Manav Pandey
I build production AI systems — from agentic frameworks serving thousands of developers to fine-tuned open-weight models running in production. I’m drawn to the boundary where engineering meets research, especially energy-based approaches to reasoning.
Currently a Senior AI Research Engineer at American Express and an M.S. student at Georgia Tech. My side research explores whether energy-based models can offer a principled alternative to autoregressive reasoning — Enso is my first proof of concept.

If I had to describe myself
01Production Python is my foundation — every system I’ve shipped started here, from conversational AI to agentic frameworks to internal developer tools.
Python Engineer.
02I’ve fine-tuned, quantized, and deployed open-weight models in production — the kind of work where inference latency and model quality both matter.
ML Engineer.
At Lightsource I was the sole ML engineer: vLLM serving across multiple GPUs, dynamic LoRA adapters, DPO and PPO fine-tuning of Mistral and Mixtral, INT4-FP8 quantization for production inference. At Amex I built the internal model routing and tooling layer that sits between developers and LLMs. The craft is making models work reliably at scale, not just on a notebook.
03I’m pursuing a research question: can energy-based models learn to reason through optimization rather than token prediction?
Researcher.
Enso is my first attempt at an answer — a 36M-parameter model that solves Sudoku via Langevin dynamics in a learned energy landscape. I reproduce results before I trust them; Enso started as a replication of Kona 1.0 and became something new. I’m now studying ML formally at Georgia Tech while continuing to build in this direction.
Experience
Projects
Energy-Based Model for Constraint Satisfaction
A personal research project: a 36.5M-parameter model combining JEPA-style joint embedding with energy-based inference, solving hard Sudoku through Langevin dynamics in latent space — exploring whether energy-based optimization can substitute for autoregressive generation.
- 96.6% puzzle accuracy — exceeding Kona 1.0’s open-source benchmark of 96.2%
- Forward pass achieves 95.6%; Langevin dynamics adds +1.0% through test-time compute scaling
- Uses mechanistic interpretability to analyze energy-based model reasoning
Dialogue Tree Search
MCTS-Inspired Synthetic RL Data Generation
A parallel beam search system that treats conversation trajectories as a search tree, using Monte Carlo rollouts to explore diverse dialogue paths. Generates synthetic preference datasets for training tool-using agents via GRPO and PPO with Elo-based scoring.
- MCTS-inspired parallel beam search over conversation trajectories
- Produces preference data for RL fine-tuning of tool-using agents
Education
Awards & Recognition
American Express Inventor Award
2025
Recognized for 4 filed patents (28 additional underway) spanning agentic AI, applied ML infrastructure, and self-supervised learning.
Anthropic Bug Bounty Program
Contributed to AI safety through Anthropic’s bug bounty program, identifying vulnerabilities in foundation model behavior.