Manav Pandey
I'm a research engineer interested in how neural networks can learn to reason — not by generating token sequences, but by navigating energy landscapes and optimizing latent representations.
I believe small models with structured search and learned energy landscapes will outperform large autoregressive models on reasoning tasks. My work on Enso and Dialogue Tree Search are direct expressions of that conviction.

If I had to describe myself
01Every autoregressive system I’ve shipped started with production Python — conversational AI, LLM fine-tuning, agentic tool-use.
Python Engineer.
02Building Enso taught me that diffusion-like reasoning — Langevin dynamics refining noise into structure — is a principled alternative to autoregression.
ML Engineer.
My core passions are mechanistic interpretability and self-supervised learning — now focused on JEPA architectures and Langevin dynamics for inference. I’m drawn to energy-based models because they offer a principled framework for reasoning through optimization rather than generation.
03Self-supervised learning is my core research bet — JEPA, contrastive methods, and the conviction that reasoning emerges from representations, not token prediction.
Researcher.
I reproduce results before I trust them — Enso started as a replication of Kona 1.0 and became something new. I read papers end to end, question assumptions, and build to understand. When I encounter a claim, my first instinct is to verify it myself.
Experience
Projects
Energy-Based Model for Constraint Satisfaction
A 36.5M-parameter JEPA-EBM that solves hard Sudoku through Langevin dynamics in latent space — navigating a learned energy landscape rather than generating tokens sequentially.
- 96.6% puzzle accuracy — exceeding Kona 1.0’s open-source benchmark of 96.2%
- Forward pass achieves 95.6%; Langevin dynamics adds +1.0% through test-time compute scaling
- Uses mechanistic interpretability to analyze energy-based model reasoning
Dialogue Tree Search
MCTS-Inspired Synthetic RL Data Generation
A parallel beam search system that treats conversation trajectories as a search tree, using Monte Carlo rollouts to explore diverse dialogue paths. Generates synthetic preference datasets for training tool-using agents via GRPO and PPO with Elo-based scoring.
- MCTS-inspired parallel beam search over conversation trajectories
- Produces preference data for RL fine-tuning of tool-using agents
Education
Awards & Recognition
American Express Inventor Award
2025
Received for several patent filings across the agentic AI, mechanistic interpretability, and self-supervised learning domains.
Anthropic Bug Bounty Program
Contributed to AI safety through Anthropic’s bug bounty program, identifying vulnerabilities in foundation model behavior.