Manav Pandey

Manav Pandey

Senior Research Engineer - Agentic AI

Scottsdale, AZ
AI Research
LLMs
Production ML
Agentic AI

Professional Focus

Architecting enterprise multi-agent AI frameworks serving thousands of users

Fine-tuning and optimizing large language models for production deployment

Leading AI safety research and discovering critical vulnerabilities in foundation models

Building scalable ML systems with comprehensive monitoring and security protocols

Programming Languages

🐍PythonExpert
🟨JavaScriptExpert
C++Proficient
🔵GoProficient
JavaProficient
🦀RustLearning

Employment Authorization

🇺🇸

US Citizen

About Me

In a Nutshell...

  • Senior Research Engineer at American Express, focused on async RL environment research with code execution sandboxing.

  • Optimizing tool use for small language models using GRPO and PPO with Elo-based scoring systems.

  • Building synthetic dataset generation pipelines for RL training through dialogue tree search.

  • Diverse experience from enterprise R&D to startups, with focus on RL training, model fine-tuning, and agentic AI.

|

Experience

Senior Research Engineer - CTO R&D Team
American Express logo
American Express
Sep 2024 - Present

Leading RL research and enterprise-wide adoption of agentic AI systems. Driving research on tool optimization for small language models and async environment training.

Key Achievements:

  • Conducting async RL environment research with code execution sandboxing for safe agent training
  • Sole-led enterprise-wide technical sessions distilling cutting-edge research papers (DeepSeek MLA, Muon optimizer, etc.) to 500+ attendees
  • Contributing to patents in the RL, Quantum Computing, and Agentic AI domains
  • Architected enterprise multi-agent AI framework using LangGraph and MCP
  • Established company-wide standards for agentic AI: security protocols, evaluation metrics, and tool-use patterns

Technologies:

LangGraph
MCP
GRPO
PPO
Reinforcement Learning
Small Language Models
Agentic AI
Machine Learning Engineer - Research
Lightsource logo
Lightsource
Feb 2024 - Sep 2024

Specialized in fine-tuning large language models for multilingual content generation, with focus on German-language podcast applications.

Key Achievements:

  • Applied DPO (Direct Preference Optimization) for German-language podcast episode description generation
  • Generated synthetic training data and used PPO for fine-tuning Mistral 7B and Mixtral 8x7B models
  • Optimized inference pipelines with INT4-FP8 quantization techniques across heterogeneous GPU hardware
  • Implemented spherical interpolation model merging for improved multilingual performance

Technologies:

Mistral
Mixtral
DPO
PPO
Synthetic Data Generation
Model Quantization
GPU Optimization
Director of AI
Curiouser logo
Curiouser
Dec 2023 - Sep 2024

Led technical direction for engineering team developing high-EQ conversational AI platform with advanced reasoning and adapter selection.

Key Achievements:

  • Implemented reasoning and chain-of-thought tool calling architecture for improved response quality
  • Built LoRA dynamic adapter selection with orchestrator agent for high EQ responses across conversation contexts
  • Autonomous knowledge graph creation for persistent user understanding
  • Employed task queues with Celery and graph-based agentic systems for parallel, real-time conversational processing

Technologies:

LoRA
Chain-of-Thought
Multi-LLM Architecture
Knowledge Graphs
Celery
Orchestrator Agents
Conversational AI
Software Engineer
American Express logo
American Express
Aug 2022 - Feb 2024

Built production conversational AI systems and implemented comprehensive monitoring across microservices architecture. Focused on scalable AI deployment and system observability.

Key Achievements:

  • Built production conversational AI using BERT models, achieving 90% accuracy across 1M+ monthly interactions
  • Implemented comprehensive APM monitoring across 100+ microservices using Splunk and OpenTelemetry

Technologies:

BERT
Conversational AI
Microservices
Splunk
OpenTelemetry
APM Monitoring
Machine Learning Research Assistant
Texas A&M University logo
Texas A&M University
Oct 2021 - May 2022

Conducted research on sentiment classification and cross-border threat detection using advanced machine learning techniques. Applied NLP and deep learning to process large-scale social media and news data.

Key Achievements:

  • Enhanced SVM model accuracy from 79% to 94% through kernel optimization for sentiment classification and cross-border threat detection
  • Applied NLP and deep learning techniques to aggregate and process news/social media data for US-Mexico-Canada supply chain assessment

Technologies:

SVM
Kernel Optimization
NLP
Deep Learning
Sentiment Analysis
Social Media Analytics

Education & Certifications

Academic background and professional certifications in AI and machine learning

Education
Georgia Institute of Technology logo

Masters in Computer Science (Machine Learning)

Georgia Institute of Technology

2025 - Present

Texas A&M University logo

Bachelors in Computing

Texas A&M University

2018 - 2022

Certifications
Coursera logo

Coursera AI Instructor

Coursera

2023

Core Competencies
Foundation Models
Reinforcement Learning
SLM Tool Optimization (GRPO/PPO/Elo)
Production AI Systems
Multi-Agent Systems
LLM Fine-tuning
Model Optimization
Agentic AI

Technical Skills

LLMs & Deep Learning

PyTorch
TensorFlow
Transformers
RLHF/PPO
Supervised Fine-tuning
Distillation
DPO
LangChain
LlamaIndex

AI/ML Engineering

vLLM
TensorRT
GGUF/AWQ/EXL2 Quantization
Distributed Training
CUDA
Model Merging
RAG Architectures

Software Engineering

Python
C++
TypeScript
FastAPI
Docker
Kubernetes
AWS SageMaker
Multi-agent Systems
MCP
Production ML

Research & Safety

AI Safety Research
Adversarial Testing
Model Validation
Bug Bounty Programs
Security Protocols
Evaluation Metrics

© 2026 Manav Pandey. All rights reserved.

0%