Mayank Sharma

1. What AI capabilities must we measure?

I’m interested in identifying AI capabilities that matter, grounding them in literature from the human sciences, and breaking them down into constructs that are theoretically meaningful and machine measurable.

2. How do we measure them rigorously?

I’m interested in working with annotators, designing interfaces, understanding disagreement, and building human-in-the-loop processes to create benchmarks and evaluation datasets. I apply psychometrics (IRT, validity, reliability, and measurement theory) to evaluate whether our systems capture what they are intended to measure.

3. Do systems built to demonstrate these capabilities have real-world impact?

Once we define what to measure and build rigorous evaluation methods, I draw on research methods from the social sciences to test whether these signals hold in real-world settings, linking evaluations to deployment through causal/experimental methods.

Looking for

Researcher or research TPM roles on benchmarks, evaluations, and human data teams.

Skills & tools

Programming: Python, R, SQL, Jupyter, Git, LaTeX, Cursor

Data / ML: NumPy, Pandas, Scikit-Learn, PyTorch, Transformers, and NLTK

AI: Benchmark & Evaluation Design; Post-Training (QLoRA, RLHF/DPO); RAG; Prompt Engineering; Human & LLM Annotation

Research Methods: Item Response Theory; Dimensionality Reduction (Factor Analysis/PCA); Reliability & Validity; Regression Analysis; Experimental & Quasi-Experimental Designs; Structural Equation Modelling

news

May 05, 2026	Returning at Stanford AI4ALL 2026 and the Stanford AIMI Summer Research Internship 2026 as an NLP mentor for high school students
Apr 30, 2026	Two papers accepted at 21st BEA @ ACL 2026: A Bigger Catch: Fine-Grained Curriculum Standards Alignment on the MathFish Benchmark (with Xinman Liu & Teah Shi), and Predicting Item Difficulty and Generating Reading Comprehension Items via an Annotated Repository (with collaborators)
Apr 29, 2026	ClaimCLAIRE: A Trust-Aware Multi-Component Fact-Checking Agent for Open-World Claims (with Xinman Liu) accepted for oral presentation at 6th TrustNLP @ ACL 2026
Apr 15, 2026	ConvoLearn dataset (40K turns, post-training data for dialogic alignment of LLM tutors) released on Hugging Face
Apr 12, 2026	Attending ASU+GSV 2026 in San Diego on Stanford GSE scholarship

click for more