Publications
My work is used by AI labs such as DeepMind [1, 2, 3, 4], Meta [5, 6, 7], NVIDIA [8, 9], and Mila [10, 11, 12]:
Reasoning Gym: Reasoning Environments for Reinforcement Learning with Verifiable Rewards
*, Oliver Stanley*, Joe Sharratt*, Richard Jones*, Abdulhakeem Adefioye, Jean Kaddour, Andreas Köpf
NeurIPS 2025 Spotlight
Momentum-based Weight Interpolation of Strong Zero-Shot Models for Continual Learning
*, Karsten Roth*, Zeynep Akata
Interpolate @ NeurIPS 2022 Best Paper Award
Selected Work
Core contributor of Reasoning Gym – a library of procedural data generators for training reasoning models with RL created by Andreas Köpf (co-author of PyTorch, OpenAssistant). I built dozens of RL environments, as well as ran the zero-shot, external benchmark, and curriculum learning experiments for our NeurIPS publication.
Reinforcement Learning
Worked with Karsten Roth (now Research Scientist at DeepMind) on mitigating catastrophic forgetting in foundation models. Using momentum-based weight interpolation, we demonstrated performance close to the upper bound of jointly training on all data in our NeurIPS workshop publication.
Continual Learning
Wrote several sections of the RLHF Book by Nathan Lambert (Research Scientist at Ai2), where I derived the policy gradient objective and Bradley-Terry loss, provided intuitions for the PPO gradient dynamics, and built the foundations of the code library.
Reinforcement Learning
Led a team to automate glomerular sclerosis classification from gigapixel kidney biopsies, deployed in a system serving over half of the Organ Procurement Organizations in the US.
Healthcare Life Sciences
Part of a team developing models to predict protein-ligand binding affinity from DNA Encoded Library (DEL) data for drug discovery, resulting in numerous experimentally confirmed binders in the lab!
Healthcare Life Sciences
Contributed several datasets to EleutherAI's Evaluation Harness (such as Lambada Translations, Paloma, LegalBench), as well as implemented higher-is-better indicators and tests for output table consistency.
Model Evaluation
Co-founded uxo.ai in 2023 to develop agents capable of understanding and navigating the web. The goal was to build universal web scrapers that can extract structured content at scale.
Startups