
Azalia Mirhoseini
Azalia Mirhoseini is Founder of Ricursive Intelligence, a frontier lab dedicated to recursive self-improvement through AI that designs the chips that fuel it. She is also an Assistant Professor of Computer Science at Stanford University where she directs Scaling Intelligence, a lab focused on developing scalable and self-improving AI systems and methodologies toward the goal of artificial general intelligence. Previously, she spent several years in industry AI labs, including Google Brain, Anthropic, and Google DeepMind, working on the development of Claude and Gemini. Her past work includes Mixture-of-Experts (MoE) neural architectures, now predominantly used in leading generative AI models; AlphaChip, a pioneering work on deep reinforcement learning for layout optimization used in the design of advanced chips like Google AI accelerators (TPUs) and data center CPUs; as well as pioneering research on LLM test-time scaling. Her work has been recognized through the Okawa Research Grant, the Google ML and Systems Junior Faculty Award, MIT Technology Review's 35 Under 35 Award, the Best ECE Thesis Award at Rice University, publications in flagship venues such as Nature, and coverage by various media outlets, including WSJ, NYT, Forbes, MIT Technology Review, IEEE Spectrum, WIRED, and TechCrunch.​
I'm generally interested in enabling self-improving AI, including AI for model, software, and hardware design. I believe that recursive self-improvement is the path to general intelligence. I highlight some of our projects below, but please visit Scaling Intelligence's publications, blogposts, and GitHub pages for a comprehensive list of projects, including test-time scaling methodologies, multi-model AI systems, inference optimization, AI-based verification, AI for software, long-context optimization, and more.
Highlights
OpenJarvis
OpenJarvis is an open-source framework for building personal AI agents that run entirely on local devices. Motivated by the Intelligence Per Watt finding that local models already handle the vast majority of real-world queries, OpenJarvis provides the missing software stack to make local-first personal AI practical. The framework is built around three core ideas: shared primitives for composing on-device agents, evaluations that treat energy, latency, and cost as first-class constraints alongside accuracy, and a learning loop that improves models using local trace data. OpenJarvis supports multiple inference backends and includes an energy leaderboard for benchmarking on-device efficiency (Blog'26, GitHub).
Intelligence Per Watt
Intelligence Per Watt (IPW) introduces a unified metric, task accuracy per unit of power, for evaluating the viability of local AI inference on personal devices. Through a large-scale empirical study across 20+ local LMs, 8 hardware accelerators, and 1M+ real-world queries, we show that local models can successfully handle 88.7% of single-turn chat and reasoning queries, with intelligence per watt improving 5.3× from 2023 to 2025 through compounding advances in model architectures (3.1×) and hardware accelerators (1.7×). Hybrid local-cloud routing achieves 60 to 80% reductions in energy, compute, and cost while maintaining answer quality, demonstrating that local inference can meaningfully redistribute demand from centralized cloud infrastructure (Preprint'25).
KernelBench
KernelBench is a benchmark and evaluation framework for AI based kernel generation. KernelBench continues to drive progress on AI for AI performance optimization and has been adopted by industry leaders, including Meta, METR, Nvidia, Prime Intellect, Cognition AI, and SakanaAI (ICML’25).
SWiRL
SWiRL is a recursive self-improvement technique enabled by a new multi-step deep RL approach. We leverage LLMs to synthetically generate new reasoning, planning, and tool use traces that can then be used for training LLMs to advance their multi-step and long-horizon problem-solving capability. Importantly, SWiRL exhibits generalization across tasks: for example, training only on multi-hop
question-answering improves zero-shot performance on math and vice versa (COLM'25).
Large Language
Monkeys
Large Language Monkeys is a pioneering work in tets-time scaling that introduces inference-time scaling laws. While the impact of scaling training compute is well understood, when it comes to inference, we often limit the amount of compute to only one or a few limited attempts per problem. Here, we explore inference compute as another axis for scaling by increasing the number of generated samples. We show that the relationship between coverage and the number of samples is often log-linear and can be modeled with an exponentiated power law, suggesting the existence of inference-time scaling laws. In domains with automated verifiers like coding and formal proofs, these increases in coverage directly translate into improved performance. For example, on SWE-bench Lite, we showed that coverage increases from 15.9% with one sample to a new state-of-the-art performance of 56% with 250 samples (paper’24, ICML Oral’25).
Constitutional AI
Constitutional AI (CAI) is a novel approach to train Large Language Models (LLMs) that are helpful, harmless, and honest. CAI introduces a new strategy that relies on AI feedback for self-improvement. In CAI, the LLM criticizes its own outputs based on a set of principles (the constitution) and revises its outputs to follow the constitution. This technique is used to finetune the LLM with supervised training first and later with reinforcement learning (based on the preference model trained on AI feedback). CAI reduces reliance on human supervision for training, improves transparency, and enables a faster approach to new domain adaptation (Anthropic'22).
FAST
FAST is a full-stack accelerator search framework with a broad design search space, including hardware datapath, software scheduling, and compiler passes such as operation fusion. When evaluated on vision and NLP benchmarks, custom designs generated by FAST show up to 6x improvement in Perf/TDP vs. TPU-v3. We also demonstrated that accelerators with 2x higher Perf/TCO can become ROI-positive at moderate datacenter deployments (ASPLOS'22).
Lossless Accelerators
Developed lossless accelerators using approximate multipliers for fast and energy-efficient inference on vision benchmarks (DATE'22).
AlphaChip
Developed AlphaChip, one of the first deep reinforcement learning approaches used to solve a real-world engineering problem. It generates superhuman or comparable chip layouts in hours, rather than taking weeks or months of human effort, and its layouts are used in advanced chips, from data centers to mobile phones. I cofounded/led this project, a cross-functional effort with 20+ engineers and researchers across different organizations, including Google Research and (Hardware) Platforms (Nature'21, AlphaChip'24).
Deep Reinforcement Learning Algorithms
Developed deep reinforcement learning algorithms to do model parallelism that speed up deep network training by more than 60% over top-performing baselines (ICML’17, ICLR’18, and NeurIPS’20).
Mixture-of-Experts
Introduced Mixture-of-Experts (MoE) architectures for language models and showed efficient training of LLMs with over 130 billion parameters, yielding state-of-the-art results on established language modeling and translations tasks (ICLR’17).
Publications
For a full list of recent papers, blogposts, code, and dataset pointers, visit my lab webpage. For a comprehensive list of publications, please visit my Google Scholar page.
-
Fall 2023, Fall 2024 - CS229s: Systems for Machine Learning
-
Winter 2024, Fall 2025 - CS329a: Self-Improving AI Agent
-
Keynote at NeurIPS ML for Systems Workshop, 2025
-
Invited speaker at NeurIPS Foundations of Reasoning of LMs Workshop, 2025
-
Invited speaker at CoLM RAM2: Reasoning, Attention & Memory-10 Years On Workshop, 2025
-
Invited speaker at CoLM SCALR Workshop, 2025
-
Panelist at CoLM Main Conference, 2025
-
Invited speaker at Simon’s Institute on Future of LLMs and Transformers, 2025 (Video)
-
Distinguished Speaker Series, IBM’s Watson Research Center, 2025
-
Invited Speaker at Stanford HAI Congressional Boot Camp on AI, 2025
-
Invited speaker at Simon’s Institute Workshop on Future of LLMs and Transformers, 2025
-
Invited speaker at NeurIPS Fine-Tuning in Modern Machine Learning Workshop, 2024
-
Invited speaker at NeurIPS ML with New Compute Paradigms (MLNCP), 2024
-
Keynote speaker at Netflix Machine Learning Summit, 2024
-
Invited speaker at Center for AI Safety Annual Meeting, 2024
-
Invited speaker at ICML ES-FoMo Workshop, 2024
-
Invited speaker at LLM Aided Design Workshop, 2024
-
Invited speaker at ISCA Workshop on Cognitive Architectures, 2024
-
Panelist at ML and Systems Rising Stars Workshop, 2024
-
Panelist at VLSI Symposium, 2024
-
Keynote speaker at ASPLOS, 2023
-
Keynote speaker at the 32nd Microelectronics Design and Test Symposium, 2023
-
Keynote speaker at NeurIPS workshop on New Frontiers in Graph Learning, 2022
-
Invited speaker at AICoM workshop on AI-Enhanced Co-Design for Next-gen Microelectronics, 2022
-
Keynote speaker at CVPR workshop on Dynamic Neural Networks meets Computer Vision, 2021 (Slides)
-
Keynote speaker at MLSys workshop on Graph Neural Networks and Systems, 2021
-
Invited speaker at IPAM 2021 Workshop on Deep Learning and Combinatorial Optimization
-
Invited speaker at NVIDIA GTC, 2021 (Video)
-
Panelist at IEEE Custom Integrated Circuits Conference ML for Chip Design Forum, 2021
-
Keynote Speaker at MIT EmTech China, 2020
-
Keynote Speaker at Ray summit, 2020 (Video)
-
Keynote Speaker at Open Data Science Conference, 2020
-
Keynote Speaker at International Supercomputing ML day, 2019
-
Keynote Speaker at ML in HPC Workshop at Supercomputing, 2018
-
Interviewed by MIT Technology Review, 2020
-
Interviewed by IEEE Spectrum, 2020
-
Interviewed by ACM Learning Center, 2020 (Interview)
-
Interviewed by Towards Data Science, 2019 (Video)
-
Okawa Research Grant Award for Self-Improving AI Systems, 2025 (Link)
-
Google ML and Systems Junior Faculty Award, 2025 (Link)
-
Google DeepMind's Flagship AlphaChip Brand, 2024 (Blogpost)
-
Published in Nature Flagship Journal, 2021 (Article)
-
MIT Technology Review 35 Innovators Under 35, 2019 (Article)
-
Best Thesis Award at Rice University, ECE Department, 2015
-
Fellowships and Scholarships from Microsoft Research, IBM Research and Schlumberger, 2010-2015
-
Gold Medal in National Math Olympiad, Iran, 2004
If you are interested in working with me at Ricursive Intelligence, please visit ricursive.com
If you are interested in working with me at Stanford, please visit scalingintelligence.stanford.edu