Azalia Mirhoseini

Azalia Mirhoseini is an Assistant Professor of Computer Science and founder of Scaling Intelligence Lab at Stanford University. Her lab develops scalable and self-improving AI systems and methodologies towards the goal of advancing artificial general intelligence. She also spends time at Google DeepMind as a Senior Staff Scientist. Prior to Stanford, she spent several years in industry AI labs, including Google Brain and Anthropic. Her past work includes Mixture-of-Experts (MoE) neural architectures, now commonly used in leading generative AI models; AlphaChip, a pioneering work on deep reinforcement learning for layout optimization used in the design of advanced chips like Google AI accelerators (TPUs) and data center CPUs; and research on inference-time scaling laws. Her research has been recognized through the MIT Technology Review's 35 Under 35 Award, the Best ECE Thesis Award at Rice University, publications in flagship venues such as Nature, and coverage by various media outlets, including MIT Technology Review, IEEE Spectrum, The Verge, The Times, ZDNet, VentureBeat, and WIRED.

Research

Highlights:

Large Language Monkeys introduces inference-time scaling laws. While the impact of scaling training compute is well understood, when it comes to inference, we often limit the amount of compute to only one or a few limited attempts per problem. Here, we explore inference compute as another axis for scaling by increasing the number of generated samples. We show that the relationship between coverage and the number of samples is often log-linear and can be modeled with an exponentiated power law, suggesting the existence of inference-time scaling laws. In domains with automated verifiers like coding and formal proofs, these increases in coverage directly translate into improved performance. For example, on SWE-bench Lite, we showed that coverage increases from 15.9% with one sample to a new state-of-the-art performance of 56% with 250 samples (paper’24).

Constitutional AI (CAI) is a novel approach to train Large Language Models (LLMs) that are helpful, harmless, and honest. CAI introduces a new strategy that relies on AI feedback for self-improvement. In CAI, the LLM criticizes its own outputs based on a set of principles (the constitution) and revises its outputs to follow the constitution. This technique is used to finetune the LLM with supervised training first and later with reinforcement learning (based on the preference model trained on AI feedback). CAI reduces reliance on human supervision for training, improves transparency, and enables a faster approach to new domain adaptation (paper'22).

FAST is a full-stack accelerator search framework with a broad design search space, including hardware datapath, software scheduling, and compiler passes such as operation fusion. When evaluated on vision and NLP benchmarks, custom designs generated by FAST show up to 6x improvement in Perf/TDP vs. TPU-v3. We also demonstrated that accelerators with 2x higher Perf/TCO can become ROI-positive at moderate datacenter deployments (ASPLOS'22).

Developed lossless accelerators using approximate multipliers for fast and energy-efficient inference on vision benchmarks (DATE'22).

Developed AlphaChip, one of the first deep reinforcement learning approaches used to solve a real-world engineering problem. It generates superhuman or comparable chip layouts in hours, rather than taking weeks or months of human effort, and its layouts are used in advanced chips, from data centers to mobile phones. I cofounded/led this project, a cross-functional effort with 20+ engineers and researchers across different organizations, including Google Research and (Hardware) Platforms (Nature'21, AlphaChip'24).

Developed deep reinforcement learning algorithms to do model parallelism that speed up deep network training by more than 60% over top-performing baselines (ICML’17, ICLR’18, and NeurIPS’20).

Introduced Mixture-of-Experts (MoE) architectures for language models and showed efficient training of LLMs with over 130 billion parameters, yielding state-of-the-art results on established language modeling and translations tasks (ICLR’17).

Publications:

For a full list of recent papers, blog posts, and code pointers, visit my lab webpage. For a comprehensive list of publications, please visit my Google Scholar page.

Teaching

Fall 2023 - CS229s: Systems for Machine Learning

Recent Talks and Interviews

Keynote speaker at ASPLOS, 2023

Keynote speaker at the 32nd Microelectronics Design and Test Symposium, 2023

Keynote speaker at NeurIPS workshop on New Frontiers in Graph Learning, 2022

Invited speaker at AICoM workshop on AI-Enhanced Co-Design for Next-gen Microelectronics, 2022

Keynote speaker at CVPR workshop on Dynamic Neural Networks meets Computer Vision, 2021 (Slides, Video)

Keynote speaker at MLSys workshop on Graph Neural Networks and Systems, 2021

Invited speaker at IPAM 2021 Workshop on Deep Learning and Combinatorial Optimization

Invited speaker at NVIDIA GTC, 2021 (Video)

Panelist at IEEE Custom Integrated Circuits Conference ML for Chip Design Forum, 2021

Keynote Speaker at MIT EmTech China, 2020

Keynote Speaker at Ray summit, 2020 (Video)

Keynote Speaker at Open Data Science Conference, 2020

Keynote Speaker at International Supercomputing ML day, 2019

Keynote Speaker at ML in HPC Workshop at Supercomputing, 2018

Interviewed by MIT Technology Review, 2020

Interviewed by IEEE Spectrum, 2020

Interviewed by ACM Learning Center, 2020 (Interview)

Interviewed by Towards Data Science, 2019 (Interview, Video)

Awards and Honors

Published in Nature Flagship Journal, 2021

MIT Technology Review 35 Innovators Under 35, 2019 (Article)

Best Thesis Award at Rice University, ECE Department, 2015

Fellowships and scholarships from Microsoft Research, IBM Research and Schlumberger, 2010-2015

Gold Medal in National Math Olympiad, Iran, 2004

Media

Contact

If you are interested in working with me at Stanford, please visit Scaling Intelligence Lab for more information.