Eddie Zhang

San Francisco, CA · ezhang [at] g.[school name].edu

I study reinforcement learning for social good.

About

I am currently at OpenAI, working on safety and alignment and interested in applying AI for social good. I am on leave from a CS PhD at Harvard. Other areas of interest include alignment and the principles of intelligence.

I am grateful to have worked closely with Professor Milind Tambe, Chuang Gan at MIT-IBM Watson, Amy Zhang at Meta, and William Wang at UCSB.

Selected Research

Transcendence: Generative Models Can Outperform The Experts That Train Them

Edwin Zhang, Vincent Zhu, Naomi Saphra, Anat Kleiman, Benjamin L. Edelman, Milind Tambe, Sham Kakade, Eran Malach,

Theoretically and empirically demonstrates that generative models can outperform the experts that train them by low-temperature sampling
Ran experiments on the domains of toy Gaussian setting, Chess, and NLP (SQuAD v2)

NeurIPS 2024

Social Environment Design

Edwin Zhang, Sadie Zhao, Tonghan Wang, Safwan Hossain, Henry Gasztowtt, Stephan Zheng, David C. Parkes, Milind Tambe, Yiling Chen

Introduces a research agenda towards using Generative AI for social good
Sits at the intersection of EconCS, MARL, Computational Social Choice, and Mechanism Design

ICML 2024, Position Paper

Towards Generalist Agents Through Scaling Offline Reinforcement Learning

Edwin Zhang

Introduced new perspectives on pursuing Artificial General Intelligence (AGI) under the modern data-driven regime
Proposed a computability hypothesis regarding the potential and limits of applying RL for the real-world

Master's Thesis

Language Control Diffusion

Edwin Zhang, Yujie Lu, William Yang Wang, Amy Zhang

Proposed and created language conditioned diffusion RL models, enabling generalization in control through large language models
Ran several experiments comparing baselines and proposed method on distributed FAIR cluster through SLURM

ICLR 2024

Education

Harvard University

PhD Student

Computer Science

September 2023 - Present

University of California Santa Barbara

Master of Science

Computer Science

Led three disparate research projects: CFPI, LCD, and an unreleased hierarchical RL project
Teaching Assistant for CS 165B (Machine Learning)

June 2022 - June 2023

University of California Santa Barbara

Bachelor of Science

Computer Science

GPA: 3.96

High Honors
Regents Scholar (top 2.5% of school)
Relevant coursework: Convex Optimization, Game Theory, Advanced Linear Algebra, Differential Geometry, Statistical Machine Learning, Special Topics in Deep Learning

September 2019-June 2022

Employment History

Safety Research

OpenAI

July 2024 - Present

Visting Researcher

MIT-IBM Watson AI

Started work on an ongoing hierarchical RL project with Chuang Gan with potential for solving extreme long-horizon control problems such as minecraft diamond crafting
Won 3rd place out of 19 in NeurIPS Integrated Language and Understanding Challenge, receiving a $1500 cash prize

December 2022 - May 2023

Research Intern

Computer Vision and Software Engineering Intern

Plato Systems

Developed multiple view calibration pipeline through planar homographies and OpenCV.
Created set up process and capture script for NVIDIA Jetson platform with multiple third party imaging providers.
Designed and led benchmarking of several potential imaging candidates in low light, high light, and no light settings.
Refactored and contributed to primary user-facing web application, utilizing VueJS and Express.

June 2021 - June 2022

Lead Full Stack Engineer/First Hire

Allthenticate

Led development on cloud platform in early stage startup, collaborating directly with the CEO to architect and implement proprietary API.
Taught advanced Vue JS by taking complete responsibility at each step of the development phase – delivered full web application while teaching and leading two other interns working on the same project.
Built and deployed 27000 line python backend to use Elastic Beanstalk, implementing dockerized development process to speed up iteration cycles by 25%.
Gained experience with emerging web technologies such as JWT, ProtoBuf, and Nuxt.js.

January 2020 - June 2021

Founder and Lead Tutor

Yaitea

Assessed a need for tutoring code and critical thinking to children, as programming skills arose in demand and traditional tutoring services struggled to keep up.
Collaborated with several students and parents to create lasting relationships
Applied ability to learn rapidly and on the fly through the picking up and application of basic marketing to give sales pitches on the tutoring service
Organized an extensive programming curriculum of 24 lessons
Taught over 200 hours of coding and critical thinking to students
Gained comprehensive experience with Google Cloud, Nginx, WordPress, and Frontend Web Dev through creating the tutoring business’ website, at yaitea.com

August 2018 - September 2019

Projects

AlphaGo Zero Reimplementation

Graph Theory w/ UCSB

BERT Lecture Summarization

Predicting Winners in League

3D graphics with React

It's like LinkedIn but Tinder

Green Uber

Connecting HS Students w/ College Students

Invited Talks & Teaching

Teaching is one of my passions. I really really love it.

Invited Talks

Stanford AI Safety Seminar, 4/3/25, Deliberative Alignment: On Alignment through Reasoning in LLMs. Slides, Video
Harvard EconCS Incentives and Learning Seminar, 10/31/23, AI Economist: Review and Analysis. Slides, Video
Harvard SEAS AI for Social Good Seminar, 10/26/23. Language Control Diffusion: Efficiently Scaling to Generalist Agents through Space, Time, and Tasks.
Harvard Law School Tax Law Seminar, 10/25/23. Review and Analysis of the AI Economist: Improving Equality and Productivity with AI-Driven Tax Policies. (Review of work done by Stephan Zheng et al.)
UCSB Master's Thesis Defense, 5/31/23. Towards Generalist Agents through Scaling Offline Reinforcement Learning. Slides, Video
MIT Brain and Cognitive Sciences, 12/13/22. Integrating Language into Reinforcement Learning through Diffusion
NeurIPS Integrated Grounded Language and Understanding Competition, 12/06/22. Hierarchical RL through Diffusion Models
UCSB Computer Science Research Poster Session, 6/03/22. Offine RL with CFPI
UCSB Natural Language Processing Lab, 4/08/22. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor Review
Allthenticate Lunch & Learn, 2/24/21. Introduction to neural networks

Teaching

UCSB CMPSC 16 (C++) Learning Assistant, Winter '20
UCSB CMPSC 165B (ML) Teaching Assistant, Spring '23

Interests

I really like learning, and thinking about learning. I like spending time with people even more.

I love playing tennis (and losing miserably at it to my superior roommate), riding the BART, hating on Apple (sometimes while riding the BART), watching anime, and hunting dinosaurs. Haha just kidding on that last one

or am i?

Awards & Certifications

Google Cloud Research Grant ($87,200), 2023
Harvard AI Safety Technical Fellowship, 2023
Harvard Effective Altruism Precipice Fellowship, 2023

Slides

Second out of 10 in Amazon Alexa Simbot Challenge ($100,000 cash prize), 2023
Third out of 19 in Integrated Language and Understanding (IGLU) Challenge at NeurIPS ($1500 cash prize), 2022
First out of 16 in React Category at SBhacks, 2022
UCSB Distinction in the Major: Research Track, 2022
UCSB High Honors (Top 8.5% at graduation), 2022
First out of 78 in Startup Category at SDhacks, 2021
Best use of Google Cloud out of 71 at SBhacks, 2021
First overall out of 6 at Santa Barbara Startup Weekend, 2020
First out of 70 in Database Category at SBhacks, 2020
Second out of 85 in AI classification competiton at UCSB, 2020
Google Cloud Cybersecurity Grant Winner ($1000), 2020
Regents Scholar UCSB ($24000), 2019
Google Cloud Startup Grant Winner ($3000), 2019
AP Scholar with Distinction, 2019

Miscellaneous

Professional Skills

On credit assignment

The credit assignment problem is an extremely interesting problem that appears in Reinforcement Learning and AI in general. Let's say that I play a game of chess, and make n moves in succession. At the end of the game, I get just one discrete feedback signal: the outcome of the game. How does one attribute the importance of each move to the outcome of the game? This is the credit assignment problem. For a more in-depth introduction to the topic I would recommend this paper from Minsky, starting from part 3 on page 10.

The reason I mention this here is because very little of my career credit should be attributed to me. I am eternally grateful to the following people for their kindness, support and guidance. Without them, I would have nothing. In order of recency (not importance): Jiachen Li, Chad Spensky, Shou Chaofan, Derren Slinde.

Student mentoring

Parentheses denote first position after mentorship. I try to work closely with students for at least half a year, and get them to either a workshop or conference paper.

Vincent Zhu (Currently mentoring, 2024)
Henry Gasztowtt (Currently mentoring, 2024)
Ben Smith (Currently mentoring, 2023)
Matthew Ho (UCSD PhD, 2024)
Peiyang Song (Caltech BS, 2024)
Lauren Cooke (Harvard BS, 2023)
Shinda Huang (UCSB MS, 2023)
Katelyn Zhang (Google SWE, 2022)
Yuhao Zhang (Amazon SWE, 2021)

*why the domain name eddie.win?

my mom used to call me 'ai da win', a
bastardization of my actual name edwin.
my friends thought this was hilarious and
so they started calling me that too:
the domain name is just a massive joke.

Made by ya boi Eddie Zhang xxxx

Feel free to take any of the code