Research Interests
I am a research scientist at OpenAI strawberry team working on reasoning models. Previouly I worked on RLHF for Gemini at Deepmind. I am also interested in fundamental research on RL and multi-armed bandits. See my talk at Stanford RL forum about information-directed sampling for explorationsPublications
2025
Openai o1 system card
Aaron Jaech, Adam Kalai, Adam Lerer, Adam Richardson, Ahmed El-Kishky, Aiden Low, Alec Helyar, Aleksander Madry, Alex Beutel, Alex Carney, Alex Iftimie, Alex Karpenko, Alex Tachard Passos, Alexander Neitz, Alexander Prokofiev, Alexander Wei, Allison Tam, Ally Bennett, Ananya Kumar, Andre Saraiva, Andrea Vallone, Andrew Duberstein, Andrew Kondrich, Andrey Mishchenko, Andy Applebaum, Angela Jiang, Ashvin Nair, Barret Zoph, Behrooz Ghorbani, Ben Rossen, Benjamin Sokolowsky, Boaz Barak, Bob McGrew, Borys Minaiev, Botao Hao, et al.
Preprint. [arXiv]
2024
Sequential Best-Arm Identification with Application to Brain-Computer Interface
Xin Zhou, Botao Hao, Jian Kang, Tor Lattimore, Lexin Li
Transactions on Machine Learning Research. [arXiv]Efficient Exploration for LLMs
Vikranth Dwaracherla, Seyed Mohammad Asghari, Botao Hao, Benjamin Van Roy
ICML 2024. [arXiv]Stochastic Low-rank Tensor Bandits for Multi-dimensional Online Decision Making
Jie Zhou, Botao Hao, Zheng Wen, Jingfei Zhang, Will Wei Sun
Journal of the American Statistical Association. [arXiv]
2023
Sample Efficient Deep Reinforcement Learning via Local Planning
Dong Yin, Sridhar Thiagarajan, Nevena Lazic, Nived Rajaraman, Botao Hao, Csaba Szepesvari
Submitted. [arXiv]Bridging Imitation and Online Reinforcement Learning: An Optimistic Tale
Botao Hao, Rahul Jain, Dengwang Tang, Zheng Wen
Transactions on Machine Learning Research. [arXiv]Leveraging Demonstrations to Improve Online Learning: Quality Matters
Botao Hao, Rahul Jain, Tor Lattimore, Benjamin Van Roy, Zheng Wen
ICML 2023. [arXiv]
2022
Regret Bounds for Information-Directed Reinforcement Learning
Botao Hao, Tor Lattimore
NeurIPS 2022. [arXiv]The Neural Testbed: Evaluating Predictive Distributions
Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Botao Hao, Morteza Ibrahimi, Dieterich Lawson, Xiuyuan Lu, Brendan O'Donoghue, Benjamin Van Roy
NeurIPS 2022. [arXiv]Interacting Contour Stochastic Gradient Langevin Dynamics
Wei Deng, Siqi Liang, Botao Hao, Guang Lin, Faming Liang
ICLR 2022. [arXiv]Contextual Information-Directed Sampling
Botao Hao, Tor Lattimore, Chao Qin
ICML 2022. [arXiv]Confident Least Square Value Iteration with Local Access to a Simulator
Botao Hao, Nevena Lazic, Dong Yin, Yasin Abbasi-Yadkori, Csaba Szepesvári
AISTATS 2022. [Proceedings]Efficient Local Planning with Linear Function Approximation
Dong Yin, Botao Hao, Yasin Abbasi-Yadkori, Nevena Lazic, Csaba Szepesvári
ALT 2022. [arXiv]
2021
Bootstrapping Fitted Q-Evaluation for Off-Policy Inference
Botao Hao, Xiang Ji, Yaqi Duan, Hao Lu, Csaba Szepesvári, Mengdi Wang
ICML 2021. [arXiv]Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient
Botao Hao, Yaqi Duan, Tor Lattimore, Csaba Szepesvári, Mengdi Wang
ICML 2021. [arXiv]Online Sparse Reinforcement Learning
Botao Hao, Tor Lattimore, Csaba Szepesvári, Mengdi Wang
AISTATS 2021. [arXiv] [poster]Adaptive Approximate Policy Iteration
Botao Hao, Nevena Lazic, Yasin Abbasi-Yadkori, Pooria Joulani, Csaba Szepesvári
AISTATS 2021. [arXiv] [poster]Information Directed Sampling for Sparse Linear Bandits
Botao Hao, Tor Lattimore, Wei Deng
NeurIPS 2021 (spotlight). [Proceedings] [slides]Bandit Phase Retrieval
Tor Lattimore, Botao Hao
NeurIPS 2021. [arXiv]Sparse Tensor Additive Regression
Botao Hao, Boxiang Wang, Pengyuan Wang, Jingfei Zhang, Jian Yang, Will Wei Sun
Journal of Machine Learning Research (2021). [arXiv]
2020
High-Dimensional Sparse Linear Bandits
Botao Hao, Tor Lattimore, Mengdi Wang
NeurIPS 2020. [arXiv] [slides] [poster]Adaptive Exploration in Linear Contextual Bandit
Botao Hao, Tor Lattimore, Csaba Szepesvári
AISTATS 2020. [arXiv] [slides]Sparse and Low-rank Tensor Estimation via Cubic Sketchings
Botao Hao, Anru Zhang, Guang Cheng
IEEE Transactions on Information Theory (2020). [arXiv] [slides]
Accepted in part to AISTATS 2020.
2019
Bootstrapping Upper Confidence Bound
Botao Hao, Yasin Abbasi-Yadkori, Zheng Wen, Guang Cheng
NeurIPS 2019. [arXiv] [poster]Nonparametric Bayesian Aggregation for Massive Data
Zuofeng Shang, Botao Hao, Guang Cheng
Journal of Machine Learning Research (2019). [pdf]
2018