Research Interests
I am currently a research scientist at OpenAI working on reasoning and safety research. Previouly I worked on developing principled data-efficient RLHF for fine-tuning large language models (LLMs) and their application for Bard and Gemini. I am also interested in fundamental research on RL and multi-armed bandits. See my talk at Stanford RL forum about information-directed sampling for explorationsPublications
2024
Sequential Best-Arm Identification with Application to Brain-Computer Interface
Xin Zhou, Botao Hao, Jian Kang, Tor Lattimore, Lexin Li
Transactions on Machine Learning Research. [arXiv]Efficient Exploration for LLMs
Vikranth Dwaracherla, Seyed Mohammad Asghari, Botao Hao, Benjamin Van Roy
ICML 2024. [arXiv]Stochastic Low-rank Tensor Bandits for Multi-dimensional Online Decision Making
Jie Zhou, Botao Hao, Zheng Wen, Jingfei Zhang, Will Wei Sun
Journal of the American Statistical Association. [arXiv]
2023
Sample Efficient Deep Reinforcement Learning via Local Planning
Dong Yin, Sridhar Thiagarajan, Nevena Lazic, Nived Rajaraman, Botao Hao, Csaba Szepesvari
Submitted. [arXiv]Bridging Imitation and Online Reinforcement Learning: An Optimistic Tale
Botao Hao, Rahul Jain, Dengwang Tang, Zheng Wen
Transactions on Machine Learning Research. [arXiv]Leveraging Demonstrations to Improve Online Learning: Quality Matters
Botao Hao, Rahul Jain, Tor Lattimore, Benjamin Van Roy, Zheng Wen
ICML 2023. [arXiv]
2022
Regret Bounds for Information-Directed Reinforcement Learning
Botao Hao, Tor Lattimore
NeurIPS 2022. [arXiv]The Neural Testbed: Evaluating Predictive Distributions
Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Botao Hao, Morteza Ibrahimi, Dieterich Lawson, Xiuyuan Lu, Brendan O'Donoghue, Benjamin Van Roy
NeurIPS 2022. [arXiv]Interacting Contour Stochastic Gradient Langevin Dynamics
Wei Deng, Siqi Liang, Botao Hao, Guang Lin, Faming Liang
ICLR 2022. [arXiv]Contextual Information-Directed Sampling
Botao Hao, Tor Lattimore, Chao Qin
ICML 2022. [arXiv]Confident Least Square Value Iteration with Local Access to a Simulator
Botao Hao, Nevena Lazic, Dong Yin, Yasin Abbasi-Yadkori, Csaba Szepesvári
AISTATS 2022. [Proceedings]Efficient Local Planning with Linear Function Approximation
Dong Yin, Botao Hao, Yasin Abbasi-Yadkori, Nevena Lazic, Csaba Szepesvári
ALT 2022. [arXiv]
2021
Bootstrapping Fitted Q-Evaluation for Off-Policy Inference
Botao Hao, Xiang Ji, Yaqi Duan, Hao Lu, Csaba Szepesvári, Mengdi Wang
ICML 2021. [arXiv]Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient
Botao Hao, Yaqi Duan, Tor Lattimore, Csaba Szepesvári, Mengdi Wang
ICML 2021. [arXiv]Online Sparse Reinforcement Learning
Botao Hao, Tor Lattimore, Csaba Szepesvári, Mengdi Wang
AISTATS 2021. [arXiv] [poster]Adaptive Approximate Policy Iteration
Botao Hao, Nevena Lazic, Yasin Abbasi-Yadkori, Pooria Joulani, Csaba Szepesvári
AISTATS 2021. [arXiv] [poster]Information Directed Sampling for Sparse Linear Bandits
Botao Hao, Tor Lattimore, Wei Deng
NeurIPS 2021 (spotlight). [Proceedings] [slides]Bandit Phase Retrieval
Tor Lattimore, Botao Hao
NeurIPS 2021. [arXiv]Sparse Tensor Additive Regression
Botao Hao, Boxiang Wang, Pengyuan Wang, Jingfei Zhang, Jian Yang, Will Wei Sun
Journal of Machine Learning Research (2021). [arXiv]
2020
High-Dimensional Sparse Linear Bandits
Botao Hao, Tor Lattimore, Mengdi Wang
NeurIPS 2020. [arXiv] [slides] [poster]Adaptive Exploration in Linear Contextual Bandit
Botao Hao, Tor Lattimore, Csaba Szepesvári
AISTATS 2020. [arXiv] [slides]Sparse and Low-rank Tensor Estimation via Cubic Sketchings
Botao Hao, Anru Zhang, Guang Cheng
IEEE Transactions on Information Theory (2020). [arXiv] [slides]
Accepted in part to AISTATS 2020.
2019
Bootstrapping Upper Confidence Bound
Botao Hao, Yasin Abbasi-Yadkori, Zheng Wen, Guang Cheng
NeurIPS 2019. [arXiv] [poster]Nonparametric Bayesian Aggregation for Massive Data
Zuofeng Shang, Botao Hao, Guang Cheng
Journal of Machine Learning Research (2019). [pdf]
2018