I am an undergruduate at Tsinghua University, pursuing a
dual degree in Computer Science and Economics since 2022.
In the summer of 2023, I joined Tsinghua NLP Lab, advised by Prof. Zhiyuan Liu.
In the summer of 2024, I interned at Rose-STL-Lab, Department of CSE, UCSD, advised by Prof. Rose Yu. Now I'm working on LLMs for auto math proving with formal language (LEAN4) at PLI, Princeton.
Before that, I interned as a quantitative researcher at China Securities in 2023.
My research interests lie in facilitating scientific discovery with large models. My current research
questions are:
1. How to enable models to learn from their interactions with the physical world and
drive progress in scientific discovery? 2. How can we rigorously evaluate the capabilities of scientific agents with methods that
go beyond static datasets? 3. How to verify natural language math reasoning with formal language?
The new 7B Goedel-Prover sets a new state-of-the-art in open-source automated theorem proving, beating previous records with a 7% improvement on miniF2F, topping the PutnamBench Leaderboard, and solving nearly twice as many problems on Lean Workbook.
Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation
Bohan Lyu*, Yadi Cao*, Duncan Watson-Parris, Leon Bergen, Taylor Berg-Kirkpatrick, Rose Yu
AAAI FSS ORAL, Main Conference under review, 2024
arxiv /
slides /
youtube /
This work proposes a fine-tuning method where LLMs internalize tool-generated solutions (World Knowledge Distillation) and learn to switch between direct answers and tool use for complex problems (Tool Usage Adaptation). It outperforms GPT-4 and Claude-3.5 across six scientific benchmarks.
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks
MEGA-Bench contains 505 multimodal tasks with diverse data sources, input/output formats, and skill requirements. The benchmark is equiped with a suite of 45 evaluation metrics to handle various output formats beyond multiple-choice questions.
VIDEOSCORE: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Xuan He*, Dongfu Jiang*, Ge Zhang, Max Ku, Achint Soni, Sherman Siu, Haonan Chen, Abhraneil Chandra, Ziyan Jiang, Aaran Arulraj, Kai Wang, Quy Duc Do, Yuansheng Ni, Bohan Lyu, Yaswanth Narsupalli, Rongqi Fan, Zhiheng Lyu, Yuchen Lin, Wenhu Chen
EMNLP (Main), 2024
arxiv /
website /
We release VIDEOFEEDBACK, the first large-scale dataset containing human-provided multi-aspect score over 37.6K synthesized videos from 11 existing video generative models. We train VIDEOSCORE based on VIDEOFEEDBACK to enable automatic video quality assessment.
Exploring Diffusion Models’ Corruption Stage in Few-Shot Fine-tuning and Mitigating with Bayesian Neural Networks
Xiaoyu Wu*, Jiaru Zhang*, Yang Hua, Bohan Lyu, Hao Wang, Tao Song, Haibing Guan
Under Peer Review, 2024
arxiv /
We apply Bayesian Neural Networks (BNNs) on Diffusion Models (DMs) with variational inference to implicitly broaden the learned distribution, and present that the learning target of the BNNs can be naturally regarded as an expectation of the diffusion loss and a further regularization with the pretrained DMs.
Enhancing LLM’s Capabilities in Open Domains via Autonomous Tool Integration
Bohan Lyu*, Xin Cong*, Heyang Yu, Pan Yang, Yujia Qin, Yining Ye, Yaxi Lu, Zhong Zhang, Yukun Yan, Yankai Lin, Zhiyuan Liu, Maosong Sun
Under Peer Review, 2023
arxiv /
Developed an autonomous agent that leverages GitHub repositories to extend its capabilities to address diverse user queries. Introduced a new agent architecture that achieved SOTA performance on SciAct.