I am an undergruduate at Tsinghua University, pursuing a
dual degree in Computer Science and Economics since 2022.
In the summer of 2023, I joined Tsinghua NLP Lab and OpenBMB, advised by Prof. Zhiyuan Liu.
In the summer of 2024, I start my research internship at Rose-STL-Lab, Department of CSE, UCSD, advised by Prof. Rose Yu.
Before that, I interned as a quantitative researcher at China Securities in 2023.
I am interested in designing innovative computer science methods, with a focus on language techniques,
knowledge discovery, and data mining, to address real-world challenges.
I am seeking CS PhD opportunities for Fall 2026. I welcome any collaboration or
discussion, whether with seniors or peers. Please feel free to reach out!
2024-09 My first-authored paper is invited to give a 30-min talk at AAAI FSS 2024.
2024-06 My team got No.1 out of 600+ teams at a Baidu Inc. Data Mining Competition.
2024-05 I became a member of 'Sparking Program', the most prestigious and selective academic
organization for students at Tsinghua University.
2024-04 I led my team to win Second Place and the Newcomer Prize at Tsinghua University's
Challenge Cup.
2024-04 I received a Grade A in the 'Global Exploration Initiative' and was awarded a grant
of 20,000 RMB.
Research (* indicates equal contribution)
I'm interested in Natural Language Processing, AI4Science and multimodal learning.
Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation
Bohan Lyu*, Yadi Cao*, Duncan Watson-Parris, Leon Bergen, Taylor Berg-Kirkpatrick, Rose Yu
Preprint, 2024
arxiv /
slides /
youtube /
This work proposes a fine-tuning method where LLMs internalize tool-generated solutions (World Knowledge Distillation) and learn to switch between direct answers and tool use for complex problems (Tool Usage Adaptation). It outperforms GPT-4 and Claude-3.5 across six scientific benchmarks.
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks
Jiacheng Chen*, Tianhao Liang*, Sherman Siu*, Zhengqing Wang, Kai Wang, Yubo Wang, Yuansheng Ni, Wang Zhu, Ziyan Jiang, Bohan Lyu, Dongfu Jiang, Xuan He, Yuan Liu, Hexiang Hu, Xiang Yue, Wenhu Chen
Under Peer Review, 2024
arxiv /
website /
MEGA-Bench contains 505 multimodal tasks with diverse data sources, input/output formats, and skill requirements. The benchmark is equiped with a suite of 45 evaluation metrics to handle various output formats beyond multiple-choice questions.
VIDEOSCORE: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Xuan He*, Dongfu Jiang*, Ge Zhang, Max Ku, Achint Soni, Sherman Siu, Haonan Chen, Abhraneil Chandra, Ziyan Jiang, Aaran Arulraj, Kai Wang, Quy Duc Do, Yuansheng Ni, Bohan Lyu, Yaswanth Narsupalli, Rongqi Fan, Zhiheng Lyu, Yuchen Lin, Wenhu Chen
EMNLP (Main), 2024
arxiv /
website /
We release VIDEOFEEDBACK, the first large-scale dataset containing human-provided multi-aspect score over 37.6K synthesized videos from 11 existing video generative models. We train VIDEOSCORE based on VIDEOFEEDBACK to enable automatic video quality assessment.
Exploring Diffusion Models’ Corruption Stage in Few-Shot Fine-tuning and Mitigating with Bayesian Neural Networks
Xiaoyu Wu*, Jiaru Zhang*, Yang Hua, Bohan Lyu, Hao Wang, Tao Song, Haibing Guan
Under Peer Review, 2024
arxiv /
We apply Bayesian Neural Networks (BNNs) on Diffusion Models (DMs) with variational inference to implicitly broaden the learned distribution, and present that the learning target of the BNNs can be naturally regarded as an expectation of the diffusion loss and a further regularization with the pretrained DMs.
Enhancing LLM’s Capabilities in Open Domains via Autonomous Tool Integration
Bohan Lyu*, Xin Cong*, Heyang Yu, Pan Yang, Yujia Qin, Yining Ye, Yaxi Lu, Zhong Zhang, Yukun Yan, Yankai Lin, Zhiyuan Liu, Maosong Sun
Under Peer Review, 2023
arxiv /
Developed an autonomous agent that leverages GitHub repositories to extend its capabilities to address diverse user queries. Introduced a new agent architecture that achieved SOTA performance on SciAct.