I am an undergruduate at Tsinghua University, pursuing a
dual degree in Computer Science and Economics since 2022. I'm interested in ML and NLP topics. My works are published in ICML and ACL.
Recently I'm particularly interested in math theorem proving in formal language, and its applications (e.g. how it can benefit natural language reasoning).
SURGE evaluates LLMs’ ability to predict code execution across eight key areas. While they show promise, limitations prevent general-purpose surrogate execution.
Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving
The new 7B Goedel-Prover sets a new state-of-the-art in open-source automated theorem proving, beating previous records with a 7% improvement on miniF2F, topping the PutnamBench Leaderboard, and solving nearly twice as many problems on Lean Workbook.
Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation
Bohan Lyu*, Yadi Cao*, Duncan Watson-Parris, Leon Bergen, Taylor Berg-Kirkpatrick, Rose Yu
ICML 2025, AAAI FSS ORAL, 2024
arXiv /
Slides /
YouTube /
This work proposes a fine-tuning method where LLMs internalize tool-generated solutions (World Knowledge Distillation) and learn to switch between direct answers and tool use for complex problems (Tool Usage Adaptation). It outperforms GPT-4 and Claude-3.5 across six scientific benchmarks.
Enhancing LLM’s Capabilities in Open Domains via Autonomous Tool Integration
Bohan Lyu*, Xin Cong*, Heyang Yu, Pan Yang, Yujia Qin, Yining Ye, Yaxi Lu, Zhong Zhang, Yukun Yan, Yankai Lin, Zhiyuan Liu, Maosong Sun
ACL 2025 (Main), 2023
arXiv /
Developed an autonomous agent that leverages GitHub repositories to extend its capabilities to address diverse user queries. Introduced a new agent architecture that achieved SOTA performance on SciAct.