2025-05Enhancing LLM's Capabilities in Open Domains via Autonomous Tool Integration is accepted to ACL 2025 Main!
2025-05Adapting while Learning is accepted to ICML 2025!
2024-12 Delivered a talk at the Challenge Cup Seminar hosted by Department of
Automation, Tsinghua University.
2024-11 Gave a talk on my summer project at AAAI FSS.
2024-06 My team got No.1 out of 600+ teams at a Baidu Inc. Data Mining Competition.
2024-05 Awarded Spark Scientific and Technological Innovation Fellowship (Top 1% in Tsinghua).
2024-04 I led my team to win Second Place and the Newcomer Prize at Tsinghua University's
Challenge Cup.
Research (* indicates equal contribution, highlight indicates representative papers)
I'm interested in Natural Language Processing, AI4Science and multimodal learning.
Goedel-Prover-V2: The Strongest Open-Source Theorem Prover to Date
Yong Lin*, Shange Tang*, Bohan Lyu*, Ziran Yang*, Jui-Hui Chung*, Haoyu Zhao*, Lai Jiang, Yihan Geng, Jiawei Ge, Jingruo Sun, Jiayun Wu, Jiri Gesi, David Acuna, Kaiyu Yang, Hongzhou Lin, Yejin Choi, Danqi Chen, Sanjeev Arora, Chi Jin
Blog Website /
Goedel-Prover-V2 sets new state-of-the-art in automated proof generation with dramatically smaller models, using scaffolded training, self-correction, and model averaging to outperform systems 20-100x larger on key mathematical benchmarks.
Ineq-Comp: Benchmarking Human-Intuitive Compositional Reasoning in Automated Theorem Proving on Inequalities
LLM-based proof assistants struggle with compositional reasoning in inequalities. The Ineq-Comp benchmark reveals that even strong models like DeepSeek-Prover-V2-7B falter, despite having proofs of subparts. This highlights a key gap between AI and human mathematical intuition.
SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors
SURGE evaluates LLMs’ ability to predict code execution across eight key areas. While they show promise, limitations prevent general-purpose surrogate execution. This is an extension of a course project that does not reflect the focus of my research.
Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving
The new 7B Goedel-Prover sets a new state-of-the-art in open-source automated theorem proving, beating previous records with a 7% improvement on miniF2F, topping the PutnamBench Leaderboard, and solving nearly twice as many problems on Lean Workbook.
Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation
Bohan Lyu*, Yadi Cao*, Duncan Watson-Parris, Leon Bergen, Taylor Berg-Kirkpatrick, Rose Yu
ICML 2025, AAAI 2024 FSS (ORAL) arXiv /
Code /
Slides /
Website /
YouTube /
BibTeX /
This work proposes a fine-tuning method where LLMs internalize tool-generated solutions (World Knowledge Distillation) and learn to switch between direct answers and tool use for complex problems (Tool Usage Adaptation). It outperforms GPT-4 and Claude-3.5 across six scientific benchmarks.
Enhancing LLM's Capabilities in Open Domains via Autonomous Tool Integration
Constructed OpenAct benchmark for complex open-domain task-solving. Developed a novel LLM agent system, OpenAgent, which leverages GitHub repositories to extend its capabilities to address diverse user queries.
Resources
Some tools and resources I've developed for the research community:
jload - A Python package for convenient JSON/JSONL data loading and saving.
vlllm - A Python package for convenient text generation with multiprocessing support.