Bohan Lyu

I am an undergruduate at Tsinghua University. I'm interested in ML and NLP topics. My works are published in ICML and ACL.

Now I'm working on LLMs for autonomous math theorem proving at PLI, Princeton, advised by Prof. Prof. Chi Jin. Prior to it, I've worked in Tsinghua NLP Lab, advised by Prof. Zhiyuan Liu, and Rose-STL-Lab, UCSD, advised by Prof. Rose Yu. I've also interned as a quantitative researcher at China Securities.

I have served as a reviewer for ARR Feb. 2025, ICLR 2025, AI4MATH @ ICML 2024 and LLMAgents @ ICLR 2024 .

I am seeking PhD opportunities starting in Fall 2026. Please feel free to reach out!

Email  /  GitHub  /  Google Scholar  /  Twitter  /  CV

   

profile photo
▶ Contact Info (click to see)

News

  • 2025-08 SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors is accepted to EMNLP 2025 Main, with a top 0.3% meta score!
  • 2025-07 Gave a tutorial titled LLM Reasoning: From Informal to Formal at Department of Computer Science and Technology, Tsinghua University.
  • 2025-07 Releasing Goedel-Prover-V2!
  • 2025-05 Enhancing LLM's Capabilities in Open Domains via Autonomous Tool Integration is accepted to ACL 2025 Main!
  • 2025-05 Adapting while Learning is accepted to ICML 2025!
  • 2024-06 My team got No.1 out of 600+ teams at a Baidu Inc. Data Mining Competition.
  • 2024-05 Awarded Spark Scientific and Technological Innovation Fellowship (Top 1% in Tsinghua).
  • 2024-04 I led my team to win Second Place and the Newcomer Prize at Tsinghua University's Challenge Cup.

Research (* indicates equal contribution, highlight indicates representative papers)

I'm interested in AI for MATH, ML for NLP and their applications.

project image

Goedel-Prover-V2: The Strongest Open-Source Theorem Prover to Date


Yong Lin*, Shange Tang*, Bohan Lyu*, Ziran Yang*, Jui-Hui Chung*, Haoyu Zhao*, Lai Jiang*, Yihan Geng*, Jiawei Ge, Ximing Lu, Jingruo Sun, Jiayun Wu, Jiri Gesi, David Acuna, Kaiyu Yang, Hongzhou Lin, Yejin Choi, Danqi Chen, Sanjeev Arora, Chi Jin
AI4MATH @ ICML 2025 (Oral)
arXiv / Code / Website / BibTeX /

Goedel-Prover-V2 sets new state-of-the-art in automated proof generation with dramatically smaller models, using scaffolded training, self-correction, and model averaging to outperform systems 20-100x larger on key mathematical benchmarks.

project image

SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors


Bohan Lyu*, Siqiao Huang*, Zichen Liang*, Qi-An Sun, Jiaming Zhang
EMNLP 2025 (Main, meta score top 0.3%)
arXiv / Code / BibTeX /

SURGE evaluates LLMs’ ability to predict code execution across eight key areas. While they show promise, limitations prevent general-purpose surrogate execution. This is an extension of a course project that does not reflect the focus of my research.

project image

Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving


Yong Lin*, Shange Tang*, Bohan Lyu, Jiayun Wu, Hongzhou Lin, Kaiyu Yang, Jia Li, Mengzhou Xia, Danqi Chen, Sanjeev Arora, Chi Jin
COLM 2025
arXiv / Code / Website / YouTube / BibTeX /

The new 7B Goedel-Prover sets a new state-of-the-art in open-source automated theorem proving, beating previous records with a 7% improvement on miniF2F, topping the PutnamBench Leaderboard, and solving nearly twice as many problems on Lean Workbook.

project image

Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation


Bohan Lyu*, Yadi Cao*, Duncan Watson-Parris, Leon Bergen, Taylor Berg-Kirkpatrick, Rose Yu
ICML 2025, AAAI 2024 FSS (Oral)
arXiv / Code / Slides / Website / YouTube / BibTeX /

This work proposes a fine-tuning method where LLMs internalize tool-generated solutions (World Knowledge Distillation) and learn to switch between direct answers and tool use for complex problems (Tool Usage Adaptation). It outperforms GPT-4 and Claude-3.5 across six scientific benchmarks.

project image

Enhancing LLM's Capabilities in Open Domains via Autonomous Tool Integration


Bohan Lyu*, Xin Cong*, Heyang Yu, Pan Yang, Yujia Qin, Yining Ye, Yaxi Lu, Zhong Zhang, Yukun Yan, Yankai Lin, Zhiyuan Liu, Maosong Sun
ACL 2025 (Main)
arXiv / Code / Slides / Website / YouTube / BibTeX /

Constructed OpenAct benchmark for complex open-domain task-solving. Developed a novel LLM agent system, OpenAgent, which leverages GitHub repositories to extend its capabilities to address diverse user queries.

Resources

Some tools and resources I've developed for the research community:

  • jload - A Python package for convenient JSON/JSONL data loading and saving. PyPI version
  • vlllm - A Python package for convenient text generation with multiprocessing support. PyPI version





Design and source code from Leonid Keselman's website