Bohan Lyu

I am an undergruduate at Tsinghua University, pursuing a dual degree in Computer Science and Economics since 2022.

Now I'm working on LLMs for autonomous math theorem proving with formal language at PLI, Princeton. Prior to it, I've worked in Tsinghua NLP Lab, advised by Prof. Zhiyuan Liu, and Rose-STL-Lab, UCSD, advised by Prof. Rose Yu. I've also interned as a quantitative researcher at China Securities.

Recently I'm particularly interested in math theorem proving in formal language, and its applications (e.g. how it can benefit natural language reasoning).

I have served as a reviewer for ICLR 2025, AI4MATH @ ICML 2024 and LLMAgents @ ICLR 2024 .

I am seeking PhD opportunities starting in Fall 2026. Please feel free to reach out!

Email  /  GitHub  /  Google Scholar  /  Twitter  /  CV

   

profile photo
▶ Contact Info (click to see)

News

  • 2024-12 Delivered a talk at the Challenge Cup Seminar hosted by Department of Automation, Tsinghua University.
  • 2024-11 Presented my summer project at AAAI FSS.
  • 2024-10 Awarded the Scholarship for Technological Innovation of 5,000 RMB.
  • 2024-06 My team got No.1 out of 600+ teams at a Baidu Inc. Data Mining Competition.
  • 2024-05 I became a member of 'Sparking Program', the most prestigious and selective academic organization for students at Tsinghua University.
  • 2024-04 I led my team to win Second Place and the Newcomer Prize at Tsinghua University's Challenge Cup.
  • 2024-04 I received a Grade A in the 'Global Exploration Initiative' and was awarded a grant of 20,000 RMB.

Research (* indicates equal contribution, highlight indicates representative papers)

I'm interested in Natural Language Processing, AI4Science and multimodal learning.

project image

SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors


Bohan Lyu*, Siqiao Huang*, Zichen Liang*
Preprint, 2025
arxiv / code /

SURGE evaluates LLMs’ ability to predict code execution across eight key areas. While they show promise, limitations prevent general-purpose surrogate execution.

project image

Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving


Yong Lin*, Shange Tang*, Bohan Lyu, Jiayun Wu, Hongzhou Lin, Kaiyu Yang, Jia Li, Mengzhou Xia, Danqi Chen, Sanjeev Arora, Chi Jin
Preprint, 2025
arxiv / code / website / youtube /

The new 7B Goedel-Prover sets a new state-of-the-art in open-source automated theorem proving, beating previous records with a 7% improvement on miniF2F, topping the PutnamBench Leaderboard, and solving nearly twice as many problems on Lean Workbook.

project image

Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation


Bohan Lyu*, Yadi Cao*, Duncan Watson-Parris, Leon Bergen, Taylor Berg-Kirkpatrick, Rose Yu
AAAI FSS ORAL, Main Conference under review, 2024
arxiv / slides / youtube /

This work proposes a fine-tuning method where LLMs internalize tool-generated solutions (World Knowledge Distillation) and learn to switch between direct answers and tool use for complex problems (Tool Usage Adaptation). It outperforms GPT-4 and Claude-3.5 across six scientific benchmarks.

project image

Enhancing LLM’s Capabilities in Open Domains via Autonomous Tool Integration


Bohan Lyu*, Xin Cong*, Heyang Yu, Pan Yang, Yujia Qin, Yining Ye, Yaxi Lu, Zhong Zhang, Yukun Yan, Yankai Lin, Zhiyuan Liu, Maosong Sun
Under Peer Review, 2023
arxiv /

Developed an autonomous agent that leverages GitHub repositories to extend its capabilities to address diverse user queries. Introduced a new agent architecture that achieved SOTA performance on SciAct.






Design and source code from Leonid Keselman's website