Accepted Paper by ICSE 2026

I am excited to share that FLAMES has been accepted by Research Track of ICSE 2026. FLAMES is a memory-efficient LLM-based APR with semantic-guided patch generation. If you are interested in FLAMES, please read our pre-print and explore the tool.

paper

presentation

news

In this work, we unveil the inefficiency of LLM-based APR techniques, which leads to a significant decrease of their performance in constrained environments. To address this challenge, we propose FLAMES, an efficient LLM-based APR approach that overcomes the memory inefficiency of traditional beam search methods. Instead of beam search, FLAMES employs a semantic-guided best-first search with greedy decoding, using test feedback to guide patch generation. Experiments show that FLAMES reduces memory usage by up to 83% while achieving state-of-the-art repair accuracy, fixing 133 bugs on Defects4J and outperforming baselines by 12% on HumanEval-Java and 36.5% on TransformedD4J.

Authors

Thanh Le-Cong

Bach Le

Toby Murray

Published

October 16, 2025

Abstract

Fixing software bugs is crucial yet demands significant resources from developers. Automated Program Repair (APR) is a promising solution to address this challenging task. The emergence of Large Language Models (LLMs) has opened a new era of LLM-based APR, substantially advancing the APR field further. LLM-based APR methods face significant challenges regarding memory inefficiency, hindering their scalability and effectiveness. This is largely due to the beam search utilized in the patch generation phase of LLM-based APR, which requires large beam sizes to search for more potentially good repair candidates.

In this paper, we first show that increases in beam size, even for small-sized LLMs (1B-7B params), require extensive GPU usage, leading to up to 80% of recurring crashes due to memory overloads in LLM-based APR. Seemingly simple solutions to reduce memory consumption are (1) to quantize LLM models, i.e., converting the weights of an LLM from high-precision values to lower-precision ones, and (2) to make beam search sequential, i.e., forwarding each beam through the model sequentially and then concatenating them back into a single output. However, we show that these approaches still do not work via both theoretical analysis and experiments.

To address this, we introduce FLAMES, a novel LLM-based APR technique that employs semantic-guided patch generation to enhance repair effectiveness and memory efficiency. Unlike conventional methods that rely on beam search, FLAMES utilizes greedy decoding to enhance memory efficiency while steering the search towards more potentially good repair candidates via a semantic-guided best-first search algorithm. At each decoding step, FLAMES uses semantic feedback from test validation, such as the number of passing and failing test cases, to select the most promising token to explore further. Our empirical evaluation on Defects4J shows that FLAMES substantially reduces memory consumption by up to 83% compared to LLM-based APR without compromising time efficiency. Moreover, FLAMES correctly fixes 133 bugs on Defects4J, fixing 10 bugs more than the best baseline. Additionally, these improvements also generalize to the HumanEval-Java and TransformedD4J datasets, where FLAMES generates 12% and 36.5% more correct patches, respectively, than the best baseline.

Full Paper

Your browser does not support PDFs. Download the PDF.