Large language models have demonstrated impressive reasoning capabilities but are inherently limited by their knowledge reservoir. Retrieval-augmented reasoning mitigates this limitation by allowing LLMs to query external resources, but existing methods often retrieve irrelevant or noisy information, hindering accurate reasoning. In this paper, we propose AutoRefine, a reinforcement learning post-training framework that adopts a new "search-and-refine-during-think" paradigm. AutoRefine introduces explicit knowledge refinement steps between successive search calls, enabling the model to iteratively filter, distill, and organize evidence before generating an answer. Furthermore, we incorporate tailored retrieval-specific rewards alongside answer correctness rewards using group relative policy optimization. Experiments on single-hop and multi-hop QA benchmarks demonstrate that AutoRefine significantly outperforms existing approaches, particularly in complex, multi-hop reasoning scenarios. Detailed analysis shows that AutoRefine issues frequent, higher-quality searches and synthesizes evidence effectively.
Current retrieval-augmented reasoning approaches face two key limitations:
At the core of AutoRefine is a novel "search-and-refine-during-think" paradigm that extends the traditional "search-during-think" approach. This paradigm allows the LLM to:
AutoRefine employs a sophisticated reward system that combines:
AutoRefine was evaluated on seven question-answering benchmarks, including three single-hop datasets (NQ, TriviaQA, and PopQA) and four multi-hop datasets (HotpotQA, 2WikiMultihopQA, MuSiQue, and Bamboogle). The results demonstrate that:
@article{shi2025search,
title={Search and Refine During Think: Autonomous Retrieval-Augmented Reasoning of LLMs},
author={Shi, Yaorui and Li, Shihan and Wu, Chang and Liu, Zhiyuan and Fang, Junfeng and Cai, Hengxing and Zhang, An and Wang, Xiang},
journal={arXiv preprint arXiv:2505.11277},
year={2025}
}