Agentic Skills Reading List

Structured paper table with figures, benchmarks, and one-line summaries

Alias Title Source Institution Benchmarking Figure Summary
Comp-RL Complementary Reinforcement Learning arXiv 2603.17621 Alibaba Group / HKUST MiniHack ALFWorld WebShop SWE-Bench ★★★★★ Comp-RL Fig.1 受 CLS 启发的 actor-extractor 共演化 RL 框架:静态经验随训练变为负资产,共演化闭环单任务 1.3× margin + 多任务 +7%;split-group GRPO 防止 actor 对经验过度依赖。
ExGRPO ExGRPO: Learning to Reason from Experience arXiv 2510.02245 University of Macau / Shanghai AI Lab AIME24/25 MATH-500 GPQA★ MMLU-Pro ★★★★ ExGRPO Fig.1 首个系统研究 RLVR 经验价值的工作:中等难度题+低熵轨迹是最优经验来源;bucketed replay + entropy selection 在 5 模型上平均 +3.5/+7.6(ID/OOD)。
TRT Test-time Recursive Thinking: Self-Improvement without External Feedback arXiv 2602.03094 Microsoft Research AIME-24/25 LiveCodeBench Hard ★★★★ N/A 无需外部反馈的 test-time 迭代自改进:Generate-Select-Reflect 三阶段闭环;开源模型 AIME 100%,核心发现:失败知识 > 成功知识,depth > breadth。
OEL Online Experiential Learning for Language Models arXiv 2603.16856 Frozen Lake Sokoban IF-Eval ★★★★ OEL Fig.1 经验抽取后再做 on-policy 蒸馏显著优于直接吃原始轨迹(Sokoban consolidate 21.4 vs 7.8),self-policy 经验优于更大教师经验。
SkillsBench Benchmarking How Well Agent Skills Work Across Diverse Tasks arXiv 2602.12670 86 tasks 11 domains ★★★★ N/A 最关键结论:人工 curated skills 平均 +16.2pp,而 self-generated skills 平均退化(-3.3pp pass rate),说明 skill 质量控制比数量更重要。
MemSkill Learning and Evolving Memory Skills for Self-Evolving Agents arXiv 2602.02474 LoCoMo LongMemEval HotpotQA ALFWorld ★★★☆☆ MemSkill Fig.1 把记忆操作做成可检索、可演化 skill(controller/executor/designer);消融显示去掉 designer 退化最大,说明"持续增量技能"是关键。
SkillRL Evolving Agents via Recursive Skill-Augmented Reinforcement Learning arXiv 2602.08234 ALFWorld WebShop 7 search tasks ★★★☆☆ SkillRL Fig.1 轨迹记忆→技能记忆+递归演化有效:复杂子任务增益大、收敛更快,约 10.3% 上下文 token 缩减。
AutoSkill Experience-Driven Lifelong Learning via Skill Self-Evolution arXiv 2603.01145 ECNU / Shanghai AI Lab Lifelong personalization ★★★☆☆ AutoSkill Fig.1 重点不是加检索器,而是把长期经验沉淀为可版本化 SKILL.md 并异步 merge/演化;工程价值在 training-free 个性化闭环。
SkillOrchestra SkillOrchestra: Learning to Route Agents via Skill Transfer arXiv 2602.19672 UW-Madison / Salesforce AI 10 benchmarks ★★★☆☆ N/A 用"技能需求"而非 query 相似度做路由,能缓解 RL 路由 collapse,在性能-成本 Pareto 上更稳。
SoK-Skills Agentic Skills — Beyond Tool Use in LLM Agents arXiv 2602.20867 SoK / taxonomy ★★★☆☆ SoK Fig.1 贡献在系统化:给出 skill 四元组与生命周期框架,并把供应链安全纳入同一评估视角。
Survey Agent Skills for LLMs: Architecture, Acquisition, Security, and the Path Forward arXiv 2602.12430 Survey ★★★☆☆ Survey Fig.1 价值在于把 architecture/acquisition/security/governance 串成一张图,用大规模安全统计强调 skill 生态的治理刚需。