Don't Break the Cache: An Evaluation of Prompt Caching for Long-Horizon Agentic Tasks

January 9, 2026

7 authors

arXiv:2601.06007v1

Authors

Elias LumerFaheem NizarAkshaya JangitiKevin FrankAnmol GulatiMandar PhadateVamse Kumar Subbiah

Abstract

Recent advancements in Large Language Model (LLM) agents have enabled complex multi-turn agentic tasks requiring extensive tool calling, where conversations can span dozens of API calls with increasingly large context windows. However, although major LLM providers offer prompt caching to reduce cost and latency, its benefits for agentic workloads remain underexplored in the research literature. To our knowledge, no prior work quantifies these cost savings or compares caching strategies for multi-turn agentic tasks. We present a comprehensive evaluation of prompt caching across three major LLM providers (OpenAI, Anthropic, and Google) and compare three caching strategies, including full context caching, system prompt only caching, and caching that excludes dynamic tool results. We evaluate on DeepResearchBench, a multi-turn agentic benchmark where agents autonomously execute real-world web search tool calls to answer complex research questions, measuring both API cost and time to first token (TTFT) across over 500 agent sessions with 10,000-token system prompts. Our results demonstrate that prompt caching reduces API costs by 45-80% and improves time to first token by 13-31% across providers. We find that strategic prompt cache block control, such as placing dynamic content at the end of the system prompt, avoiding dynamic traditional function calling, and excluding dynamic tool results, provides more consistent benefits than naive full-context caching, which can paradoxically increase latency. Our analysis reveals nuanced variations in caching behavior across providers, and we provide practical guidance for implementing prompt caching in production agentic systems.

Paper Information

arXiv ID:: 2601.06007v1
Published:: January 9, 2026
Categories:: cs.CL

Don't Break the Cache: An Evaluation of Prompt Caching for Long-Horizon Agentic Tasks

Authors

Abstract

Paper Information

Related Papers

Large Language Models for Code Generation

Diffusion Models for High-Resolution Image Synthesis