May 8, 2025:
Google Unveils Cost-Saving Implicit AI Caching - Google's Gemini API now includes implicit caching, cutting AI model access costs by 75% through automatic caching of repetitive contexts. This feature supports Gemini 2.5 Pro and 2.5 Flash models, eliminating the need for manual explicit caching and reducing developer expenses.
Despite the change being implicitly enabled and requiring minimal token counts, developers are advised to structure requests for optimal cache hits. Skepticism persists regarding the actual savings due to the absence of third-party verification.