đź§ 

Day 3 Whitepaper Companion Podcast - Context Engineering: Sessions & Memory

Nov 13, 2025

Overview

This talk explains how to give LLM agents effective “memory” using context engineering, sessions, and long-term memory, based on a Google X Kaggle white paper.

Core Concepts

  • Context engineering dynamically assembles all needed inputs per turn to overcome LLM statelessness.
  • Sessions manage immediate, per-conversation state and history for one user interaction.
  • Memory provides long-term, cross-session personalization via an LLM-driven ETL pipeline.

Context Engineering

  • LLMs are stateless; statefulness requires dynamic, per-call context packaging.
  • Goes beyond static prompt engineering; continuously adapts inputs to current turn.
  • Inputs include system instructions, tool definitions, few-shot examples, external data, and dialogue history.
  • Manages function outputs, scratchpads, and latest user message for precise responses.

Context Rot and Compaction

  • Context rot: oversized, noisy context degrades attention and reasoning quality.
  • Compaction trims context using summarization and pruning to preserve key signal.
  • Must run asynchronously to avoid blocking user responses and adding latency.

Context Management Cycle

  • Fetch context: retrieve relevant memories, RAG documents, and required data.
  • Prepare context: assemble full prompt string on the hot path before inference.
  • Invoke LLM and tools: send prompt, run functions, collect outputs as needed.
  • Upload context: persist new insights to storage asynchronously after response.

Sessions: Structure and Frameworks

  • A session is a self-contained container for one continuous user conversation.
  • Events: chronological log of user/agent messages and tool calls in order.
  • State: structured working memory like cart items or workflow step progress.
  • Frameworks differ: ADK separates events and state; LangGraph uses mutable state.

Multi-Agent Systems (MAS)

  • Shared unified history: all agents read/write one central log; high visibility, clutter risk.
  • Separate individual histories: agents communicate via messages; autonomy but less shared context.
  • Requires an abstract, framework-agnostic memory layer to share synthesized knowledge.

Security, Privacy, and Performance

  • Strict isolation and ACLs are baseline; enforce user-by-user separation.
  • PII reduction must occur before storage for compliance (e.g., GDPR, CCPA).
  • Data hygiene: TTL policies and deterministic event ordering maintain integrity.
  • Performance: minimize hot-path session size; compaction reduces cost and latency.

Compaction Strategies

  • Sliding window: keep last N turns to limit context growth per conversation.
  • Token-based truncation: cut oldest content once token budget is reached.
  • Recursive summarization: periodically replace older chunks with concise summaries.
  • Triggers: number of turns, inactivity periods, or task completion events.

Memory vs RAG

  • RAG: static, shared knowledge; “research librarian” for world facts.
  • Memory: dynamic, user-specific knowledge; “personal assistant” for personalization.
  • Both are complementary; serve distinct roles in agent intelligence.

Types and Organization of Memory

  • Declarative memory: facts and events (e.g., favorite team, upcoming destination).
  • Procedural memory: skills and workflows (e.g., tool call sequences for tasks).
  • Organization: per-user collections, topic collections, structured profiles, rolling summaries.
  • Storage: vector databases for semantic search; knowledge graphs for relations.
  • Scope: user-level (across sessions), session-level (temporary), application-level (global).

Multimodal Considerations

  • Sources may be images or audio; store extracted key facts as text for processing.
  • Text remains the primary representation for LLM search and reasoning.

Memory Generation: LLM-Driven ETL

  • Extraction: targeted filtering of meaningful details based on agent purpose.
  • Consolidation: compare with existing memories; create, update, delete, or invalidate.
  • Provenance and confidence: source, age, explicitness, and reinforcement drive trust.
  • Relevance decay: reduce importance over time without reinforcement to mimic forgetting.
  • Asynchronous processing: run ETL in background to prevent response latency spikes.

Memory as a Tool

  • Provide agent tools like create_memory and query_memory for autonomous management.
  • Agent decides when to save or retrieve based on conversational needs and goals.

Retrieval and Scoring

  • Blend scores: combine relevance (similarity), recency, and importance for ranking.
  • Proactive retrieval: fetch likely memories each turn; simple but may add latency.
  • Reactive retrieval: agent queries memory on demand; efficient but requires smarter control.

Inference and Prompt Placement

  • Placement signals authority; system messages carry strong weight but risk bias.
  • Conversation history injection may confuse roles and dilute dialogue clarity.
  • Choose placement carefully to balance stability and risk from imperfect memories.

Evaluation and Testing

  • Generation metrics: precision and recall for captured memories’ correctness.
  • Retrieval metrics: recall@K to assess presence of correct memories in top results.
  • Latency targets: memory lookups ideally under 200 milliseconds for responsiveness.
  • End-to-end success: use LLM judges across test cases to score task completion gains.

Summary Table: Architecture Elements

ComponentPurposeKey TechniquesPerformance Notes
Context EngineeringBuild per-turn statefulnessFetch/prepare/invoke/upload; compactionHot path; minimize latency and cost
SessionsImmediate conversation stateEvents log; mutable state; TTLDeterministic ordering; framework differences
MemoryLong-term personalizationLLM ETL; provenance; decayAsynchronous processing required
RetrievalBring memories into contextBlended scoring; proactive/reactiveAim for <200 ms lookup times
StoragePersist and query knowledgeVector DB + knowledge graphHybrid supports semantic and relational queries

Key Terms & Definitions

  • Context engineering: Dynamic assembly of all inputs per turn to overcome statelessness.
  • Context rot: Quality drop when context becomes too large or noisy.
  • Compaction: Techniques to reduce context size while preserving essential information.
  • Session: Self-contained container of one conversation’s events and working state.
  • Declarative memory: “Knowing what” facts and events about the user.
  • Procedural memory: “Knowing how” processes and tool-use sequences.
  • Provenance: Origin and characteristics of a memory used to judge reliability.
  • Relevance decay: Scheduled reduction in memory importance without reinforcement.
  • Blended scoring: Combining relevance, recency, and importance for retrieval ranking.

Action Items / Next Steps

  • Implement session compaction with sliding window and token-based truncation first.
  • Add asynchronous recursive summarization triggered by turn count or inactivity.
  • Design and deploy an LLM-driven ETL for memory with provenance and decay.
  • Expose memory tools to the agent for create/query within conversations.
  • Adopt blended retrieval scoring and evaluate with recall@K and latency targets.