🛠️

GPT-4.1 Prompting Strategies and Best Practices

Apr 18, 2025

GPT-4.1 Prompting Guide

Overview

  • GPT-4.1 models represent a significant improvement over GPT-4o with advancements in coding, instruction following, and long context handling.
  • Key focus: improved instruction adherence, making it highly steerable with precise prompts.
  • Prompt migration may be necessary for leveraging the new capabilities effectively.

Best Practices

  • Provide context examples and clear, specific instructions.
  • Utilize prompt-induced planning for better model performance.
  • Clear instructions are crucial for effective steering of model behavior.

Agentic Workflows

  • GPT-4.1 excels in agentic workflows, with a state-of-the-art performance in solving SWE-bench Verified problems.
  • Important to include three types of reminders in agent prompts:
    1. Persistence: Prevents premature yielding of control by the model.
    2. Tool-calling: Encourages tool usage to reduce hallucinations.
    3. Planning (Optional): Induces explicit planning and reflection.

Tool Calls

  • GPT-4.1 utilizes tools more effectively than previous models.
  • Use the tools field in API requests to minimize errors and remain in distribution.
  • Provide clear names and descriptions for tools.
  • Include examples in prompts for complex tools.

Prompting-Induced Planning & Chain-of-Thought

  • Encourages explicit planning between tool calls.
  • Although not a reasoning model, explicit step-by-step prompts can improve performance.
  • Increased pass rates in SWE-bench Verified tasks when using planning prompts.

Long Context

  • Supports a 1M token input context window, aiding in tasks like structured document parsing and multi-hop reasoning.
  • Performance can degrade with more complex reasoning tasks.

Instruction Following

  • GPT-4.1 follows instructions very closely, requiring precise and explicit prompt structures.
  • Developers should iteratively refine prompts for desired outcomes.

Recommended Workflow

  • Start with high-level response rules and refine with specific instructions and examples.
  • Address conflicting instructions and test extensively to ensure prompt effectiveness.

Common Failure Modes

  • Issues like hallucination of tool inputs or repetitive answers can occur, and must be managed with clear instructions.

General Advice

  • Structure prompts with a clear role, objective, instructions, reasoning steps, and examples.
  • Use markdown or XML for prompt structuring, and avoid verbose formats like JSON for long contexts.

Apply Patch

  • Improved diff generation capabilities in GPT-4.1.
  • Recommended diff format involves clear context without line numbers.

Additional Notes

  • Use AI-powered IDEs for efficient prompt iteration and debugging.
  • Consider potential model resistance to very long outputs.
  • Test for correctness in parallel tool calls.