🤖

ChatGPT Agent Launch Overview

Jul 21, 2025

Overview

OpenAI has launched ChatGPT Agent, a new AI tool with advanced task automation, integration, and web interaction capabilities. This marks a significant leap in agentic AI, enabling real-world task completion with robust safety measures and gradual rollout.

Key Features and Capabilities

  • ChatGPT Agent automates complex tasks using its own virtual computer, separate from the user's device.
  • It can perform web browsing, API integration, spreadsheet editing, and generate editable slide decks.
  • The agent blends OpenAI's prior operator and deep research agents, leveraging new tools like terminal access and connectors to apps.
  • Users can prompt the agent for multi-step tasks, such as competitor analysis or online shopping automation.
  • The system is interactive; users can pause, adjust, or cancel tasks and request progress updates.
  • ChatGPT Agent connects to services like Gmail, Google Calendar, and GitHub after secure authentication.
  • Performance notifications are available on the mobile app when tasks complete.

Benchmark and Performance Results

  • Achieved 41.6% on “Humanity’s Last Exam” benchmark, nearly double previous OpenAI models.
  • Scored 44.4% with parallel attempts on the same exam.
  • Outperformed prior models on the Frontier Math benchmark with 27.4% using tool access versus 6.3% for the previous best.
  • Matched or exceeded human outputs in about half of real-world business task evaluations.
  • Scored 45.54% on spreadsheet tasks when editing .xls files, more than double Copilot in Excel, though humans scored 71.33%.
  • Outperformed human baselines on DSBench data science tasks.
  • Set new records on web navigation benchmarks like Browse Comp (68.9%) and outperformed previous agents on WebArena.

Technical Architecture

  • Uses multiple environments (visual and text-based browsers, terminal) on a virtual machine for persistent context.
  • Can run scripts, manipulate files, and display outputs without losing track of progress.
  • Employs connectors for secure API access across user apps and services.

Safety, Privacy, and Limitations

  • Classified as high-risk for biological and chemical misuse; comprehensive safety stack implemented.
  • All prompts are classified and filtered for harmful content, with real-time monitoring and agent memory disabled.
  • Designed to resist prompt injection threats from malicious web content.
  • Requires explicit user permissions for consequential actions and offers watch mode for supervised tasks.
  • Provides privacy features: data deletion, active session logout, cookie management, and private browser logins.
  • Known limitations include beta-level slide deck creation and lack of custom slide uploads.
  • Next version aims for improved formatting and broader capabilities.

Rollout and Access

  • Pro users have access now with 400 messages/month; team users get 40 messages/month, more via paid credits.
  • Enterprise and education rollout in coming weeks; European Economic Area and Switzerland access delayed.
  • Web developers are advised to use structured content to facilitate AI agent interactions.

Impact and Future Outlook

  • ChatGPT Agent is positioned to change online productivity for tasks like trip planning, business analysis, and spreadsheet editing.
  • Structured website content is increasingly important for AI agent efficiency.
  • AI is already enabling new income streams and workflows for regular users.