🤖

Automating Repository-Level Coding with CodePlan

Apr 25, 2025

CodePlan: Repository-Level Coding Using LLMs and Planning

Overview

  • Authors: Ramakrishna Bairi, Atharv Sonwane, Aditya Kanade, Vageesh D C, Arun Iyer, Suresh Parthasarathy, Sriram Rajamani, B. Ashok, Shashank Shet
  • Institution: Microsoft Research, India
  • Topic: Automates repository-level coding tasks using Large Language Models (LLMs) and planning.
  • Challenge: Repository-level tasks involve pervasive edits across a codebase, which can't be addressed directly by LLMs due to inter-dependencies and large size.
  • Solution: CodePlan—a task-agnostic framework that synthesizes a multi-step chain of edits.

Key Features of CodePlan

  • Incremental Dependency Analysis: Tracks syntactic and semantic relations between code elements.
  • Change May-Impact Analysis: Identifies potential impacts of code changes on the rest of the repository.
  • Adaptive Planning Algorithm: Constructs a plan graph to guide edits based on dependency analysis.

Evaluation

  • Tasks Evaluated: Package migration (C#) and temporal code edits (Python).
  • Results: CodePlan outperformed baselines in getting repositories to pass validity checks.

Problem Formulation

  • Framed repository-level coding as a planning problem.
  • LLM-driven Repository-level Coding Task: Start with a repository and seed specifications; reach a valid state through derived edit specifications.

Proposed Solution

  • CodePlan Framework:
    • Constructs a plan graph.
    • Uses dependency analysis to identify areas needing changes.
    • Adapts the plan as code changes occur.
    • Validates repository state with an oracle after each plan execution.

Experimental Design

  • Datasets Used: Internal and external repositories.
  • Oracles: C# build tools for migration tasks; Pyright for temporal edits.
  • Baselines: Oracle-Guided Repair (reactive approach based on errors flagged by oracles).
  • Metrics:
    • Block Metrics: Matched, Missed, and Spurious Blocks.
    • Edit Metrics: Levenshtein Distance and Diff BLEU.

Results and Analysis

  • RQ1: CodePlan effectively localizes and makes required changes for repository-level tasks, outperforming Oracle-Guided Repair.
  • RQ2: Temporal and spatial contexts are crucial for CodePlan's performance.
  • RQ3: Key differentiators for CodePlan include its strategic planning, context awareness, and change-may-impact analysis.

Implementation Details

  • Dependency Graph Construction: Tree-sitter for C# and Jedi for Python.
  • Integration with GPT-4: Provides temporal and spatial context for code edits.
  • Language Extensibility: Framework extensible to other programming languages.

Limitations and Future Work

  • Current Limitations:
    • Static analysis limitations in dynamically typed languages.
    • Handling dynamic dependencies remains a challenge.
  • Future Directions:
    • Expand to more programming languages and artifacts.
    • Improve change may-impact analysis with machine learning.
    • Address dynamic dependencies in software systems.

Conclusion

  • CodePlan presents a promising approach to automating complex coding tasks at the repository level, with potential for significant productivity gains and increased code accuracy.
  • Future work includes expanding its applicability and refining its analysis capabilities.