CodePlan: Repository-Level Coding Using LLMs and Planning
Overview
Authors: Ramakrishna Bairi, Atharv Sonwane, Aditya Kanade, Vageesh D C, Arun Iyer, Suresh Parthasarathy, Sriram Rajamani, B. Ashok, Shashank Shet
Institution: Microsoft Research, India
Topic: Automates repository-level coding tasks using Large Language Models (LLMs) and planning.
Challenge: Repository-level tasks involve pervasive edits across a codebase, which can't be addressed directly by LLMs due to inter-dependencies and large size.
Solution: CodePlan—a task-agnostic framework that synthesizes a multi-step chain of edits.
Key Features of CodePlan
Incremental Dependency Analysis: Tracks syntactic and semantic relations between code elements.
Change May-Impact Analysis: Identifies potential impacts of code changes on the rest of the repository.
Adaptive Planning Algorithm: Constructs a plan graph to guide edits based on dependency analysis.
Evaluation
Tasks Evaluated: Package migration (C#) and temporal code edits (Python).
Results: CodePlan outperformed baselines in getting repositories to pass validity checks.
Problem Formulation
Framed repository-level coding as a planning problem.
LLM-driven Repository-level Coding Task: Start with a repository and seed specifications; reach a valid state through derived edit specifications.
Proposed Solution
CodePlan Framework:
Constructs a plan graph.
Uses dependency analysis to identify areas needing changes.
Adapts the plan as code changes occur.
Validates repository state with an oracle after each plan execution.
Experimental Design
Datasets Used: Internal and external repositories.
Oracles: C# build tools for migration tasks; Pyright for temporal edits.
Baselines: Oracle-Guided Repair (reactive approach based on errors flagged by oracles).
Metrics:
Block Metrics: Matched, Missed, and Spurious Blocks.
Edit Metrics: Levenshtein Distance and Diff BLEU.
Results and Analysis
RQ1: CodePlan effectively localizes and makes required changes for repository-level tasks, outperforming Oracle-Guided Repair.
RQ2: Temporal and spatial contexts are crucial for CodePlan's performance.
RQ3: Key differentiators for CodePlan include its strategic planning, context awareness, and change-may-impact analysis.
Implementation Details
Dependency Graph Construction: Tree-sitter for C# and Jedi for Python.
Integration with GPT-4: Provides temporal and spatial context for code edits.
Language Extensibility: Framework extensible to other programming languages.
Limitations and Future Work
Current Limitations:
Static analysis limitations in dynamically typed languages.
Handling dynamic dependencies remains a challenge.
Future Directions:
Expand to more programming languages and artifacts.
Improve change may-impact analysis with machine learning.
Address dynamic dependencies in software systems.
Conclusion
CodePlan presents a promising approach to automating complex coding tasks at the repository level, with potential for significant productivity gains and increased code accuracy.
Future work includes expanding its applicability and refining its analysis capabilities.