Improving Language Models through Interventions

Introduction

Language Models (LMs):
- Used in various fields such as medicine, finance, science, and entertainment.
- Can behave unpredictably, generate incorrect or harmful content.
- User needs change due to new regulations, limited resources, outdated knowledge, or copyright issues.
- Need for quick solutions to prevent LMs from becoming inaccurate, outdated, or biased.
Interventions:
- Efficient updates for LMs after initial training.
- Aim to address issues such as outdated information and changing user requirements.
- Examples: model compression, knowledge editing, and unlearning.
- Typically developed independently, posing challenges for effective combination.
Proposed Solution:
- Interventions should be composable, meaning they can work together harmoniously.
- Metrics introduced:
  - Order-free error: Assesses if one intervention impacts the success of others.
  - Order sensitivity: Checks if the application order of interventions affects the outcome.
Experiments:
- Conducted using the LLaMA 3-8B model.
- Results showed that model compression can hinder other interventions.
- Emphasis on developing interventions with composability in mind.

Intervention Composition:
- Involves applying multiple interventions in a specific order to a model.
- Assume each intervention is based on a single criterion for simplicity.
Measuring Effectiveness:
- Consider how one intervention affects the application of subsequent interventions.
- Metrics introduced:
  - Order-free error: Performance regardless of order.
  - Order sensitivity: Performance invariance to order.
Evaluation Method:
- Apply an intervention, measure its impact, add another, and measure the new impact.
- Reverse the order of interventions and compare results.
- Helps identify direct interactions between interventions.
Experimental Setup:
- Study of the impact of multiple sequential interventions on model performance.
- Importance of order in applying interventions for effective results.
- Detailed evaluation crucial for developing practical and composable interventions.

Model & Metrics:
- Experiments conducted with the LLaMA 3-8B model.
- Focus on how interventions work individually and combined.
Findings:
- Model compression impacts the success of knowledge editing and overall performance.
- Order of applying interventions affects the outcome.
- Emphasis on creating tailored methods for successful composition.
Utility Evaluations:
- Overall utility evaluations may not accurately capture composability.
- Need for detailed evaluations focusing on composability as a design factor.

Interventions:
- Composed three model compression and three machine unlearning interventions.
- Evaluated success of unlearning using wdp (weighted data points).
Results:
- Compressing models can make unlearning more difficult.
- Pruning before unlearning reduces performance, especially at higher sparsity.
- Optimal order varies depending on the method and intervention.
- Compression tends to hinder other interventions and make targeted updates harder.
Metrics & Techniques:
- GD and RMU outperform Gau in terms of order-free error on wdp.
- RMU shows lower overall order sensitivity, while GD has higher order sensitivity for pruning compared to quantization.
Knowledge Editing and Unlearning:
- High composability between some unlearning methods and knowledge editing.
- RMU stands out due to low order-free error and order sensitivity.
- Editing precise modifications do not disrupt unlearning targets.
Summary:
- Compression alters model knowledge storage, making updates difficult.
- RMU emerges as the most composable unlearning technique.
- Need thorough evaluations using multiple metrics and datasets to assess intervention composability effectively.