Transcript for:
Improving Language Models through Interventions

section introduction in this section we explore how language models LMS can be improved through interventions LMS are powerful tools used in various Fields like medicine Finance science and entertainment however they can sometimes behave unpredictably generating incorrect or harmful content additionally user needs change over time due to new regulations limited resources outdated knowledge or copyright issues without quick Solutions LMS may become inaccurate outdated or biased limiting their responsible use researchers have been working on efficient updates for LMS known as interventions to address these issues these interventions Target specific properties of LMS after their initial training for example model compression makes LMS more memory efficient while knowledge editing and unlearning are other evolving techniques however these interventions are often developed independently making it challenging to combine them effectively we propose that interventions should be composable meaning they can work together without interfering with each other for instance fixing a factual error in an LM should not be affected by later optimizations like model compression we introduce two metrics to measure composability order- free error assesses if an intervention succe sucess impacts others while order sensitivity checks if the order of applying interventions affects their combined success to evaluate composability we conducted experiments using popular interventions on a specific LM called llama 3 to 8B our results revealed that model compression can hinder the success of other interventions emphasizing the need for new approaches designed for composability the order in which interventions are applied also significantly influences their effectiveness highlighting the importance of developing interventions explicitly for composability we found that relying solely on overall model performance after interventions is not enough to assess composability accurately therefore focusing on composability as a metric can drive the creation of more effective interventions our study underscores the necessity of expanding evaluation methods for interventions and designing new interventions with composability in mind our contribu ions include introducing the concept of composability to LM interventions Paving the way for addressing practical challenges in updating lm's online by conducting extensive experiments we identified previously unknown interactions between different interventions emphasizing the importance of developing interventions that prioritize composability we have also provided a code base that integrates various Cutting Edge interventions enabling others to create new interventions with multiple objectives section summary in this section we explore the challenges faced by language models LMS despite their impressive capabilities such as generating inaccurate or harmful content to address issues like outdated knowledge and changing user requirements efficient interventions are being developed to update LMS after pre-training our research focuses on the composability of these interventions highlighting the need for methods that can be applied sequentially with without interfering with each other ultimately driving the development of new more practical interventions section composable interventions for language models in this section we consider various interventions for language models we have a total of 28 possible interventions that we can apply intervention composition involves applying multiple interventions to a model in a specific order for instance we can first apply one intervention to a model and then apply another intervention to the modified model we can represent this composition as a sequence of interventions most interventions are designed based on specific criteria to simplify we assume each intervention is based on one Criterion although in reality they may involve multiple criteria we suggest that the effectiveness of interventions should also consider how they affect the application of other interventions to address this we introduce two metrics to measure this composability these metrics can be calculated for a single Criterion and across different hyperparameter settings to understand how interventions interact our evaluation method is simple we apply an intervention measure its impact add another intervention and measure the new impact the difference between these measures shows how the second intervention affects the success of the first intervention based on the Criterion to separate the overall performance from the interventions we reverse the order of interventions and compare the results by combining absolute changes with different orderings we can identify direct interactions between interventions this approach helps in selecting interventions to combine and highlights potential issues with existing methods we introduce two concrete metrics for composability the first metric order free error evaluates how well interventions perform regardless of the order they are applied in the second metric order sensitivity assesses whether the performance of interventions is consistent regardless of their order of application these metrics provide initial insights into intervention composability and can be simplified to a single error score when hyperparameters are fixed by exploring these metrics across various hyperparameters we gain a better understanding of how interventions interact this broader analysis helps in visualizing and comp comprehending the composability of interventions our experimental setup focuses on studying the impact of applying multiple interventions sequentially on a model's performance this understanding is crucial for practical applications of interventions during testing we will now detail the 10 techniques selected from three intervention categories model editing unlearning and compression along with their respective metrics section summary in this section we explore the concept of composable interventions for language models by considering various interventions and their application sequences we propose metrics to measure the impact of interventions on each other and introduce two metrics order- free error which assesses performance regardless of order and Order sensitivity which evaluates performance in variance to order these metrics help in understanding intervention composability and provide insights into how interventions inter act when applied sequentially section implementation details in this section we conducted experiments using the Llama 3 to 8B model which is known for its efficiency we tested various editing methods over multiple batches of random edits the details of our implementations and hyperparameters can be found in the appendix we first analyzed how each intervention works on its own and then looked at the overall all Trends we used metrics to measure how well interventions can be combined and assess the impact of compression at different levels all intervention pairs were tested in both directions to compare the methods we counted how many times each method outperformed The Others When combining model compression with knowledge editing we observed that as compression increases editing performance decreases different editing methods showed varying degrees of performance degradation with increasing compression levels the order in which interventions are applied also plays a significant role in editing performance interestingly we found that model compression can restrict a model's ability to be edited effectively the order of interventions is crucial as it can significantly affect editing performance the best order of interventions varies depending on the editing method used composability can differ within the same intervention category highlighting the importance of tailored methods for successful composition overall utility evaluations may not accurately capture composability methods that perform similarly in overall utility metrics can exhibit significant differences in composability this emphasizes the need for more detailed evaluations focused specifically on composability as a design factor section summary in this section we explore the inter interaction between model compression and knowledge editing using the Llama 3 to 8B model we find that as compression increases editing performance deteriorates with different editing methods showing varying levels of degradation the order in which interventions are applied significantly impacts editing performance highlighting the need for tailored and composable methods to achieve successful composition section composing model compression with machine unlearning in this section we composed three model compression and three machine unlearning interventions we evaluated the success of unlearning using wdp we found that compressing a model can make unlearning more difficult our results indicate that unlearning a compressed model can lead to decreased performance when using rmu pruning the model before unlearning can notably reduce unlearning performance especially at higher sparsity levels a similar TR was observed with quantization when combining rmu and gptq with GD we discovered that pruning after unlearning works best although quantizing first can also be effective on the other hand with G pruning after unlearning was optimal despite gau's generally poor unlearning performance overall unlearning a compressed model is more challenging and the impact of unlearning after compression on performance depends on the specific composition and compress ression level order sensitivity plays a crucial role in determining overall composability our analysis of the composability metrics for each unlearning technique revealed that GD and rmu outperform G in terms of order- free error on wdp GD shows higher order sensitivity for pruning compared to quantization while rmu exhibits low order sensitivity across all techniques except when quantized by gptq although both GD and rmu are successful in unlearning rmu has lower overall order sensitivity than GD indicating that while task performance is important it does not fully capture the Practical challenges of interventions rmu is more resilient to order variations we then explored the composability between knowledge editing and machine unlearning both interventions aim to modify a model's knowledge without compromising its overall utility pairing editing and unar learning is practical for keeping language models factually updated while removing undesirable knowledge some unlearning methods show high composability with editing as indicated by the order free error and Order sensitivity metrics rmu stands out with lower overall order free error and Order sensitivity compared to other unlearning techniques across all editors GD performs similarly to rmu but with a higher order sensitivity Gau tendency for catastrophic forgetting makes it unsuitable for composition editing does not disrupt the model's ability to forget the unlearning target possibly due to its precise modifications and targeting of semantically distinct knowledge in summary we observed that compression tends to hinder other interventions and compressing an edited model can reduce editing performance unlearning a compressed model is often more challenging than unlearning the original model these Trends suggest that compression May alter how models store knowledge making targeted updates harder rmu emerges as the most composable unlearning technique with strong performance and Order invariance rmu's training objective directs activations related to the unlearning Target in a random Direction while minimally affecting unrelated activations leading to effective interventions MML U alone is insufficient for measuring composability as it may mask significant differences in intervention performance we recommend thorough evaluations using multiple metrics and data sets for assessing intervention composability