Axolotl and Fine-Tuning Lecture Notes

Introduction

Agenda:
- Overview of Axolotl
- Honeycomb example
- Q&A with Wing
- Zach's discussion on parallelism and Hugging Face Accelerate
- Fine-tuning on modal
- Final Q&A

Model Capacity Questions:
- What model to fine-tune?
- Should I use LoRa or full fine-tune?

Model Size:
- Options: 7B, 13B, 70B parameter models.
- 7B is often preferred due to:
  - Faster training times
  - Easier to obtain compatible GPUs
  - Popularity with most users
Model Family:
- Examples: Llama 2, Llama 3, Mistral, Zephyr, Gemma.
- Use trending models from communities like Hugging Face to guide selections.

LoRa (Low-Rank Adaptation):
- Strategy to learn lower-dimensional alterations in the model layers.
- Typical input/output dimensions for LoRa adjustments of 4000 dimensions with reduced parameter count.
- Majority of fine-tuning efforts are either through LoRa or QLoRa.
- Recommended for most users to start with LoRa.
Quantization (QLoRa):
- Uses fewer bits for parameters, can produce lighter models while sacrificing some precision.
- Less noticeable performance impact than expected.

Getting Started:
- Start with example YAML configurations for simplicity.
- Important flags include:
  - LoRaR: size of LoRa matrices.
  - LoRaAlpha: scaling parameter for LoRa.
Pre-processing Data:
- Pre-process your data with command-line flags to set up in the correct format for Axolotl.
- Review the output for correct formatting, especially checking for tokenization integrity.
Training Process:
- Use accelerate launch commands for training jobs.
- Include parameters for successful runs, such as logging runs with Weights and Biases.

Overview of Honeycomb:
- Observability platform with a goal of simplifying user query language.
- Implemented LLM to interpret natural language queries and convert them to complex queries in their domain-specific language.
Evaluation Strategy:
- Level 1 evals: Unit tests for correctness, adapted for query validations.
- Level 2 evals: More complex evaluations including human feedback.

What is Modal?
- A cloud-based development platform that allows for an agile code development experience.
- Supports parallel processing, beneficial for hyperparameter tuning.
Training with Modal:
- Load Axolotl through Modal configured scripts easily handling data flags and required dependencies.
Key Commands for Modal:
- modal run: Command to initiate training with required configurations.

Deterministic Results:
- Achievable through careful control of sampling strategies during inference rather than training.
Custom Evaluations During Training:
- Potentially implemented through evaluation flags and logging capabilities integrated into Axolotl.
Using Smaller Models:
- 7B models typically considered sufficient for most tasks; smaller sizes may lose reasoning quality.
Webinars and Resources:
- YouTube presentations and community discussions to learn deeper insights available after the session.