🌌 Fine-Tuning Llama 3.2 3B on Windows with QLoRA: Complete Guide

I’ll walk you through setting up your Windows machine with a 3060 GPU for fine-tuning Llama 3.2 3B using QLoRA in 8-bit quantization with your Alpaca format dataset.

🌟 1. Windows Setup with WSL

First, you’ll need to set up Windows Subsystem for Linux (WSL) since it provides a more compatible environment for ML tools:

1. **Install WSL with Ubuntu**:

Open PowerShell as administrator and run:


wsl --install -d Ubuntu

Set up a username and password when prompted

2. **Configure GPU Access in WSL**:

Ensure you have the latest NVIDIA drivers for your 3060 GPU installed on Windows
In WSL, verify GPU access with:


nvidia-smi

If the command doesn’t work, you may need to install the WSL2 GPU driver from NVIDIA: https://developer.nvidia.com/cuda/wsl

3. **Install Essential Development Tools in WSL**:


sudo apt-get update

sudo apt-get install -y build-essential python3-dev git wget

🌟 2. Python Environment Setup

1. **Install Miniconda in WSL**:


wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86\_64.sh

bash Miniconda3-latest-Linux-x86\_64.sh

Follow the prompts to complete installation
Restart your terminal or run source ~/.bashrc

2. **Create and Activate Conda Environment**:


conda create -n llama\_ft python=3.10

conda activate llama\_ft

3. **Install PyTorch with CUDA Support**:


conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

4. **Install Fine-Tuning Dependencies**:


pip install transformers datasets accelerate peft trl bitsandbytes xformers

pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

🌟 3. Fine-Tuning Script with QLoRA 8-bit

Now I’ll provide a complete Python script for fine-tuning Llama 3.2 3B with QLoRA in 8-bit:

🌟 4. Running the Fine-Tuning

Now let’s create a script to test your fine-tuned model:

🌟 5. Step-by-Step Usage Instructions

1. **Prepare Your Alpaca Format Dataset**:

Ensure your dataset is in Alpaca format with “instruction”, “input”, and “output” fields
Save it as a JSON file or upload it to Hugging Face datasets

2. **Start Fine-Tuning**:

Navigate to your project directory in WSL and run:


python fine\_tune\_llama32\_3b.py --dataset\_path /path/to/your/dataset.json --output\_dir ./finetuned\_model --epochs 3 --batch\_size 2 --gradient\_accumulation 4

Key parameters:

--dataset\_path: Path to your Alpaca format dataset JSON file
--batch\_size: Start with 2 for a 3060 (12GB VRAM) and adjust based on CUDA out-of-memory errors
--gradient\_accumulation: Higher values allow for larger effective batch sizes with limited VRAM
--max\_seq\_length: Maximum sequence length (default 2048, reduce if needed)

3. **Test Your Fine-Tuned Model**:

After training completes, test the model with:


python test\_finetuned\_model.py --model\_path ./finetuned\_model/final --prompt "Your test prompt here"

🌟 6. Memory Optimization Tips for RTX 3060 (12GB VRAM)

1. **If You Encounter CUDA Out-of-Memory Errors**:

Reduce max\_seq\_length to 1024 or 512
Decrease batch size to 1
Increase gradient accumulation steps to 8
Use 8-bit quantization (already implemented)

2. **Monitoring GPU Usage**:

Run nvidia-smi in another terminal to monitor GPU memory usage
Use watch -n 1 nvidia-smi for continuous monitoring

3. **Efficient Training**:

Start with a small subset of your dataset to test the setup
Use checkpointing to save progress regularly

🌟 7. Understanding the Process

**QLoRA** performs fine-tuning by adding small adapter modules to the frozen quantized model, dramatically reducing memory requirements while maintaining performance.
**8-bit Quantization** reduces the model precision from FP16/FP32 to INT8, cutting memory usage by up to 50%.
**Unsloth** optimizes the training process with memory-efficient operations specifically designed for Llama models.
**Format Adaptation**: The script automatically formats your Alpaca dataset for Llama 3.2’s chat template.

🌟 8. Additional Resources

You can modify the learning rate and other hyperparameters in the script based on your specific task
For very specialized tasks, consider increasing the LoRA rank (r parameter) to 32 or 64
The models will be automatically saved to your specified output directory after each epoch

With this setup, you should be able to effectively fine-tune Llama 3.2 3B on your Windows machine with an RTX 3060. The 8-bit quantization and QLoRA approach makes this possible even with the 12GB VRAM limitation.

Fine Tuning Llama 3.2 3b On Windows With Qlora: Complete Guide

📖 Reading Mode

📖 Table of Contents