⚑ Technical Documentation

Fine Tuning Llama 3.2 3b On Windows With Qlora: Complete Guide

Technical guide covering fine tuning llama 3.2 3b on windows with qlora: complete guide

πŸ‘€
Author
Cosmic Lounge AI Team
πŸ“…
Updated
6/1/2025
⏱️
Read Time
8 min
Topics
#ai #model #fine-tuning #training #gpu #cuda #pytorch #setup #design

πŸ“– Reading Mode

πŸ“– Table of Contents

🌌 Fine-Tuning Llama 3.2 3B on Windows with QLoRA: Complete Guide

I’ll walk you through setting up your Windows machine with a 3060 GPU for fine-tuning Llama 3.2 3B using QLoRA in 8-bit quantization with your Alpaca format dataset.



🌟 1. Windows Setup with WSL

First, you’ll need to set up Windows Subsystem for Linux (WSL) since it provides a more compatible environment for ML tools:

1. **Install WSL with Ubuntu**:

  • Open PowerShell as administrator and run:

wsl --install -d Ubuntu
  • Set up a username and password when prompted

2. **Configure GPU Access in WSL**:

  • Ensure you have the latest NVIDIA drivers for your 3060 GPU installed on Windows

  • In WSL, verify GPU access with:


nvidia-smi

3. **Install Essential Development Tools in WSL**:


sudo apt-get update

sudo apt-get install -y build-essential python3-dev git wget


🌟 2. Python Environment Setup

1. **Install Miniconda in WSL**:


wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86\_64.sh

bash Miniconda3-latest-Linux-x86\_64.sh
  • Follow the prompts to complete installation

  • Restart your terminal or run source ~/.bashrc

2. **Create and Activate Conda Environment**:


conda create -n llama\_ft python=3.10

conda activate llama\_ft

3. **Install PyTorch with CUDA Support**:


conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

4. **Install Fine-Tuning Dependencies**:


pip install transformers datasets accelerate peft trl bitsandbytes xformers

pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"


🌟 3. Fine-Tuning Script with QLoRA 8-bit

Now I’ll provide a complete Python script for fine-tuning Llama 3.2 3B with QLoRA in 8-bit:



🌟 4. Running the Fine-Tuning

Now let’s create a script to test your fine-tuned model:



🌟 5. Step-by-Step Usage Instructions

1. **Prepare Your Alpaca Format Dataset**:

  • Ensure your dataset is in Alpaca format with β€œinstruction”, β€œinput”, and β€œoutput” fields

  • Save it as a JSON file or upload it to Hugging Face datasets

2. **Start Fine-Tuning**:

Navigate to your project directory in WSL and run:


python fine\_tune\_llama32\_3b.py --dataset\_path /path/to/your/dataset.json --output\_dir ./finetuned\_model --epochs 3 --batch\_size 2 --gradient\_accumulation 4

Key parameters:

  • --dataset\_path: Path to your Alpaca format dataset JSON file

  • --batch\_size: Start with 2 for a 3060 (12GB VRAM) and adjust based on CUDA out-of-memory errors

  • --gradient\_accumulation: Higher values allow for larger effective batch sizes with limited VRAM

  • --max\_seq\_length: Maximum sequence length (default 2048, reduce if needed)

3. **Test Your Fine-Tuned Model**:

After training completes, test the model with:


python test\_finetuned\_model.py --model\_path ./finetuned\_model/final --prompt "Your test prompt here"


🌟 6. Memory Optimization Tips for RTX 3060 (12GB VRAM)

1. **If You Encounter CUDA Out-of-Memory Errors**:

  • Reduce max\_seq\_length to 1024 or 512

  • Decrease batch size to 1

  • Increase gradient accumulation steps to 8

  • Use 8-bit quantization (already implemented)

2. **Monitoring GPU Usage**:

  • Run nvidia-smi in another terminal to monitor GPU memory usage

  • Use watch -n 1 nvidia-smi for continuous monitoring

3. **Efficient Training**:

  • Start with a small subset of your dataset to test the setup

  • Use checkpointing to save progress regularly



🌟 7. Understanding the Process

  • **QLoRA** performs fine-tuning by adding small adapter modules to the frozen quantized model, dramatically reducing memory requirements while maintaining performance.

  • **8-bit Quantization** reduces the model precision from FP16/FP32 to INT8, cutting memory usage by up to 50%.

  • **Unsloth** optimizes the training process with memory-efficient operations specifically designed for Llama models.

  • **Format Adaptation**: The script automatically formats your Alpaca dataset for Llama 3.2’s chat template.



🌟 8. Additional Resources

  • You can modify the learning rate and other hyperparameters in the script based on your specific task

  • For very specialized tasks, consider increasing the LoRA rank (r parameter) to 32 or 64

  • The models will be automatically saved to your specified output directory after each epoch

With this setup, you should be able to effectively fine-tune Llama 3.2 3B on your Windows machine with an RTX 3060. The 8-bit quantization and QLoRA approach makes this possible even with the 12GB VRAM limitation.