🌌 Fine-tuning LLMs with Unsloth in WSL and Google Colab

Large language models (LLMs) have revolutionized artificial intelligence, enabling machines to understand and generate human-like text with remarkable accuracy. Fine-tuning these models allows you to adapt them to specific tasks and domains, further enhancing their capabilities and making them more useful for specific applications. Unsloth is a powerful tool that streamlines and optimizes the fine-tuning process, making it faster and more efficient. Unsloth boasts several key features that make it a compelling choice for fine-tuning LLMs:

Speed and Efficiency: Unsloth leverages highly optimized kernels written in OpenAI’s Triton language and a manual backprop engine to achieve significant speed improvements during the fine-tuning process.
Accuracy: Unsloth maintains the accuracy of the original model, ensuring that fine-tuning does not degrade performance.
Hardware Compatibility: Unsloth is compatible with a wide range of NVIDIA GPUs, supporting models with varying sizes and complexities.
Operating System Support: Unsloth works seamlessly on both Linux and Windows (via WSL), providing flexibility in your development environment.
Quantization Support: Unsloth supports both 4-bit and 16-bit quantization techniques (QLoRA and LoRA), enabling you to fine-tune large models with limited resources.

🌟 Understanding Fine-tuning

Before diving into the setup process, let’s briefly discuss the concept of fine-tuning and its benefits. Fine-tuning involves taking a pre-trained LLM, which has already learned a vast amount of general knowledge from a massive dataset, and further training it on a smaller, more specific dataset. This allows you to:

Update Knowledge: Introduce new domain-specific information to the model, making it more knowledgeable in a particular area.
Customize Behavior: Adjust the model’s tone, personality, or response style to better suit your needs.
Optimize for Tasks: Improve the model’s accuracy and relevance for specific tasks, such as text summarization, question answering, or code generation. For example, you could fine-tune an LLM on a dataset of legal texts to improve its ability to analyze contracts or on a dataset of customer service conversations to enhance its ability to provide helpful and empathetic responses.

🌟 Setting up Unsloth in WSL Linux

WSL (Windows Subsystem for Linux) provides a convenient environment for running Linux applications on Windows, offering a seamless integration between the two operating systems. To set up Unsloth in WSL Linux, follow these steps:

1. Install WSL: If you haven’t already, install WSL by following the instructions on the official Microsoft documentation. This will allow you to run a Linux distribution, such as Ubuntu, within your Windows environment. 2. Install Python: Download and install Python from the official Python website. Python is the programming language used for most machine learning tasks, and it’s essential for working with Unsloth. 3. Launch WSL: Open a command prompt as administrator and run wsl -d ubuntu to launch your Ubuntu distribution on WSL. This will open a Linux terminal where you can execute commands and install software. 4. Update WSL: Update your WSL distribution by running sudo apt update && sudo apt upgrade -y. This ensures that you have the latest software packages and security updates. 5. Install Pip: Install pip, the Python package installer, by running sudo apt install python3-pip. Pip is used to install Python libraries, including Unsloth and its dependencies.

⚡ Setting up Conda Environment

While Unsloth can be installed directly using pip, setting up a Conda environment is highly recommended. Conda is a powerful package and environment management system that helps you create isolated environments with specific versions of Python and packages. This ensures that your projects have the correct dependencies and avoids conflicts between different projects. Here’s how to set up a Conda environment for Unsloth:

1. Install Miniconda: Download and install Miniconda, a minimal installer for Conda, from the official website. Miniconda provides the core Conda functionality without including a large number of pre-installed packages. 2. Create a Conda environment: Open a terminal and run the following command to create a Conda environment named unsloth_env:

conda create —name unsloth_env
python=3.11
pytorch-cuda=12.1
pytorch cudatoolkit xformers -c pytorch -c nvidia -c xformers
-y

This command creates an environment with Python 3.11, PyTorch with CUDA 12.1, cudatoolkit, and xformers. These are essential dependencies for Unsloth and deep learning tasks in general.

1. Activate the environment: Activate the newly created environment by running conda activate unsloth_env. This will switch your terminal session to the unsloth_env environment, ensuring that any packages you install or commands you run are specific to this environment. 2. Install Unsloth: Install Unsloth and its dependencies within the environment:

pip install “unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git” pip install —no-deps trl peft accelerate bitsandbytes

This ensures that Unsloth and its dependencies are installed within the isolated Conda environment.

⚡ Installing Dependencies

Unsloth relies on several dependencies for optimal performance. Ensure that you have the following dependencies installed:

PyTorch: A deep learning framework that provides essential tools for building and training neural networks. It provides data structures for tensors (multi-dimensional arrays) and automatic differentiation, which are crucial for implementing gradient-based optimization algorithms.
TRL (Transformer Reinforcement Learning): A library that facilitates reinforcement learning for transformer models. It provides tools for training LLMs using reinforcement learning algorithms, allowing you to fine-tune them for tasks that require sequential decision-making.
xformers: A library that optimizes transformer operations for improved speed and memory efficiency. It provides optimized implementations of common transformer components, such as attention mechanisms, leading to faster training and inference.
PEFT (Parameter-Efficient Fine-Tuning): A library that enables efficient fine-tuning of large language models by only updating a small subset of parameters. This reduces the computational cost and memory requirements of fine-tuning, making it feasible to fine-tune large models on resource-constrained devices.
Accelerate: A library that simplifies the process of training and deploying PyTorch models on various hardware accelerators, including GPUs and TPUs. It provides a high-level API that abstracts away the complexities of managing different hardware devices.
bitsandbytes: A library that provides efficient quantization techniques for reducing the memory footprint of large language models. It allows you to represent model parameters with fewer bits, reducing memory usage without significant loss of accuracy. You can install these dependencies using pip:

pip install torch trl nvitop xformers peft accelerate bitsandbytes

⚡ Verifying Installation

To verify that Unsloth is installed correctly, you can run the following command in your terminal:

python -c “import unsloth; print(unsloth.__version__)”

This command should print the version of Unsloth installed on your system.

🌟 Setting up Unsloth in Google Colab

Google Colab provides a free cloud-based environment for running machine learning experiments, making it an accessible platform for fine-tuning LLMs. Colab offers free access to GPUs, eliminating the need for expensive hardware. To set up Unsloth in Google Colab, follow these steps:

1. Open a Colab notebook: Create a new Colab notebook or open an existing one. Colab notebooks are interactive environments where you can write and execute code, as well as add text and visualizations. 2. Install Unsloth: In the first code cell of your notebook, run the following command to install Unsloth:

!pip install unsloth

This command installs Unsloth and its dependencies in your Colab environment.

⚡ Using Colab Notebooks for Unsloth Fine-tuning

Unsloth provides a collection of Colab notebooks that demonstrate how to fine-tune various LLMs. These notebooks offer a convenient starting point for your fine-tuning experiments. Here’s how to use them:

1. Access the notebooks: You can find the Unsloth Colab notebooks on the official Unsloth GitHub repository. These notebooks are organized by model and task, making it easy to find the one you need. 2. Select a notebook: Choose a notebook that corresponds to the LLM you want to fine-tune and the specific task you’re interested in. For example, if you want to fine-tune a Llama 2 model for conversational AI, you would select the “Llama 2 - Conversational” notebook. 3. Open in Colab: Click the “Open in Colab” button to open the notebook in your Colab environment. This will load the notebook into Colab, where you can execute the code and modify it as needed. 4. Run the notebook: Follow the instructions in the notebook to run the fine-tuning process. The notebooks typically provide step-by-step guidance and explanations, making it easy to understand the process. The notebooks typically include the following steps:

Loading the model and tokenizer: The notebook loads the pre-trained LLM and its corresponding tokenizer. The tokenizer is responsible for converting text into numerical representations that the model can understand.
Loading and processing the dataset: The notebook loads the dataset you want to use for fine-tuning and preprocesses it into a suitable format. This may involve cleaning the data, formatting it correctly, and splitting it into training and validation sets.
Setting up the model: The notebook configures the model for fine-tuning, including setting hyperparameters and enabling techniques like LoRA or QLoRA. Hyperparameters are settings that control the learning process, such as the learning rate and batch size. LoRA and QLoRA are techniques that make fine-tuning more efficient by reducing the number of parameters that need to be updated.
Model training: The notebook trains the model on the dataset, optimizing its parameters to improve performance on the specific task. This involves feeding the training data to the model and adjusting the model’s weights based on the errors it makes.
Testing the model: The notebook evaluates the fine-tuned model on a test dataset or provides an interactive interface for you to test it manually. This allows you to assess the model’s performance and see how well it generalizes to unseen data.
Saving the model: The notebook saves the fine-tuned model and tokenizer, allowing you to use it for inference or further fine-tuning. This allows you to reuse the model without having to retrain it from scratch.

⚡ Template Notebook

Unsloth provides a template notebook (Template_Notebook.ipynb) that you can use as a starting point for creating your own Colab notebooks. This template includes the basic structure and formatting guidelines for Unsloth notebooks, making it easier to create consistent and well-organized notebooks. To use the template notebook:

1. Make a copy: Create a copy of the Template_Notebook.ipynb file. 2. Rename the copy: Rename the copied file to a descriptive name that reflects the model and task you’re working on. 3. Modify the copy: Modify the copied notebook to include your own code, data, and explanations.

⚡ Choosing the Right GPU and Model Size

When fine-tuning LLMs in Colab, it’s important to select the appropriate GPU and model size based on your needs and the available resources. Colab offers different types of GPUs with varying amounts of memory. If you’re working with relatively small models and datasets, the default T4 GPU may be sufficient. However, for larger models and more demanding tasks, you may need to upgrade to a more powerful GPU, such as the A100.

⚡ Types of Fine-tuning

Unsloth supports various types of fine-tuning, each with its own applications:

Instruction Fine-tuning: This involves fine-tuning the model on a dataset of instructions and their corresponding outputs. This is commonly used to improve the model’s ability to follow instructions and generate responses in a specific format.
Conversational Fine-tuning: This involves fine-tuning the model on a dataset of conversations, such as those between a user and a chatbot. This is used to improve the model’s ability to engage in natural and coherent conversations.
GRPO (Generalized Role-Playing Optimization): This is a more advanced technique that involves training the model to reason and solve problems by playing different roles in a simulated environment. This is particularly useful for tasks that require complex reasoning and decision-making.

⚡ Dataset Preparation

The quality and size of your dataset play a crucial role in the success of your fine-tuning experiments. Here are some tips for preparing your dataset:

High-Quality Data: Ensure that your dataset is accurate, consistent, and free of errors.
Sufficient Size: The size of your dataset should be appropriate for the complexity of the task and the size of the model. Larger models generally require larger datasets.
Synthetic Data Generation: If you don’t have enough real data, you can consider using synthetic data generation techniques to create additional training examples.

⚡ Advanced Fine-tuning Techniques

Unsloth supports various advanced fine-tuning techniques that can further improve the performance of your models:

Prompt Engineering: This involves carefully designing the prompts that you use to interact with the model. Well-crafted prompts can elicit more accurate and relevant responses.
Hyperparameter Optimization: This involves systematically exploring different hyperparameter settings to find the ones that yield the best results.

⚡ Avoiding Overfitting and Underfitting

During fine-tuning, it’s important to be aware of the risks of overfitting and underfitting:

Overfitting: This occurs when the model memorizes the training data too well and fails to generalize to unseen data. This can be addressed by reducing the learning rate, lowering the number of training epochs, or increasing the dropout rate.
Underfitting: This occurs when the model fails to learn from the training data and provides responses similar to the base model. This can be addressed by increasing the learning rate, increasing the number of training epochs, or using a more complex model.

⚡ Troubleshooting

Here are some common issues you might encounter during Unsloth setup and fine-tuning, along with their solutions:

Dependency Conflicts: If you encounter errors related to dependency conflicts, ensure that you have the correct versions of all dependencies installed and that they are compatible with each other.
Out of Memory (OOM) Errors: If you encounter OOM errors, try reducing the batch size or using quantization techniques to reduce memory usage.
Incorrect Chat Template: If the model produces poor results after exporting it to another platform, ensure that you are using the same chat template that was used during fine-tuning.
Untrained Tokens: If you encounter errors related to untrained tokens, make sure that you are using the correct model version (e.g., the instruct version) and that the embed_tokens and lm_head modules are included in the target_modules list when setting up the PEFT model.

🌟 DeepSeek-R1: Unsloth’s Reasoning Model

Unsloth goes beyond basic fine-tuning by providing DeepSeek-R1, its first-generation reasoning model. DeepSeek-R1 is trained using large-scale reinforcement learning (RL) and demonstrates remarkable reasoning capabilities. It can perform tasks such as self-verification, reflection, and generating long chains of thought.

🌟 Conclusion

Unsloth provides a powerful and efficient way to fine-tune large language models, offering significant speed improvements and reduced memory usage compared to traditional methods. Its integration with Google Colab makes it accessible to a wide range of users, even those without access to expensive hardware. By following the steps outlined in this guide and leveraging the provided Colab notebooks, you can adapt LLMs to your specific needs and unlock their full potential for various NLP tasks.

🔧 Works cited

1. unslothai/unsloth: Finetune Llama 3.3, DeepSeek-R1 & Reasoning LLMs 2x faster with 70% less memory! - GitHub, https://github.com/unslothai/unsloth 2. Windows Installation | Unsloth Documentation, https://docs.unsloth.ai/get-started/installing-+-updating/windows-installation 3. Managing environments — conda 25.1.2.dev49 documentation, https://docs.conda.io/docs/user-guide/tasks/manage-environments.html 4. Conda Install - Unsloth Documentation, https://docs.unsloth.ai/get-started/installing-+-updating/conda-install 5. A Rapid Tutorial on Unsloth - Stephen Diehl, https://www.stephendiehl.com/posts/unsloth/ 6. Google Colab - Unsloth Documentation, https://docs.unsloth.ai/get-started/installing-+-updating/google-colab 7. Unsloth Fine-tuning Notebooks for Google Colab, Kaggle, Hugging Face and more.

GitHub, https://github.com/unslothai/notebooks 8. Fine-Tuning Gemma (Easiest Method with Unsloth & Colab) - YouTube, https://www.youtube.com/watch?v=pWZfufhF45o 9. Train your own R1 reasoning model locally (GRPO) - Unsloth AI, https://unsloth.ai/blog/r1-reasoning 10. Fine-tuning Guide | Unsloth Documentation, https://docs.unsloth.ai/get-started/fine-tuning-guide 11. Not able to solve RuntimeError: Unsloth: Please file a bug report! Error patching SFTTrainer · Issue #1697 - GitHub, https://github.com/unslothai/unsloth/issues/1697 12. Errors/Troubleshoot | Unsloth Documentation, https://docs.unsloth.ai/basics/errors-troubleshoot 13. Help Needed with Continual Pretraining and Instruct Fine-Tuning Using Unsloth on LLaMA Model : r/LLMDevs - Reddit, https://www.reddit.com/r/LLMDevs/comments/1f9ghww/help\_needed\_with\_continual\_pretraining\_and/ 14.

unsloth/DeepSeek-R1-GGUF - Hugging Face, https://huggingface.co/unsloth/DeepSeek-R1-GGUF 15.

Fine Tuning Llms With Unsloth In WSL And Google Colab

📖 Reading Mode

📖 Table of Contents