🌌 Unsloth AI: A Comprehensive Manual for Optimized LLM Fine-tuning
1. Introduction:
This manual provides a comprehensive guide to Unsloth AI, a framework designed to accelerate and optimize the fine-tuning of large language models (LLMs).
Addressing the complexities of navigating extensive online documentation with numerous hyperlinks, this report consolidates essential information into a structured format. It covers all critical aspects of Unsloth, from initial setup and installation to advanced features, practical usage, and illustrative examples, accompanied by necessary code snippets.
2. Unsloth AI Overview:
Unsloth AI is a Python-based framework that significantly enhances the efficiency of fine-tuning large language models such as Llama-3, Mistral, Phi-4, and Gemma 1. It achieves this by offering training speeds up to two times faster and reducing memory usage by approximately 70% without compromising accuracy 1. The core principle behind Unsloth’s performance is the manual derivation of computationally intensive mathematical operations and the implementation of handwritten GPU kernels 2. This approach allows for faster training without requiring any hardware modifications 2. Unsloth supports a wide range of NVIDIA GPUs, from Tesla T4 to H100, and is designed to be portable to AMD and Intel GPUs 2. The framework guides users through the essential steps of installing and updating Unsloth, creating custom datasets, and running and deploying their fine-tuned models 1. By fine-tuning a pre-trained model on a specialized dataset, users can update the model’s knowledge with new domain-specific information, customize its behavior to adjust tone and response style, and optimize it for specific tasks to improve accuracy and relevance 1.
3. System Requirements and Installation:
Before installing and using Unsloth, it is crucial to ensure that your system meets the necessary requirements. The documentation outlines several prerequisites depending on the intended use case 4.
3.1. System Requirements:
The fundamental requirements include Python, with version 3.10.14 specified in the environment files, and pip, version 24.0 4. For managing environments, Miniconda is utilized in the provided Dockerfile 4.
3.1.1. NVIDIA GPU Support (Pascal):
For users with NVIDIA GPUs, particularly those with Pascal architecture, Unsloth provides specific setup guidance involving Docker and Docker Compose 4. The NVIDIA Container Toolkit must be installed to enable GPU support within Docker environments 4. The Dockerfile employs nvidia/cuda:11.8.0-devel-ubuntu22.04 as the base image, indicating a need for compatible NVIDIA drivers 4.
3.1.2. Ollama Integration:
To utilize Unsloth LoRA adapters with Ollama, a local installation of Ollama (version 0.1.32 or later) is required 4. A Hugging Face account is necessary for saving and loading models and adapters from the Hugging Face Hub 4. Users will also need a base Unsloth model, such as unsloth/tinyllama, and the LoRA adapters saved online 4.
3.1.3. GGUF Conversion:
For saving models to the GGUF format, the llama.cpp repository needs to be cloned and built 4. Additionally, the gguf Python package must be installed via pip install gguf protobuf 4.
3.1.4. Faster Inference:
The 2x faster inference capability of Unsloth does not list any specific additional dependencies, suggesting it is a native feature of the library 4.
3.1.5. UTF-8 Locale:
In Google Colab environments, users might encounter a NotImplementedError related to UTF-8 locale. The documentation provides a code snippet to set the locale to UTF-8 to resolve this issue 4.
3.2. Installation Methods:
Unsloth offers several installation methods to accommodate different operating systems and user preferences 6.
3.2.1. Linux Installation (Pip):
The recommended method for Linux devices is to use pip. The command provided for installation is:
pip install —upgrade —no-cache-dir “unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git”
This command ensures that any existing version of Unsloth is uninstalled before installing the latest version from the GitHub repository 4.
3.2.2. Windows Installation:
Unsloth can be installed on Windows through several methods 8.
- Method #1 - Windows Directly:
- Begin by installing the latest NVIDIA Video Driver suitable for your GPU from the official NVIDIA website 10.
- Next, install Visual Studio Community Edition, ensuring that the C++ workload is selected during the installation. This includes components like the. NET Framework SDK, C# and Visual Basic Roslyn compilers, MSBuild, MSVC v143 build tools, C++ Redistributable Update, C++ CMake tools, C++/CLI support, MSBuild support for LLVM, C++ Clang Compiler, and the Windows SDK.
“C:\Program Files (x86)\Microsoft Visual Studio\Installer\vs_installer.exe” modify ^ —installPath “C:\Program…source
- Install the CUDA Toolkit following the instructions on the NVIDIA website. After installation, install Miniconda, which includes Python, from the official Anaconda website 10.
- Install PyTorch, carefully selecting the version that is compatible with your installed CUDA drivers from the PyTorch website 10.
- Finally, open a Conda command prompt or your terminal with Python activated and run the command 10: pip install “unsloth[windows] @ git+https://github.com/unslothai/unsloth.git”
- For users intending to use GRPO or vLLM, it’s important to note that vLLM currently does not offer direct Windows support and might require WSL or Linux 10.
- When using the SFTTrainer, setting the dataset_num_proc parameter to 1 can help avoid potential crashing issues on Windows 10: Python trainer = SFTTrainer( dataset_num_proc=1, … )
- Method #2 - Windows using PowerShell:
- Install the NVIDIA CUDA Toolkit from the official NVIDIA website. Reboot your system if prompted. No additional setup within Unsloth is required after this 10.
- Download and install Microsoft Build Tools for Visual Studio from the official website, ensuring the “C++ build tools” workload and the MSVC compiler toolset are selected 10.
- Set environment variables for the C++ compiler. Open System Properties, click “Environment Variables…”, and under System variables, add or update CC and CXX with the path to the cl.exe C++ compiler (example path provided in 10). Verify by opening a new terminal and typing cl 10.
- Install Miniconda from the official website 10.
- Download the unsloth_windows.ps1 PowerShell script (link needed). Open PowerShell as Administrator, navigate to the script’s location using cd, and run the script using 10: cd path\to\script\folder
powershell.exe -ExecutionPolicy Bypass -File .\unsloth_windows.ps1
- After the installation completes, activate the environment using 10: conda activate unsloth_env
- Method #3 - Windows via WSL:
- Install Python from the official Python website 10.
- Start WSL (should be preinstalled, or install Ubuntu from the Microsoft Store) 10.
- Update WSL 10: wsl -d ubuntu
sudo apt update && sudo apt upgrade -y
- Install pip 10: sudo apt install python3-pip
- Install Unsloth 10: pip install unsloth
- Optional: Install Jupyter Notebook 10: pip3 install notebook and launch it with 10: jupyter notebook
- Download any Unsloth Colab notebook, import it into your Jupyter Notebook environment in WSL, adjust parameters, and execute 10.
3.2.3. Conda Installation:
Unsloth can also be installed using Conda. First, save the content of the unsloth_env_file.yml (content needs to be sourced from the original file) to a local file named unsloth_env_file.yml 4. Then, open your terminal or command prompt, navigate to the directory where the file is saved, and run 4:
Bash
conda env create -f unsloth_env_file.yml
Once the environment is created, activate it using 4:
Bash
conda activate unsloth_env
Finally, install Unsloth using pip within the activated environment 4:
Bash
pip install “unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git”
3.2.4. Google Colab Setup:
The most straightforward way for beginners to start with Unsloth is by using the pre-made Google Colab notebooks 12. A comprehensive list of these notebooks, categorized by model (Gemma 3, Llama 3.1, Phi-4, Qwen2.5, Mistral), task (GRPO reasoning, conversational), and specific use case (inference chat UI, text classification), is available in the documentation 12. Users can simply open the desired notebook in Google Colab. Python
pip uninstall unsloth -y pip install —upgrade —no-cache-dir “unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git”
For Llama-3 8b in Colab, users might encounter memory limitations when fine-tuning both the lm_head and embed_tokens. In such cases, it is advisable to only include lm_head in the target_modules 4. To ensure models are saved in the .safetensors format instead of the default .bin in Colab, the safe_serialization = None argument should be used with the save_pretrained or push_to_hub methods 4.
3.2.5. Updating Unsloth:
To update Unsloth to the latest version without modifying other dependencies, users can first uninstall the current version and then install the newest one using pip 4:
Python
pip uninstall unsloth -y pip install —upgrade —no-cache-dir “unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git”
4. Getting Started with Unsloth:
Fine-tuning a large language model involves taking a pre-trained model and further training it on a smaller, specific dataset 1. This process customizes the model’s behavior, enhances its knowledge in particular domains, and optimizes its performance for specific tasks. A fundamental decision when starting is whether to use an instruct model or a base model 19. Instruct models are pre-trained with instructions, making them readily usable for instruction-following tasks 19. Base models, conversely, are the original pre-trained versions without instruction fine-tuning, designed for extensive customization 19. The choice often hinges on the size and quality of your dataset; larger datasets (1,000+ rows) typically benefit from fine-tuning a base model, while smaller, high-quality datasets (300-1,000 rows) can work well with either 19. Unsloth utilizes parameter-efficient fine-tuning (PEFT) techniques, primarily Low-Rank Adaptation (LoRA) and Quantized LoRA (QLoRA), to minimize the computational resources required for fine-tuning large models 16. LoRA freezes the original model weights and introduces small, trainable matrices (adapters) 16. QLoRA combines LoRA with 4-bit quantization of the model weights, further reducing memory usage 16. Starting with QLoRA is often recommended due to its efficiency and accessibility 16.
5. Fine-tuning Guide:
The process of fine-tuning with Unsloth involves several key steps, each requiring careful consideration to achieve optimal results 16.
5.1. Choosing the Right Model:
Selecting the appropriate pre-trained model is paramount. Consider the specific application of the fine-tuned model; for instance, vision models are suited for image-based tasks, while specialized models like Qwen Coder 2.5 are ideal for code-related datasets 19. It is also important to review the licensing terms and system requirements associated with different models to ensure compatibility 19. Unsloth supports a wide range of models, and their documentation provides guidance on selecting the most suitable one for your needs 16.
5.2. Preparing Your Dataset:
The quality and format of your dataset are critical determinants of the fine-tuned model’s performance 16. For language models, datasets typically consist of text data that needs to be tokenized 20. A common format involves question-answer pairs or instructions paired with desired outputs 16. For example, the Alpaca dataset, frequently used in Unsloth examples, contains “instruction” and “output” columns 16. For conversational tasks, the ShareGPT format, with alternating turns between “human” and “assistant,” is often used 20. The size of your dataset is also a factor; while a minimum of 100 rows is suggested for reasonable results, datasets with over 300 rows tend to yield better outcomes 20.
5.3. Understanding Model Parameters:
Fine-tuning involves adjusting various hyperparameters that control the training process 16. The learning rate defines the magnitude of weight adjustments during each training step; higher rates lead to faster training but risk overfitting, while lower rates offer more stable training but may require more epochs 16. The number of epochs specifies how many times the model iterates over the entire training dataset; typically, 1-3 epochs are recommended to balance learning and the risk of overfitting 16. Batch size determines the number of samples processed before the model’s parameters are updated, and gradient accumulation steps allow for simulating larger batch sizes without increased memory usage 16.
5.4. Avoiding Overfitting and Underfitting:
Overfitting occurs when a model learns the training data too well, including its noise and outliers, and consequently performs poorly on new, unseen data 16. Underfitting, on the other hand, happens when the model fails to capture the underlying patterns in the training data and performs poorly even on the training set 16. To combat overfitting, strategies include reducing the learning rate, lowering the number of training epochs, combining your specific dataset with more general data, and increasing the dropout rate for regularization 16. Addressing underfitting might involve increasing the learning rate, training for more epochs, or using a dataset that is more relevant to the task 16. The optimal approach often involves experimentation to find the right balance for your particular needs 16.
6. Key Features and Functionalities:
Unsloth AI offers a range of features designed to optimize and enhance the fine-tuning experience for large language models 3.
6.1. Optimized Training:
Unsloth is engineered to provide significant performance improvements in LLM fine-tuning, achieving up to 2x faster training speeds and reducing VRAM usage by up to 80% compared to standard methods 2. This efficiency is attributed to the framework’s manually derived GPU kernels and custom backpropagation engine, which ensure rapid computation without sacrificing model accuracy 2.
6.2. Model Support:
The framework demonstrates broad compatibility with a variety of popular LLM architectures, including Llama-3, Mistral, Phi-4, Gemma, and Qwen 1. Unsloth provides optimized support for these models, often accompanied by specific configurations and tutorials to facilitate their use 3. A comprehensive collection of GGUF, 16-bit, and 4-bit quantized versions of these models, uploaded by the Unsloth team, is readily available on the Hugging Face platform 22.
6.3. Reasoning with GRPO and RL:
Unsloth incorporates advanced techniques for training models with enhanced reasoning capabilities, most notably Group Relative Policy Optimization (GRPO), a method developed by DeepSeek 23. GRPO efficiently optimizes model responses without relying on a value function model, thereby reducing memory and computational demands 23.
6.4. Vision Fine-tuning:
For multimodal applications, Unsloth supports the fine-tuning of vision models, including Llama 3.2 Vision, Qwen 2 VL, and Pixtral 26. The documentation provides example notebooks that illustrate use cases such as analyzing radiography images, converting handwritten text to LaTeX, and performing general visual question answering 26. It is recommended that users prepare datasets with images of consistent dimensions to ensure efficient and effective training 26.
6.5. Continued Pretraining:
Unsloth enables continued pretraining, also known as continual fine-tuning, which is vital for adapting language models to new domains, languages, or data distributions that were not well-represented in their initial pretraining 27. The documentation offers notebooks and guidance for tasks such as text completion and learning new languages 27.
6.6. Chat Templates:
Unsloth simplifies the management of various chat template formats used by different language models through its get_chat_template function 28. This function supports popular templates like zephyr, chatml, mistral, llama, alpaca, vicuna, and unsloth, facilitating the easy formatting of conversational data for training and inference 28.
7. Using Unsloth: Running and Saving Models:
After fine-tuning a model with Unsloth, the next crucial steps involve running inference and saving the model for future use or deployment 30.
7.1. Running Inference:
Unsloth incorporates a natively optimized inference engine that provides a 2x speed enhancement 31. To activate this faster inference, users simply need to call the FastLanguageModel.for_inference(model) function after loading their fine-tuned model 16. The documentation also addresses a common error, NotImplementedError: A UTF-8 locale is required, which can occur in some environments, and offers a code snippet to resolve it 31.
7.2. Saving to GGUF:
To save a fine-tuned model in the GGUF format, which is compatible with various inference engines like Ollama, Jan AI, and Open WebUI, Unsloth provides the save_pretrained_gguf function 33. This function allows users to specify different quantization methods, such as q4_k_m, q8_0, and f16, to balance model size, inference speed, and accuracy according to their needs 33. The documentation also includes a detailed, step-by-step guide for manually converting models to the GGUF format using the llama.cpp toolkit.
7.3. Saving to Ollama:
Unsloth streamlines the process of exporting fine-tuned models for use with Ollama, a tool for running LLMs locally 16. This typically begins with installing Ollama, especially in cloud environments like Google Colab, followed by exporting the model to the GGUF format 34. A significant feature of Unsloth is its ability to automatically generate the Modelfile that Ollama requires. This file contains essential settings and includes the chat template that was used during the fine-tuning process 34.
7.4. Saving to vLLM:
For users who prefer the vLLM inference library, Unsloth offers methods to save fine-tuned models in compatible formats 35. This includes saving the merged model in 16-bit precision using the save_pretrained_merged function with the save_method = “merged_16bit” argument 35.
7.5. Saving LoRA Adapters:
Users who wish to save only the LoRA adapters, which are the small, trainable weight matrices added during fine-tuning, can do so using Unsloth 35. This can be achieved either by saving the model and tokenizer separately using their respective save_pretrained methods or by using the save_pretrained_merged function with the save_method = “lora” argument 35.
7.6. Troubleshooting Saving and Running Models:
The Unsloth documentation includes a dedicated troubleshooting section to address common issues that users may encounter when saving and running their fine-tuned models 14. This section provides guidance on resolving discrepancies in model performance between Unsloth and other inference platforms, which often stem from incorrect chat template usage or problems with start-of-sequence tokens 14. It also offers solutions for crashes or out-of-memory errors that might occur when saving models to GGUF or vLLM in 16-bit precision, suggesting reducing the maximum GPU memory usage during the saving process 14. Additionally, the troubleshooting section covers out-of-memory errors during the evaluation loop, recommending a reduction in the evaluation batch size 32, and reiterates the solution for the NotImplementedError related to UTF-8 locale 31.
8. Examples and Tutorials:
Unsloth AI provides a rich set of examples and tutorials to help users understand and implement various functionalities 12. These resources offer practical demonstrations and step-by-step guidance for a wide array of models and tasks.
-
A tutorial demonstrates how to train a reasoning model using GRPO, complete with specific instructions and code snippets 1.
-
A guide explains how to run the DeepSeek-R1 reasoning model locally using llama.cpp, including setup instructions and example commands 1.
-
Another tutorial details the effective execution of the QwQ-32B reasoning model, providing recommended settings and instructions for use with Ollama and llama.cpp 1.
-
A step-by-step guide is available on running and fine-tuning the Gemma 3 family of models using Unsloth 1.
-
A beginner-friendly tutorial illustrates the process of fine-tuning Llama-3 and deploying it locally using Ollama, including automatic Modelfile creation 40.
-
The documentation includes categorized lists of Unsloth notebooks available on Google Colab and Kaggle. These notebooks cover a variety of base models (Gemma, Llama, Phi, Mistral, Qwen), vision tasks, Ollama integration, continued pretraining, and specific use cases like text classification and conversational AI 12.
-
Specific examples within the notebooks, such as text classification, handling multiple datasets, using KTO, creating conversational agents, working with ChatML format, and performing text completion, are also highlighted 12.
-
Based on the Analytics Vidhya blog, a tutorial guides users through fine-tuning Llama 2 with Unsloth, covering dataset preparation, model training, inference, and LoRA integration 18. These examples and tutorials serve as valuable hands-on resources, enabling users to quickly apply Unsloth to their specific use cases and accelerate their learning process.
9. Troubleshooting:
The Unsloth documentation compiles various troubleshooting tips and solutions to assist users in resolving common issues 32.
-
Performance Discrepancies: If a model performs well in Unsloth but poorly on other platforms like Ollama or vLLM, the most likely cause is an incorrect chat template. Ensure the same template used during training is applied during inference. Also, check if the inference engine is adding or missing a start-of-sequence token 14.
-
Saving Crashes (GGUF/vLLM 16bit): Out-of-memory errors during saving can sometimes be resolved by reducing the maximum GPU usage. The default is 75%; try reducing it to 50% or lower 14.
-
Evaluation Loop Errors: Out-of-memory errors during evaluation often occur due to a batch size that is too high. Try setting the evaluation batch size to 2 or lower 32.
-
UTF-8 Locale Error: The NotImplementedError: A UTF-8 locale is required error can be fixed by running the following code in a new cell 31: Python import locale locale.getpreferredencoding = lambda: “UTF-8”
-
Saving to safetensors in Colab: To force saving models to the .safetensors format instead of .bin in Google Colab, use safe_serialization = None in the save_pretrained or push_to_hub methods 14.
10. Benchmarks:
Unsloth AI demonstrates significant performance advantages in terms of speed and memory efficiency, as highlighted by several benchmarks 21.
- When tested on the Alpaca Dataset with specific training parameters, Unsloth achieved a 2x faster training speed and over 70% reduction in VRAM usage for both Llama 3.1 (8B) and Llama 3.3 (70B) models compared to standard Hugging Face implementations using Flash Attention 2 21. The following table summarizes these findings:
Model | VRAM | 🦥Unsloth speed | 🦥VRAM reduction | Longer context | 😊Hugging Face + FA2 |
---|---|---|---|---|---|
Llama 3.3 (70B) | 80GB | 2x | >75% | 13x longer | 1x |
Llama 3.1 (8B) | 80GB | 2x | >70% | 12x longer | 1x |
-
Context length benchmarks for Llama 3.1 (8B) showed that Unsloth can handle significantly longer context lengths at various GPU VRAM capacities compared to Hugging Face + FA2. For instance, on a 12GB GPU, Unsloth achieved a maximum context length of 21,848, while Hugging Face + FA2 reached only 932 before running out of memory 21.
-
Similarly, tests with Llama 3.3 (70B) on 48GB and 80GB GPUs revealed that Unsloth supports substantially longer context lengths than the standard implementation 21. More detailed benchmarking results are available on the Unsloth AI blog and the Hugging Face blog 21.
11. Conclusion and Further Resources:
Unsloth AI stands out as a powerful framework for optimizing the fine-tuning of large language models, offering remarkable improvements in training speed and memory efficiency without compromising accuracy. Its broad compatibility with popular models and its support for advanced features like GRPO reasoning and vision fine-tuning make it a versatile tool for a wide range of applications. For further learning and updates, users are encouraged to visit the Unsloth AI blog and the official GitHub repository. Additionally, the Discord servers for Unsloth AI and Ollama provide platforms for community engagement and support. These resources offer valuable insights, additional tutorials, and direct interaction with the developers and other users.
🔧 Works cited
1. Unsloth Documentation: Welcome, accessed on March 22, 2025, https://docs.unsloth.ai/ 2. Unsloth AI - Open Source Fine-Tuning for LLMs, accessed on March 22, 2025, https://unsloth.ai/ 3. unslothai/unsloth: Finetune Llama 3.3, DeepSeek-R1, Gemma 3 & Reasoning LLMs 2x faster with 70% less memory! - GitHub, accessed on March 22, 2025, https://github.com/unslothai/unsloth 4. Home · unslothai/unsloth Wiki · GitHub, accessed on March 22, 2025, https://github.com/unslothai/unsloth/wiki/Home/f961aac2ad938b243fe5ed58d1c3f8a2c9b8f128 5. accessed on December 31, 1969, https://docs.unsloth.ai/get-started/faq 6. accessed on December 31, 1969, https://docs.unsloth.ai/get-started/installing-and-updating 7. Installing + Updating - Unsloth Documentation, accessed on March 22, 2025, https://docs.unsloth.ai/get-started/installing-+-updating 8. accessed on December 31, 1969, https://docs.unsloth.ai/get-started/installing-and-updating/windows-installation 9. Install Unsloth on your Windows and Finetune LLM models Locally | by Vinod Kumar G R, accessed on March 22, 2025, https://medium.com/@vinodkumargr/install-unsloth-on-your-windows-and-finetune-llm-models-locally-20ebcce34014 10. Windows Installation | Unsloth Documentation, accessed on March 22, 2025, https://docs.unsloth.ai/get-started/installing-+-updating/windows-installation 11. Unsloth now works for Windows! - Reddit, accessed on March 22, 2025, https://www.reddit.com/r/unsloth/comments/1j0jbbi/unsloth_now_works_for_windows/ 12. Unsloth Notebooks | Unsloth Documentation, accessed on March 22, 2025, https://docs.unsloth.ai/get-started/unsloth-notebooks 13. accessed on December 31, 1969, https://docs.unsloth.ai/get-started/installing-and-updating/google-colab 14. Troubleshooting | Unsloth Documentation, accessed on March 22, 2025, https://docs.unsloth.ai/basics/running-and-saving-models/troubleshooting 15. accessed on December 31, 1969, https://docs.unsloth.ai/get-started/installing-and-updating/updating 16. Fine-tuning Guide - Unsloth Documentation, accessed on March 22, 2025, https://docs.unsloth.ai/get-started/fine-tuning-guide 17. FAQ + Is Fine-tuning Right For Me? | Unsloth Documentation, accessed on March 22, 2025, https://docs.unsloth.ai/get-started/beginner-start-here/faq-+-is-fine-tuning-right-for-me 18. How to Fine-tune Llama 2 with Unsloth? - Analytics Vidhya, accessed on March 22, 2025, https://www.analyticsvidhya.com/blog/2024/05/fine-tune-llama-2-with-unsloth/ 19. What Model Should I Use? | Unsloth Documentation, accessed on March 22, 2025, https://docs.unsloth.ai/get-started/beginner-start-here/what-model-should-i-use 20. Datasets 101 | Unsloth Documentation, accessed on March 22, 2025, https://docs.unsloth.ai/basics/datasets-101 21. Unsloth Benchmarks | Unsloth Documentation, accessed on March 22, 2025, https://docs.unsloth.ai/basics/unsloth-benchmarks 22. All Our Models | Unsloth Documentation, accessed on March 22, 2025, https://docs.unsloth.ai/get-started/all-our-models 23. Reasoning - GRPO & RL | Unsloth Documentation, accessed on March 22, 2025, https://docs.unsloth.ai/basics/reasoning-grpo-and-rl 24. Tutorial: Train your own Reasoning model with GRPO | Unsloth …, accessed on March 22, 2025, https://docs.unsloth.ai/basics/reasoning-grpo-and-rl/tutorial-train-your-own-reasoning-model-with-grpo 25. Reinforcement Learning - DPO, ORPO & KTO | Unsloth …, accessed on March 22, 2025, https://docs.unsloth.ai/basics/reasoning-grpo-and-rl/reinforcement-learning-dpo-orpo-and-kto 26. Vision Fine-tuning | Unsloth Documentation, accessed on March 22, 2025, https://docs.unsloth.ai/basics/vision-fine-tuning 27. Continued Pretraining | Unsloth Documentation, accessed on March 22, 2025, https://docs.unsloth.ai/basics/continued-pretraining 28. Chat Templates | Unsloth Documentation, accessed on March 22, 2025, https://docs.unsloth.ai/basics/chat-templates 29. Home · unslothai/unsloth Wiki - GitHub, accessed on March 22, 2025, https://github.com/unslothai/unsloth/wiki 30. Running & Saving Models | Unsloth Documentation, accessed on March 22, 2025, https://docs.unsloth.ai/basics/running-and-saving-models 31. Inference | Unsloth Documentation, accessed on March 22, 2025, https://docs.unsloth.ai/basics/running-and-saving-models/inference 32. Errors/Troubleshooting | Unsloth Documentation, accessed on March 22, 2025, https://docs.unsloth.ai/basics/errors-troubleshooting 33. Saving to GGUF | Unsloth Documentation, accessed on March 22, 2025, https://docs.unsloth.ai/basics/running-and-saving-models/saving-to-gguf 34. Saving to Ollama | Unsloth Documentation, accessed on March 22, 2025, https://docs.unsloth.ai/basics/running-and-saving-models/saving-to-ollama 35. Saving to VLLM | Unsloth Documentation, accessed on March 22, 2025, https://docs.unsloth.ai/basics/running-and-saving-models/saving-to-vllm 36. Tutorial: How to Run DeepSeek-R1 Locally | Unsloth Documentation, accessed on March 22, 2025, https://docs.unsloth.ai/basics/tutorial-how-to-run-deepseek-r1-locally 37. DeepSeek-R1 Dynamic 1.58-bit | Unsloth Documentation, accessed on March 22, 2025, https://docs.unsloth.ai/basics/tutorial-how-to-run-deepseek-r1-locally/deepseek-r1-dynamic-1.58-bit 38. Tutorial: How to Run QwQ-32B effectively | Unsloth Documentation, accessed on March 22, 2025, https://docs.unsloth.ai/basics/tutorial-how-to-run-qwq-32b-effectively 39. Tutorial: How to Run & Fine-tune Gemma 3 | Unsloth Documentation, accessed on March 22, 2025, https://docs.unsloth.ai/basics/tutorial-how-to-run-and-fine-tune-gemma-3 40. Tutorial: How to Finetune Llama-3 and Use In Ollama | Unsloth …, accessed on March 22, 2025, https://docs.unsloth.ai/basics/tutorial-how-to-finetune-llama-3-and-use-in-ollama 41. Unsloth Guide: Optimize and Speed Up LLM Fine-Tuning - DataCamp, accessed on March 22, 2025, https://www.datacamp.com/tutorial/unsloth-guide-optimize-and-speed-up-llm-fine-tuning