🌌 Fine-Tuning Large Language Models with Unsloth on an RTX 3060
🌟 1. Introduction: Understanding Efficient LLM Fine-Tuning with Unsloth
🚀 Welcome to this comprehensive guide! This section will give you the foundational knowledge you need. The adaptation of pre-trained Large Language Models (LLMs) to specific tasks and domains through fine-tuning has become a cornerstone of modern natural language processing. This process allows for the customization of a model’s behavior and the injection of domain-specific knowledge, leading to enhanced performance in targeted applications. However, the computational demands associated with fine-tuning these massive models can be substantial, often requiring access to high-end computing infrastructure. Unsloth emerges as a powerful solution to this challenge, providing a Python framework meticulously engineered to accelerate the fine-tuning of LLMs while drastically reducing memory consumption 1. This optimization is achieved through a complete rewriting of the underlying computations during the fine-tuning process, leveraging OpenAI’s Triton language for highly efficient GPU operations and employing manual autograd for enhanced memory management 1. By offering performance that is reportedly twice as fast as traditional methods and reducing VRAM usage by up to 70% without compromising accuracy, Unsloth makes the fine-tuning of larger models accessible on hardware like the RTX 3060 3. For users with an NVIDIA RTX 3060 GPU, which typically features 12GB of VRAM, Unsloth’s optimizations are particularly advantageous 6. The framework’s support for 4-bit quantization further diminishes memory requirements, enabling the fine-tuning of models that might otherwise exceed the GPU’s capacity 7. This report aims to provide a comprehensive guide to leveraging Unsloth for fine-tuning various prominent LLM families, including Llama, Gemma, Phi, and Granite, using a local instruction dataset. We will delve into the necessary environment setup, the preparation of the dataset, detailed code examples for each model family, explanations of crucial parameters, and considerations for training and evaluation on the RTX 3060. The performance gains offered by Unsloth compared to standard Hugging Face Transformers, even when using Flash Attention 2, are significant 3. Benchmarks available in the Unsloth GitHub repository demonstrate these improvements across different models and datasets, providing tangible evidence of the framework’s efficiency in terms of both speed and memory reduction. This makes Unsloth a highly relevant tool for users constrained by hardware limitations. Furthermore, the fact that Unsloth’s core design prioritizes memory efficiency and speed directly addresses the inherent limitations of the RTX 3060 when dealing with large language models.
🌟 2. Setting Up Your Environment for Unsloth on an RTX 3060
Before embarking on the fine-tuning process, it is crucial to establish the correct development environment. For utilizing Unsloth on an NVIDIA RTX 3060 GPU, several hardware and software prerequisites must be met 6. Naturally, an NVIDIA RTX 3060 GPU is the primary hardware requirement. On the software side, Python, preferably version 3.10 or higher, needs to be installed on the system 3. Ensuring that the appropriate CUDA drivers are installed for PyTorch is also paramount for seamless GPU utilization 3. The installation of Unsloth itself is straightforward via the Python package manager, pip. The command pip install unsloth will typically suffice for the basic installation 3. In certain scenarios, particularly if encountering performance bottlenecks or specific issues, the optional installation of xformers might be beneficial, as it can sometimes provide further speed enhancements 13. While not strictly mandatory, setting up a Hugging Face token is highly recommended, especially if the user intends to fine-tune models that require access through a gated repository 2. This token acts as an authentication credential, allowing the user to download and utilize these models. Optionally, for users interested in meticulously tracking the progress of their fine-tuning runs, integrating Weights & Biases (W&B) can be invaluable 2. W&B is a platform that provides tools for experiment tracking, visualization, and collaboration in machine learning projects. Setting it up involves obtaining an API key from the W&B website and then logging in using Python code. The following snippets illustrate this process:
Python
import wandb from kaggle_secrets import UserSecretsClient # If using Kaggle
user_secrets = UserSecretsClient() wb_token = user_secrets.get_secret(“wandb”) # Replace “wandb” with your secret name if on Kaggle
wandb.login(key=wb_token) run = wandb.init(project=‘Fine-tune YourModel’, job_type=“training”, anonymous=“allow”)
This setup allows for real-time monitoring of training metrics and provides a comprehensive overview of the fine-tuning process. The need for specific versions of PyTorch and CUDA, along with potential additional installations like xformers or a Windows-specific Triton, suggests that the environment setup might require careful attention to ensure compatibility. This is a crucial first step in successfully utilizing Unsloth. While many tutorials and examples might focus on loading datasets from the Hugging Face Hub, the user has their own local dataset.
🌟 3. Preparing Your Local Instruction Dataset for Fine-Tuning
For effective instruction fine-tuning with Unsloth, the user’s local dataset needs to adhere to a structured format, typically comprising fields for “instruction,” an optional “input,” and the corresponding “output” 16. Common file formats like JSON or CSV are well-suited for this purpose and can be readily processed using libraries such as Pandas 14. Loading a local dataset can be achieved using the datasets library’s load_dataset function, specifying the ‘csv’ or ‘json’ option and providing the path to the local file 14. Alternatively, if the data is already in a Pandas DataFrame, it can be converted into a Hugging Face Dataset object using datasets. Dataset.from_pandas 16. Here are illustrative code snippets:
Python
from datasets import load_dataset, Dataset import pandas as pd import yaml # For JSON/YAML
🌌 Loading from a CSV file
dataset = load_dataset(‘csv’, data_files=‘/path/to/your/local_dataset.csv’)
🌌 Loading from a JSON/YAML file
with open(‘/path/to/your/local_dataset.json’, ‘r’) as f: json_data = yaml.safe_load(f.read()) df = pd. DataFrame(json_data) dataset = Dataset.from_pandas(df)
Once the dataset is loaded, it needs to be formatted into a specific structure that Unsloth expects. This typically involves creating a new column named “text” which concatenates the instruction, input (if present), and output into a single prompt string, often following a template reminiscent of the Alpaca format 2. A generic example of a formatting function is shown below:
Python
def formatting_prompts_func(examples): instructions = examples[“instruction”] inputs = examples[“input”] outputs = examples[“output”] texts = for instruction, input, output in zip(instructions, inputs, outputs): text = f”### Instruction: {instruction}\n”
if input: text += f”### Input: {input}\n”
text += f”### Response: {output}”
texts.append(text) return { “text” : texts, }
This function can then be applied to the entire dataset using the dataset.map method:
Python
dataset = dataset.map(formatting_prompts_func, batched=True)
It is also crucial to append an end-of-sequence (EOS) token to the end of each formatted text string. This signals to the model when the generation should stop 2. The EOS token can be retrieved from the tokenizer:
Python
EOS_TOKEN = tokenizer.eos_token
def formatting_prompts_func_with_eos(examples):
🌌 … (previous formatting logic) …
for instruction, input, output in zip(instructions, inputs, outputs):
🌌 …
text += EOS_TOKEN texts.append(text) return { “text” : texts, }
dataset = dataset.map(formatting_prompts_func_with_eos, batched=True)
Finally, it is often beneficial to split the dataset into training and validation sets. This allows for monitoring the model’s performance on unseen data during training, helping to prevent overfitting. This can be done using the dataset.train_test_split method:
Python
train_val_dataset = dataset.train_test_split(test_size=0.1) train_dataset = train_val_dataset[“train”] eval_dataset = train_val_dataset[“test”]
The consistent requirement to format the dataset into a “text” column following a specific prompt template highlights its importance for Unsloth’s supervised fine-tuning process. This structure enables the model to learn the desired task effectively. Furthermore, understanding the different dataset formats supported by Unsloth, such as “Instruct,” allows the user to align their local data structure accordingly.
🌟 4. Comprehensive Guide and Code Examples for Fine-Tuning Llama Models with Unsloth
Fine-tuning Llama family models with Unsloth on an RTX 3060 involves a series of well-defined steps. The initial stage is loading the pre-trained Llama model and its corresponding tokenizer. This is typically done using the FastLanguageModel.from_pretrained function from the unsloth library 2. A common practice for memory efficiency on the RTX 3060 is to load the model in a 4-bit quantized format. Here’s an example:
Python
from unsloth import FastLanguageModel import torch
model_name = “unsloth/Meta-Llama-3.1-8B-bnb-4bit” # Example Llama model
max_seq_length = 2048 # Recommended starting value
dtype = None # Auto-detect data type
load_in_4bit = True
model, tokenizer = FastLanguageModel.from_pretrained( model_name=model_name, max_seq_length=max_seq_length, dtype=dtype, load_in_4bit=load_in_4bit, )
The key parameters here are model_name, which specifies the Hugging Face repository of the Llama model; max_seq_length, which determines the maximum length of input sequences (a value of 2048 is often a good starting point for the RTX 3060); dtype, which can be set to None for automatic detection; and load_in_4bit, set to True to enable 4-bit quantization for reduced memory usage 7. Next, Low-Rank Adaptation (LoRA) is typically employed for efficient fine-tuning. LoRA involves freezing the pre-trained model weights and adding a small number of trainable rank-decomposition matrices. This significantly reduces the number of trainable parameters. A LoRA configuration can be created using FastLanguageModel.get_peft_model:
Python
model = FastLanguageModel.get_peft_model( model, r=16, # Rank of the LoRA matrices
target_modules=[“q_proj”, “k_proj”, “v_proj”, “o_proj”, “gate_proj”, “up_proj”, “down_proj”], # Commonly targeted modules
lora_alpha=16, # Scaling factor for LoRA weights
lora_dropout=0, # Dropout probability for LoRA layers
bias=“none”, # Bias type for LoRA layers
use_gradient_checkpointing=“unsloth”, # Optimized gradient checkpointing for memory saving
random_state=3407, # For reproducibility
use_rslora=False, # Rank-Stabilized LoRA
loftq_config=None, # LoftQ configuration
)
Key LoRA parameters include r (the rank of the adaptation matrices, influencing memory and accuracy), target_modules (specifying which layers to apply LoRA to), lora_alpha (a scaling factor), lora_dropout (for regularization), and bias. Utilizing use_gradient_checkpointing=“unsloth” is highly recommended on the RTX 3060 as it provides further memory reduction, especially for longer sequences 1. The training process is configured using TrainingArguments from the transformers library:
Python
from transformers import TrainingArguments
output_dir = “llama_fine_tuned” # Directory to save the fine-tuned model
per_device_train_batch_size = 2 # Start with a small batch size
gradient_accumulation_steps = 4 # Increase to simulate larger batch sizes
max_steps = 60 # For a quick test, adjust for full training
learning_rate = 2e-4 # Common learning rate
fp16 = not torch.cuda.is_bf16_supported() # Use FP16 if BF16 is not supported
bf16 = torch.cuda.is_bf16_supported() # Use BF16 if supported
logging_steps = 1 save_steps = 10 seed = 3407
training_args = TrainingArguments( output_dir=output_dir, per_device_train_batch_size=per_device_train_batch_size, gradient_accumulation_steps=gradient_accumulation_steps, max_steps=max_steps, learning_rate=learning_rate, fp16=fp16, bf16=bf16, logging_steps=logging_steps, save_steps=save_steps, seed=seed, )
Important training parameters include output_dir (where the fine-tuned model will be saved), per_device_train_batch_size (start with a small value like 2 due to VRAM constraints), gradient_accumulation_steps (increase to simulate larger effective batch sizes), max_steps or num_train_epochs (to control the duration of training), learning_rate, and the precision settings (fp16 or bf16).
Finally, the SFTTrainer from the trl library is initialized and the training is started:
Python
from trl import SFTTrainer
trainer = SFTTrainer( model=model, tokenizer=tokenizer, train_dataset=train_dataset, # Your prepared training dataset
dataset_text_field=“text”, # The column containing the formatted text
max_seq_length=max_seq_length, args=training_args, )
trainer.train()
The dataset_text_field parameter should point to the “text” column created during dataset preparation. The snippets consistently demonstrate the use of 4-bit quantization and LoRA with similar parameter settings for Llama models, indicating a common and effective configuration for memory-efficient fine-tuning with Unsloth. Unsloth’s optimized gradient checkpointing can further reduce VRAM usage, which is particularly beneficial for RTX 3060 users.
🌟 5. Comprehensive Guide and Code Examples for Fine-Tuning Gemma Models with Unsloth
Fine-tuning Gemma models with Unsloth on an RTX 3060 follows a similar structure to Llama models but with some specific considerations. Loading a pre-trained Gemma model, such as Gemma 3 4B, in a 4-bit quantized format is done using FastLanguageModel.from_pretrained 3:
Python
from unsloth import FastLanguageModel
model_name = “unsloth/gemma-3-4B-it-bnb-4bit” # Example Gemma model
max_seq_length = 2048 load_in_4bit = True
model, tokenizer = FastLanguageModel.from_pretrained( model_name=model_name, max_seq_length=max_seq_length, load_in_4bit=load_in_4bit, )
Gemma models, especially Gemma 3, have specific requirements regarding the chat template format 9. Unsloth automatically handles some of these specifics, such as the double Beginning-of-Sequence (BOS) tokens present in Gemma 3 20. However, the user should ensure their local instruction dataset is formatted according to the correct chat template for Gemma. Setting up the LoRA configuration for Gemma models is similar to Llama:
Python
model = FastLanguageModel.get_peft_model( model, r=16, target_modules=[“q_proj”, “k_proj”, “v_proj”, “o_proj”, “gate_proj”, “up_proj”, “down_proj”], lora_alpha=16, lora_dropout=0, bias=“none”, use_gradient_checkpointing=“unsloth”, random_state=3407, )
The TrainingArguments and SFTTrainer initialization also follow the same pattern as with Llama models. The key difference lies in the model name and potentially the specific formatting of the input data to align with Gemma’s expected chat format. Unsloth’s blog posts highlight significant performance improvements when fine-tuning Gemma 3, demonstrating the efficiency of using the framework with this model family on resource-constrained hardware 21.
🌟 6. Comprehensive Guide and Code Examples for Fine-Tuning Phi Models with Unsloth
Fine-tuning Phi family models, such as Phi-3 Mini, with Unsloth on an RTX 3060 generally aligns with the process for Llama and Gemma 3. The initial step involves loading the pre-trained Phi model and tokenizer:
Python
from unsloth import FastLanguageModel
model_name = “unsloth/phi-3-mini-4k-bnb-4bit” # Example Phi model
max_seq_length = 2048 load_in_4bit = True
model, tokenizer = FastLanguageModel.from_pretrained( model_name=model_name, max_seq_length=max_seq_length, load_in_4bit=load_in_4bit, )
The LoRA configuration for Phi models can be set up as follows:
Python
model = FastLanguageModel.get_peft_model( model, r=16, target_modules=[“q_proj”, “k_proj”, “v_proj”, “dense”, “dense_h_to_4h”, “dense_4h_to_h”], # Target modules specific to Phi
lora_alpha=16, lora_dropout=0, bias=“none”, use_gradient_checkpointing=“unsloth”, random_state=3407, )
Note that the target_modules might differ for Phi models compared to Llama and Gemma due to architectural variations. The TrainingArguments and SFTTrainer are then configured and used in the same way. While Unsloth accelerates the fine-tuning process for Phi models, the data preparation step might require similar manual effort as with standard Hugging Face practices 12.
🌟 7. Comprehensive Guide and Code Examples for Fine-Tuning Granite Models with Unsloth
Fine-tuning Granite models, such as Granite 3.2, with Unsloth on an RTX 3060 follows the general pattern established for other model families 3. Loading the pre-trained Granite model and tokenizer is the first step:
Python
from unsloth import FastLanguageModel
model_name = “unsloth/granite-3.2-2b-instruct-bnb-4bit” # Example Granite model
max_seq_length = 2048 load_in_4bit = True
model, tokenizer = FastLanguageModel.from_pretrained( model_name=model_name, max_seq_length=max_seq_length, load_in_4bit=load_in_4bit, )
Granite models come in various versions, including instruct and vision models 25. The LoRA configuration for Granite models can be set up as follows:
Python
model = FastLanguageModel.get_peft_model( model, r=16, target_modules=[“q_proj”, “k_proj”, “v_proj”, “dense”, “dense_h_to_4h”, “dense_4h_to_h”], # Target modules for Granite
lora_alpha=16, lora_dropout=0, bias=“none”, use_gradient_checkpointing=“unsloth”, random_state=3407, )
Similar to Phi, the target_modules might need adjustment based on the specific Granite model’s architecture. The subsequent steps of setting up TrainingArguments and initializing the SFTTrainer remain consistent. Unsloth’s support for IBM’s Granite models has been explicitly mentioned, indicating compatibility 15. While a dedicated Unsloth tutorial for Granite might not be explicitly present in the provided snippets, the general fine-tuning process demonstrated for other models should be applicable, given Unsloth’s broad model support 3.
🌟 8. Deep Dive into Key Unsloth Fine-Tuning Parameters and Configurations
A thorough understanding of the key parameters in Unsloth’s fine-tuning process is essential for optimizing performance and managing resources, especially on an RTX 3060. The FastLanguageModel.from_pretrained function offers several important parameters.
model_name dictates which pre-trained model to load. max_seq_length controls the context length; while models might support longer contexts, a value of 2048 is often recommended for initial testing on the RTX 3060 to balance context and memory usage 7.
dtype can be left as None for automatic data type detection, but for newer GPUs, torch.float16 or torch.bfloat16 might be considered. The LoRA configuration parameters within FastLanguageModel.get_peft_model are equally significant. r (rank) determines the dimensionality of the low-rank matrices; higher values can improve accuracy on complex tasks but increase memory usage.
target_modules specifies the layers to apply LoRA to; the default set including attention projections is generally recommended. lora_alpha is a scaling factor for the LoRA weights. lora_dropout helps prevent overfitting. bias is best set to “none” for optimized training.
use_gradient_checkpointing=“unsloth” is vital for memory saving on the RTX 3060. random_state ensures reproducibility. Within TrainingArguments, per_device_train_batch_size and gradient_accumulation_steps are critical for managing the RTX 3060’s memory. Starting with a small batch size (e.g., 2) and increasing the accumulation steps can simulate larger batch sizes without exceeding VRAM 7.
max_steps defines the total number of training steps, while num_train_epochs specifies the number of times to iterate over the dataset. learning_rate controls the step size during optimization; common values range from 1e-4 to 5e-5 7.
fp16 and bf16 enable mixed-precision training, which can speed up training and reduce memory usage on compatible GPUs. The Unsloth documentation provides an extensive encyclopedia of LoRA parameters and key fine-tuning parameters, offering recommended settings and explaining their impact 27. This resource is invaluable for users looking to fine-tune their models effectively. Furthermore, general recommendations for training parameters such as batch size, gradient accumulation, learning rate, and epochs are available across various snippets, providing a good starting point for experimentation on the RTX 3060 7.
🌟 9. Training and Evaluation Considerations for RTX 3060
Given the RTX 3060’s 12GB of VRAM, careful memory management is paramount during the fine-tuning process. Leveraging 4-bit quantization and Unsloth’s inherent memory optimizations are the first steps. It is advisable to begin with smaller values for per_device_train_batch_size (e.g., 2) and compensate by increasing gradient_accumulation_steps to achieve a desired effective batch size 7. For initial testing and to conserve memory, setting max_seq_length to 2048 or even lower might be prudent 2. Crucially, ensure that use_gradient_checkpointing=“unsloth” is enabled in the LoRA configuration 1. Monitoring the training progress is typically done by observing the training loss reported by the SFTTrainer 7. If Weights & Biases was set up, it can provide more detailed insights into various training metrics 2. For evaluating the fine-tuned model, a basic approach is manual evaluation by engaging in conversations or providing prompts to the model and assessing the quality of its responses 7. While using dedicated evaluation datasets and metrics is possible, it might be memory-intensive on an RTX 3060. Referencing snippets that demonstrate example inference code after fine-tuning can be helpful in understanding how to interact with the model 9.
🌟 10. Saving and Utilizing Your Fine-Tuned Language Models
Once the fine-tuning process is complete, saving the model is essential for future use. The LoRA adapters, which contain the fine-tuned weights, can be saved using the model.save_pretrained method 1. This saves only the small set of modified weights. For easier deployment, the LoRA adapters can be merged with the original base model weights using model.save_pretrained_merged 1, creating a larger but self-contained model. Optionally, the saved model can be pushed to the Hugging Face Hub using trainer.push_to_hub_merged or model.push_to_hub 14. This requires a Hugging Face account and an API token 9. To use the fine-tuned model for inference, it can be loaded using FastLanguageModel.for_inference 18, and text can be generated using the loaded model and tokenizer 13. Text streamers can be employed for real-time output during generation 2. Unsloth also supports converting the fine-tuned model to other formats, such as GGUF, which is suitable for local inference with tools like llama.cpp or Ollama 12. GGUF format is optimized for CPU inference 1. Detailed instructions for this conversion can be found in Unsloth’s documentation or other relevant resources 12. The flexibility in saving formats allows users to deploy their fine-tuned models in various environments.
🌟 11. Conclusion: Empowering Efficient LLM Fine-Tuning
Unsloth offers a compelling solution for efficiently fine-tuning large language models on resource-constrained hardware like the NVIDIA RTX 3060. By providing significant speedups and memory reductions, Unsloth democratizes access to advanced NLP techniques. The fine-tuning process involves setting up the environment, preparing the local instruction dataset, loading the desired model family (Llama, Gemma, Phi, or Granite), configuring LoRA, defining training arguments, and initiating the training. Careful consideration of parameters like max_seq_length, batch size, gradient accumulation, and the use of Unsloth’s optimized gradient checkpointing are crucial for successful training on the RTX 3060. After fine-tuning, the models can be saved in various formats and utilized for inference.
🔧 Works cited
1. Finetuning with Unsloth: The Game-Changer in LLM Fine-tuning | by Sridevi Panneerselvam, accessed on March 19, 2025, https://medium.com/@sridevi17j/finetuning-with-unsloth-the-game-changer-in-llm-fine-tuning-e32262701195 2. Unsloth Guide: Optimize and Speed Up LLM Fine-Tuning - DataCamp, accessed on March 19, 2025, https://www.datacamp.com/tutorial/unsloth-guide-optimize-and-speed-up-llm-fine-tuning 3. unslothai/unsloth: Finetune Llama 3.3, DeepSeek-R1, Gemma 3 & Reasoning LLMs 2x faster with 70% less memory! - GitHub, accessed on March 19, 2025, https://github.com/unslothai/unsloth 4. Unsloth Documentation: Welcome, accessed on March 19, 2025, https://docs.unsloth.ai/ 5. Fine-Tuning Large Language Models with Unsloth | by Kushal V | Medium, accessed on March 19, 2025, https://medium.com/@kushalvala/fine-tuning-large-language-models-with-unsloth-380216a76108 6. Mistral fine-tuning resume error. · Issue #600 · unslothai/unsloth - GitHub, accessed on March 19, 2025, https://github.com/unslothai/unsloth/issues/600 7. Fine-tuning Guide - Unsloth Documentation, accessed on March 19, 2025, https://docs.unsloth.ai/get-started/fine-tuning-guide 8. Tutorial: How to Finetune Llama-3 and Use In Ollama - Unsloth Documentation, accessed on March 19, 2025, https://docs.unsloth.ai/basics/tutorial-how-to-finetune-llama-3-and-use-in-ollama 9. Finetuning Llama-3.1 8b model Using Unsloth - YouTube, accessed on March 19, 2025, https://www.youtube.com/watch?v=dFj0xVOHz68 10. A Step-by-Step Guide to Fine-Tuning Llama 7B with Unsloth and LoRA - Medium, accessed on March 19, 2025, https://medium.com/@sohanm10/a-step-by-step-guide-to-fine-tuning-llama-7b-with-unsloth-and-lora-bc00a90899a2 11. Entreprenerdly/unsloth - GitHub, accessed on March 19, 2025, https://github.com/Entreprenerdly/unsloth 12. Fast Fine Tuning with Unsloth - YouTube, accessed on March 19, 2025, https://www.youtube.com/watch?v=dMY3dBLojTk 13. Fine-Tune PHI-3.5 Model on Custom Dataset Using Free Google Colab with Unsloth, accessed on March 19, 2025, https://m.youtube.com/watch?v=iClsWNcBN8U 14. Fine-tuning Llama 3.2 Using Unsloth - KDnuggets, accessed on March 19, 2025, https://www.kdnuggets.com/fine-tuning-llama-using-unsloth 15. Gemma 3 now available in Unsloth! - Reddit, accessed on March 19, 2025, https://www.reddit.com/r/unsloth/comments/1jb8ts0/gemma_3_now_available_in_unsloth/ 16. Fine-tuning made easy with Unsloth and Colab | by Ahamed Musthafa R S | Medium, accessed on March 19, 2025, https://medium.com/@amrstech/fine-tuning-made-easy-with-unsloth-and-colab-e0993f3f4c07 17. Datasets 101 - Unsloth Documentation, accessed on March 19, 2025, https://docs.unsloth.ai/basics/datasets-101 18. How to Fine-tune Llama 2 with Unsloth? - Analytics Vidhya, accessed on March 19, 2025, https://www.analyticsvidhya.com/blog/2024/05/fine-tune-llama-2-with-unsloth/ 19. Fine-Tune Gemma-3 on Custom Dataset Locally: Step-by-Step Easy Tutorial - YouTube, accessed on March 19, 2025, https://www.youtube.com/watch?v=TWL10n8ZFCQ 20. Gemma 3 Fine-tuning now in Unsloth - 1.6x faster with 60% less VRAM - Reddit, accessed on March 19, 2025, https://www.reddit.com/r/LocalLLaMA/comments/1jba8c1/gemma_3_finetuning_now_in_unsloth_16x_faster_with/ 21. Fine-tune Gemma 3 with Unsloth, accessed on March 19, 2025, https://www.unsloth.ai/blog/gemma3 22. Fine-tune Gemma 3 with Unsloth, accessed on March 19, 2025, https://unsloth.ai/blog/gemma3 23. Unsloth Fine-tuning Notebooks for Google Colab, Kaggle, Hugging Face and more. - GitHub, accessed on March 19, 2025, https://github.com/unslothai/notebooks 24. Tutorial: How to Run Gemma 3 effectively - Unsloth Documentation, accessed on March 19, 2025, https://docs.unsloth.ai/basics/tutorial-how-to-run-gemma-3-effectively 25. unsloth/granite-3.2-2b-instruct - Hugging Face, accessed on March 19, 2025, https://huggingface.co/unsloth/granite-3.2-2b-instruct 26. unsloth/granite-vision-3.2-2b - Hugging Face, accessed on March 19, 2025, https://huggingface.co/unsloth/granite-vision-3.2-2b 27. LoRA Parameters Encyclopedia - Unsloth Documentation, accessed on March 19, 2025, https://docs.unsloth.ai/get-started/beginner-start-here/lora-parameters-encyclopedia 28. Fine-Tuning Quantized Llama 2 on a Single RTX 3060 12 GB | Innova company blog, accessed on March 19, 2025, https://medium.com/innova-technology/efficient-fine-tuning-of-quantized-llama-2-da383228ee1e 29. Home · unslothai/unsloth Wiki - GitHub, accessed on March 19, 2025, https://github.com/unslothai/unsloth/wiki