Technical Documentation

Fine Tuning Large Language Models With Trl And Sfttrainer On An Rtx 3060

Technical guide covering **fine tuning large language models with trl and sfttrainer on an rtx 3060**

👤
Author
Cosmic Lounge AI Team
📅
Updated
6/1/2025
⏱️
Read Time
13 min
Topics
#llm #ai #model #fine-tuning #training #gpu #cuda #pytorch #introduction #design

📖 Reading Mode

📖 Table of Contents

🌌 Fine-Tuning Large Language Models with TRL and SFTTrainer on an RTX 3060



🌟 1. Introduction to Fine-Tuning and the TRL Library

🚀 Welcome to this comprehensive guide! This section will give you the foundational knowledge you need. Fine-tuning large language models (LLMs) has become a crucial technique for adapting pre-trained models to specific downstream tasks or domains. This process leverages the extensive knowledge already embedded within these models and refines it with a smaller, task-specific dataset, leading to improved performance and more relevant outputs 1. The Transformer Reinforcement Learning (TRL) library, developed by Hugging Face, provides a suite of tools and trainers designed to simplify the post-training of foundation models 3. Built on top of the popular 🤗 Transformers ecosystem, TRL offers high-level abstractions for advanced techniques such as Supervised Fine-Tuning (SFT), Proximal Policy Optimization (PPO), and Direct Preference Optimization (DPO) 3. The library is engineered for efficiency and scalability, leveraging 🤗 Accelerate to support training across various hardware setups, from single GPUs to multi-node clusters 3. Furthermore, its integration with 🤗 PEFT (Parameter-Efficient Fine-Tuning) enables training on large models with limited computational resources through techniques like quantization and Low-Rank Adaptation (LoRA) 3. Among the trainers offered by TRL, the SFTTrainer stands out for its user-friendly API that facilitates supervised fine-tuning with just a few lines of code 3.



🌟 2. Setting Up the Development Environment

Before embarking on the fine-tuning process, it is essential to set up the development environment with the necessary libraries. This typically involves installing PyTorch, the foundational deep learning framework, along with the Hugging Face libraries, including transformers, datasets, accelerate, peft, and trl 7. Additionally, for enhanced performance on NVIDIA GPUs, ensuring the correct CUDA drivers are installed is crucial. For users with newer NVIDIA GPUs (Ampere architecture or later), installing Flash Attention can significantly speed up training and reduce memory usage by optimizing the attention computation 7. This can be particularly beneficial when working with longer sequences. The installation command typically involves using pip along with specific flags to ensure compatibility with the hardware and software environment 7. To manage and track experiments, as well as to potentially share the fine-tuned models, integrating with platforms like Weights & Biases (W&B) is often recommended 9. This involves installing the wandb library and setting up an account. Environment variables can be used to configure the W&B project name and logging behavior 9. In some cases, particularly when dealing with specific models or requiring the latest features, installing the TRL library from its source on GitHub might be necessary 3. This can be done using pip install git+https://github.com/huggingface/trl.git. For general usage, installing the stable release via pip install trl is usually sufficient 3.



🌟 3. Comprehensive Python Code Examples for Fine-Tuning phi4

  • 3.1. Loading the Model and Tokenizer The first step in fine-tuning any language model is to load the pre-trained model and its associated tokenizer. Hugging Face’s transformers library provides the AutoModelForCausalLM and AutoTokenizer classes for this purpose, which can automatically infer the model architecture and tokenizer based on the provided model name or path 4. For the phi4 model, the model identifier on the Hugging Face Hub would be used. To address the memory constraints of an RTX 3060, loading the model in a quantized format, such as 4-bit, is highly recommended. This significantly reduces the model’s memory footprint, allowing for larger batch sizes or the fine-tuning of larger models than would otherwise be possible 11. Libraries like bitsandbytes or Unsloth’s FastLanguageModel can facilitate this quantization process 7.

⚡ Code Example 1 (Loading phi4 with transformers):

Python from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = “microsoft/phi-2” # Or a specific phi4 variant if available

tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) print(f”Phi-2 model loaded with {model.config.num_parameters():,} parameters.”)

⚡ Code Example 2 (Loading phi4 in 4-bit with Unsloth):

Python from unsloth import FastLanguageModel

model_name = “unsloth/Phi-4” # Example for a potentially available phi4 variant on Unsloth

max_seq_length = 2048 # Adjust as needed

load_in_4bit = True # Reduces memory usage

🌌 Load model in 4-bit for significant memory reduction

model, tokenizer = FastLanguageModel.from_pretrained( model_name=model_name, max_seq_length=max_seq_length, load_in_4bit=load_in_4bit, ) print(f”Phi-4 model loaded in 4-bit with Unsloth.”)

  • 3.2. Preparing the Dataset The SFTTrainer in TRL expects the training data to be in a specific format. For standard language modeling tasks, the dataset should ideally have a column named “text” containing the text sequences to be used for fine-tuning 15. For conversational fine-tuning, the dataset might use a “messages” column with a list of dictionaries, each representing a turn in the conversation with “role” and “content” keys 4. Loading a dataset from the Hugging Face Hub is straightforward using the load_dataset() function from the datasets library 3. Numerous datasets suitable for instruction fine-tuning are available, such as the Databricks Dolly 15k dataset 12. Formatting the dataset to match the phi4 model’s expected input format is crucial. When using libraries like Unsloth, the tokenizer often has a specific chat template that needs to be applied to structure the conversations correctly 11. The tokenizer.apply_chat_template() method can be used to format conversations according to this template. This is particularly important for multi-turn conversations where the roles of the user and assistant need to be clearly delineated 11. The dataset_text_field parameter in the SFTTrainer constructor specifies the name of the column in the dataset that contains the text data to be used for training 6. This tells the trainer which part of the dataset to process. For memory efficiency, especially when dealing with shorter sequences, the SFTTrainer supports dataset packing via the packing=True option 6. This technique uses the ConstantLengthDataset utility to pack multiple short examples into a single longer sequence up to the max_seq_length, reducing padding and improving training efficiency 6.

⚡ Code Example 3 (Loading and formatting dataset for phi4 with Unsloth):

Python from datasets import load_dataset from unsloth.chat_templates import get_chat_template

dataset_name = “databricks/databricks-dolly-15k” # Example instruction dataset

dataset = load_dataset(dataset_name, split=“train”)

🌌 Assuming tokenizer is already loaded from previous step

tokenizer = get_chat_template(tokenizer, chat_template=“phi-4”)

def formatting_prompts_func(examples): texts = [ tokenizer.apply_chat_template( {“conversations”: [{“role”: “user”, “content”: instruction}, {“role”: “assistant”, “content”: response}]}, tokenize=False, add_generation_prompt=False, ) for instruction, response in zip(examples[“instruction”], examples[“response”]) ] return {“text”: texts}

processed_dataset = dataset.map(formatting_prompts_func, batched=True)

  • 3.3. Basic Fine-Tuning Implementation with SFTTrainer The SFTTrainer class from the trl library simplifies the supervised fine-tuning process 3. It requires a set of TrainingArguments to configure the training parameters 9. These arguments control aspects such as the output directory for saving the model, the batch size used during training, the learning rate, the number of training epochs, and the directory for saving logs 9. After defining the TrainingArguments, the SFTTrainer is initialized with the pre-trained model, the processed training dataset, the tokenizer, and the training arguments 3. The dataset_text_field parameter is used to specify the column containing the training text 6. Finally, the trainer.train() method initiates the fine-tuning process 3. Once training is complete, the fine-tuned model can be saved using trainer.save_model() 9.

⚡ Code Example 4 (Basic SFTTrainer usage for phi4):

Python from trl import SFTTrainer from transformers import TrainingArguments

training_args = TrainingArguments( output_dir=”./phi4_finetuned”, per_device_train_batch_size=2, # Adjust based on your GPU memory

gradient_accumulation_steps=8, # Adjust to effectively increase batch size

learning_rate=2e-4, num_train_epochs=3, logging_dir=”./logs”, fp16=True, # Use mixed precision for memory efficiency and speed

)

trainer = SFTTrainer( model=model, train_dataset=processed_dataset, tokenizer=tokenizer, args=training_args, dataset_text_field=“text”, )

trainer.train() trainer.save_model(”./phi4_finetuned”)

  • 3.4. Exploring Key SFTTrainer Configuration Options Several parameters within TrainingArguments are particularly relevant for fine-tuning on an RTX 3060 with limited memory 4. The per_device_train_batch_size determines the number of training examples processed in parallel on each GPU. Due to the RTX 3060’s memory capacity, this value might need to be kept relatively low to avoid out-of-memory errors. gradient_accumulation_steps allows simulating a larger effective batch size by accumulating gradients over multiple forward and backward passes before performing an optimization step 2. This is a crucial technique when the per_device_train_batch_size is limited by GPU memory. Enabling gradient_checkpointing can significantly reduce memory usage during training by recomputing gradients on demand instead of storing all intermediate activations 9. This comes at the cost of a slight increase in training time but can be essential for fitting larger models into memory. Using mixed precision training with fp16=True or bf16=True (if supported by the GPU) can reduce memory usage and potentially speed up training by using lower-precision floating-point numbers for computations 9. The max_seq_length parameter, which can be set in either SFTTrainer or TrainingArguments, controls the maximum length of input sequences. Shorter sequence lengths require less memory, so adjusting this value based on the dataset characteristics can be beneficial. As mentioned earlier, the packing=True option in SFTTrainer can improve training efficiency by packing shorter sequences together, which can indirectly help with memory usage by reducing padding 6.

  • 3.5. Implementing Parameter-Efficient Fine-Tuning (PEFT) with LoRA Parameter-Efficient Fine-Tuning (PEFT) techniques address the challenge of fine-tuning large models with limited computational resources by only training a small subset of the model’s parameters while keeping the majority frozen 1. Low-Rank Adaptation (LoRA) is a popular PEFT technique that achieves this by adding low-rank matrices to the existing layers of the pre-trained model 2. To use LoRA, the peft library provides the LoraConfig class, which allows specifying the configuration for the LoRA adapters, such as the rank (r), the scaling factor (lora_alpha), the dropout probability (lora_dropout), and the target modules where LoRA should be applied 11. The get_peft_model() function is then used to wrap the base model with the configured LoRA adapters 11. This peft_model can then be directly used with the SFTTrainer 11. Quantized LoRA (QLoRA) is an even more memory-efficient variant that combines quantization with LoRA, typically involving loading the base model in 4-bit precision before adding LoRA adapters 2. This approach can further reduce memory requirements.

⚡ Code Example 5 (Fine-tuning phi4 with LoRA):

Python from peft import LoraConfig, get_peft_model from trl import SFTTrainer from transformers import TrainingArguments

🌌 Assuming model and tokenizer are already loaded

lora_config = LoraConfig( r=16, lora_alpha=32, lora_dropout=0.05, bias=“none”, task_type=“CAUSAL_LM”, target_modules=[“q_proj”, “k_proj”, “v_proj”, “o_proj”, “up_proj”, “down_proj”, “gate_proj”], # Common target modules

)

peft_model = get_peft_model(model, lora_config) peft_model.print_trainable_parameters()

training_args = TrainingArguments( output_dir=”./phi4_lora_finetuned”, per_device_train_batch_size=4, gradient_accumulation_steps=4, learning_rate=2e-4, num_train_epochs=3, logging_dir=”./logs”, fp16=True, )

trainer = SFTTrainer( model=peft_model, train_dataset=processed_dataset, tokenizer=tokenizer, args=training_args, dataset_text_field=“text”, )

trainer.train() trainer.save_model(”./phi4_lora_finetuned”)

  • 3.6. Techniques for Memory Optimization on RTX 3060 To effectively fine-tune LLMs like phi4 on an RTX 3060, a combination of memory optimization techniques is often necessary 2. Starting by loading the model in 4-bit precision using libraries like Unsloth or bitsandbytes within transformers provides a significant initial reduction in memory footprint 11. Applying LoRA adapters further minimizes memory usage by drastically reducing the number of trainable parameters 2. Experimenting with different LoRA configurations, such as the rank r, can help find a balance between performance and memory consumption. Carefully adjusting the per_device_train_batch_size and gradient_accumulation_steps is crucial for maximizing throughput without exceeding the GPU’s memory capacity 2. Starting with a small batch size and gradually increasing gradient_accumulation_steps can help determine the optimal values. Enabling gradient_checkpointing should be considered if memory issues persist, as it can provide substantial memory savings, although with a potential slowdown in training speed 9. Using mixed precision training with fp16=True in TrainingArguments is generally recommended for both memory efficiency and potential speed improvements on compatible GPUs 9. Finally, employing dataset packing can lead to more efficient training by reducing padding, which can indirectly help manage memory usage, especially when dealing with a dataset containing many short sequences 6.


🌟 4. Comprehensive Python Code Examples for Fine-Tuning gemma3

  • 4.1. Loading the Model and Tokenizer Similar to phi4, fine-tuning gemma3 models begins with loading the pre-trained model and its tokenizer using AutoModelForCausalLM.from_pretrained() and AutoTokenizer.from_pretrained() from the transformers library 16. It is important to use the correct model identifier for the desired gemma3 variant (e.g., google/gemma-3-1b-it for the instruction-tuned version) 20. Again, loading the model in a quantized format like 4-bit is highly recommended for memory efficiency on an RTX 3060.

⚡ Code Example 6 (Loading gemma3):

Python from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = “google/gemma-3-1b-it” tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) print(f”Gemma-3 model loaded with {model.config.num_parameters():,} parameters.”)

  • 4.2. Preparing the Dataset (Considering Specific Chat Templates) Gemma3 models have their own specific chat templates that need to be applied to format the input data correctly for conversational fine-tuning 16. The snippets suggest a chat template like <0x04>user\n{prompt}<0x04>assistant\n 16. This template helps the model understand the roles of the user and the assistant in a conversation. The tokenizer.chat_template attribute can be set to this specific format 16.

⚡ Code Example 7 (Loading and formatting dataset for gemma3):

Python from datasets import load_dataset from transformers import AutoTokenizer

model_name = “google/gemma-3-1b-it” tokenizer = AutoTokenizer.from_pretrained(model_name) tokenizer.chat_template = “<0x04>user\n{prompt}<0x04>assistant\n”

dataset_name = “databricks/databricks-dolly-15k” # Example instruction dataset

dataset = load_dataset(dataset_name, split=“train”)

def formatting_prompts_func(examples): texts = [ tokenizer.apply_chat_template({“prompt”: instruction, “response”: response}) for instruction, response in zip(examples[“instruction”], examples[“response”]) ] return {“text”: texts}

processed_dataset = dataset.map(formatting_prompts_func, batched=True)

  • 4.3. Fine-Tuning Implementation with SFTTrainer The fine-tuning process for gemma3 using SFTTrainer is very similar to that of phi4. The TrainingArguments are configured with the desired training parameters, and the SFTTrainer is initialized with the model, dataset, tokenizer, and training arguments. The dataset_text_field should be set to “text” to indicate the column containing the formatted prompts 16.

⚡ Code Example 8 (Basic SFTTrainer usage for gemma3):

Python from trl import SFTTrainer from transformers import TrainingArguments

🌌 Assuming model and tokenizer are already loaded

🌌 Assuming processed_dataset is already prepared

training_args = TrainingArguments( output_dir=”./gemma3_finetuned”, per_device_train_batch_size=2, gradient_accumulation_steps=8, learning_rate=2e-4, num_train_epochs=3, logging_dir=”./logs”, fp16=True, )

trainer = SFTTrainer( model=model, train_dataset=processed_dataset, tokenizer=tokenizer, args=training_args, dataset_text_field=“text”, )

trainer.train() trainer.save_model(”./gemma3_finetuned”)

  • 4.4. Highlighting Model-Specific Configuration Options For gemma3, the research material emphasizes the use of parameter-efficient fine-tuning techniques like LoRA to manage memory usage effectively 16. The target modules for LoRA might vary slightly depending on the specific model architecture, but common targets include the query, key, value, and output projection layers in the attention mechanisms, as well as the up and down projection layers in the MLP 18.

  • 4.5. Advanced Fine-Tuning Techniques Some advanced techniques mentioned in the context of gemma3 include freezing certain layers, particularly in multimodal versions of the model, if the fine-tuning task primarily focuses on the language aspects 18. However, for the base language models like those the user is interested in, the focus is typically on fine-tuning the language layers or using PEFT techniques like LoRA.



🌟 5. Detailed Explanation of SFTTrainer and TrainingArguments Parameters

Parameter (SFTTrainer)DescriptionRelevance for RTX 3060
modelThe pre-trained model to be fine-tuned. Can be a string (model ID) or a PreTrainedModel object.Essential for specifying the model. Loading in 4-bit is crucial for memory.
train_datasetThe dataset to use for training. Supports various formats (plain text, conversational).The core data for fine-tuning. Formatting and packing are important.
tokenizerThe tokenizer associated with the model.Necessary for processing text data into tokens. Chat templates are model-specific.
argsAn instance of TrainingArguments containing training configurations.Crucial for controlling the training process and memory usage.
dataset_text_fieldThe name of the column in the dataset containing the text.Specifies the input data for training.
packingWhether to use example packing for more efficient training.Recommended for datasets with shorter sequences to improve efficiency.
Parameter (TrainingArguments)DescriptionRelevance for RTX 3060
output_dirThe directory where the fine-tuned model and logs will be saved.Standard parameter for saving outputs.
per_device_train_batch_sizeThe batch size per GPU during training.Needs to be tuned carefully to fit within the RTX 3060’s memory.
gradient_accumulation_stepsNumber of steps to accumulate gradients before performing a backward/update pass.Allows simulating larger effective batch sizes when memory is limited.
learning_rateThe initial learning rate for the optimizer.Standard hyperparameter for training.
num_train_epochsThe total number of training epochs to perform.Standard hyperparameter for training.
logging_dirThe directory where training logs will be saved.Useful for monitoring the training process.
fp16 / bf16Whether to use 16-bit (mixed) precision training.Highly recommended for reducing memory usage and potentially increasing speed.
gradient_checkpointingIf True, use gradient checkpointing to save memory at the expense of slower backward pass.Important memory optimization technique for limited GPU memory.
max_seq_lengthMaximum sequence length to use.Directly impacts memory consumption. Should be set based on the dataset.


🌟 6. Best Practices and Considerations for Fine-Tuning LLMs

Selecting a relevant and high-quality dataset is paramount for achieving good performance after fine-tuning 1. The dataset should be representative of the target task or domain. Proper data preprocessing and formatting are essential to ensure the data is in the expected format for the chosen model and the SFTTrainer 2. This includes tokenization and potentially applying chat templates for conversational models. Choosing appropriate hyperparameters, such as the learning rate, batch size, and number of epochs, is crucial and often requires experimentation to find the optimal values for a specific task and model 2. Monitoring evaluation metrics, such as perplexity or task-specific metrics like ROUGE or BLEU, is important for assessing the performance of the fine-tuned model and identifying potential issues like overfitting 1. Strategies for preventing overfitting, such as using a validation set to monitor performance on unseen data, employing early stopping to halt training when performance on the validation set starts to degrade, and using regularization techniques, should be considered 1. Monitoring GPU usage during training is essential to ensure efficient resource utilization and to diagnose any memory-related issues 11. Tools like nvidia-smi can be used to track GPU memory consumption and utilization. Finally, after successful fine-tuning, the model should be saved in a suitable format for deployment or further use 9. The saved model can then be loaded for inference or shared with the community.



🌟 7. Conclusion and Further Resources

Fine-tuning large language models like phi4 and gemma3 on an RTX 3060 with limited memory requires careful consideration of memory optimization techniques. The TRL library and its SFTTrainer provide a user-friendly interface for this process, especially when combined with parameter-efficient fine-tuning methods like LoRA and quantization. For further exploration, users are encouraged to consult the official documentation of the TRL library (3), the 🤗 Transformers library, the PEFT library, and Unsloth. Examining the example scripts provided in the TRL repository (4) and exploring community tutorials and blog posts (1) can provide additional insights and practical guidance for specific use cases. Experimentation with different configurations and techniques is key to achieving the best possible performance for the desired task.

🔧 Works cited

1. SFTTrainer: A Comprehensive Exploration of Its Concept, Advantages, Limitations, History, and Applications | by Frank Morales Aguilera | The Deep Hub | Medium, accessed on March 19, 2025, https://medium.com/thedeephub/sfttrainer-a-comprehensive-exploration-of-its-concept-advantages-limitations-history-and-19ab0926e74e 2. Parameter-Efficient Fine-Tuning of Llama 3.1: A Comprehensive Guide - Medium, accessed on March 19, 2025, https://medium.com/@govindarajpriyanthan/parameter-efficient-fine-tuning-of-llama-3-1-a-comprehensive-guide-bed38d232285 3. huggingface/trl: Train transformer language models with … - GitHub, accessed on March 19, 2025, https://github.com/huggingface/trl 4. Supervised Fine-tuning Trainer - Hugging Face, accessed on March 19, 2025, https://huggingface.co/docs/trl/en/sft_trainer 5. trl/docs/source/sft_trainer.md at main · huggingface/trl · GitHub, accessed on March 19, 2025, https://github.com/huggingface/trl/blob/main/docs/source/sft_trainer.md 6. Supervised Fine-tuning Trainer - Hugging Face, accessed on March 19, 2025, https://huggingface.co/docs/trl/v0.7.4/sft_trainer 7. deep-learning-pytorch-huggingface/training/fine-tune-llms-in-2024-with-trl.ipynb at main, accessed on March 19, 2025, https://github.com/philschmid/deep-learning-pytorch-huggingface/blob/main/training/fine-tune-llms-in-2024-with-trl.ipynb 8. How to Fine-Tune Multimodal Models or VLMs with Hugging Face TRL - Philschmid, accessed on March 19, 2025, https://www.philschmid.de/fine-tune-multimodal-llms-with-trl 9. How to Fine-tune an LLM Part 3: The HuggingFace Trainer | alpaca_ft - Wandb, accessed on March 19, 2025, https://wandb.ai/capecape/alpaca_ft/reports/How-to-Fine-tune-an-LLM-Part-3-The-HuggingFace-Trainer—Vmlldzo1OTEyNjMy 10. Model fine-tuning with Hugging Face | Llama - DataCamp, accessed on March 19, 2025, https://campus.datacamp.com/courses/fine-tuning-with-llama-3/fine-tuning-with-sfttrainer-on-hugging-face?ex=1 11. How to Fine-Tune Phi-4 Locally? - Analytics Vidhya, accessed on March 19, 2025, https://www.analyticsvidhya.com/blog/2025/01/fine-tune-phi-4-locally/ 12. I Used Microsoft’s New Phi-4 SLM with unsloth to Predict User Responses (Classification Tasks) — Finetune — Part 2 | by Nurman Naz - Medium, accessed on March 19, 2025, https://medium.com/@nurmanaz/i-used-microsofts-new-phi-4-slm-with-unsloth-to-predict-user-responses-classification-tasks-2284aa13badc 13. Finally, My Gaming GPU can Optimally Fine-Tune LLMs | by Austin Choi - Medium, accessed on March 19, 2025, https://medium.com/@austinchoi/finally-my-gaming-gpu-can-optimally-fine-tune-llms-cfd3b88e30ed 14. Fine-tuning with the Hugging Face ecosystem (TRL) - ROCm Documentation - AMD, accessed on March 19, 2025, https://rocm.docs.amd.com/projects/ai-developer-hub/en/latest/notebooks/fine_tune/fine_tuning_lora_qwen2vl.html 15. SFTTrainer usage · Issue #2390 · huggingface/trl - GitHub, accessed on March 19, 2025, https://github.com/huggingface/trl/issues/2390 16. Finetune Gemma - AI Engineering Academy, accessed on March 19, 2025, https://aiengineering.academy/LLM/Gemma/Gemma_finetuning_notebook/ 17. Fine-Tuning: Unleashing the Potential of Large Language Models - Medium, accessed on March 19, 2025, https://medium.com/@adnaan525/fine-tuning-unleashing-the-potential-of-large-language-models-27c8012d3d69 18. Fine-Tune Gemma-3 on Custom Dataset Locally: Step-by-Step Easy Tutorial - YouTube, accessed on March 19, 2025, https://www.youtube.com/watch?v=TWL10n8ZFCQ 19. Fine-Tune Gemma for Vision Tasks using Hugging Face Transformers and QLoRA, accessed on March 19, 2025, https://ai.google.dev/gemma/docs/core/huggingface_vision_finetune_qlora 20. Fine-Tune Gemma using Hugging Face Transformers and QloRA | Google AI for Developers, accessed on March 19, 2025, https://ai.google.dev/gemma/docs/core/huggingface_text_finetune_qlora 21. Bypassing the V100 : Finetuning LLM on a Single 3060 Card | by Aria | Medium, accessed on March 19, 2025, https://medium.com/@kudoysl/bypassing-the-v100-train-llm-on-a-single-3060-card-3165aef506c4 22. Fine-tuning open AI models using Hugging Face TRL - YouTube, accessed on March 19, 2025, https://www.youtube.com/watch?v=cnGyyM0vOes