🌌 Fine-Tuning Large Language Models with TRL and SFTTrainer on an RTX 3060: A Comprehensive Guide

🌟 1. Introduction to Supervised Fine-Tuning of Large Language Models

🚀 Welcome to this comprehensive guide! This section will give you the foundational knowledge you need. The adaptation of pre-trained Large Language Models (LLMs) to specific tasks through fine-tuning has emerged as a pivotal technique in natural language processing. This process allows practitioners to leverage the extensive general knowledge embedded within these models and tailor them for enhanced performance in specialized domains or applications 1. By building upon the foundations laid during the initial pre-training phase, fine-tuning enables significant improvements in task-specific accuracy, control over output style, and the incorporation of domain-specific knowledge. The Hugging Face ecosystem has become central to the development and deployment of LLMs, providing a rich set of open-source libraries that streamline various aspects of working with these models 2. Key libraries within this ecosystem include transformers, which offers access to a vast repository of pre-trained models and tools for their utilization; datasets, which simplifies the process of loading and managing training data; accelerate, which facilitates distributed training and optimizes resource utilization across different hardware configurations; PEFT (Parameter-Efficient Fine-Tuning), which provides techniques to adapt large models efficiently with limited computational resources; and TRL (Transformers Reinforcement Learning), which builds upon these libraries to offer high-level trainers for various fine-tuning and alignment strategies, including Supervised Fine-Tuning (SFT) 2. Among the various fine-tuning techniques, Supervised Fine-Tuning (SFT) plays a crucial role in transforming general-purpose language models into specialized assistants capable of performing specific tasks or adhering to particular styles 7. SFT involves further training a pre-trained “base model” on a smaller, task-specific dataset that consists of input-output pairs or conversational turns. The model learns to map the given inputs to the desired outputs by minimizing the discrepancy between its predictions and the provided ground truth. This process refines the model’s understanding and generation capabilities for the target task, allowing it to produce more relevant, accurate, and contextually appropriate responses 7.

🌟 2. Understanding the TRL Library and the SFTTrainer

The Transformers Reinforcement Learning (TRL) library is an extension of the well-established transformers and datasets libraries, designed to simplify the process of fine-tuning and aligning large language models 2. It offers a suite of high-level trainer classes that cater to various fine-tuning and alignment methodologies, including Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), and Direct Preference Optimization (DPO) 6. The library’s architecture leverages the accelerate library to ensure efficient training across a diverse range of hardware setups, from single GPUs to large-scale multi-node clusters 6. Furthermore, TRL seamlessly integrates with the PEFT library, enabling training on extremely large models with limited computational resources through the application of techniques like quantization and Low-Rank Adaptation (LoRA) 6. Within the TRL library, the SFTTrainer class stands out as a powerful and user-friendly tool specifically designed for supervised fine-tuning of causal language models 5. As a subclass of the Trainer class from the transformers library, SFTTrainer inherits all the core functionalities for managing the training loop, including logging, evaluation, and checkpointing 5. However, it extends these capabilities by incorporating several features that specifically address the nuances of supervised fine-tuning 5. These enhancements include automated dataset formatting, support for various dataset structures (such as plain text and conversational formats), efficient data packing to maximize training efficiency, and native integration with Parameter-Efficient Fine-Tuning (PEFT) techniques like QLoRA and LoRA 5. To effectively utilize the SFTTrainer, several key components are essential 2:

Model: This refers to the pre-trained causal language model that will be the subject of fine-tuning. Typically, this model is loaded from the transformers library using the AutoModelForCausalLM.from_pretrained() method, specifying the model identifier from the Hugging Face Hub or a local path.
Tokenizer: The tokenizer is responsible for converting text data into numerical tokens that the model can process. It is crucial to use the tokenizer that corresponds to the pre-trained model being used. The tokenizer is usually loaded using AutoTokenizer.from_pretrained() with the same model identifier as the model.
Dataset: The training dataset provides the examples that the model will learn from during the fine-tuning process. This dataset should be loaded using the datasets library and formatted appropriately for the specific task. The SFTTrainer supports various dataset formats, including those with plain text or structured conversational turns.
Training Arguments: These arguments define the hyperparameters and configuration settings for the training process. They are specified through an instance of TrainingArguments (from transformers) or SFTConfig (from trl, which inherits from TrainingArguments). These arguments control aspects such as the learning rate, batch size, number of training epochs, the directory for saving output files, and logging options. Understanding the role and configuration of each of these components is fundamental to successfully implementing supervised fine-tuning using the SFTTrainer.

⚡ Table 1: Key Parameters of SFTTrainer and SFTConfig

Parameter	Description	Typical Use Cases
model	The pre-trained language model to fine-tune (can be a string identifier or a model object).	Specifies the base model for adaptation.
tokenizer	The tokenizer corresponding to the pre-trained model.	Converts text to tokens and back.
train_dataset	The dataset used for training.	Provides the input-output examples for learning.
eval_dataset	The dataset used for evaluation during training (optional).	Allows monitoring of the model’s performance on unseen data.
args (TrainingArguments or SFTConfig)	Configuration object specifying training hyperparameters and settings.	Controls learning rate, batch size, output directory, logging, etc.
dataset_text_field	Name of the column in the dataset containing the text data.	Specifies which column to use for training text.
max_length	Maximum length of tokenized sequences.	Truncates or pads sequences to a fixed length.
packing	Whether to pack multiple short sequences into longer ones to improve training efficiency.	Increases GPU utilization by reducing padding.
peft_config	Configuration for Parameter-Efficient Fine-Tuning (e.g., LoRA, QLoRA).	Enables memory-efficient fine-tuning of large models.
formatting_func	A function to format dataset samples into the desired input-output structure.	Customizes how data is presented to the model.
callbacks	List of callback functions to customize the training loop.	Allows adding custom behavior during training.
optimizers	Tuple containing the optimizer and learning rate scheduler.	Provides control over the optimization process.
model_init_kwargs	Keyword arguments for initializing the model from a string identifier.	Used to pass specific initialization parameters.
dataset_kwargs	Keyword arguments for dataset preparation.	Allows customization of how the dataset is loaded and processed.
dataset_num_proc	Number of processes to use for dataset processing.	Speeds up data loading and preprocessing.
padding_free	Whether to perform forward passes without padding (requires flash_attention_2).	Can improve efficiency with specific attention implementations.
eval_packing	Whether to apply packing to the evaluation dataset.	Ensures consistent data handling for evaluation.
learning_rate	Initial learning rate for the optimizer.	Controls the step size during weight updates.

🌟 3. Setting Up Your Environment for Efficient Fine-Tuning on an RTX 3060

The NVIDIA RTX 3060, equipped with 12GB of Video RAM (VRAM), presents a viable option for individuals looking to fine-tune large language models without access to high-end data center GPUs 20. However, the 12GB VRAM capacity imposes certain limitations on the size of the models that can be effectively fine-tuned and the batch sizes that can be utilized without encountering memory constraints 20. For very large models, especially those with tens or hundreds of billions of parameters, full fine-tuning, which involves updating all the model’s weights, may exceed the available GPU memory 7. Therefore, employing memory-efficient fine-tuning strategies becomes essential to successfully adapt these models on an RTX 3060 1. To begin the fine-tuning process, a properly configured software environment is necessary 2. The first step involves ensuring that Python (version 3.8 or later is generally recommended) is installed on the system. PyTorch, the deep learning framework underlying the Hugging Face libraries, must also be installed, ideally with CUDA support to leverage the RTX 3060’s GPU capabilities 2. The specific installation command for PyTorch with CUDA will depend on the system’s configuration, including the CUDA version and operating system, and can be found on the official PyTorch website. Once PyTorch is successfully installed, the essential Hugging Face libraries can be installed using pip: pip install transformers datasets accelerate peft trl 2. Given the memory constraints of the RTX 3060, adopting strategies to optimize memory usage during fine-tuning is critical 1. Parameter-Efficient Fine-Tuning (PEFT) techniques are particularly relevant in this context 1. LoRA (Low-Rank Adaptation) is a popular PEFT method that addresses this by freezing the majority of the pre-trained model’s parameters and introducing a small number of trainable adapter layers 7. These adapter layers consist of low-rank matrices that are trained to approximate the weight updates required for the downstream task, significantly reducing the number of trainable parameters and thus the memory footprint 7. QLoRA (Quantized LoRA) takes this efficiency a step further by quantizing the pre-trained model’s weights to 4-bit precision while still utilizing LoRA for parameter-efficient adaptation 1. This combination of quantization and low-rank adaptation can enable the fine-tuning of much larger models on GPUs with limited VRAM, such as the RTX 3060 1. Other memory optimization techniques that can be beneficial include gradient accumulation, which allows for simulating larger batch sizes without increasing per-batch memory usage, and gradient checkpointing, which trades off some computation time for reduced memory consumption by recomputing gradients on the fly 7. If the RTX 3060 has an Ampere architecture, leveraging Flash Attention can also provide speedups and memory savings for attention computations 2.

🌟 4. Preparing Your Dataset for Supervised Fine-Tuning

The SFTTrainer within the TRL library is designed to be flexible and can accommodate various formats for training datasets 5. At its core, the trainer requires a dataset that contains a text field which holds the data to be used for fine-tuning 2. The specific name of this text field can be specified through the dataset_text_field parameter when initializing the SFTTrainer 2. For tasks that involve instruction-based learning, a common data format includes columns for “instruction” and “output” (or “response”) 8. The SFTTrainer can then be configured to combine these columns into a single text sequence that the model will learn to generate, given an instruction 8. For conversational fine-tuning, the dataset might be structured as a list of messages, where each message has a defined “role” (e.g., “user” or “assistant”) and the corresponding “content” 10. The SFTTrainer is capable of handling these conversational formats, often in conjunction with the tokenizer’s specific chat template, which dictates how conversations should be structured and tokenized 10. The Hugging Face datasets library provides a user-friendly interface for loading and exploring a wide range of datasets, either from the Hugging Face Hub or from local file sources 2. The load_dataset() function is the primary tool for this purpose, allowing users to specify the name of a dataset hosted on the Hub (and optionally, the specific split of the dataset they wish to use) 2. For instance, to load the Databricks Dolly 15k dataset, the following code can be used: dataset = load_dataset(“databricks/databricks-dolly-15k”) 2. Similarly, datasets stored locally in formats like CSV, JSON, or plain text can be loaded by providing the file path to the load_dataset() function and specifying the appropriate data format 10. Once a dataset is loaded, it can be inspected to understand its structure, examine individual data samples, and identify the columns that are relevant for the fine-tuning task 10. Before the data can be used for fine-tuning, it must undergo preprocessing steps, primarily tokenization 10. Tokenization is the process of converting the raw text data into numerical IDs that the language model can process 10. This is typically achieved using the tokenizer associated with the pre-trained model that is being fine-tuned 10. The tokenizer can be applied to the entire dataset using its __call__ method or by defining a mapping function that is applied to each sample in the dataset using the map() method of the datasets library 10. For instruction-based or conversational fine-tuning, it is often necessary to format the data into a specific structure that the model is trained to understand 10. This might involve combining the “instruction” and “output” columns into a single text sequence, possibly with the inclusion of specific separator tokens or special tokens that the model was pre-trained on 10. For conversational data, the messages might need to be formatted according to the chat template defined by the tokenizer 10. Many modern large language models, especially those designed for interactive tasks, are associated with a specific “chat template” within their tokenizer 9. This chat template dictates the precise formatting that should be applied to different components of a conversation, such as system messages, user prompts, and assistant responses, including the use and placement of special tokens like (beginning of sequence), (end of sequence), <|user|>, and <|assistant|> 9. When the goal is to fine-tune these models for conversational abilities, adhering to the tokenizer’s chat template during the preparation of the training data is of paramount importance 9. This ensures that the model learns to generate responses that conform to the expected conversational structure and understands the distinct roles of the participants in the dialogue 9. The SFTTrainer often integrates seamlessly with these chat templates, simplifying the process of formatting conversational datasets for fine-tuning 9.

🌟 5. Comprehensive Guide and Code Examples for Fine-Tuning Phi-4 Models

(Detailed content for this section will be added based on further analysis of snippets related to Phi models, focusing on practical code examples using SFTTrainer and PEFT on an RTX 3060.)

🌟 6. Comprehensive Guide and Code Examples for Fine-Tuning Gemma Models

(Detailed content for this section will be added based on further analysis of snippets related to Gemma models, focusing on practical code examples using SFTTrainer and PEFT on an RTX 3060.)

🌟 7. Fine-Tuning Other Popular LLM Architectures with SFTTrainer

(Detailed content for this section will be added based on further analysis of snippets related to other LLM architectures like Llama and Mistral, focusing on practical code examples using SFTTrainer and PEFT on an RTX 3060.)

🌟 8. Advanced Techniques and Options in SFTTrainer

(Detailed content for this section will be added based on further analysis of snippets related to advanced SFTTrainer parameters and PEFT techniques.)

🌟 9. Evaluating the Performance of Your Fine-Tuned Models

(Detailed content for this section will be added based on further analysis of snippets related to evaluation metrics and strategies.)

(Detailed content for this section will be added based on further analysis of snippets related to saving, sharing, and inference.)

🌟 11. Conclusion and Further Resources

(Content for this section will be added at the end of the report.)

🔧 Works cited

1. Introduction to Fine-tuning Large Language Models - Stephen Diehl, accessed on March 19, 2025, https://www.stephendiehl.com/posts/training_llms/ 2. deep-learning-pytorch-huggingface/training/fine-tune-llms-in-2024 …, accessed on March 19, 2025, https://github.com/philschmid/deep-learning-pytorch-huggingface/blob/main/training/fine-tune-llms-in-2024-with-trl.ipynb 3. How to Fine-Tune Multimodal Models or VLMs with Hugging Face TRL - Philschmid, accessed on March 19, 2025, https://www.philschmid.de/fine-tune-multimodal-llms-with-trl 4. How to Fine-Tune LLMs in 2024 with Hugging Face - Philschmid, accessed on March 19, 2025, https://www.philschmid.de/fine-tune-llms-in-2024-with-trl 5. How to fine-tune open LLMs in 2025 with Hugging Face - Philschmid, accessed on March 19, 2025, https://www.philschmid.de/fine-tune-llms-in-2025 6. huggingface/trl: Train transformer language models with reinforcement learning. - GitHub, accessed on March 19, 2025, https://github.com/huggingface/trl 7. Transformers-Tutorials/Mistral/Supervised_fine_tuning_(SFT)_of_an_LLM_using_Hugging_Face_tooling.ipynb at master - GitHub, accessed on March 19, 2025, https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Mistral/Supervised_fine_tuning_(SFT)_of_an_LLM_using_Hugging_Face_tooling.ipynb 8. LLaMA 2 Fine Tuning: Building Your Own LLaMA, Step by Step - Run:ai, accessed on March 19, 2025, https://www.run.ai/guides/generative-ai/llama-2-fine-tuning 9. Supervised Fine-tuning Trainer - Hugging Face, accessed on March 19, 2025, https://huggingface.co/docs/trl/en/sft_trainer 10. Fine-Tuning Large Language Models for Custom Tasks Using Hugging Face TRL - Medium, accessed on March 19, 2025, https://medium.com/@yash9439/fine-tuning-large-language-models-for-custom-tasks-using-hugging-face-trl-2d8b69adc72c 11. A Practical Guide: Fine-Tuning Large Language Models with HuggingFace | by Yu-Cheng Tsai | Sage Ai | Medium, accessed on March 19, 2025, https://medium.com/sage-ai/a-practical-guide-fine-tuning-large-language-models-with-huggingface-3d4b8298b55f 12. How to Fine-tune an LLM Part 3: The HuggingFace Trainer | alpaca_ft - Wandb, accessed on March 19, 2025, https://wandb.ai/capecape/alpaca_ft/reports/How-to-Fine-tune-an-LLM-Part-3-The-HuggingFace-Trainer—Vmlldzo1OTEyNjMy 13. Fine-Tuning Your First Large Language Model (LLM) with PyTorch …, accessed on March 19, 2025, https://huggingface.co/blog/dvgodoy/fine-tuning-llm-hugging-face 14. Fine-Tune Gemma-3 on Custom Dataset Locally: Step-by-Step Easy Tutorial - YouTube, accessed on March 19, 2025, https://www.youtube.com/watch?v=TWL10n8ZFCQ 15. Fine-Tune Gemma using Hugging Face Transformers and QloRA | Google AI for Developers, accessed on March 19, 2025, https://ai.google.dev/gemma/docs/core/huggingface_text_finetune_qlora 16. Fine-Tune Gemma for Vision Tasks using Hugging Face Transformers and QLoRA, accessed on March 19, 2025, https://ai.google.dev/gemma/docs/core/huggingface_vision_finetune_qlora 17. deep-learning-pytorch-huggingface/training/gemma-lora-example.ipynb at main - GitHub, accessed on March 19, 2025, https://github.com/philschmid/deep-learning-pytorch-huggingface/blob/main/training/gemma-lora-example.ipynb 18. Meta-Llama3-Fine-Tuning/How-to-FineTune-Llama-3-with-SFTTrainer.ipynb at master, accessed on March 19, 2025, https://github.com/ruslanmv/Meta-Llama3-Fine-Tuning/blob/master/How-to-FineTune-Llama-3-with-SFTTrainer.ipynb 19. Model fine-tuning with Hugging Face | Llama - DataCamp, accessed on March 19, 2025, https://campus.datacamp.com/courses/fine-tuning-with-llama-3/fine-tuning-with-sfttrainer-on-hugging-face?ex=1 20. kuleshov-group/llmtools: Finetuning Large Language Models on One Consumer GPU in 2 Bits - GitHub, accessed on March 19, 2025, https://github.com/kuleshov-group/llmtools 21. The Complete Guide to GPU Requirements for LLM Fine-tuning - RunPod Blog, accessed on March 19, 2025, https://blog.runpod.io/the-complete-guide-to-gpu-requirements-for-llm-fine-tuning/ 22. Guide to Hardware Requirements for Training and Fine-Tuning Large Language Models, accessed on March 19, 2025, https://towardsai.net/p/artificial-intelligence/guide-to-hardware-requirements-for-training-and-fine-tuning-large-language-models 23. Two RTX 3060 for running llms locally : r/LocalLLaMA - Reddit, accessed on March 19, 2025, https://www.reddit.com/r/LocalLLaMA/comments/16q8cyt/two_rtx_3060_for_running_llms_locally/ 24. The Ultimate Guide to Hardware Requirements for Training and Fine-Tuning Large Language Models (LLMs) - Towards AI, accessed on March 19, 2025, https://pub.towardsai.net/the-ultimate-guide-to-hardware-requirements-for-training-and-fine-tuning-large-language-models-7b5fe3884f64 25. Please Help Choosing Best Machine for Running Local LLM (3 Options and my objectives inside) : r/LocalLLaMA - Reddit, accessed on March 19, 2025, https://www.reddit.com/r/LocalLLaMA/comments/1j7wnye/please_help_choosing_best_machine_for_running/ 26. Fine-Tuning with LoRA: Optimizing Parameter Selection for LLMs - DagsHub, accessed on March 19, 2025, https://dagshub.com/blog/streamlining-fine-tuning-with-lora-optimizing-parameter-selection-for-llms/ 27. Bypassing the V100 : Finetuning LLM on a Single 3060 Card | by Aria | Medium, accessed on March 19, 2025, https://medium.com/@kudoysl/bypassing-the-v100-train-llm-on-a-single-3060-card-3165aef506c4 28. Fine-Tuning Gemma LLM model - Medium, accessed on March 19, 2025, https://medium.com/@_AchrefTlili/fine-tuning-gemma-llm-model-86ee764e7a3b 29. Fine-tuning | How-to guides - Llama, accessed on March 19, 2025, https://www.llama.com/docs/how-to-guides/fine-tuning/ 30. Finetuning Meta-Llama-3.1-8B using PEFT - Models - Hugging Face Forums, accessed on March 19, 2025, https://discuss.huggingface.co/t/finetuning-meta-llama-3-1-8b-using-peft/108319 31. Fine-tuning with the Hugging Face ecosystem (TRL) - ROCm Documentation - AMD, accessed on March 19, 2025, https://rocm.docs.amd.com/projects/ai-developer-hub/en/latest/notebooks/fine_tune/fine_tuning_lora_qwen2vl.html 32. LoRA (Low-Rank Adaptation) - Hugging Face NLP Course, accessed on March 19, 2025, https://huggingface.co/learn/nlp-course/chapter11/4 33. Finetuning with LoRA and variants - Prem AI, accessed on March 19, 2025, https://blog.premai.io/lora/ 34. Parameter-Efficient Fine-Tuning of Llama 3.1: A Comprehensive Guide - Medium, accessed on March 19, 2025, https://medium.com/@govindarajpriyanthan/parameter-efficient-fine-tuning-of-llama-3-1-a-comprehensive-guide-bed38d232285 35. Efficient Fine-Tuning with LoRA: A Guide to Optimal Parameter Selection for Large Language Models - Databricks, accessed on March 19, 2025, https://www.databricks.com/blog/efficient-fine-tuning-lora-guide-llms 36. Fine-tune a pretrained model - Hugging Face, accessed on March 19, 2025, https://huggingface.co/docs/transformers/en/training

Fine Tuning Large Language Models With Trl And Sfttrainer On An Rtx 3060: A Comprehensive Guide

📖 Reading Mode

📖 Table of Contents

🌌 Fine-Tuning Large Language Models with TRL and SFTTrainer on an RTX 3060: A Comprehensive Guide

🌟 1. Introduction to Supervised Fine-Tuning of Large Language Models

🌟 2. Understanding the TRL Library and the SFTTrainer

⚡ Table 1: Key Parameters of SFTTrainer and SFTConfig

🌟 3. Setting Up Your Environment for Efficient Fine-Tuning on an RTX 3060

🌟 4. Preparing Your Dataset for Supervised Fine-Tuning

🌟 5. Comprehensive Guide and Code Examples for Fine-Tuning Phi-4 Models

🌟 6. Comprehensive Guide and Code Examples for Fine-Tuning Gemma Models

🌟 7. Fine-Tuning Other Popular LLM Architectures with SFTTrainer

🌟 8. Advanced Techniques and Options in SFTTrainer

🌟 9. Evaluating the Performance of Your Fine-Tuned Models

🌟 11. Conclusion and Further Resources

🔧 Works cited

Fine Tuning Large Language Models With Trl And Sfttrainer On An Rtx 3060: A Comprehensive Guide

📖 Reading Mode

📖 Table of Contents

🌌 Fine-Tuning Large Language Models with TRL and SFTTrainer on an RTX 3060: A Comprehensive Guide

🌟 1. Introduction to Supervised Fine-Tuning of Large Language Models

🌟 2. Understanding the TRL Library and the SFTTrainer

⚡ Table 1: Key Parameters of SFTTrainer and SFTConfig

🌟 3. Setting Up Your Environment for Efficient Fine-Tuning on an RTX 3060

🌟 4. Preparing Your Dataset for Supervised Fine-Tuning

🌟 5. Comprehensive Guide and Code Examples for Fine-Tuning Phi-4 Models

🌟 6. Comprehensive Guide and Code Examples for Fine-Tuning Gemma Models

🌟 7. Fine-Tuning Other Popular LLM Architectures with SFTTrainer

🌟 8. Advanced Techniques and Options in SFTTrainer

🌟 9. Evaluating the Performance of Your Fine-Tuned Models

🌟 10. Saving, Sharing, and Utilizing Your Fine-Tuned Models

🌟 11. Conclusion and Further Resources

🔧 Works cited