
Have you ever thought about how to make communication easier for people who use a mix of Hindi and English, commonly known as Hinglish? With the growing use of Hinglish in everyday conversations, social media, and advertising, there’s a need for tools that can accurately translate between English and Hinglish. This is where advanced language models like Gemma 2 9B come into play. By fine-tuning this model, we can create solutions that understand the unique blend of Hindi and English, making communication more effective for a wider audience.
Learning Objectives
- Understand the key features and multilingual capabilities of the Gemma 2 9B model.
- Learn how Unsloth AI accelerates fine-tuning for large language models.
- Gain hands-on experience in fine-tuning the Gemma 2 9B model for English-to-Hinglish translation.
- Explore the impact of fine-tuning on translation accuracy compared to the original model.
- Learn how to deploy and query the fine-tuned model using Ollama for real-world applications.
This article was published as a part of the Data Science Blogathon.
Understanding Gemma 2 9B Model
Gemma 2 models represent a significant advancement in artificial intelligence, offering powerful language processing capabilities with a focus on efficiency and accessibility. These models are designed to excel in tasks such as text generation, code writing, and problem-solving. With their compact size and robust performance, Gemma 2 models provide a versatile tool for developers and users alike. They are particularly noted for their competitive performance relative to larger models.
- Parameter Size: The model has 9 billion parameters, which is relatively small compared to other larger LLMs, making it efficient for deployment on devices with limited resources
- Training Data: It was trained on a massive dataset of 8 trillion tokens, including web documents, code, and mathematical text. This diverse training enables the model to excel in tasks like text generation, code writing, and mathematical problem-solving
- Architecture: Gemma 2 uses a transformer architecture, which is well-suited for natural language processing tasks. It is designed to handle a wide range of tasks, from answering questions to generating code
- Multilingual and Code Generation: Gemma 2 is proficient in multiple languages and can generate code in various programming languages, making it a versatile tool for developers
- Efficiency and Accessibility: Its relatively small size allows for deployment on laptops or desktops, democratizing access to state-of-the-art AI models. It also supports fast inference, making it suitable for real-time applications
Fine tuning Gemma 2 9B using Unsloth AI
Fine-tuning the multilingual Gemma 2 9B model can be highly beneficial for Hindi translations due to its robust multilingual capabilities and adaptability.
- Multilingual Strengths: Gemma 2 models, including the 9B version, have demonstrated strong multilingual performance across various languages, often surpassing larger models like Llama-3-70B in specific tasks. For instance, fine-tuned versions have excelled in languages such as French and Korean, showcasing their ability to handle diverse linguistic structures effectively. This capability indicates that with fine-tuning on Hinglish datasets, the model can achieve high-quality translations and semantic understanding.
- Customization for Hindi: Fine-tuning allows the model to adapt specifically to Hinglish unique syntax, grammar, and cultural nuances. Using techniques like Supervised Fine-Tuning (SFT) or Low-Rank Adaptation (LoRA), developers can enhance its translation accuracy by training it on curated Hinglish datasets. This process ensures that the model generates contextually accurate and culturally relevant translations.
- Efficiency for Low-Resource Scenarios: The Gemma 2 9B model is computationally efficient compared to larger models like the 27B version, making it ideal for projects with limited resources while still delivering excellent result
What is Unsloth AI?
Unsloth AI, founded in 2023 and based in San Francisco, is an innovative startup revolutionizing the fine-tuning and training of large language models (LLMs). With a focus on speed and efficiency, Unsloth’s platform enables model training up to 30 times faster while using 90% less memory compared to traditional methods. This is achieved through advanced software optimizations, such as handwritten GPU kernels, rather than relying on hardware upgrades. The company embraces an open-source approach, boasting over 8 million monthly downloads and 29,000 GitHub stars. By making AI training more accessible and cost-effective, Unsloth AI caters to developers and enterprises alike, fostering a collaborative and inclusive AI ecosystem.
Unsloth speeds up LLM training using several techniques. It manually derives backpropagation steps, like manual autograd, for faster gradient calculations. And optimizes chained matrix multiplications and builds custom, more efficient kernels known as Triton language kernels. It also uses Flash Attention to focus on critical input data. Along with other memory-efficient strategies, these enhance training speed and efficiency.

Hands On Tutorial on Fine Tuning Gemma 2 9B For English to Hinglish Translations
In the following tutorial, we fine tune the multilingual Gemma 2 9B on a Hinglish Dataset leveraging the Unsloth AI library on Google Colab using T4 GPU. We save the fine tuned model in Hugging Face and then query the model for different inputs through Ollama. Post this, we explore how the fine tuned model helps in more accurate English to Hinglish translations.
Step 1: Install Necessary Libraries
We will first install necessary libraries below:
!pip install unsloth
Step 2: Loading the Model
The code below loads the pre-trained Gemma 2 9B language model using the unsloth library. It sets configuration options like a maximum sequence length of 2048 tokens and enables 4-bit quantization to reduce memory usage. The data type (dtype) is auto-detected, and the model and tokenizer are loaded for use in further language processing tasks. This setup optimizes memory efficiency while working with large language models.
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = (
None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
)
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/gemma-2-9b",
max_seq_length=max_seq_length,
dtype=dtype,
load_in_4bit=load_in_4bit)
Step 3: Adding LoRA Adapters
For Adding LoRA Adapters, we only need to update 1 to 10% of all parameters. The code below utilizes the FastLanguageModel.get_peft_model function to adapt a model using LoRA (Low-Rank Adaptation) techniques. It specifies parameters such as the rank (r = 16), target modules for adaptation, and optimization settings like lora_alpha and bias.
The code also enables “unsloth” for efficient memory usage and sets a random state for reproducibility.
model = FastLanguageModel.get_peft_model(
model,
r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 16,
lora_dropout = 0, # Supports any, but = 0 is optimized
bias = "none", # Supports any, but = "none" is optimized
# [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
random_state = 3407,
use_rslora = False, # We support rank stabilized LoRA
loftq_config = None, # And LoftQ
)
Step 4: Defining the Alpaca Format For Preparing the Dataset
The code below defines a prompt formatting function for preparing training data in a structured format. It starts by creating a template (alpaca_prompt) that includes placeholders for the instruction, input, and output. The formatting_prompts_func function takes in a batch of examples, extracts the English (en) and Hinglish (hi_ng) text, and formats them into the defined template. It adds an EOS_TOKEN (End-of-Sequence token) at the end of each formatted prompt to prevent the model from generating responses indefinitely. The final output is a dictionary with the formatted text for each example, ready for model training or fine-tuning.
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{}
### Input:
{}
### Response:
{}"""
EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
instructions = ["Translate English to Hinglish"]
inputs = examples["en"]
outputs = examples['hi_ng']
texts = []
for instruction, input, output in zip(instructions, inputs, outputs):
# Must add EOS_TOKEN, otherwise your generation will go on forever!
text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
texts.append(text)
return { "text" : texts, }
Step 5: Loading the Dataset
The code below prepares the dataset in the correct format, with each entry consisting of a properly structured instruction-input-output prompt for Hinglish translation tasks.
from datasets import load_dataset
from datasets import Dataset, DatasetDict
dataset = load_dataset("nateraw/english-to-hinglish", split = "train")
dataset= dataset.remove_columns(["source"])
df_pandas = dataset.to_pandas()
def apply_format(col1,col2):
instruction = "Translate English to Hinglish"
text = alpaca_prompt.format(instruction, col1, col2) + EOS_TOKEN
return text
df_pandas['text'] = df_pandas.apply(lambda e:apply_format(e['en'],e['hi_ng']),axis=1)
df_pandas.drop(['en','hi_ng'],axis=1,inplace=True)
dataset = Dataset.from_pandas(df_pandas)
Step 6: Defining Huggingface TRL’s SFTTrainer for Training the Model
The code below initializes an SFTTrainer for fine-tuning a model using the trl library. It sets up training parameters such as batch size, gradient accumulation steps, and learning rate within TrainingArguments. The trainer also configures logging and optimization settings, including the use of mixed precision (fp16 or bf16) based on hardware support. The training process is optimized with an AdamW optimizer and a linear learning rate scheduler.
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = dataset,
dataset_text_field = "text",
max_seq_length = max_seq_length,
dataset_num_proc = 2,
packing = False, # Can make training 5x faster for short sequences.
args = TrainingArguments(
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
warmup_steps = 5,
max_steps = 60,
learning_rate = 2e-4,
fp16 = not is_bfloat16_supported(),
bf16 = is_bfloat16_supported(),
#LOGGING ARGUMENTS
logging_steps = 1,
optim = "adamw_8bit",
weight_decay = 0.01,
lr_scheduler_type = "linear",
seed = 3407,
output_dir = "outputs",
report_to = "none", # Use this for WandB etc
),
)
Step 7: Starting the Training
trainer_stats = trainer.train()
Step 8: Inference from the Fine Tuned Model
The code below sets up inference for the fine-tuned model using FastLanguageModel. It first prepares a prompt (alpaca_prompt) for translation from English to Hinglish by formatting it with an example input. The prompt is tokenized and transferred to a GPU (cuda) for efficient computation. The model then generates a response with a maximum of 64 new tokens, and the output is decoded back into text. Finally, it extracts the part of the output after the “### Response:” section, which contains the generated Hinglish translation.
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
alpaca_prompt.format(
"Translate English to Hinglish", # instruction
"remind me to get eggs today", # input
"", # output - leave this blank for generation!
)
], return_tensors = "pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
output = tokenizer.batch_decode(outputs)
output[0].split("### Response:\n")[1]
Output
'mujhe aaj eggs lene ke liye yaad dilaayen'
Step 9: Saving the Model & Pushing to Hugging Face
The following code is for saving the trained model and pushing it to Hugging Face Hub. You would need to give it the HF token for writing to the Hub.
model.save_pretrained("lora_model") # Local saving
tokenizer.save_pretrained("lora_model")
model.push_to_hub("mimidutta007/english_to_hinglish_FTgemma2", token = "") # Online saving
tokenizer.push_to_hub("mimidutta007/english_to_hinglish_FTgemma2", token = "") # Online saving
You can find the model here. I have also converted it to GGUF format so that we can query the model through ollama as well.
Querying the Model Through Ollama
Learn how to interact with the fine-tuned Gemma 2 9B model using Ollama, enabling seamless English-to-Hinglish translations through efficient API queries.
Pulling the Fine Tuned Model Through Ollama
This code installs the Ollama software and the langchain-ollama library, which allows interaction with language models via Ollama. It then starts Ollama as a background subprocess (subprocess.Popen) to run in a non-blocking manner. After waiting for 3 seconds (time.sleep(3)), the code pulls a fine-tuned model (english_to_hinglish_FTgemma2) from Ollama using the ollama pull command. This setup enables the model to be used for English-to-Hinglish translation tasks.
#Installing Ollama and langchain-ollama library
!curl -fsSL https://ollama.com/install.sh | sh
!pip install langchain-ollama
#Starting a subprocess so that ollama can be run in a non blocking manner
import subprocess
subprocess.Popen(["ollama", "serve"])
import time
time.sleep(3)
#Pulling the Model
!ollama pull hf.co/mimidutta007/english_to_hinglish_FTgemma2
Querying the Fine Tuned Model Through Ollama
This code sets up a prompt template using langchain for an English-to-Hinglish translation task. It defines a template that includes placeholders for the instruction and input, then creates a ChatPromptTemplate from it. The model (OllamaLLM) is instantiated with a fine-tuned Hinglish translation model. The prompt and model are combined in a chain. The input data is passed to the chain, generating a translation response.
The result is then displayed in Markdown format.
from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama.llms import OllamaLLM
from IPython.display import Markdown
# Define the template
template = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{Instruction}
### Input:
{Input}
### Response:
"""
# Create a prompt template
prompt = ChatPromptTemplate.from_template(template)
# Instantiate the model
model = OllamaLLM(model="hf.co/mimidutta007/english_to_hinglish_FTgemma2")
# Chain the prompt and model
chain = prompt | model
input_data = {
"Instruction": "Translate from English to Hinglish",
"Input": "are there any roads closed in the area due to construction"
}
# Invoke the chain with input data and display the response in Markdown format
response = chain.invoke(input_data)
Output
'kya area ke kisi road par construction ki wajah se band hai'
Query-2
“Input”: “please text Joanne Brennan that I will be five minutes late.”
Output
'Joanne Brenan ko message karo ke main 5 minutes late hoon'
Query-3
“Input”: “remind me to get eggs today”
Output
'mujhe aaj eggs lene ke liye yaad dilaayen'
Query-4
“Input”: “Rotate the Image 90 degrees to the right”
Output
'Image ko 90 degree tak apne Right me rotate kare'
As we can see from the outputs above, they are translated well into Hinglish. Some words are kept in English, while others are converted to Hindi. This suggests that the model, after training, understands Hinglish grammar and can generate accurate outputs.
Comparison with Original Gemma 2 9B Model
Lets also check what the original Gemma 2 9B model outputs for an input:
Input | Output From Original Gemma 2 9B | Output From Fine-tuned Gemma 2 9B |
---|---|---|
play some motivating music | Karo Yaar Kuch Motivational Gaane | kuch motivate karne wala music bajao |
play some motivating music | Chal yaar, kucch pumping gaane bajao | kuch motivate karne wali music play kare |
remind me to get eggs today | Yaar, mujhe aaj ande lene ka yaad dila de! | mujhe aaj eggs lene ke liye yaad dilaayen |
please text Joanne Brennan that I will be five | Bhai Joanne Brennan ko msg kar de ki main 5 minute late ho jaunga. | Joanne Brennan ko message karo ki main 5 minutes late ho |
Request you to please give feedback on comment section | Kya comment section mein kya chal raha hai? Bata de feedback do! | Mujhe comment section par apne feedback dene ki request hai |
- As we see in the table above, the original Gemma 2 9B outputs are not contextually incorrect. However, the fine-tuned model provides contextually accurate responses. It also maintains a formal tone in the message. In contrast, the original model’s output sounds more casual.
- Also, some outputs from the original model are not Hinglish but in complete Hindi like “Yaar, mujhe aaj ande lene ka yaad dila de!”
- We also observe some contextuallu inaccurate translations by the original Gemma 2 9B model like “Kya comment section mein kya chal raha hai? Bata de feedback do!” while the fine tuned model translates it accurately.
Conclusion
The development of LLM models for Hinglish translation is crucial for bridging the gap between formal languages and the hybrid dialect commonly used in India’s everyday communication. Fine-tuning the multilingual Gemma 2 9B model offers significant advantages, especially with its efficiency, multilingual strengths, and adaptability to Hinglish’s unique nuances. This approach not only enhances translation accuracy but also facilitates better communication in personal and professional contexts. With the support of Unsloth AI’s innovative fine-tuning capabilities, this model can revolutionize Hinglish translation and improve engagement across diverse audiences.
Key Takeaways
- Hinglish, a blend of Hindi and English, is increasingly used in informal communication across India. Hence making it essential for businesses and individuals to develop accurate translation models to engage with a broader audience effectively.
- The Gemma 2 9B model is compact yet powerful, with 9 billion parameters and excellent multilingual capabilities. It excels in various tasks such as text generation, code writing, and problem-solving, making it highly versatile.
- Fine-tuning the Gemma 2 9B model on Hinglish datasets improves its translation accuracy and ensures it adapts to Hinglish’s unique syntax, grammar, and cultural nuances, making it more effective for real-world applications.
- The Gemma 2 9B model’s smaller size (9 billion parameters) allows for efficient deployment on devices with limited resources, offering high performance without the need for costly hardware.
- Unsloth AI’s platform significantly enhances the fine-tuning process by enabling faster training (up to 30 times faster) with 90% less memory usage, making AI training more accessible and cost-effective for developers.
Frequently Asked Questions
A. Hinglish, a blend of Hindi and English, is widely used in informal communication in India, especially on social media, in advertising, and in daily conversations. Developing LLM models for Hinglish translation helps businesses and individuals effectively communicate with a broader audience, improving engagement and bridging the gap between formal and colloquial language.
A. The Gemma 2 9B model is a powerful language processing tool with 9 billion parameters, offering robust performance across multilingual tasks. Its compact size, high efficiency, and adaptability make it an ideal candidate for fine-tuning on Hinglish datasets, improving translation accuracy and capturing Hinglish’s unique syntax and cultural nuances.
A. Fine-tuning the Gemma 2 9B model using curated Hinglish datasets allows the model to adapt to the language’s distinct syntax, grammar, and vocabulary. This customization ensures more accurate and culturally relevant translations from English to Hinglish, improving communication in both personal and professional contexts.
A. Unsloth AI offers significant advantages by enabling faster training (up to 30 times faster) while using 90% less memory than traditional methods. This platform makes the fine-tuning process more efficient, cost-effective, and accessible, helping developers create highly specialized language models with fewer resources.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.
Login to continue reading and enjoy expert-curated content.