An interactive, curiosity-driven exploration of Nepali Low-Rank Adaptation (LoRA). Use this lab bench to physically analyze the speech frequencies and weights of the fine-tuned voice model.
A pretrained voice model stores its knowledge in large weight matrices $W_0$. Retraining all of them for Nepali would be expensive and risk forgetting what the model already knows. LoRA leaves $W_0$ frozen and learns a small correction $\Delta W = BA$ through a narrow rank-$r$ channel.
Box sizes and labels update live. Multiplying $B \times A$ collapses the inner dimension $r$, so the correction is low-rank by construction.
Most of the model stays untouched. The green slice is what Nepali fine-tuning actually learns.
| Base weights ($d^2$) | 1,048,576 |
| LoRA params ($2dr$) | 16,384 |
| Compression ratio | 64× fewer trained |
| Effective scale | α/r = 2.0 |
Why this matters for Nepali: Devanagari phonetics — vowel nasalization, aspirated consonants, schwa deletion — live in a small subspace of the model's behaviour. A rank-8 adapter is enough to shift pronunciation without relearning voice quality. The listening tests below are the audible difference that $BA$ makes.
Three listening tests, one tab each: Compare (base vs LoRA on the same sentence), Clone · demo (an early reference speaker), and Voice design (no reference — persona from an English line). Jump tabs anytime; audio in hidden tabs is paused so only what you see is playing.
Each row plays the same Nepali sentence twice. Original is the base text-to-speech model on its own. Fine-tuned is the same model with the Nepali LoRA adapter loaded on top. Words highlight as they play.
01 — greeting
नमस्ते।
Before — base model
After — with Nepali LoRA
02 — weather & movement
आज मौसम धैरे राम्रो छ, हामी बाहिर घुम्न जाऔं।
Before — base model
After — with Nepali LoRA
03 — geography & regions
नेपाल एक सुन्दर देश हो जहाँ हिमालय, पहाड र तराई क्षेत्र छन्।
Before — base model
After — with Nepali LoRA
04 — education policy
सरकारले नयाँ शिक्षा नीति लागू गर्ने घोषणा गरेको छ, जसले विद्यार्थीहरूको भविष्य उज्यालो बनाउने अपेक्षा गरिएको छ।
Before — base model
After — with Nepali LoRA
05 — wise old man
एक समयको कुरा हो, एउटा सानो गाउँमा एक जना बुद्धिमान वृद्ध मानिस बस्थे। उनले आफ्नो जीवनभर धेरै कठिनाइहरू झेलेका थिए, तर कहिल्यै हार मानेनन्।
Before — base model
After — with Nepali LoRA
06 — area & stats
नेपालको क्षेत्रफल एक लाख सत्तरी हजार वर्ग किलोमिटर छ र जनसंख्या लगभग तीन करोड छ।
Before — base model
After — with Nepali LoRA
07 — inquiries
तपाईंको नाम के हो? तपाईं कहाँबाट आउनुभयो? के तपाईंलाई नेपाली खाना मनपर्छ?
Before — base model
After — with Nepali LoRA
08 — mother's love
आमाको माया संसारमा सबैभन्दा ठूलो माया हो। उनको आँचलमा सुत्दा संसारका सबै दुःख बिर्सिन्छन्।
Before — base model
After — with Nepali LoRA
This experiment isolates **identity (reference clip)** from **skill (the LoRA)**. The reference speaker is a short english recording. The model copies this specific timbre, and uses the LoRA weights to pronounce new Nepali text with natural phonology.
Reference speaker
A 3-second recording of the target voice. This identical anchor controls the synthesized speaker identity for all files below.
Reference clip (real human voice)
01 — sentence in the cloned voice
नमस्ते, मेरो नाम सारा हो।
In the cloned voice
02 — sentence in the cloned voice
आज मौसम धेरै राम्रो छ।
In the cloned voice
03 — sentence in the cloned voice
नेपाल एक सुन्दर देश हो जहाँ हिमालय, पहाड र तराई क्षेत्र छन्।
In the cloned voice
04 — sentence in the cloned voice
सरकारले नयाँ शिक्षा नीति लागू गर्ने घोषणा गरेको छ।
In the cloned voice
05 — sentence in the cloned voice
आमाको माया संसारमा सबैभन्दा ठूलो माया हो।
In the cloned voice
This test invents brand-new identities entirely from a single-line English description (like "warm storyteller" or "elderly deep man"). No reference audio is used.
Acoustic Drift Phenomenon: Rows 02, 03, and 08 demonstrate a fascinating model behavior. Despite requesting deep, masculine voices, the model synthesized higher-pitched, lighter timbres. In deep learning research, this represents an "attractor state" where a highly skewed female dataset acts as an acoustic gravity well, pulling male prompts toward feminine formants.
Deep, authoritative, calm elderly man
Energetic, enthusiastic young man
Measured, authoritative male teacher
01 young_female_gentle
(A young woman with a gentle, sweet, and warm voice, speaking slowly and clearly)
नमस्ते, मेरो नाम सीता हो। म नेपालबाट बोल्दै छु।
Designed voice
02 old_male_deep
(An elderly man with a deep, authoritative, and calm voice)
नेपाल एक सुन्दर देश हो जहाँ हिमालय, पहाड र तराई क्षेत्र छन्।
Designed voice
03 young_male_energetic
(A young man with an energetic, enthusiastic, and fast-paced voice)
आज मौसम धेरै राम्रो छ, हामी बाहिर घुम्न जाऔं!
Designed voice
04 female_newsreader
(A professional female news anchor, clear articulation, neutral tone, moderate pace)
सरकारले नयाँ शिक्षा नीति लागू गर्ने घोषणा गरेको छ, जसले विद्यार्थीहरूको भविष्य उज्यालो बनाउने अपेक्षा गरिएको छ।
Designed voice
05 female_storyteller
(A warm, motherly woman telling a bedtime story, soft and soothing voice)
एक समयको कुरा हो, एउटा सानो गाउँमा एक जना बुद्धिमान वृद्ध मानिस बस्थे।
Designed voice
06 male_cheerful
(A cheerful middle-aged man, slightly smiling, friendly and inviting tone)
तपाईंलाई नेपाली खाना मनपर्छ? आउनुहोस्, हामीसँग खाना खानुहोस्!
Designed voice
07 female_sad_emotional
(A young woman, sad and emotional, speaking slowly with a trembling voice)
आमाको माया संसारमा सबैभन्दा ठूलो माया हो। उनको आँचलमा सुत्दा संसारका सबै दुःख बिर्सिन्छन्।
Designed voice
08 male_formal_teacher
(A male teacher, calm and measured, explaining clearly with authority)
नेपालको क्षेत्रफल एक लाख सत्तरी हजार वर्ग किलोमिटर छ र जनसंख्या लगभग तीन करोड छ।
Designed voice