Ref · NPL-LoRA/2026 Subject · Nepali speech adaptation Medium · synthetic audio

Nepali LoRA Listening Lab

An interactive, curiosity-driven exploration of Nepali Low-Rank Adaptation (LoRA). Use this lab bench to physically analyze the speech frequencies and weights of the fine-tuned voice model.

Every voice clip on this page is generated. The only human-recorded audio is the 3-second reference clip in the cloning tab, which provides the speaker's physical timbre.

How LoRA works

A pretrained voice model stores its knowledge in large weight matrices $W_0$. Retraining all of them for Nepali would be expensive and risk forgetting what the model already knows. LoRA leaves $W_0$ frozen and learns a small correction $\Delta W = BA$ through a narrow rank-$r$ channel.

W₀ + α/r · B × A
Step 1
Freeze the base weights $W_0$ ($d \times d$) stays fixed — timbre, vocoder, general speech prior are preserved.
Step 2
Learn a low-rank detour Only $B$ ($d \times r$) and $A$ ($r \times d$) are trained. Their product $BA$ is a $d \times d$ update, but forced through rank $r$.
Step 3
Scale the adapter $\alpha$ controls how strongly the correction is applied. Effective gain is $\alpha/r$ — tune it without retraining $B$ and $A$.
Hidden dim $d$ 1024
Side length of each square weight matrix. Larger models have more capacity — and more parameters to freeze.
LoRA rank $r$ 8
Width of the bottleneck. Lower $r$ = cheaper training, less expressive adapter. Higher $r$ = more degrees of freedom.
Scale $\alpha$ 16
Adapter strength at inference. With $\alpha = 16$ and $r = 8$, the effective multiplier is $\alpha/r = 2$.

Matrix shapes

Box sizes and labels update live. Multiplying $B \times A$ collapses the inner dimension $r$, so the correction is low-rank by construction.

1024 × 1024
W₀ frozen
base weights
+
scale α/r = 2.0
Rank-8 bottleneck
1024 × 8
B
×
8 × 1024
A
trainable correction → 1024 × 1024
Bottleneck intuition: A full $d \times d$ update needs $d^2$ parameters. LoRA routes the change through rank 8, needing only 16,384 trainable values ($2dr$). The inner dimension $r$ is the subspace the adapter is allowed to move in.

Parameter budget

Most of the model stays untouched. The green slice is what Nepali fine-tuning actually learns.

Base weights ($d^2$) 1,048,576
LoRA params ($2dr$) 16,384
Compression ratio 64× fewer trained
Effective scale α/r = 2.0
98.44% frozen

Why this matters for Nepali: Devanagari phonetics — vowel nasalization, aspirated consonants, schwa deletion — live in a small subspace of the model's behaviour. A rank-8 adapter is enough to shift pronunciation without relearning voice quality. The listening tests below are the audible difference that $BA$ makes.

Three listening tests

Compare Same Nepali sentence before and after the LoRA adapter.
Clone demo A reference voice reading new Nepali lines with adapted pronunciation.
Voice design New voices generated from short English persona descriptions.

Three listening tests, one tab each: Compare (base vs LoRA on the same sentence), Clone · demo (an early reference speaker), and Voice design (no reference — persona from an English line). Jump tabs anytime; audio in hidden tabs is paused so only what you see is playing.

1 · Same sentence, before and after

Each row plays the same Nepali sentence twice. Original is the base text-to-speech model on its own. Fine-tuned is the same model with the Nepali LoRA adapter loaded on top. Words highlight as they play.

01 — greeting

नमस्ते।

Before — base model

After — with Nepali LoRA

02 — weather & movement

आज मौसम धैरे राम्रो छ, हामी बाहिर घुम्न जाऔं।

Before — base model

After — with Nepali LoRA

03 — geography & regions

नेपाल एक सुन्दर देश हो जहाँ हिमालय, पहाड तराई क्षेत्र छन्।

Before — base model

After — with Nepali LoRA

04 — education policy

सरकारले नयाँ शिक्षा नीति लागू गर्ने घोषणा गरेको छ, जसले विद्यार्थीहरूको भविष्य उज्यालो बनाउने अपेक्षा गरिएको छ।

Before — base model

After — with Nepali LoRA

05 — wise old man

एक समयको कुरा हो, एउटा सानो गाउँमा एक जना बुद्धिमान वृद्ध मानिस बस्थे। उनले आफ्नो जीवनभर धेरै कठिनाइहरू झेलेका थिए, तर कहिल्यै हार मानेनन्।

Before — base model

After — with Nepali LoRA

06 — area & stats

नेपालको क्षेत्रफल एक लाख सत्तरी हजार वर्ग किलोमिटर जनसंख्या लगभग तीन करोड छ।

Before — base model

After — with Nepali LoRA

07 — inquiries

तपाईंको नाम के हो? तपाईं कहाँबाट आउनुभयो? के तपाईंलाई नेपाली खाना मनपर्छ?

Before — base model

After — with Nepali LoRA

08 — mother's love

आमाको माया संसारमा सबैभन्दा ठूलो माया हो। उनको आँचलमा सुत्दा संसारका सबै दुःख बिर्सिन्छन्।

Before — base model

After — with Nepali LoRA