Question 1

What is type-token ratio (TTR)?

Accepted Answer

Type-token ratio is the number of unique word types divided by the total number of word tokens in a language sample. A sample with 30 different words across 100 total tokens has a TTR of 0.30. TTR was introduced by Templin (1957) and is the most widely cited index of lexical diversity in early language samples. Higher TTR means more vocabulary variety; lower TTR means more repetition.

Question 2

Why is TTR sample-length sensitive?

Accepted Answer

TTR drops as a sample grows because new words appear less frequently in long samples — function words, pronouns, and discourse markers repeat. A 50-token sample and a 500-token sample from the same child will produce very different TTRs, even though the underlying vocabulary is the same. This is why modern LSA methods (Miller 1981; SUGAR; CLAN) report number of different words at fixed sample lengths instead of raw TTR.

Question 3

What is NDW-100 and why is it preferred over TTR?

Accepted Answer

NDW-100 is the number of different word types in the first 100 tokens of a sample. By fixing the denominator at 100 tokens, NDW-100 avoids the length confound that plagues raw TTR and lets you compare children directly. Miller (1981) introduced the truncation convention and it is now the default in SALT and most academic LSA work.

Question 4

What is NDW per 50 utterances?

Accepted Answer

NDW-50 (or NDW per 50 utterances) is the number of different word types across the first 50 utterances of a sample, regardless of how many tokens those utterances contain. SUGAR (Pavelko & Owens 2017) adopted the 50-utterance convention because clinicians transcribe in utterance units, not in token windows, and 50 utterances is the practical floor for a clinically usable language sample.

Question 5

How many tokens or utterances do I need for a valid measure?

Accepted Answer

For TTR alone, anything below 50 tokens is unreliable. For NDW-100, you need at least 100 tokens. For NDW-50, you need at least 50 utterances — typically 250 to 400 tokens depending on MLU. The calculator will flag short samples and refuse to report a length-truncated measure that the sample is too small to support.

Question 6

Does the calculator merge contractions or case-insensitive variants?

Accepted Answer

Yes. Tokens are lowercased, so "Dog" and "dog" count as the same type. Contractions like "don't" are treated as a single token (matching the SALT NDW convention). The calculator strips leading and trailing punctuation so "dog," and "dog" merge. It does not lemmatise, so "run" and "running" are still distinct types — this matches the standard NDW operationalisation in the LSA literature.

Question 7

Is my data private?

Accepted Answer

Yes. The calculator runs entirely in your browser. The transcript you paste never leaves your device, and nothing is saved on our servers. Close the tab and the sample is gone.

Lexical Diversity Calculator

Paste your language sample

Skip the manual count with ConductSpeech

What Is Lexical Diversity?

TTR vs. NDW — Which Should I Report?

How This Calculator Works

Frequently Asked Questions

Next Steps

MLU Calculator — compute MLU-morphemes from the same transcript

Brown's Stages Lookup — map MLU to grammatical milestones

SUGAR Norms Lookup — age-matched reference values for NDW

PGU Calculator — percentage of grammatical utterances

Need Help?