What Is Lexical Diversity?
Lexical diversity describes how varied a child's vocabulary is in spontaneous speech. It is one of the three core descriptive dimensions of a language sample alongside syntactic complexity (MLU) and grammaticality (PGU). A child with rich lexical diversity uses many different words; a child with poor lexical diversity reuses a small set of words across many utterances.
Why it matters. Lexical diversity is independently predictive of school readiness, narrative quality, and long-term reading outcomes. Children with developmental language disorder (DLD) consistently score lower on lexical diversity measures than typically developing peers, even when their MLU is in the normal range. Tracking lexical diversity gives you a second axis for monitoring expressive language growth.
Three commonly reported measures. Type-Token Ratio (TTR), Number of Different Words in the first 100 tokens (NDW-100), and Number of Different Words across the first 50 utterances (NDW-50). All three are computed by this tool from the same pasted transcript.
TTR vs. NDW — Which Should I Report?
Use TTR when you are describing a single sample for a teaching example, comparing two samples of equal length, or showing the raw types-to-tokens relationship for a worked example.
Use NDW-100 (Miller 1981) when you are comparing children across samples of different lengths, reporting against published norms, or writing a research-grade evaluation. This is the default in SALT and CLAN.
Use NDW-50 utterances (SUGAR) when your sample is structured in utterances rather than tokens (which is how most clinicians collect samples), you need to compare against the SUGAR norms, or you are working with the standard 50-utterance SUGAR sample.
Common pitfall. Do not report TTR alone for samples of different lengths. A child with TTR 0.45 from a 60-token sample is not better than a child with TTR 0.32 from a 300-token sample — TTR drops with length, so the comparison is meaningless. Report NDW-100 instead.
How This Calculator Works
Paste your language sample into the textarea, one utterance per line. The calculator:
- Splits the sample into utterances (newlines first, sentence-ending punctuation as a fallback) using the shared SLP utterance parser.
- Tokenises each utterance: lowercases, strips leading and trailing punctuation, treats contractions like "don't" as a single token.
- Counts total tokens and total unique types across the entire sample to compute TTR.
- Computes NDW-100 by counting unique types in the first 100 tokens of the sample (Miller 1981 truncation).
- Computes NDW-50 by counting unique types across the first 50 utterances of the sample (SUGAR convention).
- Lists the 20 most frequent word types so you can sanity-check the tokeniser and spot transcription typos.
Length-truncated measures are reported as `—` until the sample meets the minimum length, with a hint telling you how many more tokens or utterances you need.