Lexical Diversity Calculator

Type-Token Ratio, NDW-100 (Miller 1981), and NDW per 50 utterances (SUGAR) from a single pasted transcript. The calculator handles tokenisation, length truncation, and type counting — your data stays in your browser.

TTRNDW-100NDW-50 (SUGAR)Client-Side

Paste your language sample

One utterance per line. The calculator tokenises the sample and reports type-token ratio (TTR), number of different words (NDW), and total tokens — instantly, in your browser.

Paste a transcript above to compute TTR, NDW per 100 tokens, NDW per 50 utterances, total tokens, and total types.

Automate this workflow

Skip the manual count with ConductSpeech

ConductSpeech transcribes the audio, runs the analysis, and writes the clinical report — all in minutes instead of hours.

Automate this with ConductSpeech
  • Computing TTR, NDW-100, and NDW-50 from a transcribed language sample
  • Tracking expressive vocabulary growth across therapy sessions
  • Comparing a child against SUGAR (Pavelko & Owens 2017) lexical diversity norms
  • Reporting a second descriptive metric alongside MLU in an evaluation
  • Teaching graduate SLP students why TTR is sample-length sensitive
  • Spot-checking a transcript for typos by reviewing the most frequent word types

Don't use for

  • Comparing TTR across samples of different lengths — use NDW-100 instead
  • Reporting NDW-100 on samples below 100 tokens (the calculator will refuse and tell you)
  • Reporting NDW-50 on samples below 50 utterances (the calculator will refuse and tell you)
  • Diagnosing developmental language disorder from lexical diversity alone — combine with MLU and PGU
  • Comparing English NDW values against bilingual children without a parallel sample in their other language

What Is Lexical Diversity?

Lexical diversity describes how varied a child's vocabulary is in spontaneous speech. It is one of the three core descriptive dimensions of a language sample alongside syntactic complexity (MLU) and grammaticality (PGU). A child with rich lexical diversity uses many different words; a child with poor lexical diversity reuses a small set of words across many utterances.

Why it matters. Lexical diversity is independently predictive of school readiness, narrative quality, and long-term reading outcomes. Children with developmental language disorder (DLD) consistently score lower on lexical diversity measures than typically developing peers, even when their MLU is in the normal range. Tracking lexical diversity gives you a second axis for monitoring expressive language growth.
Three commonly reported measures. Type-Token Ratio (TTR), Number of Different Words in the first 100 tokens (NDW-100), and Number of Different Words across the first 50 utterances (NDW-50). All three are computed by this tool from the same pasted transcript.

TTR vs. NDW — Which Should I Report?

Use TTR when you are describing a single sample for a teaching example, comparing two samples of equal length, or showing the raw types-to-tokens relationship for a worked example.
Use NDW-100 (Miller 1981) when you are comparing children across samples of different lengths, reporting against published norms, or writing a research-grade evaluation. This is the default in SALT and CLAN.
Use NDW-50 utterances (SUGAR) when your sample is structured in utterances rather than tokens (which is how most clinicians collect samples), you need to compare against the SUGAR norms, or you are working with the standard 50-utterance SUGAR sample.
Common pitfall. Do not report TTR alone for samples of different lengths. A child with TTR 0.45 from a 60-token sample is not better than a child with TTR 0.32 from a 300-token sample — TTR drops with length, so the comparison is meaningless. Report NDW-100 instead.

How This Calculator Works

Paste your language sample into the textarea, one utterance per line. The calculator:

  • Splits the sample into utterances (newlines first, sentence-ending punctuation as a fallback) using the shared SLP utterance parser.
  • Tokenises each utterance: lowercases, strips leading and trailing punctuation, treats contractions like "don't" as a single token.
  • Counts total tokens and total unique types across the entire sample to compute TTR.
  • Computes NDW-100 by counting unique types in the first 100 tokens of the sample (Miller 1981 truncation).
  • Computes NDW-50 by counting unique types across the first 50 utterances of the sample (SUGAR convention).
  • Lists the 20 most frequent word types so you can sanity-check the tokeniser and spot transcription typos.

Length-truncated measures are reported as `—` until the sample meets the minimum length, with a hint telling you how many more tokens or utterances you need.

Frequently Asked Questions