ToolsConductScience tool
TTRFree in-browser calculator

Lexical Diversity Calculator.

Type-Token Ratio, NDW-100 (Miller 1981), and NDW per 50 utterances (SUGAR) from a single pasted transcript. The calculator handles tokenisation, length truncation, and type counting — your data stays in your browser.

PrivateData stays in your browser
LiveNo sign-up required
Validated2026-04-06
CitableMethods and citation included

Calculator

Results update in place

Paste your language sample

One utterance per line. The calculator tokenises the sample and reports type-token ratio (TTR), number of different words (NDW), and total tokens — instantly, in your browser.

Paste a transcript above to compute TTR, NDW per 100 tokens, NDW per 50 utterances, total tokens, and total types.

Automate this workflow

Skip the manual count with ConductSpeech

ConductSpeech transcribes the audio, runs the analysis, and writes the clinical report — all in minutes instead of hours.

Automate this with ConductSpeech

When to use

  • Computing TTR, NDW-100, and NDW-50 from a transcribed language sample
  • Tracking expressive vocabulary growth across therapy sessions
  • Comparing a child against SUGAR (Pavelko & Owens 2017) lexical diversity norms
  • Reporting a second descriptive metric alongside MLU in an evaluation
  • Teaching graduate SLP students why TTR is sample-length sensitive
  • Spot-checking a transcript for typos by reviewing the most frequent word types

Do not use for

  • Comparing TTR across samples of different lengths — use NDW-100 instead
  • Reporting NDW-100 on samples below 100 tokens (the calculator will refuse and tell you)
  • Reporting NDW-50 on samples below 50 utterances (the calculator will refuse and tell you)
  • Diagnosing developmental language disorder from lexical diversity alone — combine with MLU and PGU
  • Comparing English NDW values against bilingual children without a parallel sample in their other language

Always fix the length before reporting

TTR is meaningful only at a fixed sample length, and NDW changes with sample size. Always report NDW-100 (Miller) or NDW-50 (SUGAR) so other clinicians can interpret your number against norms.

Tokens are case-insensitive but morphology-sensitive

The calculator merges "Dog" and "dog" into a single type but treats "run", "runs", and "running" as distinct types. This matches SALT and most published norms — do not lemmatise unless your reference data was lemmatised too.

Contractions count as one token

Under the standard SALT/SUGAR NDW convention, "don't", "I'm", and "we're" each count as one token (and one type). MLU rules differ — for MLU, contractions count as 2 morphemes — so the same word contributes differently to MLU vs. NDW.

Use the per-utterance breakdown to spot transcription errors

A surprising spike in unique types per utterance often indicates a typo or an unusually creative utterance. Review the top word types table for misspellings — they inflate NDW.

Lexical diversity complements MLU, it does not replace it

A child can have low MLU and high lexical diversity (lots of words, short utterances) or high MLU and low lexical diversity (long sentences with limited vocabulary). Report both axes for a complete description of expressive language.

1

Method

Tokens are lowercased and stripped of leading and trailing punctuation by the shared SLP utterance parser (src/lib/slp/utterance-parser.ts). Contractions are treated as a single token to match the SALT/SUGAR NDW convention. TTR = total types / total tokens across the full sample. NDW-100 = number of unique types in the first 100 tokens (Miller 1981 truncation). NDW-50 = number of unique types across the first 50 utterances (SUGAR convention; Pavelko & Owens 2017). Length-truncated measures are reported only when the sample meets the minimum length.

2

Validated

Last validated 2026-04-06. Calculations are designed for planning and documentation support; verify procurement decisions against manufacturer specifications or institutional SOPs.

3

How to cite

How to Cite

ConductScience Lexical Diversity Calculator (v1.0). ConductScience, Inc. 2026. Available at: https://conductscience.com/tools/lexical-diversity-calculator

Templin MC. Certain Language Skills in Children: Their Development and Interrelationships. University of Minnesota Press; 1957.

Miller JF. Assessing Language Production in Children: Experimental Procedures. University Park Press; 1981.

Pavelko SL, Owens RE. Sampling Utterances and Grammatical Analysis Revised (SUGAR): New normative values for language sample analysis measures. LSHSS. 2017;48(3):197-215. doi:10.1044/2017_LSHSS-17-0022

Watkins RV, Kelly DJ, Harbers HM, Hollis W. Measuring children's lexical diversity: Differentiating typical and impaired language learners. JSLHR. 1995;38(6):1349-1355. doi:10.1044/jshr.3806.1349

What Is Lexical Diversity?

Lexical diversity describes how varied a child's vocabulary is in spontaneous speech. It is one of the three core descriptive dimensions of a language sample alongside syntactic complexity (MLU) and grammaticality (PGU). A child with rich lexical diversity uses many different words; a child with poor lexical diversity reuses a small set of words across many utterances.

Why it matters. Lexical diversity is independently predictive of school readiness, narrative quality, and long-term reading outcomes. Children with developmental language disorder (DLD) consistently score lower on lexical diversity measures than typically developing peers, even when their MLU is in the normal range. Tracking lexical diversity gives you a second axis for monitoring expressive language growth.
Three commonly reported measures. Type-Token Ratio (TTR), Number of Different Words in the first 100 tokens (NDW-100), and Number of Different Words across the first 50 utterances (NDW-50). All three are computed by this tool from the same pasted transcript.

TTR vs. NDW — Which Should I Report?

Use TTR when you are describing a single sample for a teaching example, comparing two samples of equal length, or showing the raw types-to-tokens relationship for a worked example.
Use NDW-100 (Miller 1981) when you are comparing children across samples of different lengths, reporting against published norms, or writing a research-grade evaluation. This is the default in SALT and CLAN.
Use NDW-50 utterances (SUGAR) when your sample is structured in utterances rather than tokens (which is how most clinicians collect samples), you need to compare against the SUGAR norms, or you are working with the standard 50-utterance SUGAR sample.
Common pitfall. Do not report TTR alone for samples of different lengths. A child with TTR 0.45 from a 60-token sample is not better than a child with TTR 0.32 from a 300-token sample — TTR drops with length, so the comparison is meaningless. Report NDW-100 instead.

How This Calculator Works

Paste your language sample into the textarea, one utterance per line. The calculator:

  • Splits the sample into utterances (newlines first, sentence-ending punctuation as a fallback) using the shared SLP utterance parser.
  • Tokenises each utterance: lowercases, strips leading and trailing punctuation, treats contractions like "don't" as a single token.
  • Counts total tokens and total unique types across the entire sample to compute TTR.
  • Computes NDW-100 by counting unique types in the first 100 tokens of the sample (Miller 1981 truncation).
  • Computes NDW-50 by counting unique types across the first 50 utterances of the sample (SUGAR convention).
  • Lists the 20 most frequent word types so you can sanity-check the tokeniser and spot transcription typos.

Length-truncated measures are reported as `—` until the sample meets the minimum length, with a hint telling you how many more tokens or utterances you need.

Frequently asked

325
Free tools
1,200+
Institutions
100%
Client-side
0
Uploads required