Vocabulary Diversity (VOCD / D-measure)
VOCD fits a mathematical curve to TTR values at many sub-sample sizes to produce a single length-invariant diversity score known as D.
What VOCD / D measures
VOCD — later refined and renamed "D" by Malvern and colleagues — estimates lexical diversity by taking many random sub-samples of different sizes from the transcript, computing TTR in each, and then fitting a theoretical curve to the resulting points. The parameter of that curve, D, is the metric. Higher D corresponds to richer diversity. Because the curve fit is mathematically independent of sample length, D was one of the first principled answers to the TTR sample-length problem and remains a research standard alongside MATTR.
Formula
TTR(N) = (D / N) * ( √(1 + 2N/D) − 1 ), fit for D across sub-sample sizes NNormative ranges and benchmarks
- Age 5;0 — D ≈ 40 – 60
- Age 8;0 — D ≈ 55 – 80
- Age 12;0 — D ≈ 70 – 100
- Adult conversational speech — D ≈ 80 – 120
- Children with DLD typically score 10 – 20 points below age-matched peers
Normative bands are central estimates drawn from the cited literature. Individual variation is wide — always cross-reference against the source paper and your assessment's own manual before quoting a cut-score in a report.
Clinical use
D is stronger in research than in clinic because computing it by hand is impractical — every software implementation does hundreds of random sub-samples and a curve fit. If you use an automated LSA tool, D is almost always in the output, and it is worth including in your report as a secondary lexical index when you need a length-invariant number for a child you intend to re-assess in a year. For paper-and-pencil sessions, stick with NDW at a fixed sample length and MATTR computed in a spreadsheet. The main thing clinicians should know is that D and MATTR usually agree qualitatively — if one is low and the other is high, the sample is probably too short or contains a lot of one-word utterances.
“D is the metric you report when the reviewer will read the appendix. MATTR is the one you put in the body of the report because the parents have to understand it. They both tell the same story about the same kid.”
Get the full analysis
Automate VOCD / D in your next language sample
Upload the audio. ConductSpeech transcribes, scores every metric on this page — including VOCD / D — and writes a parent-ready summary in minutes.
Free tools that compute VOCD / D
Lexical Diversity Calculator
Paste a language sample and get type-token ratio (TTR), number of different words in the first 100 tokens (NDW-100, Miller 1981), and NDW per 50 utterances (NDW-50, SUGAR). Implements the standard SALT/SUGAR tokenisation rules and runs entirely in your browser.
Open toolLanguage Sample Worksheet
Free printable and fillable language sample analysis worksheet for speech-language pathologists. Five columns (utterance #, transcription, morpheme count, grammatical Y/N, notes), configurable row count up to 100 utterances, browser print produces a clean PDF, and an inline running summary tracks total utterances, total morphemes, and rolling MLU as you fill it in.
Open toolMLU Calculator
Paste a language sample and get Mean Length of Utterance in morphemes and words, total utterances, total morphemes, and the matching Brown's stage. Implements Brown (1973) morpheme counting rules and runs entirely in your browser.
Open toolRelated LSA metrics
Moving-Average Type-Token Ratio (MATTR)
MATTR computes TTR across a sliding window and removes the sample-length confound that makes raw TTR unreliable.
TTRType-Token Ratio (TTR)
TTR divides unique word roots by total words to index lexical diversity — quick to compute but highly sensitive to sample length.
NDWNumber of Different Words (NDW)
NDW counts unique word roots across a fixed sample length and is the most stable lexical-diversity measure for school-age children.
References
- Malvern, D., Richards, B., Chipere, N., & Durán, P. (2004). Lexical Diversity and Language Development: Quantification and Assessment. Palgrave Macmillan.
- McCarthy, P. M., & Jarvis, S. (2007). vocd: A theoretical and empirical evaluation. Language Testing, 24(4), 459–488.
- McKee, G., Malvern, D., & Richards, B. (2000). Measuring vocabulary diversity using dedicated software. Literary and Linguistic Computing, 15(3), 323–337.