VOCD / D

Vocabulary Diversity (VOCD / D-measure)

VOCD fits a mathematical curve to TTR values at many sub-sample sizes to produce a single length-invariant diversity score known as D.

What VOCD / D measures

VOCD — later refined and renamed "D" by Malvern and colleagues — estimates lexical diversity by taking many random sub-samples of different sizes from the transcript, computing TTR in each, and then fitting a theoretical curve to the resulting points. The parameter of that curve, D, is the metric. Higher D corresponds to richer diversity. Because the curve fit is mathematically independent of sample length, D was one of the first principled answers to the TTR sample-length problem and remains a research standard alongside MATTR.

Formula

TTR(N) = (D / N) * ( √(1 + 2N/D) − 1 ), fit for D across sub-sample sizes N

Normative ranges and benchmarks

  • Age 5;0 — D ≈ 40 – 60
  • Age 8;0 — D ≈ 55 – 80
  • Age 12;0 — D ≈ 70 – 100
  • Adult conversational speech — D ≈ 80 – 120
  • Children with DLD typically score 10 – 20 points below age-matched peers

Normative bands are central estimates drawn from the cited literature. Individual variation is wide — always cross-reference against the source paper and your assessment's own manual before quoting a cut-score in a report.

Clinical use

D is stronger in research than in clinic because computing it by hand is impractical — every software implementation does hundreds of random sub-samples and a curve fit. If you use an automated LSA tool, D is almost always in the output, and it is worth including in your report as a secondary lexical index when you need a length-invariant number for a child you intend to re-assess in a year. For paper-and-pencil sessions, stick with NDW at a fixed sample length and MATTR computed in a spreadsheet. The main thing clinicians should know is that D and MATTR usually agree qualitatively — if one is low and the other is high, the sample is probably too short or contains a lot of one-word utterances.

D is the metric you report when the reviewer will read the appendix. MATTR is the one you put in the body of the report because the parents have to understand it. They both tell the same story about the same kid.
For the appendix, not the parent meeting

References

  1. Malvern, D., Richards, B., Chipere, N., & Durán, P. (2004). Lexical Diversity and Language Development: Quantification and Assessment. Palgrave Macmillan.
  2. McCarthy, P. M., & Jarvis, S. (2007). vocd: A theoretical and empirical evaluation. Language Testing, 24(4), 459–488.
  3. McKee, G., Malvern, D., & Richards, B. (2000). Measuring vocabulary diversity using dedicated software. Literary and Linguistic Computing, 15(3), 323–337.