MATTR

Moving-Average Type-Token Ratio (MATTR)

MATTR computes TTR across a sliding window and removes the sample-length confound that makes raw TTR unreliable.

What MATTR measures

Moving-Average Type-Token Ratio, introduced by Covington and McFall in 2010, computes TTR across a sliding window of fixed size (typically 50 or 100 tokens) and then averages across all windows in the sample. Because every window is the same length, the resulting average is invariant to how long the overall sample is — the single biggest criticism of traditional TTR disappears. MATTR has become the lexical-diversity metric of choice in modern automated LSA tools and in most corpus-linguistic research.

Formula

MATTR = mean of (TTR_i) over all length-W sliding windows, where W is the window size

Normative ranges and benchmarks

  • W = 50, age 5;0 — mean 0.72 (typical range 0.68 – 0.78)
  • W = 50, age 8;0 — mean 0.77 (typical range 0.72 – 0.82)
  • W = 50, age 12;0 — mean 0.80 (typical range 0.76 – 0.84)
  • Window sizes below 25 tokens give noisy estimates; above 100 tokens give diminishing returns
  • MATTR is stable across samples of 100 to 1000 tokens, making it ideal for cross-child comparison

Normative bands are central estimates drawn from the cited literature. Individual variation is wide — always cross-reference against the source paper and your assessment's own manual before quoting a cut-score in a report.

Clinical use

MATTR earns its place in a report on the days you have samples of wildly different lengths — one child produced 80 utterances at a picnic, the next produced 280 at the art table, and you still want to compare their lexical diversity honestly. Because MATTR is averaged across same-size windows, it behaves itself even on uneven samples. For day-to-day clinical practice the interpretation is the same as TTR — higher is more diverse — but it can be placed alongside published windows sizes without the "compared to what?" asterisk. Clinicians should set their window to 50 tokens for typical clinic-length samples and to 100 tokens for corpus-level data.

MATTR is what happens when a linguist does the statistics homework TTR skipped. For about forty lines of Python you get a number you can actually put into a report without the committee asking about sample length.
Forty lines of Python, one honest number

References

  1. Covington, M. A., & McFall, J. D. (2010). Cutting the Gordian knot: The moving-average type-token ratio (MATTR). Journal of Quantitative Linguistics, 17(2), 94–100.
  2. Fergadiotis, G., Wright, H. H., & West, T. M. (2013). Measuring lexical diversity in narrative discourse. American Journal of Speech-Language Pathology, 22(2), S397–S408.
  3. Kyle, K. (2016). Measuring syntactic development in L2 writing: Fine grained indices of syntactic complexity and usage-based indices of syntactic sophistication. Georgia State University dissertation.