Pillar guideSpeech-Language Pathology12 min read

MLU: Calculation, Norms, and Clinical Use — A Practical Guide for SLPs

Mean Length of Utterance (MLU) is the single most-reported number in Language Sample Analysis, and the single most-misreported. This guide walks through Brown’s morpheme-counting rules, both flavours of MLU, the Brown and SUGAR normative data, and the IEP-goal workflow that turns the score into a defensible clinical decision — all with free browser tools.

1. What MLU is and why it became the standard

Mean Length of Utterance is the average number of meaningful units in a child’s spontaneous utterances. Roger Brown introduced the metric in A First Language (1973) as a way to summarise the grammar of his three longitudinal participants — Adam, Eve, and Sarah — and the field adopted it almost overnight because it does something no test score can: it produces a single, defensible number from the child’s real spoken language. Six decades later it is still the first metric on every clinical Language Sample Analysis report.

MLU works because grammar gets longer before it gets more complex. A two-year-old saying "doggie sleep" produces an MLU of 2. A five-year-old saying "the doggie is sleeping in his bed" produces an MLU of about 7. The growth curve from 1.0 to roughly 6.0 morphemes is a remarkably consistent index of expressive-language maturation in monolingual General American English children, which is why ASHA, Owens (2014), and the SUGAR research team all keep recommending it as the headline metric for school-based reports.

There are two flavours: MLU in morphemes (MLU-m) and MLU in words (MLU-w). MLU-m counts every meaningful unit, including bound morphemes such as past-tense -ed and plural -s. MLU-w counts whitespace-separated words. They report related but distinct things, and most clinicians end up reporting both.

2. Brown’s morpheme-counting rules — the part everyone gets wrong

The single biggest source of error in clinical MLU reporting is hand-counting morphemes by ear. Brown’s rules are precise, and most of them are also counterintuitive. The list below is the consensus reading of Brown (1973, pp. 54-55) as restated by Owens (2014) and the SALT 2020 transcription manual. If your scoring software (or your tally sheet) does not implement them, your MLU will drift in the same direction every time.

  • Count compound words ("birthday", "playground", "pancake") as ONE morpheme. They were a single lexical item to the child.
  • Count irregular past-tense verbs ("went", "ate", "ran") as ONE morpheme. The child has not yet decomposed them.
  • Count irregular plurals ("feet", "children", "mice") as ONE morpheme.
  • Count diminutives such as "doggie", "horsie", "kitty" as ONE morpheme — the -ie/-y is not a productive morpheme in child speech.
  • Count all auxiliary forms ("is", "have", "will", "do", "be") and the catenatives ("gonna", "wanna", "hafta") as SEPARATE morphemes. "Gonna" = 2 morphemes (gon + na, i.e., going + to).
  • Count inflections (-s plural, -ed past, -ing progressive, possessive 's, third-person -s) as SEPARATE morphemes once the child uses them productively in obligatory contexts.
  • Do NOT count fillers ("um", "uh", "er"), false starts, mazes, or repeated whole words. Bracket them in the transcript so they are excluded.
  • Do NOT count unintelligible segments. Mark them and skip them in the denominator.

Counting "the doggies are running" the right way

"the" (1) + "doggie" (1) + "-s plural" (1) + "are" (1) + "run" (1) + "-ing progressive" (1) = 6 morphemes in 4 words. Hand-counters routinely report this utterance as 4 or 5 morphemes; the calculator reports 6.

3. MLU-m vs MLU-w — when to report which

MLU-morphemes is more sensitive to early grammatical development because every newly-acquired bound morpheme adds to the count. It is the right metric for any child below MLU 4.0 — roughly Brown Stages I through III, or developmental ages 18 months to 4 years. Below this ceiling, MLU-m correlates with language age at r > 0.85 in the Brown corpus and at r ≈ 0.7 in the larger SUGAR sample.

MLU-words is easier to compute, requires no morpheme-counting rules, and is the metric most school-age normative datasets actually report. Above MLU 4.0, MLU-m and MLU-w correlate at r > 0.95 — they essentially measure the same thing once the child has acquired the high-frequency bound morphemes. For school-age children (kindergarten through grade 5) the field generally reports MLU-w against the SUGAR or Heilmann reference set and only falls back to MLU-m when the analysis question is specifically about morphological development.

For bilingual children, MLU-w is usually the safer choice because morpheme-counting rules differ across languages and there is no consensus cross-linguistic equivalent for "morpheme" as Brown defined it. Spanish-English bilingual evaluations almost always report MLU-w in each language separately and avoid MLU-m altogether.

4. Normative data — Brown’s stages and the SUGAR update

Brown (1973) divided early language development into five stages defined by MLU bands: Stage I (MLU 1.0-2.0), Stage II (2.0-2.5), Stage III (2.5-3.0), Stage IV (3.0-3.75), and Stage V (3.75-4.5). The stages map approximately to chronological ages 18-26 months, 27-30 months, 31-34 months, 35-40 months, and 41-46 months for typically-developing monolingual children. The stages remain useful as a rapid clinical gloss but they were derived from three children and they overestimate growth past Stage V — most school SLPs need a different reference for anyone older than 4.

The current state of the art for school-age MLU is the SUGAR (Sampling Utterances and Grammatical Analysis Revised) project — Pavelko and Owens (2017, 2019) — which collected 50-utterance conversational samples from 250 typically-developing US children aged 3 through 7 and published age-banded means and standard deviations for MLU-w, NDW, and percent grammatical utterances. The SUGAR norms are the most defensible reference for monolingual General American English speakers in the 3-7 age band today.

For older school-age children, Heilmann, Nockerts, and Miller (2010) extended the reference range up through age 11 using the SALT databases. Above age 11, MLU as a metric loses sensitivity because typical adolescents already produce most utterances above the 7-morpheme ceiling, and clinicians switch to subordination index, T-units, or IPSyn for syntactic-complexity tracking.

  • Brown Stage I (MLU 1.0-2.0) — typical age 18-26 months. Single-word and two-word telegraphic speech.
  • Brown Stage II (MLU 2.0-2.5) — typical age 27-30 months. Present progressive -ing and prepositions in/on emerge.
  • Brown Stage III (MLU 2.5-3.0) — typical age 31-34 months. Plural -s and irregular past appear.
  • Brown Stage IV (MLU 3.0-3.75) — typical age 35-40 months. Possessive ‘s and uncontractible copula stabilise.
  • Brown Stage V (MLU 3.75-4.5) — typical age 41-46 months. Articles a/the and regular past -ed reach mastery.
  • SUGAR age 5 — mean MLU-w ≈ 5.5 (SD ≈ 0.8) on a 50-utterance conversational sample.
  • SUGAR age 7 — mean MLU-w ≈ 6.5 (SD ≈ 0.9). Children scoring more than 1 SD below the mean warrant follow-up.

Use the SUGAR Lookup, not your memory

The exact SUGAR means and standard deviations for ages 3-7 (and the Heilmann extension to age 11) live in the SUGAR Norms Lookup tool. Look them up at point of care — the field updates them often enough that hand-memorising is a liability.

5. Collecting a sample that gives you a stable MLU

MLU is unstable on small samples. The consensus floor is 50 utterances — below that, your single-utterance noise dominates the mean and the score swings by 0.5-1.0 morphemes between sessions. Pavelko and Owens (2017) collect exactly 50 utterances per child for the SUGAR norms, Heilmann et al. (2010) collect 100, and Eisenberg and Guo (2013) recommend 50-100 for clinical decisions. Anything shorter than 50 is a screening tool, not an MLU report.

The sample also has to be elicited under conditions the norms expect. SUGAR norms come from a free-play conversation between an adult and the child for ages 3-5, and a topic-prompted conversation for ages 6-7. If you elicit a story-retell instead, your MLU will be inflated by 0.5-1.0 morphemes because narrative tasks pull longer utterances. Match the elicitation method to the reference set.

  • Aim for 50-100 utterances. 50 is the SUGAR floor; 100 is the Heilmann standard.
  • Use age-appropriate elicitation: free play (ages 3-4), conversation (ages 5-7), expository or narrative (ages 8+).
  • Keep adult talk turns at ≈30% of total turns. Adult-dominated samples depress child MLU.
  • Bracket mazes, fillers, and false starts in transcription — they are not part of MLU.
  • Record the elicitation method on the report so the next clinician can compare apples to apples.

6. Five common MLU errors and how to avoid them

Almost every flagged MLU report fails at one of the same five points. Read this list before your next clinic day; it will save you a re-test and probably save a child from a misclassified eligibility decision.

  • Counting catenatives wrong. "Gonna" is 2 morphemes (going + to), "wanna" is 2 (want + to), "hafta" is 2 (have + to). Counting them as 1 systematically depresses MLU for ages 3-5.
  • Counting compound words as multiple morphemes. "Birthday" is 1, not 2. "Pancake" is 1, not 2.
  • Counting proper names by syllable. "Spider-Man" is 1 morpheme. "Elsa" is 1 morpheme.
  • Including unintelligible utterances in the denominator. Only intelligible utterances count toward N.
  • Reporting MLU on fewer than 50 utterances. Below the 50-utterance floor, the mean is unstable and the standard error is too wide for clinical interpretation.

7. From MLU score to IEP goal — the workflow

A defensible IEP goal written from an MLU score names the metric, the structures driving the gap, the target criterion, and the conditions under which the data will be collected. The MLU score on its own is a baseline number; the goal turns it into a measurable plan.

The workflow: identify the SUGAR or Brown reference for the child’s age, calculate the gap in morphemes between the child’s mean and the reference mean, examine the missing structures (which Brown’s morphemes are not appearing in obligatory contexts?), and write a goal that targets those structures with a 6-month criterion. Re-run the same 50-utterance MLU at the next IEP review — the metric is the goal’s own progress monitor.

  • Sample IEP goal (MLU 2.8 vs SUGAR age-4 expected 4.2): "Within 36 weeks of specially designed instruction, when shown an age-appropriate picture, [Student] will produce 5 consecutive utterances with mean length ≥ 3.5 morphemes including productive use of present-progressive -ing and plural -s, on 4 out of 5 trials across 3 sessions."
  • Always include the structures driving the gap, not just the metric — a goal that says "increase MLU to 3.5" gives the therapist no instructional target.
  • Re-run a 25-utterance probe quarterly and a full 50-utterance sample annually for IEP progress monitoring.
  • Document the elicitation method in the goal so the re-test conditions are reproducible.

8. Computing MLU — free tools, SALT, and ConductSpeech

Three options exist for actually scoring MLU. The browser-based MLU Calculator on this site implements Brown’s rules in code, accepts plain-text utterances with no special transcription syntax, and emits MLU-m, MLU-w, total morphemes, total words, and the matching Brown stage in one pass. It is free, runs entirely client-side (the child’s data never leaves the browser), and is the fastest option for a single sample.

SALT (Systematic Analysis of Language Transcripts) is the long-standing professional alternative. It costs roughly $295 per single-user licence, requires its own transcription syntax, and ships with a 700-child reference database that is broader than SUGAR. For a clinician running 50+ language samples per year on a research-grade workflow, SALT is still worth the investment. For a clinician running 10 samples per year as part of school-based evaluations, the free MLU Calculator covers the same metric on the same Brown rules.

ConductSpeech is the AI alternative for clinicians whose bottleneck is transcription, not scoring. It accepts the raw audio recording, runs the transcription, computes the full MLU/NDW/PGU battery, and drafts a present-levels paragraph. It is the only option in this list that automates the slow step — typing the transcript — so it pays for itself on caseloads above 20-30 students.

Free tools and reference pages

Every link in this guide stays on conductscience.com. Open any tool in a new tab and come back here for context.

Frequently asked questions

What is a normal MLU for a 3-year-old?
On the SUGAR norms (Pavelko & Owens, 2017), the mean MLU-words for typically-developing 3-year-olds on a 50-utterance conversational sample is approximately 3.2 (SD ≈ 0.6). Children scoring more than 1 standard deviation below the mean (roughly 2.6) on a 50-utterance sample warrant follow-up. Use the SUGAR Norms Lookup for the exact, current values.
How is MLU in morphemes different from MLU in words?
MLU-morphemes counts every meaningful unit, including bound morphemes such as past-tense -ed, plural -s, and progressive -ing. MLU-words counts whitespace-separated words and ignores bound morphemes. For children below MLU 4.0, MLU-m is more sensitive to grammatical development and is the recommended metric. Above MLU 4.0 the two metrics correlate at r > 0.95 and either is acceptable.
How many utterances do I need for a reliable MLU?
50 utterances is the consensus floor across Owens (2014), Pavelko and Owens (2017), Heilmann et al. (2010), and Eisenberg and Guo (2013). Below 50 utterances the mean is unstable and the standard error is too wide for clinical decisions. SUGAR norms are derived on exactly 50 utterances, which is also the most efficient floor for school-based clinicians.
Should I count "gonna" as one morpheme or two?
Two. "Gonna" is the contracted form of "going to", and Brown’s rules count it as 2 morphemes (gon + na). The same rule applies to "wanna" (2: want + to) and "hafta" (2: have + to). Counting catenatives as 1 morpheme is one of the most common hand-counting errors and systematically depresses MLU for ages 3-5.
Can I use MLU for bilingual children?
Yes, but report MLU separately for each language and use MLU-words rather than MLU-morphemes — morpheme-counting rules differ across languages and there is no consensus cross-linguistic equivalent. Use language-matched norms (SUGAR for English, the SUGAR Spanish supplement for Spanish, or within-child progress monitoring where no published norms exist).
What does ConductSpeech add over the free MLU Calculator?
The free MLU Calculator on this site assumes you already have a typed transcript. ConductSpeech accepts the raw audio recording, runs the transcription, computes MLU plus the rest of the standard battery (NDW, PGU, percent grammatical utterances), and drafts a present-levels paragraph in one pass. It pays for itself on caseloads above 20-30 students where the bottleneck is transcription time, not scoring.

References

  1. Brown, R. (1973). A First Language: The Early Stages. Cambridge, MA: Harvard University Press.
  2. Owens, R. E. (2014). Language Disorders: A Functional Approach to Assessment and Intervention (6th ed.). Boston, MA: Pearson.
  3. Pavelko, S. L., & Owens, R. E. (2017). Sampling Utterances and Grammatical Analysis Revised (SUGAR): New normative values for language sample analysis measures. Language, Speech, and Hearing Services in Schools, 48(3), 197-215.
  4. Pavelko, S. L., & Owens, R. E. (2019). Diagnostic accuracy of the Sampling Utterances and Grammatical Analysis Revised (SUGAR) measures for identifying children with language impairment. Language, Speech, and Hearing Services in Schools, 50(2), 211-223.
  5. Heilmann, J., Nockerts, A., & Miller, J. F. (2010). Language sampling: Does the length of the transcript matter? Language, Speech, and Hearing Services in Schools, 41(4), 393-404.
  6. Eisenberg, S. L., & Guo, L. (2013). Differentiating children with and without language impairment based on grammaticality. Language, Speech, and Hearing Services in Schools, 44(1), 20-31.
  7. Miller, J. F., & Iglesias, A. (2008). Systematic Analysis of Language Transcripts (SALT) [Computer software]. SALT Software, LLC.
  8. Rice, M. L., Smolik, F., Perpich, D., Thompson, T., Rytting, N., & Blossom, M. (2010). Mean length of utterance levels in 6-month intervals for children 3 to 9 years with and without language impairments. Journal of Speech, Language, and Hearing Research, 53(2), 333-349.
  9. Parker, M. D., & Brorson, K. (2005). A comparative study between mean length of utterance in morphemes (MLUm) and mean length of utterance in words (MLUw). First Language, 25(3), 365-376.

This article is a clinical reference, not a substitute for individual clinical judgement. Clinicians must adapt every recommendation to the individual student and to the current edition of any cited instrument manual.

Automate this workflow

Run a complete MLU report from raw audio

ConductSpeech automates transcription, MLU scoring, and report drafting so you can spend the saved time on therapy. Built for SLPs who already know how to do the math by hand.

Automate this with ConductSpeech