CAPE-V Voice Rating

Rate the six perceptual parameters of the Consensus Auditory-Perceptual Evaluation of Voice — Overall Severity, Roughness, Breathiness, Strain, Pitch, and Loudness — on the 100 mm visual analog scale as published by Kempster et al. (2009), with instant classification against the Kempster anchor bands (WNL, mild MI, moderate MO, severe SE) and copy-paste chart note output. Built for voice clinic intake, voice therapy progress visits, pre- and post-laryngology surgical follow-up, and SLP graduate training.

Kempster et al. 2009100 mm Visual Analog Scale6 Perceptual ParametersWNL / MI / MO / SEClient-Side
Tool details, related tools, and citation

Rate six CAPE-V perceptual parameters on the 100 mm visual analog scale

The Consensus Auditory-Perceptual Evaluation of Voice (Kempster et al. 2009) rates six perceptual parameters — Overall Severity, Roughness, Breathiness, Strain, Pitch, and Loudness — on a 100 mm visual analog scale. Slide each parameter to the clinician-rated mm position, mark consistent (C) or intermittent (I), and — for pitch and loudness — indicate too high or too low. The tool classifies each parameter against the Kempster anchor bands and returns a copy-paste clinical verdict line.

  1. 1.Overall Severity

    Global, integrated impression of voice deviation — the single clinician judgment that anchors the overall CAPE-V verdict.

    0 — none35 — MI ceiling60 — MO ceiling100 — SE
  2. 2.Roughness

    Perceived irregularity in the voicing source — a raspy, grating, or aperiodic quality most audible on sustained vowels.

    0 — none35 — MI ceiling60 — MO ceiling100 — SE
  3. 3.Breathiness

    Audible escape of air through an incompletely adducted glottis — a noisy, whispered, or hypofunctional quality.

    0 — none35 — MI ceiling60 — MO ceiling100 — SE
  4. 4.Strain

    Perceived effort — a squeezed, hyperfunctional, or pressed quality suggestive of elevated intrinsic laryngeal musculature tension.

    0 — none35 — MI ceiling60 — MO ceiling100 — SE
  5. 5.Pitch

    Perceived deviation of fundamental frequency from age-, sex-, and culture-expected norms. Mark direction (too high or too low) as well as magnitude.

    0 — none35 — MI ceiling60 — MO ceiling100 — SE
  6. 6.Loudness

    Perceived deviation of vocal intensity from culturally expected norms. Mark direction (too high or too low) as well as magnitude.

    0 — none35 — MI ceiling60 — MO ceiling100 — SE
Slide each parameter to the clinician-rated VAS position to see the CAPE-V severity verdict and the per-parameter bands.
CAPE-V severity bands (Kempster et al. 2009)
BandVAS rangeClinical note
Severe (SE)61 - 100 mmSevere deviation from normal voice quality on the CAPE-V visual analog scale. The perceptual feature is consistently and markedly atypical in connected speech and sustained vowels. Expect functional, occupational, and social voice-use impact. Prioritise laryngology consultation and standard voice therapy; pair with a stroboscopic laryngeal examination and an acoustic / aerodynamic battery.
Moderate (MO)36 - 60 mmModerate deviation from normal voice quality. Regular perceptual abnormality heard in connected speech, with measurable impact on listener perception and on the patient-reported voice handicap. Voice therapy is indicated; pair with the VHI-10, stroboscopic laryngeal examination, and acoustic (jitter, shimmer, CPP) and aerodynamic (MPT, s/z ratio) measures.
Mild (MI)10 - 35 mmMild deviation from normal voice quality. The perceptual feature is detectable but does not dominate the percept. Voice therapy is appropriate especially for occupational voice users; re-rate with the CAPE-V at every progress visit and anchor against the patient-reported VHI-10 and a stroboscopic exam.
Within normal limits (WNL)0 - 9 mmWithin normal limits on the CAPE-V visual analog scale. No clinically meaningful deviation for this parameter. Document the score and re-rate at the next progress visit; a score at or near 0 mm on every parameter is the intended post-therapy discharge target for most voice rehabilitation cases.

Boundary rule: 0 - 9 mm is within normal limits. 10 - 35 mm is mildly deviant (MI). 36 - 60 mm is moderately deviant (MO). 61 - 100 mm is severely deviant (SE). Source: Kempster GB, Gerratt BR, Verdolini Abbott K, Barkmeier-Kraemer J, Hillman RE (2009) Consensus Auditory-Perceptual Evaluation of Voice: Development of a Standardized Clinical Protocol. American Journal of Speech-Language Pathology 18(2):124-132.

Automate this workflow

Skip the manual count with ConductSpeech

ConductSpeech transcribes the audio, runs the analysis, and writes the clinical report — all in minutes instead of hours.

Automate this with ConductSpeech
  • Voice clinic intake — rate the CAPE-V alongside the VHI-10, stroboscopic exam, and acoustic / aerodynamic battery
  • Voice therapy progress visits — re-rate every 2 - 4 weeks and track Overall Severity change across visits
  • Pre- and post-surgical laryngology follow-up — compare CAPE-V scores before and after vocal-fold lesion excision, injection laryngoplasty, or thyroplasty
  • Occupational voice user screening — teachers, singers, broadcasters, attorneys, clergy, fitness instructors
  • Parkinson disease hypophonia baseline and LSVT LOUD progress tracking
  • Gender-affirming voice therapy — rate the CAPE-V pitch and loudness parameters across treatment
  • SLP graduate training in voice evaluation — practise CAPE-V rating against expert anchor cases
  • Dysphonia outcome research — the CAPE-V is the U.S. standard perceptual outcome measure
  • Second opinion — re-rate the CAPE-V to confirm a prior rating before revising the treatment plan
  • Chart-review audit — re-rate archived voice samples to verify inter-rater agreement across a clinic team

Don't use for

  • As a substitute for a stroboscopic laryngeal examination — perceptual rating cannot identify structural lesions
  • As the only outcome measure — always pair with the VHI-10 and acoustic / aerodynamic measures
  • On single isolated stimuli — the Kempster protocol requires the full stimulus set of sustained vowels, six CAPE-V sentences, and 20 seconds of spontaneous speech
  • On children under roughly age 5 without age-appropriate stimulus adaptation — the standard CAPE-V sentences are calibrated to school-age and adult speakers
  • For languages other than English without linguistic adaptation — Spanish, French, Portuguese, and Arabic CAPE-V adaptations exist in the literature and should be used for speakers of those languages

Why Perceptual Rating Still Matters in Voice Evaluation

Voice quality is a perceptual phenomenon. The listener ear is the final arbiter of whether a patient sounds rough, breathy, strained, too high, too low, too loud, or too soft — and the acoustic and aerodynamic measures that clinicians use alongside the perceptual rating (jitter, shimmer, CPP, MPT, s/z ratio, electroglottography) are ultimately validated against the perceptual judgment of expert clinicians. The American Speech-Language-Hearing Association Special Interest Group 3 (Voice and Voice Disorders) convened the consensus conference that published the CAPE-V because the earlier GRBAS scale (Hirano 1981) had documented test-retest and inter-rater reliability problems, an ordinal scale with only four discrete levels, and no standardised stimulus set. The CAPE-V was designed to fix each of those problems.

Continuous scale. The 100 mm visual analog scale is a continuous measurement — the clinician can mark anywhere along the line — which means the CAPE-V captures smaller changes than an ordinal scale. Zempleni et al. (2018) and Helou et al. (2010) showed the smallest clinically meaningful change on CAPE-V Overall Severity is approximately 10 mm, which is roughly a 0.3 - 0.5 point change on the older GRBAS 0 - 3 scale and is therefore below the GRBAS resolution limit.
Standardised stimulus set. The Kempster (2009) protocol specifies sustained vowels, six CAPE-V sentences, and 20 seconds of spontaneous speech. Every clinician rates the same stimulus set so the ratings are directly comparable across visits, across clinicians, and across clinics.
Explicit anchor bands. The MI (mildly deviant, 10 - 35 mm), MO (moderately deviant, 36 - 60 mm), and SE (severely deviant, 61 - 100 mm) labels on the printed form give every clinician the same mental reference frame for every mm position.

Interpreting the Six CAPE-V Parameters

Overall Severity. The integrated clinician judgment of how dysphonic the voice sounds as a whole. This single parameter anchors the overall verdict of the CAPE-V. It is not the arithmetic mean of the other five parameters — it is a separate integrated impression that the clinician forms across the sustained vowels, the six sentences, and the spontaneous speech. Many clinicians find that Overall Severity is often slightly higher than the mean of the other five parameters because it captures "something is off" cues that do not map cleanly onto any single feature.
Roughness. Perceived irregularity in the voicing source — a raspy, grating, aperiodic quality most audible on sustained vowels. Roughness is acoustically correlated with elevated jitter, elevated shimmer, and reduced cepstral peak prominence (CPP). Common drivers include vocal-fold mass lesions (nodules, polyps, cysts, leukoplakia), surface scarring, and Reinke oedema.
Breathiness. Audible escape of air through an incompletely adducted glottis — a noisy, whispered, or hypofunctional quality. Breathiness is acoustically correlated with elevated noise-to-harmonic ratio and reduced H1-H2. Common drivers include vocal-fold paresis or paralysis, age-related glottal insufficiency (presbyphonia), post-surgical glottal gap, and psychogenic or functional aphonia.
Strain. Perceived effort — a squeezed, hyperfunctional, pressed quality. Strain is the hallmark perceptual feature of primary muscle tension dysphonia and is the parameter most responsive to voice therapy with resonant voice, flow-phonation, and manual laryngeal release techniques. Strain is also elevated in adductor spasmodic dysphonia, but on a CAPE-V rating it has a distinctive "squeeze-and-release" pattern rather than the consistent elevation of muscle tension dysphonia.
Pitch. Perceived deviation of fundamental frequency from age-, sex-, and culture-expected norms. Mark direction (too high or too low) on the printed form. Pitch too high is common in puberphonia, falsetto-register fixation, and anxiety-related hyperfunction. Pitch too low is common in Reinke oedema, vocal-fold mass lesions, and post-testosterone voice feminisation in gender-affirming care.
Loudness. Perceived deviation of vocal intensity from culturally expected norms. Mark direction on the printed form. Loudness too high is common in hyperfunctional dysphonia, occupational voice over-use, and hearing-impaired speakers. Loudness too low is the hallmark perceptual feature of Parkinson disease hypophonia and responds robustly to Lee Silverman Voice Treatment (LSVT LOUD).

Pair CAPE-V with the Rest of the Voice Battery

The CAPE-V is one component of a complete voice evaluation. The Kempster (2009) paper and the ASHA Practice Portal on voice disorders both stress that the perceptual rating should be interpreted alongside the patient-reported voice handicap, the laryngoscopic exam, and a set of acoustic and aerodynamic measures.

Patient-reported voice handicap (VHI-10 or VHI-30). Rate the perceptual CAPE-V alongside the patient-reported Voice Handicap Index-10 (Rosen et al. 2004) at every voice clinic intake and at every progress visit. The two measures answer different questions — CAPE-V asks how the voice sounds to a trained listener, VHI-10 asks how the voice impact feels to the patient — and they can diverge, especially in occupational voice users (teachers, singers, broadcasters, attorneys, clergy, fitness instructors) who report high handicap on mildly deviant voices and in patients with long-standing dysphonia who have habituated to a severe voice and report low handicap.
Stroboscopic laryngeal examination. Every patient with a CAPE-V Overall Severity above the within-normal-limits band should have a stroboscopic laryngeal examination at intake. Moderate and severe CAPE-V scores are typically associated with visible structural or neurogenic findings — nodules, polyps, cysts, Reinke oedema, scarring, paresis, or paralysis — and the laryngoscopic exam is the single most important diagnostic step for ruling out lesions that require surgery before therapy.
Acoustic battery. Jitter, shimmer, cepstral peak prominence (CPP), noise-to-harmonic ratio (NHR), and the Acoustic Voice Quality Index (AVQI, Maryn et al. 2010) all correlate with specific CAPE-V parameters and provide an objective cross-check on the perceptual rating.
Aerodynamic battery. Maximum phonation time, s/z ratio, mean phonatory airflow, and subglottal pressure estimates complete the workup. A short MPT (< 15 seconds in adults) paired with an elevated CAPE-V breathiness score suggests glottal insufficiency; a short MPT paired with an elevated strain score suggests hyperfunction against a hypoadducted glottis.

Tracking Voice Therapy Progress with Serial CAPE-V

The CAPE-V is the single best way to track voice therapy progress in a voice clinic. Rate the CAPE-V at intake and at every progress visit, typically every 2 - 4 weeks, and plot the Overall Severity score across visits. The clinically meaningful change (Zempleni et al. 2018) is approximately 10 mm on the Overall Severity VAS, so a 10 mm or larger drop between consecutive visits is a real change; smaller drops should be interpreted as noise.

Discharge criteria. Most U.S. voice clinics use a discharge threshold of CAPE-V Overall Severity below 10 mm (within normal limits) alongside a VHI-10 below 11 and a patient-reported return to occupational voice use. Discharge criteria should be individualised for occupational voice users who require a below-9 mm CAPE-V Overall Severity for the voice to meet the demand of their job (teachers, broadcasters, singers).
Non-responders. A patient whose CAPE-V Overall Severity fails to drop by at least 10 mm across four progress visits is a non-responder. Re-evaluate the diagnosis, re-image the larynx, refer to laryngology for a second opinion, and consider whether the primary driver is structural (requires surgery), neurogenic (requires a different therapy approach such as LSVT LOUD for Parkinson disease hypophonia), or psychogenic (requires a referral to the voice clinic psychologist).
Documentation. Paste the CAPE-V mm scores and the Kempster anchor band into every voice-therapy progress note. The chart note should reference the CAPE-V Overall Severity score explicitly, the direction and magnitude of change since the previous visit, and the continued or revised voice therapy plan.

Frequently Asked Questions