Why Perceptual Rating Still Matters in Voice Evaluation
Voice quality is a perceptual phenomenon. The listener ear is the final arbiter of whether a patient sounds rough, breathy, strained, too high, too low, too loud, or too soft — and the acoustic and aerodynamic measures that clinicians use alongside the perceptual rating (jitter, shimmer, CPP, MPT, s/z ratio, electroglottography) are ultimately validated against the perceptual judgment of expert clinicians. The American Speech-Language-Hearing Association Special Interest Group 3 (Voice and Voice Disorders) convened the consensus conference that published the CAPE-V because the earlier GRBAS scale (Hirano 1981) had documented test-retest and inter-rater reliability problems, an ordinal scale with only four discrete levels, and no standardised stimulus set. The CAPE-V was designed to fix each of those problems.
Continuous scale. The 100 mm visual analog scale is a continuous measurement — the clinician can mark anywhere along the line — which means the CAPE-V captures smaller changes than an ordinal scale. Zempleni et al. (2018) and Helou et al. (2010) showed the smallest clinically meaningful change on CAPE-V Overall Severity is approximately 10 mm, which is roughly a 0.3 - 0.5 point change on the older GRBAS 0 - 3 scale and is therefore below the GRBAS resolution limit.
Standardised stimulus set. The Kempster (2009) protocol specifies sustained vowels, six CAPE-V sentences, and 20 seconds of spontaneous speech. Every clinician rates the same stimulus set so the ratings are directly comparable across visits, across clinicians, and across clinics.
Explicit anchor bands. The MI (mildly deviant, 10 - 35 mm), MO (moderately deviant, 36 - 60 mm), and SE (severely deviant, 61 - 100 mm) labels on the printed form give every clinician the same mental reference frame for every mm position.
Interpreting the Six CAPE-V Parameters
Overall Severity. The integrated clinician judgment of how dysphonic the voice sounds as a whole. This single parameter anchors the overall verdict of the CAPE-V. It is not the arithmetic mean of the other five parameters — it is a separate integrated impression that the clinician forms across the sustained vowels, the six sentences, and the spontaneous speech. Many clinicians find that Overall Severity is often slightly higher than the mean of the other five parameters because it captures "something is off" cues that do not map cleanly onto any single feature.
Roughness. Perceived irregularity in the voicing source — a raspy, grating, aperiodic quality most audible on sustained vowels. Roughness is acoustically correlated with elevated jitter, elevated shimmer, and reduced cepstral peak prominence (CPP). Common drivers include vocal-fold mass lesions (nodules, polyps, cysts, leukoplakia), surface scarring, and Reinke oedema.
Breathiness. Audible escape of air through an incompletely adducted glottis — a noisy, whispered, or hypofunctional quality. Breathiness is acoustically correlated with elevated noise-to-harmonic ratio and reduced H1-H2. Common drivers include vocal-fold paresis or paralysis, age-related glottal insufficiency (presbyphonia), post-surgical glottal gap, and psychogenic or functional aphonia.
Strain. Perceived effort — a squeezed, hyperfunctional, pressed quality. Strain is the hallmark perceptual feature of primary muscle tension dysphonia and is the parameter most responsive to voice therapy with resonant voice, flow-phonation, and manual laryngeal release techniques. Strain is also elevated in adductor spasmodic dysphonia, but on a CAPE-V rating it has a distinctive "squeeze-and-release" pattern rather than the consistent elevation of muscle tension dysphonia.
Pitch. Perceived deviation of fundamental frequency from age-, sex-, and culture-expected norms. Mark direction (too high or too low) on the printed form. Pitch too high is common in puberphonia, falsetto-register fixation, and anxiety-related hyperfunction. Pitch too low is common in Reinke oedema, vocal-fold mass lesions, and post-testosterone voice feminisation in gender-affirming care.
Loudness. Perceived deviation of vocal intensity from culturally expected norms. Mark direction on the printed form. Loudness too high is common in hyperfunctional dysphonia, occupational voice over-use, and hearing-impaired speakers. Loudness too low is the hallmark perceptual feature of Parkinson disease hypophonia and responds robustly to Lee Silverman Voice Treatment (LSVT LOUD).
Pair CAPE-V with the Rest of the Voice Battery
The CAPE-V is one component of a complete voice evaluation. The Kempster (2009) paper and the ASHA Practice Portal on voice disorders both stress that the perceptual rating should be interpreted alongside the patient-reported voice handicap, the laryngoscopic exam, and a set of acoustic and aerodynamic measures.
Patient-reported voice handicap (VHI-10 or VHI-30). Rate the perceptual CAPE-V alongside the patient-reported Voice Handicap Index-10 (Rosen et al. 2004) at every voice clinic intake and at every progress visit. The two measures answer different questions — CAPE-V asks how the voice sounds to a trained listener, VHI-10 asks how the voice impact feels to the patient — and they can diverge, especially in occupational voice users (teachers, singers, broadcasters, attorneys, clergy, fitness instructors) who report high handicap on mildly deviant voices and in patients with long-standing dysphonia who have habituated to a severe voice and report low handicap.
Stroboscopic laryngeal examination. Every patient with a CAPE-V Overall Severity above the within-normal-limits band should have a stroboscopic laryngeal examination at intake. Moderate and severe CAPE-V scores are typically associated with visible structural or neurogenic findings — nodules, polyps, cysts, Reinke oedema, scarring, paresis, or paralysis — and the laryngoscopic exam is the single most important diagnostic step for ruling out lesions that require surgery before therapy.
Acoustic battery. Jitter, shimmer, cepstral peak prominence (CPP), noise-to-harmonic ratio (NHR), and the Acoustic Voice Quality Index (AVQI, Maryn et al. 2010) all correlate with specific CAPE-V parameters and provide an objective cross-check on the perceptual rating.
Aerodynamic battery. Maximum phonation time, s/z ratio, mean phonatory airflow, and subglottal pressure estimates complete the workup. A short MPT (< 15 seconds in adults) paired with an elevated CAPE-V breathiness score suggests glottal insufficiency; a short MPT paired with an elevated strain score suggests hyperfunction against a hypoadducted glottis.
Tracking Voice Therapy Progress with Serial CAPE-V
The CAPE-V is the single best way to track voice therapy progress in a voice clinic. Rate the CAPE-V at intake and at every progress visit, typically every 2 - 4 weeks, and plot the Overall Severity score across visits. The clinically meaningful change (Zempleni et al. 2018) is approximately 10 mm on the Overall Severity VAS, so a 10 mm or larger drop between consecutive visits is a real change; smaller drops should be interpreted as noise.
Discharge criteria. Most U.S. voice clinics use a discharge threshold of CAPE-V Overall Severity below 10 mm (within normal limits) alongside a VHI-10 below 11 and a patient-reported return to occupational voice use. Discharge criteria should be individualised for occupational voice users who require a below-9 mm CAPE-V Overall Severity for the voice to meet the demand of their job (teachers, broadcasters, singers).
Non-responders. A patient whose CAPE-V Overall Severity fails to drop by at least 10 mm across four progress visits is a non-responder. Re-evaluate the diagnosis, re-image the larynx, refer to laryngology for a second opinion, and consider whether the primary driver is structural (requires surgery), neurogenic (requires a different therapy approach such as LSVT LOUD for Parkinson disease hypophonia), or psychogenic (requires a referral to the voice clinic psychologist).
Documentation. Paste the CAPE-V mm scores and the Kempster anchor band into every voice-therapy progress note. The chart note should reference the CAPE-V Overall Severity score explicitly, the direction and magnitude of change since the previous visit, and the continued or revised voice therapy plan.