Pillar guideSpeech-Language Pathology17 min read

AI in Speech Therapy: What Actually Works in 2026 (and What Still Doesn’t)

The honest answer to "does AI work for speech therapy?" in 2026 is: yes, for exactly three tasks, and no, for almost everything else. This pillar walks through the four AI-adjacent stages of a typical SLP workflow — transcription, scoring, drafting, and triage — and tells you which sub-tasks a commodity LLM handles well, which ones it fails at in ways that matter clinically, and which ones should still be firmly in the clinician’s hands until the tooling catches up.

1. The honest framing: three tasks, not a revolution

The easiest way to lose a working SLP’s attention in 2026 is to open with a sentence about how AI is going to transform speech therapy. It is not, at least not on the time horizon that matters to a clinician carrying a 60-student caseload this semester. What AI is going to do — and what it is already doing on the caseloads where it has been carefully adopted — is reallocate clinician time away from three specific high-volume, low-judgement tasks that consumed six to ten hours a week under the old workflow. Those three tasks are transcription, mechanical metric scoring, and first-draft paperwork. That is the entire honest list. Everything else in this article is detail on where those three tasks begin, where they end, and what happens when someone extends the list past them without thinking carefully.

This framing matters because the marketing framing around AI-for-SLP has been consistently sloppier than the clinical reality warrants. A tool that turns a one-hour audio file into a cleaned, speaker-diarised, punctuation-correct transcript in three minutes is genuinely useful; a tool that claims to generate a complete IEP goal from a parent questionnaire is not. A tool that reads a 50-utterance transcript and returns MLU, NDW, and PGU values that match a clinician’s hand-count to within 2% is a real time saver; a tool that claims to provide developmental diagnosis from a two-minute audio clip is an FDA-regulated medical device masquerading as a feature. The difference between the first and second items in each pair is the difference between an AI tool that belongs on an SLP’s desktop and an AI tool that should not be allowed near a caseload. The goal of this pillar is to help a clinician draw that line clearly on every vendor pitch they will see this year.

The concrete recommendation that comes out of this framing is narrow and unglamorous: adopt AI for transcription, adopt it for metric scoring, adopt it for first-draft paperwork, and do not adopt it for anything else until the regulatory and validation story for that task catches up. Everything in the rest of this article elaborates on the three tasks that are on the list and explains the three categories of tasks that are still firmly off the list: clinical judgement calls, diagnostic conclusions, and any task where the error mode of the model affects which intervention a child receives.

The 2026 list: what AI does well for SLP

(1) Speech-to-text transcription with speaker diarisation, (2) mechanical LSA metric scoring from a cleaned transcript, (3) first-draft paperwork from structured clinician-supplied data. Everything else — differential diagnosis, intervention planning, eligibility decisions, progress-monitoring judgement — stays in the clinician’s hands until the tooling is validated against clinical outcomes. This is the honest 2026 line.

2. Task 1: Transcription — where AI actually saves hours

The single most valuable thing an LLM-adjacent AI does for a speech-language pathologist in 2026 is turn an hour of audio into a cleaned, speaker-diarised transcript in under five minutes. This is the task where the underlying technology — large-scale ASR models trained on hundreds of thousands of hours of conversational speech — has matured to the point where the transcript out of a commodity model is not only accurate enough for clinical use but also cheaper and faster than any human-in-the-loop alternative. The working clinician’s experience is straightforward: upload a 30-minute therapy session recording, wait two to three minutes, and receive a transcript with speaker labels ("SLP", "Child"), utterance segmentation, punctuation, and filled-pause markers. The same process by hand takes an SLP between 90 minutes and four hours depending on audio quality and child age.

The accuracy profile of current transcription is well-documented enough to plan around. For adult-to-adult conversational speech in clean audio, word error rates are below 5% on current frontier ASR. For adult-to-child speech in clinical settings, word error rates for the adult are below 8% and word error rates for the child are between 15% and 30% depending on age, speech intelligibility, and background noise. The child word error rate is the number that matters clinically, because the child is the target of the language sample; a transcript with a 25% child word error rate will need clinician review on every utterance the child produced, which is still faster than hand-transcription but is not the zero-clinician-time experience the marketing sometimes implies. A clinician adopting AI transcription should plan on 10 to 20 minutes of reviewing and correcting the child’s utterances after receiving the AI output, not zero.

The practical workflow is: record the session on a high-quality handheld recorder or a dedicated clinical audio device (not a laptop microphone), upload the file to the AI transcription service, wait for the output, review the child’s utterances line by line and correct any misheard words, run the cleaned transcript through the LSA calculators on this site (MLU, NDW, PGU, DSS, and whatever other metrics the clinical question demands), and save the scored output to the child’s file. The entire pipeline from recording to scored metrics takes under 30 minutes for a clinician who has done it before, compared to four to six hours under the old hand-transcription workflow. That two-order-of-magnitude time saving is the core value proposition of AI for SLP in 2026, and it is the one that most reliably converts into caseload capacity.

  • Record on a dedicated audio device, not a laptop microphone — input quality drives output quality.
  • Upload to an AI transcription service with speaker diarisation and utterance segmentation.
  • Plan on 10–20 minutes of reviewing and correcting the child’s utterances; do not skip this step.
  • Run the cleaned transcript through the LSA calculators on this site for MLU, NDW, PGU, and other metrics.
  • Pipeline total: under 30 minutes per session versus 4–6 hours under hand-transcription.

3. Task 2: LSA metric scoring — the math is already solved

The second place AI saves serious clinician time in 2026 is the mechanical scoring step that turns a cleaned transcript into metric values. This is the step that the free calculators on this site already automate without any AI at all — they are deterministic algorithms over the cleaned transcript that return MLU, NDW, TTR, MATTR, vocd-D, DSS, IPSyn, PGU, and a dozen other metrics in under a second. The interesting question is not whether AI is needed for this step (it is not), but how an AI-assisted scoring pipeline can wrap around these deterministic calculators to handle the small amount of judgement scoring that is still manual.

The specific judgement calls that still require a clinician on 2026 scoring are narrow: distinguishing a mazing repetition from a substantive repetition in PGU scoring, deciding whether a novel morpheme is a production error or a dialectal feature, segmenting a long turn into C-units when the prosody is ambiguous, and determining whether an ungrammatical utterance is a production error or a comprehension indicator. Each of these calls is a place where a commodity LLM can draft a plausible answer but where clinician review is still mandatory because the error mode of the model is unbounded and the downstream consequence is an IEP goal that could be wrongly calibrated. The workflow is not "AI scores and clinician rubber-stamps"; it is "deterministic calculator scores, AI drafts the judgement calls that remain, clinician reviews."

A concrete example: the PGU Calculator on this site takes a marked-up transcript where the clinician has tagged each utterance as grammatical or ungrammatical and returns the percentage of grammatical utterances. The AI-assisted version of the workflow uses a commodity LLM to draft the grammaticality tags on a first pass, then the clinician reviews the tagged transcript and corrects the 10% to 20% of tags the model got wrong, then the deterministic PGU Calculator returns the percentage. This cuts the hand-tagging step from 20 minutes to five, while keeping the clinician in control of the final tag every time. The same pattern works for DSS category tagging, IPSyn item tagging, and morpheme-in-obligatory-context tagging for Brown’s stage analysis. The deterministic calculator is always the final arbiter of the number; the AI shortens the tagging step that feeds the calculator.

  • Deterministic calculators — not LLMs — are the right tool for the final metric computation.
  • LLMs can draft the judgement-call tags (grammaticality, C-unit segmentation) that feed the calculators.
  • The clinician reviews and corrects the AI-drafted tags before the calculator runs.
  • Typical time saving: 20 minutes of hand-tagging reduced to 5 minutes of review-and-correct.
  • The deterministic calculator is always the final arbiter of the number that goes into the IEP.

Deterministic vs generative scoring

For any number that is going to end up inside an IEP goal or a progress report, the final computation must be deterministic — a closed-form algorithm over the tagged transcript, not a generative model output. LLMs help with the tagging step; the calculator does the math. This is the 2026 rule that keeps AI-assisted scoring defensible in a due-process review.

4. Task 3: First-draft paperwork — reports, present-levels, and IEP goals

The third place AI saves real clinician time in 2026 is in drafting the paperwork that turns scored data into readable clinical prose: the evaluation report, the present-levels-of-performance paragraph, the SMART goal sentences for an IEP, the quarterly progress note to the parent, and the monthly data summary for the supervising teacher. Each of these documents follows a predictable template, pulls from a small number of structured data points, and has a cost-to-read-first-draft that is much lower than the cost-to-write-from-scratch. Those are exactly the conditions under which AI drafting saves the most time while preserving clinician judgement.

The working pattern is structured-in, prose-out. The clinician gives the model a small JSON or spreadsheet row: child name, age in years and months, date of sample, MLU-m, MLU-w, NDW, PGU, DSS composite, SUGAR age band, clinical interpretation. The model produces a paragraph in the clinician’s preferred format: "On 3/14/2026, a 50-utterance conversational language sample was elicited from Marcus (age 5;3) using the SUGAR protocol. MLU in morphemes was 3.4 (SUGAR age-5 mean 5.1, SD 0.9), placing him at −1.9 SD." The clinician reads the paragraph, edits the two or three sentences that need adjustment, and the paragraph is ready. Typical time saving: 20 minutes of writing reduced to five minutes of reviewing and editing, multiplied across the 15 to 25 paragraphs in a full evaluation report.

The IEP Goal Generator on this site is the canonical example of this pattern in the SLP space. It takes the metric values from the calculators, the clinician’s chosen goal area, and a small set of district-specific parameters, and drafts a SMART goal sentence in the format most US districts expect. The clinician reviews the draft, edits the mastery criterion or the progress-probe cadence, and accepts. The draft-and-edit pattern works because the goal sentences are structurally predictable (baseline, target, measurement method, mastery criterion) and because the downstream reader — the IEP team — expects a specific format that the model can learn. The pattern fails when the task is unstructured and the output cannot be template-validated, which is the boundary we will cross in the next section.

  • Structured-in, prose-out is the pattern that works: give the model data, get the first-draft paragraph.
  • Works for present-levels paragraphs, evaluation report sections, and SMART goal sentences.
  • Clinician edit time is typically 20–30% of the original write-from-scratch time.
  • The IEP Goal Generator is the canonical SLP example — structured baselines in, SMART goals out.
  • The pattern fails when the task is unstructured or when the output has no template to validate against.

5. Where AI still fails: diagnosis, eligibility, and clinical judgement

The three tasks above — transcription, metric scoring, and first-draft paperwork — are the entire honest list of places where AI is saving meaningful clinician time on a typical SLP caseload in 2026. Every other task in the workflow either still belongs to the clinician or still requires infrastructure that the current tooling does not provide. This section walks through the specific places where AI is marketed but where adoption is premature, and explains the error mode in each case so a clinician can recognise it when a vendor pitches around it.

The first place AI fails for SLP is differential diagnosis. Vendors will sometimes claim that an LLM can read a case history and produce a diagnostic impression ("expressive language disorder", "childhood apraxia of speech", "developmental language disorder with pragmatic features"). The clinical problem with this claim is that differential diagnosis in speech-language pathology is a regulated activity that requires a credentialed clinician, depends on data from standardised assessments the model has not run, and produces an outcome that drives eligibility decisions for special education services. The error mode of a commodity LLM on differential diagnosis is quiet confidence in a wrong answer that a credentialed clinician would not have reached, which is the worst possible error mode for a downstream eligibility decision. Diagnostic output from an LLM belongs in the "do not adopt" column for 2026, full stop.

The second place AI fails is eligibility determination. A child qualifies for school-based SLP services under IDEA when a multidisciplinary team determines that a communication impairment adversely affects educational performance. The word "team" is load-bearing: eligibility is a multi-person judgement that depends on data a model does not have, includes family context a model cannot evaluate, and has statutory due-process implications that a model cannot take responsibility for. An LLM that writes an eligibility recommendation is performing the wrong activity; the right role for the model is to summarise the data the team will use to make the decision, not to pre-empt the decision itself. The distinction matters because an eligibility recommendation from an "AI tool" carries a false sense of authority that can short-circuit the team process that IDEA actually requires.

The third place AI fails is intervention planning. Picking the right therapy approach for a specific child depends on dozens of clinical variables that do not compress into a prompt: the child’s response to prior intervention attempts, the family’s engagement capacity, the school’s scheduling constraints, the team’s clinical preferences, and the clinician’s own intuition about what is and is not worth trying this quarter. A model that recommends an intervention approach from a short prompt is not doing intervention planning; it is doing pattern-matching on training data that does not include the variables that actually matter. The honest framing is: the model can list the approaches that are commonly used for a presentation, which is useful as a reference; it cannot pick the right approach, which is still the clinician’s call.

  • Differential diagnosis: LLMs produce confident wrong answers; this is the worst error mode.
  • Eligibility determination: a multidisciplinary team activity with statutory due-process implications.
  • Intervention planning: depends on variables that do not compress into a prompt.
  • Progress-monitoring judgement: whether a flat trajectory means "revise the goal" or "keep going" is clinical.
  • Family communication and counselling: no substitute for the clinician’s face-to-face conversation.

6. The unglamorous constraints: cost, privacy, and HIPAA

Even within the three tasks where AI works well, two constraints shape which tools are actually adoptable on a real SLP caseload: cost and privacy. The cost side is the easier one. AI transcription at 2026 prices runs roughly $0.10 to $0.30 per minute of audio depending on the service and quality tier, which means a typical 30-minute therapy session costs $3 to $9 to transcribe. For a clinician seeing 20 students a week at one session each, that is $60 to $180 a week — $2,400 to $7,200 a year — which is well within a district technology budget but not within a zero-cost one. Districts that have not budgeted for this cost will push back, and the clinician is left with either a personal-budget workflow or a hand-transcription workflow. The honest framing is that AI transcription is not free and the budget conversation needs to happen before the workflow is built around it.

Privacy is the harder constraint. Therapy session audio is Protected Health Information under HIPAA and is also covered by FERPA in school settings. A transcription service that routes the audio through a general-purpose cloud endpoint is almost certainly not HIPAA-compliant by default, and most are not covered by a Business Associate Agreement with the school district. A clinician who uploads session audio to a consumer transcription service is creating a compliance exposure for the district even if the resulting transcript is clinically perfect. The practical rule is: use only services that offer a signed BAA for healthcare clients, use only services that explicitly advertise HIPAA-compliant infrastructure, and run the adoption decision past the district IT or compliance officer before the first session is uploaded. The same rule applies to any LLM used to draft IEP paperwork — if the prompt contains identifiable student data, the model endpoint needs to be HIPAA-compliant.

The cost-plus-privacy combination is the reason the most common AI-for-SLP adoption pattern in 2026 is a hybrid one: identifiable student data stays inside a HIPAA-compliant environment (which usually means a narrower set of tools than the general AI market) and de-identified metric data moves through the free calculators on this site for the deterministic scoring step. The clinician does not paste "Marcus, age 5;3" into a consumer LLM; the clinician pastes a de-identified transcript chunk into the MLU Calculator, reads the number, and then composes the present-levels paragraph in the district’s own HIPAA-compliant paperwork system. The split between identifiable and de-identified data is the most common 2026 compliance pattern, and it is the one that keeps the workflow defensible to a compliance review.

  • AI transcription: $0.10–$0.30 per minute; $2,400–$7,200 per clinician per year.
  • Privacy: session audio is PHI under HIPAA and FERPA in schools; BAA required.
  • Never upload identifiable audio to a consumer transcription service without a signed BAA.
  • Typical 2026 pattern: identifiable data stays in HIPAA-compliant systems; de-identified metric data uses free calculators.
  • Run the adoption decision past district IT or compliance before the first session is uploaded.

7. Failure modes a clinician should learn to spot

Every AI tool a school SLP considers in 2026 has a handful of specific failure modes that a clinician can learn to spot during the trial period. Knowing these failure modes before the tool is on the caseload prevents the most common adoption mistakes and turns the evaluation process from a vague "does this work?" into a checklist of specific probes the tool either passes or fails. The five failure modes below come up on every vendor trial and are worth running explicitly during any AI-for-SLP evaluation.

Failure mode one is the child-speech degradation. A transcription tool that claims a 95% word-level accuracy is almost always reporting the adult side of the conversation; child speech in clinical audio is harder by 10 to 20 percentage points in every commercial benchmark. The probe is to upload a known child-only audio clip (with an existing hand-transcription) and compute the word error rate against the hand-transcription, not against the adult side. Any tool whose child WER is above 35% in clean audio is not ready for clinical adoption no matter how impressive the adult numbers look.

Failure mode two is the hallucinated utterance. Commodity LLMs used for drafting paperwork sometimes produce sentences that contain made-up numbers — an MLU value the clinician did not supply, a normative band the tool invented, a citation to a paper that does not exist. The probe is to run the tool on a structured-in prompt where every numeric value is known and then search the output for any numeric value that was not in the input. Any tool that produces a numeric value the clinician did not supply is failing and needs either a stricter prompt template or a different tool.

Failure mode three is the diagnostic leak. A report-drafting tool sometimes produces a "clinical interpretation" sentence that crosses from data summary into diagnostic conclusion: "The pattern is consistent with a developmental language disorder." A clinician who supplied only raw metric values and normative comparisons did not authorise that sentence; the tool produced it from its training data. The probe is to give the tool a prompt that contains only metric values and normative references and then check whether the output contains any diagnostic conclusion the clinician did not explicitly author. Diagnostic leak is the most clinically dangerous of the five failure modes because the conclusion can end up in the child’s file.

Failure modes four and five are the confidence hallucination and the template breakage. Confidence hallucination is when the tool produces an output that sounds authoritative but is based on thin evidence — the prose is polished enough that a clinician skimming the draft can miss that the underlying data does not support the claim. Template breakage is when the tool produces an output in the wrong format for the district’s IEP system — goal sentences that are missing the mastery criterion, present-levels paragraphs that skip the normative comparison. Both are fixable with a tighter prompt template, but both are reasons a clinician should review every draft carefully and not accept an output that "looks right" without checking the specific components the district expects.

  • Child-speech degradation: always test WER on child-only clips, not adult benchmarks.
  • Hallucinated utterance: scan every draft for numeric values the clinician did not supply.
  • Diagnostic leak: any diagnostic conclusion the clinician did not explicitly author is a failure.
  • Confidence hallucination: polished prose that is not supported by the underlying data.
  • Template breakage: missing mastery criteria, missing normative comparisons, wrong district format.

8. A realistic 2026 AI-assisted SLP workflow

Taking everything above together, here is what a realistic 2026 AI-assisted workflow looks like for a school SLP carrying a 60-student caseload across an elementary district. The workflow is calibrated for clinical realism, not for technology maximalism, and every step is named so a clinician can adopt it piece by piece without committing to a full tool suite on day one.

Step one is the session recording. Use a dedicated audio device (handheld recorder or a tablet with a good microphone), place it within two feet of the child, record in a quiet room, and save the file with a de-identified file name (date and session number, not child name). Step two is the HIPAA-compliant upload. Use only a transcription service with a signed BAA; if the district does not have one, stop here and use hand-transcription until IT approves a compliant vendor. Step three is the AI transcription itself, which takes three to five minutes for a 30-minute clip. Step four is the clinician review of the child’s utterances, which takes 10 to 20 minutes depending on intelligibility. Step five is the cleaned transcript going into the free LSA calculators on this site: MLU Calculator, Lexical Diversity Calculator, DSS or IPSyn Calculator as the clinical question requires, PGU Calculator for grammaticality. Each calculator returns a number in under a second from a de-identified transcript paste.

Step six is the cross-reference against the SUGAR Norms Lookup and Brown’s Stages Lookup to produce the clinical interpretation row. Step seven is the present-levels paragraph, which the clinician drafts using an AI tool inside the district’s HIPAA-compliant environment if one is available, or inside a structured template if not. Step eight is the IEP goal drafting using the IEP Goal Generator on this site (de-identified input) plus the clinician’s editing pass. Step nine is the progress-monitoring plan: the same calculators on a fresh sample at the six-month and year-end probes. Step ten is the parent communication, which is always the clinician, never the AI. The total time from recording to finished IEP paperwork is under 90 minutes per child, compared to the four to six hours the hand-transcription workflow required.

The structural reason this workflow works is that the AI handles the high-volume, low-judgement steps (transcription, calculator input formatting, first-draft paragraph generation) while the clinician handles every judgement call (child-utterance review, calculator interpretation, goal calibration, family communication). The AI is never the final arbiter of any number that goes into the IEP; the deterministic calculator is. The AI is never the author of any clinical conclusion; the clinician is. When the workflow follows this pattern, the time saving is real and the compliance posture is defensible. When the workflow blurs the line — AI drafting conclusions, AI writing directly to the IEP system, AI talking to families — the time saving turns into a compliance and clinical risk that the district should not accept.

  • Record on a dedicated audio device in a quiet room; de-identify file names.
  • Upload only to HIPAA-compliant services with a signed BAA.
  • Clinician reviews every child utterance after transcription — do not skip this step.
  • Run cleaned transcripts through deterministic calculators, not LLMs, for metric values.
  • AI drafts prose; clinician edits and owns every clinical conclusion.
  • Parent communication is always the clinician, never the AI.

9. Where ConductSpeech fits on the honest list

ConductSpeech is a tool built specifically for the three tasks on the 2026 honest list: it transcribes session audio with speaker diarisation inside a HIPAA-compliant environment, it feeds the cleaned transcript to the deterministic calculators that produce the metric values, and it drafts the present-levels paragraph, the SMART goal sentences, and the parent-facing progress note from the scored output. It does not produce diagnostic conclusions, it does not make eligibility recommendations, it does not replace the clinician’s review of the child’s utterances, and it does not talk to families. The positioning is deliberate and matches the honest framing of this pillar: ConductSpeech reallocates clinician time from the three high-volume low-judgement tasks to the clinical judgement tasks that are the reason the clinician exists in the first place.

The reason ConductSpeech can make this claim honestly is that the underlying LSA calculators — the ones on this site that run in a browser without any AI at all — are deterministic. The transcript goes in, the numbers come out, the same numbers come out every time. A reviewer can reproduce any number in the system by pasting the same transcript into the free calculator on this site. The AI sits on top of the deterministic calculators and handles the tagging, drafting, and formatting steps around them; it does not replace them. This architectural choice is the one that makes the system defensible in a due-process review, compliant with HIPAA and FERPA, and honest in its marketing claims. It is also the reason the free calculators on this site are not going anywhere — they are the load-bearing trust layer under everything ConductSpeech does.

For a clinician evaluating ConductSpeech against a competing tool, the five diagnostic questions are the ones this pillar has already laid out: (1) Does the tool have a signed BAA for HIPAA compliance? (2) Does the tool report child-speech word error rates separately from adult rates? (3) Does the tool use deterministic calculators for the final metric values, or does it use a generative model? (4) Does the tool produce diagnostic conclusions that the clinician did not author? (5) Does the tool let the clinician review and edit every draft before it enters the IEP system? ConductSpeech answers yes to (1), (2), and (5), uses deterministic calculators for (3), and does not produce diagnostic conclusions for (4). The honest framing — and the honest marketing — is that the tool saves meaningful clinician time on the three tasks where AI works well, and does not pretend to do the tasks where AI does not.

Free tools and reference pages

Every link below stays on conductscience.com. The free calculators are deterministic — paste the same transcript, get the same number every time. AI sits on top; the calculator is the final arbiter.

Free tools

MLU Calculator

Deterministic MLU-m and MLU-w computation from a cleaned transcript — the load-bearing calculator under any AI-assisted LSA workflow.

Open

Lexical Diversity Calculator

NDW, TTR, MATTR, and vocd-D from a transcript paste — deterministic output that AI drafting tools can build present-levels paragraphs around.

Open

PGU Calculator

Percent Grammatical Utterances from a tagged transcript — the school-age grammaticality metric that keeps AI-drafted goals defensible.

Open

DSS Calculator

Developmental Sentence Score from a tagged transcript — deterministic syntactic complexity scoring for school-age IEP goals.

Open

IPSyn Calculator

Scarborough (1990) 60-item productive syntax inventory — the second syntactic complexity option for school-age samples.

Open

Brown's Stages Lookup

Maps an MLU value onto Brown’s five stages — the cross-reference AI-drafted morphology goals depend on.

Open

SUGAR Norms Lookup

Pavelko & Owens age-banded mean and SD for MLU, NDW, and TPU — the cross-reference that turns raw metrics into clinical interpretations.

Open

Language Sample Worksheet

Printable elicitation prompts and tally sheet — the upstream input to any AI-assisted scoring pipeline.

Open

IEP Goal Generator

Drafts SMART goal sentences from deterministic metric values — the canonical example of the structured-in, prose-out AI drafting pattern.

Open

Narrative Scoring Scheme Calculator

Heilmann (2010) seven-component narrative scoring — deterministic narrative metric for school-age IEP goals.

Open

Conversation Turn Analyzer

Quantifies speaker balance and turn length from a turn-marked transcript — deterministic pragmatic metric.

Open

Reading Grade Level Analyzer

Readability scores from a text paste — deterministic complement to AI drafting tools for literacy goals.

Open

Caseload Workload Calculator

Quantifies the hours returned by AI-assisted workflow adoption — the cost-benefit input for the district adoption decision.

Open

Therapy Frequency Recommender

Recommends a weekly therapy dose from presenting concerns and severity — clinician-owned judgement, not an AI diagnostic conclusion.

Open

Frequently asked questions

Does AI actually work for speech therapy in 2026?
Yes, for exactly three tasks: transcription (turning session audio into a cleaned, speaker-diarised transcript), mechanical LSA metric scoring (feeding the cleaned transcript into deterministic calculators like MLU, NDW, and PGU), and first-draft paperwork (present-levels paragraphs, SMART goal sentences, and progress notes from structured metric values). Everything else — differential diagnosis, eligibility determination, intervention planning, clinical judgement calls — is still firmly in the clinician’s hands.
Can AI replace a speech-language pathologist?
No. AI reallocates clinician time from high-volume low-judgement tasks (transcription, scoring, first-draft paperwork) to the clinical judgement tasks that are the reason the clinician exists — differential diagnosis, eligibility decisions, intervention planning, family communication. An AI tool that claims to replace the clinician is either performing a regulated activity without authorisation or producing outputs that carry false authority the clinician would not endorse.
How accurate is AI transcription for child speech?
Word error rates for adult speech in clean audio are below 5% on current frontier ASR; word error rates for child speech in clinical audio are between 15% and 30% depending on age, intelligibility, and background noise. Plan on 10–20 minutes of clinician review and correction of the child’s utterances after receiving the AI output. The child WER is the number that matters clinically, not the advertised adult WER.
Is AI transcription HIPAA-compliant for school settings?
Only if the service has a signed Business Associate Agreement with the district and runs on HIPAA-compliant infrastructure. Session audio is Protected Health Information under HIPAA and is also covered by FERPA in schools. Never upload identifiable session audio to a consumer transcription service without a BAA — it creates a compliance exposure for the district even if the resulting transcript is clinically perfect.
What does AI do well and what does it do poorly for SLP?
AI does well at transcription (commodity ASR is accurate and cheap), mechanical metric tagging (LLMs draft C-unit segmentation and grammaticality tags that clinicians review), and first-draft paperwork (structured-in, prose-out is the reliable pattern). AI does poorly at differential diagnosis (confident wrong answers), eligibility determination (a team activity with due-process implications), intervention planning (depends on variables that do not fit in a prompt), and progress-monitoring judgement (whether a flat trajectory means revise the goal is clinical).
Should the metric calculation be done by an LLM or by a deterministic calculator?
Always by a deterministic calculator for any number that will end up inside an IEP goal or a progress report. The calculator returns the same number every time from the same transcript, which is the property that makes the goal defensible in a due-process review. LLMs can help with the tagging step that feeds the calculator, but the calculator is the final arbiter of the number. The free calculators on this site are deterministic for exactly this reason.
What are the specific AI failure modes I should test for during a trial?
Five specific failure modes to probe explicitly: (1) child-speech WER degradation — test on known child-only clips, not adult benchmarks; (2) hallucinated utterance — check every draft for numeric values the clinician did not supply; (3) diagnostic leak — check every draft for diagnostic conclusions the clinician did not author; (4) confidence hallucination — polished prose that is not supported by the underlying data; (5) template breakage — missing mastery criteria or normative comparisons in the wrong district format.
How much does AI transcription cost for a typical SLP caseload?
Roughly $0.10–$0.30 per minute of audio on 2026 prices. A typical 30-minute session costs $3–$9 to transcribe. A clinician seeing 20 students a week at one session each spends $60–$180 a week, or $2,400–$7,200 a year. This is within most district technology budgets but is not a zero-cost adoption — the budget conversation needs to happen before the workflow is built around it.
Where does ConductSpeech fit into the honest 2026 AI-for-SLP picture?
ConductSpeech is built specifically for the three tasks on the honest list: HIPAA-compliant session transcription with speaker diarisation, feeding cleaned transcripts to deterministic LSA calculators for metric values, and drafting present-levels paragraphs and SMART goal sentences from the scored output. It does not produce diagnostic conclusions, does not make eligibility recommendations, does not replace clinician review of child utterances, and does not talk to families. The positioning matches the honest framing of this pillar.
What does a realistic AI-assisted SLP workflow look like?
Record on a dedicated audio device, upload to a HIPAA-compliant transcription service with a signed BAA, review the child’s utterances (10–20 minutes), run the cleaned transcript through the free deterministic calculators on this site for MLU, NDW, PGU, DSS, and IPSyn, cross-reference the values against SUGAR Norms and Brown’s Stages, draft the present-levels paragraph with structured-in prose-out AI assistance, draft the SMART goals with the IEP Goal Generator, run the same calculators on fresh samples at the six-month and year-end probes. Total time: under 90 minutes per child compared to 4–6 hours under hand-transcription.

References

  1. American Speech-Language-Hearing Association (2024). Artificial Intelligence in Speech-Language Pathology. ASHA Practice Portal.
  2. American Speech-Language-Hearing Association (2023). HIPAA for Speech-Language Pathologists and Audiologists. ASHA Practice Management.
  3. Individuals with Disabilities Education Act (IDEA), 34 CFR Parts 300 and 303. U.S. Department of Education.
  4. Family Educational Rights and Privacy Act (FERPA), 20 U.S.C. § 1232g; 34 CFR Part 99. U.S. Department of Education.
  5. Pavelko, S. L., & Owens, R. E. (2017). Sampling Utterances and Grammatical Analysis Revised (SUGAR): New normative values for language sample analysis measures. Language, Speech, and Hearing Services in Schools, 48(3), 197-215.
  6. Eisenberg, S. L., & Guo, L. (2013). Differentiating children with and without language impairment based on grammaticality. Language, Speech, and Hearing Services in Schools, 44(1), 20-31.
  7. Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2023). Robust speech recognition via large-scale weak supervision (Whisper). International Conference on Machine Learning.
  8. Koenecke, A., Nam, A., Lake, E., Nudell, J., Quartey, M., Mengesha, Z., Toups, C., Rickford, J. R., Jurafsky, D., & Goel, S. (2020). Racial disparities in automated speech recognition. Proceedings of the National Academy of Sciences, 117(14), 7684-7689.
  9. Gabeur, V., Sun, C., Alahari, K., & Schmid, C. (2020). Multi-modal transformer for video retrieval. European Conference on Computer Vision (ECCV), Springer.
  10. Brown, R. (1973). A First Language: The Early Stages. Cambridge, MA: Harvard University Press.
  11. Lee, L. L. (1974). Developmental Sentence Analysis: A Grammatical Assessment Procedure for Speech and Language Clinicians. Evanston, IL: Northwestern University Press.
  12. U.S. Department of Health and Human Services (2024). HIPAA for Professionals: Business Associate Contracts.

This article is a clinical-workflow reference, not legal, regulatory, or procurement advice. Before adopting any AI tool on a student caseload, confirm HIPAA and FERPA compliance with your district’s IT or compliance officer and review the vendor’s Business Associate Agreement.

Get the full analysis

AI where it works; clinician where it matters

ConductSpeech runs the three tasks on the honest 2026 list \u2014 HIPAA-compliant transcription, deterministic LSA scoring, and structured-in prose-out drafting \u2014 and keeps every clinical conclusion in the clinician\u2019s hands. Free trial; no installation.

Get the full analysis with ConductSpeech