Overview
Fumo is a mobile speech therapy platform designed to support children with Childhood Apraxia of Speech (CAS).
Using AI-powered 3D avatars and real-time feedback, Fumo bridges the gap between weekly clinical sessions and daily speech practice, making therapy more engaging and accessible.
Results
From V1 to V2, my teammates and I redesigned Fumo. Through research, ideation, and countless iterations of prototyping, we transformed the application into a more accessible and engaging experience.
Between two phases of usability testing we saw a 86% increase in user engagement.
What is Childhood Apraxia of Speech (CAS)?
Childhood Apraxia of Speech (CAS) is a neurological speech disorder where the brain has trouble planning and coordinating the muscle movements needed for speech. Kids with CAS know what they want to say, but their brains struggle to send the right signals to their mouths, making it hard to speak clearly.
Context and Problem
Children with CAS often face delays in speech development due to motor planning difficulties. Access to therapy is limited. These are some of the problems that they face:
Hypothesis

A 3D avatar that acts as a speech coach to improve the accuracy and confidence of children practicing alone.

Real-time, emotionally expressive feedback improves engagement and retention in early learners.
Potential Solutions
Introducing the NVIDIA OMNIVERSE 3D MODEL
Avatar design was created with Metahuman Creator, stylized to avoid the "uncanny valley."
Facial animation is powered by NVIDIA Audio2Face and TensorRT for near real-time lip sync from speech input.
Integration was done using Unreal Engine for building the animation logic that syncs mouth shapes to phonemes dynamically.
The development of 3D model
The 3-D avatar with different human facial expressions and articulation using real-time 3D rendering.



The 3D model displaying different emotions and prosidies
Restructuring the Learning Flow to Improve Engagement & Practice Quality
The Approach
Since direct testing with young children diagnosed with apraxia was not yet possible (due to the longer recruitment and ethical approval processes involved), we adapted our research methods:
Conducted pilot testing with older individuals with speech disorders and general users simulating speech therapy tasks
Analyzed session behavior patterns, completion rates, and points of drop-off
Reviewed secondary research on cognitive load theory
This gave us enough behavioral signals to identify flow pain points while we awaited broader access to the pediatric population.
The Challenge
Initial versions of the app followed a story-driven, emotion-later structure, where users engaged with storytelling modules before practicing articulation. However, early pilot observations with adult users including individuals with speech challenges revealed that the flow was cognitively demanding.
Users struggled to maintain focus through long, multi-step sessions, and articulation practice felt scattered rather than structured.
Given the known cognitive profiles of children with apraxia (sourced through secondary research and literature reviews), we hypothesized that a similar or even greater disengagement would occur with our target audience if the flow remained complex.
Due to session fatigue users lost interest after 15–20 minutes.
Since words were pulled randomly from different emotional contexts, we hypothesized that children would be unable to build muscle memory or reinforce patterns.
Root Cause
I realized that the cognitive demand was still very high. Users had to constantly switch emotional contexts while also learning pronunciation, two taxing tasks at once.
Plus, the absence of a consistent speech path led to shallow learning and low engagement.
Old flow:
01
Story module first
02
4 practice words
03
Emotion-driven word repetition
04
Asked questions to make them answer with the practice words
The Solution
One promising idea is to let kids earn a brief storytime after completing their articulation exercises. Research in child development and UX supports this strategy – showing that storytelling rewards can boost motivation, sustain engagement, and reduce cognitive overload.
Boosting Motivation with Story Rewards - Using a story as a reward is a form of positive reinforcement that leverages children’s love for stories to encourage task completion. Using a story as a reward is a form of positive reinforcement that leverages children’s love for stories to encourage task completion. Behavior experts note that offering a preferred activity (like storytime) after a less-preferred task increases the likelihood the child will do that task – a concept known as the Premack principle. In practice, this means a child knows “if I finish my speech practice, I get a story,” which can spark willingness and enthusiasm to complete the exercise. For example, one guide on child behavior suggests offering an extra bedtime story as a reward for tidying up or doing homework.
Reducing Cognitive Load with Story Breaks - Crucially, the story should feel like a relaxing reward, not more work. A design framework for gamified learning suggests using “lighter, more playful content” (e.g. a song or story) as a reward, explicitly to let students relax and recharge.
We redesigned the flow around consistency, repetition, and bite-sized focus. The new flow flipped the structure:
Practice-first, story-second: Now, users begin with focused articulation practice on a single target (e.g., a vowel or word cluster).
Catered progression: Rather than rotating emotions, the app now builds practice sessions around phoneme categories personalized to the child’s needs.
Shorter, progressive modules: Each unit is 10–15 minutes long, letting children practice in manageable bursts instead of enduring one long session.
Emotion as expression, not instruction: Emotional variety now supports prosody and tone—but only after mastery of the core sound.
Added session cards and visual progress checkpoints
Designed each module to be 10–15 minutes max
Shifted story elements to reward zones, acting as a playful cooldown rather than a starter
New flow:
01
Practice-first, story-second
02
Single-target articulation session
03
Catered phoneme-based progression
04
10–15 min per module
Accessibility First Approach
Designing for children with speech and motor challenges meant accessibility had to be the foundation of our design.
Accessibility isn’t just about compliance it's about creating a learning environment where every child feels confident, capable, and encouraged. By building Fumo on an accessibility-first framework, we made sure that no child was left behind simply because the interface was too fast, too small, or too complex.
Accessibility was a design choice not a feature. It was intentional and fundamental.
We focused on four key areas:
(fig: UI with labels)
Old Solution
(reduced video quality due to file size limitations)
Current Solution
(reduced video quality due to file size limitations)
Impact
40% reduction
in therapy-related expenses for families
3x more frequent practice
compared to traditional therapy-only routines
50% faster progress
reported by pilot users and therapists over a 6-week trial
86% increase in user engagement
between V1 and V2 (measured via task completion, session length)
Key Decisions & Challenges
What I Learned
Designing for children with speech challenges taught me the value of inclusive design not as a constraint, but as a creative advantage.
AI in healthcare needs to be transparent, reliable, and most of all emotionally intelligent.
Balancing clinical requirements with user engagement for children with special needs, leading to better product for everyone