My contributions

Designing an end-to-end mobile experience

Visual Design, UI Design

Prototyping, User Testing

Overview

Fumo is a mobile speech therapy platform designed to support children with Childhood Apraxia of Speech (CAS).

Using AI-powered 3D avatars and real-time feedback, Fumo bridges the gap between weekly clinical sessions and daily speech practice, making therapy more engaging and accessible.

Results

From V1 to V2, my teammates and I redesigned Fumo. Through research, ideation, and countless iterations of prototyping, we transformed the application into a more accessible and engaging experience.

Between two phases of usability testing we saw a 86% increase in user engagement.

What is Childhood Apraxia of Speech (CAS)?

Childhood Apraxia of Speech (CAS) is a neurological speech disorder where the brain has trouble planning and coordinating the muscle movements needed for speech. Kids with CAS know what they want to say, but their brains struggle to send the right signals to their mouths, making it hard to speak clearly.

Context and Problem

Children with CAS often face delays in speech development due to motor planning difficulties. Access to therapy is limited. These are some of the problems that they face:

Hypothesis

A 3D avatar that acts as a speech coach to improve the accuracy and confidence of children practicing alone.

Real-time, emotionally expressive feedback improves engagement and retention in early learners.

Potential Solutions

Introducing the NVIDIA OMNIVERSE 3D MODEL

Avatar design was created with Metahuman Creator, stylized to avoid the "uncanny valley."

Facial animation is powered by NVIDIA Audio2Face and TensorRT for near real-time lip sync from speech input.

Integration was done using Unreal Engine for building the animation logic that syncs mouth shapes to phonemes dynamically.

The development of 3D model

The 3-D avatar with different human facial expressions and articulation using real-time 3D rendering.

The 3D model displaying different emotions and prosidies

Restructuring the Learning Flow to Improve Engagement & Practice Quality

The Approach

Since direct testing with young children diagnosed with apraxia was not yet possible (due to the longer recruitment and ethical approval processes involved), we adapted our research methods:

Conducted pilot testing with older individuals with speech disorders and general users simulating speech therapy tasks.

Analyzed session behavior patterns, completion rates, and points of drop-off.

Reviewed secondary research on cognitive load theory.

This gave us enough behavioral signals to identify flow pain points while we awaited broader access to the pediatric population.

The Challenge

Initial versions of the app followed a gamified, emotion-later structure, where users engaged with gaming modules in between practicing articulation. However, early pilot observations with adult users including individuals with speech challenges revealed that the flow was cognitively demanding.

Users struggled to maintain focus through long, multi-step sessions, and articulation practice felt scattered rather than structured.

Given the known cognitive profiles of children with apraxia (sourced through secondary research and literature reviews), we hypothesized that a similar or even greater disengagement would occur with our target audience if the flow remained complex.

Due to session fatigue users lost interest after 15–20 minutes.

Since words were pulled randomly from different emotional contexts, we hypothesized that children would be unable to build muscle memory or reinforce patterns.

Root Cause

I realized that the cognitive demand was still very high. Users had to constantly switch emotional contexts while also learning pronunciation, two taxing tasks at once. In addition to that, they were getting distracted by games in between!

Plus, the absence of a consistent speech path led to shallow learning and low engagement.

Old flow:

01

Story module first

02

4 practice words

03

Emotion-driven word repetition

04

Practice word related game

05

Asked questions to make them answer with the practice words

The Solution

One idea is to let kids earn a brief storytime after completing their articulation exercises. Research in child development and UX supports this strategy: showing that storytelling rewards can boost motivation, sustain engagement, and reduce cognitive overload.

Boosting Motivation with Story Rewards - Using a story as a reward is a form of positive reinforcement that leverages children’s love for stories to encourage task completion. Using a story as a reward is a form of positive reinforcement that leverages children’s love for stories to encourage task completion. Behavior experts note that offering a preferred activity (like storytime) after a less-preferred task increases the likelihood the child will do that task – a concept known as the Premack principle. In practice, this means a child knows “if I finish my speech practice, I get a story,” which can spark willingness and enthusiasm to complete the exercise. For example, one guide on child behavior suggests offering an extra bedtime story as a reward for tidying up or doing homework.

Reducing Cognitive Load with Story Breaks - Crucially, the story should feel like a relaxing reward, not more work. A design framework for gamified learning suggests using “lighter, more playful content” (e.g. a song or story) as a reward, explicitly to let students relax and recharge.

We redesigned the flow around consistency, repetition, and bite-sized focus. The new flow flipped the structure:

Practice-first, story-second: Now, users begin with focused articulation practice on a single target (e.g., a vowel or word cluster).

Catered progression: Rather than rotating emotions, the app now builds practice sessions around phoneme categories personalized to the child’s needs.

Shorter, progressive modules: Each unit is 10–15 minutes long, letting children practice in manageable bursts instead of enduring one long session.

Emotion as expression, not instruction: Emotional variety now supports prosody and tone—but only after mastery of the core sound.

Added session cards and visual progress checkpoints

Designed each module to be 10–15 minutes max

Shifted story elements to reward zones, acting as a playful reward rather than a starter

New flow:

01

Practice-first, story-second

02

Single-target articulation session

03

Catered phoneme-based progression

04

10–15 min per module

Accessibility First Approach

Designing for children with speech and motor challenges meant accessibility had to be the foundation of our design.

Accessibility isn’t just about compliance it's about creating a learning environment where every child feels confident, capable, and encouraged. By building Fumo on an accessibility-first framework, we made sure that no child was left behind simply because the interface was too fast, too small, or too complex.

Accessibility was a design choice not a feature. It was intentional and fundamental.

We focused on four key areas: