React Native2024-06-20|8 min read

Build a Speech-to-Text React Native App with Whisper RN

On-device speech recognition has been a hard problem for mobile developers for years. Cloud APIs add latency, cost, and privacy concerns — especially in healthcare and enterprise apps where audio may contain sensitive information. whisper.rn brings OpenAI's Whisper model directly onto the device, giving you fast, accurate, offline transcription without sending a single byte to an external server. I've used this in a telehealth project and the accuracy at the base.en model level genuinely surprised me.

Why Whisper RN Over Cloud APIs?

Privacy: Audio never leaves the device — essential for HIPAA-adjacent use cases.
Offline: Works without a network connection.
Cost: No per-minute billing.
Latency: On a modern iPhone or Android flagship, a 30-second clip transcribes in 3–6 seconds.

The trade-off is model size (the base model is ~140 MB) and slightly lower accuracy than Whisper Large running server-side. For most voice memo or note-taking use cases, base.en hits the sweet spot.

Installation

npm install whisper.rn
# or
yarn add whisper.rn

For iOS, run pod install inside the ios/ directory. On Android, the library ships pre-compiled native binaries, so no extra NDK configuration is needed as of whisper.rn 0.3+.

Also install react-native-audio-record for microphone capture:

npm install react-native-audio-record

Add microphone permissions. In android/app/src/main/AndroidManifest.xml:

<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />

In ios/MyApp/Info.plist:

<key>NSMicrophoneUsageDescription</key>
<string>We need microphone access to transcribe your speech.</string>

Downloading the Whisper Model

whisper.rn accepts a local file path to the .bin model file. The cleanest approach is to download it on first launch and cache it in the app's document directory:

import RNFS from 'react-native-fs';

const MODEL_URL =
  'https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin';
const MODEL_PATH = `${RNFS.DocumentDirectoryPath}/ggml-base.en.bin`;

export async function ensureModelDownloaded(onProgress) {
  const exists = await RNFS.exists(MODEL_PATH);
  if (exists) return MODEL_PATH;

  await RNFS.downloadFile({
    fromUrl: MODEL_URL,
    toFile: MODEL_PATH,
    progress: (res) => {
      const pct = Math.round((res.bytesWritten / res.contentLength) * 100);
      onProgress?.(pct);
    },
  }).promise;

  return MODEL_PATH;
}

Recording Audio

Configure react-native-audio-record to output a 16 kHz mono WAV file — the exact format Whisper expects:

import AudioRecord from 'react-native-audio-record';

const AUDIO_OPTIONS = {
  sampleRate: 16000,
  channels: 1,
  bitsPerSample: 16,
  wavFile: 'recording.wav',
};

export function startRecording() {
  AudioRecord.init(AUDIO_OPTIONS);
  AudioRecord.start();
}

export async function stopRecording() {
  const filePath = await AudioRecord.stop();
  return filePath; // absolute path to the .wav file
}

Running Transcription

import { initWhisper } from 'whisper.rn';

let whisperContext = null;

export async function loadWhisper(modelPath) {
  whisperContext = await initWhisper({ filePath: modelPath });
}

export async function transcribeFile(audioPath) {
  if (!whisperContext) throw new Error('Whisper not initialised');

  const { result } = await whisperContext.transcribe(audioPath, {
    language: 'en',
    maxLen: 1,
    tokenTimestamps: true,
  });

  return result; // plain-text transcription string
}

Building the UI

import React, { useState, useRef } from 'react';
import {
  View, Text, TouchableOpacity,
  ActivityIndicator, StyleSheet,
} from 'react-native';
import { startRecording, stopRecording } from './audio';
import { transcribeFile, loadWhisper } from './whisper';
import { ensureModelDownloaded } from './model';

export default function TranscriberScreen() {
  const [status, setStatus] = useState('idle');
  const [transcript, setTranscript] = useState('');
  const [downloadProgress, setDownloadProgress] = useState(0);
  const modelPath = useRef(null);

  async function handlePress() {
    if (status === 'idle') {
      if (!modelPath.current) {
        setStatus('downloading');
        modelPath.current = await ensureModelDownloaded(setDownloadProgress);
        await loadWhisper(modelPath.current);
      }
      setStatus('recording');
      startRecording();
    } else if (status === 'recording') {
      const audioPath = await stopRecording();
      setStatus('transcribing');
      const text = await transcribeFile(audioPath);
      setTranscript(text);
      setStatus('idle');
    }
  }

  const buttonLabel = {
    idle: 'Start Recording',
    downloading: `Downloading model ${downloadProgress}%`,
    recording: 'Stop & Transcribe',
    transcribing: 'Transcribing…',
  }[status];

  return (
    <View style={styles.container}>
      {status === 'transcribing' ? (
        <ActivityIndicator size="large" />
      ) : (
        <TouchableOpacity
          style={[styles.btn, status === 'recording' && styles.btnRecording]}
          onPress={handlePress}
          disabled={status === 'downloading' || status === 'transcribing'}
        >
          <Text style={styles.btnText}>{buttonLabel}</Text>
        </TouchableOpacity>
      )}
      {transcript ? <Text style={styles.transcript}>{transcript}</Text> : null}
    </View>
  );
}

Performance Tips

Initialise once: Call initWhisper at app startup (or after download), not before every transcription. Loading the model takes ~1–2 seconds.
Use base.en for English-only apps: It is 3x faster than small with minimal accuracy drop on clear speech.
Run on a background thread: whisper.rn handles this internally, but avoid blocking the JS thread with synchronous preprocessing.
Trim silence: Strip leading and trailing silence from the WAV before passing it to Whisper. A 5-second clip with 4 seconds of silence still takes as long as a genuine 5-second clip.

Error Handling

try {
  const text = await transcribeFile(audioPath);
  setTranscript(text);
} catch (err) {
  if (err.message.includes('model')) {
    // Model file corrupted — delete and re-download
    await RNFS.unlink(MODEL_PATH);
    modelPath.current = null;
    Alert.alert('Model error', 'Please restart and try again.');
  } else {
    Alert.alert('Transcription failed', err.message);
  }
}

whisper.rn is one of those libraries that genuinely changes what's possible in a mobile app. Once you have on-device transcription running, features like searchable voice notes, real-time captions, and hands-free data entry become straightforward to build — all without a cloud dependency.

Also available on Medium

Written by Vipul Kaushik — CTO at Rolling Around, 8+ years building React Native apps.

Hire me →

Free Developer Tools

JSON Formatter Image Compressor HEIC to JPG PDF Merger Word Counter