On-device speech recognition has been a hard problem for mobile developers for years. Cloud APIs add latency, cost, and privacy concerns — especially in healthcare and enterprise apps where audio may contain sensitive information. whisper.rn brings OpenAI's Whisper model directly onto the device, giving you fast, accurate, offline transcription without sending a single byte to an external server. I've used this in a telehealth project and the accuracy at the base.en model level genuinely surprised me.
Why Whisper RN Over Cloud APIs?
- Privacy: Audio never leaves the device — essential for HIPAA-adjacent use cases.
- Offline: Works without a network connection.
- Cost: No per-minute billing.
- Latency: On a modern iPhone or Android flagship, a 30-second clip transcribes in 3–6 seconds.
The trade-off is model size (the base model is ~140 MB) and slightly lower accuracy than Whisper Large running server-side. For most voice memo or note-taking use cases, base.en hits the sweet spot.
Installation
npm install whisper.rn
# or
yarn add whisper.rn
For iOS, run pod install inside the ios/ directory. On Android, the library ships pre-compiled native binaries, so no extra NDK configuration is needed as of whisper.rn 0.3+.
Also install react-native-audio-record for microphone capture:
npm install react-native-audio-record
Add microphone permissions. In android/app/src/main/AndroidManifest.xml:
<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
In ios/MyApp/Info.plist:
<key>NSMicrophoneUsageDescription</key>
<string>We need microphone access to transcribe your speech.</string>
Downloading the Whisper Model
whisper.rn accepts a local file path to the .bin model file. The cleanest approach is to download it on first launch and cache it in the app's document directory:
import RNFS from 'react-native-fs';
const MODEL_URL =
'https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin';
const MODEL_PATH = `${RNFS.DocumentDirectoryPath}/ggml-base.en.bin`;
export async function ensureModelDownloaded(onProgress) {
const exists = await RNFS.exists(MODEL_PATH);
if (exists) return MODEL_PATH;
await RNFS.downloadFile({
fromUrl: MODEL_URL,
toFile: MODEL_PATH,
progress: (res) => {
const pct = Math.round((res.bytesWritten / res.contentLength) * 100);
onProgress?.(pct);
},
}).promise;
return MODEL_PATH;
}
Recording Audio
Configure react-native-audio-record to output a 16 kHz mono WAV file — the exact format Whisper expects:
import AudioRecord from 'react-native-audio-record';
const AUDIO_OPTIONS = {
sampleRate: 16000,
channels: 1,
bitsPerSample: 16,
wavFile: 'recording.wav',
};
export function startRecording() {
AudioRecord.init(AUDIO_OPTIONS);
AudioRecord.start();
}
export async function stopRecording() {
const filePath = await AudioRecord.stop();
return filePath; // absolute path to the .wav file
}
Running Transcription
import { initWhisper } from 'whisper.rn';
let whisperContext = null;
export async function loadWhisper(modelPath) {
whisperContext = await initWhisper({ filePath: modelPath });
}
export async function transcribeFile(audioPath) {
if (!whisperContext) throw new Error('Whisper not initialised');
const { result } = await whisperContext.transcribe(audioPath, {
language: 'en',
maxLen: 1,
tokenTimestamps: true,
});
return result; // plain-text transcription string
}
Building the UI
import React, { useState, useRef } from 'react';
import {
View, Text, TouchableOpacity,
ActivityIndicator, StyleSheet,
} from 'react-native';
import { startRecording, stopRecording } from './audio';
import { transcribeFile, loadWhisper } from './whisper';
import { ensureModelDownloaded } from './model';
export default function TranscriberScreen() {
const [status, setStatus] = useState('idle');
const [transcript, setTranscript] = useState('');
const [downloadProgress, setDownloadProgress] = useState(0);
const modelPath = useRef(null);
async function handlePress() {
if (status === 'idle') {
if (!modelPath.current) {
setStatus('downloading');
modelPath.current = await ensureModelDownloaded(setDownloadProgress);
await loadWhisper(modelPath.current);
}
setStatus('recording');
startRecording();
} else if (status === 'recording') {
const audioPath = await stopRecording();
setStatus('transcribing');
const text = await transcribeFile(audioPath);
setTranscript(text);
setStatus('idle');
}
}
const buttonLabel = {
idle: 'Start Recording',
downloading: `Downloading model ${downloadProgress}%`,
recording: 'Stop & Transcribe',
transcribing: 'Transcribing…',
}[status];
return (
<View style={styles.container}>
{status === 'transcribing' ? (
<ActivityIndicator size="large" />
) : (
<TouchableOpacity
style={[styles.btn, status === 'recording' && styles.btnRecording]}
onPress={handlePress}
disabled={status === 'downloading' || status === 'transcribing'}
>
<Text style={styles.btnText}>{buttonLabel}</Text>
</TouchableOpacity>
)}
{transcript ? <Text style={styles.transcript}>{transcript}</Text> : null}
</View>
);
}
Performance Tips
- Initialise once: Call
initWhisperat app startup (or after download), not before every transcription. Loading the model takes ~1–2 seconds. - Use
base.enfor English-only apps: It is 3x faster thansmallwith minimal accuracy drop on clear speech. - Run on a background thread: whisper.rn handles this internally, but avoid blocking the JS thread with synchronous preprocessing.
- Trim silence: Strip leading and trailing silence from the WAV before passing it to Whisper. A 5-second clip with 4 seconds of silence still takes as long as a genuine 5-second clip.
Error Handling
try {
const text = await transcribeFile(audioPath);
setTranscript(text);
} catch (err) {
if (err.message.includes('model')) {
// Model file corrupted — delete and re-download
await RNFS.unlink(MODEL_PATH);
modelPath.current = null;
Alert.alert('Model error', 'Please restart and try again.');
} else {
Alert.alert('Transcription failed', err.message);
}
}
whisper.rn is one of those libraries that genuinely changes what's possible in a mobile app. Once you have on-device transcription running, features like searchable voice notes, real-time captions, and hands-free data entry become straightforward to build — all without a cloud dependency.

