Building DropVox: A Local AI Transcription App for Mac
The Problem
My wife sends voice messages. Long ones. Two minutes of Portuguese while I'm in a meeting, unable to listen. WhatsApp has no transcription. I needed to know if it was urgent or just a story about her day.
Existing solutions? Upload to some cloud service, wait, hope they don't store my audio forever. No thanks.
I wanted something local, fast, and always available. A weekend later, DropVox was born.
The Spark: A Quick CLI
It started with 20 WhatsApp audio files I needed transcribed. I threw together a Python script using OpenAI's Whisper:
import whisper
model = whisper.load_model("base")
result = model.transcribe("audio.opus", language="pt")
print(result["text"])
It worked. Beautifully. The base model was surprisingly accurate for Portuguese, and it ran entirely on my MacBook. But opening a terminal every time? Too much friction.
The Vision: Menu Bar Always Ready
What if transcription was one click away? A menu bar app that:
- Lives in your menu bar, always there
- Opens a file picker with one click
- Transcribes and copies to clipboard automatically
- Shows a notification when done
No terminal. No extra windows. Just click, select, paste.
Day 1: Learning rumps
I discovered rumps, a Python framework for macOS menu bar apps. The API is delightfully simple:
import rumps
class DropVoxApp(rumps.App):
def __init__(self):
super().__init__("DropVox")
@rumps.clicked("Select Audio Files...")
def select_files(self, _):
# Handle file selection
pass
if __name__ == "__main__":
DropVoxApp().run()
That's it. A working menu bar app in 10 lines.
The Drag-and-Drop Dream (Deferred)
My original vision included a floating drop zone—drag files directly onto it. But implementing native drag-and-drop requires diving into PyObjC and AppKit. For an MVP, a file picker would do.
I used tkinter's file dialog, which works surprisingly well with rumps:
import tkinter as tk
from tkinter import filedialog
root = tk.Tk()
root.withdraw()
file_paths = filedialog.askopenfilenames(
title="Select Audio Files",
filetypes=[("Audio Files", "*.opus *.mp3 *.m4a *.wav")]
)
Not as elegant as drag-and-drop, but functional. Ship first, polish later.
Day 2: The Threading Challenge
Whisper transcription takes time—sometimes 30 seconds for a two-minute audio. The UI can't freeze. Threading to the rescue:
import threading
def select_files(self, _):
# ... file picker code ...
thread = threading.Thread(
target=self._transcribe_files,
args=(file_paths,),
daemon=True,
)
thread.start()
The menu bar title updates during transcription: "DropVox" → "[1/3]" → "[2/3]" → "DropVox". Simple progress indication without blocking.
The Polish: Notifications and Clipboard
macOS notifications via osascript:
import subprocess
def notify(title: str, message: str):
script = f'display notification "{message}" with title "{title}"'
subprocess.run(["osascript", "-e", script])
Clipboard via pyperclip:
import pyperclip
pyperclip.copy(transcription_text)
Now the workflow is: click → select → wait → paste. Four steps, zero terminal commands.
Model Selection
Whisper offers models from tiny to large. I added a submenu to switch between them:
| Model | Speed | Accuracy | Size |
|---|---|---|---|
| tiny | Fastest | Good | 39M |
| base | Fast | Better | 74M |
| small | Medium | Good | 244M |
| medium | Slow | Great | 769M |
| large | Slowest | Best | 1550M |
For WhatsApp messages, base is the sweet spot. Quick enough to not feel slow, accurate enough for everyday messages.
Going Further: A Real Product
What started as a weekend project evolved into something more. I built a dedicated website at dropvox.app with:
- PostHog Analytics - Tracking downloads and user engagement to understand what's working
- A Blog - SEO-focused content about audio transcription, privacy, and Whisper
- Full SEO - Structured data, sitemap, optimized meta tags for discoverability
- i18n Support - English and Portuguese for my primary audiences
The site itself is a Next.js 16 app with Tailwind CSS v4, deployed on Vercel. Clean, fast, and informative.
What I Learned
-
rumps is perfect for simple menu bar apps - If you need more, look at PyObjC or Swift.
-
tkinter file dialogs work everywhere - Cross-platform and plays nice with other frameworks.
-
Threading is essential for any processing - Never block the UI thread.
-
Whisper is remarkable - Running a speech-to-text model locally on a laptop, with this accuracy, felt like magic.
-
MVPs should be embarrassingly simple - File picker instead of drag-and-drop. Ship it.
-
Analytics matter - Even for side projects, understanding user behavior helps prioritize features.
What's Next
DropVox works. I use it daily. But the roadmap is tempting:
- Native drag-and-drop - A floating window you can drop files onto
- Swift rewrite - Better performance, true native feel
- Share Extension - Transcribe directly from WhatsApp's share menu
- App Store - Maybe, if there's demand
For now, it's open source: github.com/helsky-labs/dropvox
Try It
Download the latest release from dropvox.app or build from source:
git clone https://github.com/helsky-labs/dropvox.git
cd dropvox
python3 -m venv venv
source venv/bin/activate
pip install -e .
dropvox
The first run downloads the Whisper model (~140MB for base). After that, it's instant.
Building something? I'm always happy to chat. Find me on GitHub.