Building DropVox: A Local AI Transcription App for Mac

The Problem

My wife sends voice messages. Long ones. Two minutes of Portuguese while I'm in a meeting, unable to listen. WhatsApp has no transcription. I needed to know if it was urgent or just a story about her day.

Existing solutions? Upload to some cloud service, wait, hope they don't store my audio forever. No thanks.

I wanted something local, fast, and always available. A weekend later, DropVox was born.

The Spark: A Quick CLI

It started with 20 WhatsApp audio files I needed transcribed. I threw together a Python script using OpenAI's Whisper:

import whisper

model = whisper.load_model("base")
result = model.transcribe("audio.opus", language="pt")
print(result["text"])

It worked. Beautifully. The base model was surprisingly accurate for Portuguese, and it ran entirely on my MacBook. But opening a terminal every time? Too much friction.

The Vision: Menu Bar Always Ready

What if transcription was one click away? A menu bar app that:

Lives in your menu bar, always there
Opens a file picker with one click
Transcribes and copies to clipboard automatically
Shows a notification when done

No terminal. No extra windows. Just click, select, paste.

Day 1: Learning rumps

I discovered rumps, a Python framework for macOS menu bar apps. The API is delightfully simple:

import rumps

class DropVoxApp(rumps.App):
    def __init__(self):
        super().__init__("DropVox")

    @rumps.clicked("Select Audio Files...")
    def select_files(self, _):
        # Handle file selection
        pass

if __name__ == "__main__":
    DropVoxApp().run()

That's it. A working menu bar app in 10 lines.

The Drag-and-Drop Dream (Deferred)

My original vision included a floating drop zone—drag files directly onto it. But implementing native drag-and-drop requires diving into PyObjC and AppKit. For an MVP, a file picker would do.

I used tkinter's file dialog, which works surprisingly well with rumps:

import tkinter as tk
from tkinter import filedialog

root = tk.Tk()
root.withdraw()
file_paths = filedialog.askopenfilenames(
    title="Select Audio Files",
    filetypes=[("Audio Files", "*.opus *.mp3 *.m4a *.wav")]
)

Not as elegant as drag-and-drop, but functional. Ship first, polish later.

Day 2: The Threading Challenge

Whisper transcription takes time—sometimes 30 seconds for a two-minute audio. The UI can't freeze. Threading to the rescue:

import threading

def select_files(self, _):
    # ... file picker code ...

    thread = threading.Thread(
        target=self._transcribe_files,
        args=(file_paths,),
        daemon=True,
    )
    thread.start()

The menu bar title updates during transcription: "DropVox" → "[1/3]" → "[2/3]" → "DropVox". Simple progress indication without blocking.

The Polish: Notifications and Clipboard

macOS notifications via osascript:

import subprocess

def notify(title: str, message: str):
    script = f'display notification "{message}" with title "{title}"'
    subprocess.run(["osascript", "-e", script])

Clipboard via pyperclip:

import pyperclip
pyperclip.copy(transcription_text)

Now the workflow is: click → select → wait → paste. Four steps, zero terminal commands.

Model Selection

Whisper offers models from tiny to large. I added a submenu to switch between them:

Model	Speed	Accuracy	Size
tiny	Fastest	Good	39M
base	Fast	Better	74M
small	Medium	Good	244M
medium	Slow	Great	769M
large	Slowest	Best	1550M

For WhatsApp messages, base is the sweet spot. Quick enough to not feel slow, accurate enough for everyday messages.

Going Further: A Real Product

What started as a weekend project evolved into something more. I built a dedicated website at dropvox.app with:

PostHog Analytics - Tracking downloads and user engagement to understand what's working
A Blog - SEO-focused content about audio transcription, privacy, and Whisper
Full SEO - Structured data, sitemap, optimized meta tags for discoverability
i18n Support - English and Portuguese for my primary audiences

The site itself is a Next.js 16 app with Tailwind CSS v4, deployed on Vercel. Clean, fast, and informative.

What I Learned

rumps is perfect for simple menu bar apps - If you need more, look at PyObjC or Swift.
tkinter file dialogs work everywhere - Cross-platform and plays nice with other frameworks.
Threading is essential for any processing - Never block the UI thread.
Whisper is remarkable - Running a speech-to-text model locally on a laptop, with this accuracy, felt like magic.
MVPs should be embarrassingly simple - File picker instead of drag-and-drop. Ship it.
Analytics matter - Even for side projects, understanding user behavior helps prioritize features.

What's Next

DropVox works. I use it daily. But the roadmap is tempting:

Native drag-and-drop - A floating window you can drop files onto
Swift rewrite - Better performance, true native feel
Share Extension - Transcribe directly from WhatsApp's share menu
App Store - Maybe, if there's demand

For now, it's open source: github.com/helsky-labs/dropvox

Try It

Download the latest release from dropvox.app or build from source:

git clone https://github.com/helsky-labs/dropvox.git
cd dropvox
python3 -m venv venv
source venv/bin/activate
pip install -e .
dropvox

The first run downloads the Whisper model (~140MB for base). After that, it's instant.

Building something? I'm always happy to chat. Find me on GitHub.