Tools

Whisper Desktop: Free Speech-to-Text in Under 5 Minutes

A 5-minute quick start guide for Whisper Desktop, an easy-to-install Windows application for running the open-source Whisper transcription model from OpenAI.

Mike Wolfe

Apr 6, 2025 • 4 min read

I've set up an AI workflow managed by a custom Notion database template to crank out one of my Access User Group video recap articles in about 10 minutes, start to finish:

When I first started doing these recaps, I relied on the YouTube automatic transcriptions which were tedious to copy and paste and–more importantly–were not always available.

Enter Whisper AI and the Whisper Desktop GUI.

Whisper AI

Whisper AI is a free and open-source project released by OpenAI in 2022 (back when the "Open" in "OpenAI" actually meant something).

Here's a description of the project from its GitHub readme:

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.

The two best things about this software are that it is:

Free
Small enough to download and run on a standard computer

Official Installation Instructions

The official installation instructions for Whisper use Python to run the tool.

There are a couple problems with this:

The install requires command-line steps for each of several dependencies
As an interpreted language, Python is very slow to execute compared to other languages

The second point is usually not a big deal.

The typical tradeoff with Python is faster development in exchange for slower execution. With modern hardware and typical programming tasks, though, the raw performance difference is often measured in tens or hundreds of milliseconds. In other words, it's negligible as far as humans are concerned.

AI transcription of speech to text is very processor-intensive (CPU and GPU).

Whisper Desktop

Whisper Desktop solves both of these issues.

Installation involves downloading a single zip file and extracting a portable .exe
The C++-based tool runs more than twice as fast as the Python implementation

Much faster than OpenAI’s implementation
On my desktop computer with GeForce 1080Ti GPU, medium model, 3:24 min speech took 45 seconds to transcribe with PyTorch and CUDA, but only 19 seconds with my implementation and DirectCompute.
Funfact: that’s 9.63 gigabytes runtime dependencies, versus 431 kilobytes Whisper.dll

Quick Start Guide

Follow the instructions here.

To give you an idea of just how easy it is to get started, I'll recreate the instructions here for your reference:

Download WhisperDesktop.zip from the “Releases” section of this repository, unpack the ZIP, and run WhisperDesktop.exe.
On the first screen it will ask you to download a model. [The author] recommends ggml-medium.bin (1.42GB in size), because [they’ve] mostly tested the software with that model.
The next screen allows to transcribe an audio file.
There’s another screen which allows to capture and transcribe or translate live audio from a microphone.

NOTE: I went with the author's suggestion to use the ggml-medium.bin model and I've been quite happy with it.

Screenshots

Performance Results

It takes just under one minute to transcribe ten minutes of audio on my home computer.

Here are the specs for the machine (which was near top of the line...about five years ago):

AMD Ryzen 9 3900X 12-Core processor (3.8 GHz)
32 GB RAM (3200 MHz)
AMD Radeon RX 5700 XT (24 GB VRAM)

Acknowledgements

Cover image generated by FLUX-pro-1.1-ultra