Last week Sam Altman announced Whisper v3 on stage at OpenAI’s Dev Day. Like any in the community, I was eager to see how the model performed. After all, at Deepgram, we love all things voice AI. Therefore we decided to take it for a spin. 

This post shows how I got it running and the results of the testing that I did. Getting the testing setup was relatively straightforward; however, we found some surprising results.

I’ll show some peculiarities up front then go through the thorough analysis after that. 

🔍 The Peculiarities We Found

Peculiarity #1:

Start at 4:06 in this audio clip (the same one embedded above). This file is one we used in our testing.

At that moment in the audio, the ground-truth transcription reads “Yeah, I have one Strider XS9. That one’s from 2020. I’ve got two of the Fidgets XSR7s from 2019. And the player tablet is a V2090 that’s dated 2015.”

However, the Whisper-v3 transcript says: