Thinking About Text-to-Speech AI on Mac

Recently in the tech community, there’s been a lot of conversation around OpenAI’s Whisper engine, used to take audio recordings and transcribe them to text with a fairly high level of accuracy. Outside of that, there’s been dialog about AI that’s both good and bad. But Whisper has been fascinating to watch, as people have found ways to run it locally on a Mac. Just the other day I even saw an app someone launched, called MacWhisper, which lets you take full advantage of Whisper without having to do anything in Terminal.

A screenshot of MacWhisper running on macOS, showing a transcription of a Mitch Hedberg stand-up performance.
This was one of the sections that the AI nailed! The mistakes were a lot less prevalent than I was expecting.

This got me thinking about the other possibilities. If you can run Whisper on macOS (and I don’t believe it’s even optimized for Apple Silicon yet), there has to be other AI’s that could be run on the Mac. I know that AI-generated art is a very touchy subject right now so I won’t even suggest that today, but I do have something else I’d be very interested in: text-to-speech.

It’s great that Whisper is able to take audio and transcribe it to text fairly well. In my testing, I used the basic version of MacWhisper (which has the smaller, less accurate data model) and gave it a clip of Mitch Hedberg performing at Just for Laughs. It was able to transcribe everything fairly well and I only found that it struggled with uncommon proper nouns (Hedberg, Yoplait, etc) and occasionally him mumbling or repeating something. I’d rate it fairly well as an 8/10, though, requiring only a quick cleanup pass to be ready for prime time.

Now let’s take those tools the other way. There’s plenty of services online that allow you to convert text-to-speech, all varying to the degree that they sound natural. I have plans this year to add audio versions of every post I put up on this blog starting with my posts from the beginning of this year, but I’m still settling on a method to this. Seeing how well Whisper can run on Mac, though, I really don’t want an online solution for this. I want Whisper, but for generating speech from blog posts.

I have a feeling we’re a lot closer to seeing this tech come into the world at an accessible level for all. Yes, there are potential security concerns I’ve seen expressed over an effort that Microsoft is working on, but the unfortunate reality is that any technology brought into the world can and will be used by bad actors, we’re just going to need to find ways to grow as a society. Even without an easy and straightforward way to do this at home, deepfakes are already a thing, but the accessibility applications of something like this far outweigh the risks.

One thought on “Thinking About Text-to-Speech AI on Mac

Leave a comment