Descript Changes Everything
When you listen to someone’s voice online, are you sure it’s a real person? Because there’s a chance it might be Descript.
Have you seen this Arthur C. Clarke quote about new technology?
“Any sufficiently advanced technology is indistinguishable from magic.”
It’s what people thought when they heard their first radio or experienced their first elevator.
It’s also how I felt the other day when I stumbled across something called Descript.
At first glance, Descript doesn’t sound that remarkable. It is a collaborative audio/video editor that includes a screen recorder, publishing and full multitrack editing.
This seems very neat and useful but it hardly sounds like magic.
But it is. Let me explain.
Imagine you just finished a clip of yourself talking. This could be a podcast, YouTube video, webinar or recorded message to a loved one on their birthday.
Upload the clip to Descript
After a couple minutes Descript will hand you back a text version of everything you said. All of your words will appear in something similar to a Word or Google document.
And I mean everything you said, including the uhs, hems, ahs, sighs, sneezes, pauses and the sound of your mom shouting that dinner is ready.
Now take this text and edit it just like you would a Word or Google document. Go ahead and delete all those uhs, hems, ahs, sighs, sneezes, pauses and of course mom calling you to dinner.
Okay, play that clip of yourself talking again. It will be as smooth as silk. All those ugly bits have disappeared and you sound like a pro.
Think about what just happened. You speak. This speech is reproduced as words on a page. You change the words on the page, and that changes how you spoke. Cool, right?
There’s more.
Let’s say you recorded a ten minute audio clip and right smack dab in the middle you realize that you screwed up. Maybe you said “blah blah Google CEO Bill Gates” when you should have said “Google CEO Sundar Pichai.”
Okay, fire up Descript and get a text version of all your words. Delete “Bill Gates” and write in “Sundar Pichai.”
Now play that audio clip again.
Honest to God, you will hear yourself saying Sundar Pichai. Smooth as silk and exactly as if you would say it in your very own voice.
To see it for yourself, check out their awesome video below (it’s actually one of the best product ads I’ve seen).
Okay, Descript isn’t magic.
It’s a sophisticated AI that listens to how you sound, and then learns to sound just like you.
But Descript’s technology is close enough to magic that a lot of very smart people are sitting up and taking notice.
The company raised $30 million in January, and attracted a Who’s Who of tech personalities who injected their own personal wealth into the company.
These include people like Shopify billionaire Tobi Lutke, AngelList co-founder Naval Ravikant, the YouTuber Casey Neistat and Twitter co-founder Ev Williams among others.
So you might be wondering at this point, who’s the genius behind Descript that’s generating all of this excitement.
Well, do you remember Groupon?
The Founder
Descript is led by Andrew Mason, the same person who founded Groupon in 2008 and famously received a $6 billion offer from Google just two years later.
But it turned out Mason was better at founding a company than actually running one. Groupon fell as quickly as it rose, and Mason exited the company in 2013 with the same quirky style as he ran it.
“After four and a half intense and wonderful years as CEO of Groupon, I’ve decided that I’d like to spend more time with my family. Just kidding — I was fired today.”
After being fired from Groupon, Mason stayed true to his quirky form and almost immediately released an album called Hardly Workin’ (it’s still available on iTunes).
This all seems pretty bizarre until you realize that Mason isn’t your typical tech whizkid who dropped out of Stanford engineering after six months to start a company.
He has a degree in music from Northwestern. He did drop out of school, but that was out of a masters in public policy. And when the bespectacled 6'4" prodigy finally did come into some money, he didn’t splash out on a Tesla or a loft in San Francisco’s Mission District like some other new millionaires.
He bought a Steinway piano so he could relax playing Bach.
Clearly, Mason isn’t a typical entrepreneur. Just as Descript isn’t a typical company.
It reminds me a lot of the last time a new technology felt like magic.
The Big Picture
In 2020, the artificial intelligence lab OpenAI co-founded by Elon Musk quietly introduced a text predictor called GPT-3.
Trained on a trillion words scraped from the Internet, GPT-3 is designed to predict the most credible text that should follow a question or request.
It’s really, creepily good at what it does.
Look what it came up with when prompted to rewrite a passage from Harry Potter in the style of Ernest Hemingway:
“It was a cold day on Privet Drive. A child cried. Harry felt nothing. He was dryer than dust. He had been silent too long. He had not felt love. He had scarcely felt hate. Yet the Dementor’s Kiss killed nothing. Death didn’t leave him less dead than he had been a second before. It wasn’t about living or dying really. It wasn’t about death. It was about the Dark Lord, his creator, his engenderer. He was not resurrected. He was created again.”
And by Jane Austen:
“It is a truth universally acknowledged, that a broken Harry is in want of a book. This he knows to his cost; pressing that wretched nose against the window of a bookstore in Diagon Alley the last thing before he goes into hiding for several years, where upon, he goes straight to Azkaban.”
Now remember, this is a machine that’s writing all of this, and doing it within seconds of being asked.
When GPT-3 came out, all the simmering fears about AI boiled to the surface. People who were already worried about AI said here was yet more proof that artificial intelligence is going to replace humans.
The people who aren’t worried about AI disagreed.
No, they said, AI will simply help us do the menial stuff faster. It will give us more freedom to focus on the creative soft skills that machines can never replace.
But after seeing Descript, I wonder what the non-worriers will think.
To me, the Descript AI’s ability to reproduce speech feels a lot like it’s replacing a soft skill.
That’s because the way that we talk is fundamental to how others judge our personality. If we can say big words smoothly and quickly, we sound confident and intelligent. If we stumble, we don’t.
So what happens when a future Descript AI is able to adjust the timbre of our voice, add a purr to our words, or a positive uplift at the end of each sentence?
All these changes in tone, pace and inflection define so much of our personality.
Think about it. What happens when a machine can reproduce anyone’s voice so that it oozes charm, gravitas or charisma?
This feels like a glimpse into a very strange yet probable future where literally every voice we hear in podcasts, videos and movies will be created by a machine.
Let’s take it a bit further.
Put this future Descript AI next to a future version of GPT-3.
The future version of GPT does all the writing. And the future Descript AI does all the talking.
Where does that leave us?
Maybe there will be a human being somewhere at the very beginning of the creative process, providing an original premise or idea.
But what happens to all those other people in the world who are happily hammering out stories on keyboards and speaking into microphones?
It seems like those jobs might just disappear.
Like magic.