In 1962, famed science fiction writer Arthur C Clarke was visiting a friend at Bell Labs where he witnessed the IBM 704 demonstration. In the lab, an IBM 704 computer was used alongside a vocoder synthesizer to recreate the song Daisy Bell. Inspired by this demonstration, Clarke incorporated this moment into his book and film 2001: A Space Odyssey in a scene where the HAL9000 computer sings Daisy Bell as it is being disabled. This particular moment of pop culture and technology has been referenced continuously since then. In the television show American Horror Story, an android recreation sings Daisy Bell, and Microsoft’s personal assistant, Cortana, may sing Daisy Bell when asked to sing a song.

Text to speech or speech synthesis is the process of producing human speech using artificial methods. Speech synthesis technology has a long history that dates back to about 1000 AD and increasingly became more and more popular as technologies advanced. Today speech synthesis is used in various industries from healthcare to finance, and by almost everyone with a smartphone as a result of the rise in virtual assistants and accessibility apps. As a result, text to speech systems have reshaped how work is done today, doing things that we could not have imagined when text to speech systems were first introduced.

Text to speech technology gets even more interesting in popular culture. Since Frankeistein in 1818, speech synthesis has been used in books, tv shows, and movies to depict robots, androids and AI in different scenarios. This includes everything from malicious Artificial Intelligence takeovers (Dial F for Frankenstein, Battlestar Galactica, I, Robot) to more positive portrayals (Star Trek, WALL-E, Lost in Space). The use of AI in various forms of storytelling has in turn influenced real life technologies leading to a symbiotic relationship between technology and popular culture.

Speech synthesis has also had a large impact on the media and entertainment industry. As the use of AI gets more mainstream, media and entertainment companies are using the opportunity to streamline their creative process.This includes the use of the technology in podcasting, filmmaking, audiobooks, and other areas. 

Evolution of text to speech technology in the media

The introduction of AI (and by extension text to speech systems) into fiction is credited to Mary Shelley, the author of Frankenstein. Frankenstein, popularly known as the first true science fiction book, tells the tale of a young scientist who builds a sentient creature. Written more than 200 years ago, the creation has very little similarities to actual artificial intelligence today but is still referenced as the original artificial intelligence depiction in a piece of media. In 1920, a Czech play Rossumovi Univerzální Roboti, the term “robot” was first introduced from the Slavonic word “robota” which means forced labor. As we know, this word is still being used today by both science fiction and technologists.

By the 1960s, science fiction depicting artificial intelligence had become a very popular sub-genre with stories and movies like Lymphater’, Dial F for Frankenstein, Colossus and eventually 2001: A Space Odyssey entering mainstream consciousness. By the 2000s, there were a ton of iconic AI characters in books, short stories, and films. This includes media like Westworld, Battlestar Galactica, The Transformers, Terminator, and The Matrix. This period also coincided with the golden age of speech synthesis systems. The first text to speech system had just been created in 1975 in Japan by Noriko Umeda leading to an increase in research into commercial speech synthesis systems. Although artificial intelligence in the media is usually criticized for being unrealistic, scientists in real life, from Jaron Lanier to Yoshua Bengio, states that they have drawn inspiration from it with Lanier saying that movies like 2001: A Space Odyssey and The Terminator have been especially influential in their work.

The evolution of artificial intelligence in the real world has also influenced the way the creative industry works. Nowadays, AI is being used for editing movies, predicting the success of a piece of media, music composition and even writing the script for a movie or TV show. 

Text to speech in film and television

AI voices and virtual assistants are bountiful in film and television where they are often depicted as antagonists in the near future  trying to take over the world. In these settings, they are often portrayed as sentient and intelligent beings who mostly turn on their creator. Other portrayals include AI voices as friendly companions or romantic interests in more positive portrayals. This has introduced some iconic AI voices and virtual assistants to the general public including Jarvis from the Iron Man and subsequent Avengers movies, the Star Trek computer, ARIIA in Eagle Eye, Samantha from HER, and the 2021 animated movie, The Michells vs. the Machines.

Meanwhile text to speech systems have also proven useful behind the scenes when making these movies and TV shows. With text to speech software, filmmakers and editors can convert written text into human speech which is useful when creating voiceovers and subtitles. The use of text to speech software decreases the budget for a production, making filmaking more accessible for indie and newer filmmakers. Speech synthesis also creates an opportunity to feature and execute unique voices and characters that might be otherwise limited by human voices.

Text to speech in gaming

The video game industry is notorious for always being at the forefront of innovation and technological advancements, and speech synthesis is not the exception. Game developers are using text to speech systems to bring game characters to life, allowing them to relate with players in a more interactive manner. This experience allows gamers to have more input in world building and characterizing of games, making for a more engaging experience. Last year, the game studio EA, developed a patent that would allow players to provide the voices for game characters. This way players would be able to type the dialogue that they want their character to say which would then go through a text to speech synthesizer. Speech synthesis is also used to voice characters in games in different languages making them more accessible to non-English speaking communities.

As in filmaking, incorporating speech synthesis systems into the game developing process decreases the cost of creating a game albeit for different purposes. Game developing includes a period of drafting where storylines, characters, and animations are formed and finalized. This process usually requires a lot of changes withscripts and characters which is where text to speech systems comes in. Automating this process with speech synthesis software makes the game developing process a cheaper and more efficient one.

Text to speech and social media

Speech synthesis has been embraced wholeheartedly by social media users and influencers who use the text to speech feature on various social media apps on their videos. Tiktok and Instagram’s text to speech feature allows users to voice their videos using a list of available voices ranging from robotic to soothing. This allows creators to add context and explanations to their videos, replace their own voices for anonymity or make their videos more accessible to a larger audience. On Twitch, streamers can use Twitch’s text to speech donations feature which lets a robot read each donor’s name and attached message.

On YouTube, creators are using text to speech speech software in more interesting ways. When going through the platform, you might discover one or two (or a thousand) videos using the voices of famous politicians and entertainers. These videos are mostly entertaining, one showcases American presidents playing UNO while another video shows Bill Gates and Socrates debating the benefits and threats of AI. Ironically, speech synthesis in social media platforms allows users and creators to communicate their thoughts and feelings in a more personal manner while creating new and cheaper ways for them to tell stories.

Text to speech in personal entertainment

These days, speech produced by text to speech software sounds very convincingly human, making them suitable for voicing audio books and podcasts. This means that a broader range of audio books can be produced at a cheaper rate for independent and new authors who do not have a lot of backing or funding. Although the use of speech synthesis in voicing audio books is still controversial (leading audiobook app, Audible, does not allow AI voiced audiobooks on its platform), it provides an alternative for authors.

Text to speech systems are also used by virtual assistants as seen in Apple’s Siri, Microsoft’s Cortana, and Amazon’s Alexa. These virtual assistants are used daily by over 123 million people in the US alone for tasks ranging from reading texts and email messages aloud, to placing phone calls and taking dictation. For a lot of people living with disabilities, virtual assistants are a game changer in terms of accessibility.

Conclusion

Text to speech systems have proven useful in various aspects of the media and entertainment industry both in front of the screens and behind the scenes. Although there are some concerns as to how onscreen depictions might influence real-life ideas of how AI or text to speech systems work, it is important to acknowledge the symbiotic relationship between science fiction and real life scientists and engineers. Appreciating the difference between creativity and reality is important when talking about the abilities of artificial intelligence.

As text to speech technology advances and evolves, the way the technology is used in this industry will also evolve with it. For now, speech synthesis is playing a huge role in the media and entertainment industry, helping to make it more accessible to newcommers and independent creators while pushing the limits of their imaginations.

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.

Sign Up FreeBook a Demo
Deepgram
Essential Building Blocks for Voice AI