Audiobooks: Issues and Future Developments
Recent issues surrounding audiobooks
The voice service provided in announcements, news reading services, or e-book devices uses TTS (Text-to-Speech) technology that converts written letters into sound. As it has no accent and cuts the sentence in weird places, we have to track what it said while listening to the voice. Recently, as more people read books while traveling, e-books are transforming into audiobooks. Most people wish to hear a human’s voice when listening to one. The audiobook market is growing rapidly, but the production time, cost, and technological limitations have been slowing progress down. However, as the A.I. (artificial intelligence) voice synthesis technology is now being applied where it can process natural language, synthesize voices, and recognize the context, listeners can feel as if a real voice actor or actress is reading the story. If this technology becomes capable of expressing feelings, A.I. voice actors/actresses might be able to perform in almost all the sectors except audio dramas that require reading between the lines. In short, the A.I. will be meeting the delicate needs of the listeners.
Also, publisher Woongjin Thinkbig has recently released two A.I. - based audiobooks - Smart Parenting for Smart Kids and The Prince by Niccolò Machiavelli. Another publisher Daekyo Junior has released an audiobook edition of Picky Kid, a children’s book, in two versions, each recorded by a human actor/actress and the voice synthesis technology. The latter version reads the story in the unique voice of an A.I. An official from the publisher said, “The voice synthesis version makes a perfect representation of the actual actor/actress’s voice. As our subscribers like it, we are planning to use the technology to expand into overseas markets by producing multi-language versions such as English, Chinese, and Japanese, adding to educational content and new broadcast content.”
The Old Man and the Sea read by Yoo In-Na
Meanwhile, in April 2021, an e-book service provider Millie showcased an audiobook service using an A.I.’s voice for the first time in the industry. There are 5 A.I. voices depending on the type of book. It was developed to be similar to the voice of an actual person after analyzing the actual sound of voice actors/actresses. An official from Millie said, “There are even customer reviews that it is more convenient to listen as the pronunciation is clear. We have released 100 A.I. audiobooks, and are planning to add 500 titles every month from this month.”
Possibility of releasing A.I. audiobooks
To cast a voice actor/actress, publishers had to schedule the right time and spend a considerable amount of time recording their voice. Little breaks in between were a must in order to keep the actors/actresses’ voices in the best condition. While the average running time of full audiobooks is 7 hours, it takes about 20 to 30 hours to record one. As voice actors/actresses only record for 3 to 4 hours a day considering their voice condition, it usually takes about a week to finish a book. For example, the running time of 192-page-long Kim Ji Young, Born 1982 (Minumsa) is about 4.5 hours. You need at least 4 days to record a book of such length. Then, you need another whole day to produce that recording into a fully mastered audiobook. In short, you need 5 days to make a 4.5-hour audiobook. Let’s suppose that an A.I. voice actor/actress takes over their place. There is a total of 3,072 sentences in the book Kim Ji Young, Born 1982. As it takes about a minute to record one sentence, you only need 51 hours to finish the book. In short, it takes about 6 days for one person if we assume he/she records for 8 hours a day, and 2 days for three people in the same condition. If six people jump into the recording, then one day will be enough to finish the book. Another person from the voice synthesis industry boasted that if a famous celebrity records from at least 20 minutes to 3-4 hours in total, the A.I. can analyze their voice, tone, accent, pronunciation, and speed, and create whatever sound it could be, adding that if you adopt an A.I. technology, you can make one audiobook in just 10 seconds regardless of the number of sentences.
As the demand for audio content is growing, tech companies, publishers, and platforms are paying attention to the development potential of the audiobook market, particularly pioneering it with A.I. voice technology. The publishing industry welcomes A.I. technology but is careful on the other hand. Not every print books can be produced in audio, but only a few of high possibility of making profits have been published in audio until today, the A.I. voice synthesis technology is likely to boost the market to the level where it can stand shoulder to shoulder with the paper book market. The publishing industry is also paying heed to the fact the audiobook sales affect that of paper books in a positive way. It is not an exaggeration to say that the audiobook market’s success hinges on how perfectly the voice synthesis technology can reproduce the human voice. Survey results and statistics announced by audiobook service providers show that many audiobook users tend to buy the original paper book if they like the audio content. This is anticipated to bring a positive synergy effect to the stagnant publishing market.
Copyright law surrounding audiobooks
Basically, as intellectual property rights are protected for 70 years after the holder’s death, one needs to receive permission for licensing the copyright from the original rights holder. Given that, in order to make and service audiobooks, you need to be granted the rights of derivative work on top of the exclusive publication rights from the original rights holder, and if it is a translated work, you also need to secure the subsidiary rights.
Concerns surrounding A.I. voices and the necessity of setting an ethical standard for A.I.
There are negative opinions about A.I. voice technology that people can maliciously use celebrities’ voices for false purposes. Some worry about “deep voice,” a sound version of “deep fake.” The synthesized voice processed through deep voice technology is so delicate that normal people cannot distinguish. To make matters worse, even the general public can be targeted as victims of deep fake cases. To prevent such a situation from happening, it is highly suggested that technologies that can distinguish deep voice and A.I. synthesis technology should be developed alongside the advancement of voice synthesis technology.
Written by Beatrice YongIn Lin (Publisher of Storytel South Korea)
Beatrice YongIn Lin (Publisher of Storytel South Korea)#Audiobooks #A.I. #Copyright law #deep voice