게시물 상세

Audiobooks: Issues and Future Developments

 

2021.05.03

 

상단

 

Recent issues surrounding audiobooks

 

The voice service provided in announcements, news reading services, or e-book devices uses TTS (Text-to-Speech) technology that converts written letters into sound. As it has no accent and cuts the sentence in weird places, we have to track what it said while listening to the voice. Recently, as more people read books while traveling, e-books are transforming into audiobooks. Most people wish to hear a human’s voice when listening to one. The audiobook market is growing rapidly, but the production time, cost, and technological limitations have been slowing progress down. However, as the A.I. (artificial intelligence) voice synthesis technology is now being applied where it can process natural language, synthesize voices, and recognize the context, listeners can feel as if a real voice actor or actress is reading the story. If this technology becomes capable of expressing feelings, A.I. voice actors/actresses might be able to perform in almost all the sectors except audio dramas that require reading between the lines. In short, the A.I. will be meeting the delicate needs of the listeners.

 

The application of the A.I. voice synthesis technology will make listeners feel as if a human voice actor/actress is reading the lines.

 

Also, publisher Woongjin Thinkbig has recently released two A.I. - based audiobooks - Smart Parenting for Smart Kids and The Prince by Niccolò Machiavelli. Another publisher Daekyo Junior has released an audiobook edition of Picky Kid, a children’s book, in two versions, each recorded by a human actor/actress and the voice synthesis technology. The latter version reads the story in the unique voice of an A.I. An official from the publisher said, “The voice synthesis version makes a perfect representation of the actual actor/actress’s voice. As our subscribers like it, we are planning to use the technology to expand into overseas markets by producing multi-language versions such as English, Chinese, and Japanese, adding to educational content and new broadcast content.”
Meanwhile, Naver, a South Korean online platform, is steadily developing “Clova Voice,” a voice technology on its A.I. platform Clova. With the A.I. voice synthesis technology, Naver has rendered the voices of actress Yoo In-Na and newscaster Oh Sang-Jin, applying them to the voice news provided on Naver and some of its audio content. The best example is actress Yoo In-Na’s The Old Man and the Sea released in 2018. The total view of the book’s summary that has the running time of 3 hours and 42 minutes has exceeded 600 thousand. It’s like selling more than 600 thousand physical copies. But the thing is, it was not recorded in the real voice of the actress but was an A.I.’s voice produced with the voice synthesis technology. If you don’t listen carefully enough, you will not be able to recognize that it was read by an A.I.; subscribers compliment that it is easier to listen as the pronunciation is very clear.

 

배우 우인나의 노인과 바다

The Old Man and the Sea read by Yoo In-Na

 

Meanwhile, in April 2021, an e-book service provider Millie showcased an audiobook service using an A.I.’s voice for the first time in the industry. There are 5 A.I. voices depending on the type of book. It was developed to be similar to the voice of an actual person after analyzing the actual sound of voice actors/actresses. An official from Millie said, “There are even customer reviews that it is more convenient to listen as the pronunciation is clear. We have released 100 A.I. audiobooks, and are planning to add 500 titles every month from this month.”

 

Possibility of releasing A.I. audiobooks

 

To cast a voice actor/actress, publishers had to schedule the right time and spend a considerable amount of time recording their voice. Little breaks in between were a must in order to keep the actors/actresses’ voices in the best condition. While the average running time of full audiobooks is 7 hours, it takes about 20 to 30 hours to record one. As voice actors/actresses only record for 3 to 4 hours a day considering their voice condition, it usually takes about a week to finish a book. For example, the running time of 192-page-long Kim Ji Young, Born 1982 (Minumsa) is about 4.5 hours. You need at least 4 days to record a book of such length. Then, you need another whole day to produce that recording into a fully mastered audiobook. In short, you need 5 days to make a 4.5-hour audiobook. Let’s suppose that an A.I. voice actor/actress takes over their place. There is a total of 3,072 sentences in the book Kim Ji Young, Born 1982. As it takes about a minute to record one sentence, you only need 51 hours to finish the book. In short, it takes about 6 days for one person if we assume he/she records for 8 hours a day, and 2 days for three people in the same condition. If six people jump into the recording, then one day will be enough to finish the book. Another person from the voice synthesis industry boasted that if a famous celebrity records from at least 20 minutes to 3-4 hours in total, the A.I. can analyze their voice, tone, accent, pronunciation, and speed, and create whatever sound it could be, adding that if you adopt an A.I. technology, you can make one audiobook in just 10 seconds regardless of the number of sentences. 
In the previous issue, we took an in-depth look at the market status where there are more people “listening” to books nowadays. According to Naver Audioclip, for a year from July 2018 when it began to service audiobooks to September 2019, more than 100 thousand subscribers listened to 8,700 audiobooks. The accumulated sales were 180 thousand titles. Also, another audiobook service platform Millie, announced that its audiobook membership increased by 1.8-fold compared to two years ago, and the proportion is growing each year that now more than one-fifth of its entire membership (23.6%) are enjoying audiobooks. However, among 100 thousand e-book contents, only one thousand, which is one percent of it, are made into audiobooks. As such, while the demand is rising, the audiobook industry had difficulties securing various content due to the time consumed for recording and financial issues. Meanwhile, A.I.-produced audiobooks are economically efficient as you can finish a full audiobook in just one day. An official from an A.I. voice service provider said, “Compared to the audiobook production where you have to spend millions of won to hire a voice actor/actress, the A.I. can mass-produce various content for just one-tenth of the cost.”

 

The voice synthesis technology is likely to upgrade audiobooks to the level of paper books in the near future.

 

As the demand for audio content is growing, tech companies, publishers, and platforms are paying attention to the development potential of the audiobook market, particularly pioneering it with A.I. voice technology. The publishing industry welcomes A.I. technology but is careful on the other hand. Not every print books can be produced in audio, but only a few of high possibility of making profits have been published in audio until today, the A.I. voice synthesis technology is likely to boost the market to the level where it can stand shoulder to shoulder with the paper book market. The publishing industry is also paying heed to the fact the audiobook sales affect that of paper books in a positive way. It is not an exaggeration to say that the audiobook market’s success hinges on how perfectly the voice synthesis technology can reproduce the human voice. Survey results and statistics announced by audiobook service providers show that many audiobook users tend to buy the original paper book if they like the audio content. This is anticipated to bring a positive synergy effect to the stagnant publishing market.

 

Copyright law surrounding audiobooks

 

Basically, as intellectual property rights are protected for 70 years after the holder’s death, one needs to receive permission for licensing the copyright from the original rights holder. Given that, in order to make and service audiobooks, you need to be granted the rights of derivative work on top of the exclusive publication rights from the original rights holder, and if it is a translated work, you also need to secure the subsidiary rights.
Also, in regards to the voice that takes the greatest portion in audiobook productions, protecting the narrator’ rights is getting more attention as the market grows. In Korea, according to Copyright Act Article 2, No. 4, voice actors/actresses are defined as performers and thus have the rights as the neighboring rights holder. Therefore, any voice actor/actress that has narrated, read, or expressed the existing work in any way, is protected as a performer.
Then, will an A.I. voice actor/actress also have this neighboring rights? Basically, copyright is only accepted for “people,” meaning humans that created the works. If the voice providers, including voice actors/actresses whose voice has been used for an A.I. service, are not adequately protected, their position will be severely threatened. Voice actors/actresses participate in a variety of voice works such as narrations, promotional videos, advertisements, voiceovers, radio/audio dramas, and animations, which include audiobook recordings, in-game character voice-overs, and in-game narrations that have recently been receiving public attention. According to an international voice actor/actress company “Voices.com,” the relevant global market volume is expected to reach 11.5 trillion won by 2025. The audiobook market will reach 1.46 trillion won, and the in-game voice actor/actress market will grow to 345 billion won during the same period.
As such, the public attention today is focused on how the voice actor/actress market will be protected, and how the works created with A.I. technology will be protected and who will be responsible for it. There was news that the Korean government will preemptively enact a relevant law with the advancement of A.I. technology in December 2020. It is anticipated that the law will clarify legal responsibilities over social issues - including whether intellectual property rights will be recognized when an A.I. creates a work or develops a product - and reduce conflicts on compensations caused by A.I.

 

Concerns surrounding A.I. voices and the necessity of setting an ethical standard for A.I.

 

There are negative opinions about A.I. voice technology that people can maliciously use celebrities’ voices for false purposes. Some worry about “deep voice,” a sound version of “deep fake.” The synthesized voice processed through deep voice technology is so delicate that normal people cannot distinguish. To make matters worse, even the general public can be targeted as victims of deep fake cases. To prevent such a situation from happening, it is highly suggested that technologies that can distinguish deep voice and A.I. synthesis technology should be developed alongside the advancement of voice synthesis technology.
Furthermore, SBS, a broadcast company in Korea, aired a TV show titled “Battle of the Century: A.I. vs. Human” on January 29, 2021 that adopted A.I. voice and technology. How singer Kim Bum-Soo’s famous song “I Miss You” was sung in the A.I voice of the late Kim Kwang-Seok shocked a number of his fans and people watching the show. However, the rapidly developing A.I. technology is a double-edged sword. Issues surrounding A.I. ethics have risen as social problems. Professor Choi Byeong-Ho of the Human-inspired A.I. Research Center at Korea University pointed out that “Only a few among A.I. service providers have an ethical standard. Korean government and society should establish a system preemptively, but changes are made when a relevant incident occurs. Abuse of A.I. technology takes place as technological development outpaces systematic settlement. The A.I. ethical standard today is merely at the declarative level. It needs a systematic approach.” Most importantly, the personal rights and copyrights of the deceased are mentioned as the major issue considering personal rights and human dignity could be damaged if the deceased is summoned anytime by technology for commercial purposes.
It is time an “A.I. ethical standard that takes humanity as the priority” must be set with a specific implementation guideline and change to public awareness. The application of advancing A.I. technology along with relevant laws and systems are expected to let audiobook and voice-based services lead the industry in an ideal direction.

 


Written by Beatrice YongIn Lin (Publisher of Storytel South Korea)

kbbok

Beatrice YongIn Lin (Publisher of Storytel South Korea)

#Audiobooks#A.I.#Copyright law#deep voice
If you liked this article, share it with others. 페이스북트위터블로그인쇄

Pre Megazine