Friday, May 15, 2020

INFORMATIZED CAPTION ENHANCEMENT BASED ON IBM WATSON API AND SPEAKER PRONUNCIATION TIME-DB

Author :  Yong-Sik Choi

Affiliation :  Department of Computer Science and Engineering, Dongguk University

Country :  Korea

Category :  Computer Science & Information Technology

Volume, Issue, Month, Year :  8, 2, January, 2018

Abstract 

This paper aims to improve the inaccuracy problem of the existing informatized caption in the noisy environment by using the additional caption information. The IBM Watson API can automatically generate the informatized caption including the timing information and the speaker ID information from the voice information input. In this IBM Watson API, when there is noise in the voice signal, the recognition results are not good, causing the informatized caption error. Especially, it is more easily found in movies such as background music and special sound. Specifically, to reduce caption error, additional captions and voice information are entered at the same time, and the result of the informatized caption of voice information from IBM Watson API is compared with the original text to automatically detect and modify the error part. Based on the database containing the average pronunciation time, each word for each speaker is changed into the informatized caption in this process. In this way, more precise informatized captions could be generated based on the IBM Watson API.

Keyword :  Informatized caption, Speaker Pronunciation Time, IBM Watson API, Speech to Text Translation

For More Details  :  https://airccj.org/CSCP/vol8/csit88211.pdf

No comments:

Post a Comment