Project Reference: | ITP/030/23LP |
Project Title: | Cantonese-English code-switching transcription to formal text |
Hosting Institution: | LSCM R&D Centre (LSCM) |
Abstract: | With an increasing demand on Cantonese transcription for media liaison, public information, digital news and social media in recent years, a stable and systematic transcription service is essential to alleviate the burden of human transcribers and increase their productivity. Since always, transcription has never been an easy task. Different individual may have different perspectives and understanding on the materials to be transcribed, resulting in big variations on the transcription results. What make it worse is that Cantonese conversations occasionally includes English in between, increasing the complexity of the transcription task. More importantly, confidentiality on transcription and translation materials has to be addressed. One cannot rely on external online services under such circumstance so the self-sufficiency of the service is of high importance. To overcome the abovementioned difficulties and limitations, we propose a self-contained offline Cantonese automatic speech recognition system in line with a Cantonese to written Chinese translation engine to serve the purpose. Our research and development undertakings involve: (1) Data collection of Cantonese audio and transcripts (2) Data collection of Cantonese-written Chinese parallel corpus (3) Development of an automatic speech recognition system for Cantonese audios (4) Development of a Cantonese to written Chinese translation engine (5) Development of a user interface for the whole system |
Project Coordinator: | Dr Chung Dak Shum |
Approved Funding Amount: | HK$ 2.78 M |
Project Period: | 30 Sep 2023 - 31 Mar 2025 |