Project Reference: | ITP/047/19LP |
Project Title: | Neural Machine Translation Engine - Confidential Documents with Domain Focus |
Hosting Institution: | LSCM R&D Centre (LSCM) |
Abstract: | Many document translations in business and legal context require confidentiality and accuracy. With the breakthrough advancement in neural machine translation, public translation services, like Google Translate and Microsoft Translator, however, still cannot be used if the document being translated is not supposed to be seen by other parties. Moreover, most public translation services aim towards general documents and do not meet the accuracy required of business and legal purposes. In this project, we propose to develop an indigenous neural machine translation engine that can be owned by individual organizations; thus, keeping the documents to be translated fully confidentiality. The engine also addresses domain-focused translation which aims to improve the accuracy of the translation. Our research and development undertakings involve: (1) the construction of a Machine Learning Development Environment including a GPU farm and storage system, and providing services and tools for machine learning development cycle; (2) the development of tools to automatically classify data (text) into different domains to enrich the context of the training data; (3) the development of tools to extract and pair up sentences in parallel text corpus to enhance the training effectiveness; (4) the extension of the Transformer architecture by utilizing domain adaption technique to handle domainfocused translation; (5) the building of a translation engine based on the architecture, which can improve in-domain translation. |
Project Coordinator: | Dr Chung-Dak Shum |
Approved Funding Amount: | HK$16.4 M |
Project Period: | 02 Jan 2020 - 31 Dec 2021 |