Neural Machine Translation Engine – Confidential Documents with Domain Focus
Many document translations in the business and legal contexts require a high degree of confidentiality and accuracy. Despite the breakthrough and advancement in neural machine translation, public translation services like Google Translate and Microsoft Translator, however, cannot be used if the document being translated is not supposed to be accessed by other parties. Moverover, most public translation services are designed for translating general documents and do not meet the accuracy requirement for business and legal purposes. In this project, we propose to develop an indigenous neural machine translation engine that can be owned by individual organisations in order to keep the documents to be translated fully confidential. The engine also addresses the usual problems of domain-focused translation so as to improve the accuracy of the translation.
Our research and development will focus on: (1) the construction of a Machine Learning Development Environment including a GPU farm and storage system, and providing services and tools for machine learning development cycle; (2) the development of tools to automatically classify data (text) into different domains to enrich the context of the training data; (3) the development of tools to extract and pair up sentences in parallel text corpus to enhance the training effectiveness; (4) exploring different kinds of Transformer architectures by utilising their domain adaption abilities to handle domain-focused translation, as well as (5) the building of a translation engine based on the architecture, which can improve the in-domain translation.
|