Project Reference: | ITP/049/23LP |
Project Title: | Application System Prototype for China HS Code Recommendation Automation with Large Language Model |
Hosting Institution: | LSCM R&D Centre (LSCM) |
Abstract: | This project aims to research how pretrained large language models (LLMs) can be used to build an automated system for China Harmonized System Code (HS code) recommendation. Automating HS code assignment accurately can significantly reduce customs revenue loss, compliance errors and trade delays. The application can benefit customs authorities, shippers, and brokers involved in cross-border trade. In particular, the project can enhance competitive capabilities of local SMEs acting as trade intermediaries between Hong Kong and Mainland China. The project has relevance with Government Initiatives to enhance local logistics and SME competitiveness for import and export business, specifically for Hong Kong’s role as “International Trade Center”, “ International Shipping Center” and regional player in smart logistics per the Chief Executive’s 2022 Policy Address (sections 44, 47 and 49) In the past, the unstructured nature of HS code reference descriptions makes automation tedious and painful with traditional methods. When testing pretrained LLMs' enhanced abilities to process nuanced queries, unstructured text, context and other information during retrieval augmentation, we found that certain pipelines allow LLMs to produce accurate HS codes with short user inputs. Based on pilot experimental testing, the best approach involves a step by step (2 digit at a time) retrieval workflow with in-context text block augmentation which shows much higher accuracy for first four digits compare to standard machine learning classification methods (>92%). From querying the pretrained LLM, the decision rules from such in-context text blocks also matches >90% of the 2022 HS Code Interpretation (China Customs, Volume 1, Chapters 1-3). The project proposes to test various prompting techniques, retrieval augmentation approach, study multiple LLMs including China-LLMs to find the best performer(s) to optimize the pipeline accuracy for HS code derivation. To allow the HS Code recommendation multistaged pipeline to be automated and operate as a service, we propose the following engineering development for the prototype: [1] Incorporate prefix trie as a more efficient data structure to encode hierarchical HS Codes to reduce multistage retrieval response time [2] A vector database can help in reducing response time by allowing for more efficient storage and retrieval of data, considering the practically usable context window size of around 2000 usable tokens. Finally, a chat service API will be built to facilitate convenient user interactions. Besides meeting accuracy and response time and rapidly updatable objectives, we will submit a study report summarizing the optimized processing architecture as well a comparison of three LLM's performances and limitations for HS Code recommendation. |
Project Coordinator: | Dr Frank C H TONG |
Approved Funding Amount: | HK$ 2.71 M |
Project Period: | 1 Dec 2023 - 31 May 2025 |