Hybrid Machine Translation Using Malay-English Language Parallel Text Extraction From Comparable Text

Yeong, Yin Lai (2024) Hybrid Machine Translation Using Malay-English Language Parallel Text Extraction From Comparable Text. PhD thesis, Perpustakaan Hamzah Sendut.

[img] PDF
Download (999kB)

Abstract

Machine translation (mt) investigates the approaches to translate a text from a source language (sl) to a target language (tl). Parallel text is the resource that is essential for building the translation model of an mt system. A parallel text is a text and its translation in one or more languages. Nevertheless, there are not many parallel texts that are freely available. Thus, a few directions were explored and investigated in this thesis to improve the translation quality despite the limited parallel text. Firstly, we analysed using linguistic information in machine translation to compensate for the lack of data for training. Secondly, we studied the problem of acquiring a parallel text from comparable texts. Comparable texts are similar texts in different languages that may be independently produced. Thirdly, we investigated the architecture of statistical machine translation (smt) and neural machine translation (nmt) to combine the strength of both systems. This study was carried out using english-malay machine translation in the news domain and computer science domain. For the first problem, we propose to use affixation and part-of-speech information to build a translation model. We improve the bleu score from 13.40% to 15.41% using 315,194 parallel texts. In the second problem, we propose an algorithm to extract parallel sentences and parallel fragments/subsentences from comparable texts. The approach finds matching comparable texts. Then, a sentence aligner and a classifier are used to align the sentences in the comparable text.

Item Type: Thesis (PhD)
Subjects: Q Science > QA Mathematics > QA75.5-76.95 Electronic computers. Computer science
Divisions: Pusat Pengajian Sains Komputer (School of Computer Sciences) > Thesis
Depositing User: Mr Hasmizar Mansor
Date Deposited: 02 Mar 2026 02:52
Last Modified: 02 Mar 2026 02:52
URI: http://eprints.usm.my/id/eprint/63683

Actions (login required)

View Item View Item
Share