Resource title

Syntactic reordering in statistical machine translation

Resource image

image for OpenScout resource :: Syntactic reordering in statistical machine translation

Resource description

Reordering has been an important topic in statistical machine translation(SMT) as long as SMT has been around. State-of-the-art SMT systems suchas Pharaoh (Koehn, 2004a) still employ a simplistic model of the reorderingprocess to do non-local reordering. This model penalizes any reordering nomatter the words. The reordering is only selected if it leads to a translationthat looks like a much better sentence than the alternative.Recent developments have, however, seen improvements in translationquality following from syntax-based reordering. One such developmentis the pre-translation approach that adjusts the source sentence to resembletarget language word order prior to translation. This is done based onrules that are either manually created or automatically learned from wordaligned parallel corpora.We introduce a novel approach to syntactic reordering. This approachprovides better exploitation of the information in the reordering rules andeliminates problematic biases of previous approaches. Although the approachis examined within a pre-translation reordering framework, it easilyextends to other frameworks. Our approach significantly outperforms astate-of-the-art phrase-based SMT system and previous approaches to pretranslationreordering, including (Li et al., 2007; Zhang et al., 2007b; Crego& Mari˜ no, 2007). This is consistent both for a very close language pair,English-Danish, and a very distant language pair, English-Arabic.We also propose automatic reordering rule learning based on a rich setof linguistic information. As opposed to most previous approaches thatextract a large set of rules, our approach produces a small set of predominantlygeneral rules. These provide a good reflection of the main reorderingissues of a given language pair. We examine the influence of several parameters that may have influence on the quality of the rules learned.Finally, we provide a new approach for improving automatic word alignment.This word alignment is used in the above task of automatically learningreordering rules. Our approach learns from hand aligned data how tocombine several automatic word alignments to one superior word alignment.The automatic word alignments are created from the same data thathas been preprocessed with different tokenization schemes. Thus utilizingthe different strengths that different tokenization schemes exhibit in wordalignment. We achieve a 38% error reduction for the automatic word alignment

Resource author

Jakob Elming

Resource publisher

Resource publish date

Resource language

eng

Resource content type

application/pdf

Resource resource URL

http://hdl.handle.net/10398/7922

Resource license

Check the according license before adaptation. When adapting give credits to the original author.