Resource title

Hierarchy-based Partition Models: Using Classification Hierarchies to

Resource image

image for OpenScout resource :: Hierarchy-based Partition Models: Using Classification Hierarchies to

Resource description

We propose a novel machine learningtechnique that can be used to estimateprobability distributions for categoricalrandom variables that are equipped witha natural set of classification hierarchies,such as words equipped with word classhierarchies, wordnet hierarchies, and suffixand affix hierarchies. We evaluate theestimator on bigram language modellingwith a hierarchy based on word suffixes,using English, Danish, and Finnish datafrom the Europarl corpus with training setsof up to 1–1.5 million words. The resultsshow that the proposed estimator outperformsmodified Kneser-Ney smoothing interms of perplexity on unseen data. Thissuggests that important information is hiddenin the classification hierarchies that weroutinely use in computational linguistics,but that we are unable to utilize this informationfully because our current statisticaltechniques are either based on simplecounting models or designed for samplespaces with a distance metric, rather thansample spaces with a non-metric topologygiven by a classification hierarchy.Keywords: machine learning; categoricalvariables; classification hierarchies; languagemodelling; statistical estimation

Resource author

Matthias Buch-Kromann, Martin Haulrich

Resource publisher

Resource publish date

Resource language

eng

Resource content type

application/pdf

Resource resource URL

http://hdl.handle.net/10398/8221

Resource license

Check the according license before adaptation. When adapting give credits to the original author.