Resource title

Conditional complexity of compression for authorship attribution

Resource image

image for OpenScout resource :: Conditional complexity of compression for authorship attribution

Resource description

We introduce new stylometry tools based on the sliced conditional compression complexity of literary texts which are inspired by the nearly optimal application of the incomputable Kolmogorov conditional complexity (and presumably approximates it). Whereas other stylometry tools can occasionally be very close for different authors, our statistic is apparently strictly minimal for the true author, if the query and training texts are sufficiently large, compressor is sufficiently good and sampling bias is avoided (as in the poll samplings). We tune it and test its performance on attributing the Federalist papers (Madison vs. Hamilton). Our results confirm the previous attribution of Federalist papers by Mosteller and Wallace (1964) to Madison using the Naive Bayes classifier and the same attribution based on alternative classifiers such as SVM, and the second order Markov model of language. Then we apply our method for studying the attribution of the early poems from the Shakespeare Canon and the continuation of Marlowe's poem 'Hero and Leander' ascribed to G. Chapman.

Resource author

Mikhail B. Malyutov, Chammi Irosha Wickramasinghe, Sufeng Li

Resource publisher

Resource publish date

Resource language


Resource content type


Resource resource URL

Resource license

Adapt according to the presented license agreement and reference the original author.