Sam Hames

Nyah!

A universal information theoretic approach to the identification of stopwords - s42256-019-0112-6.pdf

https://amaral.northwestern.edu/media/publication_pdfs/s42256-019-0112-6.pdf

Measures the "informativeness" of a word by comparing a words conditional entropy of appearance, against a null model of randomly appearing words.

Words that are indistinguishable from randomly distributed words are candidates for uninformative, or stop words.

Tags

Related By Tags

Details

Revised
Created
Edited