A universal information theoretic approach to the identification of stopwords - s42256-019-0112-6.pdf
https://amaral.northwestern.edu/media/publication_pdfs/s42256-019-0112-6.pdfMeasures the "informativeness" of a word by comparing a words conditional entropy of appearance, against a null model of randomly appearing words.
Words that are indistinguishable from randomly distributed words are candidates for uninformative, or stop words.
Tags
Related By Tags
Details
- Revised
- Created
- Edited