Getting rid of the Chi-square and Log-likelihood tests for analysing vocabulary differences between corpora

Authors

  • Yves Bestgen Université Catholique de Louvain

DOI:

https://doi.org/10.7203/qf.22.11299

Keywords:

lexical differences between corpora, resampling test, WordSmith Tools, British and American English

Abstract

Log-likelihood and Chi-square tests are probably the most popular statistical tests used in corpus linguistics, especially when the research is aiming to describe the lexical variations between corpora. However, because this specific use of the Chi-square test is not valid, it produces far too many significant results. This paper explains the source of the problem (i.e., the non-independence of the observations), the reasons for which the usual solutions are not acceptable and which kinds of statistical test should be used instead. A corpus analysis conducted on the lexical differences between American and British English is then reported, in order to demonstrate the problem and to confirm the adequacy of the proposed solution. The last section presents the commands that can be used with WordSmith Tools, a very popular software for corpus processing, to obtain the necessary data for the adequate tests, as well as a very easy-to-use procedure in R, a free and easy to install statistical software, that performs these tests.

Downloads

Download data is not yet available.

Author Biography

Yves Bestgen, Université Catholique de Louvain

Faculté de psychologie et des sciences de l'éducation

Published

2018-01-07

How to Cite

Bestgen, Y. (2018). Getting rid of the Chi-square and Log-likelihood tests for analysing vocabulary differences between corpora. Quaderns De Filologia - Estudis Lingüístics, 22(22), 33–56. https://doi.org/10.7203/qf.22.11299
Metrics
Views/Downloads
  • Abstract
    1043
  • PDF
    1328
  • PDF (Español)
    233
  • PDF (Català)
    639

Metrics

Similar Articles

<< < 10 11 12 

You may also start an advanced similarity search for this article.