Language evolution 'in silico': From large-scale data to artificial agents creating languages from scratch

Thomas Brochhagen

doi:10.7203/metode.15.27692

Authors

Thomas Brochhagen Pompeu Fabra University https://orcid.org/0000-0002-0301-0541

DOI:

https://doi.org/10.7203/metode.15.27692

Keywords:

language, evolution, artificial intelligence, typology, universals

Abstract

We all speak a language and have intuitions about it: from its vocabulary to the way words are put together according to its grammar. However, much is still to be understood about the processes that make language even possible and those that shape its evolution. Recent computational advances have enabled us to address these issues from new angles. This article highlights methods and findings that the age of computation has given rise to, from learning from large-scale data from thousands of languages to the evolution of languages created by artificial intelligence.

Downloads

Download data is not yet available.

Author Biography

Thomas Brochhagen, Pompeu Fabra University

Tenure-track professor in computational cognitive science at the Universitat Pompeu Fabra’s Department of Translation and Language Sciences (Spain). His research interests include language evolution, artificial intelligence, Bayesian models, and statistics.

References

Bouchacourt, D., & Baroni, M. (2018). How agents see things: On visual representations in an emergent language game. In E. Riloff, D. Chiang, J. Hockenmaier & J. Tsujii, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Progressing (p. 981–985). Association for Computational Linguistics.

BigScience Workshop. (2023). BLOOM: A 176B-parameter open-access multilingual language model. arXiv. https:/doi.org/10.48550/arxiv.2211.05100

Brochhagen, T., & Boleda, G. (2022). When do languages use the same word for different meanings? The Goldilocks principle in colexification. Cognition, 226, 105179. https://doi.org/10.1016/j.cognition.2022.105179

Brochhagen, T., Boleda, G., Gualdoni, E., & Xu, Y. (2023). From language development to language evolution: A unified view of human lexical creativity. Science, 381(6656), 431–436. https://doi.org/10.1126/science.ade7981

Chaabouni, R., Kharitonov, E., Dupoux, E., & Baroni, M. (2019). Anti-efficient encoding in emergent communication. In Proceedings of NeurIPS 2019 (33d Conference on Neural Information Processing Systems) (p. 6290–6300). Curran Associates.

Corballis, M. C. (2008). Not the last word. American Scientist, 96(1), 68–70.

Deng, J., Dong, W., Socher, R., Li, L.-J.,Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In IEEE Computer Vision and Pattern Recognition (CVPR) (p. 248–255). https://doi.org/10.1109/CVPR.2009.5206848

Kemp, C., & Regier, T. (2012). Kinship categories across languages reflect general communicative principles. Science, 336(6084), 1049–1054. https://doi.org/10.1126/science.1218811

Lazaridou, A., & Baroni, M. (2020). Emergent multi-agent communication in the deep learning era. arXiv. https://doi.org/10.48550/arXiv.2006. 02419

Rzymski, C., Tresoldi, T., Greenhill, S. J., Wu, M.-S., Schweikhard, N. E., Koptjevskaja-Tamm, M., Gast, V., Bodt, T. A., Hantgan, A., Kaiping, G. A., Chang, S., Lai, Y., Morozova, N., Arjava, H., Hübler, N., Koile, E., Pepper, S., Proos, M., Van Epps, B., ... List, J.-M. (2020). The database of cross-linguistic colexifications, reproducible analysis of cross- linguistic polysemies. Scientific Data, 7, 13. https://doi.org/10.1038/s41597-019-0341-x

Seifart, F., Paschen, L., & Stave, M. (2022). Language Documentation Reference Corpus (DoReCo) 1.2. [Archive material]. Leibniz-Zentrum Allgemeine Sprachwissenschaft & laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). https://doi.org/10.34847/nkl.7cbfq779

Passmore, S., Barth, W., Greenhill, S. J., Quinn, K., Sheard, C., Argyriou, P., Birchall, J., Bowern, C., Calladine, J., Deb, A., Diederen, A., Metsäranta, N. P., Araujo, L. H., Schembri, R., Hickey-Hall, J., Honkola, T., Mitchell, A., Poole, L., Rácz, P. M., ... Jordan, F. M. (2023). Kinbank: A global database of kinship terminology. PLOS ONE, 18(5), e0283218. https://doi.org/10.1371/journal.pone.0283218

Xu, Y., Duong, K., Malt, B. C., Jiang, S., & Srinivasan, M. (2020). Conceptual relations predict colexification across languages. Cognition, 201, 104280. https://doi.org/10.1016/j.cognition.2020.104280

Zaslavsky, N., Kemp, C., Regier, T., & Tishby, N. (2018). Efficient compression in color naming and its evolution. Proceedings of the National Academy of Sciences, 115(31), 7937–7942. https://doi.org/10.1073/pnas. 1800521115

Language evolution 'in silico'

From large-scale data to artificial agents creating languages from scratch

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biography

Thomas Brochhagen, Pompeu Fabra University

References

Published

How to Cite

Issue

Section

License

Metrics

Similar Articles

Make a Submission

Language

Information

Keywords

scimago

scopus

jcr

redib

fecyt

urkund_antiplagio