Computational tools and spoken corpora design: an ongoing dialogue

Victoria Vázquez Rozas; Mario Barcala

doi:10.7203/caplletra.69.17270

Computational tools and spoken corpora design: an ongoing dialogue

Authors

Victoria Vázquez Rozas Universidade de Santiago de Compostela http://orcid.org/0000-0001-8155-669X
Mario Barcala NLPgo Technologies S.L. http://orcid.org/0000-0002-6736-2773

DOI:

https://doi.org/10.7203/caplletra.69.17270

Keywords:

oral corpora, stand-off annotation, in-line annotation, segmentation, POS tagging

Abstract

The design of an oral corpus and the processes of registering, codifying and treating the materials in order to build a useful resource for linguistic analysis prompt numerous decisions regarding theory and methodology. This article is focused on those stages of corpus construction which are more clearly conditioned by the computational processing necessary to make it functional. In order to adequately match the initial expectations and the real possibilities of using the tool, each feature we intend to codify must be measured against the workload and the means required to do so. Therefore, it is essential to take into account the available possibilities of processing and exploitation as they have a crucial impact on decisions regarding the corpus’ construction.

Based on experience acquired in the construction of the ESLORA corpus, the present article looks into some of the problems arising in the process of designing an oral corpus, such as the delicacy with which oral phenomena are represented, the segmentation of the discourse, the coexistence of different simultaneous tagging systems and the particularities of annotation in a bilingual or multilingual context.

Downloads

Download data is not yet available.

Downloads

Published

2020-10-07

How to Cite

Vázquez Rozas, V., & Barcala, M. (2020). Computational tools and spoken corpora design: an ongoing dialogue. Caplletra. Revista Internacional De Filologia, (69), 221–240. https://doi.org/10.7203/caplletra.69.17270

Download Citation

Metrics

Views/Downloads

Abstract
730
PDF
220

Issue

Caplletra 69 (tardor 2020)

Section

ARTICLES MONOGRÀFIC

License

Authors submitting work to Caplletra for publication must be the legitimate holder of the usage rights. Legitimacy for the purposes of publishing the work must also include images, tables, diagrams and any other materials that may complement the text, whether they are the author of such material or not.

Copyright: on publishing their work in the journal, the author grants Caplletra. Revista Internacional de Filologia usage rights (reproduction, distribution and public communication) for both the paper printed version and for the electronic version.

All work published in Caplletra is covered by the Creative Commons license type Attribution-NonCommercial-NoDerivatives 4.0 (CC BY-NC-ND 4.0).

RESPONSABILITY

Caplletra. Revista Internacional de Filologia does not necessarily identify with the points of view expressed in the papers it publishes.

Caplletra. Revista Internacional de Filologia accepts no responsibility whatsoever for any eventual infringement of intellectual property rights on the part of authors.

Computational tools and spoken corpora design: an ongoing dialogue

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

License

Metrics

Similar Articles

Caplletra

Make a Submission

Language

Information

Keywords