The CLiGS textbox

The CLiGS textbox

The CLiGS textbox contains several corpora of literary texts in Romance languages. It was made made available by the CLiGS junior research group. [more...]

Aggregation 1–10 of 10
  1. text/tg.collection+tg.aggregation+xml
  2. text/tg.collection+tg.aggregation+xml

The CLiGS textbox

The CLiGS texbox is a collection of corpora of literature in Romance languages. Origanlly, it was meant as an early-publication channel of the CLiGs junior research group. It started in 2015 in GitHub, where more documentation can be found.

All texts are in the public domain. The markup and metadata we have added are provided with a CC-BY license (Creative Commons Attribution).

Each collection includes a description of the criteria of text selection, the available data formats, a citation suggestion, etc. More information about formal schemas (for example the TEI schema linked to from all the TEI files) can be found in the reference repository of CLiGS.

In the context of the NFDI consortium Text+ and the development of a fluffier import workflow, these corpora have also been published in TextGrid Repository in 2024. This new publication includes some more metadata (such as Basic Classification) which are shown as facets. With this publication, the corpora now benefit from the features of the TextGrid Repository (recombination, browsing, conversion into other formats, long-term archiving, integration with other resources, access via APIs and Python libraries, etc.) and can thus better comply with the FAIR principles.

Furthermore, this corpus is the first to use the genre entities from the GND as part of the metadata and facets. The different concepts in the Romance languages have been mapped to the terms drama, novel (roman in French, novela in Spanish, romanzo in Italian, romance in Portuguese) and novela (nouvelle in French, cuento in Spanish, novella in Italian). Although the mapping between languages is debatable (especially in the case of the Spanish cuento), it was preferred to use fewer terms, even if this meant losing some cultural specificity. The English-German translation of these terms has followed the data in the GND.

Related Publications

  • Schöch, Christof, José Calvo Tello, Ulrike Henny-Krahmer, and Stefanie Popp. 2019. “The CLiGS Textbox: Building and Using Collections of Literary Texts in Romance Languages Encoded in XML-TEI.” Journal of the Text Encoding Initiative. https://journals.openedition.org/jtei/2085.
  • Calvo Tello, José, Ulrike Henny-Krahmer, and Christof Schöch. 2018. “Textbox: análisis del léxico mediante corpus literarios.” In Historia del léxico español y Humanidades digitales, edited by Dolores Corbella Diaz, Alejandro Fajardo Aguirre, and Jutta Langenbacher-Liebgott, 223–51. Berlin: Peter Lang.

Important Publications of the Junior Research Group

  • Hesselbach, Robert, José Calvo Tello, Ulrike Henny-Krahmer, Christof Schöch, and Daniel Schlör, eds. 2024. Digital Stylistics in Romance Studies and Beyond. Heidelberg: Heidelberg University Press.
  • Henny-Krahmer, Ulrike. 2023. „Genre Analysis and Corpus Design: Nineteenth Century Spanish-American Novels (1830–1910)“. Universität Würzburg. https://doi.org/10.25972/OPUS-31999.
  • Calvo Tello, José. 2021. The Novel in the Spanish Silver Age: A Digital Analysis of Genre Using Machine Learning. Digital Humanities Research 4. Bielefeld: transcript. https://www.transcript-verlag.de/978-3-8376-5925-2.