Monday, December 25, 2023

OnomOs Diachronic Corpus is available

 link 

The Czech National Corpus, hosted by Charles University in Prague, has recently unveiled a linguistic treasure – OnomOs. This diachronic corpus is the result of collaborative efforts led by researchers from the Department of Czech Language at the Faculty of Arts, University of Ostrava, spearheaded by Jaroslav David and his team.


OnomOs stands out as a diachronic corpus meticulously crafted from selected issues of the (Rudé) Právo newspaper. Developed with a keen eye for onymic nuances, this corpus is a testament to the dedication and expertise of the researchers involved. One of the defining features of OnomOs is its inclusion of named entity annotation. This annotation adheres to the onomastic concept of proper names, enriching the corpus with a layer of linguistic detail that adds depth and context.

For those eager to explore the intricacies of OnomOs, detailed information and resources can be found on the Czech National Corpus website: OnomOs – Czech National Corpus. Here, you’ll find insights into the construction process, the onomastic approach to proprial units and more. Thanks to the dedication of the team at the University of Ostrava, this diachronic corpus opens new avenues for exploring the evolution of proper names in the linguistic landscape. As researchers and language enthusiasts, let’s embark on a journey of discovery with OnomOs.



The OnomOs corpus is a linguistically processed database of texts from the periodicals Rudé právo (published 1920–1995) and Právo (1995–present). It always contains one issue from each decade in which (Rudé) Právo was published. The corpus includes texts in which the language component dominates; therefore, not included are, for example, advertisements, cinema, theatre and radio programmes, some types of texts from the sports section (e.g. scoreboards and player rosters), comics or crossword puzzles. The structure of the corpus is presented in more detail in Figure 1. In total, the corpus contains 255 149 tokens.

No comments:

Post a Comment