Wednesday, August 3, 2016

Hidden Markup – The Digital Work Environment of the “Digital Dictionary of Surnames in Germany”

Horn, Franziska, Sandra Denzer and Jörg Hambuch. “Hidden Markup — The Digital Work Environment of the "Digital Dictionary of Surnames in Germany".” Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5, 2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). doi:10.4242/BalisageVol17.Horn01.

Hidden Markup – The Digital Work Environment of the Digital Dictionary of Surnames in Germany

Balisage: The Markup Conference 2016August 2 - 5, 2016


Designers are not users – that is one of the main principles one has to follow regarding project design according to Nielsen 2008. In this paper, we present the digital work environment Onodi that was designed for the project Digital Dictionary of Surnames in Germany (Digitales Familiennamenwörterbuch Deutschlands, DFD) in close collaboration with its users. Onodi’s three main components are the oXygen XML Editor, an eXist-db database and the content management system TYPO3. We developed a project-specific graphical user interface that enables the creation of texts with TEI-conformant markup without working directly with the XML code.
First, we will briefly present the Digital Dictionary of Surnames in Germany as a long-term project that records the entire inventory of surnames occurring in Germany. After briefly introducing the general structure of the work environment, the features that enable XML editing for technologically less-skilled users are explained in detail. Based on the work Denzer/Horn 2014 this paper presents three new features of the digital work environment: 1) integration of a Zotero literature database, currently with about 1700 entries, 2) copying and pasting of identifiers and 3) an automatic publication process.
This paper complements other publications on similar software, e.g. ediarum as a digital work environment for editing manuscripts (Dumont/Fechner 2014) and dictionary writing systems (e. g. Atkins/Rundell 2008; Abel/Klosa 2012). New insights into dictionary production presented here may offer connecting factors for other text editions as well as dictionary projects.

Project Presentation

The Digital Dictionary of Surnames in Germany is a long-term project under the auspices of the Academy of Sciences and Literature Mainz and in collaboration with Technische Universität Darmstadt and the Johannes Gutenberg University (JGU). It began in 2012 and has a planned duration of 24 years. The aim of the project is to prepare the first-ever comprehensive digital dictionary of surnames which exist in Germany.[1]
In the database of the DFD, all current surnames (roughly 850 000) in Germany are lexicographically collated. The published articles, in the end about 200 000, provide information about their etymology and origin. Additionally, the surname entries provide information, for example, about the geographical distribution of the surname, the occurrence of the name in countries other than Germany, morphological or semantic variants and further reading on the topic.
Also available on the DFD website is information about aspects of the history and development of names in different countries, their (cultural-)historical context and general linguistic aspects. Furthermore, the dictionary is embedded in a research portal, which can be seen as a gateway to various projects and information related to the field of onomastic studies.
Previous surname dictionaries only covered a fraction of existing surnames, and foreign surnames are for the most part not included. The DFD includes names of foreign origin as well. By using information about the geographical distribution and localization of surnames made possible by data mapping, one can provide further support for the etymological interpretation (Schmuck/Dräger 2008; Nübling/Kunze 2006). Thus, contradictory and outdated information from (older) existing dictionaries can be corrected. The new method of mapping the data concerning surnames is made possible by a program which has been developed for the forerunner project, German Surname Atlas (Deutscher Familiennamenatlas, DFA). The data are based on the telephone directory of the telecommunication company Deutsche Telekom, with the records dating from the year 2005. The software uses information about surnames, postal codes and name frequency to create a map.[2]The benefits of the new method can be illustrated by the plausible interpretation of names, which were previously uninterpreted. The name Fixemer, for example, is not accounted for in the consulted German or in the relevant foreign-language dictionaries. Due to its geographical distribution in the Southern Palatinate and Saarland, it was discovered that the name is locational, derived from the French place name Fixem (French commune in the Moselle department, northeast of Thionville).
The online dictionary registers names for which at least ten different telephone numbers are listed. In addition to these names, variants are included which have fewer than ten tokens but similar morphological or semantic features. The modular approach of lemma selection by frequency, variation and theme provides a diverse range of recorded names.
Users of the DFD include the interested general public and academic researchers. Therefore, we provide a user-friendly website, which is free and accessible to everyone. An example of support provided for the layperson, is the glossary, with explanations of technical terms used in the articles. The advanced search, currently under development, will provide new search possibilities for a diverse set of research questions. Based on the developed classification scheme of names, one can create lists of names with the same categories. Also one can compile names with common linguistic features (for example Latin genitive suffix –i) or all annotated variants belonging to one basic name.
The DFD represents a resource, which can be used by many research disciplines. Surnames offer a variety of information: They serve as a linguistic source, e.g. for dialectology, because they preserve historical forms of languages and language variations. Furthermore, migration researchers can use surnames to track population movements. One such finding can be mentioned here: In German cities and regions with a strong industrial sector – especially automotive industry, such as Munich, Wolfsburg, Stuttgart or the Rhine-Main and Ruhr area – one can find many names with a South Italian origin like Russo, Esposito and Greco. However, surnames with a North Italian origin, like Ferrari, are rare (Dräger 2011, p. 143).[3]

Figure 1: Map of Italian surnames in Germany

The map shows the distribution of the Italian surnames Russo, Esposito, Rossi, Greco and Ferrari within Germany. One can see clusters around the regions with focus on (automotive) industry – Munich, Stuttgart, Wolfsburg, the Rhine-Main and Ruhr area.
The information provided by surnames is also interesting for historians, e.g. the differentiation of occupational groups in the Middle Ages. For instance, determinative elements of compounds of Müller (English meaning: miller), such as Freimüller, Bannmüller, Fronmüller and Hofmüller, reflect the different social status in the feudal system. Frei- in the name Freimüller indicates that the miller was a free man, while the prefix Fron- in Fronmüller suggests, that the miller had to pay a kind of rent to his feudal lord.

General Description of the Work Environment Onodi

The DFD uses an in-house dictionary writing system designed in close interaction with the users. The main concern is to provide a user-friendly environment which can be used to create a richly-annotated and searchable dictionary of surnames that adheres to modern standards of electronic publishing. This section of the paper will present the general structure and basic features of the work environment Onodi. It is based on a presentation by Denzer/Horn 2014, but puts more emphasis on designing a tool for technologically less-skilled users.
Onodi consists of three main components: the XML editor oXygen to enter and edit contents of the dictionary, an eXist-db database to deploy and maintain the articles, and the content management system (CMS) TYPO3 with a custom extension for publishing and searching. Additional modules are a mapping software and the reference management software Zotero. The selected software reflects a preference for using established and mostly open-source software that can be accessed via interfaces for integration. The dictionary writing system Onodi is used successfully to produce and publish dictionary entries. This can be noted as a mark of effectiveness – a feature of usability besides efficiency and satisfaction according to the ISO standard EN ISO 9241.[4]
The digital work environment is designed to be adapted to the needs and skills of the lexicographers. This concerns, for instance, data modelling and the user interfaces for editing contents in the editor as well as in the CMS. TYPO3 provides a user interface and granular rights management. Consequently, users can update and edit contents for the website, e.g. news or the project description, relatively easily by themselves in the backend of the CMS. The interface for XML editing is based on a so-called oXygen framework which contains CSS files to format the XML files, XML schemas for validation, templates for new entries and project-specific preferences for editing such as adapted menu actions and a menu bar. As a result, a WYSIWYG mode is created that hides the markup.
Figure 2: A part of the editing interface in XML editor Oxygen

The designations as follows: a) menu for DFD annotations, b) menu bar for DFD annotations, c) button (here category) to add a new section, d) drop-down menu, e) real-time validation indicator, f) tree structure view (optional), g) access to database via WebDAV.

Keep reading on the Website

Preliminary Proceedings