Monday, November 10, 2014

Statistics from the other side

http://dmnes.wordpress.com/2014/11/08/statistics-from-the-other-side/



The last few months we’ve been posting statistics about how things look with a view towards the first edition of the Dictionary, but today I wanted to say a little bit about the other side of the statistics, all the data that we’ve collected that isnot yet ready for publication.
Compared to the 480+ entries that are ready for publication, we have 1580+ that have been created but are still in the drafting/research phase — nearly four times as many. These entries are sitting unfinished for any number of various reasons:
  • Time. When entering individual citations (VNFs in editor-speak), when a new name is reached that doesn’t yet have an entry associated with its canonical form (CNF in editor-speak), we simply create a CNF file which is empty other than the name form itself, and continue entering data. Depending on the data set being transcribed, over the course of an afternoon, 25+ such entries could be generated. Then, someone has to sit down, compare them to the CNF files already in existence (to make sure that it isn’t actually a duplicate of something, just in a different spelling), collect etymological and usage information, review everything to make sure it’s accurate, there are no typos, the XML formatting is correct, etc.
  • Etymology. A lot of the names we deal with are ‘known’ quantities; their origin has been well established, and thus it’s just a matter of writing up the etymological information correctly. Others, though, are unique and puzzling, and a single citation or two is not sufficient for positive identification. For this, I can offer examples the names Pelejana (Valencia, 1510), Persla (Brno, 1349), and Pevernel (Devon, 1599) (yes, I was working in “Pe-” last night…). These are likely tractable cases, but need to wait until further examples are collected before we can make headway with identification.
  • The intractable ones. Some, however, are going to be intractable: We fully expect that there will be names where all we can say is “This unique name of uncertain origin is found only in Italy in the early 14th C”, or the like. But, as with the names of (currently) uncertain origin noted above, we can’t make such a decision about a name too quickly.
  • Non-Latin alphabets (other than Greek). The Dictionary currently has no ready-for-publication entries which involve names of Hebrew origin, because we are still determining the best way to handle words written in that alphabet, in particular how to store the data and how to make sure it displays properly on the website. This means that, right now, a tremendously large number of names of Biblical origin are not yet ready.
  • Complex developments. In many cases, it’s rather straightforward to trace the development of a name through different time-periods and cultures, to confidently say, e.g., that Giovanni is a form of John. Other names are not so straightforward: Are RandalRandolph, and Ranulph all distinct names? They are of the same etymological origin, which normally would cause us to group them together; but would someone looking forRanulph think to look under Randal?
  • Names of cultural importance. For many names, providing the etymological information and some information about the use of the name by important royalty, saints, or popes is sufficient: The citations then speak for themselves in illustrating the spread of usage over time and space. But some names are, through their widespread use, important from a ‘cultural’ perspective, i.e., the perspective of anyone who is interested in the relationship between onomastics and social and personal identities. These names deserve greater comment, which, in turn, takes time to adequately compile, collate, and present. An example of such a name is John, whose popularity in pretty much every western European culture from the early 13th C on strips that of almost every other masculine name. (It is rare to find a data set where John and variants are not the most common name by a significant margin. I’ve always wondered about Ormskirk, Lancashire; in their 16th C baptismal register, Thomas just barely squeaks past John to be the most popular name.)
So what does all this mean? It means that the first edition is not going to contain a lot of names that people might expect to be in there (John very most likely being one of them). But it also means that there is always place for further research, and that we are unlikely to reach the end of potential new entries any time soon. It also means that at some point down the line, we’ll be able to put together a “Does Anyone Know This Name?” page where lay users of theDictionary as well as onomastic specialists can contribute their knowledge regarding identification and etymology of rare and unusual names.
It also means that we could stop collecting data now (though we won’t!) and spend the next two months solely doing research, and there’d still be potentially nearly 10,000 VNFs that could end up in an upcoming edition, since that is the number we correctly have waiting for review by one of the editors, to ensure that the entry details are correct and that the entry for the corresponding CNF is ready for publication. And this is just a scratch on the surface: There are hundreds of thousands of names out there waiting for us to catalogue them.