London EpiDoc Workshop 2014

I spent the most of the last week at the Institute for Classical Studies attending a Digital Epigraphy Workshop in TEI-EpiDoc organised by the Department for Digital Humanities, King’s College London and hosted by Gabriel Bodard, Simona Stoyanova, and Charlotte Tupman.  My own interest in epigraphy as a linguist is from the data which I can extract from it to make historical and linguistic arguments, and so if I can find a way to process that data in a way that is more optimal than searching strings through the PHI Epigraphy database, or (dius fidius) manually scanning over volume after volume of print corpora, I would very much like to take advantage of that.  Hence my interest in the workshop, although going into it and with no prior experience of working with XML I wasn’t completely sure of what I was getting myself into.  In the end, however, it was entirely worth the time invested.

Some of you may have heard of EpiDoc before because there are workshops regularly advertised on the Liverpool Classics List at least once a year, but what exactly EpiDoc is may not be entirely clear, even within the more restricted community of Classicists who are also epigraphists, papyrologists, or those who use epigraphical or papyrological editions for their research.  Such people may have already come into contact with online corpora where EpiDoc was used for digitially presenting their respective editions.  Some examples are the AHRC funded online Inscriptions of Aphrodisias Project, the Vindolanda Tablets Online, and the consolidated papyrological databases of

EpiDoc ( itself is a set of conventions for encoding epigraphic and papyrological documents in TEI (Text Encoding Initiative) XML for digital publication.  Their current guidelines and a number of ‘Gentle Introductions’ can be found here and here.  To unpack TEI-EpiDoc further in this blog post (although perhaps not quite as gently), TEI is a consortium that has been establishing SGML and XML standards for the encoding of literary and linguistic corpora since 1988, and XML is a standard markup language for encoding documents to be both machine and human readable.  Actually, the documents once encoded in XML sometimes aren’t that human readable, but transformations can be applied to XML documents through scripts in order to produce output that is more human-readable in alternative formats such as HTML.  While TEI XML is broad in scope subsuming all forms of documents from novels to medieval manuscripts, EpiDoc is a subset of TEI-compliant XML markup designed specifically to encode epigraphic (and more recently expanded to papyrological) documents in accordance with the various subsets of the Leiden conventions used in scholarly works, and has been in development since the first version of the guidelines was published in 2000.

Let me give a concrete example of what EpiDoc can do on the representational level:  On this most basic level, as stated above, it encodes texts digitally with all the Leiden sigla you would find in a volume of the Corpus Inscriptionum Latinarum, Inscriptiones Graecae, or any other edition of inscriptions.  Here’s an example of a verse inscription from a corpus that I regularly use (IG IX,2 – Thessaly) that I’ll encode using EpiDoc, and show the various levels of output that can be transformed out of it:

IG IX,2 270; drawing after Guarducci (1967:367)

IG IX,2 270; drawing after IG IX,2, reproduced from Guarducci (1967:367)

Using ‘anonymous block’ tags (<ab></ab>) we can set up a three line inscription in the edition space of the XML file.  In the inscription geminate liquids -ρρ- and -λλ- are only written with a singular letter, so for the sake of normalising this in the interpretative edition, we include the extra letters, which would normally be inclosed in angle-brackets < >, between XML tags <supplied reason=”omitted”>λ</supplied>.  Unclear letters, which we record with underdots in Leiden-style, are enclosed with the tags <unclear>πί</unclear>.

<div type=“edition” xml:space=“preserve”>
<lb n=“1″/>μνᾶμ’ ἐμι Πυ<supplied reason=“omitted”>ρ</supplied>ιάδα, hὸς οὐκ ἐ̄<unclear>πί</unclear>
<lb n=“2″/>στατο φεύγε̄ν ἀ<supplied reason=“omitted”>λ</supplied>λ’ αὖθε πὲρ γᾶς
<lb n=“3″/>τᾶσδε πο<supplied reason=“omitted”>λ</supplied>λὸν ἀριστεύο̄ν ἔθανε.

The following code when run through a basic EpiDoc XML > HTML stylesheet produces the following edition:

μνᾶμ’ ἐμι Πυ<ρ>ιάδα, hὸς οὐκ ἐ̄π̣ί̣
στατο φεύγε̄ν ἀ<λ>λ’ αὖθε πὲρ γᾶς
τᾶσδε πο<λ>λὸν ἀριστεύο̄ν ἔθανε.

Of course, this is a verse inscription consisting of two hexameters, so we can rewrite it to include markup that takes account of the verse lines and original line breaks:

            <div type=“edition” xml:space=“preserve”>
<lg met=“hexameter”>
<l n=“1″ met=“hexameter”>μνᾶμ’ ἐμι Πυ<supplied reason=“omitted”>ρ</supplied>ιάδα, hὸς οὐκ ἐ̄<supplied reason=“lost”>πί</supplied><lb/>στατο φεύγε̄ν</l>
<l n=“2″ met=“hexameter”><supplied reason=“omitted”>λ</supplied>λ’ αὖθε πὲρ γᾶς <lb/> τᾶσδε πο<supplied reason=“omitted”>λ</supplied>λὸν ἀριστεύο̄ν ἔθανε.</l>

Using the same XML > HTML stylesheet we can get the same edition as before with this markup, but if we enable the verse-lines parameter, we alternatively get the following output:

μνᾶμ’ ἐμι Πυ<ρ>ιάδα, hὸς οὐκ ἐ̄[πί]|στατο φεύγε̄ν
ἀ<λ>λ’ αὖθε πὲρ γᾶς | τᾶσδε πο<λ>λὸν ἀριστεύο̄ν ἔθανε.

Likewise also from the exact same encoding from the XML > HTML stylesheet can generate the diplomatic edition using a different output parameter:


Also, if this was an elegiac couplet, you could even set the metre tag to ‘met=”elegaic”‘ and the metre on specific lines get the proper indenting.  For example:

[δέξ]ο ϝά[ν]αξ Κρονίδα{ι} Δ̣εῦ Ὀλύνπιε καλὸν ἄγαλμα
   hελέϝο[ι θυ]μο̃ι τοῖ<λ> Λακεδαιμονίοις

 <div type=“edition” xml:space=“preserve”>
<lg met=“elegaic”>
<l n=“1″ met=“hexameter”><supplied reason=“lost”>δέξ</supplied>ο ϝά<supplied reason=“lost”>ν</supplied>αξ Κρονίδα<surplus>ι</surplus> <unclear>Δ</unclear>εῦ Ὀλύνπιε καλὸν ἄγαλμα</l>
<l n=“2″ met=“pentameter”>hελέϝο<supplied reason=“lost”>ι θυ</supplied>μο̃ι τοῖ<supplied reason=”omitted”>λ</supplied> Λακεδαιμονίοις</l>

 There are lots of options for verse and metre tagging.  It’s a very cool feature.

Of course, this is just the technical production and presentation.  (For an example of how such output can look in an actual online corpus, see e.g. IRT 256a.)  Similarly anyone could write HTML encoding on its own to publish inscriptions that can be read by human beings online.  The real usefulness of EpiDoc is that it encodes the text in XML which makes the text not only capable of being transformed into something that is human-readable, but the whole text is also machine readable.  This is the part about it that has made me an instant convert to digital epigraphy:  Because of the semantic tagging involved in XML, not just within the text but also information in the lemma structure, the corpora can be read and processed by a computer (i.e. it is machine-readable) to sort the data along whatever criteria that you have tagged – which makes it a lot faster to search through on specific parameters such as date, monument type, object material, find spot, current location, museum inventory number, alternative reference numbers in other corpora, language of the inscription, persons mentioned, locations mentioned etc., etc.  It also means that you can develop further tools that can generate your indices and search criteria from whatever you would like on the basis of the semantic tagging.  Thus, it radically saves the time that you might have spent scanning by hand through inscription after inscription or papyrus after papyrus in print editions, or sorting through their indices.  An example of such complex indexing can be found here in the EpiDoc digitisation of the Inscriptions of Roman Tripolitania; you’ll find similar things in other XML-based corpora of inscriptions.

There are a few limitations at present, but hopefully these will be resolved soon.  There’s no readily available way at present for non-programmers to take a corpus of XML encoded documents and put readily cobble together a front-end to collate them and make them cross-searchable – but in the workshop such a tool was recognised as a desideratum and hopefully will see the light of day sometime in the future.  I would imagine that if such a tool were made available at some point then EpiDoc could be greatly expanded as a tool not only for the publication of online corpora, but for individual researchers to simply organise and control the vast amount of data that is involved in, say, a Ph.D. project that involves extensive use of inscriptions or papyri.

In the meantime, although it’s my opinion that the traditional paper-bound text-corpus is unlikely to be entirely superceded simply for the fact that it’s easier to cite something fixed in print rather than something mutable on the internet – although we had a long interesting conversation about this point at the pub after one of the workshop sessions – it is clear to me that the digital epigraphic and papyrological corpora are likely to become the future standard of how we produce our editions.*

*Note:  By production of corpora digitally, this does not necessarily exclude the continued production of print corpora by the same method, since the same encoding can be used for simultaneous online and print publication.  Closer to home, a similar situation is in the works here in the Faculty of Classics; the new Greek Lexicon Project was conceived from the start to achieve an integrated online and print publication, and is also being composed in XML.

About these ads

About Matt Scarborough

न वदेद्यावनीं भाषां प्राणैः कण्ठगतैरपि । «One should not speak a Western language even to save one's life.»
This entry was posted in Uncategorized and tagged , , . Bookmark the permalink.

One Response to London EpiDoc Workshop 2014

  1. mattitiahu says:

    Reblogged this on Memiyawanzi and commented:
    I just wrote this piece for Res Gerendae on the London EpiDoc workshop which I just took part in last week. EpiDoc is an XML markup language for digital epigraphy and papyrology that is gaining tread in the epigraphical community since its first release of guidelines in 2000. Some of the regular readers here may also find this initiative interesting.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s