Pottery drawings and the semantic web

It’s almost impossible to know how many potsherds have been drawn by archaeologists up to date. Their number is however no doubt well over one million (educated guesses welcome). However, when it turns to the standard question “How many of these drawings are on the web?”, there is going to be some disappointment. Especially if you don’t count Google Books, and in fact you should not – I’m going to explain why in a moment.

In 2010, most drawings are still done by hand on a piece of paper, but later it’s common to digitize these drawings using vector graphics software like Autodesk AutoCAD™, Adobe Illustrator™ or Inkscape. There are some marked differences between choosing CAD or a “graphics” program like Inkscape or Illustrator (below shortened as “I”):

  • “I” don’t manage measurement units, you can just refer to the size of the paper (e.g. A4)
  • CAD is very good at producing high-quality printouts, with fine tuning of line-widths and similar settings
  • CAD uses file formats that are understood almost only by CAD software (DXF, DWG)
  • “I” are capable of (natively) managing open file formats like SVG

There are many more differences, but these are the ones of interest here. I will make now a brief digression on why we do so many drawings and what are their purposes from different points of view.

The first purpose of the drawing is to see the potsherd. This is true both because you look at it very carefully during the drawing process, and because the final result allows you and others to see it in a standardized layout, with some possibly hidden details clearly viewable on a clean surface. By no means I’m saying that a drawing is the best representation of a potsherd nor that it is the only one that should be considered for the purpose of dissemination and publication.

The second reason for drawing is being able to quickly go through an assemblage and develop typologies. Drawings make all the work easier and can be moved freely, while archaeological finds often cannot be moved from their storage place, for a number of reasons including lack of physical space, restrictions to movement imposed by conservation officers or even cultural heritage laws. Your drawings are a valuable digital copy of the assemblage, most probably together with a database. A digital copy of your drawings is one step further in the de-materialization of the archaeological assemblage (I’m going to write more about this issue soon).

The third and more prominent use of drawings is for publishing and dissemination of archaeological contexts, which follow de facto standards in each archaeological sub-domain, either imposed by custom or editors. Some publications and excavation reports feature hundreds of drawings. Yet, it seems like drawings are just passive illustrations that have no chance for being indexed, processed and disseminated in a proper format and I want something better than a raster image for my drawings, not just on my laptop but on the Web, too. Having your excavation publication in Google Books, or even as a downloadable PDF, is not what I’m envisioning here. Just like Tim Berners-Lee, I need raw data that I can build upon and play with.

Give these three assumptions, it comes as an obvious observation that I need an open format for my data, and this translates to SVG. I’m not sure it’s the best format, but it’s certainly a decent one, that has a fairly good support both on the desktop and on the web and is not entirely obscure to the masses. Furthermore SVG has been the subject of several digital experiments in archaeological publishing and dissemination. I realize that most efforts have been in the field of mapping and GIS ‒ and I assume that the reason of this irregularity is the fact that archaeologists doing GIS are generally speaking more tech-inclined than those doing ceramics. This is a recurring problem underlying and causing lots of the hyper-specialization we see today.

The only actual example of such an approach is Greek, Roman and Byzantine Pottery at Ilion (GRBP) by Sebastian Heath and Billur Tekkök. If you download the .tar.gz archive containing the entire website, you will find lots of SVG “source” files in the grbpottery/svg directory, along with their JPEG rasterization. The SVG files are not used in the public website, but are behind the scenes and were produced using Adobe Illustrator. With an increasing support of SVG by major web browsers (including IE 9), it will be easy for GRBP to switch from JPEG to SVG if it turns out to be convenient.

Scale and units are provided in the GRBP drawings by a scale within the drawing itself, overcoming one (possibly serious) limitation of the SVG format: there is no straight way to use real measurement units. A second limitation is the orientation of the drawing: like most digital imaging, SVG has the origin in the top-left corner, whereas a bottom-left origin would be much more familiar to anyone who knows the Cartesian system. This topic is worth a separate discussion, but in the meantime you can take a look at Kotyle, a software to compute the capacity of ceramic vessels.

Some considerations apply only if you have in mind a traditional publication work-flow, with a book at the end of the process. Instead, let’s assume we want to try out a new publishing process, geared towards the web. Since 2008, SVG can include RDFa attributes. Period.

RDFa attributes mean annotations inside the drawing that are machine-readable ‒ I’m thinking about “rim” and “handle” getting their own attributes, just like an HTML p or span. I’m going to include the author name in the metadata, together with semantic links to the original context and the comparisons for the shape and decoration. This is something that needs to be done by hand, but could be done within a dedicated editor. Maybe directly on the web.

My idea for a sustainable digital work-flow of pottery drawings would be thus along these lines:

  • draw on paper, following traditional procedures
  • digitize paper drawing in Inkscape, using the GRBP model
  • add semantic annotations by hand with a text editor, including authorship and links to context and comparisons/typologies
  • publish a collection of SVG drawings, alongside their raster versions for users with legacy systems

This is all very simplistic, and that’s exactly why I’m publishing it here. All comments are welcome, and would be geared towards the collaborative writing of a draft shared protocol.