From Photos and Negatives to Omeka and Flickr
A Step-by-Step Guide
Having acquired quite a number of glass plate negatives, celluloids and prints over the past few years for the Breslin Archive, with thousands more still to be scanned, we thought it might be useful for others to read a little bit about the process by which we go from negative to digital archive on a public service like Omeka or Flickr, including the injection of metadata along the way.
As their Omeka.net site says “Omeka is a web publishing platform for sharing digital collections and creating media-rich online exhibits”. Flickr is a popular photo management and sharing platform. We use both for sharing photos online. You can trial out Omeka for free for up to 500 MB of storage, and the free version of Flickr allows you to store 1000 public images.
In this guide, we have attempted to use free systems so that anyone can replicate these steps. We carry out the process on a Mac, so some of the command line stuff is run in a Mac Terminal, but you could also do this on Linux or on a Windows PC with a Unix emulator such as Cygwin. The other tools we use below (ImageMagick, exiftool) are also free software.
Digitising Photos and Negatives
There have been various guides written about digitising photos and negatives (including glass plates) in the first place, and a good starting point is the minimum digitisation capture requirements from the ALCTS (ALA), along with blog posts such as this great overview on scanning and digitising from Kelly Miner who was an intern at Dublin City Council, this article from WebJunction/OCLC on the preservation of glass plate negatives, another guide on scanning that is now only available via the Wayback Machine, and this set of tips on housing glass plate negatives.
We have an Epson Perfection V800 photo scanner which works well, although there are other versions, newer and older. It has some negative holders for standard sizes (e.g., 35 mm negatives) and also a capital I-shaped negative guide for aligning odd-format negatives on the scanner bed. The scanner bed is about A4 size. We mostly use the default settings, but always in Professional Mode to fine tune some settings and to select the areas to be scanned after preview.
There are a few key choices, namely going for the best number of bits available for black and white or colour, and choosing the correct dots per inch. The ALCTS requirements above are a good starting point for minimum DPIs for various sizes, and you can often go a bit higher if you can afford the file size/storage increase (we tend to do so for glass plates). We first scan to an uncompressed TIFF file for local storage, and then do some further processing with ImageMagick to generate compressed JPEG/watermarked versions for use online (see below).
Don’t forget to scan negatives with the emulsion side up and away from the scanner bed (i.e., dull side facing up, glossy side down) to avoid damaging where the ‘information’ is stored.
Naming Files/Identifiers
For choice of file naming and data, this presentation on digital files/metadata from Sarah Gillis at the Worcester Art Museum is worth a look (along with the associated article here).
After various iterations, we’ve gone for something inspired by the above presentation but that also gives me a bit more flexibility in terms of future additions (it’s a bit like car registrations): 00000AAA-AAAAA000-D. You may need to use a batch file renamer like Transnomino if the scanning software limits you to a certain number of digits. The first five digits are just a number in sequence in terms of the scanning process/when it was scanned into the archive. It would be easier if albums/collections were in sequence, but they don’t necessarily have to be either.
The next three letters are the type of photo source, e.g., CNG or GNG for celluloid or glass plate negative, BWP for black and white print, etc. The next eight are five characters for the collection abbreviation (e.g., POOLE) plus by a three-number identifier for the item in that collection. If you have more than 999, you could start another collection name like POOLB. We have some collections that span multiple albums, e.g. CAFEA, CAFEB, etc. We sometimes use NOCOL123 if there is no collection, in a numbered sequence when scanning.
Finally, we tag on a -D to show that this represents the digital copy of the physical item (as per Sarah Gillis’ suggestion). Here is an example file name: 00639CNG-CAFEE031-D.tif. We sometimes further tag on a -WM to indicate a watermarked version, which we create from the TIFF file using the script below, which creates a (limited to) 2000 pixel high or wide JPEG (grayscale, normalised) and the corresponding watermarked version:
for f in uncompressed/*.tif
do
echo “Converting $f”
convert “$f” -colorspace Gray -normalize “$(basename “$f” .tif).jpg”
convert -resize 2000 “$(basename “$f” .tif).jpg” “$(basename “$f” .tif)-W.jpg”
test=`convert $f -format “%[fx:(w/h>1)?1:0]” info:`
if [ $test -eq 1 ]; then
composite -dissolve 25% -gravity SouthWest logo_landscape.jpg “$(basename “$f” .tif)-W.jpg” “$(basename “$f” .tif)-WM.jpg”
else
composite -dissolve 25% -gravity SouthWest logo_portrait.jpg “$(basename “$f” .tif)-W.jpg” “$(basename “$f” .tif)-WM.jpg”
fi
rm “$(basename “$f” .tif)-W.jpg”
done
The logo dimensions for watermarking are 1500 × 500 (portrait) and 1000 × 333 (landscape).
We usually move files into three folders: uncompressed (TIFF, e.g. …-D.tif), compressed (…-D.jpg) and watermarked (…-D-WM.jpg), some with sub-folders. Now that we have watermarked versions, we create additional versions of these with embedded EXIF metadata. But first, to do that, we need to type each photo or negative’s associated metadata into an initial spreadsheet. We use this spreadsheet to help with the Omeka upload process as well.
Spreadsheet for Mass Uploading to Omeka
Our metadata is stored in two ways. One is a general spreadsheet that holds information on all of our scans (that go into “Items” in Omeka), which is stored with headings that make it easy to export from Excel/Google Sheets to a .CSV file for mass uploading/importing into Omeka Items, and the other is a sheet created later which holds tags for embedded EXIF metadata (i.e., stored within the JPEG files) for importing into Flickr.
We are using the hosted version of Omeka at Omeka.net, but this process should also work equally well for your own custom Omeka installations if using the CSV Import plugin. In Omeka, Items (“Still Images” or photos in our case) are stored in Collections.
Below is a table showing the headings in our spreadsheet. We use the commonly used Dublin Core Metadata Initiative (DCMI) semantic terms (dc:) which are easily mapped to fields used by Omeka’s CSV Import plugin, plus some custom fields of our own (jb:) which are used by us for indexing or to help generate other fields in Excel/Google Sheets. In the CSV Import, you can choose using dropdowns which imported (jb:/dc:) fields are supposed to map to the built-in Omeka fields.
jb:number
The number in sequence scanned, or the first five digits of the file name as above.
jb:type
The source item type, CNG, GNG, BWP, etc., which is usually a three-letter version of the still-image-item-type-metadata-original-format field below.
jb:collection_album
The collection and/or album name, e.g., POOLE, CAFEA, DLRLP, etc. A full-text version of this five-letter abbreviation is usually stored in Omeka’s “Collections” for each collection or album.
jb:collection_number
The number within the collection and/or album, three digits.
dc:title
The title of the image.
dc:description
The description of the image.
dc:creator
Who took the photograph, if known.
dc:source
Where did the photograph come from (book, newspaper, website, etc.), if known, e.g., “On An Irish Jaunting Car Through Donegal and Connemara”.
dc:date
The date the image was taken on, using YYYY-MM-DD format.
dc:rights
Copyright (public domain, Creative Commons, etc.).
dc:identifier
We use the identifier format described above in this field, e.g., 00018BWP-ONANI001-D. This is generated as a concatenation of the first four jb: fields above using an Excel/Google Sheets formula.
jb:location_streetview
If an exact location is known, we like to embed a Google Street View frame which is nice for then/now comparisons.
jb:location_url
Again, we use Google Maps for this, with URLs in the format https://www.google.com/maps/@52.140755,-8.2992043,14z so that we can easily extract the latitude and longitude coordinates for EXIF.
jb:location_name
A textual description of where this is located, e.g., Doolin, County Clare.
dc:coverage
We create this field from the previous two, hyperlinking the location name with the location URL. We also include the Google Street View if available as an iframe.
dc:publisher
For us, this is “the Breslin Archive”.
dc:subject
Keywords separated by semicolons.
still-image-item-type-metadata-original-format
Examples include Celluloid Negative, Glass Plate Negative, Black and White Print, etc.
still-image-item-type-metadata-physical-dimensions
Examples include 8 cm x 10 cm, 15 cm x 20 cm, etc.
jb:scan_date
The date the photograph was scanned, again using YYYY-MM-DD format. This is often close to when the photograph was made available online in the archive, and we sometimes include this in the copyright text (dc:rights) when a work is made available for the first time.
jb:image
We need to have a web accessible version of the (watermarked) photograph for Omeka to import. We use FTP/SCP (Cyberduck) to put it into a public directory, and this is the URL for the file.
When we have typed up the above metadata into the spreadsheet, we can export it to a CSV file. We upload the watermarked items to a public space so that Omeka can access them for importing (see jb:image field above), and then import them using this CSV file via the Omeka CSV Import plugin. You can read these instructions on CSV Import for more.
When importing, you can select “Still Image” as the item type, whatever collection you have created previously to put them into (think “album”), make items as public and featured too if you want, and choose semicolon as the tags delimiter.
On the field mapping screen, link all the dc: terms to their corresponding fields in Omeka (the names all match up), as well as the Still Image Original Format and Physical Dimensions fields. If you have used any HTML, you can tick the HTML checkbox for each necessary field (e.g., for dc:description, dc:coverage, dc:rights), tick the tags checkbox for dc:subject, and tick the files checkbox for jb:image as the file source to import.
Adding EXIF Metadata for Flickr Imports
We want to inject some EXIF, IPTC and GPS metadata into the watermarked items next. This is useful when importing into Flickr as it will pre-populate Flickr titles, descriptions, locations, etc. using this metadata, removing a lot of manual updating (as with Omeka in the previous step where the batch upload automatically populates each photo with its associated metadata).
Here are some general tips on EXIF and Flickr, and some more questions answered on EXIF and Flickr. Finally, you might also want to read about how to insert line breaks into an image description in EXIF. (If you have already uploaded images previously to Omeka and want to download the image metadata to a new CSV file using some Python scripts, have a try of the Omeka API to CSV Script. You will need an API key, which can be found under the Users Admin page in Omeka. After running this script, you will have an items_output.csv file and a files_output.csv file with all the associated metadata from prior uploads.)
We use exiftool on the command line to inject the EXIF metadata into the watermarked JPEGs as a batch job. You can do this by passing data from another CSV file to exiftool using the following command, where the semicolon is to allow the keywords to be extracted (note that there’s no need for an equals sign after the -sep parameter), and watermarked as the last parameter is the folder where the files are stored/to be updated:
exiftool -csv=exiftool.csv -sep “; “ watermarked
The headings in the exiftool.csv file are as follows (we use a corresponding exiftool.xls[x] to populate fields using various formulas described below, and then export this to a CSV):
SourceFile
The watermarked file location, e.g. watermarked/00040CNG-BROTC018-D-WM.jpg
xmp:Title
Corresponding to dc:title.
xmp:Description
Corresponding to dc:description. However, you can also concatenate a bunch of strings from the items_output sheet, e.g.
=”Information: “&items_output.csv!$E2&”
Date: “&items_output.csv!$D2&”
Location: “&items_output.csv!$B2&”
Photographer: “&items_output.csv!$C2&”
Source: “&IF(items_output.csv!$I2<>””,items_output.csv!$I2,items_output.csv!$G2)&”
Ref.: “&items_output.csv!$F2&”
Link: “&SUBSTITUTE(items_output.csv!$W2, “/api/items”, “/items/show”)&”
License: “&items_output.csv!$H2&”
Format: “&items_output.csv!$K2&”, “&items_output.csv!$L2
iptc:Keywords
Corresponding to dc:subject. Alternatively, if you don’t have any keywords, you can just use the space-separated words from the photo title, so you would need to remove any commas from the title using a formula such as =SUBSTITUTE(B2, “,”,””) and also use -sep “ “ in the exiftool command line
xmp:gpslatitude
Extracted from the Google Maps URL, e.g., =IF(items_output.csv!$B2<>””, MID(items_output.csv!$B2, FIND(“@”, items_output.csv!$B2)+1, FIND(“,”, items_output.csv!$B2)-FIND(“@”, items_output.csv!$B2)-1), “”)
xmp:gpslongitude
Extracted from the Google Maps URL, e.g., =IF(items_output.csv!$B2<>””, MID(items_output.csv!$B2, FIND(“,”, items_output.csv!$B2)+1, FIND(“z”, items_output.csv!$B2)-FIND(“,”, items_output.csv!$B2)-4), “”)
exif:gpslatitude
As above.
exif:gpslongitude
As above.
exif:gpslatituderef
Extracted from the Google Maps URL, e.g., where E2 is xmp:gpslatitude you can use =IF(items_output.csv!$B2<>””, IF(VALUE(E2)<0, “S”, “N”), “”)
exif:gpslongituderef
Extracted from the Google Maps URL, e.g., where F2 is xmp:gpslongitude you can use =IF(items_output.csv!$B2<>””, IF(VALUE(F2)<0, “W”, “E”), “”)
exif:dateTimeOriginal
The date and time in the format YYYY:MM:DD HH:MM:SS, e.g., 1901:01:01 01:01:01.
When you run the exiftool command above, it will rename the original files to have a suffix of .jpg_original, create new files with the same original file name (.jpg suffix), and you should get a response similar to this:
1 directories scanned
68 image files updated
In Flickr, it is simply a matter of dragging and dropping the now metadata-enhanced watermarked images into the Upload area, and you will see all photos, titles, descriptions and tags being automatically populated. Before you finalise the upload, you may want to change the licence type for the uploaded images, as well as choosing a photo album to upload to (new or existing).
Et voila! Each photo on Flickr is pre-populated with a lot of metadata, including map location if the latitude and longitude are known, a rich hyperlinked description, tags, etc. as shown.
We hope you found this guide of interest, and welcome any comments or questions below, or via email at archive@breslin.org. You can visit the Breslin Archive at www.breslin.org.