Processing EXIF Data
The EXIF data were dumped using exiftool.
The resulting data were a lengthy series of records (one for each image file) that looked like this:
The data for each image begin at the “========” separator and the number of fields varies according to what information is available per image.
Getting the Data into R
The process of importing the data into R and transforming it into a workable structure took a little bit of work. Nothing too tricky though.
|Basically, here I loaded all of the data using readLines() which gave me a vector of strings. I then concatenated all of those strings into a single very, very long string. I sliced this up into blocks using the separator string “======== “ and then split the records in each block at the pipe symbol “||”. Finally, I found that the results had a few empty elements which I simply discarded.|
The resulting data looked like this:
The next step was to reformat the data in each of these records and consolidate into a data frame.
Note the use of the handy utility function setNames() which was used to avoid creating a temporary variable. The result is a list, each element of which is a sub-list with named fields. A typical element looks like
The next step was to concatenate all of these elements to form a single data frame. Normally I would do this using a combination of do.call() and rbind() but this will not work for the present case because the named lists for each of the images do not all contain the same set of fields. So, instead I used ldply(), which deals with this situation gracefully.
The final data, after a few more minor manipulations, is formatted as a neat data frame.
Plots and Analysis
There are quite a few photographs in the data.
The only sensible way to understand my photography habits is to produce some visualisations. The three elements in the exposure triangle are ISO, shutter speed (or exposure time) and aperture (or F-number). I try to always shoot at the lowest ISO, so the two variables of interest are shutter speed and aperture.
First let’s look at mosaic and association plots for these two variables.
Mosaic plots are a powerful way of understanding multivariate categorical data. The mosaic plot above indicates the relative frequency of photographs with particular combinations of shutter speed and aperture. The large blue block in the tenth row from the top indicates that there are many photographs at 1/60 second and F/2.8. Whereas the small blue block on the right of the top row indicates that there are relatively few photographs at 1 second and F/32. The colour shading in the plot indicates whether there are too many (blue) or too few (red) photographs with a given combination of shutter speed and aperture relative to the assumption that these two variables are independent [1,2]. Grey blocks are not inconsistent with the assumption of independence. Since there are a significant number of blue and red blocks, the data suggests that there is a significant relationship between shutter speed and aperture (which is just what one would expect!).
The association plot provides essentially the same information, with the area of a block being positive (blue and above the line) or negative (red and below the line) according to whether or not the observed data are greater than or less than what would be expected in the case of independence.
A somewhat simpler way of looking at these data is a heat map, which just gives the number of counts for each combination of shutter speed and aperture and does not make any model comparisons. It is presented on a regular grid, which makes it a little easier on the mind too (although it is appreciably weaker in terms of information content).
Again we can see that the overwhelming majority of my photographs have been taken at 1/60 second and F/2.8, which I am ashamed to say shows a distinct lack of imagination! (Note to self to remedy this situation).
And, finally, another heat map which shows how the number of photographs I have taken has evolved over time. There have been some very busy periods like the (southern hemisphere) summers of 2004/5 and 2007/8 when I went to Antarctica, and April/May 2007 when I visited Marion Island.
There are a lot of other interesting things that one might do with these data. For example, looking at the upper and lower limits of focus distance, but these will have to wait for another day.
 Zeileis, Achim, David Meyer, and Kurt Hornik. “Residual-based Shadings in vcd.”
 Visualizing contingency tables