satRday in Cape Town


We are planning to host one of the three inaugural satRday conferences in Cape Town during 2017. The [R Consortium]( has committed to funding three of these events: one will be in Hungary, another will be somewhere in the USA and the third will be at an international destination. At present Cape Town is dicing it out with Monterrey (Mexico) for the third location. We just need your votes to make Cape Town’s plans a reality.

The satRday will probably happen in late February or early March 2017. This is the end of southern hemisphere Summer and the Cape is at its best, with glorious weather and the peak Summer tourist rush over. You could easily factor satRday into a vacation in sunny South Africa.

Why Cape Town?

Cape Town is literally the jewel of Southern Africa:

– Table Mountain (spectacular view, great hiking, cable car);
– Wine farms (too many to mention, but all within a short drive of the city and most offering free tastings);
Boulders Beach (pristine beach in Simonstown with large colony of Jackass Penguins);
Camps Bay, Muizenberg and many other idyllic beaches;
Robben Island (return boat trip across Table Bay, tour of the Maximum Security Prison, and a bus tour of the Island);
– the Victoria & Alfred Waterfront and the Two Oceans Aquarium;
– the Kirstenbosch National Botanical Garden; and
– lots, lots more.

Did I mention the wine and Table Mountain? Ah, yes, I did.

This is what the weather looked like in Cape Town at the end of February 2016: temperatures around 20 to 25 °C (70 to 80 Fahrenheit), light breezes and zero precipitation.


For International Tourists

Cape Town is well connected with the rest of the World. There are direct flights to Cape Town International Airport from Amsterdam, Buenos Aires, Doha, Dubai, Frankfurt, Istanbul, London and Munich.

The exchange rate is extremely favourable (see below), making South Africa rather affordable for the international traveller. A healthy meal will cost you around 100 ZAR and a great bottle of wine can be had for about the same price. Decent accommodation with sweeping views of the sea or mountain costs between 700 and 1000 ZAR per night, but you can find more affordable (or more lavish) options.


Public transport is a little sparse, but Uber will take you anywhere you need to go.

We know that there are some security concerns about South Africa, but Cape Town is a very safe city. The two venues that we are considering for the conference are secure and easy to access:

– the campus of the University of Cape Town;
– the trendy suburb of Green Point, close to the Waterfront.

For useRs from the USA, Cape Town is somewhat further away than Monterrey, but it’s a trip you won’t regret making. For Europeans useRs it’s closer than Mexico and there is no time zone change.

We look forward to hosting you next year at satRday in Cape Town, the Mother City. Please vote for Cape Town now.



R Saturday [satRday] in Cape Town


I put in a proposal to host a R Saturday [satRday] in Cape Town next year. The R Consortium has committed to funding three of these events: one will be in Hungary, another will be somewhere in the USA and the third will be elsewhere in the world. The voting has opened for the location of these events.

Cast your vote for Cape Town here.


The image above reflects the results on the morning of 12 May 2016. You can find the current poll results here.

International Open Data Day

As part of International Open Data Day we spent the morning with a bunch of like minded people poring over some open Census South Africa data. Excellent initiative, @opendatadurban, I’m very excited to see where this is all going and look forward to contributing to the journey!


The data above show the distribution of ages in a segment of the South African population who have either no schooling (blue) or have completed Grade 12 (orange). Click the image to access the interactive plot. Being a subset of the complete data set, this does not tell the full story, but it’s still a rather interesting view on the state of education in South Africa.

R, HDF5 Data and Lightning

I used to spend an inordinate amount of time digging through lightning data. These data came from a number of sources, the World Wide Lightning Location Network (WWLLN) and LIS/OTD being the most common. I recently needed to work with some Hierarchical Data Format (HDF) data. HDF is something of a niche format and, since that was the format used for the LIS/OTD data, I went to review those old scripts. It was very pleasant rediscovering work I did some time ago.


The Optical Transient Detector (OTD) and Lightning Imaging Sensor (LIS) were instruments for detecting lightning discharges from Low Earth Orbit. OTD was launched in 1995 on the MicroLab-1 satellite into a near polar orbit with inclination 70°. OTD achieved global (spatial) coverage for the period May 1995 to April 2000 with roughly 60% uptime. LIS was an instrument on the TRMM satellite, launched into a 35° inclination orbit during 1997. Data from LIS were thus confined to more tropical latitudes. The TRMM mission only ended in April 2015.

The seminal work using data from OTD, Global frequency and distribution of lightning as observed from space by the Optical Transient Detector, was published by Hugh Christian and his collaborators in 2003. It’s open access and well worth a read if you are interested in where and when lightning happens across the Earth’s surface.

Preprocessing HDF4 to HDF5

The LIS/OTD data are available as HDF4 files from To load them into R I first converted to HDF5 using a tool from the h5utils suite:

$ h5fromh4 -d lrfc LISOTD_LRFC_V2.3.2014.hdf

Loading HDF5 in R

Support for HDF5 in R appears to have evolved appreciably in recent years. I originally used the hdf5 package. Then some time later transitioned to the h5r package. Neither of these appear on CRAN at present. Current support for HDF5 is via the h5 package. This package depends on the h5c++ library, which I needed to grab.

$ sudo apt-get install libhdf5-dev

Then, back in R I installed and loaded the h5 package.

> install.packages("h5")
> library(h5)

Ready to roll!

Low Resolution Full Climatology

The next step was to interrogate the contents of one of the HDF files. A given file may contain multiple data sets (this is part of the “hierarchical” nature of HDF), so we’ll check on what data sets are packed into one of those files. Let’s look at the Low Resolution Full Climatology (LRFC) data.

> file = h5file(name = "data/LISOTD_LRFC_V2.3.2014.h5", mode = "r")
> dataset = list.datasets(file)
> cat("datasets:", dataset, "\n")
datasets: /lrfc

Just a single data set, but that’s by design: we only extracted one using h5fromh4 above. What are the characteristics of that data set?

> print(file[dataset])
DataSet 'lrfc' (72 x 144)
type: numeric
chunksize: NA
maxdim: 72 x 144

It contains numerical data and has dimensions 72 by 144, which means that it has been projected onto a latitude/longitude grid with 2.5° resolution. We’ll just go ahead and read in those data.

> lrfc = readDataSet(file[dataset])
> class(lrfc)
[1] "matrix"
> dim(lrfc)
[1]  72 144

That wasn’t too hard. And it’s not much more complicated if there are multiple data sets per file.

Below is a ggplot showing the annualised distribution of lightning across the Earth’s surface. It’s apparent that most lightning occurs over land in the tropics, with the highest concentration in Central Africa. The units of the colour scale are flashes per square km per year. Higher resolution data can be found the HRFC file (High Resolution Full Climatology), but the LRFC is quite sufficient to get a flavour of the data.


Low Resolution Annual Climatology

The Low Resolution Annual Climatology (LRAC) data have the same spatial resolution as LRFC but the data are broken down by day of year. This allows us to see how lightning activity varies at a particular location during the course of a year.

> file = h5file(name = "data/LISOTD_LRFC_V2.3.2014.h5", mode = "r")
> dataset = list.datasets(file)
> lrac = readDataSet(file[dataset])
> dim(lrac)
[1]  72 144 365

The data are now packed into a three dimensions array, where the first two dimensions are spatial (as for LRFC) and the third dimension corresponds to day of year.

We’ll look at two specific grid cells, one centred at 28.75° S 28.75° E (near the northern border of Lesotho, which according to the plot above is a region of relatively intense lightning activity) and the other at 31.25° S 31.25° E (in the Indian Ocean, just off the coast of Margate, South Africa). The annualised time series are plotted below. Lesotho has a clear annual cycle, with peak lightning activity during the Summer months but extending well into Spring and Autumn. There is very little lightning activity in Lesotho during Winter due to extremely dry and cold conditions. The cell over the Indian Ocean has a relatively high level of sustained lightning activity throughout the year. This is due to the presence of the warm Agulhas Current flowing down the east coast of South Africa. We wrote a paper about this, Processes driving thunderstorms over the Agulhas Current. It’s also open access, so go ahead and check it out.


Although the data in the plot above are rather jagged, with some aggregation and mild filtering they become pleasingly smooth and regular. We observed that you could actually fit the resulting curves rather well with just a pair of sinusoids. That work was documented in A harmonic model for the temporal variation of lightning activity over Africa. Like the other two papers, it’s open access. Enjoy (if that sort of thing is your cup of tea).


My original intention with this post was to show how to handle HDF data in R. But in retrospect it has achieved a second objective, showing that it’s possible to do some meaningful Science with data that’s in the public domain.

Ira Glass on the Creative Process

Nobody tells this to people who are beginners, I wish someone told me. All of us who do creative work, we get into it because we have good taste. But there is this gap. For the first couple years you make stuff, it’s just not that good. It’s trying to be good, it has potential, but it’s not. But your taste, the thing that got you into the game, is still killer. And your taste is why your work disappoints you. A lot of people never get past this phase, they quit. Most people I know who do interesting, creative work went through years of this. We know our work doesn’t have this special thing that we want it to have. We all go through this. And if you are just starting out or you are still in this phase, you gotta know its normal and the most important thing you can do is do a lot of work. Put yourself on a deadline so that every week you will finish one story. It is only by going through a volume of work that you will close that gap, and your work will be as good as your ambitions. And I took longer to figure out how to do this than anyone I’ve ever met. It’s gonna take awhile. It’s normal to take awhile. You’ve just gotta fight your way through.Ira Glass