Categorically Variable

Only search Categorically Variable.

Largest Volcanoes in Recorded History (and other statistics)

Around 199 years ago the largest volcano in recorded history, Mount Tambora, erupted, spewing an enormous volume of molten rock and ash into the atmosphere and onto the surrounding land.

 Read more

Earthquakes: Magnitude / Depth Chart

I am working on a project related to secondary effects of earthquakes. To guide me in the analysis I need a chart showing the location, magnitude and depth of recent earthquakes. There are a host of such charts available already, but since I had the required data on hand, it seemed like a good idea to take a stab at it myself.

 Read more

Daylight Saving Effect on Financial Indices

Does the transition to and from Daylight Saving Time (DST) have a (significant) effect on the stock market?

 Read more

Filtering Data with L1 Regularisation

A few days ago I posted about Filtering Data with L2 Regularisation. Today I am going to explore the other filtering technique described in the paper by Tung-Lam Dao.

 Read more

Filtering Data with L2 Regularisation

I have just finished reading Momentum Strategies with L1 Filter by Tung-Lam Dao. The smoothing results presented in this paper are interesting and I thought it would be cool to implement the L1 and L2 filtering schemes in R. We’ll start with the L2 scheme here because it has an exact solution and I will follow up with the L1 scheme later on.

 Read more

Real Time Lightning Data

A map of real time lightning data from the World Wide Lightning Location Network (WWLLN) is now available here.

 Read more

How Much Time to Conceive?

This morning my wife presented me with a rather interesting statistic: a healthy couple has a 25% chance of conception every month [1], and that this should result in a 75% to 85% chance of conception after a year. This sounded rather interesting and it occurred to me that it really can’t be that simple. There are surely a lot of variables which influence this probability. Certainly age should be a factor and, after a short search, I found some more age-specific information which indicated that for a woman in her thirties, the probability is only around 15% [2,3].

 Read more

Processing EXIF Data

I got quite inspired by the EXIF with R post on the Timely Portfolio blog and decided to do a similar analysis with my photographic database.

 Read more

Updated Comrades Winners Chart

On Friday I received my copy of The Official Results Brochure for the 2013 Comrades Marathon. Always makes for a diverting half an hour’s reading. And the tables at the front provide some very interesting statistics. Seemed like a good opportunity to update my Chart of Comrades Winners.

 Read more

Contour And Density Layers With Ggmap

 Read more

Amy Cuddy: Your body language shapes who you are...

… but be careful with your plots: they might be misinterpreted.

Amy Cuddy gives a great talk. Provided me with lots to think about and I will happily confess that I have struck a few power poses (but only after ensuring that I am quite alone)!

 Read more

Deriving a Priority Queue from a Plain Vanilla Queue

Following up on my recent post about implementing a queue as a reference class, I am going to derive a Priority Queue class.

 Read more

Implementing a Queue as a Reference Class

I am working on a simulation for an Automatic Repeat-reQuest (ARQ) algorithm. After trying various options, I concluded that I would need an implementation of a queue to make this problem tractable. R does not have a native queue data structure, so this seemed like a good opportunity to implement one and learn something about Reference Classes in the process.

The Implementation

We use setRefClass() to create a generator function which will create objects of the Queue class.

Queue <- setRefClass(Class = "Queue",
                     fields = list(
                       name = "character",
                       data = "list"
                     ),
                     methods = list(
                       size = function() {
                         'Returns the number of items in the queue.'
                         return(length(data))
                       },
                       #
                       push = function(item) {
                         'Inserts element at back of the queue.'
                         data[[size()+1]] <<- item
                       },
                       #
                       pop = function() {
                         'Removes and returns head of queue (or raises error if queue is empty).'
                         if (size() == 0) stop("queue is empty!")
                         value <- data[[1]]
                         data[[1]] <<- NULL
                         value
                       },
                       #
                       poll = function() {
                         'Removes and returns head of queue (or NULL if queue is empty).'
                         if (size() == 0) return(NULL)
                         else pop()
                       },
                       #
                       peek = function(pos = c(1)) {
                         'Returns (but does not remove) specified positions in queue (or NULL if any one of them is not available).'
                         if (size() < max(pos)) return(NULL)
                         #
                         if (length(pos) == 1) return(data[[pos]])
                         else return(data[pos])
                       },
                       initialize=function(...) {
                         callSuper(...)
                         #
                         # Initialise fields here (place holder)...
                         #
                         .self
                       }
                     )
)

Methods of the Generator Function

Let’s look at some of the methods for the generator function. The first of these is new(), which will create a fresh instance of the class. We will return to more examples of this later.

> Queue$new()
Reference class object of class "Queue"
Field "name":
character(0)
Field "data":
list()

Next we have methods() and fields() which, not surprisingly, return vectors of the names of the class’s methods and data fields respectively.

> Queue$methods()
 [1] "callSuper"    "copy"         "export"       "field"        "getClass"     "getName"      "getRefClass" 
 [8] "import"       "initFields"   "initialize"   "peek"         "poll"         "pop"          "push"        
[15] "setName"      "show"         "size"         "trace"        "untrace"      "usingMethods"
> Queue$fields()
       name        data 
"character"      "list" 

Lastly the help() method which, by default, returns some general information about the class.

> Queue$help()
Usage:  $help(topic) where topic is the name of a method (quoted or not)
The definition of class Queue follows.
Reference Class "Queue":

Class fields:
                          
Name:       name      data
Class: character      list

 Class Methods:  
    "callSuper", "copy", "export", "field", "getClass", "getName", "getRefClass", "import", "initFields", "initialize", 
"peek", "poll", "pop", "push", "setName", "show", "size", "trace", "untrace", "usingMethods"


 Reference Superclasses:  
    "envRefClass"

However, if provided with a method name as an argument, it will provide a help string for the specifed method. These help strings are embedded in the class definition using a syntax which will be familiar to Python programmers.

> Queue$help(push)
Call:
$push(item)


Inserts element at back of the queue.

Creating Objects (and some Reference Class Features)

Each Queue object can be assigned a name. I am not sure whether this will be required, but it seemed like a handy feature to have. This can be done after the object has been created…

> q1 <- Queue$new()
> q1$name <- "test queue"
> q1$name
[1] "test queue"
> 
> q1
Reference class object of class "Queue"
Field "name":
[1] "test queue"
Field "data":
list()

… or during the creation process.

> q2 <- Queue$new(name = "another test") > q2$name
[1] "another test"

We have seen that the name field can be accessed using the dollar operator. We can also use the accessors() method for the generator function to create methods for reading and writing to the name field.

> Queue$accessors("name")
>
> q2$getName()
[1] "another test"
> q2$setName("a new name")
> q2$getName()
[1] "a new name"

Standard Queue Functionality

Let’s get down to the actual queue functionality. First we add elements to the queue using the push() method.

> q2$push("item number one")
> q2$push(2)
> q2$push("third item")
> q2$size()
[1] 3

The queue now contains three items (two strings and a number). The items are stored in the data field. Ideally this should not be directly accessible, but the Reference Class implementation does not provide for private data. So you can access the items directly (but this defeats the object of a queue!).

> q2$data
[[1]]
[1] "item number one"
[[2]]
[1] 2
[[3]]
[1] "third item"

Next we can start to access and remove items from the queue. The first way to do this uses pop(), which returns the item at the head of the queue and at the same time removes it from the queue.

> q2$pop()
[1] "item number one"
> q2$pop()
[1] 2
> q2$size()
[1] 1

What if we just want to have a look at the first item in the queue but not remove it? Use peek().

> q2$peek()
[1] "third item"
> q2$size()
[1] 1

Next we have poll(), which is very much like pop() in that it removes the first item from the queue. But, as we will see in a moment, it behaves differently when the queue is empty.

> q2$poll()
[1] "third item"
> q2$size()
[1] 0

So we have ejected all three items from the queue and we should expect the behaviour of these functions to be a little different now. pop() generates an error message: obviously we cannot remove an item from the queue if it is empty!

> try(q2$pop())
Error in q2$pop() : queue is empty!

This might not be the best behaviour for harmonious code, so peek() and poll() more elegantly indicate that that the queue is exhausted by returning NULL.

> q2$peek()
NULL
> q2$poll()
NULL

Non-Standard Queue Functionality

I am going to need to be able to take a look at multiple items in the queue, so the peek() method accepts an argument which specifies item locations. To illustrate, first let’s push a number of items onto a queue.

> q3 <- Queue$new(name = "letter queue") >
> sapply(letters, function(n) {q3$push(n)})
a b c d e f g h i j k l m n o p q r s t u v w x y z
"a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
>
> q3$size()
[1] 26

As we have already seen, the default peek() behaviour is just to take a look at the item at the head of the queue.

> q3$peek()
[1] "a"

But we can also choose items further down the queue as well as ranges of items.

> q3$peek(2)
[1] "b"
> q3$peek(4:7)
[[1]]
[1] "d"
[[2]]
[1] "e"
[[3]]
[1] "f"
[[4]]
[1] "g"

If the requested range of locations extends beyond the end of the queue then the method returns NULL as before.

> q3$peek(24:28)
NULL

Conclusion

That is all the infrastructure I will need for my ARQ project. Well, almost. I would actually prefer a Priority Queue. I’ll be back with the implementation of that in a day or two. In the meantime though, if anyone has any ideas for additional functionality or has suggestions for how this code might be improved, I would be very pleased to hear them.

This code is now published in the liqueueR package.

 Read more

Lightning Activity Predictions For Single Buoy Moorings

A short talk that I gave at the LIGHTS 2013 Conference (Johannesburg, 12 September 2013). The slides are a little short on text because I like the audience to hear the content rather than read it. The objective with this project was to develop a model which would predict the occurrence of lightning in the vicinity of a Single Buoy Mooring (SBM). Analysis and visualisations were all done in R. I used data from the World Wide Lightning Location Network (WWLLN) and considered four possible models: Neural Network, Conditional Inference Tree, Support Vector Machine and Random Forest. Of the four, Random Forests produced the best performance. The preliminary results from the Random Forests model are very promising: there is good agreement between predicted and observed lightning occurrence in the vicinity of the SBM.

 Read more

Iterators In R

 Read more

Introduction to Fractals

A short while ago I was contracted to write a short piece entitled “Introduction to Fractals”. The article can be found here. Admittedly it is hard to do justice to the topic in less than 1000 words.

 Read more

Percolation Threshold: Including Next-Nearest Neighbours

In my previous post about estimating the Percolation Threshold on a square lattice, I only considered flow from a given cell to its four nearest neighbours. It is a relatively simple matter to extend the recursive flow algorithm to include other configurations as well.

Malarz and Galam (2005) considered the problem of percolation on a square lattice for various ranges of neighbor links. Below is their illustration of (a) nearest neighbour “NN” and (b) next-nearest neighbour “NNN” links.

Implementing Next-Nearest Neighbours

There were at least two options for modifying the flow function to accommodate different configurations of neighbours: either they could be given as a function parameter or defined as a global variable. Although the first option is the best from an encapsulation perspective, it does make the recursive calls more laborious. I got lazy and went with the second option.

> neighbours = c("NN")
> 
> flow <- function(g, i = NA, j = NA) {
+   # -> Cycle through cells in top row
+   #
+   if (is.na(i) || is.na(j)) {
+     for (j in 1:ncol(g)) {
+       g = flow(g, 1, j)
+     }
+     return(g)
+   }
+   #
+   # -> Check specific cell
+   #
+   if (i < 1 || i > nrow(g) || j < 1 || j > ncol(g)) return(g)
+   #
+   if (g[i,j] == OCCUPIED || g[i,j] == FLOW) return(g)
+   #
+   g[i,j] = FLOW
+   
+   if ("NN" %in% neighbours) {
+     g = flow(g, i+1, j)           # down
+     g = flow(g, i-1, j)           # up
+     g = flow(g, i, j+1)           # right
+     g = flow(g, i, j-1)           # left
+   }
+   if ("NNN" %in% neighbours) {
+     g = flow(g, i+1, j+1)         # down+right
+     g = flow(g, i-1, j+1)         # up+right
+     g = flow(g, i+1, j-1)         # down+left
+     g = flow(g, i-1, j-1)         # up+left
+   }
+   
+   g
+ }

We will add a third example grid to illustrate the effect.

> set.seed(1)
> g1 = create.grid(12, 0.6)
> g2 = create.grid(12, 0.4)
> g3 = create.grid(12, 0.5)

Generating the compound plot with two different values of the global variable required a little thought. But fortunately the scoping rules in R allow for a rather nice implementation.

> grid.arrange(
+   {neighbours = c("NN"); visualise.grid(flow(g3))},
+   {neighbours = c("NN", "NNN"); visualise.grid(flow(g3))},
+   ncol = 2)

Here we have two plots for the same grid showing (left) NN and (right) NN+NNN percolation. Including the possibility of “diagonal” percolation extends the range of cells that are reachable and this grid, which does not percolate with just NN, does support percolation with NN+NNN.

No modifications are required to the percolation function.

> neighbours = c("NN")
> 
> percolates(g1)
[1] FALSE
> percolates(g2)
[1] TRUE
> percolates(g3)
[1] FALSE
> 
> neighbours = c("NN", "NNN")
> 
> percolates(g1)
[1] FALSE
> percolates(g2)
[1] TRUE
> percolates(g3)
[1] TRUE

Effect on Percolation Threshold

Finally we can see the effect of including next-nearest neighbours on the percolation threshold. We perform the same logistic fit as previously.

> (pcrit = - logistic.fit$coefficients[1] / logistic.fit$coefficients[2])
(Intercept) 
    0.59107 
> 1-pcrit
(Intercept) 
    0.40893

This agrees well with the result for NN+NNN from Table 1 in Malarz and Galam (2005).

A Larger Lattice

Finally, let’s look at a larger lattice at the percolation threshold.

> set.seed(1)
>
> g4 = create.grid(100, pcrit)
>
> percolates(g4)
[1] TRUE

References

  • Malarz, K., & Galam, S. (2005). Square-lattice site percolation at increasing ranges of neighbor bonds. Physical Review E, 71(1), 016125. doi:10.1103/PhysRevE.71.016125.
 Read more

Percolation Threshold on a Square Lattice

Manfred Schroeder touches on the topic of percolation a number of times in his encyclopaedic book on fractals (Schroeder, M. (1991) Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise. Percolation has numerous practical applications, the most interesting of which (from my perspective) is the flow of hot water through ground coffee! The problem of percolation can be posed as follows: suppose that a liquid is poured onto a solid block of some substance. If the substance is porous then it is possible for the liquid to seep through the pores and make it all the way to the bottom of the block. Whether or not this happens is determined by the connectivity of the pores within the substance. If it is extremely porous then it is very likely that there will be an open path of pores connecting the top to the bottom and the liquid will flow freely. If, on the other hand, the porosity is low then such a path may not exist. Evidently there is a critical porosity threshold which divides these two regimes.

 Read more

Eric Berlow and Sean Gourley: Mapping Ideas Worth Spreading

 Read more

Plotting Times of Discrete Events

I recently enjoyed reading <blockquote>O’Hara, R. B., & Kotze, D. J. (2010). Do not log-transform count data. Methods in Ecology and Evolution, 1(2), 118–122. doi:10.1111/j.2041-210X.2010.00021.x.</blockquote>

 Read more

Categorically Variable