satRday in Cape Town

The second satRday (and first satRday on African soil) will happen in Cape Town on 18 February 2017. It’s going to be a one day celebration of R.

satrday-keynote-speakers

We have a trio of phenomenal keynote speakers (Hilary Parker, Jenny Bryan and Julia Silge) who will be giving inspiring talks at the conference and also conducting workshops prior to the conference. There will be numerous other talks from local and international speakers covering a variety of topics relating to R.

Registration is open and early bird prices are available until 23 December 2016. Submit a talk proposal and you could join the lineup of R luminaries (in addition to getting a free ticket to the conference!).

Simple School Maths Problem

A simple problem sent through to me by one of my running friends:

There are 6 red cards and 1 black card in a box. Busi and Khanha take turns to draw a card at random from the box, with Busi being the first one to draw. The first person who draws the black card will win the game (assume that the game can go on indefinitely). If the cards are drawn with replacement, determine the probability that Khanya will win, showing all working.

The problem was posed to matric school pupils and allocated 7 marks (which translates into 7 minutes).

Per Game Analysis

Every time somebody plays the game they have a 1 in 7 chance of winning. The fact that the cards are drawn with replacement means that every time the game is played the odds are precisely the same.

Series of Games

Busi plays first. On her first try she has a 1/7 probability of winning.

Khanha plays next. Her probability of winning is 6/7 * 1/7, where 6/7 is the probability that Busi did not win perviously and 1/7 is the probability that Khanha wins on her first try.

The next time that Busi plays her probability of winning is 6/7 * 6/7 * 1/7, where the first 6/7 is the probability that she did not win on her first try and the second 6/7 is the probability that Khanha didn’t win on the previous round either.

The process continues…

In the end the probability that Busi wins is

1/7 + (6/7 * 6/7) * 1/7 + (6/7 * 6/7)^2 * 1/7 + (6/7 * 6/7)^3 * 1/7 + …

This is an infinite geometric series. We’ll simplify it a bit:

1/7 * [1 + (6/7 * 6/7) + (6/7 * 6/7)^2 +  (6/7 * 6/7)^3 + …]
= 1/7 * [1 + r + r^2 + r^3 + …]
= 1/7 * [1 / (1-r)]
= 1/7 * [49/13]
= 0.5384615

where r = 6/7 * 6/7 = 36/49.

What about the probability that Khanha wins? By similar reasoning this is

6/7 * 1/7 + (6/7 * 6/7) * 6/7 * 1/7 + (6/7 * 6/7)^2 * 6/7 * 1/7 + (6/7 * 6/7)^3 * 6/7 * 1/7 + …
= 6/7 * 1/7 * [1 + (6/7 * 6/7) + (6/7 * 6/7)^2 + (6/7 * 6/7)^3 + …]
= 6/49 * [49/13]
= 0.4615385

Importantly those two probabilities sum to one: 0.5384615 + 0.4615385 = 1.

The required answer would be 0.4615385. The calculation for Busi would not be necessary, but I’ve included it for completeness.

Conclusion

Although every time they play the game either player has the same chance of winning, because Busi plays first she has a greater chance of winning overall (simply by virtue of the fact that she plays before her opponent). By the same token, Khanha playing second puts her at a slight disadvantage. If both players played at the same time (for example, each drawing from their own box) then the probability would be 0.5 for both of them. The sequence of play puts Khanha at a slight disadvantage.

Note that Busi’s edge gets smaller as the number of red cards in the box increases. This is because her probability of winning on every game gets smaller and so the “first play” advantage weakens.

It seems like a fairly challenging problem for matric maths. Especially for only 7 marks. Having said that, the fact that they are attacking these sorts of problems in school maths is great. We never did anything this practical when I was at school.

Django: Hosting Multiple Sites with NGINX


nginx-logo

I have a droplet on DigitalOcean which I use to host a few projects in development. Since none of these projects are at a stage where I’ve registered individual domains, it makes sense to simply serve them via different ports on the same server.

Setting this kind of configuration up on NGINX turns out to be pretty simple. Edit the virtual host configuration file, which will probably be found in /etc/nginx/sites-enabled/.

Upstream Modules

First construct two (or more!) upstream modules.

upstream app1 {
    server 127.0.0.1:9000 fail_timeout=0;
}

upstream app2 {
    server 127.0.0.1:9001 fail_timeout=0;
}

Above we’ve listed two upstream services which NGINX will be proxying. The first corresponds to a server which accepts requests on port 9000, while the second is listening on port 9001. In principle each of the upstream sections can include one or more server records. If there is more than one then requests will be assigned to servers in a round robin fashion.

To make this a little more concrete, you might have two separate Django projects, app1 which is running on port 9000 and app2 on port 9001.

~/app1$ nohup python3 manage.py runserver localhost:9000 &
~/app2$ nohup python3 manage.py runserver localhost:9001 &

Since these projects are in development I’m still running the Django test server.

Server Blocks

Now we need to construct server blocks for each of the upstream modules. These blocks determine how an external request is mapped to an internal service. Suppose, for example, that we wanted to expose app1 on ports 80 and 8000. The mapping between the external port and the upstream service happens via the proxy_pass field.

server {
    listen 80 default_server;
    listen 8000;

    client_max_body_size 4G;
    server_name _;

    keepalive_timeout 5;

    location /media  {
        alias /home/django/media;
    }

    location /static {
        alias /home/django/staticfiles; 
    }

    location / {
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $http_host;
        proxy_redirect off;
        proxy_pass http://app1;
    }
}

Note that only one server block can be designated as default_server.

We’d then have a second entry which, for example, exposes app2 on port 81.

server {
    listen 81;

    server_name _;

    location / {
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $http_host;
        proxy_redirect off;
        proxy_pass http://app2;
    }
}

A similar configuration could be used to host separate subdomains.

Conclusion

After making changes to the configuration file you’ll need to restart the server.

# service nginx restart

If my server was located at 114.132.8.215 then I’d be able to access the two sites as http://114.132.8.215:80/ (or http://114.132.8.215:8000/) and http://114.132.8.215:81/ respectively.

I don’t for a moment pretend to be an expert with NGINX. Far from it, in fact. However, the above setup works for me in a testing environment. Once I’d figured out the plumbing it was pretty simple to implement. For more detailed background on the general principles behind using NGINX read this.

satRday Cape Town: Call for Submissions

satrday-cape-town-banner

satRday Cape Town will happen on 18 February 2017 at Workshop 17, Victoria & Alfred Waterfront, Cape Town, South Africa.

Keynotes and Workshops

We have a trio of fantastic keynote speakers: Hilary Parker, Jennifer Bryan and Julia Silge, who’ll be dazzling you on the day as well as presenting workshops on the two days prior to the satRday.

Call for Submissions

We’re accepting submissions in four categories:

  • Workshop [90 min],
  • Lightning Talk [5 min],
  • Standard Talk [20 min] and
  • Poster.

Submit your proposals here. The deadline is 16 December, but there’s no reason to wait for the last minute: send us your ideas right now so that we can add you to the killer programme.

Registration

Register for the conference and workshops here. The tickets are affordable and you’re going to get extraordinary value for money.

fast-neural-style: Real-Time Style Transfer

I followed up a reference to fast-neural-style from Twitter and spent a glorious hour experimenting with this code. Very cool stuff indeed. It’s documented in Perceptual Losses for Real-Time Style Transfer and Super-Resolution by Justin Johnson, Alexandre Alahi and Fei-Fei Li.

The basic idea is to use feed-forward convolutional neural networks to generate image transformations. The networks are trained using perceptual loss functions and effectively apply style transfer.

What is “style transfer”? You’ll see in a moment.

As a test image I’ve used my Twitter banner, which I’ve felt for a while was a little bland. It could definitely benefit from additional style.

twitter-banner

What about applying the style of van Gogh’s The Starry Night?

twitter-banner-1

That’s pretty cool. A little repetitive, perhaps, but that’s probably due to the lack of structure in some areas of the input image.

How about the style of Picasso’s La Muse?

twitter-banner-2

Again, rather nice, but a little too repetitive for my liking. I can certainly imagine some input images on which this would work well.

Here’s another take on La Muse but this time using instance normalisation.

twitter-banner-6

Repetition vanished.

What about using some abstract contemporary art for styling?

twitter-banner-4

That’s rather trippy, but I like it.

Using a mosaic for style creates an interesting effect. You can see how the segments of the mosaic are echoed in the sky.

twitter-banner-7

Finally using Munch’s The Scream. The result is dark and forboding and I just love it.

twitter-banner-8

Maybe it’s just my hardware, but these transformations were not quite a “real-time” process. Nevertheless, the results were worth the wait. I certainly now have multiple viable options for an updated Twitter header image.

Related Projects

If you’re interested in these sorts of projects (and, hey, honestly who wouldn’t be?) then you might also like these:

Fitting a Statistical Distribution to Sampled Data

I’m generally not too interested in fitting analytical distributions to my data. With large enough samples (which I am normally fortunate enough to have!) I can safely assume normality for most statistics of interest.

Recently I had a relatively small chunk of data and finding a decent analytical approximation was important. So I had a look at the tools available in R for addressing this problem. The fitdistrplus package seemed like a good option. Here’s a sample workflow.

Create some Data

To have something to work with, generate 1000 samples from a log-normal distribution.

> N <- 1000
> 
> set.seed(37)
> #
> x <- rlnorm(N, meanlog = 0, sdlog = 0.5)

Skewness-Kurtosis Plot

Load up the package and generate a skewness-kurtosis plot.

> library(fitdistrplus)
> 
> descdist(x)
summary statistics
------
min:  0.2391517   max:  6.735326 
median:  0.9831923 
mean:  1.128276 
estimated sd:  0.6239416 
estimated skewness:  2.137708 
estimated kurtosis:  12.91741

There’s nothing magical in those summary statistics, but the plot is most revealing. The data are represented by the blue point. Various distributions are represented by symbols, lines and shaded areas.

cullen-frey-plot

We can see that our data point is close to the log-normal curve (no surprises there!), which indicates that it is the most likely distribution.

We don’t need to take this at face value though because we can fit a few distributions and compare the results.

Fitting Distributions

We’ll start out by fitting a log-normal distribution using fitdist().

> fit.lnorm = fitdist(x, "lnorm")
> fit.lnorm
Fitting of the distribution ' lnorm ' by maximum likelihood 
Parameters:
            estimate Std. Error
meanlog -0.009199794 0.01606564
sdlog    0.508040297 0.01135993
> plot(fit.lnorm)

fitdist-lnorm

The quantile-quantile plot indicates that, as expected, a log-normal distribution gives a pretty good representation of our data. We can compare this to the results of fitting a normal distribution, where we see that there is significant divergence of the tails of the quantile-quantile plot.

fitdist-norm

Comparing Distributions

If we fit a selection of plausible distributions then we can objectively evaluate the quality of those fits.

> fit.metrics <- lapply(ls(pattern = "fit\\."), function(variable) {
+   fit = get(variable, envir = .GlobalEnv)
+   with(fit, data.frame(name = variable, aic, loglik))
+ })
> do.call(rbind, fit.metrics)
       name      aic     loglik
1   fit.exp 2243.382 -1120.6909
2 fit.gamma 1517.887  -756.9436
3 fit.lnorm 1469.088  -732.5442
4 fit.logis 1737.104  -866.5520
5  fit.norm 1897.480  -946.7398

According to these data the log-normal distribution is the optimal fit: smallest AIC and largest log-likelihood.

Of course, with real (as opposed to simulated) data, the situation will probably not be as clear cut. But with these tools it’s generally possible to select an appropriate distribution and derive appropriate parameters.

Talks about Bots

Seth Juarez and Matt Winkler having an informal chat about bots.

Matt Winkler talking about Bots as the Next UX: Expanding Your Apps with Conversation at the Microsoft Machine Learning & Data Science Summit (2016).

At the confluence of the rise in messaging applications, advances in text and language processing, and mobile form factors, bots are emerging as a key area of innovation and excitement. Bots (or conversation agents) are rapidly becoming an integral part of your digital experience: they are as vital a way for people to interact with a service or application as is a web site or a mobile experience. Developers writing bots all face the same problems: bots require basic I/O, they must have language and dialog skills, and they must connect to people, preferably in any conversation experience and language a person chooses. This code-heavy talk focuses on how to solve these problems using the Microsoft Bot Framework, a set of tools and services to easily build bots and add them to any application. We’ll cover use cases and customer case studies for enhancing an application with a bot, and how to build a bot, focusing on each of the key problems: how to integrate with various messaging services, how to connect to users, and how to process language to understand the user’s intent. At the end of this talk, developers will be equipped to get started adding bots to their applications, understanding both the fundamental concepts as well as the details of getting started using the Bot Framework.