Simple School Maths Problem

A simple problem sent through to me by one of my running friends:

There are 6 red cards and 1 black card in a box. Busi and Khanha take turns to draw a card at random from the box, with Busi being the first one to draw. The first person who draws the black card will win the game (assume that the game can go on indefinitely). If the cards are drawn with replacement, determine the probability that Khanya will win, showing all working.

The problem was posed to matric school pupils and allocated 7 marks (which translates into 7 minutes).

Per Game Analysis

Every time somebody plays the game they have a 1 in 7 chance of winning. The fact that the cards are drawn with replacement means that every time the game is played the odds are precisely the same.

Series of Games

Busi plays first. On her first try she has a 1/7 probability of winning.

Khanha plays next. Her probability of winning is 6/7 * 1/7, where 6/7 is the probability that Busi did not win perviously and 1/7 is the probability that Khanha wins on her first try.

The next time that Busi plays her probability of winning is 6/7 * 6/7 * 1/7, where the first 6/7 is the probability that she did not win on her first try and the second 6/7 is the probability that Khanha didn’t win on the previous round either.

The process continues…

In the end the probability that Busi wins is

1/7 + (6/7 * 6/7) * 1/7 + (6/7 * 6/7)^2 * 1/7 + (6/7 * 6/7)^3 * 1/7 + …

This is an infinite geometric series. We’ll simplify it a bit:

1/7 * [1 + (6/7 * 6/7) + (6/7 * 6/7)^2 +  (6/7 * 6/7)^3 + …]
= 1/7 * [1 + r + r^2 + r^3 + …]
= 1/7 * [1 / (1-r)]
= 1/7 * [49/13]
= 0.5384615

where r = 6/7 * 6/7 = 36/49.

What about the probability that Khanha wins? By similar reasoning this is

6/7 * 1/7 + (6/7 * 6/7) * 6/7 * 1/7 + (6/7 * 6/7)^2 * 6/7 * 1/7 + (6/7 * 6/7)^3 * 6/7 * 1/7 + …
= 6/7 * 1/7 * [1 + (6/7 * 6/7) + (6/7 * 6/7)^2 + (6/7 * 6/7)^3 + …]
= 6/49 * [49/13]
= 0.4615385

Importantly those two probabilities sum to one: 0.5384615 + 0.4615385 = 1.

The required answer would be 0.4615385. The calculation for Busi would not be necessary, but I’ve included it for completeness.


Although every time they play the game either player has the same chance of winning, because Busi plays first she has a greater chance of winning overall (simply by virtue of the fact that she plays before her opponent). By the same token, Khanha playing second puts her at a slight disadvantage. If both players played at the same time (for example, each drawing from their own box) then the probability would be 0.5 for both of them. The sequence of play puts Khanha at a slight disadvantage.

Note that Busi’s edge gets smaller as the number of red cards in the box increases. This is because her probability of winning on every game gets smaller and so the “first play” advantage weakens.

It seems like a fairly challenging problem for matric maths. Especially for only 7 marks. Having said that, the fact that they are attacking these sorts of problems in school maths is great. We never did anything this practical when I was at school.

satRday Cape Town: Call for Submissions


satRday Cape Town will happen on 18 February 2017 at Workshop 17, Victoria & Alfred Waterfront, Cape Town, South Africa.

Keynotes and Workshops

We have a trio of fantastic keynote speakers: Hilary Parker, Jennifer Bryan and Julia Silge, who’ll be dazzling you on the day as well as presenting workshops on the two days prior to the satRday.

Call for Submissions

We’re accepting submissions in four categories:

  • Workshop [90 min],
  • Lightning Talk [5 min],
  • Standard Talk [20 min] and
  • Poster.

Submit your proposals here. The deadline is 16 December, but there’s no reason to wait for the last minute: send us your ideas right now so that we can add you to the killer programme.


Register for the conference and workshops here. The tickets are affordable and you’re going to get extraordinary value for money.

fast-neural-style: Real-Time Style Transfer

I followed up a reference to fast-neural-style from Twitter and spent a glorious hour experimenting with this code. Very cool stuff indeed. It’s documented in Perceptual Losses for Real-Time Style Transfer and Super-Resolution by Justin Johnson, Alexandre Alahi and Fei-Fei Li.

The basic idea is to use feed-forward convolutional neural networks to generate image transformations. The networks are trained using perceptual loss functions and effectively apply style transfer.

What is “style transfer”? You’ll see in a moment.

As a test image I’ve used my Twitter banner, which I’ve felt for a while was a little bland. It could definitely benefit from additional style.


What about applying the style of van Gogh’s The Starry Night?


That’s pretty cool. A little repetitive, perhaps, but that’s probably due to the lack of structure in some areas of the input image.

How about the style of Picasso’s La Muse?


Again, rather nice, but a little too repetitive for my liking. I can certainly imagine some input images on which this would work well.

Here’s another take on La Muse but this time using instance normalisation.


Repetition vanished.

What about using some abstract contemporary art for styling?


That’s rather trippy, but I like it.

Using a mosaic for style creates an interesting effect. You can see how the segments of the mosaic are echoed in the sky.


Finally using Munch’s The Scream. The result is dark and forboding and I just love it.


Maybe it’s just my hardware, but these transformations were not quite a “real-time” process. Nevertheless, the results were worth the wait. I certainly now have multiple viable options for an updated Twitter header image.

Related Projects

If you’re interested in these sorts of projects (and, hey, honestly who wouldn’t be?) then you might also like these:

Fitting a Statistical Distribution to Sampled Data

I’m generally not too interested in fitting analytical distributions to my data. With large enough samples (which I am normally fortunate enough to have!) I can safely assume normality for most statistics of interest.

Recently I had a relatively small chunk of data and finding a decent analytical approximation was important. So I had a look at the tools available in R for addressing this problem. The fitdistrplus package seemed like a good option. Here’s a sample workflow.

Create some Data

To have something to work with, generate 1000 samples from a log-normal distribution.

> N <- 1000
> set.seed(37)
> #
> x <- rlnorm(N, meanlog = 0, sdlog = 0.5)

Skewness-Kurtosis Plot

Load up the package and generate a skewness-kurtosis plot.

> library(fitdistrplus)
> descdist(x)
summary statistics
min:  0.2391517   max:  6.735326 
median:  0.9831923 
mean:  1.128276 
estimated sd:  0.6239416 
estimated skewness:  2.137708 
estimated kurtosis:  12.91741

There’s nothing magical in those summary statistics, but the plot is most revealing. The data are represented by the blue point. Various distributions are represented by symbols, lines and shaded areas.


We can see that our data point is close to the log-normal curve (no surprises there!), which indicates that it is the most likely distribution.

We don’t need to take this at face value though because we can fit a few distributions and compare the results.

Fitting Distributions

We’ll start out by fitting a log-normal distribution using fitdist().

> fit.lnorm = fitdist(x, "lnorm")
> fit.lnorm
Fitting of the distribution ' lnorm ' by maximum likelihood 
            estimate Std. Error
meanlog -0.009199794 0.01606564
sdlog    0.508040297 0.01135993
> plot(fit.lnorm)


The quantile-quantile plot indicates that, as expected, a log-normal distribution gives a pretty good representation of our data. We can compare this to the results of fitting a normal distribution, where we see that there is significant divergence of the tails of the quantile-quantile plot.


Comparing Distributions

If we fit a selection of plausible distributions then we can objectively evaluate the quality of those fits.

> fit.metrics <- lapply(ls(pattern = "fit\\."), function(variable) {
+   fit = get(variable, envir = .GlobalEnv)
+   with(fit, data.frame(name = variable, aic, loglik))
+ })
>, fit.metrics)
       name      aic     loglik
1   fit.exp 2243.382 -1120.6909
2 fit.gamma 1517.887  -756.9436
3 fit.lnorm 1469.088  -732.5442
4 fit.logis 1737.104  -866.5520
5  fit.norm 1897.480  -946.7398

According to these data the log-normal distribution is the optimal fit: smallest AIC and largest log-likelihood.

Of course, with real (as opposed to simulated) data, the situation will probably not be as clear cut. But with these tools it’s generally possible to select an appropriate distribution and derive appropriate parameters.

Talks about Bots

Seth Juarez and Matt Winkler having an informal chat about bots.

Matt Winkler talking about Bots as the Next UX: Expanding Your Apps with Conversation at the Microsoft Machine Learning & Data Science Summit (2016).

At the confluence of the rise in messaging applications, advances in text and language processing, and mobile form factors, bots are emerging as a key area of innovation and excitement. Bots (or conversation agents) are rapidly becoming an integral part of your digital experience: they are as vital a way for people to interact with a service or application as is a web site or a mobile experience. Developers writing bots all face the same problems: bots require basic I/O, they must have language and dialog skills, and they must connect to people, preferably in any conversation experience and language a person chooses. This code-heavy talk focuses on how to solve these problems using the Microsoft Bot Framework, a set of tools and services to easily build bots and add them to any application. We’ll cover use cases and customer case studies for enhancing an application with a bot, and how to build a bot, focusing on each of the key problems: how to integrate with various messaging services, how to connect to users, and how to process language to understand the user’s intent. At the end of this talk, developers will be equipped to get started adding bots to their applications, understanding both the fundamental concepts as well as the details of getting started using the Bot Framework.

Rafal Lukawiecki – Putting Science into the Business of Data Science

A talk by Rafal Lukawiecki at the Microsoft Machine Learning & Data Science Summit (2016).

Data science relies on the scientific method of reasoning to help make business decisions based on analytics. Let Rafal explain how his customers apply the trusted processes and the principles of hypothesis testing with machine learning and statistics towards solving their day-to-day, practical business problems. Rafal will speak from his 10 years of experience in data mining and statistics, using the Microsoft data platform for high-value customer identification, recommendation and gap analysis, customer paths and acquisition modelling, price optimization and other forms of advanced analytics. He will share how this work leads to measurable results that not only make his customers more successful but also keep the data scientist true to his principles. This session will also help you understand how to start and justify data science projects in ways that business decision makers find attractive, and how to do it all with the help of the Microsoft platform.

Python: First Steps with MongoDB

I’m busy working my way through Kyle Banker’s MongoDB in Action. Much of the example code in the book is given in Ruby. Despite the fact that I’d love to learn more about Ruby, for the moment it makes more sense for me to follow along with Python.


MongoDB Installation

If you haven’t already installed MongoDB, now is the time to do it! On a Debian Linux system the installation is very simple.

$ sudo apt install mongodb

Python Package Installation

Next install PyMongo, the Python driver for MongoDB.

$ pip3 install pymongo

Check that the install was successful.

>>> import pymongo
>>> pymongo.version

Detailed documentation for PyMongo can be found here.

Creating a Client

To start interacting with the MongoDB server we need to instantiate a MongoClient.

>>> client = pymongo.MongoClient()

This will connect to localhost using the default port. Alternative values for host and port can be specified.

Connect to a Database

Next we connect to a particular database called test. If the database does not yet exist then it will be created.

>>> db = client.test

Create a Collection

A database will hold one or more collections of documents. We’ll create a users collection.

>>> users = db.users
>>> users
Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True),
'test'), 'users')

As mentioned in the documentation, MongoDB is lazy about the creation of databases and collections. Neither the database nor collection is actually created until data are written to them.

Working with Documents

As you would expect, MongoDB caters for the four basic CRUD operations.


Documents are represented as dictionaries in Python. We’ll create a couple of light user profiles.

>>> smith = {"last_name": "Smith", "age": 30}
>>> jones = {"last_name": "Jones", "age": 40}

We use the insert_one() method to store each document in the collection.

>>> users.insert_one(smith)
<pymongo.results.InsertOneResult object at 0x7f57d36d9678>

Each document is allocated a unique identifier which can be accessed via the inserted_id attribute.

>>> jones_id = users.insert_one(jones).inserted_id
>>> jones_id

Although these identifiers look pretty random, there is actually a wel defined structure. The first 8 characters (4 bytes) are a timestamp, followed by a 6 character machine identifier then a 4 character process identifier and finally a 6 character counter.

We can verify that the collection has been created.

>>> db.collection_names()
['users', 'system.indexes']

There’s also an insert_many() method which can be used to simultaneously insert multiple documents.


The find_one() method can be used to search the collection. As its name implies it only returns a single document.

>>> users.find_one({"last_name": "Smith"})
{'_id': ObjectId('57ea4acfad4b2a1378640b41'), 'age': 30, 'last_name': 'Smith'}
>>> users.find_one({"_id": jones_id})
{'_id': ObjectId('57ea4adfad4b2a1378640b42'), 'age': 40, 'last_name': 'Jones'}

A more general query can be made using the find() method which, rather than returning a document, returns a cursor which can be used to iterate over the results. With our minimal collection this doesn’t seem very useful, but a cursor really comes into its own with a massive collection.

>>> users.find({"last_name": "Smith"})
<pymongo.cursor.Cursor object at 0x7f57d77fe3c8>
>>> users.find({"age": {"$gt": 20}})
<pymongo.cursor.Cursor object at 0x7f57d77fe8d0>

A cursor is an iterable and can be used to neatly access the query results.

>>> cursor = users.find({"age": {"$gt": 20}})
>>> for user in cursor:
...     user["last_name"]

Operations like count() and sort() can be applied to the results returned by find().


The update() method is used to modify existing documents. A compound document is passed as the argument to update(), the first part of which is used to match those documents to which the change is to be applied and the second part gives the details of the change.

>>> users.update({"last_name": "Smith"}, {"$set": {"city": "Durban"}})
{'updatedExisting': True, 'nModified': 1, 'n': 1, 'ok': 1}

The example above uses the $set modifier. There are a number of other modifiers available like $inc, $mul, $rename and $unset.

By default the update is only applied to the first matching record. The change can be applied to all matching records by specifying multi = True.


Deleting records happens via the remove() method with an argument which specifies which records are to be deleted.

>>> users.remove({"age": {"$gte": 40}})
{'n': 1, 'ok': 1}


Well those are the basic operations. Nothing too scary. I’ll be back with the Python implementation of the Twitter archival sample application.