#MonthOfJulia Day 27: Distributions
Today I’m looking at the Distributions package. Let’s get things rolling by loading it up.
There’s some overlap between the functionality in Distributions and what we saw yesterday in the StatsFuns package. So, instead of looking at functions to evaluate various aspects of PDFs and CDFs, we’ll focus on sampling from distributions and calculating summary statistics.
Julia has native support for sampling from a uniform distribution. We’ve seen this before, but here’s a reminder.
What if you need to generate samples from a more exotic distribution? The Normal distribution, although not particularly exotic, seems like a natural place to start. The Distributions package exposes a type for each supported distribution. For the Normal distribution the type is appropriately named
Normal. It’s derived from
Distribution with characteristics
The constructor accepts two parameters: mean (μ) and standard deviation (σ). We’ll instantiate a
Normal object with mean 1 and standard deviation 3.
Thanks to the wonders of multiple dispatch we are then able to generate samples from this object with the
We’ll use Gadfly to generate a histogram to validate that the samples are reasonable. They look pretty good.
There are functions like
logcdf() which allow the density function of our distribution object to be evaluated at particular points. Check those out. We’re moving on to truncating a portion of the distribution, leaving a
Truncated distribution object.
Again we can use Gadfly to get an idea of what this looks like. This time we’ll plot the actual PDF rather than a histogram of samples.
The Distributions package implements an extensive selection of other continuous distributions, like Exponential, Poisson, Gamma and Weibull. The basic interface for each of these is consistent with what we’ve seen for
Normal above, although there are some methods which are specific to some distributions.
Let’s look at a discrete distribution, using a Bernoulli distribution with success rate of 25% as an example.
What about a Binomial distribution? Suppose that we have a success rate of 25% per trial and want to sample the number of successes in a batch of 100 trials.
Finally let’s look at an example of fitting a distribution to a collection of samples using Maximum Likelihood.
Yup, those values are in pretty good agreement with the mean and standard deviation we specified for our
Normal object originally.
That’s it for today. There’s more to the Distributions package though. Check out my github repository to see other examples which didn’t make it into the today’s post.