JuliaStats is a meta-project which consolidates various packages related to statistics and machine learning in Julia. W

Julia already has some builtin support for statistical operations, so additional packages are not strictly necessary. However they do increase the scope and ease of possible operations (as we’ll see below).Julia already has some builtin support for statistical operations. Let’s kick off by loading all the packages that we’ll be looking at today.

StatsBase

The documentation for StatsBase can be found here. As the package name implies, it provides support for basic statistical operations in Julia.

High level summary statistics are generated by summarystats().

Weighted versions of the mean, variance and standard deviation are implemented. There’re also geometric and harmonic means.

There’s a weighted median as well as functions for calculating quantiles.

Sampling from a population is also catered for, with a range of algorithms which can be applied to the sampling procedure.

There’s also functionality for empirical estimation of distributions from histograms and a range of other interesting and useful goodies.

StatsFuns

The StatsFuns package provides constants and functions for statistical computing. The constants are by no means essential but certainly very handy. Take, for example, twoπ and sqrt2.

There are some mildly exotic mathematical functions available like logistic, logit and softmax.

Finally there is a suite of functions relating to various statistical distributions. The functions for the Normal distribution are illustrated below, but there’re functions for Beta and Binomial distribution, the Gamma and Hypergeometric distribution and many others. The function naming convention is consistent across all distributions.

StreamStats

Finally, the StreamStats package supports calculating online statistics for a stream of data which is being continuously updated.

In addition to the mean and variance illustrated above, the package also supports online versions of min() and max(), and can be used to generate incremental confidence intervals for Bernoulli and Poisson processes.

That’s it for today. Check out the full code on github and watch the video below.