Machine Learning: Classification


  • k-Nearest Neighbours (kNN)
    • How it works
    • Finding a good value for k
    • Importance of normalising data
  • Naive Bayes
    • Background on Bayesian Methods
    • Probabilistic model
    • Flavours of Naive Bayes
    • Laplace smoothing
    • Document Classifier
  • Model Evaluation
    • Confusion Matrix
    • Accuracy
    • Recall / Sensitivity
    • Precision
    • Specificity
    • Positive/Negative Predictive Value
    • F Measure
    • ROC and AUC
  • Costs of Errors
  • Decision Trees
    • Recursive Partitioning algorithm
    • Pruning
    • Model parameters (preventing underfitting and overfitting)
    • A variation: Conditional Inference Trees
  • Support Vector Machine
    • Maximum Margin Classifiers
    • Support Vector Classifiers
    • The Kernel Trick
    • Non-Linear Boundaries
      • Polynomial Kernel
      • Radial Kernel
  • Unbalanced Data
    • Oversampling
    • Undersampling
    • Synthetic Data Generation

Prior Knowledge

We assume that participants have prior experience with R, ideally having completed both the the Introduction to R and Data Wrangling courses.

Book now!

Training Philosophy

Our training emphasises practical skills. So, although you'll be learning concepts and theory, you'll see how everything is applied in the real world. We will work through examples and exercises based on real datasets.


All you'll need is a computer with a browser and a decent internet connection. We'll be using an online development environment. This means that you can focus on learning and not on solving technical problems.

Of course, we are happy to help you get your local environment set up too! You can start by following these instructions.


The training package includes access to
  • our online development environment and
  • detailed course material (slides and scripts).

Return to our list of courses.