Machine Learning


Machine Learning

1. Introduction

Exegetic Analytics is a Data Science consultancy specialising in data acquisition and augmentation, data preparation, predictive analytics and machine learning. Our services are used by a range of industries from Education to Security, Food Delivery to Politics. Our consultants are based in Durban and Cape Town and we engage with clients all over the world. Our products and services are used by a multitude of industries including Aerospace, Education, Finance, Food and Transport.

Exegetic Analytics also offers training, with experienced and knowledgeable facilitators. Our courses focus on practical applications, working through examples and exercises based on real-world datasets.

All of our training packages include access to:

  • our online development environment and
  • detailed course material which participants will have continued access to even once the training has concluded.

For more information about what we do, you can refer to our website.

These are some of the companies who have benefitted from our trainning:

Take a look at our full list of courses to see what other training we have on offer.

Contact Us

If this proposal is of interest to you or you would like to hear more about what we do you can get in touch on or +27 73 805 7439.

2. Course Description


Duration 3 days
Who should attend? The course is aimed at students, academics and professionals who want to use Machine Learning to build models and make predictions.
Objectives Machine Learning is a big (and rather hot!) topic. In this course you’ll learn how to apply Machine Learning to two types of problems: Classification and Regression. Although Machine Learning models are often treated as black boxes, you’ll learn (in an unthreatening, low-math way) how these models work. You’ll also learn how to appropriately prepare your data, how to build and test a model, and how to generate predictions.
Outcomes Participants will be able to build Classification and Regression models on real world data. They will understand how the models work and how to interpret model predictions.
Requirements Participants are assumed to have prior exposure to R, or at least to programming of some variety. Ideally participants should have completed the Data Wrangling and Visualisation modules.

Return to our list of courses.

Course Outline

3. Course Outline

Day 1: Classification

  • Introduction
    • What is Machine Learning?
    • Regression and Classification
      • Concepts of Accuracy
      • Residuals
      • Best fit and least squares
    • Model Optimisation (Underfitting and Overfitting)
  • Classification
    • k-Nearest Neighbours (kNN)
      • How it works
      • Finding a good value for k
      • Importance of normalising data
    • Naive Bayes
      • Background on Bayesian Methods
      • Probabilistic model
      • Flavours of Naive Bayes
      • Laplace smoothing
      • Document Classifier
    • Model Evaluation
      • Confusion Matrix
      • Accuracy
      • Recall / Sensitivity
      • Precision
      • Specificity
      • Positive/Negative Predictive Value
      • F Measure
      • ROC and AUC
    • Costs of Errors
    • Data Preparation
      • Transformations (log(), sqrt() and Box-Cox)
      • Missing Data
      • Unbalanced Data
        • Oversampling
        • Undersampling
        • Synthetic Data Generation
      • {recipes}
    • Decision Trees
      • Recursive Partitioning algorithm
      • Pruning
      • Model parameters (preventing underfitting and overfitting)
      • A variation: Conditional Inference Trees
    • Support Vector Machine
      • Maximum Margin Classifiers
      • Support Vector Classifiers
      • The Kernel Trick
      • Non-Linear Boundaries
        • Polynomial Kernel
        • Radial Kernel

Day 2: Linear Models & Dimension Reduction

  • Linear Models
    • Motivating Example
    • k-Nearest Neighbours
    • Linear Regression
      • Assumptions
      • Multiple regression
      • Model evaluation (RMSE, MAE and MPE)
      • Categorical variables
        • One-Hot Encoding (low-cardinality variables)
        • Target Encoding (high-cardinality variables)
      • Formulae
        • Simple Formulae
        • Interactions
      • Example: Prostate Cancer with Interactions
      • Polynomial regression
      • LOESS
    • Validating Model Assumptions
      • Fit Diagnostics
    • Using {broom}
    • Logistic Regression
      • Odds, Log Odds and the Logit Function
      • Example: Synthetic Data
      • Thresholding and classification
      • Principle of Parsimony
      • Multicollinearity
      • Example: Myopia Data
      • Beyond binary: One-versus-rest models

Day 3: Caret, Validation & Ensembles

  • Validation
    • Why Validation?
    • k-Fold Cross-Validation
    • Repeated Cross-Validation
    • Leave-One-Out Cross-Validation
    • Bootstrap
    • Model Tuning / Parameter Selection
  • Using {caret}
    • Pre-processing
      • Dealing with missing data
      • Handling unbalanced data
    • Train/test splitting
    • Feature importance and feature selection
    • Model evaluation (using cross validation and bootstrapping)
    • Model tuning
  • Ensembles
    • The Idea: “Wisdom of the Crowd”
    • Homogeneous and Heterogeneous Ensembles
    • Bagging
      • Random Forests
  • Machine Learning at Scale
    • Building many models (automation)

Feature Selection & Penalised Regression

  • Feature Importance
  • Feature Selection
    • Stepwise (forward selection and backward elimination)
  • Penalised Regression
    • Lasso
    • Ridge Regresion

Day 5: Unsupervised Learning

  • Clustering
  • Dimension Reduction
    • PCA
    • Linear Discriminant Analysis

Book now!

Training Philosophy

Our training emphasises practical skills. So, although you'll be learning concepts and theory, you'll see how everything is applied in the real world. We will work through examples and exercises based on real datasets.


All you'll need is a computer with a browser and a decent internet connection. We'll be using an online development environment. This means that you can focus on learning and not on solving technical problems.

Of course, we are happy to help you get your local environment set up too! You can start by following these instructions.


The training package includes access to
  • our online development environment and
  • detailed course material (slides and scripts).

Return to our list of courses.