Data Wrangling


Data Wrangling

1. Introduction

Exegetic Analytics is a Data Science consultancy specialising in data acquisition and augmentation, data preparation, predictive analytics and machine learning. Our services are used by a range of industries from Education to Security, Food Delivery to Politics. Our consultants are based in Durban and Cape Town and we engage with clients all over the world. Our products and services are used by a multitude of industries including Aerospace, Education, Finance, Food and Transport.

Exegetic Analytics also offers training, with experienced and knowledgeable facilitators. Our courses focus on practical applications, working through examples and exercises based on real-world datasets.

All of our training packages include access to:

  • our online development environment and
  • detailed course material which participants will have continued access to even once the training has concluded.

For more information about what we do, you can refer to our website.

These are some of the companies who have benefitted from our trainning:

Take a look at our full list of courses to see what other training we have on offer.

Contact Us

If this proposal is of interest to you or you would like to hear more about what we do you can get in touch on or +27 73 805 7439.

2. Course Description


Duration 1 day
Who should attend? The course is aimed at students, academics and professionals who conduct data analysis using other tools like Excel. It’s assumed that participants already have some familiarity with R.
Objectives R is a phenomenal tool for working with data. Being able to lay out the steps in an analysis as a script means that the analysis is repeatable and can also be version controlled. One of the first steps in any analysis is the preparation of the data. Three R packages (dplyr, tidyr and purrr) expose an wide range of functionality to aid in the data preparation process.
Outcomes Participants will be able to apply a selection of “tidy” operations on data using the dplyr, tidyr and purrr packages.

Return to our list of courses.

Course Outline

3. Course Outline

  • Loading data from files
  • Manipulating Data Frames with dplyr
    • Selecting columns with select()
    • Filtering rows with filter()
    • Sorting with arrange()
    • Adding and changing columns with mutate() and transmute()
    • Aggregation with group_by() and summarise()
    • Assorted other functions from dplyr
  • Pivoting with tidyr
    • Long versus wide data formats
    • Going wide with spread()
    • Getting long with gather()
    • Splitting compound columns using separate() and extract()
    • Explicit missing values with complete()
    • Handling columns of data frames with nest() and unnest()
  • Functional programming with purrr
    • Mapping functions of a single variable with map()
    • Mapping to a specific data type
    • Mapping functions of two variables with map2()
    • Mapping functions with many variables using pmap()
    • Leveraging side effects with walk(), walk2() and pwalk()
    • Repeating with delays using insistently() and slowly()

Book now!

Training Philosophy

Our training emphasises practical skills. So, although you'll be learning concepts and theory, you'll see how everything is applied in the real world. We will work through examples and exercises based on real datasets.


All you'll need is a computer with a browser and a decent internet connection. We'll be using an online development environment. This means that you can focus on learning and not on solving technical problems.

Of course, we are happy to help you get your local environment set up too! You can start by following these instructions.


The training package includes access to
  • our online development environment and
  • detailed course material (slides and scripts).

Return to our list of courses.