Web Scraping


Web Scraping

1. Introduction

Exegetic Analytics is a Data Science consultancy specialising in data acquisition and augmentation, data preparation, predictive analytics and machine learning. Our services are used by a range of industries from Education to Security, Food Delivery to Politics. Our consultants are based in Durban and Cape Town and we engage with clients all over the world. Our products and services are used by a multitude of industries including Aerospace, Education, Finance, Food and Transport.

Exegetic Analytics also offers training, with experienced and knowledgeable facilitators. Our courses focus on practical applications, working through examples and exercises based on real-world datasets.

All of our training packages include access to:

  • our online development environment and
  • detailed course material which participants will have continued access to even once the training has concluded.

For more information about what we do, you can refer to our website.

These are some of the companies who have benefitted from our trainning:

Take a look at our full list of courses to see what other training we have on offer.

Contact Us

If this proposal is of interest to you or you would like to hear more about what we do you can get in touch on training@exegetic.biz or +27 73 805 7439.

2. Course Description

There’s a wealth of data available on the internet which can be used for data augmentation or to create entirely new datasets.


Duration 2 days
Objectives In this course you’ll learn how to use Python to selectively, systematically and automatically scrape data from websites.
Outcomes You’ll know

  • the basics of HTML and CSS;
  • how to parse an HTML document and extract data using Beautiful Soup;
  • how to build a spider to traverse an entire website using Scrapy;
  • how to drive a browser using Selenium and Python bindings; and
  • how to store scraped data as CSV or JSON.
Requirements You are assumed to have have experience with Python. Familiarity with HTML and CSS will be an advantage.
  • Get a recent version of Google Chrome or Firefox.
  • Get a recent version of Python.
    • Install the following packages: requests, numpy, pandas, beautifulsoup4 and selenium.
  • Install a VNC client.
  • Install Docker and pull the following images:
    • selenium/standalone-firefox:3.14
    • selenium/standalone-chrome-debug:3.14
  • Go through this fun CSS tutorial.

Return to our list of courses.

Course Outline

3. Course Outline

  • Introduction to Web Scraping
    • Structure of a web page
    • Selectors
      • CSS
      • XPath
    • Browser tools
  • HTTP
    • Requests
    • Status codes
  • requests
    • GET request
    • The response
      • Status code
      • Headers
      • Content
    • Query strings
    • Request headers
    • Authentication
    • Performance
    • Timeouts
    • Sessions
  • Beautiful Soup — Parsing HTML
    • BeautifulSoup object
    • Tag object
      • Name
      • Text
      • Attributes
    • Navigating HTML
      • Using tag names
      • select() and select_one()
      • Parents, siblings, descendants and children
      • Various forms of find()
  • Selenium — Scraping dynamic sites
    • Selenium on Docker
      • Chrome
      • Firefox
      • Debug images and VNC connections
    • Navigation
    • Locating elements
    • Waiting (explicit and implicit waits)
    • Screenshots
    • Cookies
  • Scrapy — Creating a Spider
    • What is a spider?
    • Scrapy shell
    • Project
    • Writing spiders
      • Spider class
      • Selectors
      • Navigation
      • Gathering links
      • Extracting and storing data
      • Following links and recursion
    • Spider patterns
      • Using sitemap.xml
    • Using Selenium

Book now!

Training Philosophy

Our training emphasises practical skills. So, although you'll be learning concepts and theory, you'll see how everything is applied in the real world. We will work through examples and exercises based on real datasets.


All you'll need is a computer with a browser and a decent internet connection. We'll be using an online development environment. This means that you can focus on learning and not on solving technical problems.

Of course, we are happy to help you get your local environment set up too! You can start by following these instructions.


The training package includes access to
  • our online development environment and
  • detailed course material (slides and scripts).

Return to our list of courses.