Indicative Syllabus

Learning units build on previous units and they should be learnt in the order presented.

Unit 1

Core DS/AI Concepts and Tools

This unit will cover the baseline knowledge you need before getting into the advanced coding aspects of DS/AI. We will discuss concepts and terminology related to data and data solutions. You will learn the fundamentals of computer science with “drag & drop blocks”. In order to become acquainted with the basic computational concepts of problem-solving, logical thinking, cause and effect within the context of modelling, you will be introduced to prime programming operations, control structures, data types, etc. We will finish by configuring your working environment and installing the software necessary when working on your future data projects.

What you will learn:

  • To describe forms, formats, and lifecycles of data and how these vary across disciplines
  • To represent programming logic in pseudocode and/or flowcharts
  • Basic concepts such as data type and index
  • To use Boolean expressions in if-then structures for making decisions
  • To use looping structures and how to end loops
  • To create your own customised functions
  • To install and run R/RStudio

Unit 2

R in RStudio: syntax and working environment

In this module you will set up the working environment and pass the first big hurdle of importing data; you will learn how to do it the proper way with a command in R. You will learn how to use RStudio IDE for R from RStudio customisation to files navigation. You will learn good habits and practice of workflow in an R project. Once you get comfortable with the RStudio working environment, you will move on to mastering the key features of R language and its syntax.

What you will learn:

  • Basic use of R/RStudio console
  • Good habits for workflow
  • Data structures, variables, and basic data collection techniques
  • Inputting and importing different data types
  • R environment: record keeping

Unit 3

Git in GitHub: Basic Shell Script

Version control has become an essential tool for keeping track when working on data projects, as well as collaborating. RStudio supports working with Git, an open source distributed version control system, which is easy to use when combined with GitHub, a web-based Git repository hosting service. In this unit you will expand your data working environment, install Git and connect to GitHub through RStudio. You will learn how to use git to track changes to your work over time, and how to “branch” your project into a separate copy that can be developed further in isolation from the main code or merged back into the main trunk. You will discover why GitHub is useful for reproducible analysis and collaboration with others.

What you will learn:

  • To explain the difference between Git and GitHub
  • To communicate with others on GitHub
  • To create a new and clone an existing repository on GitHub
  • To use the shell commands to do basic version control with Git
  • To fork, branch and collaborate using GitHub
  • Good habits and practice of integrating Git and GitHub into your R project workflow

Unit 4

Basic Statistical Concepts: Data classification, summary statistics and DA Methodology

Now, as you become more comfortable with the RStudio working environment, you will move on to mastering the key features of R language through the introduction of the fundamental statistical concepts. In this module you will learn the fundamental concepts of statistical modelling, starting with exploring the data by using appropriate plots and computation of descriptive statistics, and moving on to inferential statistics of parameter estimation and hypothesis testing. You will learn the process of generating conclusions about a population from a noisy sample. Using statistical inference through the navigation of the set of assumptions and tools, you will learn how to draw conclusions from data to generate new knowledge.

What you will learn:

  • The concept of statistical distribution
  • How to explore and visualise different data types using base R
  • How to adopt common data-analysis methodology when exploring relationships within data
  • The basics of inferential statistics by making valid generalizations from sample data
  • Basic tools for making statistical inference
  • Conduct explorative data analysis, interpret and report its outcomes in an appropriate manner

Unit 5

Data Wrangling

In this unit you will learn some of the fundamental techniques for data exploration and transformation through the use of the dplyr package. This tidyverse package helps make your exploration intuitive to write and easy to read. You will learn dplyr’s key verbs for data manipulation that will help you uncover and shape the information within the data that is easy to turn into informative plots.

What you will learn:

  • dplyr’s key data manipulation verbs: select, mutate, filter, arrange and summarise/summarize
  • To aggregate data by groups
  • To chain data manipulation operations using the pipe operator
  • To create summary statistics for a dataset

Unit 6

Data Visualisation

In this unit you will be introduced to the fundamental principles behind effective data visualisation. Through the use of the grammar of graphics plotting concepts implemented in the ggplot2 package, you will be able to create meaningful exploratory plots. You will develop understanding about the way in which you should be able to think about necessary data transformations and summaries that can lead to an informative visualisation. You will learn how to create static maps and interactive maps with geolocated data by using the most popular packages in the R GIS community: simple features (sf) and leaflet.

What you will learn:

  • Basic principles of effective data visualisation
  • To specify ggplot2 building blocks and combine them to create a graphical display
  • About the philosophy that guides ggplot2: grammatical elements (layers) and aesthetic mapping
  • To visualise data with maps

Unit 7

Reproducible Reporting

This unit will give you an appreciation of R programming as a tool for reproducible reporting. Adopting and applying open and transparent approaches to your DS project is integral to genuinely rigorous analysis and adds to the material value of one’s work. In view of this, the presentation of the available code, data and algorithms should be available to scrutiny and possible connectivity to the wider DS community. R Markdown and knitr make it easy to intermingle code and text to generate compelling reports and presentations that are never out of date.

What you will learn

  • Store in one place your data analysis methods, results and interpretation
  • Update your data analysis methodology and automate the update of the results
  • Write your DS report text and code in R Markdown
  • Generate reproducible DS reports that display your code and results
  • Assemble your DS reporting as either HTML, a PDF or a Word document using the package rmarkdown

Unit 8

Case Study

This unit provides much needed hands-on experience. You will work on a on a real-life data problem that involves a complete data analysis, from data import to communication of results, with emphasis on reproducibility and transparency. You will develop you project openly on GitHub and make contributions to your fellow student whom you will communicate via GitHub Issues and who will review it and accepted it.

What you will learn

  • To integrate different aspects of DS knowledge and skills
  • To collaborate with others when working on a DS project
  • To gain exposure to an open source development process
  • To build personal DS portfolios