Foundations for Machine Learning
Learn the #1 skill required to succeed as a machine learning engineer or data scientist
Math and Statistics
Unlike other math and statistics courses, this foundations series is built from the ground up to boost your understanding of machine learning principles.Â
-
Linear Algebra for Machine Learning
Available On-DemandNow available on-demand. The course includes the following modules
- Intro to Linear Algebra
- Linear Algebra II: Matrix Operations
Over the course of studying this topic, you’ll:
- Understand the fundamentals of linear algebra, a ubiquitous approach for solving for unknowns within high-dimensional spaces.
- Develop a geometric intuition of what’s going on beneath the hood of machine learning algorithms, including those used for deep learning.
- Be able to more intimately grasp the details of machine learning papers as well as all of the other subjects that underlie ML, including calculus, statistics, and
- Develop a geometric intuition of what’s going on beneath the hood of machine learning algorithms, including those used for deep learning.
- Be able to more intimately grasp the details of machine learning papers as well as all of the other subjects that underlie ML, including calculus, statistics, and optimization algorithms.
- Reduce the dimensionality of complex spaces down to their most informative elements with techniques such as eigendecomposition, singular value decomposition, and principal components analysis
Available On-Demand Linear Algebra for Machine Learning
This topic, Intro to Linear Algebra, is the first in the Machine Learning Foundations series. It is essential because linear algebra lies at the heart of most machine learning approaches and is especially predominant in deep learning, the branch of ML at the forefront of today’s artificial intelligence advances.
-
Calculus for Machine Learning
Available On-DemandNow available on-demand. The course includes the following modules
- Calculus I: Limits & Derivatives
- Calculus II: Partial Derivatives & Integrals
Over the course of studying this topic, you’ll:
- Develop an understanding of what’s going on beneath the hood of machine learning algorithms, including those used for deep learning.
- Be able to more intimately grasp the details of machine learning papers as well as many of the other subjects that underlie ML, including partial-derivative calculus, statistics, and optimization algorithms.
- Compute the derivatives of functions, including by using AutoDiff in the popular TensorFlow 2 and PyTorch libraries.
- Develop an understanding of what’s going on beneath the hood of machine learning algorithms, including those used for deep learning.
- Be able to grasp the details of the partial-derivative, multivariate calculus that is common in machine learning papers as well as many in other subjects that underlie ML, including information theory and optimization algorithms.
- Use integral calculus to determine the area under any given curve, a recurring task in ML applied, for example, to evaluate model performance by calculating the ROC AUC metric.
Available On-Demand Calculus for Machine Learning
This topic, Calculus I: Limits & Derivatives, introduces the mathematical field of calculus -- the study of rates of change -- from the ground up. It is essential because computing derivatives via differentiation is the basis of optimizing most machine learning algorithms, including those used in deep learning
-
Probability and Statistics
Available On-DemandNow available on-demand. The course includes the following module
- Probability and Information Theory
- Intro to Statistics
Over the course of studying this topic, you’ll:
- Develop an understanding of what’s going on beneath the hood of predictive statistical models and machine learning algorithms, including those used for deep learning.
- Understand the appropriate variable type and probability distribution for representing a given class of data, as well as the standard techniques for assessing the relationships between distributions.
- Apply information theory to quantify the proportion of valuable signal that’s present amongst the noise of a given probability distribution.
- Develop an understanding of what’s going on beneath the hood of predictive statistical models and machine learning algorithms, including those used for deep learning.
- Hypothesize about and critically evaluate the inputs and outputs of machine learning algorithms using essential statistical tools such as the t-test, ANOVA, and R-squared.
- Use historical data to predict the future using regression models that take advantage of frequentist statistical theory (for smaller data sets) and modern machine learning theory (for larger data sets), including why we may want to consider applying deep learning to a given problem.
Available On-Demand Probability and Statistics
Probability & Information Theory introduces the mathematical fields that enable us to quantify uncertainty as well as to make predictions despite uncertainty. These fields are essential because ML algorithms are both trained by imperfect data and deployed into noisy, real-world scenarios.
-
Computer Science
Available On-DemandNow available on-demand. The course includes the following modules
- Algorithms and Data Structures
- Optimization
Over the course of studying this topic, you’ll:
- Use “Big O” notation to characterize the time efficiency and space efficiency of a given algorithm, enabling you to select or devise the most sensible approach for tackling a particular machine learning problem with the hardware resources available to you.
- Get acquainted with the entire range of the most widely-used Python data structures, including list-, dictionary-, tree-, and graph-based structures.
- Develop an understanding of all of the essential algorithms for working with data, including those for searching, sorting, hashing, and traversing.
- Discover how the statistical and machine learning approaches to optimization differ, and why you would select one or the other for a given problem you’re solving.
- Find out how the extremely versatile (stochastic) gradient descent optimization algorithm works, including how to apply it — from a low, in-depth level as well as from a high, abstracted level — within the most popular deep learning libraries, Tensorflow and PyTorch
- Get acquainted with the “fancy” optimizers that are available for advanced machine learning approaches (e.g., deep learning) and when you should consider using them.
Available On-Demand Computer Science
This session, Algorithms & Data Structures, introduces the most important computer science topics for machine learning, enabling you to design and deploy computationally efficient data models.
Meet Your Instructor : Dr. Jon KrohnÂ
Jon Krohn is Chief Data Scientist at the machine learning company, untapt. He authored the 2019 book Deep Learning Illustrated, an instant #1 bestseller that was translated into six languages. Jon is renowned for his compelling lectures, which he offers in-person at Columbia University, New York University, and the NYC Data Science Academy. Jon holds a Ph.D. in Neuroscience from Oxford and has been publishing on machine learning In leading academic journals since 2010; his papers have been cited over a thousand times.
Student Testimonials
How It Works
The foundations series is available on demand.
Each course is available on-demand as soon as you register.
Study the courses in order or skip the subjects you are already know.
Each course includes exercises to improve learning outcomes.
Coding demos allow you to learn hands-on skills.
Learn at your own pace. Courses can be taken alongside additional Ai+ courses.
Interactive Sessions
Hands-On Coding Demos
Learning Comprehension Exercises
What You Will Learn
Not only will you learn the core mathematical concepts, but you will also learn how they are applied to machine learning. In addition, you will learn to apply your knowledge using some of the key machine learning and deep learning platforms, such as Tensorflow and PyTorch.
Linear Algebra
Data Structures for Algebra
What Linear Algebra Is, A Brief History of Algebra
Vectors and Vector Transposition
Norms and Unit Vectors
Basis, Orthogonal, and Orthonormal Vectors
Arrays in NumPy, Matrices
Tensors in TensorFlow and PyTorch
Common Tensor Operations
Tensors, Scalars
Tensor Transposition
Basic Tensor Arithmetic
Reduction
The Dot Product
Solving Linear Systems
Matrix Properties
The Frobenius Norm
Matrix Multiplication
Symmetric and Identity Matrices
Matrix Inversion
Diagonal Matrices
Orthogonal Matrices
Eigendecomposition
Eigenvectors
Eigenvalues
Matrix Determinants
Matrix Decomposition
Application of Eigendecomposition
Matrix Operations for Machine Learning
Singular Value Decomposition (SVD)
The Moore-Penrose Pseudoinverse
The Trace Operator
Principal Component Analysis (PCA): A Simple Machine Learning Algorithm
Resources for Further Study of Linear Algebra
Calculus
Limits
What Calculus is
A Brief History of Calculus
The Method of Exhaustion
Matrix Decomposition
Application of Eigendecomposition
Computing Derivatives with Differentiation
The Delta Method
Basic Derivative Properties
The Power Rule
The Sum Rule
The Product Rule
The Quotient Rule & The Chain Rule
Automatic Differentiation
AutoDiff with Pytorch
AutoDiff with TensorFlow 2
Relating Differentiation to Machine Learning
Cost (or Loss) Functions
The Future: Differentiable Programming
Gradients Applied to Machine Learning
Partial Derivatives of Multivariate Functions
The Partial-Derivative Chain Rule
Cost (or Loss) Functions
Gradients
Gradient Descent
Backpropagation
Higher-Order Partial Derivatives
 Integrals
Binary Classification
The Confusion Matrix
The Receiver-Operating Characteristic (ROC) Curve
Calculating Integrals Manually
Numeric Integration with Python
Finding the Area Under the ROC Curve
Resources for Further Study of Calculus
Probability and Statistics
Introduction to Probability
What Probability Theory Is
Applications of Probability to Machine Learning
Discrete vs Continuous Variables
Probability Density FunctionÂ
Expected Value
Measures of Central Tendency
Quantiles: Quartiles, Deciles, and Percentiles
Measures of Dispersion:
Covariance and Correlation
Marginal and Conditional Probabilities
Distribution in Machine Learning
Uniforms
Gaussian: Normal and Standard Normal
The Central Limit Theorem
Log-Normal
Binominal and Multinomial
Poisson
Mixture Distributions
Preprocessing Data for Model Input
 Information Theory
What Information Theory Is
Self-Information
Nats, Bits and Shannons
Shannon and Differential Entropy
Kullback-Leibler Divergence
Cross-Entropy
Frequentist Statistics
Frequentist vs Bayesian Statistics
Review of Relevant Probability Theory
Z-scores and Outliers
P-values
Comparing Means with t-tests
Confidence Intervals
ANOVA: Analysis of Variance
Pearson Correlation Coefficient
R-Squared CoefficientÂ
Correlation vs Causation
Multiple Comparisons
Regression
Features: Independent vs Dependent Variables
Linear Regression to Predict Continuous Values
Fitting a Line to Points on a Cartesian Plane
Ordinary Least Squares
Logistic Regression to Predict Categories
(Deep) ML vs Frequentist Statistics
 Bayesian Statistics
When to Use Bayesian Statistics
Prior Probabilities
Bayes’ Theorem
PyMC3 Notebook
Resources for Further Study of Probability and Statistics
Computer Science
Introduction to Data Structures and Algorithms
Introduction to Data Structures
Introduction to Computer Algorithms
A Brief History of Data
A Brief History of Algorithms
“Big O” Notation for Time and Space Complexity
Lists and Dictionaries
List-Based Data Structures: Arrays, Linked Lists, Stacks, Queues, and Deques
Searching and Sorting: Binary, Bubble, Merge, and Quick
Set-Based Data Structures: Maps and Dictionaries
Tables, Load Factors, and Maps
Trees and Graphs
Trees: Decision Trees, Random Forests, and Gradient-Boosting (XGBoost)
Graphs: Terminology, Directed Acyclic Graphs (DAGs)
Resources for Further Study of Data Structures & Algorithms
The Machine Learning Approach to Optimization & Fancy Deep Learning Optimizers
The Statistical Approach to Regression: Ordinary Least Squares
When Statistical Approaches to Optimization Break Down
The Machine Learning Solution
A Layer of Artificial Neurons in PyTorch
Jacobian Matrices
Hessian Matrices and Second-Order Optimization
Momentum
Nesterov Momentum
AdaGrad, AdaDelta, RMSProp, Adam, Nadam
Training a Deep Neural Net
Resources for Further Study
Gradient Descent
Objective Functions
Cost / Loss / Error Functions
Minimizing Cost with Gradient Descent
Learning Rate
Critical Points, incl. Saddle Points
Gradient Descent from Scratch with PyTorch
The Global Minimum and Local Minima
Mini-Batches and Stochastic Gradient Descent (SGD)
Learning Rate Scheduling
Maximizing Reward with Gradient Ascent
Prerequisites
Programming: All code demos will be in Python, so experience with it, or another object-oriented programming language, would be helpful for following along with the code examples.
Mathematics: Familiarity with secondary school-level mathematics will make the class easier to follow. If you are comfortable dealing with quantitative information — such as understanding charts and rearranging simple equations — then you should be well prepared to follow along with all the mathematics. Â
Learn More

On-Demand
Machine Learning Foundations: Linear Algebra
Through the measured exposition of theory paired with interactive examples, you’ll develop an understanding of how linear algebra is used to solve for unknown values in high-dimensional spaces, thereby enabling machines to recognize patterns and make predictions.

On-Demand
Machine Learning Fundamentals – Calculus
Through the measured exposition of theory paired with interactive examples, you’ll develop a working understanding of how calculus is used to compute limits and differentiate functions. You’ll also learn how to apply automatic differentiation within the popular TensorFlow 2 and PyTorch machine learning libraries.Â

On-Demand
Machine Learning Foundations: Probability and Statistics
Through the measured exposition of theory paired with interactive examples, you’ll develop a working understanding of variables, probability distributions, metrics for assessing distributions, and graphical models. You’ll also learn how to use information theory to measure how much meaningful signal there is within some given data.Â

On-Demand
Machine Learning Foundations: Computer Science
Through the measured exposition of theory paired with interactive examples, you’ll develop a working understanding of all of the essential data structures across the list, dictionary, tree, and graph families. You’ll also learn the key algorithms for working with these structures, including those for searching, sorting, hashing, and traversing data.
The #1 Machine Learning Mini-Bootcamp
