12 PM EST*

Save 30% : Register Now

Price: $147

Regular price $210 , discounted 30%

  • 4 hour immersive session

  • Hands-on training with Q&A

  • Recording available on-demand

  • Certification of Completion

30% Discount Ends in:

Subscribe and get an additional 10% to 35% off ALL live training session

View Plans

Meet Your Instructor

Ankur Patel

Ankur Patel is the co-founder & Head of Data at Glean, an AI-powered spend intelligence solution for managing vendor spend, and the co-founder of Mellow, a fully managed machine learning platform for SMBs. He is an applied machine learning specialist in both unsupervised learning and natural language processing, and he is the author of Hands-on Unsupervised Learning Using Python: How to Build Applied Machine Learning Solutions from Unlabeled Data and Applied Natural Language Processing in the Enterprise: Teaching Machines to Read, Write, and Understand. Prior to founding Glean and Mellow, Ankur led data science and machine learning teams at several startups including 7Park Data, ThetaRay, and R-Squared Macro and was the lead emerging markets trader at Bridgewater Associates. He is a graduate of Princeton University and currently resides in New York City.

Why Enroll?

By the end of the course, participants will be able to:

  • Understand how unsupervised learning fits into the machine learning ecosystem

  • Apply linear and non-linear dimensionality reduction and evaluate results

  • How to find patterns in data with zero or few labels and how to efficiently label data when only few labels are available

  • Understand the overview of Clustering Algorithms

  • Application: Group Segmentation

Course Overview

In the first part, we will explore one of the core concepts in unsupervised learning, dimensionality reduction. Dimensionality reduction serves two main purposes. First, it reduces the computational complexity of working with very large datasets. Second, it removes the non-relevant information in a dataset, surfacing the information that matters most. We will use dimensionality reduction algorithms to build an anomaly detection system; specifically, we will build a system to detect credit card fraud without using any labels. Anomaly detection systems are widely used in industry today to detect all types of rare events such as fraud (e.g., credit card, wire, cyber, insurance), crime (e.g., hacking, money laundering, drug, arms, and human trafficking), and adverse events (e.g., financial market meltdowns, cardiac events, and spikes in online traffic). In the second part, we will explore one of the core concepts in unsupervised learning, clustering. Clustering is able to segment entities (e.g., users) into distinct and homogenous groups such that members of a group are very similar to members within the group but distinctly different from members in other groups. This group segmentation is possible without requiring any labels whatsoever and instead relies on separating entities based on behavior. For example, via clustering, online shoppers could be grouped into budget-conscious shoppers, high-end shoppers, frequent shoppers, seasonal shoppers, technophiles, audiophiles, sneakerheads, back-to-school shoppers, young parents, senior citizens, and millennials. To perform clustering well, good feature engineering is required. In this course, we will explore loan applications, perform feature engineering, and segment users based on their potential creditworthiness. We will also explore how clustering allows efficient labeling, turning unlabeled problems into labeled ones, opening up the realm of semi-supervised learning.

Course Outline

Lesson 1. Introduction to Unsupervised Learning

  • How unsupervised learning fits into the machine learning ecosystem
  • Common problems in machine learning: insufficient labeled data, curse of dimensionality, and outliers

Lesson 2. Introduction to Dimensionality Reduction

  • Motivation for dimensionality reduction: reduce computational complexity of large data, remove non-relevant information and surface salient information, perform anomaly detection, perform clustering
  • Linear Dimensionality Reduction Algos
  • Non-linear Dimensionality Reduction Algos

Lesson 3. Application: Anomaly Detection

  • Introduce use case: credit card fraud detection
  • Explore and prepare the data
  • Define evaluation function
  • Apply linear dimensionality reduction and evaluate results
  • Apply non-linear dimensionality reduction and evaluate results

Lesson 4. Introduction to Clustering

  • Why the need for clustering is exists / the real world motivation
  • How to find patterns in data with zero or few labels
  • How to efficiently label data when only few labels are available

Lesson 5. Overview of Clustering Algorithms

  • K-Means
  • Hierarchical clustering
  • Apply to MNIST and Fashion MNIST datasets
  • Visualize clusters and evaluate results

Lesson 6. Application: Group Segmentation

  • Introduce use case: loan applications
  • Explore and prepare the data
  • Define evaluation function
  • Apply clustering algorithms and evaluate results

Key Details





MAY 11TH, 2021





Python coding experience and familiarity with pandas, numpy, and scikit-learn would be helpful.

Understanding of basic machine learning concepts, including supervised learning and experience with deep learning and frameworks such as TensorFlow or PyTorch is a plus.

Upcoming Live Training

May 18th

Deep Unsupervised Learning: Autoencoders, Semi-supervised Learning, and Generative Models

In this course, we will explore one of the core concepts in unsupervised learning, autoencoders, and introduce semi-supervised learning. We will build unsupervised, supervised, and semi-supervised (using autoencoders) credit card fraud detection systems. First, we will employ a pure unsupervised approach, without the use of any labels. Next, we will employ a supervised approach on a partially labeled dataset. Finally, we will apply autoencoders to the partially labeled dataset (an unsupervised learning technique) and combine this with a supervised approach, building a semi-supervised solution. To conclude, we will compare and contrast the results of all three approaches.

learn more
Open Data Science

Ai+ | ODSC
One Broadway, 14th Floor
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from Youtube
Consent to display content from Vimeo
Google Maps
Consent to display content from Google