LIVE TRAINING: April 20th:



Save 30% : Register Now

Price: $147

Regular price $210 , discounted 30%

  • 4 hour immersive session

  • Hands-on training with Q&A

  • Recording available on-demand

  • Certification of Completion

30% Discount Ends in:

Subscribe and get an additional 10% to 35% off ALL live training session

View Plans

Meet Your Instructor

Aric LaBarr, PhD

Dr. Aric LaBarr is an Associate Professor in the Institute for Advanced Analytics. He is passionate about helping people solve challenges using their data. There he helps design the innovative program to prepare a modern work force to wisely communicate and handle a data-driven future at the nation’s first Master of Science in analytics degree program. He teaches courses in predictive modeling, forecasting, simulation, financial analytics, and risk management. Previously, he was Director and Senior Scientist at Elder Research, where he mentored and led a team of data scientists and software engineers. As director of the Raleigh, NC office he worked closely with clients and partners to solve problems in the fields of banking, consumer product goods, healthcare, and government. Dr. LaBarr holds a B.S. in economics, as well as a B.S., M.S., and Ph.D. in statistics — all from NC State University.

Why Enroll?

By the end of the course, participants will be able to:

  • Use network analysis to create good features for fraud models like centrality and connectivity
  • Properly oversample or undersample a rare event data set as well as use synthetic sampling techniques like SMOTE
  • Build a supervised fraud classification model using one of the following: logistic regression, tree based algorithms, and naive Bayes models
  • Build a supervised NOT-fraud classification model using one of the above techniques
  • Interpret a complicated model using LIME

Course Overview

The Association of Fraud Examiners (ACFE) consistently estimates that organizations lose approximately 5% of their revenues due to fraud. Based on world GDP estimates, this would be anywhere from $3-4 trillion annually.

Fraud is one of the most interesting problems to try and solve because the people in your data are not trying to be found. Data science techniques are now at the forefront of this industry to help fight the battle against criminals. This course outlines the typical fraud framework at an organization and where data science can play a role. It will also lay out how to build an analytically advanced fraud system at an organization. Moving beyond just simple rules and anomaly detection, these supervised and unsupervised approaches to fraud modeling will help an organization combat the every present problem of fraud.

These fraud modeling approaches can also be used in other industries to help organizations find unique customers or problems that might exist in their current systems.

Course Outline

1. Review of Fraud

The Problem of Fraud – How can we analytically define fraud? There are important characteristics of fraud that puts a better perspective on the modeling and identification of fraud.
Detection and Prevention – The two biggest pieces that any holistic fraud solution should have are detection of previous instances of fraud and prevention of new instances. This section also defines the typical fraud identification process in organizations.
Analytical Solution – Now that we now what fraud is as well as the organizational structure of how to deal with fraud, we need to introduce the analytical approaches to becoming a mature organization on detecting and preventing fraud.

2. Data Preparation

Review of Feature Engineering – The best way to glean information from data is to develop good features to help detect and identify fraud. We talk about and develop strategies for developing good features for anomaly detection. Briefly review RFM Features and categorical feature creation as well.
Introduction of Network Approaches – When generating features, we can also incorporate the ideas of network analysis to our modeling framework. Who people are connected to could play a major role in detecting instances of fraud as well as complex fraud rings.
Obtaining Labeled Data – The hardest part about modeling fraud is obtaining labeled cases of fraud. In this section we will talk about using anomaly models, subject matter experts, and/or unsupervised techniques to obtain labels for suspected fraud.
Sampling Concerns – Fraud is typically and hopefully a rare event at a company. However, this poses problems for modeling. In this section we cover the process of oversampling and undersampling to account for rare event modeling problem. We also introduce the Synthetic Minority Oversampling TEchnique (SMOTE).

3. Supervised Fraud Models

Classification Scoring – This section reviews the concepts of classification models and how they are used to rank and score observations for fraud.
Logistic Regression – This section reviews the concept of logistic regression which is a more statistical based and interpretable model for fraud detection.
Tree-Based Algorithms – This section covers the concepts of tree based models. It starts with focusing on decision trees and their generalization to random forests. We then introduce the concepts of gradient boosting approaches without going into too much mathematical detail.
Naive Bayes Model – The naive Bayes model is a great model to use for fraud detection. This section introduces the main ideas and uses for the naive Bayes model.
Supervised NOT-Fraud Model and Model Evaluation – The previous techniques all focus on previous instances of fraud and detecting those again. Here we talk about the important process of identifying new instances of fraud we haven’t seen before using the NOT-Fraud model and combining it with the fraud model. We also discuss how to properly evaluate your fraud models.

4. Clustering and Implementation

Clustering of Scored Observations – Once you have an idea about which observations don’t look like either previous instances of fraud or not fraud, how do you isolate and investigate these? We need to use clustering to help isolate groups of observations that might identify new types of fraud.
Interpretability – The people who are typically investigating cases of fraud are not the data scientists who build the models. With this being the case, we need to make sure our models are interpretable and implementable for easy use by investigators by using scorecards or Local Interpretable Model Explanations (LIME).
Long-term Fraud Strategy – To wrap-up our discussion on fraud, we talk about how to continue to implement these pieces into the grander fraud framework at a company. We also talk about how to evaluate an entire fraud system, not just the models themselves, after the models have been introduced.

Upcoming Live Training

March 10th

Part 1: Probability and Statistic Course

This class, Probability & Information Theory, introduces the mathematical fields that enable us to quantify uncertainty as well as to make predictions despite uncertainty. You’ll develop a working understanding of variables, probability distributions, metrics for assessing distributions, and graphical models. 

learn more
Open Data Science

Ai+ | ODSC
One Broadway, 14th Floor
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google