Brian McCarthy
Michael Chui
Michael Chui is a partner of the McKinsey Global Institute (MGI) and is based in McKinsey’s San Francisco office; Vishnu Kamalnath is an associate partner in the Boston office; and Brian McCarthy is a partner in the Atlanta office. The authors would like to thank Eric Bolme, Rafael Fernandes, Akshay Penmatcha, Abhay Raj, and Akshar Vashist for their contributions to this content.
Michael Chui is a partner of the McKinsey Global Institute (MGI) and is based in McKinsey’s San Francisco office; Vishnu Kamalnath is an expert in McKinsey’s North America Knowledge Center; and Brian McCarthy is a partner in the Atlanta office. The authors would like to thank Rafael Fernandes for his contribution to this content.
About the authors
Deep learning
Machine learning
Major models
Deep Learning
Major learning types
Machine Learning
Why AI Has Arrived
Artificial Intelligence
Jump to business use cases for:
An executive’s guide to AI
Staying ahead in the accelerating artificial-intelligence race requires executives to make nimble, informed decisions about where and how to employ AI in their business. One way to prepare to act quickly: know the AI essentials presented in this guide.
Major types
Timeline: Why AI now?
Artificial intelligence
Jump to Business Use Cases
Staying ahead in the accelerating artificial intelligence arms race requires executives to make nimble, informed decisions about where and how to employ AI in their business. One way to prepare to act quickly: Know the AI essentials presented in this helpful guide.
Confused about artificial intelligence? We’re here to help.
Model explainability
Updated in 2020
Download guide
Employed heavily across all industries
Describe what happened
Prescriptive
Predictive
a
b
Z
Y
X
Descriptive
Types of analytics (in order of increasing complexity)
Machine learning provides predictions and prescriptions
Next: Major types
Employed heavily by leading data and Internet companies
Provide recommendations on what to do to achieve goals
Employed in data-driven organizations as a key source of insight
Anticipate what will happen (inherently probabilistic)
Focus of machine learning
Most recent advances in AI have been achieved by applying machine learning to very large data sets. Machine-learning algorithms detect patterns and learn how to make predictions and recommendations by processing data and experiences, rather than by receiving explicit programming instruction. The algorithms also adapt in response to new data and experiences to improve efficacy over time.
We’ve listed some of the most commonly used algorithms today–this list is not intended to be exhaustive. Additionally, a number of different models can often solve the same business problem. Conversely, the nature of an available data set often precludes using a model typically employed to solve a particular problem. For these reasons, the sample business use cases below are meant only to be illustrative of the types of problems these models can solve.
Predict whether registered users will be willing or not to pay a particular price for a product
Predict the probability that a patient joins a healthcare program
Business use cases
Model in which artificial neurons (software-based calculators) make up an input layer, one or more hidden layers where calculations take place, and an output layer. It can be used to classify data or find the relationship between variables in regression problems.
Simple neural network
Predict the price of cars based on their characteristics (eg, age and mileage)
Forecast product demand and inventory levels
Classification or regression technique that generates decision trees sequentially, where each tree focuses on correcting the errors coming from the previous tree model. The final output is a combination of the results from all trees
Gradient-boosting trees
Simple, low-cost way to classify images (eg, recognize land usage from satellite images for climate-change models). Achieves lower accuracy than deep learning
Detect fraudulent activity in credit-card transactions. Achieves lower accuracy than deep learning
Classification or regression technique that uses a multitude of models to come up with a decision but weighs them based on their accuracy in predicting the outcome
AdaBoost
Predict power usage in an electrical-distribution grid
Predict call volume in call centers for staffing decisions
Classification or regression model that improves the accuracy of a simple decision tree by generating multiple decision trees and taking a majority vote of them to predict the output, which is a continuous variable (eg, age) for a regression problem and a discrete variable (eg, either black, white, or red) for classifcation
Random forest
Predict how likely someone is to click on an online ad
Use customer data to predict how much people are willing to spend on housing
Predict how many patients a hospital will need to serve in a time period
A technique that’s typically used for classification but can be transformed to perform regression. It draws a division between classes that’s as wide as possible. It also can be generalized to solve nonlinear problems.
Support vector machine
Create classifiers to filter spam emails
Analyze sentiment to assess product perception in the market
Classification technique that applies Bayes theorem, which allows the probability of an event to be calculated based on knowledge of factors that might affect that event (eg, if an email contains the word “money,” then the probability of it being spam is high)
Naive Bayes
Provide a decision framework for hiring new employees
Understand product attributes that make a product most likely to be purchased
Mostly used when an interpretable model is needed.
Highly interpretable classification or regression model that splits data-feature values into branches at decision nodes (eg, if a feature is a color, each possible color becomes a new branch) until a final decision output is made
Decision tree
Predict a sales lead’s likelihood of closing
Predict client churn
Upgrades a logistic regression to deal with nonlinear problems–those in which changes to the value of input variables do not result in proportional changes to the output variables
Linear/quadratic discriminant analysis
Predict if a skin lesion is benign or malignant based on its characteristics (size, shape, color, etc)
Classify customers based on how likely they are to repay a loan
A model with some similarities to linear regression that’s used for classification tasks, meaning the output variable is binary (eg, only black or white) rather than continuous (eg, an infinite list of potential colors)
Logistic regression
Optimize price points and estimate product-price elasticities
Understand product-sales drivers such as competition prices, distribution, advertisement, etc
Highly interpretable, standard method for modeling the past relationship between independent input variables and dependent output variables (which can have an infinite number of values) to help predict future values of the output variables
Linear regression
1
Algorithms/ business use cases
Gradient- boosting trees
Linear/ quadratic discriminant analysis
How it works
Inputs
Output
3.
2.
1.
Once training is complete–typically when the algorithm is sufficiently accurate–the algorithm is applied to new data
The algorithm is trained on the data to find the connection between the input variables and the output
A human labels the input data (eg, in the case of predicting housing prices, labels the input data as “time of year,” “interest rates,” etc) and defines the output variable (eg, housing prices)
When to use it
You know how to classify the input data and the type of behavior you want to predict, but you need the algorithm to calculate it for you on new data
What it is
An algorithm uses training data and feedback from humans to learn the relationship of given inputs to a given output (eg, how the inputs “time of year” and “interest rates” predict housing prices)
Reinforcement learning
Unsupervised learning
Machine Learning - Major Types
Supervised learning
The regions highlighted in dark blue on the right-hand image are the pixels contributing most to this image being classified as a rabbit.
Example
Heat maps, primarily used for image-classification models, in which the degree of heat in a region on the image corresponds to the level of impact that region has on predicting what the image shows.
Saliency Maps
The Y axis shows the relative importance of each feature to income level, while the X axis shows the correlation to income level (eg, age has a high impact on predicting high income and high age correlates with high income).
SHAP uses a game theory approach to score each feature based on its contribution to the output after considering the feature’s interaction with all other features. Unlike LIME, SHAP offers a global explanation of features by considering all possible feature interactions as well as individual observation-level explanations.
SHapley Additive exPlanations (SHAP)
The probability of a high income is driven by marital status, age, years of work experience, and hours worked per week, while the probability of income being low is driven by occupation and education.
LIME studies how variations of input data affect the output, enabling it to list features affecting the output and score their level of impact.
Local interpretable model-agnostic explanations (LIME)
Plots that trace how the output changes as a variable is changed.
Partial dependence plots (PDP)
Once the model is built, these techniques look at all the data features to determine how important each one was in determining the model’s output. Each method (eg, LIME, SHAP) has its own special way of looking at the model and gleaning its underlying methodology.
In situations where a complex machine-learning model is used and decision makers need to understand the reason behind the model’s output or recommendations (eg, to explain to a customer why a loan application was rejected).
XAI, short for “explainable artificial intelligence” refers to a set of techniques that help show how a machine-learning algorithm comes up with a set of outputs (eg, predictions). It allows humans to better understand (and trust) outputs from complex, “black-box” models and helps decision makers know the relevant features (data characteristics) driving the output.
Techniques
XAI
Generally, as age increases, the probability of income being high increases, with a steep increase between ages 35 to 50 (with all other features at their average values).
We’ve listed some of the most commonly used techniques today–this list is not intended to be exhaustive.
Group similar customers together and recommend next best product for them to buy
Detect patterns in spread of a pandemic
An approach to representing data with high dimensionality in a two-dimensional chart to more easily visualize interesting patterns in the data
Manifold learning
Recommend news articles a reader might want to read based on the article she or he is reading
Recommend what movies consumers should view based on preferences of other customers with similar attributes
Often uses cluster behavior prediction to identify the important data necessary for making a recommendation
Recommender system
Inform product usage/development by grouping customers mentioning keywords in social-media data
Cluster loyalty-card customers into progressively more microsegmented groups
Splits or aggregates clusters along a hierarchical tree to form a classification system
Hierarchical clustering
Segment employees based on likelihood of attrition
Segment customers to better assign marketing campaigns using less-distinct customer characteristics (eg, product preferences)
A generalization of k-means clustering that provides more flexibility in the size and shape of groups (clusters)
Gaussian mixture model
Segment employees based on likelihood to attrite
Segment customers into groups by distinct characteristics (eg, age group)–for instance, to better assign marketing campaigns or prevent churn
Puts data into a number of groups (k) that each contain data with similar characteristics (as determined by the model, not in advance by humans)
K-means clustering
Detect fake reviews and opinions on social media
Reduce noise in a medical image (eg, MRI) to analyze it more accurately
A type of neural network that can be used to represent data efficiently by removing unnecessary information. The representation can be thought of as a compressed version of the original data.
Autoencoder
The algorithm identifies groups of data that exhibit similar behavior (eg, forms clusters of customers that exhibit similar buying behaviors)
It infers a structure from the data
The algorithm receives unlabeled data (eg, a set of data describing customer journeys on a website)
You do not know how to classify the data, and you want the algorithm to find patterns and classify the data for you
An algorithm explores input data without being given an explicit output variable (eg, explores customer demographic data to identify patterns)
Predict brain-tumor progression
The sample business use cases below are meant only to be illustrative of the types of problems these models can solve.
Optimize the driving behavior of self-driving cars
Optimize pricing in real time for an online auction of a product with limited supply
Stock and pick inventory using robots
Balance the load of electricity grids in varying demand cycles
Optimize the trading strategy for an options-trading portfolio
The algorithm optimizes for the best series of actions by correcting itself over time
It receives a reward if the action brings the machine a step closer to maximizing the total rewards available (eg, the highest total return on the portfolio)
State
Environment
Action
Reward
Algorithm
The algorithm takes an action on the environment (eg, makes a trade in a financial portfolio)
You don’t have a lot of training data; you cannot clearly define the ideal end state; or the only way to learn about the environment is to interact with it
An algorithm learns to perform a task simply by trying to maximize rewards it receives for its actions (eg, maximizes points it receives for increasing returns of an investment portfolio)
Next: Major models
41%
25%
27%
Image classification
Voice recognition
Facial recognition
Deep learning can often outperform traditional methods
% reduction in error rate achieved by deep learning vs traditional methods
Deep learning is a type of machine learning that can process a wider range of data resources, requires less data preprocessing by humans, and can often produce more accurate results than traditional machine-learning approaches (although it requires a larger amount of data to do so). In deep learning, interconnected layers of software-based calculators known as “neurons” form a neural network. The network can ingest vast amounts of input data and process them through multiple layers that learn increasingly complex features of the data at each layer. The network can then make a determination about the data, learn if its determination is correct, and use what it has learned to make determinations about new data. For example, once it learns what an object looks like, it can recognize the object in a new image.
Detect defective products on a production line through images
Understand customer brand perception and usage through images
Detect a company logo in social media to better understand joint marketing opportunities (eg, pairing of brands in one product)
Diagnose health diseases from medical scans
The CNN can now classify a different image as the letter “A” if it finds in it the unique features previously identified as making up the letter
In the hidden layers, it identifies unique features–for example, the individual lines that make up “A”
The convolutional neural network (CNN) receives an image–for example, of the letter “A”–that it processes as a collection of pixels
ABCDEFG HIJKLMNO PQRSTUV WXYZ
Image classification: Human
Feature extraction and mapping
Input
When you have an unstructured data set (eg, images) and you need to infer information from it
A multilayered neural network with a special architecture designed to extract increasingly complex features of the data at each layer to determine the output
Recurrent neural network
Convolutional neural network
Transformer
Generative adversarial network (GAN)
Power chatbots that can address more nuanced customer needs and inquiries
Generate captions for images
Assess the likelihood that a credit-card transaction is fraudulent
Track visual changes to an area after a disaster to assess potential damage claims (in conjunction with CNNs)
Provide language translation
Generate analyst reports for securities traders
“Tomorrow” assigned highest probability
Probability distribution of possible last word
Output:
free
you
Are
<\s>
“free”
“you”
“Are”
Start sentence command “<\s>”
After receiving “free,” the neuron assigns a probability to every word in the English vocabulary that could complete the sentence. If trained well, the RNN will assign the word “tomorrow” one of the highest probabilities and will choose it to complete the sentence
The neuron receives the word “Are” and then outputs a vector of numbers that feeds back into the neuron to help it “remember” that it received “Are” (and that it received it first). The same process occurs when it receives “you” and “free,” with the state of the neuron updating upon receiving each word
A recurrent neural network (RNN) neuron receives a command that indicates the start of a sentence
Inputs:
Predicting the next word in the sentence “Are you free ______?”
Other neural-network architectures assume all inputs are independent from one another. But this assumption doesn’t work well for some tasks. Take, for example, the task of predicting the next word in a sentence–it’s easier to predict the next word if several words that came before are known
Output layer
Context nodes
Hidden layer
Input layer
When you are working with time-series data or sequences (eg, audio recordings or text)
A multilayered neural network that can store information in context nodes, allowing it to learn data sequences and output a number or another sequence
DEEP LEARNING - MODELS & BUSINESS USE CASES
Multiple attention heads analyze the elements of the question and sentence at the same time. They then output a vector of numbers that help the network learn which parts of the context and question are related.
Though transformers have been most broadly applied to language-processing tasks to date, their ability to learn deep dependencies in the data makes them effective for other types of sequential data, such as time series and videos
A neural network that uses special mechanisms called “attention heads” to help it understand what each word means when used in a particular context
An attention head learns that the question is asking about food.
An attention head learns that the sentiment toward the food in the input text was likely positive.
The transformer calculates probabilities for possible answers to the question. In this case, the highest probability is for “yes.”
4.
Develop more realistic chatbots
Parse text (blogs, reviews, tweets, etc.) to better understand customer sentiment
The generator works to produce a synthetic version of the underlying data (eg, replicate an image of a person), that is good enough to “fool” the discriminator. The discriminator works to distinguish between data that is genuine and synthetic. At the start, the generator creates random patterns that the discriminator can easily distinguish from “real” data, but produces more representative data with each attempt. Over time, the generator will get very good at generating synthetic data that can “fool” the discriminator.
When insufficient amounts of data are available to train an algorithmic model, GANs can be used to create new, synthetic data that is representative of actual data. They can also identify new potential cyberattack vectors or fraudulent credit-card transactions. Because the generator is constantly trying to make new representations of the real data with slight variations, it could generate new types of possible attacks or fraudulent transactions that weren’t previously known.
A combination of two networks, a generator and a discriminator, that compete against each other to perform a task, which eventually results in better performance of the required task.
Generate synthetic images or other data that has limited availability
Simulate cyber attacks on IT systems
Generate potential ideas for fashion design
Generate images or audio from text
Simulate a product launch in a new geography
Next: Why AI now?
learn
reason
problem solve
perceive
AI is typically defined as the ability of a machine to perform cognitive functions we associate with human minds, such as perceiving, reasoning, learning, interacting with the environment, problem solving, and even exercising creativity. Examples of technologies that enable AI to solve business problems are robotics and autonomous vehicles, computer vision, language, virtual agents, and machine learning.