MBA Alumni Lifelong Learning Series, “Business Analytics, Leveraging the Power of Data”

Female Speaker: Let’s go ahead
and talk about what we’re going to do tonight. We’re very fortunate
to have someone that I don’t really
even need to introduce. Professor Saby Mitra is senior
associate dean of programs here at Scheller and a professor
of information technology. And I think most of you
all know him. He has been here
for 20 plus years. His areas of expertise include
IT security, ecommerce, IT governance, IT infrastructures, design,
digital marketing—Wow!— some other ones
and business analytics which is what we’re going to be
talking about tonight in his presentation
on Leveraging the Power of Data. Professor Mitra was faculty
director of the executive MBA program here
from 2007 to 2013. In his current role, he oversees
all of our MBA programs. Because of this, I’ve had the opportunity
to work very closely with him. And so I can tell you
three things that I know
personally about him. First, all of his students
love him and love his teaching. So they’re always like,
“If everybody could do it like Saby does, it would be great.” The second
is he is a wise leader. He’s helped us
make good decisions. And the third is that
he really cares about Scheller and especially
about the MBA programs. So, I’ll let him tell you
a little bit more about himself, but please welcome
[inaudible]. Saby Mitra: Thank you. [applause] Thank you, Cynthia.
Is this too loud? Perfect? OK. It’s really great to see
so many familiar faces after such a long time; delighted that you guys
are here. You know and I am fully aware
that I am standing in between you and alcohol
which you saw outside— [laughter] —so I will try and keep
this entertaining and move along
as quickly as possible. Also, one of the things
that we do, especially in the fulltime
and the evening MBA, I know there are some folks here
from the evening MBA; saw a few also the full time. And Brian, who is standing
at the back there, our associate dean started
this tradition— we do
this one-minute introduction, so every student
needs to have a slide and they have exactly
one minute. And Julia, who is standing
somewhere behind there, times. And then you have to do
your introduce yourself in one minute. You know,
it’s a great tradition. So I’m going to start with
that one-minute introduction that I use for the fulltime
and the evening MBA, and here it is. So I’m Saby Mitra. I’m the senior associate
dean of programs at the Scheller College. The number 26
that you see up there, young as I may look,
that’s not my age; that’s the number of years
that I’ve been at Georgia Tech. I, as Cynthia mentioned,
I love Georgia Tech. I love the MBA programs. That’s a picture of me
and my wife at our daughter’s graduation. I’m really happy
she’s off our payroll now. [laughter] On the right are things
that really excite me— My research
using data and analytics especially in the area
of information security and electronic commerce. I’ve had the opportunity
to work with several companies
in that area, and we have the data expertise
but we don’t have the data; and the companies
have the data, and they can sometimes
benefit from our expertise. So it’s a real great match.
I’m a history buff. I love to travel.
I’m into road biking. I’m not great at it,
but I love to do it. And I’m also a pretty good cook; that’s a picture of us
at our Thanksgiving dinner, although I’ve become vegetarian
in the last few years and I’m actually dreading
traveling to China. For those of you
who do travel there, you know it is not the most
vegetarian-friendly place. And I’ve been
threatening my wife that I’m going to cook
a tofurky for Thanksgiving; she’s not too happy
about that, either. So that’s my one-minute
introduction. Here’s what we’re going
to talk about today. So I’ll start off
with—the session is on leveraging
the power of data, and we do want to keep it really interactive—I have
a set of slides; we don’t want to get
stuck on the slides. So we first focus on Can you really get
competitive advantage from data? Can analytics give you
competitive advantage? And so that’s the first question
we’ll try and answer. And then the second is,
you know, you’re here
for a session like this, you do want to see
some of the methods and some of the cool tools
and techniques that are available
that companies are using today, so I’ll try and give you
a broad overview of that and look at some of
the applications. And then we’ll sort of
finish off with a discussion on how do you implement
the insights that you get from analytics. And then, once again, you know, let’s keep this
really interactive so that we can have
more of a dialogue than just me talking here. As I said, I’m fully aware
that the real purpose of this is for you guys to come
and network and have some fun, and, you know,
I’ll try to keep it as entertaining as possible
given the topic. Most of you are familiar with the with
the Gartner Hype Curve, right? I mean there’s the Peak of
Inflated Expectations, Trough of Disillusionment, and then there is
more of a steady increase in the use of the technology and the benefit that businesses
get out of the technology. So here’s my question to you: Where do you think we are in
the Analytics Hype Cycle today? What do you think?
Are we at the peak? OK. Student: [inaudible] Saby Mitra: Sort of here,
on the way to the trough? OK.
Any other thoughts? Student: [inaudible] Saby Mitra: Actual use
and actual value. You know here’s kind of—and this
is just a survey. So this appeared in 2017
in Sloan Management Review. It was done actually by one of our Ph.D.
students, Sam Ransbotham. He does a lot of these surveys
and use cases. And you can see that
the percentage of organizations reporting that analytics creates
a competitive advantage, you know, sort of reached
a peak somewhere in 2012, 2013. There was a dip, and there seems
to be an uptick again, kind of related to what Brandon,
you mentioned, that the companies are starting
to see value from analytics. So what do you think given this
are we—is this the trough or do we expect
another trough going forward? Student: [inaudible] Saby Mitra: Right.
That’s a great point. I do feel that there is
more data available; the tools are getting better; the technology
is getting better. And so that’s not
where the problem is. You know, perhaps
there is another trough coming for a different reason which is that the production
of analytics has been great, you know, all the tools
are getting better; we have more data scientists
who are trained. But the use of analytics, the consumption of analytics
to make business decisions, I’m not so sure that it has
really percolated the mass, the companies, so that
they can actually get value from the insights that analytics
is able to generate. So if there is a trough
coming—and I don’t know— I am not great
at predicting the future— but if there is a trough coming, then it’s probably more because
of the consumption of analytics rather than the production
of analytics. So you’re right that
the tools are getting better, but is management
and are regular employees able to consume
the insights that analytics
is able to generate? If there is a downturn
in this curve, it’s probably
coming from that area. Student: It just occurred to me
that all the companies that are analyzing this data
to tell us about points of data are compiled by companies
that work in big data, so— Saby Mitra: Self-fulfilling? Student: [continues] —I believe
there probably is a skew— I don’t know one way or another—
but there’s probably some bias, just a
[inaudible] because the people
working on these reports here are experts in data
so they, perhaps, see advantages at the different time scales
that everybody else. Saby Mitra: Right, no,
absolutely. That’s where the consumption
of analytics comes in, right? I mean so the production
of analytics is great; we have better technology,
better algorithms, all of that is great;
better data scientists. But is the rest
of the organization able to consume
the insights that are generated and make actual business change
to get the value from it. And that’s where
the doubt comes in. Yes— Student: [inaudible] Saby Mitra: And so
the organizations are not really structured
right to get advantage. One of the things we will talk
about towards the end is implementation, you know, how do you
actually implement the insights. And that’s a really,
really important point. All right. So let’s let’s start
with that first question: Can data really be a source
of competitive advantage? What do you think? Student: [inaudible] Saby Mitra: Right. But they’re moving more
to the content side now, right? And if they move
to the content side, will data be able
to really give them that same level of advantage? Because once they become
a content generator, they’re more like a, you
know, entertainment company, and will data be able
to drive the shows they produce? Is that really going
to be valuable? That’s a question
that needs to be answered. Yes— Student: [inaudible] Saby Mitra: Sure. But, you know, real creativity
might not come from the data, and it might come from
just the script writers and the actors and all that. So it’s a little debatable
whether they’ll be able to translate that advantage
from data into actual advantage when they become
more of a content company. But it’s a good idea, to me, to answer this question
looking at it from a little bit
of a theoretical lens. And you guys will remember
from your strategy class this idea of a resource-based
view of the forum. Do you guys remember this
from your strategy class at all? And the basic theory
is that you can’t get— you can only get
competitive advantage from a resource—and a resource
may be a physical resource or intellectual resource; it may be a capability
that you own, that the company owns
that’s valuable; that’s rare—
not everyone has it— that’s inimitable
and nontransferable. So you need to have something
which others don’t have. That’s the only way you can
get competitive advantage according to the theory. So if you apply this
to the data side, I think there are really
three questions. The first is Do I have valuable
data that others do not have? And that’s really driven
by your business model. You know, Netflix or Facebook
generates a lot of data because of their business model. Because Facebook is based
on user-generated data, so they generate data
as part of their business model. Amazon generates a lot of data
because it’s online versus, let’s say, a Walmart
which may not be online. So what data you have
which others don’t have is really driven
by your business model. So that’s one part. The second is How well do I use that data
to generate actionable insights? And that is driven
by the analytics functions, all the data scientists
and all the people you have in your organization that can leverage that data
to generate actionable insights. So that’s the second part. And then the third part
is implementation which is How well do I translate
the insights into actual actions
that have a business impact? And that’s driven
by the whole organization. So where do you think
the big gap is? Where do—where is
the big opportunity big gap in analytics? Student: [inaudible] OK, for in general,
for most companies, where should
they be putting their effort? Student: Opportunities common
to the whole organization because as consultants, they
don’t have to know how to expand [inaudible]. Saby Mitra: Right. So let me phrase the question
a little bit differently. Which is the hardest here?
Which is the hardest? Yes— Students: [inaudible] Saby Mitra: OK.
So that’s certainly really hard because it requires
the whole organization; requires a mindset change
in the whole organization. Anything else? Student: I think the data
[unclear] a lot of people
apply the data already put some
sort of filter on it, and they’re
selling it to you. Saby Mitra: So, that’s
an important point because this is generated by your—it’s driven
by your business model. Some business models
generate data; some other business models don’t
generate that amount of data. And you always hit privacy
limits out here because of what
you can use the data for. You know, if you— Student: [inaudible] Saby Mitra: Say that again. Student: [inaudible] Saby Mitra: I wouldn’t. I [unclear]
the analytics function. Student: Generate insights
like this actual [inaudible]. Saby Mitra: Right. Right.
Absolutely. So this is something
that you can work on. I do not want to,
you know, to minimize
the second bucket there, but that’s something
you can work on. You know, we have
lots of data scientists that lot of
companies—every university is coming up with
a data science program. You look at us, we have maybe,
I think, 3000 or 2000 people in our online M.S.
analytics program and quite a large number
of students in our physical M.S.
analytics programs, and every university
is doing that. So I think we are
making headway out here. This becomes hard
because of privacy limitations. Because you have to change
your business model to collect more data, it’s hard
to move the needle out here. And this becomes hard because it’s the whole
organization that’s involved. So all of those three buckets
are important but probably,
in my view, and we can argue till the cows
come home about this, but probably
the biggest problem today is on the implementation
and, perhaps, related to the actionable
part of this insights, and also because
of privacy limitations in the use and collection
of data for companies. But if you believe
the resource-based view, you must have some data that others don’t have
to get advantage from it. You must be able
to analyze that data, and you must be able
to implement the insights. So we’ll follow this structure
in the in the presentation and let’s start first
with the data. And I want you guys to do
a quick thought exercise for me, and take three companies
all related. So if you’re on
this side of the room, I want you guys
to focus on Walmart. Walmart is the world’s,
as you know, the world’s largest retailer. They have about half
a trillion dollars in revenue. What type of data do you think
Walmart has on its consumers? So think about that if you’re
on this side of the room. If you’re in the middle here,
think about Amazon. You all know it’s the world’s
largest online retailer. They have about
half of Walmart’s revenue, but $232 billion;
they’re pretty large. What type of data do you think
Amazon has on its consumers? And if you’re on this side,
think about P&G. P&G is the largest FMCG company
in the U.S. They have about $65 billion
in revenue. What sort of data do you think
P&G or any FMCG company, perhaps Coke,
what type of data do you think they have
on their customers? Let’s take just
a couple of minutes. You know, wherever you’re
sitting just think about this. Talk amongst yourselves, and we’ll come back
and do a debrief. And think about—compare the data
that each of these companies have in terms of volume,
in terms of the variety of data, and the speed with which
they receive the data, and who has the data advantage
here. [students discussing] All right, let’s take
another two minutes. [students discussing] All right, guys. Let’s back again.
All right. OK. I know you guys
are having a lot of fun but let’s— yeah, there you go.
Do it again. [loud whistle] There you go.
Perfect. OK, let’s start first
with Amazon. So what sort of data
do you think Amazon has about all of us? Student: Everything.
Saby Mitra: Everything. [laughter] They don’t quite have
everything but— Student: [inaudible] Saby Mitra: OK. All right, guys,
one sec. Yes— Student: [inaudible] …they also know a lot
of online transactions. They have[inaudible] . They also know household—how
many people do you have and what they are going to
[inaudible] . What don’t have is
how is this translating to the offline transactions and therefore would be
really hard to figure out what is going on in-store so they can actually
bring those categories up. Saby Mitra: OK, very good.
Anyone else? Yes— Student: Just taking it from our
conversation a step further, they have data about things
I [unclear] purchase. Saby Mitra: Right. Exactly.
Very good, very good. Yes— Student: One thing that Amazon
has is on their cloud system, they run more companies
than anybody else, and they route all of the data
in all these companies. So if they know
where you’re browsing Amazon [inaudible], they have
your network data and see who else you’re seeing.
It’s all in their cloud system. Saby Mitra: Yeah. I mean, you know,
there is a little bit of security/privacy issue there in being able
to leverage the data that you’re hosting for others
on your cloud platform, right? Of course AWS is
the largest or maybe, you know, neck-and-neck now
with Microsoft. Microsoft has done a great
turnaround on the cloud space. Student: [inaudible] on what they said previously—
on the front end of that, they’re looking most people
go to Amazon and they’re doing
the price shopping there. That’s when the offline
transactions [inaudible] and trying to get lower
and do the comparable. What Amazon should be doing
with that is looking at that, looking at it as the front end
of the buying curve finding ways to narrow
that gap between front end
of the buying curve and then closing
on the sale. Using that data, they can
pull up the seasonality, get a [inaudible] ahead of it, and start
doing predictive analysis for the customer’s
next purchase. But then they’ll be able
to capture more of that and bring the pricing down. Saby Mitra: Great point.
Yes— Student: Amazon has been
traditionally playing with the price game, but now they are in that
position with the type of data and the budget and behavior
of the customer, they can start
jacking up the prices. They have already
started doing it. And they can actually
play on [unclear] which is where they are
sitting on right now. Like if I buy a single product
like a mouse, let’s say, and if that exact same mouse
is available at, say, Walmart, and Amazon
has a potential and a power to sell me as a Prime member, sell the exact same product
for $0.50 more and I would gladly pay $0.50
more because that saves— Saby Mitra: Of the convenience. Student: [inaudible] Saby Mitra: Let’s take one more,
yes, on the back— Student: Yeah, so we
[inaudible] how much data they have
on both the [inaudible]. Saby Mitra: Very good point, because so many people sell
on their platform, right? So many retailers
sell on their platform. And I’ll show you—I’ll give you
an example of how they’re trying
to use that data. Student: If you bought it
through Amazon, it still shows you bought
[inaudible]. Saby Mitra: Right.
All right. So, you know, if you look
at what Amazon has, they have products
search data; so if you search on a product,
they know that. Click stream data—how you browse
through their networks, their website,
what products you look at before you buy a specific
product, they know that. What order you search;
they know that information. They have
multi-retailer sales data; a lot of retailers
sell through their portal, so they have that information. They have market basket data; so what products people
tend to buy together, they have that information. Individual purchase history; so for me,
everything that I’ve ever bought from Walmart—from Amazon—they
know that information; they keep it
for a long time. So that individual
purchase history, that they can identify me perfectly through
my Prime membership. They have individual
characteristics—where I live, you know, what’s the income
in the zip code I live— they have all
that information as well. And they have
consumer preferences based on how I search
for products. And even the reviews that I put
in for different products, they have that information
which they can use to understand what consumers
are really looking for when they buy a product. Let’s go to Walmart. So who is—you guys
were doing Walmart, right? So what sort of data
do you think Walmart has? And leave aside which is a pretty small part
of their business today. So what sort of data
does Walmart have? Student: One thing
we talked about, they have a longer history
and they’ve got a lot of it. It’s not as—they
don’t get nearly as specific to the individual; it’s more of a democratic
based type of information. Saby Mitra: That’s a great point
because, you know, they can’t always identify you as an individual
because you might buy with cash and you might buy
with different credit cards. So they can’t really
pinpoint you as an individual as well as Amazon can, right? So they can do that analysis
of more of an aggregate store-level,
market segment-level. That’s a great point.
Yes— Student: I think
one of the things [inaudible] to the consumer based
on how that rollback [inaudible]
sales of that product [inaudible]. Saby Mitra: Right. Right. So if they roll back a price,
how much more did it sell? Sure.
Anything else? So, you know,
if I look at the Walmart, they have single retailer sales. It’s obviously
a single retailer. No one selling directly
as—they’re not a platform; they’re not a portal for other
retailers to sell through. They have limited
market basket information. I mean every time
I buy, they can’t really identify me individually, and they can’t look
at other products I have bought because I may not have
bought it through a Walmart; I may have bought it
through another retailer. I don’t buy everything
through Walmart. So they have market basket, but somewhat more limited
than Amazon. And they have limited
individual purchase history, longer time period because they’ve been
in business for longer, but it’s hard for them
to identify me, as you mentioned, as a specific individual
in their data. Now let’s move to P&G
or any FMCG company that’s selling through Walmart
or through Amazon. So P&G, what sort of data
do you think they have? Student: [inaudible]
Student: [inaudible] Saby Mitra: Say that again. Student: They have
a large market basket. They can see everything
from the [inaudible] what’s going on down
the [inaudible]. Saby Mitra: Right. Absolutely
for their product category, they’re selling
through multiple routes and they have much
wider—for their product category—much wider data
than just Amazon. Student: [inaudible] Saby Mitra: And that’s
exactly right. So they do have
store-level sales, you know, different stores,
as you mentioned, how much they are selling.
They have consumer trends. They do a lot of analysis
on consumer trends; what sort of products,
you know, what sort of features would consumers like
in their products. So they do focus groups.
They do other research. They buy data. They also buy third party data from various agencies
to figure out, you know, what sort of new
product enhancements they should make. Now if you look at this,
does it seem that Amazon
has an advantage to you in terms of the data
that they have? Student: Yes and no. Saby Mitra: Why “no?” Student: No is because
there are other data providers— second party,
third party data providers who can complement whatever their
[inaudible]. But they have [inaudible]
if they can go back and
[inaudible]. Saby Mitra: OK. So those guys, P&G, can actually buy some data
from other third party providers like Nielsen,
for example. Yes— Student: [inaudible]
important information [inaudible] P&G actually has a lot
of consumer loyalty programs to programs that go in
and supplement the data that will be equal
to individual purchase history and things like that. So they’ve got
additional supplements through that as well. Saby Mitra: It’s hard to get
consumers to use the loyalty programs. So if you go to a Kroger,
for example, they would love you
to use the card. They give you all kinds
of discounts if you use it. Why? Because they want that data
to figure out who you are, what are you buying. CVS, you know, print out coupons
which are a long list of coupons as you might have seen
whenever you go to CVS. Yes— Student: I would think
that exact thing [inaudible] volume of transactions knowing
people and being able to see how people react to those. Saby Mitra: Right. And, you know,
the key point here is that every company
may have a segment of the data. Walmart, Amazon comes
pretty close to get to having a large amount
of data about you, but it doesn’t have everything. So if you look at what data
you really need, what is complete data,
what data you really need to generate
an accurate consumer profile, you can sort of put it
in four buckets. One is, of course, the Purchase
and Usage Environment. So this is the customer
context—who are they; what do they do; what what’s
going on in their lives; what are the big events
in their lives; who are their influences; where do they go
for information. That’s nothing about
their purchase behavior; it’s the general context. And perhaps
customer demographics, social media—those
are sort of the places that you might find
a lot of that information. If you look at—the next category
would be Desired Experience. That is, you know, what does the
customer seek from the offering? What are their needs,
especially unmet needs? And Search and Clickstream Data— so what features
do people search for when they are looking
for a product? What product characteristics
is most likely to be clicked on when they’re browsing
a web page? That might give a lot
of valuable information about what customers
are really interested in. Then Beliefs and Associations—
so what are their perceptions? What are customer
perceptions—product perceptions, brand perceptions? And review and social media data might be one source
of that information. And then, of course, the actual
Purchase Behavior— what’s the customer
consideration set? What are the products
that they consider when they bought this product
and what product did they actually
ultimately buy? That’s actually
the transaction data. So good news, of course,
is that not any single company has all this information but, you know, you might have
heard—and actually it was in the news a couple of days
back during the weekend, and it’s been rumored
for a while that Amazon is trying to
get into banking as well. They already do
a lot of product financing especially
in Third World countries where the people who are
selling through their platform, they might not have
the money themselves, so they actually finance
many of the retailers who are selling
through their portals. Many of the delivery companies that are actually delivering
the products they finance, they’re also getting
into payments. That’s well known. But they’re also thinking about, at least according
to the article, of setting up a bank because
one element of all of this data is the customer
transaction profile and their financial transactions
which they don’t have. So if you take
a Bank of America, they probably have much
more information about me and what I spend on. Student: So I work for a company [inaudible] and also we have marketing data
that we sell [inaudible]. But we can hardly
really sell that. We can only do an aggregation and let them know
what is the cross [unclear] purchases
[inaudible]. Saby Mitra: You can sell
the insights but not the actual data itself. Student: Exactly.
[inaudible]. Saby Mitra: Sure, but they’re thinking
about getting into banking and maybe Jeff Bezos,
one day, will have a trillion dollars
to spare to buy Facebook. And when that happens,
if they have a bank and they also have
the social media and Facebook, they will probably have
a complete customer profile and consumer profile and know everything there
is to know about me. But until that happens,
the good news, I think, is that is not a single company has access to your entire
consumer profile. Now let’s get to—
in the remaining time, let’s get to the actual methods
and analytics methods that are becoming
popular today. You know there’s been a lot
of progress in the methods and the technology
to analyze data, so I wanted to give you
a sort of broad overview of all the different methods
that are available today to analyze data
and how they’re being used. And I do just want
to give you a dump of all the methods—
that’s going to be very boring. I think I have about 25 minutes
or so, hopefully, since we started
a little bit late. So I don’t want to give you
just a dump of all the methods, but to sort of put it
in a framework that helps you retain
this information. So here’s kind of my framework
of thinking about it. And this is completely,
you know, from my head, if you will. So we’ve always done business
intelligence for many, many years.
Nothing new there. You know, we’ve reported
on what’s happened in the past. We have had alerts
and dashboards on what’s happening today,
what’s happening now. And we’ve even done things,
simple forecasting, to predict what sales
are going to be in the next quarter and so on. Nothing new there; we’ve done
that for many, many years. What’s changed today?
What’s different today? Student: [inaudible] Saby Mitra: There is a lot
more data. There is cheaper storage. There’s faster processing,
and there’s better technology which basically means
that you now have access, as you mentioned, to a
wider source of integrated data, both internal and external. You also have
more granular data. At a much slower level
of granularity, you can track a person rather
than at the segment level, and longer time horizons of data
that you can store. So what this has
led to is that, if you think today in terms
of business analytics, it’s not just about reporting
what happened in the past, but also understanding
why it happened. So I can use data
to understand what was the impact
of my actions on my outcomes. I call that,
and the industry calls that, diagnostic analytics. In terms of the—in terms
of the future, instead of just
simple forecasting at the aggregate level, you can now predict
how specific actors will behave. You can say, “This customer
is likely to churn” or “That machine
is likely to fail.” So you can do the prediction,
not at the aggregate level, but at a specific actor level because of the granularity
of the data that you have. And instead of just showing
what’s happening today, you can also use models
to optimize my actions— so, how can I optimally allocate
resources to maximize profit? What price
will maximize profit? So you can use algorithms
to figure out, not just report
what’s happening, but what can I do
to actually optimize the actions
that I take today. And that’s called
prescriptive analytics. So diagnostic, predictive,
and prescriptive— those are kind of
the three terms. Now let’s look at the methods
that fall into these categories. So in terms of diagnostic,
you know, we look very quickly at impact analysis
and pilots and experiments, and I’m not going to spend
a whole lot of time in the details there,
in the weeds there. So the whole focus here
is improving decisions. How can I use data to improve
the decisions that I make? So that I invested
in something, what was the impact
of that in my outcomes? That’s the focus there. Then in terms of predictive, it’s about
understanding behavior. So typically the sort of things
that are done there is our associations—
so if I buy this product, what other product
am I likely to buy? If I have this behavior, what other behavior am I
likely to have as well? So those are the associations. Classification—using data
to understand which customers
are likely to churn versus not likely to churn; which machines are likely to
fail versus not likely to fail. That’s trying to use past data
to classify incoming data. Which customer is likely
to default on a loan and not default on a loan? Using past data and the
characteristics of the customer coming in, can I classify them
into these two buckets? And then clustering which is can
I segment my customers based on observable
characteristics? So that’s the predictive part. And then the prescriptive
is using optimization and simulation
to sort of understand what optimal decisions
I should be making today. It’s at a very high level. That’s kind of the basic
classification of the methods that you’re likely
to see in analytics. And I’ll go through
a very quick sort of overview
of each of these, so let’s start
with the first one. Let’s start with
diagnostic analytics. And here’s a sort of
real example. So I don’t know
if you saw this. This was about maybe
a couple of years back actually. Well, one and a half years back, there was an article
in The Wall Street Journal which said that P&G
is cutting about $100 million in digital advertising, because they felt that it had
no effect on their sales. I’ve always wondered
whether all these ads that we send out or our
executive MBA and evening MBA and all that, whether it has
any effect or not. It does?
I don’t know. I always—you know, I’d like to analyze the data
to see whether it has an effect. P&G actually believed
that we got some data that either said
it was in a bad place. That was shown in a bad website
or it was not effective. It didn’t really improve sales.
So here’s my question to you: How can P&G evaluate the impact
of digital ad spending? How would you do that? How would you evaluate the
impact of digital ad spending? How would you use data
to analyze that? Student: [inaudible] Saby Mitra: OK.
So that’s kind of the basic idea. So suppose P&G ran a digital ad
campaign for specific products in a few markets
during the last month of 2015. I’m just taking 2015
as an example here. So this is what the data
would look like. So you have for
every product market, a product market
may be Tide in Atlanta. That’s a product market.
So did I run a digital ad? One if I did,
zero if I did not. And then you have the sales
in those product markets and you might have
other information which also might affect sales. So things like you know
how much was the rainfall, what’s the average income
in that market, those kind of things. And you can see
that different product markets that month of December. This is the 2015.
That’s the data. And I’ve just shown 10 but there
may be many other products and maybe thousands
of product markets. So going back to your
analytical tools days, hopefully that was the first
course in your executive MBA or evening MBA
or the fulltime that was the first required
core class that you did. You run a regression with— you run a regression with sales
as your dependent variable and you’re really interested in seeing the effect
of this digital ad. So here’s kind of the results. And these are the sales
are in thousands. So this is actually
$7.4 million. So what does this tell you?
So it says digital ads 1424 and it’s positive
and significant. If you remember from
your regression analysis, what does that say? So the exam question,
what does that say? What does that say? Student: [inaudible] Saby Mitra: Say that again.
Yeah. Fundamentally, it says
that in the markets where you ran the digital ads
you had a $1.4 billion higher sales
than in product markets where you did not run
the digital ad. Now do you have to be careful
about that analysis? Why? OK. So let’s say you spend
$400,000 on the digital ad campaign. So then you have a $1 million
extra, right, from this? Is that good? Student: [inaudible] Saby Mitra: That’s
a great point. So what you’re getting to
is that you may have chosen to run digital ads
in attractive markets. What if I chose to run
digital ads in these markets where sales are high anyway? Student: [inaudible] Saby Mitra: Right.
Right. You typically run your ads
in December to get more holiday sales. But the point is that
you might have chosen to run digital ads
in those markets where sales are expected
to be high and therefore it’s not
the impact of the digital ad. It’s really the way
you’ve chosen to run the ads. Student: So if I were
running this, what I would think
is not to base it on sales but instead to base
on periods of sales— Professor: Very good, very good.
So that’s the basic idea here. If you have sales over time, you can actually do a better job
of impact analysis, because now I have, you know,
for the same product market, I have different months,
different years, and I have the sales, and now I can really look at
before and after analysis. Some years I did not run
in the same product market. Some years I did run
a digital ad in that market. So what was the impact?
And if you have that— and I don’t want to get into
how you would do that. It’s actually fairly easy
to include these other dummy
variables to do that. But then you might run
and you’ll see that the actual effect of
digital ads is still positive, but it’s less. And what this shows is, compared to the years
where I did not run a digital ad in the same product market, versus when I ran a digital ad
in the same product market, what was the extra sales. And this is a better way
of analyzing your decision. The point that I wanted to make
is this first one here, which is you always have
to be careful about variables that you have not included that may be correlated
with the focal variable. So if I just wanted to see
the impact of does an MBA lead to higher wages,
I’m sure that does, but what if I just looked
at people who have and did not have an MBA, you know, you guys all came here
on the weekends, gave up your weekends,
gave up your evenings, you know, took two years off. You are motivated differently and, therefore,
your higher wages may just be a result
of your motivation and hard work and not the result of your MBA,
right? But if I could compare
what happened to your wages before and after, for the same person,
that’s a better analysis. So that’s the idea here. The other thing
that you have to be you have to be careful about
is the focal variable. If you it’s correlated
with factors that are not legal
or unethical to use, you’ve got to be
careful of that. So I was working with a bank and one of the things
they were looking at was they have hundreds
of thousands of résumés that are submitted and they did this analysis
for their existing employees and look at their—
the people’s résumé and their job performance
and tried to relate factors on from the résumé
to job performance. One of the things they found,
for example, you know, just the example here,
is that golf, people who play golf
has higher performance. Now, that seems an innocuous
variable to use. No problem. You know, maybe I will screen
all the hundred thousand résumés I get based on whether
they play golf or not. But you don’t know that if that
variable golf is correlated with other variables
like race, like income, which might not be legal
or ethical to use. So that’s something that
you have to be careful about when you’re doing this. And then finally, you know,
delayed impact. So I may have run the ad this— now and the impact of sales
was not right now. It may be later on. That’s not captured
in my analysis. So lots of pitfalls that you have to
sort of think through logically whenever you’re trying
to use data to understand what impact
something had on another, on your outcome variable. All right,
let me move to an idea. I told you I’m going to go
through this fairly quickly. I’m going to skip
over this part here and come back
if I have the time. But I get to predictive
analytics. So we’ve looked at diagnostic. You’re trying to understand
the impact from historical data
on outcomes. Let’s move
to predictive analytics. The most common example
that all of you are aware of is when you go to Amazon, they give you lots
of recommendations. You’ve seen
these recommendations when you go to buy products
at Amazon. I’m not sure if you knew. I thought this was
a really interesting statistic that 35 percent of Amazon revenue comes from products
they recommend. That’s a huge number.
Think about it. 35 percent of their revenue
comes from products they show as recommendations
on their webpage. So they must be doing
a really good job of predicting
what you are likely to buy. How do they do that? The basic approach that they use
is called market basket analysis or association rules, and essentially
they analyze transactions, thousands of transactions,
which look at— and try to form rules
which look something like this: Left-hand side implies
right-hand side. That means if somebody
bought a camera, they’re more likely to buy
a camera bag, things like that. Left-hand side
is a group of items. Right-hand side
is another group of items and they’re trying to derive,
based on that millions of billions
of records that they have, what people buy together,
and the key thing to note— not everyone gives this example: diapers and Friday
evening implies beer. I don’t know why. Why would you buy beer if you buy diapers
and it’s Friday evening? Student: [inaudible] Saby Mitra: Yeah.
Very, very good. Yeah.
Very, very good point. So that I was going to come
to that in just a sec. That’s a great point. You know, how do you really try
to get to that causality? That if you buy this you’re
really likely to buy that. It may be that—you know,
if you go to a grocery store, what are the two things
that you always buy? Milk and bread, right?
You always buy. So someone might look at that
set of transactions and say, hey, if someone’s buying milk,
suggest bread. But that’s just because
you always buy milk and you always buy bread and, therefore, that association
will, you know, normally come up if you tried
to do this type of analysis. So what they do is—there are
just exponential number of rules that can be generated
from the data. For those of you
who are mathematically oriented, it’s actually 3
to the power of K and K is the number of items
which are there. So if you take any retailer, 3 to the power of hundred
is a very large number. I think it’s like 3 to the
power of 20 is like 3 billion. 3 to the power of hundred
is unimaginable. And a retailer sells tens
of thousands of products. So 3 to the power
of tens of thousands is just an unimaginable number. So what they do—
I try to answer your question, is they look at three measures,
which is one is what percentage of the transactions
have these two items— the left-hand side
and the right-hand side. Given every transaction
that has the left-hand side, what percentage of them
also have the right-hand side? That’s called the confidence. And then this lift is trying to get to what do you
expect by chance. If I buy milk always
and I buy bread always, just by chance, many of the transactions
will have both milk and bread. There is no causality there. So there is another factor and we don’t want to get into
how do you calculate this, which is compared to chance,
what’s the lift in probability that if you buy item X you’re
also likely to buy item Y? So that’s kind of how they’re
trying to get to the causality. It’s not real causality yet,
but it’s better than what you would observe
by just chance itself. So where would you use this?
What do you think? So recommendations is one. Where else do you think Amazon
would use this information? Obviously, recommendations
we are all aware of. If you buy something, they show you something else
based on these rules. Where else? Student: [inaudible] Saby Mitra: Exactly.
Exactly. So things which people
tend to buy together, you might want to stock
your warehouses and they’re using it
to improve their operations so that it becomes much quicker
to pick the products. I don’t see Cynthia here. I’m assuming I have till
10 minutes past 7, right? Oh, Cynthia, you’re there. Sorry, I didn’t see you.
Is that OK? Cynthia: Yes.
Is that OK, guys? Students: Yes.
Saby Mitra: All right. All right,
let’s do one more, and this is a very common method
which is used. It’s called classification
and the idea here is can I— based on past data, can I arrange incoming data
into predefined classes? So for example,
can I classify credit applicants as low, medium, or high risk? Can I classify customers
as loyal or likely to leave? Can I classify emails
as legitimate or spam based on my analysis
of prior data? You may have heard many of these
terms here. They’re all basic methods to do classification—logistic
regression, decision trees. I’m going to show you one
and I’ll run— I’ll show you the results
on a dataset and ask you to interpret that. So we’re going to look
at decision tree analysis as one of the methods, but they all basically
do the same thing. They’re trying to classify incoming data
into one of two groups. That’s fundamentally
what they’re trying to do. So the basic method
that you follow is old data, your past data,
you divide into a training set on which you train
your algorithm and the test set once you’ve trained
the algorithm to test your algorithm to see
whether it’s doing well or not. And then after you’ve done that,
you know, you want to check the accuracy
on the test data set. If it works
OK, then you apply the model to your
incoming data to classify them. So you train based
on your past data, you test based on your past data
that you have kept separate, and then once your algorithm
is trained when incoming data comes in you apply it to the incoming data
to classify into. Are they likely to churn?
Not likely to churn? Are they likely to default?
Not likely default, et cetera? So here’s kind of what the— once again,
this is a method that’s used. Here’s kind of what
the output looks like. I apologize for this. I’m going to give you
a pretty morbid example here. It’s the Titanic data set. Titanic, as you know,
very few people survived. Actually,
38 percent of the folks on the Titanic survived. So this is actual data from 891
passengers on the Titanic. And I just use this data set and ran the algorithm to see
what it generates just to see. So I’m trying to predict
this variable here, whether they survived or not. And I have data about
a lot of other things. You know, what class
they were traveling on, sex, male or female, age, whether they had siblings
or spouses, number of siblings and spouses traveling with them,
number of parents and children, the fare they paid,
the port they embarked in. And I just ran the algorithm,
let it loose on those data to see what the algorithm
thinks would predict survival. Same method you can use
to predict whether it’s a spam or not spam,
whether it’s default or not likely to default,
the same idea. And here’s the tree
that it generated: Now, let me explain
what this means. So it says here that most people
did not survive. That’s 0. 38 percent is the survival and this is
all the people are in this node. Now,
when you break it up by sex, so if you were male,
you did not survive. You had a 19 percent chance
of survival. And 65 percent were male. If you were female, you had
a 74 percent chance of survival. So you survived.
It’s more than 50 percent. 35 percent were female. If you were male
and your age was greater than 6.5—that’s this
one here—you did not survive. You had a 17 percent chance
of survival. And if you were less than
6.5—that’s here—you had a pretty good chance
of survival, 79 percent. Likewise for female
if the fare class—the class they were traveling on is 1 or
2—first or second class—they had a pretty high chance
of survival, 93 percent chance. If they were traveling
in third class, then depending on the fare
they paid, there is a difference
in the survival. How would you interpret—this
is real data. I just ran the algorithm. The algorithm is dumb.
It doesn’t—it just runs. How would you interpret this?
What does this say? Go ahead. [students murmuring
indistinctly] Rich woman?
What’s the interpretation? Yes— Student: Others looked
after kids. Saby Mitra: OK. That’s not
quite the interpretation. What else? Student: Women and children
first— [inaudible] Saby Mitra: Exactly, people who
got up on the boats survived. People who didn’t get
on the boats did not survive. That’s the human interpretation
that you make from this. The algorithm is incapable
of making that interpretation. That’s where our understanding,
our knowledge comes in, to make that interpretation
that, you know, people who got up
on the boats survived. People who did not get
on the boats did not survive. You were more likely
to get on the boats if you were women and children. And also, if you were a woman, then depending on the class
of travel you were more or less likely
to get up on the boats. That’s the human interpretation. It’s a real good example to show that the algorithm
stops at one place. It cannot give you
more insights. It’s us who has to interpret
this data to really figure out
what’s going on. And then you build your model based on your interpretation
of the data. If you think about
predictive analytics— and I think I have
to two more slides. If you think about
predictive analytics, once you run all your models, your decision tree or whatever
algorithms you’re using, that that’s going to tell you what are the factors
that are important. This analysis was
very insightful to tell me what factors were important
to predict survival. And then you take those factors
to develop a scoring model that predicts your outcome, based on the incoming data
that’s coming in. So this is a well-known example. I’m sure most of you are aware of this Target example
of trying to predict— When you have a baby,
you spend a lot of money, and it would be great
for retailers to predict when you’re going
to have a baby, because then they can
send you coupons, right? I mean, that’s really
important for them. So when they ran
all their models, they found that there were about 25 products that,
when analyzed together, they could
assign a pregnancy score, which was very accurate, to predict whether someone’s
going to have a baby or not. Now, of course, if you know
the rest of the story, you know what happened here. Can anyone explain?
Does anyone—Yes— Student: The father complained
to Target that they were setting things
incorrectly and then the daughter—
[indistinct] Professor: That’s right. Yes.
All right. And then the final one to talk
about is prescriptive analytics. So we talked about
diagnostic impact analysis, predictive classification. We talked about decision trees
and association rules, and then the last one
is prescriptive analytics. And this has to do
with using models to try and make the best decision. And this is an example: I worked with a financial
services company, a small financial services
company that really turned around
their business using analytics. What they essentially
did was they developed a really good loan default
prediction model based on data. They could really identify what’s the probability
of default for every person, every applicant that comes
in based on past data. Decision trees. Very, very good model to predict
that the probability of default. Now, they also have market
segment characteristics. So for every different—
this financial services company gave loans to different segments
of the market: rural, urban, small business,
personal loans, various other types of loans. And for each of those markets, they had pretty good information
about characteristics. So for each of these market
segments they could predict, using their loan default
prediction model, what the probability
of default was. The interest rates
were also different in each of these
market segments. You could charge more
to a consumer in a rural area than in an urban area,
things like that. So the interest rate
they could charge was different. There are also
regulatory requirements that they had to meet. There is also risk tolerance, because just because
the interest rate is higher in a market segment,
the risk may also be higher. So they have to balance that. There was a risk tolerance that
they had to take into account. And then there are
underwriting guidelines as well. So the model that they developed
was, how do I allocate my capital
that I have to lend out? How do I allocate it
between these different segments so that I maximize
expected returns, subject
to regulatory requirements, the risk tolerance limits, as well as underwriting
guidelines? And for those of you
are familiar with optimization, you know, that’s
a classic optimization problem. So framing it in that way presents it as
an optimization problem. There are many
different algorithms that are available to help
you solve that problem. And the advantage of making
decisions this way, rather than by gut feel, is that you can really take
a whole lot of factors into account in making
a better capital allocation to maximize your returns, subject to all the constraints
that you have. That’s the basic idea. I will close with
one thought here. And this has to do
with the implementation. This is my last slide here. So if I look at why the company
was able to use this analytics to really turn
themselves around, I think there are
three things that they did: The first is vision
and communications. Strong CEO
and top management support, that this is the way
we are going to do business. We are going
to use analytics. This is this is how we are
going to run our business. So that was one big element. The second is what I call
“process integration.” There wasn’t a choice
which was left for salespeople. So the risk, the rate
that is quoted, is automated. You put in the data
and your information, the models predict what
the probability of default is, and based on that, the system is
going to quote an interest rate. The salesperson
has very little leeway in setting the interest rate. So it was built into
their whole process. How they allocated capital
was based on the model. So because it’s integrated
with the process, there was no leeway. Maybe 90 percent
they did this way, maybe there are 10 percent cases
that needs a deeper look, a human element to look at. And then finally,
you have to have incentives. If you want a change
in behavior, you have to pay
real careful attention to the incentives that people
have in making that change. And that was another big thing that they really paid
a whole lot of attention to. So I’m going to stop here. Any closing thoughts
on any of this or any of the topics
that we talked about? I know I’m standing
in between you and alcohol, so I’m fully aware of that. But if you want any of
the slides—I saw many if you’re taking pictures—I’ll
be happy to send that to you. If you send me an email—or
maybe I’ll give it to Cynthia and she can send it out. Student: Thank you. [applause]

Leave a Reply

Your email address will not be published. Required fields are marked *