Articles, Blog

Anomaly detection using machine learning in Azure Stream Analytics | Azure Friday

August 22, 2019


>>Hey friends. You’ve
probably seen demos where an IoT Sensor detects the failure
of some system component. But what if you could
detect an anomaly before it becomes a failure? Krishna from the Azure
Big Data Team is here to show us how to detect
an anomaly in real time using machine learning functions in Azure Stream Analytics
today on Azure Friday. [MUSIC]>>Hey folks. I’m Scott,
and it’s Azure Friday. I’m here with Krishna Mamidipaka from Azure Stream Analytics.
How are you sir?>>I’m doing very well.
Thank you, Scott.>>I love it when
people bring hardware. Whatever you’re going to
do, it’s going to be great because there’s blinking
lights happening.>>Great. Awesome.>>So, this is a piece of hardware, and you’re going to show me how I
can detect an anomaly with this, but it’s not going to be very hard.>>It’s not going to be very hard. It’s going to be very simple
and very interesting.>>Okay. So, this data is being pushed into Azure Stream Analytics. I’m assuming that there’s a world
where I have millions of these, like a fleet of these, right?>>Absolutely. That’s how
our customers do it, right? They have hundreds of the
sensors that are attached to so many machines and components
that they do not want to fail, and they want to detect
early signs of failure. That’s where anomaly detection in
real time really comes to play.>>This is all being put
into Azure Stream Analytics.>>Absolutely. Azure
Stream Analytics is a fully-managed past
service on Azure that helps you define your real-time
analytics algorithms using very simple SQL-like language, and it is really cheap. You can get started at
just about 0.11 per hour.>>Seriously?>>Yeah.>>So, even if I don’t
have millions of these, I could maybe just have a few dozen, I could set this up for my house.>>Absolutely. A lot of
our enthusiasts actually do that.>>Really?>>Yeah.>>I like that. So, here’s a general overview of
the Azure Stream Analytics.>>Yeah. So, very
easy to get started. We let you use a very
simple SQL-like language. We let you integrate with about 15, 16 services on Azure with
just a few configuration parameters. Absolutely no need to write any code. We stand by three nines of SLA
and the enterprise readiness compliant with all kind of government and industry regulations
and things like that.>>Wow.>>So, it’s a really easy system
and solution for you to use and deploy and benefit from.>>Okay. Fantastic. So,
what is a hot path?>>Hot path analytics. So, typically, in analytics, you have two very different way
of doing your analytics. One is analyzing
an event in real time, as it happens, and the other way
is to do a batch analytics. So, in hot path analytics, you are intercepting these
events as the events are on the wire so
that you can react and respond to things happening in your ambient environment within
a matter of few seconds.>>Wow. So, this isn’t
about learning about a problem at midnight
when the batch runs.>>Exactly.>>It’s learning about ASAP.>>Absolutely. Correct.>>Okay. Wow. You got all these different devices
coming from all different places, and the data can come from anywhere.>>Data can come from
your applications, your devices. We have customers who are managing hundreds and thousands of sensors and equipment and want to
understand what is the current status of
these machinery and devices. The way they do it is they
send the data to an IoT Hub, or if a customer wants to understand the nature of applications
that they are running, they send the data to Event Hub. Stream Analytics helps you ingest the data from IoT Hub or Event Hub. Then, we do the real-time
processing on the data. We write it to targets like
a SQL database, SQL Data Warehouse, data lake, you name it, for longer time retention and for you to do batch analytics
on it at a later time. Or you can power
your real-time dashboards using Push API in Power BI. We can also help you
trigger actions downstream. What happens if you
detect an anomaly? What happens when you detect
something that is just about to fail? You want to trigger an action. You want to maybe create
a service no ticket. Maybe you want to send an email. Maybe you want to send a text alert. We can help you orchestrate all those kind of things really
easy with Azure Stream Analytics.>>That’s fantastic. Even though
we’re going to be talking about anomaly detection and some of
the new functions and things, this is a great slide that says
so much about the power of it. Once you get your data
from wherever it’s coming, in the Stream Analytics, the things you can
do are just legion.>>Absolutely.>>Fantastic. All right. What’s next?>>So, today, I’m very happy to announce that we
have something known as machine learning-based anomaly detection models into
Stream Analytics. So, we can help you detect anomalies without really needing for you to write machine
learning algorithm, or you don’t have to
bring your model. We have models that
we have embedded into the codebase of Azure
Stream Analytics that can be invoked with
simple function calls. So, what are the advantages of this? Many, many advantages. The first one is low cost for you. You don’t have to have
anomaly detection machine learning scientists who are
writing these algorithms, and you don’t have to pay for a different
machine learning service. All that is just embedded
into Azure Stream Analytics. The whole complexity of
building your model, training your model
is just abstracted out to a simple function call
from Stream Analytics. In addition to that, you can also get things moving faster because you are not writing
your model, you’re not training it. There are many advantages because
of the paradigm like this.>>So, how can this
be possible though, because there’s so many different
kinds of data, flavors of data? To put it simply, the y-axis is always different depending on
what kind of data you’re looking at. Data that’s in a device like
this, whether it be X, Y, Z data, or heat, or blood sugar information,
or heart rate. There’s so many different
kinds of data there. How can you detect
an anomaly and make a machine learning algorithm that
works for all flavors of data?>>That’s a great
question. So, the model that we have in Stream Analytics for machine learning
vision anomaly detection, it is the unsupervised
learning model. That means that the model doesn’t
come with any pre-training. It learns from the data
it is starting to read. So, you can tell the model to learn from so many number of events before it
actually starts scoring. As you have said, every system in nature has a different
data distribution. So, it is very essential
that the model comes to you without having
any preconceived notion of how the data would look like. It will learn from the
data that is seeing, and then it will start scoring the next set of data points for you.>>That makes a lot of sense. That’s great to know,
but you’re still getting me 80 percent of the way there and saving me lots of time and lots of work that I would
have had to do myself.>>Exactly. So, when we looked
at the customers of IoT or Stream Analytics who
are using some kind of machine learning algorithms
in their pipeline, it was very evident that about 70 percent of them were doing some kind of
anomaly detection that’s why we decided to make
their life really simple by leveraging the richness of Microsoft and Microsoft
Research and getting those algorithms embedded
into Stream Analytics.>>Very cool. All right.>>So, in Stream Analytics, we are equipped to detect
two very broad set of anomalies. One is the temporary anomalies, commonly known as spikes and dips, and then persistent anomalies, which are slowly increasing
or slowly decreasing trends. If you think about VM that you might be monitoring
in your data center, the memory leak in a VM, it creeps up on you very slowly. So, those are the kinds
of anomalies as well that we are equipped to track, not just temporary spikes and dips. We’re using the algorithms and
the functions that we have.>>Now, I had mentioned
to you a little bit before and I’ll say it
again now because I’m diabetic and I have a number of IoT devices that help me
manage my blood sugar. Both of those things are
temporary anomaly like a high blood sugar from eating food is something I’d
like to know about. I’ve got all this data in Azure. A persistent anomaly, has my blood sugar been creeping
up on me over the last year, and I haven’t noticed it? I’ve been trying to
train my own models, but I’d love to be able to
do it with Stream Analytics.>>Yeah. We should
definitely try that.>>Fantastic. So, you’re
prepared to do those both. Those are functions that are
available to me that I can call?>>Exactly.>>Okay.>>To your point, this is how simple a function signature to run your anomaly detection
algorithm here is. So, for example, let’s take
the change point algorithm which helps you detect
these low trending anomalies. The scalar expression is really the field on which you
want to run your anomaly. It could be temperature,
it could be pressure, it could be X, Y, Z
coordinates, you name it. It could be a field
that is directly coming from your data or it
could be a field that you might have computed as a part of
your query. Confidence level. This is very important
because this tells the model the steps
the sensitivity of the model, how sensitive the model should be
to certain types of anomalies. The higher the sensitivity or
higher the confidence level, you will see lesser of the number of events that will be
flagged as anomalies, and lower the confidence level, higher could be the number
of anomalies that the model will track for you.>>That makes sense. So, getting in a car crash is different than
going over a speed bump.>>Exactly.>>Okay.>>History size. We follow
what we call a sliding window. History size is the
number of events you want to pack in that sliding window, and it also indicates to the model
how many events to learn from before it actually starts scoring
the events in the next window.>>Okay.>>Then, you can use the common Stream Analytics development model, like you can have partition by. This is really important because, let’s say, in a single query, you are monitoring multiple devices, and each of these devices could have natively different kinds
of data distribution, and you want a different model to be trained and a different model
to be executing on different streams of events that are being managed in
the same Stream Analytics query. That’s why when you do
the partition by query, you can enable this kind of behavior.>>Okay. So, then, if I wanted
to go and find out about potentially change
point anomaly detection in my blood sugar over a month, over a year, I can partition
these things differently and have them learn separately.>>Exactly.>>Fantastic.>>Duration is the window, not in terms of number of events, but in terms of the time. Ideally, the best practice is to have a parity between the history size
that we discussed, which is in terms of
number of events, and the window size. You can anticipate
how many number of events you normally expect to see
in that time window, and then you try to strike
that parity between the history size and the window size
which is a time-based.>>Fantastic. This
seems hugely powerful.>>This is. When the model runs, when the function runs, you get
two different types of output. One is IsAnomaly which
is zero and one. One indicates that there has been
an anomaly that is detected. Zero is there is
no anomaly. The score. This is a machine learning
algorithm for every event that is being processed. It generates a score. We’re also exposing what the score statistics are for
every event that is processed.>>Okay. So, I’ve been
staring at this device. It’s been blinking at me. Are you
going to show me something cool?>>Yeah. So, this is MXChip device that is widely used
in the IoT circles. You can buy it online. We also have SDK for it that you
can program using Visual Studio. In fact, Microsoft
has an SDK for you to develop your cool projects
using this MXChip. The good thing is MXChip gives you various different
types of sensors. There’s sensors for
temperature, humidity. What we are going to use
today is the accelerometer, which is simply X, Y, and Z coordinates being tracked
on this particular sensor. What I have on the screen here is the website from where
you can actually leverage the SDK and by
this MXChip toolset.>>Okay. So, can we see the real data that’s coming into
this and how it’s being analyzed?>>Absolutely. So, what
you have on the screen are three different parameters that
are being tracked on Power BI. One is the z-axis displacement, as tracked by the MXChip. Then, down below, we have
an anomaly, yes or no. One indicating that
there is an anomaly, zero being no anomaly, and then the score associated with every event that
the model is processing.>>Okay.>>So, what I will
do now is just move this MXChip a little bit in order to disrupt the X,
Y, Z coordinates. You can assume that
this is very similar to a vibration being felt in a turbine, and you may want to slow it down
or you may want to stop it. So, I’m just going to move it a little bit and rest
it back where it is.>>Okay.>>Within a few seconds, the data gets processed
by the Stream Analytics, and you will see that there has been a z-axis displacement and an anomaly was duly detected using
the machine learning algorithm.>>That was only a couple of seconds to have that information figured out.>>Exactly.>>[inaudible] Sorry. I just
want to see what that would do. So, that information
takes one or two seconds. The accelerometer is being
sent to Stream Analytics, and then you’re going from
Stream Analytics into Power BI, and there it is.>>Exactly.>>So, what does
that function look like? What’s that codes look like?>>The function is going
to be really simple. As you can see, we are getting
time and z-axis from the device. Then, this is where
the magic really happens. You have spike and dip
algorithm that I’m using. I’m managing and
monitoring the z-axis. I’m indicating a 99
percent confidence level, telling the model that for
something to be an anomaly, the model needs to be
at least 99 percent confident. Five hundred is the number of
events I want that model to learn from before it starts scoring. I’m suggesting time window
of 120 seconds.>>That’s pretty cool.>>That’s all.>>That’s all you had to do?>>That’s all I had to do.
Then, I’m just going to push my output into
a Power BI dashboard.>>Then, when you do
your selection there, that’s basically
projecting that data, and then you use Power BI to
get your real-time dashboard?>>That’s it.>>This is the kind of thing I can
put together in a fun weekend.>>Exactly.>>This is fantastic. So, where do I go to learn about
this information, and how do I get started
immediately because I love this?>>Absolutely. So,
in order to do that, the documentation for Stream
Analytics is really the right place. Just go to the documentation, search for anomaly detection, and then all the functions
that we have are going to be very well described and
defined there for you to use.>>This is fantastic.
Thank you so much. I am learning all about anomaly detection and some of
these great machine learning algorithms that are built into Azure Stream Analytics
today on Azure Friday. [MUSIC]

No Comments

Leave a Reply