Friday, July 6, 2018

Cloud Applications: The case for realtime monitoring

Most cloud applications depend on tools link Splunk or its competitors for the monitoring of applications. These tools have become an integral part of any cloud application in production. Here I am going to talk about situations when these applications are not sufficient to provide the type of monitoring that you desire.
There are situations when you want to monitor specific workflows. For example you want to monitor a customer order as it proceeds through different steps or you want to monitor what a specifc, or a set of customers are doing on your applications. You may want to monitor all the requests that are coming in from a particular IP address in realtime for debugging purposes.
Most organizations use things like Trace Id, or Tracker Id, to trace all the logs that belong to a specific request but there are issues with that. These can be effective for requests that originated through an external interaction like a REST endpoint call or a Kafka message but you can not synchronize these ids with background activities that you system may be performing to complete your workflow. 
Since tools like Splunk go through a process of log creation, parsing and it is a batch process. It is really not very effective for real-time monitoring. I believe that we need a tool that can help us with the real-time monitoring in an effective way and not become an overhead for the application itself. I call these services contextual monitoring services.
The idea of this service is pretty simple, as developers are writing applications, they are adding logs. Depending on what is of interest, the developers could push some of these logs to contextual monitoring service. The only difference is that these logs are tied to a context which may be a User Id, Order Id, Partner Id, IP Address or any other identifier that you wish to use. We will not log all the requests through this mechanism but a small subset of request that you might be interested in.
The diagram below describes a mechanism through which we can implement a system for real-time monitoring of the applications.


Basic Architecture of Real-time Monitoring
The real-time monitoring system needs to have the following basic functions.
Real-time monitoring use case
We need a capability to create and delete a context in the form of REST calls. We don't want to log all the messages all the time. The messages will only be logged when a context exists for the particular id against which messages are being logged.
We also need to support the case of a context being related to another context. Let's understand it with an example. Take the case of a retail marketplace. When any supplier updates the status of an order as shipped, this information only has an order id as context. But this needs to be updated to a user as well. So when a context for a user id is created, his order status is also logged.
Following diagram defines how the context is used. You can create a context, create a relationship, destroy a context and log messages.
Using Context

Following diagram describes how the messages are logged.
Logging message
Here we are intending to use Kafka as a backend for logging the messages. This allows us a capability to keep the messages in Kafka queue for an amount of time till it is available for use to us.
This service with help us build functionality that would help us take real-time actions on the behavior of the application.