DevOps Architecture – DevOps Monitoring


DevOps Monitoring Introduction

This article describes how application and infrastructure architecture can contribute to the selection and application of the DevOps monitoring tooling.

DevOps Monitoring Definitions

Monitoring layer model:

There are thousands of monitor tools from which a choice can be made for the monitoring of the deployment pipeline and in production.

Event blacklist:

Events that should be signaled by the monitor tooling are listed on the event blacklist. This means that an alert occurs when the event takes place. The operations team can then take action.

Event whitelist:

Events that are not relevant are placed on the whitelist.

False negative:

The application raises an error that causes an exception, while there in fact is no error.

False positive:

The application does not raise an error on an event while in reality an incorrect situation has occurred.

DevOps Monitoring Concepts

Monitoring layer model

Figure 1 gives an overview of the various monitor functions that can be used to monitor a service.

DevOps monitor layer model
Figure 1, Monitor Layer Model [BEST 2016].

F1. System Monitoring (Element Monitoring)

The basis of the monitor layer model is the system monitoring that measures all application and infrastructure elements. System monitoring distinguishes from other forms of monitoring through the elementary level, therefore this monitoring is also referred to as element monitoring. Only individual application and infrastructure components are measured without activating them. This monitoring is essential for the technical management of the infrastructure. The system monitoring is made up of the following monitoring functions.

F1.1

Service

Monitoring

The monitoring of both infrastructure  services and application services. The services are measured with a  management protocol, usually a Simple Network Management Protocol (SNMP) get.

F1.2

Resource
Monitoring

Monitoring a single component on resource behaviour such as Central Processing Unit (CPU),  memory and network bandwidth.

F1.3

Buildin
Monitoring

A buildin monitoring is a monitor facility  that is programmed as part of the application. The added value of this monitor facility is that application runtime counters are measured which are  not visible outside the application.

F1.4

Event

Monitoring

Event Monitoring includes all  infrastructure components and infrastructure services, as well as application events. The events can usually be read from local log files such as the  application log, security log and error log file.

Table 1, System Monitoring.

F2. Application Monitoring

The application monitoring provides information an image of one individual application (subsystem) or service, by measuring the underlying system (system monitoring), infrastructure services and application interfaces. This level is important for application management because it displays all relevant information from one application in one view. The application monitoring consists of the system monitor function as well as the following monitor functions.

F2.1

Application
Interface

Monitoring

The application interface monitoring is specifically intended for application administrators so that they can at one glance determine that an application is operating in accordance with expectations. Therefore, it is necessary that per application is determined which infrastructure components, infrastructure services and application services are important. In addition, measurements are made at the application level, which verifies the interaction between the infrastructure and the  application code.

F2.2

Infrastructure

Service
Monitoring

Even if an infrastructure service is “live”, this does not mean that it is also working properly or even accessible. To establish that the service is working, a dynamic test can be performed at the infrastructure level where the service is called.

Table 2, Application Monitoring.

F3. Information System Monitoring

To perform a user function, more applications are required. These are not always identifiable to the user separately. In order to make user agreements at this level, the information system monitor layer is also defined in addition to the application monitoring layer. This is essentially a chain of applications. By measuring an E2E information system, it is possible to monitor as close as possible from the user perspective. These measurements are usually performed by walking through the transaction paths. Therefore monitoring at this level is also referred to as transaction monitoring.

E2E infrastructure measurements are also performed to support this measurement. The E2E Infrastructure Monitoring shows whether there is connectivity between various components of the infrastructure. This is especially true in geographically separated locations and large computing centers with more logical networks (Local Area Network (LAN)), Wide Area Network (WAN), and so forth.

The information system monitoring is built in addition to the application monitoring with the following functions.

F3.1

Information

System

E2E-monitoring

The availability of an application does not yet mean that the user can work. Often applications are coupled to complete chains (information systems).

F3.2

Infrastructure E2E-monitoring

The E2E monitoring infrastructure includes measuring the entire infrastructure layer. This monitoring is often used in Geographically Separated Units (WAN).

F3.3

Infrastructure Domain

monitoring

In case there are more vendors or the  network is subdivided into physically / logically separated domains, it is often desirable to monitor these domains separately in terms of availability, etc. There may also be other standards for these domains such as the DeMilitarized Zone (DMZ) domain versus the office automation network.

Table 3, Information System Monitoring.

F4. Chain Monitoring

The chain monitoring is aimed at giving an image of the performance of the total chain as a whole from different perspectives:

  • The user’s is perception;
  • The performance that the business process requires;
  • The information processing by the chain.

In addition to the information system monitoring, the chain monitoring includes the following functionalities.

F4.1

Information

Stream

E2E-monitoring

At this level, the monitoring of information flows is recognized. With an E2E information flow measurement, no ICT service is measured, but purely the information flows of the business processes.

F4.2

Business Process E2E-monitoring

This feature measures the business process from front to back based on the related information systems.

F4.3

End User E2E-monitoring

 

At this level, service provisioning as experienced by the user is measured. An image of the service provisioning is obtained from an end user perspective (see chapter IV.7).

Table 4, Chain Monitoring.

DevOps Monitoring Best Practices

The DevOps process includes the phase “monitor”. Monitoring a service is essential for DevOps. The monitor function is aimed at monitoring the correct functioning of the service and proactively detecting incidents. In addition, the monitor service enables the measurement of the SLA norms. However, realizing a good monitor service is not a sinecure. This article describes what should be considered in the DevOps process to provide a good monitoring facility.

Monitor principles:

There are a number of principles applicable to ensure that the monitor provision is properly used.

Monitor
Functionality

DevOps Monitoring functionality must correspond with the service level agreement.

Correlation

Next to E2E  monitoring, system monitoring must also be applied.

Monitor
classification

For each class of objects to be monitored objects, the same monitoring tool should be used wherever possible.

Table 5, Monitor principles.

DevOps Monitoring tools and SLA’s

The choice of a DevOps monitoring tool in production depends on the content of the SLA. For example, if in a SLA, only performance indicators and norm are formulated at the level of a hardware platform and network components, then it is enough to monitor the SLAs with a system monitors tool. However, if a SLA describes performance indicators at the information system level, then monitoring will need to be adjusted to that level.

Some tools present this by considering an information system as a sum of underlying objects and defining the availability, capacity, etc. as a function of the sum of the parts. Other tools monitor the information system itself, for example, by monitoring the flow of transactions. Matching the DevOps monitoring tool functionality to the level at which SLA norms are defined provides a reliable SLA reporting.

DevOps process:

Table 6 provides an overview of the DevOps monitoring related per DevOps fase and Monitor function.

  

Plan

Code

Build

Test

Release

Operate

Monitor

F1.1

Service Monitoring

Tool selection

System
test

Deploy

React on exception

F1.2

Resource Monitoring

Tool selection

System
test

Deploy

React on Exception

F1.3

Buildin Monitoring

Programme

Monitor

Unit test

System
test

Deploy

React on Exception

F1.4

Event

Monitoring

Tool selection

Programme

Exception

Unit test

Systeem
test

Deploy

React on Exception

F2.1

Application Interface Monitoring

Programme

Exception

Integration
test

Deploy

React on Exception

F2.2

Infrastructure Service Monitoring

Tool selection

Systeem
test

Deploy

React on Exception

F3.1

Information

System

E2E-monitoring

Tool selection

Programme
dummy transaction

EUX test

(FAT/UAT)

Deploy

React on Exception

F3.2

Infrastructuur E2E-monitoring

Tool selection

Infra E2E
test

(PAT)

Deploy

React on Exception

F3.3

Infrastructure Domain Monitoring

Tool selection

Infra
domain test

(PAT)

Deploy

React on Exception

F4.1

Information Stream 

E2E-monitoring

Tool selection

Label
information

Information
E2E test

(FAT/UAT)

 

Deploy

React on Exception

F4.2

Business Process E2E-monitoring

Tool selection

Label /
status Information

E2E test

(FAT/UAT)

Deploy

React on Exception

F4.3

End User

E2E-monitoring

 

Tool selection

Label
information

RUM test
(FAT/UAT)

Deploy

React on exception

Table 6, Monitor principles.

Plan:

The plan phase of the DevOps must take into account the costs and time involved in adjusting the monitor tooling. Freeware tools also take time to configure. Often, however, its monitoring facilities are centrally organized and a new Agile project can make use of existing facilities. However, the build-in DevOps monitoring and application interface monitoring needs to be programmed. This has to be budgeted in the plan phase.

Code:

During programming, it must be known how the exceptions are captured by the monitor facility. Writing an event (event) in a logfile is not good enough. A certain event format must be used that can be interpreted by the monitor. The easiest way is to define Standard Rules & Guidelines (SRG). A Standard must be met, a Rule may be waivered if there is a valid reason and Guideline is a recommendation. The following best practice standards apply to build a proper DevOps monitoring device while programming:

S1. Each event has a unique number
S2. Each event refers to the software configuration item that has made the exception
S3. Each event has a severity code assigned
S4. Each event also defines the recovery action
S5. Each new event will be registered on the product backlog of the OPS team

During the build phase, it should be known which monitor functions are applicable. The service desk and operations team are important stakeholders to be involved.

The following aspects are important during programming:

  • Build-in monitor equipment must be programmed. Importantly, maintenance of this feature may require new deployments of the application.
  • Event monitoring requires that events are programmed in the correct places in the application. The most important places are where an external call is performed to another, module, another application or an infrastructure service.
  • Application interface monitoring requires knowledge of how the application communicates with other applications. An example is the implementation of a control to determine whether the communication protocol that two applications use have the same version.
  • E2E Information Systems Monitoring (EUX) can best take place based on both read and write / mutation transactions. For writing and mutation transactions, however, it must be taken into account that these transactions cannot cause contamination in the information administration. A dummy account can be used. However, this must be isolated throughout the application and abuse by using such an account must be prevented.
  • Information flow E2E monitoring requires that data formats are measurable. This means that information items must be labelled. This labelling should preferably be included in the programming phase.
  • Business process E2E monitoring. To monitor a business process, the application must label information to determine which part of and business process that the transaction belongs to.
  • End User E2E Monitoring (RUM) requires labelling to determine which network package contains a particular type of transaction.

Build:

In the build phase only unit test cases can be used because otherwise the requirement of a maximum of 5 minutes for a build process cannot be met. Therefore, the complete DevOps monitoring facility can only be tested later in the deployment pipeline. This means that most aspects of the monitor function cannot be tested until a check-in has taken place. Therefore, it is important that measures taken in the code phase are taken as much as possible to prevent defects.

The build must check that the format in which the exception is programmed complies with the SRG based on a source code check. During the build, it must be verified that each error message that is programmed is also present in the event blacklist or event white list.

The programmer must in the code phase define the unit testing that will be used in the build phase to detect errors in the exception handling. Both false positives and false negatives must be tested.

Test:

Only after the object code has been created in the build phase and the source code has been checked in the baseline, it is only possible to use system integration tests and system tests to determine whether the monitor facility detects exceptions at this level.

It is often said that automating the acceptance tests in the deployment pipeline should only include the happy path. An important exception on this statement is the acceptance tests that tests all blacklist’s exceptions.

Release:

A successful deployment of application and infrastructure components should also include the deployment of adjustments to the monitor facility. Otherwise, new events will not be recognized. Deploying a monitor tool adjustment is usually not so exciting. However, the regression testing of the monitor device is a challenge. In practice, very little time is spent to this aspect. As a result, DevOps monitoring facilities are often less effective than originally intended.

Additionally, the scripting in the deployment pipeline should also be provided with exceptions if the automatic deployment is not successful. Monitor facilities must therefore also monitor the deployment pipeline, and not only on availability but also on exceptions.

Operate:

Keeping a system in the air usually means recovering of a service deviation. Therefore, the operations team is a key stakeholder to determine which aspects should be monitored. Often deviations will be possible to be solved automatically. Due to the collaboration of Dev and Ops teams, it has become possible to provide maximum support here.

Monitor:

Monitoring a service is not only applicable for production but throughout the D-T-A-P cycle. An important point that cannot be overlooked is the fact that in practice more and more DevOps monitoring tools are used. They should communicate well with each other. This collaboration of tools can impact the coding phase. This is because the communication of the monitor tools is often based on the format of events as they are programmed in the coder phase.

LinkedIn Group

Discuss with us about this article on LinkedIn.

More information

Related Books:

DevOps Best Practices, ISBN: 9789492618078
Agile Service Management with Scrum, ISBN: 9789071501807

Related training sessions:

Related Article:

Service Management

Samenvatting
DevOps Architecture - Monitoring
Artikel
DevOps Architecture - Monitoring
Beschrijving
How application and infrastructure architecture can contribute to the selection and application of the DevOps monitor tooling. There are thousands of monitor tools from which a choice can be made for the monitoring of the deployment pipeline and in production.
Auteur
Publisher Naam
ITpedia
Publisher Logo
Sidebar