This article describes how application and infrastructure architecture can contribute to the selection and application of the DevOps monitoring tooling.
There are thousands of monitor tools from which a choice can be made for the monitoring of the deployment pipeline and in production.
Events that should be signaled by the monitor tooling are listed on the event blacklist. This means that an alert occurs when the event takes place. The operations team can then take action.
Events that are not relevant are placed on the whitelist.
The application raises an error that causes an exception, while there in fact is no error.
The application does not raise an error on an event while in reality an incorrect situation has occurred.
Figure 1 gives an overview of the various monitor functions that can be used to monitor a service.
The basis of the monitor layer model is the system monitoring that measures all application and infrastructure elements. System monitoring distinguishes from other forms of monitoring through the elementary level, therefore this monitoring is also referred to as element monitoring. Only individual application and infrastructure components are measured without activating them. This monitoring is essential for the technical management of the infrastructure. The system monitoring is made up of the following monitoring functions.
F1.1 | Service Monitoring | The monitoring of both infrastructure services and application services. The services are measured with a management protocol, usually a Simple Network Management Protocol (SNMP) get. |
F1.2 | Resource | Monitoring a single component on resource behaviour such as Central Processing Unit (CPU), memory and network bandwidth. |
F1.3 | Buildin | A buildin monitoring is a monitor facility that is programmed as part of the application. The added value of this monitor facility is that application runtime counters are measured which are not visible outside the application. |
F1.4 | Event Monitoring | Event Monitoring includes all infrastructure components and infrastructure services, as well as application events. The events can usually be read from local log files such as the application log, security log and error log file. |
Table 1, System Monitoring.
The application monitoring provides information an image of one individual application (subsystem) or service, by measuring the underlying system (system monitoring), infrastructure services and application interfaces. This level is important for application management because it displays all relevant information from one application in one view. The application monitoring consists of the system monitor function as well as the following monitor functions.
F2.1 | Application Monitoring | The application interface monitoring is specifically intended for application administrators so that they can at one glance determine that an application is operating in accordance with expectations. Therefore, it is necessary that per application is determined which infrastructure components, infrastructure services and application services are important. In addition, measurements are made at the application level, which verifies the interaction between the infrastructure and the application code. |
F2.2 | Infrastructure Service | Even if an infrastructure service is “live”, this does not mean that it is also working properly or even accessible. To establish that the service is working, a dynamic test can be performed at the infrastructure level where the service is called. |
Table 2, Application Monitoring.
To perform a user function, more applications are required. These are not always identifiable to the user separately. In order to make user agreements at this level, the information system monitor layer is also defined in addition to the application monitoring layer. This is essentially a chain of applications. By measuring an E2E information system, it is possible to monitor as close as possible from the user perspective. These measurements are usually performed by walking through the transaction paths. Therefore monitoring at this level is also referred to as transaction monitoring.
E2E infrastructure measurements are also performed to support this measurement. The E2E Infrastructure Monitoring shows whether there is connectivity between various components of the infrastructure. This is especially true in geographically separated locations and large computing centers with more logical networks (Local Area Network (LAN)), Wide Area Network (WAN), and so forth.
The information system monitoring is built in addition to the application monitoring with the following functions.
F3.1 | Information System E2E-monitoring | The availability of an application does not yet mean that the user can work. Often applications are coupled to complete chains (information systems). |
F3.2 | Infrastructure E2E-monitoring | The E2E monitoring infrastructure includes measuring the entire infrastructure layer. This monitoring is often used in Geographically Separated Units (WAN). |
F3.3 | Infrastructure Domain monitoring | In case there are more vendors or the network is subdivided into physically / logically separated domains, it is often desirable to monitor these domains separately in terms of availability, etc. There may also be other standards for these domains such as the DeMilitarized Zone (DMZ) domain versus the office automation network. |
Table 3, Information System Monitoring.
The chain monitoring is aimed at giving an image of the performance of the total chain as a whole from different perspectives:
In addition to the information system monitoring, the chain monitoring includes the following functionalities.
F4.1 | Information Stream E2E-monitoring | At this level, the monitoring of information flows is recognized. With an E2E information flow measurement, no ICT service is measured, but purely the information flows of the business processes. |
F4.2 | Business Process E2E-monitoring | This feature measures the business process from front to back based on the related information systems. |
F4.3 | End User E2E-monitoring
| At this level, service provisioning as experienced by the user is measured. An image of the service provisioning is obtained from an end user perspective (see chapter IV.7). |
Table 4, Chain Monitoring.
The DevOps process includes the phase “monitor”. Monitoring a service is essential for DevOps. The monitor function is aimed at monitoring the correct functioning of the service and proactively detecting incidents. In addition, the monitor service enables the measurement of the SLA norms. However, realizing a good monitor service is not a sinecure. This article describes what should be considered in the DevOps process to provide a good monitoring facility.
There are a number of principles applicable to ensure that the monitor provision is properly used.
Monitor | DevOps Monitoring functionality must correspond with the service level agreement. |
Correlation | Next to E2E monitoring, system monitoring must also be applied. |
Monitor | For each class of objects to be monitored objects, the same monitoring tool should be used wherever possible. |
Table 5, Monitor principles.
The choice of a DevOps monitoring tool in production depends on the content of the SLA. For example, if in a SLA, only performance indicators and norm are formulated at the level of a hardware platform and network components, then it is enough to monitor the SLAs with a system monitors tool. However, if a SLA describes performance indicators at the information system level, then monitoring will need to be adjusted to that level.
Some tools present this by considering an information system as a sum of underlying objects and defining the availability, capacity, etc. as a function of the sum of the parts. Other tools monitor the information system itself, for example, by monitoring the flow of transactions. Matching the DevOps monitoring tool functionality to the level at which SLA norms are defined provides a reliable SLA reporting.
Table 6 provides an overview of the DevOps monitoring related per DevOps fase and Monitor function.
Plan | Code | Build | Test | Release | Operate | Monitor | ||
F1.1 | Service Monitoring | Tool selection | – | – | System | Deploy | – | React on exception |
F1.2 | Resource Monitoring | Tool selection | – | – | System | Deploy | – | React on Exception |
F1.3 | Buildin Monitoring | – | Programme Monitor | Unit test | System | Deploy | – | React on Exception |
F1.4 | Event Monitoring | Tool selection | Programme Exception | Unit test | Systeem | Deploy | – | React on Exception |
F2.1 | Application Interface Monitoring | – | Programme Exception | – | Integration | Deploy | – | React on Exception |
F2.2 | Infrastructure Service Monitoring | Tool selection | – | – | Systeem | Deploy | – | React on Exception |
F3.1 | Information System E2E-monitoring | Tool selection | Programme | – | EUX test (FAT/UAT) | Deploy | – | React on Exception |
F3.2 | Infrastructuur E2E-monitoring | Tool selection | – | – | Infra E2E (PAT) | Deploy | – | React on Exception |
F3.3 | Infrastructure Domain Monitoring | Tool selection | – | – | Infra (PAT) | Deploy | – | React on Exception |
F4.1 | Information Stream E2E-monitoring | Tool selection | Label | – | Information (FAT/UAT)
| Deploy | – | React on Exception |
F4.2 | Business Process E2E-monitoring | Tool selection | Label / | – | E2E test (FAT/UAT) | Deploy | – | React on Exception |
F4.3 | End User E2E-monitoring
| Tool selection | Label | – | RUM test | Deploy | – | React on exception |
Table 6, Monitor principles.
The plan phase of the DevOps must take into account the costs and time involved in adjusting the monitor tooling. Freeware tools also take time to configure. Often, however, its monitoring facilities are centrally organized and a new Agile project can make use of existing facilities. However, the build-in DevOps monitoring and application interface monitoring needs to be programmed. This has to be budgeted in the plan phase.
During programming, it must be known how the exceptions are captured by the monitor facility. Writing an event (event) in a logfile is not good enough. A certain event format must be used that can be interpreted by the monitor. The easiest way is to define Standard Rules & Guidelines (SRG). A Standard must be met, a Rule may be waivered if there is a valid reason and Guideline is a recommendation. The following best practice standards apply to build a proper DevOps monitoring device while programming:
S1. Each event has a unique number
S2. Each event refers to the software configuration item that has made the exception
S3. Each event has a severity code assigned
S4. Each event also defines the recovery action
S5. Each new event will be registered on the product backlog of the OPS team
During the build phase, it should be known which monitor functions are applicable. The service desk and operations team are important stakeholders to be involved.
The following aspects are important during programming:
In the build phase only unit test cases can be used because otherwise the requirement of a maximum of 5 minutes for a build process cannot be met. Therefore, the complete DevOps monitoring facility can only be tested later in the deployment pipeline. This means that most aspects of the monitor function cannot be tested until a check-in has taken place. Therefore, it is important that measures taken in the code phase are taken as much as possible to prevent defects.
The build must check that the format in which the exception is programmed complies with the SRG based on a source code check. During the build, it must be verified that each error message that is programmed is also present in the event blacklist or event white list.
The programmer must in the code phase define the unit testing that will be used in the build phase to detect errors in the exception handling. Both false positives and false negatives must be tested.
Only after the object code has been created in the build phase and the source code has been checked in the baseline, it is only possible to use system integration tests and system tests to determine whether the monitor facility detects exceptions at this level.
It is often said that automating the acceptance tests in the deployment pipeline should only include the happy path. An important exception on this statement is the acceptance tests that tests all blacklist’s exceptions.
A successful deployment of application and infrastructure components should also include the deployment of adjustments to the monitor facility. Otherwise, new events will not be recognized. Deploying a monitor tool adjustment is usually not so exciting. However, the regression testing of the monitor device is a challenge. In practice, very little time is spent to this aspect. As a result, DevOps monitoring facilities are often less effective than originally intended.
Additionally, the scripting in the deployment pipeline should also be provided with exceptions if the automatic deployment is not successful. Monitor facilities must therefore also monitor the deployment pipeline, and not only on availability but also on exceptions.
Keeping a system in the air usually means recovering of a service deviation. Therefore, the operations team is a key stakeholder to determine which aspects should be monitored. Often deviations will be possible to be solved automatically. Due to the collaboration of Dev and Ops teams, it has become possible to provide maximum support here.
Monitoring a service is not only applicable for production but throughout the D-T-A-P cycle. An important point that cannot be overlooked is the fact that in practice more and more DevOps monitoring tools are used. They should communicate well with each other. This collaboration of tools can impact the coding phase. This is because the communication of the monitor tools is often based on the format of events as they are programmed in the coder phase.
Discuss with us about this article on LinkedIn.
DevOps Best Practices, ISBN: 9789492618078
Agile Service Management with Scrum, ISBN: 9789071501807
Mogelijk is dit een vertaling van Google Translate en kan fouten bevatten. Klik hier om mee te helpen met het verbeteren van vertalingen.