The NSD is a part of the Moscow exchange and is a technology company providing all exchange settlements and most of the over-the-counter settlements, servicing both Russian and foreign securities. NSD‘s payment system is considered nationally significant: the default of the company is equal to the default of the whole country. If the depositary stops working, all business activity will be frozen. The company’s specialists bear great responsibility for the reliable operation of the IT infrastructure.
“We must not solve problems, but prevent them. Monitoring technological services is extremely important to ensure business continuity, and this is one of the key areas of our work,” says Denis Zelting, Head of Monitoring Development at NSD.
In the past, the company used many local monitoring systems that were independently created by specialists responsible for their IT resources. There was, however, no single point of control. But the services are developing, and the more complex they are, the harder it is to take into account all the factors that affect performance. It is necessary to track both similar routine problems (lack of disk space, password expiration, etc.) and rather rare events. And most importantly, the person who can solve the potential problem should be informed – the owner of a particular server, disk or application. There were no major difficulties, but increasing maturity in IT management has become an urgent requirement.
Micro Focus Operations Bridge (OpsB) is used at the NSD as an umbrella monitoring system. Event data is collected directly both from monitoring objects and from local systems. Furthermore, the information received is aggregated; the impact of each event on business processes is estimated and the centralized notification is organized.
“We needed to provide centralized monitoring of IT resources. Ideally, it would be good to give up small, self-written control systems, but so far it has been difficult to implement: It is not possible to directly cover all objects with a single solution, especially as we are dealing with home-developed systems in many cases,” Zelting explains.
The OpsB solution does not replace these systems, but integrates them and complements and enriches the information obtained. For example, data from the CMDB on resource owners and relationships between components is critical for the company.
For business, this purely infrastructural project was presented as an opportunity to increase customer satisfaction: Most of the failures will be detected by our own efforts, and not as a result of customer complaints. Implementing the project reduces the number of registered incidents. Events detected by the monitoring system are not included in the overall statistics.
“There are many monitoring systems on the market, including free ones, and they deal with the task of detecting failures. The main problems begin when it is necessary to determine to whom this information should be transferred and what to do when it becomes too much. In other words, it is necessary to organize filters and eliminate repetitions,” says Zelting. It is necessary to combine and organize data from different systems, identify valuable information and pass it on to the right recipients. In addition, an essential factor for the successful implementation of the project are Russian partners who can implement the system. Vendor support in Russian is also important. Among large system vendors, few meet such requirements. In terms of functionality, Micro Focus’ Russian support was similar to other vendors, but all other parameters were exceeded.
The product has a very wide range of functionality, and many of the existing features have been implemented in the system. In addition, because the solution was critical to the company, it was made resilient: The OpsB core supports 12 servers, and the number of servers involved exceeded 20 with additional services.
It is important that the system has been centralized and many processes have been automated, because the task of monitoring is to instantly detect failures and quickly notify the responsible employees.
“Without such a solution, it is impossible to fully track the failures in principle. If the information you get is 90% garbage, it is very difficult to work with. And if the information is 99% garbage, working conditions become impossible. Now we have the opportunity to enrich this core of data by increasing the percentage of useful information,” Zelting emphasizes.
Since OpsB not only “sees” what is broken, but also “knows” who the owner of the resource is, the incident is automatically assigned to the right workgroup, thus reducing the first line of support resources. Accelerating incident resolution is not so critical, but reducing the number of operators is a very tangible benefit.
Despite the growth in the number of clients and increasing complexity of IT systems, the number of incidents can be kept at a stable level, and this is an indicator of increased effectiveness. According to internal estimates, the availability of a monitoring system improves results by about 30%.
“80–90% of infrastructure failures are detected by the monitoring system and do not reach the incident stage. The ability to identify problems before they are seen and turned into incidents is a huge advantage. All of the NSD’s regular activities are carried out outside of working hours. If something goes wrong, the monitoring system flags it up right away. And the problem will be solved before the next business day,” Zelting concludes.