Sercompe Business Technology provides essential cloud services to approximately 60 enterprise customers, supporting approximately 50,000 users in total. Therefore, it is important that the underlying IT infrastructure of Joinville, Brazil, provides reliable service with predictable high performance. But with complex IT environments that include more than 2,000 virtual machines and 1 petabyte — or one million gigabytes — of data managed, network administrators must categorize all data and alerts to figure out what’s going on. happens when something goes wrong. cut off from. And it’s hard to make sure the network and storage are in the right place, or when the next upgrade is needed.
To help untangle complexity and increase the efficiency of support engineers, Sercompe has invested in an artificial intelligence operations platform AIOps), which uses AI to find the root cause of problems and Warns IT administrators before small problems become big problems. Now, according to cloud product manager Rafael Cardoso, the AIOps system does the bulk of the IT infrastructure management — a big benefit over older manual methods.
“Finding out when I need more space or capacity — it used to be a mess. We need to get information from a lot of different points when planning. Cardoso said. “I now have a complete view of the infrastructure and visualization from the virtual machines to the last disk in the rack.” AIOps gives visibility across the entire environment.
Before implementing the technology, Cardoso was where countless other organizations found themselves: trapped in a complex web of IT systems, with interdependencies between hardware layers, virtualization , middleware and finally applications. Any disruption or downtime can lead to tedious manual troubleshooting and, ultimately, a negative impact on business: for example, a website is down and make customers angry.
The AIOps platform helps IT managers master the task of automating IT operations by using AI to provide at-a-glance information on how infrastructure is performing — regions performing vs. where there is a risk of causing a downtime event. Credit for coining the term AIOps in 2016 goes to Gartner: it’s a broad category of tools designed to overcome the limitations of traditional monitoring tools. These platforms use self-learning algorithms to automate everyday tasks and understand the workings of the systems they monitor. They pull insights from performance data to identify and monitor anomalous behavior across IT infrastructure and applications.
Market research firm BCC Research estimates the global market for AIOps will grow from $3 billion in 2021 to $9.4 billion in 2026, at a compound annual growth rate of 26%. The growing AIOps adoption rate is being driven by digital business transformation and the need to move from a proactive response to infrastructure issues to proactive action.
“With data volumes reaching or exceeding gigabytes per minute across dozens of different domains, it is no longer possible for humans to manually analyze the data,” Gartner analysts wrote. Systematic adoption of AI speeds up insights and increases proactiveness.
According to Mark Esposito, director of learning at automation technology company Nexus FrontierTech, the term “AIOps” evolved from “DevOps” — the culture and practice of software engineering that integrates operations and software development. . “The idea is to advocate for automation and monitoring at all stages, from building software to managing infrastructure,” says Esposito. Recent innovation in this area includes the use of predictive analytics to predict and resolve problems before they can affect IT operations.
AIOps helps infrastructure fade into the background
Saurabh Kulkarni, head of engineering and product management at Hewlett Packard Enterprise, annoyed IT and network administrators by exploding data volumes and increasing complexity can use support help. Kulkarni works on HPE InfoSight, a cloud-based AIOps platform for proactively managing data center systems.
“IT admins spend a lot of time planning their work, planning their deployment, adding new nodes, compute, storage and all. And when something goes wrong in the infrastructure, it’s very difficult to debug those problems manually,” said Kulkarni. “AIOps uses machine learning algorithms to look at patterns, examine repetitive behaviors, and learn from them to make quick recommendations to users.” In addition to the storage nodes, every part of the IT infrastructure sends a separate alert so issues can be resolved quickly.
The InfoSight system collects data from all devices in a customer’s environment and then matches that data with data from HPE customers with similar IT environments. The system can identify a potential problem so it can be quickly resolved — if the problem recurs, a fix can be applied automatically. In addition, the system sends an alert so that IT teams can resolve the issue quickly, Kulkarni adds. Take the case of a storage controller that fails because it has no power. Instead of assuming the problem is only storage related, the AIOps platform examines the entire infrastructure, all the way down to the application layer, to determine the root cause.
“The system monitors performance and can see anomalies. We have algorithms that are constantly running in the background to detect any unusual behavior and alert customers before problems occur,” Kulkarni said. The philosophy behind InfoSight is to “make infrastructure disappear” by bringing IT systems and all telemetry data into a single pane of glass. Looking at a huge set of data, administrators can quickly figure out what’s going on with the infrastructure.
Kulkarni recalls the difficulties of managing a large IT environment from past jobs. “I had to manage a large data set, call a lot of different vendors, and be held for hours trying to figure out the problems,” he said. “Sometimes it takes us days to understand what’s really going on.”
By automating data collection and mining lots of data to understand root causes, AIOps enables companies to reallocate core personnel, including IT admins, storage administrators, and network administrators, consolidate roles as infrastructure is simplified, and spend more time ensuring application performance. “In the past, companies used to have multiple roles and different departments that handled different things. So even when deploying a new storage area, five different administrators each have to do their job,” said Kulkarni. But with AIOps, AI handles the majority of the work automatically so IT and support staff can spend their time on more strategic initiatives, increasing efficiency, and in the case of business providing Technical support for customers, improve profit margin. For example, Sercompe’s Cardoso was able to reduce the average time his support engineers spent on customer calls, reflecting a better customer experience while increasing efficiency.
Download Full report.
This content is produced by Insights, the custom content arm of MIT Technology Review. It was not written by the editorial board of the MIT Technology Review.