AIOps: Artificial intelligence helps build and run artificial intelligence

Lately, the concept of the employment of artificial intelligence for IT operations (AIOps) — which Gartner defines as the combination of AI, big data and ML to manage primary IT operations functions, “including availability and performance monitoring, event correlation and analysis, and IT service management and automation” — has been taking hold. Gartner has predicted that large enterprise exclusive use of AIOps and digital experience monitoring tools to monitor applications and infrastructure will rise from a mere five percent in 2018 to 30% in 2023.

In recent months, the dispersal of corporate teams to working out of their homes may have accelerated the acknowledgement that IT needs to be run on as much of a lights-out mode as possible. But the implications of AIOps go well beyond the recent COVID-19 crisis.  As much of the thrust in IT activity has been toward developing, scaling and supporting AI across their enterprises, we’re at a point in which AI will help us build, deploy and manage our next generation of AI. 

What can AIOps do for an enterprise IT shop? Jessica Rockwood, VP of engineering for IBM Watson, counted the ways in a recent post, which coincided with IBM’s announcement of enhanced AIOps capabilities on its Watson platform:

  • “Collects data from a heterogeneous array of sources across the IT infrastructure, from performance alerts to incident tickets. This data can be used to enable cost reductions and help achieve improved productivity by recognizing a specific time of day when demand on IT resources is low, and shifting compute resources automatically.” 
  • “If automatic adjustments are not desired, data can be displayed in a visual format that provides IT operations managers or Site Reliability Engineers with recommended courses of action, and explains the rationale behind those recommendations.” 
  • “Automates task such as shifting traffic from one router to another, freeing up space on a drive, or restarting an application. 
  • “AI systems can also be trained to self-correct so IT managers and their teams can spend their time on higher-value work, while simultaneously getting full visibility into the enterprise’s operations.”

What goes into an AIOps platform? Sameer Padhye, Bishnu Nayak and Enzo Signore explore the essential building blocks in their ebook, AIOps for Dummies:

  • Open data ingestion: An AIOps platform collects data that may include “operational insights such as faults, logs, performance metrics, log alerts, tickets, and more. The ability to ingest data from the most diverse data sources is critical because it allows for an accurate, real-time view of all the moving parts across hybrid IT environments.” 
  • Auto-discovery: “Businesses need an auto-discovery process that automatically collects data across all infrastructure and application domains – including on-premises, virtualized, and cloud deployments. Auto-discovery also identifies all infrastructure devices, the running applications, and the resulting business transactions.”
  • Correlation: “The AIOps platform correlates this data in a contextual form. It needs to determine the relationships between infrastructure elements, between an application and its infrastructure, and between the business transactions and the applications.” 
  • Visualization: Visualization enables IT operations to “quickly pinpoint issues and take corrective actions.”
  • Machine learning: “AIOps solutions use supervised and unsupervised machine learning to determine patterns of events in a time series. They also detect anomalies from expected behaviors and thresholds and predict outages and performance issues.”
  • Automation: Automation delivers ROI “by automating human IT operations tasks, reducing significant operating expenses, and expediting innovation. It also reduces MTTR and can improve customer satisfaction.”

The COVID-19 crisis has put pressure on IT leaders to cut spending, as well as find ways to do a lot more with a lot less. At the same time, there’s insatiable demand for AI-driven approaches that will inevitably tax IT infrastructures. For example, last year, Gartner analysts found general AI projects to be multiplying in scope. The average number of AI projects in place was four, but respondents expected to add six more projects in the next 12 months, and another 15 within the next three years. Ironically, AI itself is providing a way to support a viable infrastructure to support the growing volume of AI initiatives.