Empowering VMware Landscapes with AIOps

VMware has been at the forefront of everything good in the modern IT infrastructure landscape for a very long time. After it came up with solutions like VMware Server and Workstation around the early 2000s, its reputation got tremendously enhanced amongst businesses looking to upgrade IT infrastructure. VMware has been able to expand its offering since then by moving to public and private cloud. It has also brought sophisticated automation and management tools to simplify IT processes within organizations.

The technology world is not static, it is consistently changing to provide better IT solutions that are in line with the growing and diverse demands of organizations across the world. The newest wave doing the rounds revolves around IT operations and providing support to business services that are dependent on those IT environments. AIOps platforms find their origin, primarily from the world that VMware has created – a world that is built on IT infrastructure that is capable of modifying itself according to needs and is defined by software. This world created by VMware consists of components that are changing and moving at a rapid pace. In order to keep up with these changes, newer approaches to operating environments are required. AIOps solutions are emerging as the ideal way to run IT operations with no reliance on static service models or fragile systems. AIOps framework promises optimal utilization of skills and effort targeted at delivering maximum value.

In order to make the most of AIOps tools, it is important that they be used in ways that can complement the existing VMware infrastructure strategy. Here are a few of those:

Software-defined is the way to go

Even though SDx is not properly distributed, it is still here and making its mark. However, the uneven distribution of SDx is a problem. There is still a need to manage physical network infrastructure along with some aspects of VMware SDN. In order to ensure that you get the most out of VMware NFV/SDN, it is important to conduct a thorough overview combining all these aspects. By investing in an AIOps solution, you will have a unified view of the different infrastructure types. This will help you in not only identifying problems faster but also aligning IT operation resources to deal with them so that they don’t interfere with the service that you provide to your users, which is the ultimate objective of choosing to invest in any IT solution.

Integrated service-related view across the infrastructure

Not too many IT organizations out there can afford to use only one technology across the board. Every organization has to deal with many things that they have done prior to switching to AIOps. IT-related decisions made in the past could have a strong bearing on how easy or difficult the transition is. There is not just the management of virtual network and compute amongst others, organizations have their work cut out with the management of the physical aspects of these things as well. If that’s not enough, there is a public cloud and applications to manage as well.

Having an overview of the performance and availability of services that are dependent on all these different types of infrastructure is very important. Having said that, this unified view should be independent of time-consuming manual work associated with entering service definitions at every point of change. Also, whenever it is updated, it should do so with respect to the speed of infrastructure. Whether or not your IT infrastructure can support software-defined frameworks depends a lot on its minimum or no reliance on static models.  AIOps can get isolated data sources into a unified overview of services allowing IT operations teams to make the most of their time and focus only on the important things.

Automation is the key

You have to detect issues early if you want to reduce incident duration – that’s a fact. But there is no point in detecting issues early if you are not able to resolve them faster. AIOps tools connect with third-party automation tools as well as those that come with VMware to provide operators a variety of authorized actions to diagnose and resolve issues. So there are no different automation tools and actions for different people, which enables everyone to make the most of only the best tools. What this leads to is helping the IT operations teams to deliver desired outcomes, such as faster service restoration.

No-risk virtual desktops

There is no denying the benefits of having virtual desktops. However, there are disadvantages of taking the virtual route as well. With virtual desktops, you can have a chain of failure points, out of which any can have a huge impact on the service delivered to end-users. The risk comes from the different VDI chain links that are owned by different teams. This could prove harmful and cause outages, especially if support teams don’t go beyond their area of specialization and don’t communicate with other support teams either. The outages will be there for a longer period of time in these cases. AIOps can detect developing issues early and provide a background of the entire problem throughout the VDI chain. This can help different support teams to collaborate with each other and provide a resolution faster, consequently saving end-users from any disruption.

Collaboration across service teams

VMware admins have little problem in getting a clear overview of the infrastructure that they are working on. However, it is a struggle when it comes to visibility and collaboration across different teams. The problem with this lack of collaboration is the non-resolution of issues. When issues are raised, they only move from one team to another while their status remains unresolved. AIOps can improve the issue resolution rate and bring down issue resolution time considerably. It does this by associating events with their respective data source, aligning the issue to the team that holds expertise in troubleshooting that particular type of issue. AIOps also facilitates collaboration between different teams to fast-track issue resolution.

Transform your Azure Ecosystem with AIOps to Increase Operational Efficiency

The cloud is now a primary place for SMEs and other large enterprises, and Microsoft’s Azure is considered one of the preferred IaaS and PaaS services for most business organizations.

As Artificial Intelligence and Machine Learning are changing the digital way of life, AIOps is set to uplift cloud services and make operations easy for the IT industry. It provides users with a broader range of benefits, including better customer experience, service quality assurance, and productivity boost.

Why Does Your Organization Need AIOps With Microsoft Azure Ecosystem

As cloud usage is in high demand, businesses are facing problems in managing their cloud infrastructure. AIOps for Azure provides better efficiency with the help of AI-driven software, ensuring smoother operations.

By executing AI operations and ML on Microsoft Azure, organizations can be benefited in many ways. Some of these are:

Efficient and Cost-Effective Infrastructure

Microsoft Azure helps lower the overall cost of a business when enabled with AIOPs and MLOps. AI and ML help make Azure cloud a better choice for Machine Learning Operations and Artificial Intelligence Operations.

Edge Computing

Edge processing aims to bring data resources closer to the users, thus improving the overall performance of the cloud infrastructure. It also helps reduce cost and increase processing capacity simultaneously.

Pre-Trained Machine Learning Models

The Microsoft Azure Platform offers pre-trained models. These can be used for a custom model for tailor-made processing of the company’s workloads. Many ML programs can be used as models through MicrosoftML for Python and MicrosoftML for R for various functions.

Manage Your Azure Infrastructure Easily With AIOps

Microsoft Azure is a reliable cloud service that manages data efficiently. As the cloud is always increasing and becomes complex as each day passes, it needs more developers and engineers to make it stable. It can become quite easy to remain at par with the constantly evolving cloud if there were a solution to make data-based decisions automatically.

Not only will this save a lot of time for the resources of your organization, but also make the process more efficient. AIOps and machine learning help streamline the process and assist engineers in taking actions based on the insights from the existing data.

AIOps is based on self-monitoring and requires no human intervention. Automation of services ensures improved service quality, reliability, availability, and performance.

Azure cloud professionals are no longer required to investigate the repeated process and manually operate the infrastructure. Instead, they use AI and ML engineering. AI operations can work independently, and human resources can utilize their time to focus on solving bigger problems and building new functions.

Design Your Own Growth Path by Systemizing Your Operations With AIOps

The AIOps framework can contribute in several ways. The major elements are explained below.

  • Extensive and Diversified IT Data: AIOps is predicted to bring together data from IT operations management and IT service management. Bringing data from different sources helps accelerate root cause identification of a problem and enables automation simultaneously.
  • Big Data Platform: The center of an AIOps platform is big data. As data is collected from different sources, it is required to be compiled together to support next-level analytics. AIOps aggregates big data and makes it accessible to be used in real-time.
  • Machine Learning: Analysing big data is not possible by humans alone. ML automates and analyzes new and diversified data with a speed that is unachievable without the AIOps framework.
  • Observation: It is the emerging of the traditional ITO domain and other non-ITOM data to enable new models and correlations. The combination of AIOps with real-time processing makes root cause identification easier.
  • Engagement: The traditional domain offers bi-directional communication to support data analysis and, thus, auto-creates documentation for audit while maintaining compliance. AIOps help in cognitive classification with routing and intelligence along with user touchpoints.
  • Act: This is the final stop for the AIOps strategy. It provides the codification of human knowledge into automation. It helps automate analysis, workflow, and documentation for further actions.

What’s Does the Future Have in Store for IT Operations?

Artificial Intelligence for IT operations is bringing a continuous change in the cloud business. In no time, adopting the AIOps way will become a necessity.

  • Accelerate Digital Transformation: Sooner than later, businesses will be able to offer data-driven experiences with the help of AIOps. It won’t be a hassle to migrate systems after systems, as most of the monotonous work will be handled by automated systems. This way, businesses can easily transform digitally to remain relevant
  • Solutions to Various Challenges: Often, when humans spend time performing basic calculations, a lot of time and energy is wasted. Moreover, there is always a chance of human error. Empowering developers with actionable insights, AIOps will make solving problems hassle-free, replacing many traditional monitoring tools
  • Finding Issues Automatically: A faster and more efficient way to improve customer satisfaction involves ensuring that there are no problems with your service or product. However, this can be challenging. With AIOps solutions, identifying issues and mitigating them will be a cakewalk. It will play an essential role in troubleshooting workloads and understanding and predicting customer needs in the current competitive environment, eliminating the need for having a dedicated team of resources to solve simple issues.

How Does AIOps Transform a Business?

1. Digitization of Routine Practices

The AIOps architecture helps digitize routine practices, like user requests, while processing and fulfilling them automatically. It can even evaluate whether an alert requires action and if all the supporting data is under normal parameters.

2. Recognizing Serious Issues Faster and More Accurately

There are chances of human error while looking out for threats. This may lead to an unusual download being ignored. AIOps tools tackle can solve this problem easily. It can run an antimalware function through the system, automatically and when required.

3. AIOps Streamline the Interactions Between Data Center Groups and Various Teams

AIOps shares all the relevant data with each IT group and provides the operations team with what they require. Manually meeting and sending data is no more required, as AIOps monitors data for each team to streamline the interactions between all groups.

Conclusion

With the help of Microsoft Azure, the value of companies associated with this ecosystem is scaling in an upward direction. To conclude, it can be rightly said that AIOps is the infusion of AI into cloud technology. When properly implemented, AIOps can help reduce time and attention on the IT staff of an organization.

AIOps open-source tools allow Azure cloud professionals to observe multiple systems and resources. With better ML capabilities, it can enable software to find the root cause of a problem and accelerate troubleshooting by providing the right remedies for all unusual issues of an IT organization running on Microsoft Azure.

Large Language Models: A Leap in the World of Language AI

In Google’s latest annual developer conference, Google I/O, CEO Sundar Pichai announced their latest breakthrough called “Language Model for Dialogue Applications” or LaMDA. LaMDA is a language AI technology that can chat about any topic. That’s something that even a normal chatbot can do, then what makes LaMDA special?

Modern conversational agents or chatbots follow a narrow pre-defined conversational path, while LaMDA can engage in a free-flowing open-ended conversation just like humans. Google plans to integrate this new technology with their search engine as well as other software like voice assistant, workplace, gmail, etc. so that people can retrieve any kind of information, in any format (text, visual or audio), from Google’s suite of products. LaMDA is an example of what is known as a Large Language Model (LLM).

Introduction and Capabilities

What is a language model (LM)? A language model is a statistical and probabilistic tool which determines the probability of a given sequence of words occurring in a sentence. Simply put, it is a tool which is trained to predict the next word in a sentence. It works like how a text message autocomplete works. Where weather models predict the 7-day forecast, language models try to find patterns in the human language, one of computer science’s most difficult puzzles as languages are ever-changing and adaptable.

A language model is called a large language model when it is trained on enormous amount of data. Some of the other examples of LLMs are Google’s BERT and OpenAI’s GPT-2 and GPT-3. GPT-3 is the largest language model known at the time with 175 billion parameters trained on 570 gigabytes of text. These models have capabilities ranging from writing a simple essay to generating complex computer codes – all with limited to no supervision.

Limitations and Impact on Society

As exciting as this technology may sound, it has some alarming shortcomings.

1. Biasness: Studies have shown that these models are embedded with racist, sexist, and discriminatory ideas. These models can also encourage people for genocide, self-harm, and child sexual abuse. Google is already using an LLM for its search engine which is rooted in biasness. Since Google is not only used as a primary knowledge base for general people but also provides an information infrastructure for various universities and institutions, such a biased result set can have very harmful consequences.

2. Environmental impact: LLMs also have an outsize impact on the environment as these emit shockingly high carbon dioxide – equivalent to nearly five times the lifetime emissions of an average car including manufacturing of the car.

3. Misinformation: Experts have also warned about the mass production of misinformation through these models as because of the model’s fluency, people can confuse into thinking that humans have produced the output. Some models have also excelled at writing convincing fake news articles.

4. Mishandling negative data: The world speaks different languages that are not prioritized by the Silicon Valley. These languages are unaccounted for in the mainstream language technologies and hence, these communities are affected the most. When a platform uses an LLM which is not capable of handling these languages to automate its content moderation, the model struggles to control the misinformation. During extraordinary situations, like a riot, the amount of unfavorable data coming in is huge, and this ends up creating a hostile digital environment. The problem does not end here. When the fake news, hate speech and all such negative text is not filtered, it is used as a training data for next generation of LLMs. These toxic linguistic patterns then parrot back on the internet.

Further Research for Better Models

Despite all these challenges, very little research is being done to understand how this technology can affect us or how better LLMs can be designed. In fact, the few big companies that have the required resources to train and maintain LLMs refuse or show no interest in investigating them. But it’s not just Google that is planning to use this technology. Facebook has developed its own LLMs for translation and content moderation while Microsoft has exclusively licensed GPT-3. Many startups have also started creating products and services based on these models.

While the big tech giants are trying to create private and mostly inaccessible models that cannot be used for research, a New York-based startup, called Hugging Face, is leading a research workshop to build an open-source LLM that will serve as a shared resource for the scientific community and can be used to learn more about the capabilities and limitations of these models. This one-year-long research (from May 2021 to May 2022) called the ‘Summer of Language Models 21’ (in short ‘BigScience’) has more than 500 researchers from around the world working together on a volunteer basis.

The collaborative is divided into multiple working groups, each investigating different aspects of model development. One of the groups will work on calculating the model’s environmental impact, while another will focus on responsible ways of sourcing the training data, free from toxic language. One working group is dedicated to the model’s multilingual character including minority language coverage. To start with, the team has selected eight language families which include English, Chinese, Arabic, Indic (including Hindi and Urdu), and Bantu (including Swahili).

Hopefully, the BigScience Project will help produce better tools and practices for building and deploying LLMs responsibly. The enthusiasm around these large language models cannot be curbed but it can surely be nudged in a direction that has lesser shortcomings. Soon enough, all our digital communications—be it emails, search results, or social media posts —will be filtered using LLMs. These large language models are the next frontier for artificial intelligence.

About the Author –

Priyanka Pandey

Priyanka is a software engineer at GAVS with a passion for content writing. She is a feminist and is vocal about equality and inclusivity. She believes in the cycle of learning, unlearning and relearning. She likes to spend her free time baking, writing and reading articles especially about new technologies and social issues.

AIOps for Service Reliability Engineering (SRE)

Data is the single most accountable yet siloed component within any IT infrastructure. According to a Gartner report, an average enterprise IT infrastructure generates up to 3 times more IT operational data with each passing year. Large businesses find themselves challenged by frequent unplanned downtime of their services, high IT issue resolution times, and consequently poor user experience caused by inefficient management of this data overload, reactive IT operations, and other reasons such as:

  • Traditional legacy systems that do not scale
  • Siloed environments preventing unified visibility into IT landscape
  • Unattended warning signs due to alert fatigue
  • Lack of advanced tools to intelligently identify root causes of cross-tier events
  • Multiple hand-offs that require manual intervention affecting problem remediation workflow

Managing data and automation with AIOps

The surge of AI in IT operations or AIOps is helping bridge the gap between the need for meaningful insights and human intervention, to ensure service reliability and business growth. AIOps is fast becoming a critical need since effective management of the humongous data volumes has surpassed human capabilities. AIOps is powered by AI/ML algorithms that enable automatic discovery of infra & applications, 360o observability into the entire IT environment, noise reduction, anomaly detection, predictive and prescriptive analytics, and automatic incident triage and remediation!

AIOps provides clear insights into application & infrastructure performance and user experience, and alerts IT on potential outages or performance degradation. AIOps delivers a single, intelligent, and automated layer of intelligence across all IT operations, enabling proactive & autonomous IT operations, improved operational efficiencies through reduction of manual effort/fatigue/errors, and improved user experience as predictive & prescriptive analytics drive consistent service levels.

The Need for AIOps for SRE

SRE mandates that the IT team always stays ahead of IT outages and proactively resolves incidents before they impact the user. However, even the most mature teams face challenges due to the rapidly increasing data volumes and expanding IT boundaries, created by modern technologies such as the cloud, and IoT. SRE faces challenges such as lack of visibility and technology fragmentation while executing these tasks in real-time.

SRE teams have started to leverage AI capabilities to detect & analyze patterns in the data, eliminate noise & gain meaningful insights from current & historical data. As AIOps enters the SRE realm, it has enabled accelerated and automated incident management and resolution. With AI at the core, SRE teams can now redirect their time towards strategic initiatives and focus on delivering high value to users.

Transform SRE with AIOps

SREs are moving towards AIOps to achieve these main goals:

  • Improved visibility across the organization’s remote & distributed systems
  • Reduced response time through automation
  • Prevention of incidents through proactive operations

AIOps Platform ZIFTM from GAVS allows enterprises focused on digital transformation to become proactive with IT incidents, by delivering AI-led predictions and auto-remediation. ZIF is a unified platform with centralized NOC powered by AI-led capabilities for automatic environment discovery, going beyond monitoring to observability, predictive & prescriptive analytics, automation & self-remediation enabling outcomes such as:

  • Elimination of digital dirt
  • IT team empowered with end-to-end visibility
  • Breaking away the silos in IT infrastructure systems and operations
  • Intuitive visualization of application health and user experience from the digital delivery chain
  • Increasing precision in intelligent root cause analyses helping drastic cut in resolution time (MTTR)
  • ML algorithms for continuous learning from the environment driving huge improvements with time
  • Zero-touch automation across the spectrum of services, including delivery of cloud-native applications, traditional mainframes, and process workflows

The future of AIOps

Gartner predicts a rapidly growing market size from USD 1.5 billion in 2020. Gartner also claims that the future of IT operations cannot operate without AIOps due to these four main drivers:

  • Redundancy of traditional approaches to handling IT complexities
  • The proliferation of IoT devices, mobile applications & devices, APIs
  • Lack of infrastructure to support IT events that require immediate action
  • Growth of third-party services and cloud infrastructure

AIOps has a strong role in five major areas — anomaly detection, event correlation and advanced data analysis, performance analysis, automation, and IT service management. However, to get the most out of AIOps, it is crucial to choose the right AIOps platform, as selecting the right partner is critical to the success of such an important org initiative. Gartner recommends prioritizing vendors based on their ability to address challenges, data ingestion & analysis, storage & access, and process automation capabilities. We believe ZIF is that AIOps solution for you! For more on ZIF, please visit www.zif.ai.

Introduction to Shift Left Testing

Abdul Riyaz

Never stop until the very end.

The above statement encapsulates the essence of Shift Left Testing.

Quality Assurance should keep up the momentum of testing during the end-to-end flow. This will ensure Quicker Delivery, Quality Product, and Increased Revenue with higher Profitability. This will help transform the software development process. Let me elucidate how it helps.

Traditional Testing vs Shift Left Testing

For several decades, Software Development followed the Waterfall Model. In this method, each phase depends on the deliverables of the previous phase. But over time, the Agile method provided a much better delivery pattern and reduced the delivery timelines for projects. In this Software Development model, testing is a continuous process that starts at the beginning of a project and reduces the timelines. If we follow the traditional way of testing after development, it eventually results in a longer timeline than we imagined.

Hence, it is important to start the testing process parallel to the development cycle by using techniques such as ‘Business-Driven Development’ to make it more effective and reduce the timeline of delivery. To ensure Shift Left Testing is intact, AUT (Application Under Test) should be tested in an automated way. There are many proven Automation Testing software available in the current world of Information Technology which help better address this purpose.

AI Devops Automation Service Tools
AIOps Artificial Intelligence for IT Operations

End-to-End Testing Applied over Shifting Left!

Software Testing can be predominantly classified in 3 categories – Unit, Integration and End-to-End Testing. Not all testing correspondingly shifts left from Unit test to System test. But this approach is revolutionized by Shift Left Testing. Unit Testing is straightforward to test basic units of code, End-to-End Testing is based on the customer / user for the final product. But if we bring the End-to-End testing to the left, that will result in better visibility of the code and its impact on the entire product during the development cycle itself.

The best way we could leverage ML (Machine Learning) and achieve a Shift-Left towards design and development with testing is indicated by continuous testing, visual testing, API coverage, scalable tests and extendable coverage, predictive analytics, and code-less automation.

AIOps Digital Transformation Solutions

First Time Right & Quality on Time Shift Left Testing not only reduces the timeline of deliveries, but it also ensures the last minute defects are ruled out and we get to identify the software flaws and conditions during the development cycle and fix them, which eventually results in “First Time Right”. The chance of leaking a defect is very less and the time spent by development and testing teams towards fixing and retesting the software product is also reduced, thereby increasing the productivity for “Quality on Time” aspects.

I would like to refer to a research finding by the Ponemon Institute. It found that if vulnerabilities are detected in the early development process, they may cost around $80 on average. But the same vulnerabilities may cost around $7,600 to fix if detected after they have moved into production.

Best AI Auto Discovery Tools

The Shift left approach emphasizes the need for developers to concentrate on quality from the early stages of a software build, rather than waiting for errors and bugs to be found late in the SDLC.

Machine Learning vs AI vs Shift Left Testing There are opportunities to leverage ML methods to optimize continuous integration of an application under test (AUT) which begins almost instantaneously. Making machine learning work is a comparatively smaller feat but feeding the right data and right algorithm into it is a tough task. In our evolving AI world, gathering data from testing is straightforward. Eventually making practical use of all this data within a reasonable time is what remains intangible. A specific instance is the ability to recognize patterns formed within test automation cycles. Why is this important? Well, patterns are present in the way design specifications change and, in the methods, programmers use to implement those specifications. Patterns follow in the results of load testing, performance testing, and functional testing.

ML algorithms are great at pattern recognition. But to make pattern recognition possible, human developers must determine which features in the data might be used to express valuable patterns. Collecting and wrangling the data into a solid form and knowing which of the many ML algorithms to inject data into, is very critical to success.

Many organizations are striving towards inducting shift left in their development process; testing and automation are no longer just QA activities. This certainly indicates that the terms of dedicated developers or testers are fading away. Change is eventually challenging but there are few aspects that every team can work towards to prepare to make this shift very effective. It might include training developers to become responsible for testing, code review quality checks, making testers aware of code, start using the same tools, and always beginning with testability in mind.

Shifting left gives a greater ability to automate testing. Test automation provides some critical benefits;

  • Fewer human errors
  • Improvised test coverage (running multiple tests at same time)
  • Involvement and innovative focus of QA engineers apart from day to day activities
  • Lesser or no production defects.
  • Seamless product development and testing model

Introducing and practicing Shift Left Testing will improve the Efficiency, Effectiveness and the Coverage of testing scope in the software product which helps in delivery and productivity.

References

About the Author –

Riyaz heads the QA Function for all the IP Projects in GAVS. He has vast experience in managing teams across different domains such as Telecom, Banking, Insurance, Retail, Enterprise, Healthcare etc.

Outside of his professional role, Riyaz enjoys playing cricket and is interested in traveling and exploring things. He is passionate about fitness and bodybuilding and is fascinated by technology.

Anomaly Detection in AIOps

Vimalraj Subash

Before we get into anomalies, let us understand what is AIOps and what is its role in IT Operations. Artificial Intelligence for IT operations is nothing but monitoring and analyzing larger volumes of data generated by IT Platforms using Artificial Intelligence and Machine Learning. These help enterprises in event correlation and root cause analysis to enable faster resolution. Anomalies or issues are probably inevitable, and this is where we need enough experience and talent to take it to closure.

Let us simplify the significance of anomalies and how they can be identified, flagged, and resolved.

What are anomalies?

Anomalies are instances when performance metrics deviate from normal, expected behavior. There are several ways in which this occur. However, we’ll be focusing on identifying such anomalies using thresholds.

How are they flagged?

With current monitoring systems, anomalies are flagged based on static thresholds. They are constant values that provide the upper limits of a normal behavior. For example, CPU usage is considered anomalous when the value is set to be above 85%. When anomalies are detected, alerts are sent out to the operations team to inspect.

Why is it important?

Monitoring the health of servers are necessary to ensure the efficient allocation of resources. Unexpected spikes or drop in performance such as CPU usage might be the sign of a resource constraint. These problems need to be addressed by the operations team timely, failing to do so may result in applications associated with the servers failing.

So, what are thresholds, how are they significant?

Thresholds are the limits of acceptable performance. Any value that breaches the threshold are indicated in the form of alerts and hence subjected to a cautionary resolution at the earliest. It is to be noted that thresholds are set only at the tool level, hence that way if something is breached, an alert will be generated. These thresholds, if manual, can be adjusted accordingly based on the demand.

There are 2 types of thresholds;

  1. Static monitoring thresholds: These thresholds are fixed values indicating the limits of acceptable performance.
  2. Dynamic monitoring thresholds: These thresholds are dynamic in nature. This is what an intelligent IT monitoring tool does. They learn the normal range for both a high and low threshold, at each point in a day, week, month, and so on. For instance, a dynamic system will know that a high CPU utilization is normal during backup, and the same is abnormal on utilizations occurring on other days.

Are there no disadvantages in the threshold way of identifying alerts?

This is definitely not the case. Like most things in life, it has its fair share of problems. Routing from philosophy back to our article, there are disadvantages in the Static Threshold way of doing things, although the ones with a dynamic threshold are minimal. We should also understand that with the appropriate domain knowledge, there are many ways to overcome these.

Consider this scenario. Imagine a CPU threshold set at 85%. We know anything that breaches this, is anomalies generated in the form of alerts. Now consider the same threshold percentage as normal behavior in a Virtual Machine (VM). This time, the monitoring tool will generate alerts continuously until it reaches a value below the threshold. If this is left unattended, it will be a mess as there might be a lot of false alerts which in turn may cause the team to fail to identify the actual issue. It will be a chain of false positives that occur. This can disrupt the entire IT platform and cause an unnecessary workload for the team. Once an IT platform is down, it leads to downtime and loss for our clients.

As mentioned, there are ways to overcome this with domain knowledge. Every organization have their own trade secrets to prevent it from happening. With the right knowledge, this behaviour can be modified and swiftly resolved.

What do we do now? Should anomalies be resolved?

Of course, anomalies should be resolved at the earliest to prevent the platform from being jeopardized. There are a lot of methods and machine learning techniques to get over this. Before we get into it, we know that there are two major machine learning techniques – Supervised Learning and Unsupervised Learning. There are many articles on the internet one can go through to have an idea of these techniques. Likewise, there are a variety of factors that could be categorized into these. However, in this article, we’ll discuss an unsupervised learning technique – Isolation Forest amongst others.

Isolation Forest

The algorithm isolates observations by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature.

The way that the algorithm constructs the separation is by first creating isolation trees, or random decision trees. Then, the score is calculated as the path length to isolate the observation. The following example shows how easy it is to separate an anomaly-based observation:

Best AI Auto Discovery Tools

 

In the above image, the blue points denote the anomalous points whereas the brown ones denote the normal points. Anomaly detection allows you to detect abnormal patterns and take appropriate actions. One can use anomaly-detection tools to monitor any data source and identify unusual behaviors quickly. It is a good practice to research methods to determine the best organizational fit. One way of doing this is to ideally check with the clients, understand their requirements, tune algorithms, and hit the sweet spot in developing an everlasting relationship between organizations and clients.

Zero Incident FrameworkTM, as the name suggests, focuses on trending organization towards zero incidents. With knowledge we’ve accumulated over the years, Anomaly Detection is made as robust as possible resulting in exponential outcomes.

References

About the Author –

Vimalraj is a seasoned Data Scientist working with vast data sets to break down information, gather relevant points, and solve advanced business problems. He has over 8 years of experience in the Analytics domain, and currently a lead consultant at GAVS.

Reimagining ITSM Metrics

Rama Periasamy

Rama Vani Periasamy

In an IT Organization, what is measured as success.? Predominantly it inclines towards the Key Performance Indicators, internally focused metrics, SLAs and other numbers. Why don’t we shift our performance reporting towards ‘value’ delivered to our customers along with the contractually agreed service levels? Because the success of any IT operation comes from defining what it can do to deliver value and publishing what value has been delivered, is the best way to celebrate that success.

It’s been a concern that people in service management overlook value as trivial and they often don’t deliver any real information about the work they do . In other words, the value they have created goes unreported and the focus lies only on the SLA driven metrics & contractual obligations. It could be because they are more comfortable with the conventional way of demonstrating the SLA targets achieved. And this eventually prevents a business partner from playing a more strategic role.

“Watermelon reporting” is a phrase used in reporting a service provider’s performance. The SLA reports depict that the service provider has adhered to the agreed service levels and met all contractual service level targets. It looks ’green’ on the outside, just like a watermelon. However, the level of service perceived by the service consumer does not reflect the ’green’ status reported (it might actually be ’red’, like the inside of a watermelon). And the service provider continues to report on metrics that do not address the pain points.  

This misses the whole point about understanding what success really means to a consumer. We tend to overlook valuable data and the one that shows how an organization as a service provider is delivering value and helping the customer achieve his/her business goals.

The challenge here is that often consumers have underdeveloped, ambiguous and conflicting ideas about what they want and need. It is therefore imperative to discover the users’ unarticulated needs and translate them into requirements.

For a service provider, a meaningful way of reporting success would be focused on outcomes rather than outputs which is very much in tandem with ITIL4. Now this creates a demand for better reporting, analysis of delivery, performance, customer success and value created.

Consider a health care provider, the reduced time spent in retrieving a patient history during a surgery can be a key business metric and the number of incidents created, number of successful changes may be secondary. As a service provider, understanding how their services support such business metrics would add meaning to the service delivered and enable value co-creation.

It is vital that a strong communication avenue is established between the customer and the service provider teams to understand the context of the customer’s business. To a large extent, this helps the service provider teams to prioritize what they do based on what is critical to the success of the customer/service consumer. More importantly, this enables the provider become a true partner to their customers.

Taking service desk as an example, the service desk engineers fixes a printer or a laptop, resets passwords. These activities may not provide business value, but it helps to mitigate any loss or disruption to a service consumer’s business activities. The other principal part of service desk activity is to respond to service requests. This is very much an area where business value delivered to customers can be measured using ITSM.

Easier said, but how and what business value is to be reported? Here are some examples that are good enough to get started.

1. Productivity
Assuming that every time a laptop problem is fixed with the SLA, it allows the customer to get back to work and be productive. Value can be measured here by the cost reduction – considering the employee cost per hour and the time spent by the IT team to fix the laptop.

How long does it take for the service provider to provide what a new employee needs to be productive? This measure of how long it takes to get people set up with the required resources and whether this lead-time matches the level of agility the business requires equates to business value. 

2. Continual Service Improvement (CSI)

Measuring value becomes meaningless when there is no CSI. So, measuring the cost of fixing an incident plus the loss of productivity and identifying and providing solutions on what needs to be done to reduce those costs or avoid incidents is where CSI comes into play.

Here are some key takeaways:

  • Make reporting meaningful by demonstrating the value delivered and co-created, uplifting your operations to a more strategic level.
  • Speak to your customers to capture their requirements in terms of value and enable value co-creation as partners.
  • Your report may wind up in the trash, not because you have reported wrong metrics, but it may just be reporting of data that is of little importance to your audience.   

Reporting value may seem challenging, and it really is. But that’s not the real problem. Keep reporting your SLA and metrics but add more insights to it. Keep an eye on your outcomes and prevent your IT service operations from turning into a watermelon!

References –

About the Author –

Rama is a part of the Quality Assurance group, passionate about ITSM. She loves reading and traveling.
To break the monotony of life and to share her interest in books and travel, she blogs and curates at www. kindleandkompass.com

The Hands that Rock the Cradle, also Crack the Code

Sumit Ganguli

On February 18, 2021, I was attending a video conference, with my laptop perched on my standing desk while I was furtively stealing a glance at the TV in my study. I was excitedly keeping up with the Perseverance Rover that was about to land at the Mars. I was mesmerized by space odyssey and was nervous about the ‘seven minutes of terror’ –  when the engineers overseeing the landing would not be able to guide or direct the Perseverance landing as it would take a while to establish or send any communication from Earth to Mars. Hence, the rover would have to perform a landing by itself, with no human guidance involved.

During this time, I thought I saw a masked lady with a ‘bindi’ on her forehead at the NASA control room who was, in her well-modulated American accented voice, giving us a live update of the Rover.

And since that day, Swati Mohan has been all over the news. We have got to know that Mohan was born in Bengaluru, Karnataka, India, and emigrated to the United States when she was one year old. She became interested in space upon seeing Star Trek at age 9. She studied Mechanical and Aerospace Engineering at Cornell University, and did her master’s degree and Ph.D. in Aeronautics and Astronautics at Massachusetts Institute of Technology.

Swati Mohan is the lead for the Navigation and Controls (GN&C) Operations for the Mars project. She led the attitude control system of Mars 2020 during operations and was the lead systems engineer throughout development. She played a pivotal part in the landing which was rather tricky.

This led me to ruminate about women and how they have challenged stereotypes and status quo to blaze the trail, especially in STEM.

I have been fascinated from the time I got to know that the first programmer in the world was a woman, and daughter of the famed poet, Lord Byron, no less. The first Programmer in the World, Augusta Ada King-Noel, Countess of Lovelace nee Byron; was born in 1815 and was the only legitimate child of the poet laureate, Lord Byron, and his wife Annabella. 

As a teenager, Ada’s prodigious mathematical talents, led her to have British mathematician Charles Babbage, as her mentor. Babbage is known as ‘the father of computers’. Ada translated an article on the Analytical Engine, which she supplemented with an elaborate set of notes, simply called Notes. These notes contain what many consider to be the first computer program—that is, an algorithm designed to be carried out by a machine. As a result, she is often regarded as the first computer programmer.

Six women—Francis “Betty” Snyder Holberton, Betty “Jean” Jennings Bartik, Kathleen McNulty Mauchly Antonelli, Marlyn Wescoff Meltzer, Ruth Lichterman Teitelbaum, and Frances Bilas Spence were associated with the programming of the first computer ENIAC. They had no documentation and no schematics to work with. There was no language, no operating system, the women had to figure out what the computer was, and then break down a complicated mathematical problem into very small steps that the ENIAC could then perform.  They physically hand-wired the machine, using switches, cables, and digit trays to route data and program pulses. This might have been a very complicated and arduous task. So, these six women were the programmers for the world’s mainframe computers.

The story goes that on February 14, 1946 The ENIAC was announced as a modern marvel in the US. There was praise and publicity for the Moore School of Electrical Engineering at the University of Pennsylvania, the inventors of ENIAC the first computer, Eckert and Mauchly were heralded as geniuses. However, none of the key programmers, all the women were not introduced in the event. Some of the women appeared in photographs later, but everyone assumed they were just models, perfunctorily placed to embellish the photograph.

One of the six programmers, Betty Holberton went on to invent the first sort routine and help design the first commercial computers, the UNIVAC and the BINAC, alongside Jean Jennings. These were the first commercial mainframe computers in the world.

It behooves us to walk down the pages of history and read about women who had during their time decided to #choosetochallenge and celebrate the likes of Swati Mohan who have grown tall on the shoulders of the first women programmers.

About the Author –

Sumit brings over 20 years of rich experience in the international IT and BPO sectors. Prior to GAVS, he served as a member of the Governing Council at a publicly-traded (NASDAQ) IT and BPO company for over six years, where he led strategic consulting, IP and M&A operations.

He has managed global sales and handled several strategic accounts for the company. He has an Advanced Professional Certificate (APC) in Finance from Stern School of Management, NYU, and is a Post Graduate in Management from IIM. He has attended the Owners President Management Program (OPM 52) at HBS and is pursuing a Doctorate in Business Administration at the LeBow School of Business, Drexel University.

He has served as an Adjunct Professor at Rutgers State University, New Jersey teaching International Business. He speaks at various industry forums and is involved in philanthropic initiatives like Artha Forum.

5 Leadership Lessons from the Pandemic to Kickstart your Technology Career in 2021

Jane Aboyoun, CIO, SCO Family of Services

Life is not without its ironies. While the pandemic turbo-charged our dependence on technology for day-to-day activities like never before, it also clarified the importance as a leader to be thoughtful and strategic – to take a step back before leaping into the fray.  Here are 5 lessons that helped me navigate the COVID crises that I believe we can all benefit from carrying forward into 2021 and beyond.

  1.  Slow Down to Speed Up

The necessity of responding effectively to COVID-19 as a Tech Chief compelled me to use my expertise to quickly identify technology solutions that would have an impact for my clients.  While responsiveness in an uncertain climate is essential, it’s actually a strong technology foundation that allows agility and creates ballast for organizations looking to gain competitive advantage in uncertain times.  

Lesson #1 is therefore that while it may not be as inspiring as the latest app, focusing on the “blocking and tackling” and building a strong technology foundation enables agility and re-invention.  As a CIO, I constantly balance possible change opportunities with the readiness of my clients to accept that change. Knowing how far to push my clients is a key part of my role. Just because a technology is available, doesn’t always mean it’s right for them.  Always consider how a new technology fits within the foundation.

  1. Don’t Reinvent the Wheel

My role as the CTO of the New York Public Library proved to be a great training ground in how to manage the complexity of upgrading infrastructure, moving applications to the cloud, and building a digital repository. I devised a three-part strategy for the transformation. First, I had to upgrade the aging infrastructure. Second, I had to move the infrastructure and the applications into the cloud, to improve our resiliency, security, and functionality. The third was to figure out how to preserve the library’s physical assets which were expiring from age. We decided to digitize the assets to permanently preserve them. Within 5 years, the repository had over a Petabyte of assets in it and was continuing to grow. These resulted in a world-class computing environment, moving a beloved, trusted, public city library into the digital 21st century that can be accessed by future generations.  Lesson # 2 – the secret to our success at NYPL was that the technology platforms and applications we used were all developed by best-of-breed providers.  We recognized that we were in the data business rather than the R&D business, and as such, didn’t build anything ourselves.  Instead, we took pride in working with and learning from industry leaders.

  1. Future-Proof Your Thinking

The pace of change is so much more rapid than it was even five years ago. Being able to recognize that the landscape is evolving, pivot at speed, and adopt new technology within the organization is now an essential skillset for technology leaders.  I am personally excited about the ‘internet of things’ (IoT) and the data that is being collected at the edge which will be enhanced by 5G capabilities. Also, AI and ML are on the cusp of making a ‘next level’ leap. I think there are lots of good applications of it, we just need to figure out how to use them responsibly.  Lesson # 3 is that as a technology leader, we need to be constantly looking around corners and to remain open-minded and curious about what’s next.  It is important for all leaders and aspiring leaders to ask questions; to challenge the status quo. 

  1. The Human Factor Remains a Top Priority

New technology comes with its own set of challenges. I believe the issue of privacy and security to be the most pressing. Data is being collected everywhere and often has proved to be more valuable that the platform it sits on. Hence, it is paramount to understand evolving data and privacy standards, as well as how to secure it and identify breaches. Then there are also moral and ethical issues around AI. While the opportunities are limitless, it is of utmost importance that we maintain our moral and democratic compass and that we apply technology in a way that benefits society. Lesson # 4 is that while it’s challenging to get the balance between innovation, opportunity, and ethics right, it’s a battle worth fighting.

  1. Facts Matter – Strive for Balance

Another issue for me is information overload.  Knowing what is real and what isn’t, has never been more important. This is where go-to trusted news and academic sources come into play. Two influencers I follow are Dan Fagella from EMERJ and Bernard Marr.  Both Dan and Bernard focus on AI and it’s motivating to hear and read what they have to say. I also read the technology review from MIT and listen to several technology podcasts.  Lesson # 5 is that it’s critical to continue to seek knowledge and to make a point of agnostically learning a lot from other technologists, business-people, and vendors.   Doing your own research and triangulation in the age of ‘alternative facts’ ensures that you stay informed, relevant and are able to separate fact from fiction.

In summary, as we enter the ‘Next Normal’, I anticipate that the pace of change will be faster than ever.  However, it’s important to remember that it’s not technology that leads the way, it’s people.  Staying in touch with technology trends and solutions is obviously important, but so is staying in touch with your values and humanity.  At the end of the day, technology is just an enabler and it’s the human values we apply to it that make the difference in how impactful it will be.

About the Author –

Jane Aboyoun is the Chief Information Officer at SCO Family of Services, a non-profit agency that helps New Yorkers build a strong foundation for the future. In this role, Jane is responsible for leading SCO’s technology strategy, and managing the agency’s technology services to support business applications, architecture, data, engineering, and computing infrastructure.

As an accomplished CIO / CTO, Jane has spent 20 years in the C-suite in a variety of senior technology leadership roles for global, world-class brands such as Nestlé Foods, KPMG, Estēe Lauder Companies, Walt Disney Company, and the New York Public Library.

Work Life Balance is Passé – Five Atomic Habits of Women who #ChooseToChallenge

Padma Ravichandran

The goal is not to read a book; the goal is to become a reader. The goal is not to run a marathon, the goal is to become a runner, says James Clear in his book, Atomic Habits. When your identity emerges out of habits, it made me ponder on the atomic habits of working women, especially the ones who say, it is not difficult to have it all.  With the onset of the pandemic, social media saw a surge of people sharing a typical workday in a pandemic – and organizations started recognizing the power of authentic self –what we had attempted to fathom for years, happened seamlessly – work-life integration. But for those, who know how to Lean In and #ChooseToChallenge, have cracked that work-life balance is passé, and have been focusing on atomic habits to create Work-Life Harmony.

As we march into the month of International Women’s Day with this year’s theme of #ChooseToChallenge, here are some conscious habits that I have observed, and got inspired by in Women who Lean In –

  1. Have a vision of what you want to be – and align it with your purpose and values.

Thinking long term to stay in the game, needs focus on values. With the power of visualization, hurdles are easier to surmount, and your mind is aligned to our vision and crosses the challenges that come in the way. Women who #ChooseToChallenge, focus on the traits that make them successful at work, such as organizing skills, team collaboration, transparency, which also helps them be a ‘successful’ parent! Sometimes we must find the model that is aligned with our purpose with some innovation and ask for specifics. This not only helps build trust but also enables one to create an impact.

  1. Know how to focus, when at work.

Women who #ChooseToChallenge always strive to have an internal positive monologue where work brings intrinsic joy. When we structure our day for success, prioritizing automatically falls in place. Knowing how not to take a bad day home, or vice versa takes endurance and unwavering focus. One of the key tips to staying focused is to recharge oneself. Despite the structured rituals and planning, ensuring there are pockets of freedom, where you can invest in your personal development, kindles more innovation.

  1. Understand the power of relationships.

It is not just about understanding and investing in the power of relationships at work – but in all spheres of life. Purpose-driven organizations do not have a command and control approach to work, but focus more on nurturing relationships at work, and encourage everyone to bring one’s most authentic self to work and enable you to find the right anchors and mentors. This allows oneself to ask for direction and keep rebalancing. It can even be collaborating with teachers of the kids, setting meaningful expectations with partners, or having honest conversations with co-workers, in the spirit of respect, and trust, which in turn builds a valued community of support.

  1. Define self-care, more broadly.

When one chooses to challenge, the buck doesn’t stop in taking care of health and fitness, it transcends to emotions, environment, relationships, time, resources, as self-care attributes to enhanced creativity, faster learning, a sharper memory, and of course elevates moods, which has an implication on workplace performance. Self-care at work could be surrounding ourselves with inspiring and supporting people or updating our workspace with inspiring artwork.

  1. Present yourself authentically.

When choosing to challenge, perhaps the status quo, women are mindful that it is not possible to achieve a perfect equilibrium– and know-how and where to get help when one aspect takes the center stage. We all intuitively know our authentic self but sometimes we shield it even from our own selves; it needs the courage to be authentic. Learn to say no respectfully and step away if something is veering you off your authentic self. When we are our authentic selves, it is easy to have conversations with the key stakeholders on where we need help and navigate forward to pursue what we care about the most in every aspect of our life. 

Work, Self, Home, and Community are not separate chambers with different identities. Attempting to integrate the aspects and the different roles we play in each, by focusing on the larger purpose helps us to be more engaged and productive in all the segments of life.

Reference 

www.hbr.org

About the Author –

Padma Ravichandran is part of the Talent Management Team is intrigued by Organization Culture and Behaviours at the workplace that impact employee experience. She is also passionate about driving meaningful initiatives for enabling women to Lean In, along with her fellow Sheroes. She enjoys reading books, journaling, yoga, and learning more about life through the eyes of her 8-year-old son.