Observability versus Monitoring

Sri Chaganty

“Observability” has become a key trend in Service Reliability Engineering practice.  One of the recommendations from Gartner’s latest Market Guide for IT Infrastructure Monitoring Tools released in January 2020 says, “Contextualize data that ITIM tools collect from highly modular IT architectures by using AIOps to manage other sources, such as observability metrics from cloud-native monitoring tools.”

Like so many other terms in software engineering, ‘observability’ is a term borrowed from an older physical discipline: in this case, control systems engineering. Let me use the definition of observability from control theory in Wikipedia: “observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs.”

Observability is gaining attention in the software world because of its effectiveness at enabling engineers to deliver excellent customer experiences with software despite the complexity of the modern digital enterprise.

When we blew up the monolith into many services, we lost the ability to step through our code with a debugger: it now hops the network.  Monitoring tools are still coming to grips with this seismic shift.

How is observability different than monitoring?

Monitoring requires you to know what you care about before you know you care about it. Observability allows you to understand your entire system and how it fits together, and then use that information to discover what specifically you should care about when it’s most important.

Monitoring requires you to already know what normal is. Observability allows discovery of different types of ‘normal’ by looking at how the system behaves, over time, in different circumstances.

Monitoring asks the same questions over and over again. Is the CPU usage under 80%? Is memory usage under 75% percent? Or, is the latency under 500ms? This is valuable information, but monitoring is useful for known problems.

Observability, on the other side, is about asking different questions almost all the time. You discover new things.

Observability allows the discovery of different types of ‘normal’ by looking at behavior, over time, in different circumstances.

Metrics do not equal observability.

What Questions Can Observability Answer?

Below are sample questions that can be addressed by an effective observability solution:

  • Why is x broken?
  • What services does my service depend on — and what services are dependent on my service?
  • Why has performance degraded over the past quarter?
  • What changed? Why?
  • What logs should we look at right now?
  • What is system performance like for our most important customers?”
  • What SLO should we set?
  • Are we out of SLO?
  • What did my service look like at time point x?
  • What was the relationship between my service and x at time point y?
  • What was the relationship of attributed across the system before we deployed? What’s it like now?
  • What is most likely contributing to latency right now? What is most likely not?
  • Are these performance optimizations on the critical path?

About the Author –

Sri is a Serial Entrepreneur with over 30 years’ experience delivering creative, client-centric, value-driven solutions for bootstrapped and venture-backed startups.

Business with a Heart

Balaji Uppili

People and technology are converging like never before, as the world is gripped by COVID – 19. Just a few months ago, nobody could have predicted or foreseen the way businesses are having to work today.  As we were strategizing on corporate governance, digital transformation and the best of resiliency plans to ensure business continuity, no one ever anticipated the scale and enormity of COVID 19.

Today, it has become obvious that COVID 19 has brought about the convergence of technology and humanity and how it can change the way businesses work and function.  While we as leaders have been thinking largely about business outcomes, this pandemic has triggered a more humane approach, and the approach is here to stay.  The humane approach will be the differentiator and will prove the winner.

There is no doubt that this pandemic has brought an urgent need to accelerate our digital capabilities. With the focus on strong IT infrastructure and remote working, workforces were able to transition to working from home, meeting through video conferencing, and surprisingly, this has turned to increase the humane aspect of business relations – it has now become alright for both parties to be seeing children, spouses or pets in meeting backgrounds, and that in itself has broken down huge barriers and formalities.  It is refreshing to see the emerging empathy that is getting stronger with every meeting, and increasing collaboration and communication. It is becoming increasingly clear that we have overlooked the important factor of how it is that people have been showing up to work.  Suddenly it is now more visible that people have equally strong roles within the family – when we see parents having to home-school their children, or having other care obligations, we are viewing their personal lives and are able to empathize with them more.  We are seeing the impact that business can have on people and their personal lives and this is a never like before opportunity for leaders to put our people first.

And with customers being the center of every business, the situation of not being able to do in-person meetings has now warranted newer ways to collaborate and further strengthen the customer-centricity initiatives even more.  It has become evident that no matter how much we as leaders are thinking of automating operations, it is human connections that run businesses successfully. Lots of things have been unraveled – Important business imperatives like criticality of clean workspace compliance, the fact that offshoring thousands of miles away is not factually a compromise, but a very cost-effective and efficient way of getting things done. Productivity has also increased, and work done this far by, has a positive impact of at least 20% or even more in certain situations. As boundaries and barriers are broken, the rigidities of who should work on something and when they should work on it have all become less rigid.  Employees are less regimental about time.  Virtual crowd outsourcing has become the norm – you throw an idea at a bunch of people and whoever has the ability and the bandwidth to handle the task takes care of it, instead of a formal task assignment, and this highlights the fungibility of people.

All in all, the reset in the execution processes and introducing much more of a humane approach is here to stay and make the new norm even more exciting.

About the Author –

Balaji has over 25 years of experience in the IT industry, across multiple verticals. His enthusiasm, energy, and client focus is a rare gift, and he plays a key role in bringing new clients into GAVS. Balaji heads the Delivery department and passionately works on Customer delight. He says work is worship for him and enjoys watching cricket, listening to classical music, and visiting temples.

JAVA – Cache Management

Sivaprakash Krishnan

This article explores the offering of the various Java caching technologies that can play critical roles in improving application performance.

What is Cache Management?

A cache is a hot or a temporary memory buffer which stores most frequently used data like the live transactions, logical datasets, etc. This intensely improves the performance of an application, as read/write happens in the memory buffer thus reducing retrieval time and load on the primary source. Implementing and maintaining a cache in any Java enterprise application is important.

  • The client-side cache is used to temporarily store the static data transmitted over the network from the server to avoid unnecessarily calling to the server.
  • The server-side cache could be a query cache, CDN cache or a proxy cache where the data is stored in the respective servers instead of temporarily storing it on the browser.

Adoption of the right caching technique and tools allows the programmer to focus on the implementation of business logic; leaving the backend complexities like cache expiration, mutual exclusion, spooling, cache consistency to the frameworks and tools.

Caching should be designed specifically for the environment considering a single/multiple JVM and clusters. Given below multiple scenarios where caching can be used to improve performance.

1. In-process Cache – The In-process/local cache is the simplest cache, where the cache-store is effectively an object which is accessed inside the application process. It is much faster than any other cache accessed over a network and is strictly available only to the process that hosted it.

Data Center Consolidation Initiative Services

  • If the application is deployed only in one node, then in-process caching is the right candidate to store frequently accessed data with fast data access.
  • If the in-process cache is to be deployed in multiple instances of the application, then keeping data in-sync across all instances could be a challenge and cause data inconsistency.
  • An in-process cache can bring down the performance of any application where the server memory is limited and shared. In such cases, a garbage collector will be invoked often to clean up objects that may lead to performance overhead.

In-Memory Distributed Cache

Distributed caches can be built externally to an application that supports read/write to/from data repositories, keeps frequently accessed data in RAM, and avoid continuous fetching data from the data source. Such caches can be deployed on a cluster of multiple nodes, forming a single logical view.

  • In-memory distributed cache is suitable for applications running on multiple clusters where performance is key. Data inconsistency and shared memory aren’t matters of concern, as a distributed cache is deployed in the cluster as a single logical state.
  • As inter-process is required to access caches over a network, latency, failure, and object serialization are some overheads that could degrade performance.

2. In-memory database

In-memory database (IMDB) stores data in the main memory instead of a disk to produce quicker response times. The query is executed directly on the dataset stored in memory, thereby avoiding frequent read/writes to disk which provides better throughput and faster response times. It provides a configurable data persistence mechanism to avoid data loss.

Redis is an open-source in-memory data structure store used as a database, cache, and message broker. It offers data replication, different levels of persistence, HA, automatic partitioning that improves read/write.

Replacing the RDBMS with an in-memory database will improve the performance of an application without changing the application layer.

3. In-Memory Data Grid

An in-memory data grid (IMDG) is a data structure that resides entirely in RAM and is distributed among multiple servers.

Key features

  • Parallel computation of the data in memory
  • Search, aggregation, and sorting of the data in memory
  • Transactions management in memory
  • Event-handling

Cache Use Cases

There are use cases where a specific caching should be adapted to improve the performance of the application.

1. Application Cache

Application cache caches web content that can be accessed offline. Application owners/developers have the flexibility to configure what to cache and make it available for offline users. It has the following advantages:

  • Offline browsing
  • Quicker retrieval of data
  • Reduced load on servers

2. Level 1 (L1) Cache

This is the default transactional cache per session. It can be managed by any Java persistence framework (JPA) or object-relational mapping (ORM) tool.

The L1 cache stores entities that fall under a specific session and are cleared once a session is closed. If there are multiple transactions inside one session, all entities will be stored from all these transactions.

3. Level 2 (L2) Cache

The L2 cache can be configured to provide custom caches that can hold onto the data for all entities to be cached. It’s configured at the session factory-level and exists as long as the session factory is available.

  • Sessions in an application.
  • Applications on the same servers with the same database.
  • Application clusters running on multiple nodes but pointing to the same database.

4. Proxy / Load balancer cache

Enabling this reduces the load on application servers. When similar content is queried/requested frequently, proxy takes care of serving the content from the cache rather than routing the request back to application servers.

When a dataset is requested for the first time, proxy saves the response from the application server to a disk cache and uses them to respond to subsequent client requests without having to route the request back to the application server. Apache, NGINX, and F5 support proxy cache.

Desktop-as-a-Service (DaaS) Solution

5. Hybrid Cache

A hybrid cache is a combination of JPA/ORM frameworks and open source services. It is used in applications where response time is a key factor.

Caching Design Considerations

  • Data loading/updating
  • Performance/memory size
  • Eviction policy
  • Concurrency
  • Cache statistics.

1. Data Loading/Updating

Data loading into a cache is an important design decision to maintain consistency across all cached content. The following approaches can be considered to load data:

  • Using default function/configuration provided by JPA and ORM frameworks to load/update data.
  • Implementing key-value maps using open-source cache APIs.
  • Programmatically loading entities through automatic or explicit insertion.
  • External application through synchronous or asynchronous communication.

2. Performance/Memory Size

Resource configuration is an important factor in achieving the performance SLA. Available memory and CPU architecture play a vital role in application performance. Available memory has a direct impact on garbage collection performance. More GC cycles can bring down the performance.

3. Eviction Policy

An eviction policy enables a cache to ensure that the size of the cache doesn’t exceed the maximum limit. The eviction algorithm decides what elements can be removed from the cache depending on the configured eviction policy thereby creating space for the new datasets.

There are various popular eviction algorithms used in cache solution:

  • Least Recently Used (LRU)
  • Least Frequently Used (LFU)
  • First In, First Out (FIFO)

4. Concurrency

Concurrency is a common issue in enterprise applications. It creates conflict and leaves the system in an inconsistent state. It can occur when multiple clients try to update the same data object at the same time during cache refresh. A common solution is to use a lock, but this may affect performance. Hence, optimization techniques should be considered.

5. Cache Statistics

Cache statistics are used to identify the health of cache and provide insights about its behavior and performance. Following attributes can be used:

  • Hit Count: Indicates the number of times the cache lookup has returned a cached value.
  • Miss Count: Indicates number of times cache lookup has returned a null or newly loaded or uncached value
  • Load success count: Indicates the number of times the cache lookup has successfully loaded a new value.
  • Total load time: Indicates time spent (nanoseconds) in loading new values.
  • Load exception count: Number of exceptions thrown while loading an entry
  • Eviction count: Number of entries evicted from the cache

Various Caching Solutions

There are various Java caching solutions available — the right choice depends on the use case.

Software Test Automation Platform

At GAVS, we focus on building a strong foundation of coding practices. We encourage and implement the “Design First, Code Later” principle and “Design Oriented Coding Practices” to bring in design thinking and engineering mindset to build stronger solutions.

We have been training and mentoring our talent on cutting-edge JAVA technologies, building reusable frameworks, templates, and solutions on the major areas like Security, DevOps, Migration, Performance, etc. Our objective is to “Partner with customers to realize business benefits through effective adoption of cutting-edge JAVA technologies thereby enabling customer success”.

About the Author –

Sivaprakash is a solutions architect with strong solutions and design skills. He is a seasoned expert in JAVA, Big Data, DevOps, Cloud, Containers, and Micro Services. He has successfully designed and implemented a stable monitoring platform for ZIF. He has also designed and driven Cloud assessment/migration, enterprise BRMS, and IoT-based solutions for many of our customers. At present, his focus is on building ‘ZIF Business’ a new-generation AIOps platform aligned to business outcomes.

Hyperautomation

Machine learning service provider

Bindu Vijayan

According to Gartner, “Hyper-automation refers to an approach in which organizations rapidly identify and automate as many business processes as possible. It involves the use of a combination of technology tools, including but not limited to machine learning, packaged software and automation tools to deliver work”.  Hyper-automation is to be among the year’s top 10 technologies, according to them.

It is expected that by 2024, organizations will be able to lower their operational costs by 30% by combining hyper-automation technologies with redesigned operational processes. According to Coherent Market Insights, “Hyper Automation Market will Surpass US$ 23.7 Billion by the end of 2027.  The global hyper automation market was valued at US$ 4.2 Billion in 2017 and is expected to exhibit a CAGR of 18.9% over the forecast period (2019-2027).”

How it works

To put it simply, hyper-automation uses AI to dramatically enhance automation technologies to augment human capabilities. Given the spectrum of tools it uses like Robotic Process Automation (RPA), Machine Learning (ML), and Artificial Intelligence (AI), all functioning in sync to automate complex business processes, even those that once called for inputs from SMEs,  implies this is a powerful tool for organisations in their digital transformation journey.

Hyperautomation allows for robotic intelligence into the traditional automation process, and enhances the completion of processes to make it more efficient, faster and errorless.  Combining AI tools with RPA, the technology can automate almost any repetitive task; it automates the automation by identifying business processes and creates bots to automate them. It calls for different technologies to be leveraged, and that means the businesses investing in it should have the right tools, and the tools should be interoperable. The main feature of hyperautomation is, it merges several forms of automation and works seamlessly together, and so a hyperautomation strategy can consist of RPA, AI, Advanced Analytics, Intelligent Business Management and so on. With RPA, bots are programmed to get into software, manipulate data and respond to prompts. RPA can be as complex as handling multiple systems through several transactions, or as simple as copying information from applications. Combine that with the concept of Process Automation or Business Process Automation which enables the management of processes across systems, it can help streamline processes to increase business performance.    The tool or the platform should be easy to use and importantly scalable; investing in a platform that can integrate with the existing systems is crucial. The selection of the right tools is what  Gartner calls “architecting for hyperautomation.”

Impact of hyperautomation

Hyperautomation has a huge potential for impacting the speed of digital transformation for businesses, given that it automates complex work which is usually dependent on inputs from humans. With the work moved to intelligent digital workers (RPA with AI) that can perform repetitive tasks endlessly, human performance is augmented. These digital workers can then become real game-changers with their efficiency and capability to connect to multiple business applications, discover processes, work with voluminous data, and analyse in order to arrive at decisions for further / new automation.

The impact of being able to leverage previously inaccessible data and processes and automating them often results in the creation of a digital twin of the organization (DTO); virtual models of every physical asset and process in an organization.  Sensors and other devices monitor digital twins to gather vital information on their condition, and insights are gathered regarding their health and performance. As with data, the more data there is, the systems get smarter with it, and are able to provide sharp insights that can thwart problems, help businesses make informed decisions on new services/products, and in general make informed assessments. Having a DTO throws light on the hitherto unknown interactions between functions and processes, and how they can drive value and business opportunities.  That’s powerful – you get to see the business outcome it brings in as it happens or the negative effect it causes, that sort of intelligence within the organization is a powerful tool to make very informed decisions.

Hyperautomation is the future, an unavoidable market state

hyperautomation is an unavoidable market state in which organizations must rapidly identify and automate all possible business processes.” – Gartner

It is interesting to note that some companies are coming up with no-code automation. Creating tools that can be easily used even by those who cannot read or write code can be a major advantage – It can, for e.g., if employees are able to automate the multiple processes that they are responsible for, hyperautomation can help get more done at a much faster pace, sparing time for them to get involved in planning and strategy.  This brings more flexibility and agility within teams, as automation can be managed by the teams for the processes that they are involved in.

Conclusion

With hyperautomation, it would be easy for companies to actually see the ROI they are realizing from the amount of processes that have been automated, with clear visibility on the time and money saved. Hyperautomation enables seamless communication between different data systems, to provide organizations flexibility and digital agility. Businesses enjoy the advantages of increased productivity, quality output, greater compliance, better insights, advanced analytics, and of course automated processes. It allows machines to have real insights on business processes and understand them to make significant improvements.

“Organizations need the ability to reconfigure operations and supporting processes in response to evolving needs and competitive threats in the market. A hyperautomated future state can only be achieved through hyper agile working practices and tools.”  – Gartner

References:

Assess Your Organization’s Maturity in Adopting AIOps

IT operations analytics

Anoop Aravindakshan

Artificial Intelligence for IT operations (AIOps) is adopted by organizations to deliver tangible Business Outcomes. These business outcomes have a direct impact on companies’ revenue and customer satisfaction.

A survey from AIOps Exchange 2019, reports that 84% of business owners who attended the survey, confirmed that they are actively evaluating AIOps to be adopted in their organizations.

So, is AIOps just automation? Absolutely NOT!

Artificial Intelligence for IT operations implies the implementation of true Autonomous Artificial Intelligence in ITOps, which needs to be adopted as an organization-wide strategy. Organizations will have to assess their existing landscape, processes, and decide where to start. That is the only way to achieve the true implementation of AIOps.

Every organization trying to evaluate AIOps as a strategy should read through this article to understand their current maturity, and then move forward to reach the pinnacle of Artificial Intelligence in IT Operations.

The primary success factor in adopting AIOps is derived from the Business Outcomes the organization is trying to achieve by implementing AIOps – that is the only way to calculate ROI.

There are 4 levels of Maturity in AIOps adoption. Based on our experience in developing an AIOps platform and implementing the platform across multiple industries, we have arrived at these 4 levels. Assessing an organization against each of these levels, helps in achieving the goal of TRUE Artificial Intelligence in IT Operations.

Level 1: Knee-jerk

Events, logs are generated in silos and collected from various applications and devices in the infrastructure. These are used to generate alerts that are commissioned to command centres to escalate as per the SOPs (standard operating procedures) defined. The engineering teams work in silos, not aware of the business impact that these alerts could potentially create. Here, operations are very reactive which could cost the organization millions of dollars.

Level 2: Unified

All events, logs, and alerts are integrated into one central locale. ITSM processes are unified. This helps in breaking silos and engineering teams are better prepared to tackle business impacts. SOPs have been adjusted since the process is unified, but this is still reactive incident management.

Level 3: Intelligent

Machine Learning algorithms (either supervised or unsupervised) have been implemented on the unified data to derive insights. There are baseline metrics that are calibrated and will be used as a reference for future events. With more data, the metrics get richer. IT operations team can correlate incidents / events with business impacts by leveraging AI & ML. If Mean-Time-To-Resolve (MTTR) an incident has been reduced by automated identification of the root cause, then the organization has attained level 3 maturity in AIOps.

Level 4: Predictive & Autonomous

The pinnacle of AIOps is level 4. If incidents and performance degradation of applications can be predicted by leveraging Artificial Intelligence, it implies improved application availability. Autonomous remediation bots can be triggered spontaneously based on the predictive insights, to fix incidents that are prone to happen in the enterprise. Level 4 is a paradigm shift in IT operations – moving operations entirely from being reactive, to becoming proactive.

Conclusion

As IT operations teams move up each level, the essential goal to keep in mind is the long-term strategy that needs to be attained by adopting AIOps. Artificial Intelligence has matured over the past few decades, and it is up to AIOps platforms to embrace it effectively. While choosing an AIOps platform, measure the maturity of the platform’s artificial intelligent coefficient.

About the Author:

An evangelist of Zero Incident FrameworkTM, Anoop has been a part of the product engineering team for long and has recently forayed into product marketing. He has over 14 years of experience in Information Technology across various verticals, which include Banking, Healthcare, Aerospace, Manufacturing, CRM, Gaming and Mobile.

Creating Purposeful Corporations, In pursuit of Conscious Capitalism

Gavs technologies ceo

Sumit Ganguli

More than 8 million metric tons of plastic leak into the ocean every year, so building infrastructure that stops plastic before it gets into the ocean is key to solving this issue,” said H. Fisk Johnson, Chairman, and CEO of SC Johnson. SC Johnson, an industry-leading manufacturer of household consumer brands, has launched a global partnership to stop plastic waste from entering the ocean and fight poverty.

In August 2019, after 42 years of its inception, Business Roundtable,  that has periodically issued Principles of Corporate Governance, with emphasis on serving shareholders, has released a new statement of Purpose of a Corporation. This new statement was signed by 181 CEOs who have committed to lead their companies to benefit all stakeholders – customers, employees, suppliers, communities and shareholders.  Jamie Dimon, Chairman and CEO of JPMorgan Chase & Co., is the Chairman of Business Roundtable. He went on to say, “The American dream is alive, but fraying,” “Major employers are investing in their workers and communities because they know it is the only way to be successful over the long term. These modernized principles reflect the business community’s unwavering commitment to continue to push for an economy that serves all Americans.

Today the definition of corporate purpose seems to be changing. Companies are now focused on the environment and all the stakeholders.  There is a growing ambivalence about Capitalism that only promoted the pursuit of wealth, according to a Harvard Business School survey.

But this is a far cry from when we were growing up in India as youths, in the 1980s. Our definition of personal success was to expeditiously acquire wealth. Most of us who were studying Engineering, Medicine or pursuing other professional degrees, were all looking for a job that would sustain us and support our immediate family. The other option was to emigrate to America or other developed countries, for further studies and make a life here – to celebrate Capitalism in all its glory. 

In India, we were quite steeped in religious festivals and rituals. We attended Baal Mandir and had moral science in school, but the concept of Service, Altruism,  Seva, Sharing were largely platitudes and they were not a part of our daily lives.  There was an inbuilt cynicism about charity and we never felt that when we grow up, we need to think about the greater good of the society. 

And that is where Conscious Capitalism comes in. Instead of espousing Ayn Rand’s version of scorched earth capitalism, “ Selfishness is a Virtue”, or blindly following  Gordon Gekko’s “Greed is good”,  the media, parents, teachers, influence makers could promote and ingrain in all of the youth, students and people at large that there is merit in wealth creation, but it could be infused with altruism. We could celebrate the successful who also share. This could dispel the notion that charity and sharing of wealth is only for the rich and the famous.  

ai automation in cloud computing

America gets criticized for many things around the world, but often the world overlooks that the largest amount of charity and donations have been from the USA.  Bill & Melinda Gates Foundation, Warren Buffet, Larry Elison of Oracle who has pledged a significant portion of his wealth to the Bill & Melinda Gates Foundation, Mark Zuckerberg of Facebook and many others have absolutely embraced the concept of Conscious Capitalism for their corporations. But what would really broaden the pyramid, would be when early entrepreneurs and upcoming executives are also engaged in sharing and giving, and not wait till they reach the pinnacle of success. We cannot expect only governmental initiatives to support the underprivileged. We need to celebrate Conscious Capitalism and entrepreneurs and business leaders who are pursuing their dreams and are also sharing some portion of their wealth with the society.

At GAVS and through the Private Equity firm Basil Partners we are privileged to have been involved in an initiative to nurture and support a small isolated village named Ramanwadi in Maharashtra, through a project named Venu Madhuri (www.venumadhuri.org).  The volunteers involved in supporting this small village have brought success in several areas of rural development and the small hamlet is inching towards self-sufficiency.

Basil Partners along with Apar Industries seed-funded the Midday meal program, (www.annamrita.org)  that feeds almost 1.26 Million school students per day in Mumbai; and have promoted the Bhakti Vedanta Hospital in Mumbai.

These are all very humble efforts compared to some of the massive projects undertaken by the largest of groups and individuals. However, they all make a difference. I truly believe that we need to internalize some of the credo and values that have been espoused by H Fisk Johnson & the work companies like SC Johnson is doing, emulate Azim Premji, Satya Nadella and many others. They are the true ambassadors of Conscious Capitalism and are creating purposeful corporations.