Disaster Recovery for Modern Digital IT

A Disaster Recovery strategy includes policies, tools, and processes for recovery of data and restoration of systems in the event of a disruption. The cause of disruption could be natural, like earthquakes/floods, or man-made like power outages, hardware failures, terror attacks or cybercrimes. The aim of Disaster Recovery(DR) is to enable rapid recovery from the disaster to minimize data loss, the extent of damage, and disruption to the business. DR is often confused with Business Continuity Planning(BCP). While BCP ensures restoration of the entire business, DR is a subset of that, with a focus on IT infrastructure, applications, and data.

IT disasters come at the cost of lost revenue, tarnished brand image, lowered customer confidence and even legal issues relating to data privacy and compliance. The impact can be so debilitating that some companies never fully recover from it. With the average cost of IT downtime running to thousands of dollars per minute, it goes without saying that an enterprise-grade disaster recovery strategy is a must-have.

Why do companies neglect this need?

In spite of the obvious consequences of a disaster, many organizations shy away from investing in a DR strategy due to the associated expenditure. Without a clear ROI insight, these organizations decide to risk the vulnerability to catastrophic disruptions. They instead make do with just data backup plans or secure only some of the most critical elements of their IT landscape.

Why is Disaster Recovery different today?

The ripple effects of modern digital infrastructure have forced an evolution in DR strategies. Traditional Disaster Recovery methods are being overhauled to cater to the new hybrid IT infrastructure environment. Some influencing factors:

Cloud computing

The modern IT Landscape

Infrastructure – Today’s IT environment is distributed between on-premise, colocation facilities, public/private cloud, as-a-service offerings and edge locations. Traditional data centers are losing their prominence and are having to share their monopoly with these modern technologies. This trend has significant advantages such as reduced CapEx in establishing data centers, reduced latency because of data being closer to the user, and high dynamic scalability.

Data – Adding to the complexity of modern digital infrastructure is the exponential growth in data from varied sources and of disparate types like big data, mobile data, streaming content, data from the cloud, social media, edge locations, IoT, to name a few.

Applications – The need for agility has triggered the shift away from monolith applications towards microservices that typically use containers to provide their execution environment. Containers are ephemeral and so scale, shrink, disappear or move between nodes based on demand.

While innovation in IT helps digital transformation in unimaginable ways, it also makes it that much harder for IT teams to formulate a disaster recovery strategy for today’s IT landscape that is distributed, mobile, elastic and transient.

Cybercrimes

Cybercrimes are becoming increasingly prevalent and are a big threat to organizations. Modern technologies fuel increasing sophistication in malware and ransomware. As their complexity increases, they are becoming harder to even detect while they lie low and do their harm quietly inside the environment. By the time they are detected, the damage is done and it’s too late. DR strategies are also constantly challenged by the lucrative underworld of ransomware.

Solution Strategies for Disaster Recovery

On-Premise DR: This is the traditional option that translates to heavy upfront investments towards the facility, securing the facility, infrastructure including the network connectivity/firewalls/load balancers, resources to scale as needed, manpower, test drills, ongoing management and maintenance, software licensing costs, periodic upgrades for ongoing compatibility with the production environment and much more.

A comprehensive DR strategy involves piecing together several pieces of a complex puzzle. Due to the staggering costs and time involved in provisioning and managing infra for the duplicate storage and compute, companies are asking themselves if it is really worth the investment, and are starting to explore more OpEx based solutions. And, they are discovering that the cloud may be the answer to this challenge of evolving infra, offering cost-effective top-notch resiliency.

Cloud-based DR: The easy availability of public cloud infrastructure & services, with affordable monthly subscription plans and pay per use rates, has caused an organic switch to the cloud for storage, infra and as a Service(aaS) needs. To complement this, replication techniques have also evolved to enable cloud replication. With backup on the cloud, the recovery environment needs to be paid for only when used in the event of a disaster!

Since maintaining the DR site is the vendor’s responsibility, it reduces the complexity in managing the DR site and the associated operating expenses as well. Most DR requirements are intrinsically built into cloud solutions: redundancy, advanced networks, bandwidth, scalability, security & compliance. These can be availed on demand, as necessitated by the environment and recovery objectives. These features have made it feasible for even small businesses to acquire DR capabilities.

Disaster Recovery-as-a-Service(DRaaS) which is fast gaining popularity, is a DR offering on the cloud, where the vendor manages the replication, failover and failback mechanisms as needed for recovery, based on an SLA driven service contract.

On the flip side, as cloud adoption becomes more and more prevalent, there are also signs of a reverse drain back to on-premise! Over time, customers are noticing that they are bombarded by hefty cloud usage bills, way more than what they had bargained for. There is a steep learning curve in assimilating the nuances of new cloud technologies and the innumerable options they offer. It is critical for organizations to clearly evaluate their needs, narrow down on reliable vendors with mature offerings, understand their feature set and billing nitty-gritties and finalize the best fit for their recovery goals. So, it is Cloud, but with Caution!

Integrating DR with the Application: Frank Jablonski, VP of Global Marketing, SIOS Technology Corp predicts that applications will soon have Disaster Recovery architected into their core, as a value-add. Cloud-native implementations will leverage the resiliency features of the cloud to deliver this value.

The Proactive Approach

Needless to say, investing in a proactive approach for disaster prevention will help mitigate the chances of a disaster in the first place. One sure-fire way to optimize IT infrastructure performance, prevent certain types of disasters and enhance business service continuity is to use AI-augmented ITOps platforms to manage the IT environment. GAVS’ AIOps platform, Zero Incident FrameworkTM(ZIF) has modules powered by Advanced Machine Learning to Discover, Monitor, Analyze, Predict, and Remediate, helping organizations drive towards a Zero Incident EnterpriseTM. For more information, please visit the ZIF website.

Understanding Reinforcement Learning in five minutes

Gireesh Sreedhar KP

Reinforcement learning (RL) is an area of Machine Learning (ML) that takes suitable actions to maximize rewards situations. The goal of reinforcement learning algorithms is to find the best possible action to take in a specific situation. Just like the human brain, it is rewarded for good choices and penalized for bad choices and learns from each choice. RL tries to mimic the way that humans learn new things, not from a teacher but via interaction with the environment. At the end, the RL learns to achieve a goal in an uncertain, potentially complex environment.

Understanding Reinforcement Learning

How does one learn cycling? How does a baby learn to walk? How do we become better at doing something with more practice? Let us explore learning to cycle to illustrate the idea behind RL.

Did somebody tell you how to cycle or gave you steps to follow? Or did you learn it by spending hours watching videos of people cycling? All these will surely give you an idea about cycling; but will it be enough to actually get you cycling? The answer is no. You learn to cycle only by cycling (action). Through trials and errors (practice), and going through all the positive experiences (positive reward) and negative experiences (negative rewards or punishments), before getting your balance and control right (maximum reward or best outcome). This analogy of how our brain learns cycling applies to reinforcement learning. Through trials, errors, and rewards, it finds the best course of action.

Components of Reinforcement Learning

The major components of RL are as detailed below:

  • Agent: Agent is the part of RL which takes actions, receives rewards for actions and gets a new environment state as a result of the action taken. In the cycling analogy, the agent is a human brain that decides what action to take and gets rewarded (falling is negative and riding is positive).
  • Environment: The environment represents the outside world (only relevant part of the world which the agent needs to know about to take actions) that interacts with agents. In the cycling analogy, the environment is the cycling track and the objects as seen by the rider.
  • State: State is the condition or position in which the agent is currently exhibiting or residing. In the cycling analogy, it will be the speed of cycle, tilting of the handle, tilting of the cycle, etc.
  • Action: What the agent does while interacting with the environment is referred to as action. In the cycling analogy, it will be to peddle harder (if the decision is to increase speed), apply brakes (if the decision is to reduce speed), tilt handle, tilt body, etc.
  • Rewards: Reward is an indicator to the agent on how good or bad the action taken was. In the cycling analogy, it can be +1 for not falling, -10 for hitting obstacles and -100 for falling, the reward for outcomes (+1, -10, -100) are defined while building the RL agent. Since the agent wants to maximize rewards, it avoids hitting and always tries to avoid falling.

Characteristics of Reinforcement Learning

Instead of simply scanning the datasets to find a mathematical equation that can reproduce historical outcomes like other Machine Learning techniques, reinforcement learning is focused on discovering the optimal actions that will lead to the desired outcome.

There are no supervisors to guide the model on how well it is doing. The RL agent gets a scalar reward and tries to figure out how good the action was.

Feedback is delayed. The agent gets an instant reward for action, however, the long-term effect of an action is known only later. Just like a move in chess may seem good at the time it is made, but may turn out to be a bad long term move as the game progress.

Time matters (sequential). People who are familiar with supervised and unsupervised learning will know that the sequence in which data is used for training does not matter for the outcome. However, for RL, since action and reward at current state influence future state and action, the time and sequence of data matters.

Action affects subsequent data RL agent receives.

Why Reinforcement Learning

The type of problems that reinforcement learning solves are simply beyond human capabilities. They are even beyond the solving capabilities of ML techniques. Besides, RL eliminates the need for data to learn, as the agent learns by interacting with the environment. This is a great advantage to solve problems where data availability or data collection is an issue.

Reinforcement Learning applications

RL is the darling of ML researchers now. It is advancing with incredible pace, to solve business and industrial problems and garnering a lot of attention due to its potential. Going forward, RL will be core to organizations’ AI strategies.

Reinforcement Learning at GAVS

Reinforcement Learning is core to GAVS’ AI strategy and is being actively pursued to power the IP led AIOps platform – Zero Incident FrameworkTM (ZIF). We had our first success on RL; developing an RL agent for automated log rotation in servers.

References:

Reinforcement Learning: An Introduction second edition by Richard S. Sutton and Andrew G. Barto

https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf

About the Author:

Gireesh is a part of the projects run in collaboration with IIT Madras for developing AI solutions and algorithms. His interest includes Data Science, Machine Learning, Financial markets, and Geo-politics. He believes that he is competing against himself to become better than who he was yesterday. He aspires to become a well-recognized subject matter expert in the field of Artificial Intelligence.

My Journey at GAVS

Nitu Singh

I completed my education and started my professional journey at Ranchi (a city in eastern India). Through GAVS, I got the opportunity to travel down to the south of India, and experience the renowned hospitality of Chennai. And on the very first day, I was taken in by the city’s warmth. 

I received a warm welcome and was put up at the company guest house. The food in this part of India is a stark contrast to the type of food I have eaten all my life, even though I have been exposed to it during my brief stints at 2 other south Indian cities. It didn’t take me long to get used to it. Another major change was the local language. I had no clue how to engage in informal conversations without knowing the language. But the people around helped me ease into making this my home. I soon moved into a lake-side apartment, not far away from my office. 

I was overwhelmed with the warm welcome on the first day at GAVS. Apart from the regular onboarding activities, the team made sure all the new joiners felt comfortable, a reassurance that I made the right choice to accept this opportunity. I was introduced to the RITE values of GAVS and gradually understood that it is something that every GAVSian swears by. It stands for Respect, Integrity, Trust and Empathy; values that are practiced in all ways and forms here.

I was recruited for Talent Acquisition, but had very little experience in it. I could not have come this far without the guidance and support of my manager, department head and my teammates. Continuous collaborative sessions and an open-door policy followed by the leaders got me going and thus increased my confidence. 

I loved the fact that I got the opportunity to work on various customer engagements and teams. The Talent Acquisition team is strategically integrated with the Customer Success team here at GAVS. I was particularly intrigued by the role of Customer Success Managers (CSMs) as they have a lot of responsibility in these engagements. It’s amazing how much I got to learn while working with them. Frequent discussions with the CSMs and my leadership team helped me understand the broader goals of individual customer engagements; which is significant to success in a role like mine. I also had opportunities to participate directly in customer discussions and understand their objectives and goals and align our processes and approach to meet and exceed those. 

My biggest challenge came in the middle of 2019 when I was given a crucial healthcare customer. We had to significantly ramp up our hiring in very little time. Our team spent long days and even some weekends at the office to meet client requirements. We went that extra mile and worked hard as a team to overcome all the obstacles. We did some creative branding and communication both internally and externally and that boosted our pace. Every step of the way, GAVSians stood up to support and that was a heartening sight when we were going through some serious challenges. 

The various training programs like ‘customer centricity’‘business communication’‘accountability and ownership’ helped me further expand my thinking and modify my approach to the day-to-day tasks and challenges. But the real game-changers were the strategic learning initiatives. We underwent rigorous coaching and assessment in Business Communication conducted by the British Council. I learnt the nuances of communication and learnt how it can enhance communication. No wonder I can articulate my thoughts and present them to you all now!

I participated in a Customer-Centric Leadership Program completely driven by senior professors of premier B-school, Great Lakes School of Management (GLIM). The sessions ranged from Self Awareness, Interpersonal skills, handling intellectual and emotional intelligence, to optimizing and rationalizing processes for excellence and broader aspects of various leadership styles. How do we become self-aware and how do we make things happen? This allowed me to hone my inherent leadership skills and cultivate some more.

We also participated in a rigorous training program on Leadership and Critical Thinking conducted by IIT-Madras. This program was all about critical thinking and systems design especially in an ambiguous environment coupled with humane approaches to the same. The program also covered aspects of randomness and ambiguity of data, especially while making critical decisions and applying it to day-to-day management. There were some interesting sessions on strategy and business models and the art of negotiation. How many people get the opportunity to partake in such tailored programs from such premier institutes? We did, and I am thankful to GAVS for this golden opportunity, as clichéd as it may sound. I am sure all of this and my experience at GAVS will help me greatly in not just my professional but also my personal growth. 

The women’s role model series at GAVS has also helped me along my journey. Listening to the inspirational stories of eminent women who’ve broken the glass ceiling and paved their own paths makes me feel I can do it too.

I always thought that recruitment is a thankless job but my time at GAVS changed that perception. I am currently working for a crucial client and I’m responsible for hiring for some key strategic positions. I get noticed and appreciated for the work that I do. The practice at GAVS of giving out Thank You cards and other spot recognitions is also very motivating. 

The climate of Chennai did make me think twice about continuing in this city. But considering all that this city and GAVS has given me, I wouldn’t give it up for the world!

About the Author:

Nitu started her career as a teacher in Ranchi. Post her Masters in HR, she joined GAVS as Talent Specialist. She aspires to continue to grow in HR. Nitu likes to travel, read and listen to music in her free time.

Evolutionary Transitions and the Move to The Next Age™

Kerrie Hoffman

We are living in amazing times! It’s a time of transition and great transformation. As business continues to accelerate, all companies have a choice to either keep up with the transformation or experience an increased level of friction in their quest to serve the customer. Choosing the path of transformation will ignite the move into the Digital Flow of Business – and there’s a lot less friction in the Digital Flow.

History of the Ages

I mentioned we are living in a time of transition and great transformation. Looking back at human history, we find descriptions of eras, ages, and revolutions. Here are some interesting points about this History. Everyone is familiar with the Hunting and Gathering Era which started almost 2M years ago1; the Agricultural Age which started in 10,000 BC2, and the Industrial Age which started in the mid to late 1700s3. But have you thought about the impetus for the start of each age?

In my research, I have found the impetus for moving between the ages is a trigger of some sort, followed by a significant change in the way business is conducted:

•      The move into the agricultural age is when we moved from hunting and gathering to stationary farming with primitive irrigation. Stationary farming with irrigation was the trigger. This was a big change where people started to settle in villages and cities and sell their goods and services. A completely new way of living and working.

•      In the industrial revolution, we moved from small groups working with their hands to large organized departments working with Machines. Automation of manual work with machines was the trigger. This is where Corporations were eventually born. Another significant transformation in the way we live and work.

•      And in The Next Age™, we are fundamentally changing everything about the way we are used to working in the Industrial Age. The Next Age™ started around the year 2000 when early-adopting technology companies started developing platforms with Next Age architected technology. The year exponential technology became mainstream and the move accelerated was 2007.

The Year 2007

 Here’s an interesting side note on the year 2007. Many people are familiar with the book The World is Flat, by Thomas Friedman. He also wrote the book Thank You for Being Late. Chapter 2 is titled “What the Hell Happened in 2007”. I recommend reading the entire chapter/book, but here’s a sneak peek at some of the things conceived and launched in 2007: the iPhone, Hadoop, GitHub, Android, Twitter, and Facebook take off, Kindle, ATT software-defined network, Airbnb was conceived, internet users crossed the 1b mark, Watson began to be built, Intel created new materials, it was the beginning of clean power industry, the cost of DNA sequencing began to shift dramatically, and more!

Signs of the Move to The Next Age™

Moving from one age to the next requires a trigger followed by transformation. Transformation in this context is defined as a substantial and dramatic change in operations, processes, and structures to run a business.

Another sign we are in transition to The Next Age™ is the speed of business. Business is accelerating, we all feel this daily. Have you ever stopped to wonder why? This acceleration is largely due to rapid development of new technologies. In fact, technology is now being released at an exponential rate. Exponential growth in technology is the trigger for the move to The Next Age™. The challenge is, the adoption of these new technologies is lagging.

Early on in Thomas Friedman’s book Thank You for Being Late, he shows a chart by Eric Teller. The chart shows that technology is already on the exponential part of its curve, however human adaptability is not, in fact, it has fallen behind and caused a gap.

The impact this has on our businesses is profound. Do you have things in your business which used to work really well, and now not so much?  Why is this the case?

The Time to Transform is Now
It’s important we move now into adopting the transformation needed to keep our businesses relevant and growing strong. Interestingly, small business is moving somewhat organically into the next age partly out of necessity and partly because small businesses are comprised of small, nimble teams.

To fill the gap between exponential technology growth and human adaptability, it’s important to change the way we work. The reality is companies need to exit the Industrial Age and enter The Next Age™. Here are some of the key ways to change the way you work in The Next Age™:

  • Practice Extreme Customer Centricity
  • Work in small, end to end knowledgeable teams focused on customer micro-segments
  • Adopt a Digitally Expanded Mindset
  • Master the Digital Flow Framework

No matter the size of your business, companies in The Next Age™ work in small end-to-end knowledgeable teams with an extreme focus on the customer. For large companies and enterprises, this is a huge change requiring the breakdown of traditional siloed departments and micro-segmentation of the customer base. Extreme Customer Centricity means everyone understands the customer at a very deep level with strategy and process in place to solve customers’ issues, even when not part of the traditional product and service offerings of your company.

The opportunity is immense. Since small businesses naturally operate this way, there is an opportunity for mid and large size companies to learn from and be serviced by small companies. Of course, this may require a mindset change in large enterprises. Beyond the mindset change required for businesses of all sizes to work together, is the need to adopt a Digitally Expanded Mindset. 

There are 5 aspects to a Digitally Expanded Mindset:

  1. Behaviors and attitudes that see possibility in the digital era
  2. A belief in the power of technology
  3. An abundance mentality
  4. Comfort with ambiguity
  5. A growth outlook

You can read more about the 5 aspects of a Digitally Expanded Mindset in the article: Digital Mindset: 5 Aspects that Drive Digital Transformation4 published in the August 2019 edition of enGAge.

The transformation from one age to the next is no small feat. The best way to approach the transformation is to think of it as a journey where you break down the changes into several steps. There are 3 areas that need to be addressed: Talent, Operations, and Technology. The Digital Flow Framework™ breaks these areas down in detail. You can read more in the Forbes Article: Business Transformation Part 1: The Journey from Traditional Business to Digital Business.5

Lessons from History
There is a lot to learn from the transition we made to the Industrial Revolution starting centuries ago. The industrial revolution was a term popularized by the English economic historian Arnold Toynbee in the second half of the 1800s. He coined the term to describe Britain’s economic development starting in 1760 – so it was named the Industrial Revolution nearly 100 years after it started6. This first Industrial Revolution was dominated by Britain that innovated first and adapted faster.  Many other countries fell significantly behind from an economic standpoint. Britain’s vast economic development of the time created wealth and global significance. The second and third industrial

About the Author:

Kerrie is passionate about business transformation and getting as many companies as possible on their journey to The Next Age™. Kerrie is a #1 Bestselling Business Author and CEO of Hoffman Digital, an ecosystem of companies “Igniting the Human Experience at Work”. This includes Strategic Advisor at GAVS, Partner at Get Digital Velocity, and Digital Advisor at FocalPoint Business Coaching.

Design Thinking on Big Data Architecture for AI/ML platforms

Bargunan Somasundaram

The core of any organization is its data – the crown jewels. Gone are the days where ‘locking up the crown jewels’ sustained businesses. Big data enables unlocking big insights to find newer business opportunities. Big data is the megatrend because it turns data into information, information into insights and insights are the business! The arrival of Big data also rung the death knell for customer segmentation, since each data over a period is a customer. Analyzing them uncovers the hidden patterns, unknown correlations, market trends, customer preferences, and other useful information. These help in making sound decisions at the macro level and focus on smaller areas of opportunities that might have gone unnoticed. The bigger the data, the more you get to focus on, eliminating the Marketing Myopia.

“If you torture the data long enough, it will confess.” – Ronald H. Coase

Roman census Approach for AI/ML platforms

Applying Big Data analytics in any business is never a cakewalk. To harness the value of big data, a robust Data Processing Architecture must be designed. One of the cornerstones of big data architecture is its processing, referred to as the ‘Roman census approach’. This approach, architecturally, is at the core of the functioning of big data. This approach enables big data architecture to accommodate the processing of almost unlimited amounts of data.

About 2000 years ago, the Romans decided that they wanted to tax everyone in the Roman Empire. But to do so, they had to have a census. They quickly figured out that trying to get every person in the Roman Empire to march through the gates of Rome to be counted was an impossibility. There were Roman citizens across the known world of that time. Trying to transport everyone on ships, carts and donkeys to and from the city of Rome was next to impossible.

So, the Romans realized that creating a census where the processing (i.e. counting) was done centrally was not going to work. They solved the problem by creating a body of census takers and sending them all over the Roman Empire. The results of the census were then tabulated centrally in Rome.

Similarly, the work being done was sent to the data, rather than trying to send the data to a central location and doing the work in one place. By distributing the processing, the Romans solved the problem of creating a census over a large diverse population.  

“Distribute the processing, not the data”

Processing is not centralized, instead, it is distributed if the amount of data to be processed is humongous. In doing so, it’s easy to service the processing over an effectively unlimited amount of data. A well-architected big data platform runs the AI/ML/DL algorithms in a distributed fashion on data.

The common components of big data architecture for AI/ML

Data sources

All big data solutions start with one or more data sources.  This can include data from relational databases, data from real-time sources (such as IoT devices), social media data like Twitter streams, LinkedIn updates, real-time user tracking clickstream logs, web server logs, network logs, storage logs, device logs, among others.

Real-time message ingestion

This layer is the first step for the data coming from variable sources to start its journey. Data here is prioritized and categorized, which makes it flow smoothly in further layers. The big data architecture must include a way to capture and store real-time messages for stream processing. This might be a simple data store, where incoming messages are dropped into Hadoop or caught in Kafka for processing. However, many solutions need a message ingestion store to act as a buffer for messages, and to support scale-out processing, reliable delivery, and other message queuing semantics. This portion of a streaming architecture is often referred to as stream buffering. Options include Azure Event Hubs, Azure IoT Hub, Apache Kafka, Apache Flume, Apache Nifi and Elasticsearch with Logstash.

Data storage

A robust, linearly scalable data store is needed for an all-encompassing big data platform, since this layer is at the receiving end for the big data. It receives data from the various data sources and stores it in the most appropriate manner. This layer can even change the format of the data per the requirements of the system. For example, batch processing data is generally stored in a distributed file storage system such as HDFS that can store high volume data that too in different formats. On the other hand, structured data can be stored using RDBMS only. It all depends on the format of the data and the purpose.

Apache HBase, Apache Cassandra, MongoDB, Neo4j, CouchDB, Riak, Apache Hive, Azure CosmosDB etc. are some of the NoSQL data stores that could be employed in a Big Data architecture

Batch processing

Often the data sets are so large, and the big data architecture must process data files using long-running batch jobs to filter, aggregate, and prepare the data for advanced analytics. Usually these jobs involve reading source files, processing them, and writing the output to new files.

The datasets in batch processing are typically

  • bounded: batch datasets represent a finite collection of data
  • persistent: data is almost always backed by some type of permanent storage
  • large: batch operations are often the only option for processing extremely large sets of data

Batch processing is well-suited for calculations where access to a complete set of records is required. For instance, when calculating totals and averages, datasets must be treated holistically instead of as a collection of individual records. These operations require that state be maintained for the duration of the calculations. Batch processing works well in situations where there isn’t any need for real-time analytics results, and when it is more important to process large volumes of information than it is to get quick analytics results.

Apache Hadoop and its MapReduce processing engine offer a well-tested batch processing model that is best suited for handling larger data sets where time is not a significant factor. The Apache Spark framework is the new kid on the block, that can run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk. Scala, Java, Python and R are the supported languages for spark.

Stream processing

Stream processing is a golden key in generating analytics results in real-time. Stream processing allows to process data in real-time as they arrive and quickly detect conditions from the point of receiving the data. Stream processing allows you to feed data into analytics tools as soon as they get generated and get instant analytics results. There are multiple open source stream processing platforms such as Apache Kafka, Confluent KSQL, Spark Streaming, Apache Beam, Apache Flink, Apache Storm, Apache Samza, etc.

Analytical data store

After batch or stream processing the generated analytical data must be stored in a data store. Operational Data Systems, consisting largely of transactional data, are built for quicker updates. Analytical Data Systems, which are intended for decision making, are built for more efficient analysis. Thus, analytical data is not a one-size-fits-all by any stretch of the imagination! Analytical data is best stored in a Data System designed for heavy aggregation, data mining, and ad hoc queries, called an Online Analytical Processing system, OLAP. Many big data solutions prepare data for analysis and then serve the processed data in a structured format that can be queried using analytical tools. Some of them are Apache Druid, Apache Hive, Azure Synapse Analytics, Elasticsearch, Apache SOLR, Amazon Redshift, among others.

Analysis and reporting

The goal of most big data solutions is to provide insights into the data through analysis and reporting. To empower users to analyze the data, the architecture may include a data modeling layer, such as a multidimensional OLAP cube or a custom UI. It might also support self-service BI, using the modeling and visualization technologies in Microsoft Power BI or Microsoft Excel or Qlik sense, etc. Analysis and reporting can also take the form of interactive data exploration by data scientists or data analysts.  A noteworthy reporting tool is Apache Zeppelin, a web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala.

Orchestration

Most big data solutions consist of repeated data processing operations, encapsulated in workflows, that transform source data, move data between multiple sources and sinks, load the processed data into an analytical data store, or push the results straight to a report or dashboard. To automate these workflows, orchestration technologies like Azkaban, Luigi, Azure Data Factory, Apache Oozie and Apache Sqoop can be employed.

“Without big data analytics, companies are blind and deaf, wandering out onto the web like deer on a freeway.” – Geoffrey Moore

Big data is a broad, rapidly evolving topic. While it is not well-suited for all types of computing, many organizations are turning to big data for certain types of workloads and using it to supplement their existing analysis and business tools. Big data systems are uniquely suited for finding patterns like correlation, prediction, anomalies and providing insight into behaviors that are impossible to find through conventional means. By correctly designing their architecture that deal with big data, organizations can gain incredible value from data that is already available.

Monitoring Microservices and Containers

Sivaprakash Krishnan

Padmapriya Sridhar

Monitoring applications and infrastructure is a critical part of IT Operations. Among other things, monitoring provides alerts on failures, alerts on deteriorations that could potentially lead to failures, and performance data that can be analysed to gain insights. AI-led IT Ops Platforms like ZIF use such data from their monitoring component to deliver pattern recognition-based predictions and proactive remediation, leading to improved availability, system performance and hence better user experience.

The shift away from monolith applications towards microservices has posed a formidable challenge for monitoring tools. Let’s first take a quick look at what microservices are, to understand better the complications in monitoring them.

Monoliths vs Microservices

A single application(monolith) is split into a number of modular services called microservices, each of which typically caters to one capability of the application. These microservices are loosely coupled, can communicate with each other and can be deployed independently.

Quite likely the trigger for this architecture was the need for agility. Since microservices are stand-alone modules, they can follow their own build/deploy cycles enabling rapid scaling and deployments. They usually have a small codebase which aids easy maintainability and quick recovery from issues. The modularity of these microservices gives complete autonomy over the design, implementation and technology stack used to build them.

Microservices run inside containers that provide their execution environment. Although microservices could also be run in virtual machines(VMs), containers are preferred since they are comparatively lightweight as they share the host’s operating system, unlike VMs. Docker and CoreOS Rkt are a couple of commonly used container solutions while Kubernetes, Docker Swarm, and Apache Mesos are popular container orchestration platforms. The image below depicts microservices for hiring, performance appraisal, rewards & recognition, payroll, analytics and the like linked together to deliver the HR function.

Challenges in Monitoring Microservices and Containers

Since all good things come at a cost, you are probably wondering what it is here… well, the flip side to this evolutionary architecture is increased complexity! These are some contributing factors:

Exponential increase in the number of objects: With each application replaced by multiple microservices, 360-degree visibility and observability into all the services, their interdependencies, their containers/VMs, communication channels, workflows and the like can become very elusive. When one service goes down, the environment gets flooded with notifications not just from the service that is down, but from all services dependent on it as well. Sifting through this cascade of alerts, eliminating noise and zeroing in on the crux of the problem becomes a nightmare.

Shared Responsibility: Since processes are fragmented and the responsibility for their execution, like for instance a customer ordering a product online, is shared amongst the services, basic assumptions of traditional monitoring methods are challenged. The lack of a simple linear path, the need to collate data from different services for each process, inability to map a client request to a single transaction because of the number of services involved make performance tracking that much more difficult.

Design Differences: Due to the design/implementation autonomy that microservices enjoy, they could come with huge design differences, and implemented using different technology stacks. They might be using open source or third-party software that makes it difficult to instrument their code, which in turn affects their monitoring.

Elasticity and Transience: Elastic landscapes where infrastructure scales or collapses based on demand, instances appear & disappear dynamically, have changed the game for monitoring tools. They need to be updated to handle elastic environments, be container-aware and stay in-step with the provisioning layer. A couple of interesting aspects to handle are: recognizing the difference between an instance that is down versus an instance that is no longer available; data of instances that are no longer alive continue to have value for analysis of operational efficiency or past performance.

Mobility: This is another dimension of dynamic infra where objects don’t necessarily stay in the same place, they might be moved between data centers or clouds for better load balancing, maintenance needs or outages. The monitoring layer needs to arm itself with new strategies to handle moving targets.

Resource Abstraction: Microservices deployed in containers do not have a direct relationship with their host or the underlying operating system. This abstraction is what helps seamless migration between hosts but comes at the expense of complicating monitoring.

Communication over the network: The many moving parts of distributed applications rely completely on network communication. Consequently, the increase in network traffic puts a heavy strain on network resources necessitating intensive network monitoring and a focused effort to maintain network health.

What needs to be measured

This is a high-level laundry list of what needs to be done/measured while monitoring microservices and their containers.

Auto-discovery of containers and microservices:

As we’ve seen, monitoring microservices in a containerized world is a whole new ball game. In the highly distributed, dynamic infra environment where ephemeral containers scale, shrink and move between nodes on demand, traditional monitoring methods using agents to get information will not work. The monitoring system needs to automatically discover and track the creation/destruction of containers and explore services running in them.

Microservices:

  • Availability and performance of individual services
  • Host and infrastructure metrics
  • Microservice metrics
  • APIs and API transactions
    • Ensure API transactions are available and stable
    • Isolate problematic transactions and endpoints
  • Dependency mapping and correlation
  • Features relating to traditional APM

Containers:

  • Detailed information relating to each container
    • Health of clusters, master and slave nodes
  • Number of clusters
  • Nodes per cluster
  • Containers per cluster
    • Performance of core Docker engine
    • Performance of container instances

Things to consider while adapting to the new IT landscape

Granularity and Aggregation: With the increase in the number of objects in the system, it is important to first understand the performance target of what’s being measured – for instance, if a service targets 99% uptime(yearly), polling it every minute would be an overkill. Based on this, data granularity needs to be set prudently for each aspect measured, and can be aggregated where appropriate. This is to prevent data inundation that could overwhelm the monitoring module and drive up costs associated with data collection, storage, and management.    

Monitor Containers: The USP of containers is the abstraction they provide to microservices, encapsulating and shielding them from the details of the host or operating system. While this makes microservices portable, it makes them hard to reach for monitoring. Two recommended solutions for this are to instrument the microservice code to generate stats and/or traces for all actions (can be used for distributed tracing) and secondly to get all container activity information through host operating system instrumentation.    

Track Services through the Container Orchestration Platform: While we could obtain container-level data from the host kernel, it wouldn’t give us holistic information about the service since there could be several containers that constitute a service. Container-native monitoring solutions could use metadata from the container orchestration platform by drilling into appropriate layers of the platform to obtain service-level metrics. 

Adapt to dynamic IT landscapes: As mentioned earlier, today’s IT landscape is dynamically provisioned, elastic and characterized by mobile and transient objects. Monitoring systems themselves need to be elastic and deployable across multiple locations to cater to distributed systems and leverage native monitoring solutions for private clouds.

API Monitoring: Monitoring APIs can provide a wealth of information in the black box world of containers. Tracking API calls from the different entities – microservices, container solution, container orchestration platform, provisioning system, host kernel can help extract meaningful information and make sense of the fickle environment.

Watch this space for more on Monitoring and other IT Ops topics. You can find our blog on Monitoring for Success here, which gives an overview of the Monitor component of GAVS’ AIOps Platform, Zero Incident FrameworkTM (ZIF). You can Request a Demo or Watch how ZIF works here.

About the Author:

Priya is part of the Marketing team at GAVS. She is passionate about Technology, Indian Classical Arts, Travel and Yoga. She aspires to become a Yoga Instructor some day!

About the Author:

Siva is a long-timer at Gavs and has been with the company for close to 15 years. He started his career as a developer and is now an architect with a strong technology background in JAVA, Big Data, DevOps, Cloud, Containers and Micro Services. He has successfully designed & created a stable Monitoring platform for ZIF, and designed & driven cloud assessment and migration, enterprise BRMS and IoT based solutions for many of our customers. He is currently focused on building ZIF 4.0, a new-gen business-oriented tech ops platform.

CCPA Relevance in Healthcare

The California Consumer Privacy Act (CCPA) is a state statute intended to enhance consumer protection and data privacy rights of the residents of California, United States. It is widely considered one of the most sweeping consumer privacy laws, giving Californians the strongest data privacy rights in the U.S.

The focus of this article is CCPA as it applies to Healthcare. Let’s take a quick look at what CCPA is and then move onto its relevance for Healthcare entities. CCPA is applicable to any for-profit organization – regardless of whether it physically operates out of California – that interacts with, does business with and/or collects, processes or monetizes personal information of California residents AND meets at least one of these criteria: has annual gross revenue in excess of $25 million USD; collects or transacts with the personal information of 50,000 or more California consumers, households, or devices; earns 50% or more of its annual revenue by monetizing such data. CCPA also empowers California consumers with the rights to complete ownership; control; and security of their personal information and imposes new stringent responsibilities on businesses to enable these rights for their consumers.

Impact on Healthcare Companies

Companies directly or indirectly involved in the healthcare sector and dealing with medical information are regulated by the Confidentiality of Medical Information Act (CMIA) and the Health Insurance Portability and Accountability Act (HIPAA). CCPA does not supersede these laws & does not apply to ‘Medical Information (MI)’ as defined by CMIA, or to ‘Protected Health Information (PHI)’ as defined by HIPAA. CCPA also excludes de-identified data and information collected by federally-funded clinical trials, since such research studies are regulated by the ‘Common Rule’.

The focus of the CCPA is ‘Personal Information (PI)’ which means information that “identifies, relates to, describes, is capable of being associated with, or could reasonably be linked, directly or indirectly, with a particular consumer or household.” PI refers to data including but not limited to personal identifiers such as name, address, phone numbers, email ids, social security number; personal details relating to education, employment, family, finances; biometric information, geolocation, consumer activity like purchase history, product preferences; internet activity.   

So, if CCPA only regulates personal information, are healthcare companies that are already in compliance with CMIA and HIPAA safe? Is there anything else they need to do?

Well, there is a lot that needs to be done! This only implies that such companies should continue to comply with those rules when handling Medical Information as defined by the CMIA, or Protected Health Information, as defined by HIPAA. They will still need to adhere to CCPA regulations for personal data that is outside of MI and PHI. This will include employee personal information routinely obtained and processed by the company’s HR; those collected from websites, health apps, health devices, events; clinical studies that are not funded by the federal government; information of a CCPA-covered entity that is handled by a non-profit affiliate, to give a few examples.  

There are several possibilities – some not so apparent – even in healthcare entities, for personal data collection and handling that would fall under the purview of CCPA. They need to take stock of the different avenues through which they might be obtaining/handling such data and prioritize CCPA compliance. Else, with the stringent CCPA regulations, they could quickly find themselves embroiled in class action lawsuits (which by the way, do not require proof of damage to the plaintiff) in case of data breaches, or statutory penalties of up to $7500 for each violation.

The good news is that since CCPA carves out a significant chunk of data that healthcare companies/those involved in healthcare-related functions collect and process, entities that are already complying with HIPAA and CMIA are well into the CCPA compliance journey. A peek into the kind of data CMIA & HIPAA regulate will help gauge what other data needs to be taken care of. 

CMIA protects the confidentiality of Medical Information (MI) which is “individually identifiable information, in electronic or physical form, in possession of or derived from a provider of health care, health care service plan, pharmaceutical company, or contractor regarding a patient’s medical history, mental or physical condition, or treatment.”

HIPAA regulates how healthcare providers, health plans, and healthcare clearinghouses, referred to as ‘covered entities’ can use and disclose Protected Health Information (PHI), and requires these entities to enable protection of data privacy. PHI refers to individually identifiable medical information such as medical records, medical bills, lab tests, scans and the like. This also covers PHI in electronic form(ePHI). The privacy and security rule of HIPAA is also applicable to ‘business associates’ who provide services to the ‘covered entities’ that involve the use or disclosure of PHI. 

Two other types of data that are CCPA exempt are Research Data & De-Identified Data. As mentioned above, the ‘Common Rule’ applies only to federally-funded research studies, and the CCPA does not provide much clarity on exemption status for data from clinical trials that are not federally-funded.

And, although the CCPA does not apply to de-identified data, the definitions of de-identified data of HIPAA and CCPA slightly differ which makes it quite likely that de-identified data by HIPAA standards may not qualify under CCPA standards and therefore would not be exempt from CCPA regulations.

Compliance Approach

Taking measures to ensure compliance with regulations is cumbersome and labour-intensive, especially with the constantly evolving regulatory environment. Using this opportunity for a proactive, well-thought-out approach for comprehensive enterprise-wide data security and governance will be strategically wise since it will minimize the need for policy and process rehaul with each new regulation.

The most crucial step is a thorough assessment of the following:   

  • Policies, procedures, workflows, entities relating to/involved in data collection, sharing and processing, in order to arrive at clear enterprise-wide data mapping; to determine what data, data activities, data policies would fall under the scope of CCPA; and to identify gaps and decide on prioritized action items for compliance.
  • Business processes, contracts, terms of agreement with affiliates, partners and third-party entities the company does business with, to understand CCPA applicability. In some cases, HIPAA and CMIA may be applicable to only the healthcare-related business units, subjecting other business units to CCPA compliance.
  • Current data handling methods, not just its privacy & security. CCPA dictates that companies need to have mechanisms put in place to cater to CCPA consumer right to request all information relating to the personal data collected about them, right to opt-out of sale of their data, right to have their data deleted by the organization (which will extend to 3rd parties doing business with this organization as well).

Consumer Consent Management

With CCPA giving full ownership and control of personal data back to its owners, consent management mechanisms become the pivot of a successful compliance strategy. An effective mechanism will ensure proper administration and enforcement of consumer authorizations.

Considering the limitations of current market solutions for data privacy and security, GAVS has come up with its Blockchain-based Rhodium Framework (pending patent) for Customer Master Data Management and Compliance with Data Privacy Laws like CCPA. 

You can get more details on CCPA in general and GAVS’ solution for true CCPA Compliance in our White Paper, Blockchain Solution for CCPA Compliance.