Skip to content

GAVS – Global IT Consulting

Menu
  • Platforms & Products
    • Platforms & Products

      GAVS’ products will help change how you organize your IT Operations, bring meaningful and actionable insights to speed up network fixes, provide real data as quantifiable justification to adopt strategies that foster business improvements.

      • ZIF
      • Products
        • zDesk – Remote, Secure Desktop-as-a-Service (VDI+)
        • zIrrus
        • GTOps
        • TruOps
        • Close
    • Products & Platforms
      • Reimagining your Digital Infrastructure with Zero Incident FrameworkTM

        Read more
    Close
  • Services & Technologies
    • Services & Technologies

      GAVS is a global IT services provider with focus on AI-led Managed Services and Digital Transformation. GAVS’ AIOps platform, Zero Incident Framework ™ (ZIF), enables proactive detection and remediation of incidents and increases uptime, helping organizations drive towards a Zero Incident Enterprise™ . GAVS has transformed IT Enterprise delivery through ZIF’s Discover, Monitor, Analyze, Predict, and Remediate modules, to optimize business services continuity.

      • Digital Services
        • Auto Discovery and Dependency Mapping
        • Cloud Enablement
          • Cloud Advisory and Transformation
          • Close
        • Automation
        • Blockchain
        • Close
      • Cyber Security Services
        • Assessment & Advisory
        • Identity & Access Management (IAM)
        • Managed Detection & Response (MDR)
        • Managed Security Services (MSS)
        • Security Automation
        • Risk & Compliance
        • Close
      • Data Privacy Services
      • Consulting & Implementation Services
        • Cloud Advisory and Transformation
        • Data Center Assessment
        • Data Center-as-a-Service (DCaaS)
        • Infrastructure re-engineering
        • Data Center Consolidation & Migration
        • Close
      • Application Services
      • Enterprise Support Services
        • Managed Infrastructure Support
        • Remote Infrastructure Monitoring
        • End User Monitoring
        • Close
      • Microsoft Services
    • Services &Technologies
      • Reinforcement Learning- The Art of Teaching Machines

        Read more
    Close
  • Industries
    • Industries

      GAVS Technologies focuses on serving various industry verticals in their digital transformation through infrastructure solutions, adopting innovation and technologies in different domains. We offer services and solutions aligned with technology trends to enable enterprises to take advantage of futuristic technologies like DevOps, Smart Machines, Cloud, IoT, Predictive Analytics, Managed Infrastructure Services, and Security services.

      • Industries Overview
      • Healthcare
      • Banking & Financial Services
      • Manufacturing
      • Media & Publishing
    Close
  • Inside GAVS
    • Inside GAVS

      GAVS is a global IT services provider with focus on AI-led Managed Services and Digital Transformation. GAVS’ AIOps platform, Zero Incident Framework™ (ZIF), enables proactive detection and remediation of incidents and increases uptime, helping organizations drive towards a Zero Incident Enterprise™ . GAVS has transformed IT Enterprise delivery through ZIF’s Discover, Monitor, Analyze, Predict, and Remediate modules, to optimize business services continuity.

      • About Us
      • Client Speak
      • Alliances & Partnerships
      • Leadership Team
      • Social Responsibility
      • Events
      • Locations
      • Contact Us
      • Press Releases
      • Media Mentions
      • Awards and Recognitions
      • In Memoriam
      • Covid Care
    Close
  • Insights
    • Insights

      We bring you discerning insights on technology trends, innovation and organization culture, thru our collection of articles, blogs and more. Insights reflects our passion in driving advancements as we move forward creating new paradigms in business and work culture. You would find our thoughts on a variety of topics ranging from evolving technologies and ways it affects businesses and lives, transformational leadership, high impact teams, diversity, inclusion and much more.

      • Blogs
      • Articles
      • White Papers
      • Brochures
      • Videos
      • Case Studies
      • enGAge Magazine
    • insights
      • Seven Tips for Leading IT Modernization and Digital Transformation

        Read more

    Close
  • Work With Us
    • Work with us

      What it means to be a GAVSian?

      If you rate high on our SWAT test (Smart, Hardworking, Articulate, Technologically curious), GAVS’ hiring profile, we promise you excitement, inspiration and the freedom to succeed in our flat organization. Being a GAVSian, you would represent our cutting edge in technological advancement while we help you hone yourself into the person you aspire to be. That’s the level of personal interest we invest in you.

      • Career with GAVS
      • Company Culture
      • Diversity @ GAVS
      • Building a respectful workplace
    Close
Back to blogs

Why Scala for Big data and Machine Learning?

Aug 14, 2019
SHARE

In this blog post

  • Scala as Language for Frameworks.
  • Scala packs the punch of both Functional and object-oriented programming.
  • About that static typing system:
  • Concise programming with scala
  • Scala equivalent of reversing the list.
  • Streams processing in real-time
  • Plethora of Machine learning Libraries and evolving communities

By Bargunan Somasundaram

Before delving into why scala for the Big Data and Machine learning, lets me address what are these jargons and how they are interrelated?

Big data, machine learning, statistics, statistical machine learning are the terms that are surfacing the IT world recently. We are in the new era of huge data being generated each second. According to forbes.com, at our current pace, 2.5 quintillion bytes of data created each day, with the growth of IOT. From these data, the insights are generated to lead new business. The process of analyzing and extracting information to generate insights from this big amount of structured, semi-structured, unstructured data is called big data.

Now with the help of Artificial Intelligence and algorithms if the system is able to automatically learn and improve to generate insights from data, without any explicit intervention or rule-based programming, then it’s called machine learning.

We are in the midst of a data revolution, and this has given rise to completely new data formats and databases of unprecedented scale. This humongous rise in the data and the ability to analyze extract and generate from insights has related big data with machine learning.

As a rule of thumb, the accuracy of pattern finding or data mining or a knowledge discovery of the machine learning algorithm depends on the volume of the data that the algorithm has processed. So more the data, more the learning.
Python and R are the prominent programming languages for machine learning and data sciences. Now scala is climbing the ladder fast due to the rise in usage of Apache Spark.

Scala as Language for Frameworks.

Some of the frameworks that rule the roost in the Big data world are

  • Apache Spark is a unified analytics engine for big data processing with lot more features like SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming.

Apache Spark, built on Scala has gained a lot of recognition and is being used widely in productions. Resilient Distributed Datasets (RDD) is a fundamental data structure of Spark. Immutable, distributed, lazily evaluated, catchable are its common properties.

  • Apache Kafka - a distributed streaming platform for handling real-time data feeds.

Written in Java and Scala, Kafka is a fast, scalable, durable, and fault-tolerant publish-subscribe messaging system. It works in combination with Apache Storm, Apache HBase and Apache Spark for real-time analysis and rendering of streaming data.

  • Apache Samza a stream processing framework developed in scala.

Apache Samza uses Apache Kafka for messaging, and Apache Hadoop YARN to provide fault tolerance, processor isolation, security, and resource management. Samza is similar to Apache Storm while it is easier to operate. Samza stream processing job were written in Scala.

  • Apache scalding a Scala API for the Cascading, an abstraction of MapReduce

Built on top of Cascading, a Java library that abstracts Hadoop MapReduce, Scalding simplifies writing the MapReduce jobs in Scala. Scalding is comparable to Pig, while offering tight integration with Scala

  • Apache Flink — a framework for distributed stream and batch data processing

Flink’s core is a hybrid (Real-Time Streaming + Batch) distributed data processing engine written in Java and Scala. Flink contains several APIs for batch processing (DataSet API), real-time streaming (DataStream API) and relational queries (Table API) and also domain-specific libraries for machine learning (FlinkML — pure Scala), complex event processing (CEP) and graph processing (Gelly).

  • Akka — a concurrent framework for building distributed applications

Akka is an actor-based message-driven runtime for managing concurrency, elasticity and resilience on the JVM that supports Java and Scala. Akka uses Actor Model that is an ideal model for highly scalable and concurrent systems.

Scala packs the punch of both Functional and object-oriented programming.

Not to mention that scala is one of the JVM language and its biggest advantages is its support for both object-oriented and functional programming. Both programming approaches aim to create readable, bug-free code, but they go about it in very different ways. Where object-oriented programming combines data structures with the actions you want to perform on them, functional programming keeps both separate.

Each approach has its advantages. For many people, the object-oriented paradigm makes intuitive sense, and combining behaviors with the data structures they’ll interact with can make it easy to figure out what’s going on in an unfamiliar codebase. At the same time, functional programming’s preference for cleanly separated and immutable data structures and discrete behaviors often allows you to do more with less code. Functional programming aims at the usage of Lambda Expressions. The point of all lambdas is deferred execution. After all, if you wanted to execute some code right now, you’d do that, without wrapping it inside a lambda.

There are many reasons for executing code later, such as

  • Running the code in a separate thread
  • Running the code multiple times
  • Running the code at the right point in an algorithm (for example, the comparison operation in sorting)
  • Running the code when something happens (a button was clicked, data has arrived, and so on)
  • Running the code only when necessary

It is a good idea to think through what you want to achieve when you set out programming with lambdas. Let us look at a simple example. Suppose you log an event:

logger.info(“x: ” + x + “, y: ” + y);

What happens if the log level is set to suppress INFO messages? The message string is computed and passed to the info method, which then decides to throw it away. Wouldn’t it be nicer if the string concatenation only happened when necessary? Running code only when necessary is a use case for lambdas

Scala is a fully-fledged OOP language, and it’s possible to write highly elegant and expressive programs without even touching its functional attributes. But for those who are curious about functional programming, Scala provides a rich set of collection operations (like map and reduce), higher-order functions, and a strong static typing system.

About that static typing system:

Where many other modern programming languages are dynamically typed, Scala checks types at compile time, meaning that many trivial but costly bugs can be caught at compile time rather than in production. At the same time, Scala has a highly sophisticated type system, meaning that developers can enjoy the security of compile-time type-checking without having to worry about specifying every type every time.

Concise programming with scala

  1. Scala programming language is concise. Several loops can be replaced by a single word that makes it significantly less verbose than standard Java. In addition, its statically typed and functional nature makes it type-safe.

Eg In java code to reverse a list.

List<String> reversedValues = new ArrayList<String>();

    for (String n : nameList)

{

reversedValues .add(n.reverse());

}

return reversedValues ;

Scala equivalent of reversing the list.

  for (n <- nameList) yield n.reverse or nameList.map(_.reverse)

  • Pattern matching mechanism — the second most used feature of Scala, which allows to match on any sort of data with a first-match policy.
  • The ability to use functions as variables and reusing utility functions

Streams processing in real-time

While the Hadoop MapReduce can process and generate large datasets in-parallel, it has been criticized for the inability to handle real-time stream processing. Spark gives Scala an edge over other programming languages to process streams in real-time. It has made Scala the computational engine for the fast data processing.

Plethora of Machine learning Libraries and evolving communities

Even though Scala’s libraries are not as comprehensive as Python or R libraries, they provide a solid foundation for big data projects. Awesome Machine Learning which is a curated list of machine learning frameworks, libraries and software (covering several languages), presents a list of useful Scala libraries and tools for Machine Learning, data analysis, data visualization, and NLP. In addition, Typelevel provides several helpful libraries and extensions to Scala.

Following libraries are few of the most used machine learning and data analysis libraries:

  1. Saddle — a high-performance data manipulation library (strongly influenced by the pandas library for Python)
  2. ScalaNLP — a suite of different libraries, including Breeze (set of libraries for machine learning and numerical computing) and Epic (high-performance statistical parser and structured prediction library).
  3. Apache Spark MLlib — machine learning library for Scala, Java, Python, and R
  4. Apache PredictionIO — a machine learning server based on Apache Spark, HBase and Spray that can be installed as a full machine learning stack
  5. DEEPLEARNING4J — a distributed deep-learning library for Java and Scala
  6. Scala-datatable and Framian — for data frames and data tables

Scala has an active community that is expanding rapidly. According to the KDnuggets Analytics/Data Science 2016 Software Poll, Scala was among the tools with the highest growth.

Scala has an active community on Stack Overflow, in addition to its large community on GitHub and Reddit

About the Author: I’m an open source lover and a Java enthusiast. It’s my passion to share my knowledge by writing my experience about them. I believe “Gaining knowledge is the first step to wisdom and sharing it is the first step to humanity. “



FinOps
Why is Traditional Budgeting making way for FinOps?
Read More
Comprehensive IAM
Comprehensive IAM for Digital Identities in Hybrid & Cloud Infra
Read More
virtual desktop infrastructure solutions
How to Make Your End-user Devices Compliant for Remote Workers
Read More
GAVS – Global IT Consulting

Copyright © 2022, GAVS Technologies.

  • Privacy Policy
  • Cookie Policy
  • Terms of use
  • Contact Us
  • Platforms & Products
    • Platforms & Products
    • Products
      • Zero Incident Framework ™
      • Products
      • zDesk – Remote, Secure Desktop-as-a-Service (VDI+)
      • GTOps
      • TruOps
      • zIrrus
  • Services & Technologies
    • Services & Technologies
    • Digital Services
      • Digital Services
      • Auto Discovery and Dependency Mapping
      • Cloud Enablement
        • Cloud Advisory and Transformation
      • Automation
      • Blockchain
    • Data Privacy Services
    • Cyber Security Services
      • Cyber Security Services
      • Risk and Compliance
      • Security Automation
      • Managed Security Services (MSS)
      • Managed Detection and Response (MDR)
      • Identity and Access Management
      • Assessment and Advisory
    • Consulting & Implementation Services
      • Consulting & Implementation Services
      • Cloud Assessment & Advisory
      • Data Center Assessment
      • Data Center-as-a-Service (DCaaS)
      • Infrastructure re-engineering
      • Data Center Consolidation & Migration
    • Application Services
    • Enterprise Support Services
      • Enterprise Support Services
      • Managed Infrastructure Support
      • Remote Infrastructure Monitoring
      • End User Monitoring
    • Microsoft Services
  • Industries
    • Industries Overview
    • Healthcare
    • Banking & Financial Services
    • Manufacturing
    • Media & Publishing
  • Inside GAVS
    • Inside GAVS
    • About Us
    • Industries
    • Client Speak
    • Alliances & Partnerships
    • Leadership Team
    • Social Responsibility
    • Events
    • Find us
    • Reaching us
    • Press Releases
    • Media Mentions
    • Awards and recognitions
    • In Memoriam
    • Covid Care
  • Insights
    • Insights
    • Articles
    • Blogs
    • White Papers
    • Case Studies
    • Brochures
    • Videos
    • enGAge Magazine
  • Work with us
    • Work with us
    • Career with GAVS
    • Company Culture
    • Diversity @ GAVS
    • Building a respectful workplace

Schedule a Demo