Manojkumar Rajendran

BBC’s ‘Monty Python’, a comedy series, that aired during the late 1960s was a huge hit. The Python programming language, released in early 1990s, turned out to be a huge hit too in the software fraternity. Reasons for the hit runs into a long list — be it the dynamic typing, cross-platform portability, enforced readability of code, or a faster development turnaround.

Python was conceived by a Dutch programmer, Guido Van Rossum, who invented it during his Christmas holidays.

The ascent of the language has been observed since 2014 owing to its popularity in the Data science and AI domains. See the Google Trends report in Exhibit 1. No wonder Python has risen to the 3rd position in the latest TIOBE programming index.

Exhibit 1: Google Trends Report

Exhibit 2: TIOBE index

Python is being used by a surprisingly wide array of domains/ industries. The power of Python is exploited in the development of various popular web applications like YouTube, DropBox and BitTorrent. NASA has used it in space shuttle mission design and in the discovery of ‘Higgs-boson’ or God particle. The top security agency NSA used it for cryptography, thanks to its rich set of modules. It has also been used by entertainment giants like Disney and Sony DreamWorks to develop games and movies.

Now that the data is becoming ‘BIG’, programmers are resorting to Python for web scraping/sentiment analysis. Think of Big Data and the first technology that comes to a programmer’s mind in processing it (ETL and data mining) is Python.

Learning Python is quite fun. Thanks to an innovative project called Jupyter, even a person who is getting his feet wet in programming can quickly learn the concepts.

Possessing the features of both scripting languages like TCL, Perl, Scheme and systems programming languages like C++, C and Java, Python is easy to run and code.

Show a Java program and a Python script to a novice programmer; he will definitely find the Python code more readable. It is a language that enforces indentation. That is why no Python code looks ‘ugly’. The source code is first converted to platform independent byte code making Python a cross platform language. You don’t need to compile and run, unlike C and C++, thus making the life of software developers easier.

Let’s draw a comparison between Python and C++. The former is an interpreted language while the latter is a compiled one. C++ follows a two-stage execution model while Python scripts bypass the compilation stage.

In C++, you use a compiler that converts your source code into machine code and produces an executable. The executable is a separate file that can then be run as a stand-alone program.

Exhibit 3

This process outputs actual machine instructions for the specific processor and operating system it’s built for. As shown in Exhibit 4, you’d have to recompile your program separately for Windows, Mac, and Linux:

Exhibit 4

You’ll likely need to modify your C++ code to run on those different systems as well.

Python, on the other hand, uses a different process. Now, remember that you’ll be looking at CPython, written in C, which is the standard implementation for the language. Unless you’re doing something special, this is the Python you’re running. CPython is faster than Jython (Java implementation of Python) or IronPython (Dot net implementation).

Python runs each time you execute your program. It compiles your source just like the C++ compiler. The difference is that Python compiles to bytecode instead of native machine code. Bytecode is the native instruction code for the Python virtual machine. To speed up subsequent runs of your program, Python stores the bytecode in .pyc files:

Exhibit 5

If you’re using Python 2, then you’ll find these files next to the .py files. For Python 3, you’ll find them in a __pycache__ directory. Python 2 and 3 are two major releases of Python, and 2.x will be obsolete by the year 2020. Python 3 is the preferred version among the development fraternity, thanks to its advanced features and optimized functionalities. The latest Python version is 3.7.

The generated bytecode doesn’t run natively on your processor. Instead, it’s run by the Python virtual machine. This is similar to the Java virtual machine or the .NET Common Runtime Environment. The initial run of your code will result in a compilation step. Then, the bytecode will be interpreted to run on your specific hardware.

Exhibit 6

If the program hasn’t been changed, each subsequent run will skip the compilation step and use the previously compiled bytecode to interpret:

Exhibit 7

Interpreting code is going to be slower than running native code directly on the hardware. So why does Python work that way? Well, interpreting the code in a virtual machine means that only the virtual machine needs to be compiled for a specific operating system on a specific processor. All the Python code it runs will run on any machine that has Python.

Another feature of this cross-platform support is that Python’s extensive standard library is written to work on all operating systems.

Using pathlib (a Python module), for example, will manage path separators for you whether you’re on Windows, Mac, or Linux. The developers of those libraries spent a lot of time making it portable, so you don’t need to worry about it in your Python program!

Python’s philosophy is “Everything in Python is an object”, just like in Linux where “Everything in Linux is a file”. By designing the core data types as objects, one can leverage the power of attributes of an object for solving problems. Every object or every datatype will have a unique set of attributes.

It can interact with all databases including SQL databases such as Sybase, Oracle, MySQL and NoSQL databases such as MongoDB, CouchDB. In fact, the ‘dictionary’ data structure that Python supports is ideal for interacting with a NoSQL database such as MongoDB which processes documents as key-value pairs. Web frameworks written in Python such as Flask, Django facilitate faster web application building & deployment. It is also employed to process unstructured data or ‘Big Data’ & business analytics. Notable to mention are Web Scraping/Sentiment Analysis, Data Science and Text Mining. It is also used with R language in statistical modeling given the nice visualization libraries it supports, such as Seaborn, Bokeh, and Pygal. If you’re used to working with Excel, learn how to get the most out of Python’s higher-level data structures to enable super-efficient data manipulation and analysis.

Python is also a glue language by facilitating component integration with many other languages. Integrating Python with C++ or Dot net is possible through the middleman Numpy. Numpy, one of the PyPI modules, acts as a bridge between other languages and Python. PyPI is a growing repository of two hundred thousand modules. So, any developer can check out PyPI before venturing out to write their code. There are also active Python communities available to clarify our queries.

Companies of all sizes and in all areas — from the biggest investment banks to the smallest social/mobile web app startups — are using Python to run their business and manage their data, especially because of its OSI-approved open source license and the fact that it can be used for free. Python is not an option anymore but rather a de facto standard for programmers & data scientists.