Scala vs Python

Scala vs python

If you are wondering whether you’d better learn Scala or Python… or both, you might want to read this.

Scala is a statically typed language, which means that the type of the variable is known at compile time (the programmer must specify what type each variable is). Python on the contrary is dynamically typed, which means that type is inferred by Python. So, defining a variable is a bit quicker in Python than in Scala.
If you really need to specify some domain for your variables (in Math for instance), you should probably go for a static typing because type checking, although possible in Python, would make it more verbose and slower.

Scala is both functional and Object Oriented (OO), and Python, often thought as a procedural and OO language, is well equipped for functional programming. Python was not designed for functional programming but it has some tools like lambda, map(), reduce(), filter() that are acting so.

Scala can also use java libraries as well as many other JVM languages like Clojure, Groovy, etc. Scala will allow you to learn much more things than Python, but Python has a lot of great libraries maintained by a great community. You can almost do anything in both Python and Scala, but you probably will take more time with scala than with python even if scala is less verbose than Java.

Scala is very fast, about only 2 to 3 times slower than C, whereas Python would be about 50 times slower. Note that these numbers are very general and actually depend on what you actually do.

Scala lacks the same amount of data science libraries and tools as Python.
It is always possible to run R code from python which interfaces with many languages. Scala did not get that far. For instance for exploration purposes, R libraries might quickly help to visualize metrics/dimensions. For production purposes with needs of concurrent access and computing efficiency, Scala would be the way to go. Keep in mind that if Python has better libraries for Machine learning (ML) and Natural Language Processing (NLP) it is not designed for big data, whereas Scala is big data oriented. Spark MLLib has fewer algorithms but they are perfect for big data.

Scala is designed for distributed systems: Apache Spark and Storm have been written in scala. Performance with Scala is usually better than with traditional languages like Python and R. Scala integrates well with the big data eco-system, which is mostly JVM based. There are frameworks on top of java libraries like Scalding (Cascading), Summingbird (Scalding and Storm), Scrunch (Crunch), Flink (Java core with Scala API), ones built from scratch but interface with JVM systems, like Spark and Kafka. The Scala API are usually more flexible than Hadoop streaming with Python, PySpark or Python bolts in Storm, since you have direct access to the underlying API. There are also a wide range of data storage solutions that are built for or work well with JVM like Cassandra or HBase. Another great benefit is the functional paradigm that fits well within the mapreduce and bigdata model.

What about web frameworks? Scala has Play and Python has Django. Django is an older project and it is used in companies like Instagram and Pinterest. LinkedIn has migrated part of its stack to Play framework. Twitter also moved from rails to scala for its technical stack. Hue, the web user interface for Hadoop and developped by Cloudera, is based on Django. Play for its computing efficiency is often prefered in enterprise. If you want to get some more insight about Django vs Play, go here.

Want to know more about Python vs Scala, go here.

Written by Jean-Baptiste Poullet

Data analyst – consultant – freelancer
Expert in Bigdata
Founder of RBelgium – R community in Belgium
Owner of the company Stat’Rgy
Contact me at jeanbaptistepoullet@statrgy.com

Posted in Uncategorized.

One Comment

  1. Pingback: 6 points to compare Python and Scala for Data Science using Apache Spark | Vademecum of Practical Data Science

Leave a Reply

Your email address will not be published. Required fields are marked *