The best language for data scientists/engineers !?

I recently answered this questions from a friend and I would like to share my answer with you. The question originally was

As a data scientist or data engineer , what the best language for your job java or python ?
specially in NLP and text mining field.

It mainly depends on your targets. If you are building a component in a system built on top JVM. Java would be a better option. Most of the NLP libraries are C at the core so there are sometime multiple wrappers for the single library in different languages or a common interface for example (REST Endpoint for example).

If you are doing a side project, then list your requirements, explore the available different tools and choose the language that would fit best with this library.

In the NLP field, I'm aware of Apache Lucene, Stanford NLP tools, Mallet and Carrot2 and they all are in Java.

I'm sure that there are a lot of NLP tools for python users too, I can recall NLTK.

You also shouldn't limit yourself to the only very specific tools for NLP. For example, you may need a classifier and in that case a general machine learning library like Spark MLib or Apache Mahout may help.

To wrap up, as a data engineer, it is recommended to know Java even if your day to day work isn't in it as most of the big data tools are written in JVM languages (Java, Scala).

Python is also used heavily especially for data exploration and prototyping scenarios so it is a good knife to have under your belt.

At the end, it depends mainly on the problem, the environment you are working on, the skills of your team mates and other things :)

Subscribe to Learn With Passion

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe