The remote-work startups' wave

Two months ago, I posted about why I think the current situation will change the perspective of many people about remote work. These days we are witnessing a dramatic change coming into effect with companies like Twitter and Facebook spearheading the trend and questioning many of the things we used…

Your absolute guide to managing Hadoop Logging Configurations

Hadoop became an essential componenet of the infrastructure of any company nowadays. There are different distributions maintained and managed by different companies like Cloudera, Databricks and AWS. The distribution managed by AWS is named EMR. This distribution is supposdly fully managed by AWS (Not everything). One of the things that…

My reflections on Careem Deal with Uber

I still remember when I was very happy to see Souq acquired in 2017 anticipating the potential, this will bring to the region. Now, after the acquisition of Careem, this will create a ripple effect which will isA change the whole region. It won't just open the door for a…

How to become a Machine Learning Engineer ?

Recently I posted a job post for a machine learning engineer position in my team @ Careem. I got a lot of questions from people who are curious to know more about the position itself, how to prepare for it and what kind of problems does it face ? Data Science vs…

[Clean Code] Clean Code Debate !

In this series of posts , I try to share with you some of my highlights on Clean Code Chapters. Lots of people argue about the necessity of applying clean code principles and whether spending time on refactoring is an important step of the software development process or just a waste…

Why Visualization is important in Data Science ?

Our minds are always able to comprehend pictures much faster than anything else. Data Visualization can be used for two purposes : 1- Illustration & Presentation Minard's Visualization Of Napoleon's 1812 March which tried to illustrate the cartographic depiction of numerical data on a map of Napoleon's disastrous losses suffered during…

Using Spark For Data Exploration

Spark is actively supported by Apache Open Source community, and it is used in production by many famous firms and companies. In this blog, the focus would be on productionizing Apache Spark. I will discuss the use cases of Spark and how to enable each of them on production environment.…

Productionizing Apache Spark (Data Pipelines)

Apache Spark On Production (for Data Pipelines) This is the second post about Running Spark On Production, you can read the first post from here In the first post, we talked briefly about spark and then discussed the data exploration use case and compared between the available different tools . In…