Exploring Trends in Data Science in 2017

Kaggle recently opened up the results of their “State of Data Science and Machine Learning” survey of its users and I thought it’d be interesting to start to explore that a bit. While the data generated provides a fascinating look into the current state of Data Science by country and industry, filtering by certain dimensions reduces the sample size considerably (i.e. if we wish to compare metrics for a given country in a given industry for people of similar experiences).

The US is the place to be

If we look at salaries reported in USD, GBP and EUR, we can see a picture of which countries tend to have the highest compensations for data science roles.

alt text

Clearly with a median of $110k, the US has one of the highest median salaries, while the UK reports a median compensation of £50k. The Nordics appear to be lucrative as well with reported salaries in the 70–90k median range. Countries such as South Africa and Australia have low sample sizes and have an inflated compensation figure.

A more fair workplace?

There are only 6 women working full-time in the US in the tech industry, who responded indicating they have 6–10+ years tenure using code to analyze data. The bins with the largest quantity of men and women are 1–2 years and 3–5 years which both show a tendency toward male.

alt text

The industries in which women are most prevalent are non-profit and academia, both of which have the lowest median incomes for both men and women.

alt text

The most equal industries for an individual with a Master’s degree, 3–5 years experience, and in the US appear to be: Mix of Fields, Non-Profit, then Internet-based. The most unequal industries appear to be Financial, Other, and Manufacturing (which actually favors Women).


Breaking into the industry

63% of respondents suggest new data scientists focus on learning Python while 24% voted for R. Interestingly only 4% said to focus on SQL. Respondents were limited to choosing only one option and a ranked vote might have provided a bit more flexibility in interpretation of this.

One thing I am surprised is that 3% of respondents suggested new data scientists dedicate their first programming language learning time developing skills in some variant of C (C/C++/C#).


One thing we can start to see is that this is a lucrative field that spans an array of industries. The field is relatively young with quite a few respondents between 25–35. There also appear to still be challenges with men and women of equal education and tenure within a given industry making similar incomes.


Your typical data scientist in the US is 30–35 years old, male, and makes around $100k in the tech industry and studied computer science as an undergrad. They’ve likely been doing this for around 3–5 years and read the KDNuggets blog regularly.

comments powered by Disqus