Hmm. Sounds like my Ph.D. where I used Artificial Neural Networks on a medical regression problem and spent half my time investigating the data and its representation to the ANN would fall under this definition. In fact, it does and this is a fairly typical example problem domain and approach to a data science project, i.e. via machine learning techniques. I should add also at this point that machine learning black box techniques can be considered as equivalent to a variety of statistical techniques. There is nothing magic going on but they do provide an alternate path to generating results requiring less mathematical/ statistical expertise of the analyst.

Given this, could it be said that ‘data scientist’ = ‘statistician’. Yes. It could. But the detail is a little more nuanced. Other possible phrases utilised in this space are ‘data mining’, knowledge discovery’ and ‘big data’. To a large extent these are just different buzzwords that have been used at different times but they have the same underlying processes:

Here are the steps of the Cross Industry Standard Process for Data Mining (CRISP-DM) from 2000, for example:

- Business Understanding – identify project objective
- Data understanding – collect and review data
- Data preparation – select and cleanse data
- Modeling- manipulate data and draw conclusions
- Evaluation – evaluate model and conclusions
- Deployment – apply conclusions to business

As you may have surmised, a very similar process applies to today’s Data Scientists.

What kind of problems are common targets? Considering very briefly: Typically, predictive modelling using machine learning or statistical modelling and, again typically, these will be of one of the following types:

- Yes/ no Classification, e.g. is this an animal
- Value estimation (regression); your typical supervised learning problem. E.g. what will the weight of an unborn baby be
- Grouping of observations (clustering); your typical unsupervised learning problem
- Recommender systems, e.g. recommending a film based on previous viewing habit

Do you have a data set in need of analysis? Drop us an email!

Data Science and Machine Learning Essentials

https://mva.microsoft.com/en-US/training-courses/data-science-and-machine-learning-essentials-14100?l=UyhoTxWdB_3505050723

Advanced | Published: 02 November 2015

Instructor(s): Steve Elston and Cynthia Rudin Back to News