What is data science?

 front-runner-version-3

What is data science?

Data science is the analysis of data through coding, to attempt to undercover insights (about anything) to aid in decision making. A large portion of this effort is obtaining, cleaning & manipulating data with sophisticated algorithms and computer data structures. This typically must be done before a statistical analysis can be attempted – usually the smaller (but no less important) part.

Insights can come from aggregating, visualising and reporting on the data (type 1). Whereas some insist that for it to be ‘true’ data science (type 2), there must be statistical analysis (machine learning or otherwise). Type 2 is concerned with predicting the future (predictive analytics). This underpins the forefront of all technology, which is artificial intelligence.

What roles does it relate to?

There are various other definitions offered by many professionals in the field. The term is new and as such still unrefined. Some consider a data scientist to be not an individual, but a team. It typically requires a wide and deep skill set, as it often overlaps heavily with the following roles:

  • Engineer
  • Scientist
  • Statistician
  • Mathematician
  • Developer
  • Computer Scientist

Synonyms of the word data scientist also appear frequently, some are listed below. Some people consider these to be roles of a data science team, while others believe they are separate roles but with crossover.

  • Data Analyst
  • Data Engineer
  • Data Developer
  • Decision Scientist
  • Risk Analyst
  • Quantitative Modeller (Quant)

What is the power of data science?

It has the power to do great things in an abundance of applications. The only constraints are obtaining enough relevant data, having creative minds, and applying the right tools. In short, its power is the ability to help the world make better decisions by using the data. Some general examples include:

Some specific business related examples are listed below. Data science in a business context overlaps with the definition of business intelligence.

  • Increasing the number and retention of users to your product through web analytics
  • Gaining insight from customer feedback at scale by using sentiment analysis
  • Knowing which customers if contacted will respond favourably to an offer (uplifting modelling)
  • Knowing how to price your products for optimal returns
  • Detecting fraudulent transactions

What is the link between data science and technology?

The outputs of a successful data science project are recommendations for decisions. These decisions can be acted upon by either humans or technology. In either case, these decisions are what drives technology forward in a remarkable way, typically with a far greater impact than improvements without them.

Here are some examples:

  • Predicting which films will move you
    • Decision by human: watching the films that have been recommended.
    • Decision by technology: send list of ways for users to access the films.
  • Predicting accurately which medical scans relate to early signs of diseases
    • Decision by human: patient’s doctor taking preventative steps to mitigate risks of diseases.
    • Decision by technology: send list of personalised information to patients to help them reduce their risks.
  • Self driving vehicles
    • Decision by human: allow vehicle to transport them safely to a destination they’ve entered.
    • Decision by technology: deciding when to slow down, speed up, put on brakes or which route to take.

Is it the science of data?

No. Data is meaningless unless applied in a specific domain – medicine, physics, business, economics etc. The science is potentially with the domain, but not the data itself. The advancement of the algorithms for the manipulation, storage and analysis of data falls under the realms of computer science and statistics.

Is it even science?

That’s debatable.

A lot of pioneering professionals dubbed with the title ‘Data Scientist’ have come from academic and science fields. They have spent a lot of time analysing data for their domains. Their careers outside of academia required meticulous analysis of data, similar to traditional science itself. I predict this has contributed to the profession using the word ‘science’. Science is the organisation of knowledge in the form of testable explanations and predictions about the universe. To do this, one also needs to be good with data. So considering these points, yes it can be a science.

As a middle ground, perhaps ‘data science’ is the evolution of the word ‘science’. Data is core to the work of all scientists. However when the word science was popularised, we were at a time when data and data technologies were scarce. That is not the case anymore. Analysing data in abundance is a completely new challenge when compared with analysing it in scarcity.

Could we argue that it is not science?

Yes. Science in the traditional sense, is studying the universe and attempting to understand it through causation (cause and effect). For example, If I drop a ball on earth, it will fall with acceleration 9.81m/s/s due to the force of gravity. For the most part, the words science and causation are synonymous. Within data science however, when a correlation (not causation) is found in the form of patterns within the data, often this is good enough to be used for a decision. The decision is made without care for knowing what caused what in this pattern. In some useful cases, causation is still being sought by professionals in science and data science, but it is hard. The pay-off simply isn’t as high when compared with correlation.

Furthermore, causation is only short-term knowledge. Every theory of the universe is merely that, a theory. It can and probably will be disproved by a theory in the future (there are many historical examples of this). Perhaps the universe is far too complicated for us to ever analyse it accurately with causation. Hence if science is largely concerned with causation, then data science is not science since it is largely not concerned with causation.

 

References

  • Kenneth Cukier and Viktor Mayer-Schönberger. (2013). Big Data: A Revolution That Will Transform How We Live, Work, and Think [publication].
  • Eric Siegel. (2016). Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die [publication].
  • Carl Shan, Henry Wang, Max Song, and William C. C. Chen. (2015). The Data Science Handbook: Advice and Insights from 25 Amazing Data Scientists [e-publication].