Mistrust, Misunderstanding & Misinformation

In this final instalment of The Restraint of the Data Science Beast series, I’ll be tackling the proliferation of exaggeration and hype throughout the industry – and the broader community – promoted by self-professed experts, certain vendors and start-ups, especially those in the realm of Artificial Intelligence.

I’d like to begin by addressing some common myths:

  • No, you don’t NEED Data Science, but in the age of data, you’ll find that it will greatly benefit you in achieving your strategic goals. First, however, you must build a Data Culture – and for that, you’ll most likely need senior executive support, think CDO – but with relevant technical credibility,
  • Likewise, no, you don’t necessarily need AI. You can use traditional analytics and fairly ‘simple’ aspects of Machine Learning, for little or no cost, to achieve great success – depending on your data of course,
  • Not all Data Scientists are created equal – see the earlier article in this series about having the right qualified people in the right roles,
  • No, you generally cannot run a Data Science practice without a suitably qualified and technical manager/leader,
  • AI is NOT intelligent. Yes, some great advances have been made in certain areas, and they’re impressive for certain classes of problems, but there’s nothing ‘intelligent’ about them – not in the human sense. Further, they’re devoid of abstraction and reasoning, can’t develop mental models, don’t generalise well, lack interpretability, are prone to adversarial attacks etc… We’re a long way off from achieving Artificial General Intelligence (AGI) ie systems that truly mimic our own intelligence and learning capabilities. Until then, we’ll have to make do with glorified curve fitting,
  • No, Deep Learning is not the panacea for solving all problems,
  • No, it’s unfortunately not always easy to establish a Data Science capability with guaranteed and immediate results – there are too many variables at play ie people, data, tech, problem scope and definition,
  • Well-defined problems, that are WORTH solving, ARE solvable, and have measurable outcomes are NOT always easy to find, and
  • Yes, fear of the unknown, fear of change, and no appetite for risk, will hold you and your organisation back. Educate yourself and your staff in Data Literacy, so you can comprehend and extract value from data and findings, and increase your analytics maturity.

Even more troubling are those pundits who use Data Science/AI to lend credibility to answers they want – such as finding data that supports their views. This abuse is the complete antithesis to the true scientific endeavour of the discipline, and greatly discredits its use, and the integrity of all genuine Data Scientists.

The reasons for this are not always entirely malevolent and self-serving. More often that not, it’s the result of ignorance and insufficient training & education, and poor leadership and management. Examples include:

  • As a result of laziness, some practitioners use data that’s easiest to obtain, rather than best suited, and representative of the problem at hand,
  • Scientific rigour is sometimes missing, and isn’t to the same standard as is common in other disciplines, such as mathematics, statistics, actuarial, engineering, …
  • Some practitioners simply lack enough understanding of the underlying mathematical, statistical and computational concepts,
  • There can be too much focus on finding A solution rather than the MOST SUITABLE solution.
  • Not enough due consideration is given to the bigger picture ie the over-aching aims of the organisation – remember, Data Science, and tech more broadly, should not be done in isolation – and without due governance and oversight,
  • Many less-educated practitioners prefer a brute-force approach to problem solving, rather than a systemic, logical & repeatable process, and
  • Some people get excited about a new approach (ie the latest Deep Learning algorithm), and try to force a particular ill-fitting/inefficient/unsuitable solution to a problem.

To all practising Data Scientists, I leave you with these immortal words from two of the giants of Quantitative Finance (which haven’t lost their relevance a decade on), to help you maintain credibility in your work, and to guide you on your journey ahead:

The Modellers’ Hippocratic Oath
I will remember that I didn’t make the world, and it doesn’t satisfy my equations,
Though I will use models boldly to estimate value, I will not be overly impressed by mathematics,
I will never sacrifice reality for elegance without explaining why I have done so,
Nor will I give the people who use my model false comfort about its accuracy. Instead, I will make explicit its assumptions and oversights, and
I understand that my work may have enormous effects on society and the economy, many of them beyond my comprehension
Emanuel Derman & Paul Wilmott, 2009

And to my fellow Data Science and Analytics leaders, it’s up to us to help educate the misguided, and keep the hype mongers in check, for the integrity of our discipline depends on it.