An Educated Guess

In Part 1 of this series, I outlined the ‘what’, ‘why’ and ‘how’ of Data Science, and discussed some of the key drivers for it’s recent popularty. This rampant growth in the use of data and analytics across most industries and domains has resulted in unprecedented growth in the role of a Data Scientist. As an educator and analytics leader, it’s been exciting to watch the growth in popularity and respect of this discipline. Some worrying trends, however, have emerged.

The surge in popularity of Data Science has awakened or stirred in many an interest in data analysis, modelling and coding, but it’s also attracted self-proclaimed gurus who are desperate to cash-in on those looking for a short cut to the discipline.

There are also many who profess that anyone can become a successful and respected Data Scientist with little or no formal training in mathematics, statistics and computer science, and that they can help them become self-educated professionals. Many of these charlatans don’t have sufficient training and experience themselves, so how can they possibly educate others?

From an organisational perspective, technical credibility is vital, of Data Scientists and their leader.

What many people don’t often openly discuss is that their Data Scientist’s sometimes lack sufficient and appropriate training and skills ie they’re simply not up to the task – and the impact this has on the organisation. This can not only reflect poorly on the manager and their team, but also greatly inhibits their ability to add value to the organisation, as a result of unrealised potential.

My point is twofold, employing the old Descartes adage:

  1. “I think I need a Data Scientist, therefore I do”
    • As an employer, you need to be able to vet Data Scientists to make sure they have sufficient training and education for the task at hand. To do this, you need to have leaders and managers who are qualified to make such judgements, and have technical credibility themselves.
  2. “I think I’m a Data Scientist, therefore I am”
    • As an aspiring Data Scientist, there simply is no short cut. A formal education is essential. Be aware, though, that from the perspective of many potential employers, MOOC’s et al. are considered to be great for supplementing your education as a Data Scientist, but they aren’t considered to be sufficient for establishing your abilities and credibility. There’s simply too much variation in their quality, and lack of rigour, compared to established and traditional forms of education. In addition, theoretical studies are just the beginning of your journey. You truly start to progress and deepen your knowledge once you’re working on ‘real’ problems, and ideally, as part of a team of qualified and experienced colleagues, with a supportive and credible manager.

A question that I’m often asked is, should I do a PhD?

For anyone interested in pursuing a career in research or academia, then post graduate studies are basically mandatory. Beyond this, there’s also a strong appetite for professionals in industry and government who hold PhD’s, especially in the most popular organisations. Simply put, doctorate studies show that you’ve attained a level of technical competency that’s difficult to match, let alone achieve in industry, and given competition for talented candidates, and the nature of certain roles, having one will offer you opportunities that you wouldn’t otherwise have.

From a manager’s perspective, here’s what I usually look for when recruiting for a Data Science team:

  • For senior and research specific roles, I prefer PhD trained staff, as many of my projects have a strong research component, with a fair degree of conceptual and technical complexity. I also sometimes need staff who are experts in a certain field.
  • In regards to degree’s, I focus more on the fundamental and transferable skills that a potential candidate possesses ie mathematics, statistics and computer science.
  • I value staff who question the status quo, never stop asking ‘Why’, are team players, and are willing to learn and grow.
  • I find that the best teams are interdisciplinary, with complementary skills. Often the Data Science capability is not stand alone, but rather an internal part of the enterprise analytics function, so it’s vital to have Data Scientists, Analytics Engineers and DataOps specialists, to ensure an overall efficient pipeline process.
  • In regards to coding, I don’t care so much about knowledge of a specific language, but rather an understanding of different paradigms, and a deep understanding in at least one of them. Put it this way, if you’re a great C++ programmer, but have no experience in Python, which may be needed for a particular project, I have much more confidence in you becoming a proficient Python coder, than someone with little or know coding experience, who may have done some introductory Python courses.
  • The more senior the position, the more I focus on people and communication skills, as I expect the person to be front-of-house, talking to stakeholders, representing the team, and helping supervise and mentor staff.