Interview with Eugene Dubossarsky (Data Science Pioneer)

There are successful, influential and inspirational Data Scientists, and then there’s Eugene Dubossarsky!

The Data Science community is incredibly lucky to have him share his wisdom and insights, distilled from over a decade in the field.

His responses are invaluable, and you’re guaranteed to learn from, and be inspired by, this interview:

You’ve been running the highly successful Presciient training business for a decade now (teaching R, Introduction to Data Science, Machine Learning etc in face-to-face workshops), have recently started a new consultancy Advantage Data, are Chief Data Scientist for the global analytics consultancy AlphaZetta, and are involved in other ventures. What are some of the motivations and inspirations that drive your success?

I have a two year old. Do you know how expensive daycare, nappies and “Peppa Pig” merchandise are these days? More seriously – some of these things are in their early days. Let’s assess success in three years or so. I can however tell you that when an opportunity comes to work with great people who energise me, be supported in the things I find tedious, or am just not so good at, and I am able to concentrate on what I am good at:  how can I say no ?

With regard to the training: I love finding simple, intuitive explanations for complex things, and sharing them with others. I also like to see people grow and flourish, and get a buzz when it is partly due to my help. So training was a natural fit.

The combination is simple: do what you love, what you are good at, and what there is demand for. And do it with people that support, energise and inspire you. Not easy to find, but worth taking years to look for.

You were doing Data Science before some aspiring and junior Data Scientist were born and when R was known as S! Over the course of your amazing career, how have you seen the field change, and what coming changes should we be prepared for?

You’re asking me to read the tea leaves. All I can say is… Be prepared for change. Be prepared to be uncomfortable. Be prepared to give up notions of “career path”, job security and a life plan. Stay curious, embrace uncertainty and wander (semi- !) aimlessly into as many new experiences as possible.

There are any number of social, economic, technological changes happening RIGHT NOW. Everyone from your grandparents down has grown up in a rapidly accelerating social experiment.

So I don’t know what changes will happen and when. That’s the nature of the beast.

I can however tell you what I am preparing for:

  • Economic changes making all but the best data scientists obsolete, but the best ones more in demand than ever;
  • A world where value matters, and budgets are not there for spending (actually repeating the last point in a different way);
  • A data science ecosystem with intelligent, demanding use of data science by organisations (repeating myself a second time!);
  • Quantum computing and quantum machine learning;
  • A freelance model of work, where the value is measured in dollars (or some other KPI) rather than hours, and everyone is independent.  Hello AlphaZetta;
  • A global model of consulting: I want 2/3 of my clients / customers to be from outside Australia, ideally from outside the English-speaking world. We’re getting there;
  • Data science as a secret weapon: I have been selling data science as a service or product my whole life. The most satisfying application of data science is however the one where there is no sales / influence bottleneck – this only exists when you use data science as a secret weapon to sell something else, or when you use it for sports betting and financial trading;
  • Bayesian modelling will take over data science – don’t know how soon, but this is the most general and correct way to do statistics. It’s only a matter of time;
  • The importance of non-customer data; and
  • Data valuation to assess the total value of organisations.

I’ll also tell you what I am betting on NOT happening:

  • Mandatory certification of data scientists;
  • Software that can do data science, so data scientists are obsolete; and
  • Sustained economic growth in Australia for another 26 years.
You’re the founder of the Data Science Sydney Meetup, which I assume is one of the largest Data Science Meetup groups in the world. Why is it so popular, why do you enjoy running it, and what importance do you place on Meetups in general, both for hosts and attendees?

I enjoy Running DSS for the same reason I enjoy running my courses. Where else can I meet such as diverse, interesting and inspiring bunch of people? A good chunk of my friendship circle were met through Data Science Sydney or my courses. This says something about how much those mean to me, or that I don’t get out much.

For beginner and intermediate practitioners, meetups are a must. If you aren’t networking, you are missing out on a vital experience in your professional development.

You’re the creator of the fantastic R interactive data visualization package ggraptR, and use R extensively, but you also teach Python through your Presciient courses. R vs Python, when and why?

When asked this question by beginners I say that this is the wrong question for two reasons. The first is that if they think that learning a language is hard, wait until they face having to learn statistics. Which I think they would need to do properly.

While there are plenty of jobs right now for people whose stats skills are minimal and non-existent, I don’t think that this trend will continue into the future. If the required skill set for a data scientists is just a couple of online courses with a focus on coding, and so many are drawn to the field due to its prestige and high pay, then we can expect more stringent and educated requirements from employers in the future, and stats will be kind.

The other reason this is not a good question: you should learn both. There are things in R that Python does not have, and probably will not for quite a while, and vice versa.

Python is the tool for what Computer Science and Engineering are best at: Deep Learning as well as Speech, Text and Image analytics, Signal processing.

Meanwhile R is better for stats in general. In particular it is much better for forecasting, largely thanks to the work of Rob Hyndman, and more generally for non-i.i.d. regression analysis, singular value decomposition, network analysis just to name a few, as well as many other things that aren’t in the standard machine learning toolkit. It is also a terrific data processing / analysis tool thanks to the efforts of Hadley Wickham and the “tidy” vision that he continues to spearhead.

While Python is the more commonly used tool, this doesn’t mean R is diminished. It makes R ELITE.

What are some of the major challenges you’ve faced in your career, and how have you overcome them?

The biggest challenge was figuring out how to continue to work in the field given its immaturity, and to continue to enjoy it. Three hours in my favourite cafe in the Old City of Chiang Mei did the trick. I figured out a business model that has continued to serve me well.

What are some of the biggest challenges limiting the uptake of analytics within organisations, in order to truly lead to better decision making using data?

The most important real challenge is environmental: does the organisation exist in an environment where it must make good decisions or perish, and this pressure feeds directly into the short-term incentives of senior management?

Anything else is not conducive to effective analytics in organisations.

The biggest manifestation of this challenge is people with big budgets but agendas other than wanting to make better decisions, because decisions don’t really matter and there is plenty of spare money floating around.

This challenge splits into several parts:

  1. People whose main skill is getting noticed, creating momentum and being associated with high status buzzwords throwing money at something they don’t actually care about and have no real interest in using and
  2. Process / deterministic outcome driven functions (such as many IT functions) taking charge of data science, and running it like a deterministic process.

Uptake is not always a good thing.

The second biggest challenge is the rather large uptake of analytics across large organisations on the dysfunctional basis above. I would rather there be less uptake, but that the uptake happened for the right reasons.

When you’re not running your many ventures, consulting, mentoring, competing in Kaggle, hosting Meetups etc, what are some of your research areas of interest?

Last year the main focus was on Sports Betting, which led to much R&D in non-standard machine learning techniques and related methods. I wish I could tell you more but… trade secret!  I also did a fair bit of study into Bayesian modelling, particularly regression, and into causal impact analysis.

These continue in the current year, although I am now focused on Quantum Computing and Quantum Machine Learning. I also have an ongoing software R&D project on collective forecasting / strategic decision making.

What are some tips you’d like to share with Data Scientists looking to build their careers, both those entering the field, and those more established and looking to further their career and grow their influence?

Learn statistics and econometrics. Yes it’s hard. But this is the skill that makes you stand out. Also, it’s the language that data speaks. If you aren’t listening to the data, you aren’t a data scientist.

Network. Have passion projects. Present on them.

Find work where your efforts matter, and your value is understood and appreciated. Assess every job interview on that basis (A video of a talk on this topic coming up very soon).

Don’t stop learning.

Disclaimer: The opinions expressed by Eugene represent his own personal views and not those of his employer.

Leave a Reply

Your email address will not be published. Required fields are marked *