It was an absolute treat to interview Matthew, who’s a huge inspiration to the Data Science community, and to have him share his invaluable insights and advice.
He was also kind enough to offer an incredibly helpful list of resources for aspiring and junior Data Scientists.
You’re one of the most famous Data Scientists on LinkedIn, and a huge inspiration to many of us. You’re the editor of the influential and highly popular KDnuggets, a Machine Learning researcher, prolifically sharing your thoughts and research with the Data Science community. What’s your motivation, and who and what inspires you most?
Well, that’s very nice of you to say.
I have an intrinsic motivation to learn, with which many other people in this space can likely identify. I don’t really have down time, and I tend to fill any opportunity with learning something new or trying to enhance my understanding of something I already think I know.
I get motivation for writing and trying out specific things from so many of the talented and interesting people writing, sharing, and explaining machine learning and data science material all around the web, from papers to tutorials to blog posts to videos and beyond.
I also want to explicitly point out that so much of what I share on LI, Twitter, and elsewhere is written by a vast array of individuals who are not me — those contributing to KDnuggets as well as others — and so while people may find that I share something useful or inspirational, they should keep in mind that I am not necessarily the original source, and should direct their thanks appropriately.
How have you seen the field change during your time as editor of KDnuggets, and what are some of the more exciting trends you see emerging?
I have been with KDnuggets for the better part of 3 years now, but I started my journey into data science prior to that, when it was mostly still called data mining, and data science was a fairly new term. In that context, I can say that data science has changed quite a bit in name, settling what is and is not part of its domain.
As far as the more exciting trends emerging, my greatest interests at the moment lie in automated machine learning, machine learning pipelines and workflows, natural language processing, text mining, neural networks, and, perhaps oddly, data preparation. It stands to reason that I find trends in these areas the most exciting, including things like automated architecture optimization using genetic algorithms, and practical uses of NLP. I’m really interested in practical aspects of machine learning, so implementations of recent papers and concepts are always fun.
What are some of your most interesting areas of research?
Again, my main interests lie in automated machine learning, NLP, practical machine learning… importantly, and likely more telling than what my interests ARE, is what my interests are NOT. While I have no disdain for it, I have never been particularly interested in computer vision, at least the application of it myself. Given that so much of what neural networks have been able to do over the past 5 or so years has focused on CV, this takes a lot off the top right away.
What does a typical work day look like for you?
Given that my main duties are related to the day to day operations of editing a website, alongside Gregory Piatetsky, KDnuggets’ President and Editor in Chief, quite a bit of my time is taken up with tasks specific to that: liaising with authors, reading, editing, writing, etc.
I generally start the day surveying relevant tutorials, posts, news, etc from around the web, to get a feel for what’s new, even if I don’t read much in detail right away. Throughout the day I find whatever has caught my attention and stayed with me since the morning draws me back for a more in depth look.
I try to spend some time most days on personal projects or research, some of which I want to write about, some of which is for unrelated reasons.
I tend to always have a course of some sort on the go, with varying degrees of devotion, be it a MOOC or some other open courseware available from around the web, and so I try to spend some time each day progressing.
Most importantly, however, the day is heavily punctuated by coffee.
As a Data Scientist, are there any types of projects that you particularly enjoy working on?
Well, ever since starting to learn about and study machine learning, I have been fascinated by “prediction.” My interests as a “data scientist” would be almost strictly machine learning-related, and venture out into other facets of data science only so far as they support this endeavour. For example, I have no expertise in Redis, my SQL — though once strong — has fallen out of practice over the years, and I don’t possess any special data visualization skills. However, I’ll learn what I need to in order to pull data from a data store in order to use for some machine learning project, I will brush up on my SQL skills in order to create a table to extract into a CSV file, and I will read documentation to figure out how to implement whatever visualization is needed to help me better understand results.
I mostly enjoy working on tasks which focus on text data, be it trying to make sense of natural language or performing some text mining. I also particularly enjoy working on model optimization, and — again, perhaps oddly — curating, preparing, and preprocessing data.
Are there any particular challenges you face in your role, and how do you overcome them?
I will focus on my tasks related to research and personal projects and writing, as opposed to the minutiae of website editing.
I would like to believe that I am resourceful enough to find solutions via the web and a few trusted people with whom I converse for pretty much any technical problem I could come across. More often than not, the situations which post the most challenge are those of a more analytical nature, one of the biggest being: “What question am I trying to find the answer to?” Given a bunch of data, and an understanding of it, trying to figure out ‘exactly’ how to leverage it is often a difficult question. I find the best approaches to overcoming these types of problems is consciously thinking about them periodically, and then focusing on something else while the subconscious works. Exercise and fresh air are helpful here.
With the huge growth of Data Science globally, and the proliferation of many trends and fads, what do you feel are the core elements that aspiring and junior Data Scientists need to focus on to ensure a successful and enjoyable career?
My first advice to newcomers to data science is to drop the idea of being a “data scientist” and to dramatically narrow your focus. I would suggest they try and figure out what about data science attracted them in the first place (data management, prediction, problem-solving…), which should help sort out which avenues they should pursue.
For instance, when I first learned of data mining, I was enthralled by its “predictive powers.” After discovering all of the other avenues of “data science,” I first started to believe that you needed to learn everything in order to be effective, and that being a so-called unicorn was a worthy goal. When I realized this was silly, and that pursuing many of these skills was out of duty and not interest, I went back to basics and focused on the predictive aspect of data science, which is, of course, made possible by machine learning.
In a machine learning context, I am a jack of all trades and master of none; I’m always learning, and I’m OK with that. If I had a broad focus of learning ‘all’ of the aspects of data science, my understanding of any particular aspect would be much more shallow, likely to the point of not being of any use to anyone.
So, I reiterate: find your lane and stay in it, for the most part.
What are some tips you’d like to share with your many followers? Are there any specific resources, blogs, influencers, etc they should follow to help them on their journey to become happy and successful Data Scientists?
First off, I would really recommend this little site called KDnuggets.
Beyond that, my favourite top resources tend to change regularly, but some of my mainstays include (in no particular order, and incomplete):
- Kaggle’s blog is a no-brainer…
- … as is ODSC’s
- … and Towards Data Science
- Jeremy Howard and Rachel Thomas have a fantastic collection of video material for neural networks, machine learning, and linear algebra, with a focus on the practical, at fast.ai
- Google Research Blog
- Sebastian’s Twitter is a great resource, as is his website, and his book with Vahid Mirjalili is the best on the subject as well, IMO
- Denny’s WildML blog is really great
- Adrian Rosebrock’s computer vision site is a great resource, especially for someone like me who is interested in keeping up on what’s happening in CV but not especially interested in implementing it myself (though his resources are presented in a way which make me think I might want to do so)
- Sebastian writes about machine learning, deep learning, NLP, and startups
- Vered maintains a blog focused on natural language understanding
- Sujit Pal – NLP, machine learning, algorithms, and more
- Hal is an NLP researcher, and his blog is a definite must
Disclaimer: The opinions expressed by Matthew represent his own personal views and not those of his employer.