I recently had to replace a wheel on a trailer, which I use on my property to move around firewood, firefighting equipment, and the like. The issue was that the original wheel used is of a non-standard size. To fit the new wheel, I first had to cut and drill out the old axle, then fabricate a new one, and finally weld it in place. This quickly became an exercise in problem solving, which got me thinking, how different is this to Data Science???

As I was welding – and inhaling noxious fumes* – I contemplated the similarities between a jack of all trades, and a Data Scientist. I realised that they have a lot more in common than first appears:

  • The tools of the trade – Looking around my workshop, I’m amazed, and somewhat appalled, by all the tools I’ve accumulated over the years – some large, some small, and some never to be used again. The more you do yourself, the more you realise that there’s a special tool waiting to be purchased to meet your need. If you want to create a thread on a steel post to mount a bracket, well, you’re going to need a tap and die set for that. Maybe you want to join two pieces of wood together, and strengthen the joint beyond just glue. There’s a biscuit/domino joiner waiting for you…

    Let’s be honest, a lot of things can be achieved with a relatively small number of tools, and lots of creativity and hard work. However, when you’re time poor, you sometimes just want to get the job done quickly and easily – and of course, that costs money (and storage space!).

    However, it’s not always about the quickest and easiest solution. I sometimes want to feel more connected with what I’m doing (but not TOO connected – don’t get too close to that circular saw, nail gun or plasma cutter!). Cutting a piece of Tasmanian Oak by hand, with a traditional Japanese wood saw, is a different, albeit more pleasurable experience, than reaching for the drop/track saw.

    The same can be said for work as a Data Scientist. How many R and Python libraries and packages do you use each day? How many different applications are running on your laptop now? How many databases, systems, and other tools are you required to use for your role?

    Much of Data Science these days relies on using Machine Learning – a method that allows the machine to do the heavy lifting. This is in contrast to the analytical ‘hand tools’ I used early in my career ie traditional mathematical, statistical and quantitative analysis methods to solve problems. Rather than just throwing data at a model, that could then try find a solution in a relatively brute-force manner, I had to derive the underlying mathematical equation by hand (literally – using pencil and paper), and then think about how to solve it. If I couldn’t get away with an analytical or semi-analytical solution (the preferred option for a mathematician, as it provides deeper insights), only then would I resort to a numerical solution – which sometimes felt like cheating (I want to solve it, not a computer!).

  • New skills & Improvisation – As a property owner, and living out of town, you quickly learn to be relatively self-sufficient – at least when it comes to repairing/making things ie you learn to improvise.

    Over the years, I’ve had to amass a number of skills, as I can’t always find someone to help when I need them – that’s if they’re willing to travel out my way. Some of this was simply through trial-and-error, others by Google, and a few via specialist courses. If a fallen tree blocks my track in a storm, a water pump stops working, a fence falls down and I don’t want livestock on the road (or a neighbours on my property), then it’s my problem to solve immediately. Apart from the array of aforementioned tools (and unfortunately each new problem often necessitates the purchase of new tools), a new set of skills is required each time.

    Is this really any different to working as a Data Scientist? After all, you begin by building a set of fundamental technical skills, and as you move around into new industries and domains, and progress in seniority, you quickly develop new skills, as you’re faced with new challenges and opportunities. Looking back at my career, some of the key skills I’ve had to develop include: speaking and writing skills (eg public speaking and writing project/funding proposals), leadership and staff management, project management, stakeholder engagement and sales (yes – at the very least, how to sell yourself and your skills).

    There have also been numerous occasions where I’ve had to resort to improvisation to solve a problem at hand. Maybe I only have Excel easily available on a Client’s site to quickly mock-up a solution, or possibly IT couldn’t install a package I need, so I’ve have to resort to another method that will be good enough etc…

  • Speaking the right language – When I need to buy a new specialist tool, or talk to an expert about a potential solution to a tricky and unique problem, I have to speak in a language that they understand. One day I may be talking to a cabinet maker about mortise and tenon joints, another day I may be speaking to a metal fabricator about mig vs gasless flux-cored vs stick welding, and the next day I may be debating with a neighbour Stihl vs Husky chainsaws, and how to fell a large tree near a shed.

    Each conversation requires specific language and jargon to reach a common understanding. Isn’t that what we do as Data Scientists each day? In my ‘day-job’, I’m constantly adjusting my language, and level of detail, depending on who I’m speaking to. Presenting a keynote at an international conference is different to giving a lecture to students. Writing this article vs a chapter in my new book is different again.

    One of the most important lessons I’ve learned in my career is the importance of effective communication. I’ve had to learn to be clear, concise and unambiguous. This holds true for life beyond Data Science too. Speaking to a plumber about taps means something altogether different than conversing with a machinist about taps – remember my earlier example about tap and die sets 😉

  • Fail fast, fail cheap, learn heaps – When building a new internal sliding barn door recently, I was faced with the issue of having to enlarge a hole that I’d already drilled, to accomodate some new bolts. The problem was that the drill bit I had, uses a pilot bit to register itself. But with an existing hole, there’s nowhere for it to cut into. To get around this, I first tested an idea on a scrap piece of wood, which worked by using another piece of sacrificial wood clamped to it (there are actually other ways around this problem too). Once I was confident in the method, I applied it to the door – which luckily worked!

    There are cross-overs here with Data Science. I find that an agile, iterative approach to large scale projects is often effective. By working closely with your client/business collaborators, and starting small (such as via a proof-of-concept), and getting regular feedback and testing, you’re more likely to find the solution, and more importantly, define the actual problem in the first place. I’ve also found it useful to sometimes test a major model in silent mode first, before a live release. Then there are pre-mortems, and other such approaches.

    It’s all about testing ideas in a rapid and flexible way, and quickly learning from any mistakes to refine your question and solution.

  • Confidence and perseverance – If someone had told me a decade ago, that I would be doing half the things I do today, then I wouldn’t have believed them. I simply didn’t think I was capable of most of them – after all, my training is in mathematics and computer science, not wood & metal work (and no, I’m definitely not good enough to leave my day-job). Over time, I’ve developed confidence in my abilities, and look forward to acquiring new skills and learning new techniques.

    Maybe you’re starting out in Data Science, or transitioning from another field. The road ahead may be daunting, but there is so much support out there, and a wealth of resources available to assist you. Focus on the theory, then get stuck into the practice ie hack away at some coding, attempt some challenging problems, dabble with Kaggle, and start playing around with some interesting datasets to see what you can discover. I suggest finding yourself some great people to work with/learn from, and if you’re lucky, a good mentor.

So go on, get your hands dirty, and have fun. Remember, sometimes chasing a rabbet down a hole is a good thing – it just depends on the type of rabbet 😉

* I was actually wearing a respirator – remember, safety first!