Data Science: The Truth is (Not Always) Out There

I’ve been working as a Data Scientist for a while now, and have been fortunate enough to have worked in a number of different industries, including retail and investment banking, funds management and hedge funds, insurance, and State and Federal Government.

Apart from still being ‘hands-on’ with the tools, whilst managing and leading Data Science teams and projects, I also spend my time working with Government Departments and Agencies, industry, academia and start-ups to help them build successful Data Science practices, and solve challenging problems.

This has given me a unique perspective, and upon reflection, I’ve identified a list of 10 hidden challenges that Data Scientists often face. I offer some advice on how to overcome these in order to really boost your career to the next level, rather than allowing them to block your progress and learning:

1. Where’s the Data?

The data can be difficult to source and get access to, interpret, understand and amalgamate with other sources

These difficulties can exist for a number of reasons. Sometimes the data stewards or business unit don’t want to relinquish control of the data. Sometimes there are multiple layers of approvals and bureaucracy to navigate before IT will grant access. Sometimes the data is accessible, but the difficulty lies in making sense of it – the systems/business rules are complex and the documentation may be non-existent. In more rare cases, the organisation just doesn’t really have any meaningful data – not all organisations are data-driven.

One of the key strategies to overcome these sorts of issues is to build strong working relationships with the data custodians, and data domain experts as soon as possible. Engage with them, involve them, get them interested in what you’re doing and explain why you were brought on to add value to the organisation. Discuss how you may also be able to make their lives easier. This may include automating reporting and monitoring, or providing additional analysis to gain further insights.

Another important tip is to be persistent, and don’t give up when requesting access to data. Try and speak to real people, rather than waiting in a queue of support tickets.

2. Snakes & Ladders

Multiple layers of hierarchy and rules can make it difficult to get access to the right people, and ultimately the data, which can limit an understanding of the business problems you’re trying to solve

This can result in a feeling of isolation and a feeling of disconnection from the business – and difficulty in getting your work in front of stakeholders.

Once again, developing productive working relationships with your stakeholders is imperative in helping break down these walls. However, it can be difficult to do this if your manager limits how much ‘face time’ you can have with your stakeholders/clients. 

It’s also important to stay connected with other analysts/Data Scientists within the organisation and to try to source your information/business understanding from them, as they often face the same challenges as you. A great way to do this is the set up informal lunch-time sessions, where you all get together to share ideas, work on problems and network – to broaden your contacts both within and outside the organisation. Remember, when it comes to career progression, networking is vital.

3. Dirty (Data) Dancing

Most of your time will be spent preparing the data required for analysis and modelling, understanding it, dealing with missing/nonsense values and generally ‘wrangling the data’

This is simply part of life for most Data Scientists and can be very valuable as it forces us to really understand the data. But don’t forget, you always need to first understand the business and the problem you’re trying to solve, before you try to understand the data. It’s in that context that the data really starts to make sense, and when you begin to identify novel solutions and exciting ideas.

Once you’ve developed a working relationship with your stakeholders/data owners/IT, you need to leverage this relationship to validate your understanding of the data with them, and to determine if certain solutions will work for them. For instance, if you determine that a predictive model needs to be developed, you need to find out how they plan to use it. Are false positives or false negatives more problematic in the business context? What is the appetite for risk from the business users who will use the predictive model?

Throughout the whole project, make sure to keep them involved. This joint ownership will help them trust you and your work, and will help cement your relationship for future work.

4. Here’s the Solution, Now Find me a Problem

You won’t always be given a clear problem to work on. Rather, your employer will want a specific capability developed and it will be your job to go find problems to solve with it

This can be challenging, especially for those Data Scientists just starting out, but is quite a common scenario. For instance, someone may want to justify using a particular piece of software within the organisation and it may be your job to find a problem to solve with it. In this case, the problem must make sense, be achievable and it must be made clear why the software is the best solution for it.

In order to find suitable problems to work on, it’s important to not only develop a strong understanding of the business and working connections with your stakeholders, but to also understand their pain points and strategic goals, and identify clear ways that you can add value, create efficiency, save money, and so forth.

5. Minority Report

Your role may end up being a glorified reporting function

Reporting and BI functions are required by most organisations. Sometimes organisations look to build a Data Science and Machine Learning capability, while basics like reporting, data extraction and data matching, and a data pipeline aren’t in place – so Data Scientists are pulled in to meet these needs.

Unless you specifically want to work in BI, try and determine the analytics maturity of an organisation before you join. Ask questions about the number of Data Science teams they have, the size of the teams, and what each team is responsible for.

If you are ultimately responsible for reporting, note the fact that often problems that look mundane can provide interesting technical challenges if there is support for trialling innovative approaches to solve them. Analytic functionality can be added to reports, allowing the user to drill-down, do what-if analysis and forecasting. This often proves vital in helping the consumers of the reports gain valuable insights that they can’t easily get from a static report.

Always keep in mind that there may be scope for developing your own models to answer business questions, which can enrich data sources that are then visualised using reporting tools.

6. You’re so Vain

Beware of vanity Data Science! 

The real issue with such roles is that there isn’t enough support, or any real need from the employer’s perspective for having you on-board, apart from adding credibility to their business

First and foremost, try to determine if this is the case before joining. Try to find out if there’s support for analytics from senior management, how many other analysts work there and what they are doing. Most importantly, find out what their staff churn rate is for Data Scientists/analysts.

In addition, you also need to work hard to educate and show the value of analytics. You can do this building prototypes and working with as many business units as possible to identify problems to solve.

Networking throughout the organisation is very helpful in order to make yourself and your skills well known. You can do this by offering to be seconded to teams you know may have valuable data and interesting challenges. You can also offer to do a ‘road show’ whereby you give a short talk/preso/demo to other business units of what you have done/can do (ideally try make your examples directly applicable to their business) or attend their team meetings/stand-ups. You’ll discover that once they know a Data Scientist is around, they’ll be coming to you to seek help with their problems.

7. Hard & Soft

You may not have hardware and software that’s fit-for-purpose, or access to the tools you need, or have the tools and data on the same machine

This usually results from security restrictions, especially in the Government sector. Given different classification levels of data sets, machines hooked up to the internet or just a local network, it can be almost impossible to have the open source tools you want and the data to reside on the same machine. 

This can be tricky to solve, and sometime simply involves educating the decision makers about the tools that are needed for Data Science. Sometimes the issue is that R/Python/etc are available for analysis and modelling, but the machine on which the data resides is not connected to the internet, and hence cannot have ad hoc packages downloaded onto it.

To work around such restrictions, you sometimes can use a development environment (physical or virtual) but this doesn’t solve the issue if the model needs to go into production. It’s important to have senior level support, and a good relationship with IT, to help overcome these challenges.

8. Making it Happen!

Your manager may not have enough power/authority to actually make decisions

Remember, it’s your job to turn data into actionable insights, and without the relevant support (unless you’re very senior) this can be almost impossible.

To help overcome this, it’s often beneficial to clearly show how valuable your work can be to the organisation. This can be achieved by looking for quick-wins and proof-of-concepts that focus on challenges faced by senior management, in particular, by those that are actually empowered to make decisions!

Once again, to achieve this, you need a good understanding of the business, the problems you’re solving and the organisations strategic goals. It also helps to be well known throughout the organisation, which can be achieved through networking and having support from stakeholders you’ve previously worked with and assisted.

9. Discovery vs Delivery

Analytics is NOT IT!

If your role reports to IT then it often tends to be run in the same way ie a set of clear processes to be delivered within a specific time frame following well known rules. Such views greatly restrict the ‘science’ aspect of Data Science which is about research and discovery. This also limits creativity and the ability to truly add value.

In order to overcome this, it often involves educating how different the two are, and that research is not a linear activity with predetermined outcomes. It’s important to have support from management to help change perceptions and know how to successfully embed Data Science within an organisation.

The more senior you become, the more time you spend educating and changing perceptions.

10. Shallow Waters

“You’re here to work, not to do another PhD”

This is a piece of advice given to me by my manager early in my career. It was a lesson in pragmatism.

There are times when you want to research a topic further, delve into the details or a model to extract that little bit more accuracy, or simply explore alternative approaches. However, given time constraints, looming deadlines, new work building up and limited staff to assist, it can be almost impossible.

It’s important from early in your career to develop an ability to work efficiently to meet high demands, whilst still retaining curiosity and the desire to delve deeper and learn more. This can mean doing further research in your own time, or trying to set up such practices in your team or organisation as a ‘Research Week’. This allows your team a week per year, for instance, to forgo your BAU work for an opportunity to investigate some innovative new ideas. This can lead to more time and resources being allocated to pursue them, and hopefully make them actionable.

Good luck!

 

Leave a Reply