Jason Widjaja is not only Associate Director of Data Science for MSD, a global pharmaceutical company, but he’s also the world’s top analytics writer on Quora.
In this interview, Jason shares an incredible amount of insight and advice, including highlighting the importance of the business side of Data Science, in order to build and run a successful Data Science team.
He also shares his deep knowledge of AI, and thoughts on where it’s headed.
Data Scientists will also find some incredibly helpful advice on building their careers.
You’re Associate Director of Global Data Science for MSD, a global pharmaceutical company. Can you please tell us a bit about your role, including what a typical day looks like for you, any key successes you’re most proud of, and what you enjoy most about it?
I am currently based in Singapore and lead a growing team of 12 data scientists in MSD (known as Merck in the US) in a global data science unit. As we matured, teams organically took on different specializations, and my team’s focus is on AI and data products while other teams look into decision science. The basic distinction is data science in support of human decisions, or machine decisions. In practice this translates into more work in areas like machine learning, NLP, computer vision and intelligent agents. We also take a ‘product’ lens to our work in terms of maximizing reuse and thus have a product prototyping skillset.
There is no typical day – my work consists of an equal mix of ‘business and tech’. Sometimes I am down in the detail looking at individual product features, exploring new data sets or overseeing project delivery. Other times, I do ‘client facing’ work scoping out potential projects and products with our internal stakeholders, and other times I do more ‘strategic’ work of looking into capability building and thinking of the arc of the team. However, if I were to pick one, I think the most practically useful thing I do is shaping the environment to make it conducive for AI and data science to flourish.
You’re currently the world’s top analytics writer on Quora. Congratulations! What’s your motivation, and general views, on supporting the growing Data Science community, including mentorship and advocating and supporting women in tech?
While I am technically trained, I have a deep interest in the human side of data science as well as a love for reading and writing that drives the hundreds of hours I spend on Quora. One of the tragedies I see in the industry is that many data science departments are frustrating places to work in. There are many facets to the problem – executives leave ‘talent on the table’ by not knowing how to hire, manage, and utilize data scientists. New entrants struggle with understanding the landscape distorted by media bias and overused buzzwords. While recruitment agents and overworked HR are faced with a wall of technical jargon and cheap signalling.
My writing in Quora was driven by simply wanting to help fix this. I get dozens of unsolicited messages from graduates and even recruitment agencies struggling with the situation, and I was spending so much time answering similar questions individually that I thought it would be easier to answer them on Quora instead.
Other facets of advocacy beyond analytics mostly come from my experience and convictions in building high performing teams. Part of this is creating environments of diverse and independent thinking, and promoting gender diversity is a part of that, though I would regard other aspects of diversity as important as well. We are quite fortunate to have some progressive HR leaders that we partner with as well so there is a healthy cycle of real life experience and execution followed by reflection to process it.
What are some of the biggest challenges you’ve faced in your career and how have you overcome them?
Probably my most challenging and painful experience in my career was after my MBA when I took up my first job in Australia. It was in management consulting and not exactly what I expected. Like many graduates, I was drawn to promises of rapid learning, being around smart people, and creating impact, and I was willing to work hard to achieve that. But at the same time I was introverted, technically minded and soft spoken. I also preferred to avoid crowds and disliked small talk. Taken together, I was not quite the ideal persona for work that basically required talking all day, in an environment where small talk and socialising was at a premium.
I soon received feedback that I was ‘not adding enough value’, was too quiet during meetings, and did not have enough ‘executive presence’. This ultimately led me to be marked low enough to be in danger of being let go. I was also put on unpopular projects. It was a demoralizing experience, as I was coming from a place where I was used to being looked up to due to my good work and high grades. I think that broke my ‘ego barrier’ and forced me to take a long and hard look at myself. It was painful but I took on feedback and started to rework myself from scratch, to the extent that I recorded work meetings and replayed them in my free time to understand the perspectives of each person, and prepared for simple questions like how I spent my weekend. Eventually, my performance crept up to an acceptable level but it took two years.
The whole episode was very humbling. I am not sure I have fully ‘overcome’ all these things, but the experience changed me and I am a lot more comfortable putting my point of view ‘out there’ now and in particular speaking up for the overlooked technical worker who often have no seat at a table of homogenous business executives. I also starting thinking about the type of environments needed for diverse teams can flourish in the workplace, the various biases that cut out capable but less socially adept leaders, and how I could create that space. I also changed the way I network quite drastically – rather than trying to vie for the time of a popular person in a noisy room, I try to engage them at a different time as I prefer 1:1s. Nowadays after events I mostly head straight for the food and coffee 🙂
What are your views on successfully building and managing a Data Science team?
In building a data science team I have a few simple philosophies.
The first is ‘data science for data scientists’. Data scientists should be utilized first and foremost in the areas where they are trained in. These are areas such as pioneering new applications of statistical and machine learning modeling to test if the use case is a viable one. Or exploring new data sets and doing data mining to produce new views and insights that affect change in the business. And less of things like data management, infrastructure management and the like. The corollary to this is that sometime we don’t need armies of data scientists, but rather a strong supporting cast of 1) engaged sponsors 2) helpful engineers, and 3) condusive operating culture. This may be hard to swallow for some executives who just want to be associated with cool ideas like AI but are not really willing to change to make it happen.
The second is thinking through the way we frame an ideal data science team. I think too many people have taken on the Conway Venn diagram of ‘stats, programming and business’ and tried mapping their HR processes to what is inherently a cross disciplinary field. The result are teams that either cost too much, lack depth, or end up being underutilized. I have two mental pictures when I think of building a data science team – the middle of the Conway Venn diagram brings to mind a team of commandos – similar and strong across many aspects. However, I find a more useful picture to bring to mind is a superhero team more like the Avengers – distinct individuals each bringing their unique strengths and quirks. It is certainly more challenging to manage, but magic when it comes together.
Thirdly, I think having some flexibility in shaping the work is very important – one of the culprits of ‘talent on the table’ is needlessly narrow job scopes and organizational silos. I think that a good starting point when meeting a great data scientist is not to bring out a hefty job description and try to cram them into a box, but rather to instead ask ‘how can I best wrap the job around you to maximize what you can offer and grow into’.
And finally, I think having a healthy environment of practical peer-learning is important. One of the biggest advantages I have over the best solo data scientists in the world is the ability to offer peer learning between experienced data scientists with different skill sets. But this does not happen automatically. Creating this environment is hard at the start, like hard crafting each edge between nodes in a network graph. But once a tipping point of connections are set up, the whole graph starts to connect on its own.
Deep Learning and AI have advanced so far in the past decade. How would you describe the current state, including applications and limitations, and what emerging trends and technologies excite you most?
I should preface this answer by saying that most of the interest in AI is centered around machine learning and deep learning, and there is a lot of AI that is not machine learning, and likewise a lot machine learning that is not deep learning. I do expect this to change eventually, with other aspects of AI such as reasoning and knowledge representation emerging more strongly, but at the moment that is being drowned out by seemingly every other company with the *possibility* of applying machine learning into one process mentioning that they are ‘AI-powered’.
That said, regardless of what one’s working definition of AI is, it is currently developing on many fronts at an accelerated rate, including model advances, purpose built hardware, open data sets, scalable infrastructure and also general awareness and openness. Perhaps most importantly, reusable components and levels of programming abstraction is making it easier to build and deploy models in meaningful applications.
This bears some resemblance to web development, and I am expecting a similar evolution – I have little doubt that large portions of AI capability will eventually be democratised in the same way websites can now be built without programming knowledge. However, and again similar to web development – for more sophisticated, bespoke work, practitioners will continue to thrive. Unlike building websites however, AI comes with a collection of ethical issues which are only starting to see the light, and it is important to initiate and participate in these conversations.
In terms of the frontier of deep learning, my brother runs lauretta.io, which implements IoT and cutting edge computer vision algorithms, so we were understandably interested by Geoff Hinton’s recent work on capsule networks. It addressed a limit of how objects are represented by capturing the spatial characteristics of an object such as object facing in the way the data is encoded. This also decreases the training data required, though whether the tradeoff in practical implementation is worth it is still an open question.
Further into the future, I am conscious that the vast majority of AI work today is anchored around the way human beings conscious reasoning, and ultimate breakthroughs towards general purpose learners may well come from taking a very different approach that tries to approximate ‘intuition’ rather than ‘intelligence’. I recently read ‘Superintelligence’ by Nick Bostrom, and I am currently reading through Pedro Domingo’s ‘The Master Algorithm’. It is a fascinating space.
Given these advances, we also need to consider and be aware of bias in models and ethics. What are your thoughts on this?
Model bias is only one of many issues around AI. But it is inherent in machine learning because in the world of machine learning, ‘you are what you eat’, and the biased models that research uncovers are often just learning patterns drawn from real-life data. It is also a particularly striking issue because it forces us into the uncomfortable place of coming to terms with our own bias and ethics. This is not something I commonly see even in the biggest and most reputable companies and projects – there is usually a ‘business side’ and a ‘technical side’, or a ‘front end’ and a ‘back end’. But hardly an ‘ethics side’.
I think the difficulty in tackling this problem comes from bridging the worlds involved. One minute you are looking at ten lines of code sampling a dataset and training a model, and by the same action you may have just taken an ethical position on affirmative action or discrimination against a minority. However, the people who build and test these systems rarely get ‘ethical outcome requirements’ alongside ‘business requirements’.
I hope this will change, and I hope it will be done in a self-governing way as much as possible rather than have heavy handed regulation or legislation.
You’re an ex-management consulting manager. What were some of your notable challenges and successes in the role, and what are some key takeaways you’d like to share?
My main takeaway from management consulting is a front seat on these large transformation programs, and how hard it is to make organizations change. I think management consulting work would benefit a lot from various tools and agents playing an augmentation role. I can certainly think of many consulting tasks that are actually ‘data science lite’ tasks like working with data, building visualizations and arranging them in presentations where a one click first draft would save a lot of time.
The flip side is that a group of data scientists talking among ourselves would likely severely underestimate the resistance of an organization to adapt to a data driven way of work, and overestimate the interest others would have in data science. Empathy and collaboration are almost timeless traits that we sorely need more of.
What areas of research are of particular interest to you?
Personally, I am particularly interested in modelling people and human interactions. This is a particularly tricky area, both technically and ethically, but also one which holds a lot of promise. Years ago when ‘big data’ was in vogue, I have maintained that a lot of societal dysfunction is rooted in the way we take the complexity and uniqueness of human beings, and reduce it to a crude number, often something like grades. To me the promise of big data was always in reducing people less, connected to this is the strange phenomenon of modeling people as individuals, whereas both research and personal experience repeatedly bear out that we are deeply affected by the environment and people around us. Using new tools to attack this age old problem in ways never seen before excites me.
What successes and accomplishments in your career are you most proud of? What are some of the key learnings you’ve developed so far?
I am quite team oriented and get a lot out of seeing teams come together and getting opportunities to utilize their full skill set on genuinely valuable work. Commercially we have paid for ourselves many times over through a mix of cost avoidance and direct impact. More broadly we are maturing the company’s data science and AI capability. Some of our work squarely affects health outcomes at a macro level and that is pretty cool. Some highlights were our contribution to reducing Hepatitis C in Australia, while on the technical front we deployed multiple open source machine learning models in production completely in house. We are also on a strong growth arc, and are on track to pioneer a portfolio of in-house AI and data products with the potential to increase productivity substantially across the organisation.
In terms of lessons learnt, I see many gaps and many opportunities. I am learning how being technically or academically ‘right’ is not enough. I see the gap between the thinkers, the doers, and the talkers, how rare it was to have all of that in one person, and how important it is that they work together. I see how certain environments bias and favor certain types of people and the need to unwind that to unlock latent potential. I see one of the bottlenecks in realizing value is often not in the analysis itself but in the environment – such as politics and incentives – and doing one without the other is setting up for failure. And I also see that a lot of work currently billed as ‘glamorous’ is actually quite technically simple and can be both improved by data science or automation. And finally, as we race towards making our tools sharper I see a need to make them safer as well.
What tips and advice would you like to offer your fellow Data Scientists, including aspiring and junior Data Scientists?
I would generally encourage data scientists to be mindful of operating in isolation and hanging out with only other data scientists, and rather seek to understand and mature the government or company context they work in. There is an endless streams of technical skills to learn, but while this is important, the practical bottlenecks to finding good data science jobs and companies getting value from data are often less technical but more organizational.
I think what we are currently have within the industry is a signalling problem – signalling is getting easier, and reading signals is getting harder. Today it is too easy to get “data science key words” on your resume. Coding schools have intensives that claim to develop data scientists in 6–12 weeks. Some online portals like Coursera and EdX claim to equip learners with data science skills in 4–5 weeks, often with the backing of well-known universities. Udemy and Lynda both offer an introduction to machine learning course in less than one day. And this will only continue at an accelerated pace, because it is probably easier to make money from teaching data science, than actually making it work in practice. So from the candidate’s perspective, signalling is now easy.
On the hiring side, I can say with data-backed confidence that most HR managers would not know the difference between a 4 hour course and a 1 year course if it is not explicitly stated. To make it worse, data science and machine learning are ‘hot’ organizational topics so there are often political power struggles around being associated with these functions. This results in even hiring managers not being qualified to hire data scientists. So from the hiring perspective, not many can read signals well.
The net result is data scientists having to wrestle with broken recruitment systems and data science organizations. (And to be fair this applies beyond data science as well)
So as advice, I would encourage data scientists to get to know their communities and educate their peers outside the data science industry. Try your best to get in front of hiring managers and your future colleagues, and invest your time in raising the level of understanding of the *consumers* of data science. As a result, you might find the right people with the right opportunities.
Disclaimer: The opinions expressed by Jason represent his own personal views and not those of his employer.