In the last decade or so, data analytics and data science have proliferated into all industries. Data is more plentiful than ever, and specialists are needed to dredge the data lake for nuggets of insight. I’ve seen my college classmates who studied everything from economics to biophysics to insect behavior switch from academic research into a “data science” role. The industries they landed in ranged from video games to medical research to tourism.
Despite having the same title, these roles are not interchangeable. While the hard skills are the same – statistics, programming, and presentation – the object of study varies a lot. Domain expertise is critical to formulating hypotheses and interpreting data. Moreover, the skills are less effective when split. It may be tempting for a subject matter expert to hire a statistics whiz to handle all the technical drudgery, but do not separate the theory from the evidence.
The Scientific Method
At the core, analysts and data scientists are simply applying a data-heavy version of the scientific method. Let’s use a condensed version of the steps and see where domain expertise plays a role:
Form a Hypothesis
Observation
Question
Hypothesis
Collect and Interpret Data
Experiment
Analysis
Make an Impact
Conclusion
Form a Hypothesis
Directing research by asking the right questions points data work towards the most important targets. A good hypothesis is plausibly true, empirically verifiable, and impactful.
Plausibly True
Hypotheses should be uncertain. Asking a question that you already know the answer to is a waste of time. “Will players spend more money if the game loads slower?” No, obviously not. Next. (a new hire analyst at Riot actually proposed that theory). Domain expertise is important here to filter out solved problems. A data worker without domain expertise requires heavy guidance from an expert partner.
Making observations is key to forming plausibly true hypotheses. Experts who have been in the field for a long time will naturally accrue observations. However, experts also collect conventional wisdom which can be false or misleading. Without a skeptical and scientific eye, domain experts can propagate misinformation and be blind to contradictory evidence.
Empirically Verifiable
Hypotheses need to be verifiable or falsifiable. If the question is unanswerable, then don’t expend effort investigating it. Many departments create value in intangible and difficult-to-verify ways, such as brand marketing. When the results come back ambiguous, the stakeholders tend to interpret it in the most favorable way.
For example, as a young analyst at Riot, I got a request from the higher-ups to measure the impact of esports on League of Legends revenue. There are so many correlations and confounds, I struggled to find any solid methodology. I tried a variety of approaches that gave wildly different results, and ended up reporting that esports boosted LoL revenue by somewhere between $1 to $100 million a year. That 100x range is quite absurd, but I didn’t feel comfortable with a more precise answer. I don’t know where my report went, but I did hear that high-end “$100 million” number quoted back to me later. It seems whoever commissioned the report took full advantage of the ambiguity.
Impactful
Hypotheses, after being verified or falsified, need to be impactful. The new knowledge gained should help make better decisions. To this end, the research director needs to understand what and how decisions are being made. The best scientists need to ask the right questions. More on this later.
Collect and Interpret Data
This step is where the technical skills of programming and statistics shine. This is what people think of when they hire a data scientist. Domain knowledge is less critical here, though it does certainly speed up the investigation and reduce communication burdens.
Collecting the right data requires knowing what data is worth collecting. An outsider data scientist could easily pull the wrong variables, grabbing useless information while omitting the relevant bits. A research director could simply list out all the variables they want to capture, but that burdens them with foresight and planning; presumably they hired someone to reduce their workload.
Context is key to interpreting data. For example, most video games present players’ rank (e.g. bronze, silver, gold) ostensibly as an indicator of player skill, but in reality many ranking systems have a grind-to-progress component where players can rise in rank without improving in skill.
The degree of this grind-to-progress can vary quite a lot between games. In hardcore PC esports like LoL and Counter-strike, there is some grind-to-progress but rank is mostly an indicator of skill. Mobile games usually have a much larger grind-to-progress component, and ranking may reflect total playtime or dollars spent more than skill. An outsider analyst might find a relationship between player rank and spending in Brawl Stars, and conclude that user acquisition should target skillful strategy gamers like chess players because they will spend more money.
Of course, a product expert would point out this misinterpretation of “rank” and allow the outsider analyst to correct the mistake. But this iteration loop requires time and communication, and the whole process is faster if the domain expertise and technical skills are combined in one person.
Impact
Knowledge from data is used to make decisions. The more it improves decision-making, the more impactful it is. If the researcher is intimately familiar with the domain space and involved in decision-making, they can hone in on particular insights and make specific recommendations. However, if the processes are separated, then researchers often end up making useless recommendations.
I ran into this problem during my brief stint at Flexport, a freight forwarding company. I was an expert in games and consumer tech, but a total novice in B2B logistics. I often found patterns in the data and would try to propose new ideas, only to find my insights were useless. “Those contracts are locked in months in advance,” “Ocean carriers don’t operate that way,” “Clients don’t care about X, they care about Y,” etc. My lack of domain knowledge prevented me from investigating impactful questions.
Hiring for Domain Knowledge
The traditional way to hire for domain expertise is to simply hire people with work experience at similar companies. This certainly works, and it is the most common method, though it does come with in-breeding risk. Hiring like-minded people can create an echo chamber which may damage your game (as Novati writes about here). If your company is trying to innovate by breaking convention, bringing in the old guard may hamper your innovation.
Requiring relevant company experience also limits your labor pool. There’s a lot of potential talent out there, and only a fraction are already in the industry. Games are unusual in their consumer popularity and presence of superfans. Talented people with no games industry experience may still be intimately familiar with your game or genre. This is unlike other industries that may be stuffy and dry, like B2B logistics services.
Tangentially, I think the dryness of freight forwarding caused problems with Flexport’s loose hiring and poor retention. Flexport branded itself as the upstart san fran tech disruptor to logistics, analogous to Uber to taxis. But logistics is not a sexy industry. In their search for tech talent, Flexport hired plenty of tech workers who were inexperienced and uninterested in logistics. They would inevitably be disoriented, slow to onboard, and quick to jump for another job. The average employee tenure was about 2 years.
One easy question that reveals a lot about a candidate’s domain knowledge is the blue sky question: if you joined our company as a data scientist and could do anything you wanted, what would you do? If the candidate has worked in the domain before, they should have a bucket list of projects they wanted to do but never had the bandwidth to. If they are an industry outsider but passionate about games, they should have plenty of curious questions on hand. If they can generate hypotheses with minimal context and prompting, then they likely have a solid foundation of domain knowledge to work from.
"For example, as a young analyst at Riot, I got a request from the higher-ups to measure the impact of esports on League of Legends revenue. There are so many correlations and confounds, I struggled to find any solid methodology. "
I think this deserves it's own article! How do you deal with poor data requests from the top. And if you do commit to the analysis, how do you provide the proper caveats when presenting the results such that the exec (in your example) doesn't go, "woah! 100 MM is a lot of money!". Thanks for the great article, Eric