How to become a research-oriented data scientist
The work that our team does at Our World in Data (OWID) has become a lot more visible recently. It has led me to receive more frequent emails asking for advice on working with us or growing one’s skills toward similar positions. This post summarizes the necessary skills to join our data team at OWID or an “OWID-like” organization.
What we pay attention to
Data wrangling is our data team’s fundamental work at OWID. It is thus an essential skill to master if you’re considering joining us. You will need to be fluent in the use of pandas or a similar package in R (dplyr, data.table). But our entire data pipeline relies on Python, so we strongly prefer data scientists who use this language.
The fact that this is our core skill should make it clear what we do, but also what we don’t do. To solve most of the world’s largest problems, our team believes that machine learning isn’t the most urgent next step. We work with datasets that are small by industry standards. And our tech stack is not at the cutting edge of data science and cloud services. Instead, we want to provide the world with the cleanest, most-reliable, best-documented datasets on crucial problems.
Other necessary skills
It’s not only the technical skills for data wrangling that are essential. Research-oriented data analysis implies using data to understand the world and help others do the same. The “expert data wrangler” presented above would thus also need the following:
- good “data judgement” (attention to detail, thoughtful tradeoffs between data quantity and quality, careful and systematic thinking in situations where there is no perfect solution);
- very good knowledge of data visualization principles and good practices;
- a good understanding of our work at Our World in Data and our mission;
- experience with version control systems (we rely heavily on GitHub);
- a basic ability to use a terminal and bash commands;
- fluency in English;
- strong experience with importing, transforming, and maintaining datasets for other users.
This last skill can seem difficult to showcase if you’ve recently graduated. But its presence here doesn’t mean you need to have worked for a large company. We love to hear from people who maintain open-source datasets on important subjects. Highly-valued applications also include candidates who have worked with some of our key sources (WDI, SDG, UNWPP, GBD, etc.).
If the skills listed above make up the trunk of a tree, secondary skills are the branches. You don’t need to grow all these branches to work well at OWID, but our data scientists tend to be proficient in at least one.
- strong knowledge of statistics;
- strong knowledge of programming;
- strong knowledge of academic research, ability to understand publications, experience with science communication;
- experience with developing, maintaining, and documenting large public datasets.
Beyond the skills that are useful to perform well in any job, here are the ones that are the most important for what we do:
- extreme attention to detail;
- being able to assess what data is accurate and insightful and which is not;
- recognizing shared behaviors and patterns that provide solutions to data problems;
- intellectual curiosity, openness to new ideas;
- interest in learning about novel research topics;
- flexibility to receive feedback, learn from new evidence, and change one’s mind;
- ability and drive to work without supervision;
- proactivity, assertiveness.
What we don’t particularly pay attention to
This section is only relevant to our work at OWID. Other organizations, including very similar ones, may need staff who fit these definitions.
There are a few things that aren’t on our list of criteria, although some people think that they are:
- having a Ph.D. (this isn’t necessary to join our data team);
- strong experience with machine learning, big data, cloud services, etc.;
- knowledge of many programming languages. In fact, “Python and nothing else” is a much better profile for us than “everything but Python.”
This doesn’t mean you won’t find people on our team with these characteristics. Half of our current data scientists have a Ph.D. Half used to develop machine learning models in previous jobs. All of us know other languages besides Python. But none of us joined OWID because of these things. Instead, these characteristics are merely correlated with the skills we are looking for. People with a Ph.D. often understand academic research very well. People who have worked on ML models tend to have strong knowledge of statistics. And people who know many programming languages also tend to be experienced developers.
Even though projects and work experience are the best way to build your CV, I remain a big fan of book learning. Reading these five books is a great way to sharpen many of these skills at once:
- Hans Rosling, Factfulness
- Nate Silver, The Signal and the Noise
- Philip E. Tetlock, Superforecasting
- Edward R. Tufte, The Visual Display of Quantitative Information
- Julia Galef, The Scout Mindset
Where to work
Opportunities at OWID
Even though our team has grown a lot, we still rarely open new positions. You can find them on our Jobs page when we do so. You can also follow us on Twitter (OWID, myself) or LinkedIn, where we will usually advertise new positions multiple times.
Opportunities outside OWID
If you want to work at an organization similar to OWID, I recommend following the 80,000 Hours job board. 80,000 Hours is a nonprofit that provides free advice and support to have a greater impact on your career. Opportunities listed on the job board are always interesting; many are related to data and research. If you’re on Twitter, you should also follow @effective_jobs.
While you’re honing your skills or looking for a job, I’d strongly recommend checking whether your city has an existing “Data for Good” or “Tech for Good” community. MeetUp is the most frequent platform they use. These groups are typically composed of many members who volunteer to use their skills to work on interesting projects for a few weeks or months alongside their day job. While the range of projects may include more or less important problems, it’s generally a good way to meet like-minded people, get more experience, and enrich your portfolio.