Data Science in the Developing World

I was fascinated by this recent interview by 80000 hours on using data science to end poverty, given by Ofir Reich. Ofir is an experienced data scientist working at the Centre for Effective Global Action.

I found the interview inspiring, yet practical. I’ve had aspirations to work in global health as a data scientist for some time now, and I’d strongly encourage anyone else interested in the topic to listen to the entire interview.

The interview covers a wide range of topics. Some topics raised points I found particularly interesting, and it’s those that I’ve tried to highlight below.

Variety of data science projects Link to heading

Ofir works at the Centre for Effective Global Action. At the time of the interview the organisation is working on three primary projects.

Targeting tax avoiders in India through using machine learning on tax returns.
Transforming cutting edge knowledge to applicable insights, transmittable to poor farmers in developing countries via text message.
Investigating payment systems for teachers in Afghanistan that are cheaper and more effective than the current one.

I was surprised by the breadth of these projects. I would have instinctively expected a focus on one country or on one area. Perhaps the range of projects is an indicator that a number of projects failed in the incubation phase.

One interesting thing is that all of these projects are scalable. It is easy to imagine a successful project in one of these areas generalising across countries or across subject groups.

Another interesting thing is to consider the cost effectiveness of each project. Ofir makes the point that even if a project doesn’t have a high overall impact, its cost effectiveness may be very high. Take sending text messages – so cheap as to be almost free. If the text messages have even a small positive impact, their low price would result in them being very cost effective, even if their impact is not large.

All three projects are based on collaboration with local governments, which can be good (you can potentially have a huge impact across the whole country) or bad (corruption can cripple your project).

Developing world, developing data Link to heading

To do data science and machine learning you must have data. Funny, that.

Unfortunately the developing world has much less data that the developed world, partially because the people themselves use much less technology than we do. There are less digital devices (phones, cameras, sensors, sound recordings, security footage) and also less transactional data, since developing countries are more likely to pay in cash and leave no paper trail. Technology is simply less pervasive in society than we are accustomed to in the Western world.

I mean, when was the last time you climbed a tree just to get a bar of cell phone reception?

This is the biggest obstacle against data science as a weapon against global poverty. Large scale datasets are the lifeblood of data science, and we are more likely to find them in the richer developing countries than the poorer ones. This restricts us to working primarily in those countries with more data, which paradoxically are those who need our help least. Meanwhile the poorer countries cannot generate data because they lack the technology to do so, and they can only get the technology once they escape their poverty, which we can’t help them with. A vicious feedback loop.

When are data scientists redundant? Link to heading

How do you know if as a data scientist, you should be working on a particular project? One thing Ofir emphasises is there are legitimate options other than direct work. Earning to give at a hedge fund, as a quantitative trader or even just as a data scientist could have more impact if the direct work would involve working on the wrong problem.

The value of manual labour can be a nice heuristic for your usefulness. Some problems can be solved using brute force alone. Put it this way – your salary as a data scientist could be used to pay for a lot of cheap local labour, and if you are less effective than the locals that is a good indication you should not be working on the project.

Working at a not-for-profit Link to heading

Many data scientists want to work for a not-for-profit and make their difference that way. As a rule when choosing a not-for-profit to work for, consider if it relies on donations or not, and if possible work for one that does not rely on donations.

Not-for-profits with funding and not for profits with donations are two different beasts. The ones with funding have some degree of independence, don’t have to pander to their donors, and are allowed to fail. The ones with donations have to appease those who gave them money and have pressure to hide failures, less the donor take away their money. Organisations that are allowed to fail can take more risks and you are much more likely to be effective in such a place.

An alternate to working for a not-for-profit is to just follow the data, Ofir raises. Look where you have large systems and large data sets and go there, which tends to be governmental work, but could also be at large not-for-profits or elsewhere. This would be one way to potentially make a big difference.

In summary Link to heading

There were many more points covered in the interview than covered here, but these were the ones that I found interesting. Especially interesting was the debunking of the myth that data is going to solve all our problems, heard particularly loudly within tech bubbles.

Data science in the Third World is only going to grow and grow. There will undoubtedly be great innovations here over the next few years and I am very excited to see the advances and their applications to human lives.