Justin Arenstein is the founder and chief executive of Code for Africa. This is Africa’s largest network of civic tech and open data groups. It includes investigative reporters, fact checking groups, and data scientists. He is based in Tbilisi, Georgia.
This is an extended cut of the interview from The Truth is Out There that has been edited for ease of reading.
Who is actually spreading disinformation?
Whether it be around climate or religion, or reproductive issues and gender rights, you can hire sophisticated agile teams, who build campaigns for you, including by creating fake accounts that reference each other, and appear as coherent communities. They suck people in, and build quite a bit of momentum.
In South Africa, we’ve seen the development of these kinds of networks using xenophobia as a rallying call, where South Africans believe that Nigerians and Kenyans and Zimbabweans and Mozambicans are taking jobs that belong to them. They hit on many of the same trigger points you’ll hear in conversations in the US or in Eastern Europe. They use a playbook that has been proven to work elsewhere, but they dress it in local language, and often generate manipulated media to support local claims.
We’re talking about tens of millions of dollars — and thousands of people employed in Africa and elsewhere — who craft material and amplify it. It’s a large industry, and I think the only way we’re ever going to bring it under control, is to stop thinking of it in terms of good against evil or of one ideological block versus another. We need to take a step back and map the economy that underlies disinformation. Just like you have a military industrial complex, there is a disinformation industrial complex. And the only way we’re going to defang it, is by demonetizing it.
What are some of the techniques you use to expose disinformation?
Trying to fact check everything is whack-a-mole. It’s very worthwhile and we need to do it, but it does not scale, because we cannot operate at the same level that these machine generated hate machines do.
We are the largest organization in Africa doing disinformation work, and we are only 93 people. We operate across 21 countries, and in those 21 countries there’s probably half a billion people. Our fact checking team, which is only 30 people, currently works with 174 newsrooms across Africa to produce maybe 2,000 fact checks a year, which is not a big number. But this has an outsized impact, in that Facebook can label or remove over 6 million posts per year based on those 2,000 fact checks. But even then, it’s still a drop in the ocean if you consider the vast quantity of misinformation circulating online.
The machine learning tools we use, and the natural language processing tools that we use, don’t just track the use of specific terms, but seek to understand the emerging narratives or conversations — the zeitgeist that prepares people to be susceptible to the conspiracies that take hold of societies. We would be blind to all of that, if we didn’t have machine learning to analyze millions of online articles, and help us to spot trends or outliers.
We’ve built on top of an awesome toolkit called Media Cloud that was developed at MIT many years ago, and has been used for tracking media reporting across the world, including with a very big focus on election reporting and coverage. We helped spin up our own version of that, which enables us to track media across different sectors.
We’ve also tapped into increasingly more affordable machine learning tools to map individuals and amplification networks, so we can start to understand how governments and lobbyists are putting together bot armies or troll farms that either amplify the kinds of content or voices that they want, or attack, intimidate, and persecute the kinds of voices that are not in support of their position. We couldn’t do that if we didn’t have these tools.
The algorithms that drive many of these tools are often engineered for Northern audiences or Northern issues, or even just Northern ways of communicating. And by North, I mean Europe and North America. And so, a lot of what we do is try and reverse engineer what those algorithms do, and try and make them more appropriate to the worlds we operate in. You communicate differently on SMS, for example, or on pure text based messaging platforms than you do on a platform where you could also share embedded rich media, like video or audio content. So, understanding some of that communication culture is important for calibrating AI tools.
How else are you using AI?
Our team in Cape Town is called Civic Signal and they are a machine intelligence research team. They explore AI technologies to figure out what they’re capable of in ‘the real world’ –– by which we mean places like Africa –– where there is not a large amount of data to train your model on. A machine learning brain is only as good as the foundational training that it’s had. And if there is not enough information to train it on, you’re not going to have very clever machine intelligence. So the team in Cape Town is starting to build some of these large training datasets. They do that by hoovering up certain types of comments, and media reporting from credible media as well as less trustworthy media, so that you can compare. And then we also scoop up content from government communicators. We’ve built a very large structured data portal. It’s the largest on the continent.
We’re creating the raw material that AI tools need to train themselves on. Unstructured data is super important, but you also need a bit of a knowledge graph or structured data. So, telling an algorithm that this piece of information relates to the following data point in the following way, and placing a value on that relationship. And so what the AI team in Cape Town does is often quite laborious and manual, and not actually done by machines. It’s a bunch of human analysts putting stuff together, so we can feed it into algorithms. We also work with colleagues elsewhere in the world that are pushing human rights focused machine learning, across multiple North American universities. We learn from collaborating with each other, because you really don’t want to try to build everything yourself from scratch. So a large part of what the team in Cape Town does is find savvy partnerships and common interests, and then build on those.
What should social media companies be doing to fight this problem at scale?
There are no simple solutions to the challenges they’re facing. Language is one big issue. It’s very difficult to police content, if you don’t understand what is being said. In a country like Nigeria, there are 200+ languages. Ethiopia has similar levels of complexity. In South Africa, there’s 11 official languages. If the early warning machine learning algorithms that try to catch a lot of this abuse can’t understand, then it’s difficult. Human moderation is what the large social media companies fall back on to try to teach the algorithms. But that’s not enough. And it exposes people to horrific content.
I mean, this is glib, but there should be more collaboration. The platforms should be doing more sharing, and more joint problem solving, to solve a problem that ultimately they’ve created. That means everything from creating standards, to sharing watch lists, to having threat disruption teams from different platforms all sharing notes, so that there’s common intelligence to act on.
Second, is that they’ll never be able to be everywhere, all the time. And so they need to figure out productive and sustainable ways of collaborating with a wider circle of groups: with the media, with fact checkers, with political watchdog ecosystems. All of whom come with slightly different agendas and mandates, which is healthy. Because otherwise there’s group-think. I think not enough of that has happened.
Some platforms are collaborating with global networks of fact checkers, which by its nature is elitist. You’ve got to adhere to certain standards for transparency and resource limits, to ensure that you can do the work, but that excludes almost everyone. Some fact checkers, for example, get access to algorithmic tools that surface potential misinformation that the public or academic researchers wouldn’t have access to. I think we need better transparency into what those accesses are and what the rationale is for how they are controlled. Sometimes, it’s to protect the private data of users, but we could push back if we thought those limits were set too conservatively. We could have an informed public debate and discussion. Because we do need to balance privacy with the ability to analyze and root out abusers. We need to figure out ways of creating layered interactions, where grassroots media and people operating at the village level also have access to resources, techniques and AI tools, so their work can cascade up and feed into ever more complex organizations.
Yet another thing is responsiveness. For anyone who’s ever tried to report disinformation or abuse on social media, if you hear back from a real human instead of a bot, you’re extremely lucky. Platforms need to make better efforts to actually review and respond when people report abuses. We’re starting to see some experiments around this. Again, this is a scale issue. Platforms need to understand, they can’t do this on their own.
Mozilla has taken reasonable steps to ensure the accuracy of the statements made during the interview, but the words and opinions presented here are ascribed entirely to the interviewee.
Photo of Justin Arenstein is by Mohamed Nanabhay on flickr (CC-BY) 2016