Please flip your device.

Transcript: The Truth Is out There (EP4)

Transcript: The Truth Is out There (EP4)

This is the transcript of Mozilla’s IRL podcast episode, The Truth is Out There, from August 29, 2022 (IRL Season 06, Episode 04).

Bridget Todd:

Have you ever shared something online that turned out not to be true? I have. In fact, not that long ago, I shared a photo of an adorable horse named Sugar, who supposedly pretends to be asleep to avoid being ridden. That photo was shared over 40,000 times on Twitter. And I thought, “Well, it must be true.” But as it turns out, Sugar was just taking a nap. Ok, so mistakes happen but misinformation at scale can also be really dangerous, especially when it relates to our democracies. Online disinformation has reached new highs because it’s not just a human problem, it’s an algorithmic problem, too. One that’s exploited by murky political forces. In your country and in mine, in all of our elections, in different languages, in different political contexts. Are platforms doing enough to fight it?

Sahar Massachi:

These companies have not allocated enough resources to protecting all elections at once, much less societies between elections. Because if you’re paying attention to a country three weeks before the vote happens, the die is cast. Malicious people have spent a year creating soccer fan pages that they slowly politicize.

Bridget Todd:

That’s Sahar Massachi. He used to be a data engineer on Facebook’s integrity team, working on elections. We’ll hear more from him in a bit. I’m Bridget Todd, and this is IRL, an original podcast from Mozilla, the nonprofit behind Firefox. This season, five episodes on the perils and promise of artificial intelligence, for the internet and in real life. We’re meeting AI builders and policy folks who make AI more trustworthy in this special season that doubles as Mozilla’s 2022 Internet Health Report. This time it’s AI, elections and disinformation. How can we create healthier information ecosystems?

Bridget Todd:

Tracking disinformation is about unraveling mysteries. It’s about spotting clues and patterns that can lead you to its source. Let’s meet someone who works on dismantling disinformation across more than 20 countries. And no, it’s not an employee of Twitter, Facebook, or TikTok. It’s Justin Arenstein, the founder and chief executive of Code for Africa. This is Africa’s largest network of civic tech and open data groups. It includes investigative reporters, fact checking groups and data scientists.

Justin:

Whether it be things around climates or around religion or around kind of reproductive issues and gender rights, you can hire sophisticated, small agile teams, who then are able to build campaigns for you, including creating fake accounts that kind of reference each other and look like and function like coherent communities, and suck people in and build quite a bit of momentum.

Bridget Todd:

Justin is from South Africa. He’s currently based in Tbilisi, Georgia. He wants us to understand that there is a global industry of disinformation for hire, worth tens of millions of dollars. Like a well oiled machine, these networks make fake content and lots of it. More importantly, they create artificial surges of attention on topics. They use bots and network social media accounts. They post content in a way that’s designed to game social media algorithms, which in turn amplify these messages. Eventually humans and media organizations begin to genuinely engage with it and this is how social media is weaponized.

Justin:

Like you have a military industrial complex, there is a kind of disinformation industrial complex, and the only way we’re going to defang it is by demonetizing it.

Bridget Todd:

As disinformation networks become more powerful, they influence the messages we hear and how we interact with one another. Disrupting democratic discourse, drawing them out of the shadows and taking down their networks, threatens their business model. It shines a light on those who benefit politically from misinformation.

Justin:

In South Africa, we’ve seen the developments of these kinds of networks using xenophobia as kind of a rallying call where South Africans believe that Nigerians and Kenyans and Zimbabweans and Mozambiquens are taking jobs that belong to them and hitting on very many of the same trigger points that you’ll hear in conversations in the US or elsewhere, or Eastern Europe. They’re using a play book that has been proven to work elsewhere. They’re dressing it in local language and often generating manipulated media to support local claims.

Bridget Todd:

Justin says disinformation adds fuel to the fire in countries where there are already electoral coups, religious insurgencies and foreign mercenaries.

Justin:

Trying to fact check everything is whack-a-mole. It’s very worthwhile and we need to do it, but it does not scale because we cannot operate at the same level that these machine-generated hate machines do.

Bridget Todd:

Code for Africa, coordinates fact checking teams in more than 170 newsrooms across Africa. Their journalism becomes training data for machine learning tools.

Justin:

We operate across 21 countries. And probably In those 21 countries, it’s probably half of billion people and we are the largest organization in this space in Africa doing disinformation work. And we are only 93 people. And you have an outsized impact, I mean our fact checking team, which is only 30 people, produces maybe 2000 fact checks a year, which is not a big number. But that in turn has a multiplier impact that just Facebook labels or removes over 6 million posts per year, based on those 2000 fact checks.

Bridget Todd:

Justin acknowledges that this is just a drop in a sea of disinformation. But why is a small nonprofit in Africa working to clean up the platforms of the world’s richest internet companies?

Justin:

What should social media companies be doing to fight this problem at scale? I mean, this is glib, but there should be more collaboration, not just between social media companies themselves, or technology companies, the platform should be doing more sharing and more joint problem solving to solve a problem that ultimately they’ve created.

Bridget Todd:

So these big platforms should be collaborating with each other, sharing intelligence and mitigating disinformation from spreading from one channel to the other.

Justin:

They’ll never be able to be everywhere all the time. And so they need to figure out productive, sustainable ways of collaborating with a wider sense, set or circle of watchdogs. In the media, in fact checking, in kind of political watchdog ecosystems.

Bridget Todd:

That last point is important. African countries are diverse in cultures and spoken languages. In Nigeria alone, over 500 languages are spoken. Code for Africa’s team speak local languages. They understand local history and geopolitics. They’re building giant data sets of media content from the region, and they use AI to supercharge their work.

Justin:

The machine learning tools that we use and the natural language processing tools that we use to not just track the use of specific terms, but understand emerging narratives or conversations and the zeitgeist that almost makes people, prepares people to be susceptible to these tropes or these conspiracies that take hold of societies. We would be blind to all of that if we didn’t have, kind of, machine learning to be able to analyze millions of online articles and help us spot the trends or the outliers. So we use a lot of tools.

Bridget Todd:

Some big platforms do collaborate with and even pay local organizations. But Justin says only the largest fact checking networks are invited by platforms to partner with them. He believes the work Code for Africa and other researchers do, developing and sharing AI tools, can create a positive ripple effect to empower smaller groups.

Justin:

We need to figure out ways of cascading down these ecosystems and creating layered interactions where grassroots media have access to resources and to techniques and tools and their work can feed into an ever cascading up series of more complex organizations who can start using AI and the kinds of technologies that will never be in the reach of people operating down at village level.

Bridget Todd:

Justin gives big platforms like Facebook credit for the work they’re doing on disinformation. Even though he thinks they could be doing more.

Justin:

They at least are doing things that we can see and we can criticize. The problems are the closed channels, the signals, the WhatsApps, where it’s invisible, the dark social.

Bridget Todd:

Dark social. That sounds much spookier than what I see in my WhatsApp chats with my family. But messaging apps are key vectors of political and electoral disinformation in many countries. Some are end to end encrypted. That makes them more secure, but it also makes them harder to study from the outside, especially when companies are not transparent.

Tarunima Prabhakar:

Part of what was happening in 2018 was that there was a lot of attention on Facebook and Twitter. And WhatsApp had somehow still not come to global attention, but for many of us in India and also in other countries that was the dominant platform.

Bridget Todd:

That’s Tarunima Prabhakar. She’s the research lead and co-founder of the open source project, Tattle. That’s a community of technologists and researchers in India. They build machine learning tools and data sets to understand and respond to misinformation. They began their work three years ago, trying to crack a puzzle. How could they help people verify information on WhatsApp when the company wasn’t?

Tarunima Prabhakar:

On WhatsApp, a lot of the content is in audio visual format. You don’t necessarily have URLs. You just have images and voice recordings that are shared in the platform. We knew that if you wanted to do any automation, be it around linking it to a fact check report or linking it to content that has been shared in the past, you had to be able to work with the language that content was being shared in as well as the modality, which is where it’s image, video, audio. So that’s where we started with building the sort of machine learning tools to just process the content and the modality and the languages that are used in India. And then we also started archiving a lot of this content. So we started archiving content that was circulated on chat apps. We started creating a data set of content that had been fact checked in India.

Bridget Todd:

They collected those fact checks using a technique called scraping and the WhatsApp messages were from large, public groups. Then, they developed a searchable repository for cross referencing chat messages with the fact checks. In the process of doing this work, they were stunned by the close connection between hate speech and misinformation.

Tarunima Prabhakar:

We were collecting, scraping data from some of these other social media platforms. We were just sort of shocked at how much hate speech there was on these platforms and were thinking about why platforms had not addressed these issues. But also that before we opened some of this data set for, say research or journalistic storytelling, we wanted to filter out the hate speech. So we found ourselves in a place where we needed to do content moderation before releasing this data, and we didn’t have the tools to do it.

Bridget Todd:

So Tattle’s work expanded to include tools and plugins for moderating content in their own open data sets in Hindi, Tamil, and Indian English. But why haven’t the platforms addressed these issues? With so much knowledge and research into hate speech and disinformation, why is there still so much of it? One reason is language.

Tarunima Prabhakar:

Our experience with most platforms is that they’re just not geared to handle Indian languages very well.

Bridget Todd:

Let’s talk about content moderators; the people whose job it is to see the most gruesome content on the internet. Part of their job involves labeling content so it becomes training data for automated content moderation. This is important work. We know from Facebook whistleblowers like Frances Haugen, that only a small proportion of moderation happens outside of the US, even though 90% of Facebook users live elsewhere. In other words, most languages are under-resourced. And even for English, there are many different dialects.

Tarunima Prabhakar:

Even though a lot of Indian social media users will use English, they will use it in a very distinct way with a lot of mixing and matching with regional languages and using words in ways that you wouldn’t use it in American English.

Bridget Todd:

Here’s a quick question. Which countries have the most Facebook users? The answer is India, the US and Indonesia. But the number of users in a place doesn’t always mean that a platform is going to be more accountable to them. Things that are done to protect elections in one place never happen in others. And that’s true for all platforms.

Sahar Massachi:

Should it be the case that a company has at least a million people speaking a language that no one in the company understands. At the very least, you need to have some people who understand the language and are paying attention to it as their job.

Bridget Todd:

Sahar Massachi is the co-founder of the nonprofit Integrity Institute in the US. It’s a new member organization for people who work on integrity teams at social media platforms. Sahar worked at Facebook as a data engineer and developed technical tools to protect elections by showing accurate information to voters. He also looked at real time dashboards in what they called a war room, to identify spikes and misinformation. These can be caused by bots.

Sahar Massachi:

It’s well known that the reshare button is really dangerous or the retweet button. And if a thing is reshared many times, it’s very likely to be bad. You could imagine putting in place speed bumps for that or ways to sort of make it harder to quote tweet a quote tweet or retweet or retweet or retweet. That’s not technically very hard. The real challenge is arguing about why you should be allowed to launch it. And almost 90% of the job could be just sort of diplomacy around being allowed to do the rest of your job.

Bridget Todd:

By diplomacy, Sahar means they need to negotiate for changes. Integrity teams are considered cost centers, at odds with other teams focused on growth.

Sahar Massachi:

Your role is to be that inconvenient person who points out why the easy thing or the growth making thing might not be a good idea. It’s a really kind of awkward position to be in.

Bridget Todd:

In a company like Facebook, a lot of different teams work on policies and moderation. Integrity work is just one piece of the puzzle.

Sahar Massachi:

I think it’s fair to say that in general, integrity teams are newer and less well resourced than we would like.

Bridget Todd:

Sahar says, he’s especially proud of the work his team at Facebook did on the 2018 Brazilian presidential election and US midterm elections. He acknowledges that there were missteps, but he describes intense periods of work, where they grew in terms of skills and capacity.

Sahar Massachi:

We are pulling 14 hour shifts in this windowless, stinky room. You would build a tool on day one, show up on day two for your shift and someone had upgraded it. By day five someone had built a whole new tool that did a better job. By day seven, someone might’ve actually written up a documentation about how to use it.

Bridget Todd:

Sahar said it was rewarding when different parts of the company would pull together.

Sahar Massachi:

Different teams around the company really cared about it. And you were able to pull them in and say, “We really need you to teach us how to use this tool that you built so that we can use it. Or we really need you to change your app or your product temporarily so that it is safer.” And they would do it because they really wanted to do the right thing.

Bridget Todd:

Companies are secretive about what they do or don’t do to fight  disinformation. And Sahar says this makes collaboration with outside groups difficult, even when they could really help. He hopes the Integrity Institute will serve as a bridge to the inside.

Sahar Massachi:

Every company’s probably different. One way that companies are different is in the way that they think about the outside world and how comfortable they are for their workers to talk to the outside world. And a level of paranoia or being locked down in one company really can surprise you if you come from a different company. Part of what the Integrity Institute is trying to do is really speak to that and fix that. And we say that we’re representing integrity workers so that we are the one place where NGOs and academics and the rest of the world can come talk to us. And then we’ll sort of disseminate it to the workers or the integrity professionals who are our members into their day jobs.

Bridget Todd:

Let me introduce you to one more person, Raashi Saxena coordinates global contributions to a crowd sourced data set of online hate speech called Hatebase, through an initiative called the Citizen Linguist Lab. This is all run by the Sentinel Project in Canada, but Rashi works remotely from Bangalore, India.

Raashi:

The Citizen Linguist Lab is really for anyone across the board that wants to contribute and amplify and augment our database. We, in many cases, might lack the social, cultural and linguistic context of things and who better to contribute than the locals who live in the particular setting.

Bridget Todd:

Hatebase now covers 98 languages spoken across many countries. They work with hundreds of universities to research the impact of hate speech and misinformation, particularly leading up to elections. Raashi explains the connection like this.

Raashi:

Hate speech loads the gun, but misinformation pulls the trigger. Hate speech in itself might not contribute to offline violence, but it kind of sets the tone and the environment of background hostility towards a particular community or ethnicity. And then rumors in the form of malignant information, harmful information that circulate around that can perhaps lead to election violence.

Bridget Todd:

The Sentinel Project has documented this dynamic in numerous countries, in Kenya, the Democratic Republic of the Congo, South Sudan, Sri Lanka and Myanmar. Their mission is to prevent mass atrocities through early warning and cooperation. The global repository of hate speech enables them to perform automated sightings of the offensive terms across the internet, nearly a million of them. They label terms in a way that lets them take the temperature of conflicts.

Raashi:

Every contributor is called a citizen linguist, and they can also help by offering the assessment of the offensiveness of a particular term, which is then calculated with all other inputs. This helps us to sort of crowdsource sentimental analysis and access one part of the output for the system. So the offensiveness rating kind of helps us to understand the social and political environment.

Bridget Todd:

The data can be accessed by local and human rights groups, free of charge while big platforms can pay for access, it’s a resource to help moderate online conversations. It’s designed to keep people safer in real life.

Bridget Todd:

With important elections taking place around the world, it is vital that platforms get a handle on disinformation. This isn’t something that any one company can handle alone. It can’t be solved in secrecy with content moderation algorithms or underpaid and unprotected moderators. We need companies to practice meaningful transparency so they can collaborate better with each other and local groups. This would empower researchers to uncover harmful disinformation that transcends platforms, languages, and media ecosystems. And those policies that platforms create for transparency and safety during elections, they shouldn’t just apply in some countries. They should apply everywhere. This is IRL, an original podcast from Mozilla, the nonprofit behind Firefox. This season of IRL doubles as the Internet Health Report. You can read more about our guests and AI by visiting internethealthreport.org. I’m Bridget Todd. Thanks so much for listening. For more on what can be done, look up Mozilla’s minimum election standards for platforms.