Please flip your device.

 

Transcript: AI from Above (EP3)

This is a transcript of Mozilla’s IRL podcast episode, AI from Above, from August 15, 2022 (IRL Season 06, Episode 03).

Bridget Todd:

Maps are powerful. Since long before the internet, people have used them to make sense of the world. Once, to plan a backpacking trip, I even made a map of all the best secret swimming holes I grew up splashing in across Virginia.

Bridget Todd):

But maps don’t always tell the same story. There’s a satellite above the clouds right now, photographing your neighborhood. An aerial picture can tell a thousand stories, especially with a little help from machine learning and optical recognition. But who gets to tell those stories? What happens if your entire neighborhood is invisible or misrepresented?

Astha Kapoor:

You will not, say for instance, have lampposts or sidewalks or the conditions of your roads would be terrible because you, as a community, as a space, are not visible to either the government or to private sector, community providers.

Bridget Todd:

That’s Astha Kapoor. She’s the co-founder of an institute in India where she researches data and artificial intelligence. We’ll come back to Astha in a bit.

Bridget Todd:

I’m Bridget Todd, and this is IRL, an original podcast from the nonprofit Mozilla. This season, five episodes on the perils and promise of artificial intelligence. We’ll meet AI builders and policy folks who are making AI more trustworthy in real life. This season also doubles as Mozilla’s internet health report.

Bridget Todd:

Today, AI and Spatial Justice. Huge geospatial data sets based on satellite images, cell phone location data, and more are combined with AI systems. These are controlled by giant corporations, governments, and defense organizations. It goes into their decisions from border control to where a hospital should go, to how many internet cables should be on your street. So who exactly controls what goes into those big data sets? Who draws the lines on the maps of tomorrow?

Bridget Todd (02:12):

We’re in a roadside market in the second largest township in South Africa, Tembisa, home to more than half a million people. It’s morning. People are dodging traffic to buy fruits and vegetables. It’s a close up scene of life in South Africa. But right now, we are here to tell the story of an AI researcher who is mapping the spatial legacy of Apartheid. Like many other townships, Tembisa was established when Black Africans were forcibly relocated during the Apartheid era. A lot has changed here since then, but the area is still marked by deep poverty, waste water, electricity outages.

Bridget Todd):

Now, shift your perspective to a bird’s eye view of Tembisa. You’ll see a tight grid of streets and a dense population with little green space. But just next door is the city of Kempton Park. From above, we see large homes on leafy tree lined streets beside luxurious golf courses. It’s like another world, yet it’s right next door.

Raesetje Sefala:

As much as Apartheid has ended, the visual characteristics of townships are still the same.

Bridget Todd:

That’s computer vision researcher, Raesetje Sefala. As someone who grew up in South Africa, she knows that where people live can keep them from economic opportunities. It determines access to education, healthcare, and jobs. I’ve also seen this in Chicago and where I live in the Washington DC area, where you’ll find some of the wealthiest neighborhoods in the United States sitting right next door to extreme poverty. You see it in Brazil and lots of other places outside of South Africa. But often, the spatial dimension of inequality is overlooked. Even when it’s not documented in official data sets, it exists. It’s spatial injustice.

Bridget Todd:

So how can we apply AI to this problem? From her home base in Johannesburg, Raesetje is using satellite imagery and AI to identify wealthy and less wealthy neighborhoods. This creates useful data about the disparity of resources. Data that can be used to advocate for change. Because usually in South Africa, the injustice gets glossed over through zoning categories. For example, both Tembisa and its wealthy suburban neighbor are called Formal Residential Areas in the eyes of the government, despite the huge differences.

Raesetje Sefala:

If you go to a public hospital and you’re waiting for a doctor, for example, you could even wait the whole night, because there’s a whole lot of people who are also waiting for a doctor. This is a hospital in the township, but then if you go to a public hospital in a suburb, it’s a different experience altogether. You get to see the doctor, service is different because there’s not as many people as compared to hospitals in townships and places like that.

Bridget Todd:

Armed with her expertise in computer vision and machine learning, Raesetje set out to clearly identify disparities between neighborhoods. She assembled a lot of data, satellite images from the South African National Space Agency, a building data set from a public utility company. Then she started labeling buildings and neighborhoods with input from local grad students.

Raesetje Sefala:

Putting together data sets like that, now we can redefine the concept of a neighborhood to actually define that it’s an occupied place with a certain land use characteristic. At the end, it can give an image it hasn’t seen before, it can predict what the neighborhood type most likely is. So we used models that were already trained on other tasks and have been doing well. So we just took the algorithms and then we trained them on our datasets.

Bridget Todd:

Raesetje’s dataset enables townships to be seen for what they are, in the eyes of its citizens.

Raesetje Sefala:

Evaluating whether or not that new growth that you see on the side is actually a suburb versus of township requires some domain knowledge, where if you don’t have local knowledge, especially on these nuanced problems that we’re trying to solve, it can be very difficult to evaluate.

Bridget Todd:

Raesetje herself grew up in Lebowakgomo, a township about three hours north of Johannesburg. Using AI, Raesetje found that more than 2 million out of the 12 million buildings in South Africa are in townships. Raesetje is a research fellow for the Distributed AI Research Institute. That’s DAIR for short. It was founded by Timnit Gebru, to do independent community rooted AI research. Timnit is the former co-lead of Google’s AI ethics team who was fired in 2020. She’s a vocal critic of big tech’s influence on AI. Raesetje’s research paper, which is co-authored by Timnit, is among the first featured by the Institute.

Raesetje Sefala:

I chose not to work with a big tech company because we don’t solve the same problems. We don’t have the same mandate and I would honestly want to change it for other people who come from the same place that I come from. It wasn’t a great experience growing up in a neighborhood type, like a township, mainly because of the public services that are offered.

Bridget Todd:

Raesetje and DAIR understand the power of their data sets and maps. And power of this kind carries risks for people whose homes are labeled. That’s why there’s an application form to access the data set. They want to keep it out of the hands of companies who could use it to discriminate. Let’s say, if a bank wanted to use the data.

Raesetje Sefala:

They check in which neighborhood you are living in currently, or in which neighborhood you want to buy in. And then they determine the interest rate accordingly. Insurance companies do that for cars as well. So now we are saying, townships are here. We are identifying them. And then automatically they can put a risky score to a person who is inquiring about a certain product and just marginalize them or like treat them unfairly just because they’re from that neighborhood.

Bridget Todd:

If you live in the United States like I do, for instance, you know the way that someone born and raised in Baltimore county will be treated differently from someone raised just down the road in wealthy Potomac. This would be a problem everywhere, not just in South Africa. Weighing the risks and the benefits of research is something that DAIR say they aim to do for every project from the start. When it comes to AI, we could all wish this would happen more often.

Bridget Todd:

Too much or too little data who controls your data. It’s a deeply human question, especially when it comes to a place that you call home or even the language you speak. Indigenous AI communities are at the forefront of thinking about how to unlock power through controlling their own data.

Michael Running Wolf:

I am Michael Running Wolf Junior, and I am an AI researcher working for Northeastern university.

Bridget Todd:

Michael Running Wolf is a software engineer and ethicist. He’s the founder of indigenous and AI. Michael is based in Canada. He says, protecting information about places plays an important role in the protection of indigenous land and communities. When we use a term like spatial data, it can be easy to lose sight of the human reality of what makes this information valuable to a community.

Michael Running Wolf:

It’s sort of in conflict with this new movement of open data where people want to assemble large data sets and Indigenous communities have been very resistant to publicize and these important archeological sites, because once it becomes known, it’s really hard to protect them. And this affects everything, economy, ecology, people’s relationship to their religion, their spirituality, and of course their language and heritage.

Bridget Todd:

Knowing how much data to share, on which terms, weighing the risks and the benefits is a balancing act. Michael is an advocate for indigenous data sovereignty. He says indigenous communities shouldn’t just hand over their data for companies to extract value from. For the benefits of AI to be felt in indigenous communities, Michael says they need to be working with their spatial data themselves.

Michael Running Wolf:

I myself, I’m a computer scientist, and I’m happy to know that there’s a few others out there are starting to build land resource tooling. And for an instance, one of my friends worked for the tribe to build a digitized map of important resources on our land for tracking and also for maintenance. And it just wouldn’t exist without us Indigenous engineers simply because no one is interested in doing this tooling. And if we didn’t exist, it wouldn’t happen.

Bridget Todd:

Open source software lowers the barriers to access for doing this kind of work. In Michael’s case, he’s been able to build tools that help to revitalize endangered indigenous languages, including the Wakashan family of languages. Michael’s goal is to pave the way for more indigenous automatic speech recognition. And then, to see it used with immersive technology that integrates with physical environments. You could put on a virtual reality headset and travel to the uncolonized prairie or explore a city of today with a phone camera narrated by an ancestor in augmented reality. Speech recognition could also be used for language learning apps or voice assistants built by indigenous communities, services that are usually the domain of tech companies. Michael believes communities can be the guardians of their data, even while they put it to work in the world.

Michael Running Wolf:

If you go into a community with a mindset that data is just a resource, a monetary valuable thing, you’re fundamentally harming the community. And you’re also diminishing the value of this data.

Astha Kapoor:

I think that we do need to think more critically about questions of harm, questions of exclusion, of bias that come from geospatial data.

Bridget Todd:

That’s Astha Kapoor again. She’s the co-founder of the Aapti Institute. That’s a public research firm that studies the intersection of tech and society. Astha says, the missing details in some maps are a reflection of which people in places are prioritized by Big Tech. The question for all of us is, what kind of data futures do we want to see?

Astha Kapoor:

There are parts of the city that are not mapped in India or in Bangalore where I live. Google maps are just not able to generate that kind of data that densely populated cities, where maps is not able to gather data from. And that obviously means that this community does not have the benefit of maps. And as we know that maps have a huge amount of community value of being able to navigate, of being able to reveal small businesses, et cetera. And that is a burden that certain parts of the city will have to navigate because they’re undiscoverable in a certain sense.

Bridget Todd:

Being invisible on the world’s most used mapping systems can be devastating if your government relies on them to make decisions.

Astha Kapoor:

There has to be a push from the community. And then of course you need institutional mechanisms. So you potentially nonprofits that may want to mobilize that need of the community and have them either volunteer to start collecting data. There are apps that can do it. There are softwares that allow communities to do it. And then of course, you need to take that up to plug it into existing data sets to make that experience visible to the government or whoever else.

Bridget Todd:

Astha says that first, communities have to fight to be included in those larger maps. And then, they also need to make sure that their data is protected and that their privacy is respected by both corporate and state actors. It’s a very fine balance.

Astha Kapoor:

It’s a less regulated space because all our efforts to safeguard privacy have been focused on personal data protection so far. And I think that this is a huge policy gap that needs to be addressed soon. And also what’s interesting is that the lens of protection is much less when it comes to non-personal data. It is usually the lens of how do we unlock the value, which is all good, because there’s a huge amount of value that needs to be unlocked. But I think that we do need to think of how communities can participate in drawing value, as well as reflecting their own concerns and safeguarding their own interests.

Bridget Todd:

So Astha suggesting an arrangement where communities can be part of the mapping systems to the extent that they want to be, and on their own terms, she advocates for an intermediary stewardship layer between the map creator and the community. This would be a layer of oversight of the data that represents the interest of that community.

Astha Kapoor:

There’s immense value in sharing it. We just need to share it in a way that is controlled. That is regulated. That is involving communities and being directed by communities in a certain way. And they should have much more say in how the data is being collected. And for that you need these data stewards in a certain sense.

Bridget Todd:

There’s a lot of attention paid to collecting data about places, but not so much on the ethics of how to use that data.

Denise McKenzie:

There’s a real danger in mapping. And when you create derived products, that if you haven’t trained your machine learning well enough, you are basically going to rub humans out of existence, or rub types of trees out of existence, or take animals out of existence, because you haven’t detected it with your machine in your artificial intelligence.

Bridget Todd:

Denise McKenzie is an expert on geospatial data. She’s the community and ethics partner at a nonprofit mapping organization called PLACE. They make detailed digital maps that governments can use for land and infrastructure management. These are custom maps created with respect for local values.

Denise McKenzie:

So there’s this really fantastic phrase that gets used in the geospatial world which describes data, it says it’s whether the data is fit for purpose. Data’s hard. Data is hard. You almost never get the perfect data set that you want for exactly what you need. So we so often grab data because it exists and try and mold it. Talk about, yeah, you’re always hearing with data, people going, oh, I had to clean the data and I had to structure it the way I needed to structure it. I had to do all this work to it, to make it do what I needed it to do. And so what I would say to AI professionals is you have to think carefully, how much manipulation are you doing to this data set before you run your algorithms over the top of it.

Bridget Todd:

Gathering data that is fit for purpose is a critical part of ensuring that the outputs of machine learning are trustworthy. But data isn’t something to think of as separate from the people it represents.

Denise McKenzie:

And when you collect data, look at the background I come from, I’m Australian. I live in the UK. That certainly doesn’t mean that if I go fly into South Africa, that I have got any real sense of what it is to be a local in South Africa. So would I therefore be a good data collector in that space? Probably not. You’re much better off finding somebody local in country to go do that, but we’ve seen time and time again around the world, many organizations sort of flying in the experts, so to speak, to do the data collection. And it’s been shown time and time again, that they will often miss things that are nuances about local populations that really needs to be collected.

Bridget Todd:

Supporting governments to create their own data sets and maps is part of the work Denise does at Place. They do the tech work using drones to gather hyper local imagery. They share the images with governments to help inform decisions about public infrastructure, bypassing big technology firms and international agencies.

Denise McKenzie:

The main pilot we’ve done at this point is in Côte d’Ivoire. So really the examples we’ve got in that at this stage are the governments, I guess, enthusiasm and realizing that actually this is a source for them of data that gives them such insight into their own population. There’s incredible images within the data sets that we’ve got from there that show things like your ability to identify the air conditioners on the top of buildings. And so you think from an AI perspective, well, if you can teach your AI to do that, well, you can actually calculate for a particular area of the city, the energy draw, the size capacity, et cetera, of all the air conditioning on the top of roofs, and then calculate the energy consumption.

Bridget Todd:

Aerial photos are also being used in Côte d’Ivoire to help the government and communities prepare for, and even avoid, flooding disasters. That could ultimately save lives.

Denise McKenzie :

Maps are ultimately how we all make sense of the world. We use them every day. Maps are an incredible tool in understanding why we as humans do what we do.

Bridget Todd:

When AI is applied to solving a problem in real life, the output may be considered factual, even if the data or machine learning model wasn’t right for it. It becomes a truth that is used to reinforce power and dominance all the time. People and places can be made invisible and their reality denied. And this doesn’t just apply to geospatial data. Time and time again, we see there is very little space for people to contest when they think algorithms are wrong, even when it directly affects them. Sure, mapping is about numbers and coordinates and images and data sets. But by insisting that people should have the means and the rights to use AI to define their own reality, we can start imagining trustworthy AI that is less extractive, more decolonial. We can talk about equity as well as efficiency when it comes to technology. Here’s Raesetje from South Africa.

Raesetje Sefala:

This is the one major thing that we should all be thinking about. The people behind the data point and the people creating these algorithms.

Bridget Todd:

This is IRL, an original podcast from Mozilla, the nonprofit behind Firefox. Follow us and come back in two weeks. This season of IRL is Mozilla’s annual internet health report. To learn more about the people, research, and data behind the stories. Come find us internethealthreport.org. I’m Bridget Todd. Thanks for listening.