This compilation of facts and figures explores global power disparities in AI and highlights research and perspectives on how to shift that power for a healthier internet and more trustworthy AI.
Let’s begin!
This year’s Internet Health Report is about the systems of power and human decisions that define how artificial intelligence is used, and whom it impacts. We set the scene here with an accessible compilation of research and data visuals about the current state of AI worldwide.
When we say AI, it’s shorthand for a wide range of automation and algorithmic processes, including machine learning, computer vision, natural language processing, and more.
Who has power?
From your social media feed to fast food restaurants, companies in every sector are turning to AI to unlock new ways to collect and analyze data to tailor their offerings.
But the benefits — and the harms — are not evenly distributed.
This is how much AI is predicted to contribute to the global economy by 2030.
Sizing the prize, PricewaterhouseCoopers, 2017
The companies with resources to invest are carving out competitive advantages. And the countries with access to engineers, large amounts of data, and computational power, are consolidating their dominance of software and hardware in ways that impact how AI is deployed worldwide.
The United States and China are far ahead when it comes to private investments in AI. But that is just one indicator of how differently the rise of AI is experienced worldwide.
Another way power is reflected is at the very heart of AI research and development.
The cost of training machine learning systems has decreased, and the availability of data is greater. But even as more of the world delves into AI, a major imbalance is reflected in the landscape of AI research papers.
In thousands of papers, it is the same datasets from just a few countries that are used most often to evaluate the performance of machine learning models everywhere.
This doesn’t mean that datasets or machine learning models aren’t being developed in the rest of the world. They are!
But the discourse about how AI should be used — and who should benefit from it — is currently heavily weighted toward people and institutions who already wield tremendous power over the internet (and the world).
In fact, more than half of the datasets used for AI performance benchmarking across more than 26,000 research papers were from just 12 elite institutions and tech companies in the United States, Germany, and Hong Kong (China).
A large and frequently reused dataset does not guarantee better machine learning than a smaller one designed for a specific purpose.
On the contrary, many of the most popular datasets are made up of content scraped from the internet, which overwhelmingly reflects words and images that skew English, American, white, and for the male gaze.
For instance, the machine learning models most credited for advancing the field of automated language generation, like GPT-3, frequently reproduce racist and sexist stereotypes, in large part due to their training data.
Why not use other datasets? Machine learning models and datasets reflect both the biases of their makers and power dynamics that are deeply rooted in societies and online, but this is not widely acknowledged. More datasets should be created specifically to diversify machine learning methods for equity.
This is the one major thing that we should all be thinking about. The people behind the data points, and the people creating these algorithms.
Who is accountable?
There is no question that the companies who stand to gain the most from AI are the world’s biggest tech companies.
Their revenues have skyrocketed throughout the global COVID-19 pandemic, and several are among the highest earning companies of all time.
Each of these companies make money in different ways, but AI is core to the business operations of all of them.
Let’s consider Amazon, the highest earning tech company in 2021. AI is key to every major revenue category reported by Amazon last year.
Big tech companies play an outsized role in shaping our experience of the internet, and life itself. What we see, what we buy, even what we believe is nudged along by them daily.
Yet, according to Ranking Digital Rights, there is little to no transparency into how companies test and deploy algorithmic systems and use our personal data, including when it comes to curation, recommendation, and ranking.
Recommender systems determine what people see in social media feeds and what content is rewarded with ad revenue. False and harmful content that drives up engagement is frequently recommended by platforms, even when it may later be found to violate platform rules.
In a crowdsourced study of YouTube by Mozilla, 71% of the videos people said they “regretted” watching were algorithmically recommended to them.
Because there is so little transparency into how systems work, researchers often need to recruit users for data donations to help study platforms and seek answers.
Platforms should be doing more sharing and more joint problem solving, to solve a problem that, ultimately they’ve created.
The boom in AI is accelerated by astronomical levels of data collection, by big tech and others. Almost everything we do is tracked and analyzed.
It’s hard to fathom just how much data is collected, kept, and sold.
This is how often your location may be logged by mobile apps in the $12 billion location data industry.
Who Is Policing the Location Data Industry?, Alfred Ng, Jon Keegan. The Markup, 2022
Why does this matter?
Because AI offers companies with intimate knowledge about us new ways to predict what we’ll do — and new ways to influence how we behave.
It can be as simple as offering two people a different price for the exact same service without telling you.
Digital ads and social media networks are also weaponized to spread disinformation. In the absence of greater transparency and collaboration with researchers, a global industry of companies and organizations engaged in covert messaging is thriving.
This was the number of countries where social media was used for “computational propaganda” in 2020.
Industrialized Disinformation: 2020 Global Inventory of Organized Social Media Manipulation, Samantha Bradshaw, Hannah Bailey & Philip N. Howard. Oxford Internet Institute, 2021
Is it fair?
The problems of misinformation and hate speech are felt worldwide, but platforms do not respond to them with urgency everywhere. Platforms develop AI to moderate content at scale, but do not resource it equally in all languages.
For instance, although 90% of Facebook’s users live outside the United States, only 13% of moderation hours were allocated to labeling and deleting misinformation in other countries in 2020. India alone has more users of Facebook, by far.
At the very least, you need to have some people who understand the language and are paying attention to it as their job.
Automated systems are frequently trained on data that is manually annotated by humans. For example, it can take hundreds of hours of labeling to prepare just one hour of video for the computer vision of a self-driving car.
This labor that is invisible to end-users is often exported to countries with low wages or completed via global crowd work platforms like Amazon Mechanical Turk.
This was the median hourly wage for Amazon Mechanical Turk workers when accounting for invisible labor in 2021.
Quantifying the Invisible Labor in Crowd Work, Carlos Toxtli, Siddharth Suri, Saiph Savage, 2021
Often, when the algorithmic management of labor is central to a business, fairness takes a backseat to maximizing productivity. This is true of hundreds of “gig work” platforms around the world. An estimated 40% of gig workers earn below the minimum hourly wage in their own countries.
This imbalance is amplified by the lack of data privacy regulations in many countries. And even in countries that do have them, AI is increasingly used by authorities to expand surveillance and control with facial recognition, license plate readers, and more.
There are an estimated upto 1 billion surveillance cameras worldwide. This means one camera for every eight people on the planet.
The appetite for applying AI to policing and carceral systems worldwide is huge. For instance, via predictive policing and pre-trial risk assessments.
However, AI is sending people to jail and getting it wrong.
It takes a lot of imagination and creativity to move out of the rigid definition of how data should be used, to think differently about data and to reclaim it to make tools that are not oppressive.
In real life, over and over, the harms of AI disproportionately affect people who are not advantaged by global systems of power.
News headlines about algorithmic biases are glaring signals of how technology can be used to oppress rather than uplift people.
What can be done?
What values are advanced by researchers? Frequently, they are commercial ones.
Today, nearly half of the most influential machine learning research papers — and many top AI university faculties — are funded by big tech. More research from a broader set of people and institutions could help shift industry norms.
By some measures, ethical considerations in the research field are on the rise, but more interdisciplinary understanding of risk and harms is still needed.
The vast majority of leading research papers are centered on technical performance, not social needs or risks. And few of the most cited papers discuss ethical principles or user rights.
The barriers and costs for building AI are lower thanks in part to a new generation of open source tools and independent and grassroots AI developer communities worldwide.
But many of the same harms of big tech and big data AI development will be repeated if more trustworthy research, data, and development practices are not adopted.
If you go into a community with a mindset that data is just a resource, a monetary valuable thing, you’re fundamentally harming the community and you’re also diminishing the value of this data.
Communities guided by values of fairness and human rights are challenging us to rethink not just how AI is built but for whom. Such questions are not being asked or answered by big tech.
Regulation can help set guardrails for innovation that diminish harm and enforce data privacy, user rights, and accountability. Many laws already apply to AI, but policies that are specific to AI (or sometimes bans) are also surfacing in different regions, countries and cities. In the public sector, there are many novel approaches to AI accountability.
This is a rapidly evolving field. No one has all of the answers.
How do we build trust? New techniques and methods for preserving privacy, for logging the origins of data, for operationalizing ethics, for auditing algorithms and giving users more power, are among many paths explored.
Mozilla is building knowledge through convenings and collaborations with people around the world who are rethinking the use of data and AI for more equitable outcomes. Our core purpose is for the internet to be healthy. We fight for an internet with more privacy and security, openness, inclusivity, and where power is decentralized in ways that benefit humanity. With AI, we often see the worst health issues of the internet amplified. This is why we are researching, campaigning, and grantmaking for trustworthy AI with such urgency.
Collaboration across multiple sectors is necessary for solutions. With this report, the podcast, and stories that accompany it, we especially call on tech builders and policy folks to engage in the conversation and act.
10 Projects Rethinking Data Stewardship: Announcing Mozilla’s Latest Creative Media Awards, Mozilla, 2022
Algorithmic content curation, recommendation, and/or ranking systems (F12), Ranking Digital Rights, 2022
A consumer investigation into personalised pricing, Consumers International, Mozilla Foundation, 2022
A World With a Billion Cameras Watching You Is Just Around the Corner, Liza Lin, Newley Purnell. Wall Street Journal, 2019
AI Audit Challenge, Stanford University Human-Centered Artificial Intelligence Institute, Stanford Cyber Policy Center, 2022
AI Data Labelling. Roundtable Readback, aapti institute, 2020
AI is sending people to jail—and getting it wrong, Karen Hao. MIT Technology Review, 2019
Algorithmic accountability for the public sector. Learning from the first wave of policy implementation, Divij Joshi, Tonu Basu, Jenny Brennan, and Amba Kak. Ada Lovelace Institute, AI Now Institute, Open Government Partnership, 2021
Amazon Mechanical Turk, Amazon, 2022
Annual Report 2021, Amazon, 2022
AWS AI services, Amazon
Annual reports and press releases of big tech companies from 2017-2021
Artificial Intelligence Incident Database, Responsible AI Collaborative, 2022
Artificial Intelligence Index Report 2022, Stanford University Human-Centered Artificial Intelligence Institute, 2022
CelebA, The Chinese University of Hong Kong
COCO: Common Objects in Context, Microsoft
Creating Trustworthy AI, Becca Ricks, Mark Surman. Mozilla, 2020
Data Futures Lab, Mozilla
Datasheets for Datasets, Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, Kate Crawford, 2021
Ethical AI Ecosystem, Abhinav Raghunathan. The Ethical AI Database, 2022
Ethical and social risks of harm from Language Models, Laura Weidinger, John Mellor, Maribeth Rauh, Conor Griffin, Jonathan Uesato, Po-Sen Huang, Myra Cheng, Mia Glaese, Borja Balle, Atoosa Kasirzadeh, Zac Kenton, Sasha Brown, Will Hawkins, Tom Stepleton, Courtney Biles, Abeba Birhane, Julia Haas, Laura Rimell, Lisa Anne Hendricks, William Isaac, Sean Legassick, Geoffrey Irving, Iason Gabrie. DeepMind, 2021
Facebook Employees Flag Drug Cartels and Human Traffickers. The Company’s Response Is Weak, Documents Show. Justin Scheck, Newley Purnell, Jeff Horwitz. Wall Street Journal, 2021
How do the biggest internet companies make money?, Mozilla, 2019
ImageNet, Stanford University, Princeton University
Industrialized Disinformation: 2020 Global Inventory of Organized Social Media Manipulation, Samantha Bradshaw, Hannah Bailey & Philip N. Howard. Oxford Internet Institute, 2021
LAION-400M, Laion
Leading countries based on Facebook audience size as of January 2022, Statista, 2022
Liberty at Risk: Pre-trial Risk Assessment Tools in the U.S., epic.org, 2020
Male gaze, Wikipedia, 2022
Mozilla’s vision for the evolution of the Web, Mozilla, 2022
Multimodal datasets: misogyny, pornography, and malignant stereotypes, Abeba Birhane, Vinay Uday Prabhu, Emmanuel Kahembwe, 2021
On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?, Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, Shmargaret Shmitchell, 2021
Quantifying the Invisible Labor in Crowd Work, Carlos Toxtli, Siddharth Suri, Saiph Savage, 2021
Real Change How?, Mozilla, 2021
Realizing the Potential of AI Localism, Stefaan G. Verhulst , Mona Sloane. Project Syndicate, 2020
Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research, Bernard Koch, Emily Denton, Alex Hanna, Jacob G. Foster, 2021
Responsible AI in Africa: Challenges and Opportunities, Damian O Eke, 2022
Responsible Computer Science Challenge, Mozilla
Rising Through the Ranks, Spandana Singh. New America, 2019
Self-driving cars prove to be labour-intensive for humans, Tim Bradshaw. Financial Times, 2017
Sizing the prize, PricewaterhouseCoopers, 2017
Small Data’s Big AI Potential, Husanjot Chahal, Helen Toner, Ilya Rahkovsky. Center for Security and Emerging Technology, 2021
Stanford Natural Language Inference, Stanford University
State of AI Report 2021: More money, more influence, Nathan Benaich, Ian Hogarth, 2021
Street-Level Surveillance, EFF, 2017
The 2022 BTS Executive Summary, Ranking Digital Rights, 2022
The Biggest Data Breach, Irish Council for Civil Liberties, 2022
The Efforts to Make Text-Based AI Less Racist and Terrible, Khari Johnson. Wired, 2021
The gig workers index: Mixed emotions, dim prospects, Peter Guest, Youyou Zhou. Rest of World, 2021
The Mozilla Manifesto, Mozilla
The Values Encoded in Machine Learning Research, Abeba Birhane, Pratyusha Kalluri, Dallas Card, William Agnew, Ravit Dotan, Michelle Bao, 2021
Toward User-Driven Algorithm Auditing: Investigating users’ strategies for uncovering harmful algorithmic behavior, Alicia DeVos, Aditi Dhabalia, Hong Shen, Kenneth Holstein, Motahhare Eslami, 2022
Trustworthy AI Working Groups, Mozilla Festival
Who Is Policing the Location Data Industry?, Alfred Ng, Jon Keegan. The Markup, 2022
YouTube Regrets, Jesse McCrosky and Brandi Geurkink. Mozilla, 2021