LibGuides: AI Literacy: Ethics and Privacy

Bias

AI output depends entirely on its input.

This can result in explicit and implicit bias.

Facts to know:

To “train” the system, generative AI ingests enormous amounts of training data from across the internet.
Using the internet as training data means generative AI can replicate the biases, stereotypes, and hate speech found on the web.
As of January 2024, 52% of information available on the internet is in English, which means this bias is built into the system through training data. About 70% of people working in AI are male (World Economic Forum, 2023 Global Gender Gap Report) and the majority are white (Georgetown University, The US AI Workforce: Analyzing Current Supply and Growth, January 2024). As a result, there have been numerous cases of algorithmic bias.

While this does not mean that content generated by AI has no value, users should be aware of the possibility of bias influencing AI output.

There are ongoing privacy concerns and uncertainties about how AI systems harvest personal data from users.
Some of this personal information, like phone numbers, is voluntarily given by the user. However, users may not realize that the system is also harvesting information like the user’s IP address and their activity while using the service.
This is an important consideration when using AI in an educational context, as some students may not feel comfortable having their personal information tracked and saved.
Additionally, OpenAI may share aggregated personal information with third parties in order to analyze usage of ChatGPT. While this information is only shared in aggregate after being de-identified (i.e. stripped of data that could identify users), users should be aware that they no longer have control of their personal information after it is provided to a system like ChatGPT.

AI is typically associated with virtuality and the cloud, yet these systems rely on vast physical infrastructures that span the globe and require tremendous amounts of natural resources, including energy, water, and rare earth minerals.
Generating images vs. generating text require very different energy use.
Texas has been and may continue to be a source of information around data center creation and use.

AI still needs human intervention to function properly, but this necessary labor is often hidden.

For example, ChatGPT uses prompts entered by users to train its models. Since these prompts are also used to train its subscription model, many consider this unpaid labor.
Taylor & Francis recently signed a $10 million deal to provide Microsoft with access to data from approximately 3,000 scholarly journals. Authors in those journals were not consulted or compensated for the use of their articles.
Some argue that using scholarly research to train generative AI will result in better AI tools, but authors have expressed concern about how their information will be used, including whether the use by AI tools will negatively impact their citation numbers
In a more extreme case, investigative journalists discovered that OpenAI paid workers in Kenya, Uganda and India only $1-$2 per hour to review data for disturbing, graphic and violent images.
Anthropic (Claude.ai) is currently involved in an ongoing lawsuit around AI and training data and copyright concerns.