Artificial Intelligence and Data Privacy

November 8, 2023 Jessica Brown

Legislators are increasingly regulating artificial intelligence through privacy statutes and tech regulations. Article 22 of the EU’s General Data Protection Regulation governs automated decision-making requiring companies to both provide notice where automated decisions are made and human review of decisions is required. Draft legislation, specifically the American Data Privacy and Protection Act, would require large data holders to conduct algorithm impact assessments where there is a risk of harm to an individual or group when processing personal data. Additionally, companies that develop AI must comply with laws where the AI is used, not just where it is developed. Privacy is further implicated by generative AI, which uses neural networks to identify patterns and structures within existing data.

The Federal Trade Commission is investigating OpenAI, one of the leading providers of large language models, about concerns that ChatGPT can publish false information about individuals. Privacy regulators across Europe and Canada have expressed concern about the use of personal data to train large language models as well as individuals’ ability to block usage of their data for this purpose. A suit is pending in California requesting regarding the scraping of personal data to train generative AI engines. On the regulatory front, the EU is contemplating an AI Act that would govern the development and use of AI systems from conception to going to market. In the U.S., the White House and other Federal agencies are looking at an overall national AI strategy and AI governance more generally. And states and localities are considering and enacting measures. As AI grows in importance, governments will want to make sure companies are thinking about the risks and benefits of the technology.

AI necessarily implicates ethical considerations, and companies need to consider the following key principles. People trust companies and brands that protect the data they collect by providing proper security and letting people know how their data will be used. people may be less willing to share data in the absence of protections. This could impede AI development because AI runs on data. Companies that want to sell data across the U.S. and globally, must comply with existing legal regimes elsewhere. But even regulation does not exist, companies can take steps to build trust.

Transparency in how data is used also builds trust with customers. Companies should make efforts to protect privacy by de-identifying, or anonymizing, data. In that arena, tokenization is key because it can help pass essential attributes while also shielding the identity of individuals, which in return reduces risk of data loss or breach. Identities can be re-associated when the content generated by the large language model returns. And, companies should have the right contractual terms in place that and are comfortable with their privacy programs and controls.

Data is essential to AI technology, so data quality is essential to high-value outputs. “Clean” data that is free of errors is important, as is representativeness, meaning the data used to develop the AI system needs to reflect the communities on which it will have an effect. This includes eliminating or ameliorating potential bias. As such, protected characteristics should not be a part of the data the AI system acts upon. Then, where appropriate, companies should do bias studies to test whether the algorithm exhibits bias in some way and then make adjustments as needed. This is not only the right thing to do, but not doing so would open companies up to tort liability and anti-discrimination laws like Title VII of the Civil Rights Act in the case of employment discrimination.

The best way to address the above items is to have a strong AI governance program which requires thinking about what the system is trying to do and the data that is used to derive results. AI’s full potential can be realized in a way that benefits all with governance that embodies transparency, privacy, ethical design, and human oversight.