In the digital era, data is the new gold, driving advancements in fields ranging from healthcare to marketing. At the heart of this revolution is Artificial Intelligence (AI), a technology that thrives on vast quantities of information. However, as AI systems become increasingly integrated into our daily lives, they also encounter a plethora of sensitive data – information that pertains to our personal identities, health records, financial details, and more. It’s quite The handling of this sensitive data by AI raises critical questions about privacy, security, and trust. As an AI experts, we find it imperative to explore the intricate balance between harnessing the power of AI for societal benefits and safeguarding the sanctity of individual privacy. This article delves into why we should not include sensitive data in AI trainning, a topic that affects each one of us.
What is privacy? People always talk about privacy, but they might not be very sure about what it is. EU’s General Data Protection Regulation (GDPR) has definition.
“Processing of personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person’s sex life or sexual orientation shall be prohibited.”EU’s General Data Protection Regulation (GDPR)
In an ordinary interaction between people, they can decide what personal information they would like to share with other people. However, this is not the case for AI and other automated data collecting systems. Although they might ask for your consent before they collect your data, people do not read this information word for word. Therefore, it’s difficult to keep privacy unrevealed, which might impact security interests of people.
However, there is also an argument that the sensitive data that is gathered by AI might be anonymous, which means that people who can review these data cannot link this information to a specific person. There might be another reason that people can mitigate their tension, sensitive data that is stored somewhere in AI might not be accessed by anyone else. Nevertheless, as long as someone could access this data, this technology could be seen as raising a risk of violating people’s privacy, but it does not violate privacy itself. The sensitive data might also be used to form a profile of someone and they might be refused by a loan or mortgage.
The integration of sensitive data in the development and training of AI systems significantly amplifies the risk of data breaches, posing a grave threat to individual security. When AI systems process and analyze this type of data, they often become vulnerable to a myriad of cybersecurity threats. A famous example was 2018’s Cambridge Analytica-Facebook scandal. This data science company used AI to harvest data taken from social media profiles to identify key individuals and advertise politically in an attempt to influence their actions in a few different recent elections. Cambridge Analytica obtained information on millions of users without consent and then quietly overtook their social media experience to manipulate their feeds. Facebook later confirmed that it actually had data on potentially over 87 million users.
The vulnerabilities of data stem from a range of factors, including weak security protocols that fail to shield the data effectively, insufficient encryption that leaves data exposed during transit and storage, and inadequate monitoring systems that fail to detect breaches in a timely manner. Furthermore, the presence of lax access controls and the potential for internal threats magnify these risks, increasing the likelihood of unauthorized access and misuse of sensitive information. The consequences of such breaches are dire, ranging from identity theft and significant financial losses to broader societal harms. In light of these profound risks, it is imperative to reconsider the use of sensitive data in AI development and training. The need to protect individual privacy and prevent potential harm far outweighs the perceived benefits of utilizing such data in AI systems.
While advancements in data security and encryption have indeed improved the safekeeping of sensitive data in AI systems, these measures are not infallible. Cybersecurity threats are evolving rapidly, outpacing many existing security measures. As per a report by Cybersecurity Ventures, cybercrime damages are expected to reach $6 trillion annually by 2021, indicating the persistent and growing threat of data breaches. Furthermore, the reliance on sensitive data for AI effectiveness, especially in critical fields like healthcare, raises ethical concerns. The use of real patient data, while beneficial for accuracy, must contend with issues of consent and potential biases in datasets, which can lead to skewed AI outcomes. For instance, a study in Nature Medicine highlighted the risk of biases in medical AI systems, suggesting that these systems might not generalize well across different populations.
trust and acceptance
The use of sensitive data in AI systems raises profound privacy concerns and fears of misuse, creating a significant barrier to public trust and acceptance. People are increasingly alarmed by the potential invasion of privacy, as personal details like health records and financial information could be exposed or exploited without their consent. This anxiety is compounded by the opaque nature of AI algorithms, which leaves individuals feeling powerless and uninformed about how their data is used or protected. High-profile incidents of data breaches and leaks further exacerbate these fears, vividly illustrating the vulnerabilities in even the most advanced systems. Ethical and legal issues also come to the fore, with growing skepticism about whether AI developers are respecting individual rights and adhering to regulations, especially regarding sensitive data. Moreover, the potential for AI to perpetuate biases and discrimination, particularly when trained on sensitive data, amplifies public apprehension. These collective concerns underscore the need for a cautious approach in AI development. To foster public trust and ensure ethical practice, it is imperative to steer clear of using sensitive data in AI systems, prioritizing privacy and the rights of individuals in the pursuit of technological advancement.
While it’s true that sensitive data can improve the accuracy of AI systems and lead to public trust in AI, this isn’t without risks. A study by the MIT Media Lab found that facial-recognition software demonstrates clear biases based on the data it’s trained on, often misidentifying women and people of color. This indicates that even with detailed sensitive data, AI systems can propagate and amplify biases if the data isn’t representative or if the algorithm isn’t designed to handle diversity. Thus, the claimed improvement in accuracy may not be universally applicable and can not improve public trust.
The exploration of sensitive data usage in AI systems reveals a complex landscape where the potential for innovation is often overshadowed by significant privacy, security, and ethical concerns. The privacy issues, heightened security risks like the Cambridge Analytica scandal, the challenge of maintaining public trust, and the undeniable presence of biases in AI systems underscore the need for a more cautious approach in AI development. Given these concerns, it becomes evident that the use of sensitive data in developing or training AI systems should be reconsidered. Sensitive data, while valuable, poses too great a risk to individual privacy and security to justify its use in AI contexts. The pursuit of technological advancement should not compromise the sanctity of personal data. Therefore, developing AI systems without relying on sensitive data becomes not just a technical challenge but a moral imperative. This approach aligns with a vision of AI development that prioritizes the protection of individual rights and upholds the highest standards of privacy and ethical responsibility. By steering clear of using sensitive data in AI systems, we can strive to harness AI’s capabilities while ensuring it serves the public good without infringing on personal privacy and security.
Soifer, E., & Elliott, D. (2014). Nonstandard observers and the nature of privacy. Social Theory and Practice, 185-206.
Chan, Rosalie. “The Cambridge Analytica whistleblower explains how the firm used Facebook data to sway elections”. Business Insider. Archived from the original on January 29, 2021.
Kozlowska, Hanna (2018). “The Cambridge Analytica scandal affected 87 million people, Facebook says”.
Steve Morgan (2020), Global Cybercrime Damages Predicted To Reach $6 Trillion Annually By 2021, Cybercrime Magazine.
Seyyed-Kalantari, L., Zhang, H., McDermott, M.B.A. et al. Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations. Nat Med 27, 2176–2182 (2021).
Lauren Goode, Facial recognition software is biased towards white men, researcher finds, MIT Media Lab.
Elliott, D., & Soifer, E. (2022). AI technologies, privacy, and security. Frontiers in Artificial Intelligence, 5. https://doi.org/10.3389/frai.2022.826737