An Artificial Intelligence (AI) model is, by nature, a data devourer. It is trained on billions of pieces of information available on the internet and, in many cases, also on the content that users themselves type directly. This continuous interaction means that every time you converse with the AI, the model learns more about how people communicate and, potentially, about you in particular. The simple act of interacting already constitutes an act of exposure that many overlook.
“People often do not realize that by sending information to an LLM (Large Language Model), they are exposing data to an environment that does not differentiate between sensitive and common content. Many users copy code, contracts, internal records, or personal data believing they are in a private space, but the model merely processes what it receives, and the providers may retain this data for themselves, for AI learning, in addition to technical logs for audit and security,” warns Pollyne Zunino, Deputy Coordinator of the SWAT Team at Apura and an expert in Cybercrime Investigation, Electronic Fraud, and Digital Intelligence.
The survey conducted by the Apura team sheds light on an increasingly common trap we fail to see: the innocent delivery of sensitive information to systems that were not designed to keep it safe.
And real cases illustrate the risk. One of the most frequent involves developers who send code snippets for optimization, unaware that they have left embedded access tokens, internal URLs, or temporary credentials within. Even if the model responds effectively, the damage is already done — meaning that confidential data has been transmitted, processed, and potentially logged on the platform. And once the information has been used for AI learning, it may eventually become part of a response for other users of the same LLM service. Whether it's a token, a CPF (Brazilian ID number), a piece of a contract, or a strategic pipeline, the logic is the same: what goes into the model becomes part of the AI and no longer returns to the user's control.
In companies, the scenario is even more critical. The ease of use and the spontaneous, unregulated adoption of AI tools by employees creates an environment known as Shadow AI's, a parallel and invisible ecosystem where corporate data circulates outside the protection layers designed to safeguard it.
Customer information, proprietary code, strategic plans, confidential contracts, and critical assets: everything can be copied, pasted, and sent to an external platform without any risk assessment.
Unapproved tools create gaps that go unnoticed by traditional cybersecurity defense systems, such as DLP, SIEM, and EDR, turning external AI models into potential leakage channels.
“Providers like OpenAI, Google, and Anthropic, to name a few, have privacy policies that limit the use of personal data and differentiate treatment between API and web interface,” explains Zunino. “Typically, they indicate that they do not use data sent via API to train models, although they may retain operational information for security.”.
In the realm of open source — a set of software, tools, systems, and communities whose source code is open and can be viewed, modified, enhanced, and distributed by anyone — protection falls entirely on whoever hosts and operates the model. And often, this hosting is not prepared or structured to ensure adequate security.
Apura emphasizes that cybercriminals are very aware of these facts. “Today, specialized groups exploit everything from configuration flaws in corporate models to involuntary leaks in logs, repositories, and internal instances,” explains the Apura Cyber Intelligence expert.
Techniques such as model inversion, membership inference and prompt injection. allow for extracting sensitive patterns, re-identifying users, manipulating model behavior, and reconstructing originally confidential data. “In other words, the criminal no longer needs to breach the network. They just need to access what was leaked through AI prompts,” reinforces Pollyne.
How to protect yourself
The expert emphasizes: “AI is not your diary. It is not your confidential email inbox. Before pasting any content, the question should be: ‘If this were to leak, would I be comfortable?’”.
Among the main guidelines:
• never insert sensitive personal or corporate data;
• strictly follow internal cybersecurity policies;
• prioritize AI tools approved by your company's technology and security team;
• adopt local models and autonomous agents operated within the company's own infrastructure.
“Local LLMs eliminate sending data to third parties and facilitate compliance with privacy legislation such as LGPD and GDPR. Furthermore, they enable advanced automations, with autonomous browsers, data extraction, and report generation, without compromising privacy,” she explains.
Apura, a reference in Cyber Threat Intelligence (CTI), has been closely monitoring the evolution of this risk ecosystem and mapping how criminals incorporate AI into every phase of an attack.
“We monitor open sources, communities, and infrastructures where criminals share leaked corporate prompts, sensitive artifacts, and new model exploitation techniques,” states Pollyne Zunino. “This work identifies involuntary exposures and also how malicious groups use AI to automate social engineering, vulnerability scanning, spear phishing and the production of more sophisticated malicious artifacts.”.
The expert concludes by stating: “AI is learning all the time, and if you don't pay attention, it can learn much more than it should.”

