LLMs & AI Privacy: What Every User Should Know

The short version

1.Your prompts may train future models. Unless you opt out or use an enterprise plan, your queries can be used for model improvement.

2.Never paste real personal data. SSNs, passwords, financial account details, and confidential business information don’t belong in any LLM prompt.

3.AI-powered phishing is real. Spear-phishing emails now arrive grammatically perfect and hyper-personalized. The “obviously fake” tell is gone.

4.Voice cloning takes 10 seconds. Anyone with a short recording of your voice can generate convincing impersonations. Set a family safe word now.

5.Local models eliminate cloud risk. Ollama runs capable open-source models entirely on your machine - zero data leaves your device.

6.Opt out of training on every platform. 30 seconds per platform. Do it right now before you continue reading.

How AI platforms actually handle your data

The data policies of AI platforms are opaque by design - buried in terms of service most users never read. I went through each one. Here’s the practical reality.

Storage

Most platforms retain conversations server-side, associated with your account, for weeks to months. This isn’t just for your history - it enables abuse monitoring, model evaluation, and (unless you opt out) training data collection.

Training

On free and standard paid tiers, conversations may be reviewed by human contractors and used to fine-tune future models. This is how the “human feedback” in RLHF works at scale. Your chat about tax strategy might train the next version.

Breach exposure

Any data held server-side is accessible in a breach or government subpoena. The more sensitive the prompts, the higher the consequence. OpenAI was breached in 2023. These companies are high-value targets.

What should never appear in an AI prompt

Social Security numbers or government IDs
Passwords, API keys, or authentication tokens
Financial account numbers or card details
Medical records or diagnoses
Confidential client or business information
Full names combined with addresses, phone numbers, or birthdates

The practical test: if this information appeared in a data breach, would it cause real harm? If yes, don’t paste it into a chatbot.

Prompt hygiene: using AI without oversharing Habit change

Most of the risk from AI tools isn’t malicious - it’s accidental oversharing. I’ve built these habits into my daily use and they add almost no friction.

Use placeholders for sensitive details

Instead of “My SSN is 123-45-6789, help me fill out this tax form” - try “I’m filling out Form 1040 and need to enter a 9-digit identification number in Box 3. What format is expected?” The AI doesn’t need your actual data to help with the task.

Clear conversation history regularly

Most platforms let you delete individual conversations or your full history. Deleted conversations are typically removed from training pipelines, though they may remain in backup systems for a short window. I clear sensitive chats the same day. Make it a routine.

Use work AI for work, personal AI for personal

Don’t paste your employer’s confidential code into a personal ChatGPT account. If your company has an enterprise AI contract, use that - it comes with data processing agreements that personal accounts don’t have. When in doubt, ask IT first.

Disable training opt-in on every platform

ChatGPT: Settings → Data Controls → toggle off “Improve the model for everyone.”
Gemini: myaccount.google.com → Data & Privacy → Gemini Apps Activity → toggle off.
Claude: Settings → Privacy → toggle off training use.
Takes 90 seconds total. Do it now.

Platform data policies at a glance

Platform	Trains on free tier?	Opt-out?	Retention	Enterprise zero-retention?
ChatGPT (OpenAI)	Yes (opt-out available)	Yes	30 days after deletion	Yes (Enterprise)
Claude (Anthropic)	Yes (opt-out available)	Yes	Up to 2 years	Yes (Teams/Enterprise)
Gemini (Google)	Yes, via Google account activity	Yes	18 months (default)	Yes (Workspace)
Copilot (Microsoft)	Depends on account type	Yes (privacy dashboard)	Varies by product	Yes (M365 E3/E5)
Ollama (local)	No — runs entirely offline	N/A	No cloud retention	N/A

Policies update frequently. Verify current terms on each platform’s privacy settings page before handling sensitive data.

The voice cloning threat you probably haven’t taken seriously

This is the AI privacy threat most people underestimate. Commercial voice cloning tools - available to anyone with a credit card - can produce convincing audio impersonations from as little as 3-10 seconds of source audio. That’s shorter than a typical voicemail greeting.

I tested one of these tools on a recording of myself from a podcast. The result was indistinguishable to my family members. If I hadn’t told them it was fake, they wouldn’t have known.

Where attackers source audio

Attack surface

Social media videos, YouTube, TikTok, LinkedIn posts, voicemail greetings, podcast appearances, and conference recordings. If your voice is on the internet, it can be cloned.

Who gets targeted

Social engineering

Family members receive calls from a cloned voice claiming to be in an emergency. Executives’ voices are cloned for business email compromise. Financial institutions receive cloned voices attempting to bypass voice authentication.

Defending against voice cloning attacks

Set a family safe word. Agree on a word or phrase that any family member can use to verify identity in an emergency call. If someone calls in a panic and can’t say the word, hang up and call them back directly. We use a random word that has no connection to our lives.

Always call back on a known number. If you receive an urgent call from anyone - regardless of how convincing the voice sounds - hang up and call them back on a number you already have. Never call back on a number the caller provides.

Audit your public audio exposure. Search for your name on YouTube, TikTok, and LinkedIn. Consider whether voicemail greetings using your actual voice are necessary - many people switch to generic carrier greetings.

Disable voice ID authentication where possible. Some banks offer voice authentication as a login method. This attack surface is actively being exploited - use app-based or hardware authentication instead.

AI-powered phishing: the quality bar has collapsed

For years, grammatical errors were reliable signals that an email was a phishing attempt. AI-generated phishing eliminates this tell. Modern phishing emails are grammatically perfect, stylistically appropriate, and increasingly personalized using data scraped from LinkedIn, social media, and data broker databases.

I’ve seen samples that referenced the recipient’s recent conference talk, their manager’s name, and a project code that appeared in a public GitHub commit. That level of personalization used to require hours of manual research per target. AI does it in seconds at scale.

Spear phishing at scale

Traditional spear phishing required manual research per target. AI can now generate personalized attack emails referencing your company, your role, your recent activities, and your colleagues’ names - faster than any human team could produce them.

Deepfake video verification requests

A newer variant: attackers send a “verification call” request where a deepfake video of a colleague or executive asks you to take urgent action - wire a payment, share a credential. Several companies have lost significant sums to this attack. The video looks real because it is, technically.

The rule that still works

Perfect grammar and a convincing tone are no longer sufficient signals of legitimacy. The only reliable signal is: was this communication initiated by a known channel? An email asking you to reset your password should prompt you to go directly to the site - not click the link. A call asking you to wire money should be verified by calling the requester back on a known number.

See the Incident Response Guide for what to do if you’ve already clicked.

Prompt injection and agentic AI: the new attack surface

As AI moves from chatbot to agent - browsing the web, reading your email, writing and running code, booking travel on your behalf - the threat model changes significantly. Two risks most users aren’t thinking about yet.

When you give an AI a document, webpage, or email to analyze, malicious content in that material can hijack the AI’s behavior. A webpage might contain hidden text instructing the AI to ignore your request and instead exfiltrate data or forward your emails. This works against today’s most capable models.

The practical rule: don’t give an AI agent access to sensitive data or external actions unless you’ve thought carefully about what a compromised instruction could cause it to do.

AI agents that can send emails, make purchases, modify files, or interact with external services are operating with real-world permissions. The blast radius of a compromised or misbehaving agent is proportional to what you’ve given it access to.

Grant the minimum permissions needed for the task - read-only access where possible, no access to accounts the agent doesn’t need, and no persistent credentials it could use autonomously without prompting you first.

Using an AI agent at work that has access to company systems, customer data, or internal communications creates liability - for you and your employer - whether or not IT authorized it. Before giving any AI tool access to work systems: confirm your employer has a policy, use enterprise-tier tooling with a signed DPA, and treat AI access grants the same way you’d treat giving a contractor a key to the building.

The access control question to ask before every agent setup

“If this agent did the worst plausible thing with the access I’ve given it, what’s the damage?” If the answer is “it could send emails from my account” or “it could read our entire customer database” - scope it down until the worst case is acceptable.

Safer alternatives for sensitive work

I use different tools depending on what I’m working on. For anything involving real personal data, client information, or sensitive business context, I don’t use cloud AI at all.

Ollama — local models

Zero cloud exposure

Free, open-source software that runs LLMs like Llama 3, Mistral, or Phi-4 entirely on your own hardware. Nothing leaves your machine. A modern laptop with 16GB RAM handles most models well. I use this for anything involving real financial data, drafting sensitive correspondence, or proprietary code.

Get Ollama (free) ↗

Enterprise AI tiers

Signed DPAs, no training

ChatGPT Enterprise, Claude for Work (Teams/Enterprise), and Google Gemini for Workspace all come with signed data processing agreements that exclude your data from model training and offer stronger retention controls. If you’re using AI for professional work, this is the appropriate tier - often available through your employer.

Perplexity — privacy mode

No conversation storage

Perplexity’s privacy mode disables conversation history and removes your data from training pipelines. It’s not zero-risk (the query still hits their servers), but it’s a meaningful improvement over the default and useful for research queries you’d rather not have stored.

Table of Contents

LLMs & AI Privacy: What Every User Should Know

The short version

How AI platforms actually handle your data

Prompt hygiene: using AI without oversharing Habit change

Use placeholders for sensitive details

Clear conversation history regularly

Use work AI for work, personal AI for personal

Disable training opt-in on every platform

Platform data policies at a glance

The voice cloning threat you probably haven’t taken seriously

Where attackers source audio

Who gets targeted

Defending against voice cloning attacks

AI-powered phishing: the quality bar has collapsed

Spear phishing at scale

Deepfake video verification requests

Prompt injection and agentic AI: the new attack surface

Prompt injection

Scope what your AI can do

Your responsibility as an employee

Safer alternatives for sensitive work

Ollama — local models

Enterprise AI tiers

Perplexity — privacy mode

AI privacy action plan

Your AI privacy setup

Frequently asked questions

Written by Jason

The short version

How AI platforms actually handle your data

Prompt hygiene: using AI without oversharing Habit change

Use placeholders for sensitive details

Clear conversation history regularly

Use work AI for work, personal AI for personal

Disable training opt-in on every platform

Platform data policies at a glance

The voice cloning threat you probably haven’t taken seriously

Where attackers source audio

Who gets targeted

Defending against voice cloning attacks

AI-powered phishing: the quality bar has collapsed

Spear phishing at scale

Deepfake video verification requests

Prompt injection and agentic AI: the new attack surface

Prompt injection

Scope what your AI can do

Your responsibility as an employee

Safer alternatives for sensitive work

Ollama — local models

Enterprise AI tiers

Perplexity — privacy mode

AI privacy action plan

Your AI privacy setup

AI privacy changes fast. I'll track it for you.

Frequently asked questions

Written by Jason

Advanced Privacy & Security