Why you should NOT upload client data to ChatGPT
When you use ChatGPT, GPT-4, or any public cloud AI service, the data you input may be:
- Processed on servers outside the EU (typically USA).
- Used to improve the model (unless you use the API with explicit opt-out).
- Accessible to employees of the provider company during security audits.
- Stored for indeterminate periods in service logs.
This makes the lawyer the data controller (or processor) under GDPR, with all corresponding obligations.
The real problem: professional secrecy + data protection
The lawyer's professional secrecy duty (Art. 542.3 LOPJ, Art. 32 EGA in Spain: similar provisions exist across all EU jurisdictions) requires that client information not be disclosed to third parties without consent. By uploading data to a cloud LLM:
- You disclose information to a third party (the AI provider).
- You have no guarantee it won't be used for other purposes.
- You may be transferring data outside the EEA without adequate legal basis.
- Client consent doesn't necessarily cover this use.
GDPR applied to AI use in law firms
Relevant principles (Art. 5 GDPR)
| Principle | AI implication |
|---|
| Data minimization | Only input data strictly necessary for the query |
| Purpose limitation | Client data is for their legal matter, not model training |
| Integrity and confidentiality | You must ensure data doesn't leak through third-party services |
| Accountability | You must be able to demonstrate you take adequate measures |
Do you need a DPIA (Data Protection Impact Assessment)?
Under Art. 35 GDPR, you likely need a DPIA if:
- You process special categories of data (health, criminal records).
- The processing is systematic and large-scale.
- You use new technologies that may pose a high risk.
Generative AI applied to legal client data meets all three criteria in many cases.
Practical obligations
- Data processing agreement (Art. 28 GDPR) with the AI provider.
- Record of processing activities that includes AI use.
- Data subject information (Art. 13-14 GDPR): clients must know you use AI.
- International transfer assessment: Do data leave the EEA?
- Security measures: encryption, anonymization, pseudonymization.
Anonymization techniques for legal AI
Before entering any case data into an AI tool, you must anonymize it. Main techniques:
1. Pseudonymization
Replace identifying data with codes:
- "John Smith" → "Party A" or "[CLAIMANT]"
- "23 Main Street, London" → "[ADDRESS_1]"
- "ID 12345678-A" → "[ID_REDACTED]"
Advantage: Maintains text coherence (you can follow the case logic).
Limitation: Not full anonymization: reversible if you have the mapping table.
2. True anonymization
Completely remove data that allows identification:
- Remove names, IDs, addresses, phone numbers, emails.
- Generalize: "47-year-old man from Madrid" → "middle-aged person from a large city".
- Remove unique data: case file number, vehicle registration.
3. Synthetic data
Generate fictional data that maintains the statistical properties of the case without revealing real data:
- Use coherent fictional names.
- Substitute real amounts with approximate ones.
- Change dates while maintaining relative deadlines.
When to use each technique
| Technique | When to use | Example |
|---|
| Pseudonymization | Internal analysis where you need coherence | Preparing procedural strategy with AI |
| True anonymization | Sharing with cloud tools | Asking ChatGPT about a type of case |
| Synthetic data | Training, demos, testing | Training the team in legal AI use |
Own infrastructure vs public cloud
Model A: Public cloud (ChatGPT, Claude API, etc.)
- Pros: Easy to use, always updated, no maintenance.
- Cons: Data leaves your control, possible training on your data, complex GDPR compliance.
- Suitable for: Generic queries without real data, training, brainstorming.
Model B: API with data processing agreement
- Pros: Training opt-out, DPA contract, better control.
- Cons: Data still leaves your infrastructure, possible international transfer.
- Suitable for: Professional use with pseudonymized data and signed DPA.
Model C: Own infrastructure / on-premise
- Pros: Total control, data never leaves, simplified GDPR compliance.
- Cons: Requires investment in hardware/infrastructure, own models may be less capable.
- Suitable for: Large firms with very sensitive data and infrastructure budget.
Model D: Specialized legal tool with EU infrastructure
- Pros: Combines model quality with controlled infrastructure, DPA included, designed for compliance.
- Cons: Subscription cost, provider dependency.
- Suitable for: Most firms wanting to use AI professionally and safely.
Case study: data breach in a law firm
Scenario
A lawyer from a mid-size firm copies the full text of a complaint (with names, IDs, addresses, bank details of the claimant) and pastes it into ChatGPT to ask for a summary.
Potential consequences
- GDPR infringement (Art. 83): fine up to 4% of annual turnover or €20 million.
- Breach of professional secrecy: possible disciplinary proceedings from the Bar Association.
- Civil liability: if the client discovers the leak, they can claim damages.
- Reputational damage: loss of client trust and damage to the firm's brand.
How it should have been done
- Pseudonymize the text before entering it.
- Use a legal tool with DPA and EU infrastructure.
- Verify the record of processing activities includes AI use.
- Inform the client that AI tools are used in the firm (clause in the service agreement).
Module summary
| Concept | Key takeaway |
|---|
| Public cloud data | Never upload real client data to ChatGPT/GPT-4 without anonymizing |
| GDPR | Using AI with personal data requires legal basis, DPA, and records |
| Anonymization | Pseudonymize at minimum; fully anonymize for public cloud |
| Professional secrecy | Extends to AI tool usage: the lawyer is always responsible |
| Infrastructure | Prefer tools with EU infrastructure and signed DPA |