Skip to main content
Try Lexiel for freeTry now →
25 minSofía

Privacy, Data and Anonymization

GDPR, own infrastructure vs public cloud, data breach case study.

Why you should NOT upload client data to ChatGPT

When you use ChatGPT, GPT-4, or any public cloud AI service, the data you input may be:

  1. Processed on servers outside the EU (typically USA).
  2. Used to improve the model (unless you use the API with explicit opt-out).
  3. Accessible to employees of the provider company during security audits.
  4. Stored for indeterminate periods in service logs.

This makes the lawyer the data controller (or processor) under GDPR, with all corresponding obligations.

The real problem: professional secrecy + data protection

The lawyer's professional secrecy duty (Art. 542.3 LOPJ, Art. 32 EGA in Spain: similar provisions exist across all EU jurisdictions) requires that client information not be disclosed to third parties without consent. By uploading data to a cloud LLM:

  • You disclose information to a third party (the AI provider).
  • You have no guarantee it won't be used for other purposes.
  • You may be transferring data outside the EEA without adequate legal basis.
  • Client consent doesn't necessarily cover this use.

GDPR applied to AI use in law firms

Relevant principles (Art. 5 GDPR)

PrincipleAI implication
Data minimizationOnly input data strictly necessary for the query
Purpose limitationClient data is for their legal matter, not model training
Integrity and confidentialityYou must ensure data doesn't leak through third-party services
AccountabilityYou must be able to demonstrate you take adequate measures

Do you need a DPIA (Data Protection Impact Assessment)?

Under Art. 35 GDPR, you likely need a DPIA if:

  • You process special categories of data (health, criminal records).
  • The processing is systematic and large-scale.
  • You use new technologies that may pose a high risk.

Generative AI applied to legal client data meets all three criteria in many cases.

Practical obligations

  1. Data processing agreement (Art. 28 GDPR) with the AI provider.
  2. Record of processing activities that includes AI use.
  3. Data subject information (Art. 13-14 GDPR): clients must know you use AI.
  4. International transfer assessment: Do data leave the EEA?
  5. Security measures: encryption, anonymization, pseudonymization.

Anonymization techniques for legal AI

Before entering any case data into an AI tool, you must anonymize it. Main techniques:

1. Pseudonymization

Replace identifying data with codes:

  • "John Smith" → "Party A" or "[CLAIMANT]"
  • "23 Main Street, London" → "[ADDRESS_1]"
  • "ID 12345678-A" → "[ID_REDACTED]"

Advantage: Maintains text coherence (you can follow the case logic). Limitation: Not full anonymization: reversible if you have the mapping table.

2. True anonymization

Completely remove data that allows identification:

  • Remove names, IDs, addresses, phone numbers, emails.
  • Generalize: "47-year-old man from Madrid" → "middle-aged person from a large city".
  • Remove unique data: case file number, vehicle registration.

3. Synthetic data

Generate fictional data that maintains the statistical properties of the case without revealing real data:

  • Use coherent fictional names.
  • Substitute real amounts with approximate ones.
  • Change dates while maintaining relative deadlines.

When to use each technique

TechniqueWhen to useExample
PseudonymizationInternal analysis where you need coherencePreparing procedural strategy with AI
True anonymizationSharing with cloud toolsAsking ChatGPT about a type of case
Synthetic dataTraining, demos, testingTraining the team in legal AI use

Own infrastructure vs public cloud

Model A: Public cloud (ChatGPT, Claude API, etc.)

  • Pros: Easy to use, always updated, no maintenance.
  • Cons: Data leaves your control, possible training on your data, complex GDPR compliance.
  • Suitable for: Generic queries without real data, training, brainstorming.

Model B: API with data processing agreement

  • Pros: Training opt-out, DPA contract, better control.
  • Cons: Data still leaves your infrastructure, possible international transfer.
  • Suitable for: Professional use with pseudonymized data and signed DPA.

Model C: Own infrastructure / on-premise

  • Pros: Total control, data never leaves, simplified GDPR compliance.
  • Cons: Requires investment in hardware/infrastructure, own models may be less capable.
  • Suitable for: Large firms with very sensitive data and infrastructure budget.

Model D: Specialized legal tool with EU infrastructure

  • Pros: Combines model quality with controlled infrastructure, DPA included, designed for compliance.
  • Cons: Subscription cost, provider dependency.
  • Suitable for: Most firms wanting to use AI professionally and safely.

Case study: data breach in a law firm

Scenario

A lawyer from a mid-size firm copies the full text of a complaint (with names, IDs, addresses, bank details of the claimant) and pastes it into ChatGPT to ask for a summary.

Potential consequences

  1. GDPR infringement (Art. 83): fine up to 4% of annual turnover or €20 million.
  2. Breach of professional secrecy: possible disciplinary proceedings from the Bar Association.
  3. Civil liability: if the client discovers the leak, they can claim damages.
  4. Reputational damage: loss of client trust and damage to the firm's brand.

How it should have been done

  1. Pseudonymize the text before entering it.
  2. Use a legal tool with DPA and EU infrastructure.
  3. Verify the record of processing activities includes AI use.
  4. Inform the client that AI tools are used in the firm (clause in the service agreement).

Module summary

ConceptKey takeaway
Public cloud dataNever upload real client data to ChatGPT/GPT-4 without anonymizing
GDPRUsing AI with personal data requires legal basis, DPA, and records
AnonymizationPseudonymize at minimum; fully anonymize for public cloud
Professional secrecyExtends to AI tool usage: the lawyer is always responsible
InfrastructurePrefer tools with EU infrastructure and signed DPA

Video coming soon

For now you can read the written content below

Module quiz

1

Which GDPR article regulates the relationship with data processors?

2

What does automatic anonymization do?

3

Where is data processed in a legal tool with own EU infrastructure?

4

Can OpenAI use your ChatGPT data to train its models?

5

What was the consequence for the firm that leaked data by using general AI?

Have your own legal questions?

The Individual Plan gives you 50 queries/month with answers verified against official legal sources.

Try free for 14 days
Privacy, Data and Anonymization | Lexiel Academy