From 5ca022074dcdfa9d41632fb10d71ef0f5b99a197 Mon Sep 17 00:00:00 2001 From: Kyle J Turpin Date: Sat, 23 May 2026 17:48:24 -0500 Subject: [PATCH] feat: Document personal data handling and lawful basis in architecture and open questions --- docs/ARCHITECTURE.md | 93 +++++++++++++++++++++++++++++++++++++++++- docs/OPEN_QUESTIONS.md | 9 ++-- 2 files changed, 95 insertions(+), 7 deletions(-) diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md index 2c1afc3..8609bdb 100644 --- a/docs/ARCHITECTURE.md +++ b/docs/ARCHITECTURE.md @@ -234,8 +234,97 @@ FeDIY is designed with privacy as a first-class concern. The architecture must s ### Lawful Basis for Processing -- Account data is processed on the basis of contract (the user's agreement to the terms of service). -- Where consent is required (e.g. non-essential cookies, marketing communications), it is obtained explicitly, recorded, and revocable. +### Personal Data Register + +All personal data collected by FeDIY instances is documented here. Every field has a stated purpose and lawful basis. No field is collected without both. + +#### Required account fields (processed under contract) + +| Field | What is stored | Purpose | Retention | +|---|---|---|---| +| Email address | Plaintext (normalised) | Account recovery, notifications, operator contact | Duration of account; deleted on erasure | +| Password | Argon2id hash only — plaintext never stored or logged | Authentication | Duration of account; deleted on erasure | +| Handle (`@user@instance`) | Plaintext; URL-safe string | AP actor identity, addressability, federation | Duration of account; old handles redirect to new after a change; deleted on erasure | +| Display name | Plaintext; user-chosen, pseudonym allowed | Human-readable identity in UI and AP objects | Duration of account; deleted on erasure | +| Minimum age verified | Boolean (`true`/`false`) — raw date of birth is **not stored** | Compliance with COPPA/GDPR Art. 8 age gate; raw DOB is used once at registration to derive this flag and then discarded | Duration of account; deleted on erasure | +| Account creation timestamp | UTC timestamp | Audit, legal compliance | Duration of account; may be retained in anonymised moderation records after deletion | + +#### Optional profile fields (processed under contract — user chose to provide them) + +| Field | Purpose | Retention | +|---|---|---| +| Bio / about text | Public self-description | Duration of account; deleted on erasure | +| Avatar image | Visual identity in UI and AP actor object | Duration of account; deleted on erasure | +| Header / banner image | Profile page decoration | Duration of account; deleted on erasure | +| Location (free text, not geocoded) | Community context; user-declared, not verified | Duration of account; deleted on erasure | +| Preferred crafts / interests | Discovery and personalisation | Duration of account; deleted on erasure | +| Pronouns | Respectful interaction | Duration of account; deleted on erasure | +| External links (website, social profiles) | Attribution and cross-platform identity | Duration of account; deleted on erasure | +| Preferred locale | UI language and formatting | Duration of account; deleted on erasure | +| Display preferences (font, size, spacing, contrast) | Reading comfort and accessibility | Duration of account; deleted on erasure | + +#### Session and authentication data (processed under contract) + +| Field | What is stored | Purpose | Retention | +|---|---|---|---| +| Session token | Opaque cryptographic token (server-side record) | Authenticated API access | Purged on logout; purged on expiry; all tokens purged on account deletion | +| Token expiry | Timestamp | Session lifecycle management | Purged with token | +| Security event log | Timestamp + account ID + event type (login, logout, failed login, password change) — **no IP address** | Audit trail for account security events | Short retention (30 days); purged on account deletion | + +#### IP addresses + +**IP addresses are never written to persistent storage.** They are present in the request context during processing and discarded when the request completes. Brute-force and abuse detection uses in-memory rate limiting scoped to the running process, not a persistent IP log. + +This is a deliberate data minimisation decision. The privacy notice must state it explicitly as a feature of the platform's approach. + +#### Content data (processed under contract) + +| Data | Notes | Retention | +|---|---|---| +| Published projects (all fields, media, steps) | Publicly visible; federated to AP peers | Tombstoned on deletion (not fully erased if referenced by others); see Right to Erasure section | +| Draft projects | Private; never federated; not visible to other users | Fully deleted immediately on account deletion or on user request | +| Media attachments | Images, files uploaded to the instance | Deleted with the content they belong to | +| Tags, materials, tools associated with projects | Part of the project record | Same lifecycle as the project | + +#### Interaction data (processed under contract) + +| Data | Notes | Retention | +|---|---|---| +| Follows (outgoing and incoming) | Social graph; federated as AP Follow/Accept activities | Deleted on account deletion; unfollow activity sent to peers | +| Likes / favourites | Interaction record | Deleted on account deletion | +| Bookmarks and personal collections | Private by default | Deleted on account deletion; included in data export | +| Block list | Personal moderation data | Deleted on account deletion; included in data export | +| Mute list | Personal moderation data | Deleted on account deletion; included in data export | +| Keyword filters | Personal moderation data | Deleted on account deletion; included in data export | + +#### Moderation and safety records (lawful basis: legal obligation / legitimate interest) + +| Data | Notes | Retention | +|---|---|---| +| Reports filed by a user | User's own report history | Included in data export; deleted on account deletion (report content retained in anonymised form) | +| Moderation actions taken against a user | Actions, outcomes, dates | Anonymised at account deletion time (personal identifiers removed, safety record retained for legal compliance) | +| CSAM / NCII violation records | Anonymised record of confirmed violations and reports filed | Retained for operator's legal compliance obligations regardless of account status | + +#### Federated / remote actor data (lawful basis: legitimate interest — necessary to operate the federation) + +| Data | Notes | Retention | +|---|---|---| +| Remote actor profile cache | Handle, display name, AP actor URL, public key | Retained while needed for federation processing; purged when a `Delete` activity is received for the actor | +| Received AP activities | Cached copies of federated content from remote users | Retained while operationally needed; purged on receipt of `Delete` | + +#### Analytics (lawful basis: legitimate interest — only if truly anonymised) + +- First-party analytics are in scope, but must be **truly aggregate and anonymised** — not pseudonymised per-user event streams. +- Aggregate statistics (daily active users, most-viewed projects, search term frequency) that cannot be linked to any individual are not personal data under GDPR and require no consent. +- If per-user behavioural events are ever collected — even temporarily before aggregation — they become personal data at the point of collection and require explicit consent. +- The default configuration ships with analytics **off**. Operators enable it and are responsible for ensuring their approach stays within the anonymised boundary or obtains the required consent. + +### Lawful Basis Summary + +- **Contract**: required account fields, optional profile fields, session data, content data, interaction data. +- **Legal obligation**: age verification, CSAM/NCII reporting records, moderation records for legal compliance. +- **Legitimate interest**: federated actor cache, security event log, truly anonymised analytics. +- **Consent**: per-user behavioural analytics if ever collected; any future non-essential processing not covered above. - Processing logs record the lawful basis used for each data category. ### Right to Access (Article 15) diff --git a/docs/OPEN_QUESTIONS.md b/docs/OPEN_QUESTIONS.md index 9b85d68..7442258 100644 --- a/docs/OPEN_QUESTIONS.md +++ b/docs/OPEN_QUESTIONS.md @@ -16,6 +16,8 @@ Each question is tagged with the phase it blocks or most affects: [P0], [P1], [P - Materials are also extensible entities. The core material record (name, quantity, unit) can carry domain-specific extension payloads using the same mechanism as project extensions. Community-defined material type schemas (e.g. yarn, filament, PCB component) can be layered on without modifying the core model. - **Database: PostgreSQL is the primary persistence target.** JSONB, native full-text search, transactional consistency under concurrent federation fan-in, and broad managed hosting support make it the clear long-term fit. The persistence layer is behind a repository abstraction (trait-based interfaces), which keeps business logic independent of the database driver and leaves SQLite viable as a future lightweight self-hosting option without requiring changes to domain logic. See [ADR TBD: Persistence Layer Architecture]. - **Baseline content prohibitions (hardcoded, not operator-configurable):** CSAM, doxxing, and non-consensual intimate imagery (NCII) are prohibited on any FeDIY instance regardless of operator policy. The guiding principle is **consent**: minors cannot consent to sexual content; individuals have not consented to having their private identifying information published; subjects of intimate imagery have not consented to its distribution. Enforcement is in-code as far as technically feasible (hash-matching for CSAM and NCII; upload-time pattern detection and mandatory human-review tooling for doxxing). Weapons, violence, and similar dual-use content are **not** hardcoded prohibitions — legitimate DIY projects (fireworks, blacksmithing, knife-making) are indistinguishable at the software level and are handled by operator content policy and community moderation. +- **Baseline content prohibitions (hardcoded, not operator-configurable):** CSAM, doxxing, and non-consensual intimate imagery (NCII) are prohibited on any FeDIY instance regardless of operator policy. The guiding principle is **consent**: minors cannot consent to sexual content; individuals have not consented to having their private identifying information published; subjects of intimate imagery have not consented to its distribution. Enforcement is in-code as far as technically feasible (hash-matching for CSAM and NCII; upload-time pattern detection and mandatory human-review tooling for doxxing). Weapons, violence, and similar dual-use content are **not** hardcoded prohibitions — legitimate DIY projects (fireworks, blacksmithing, knife-making) are indistinguishable at the software level and are handled by operator content policy and community moderation. +- **Personal data register (Q38):** Full register in ARCHITECTURE.md. Required registration fields: email, password hash, handle, display name, minimum-age-verified boolean (raw DOB discarded after age check). IP addresses never stored — ephemeral only. Optional profile fields (bio, avatar, header image, location, preferred crafts, pronouns, external links, locale, display preferences) all under contract. Analytics must be truly aggregate/anonymised — per-user event streams require consent. Handles are changeable with a redirect from old to new URL. ## Upfront Clarification Plan (P0 -> Early P1) @@ -372,12 +374,9 @@ Decision: CSAM, doxxing, and NCII are hardcoded prohibitions enforced in code as ## Privacy and Legal Compliance **Q38 [P0/P1]** What personal data does FeDIY collect and what is the lawful basis for each category? +**Q38 [P0/P1]** ~~What personal data does FeDIY collect and what is the lawful basis for each category?~~ **RESOLVED — see Resolved Decisions and ARCHITECTURE.md Personal Data Register.** -- Account registration data (email, display name, password hash): processed under contract. What is the minimum required set? -- IP address and access logs: is there a documented retention window and purge policy before Phase 1 launches? -- Content data (projects, drafts, media): processed under contract. Drafts are private; what is the data model distinction between draft and published that ensures privacy? -- Session and authentication tokens: what is the TTL and purge-on-logout policy? -- Does the platform ever use personal data for purposes beyond what is needed to operate the service (e.g. analytics, recommendations)? If so, is consent obtained? +Decision: Full personal data register documented in ARCHITECTURE.md. Required registration fields: email, password hash, handle, display name, minimum-age-verified boolean (raw DOB discarded after age check). IP never stored. Optional profile fields under contract. Analytics must be truly aggregate/anonymised. Handles changeable with redirect. **Q39 [P1]** What does the right-to-access (GDPR Article 15) export look like?