Files

T

Matsubaa 55d9db5c6c feat: Enhance architecture and roadmap documentation with material extensibility and persistence layer details

2026-05-23 17:40:20 -05:00

27 KiB

Raw Blame History

FeDIY Architecture (Planning Baseline)

This document captures current architectural intent. It is a planning artifact and should be refined as implementation decisions mature.

System Boundaries

Primary system responsibilities:

Expose a stable HTTP API as the primary interface for all clients.
Host and render DIY project content through a bundled web UI that consumes the public API.
Manage local accounts and authorization.
Federate selected objects and activities using ActivityPub.
Enforce local moderation policy for both local and remote content.

Out of scope for early phases:

Rich social graph features beyond project-oriented interactions.
Highly customized recommendation systems.

Core Domains

Identity and actors.
Project authoring and publishing.
Federation transport and protocol translation.
Moderation and trust policy.
Search and discovery.

Project Domain Baseline

Current product definition:

A project is a defined work process.
Core project structure includes materials/ingredients, required tools, and ordered step-by-step instructions.
Steps may include embedded media hosted on this instance or linked from external sources.
Projects may include external canonical links (for example homepage, repository, or source publication).
Explicit project versioning is preferred and will be part of the domain model.
The project model is composable: a minimal core plus optional domain-specific extensions.
Domain-specific detail (for example knitting patterns/yarns, 3D print profiles/STLs, electronics BoM data) should be representable without being mandatory for all instances.
First-party FeDIY focuses on a stable extension mechanism rather than implementing every niche schema directly.
Materials are also an extensible entity: the core material record captures display name, quantity, and unit; domain-specific attributes (yarn weight, fibre content, filament diameter/material, wood species/grade, electronics component value/package) are carried in extension payloads on the material entry, using the same extension mechanism as project-level extensions.
A federated material catalog is a long-term goal: community-defined material types and shared taxonomy entries could be federated as ActivityPub objects, allowing instances to reference a common vocabulary without requiring central authority.

Client and Front-End Strategy

The server exposes a stable, documented HTTP API as its primary interface. The bundled web UI is a first-party client of that same API — it receives no privileged server-side access that a third-party client could not also use.

Principles:

API-first: every user-facing capability is reachable through the public API before the bundled UI uses it.
No server-side rendering shortcuts: the bundled UI must not depend on server-internal state or bypass the API layer.
Third-party clients are first-class: authentication, content negotiation, and capability negotiation must work the same way for any client origin.
Content negotiation: the server responds to Accept: application/activity+json (or application/ld+json) for federation endpoints, and Accept: application/json for the API, allowing a single URL namespace to serve both.
CORS: cross-origin API access is supported for read-only public resources, with authenticated endpoints requiring explicit opt-in by the instance operator.
Bundled UI is optional at deploy time: an instance operator should be able to run the backend and serve their own front-end without modifying server code.

API surface areas:

Public read API: browse projects, actors, and search without authentication.
Authenticated user API: authoring, account management, preferences.
Moderation API: moderation actions and audit review, scoped by role.
Federation endpoints: ActivityPub inbox/outbox per the ActivityPub specification.
Well-known endpoints: WebFinger, NodeInfo, and instance metadata.

Bundled web UI:

Delivered as static assets served from the same origin.
Communicates only through the public API.
Treated as the reference client for API usability validation.
Should degrade gracefully where JavaScript is unavailable where practical.
Must support per-user display preferences: font choice (including dyslexia-friendly fonts), size, line spacing, and contrast settings persisted to the user's account.
Must not hardcode fonts or layout in ways that prevent user override.

Logical Components

Public HTTP API layer (routing, auth middleware, content negotiation, CORS).
Domain service layer for project and account workflows.
Federation layer (inbox/outbox, signing, verification, retries).
Persistence layer for local and federated records.
Moderation policy engine and audit log.
Bundled web UI (static asset delivery).
Background workers for delivery, indexing, and maintenance tasks.

Data and Object Strategy

Separate local canonical records from imported federated records.
Preserve source metadata for remote content and actor provenance.
Track object lifecycle states to support idempotent federation processing.
Persist project data as core fields plus extension payloads so instances can tailor domain detail without fragmenting the base model.
Material entries within a project carry the same extension payload structure as the project itself; domain-specific material attributes are co-located with the material entry rather than encoded in project-level fields.

Persistence Layer Architecture

Database

PostgreSQL is the primary persistence target.

Reasons:

JSONB: ActivityPub objects and extension payloads are stored as structured JSON with PostgreSQL's JSONB operators. Extension payloads can be queried, indexed, and validated without an external document store.
Native full-text search: PostgreSQL's built-in FTS with tsvector/tsquery eliminates an external search service for Phase 1. Language-specific configurations (stemming, stop words) are available per-column.
Transactional consistency under federation load: federation fan-in (many incoming AP activities from many peers) involves concurrent writes. PostgreSQL's MVCC concurrency model and row-level locking handle this safely. SQLite's single-writer model would be a bottleneck under the same load.
Broad managed hosting support: PostgreSQL is available on every major cloud platform and hosting provider with zero operational effort, lowering the barrier for instance operators.

SQLite is not in scope for Phase 1 but is explicitly not ruled out as a future lightweight self-hosting option (see Repository Abstraction below).

Repository Abstraction

No business logic or domain service code queries the database directly. All persistence operations go through a repository interface layer:

Each domain aggregate (project, account, actor, moderation record, etc.) has a corresponding repository interface defined as a trait.
The domain service layer depends only on those traits, not on any database library types.
The PostgreSQL implementation of each trait is the only first-party implementation.
A SQLite implementation could be added in a future phase by implementing the same traits with SQLite-dialect queries — zero changes to domain logic or API handlers would be required.
The repository layer is also the natural seam for test doubles: domain logic tests can use an in-memory implementation of the traits without a running database.

Query Library

The query library choice is deferred to the implementation ADR, but the constraints are:

Must support async execution.
Must support both PostgreSQL and SQLite dialects (to keep the SQLite future option open).
Compile-time query checking is strongly preferred to catch SQL errors before runtime.
sqlx satisfies all three constraints and is the expected choice, but the decision is recorded in the ADR.

Migration Strategy

See Q28. Database migrations are run as part of application startup (or a separate migrate subcommand) and are versioned, idempotent, and checked into source control alongside the schema they produce.

Federation Strategy

Start with a narrow, explicit ActivityPub profile.
Prefer strict validation and clear rejection reasons over permissive parsing.
Use replay-safe request validation and deterministic retry behavior.
Maintain an interop test matrix for protocol behaviors.

Content Integrity

Certain categories of content are prohibited on any FeDIY instance regardless of operator configuration. These are not moderation policy — they are non-negotiable constraints enforced at the software level. The guiding principle is consent: the prohibited categories share the property that no legitimate consent to the content's creation or publication can exist.

Hardcoded Prohibitions

Category	Rationale	Enforcement approach
Child Sexual Abuse Material (CSAM)	Minors cannot consent to sexual content; production is abuse	Perceptual hash-matching against NCMEC hash database on every upload
Non-Consensual Intimate Imagery (NCII)	Subject has not consented to distribution	Hash-matching against StopNCII or equivalent database on every upload
Doxxing	Individual has not consented to publication of their private identifying information	Upload-time pattern detection (phone, address, government ID formats) as a signal; mandatory human-review flagging pipeline; rapid takedown tooling

What Is Not Hardcoded

Weapons, violence, and dual-use content are not hardcoded prohibitions. Legitimate DIY projects — fireworks, blacksmithing, blade-smithing, pyrotechnics, casting — can be indistinguishable from prohibited content at the software level. This category is handled by operator content policy and community moderation tools, not by the platform software.

Hash-Matching Infrastructure

All media uploaded to a FeDIY instance is processed through the hash-matching pipeline before storage is confirmed. A match results in rejection of the upload and triggers the reporting workflow.
Hash databases are not bundled with the software. Operators must configure the integration (NCMEC PhotoDNA, Microsoft CSAM hash API, or equivalent) before media upload is enabled. The application refuses to enable media upload without a configured hash-matching endpoint.
Hash-matching is performed locally on the server; media content is never transmitted to a third-party hash service. Only the computed hash is compared.
FeDIY provides clear integration documentation and a test mode for operators to verify their configuration before going live.

Doxxing Detection

Upload-time and submission-time scanning checks text content for patterns consistent with personal identifying information: phone number formats, postal address patterns, government ID number patterns (country-configurable), and combinations that together identify an individual.
Pattern matching is a signal, not a block: false positives (a project step referencing a phone socket) must not prevent legitimate content. Matched content is flagged for moderator review, not auto-rejected.
All instances must have at least one active moderator account to receive flagged content alerts before registrations are opened.

Reporting and Legal Obligations

When a CSAM match is confirmed by a moderator, the operator is required to report to the relevant national authority (NCMEC CyberTipline in the US, IWF in the UK, etc.). FeDIY provides a reporting workflow and documentation; the legal obligation rests with the instance operator as the data controller and platform host.
NCII confirmed matches follow the StopNCII/similar removal workflow; the operator notifies the subject where possible.
The platform stores a minimal, anonymised record of confirmed violations and reports for the operator's legal compliance purposes.

Moderation and Safety Strategy

Instance and Moderator-Level Controls

Local policy is authoritative for what is visible on this instance.
Policy controls exist at three levels: object, actor, and instance.
Moderation actions produce auditable events.
Appeals and reversal policy should be documented before broad federation rollout.

User-Level Personal Moderation

Users have unilateral, moderator-independent tools to protect themselves from bad actors. These do not require moderator approval and take effect immediately for the acting user:

Block users: block specific local or remote accounts. A blocked account cannot interact with the user's content and the user does not see the blocked account's content.
Block instances: block an entire remote instance. The user sees no content from that instance and cannot be interacted with by its users.
Mute users: suppress a user's content from the user's feed without full blocking.
Keyword and wildcard filtering: filter content containing specific words, phrases, or wildcard patterns from the user's feed and notifications. Filters operate client-side or server-side at the user's preference.
No moderator gate: user-level blocks, mutes, and filters are the user's own data. Instance moderators cannot prevent a user from protecting themselves.

Users may choose to share their personal moderation and curation data with others:

Exportable/importable block lists: users can export their block list as a portable format (JSON-LD or similar) and share it. Others can import it in whole or selectively.
Subscribable block lists: a user may publish a block list as a live resource. Subscribers can opt in to apply it automatically or review updates manually.
Recommendation lists: users can publish curated lists of projects, accounts, or tags they recommend. Lists are first-class objects with their own AP identity.
Personal collections and bookmarks: users can organize saved projects into named collections and optionally make those collections public or shareable.
Community-defined lists: groups of users (communities, instances) can collaboratively maintain shared lists for moderation or discovery purposes.
Attribution and transparency: shared lists carry attribution to the list maintainer. Subscribers know whose judgment they are trusting.

Localization and Internationalization Strategy

FeDIY is designed to be usable across languages and locales from the beginning:

Locale-aware API: all user-facing strings are externalized and never hardcoded in the API layer. The API surfaces language/locale metadata about content (project language, author locale) to allow clients to filter, translate, or display appropriately.
Content language tagging: projects carry a declared language tag (BCP 47). Instances may restrict accepted content languages or accept all. Federated objects include language metadata.
UI string externalization: all bundled UI strings are stored in locale files (e.g., JSON or gettext PO format) from the start. The architecture does not permit hardcoded display strings.
RTL layout support: layout must support right-to-left scripts (Arabic, Hebrew, Persian, etc.) from the first UI design pass. CSS logical properties are preferred over physical ones.
Locale-sensitive formatting: dates, times, numbers, and units use locale-aware formatting rather than hardcoded conventions.
Community translation: translation files are exposed in the repository so that the community can contribute translations without touching application code.
Locale as a user preference: authenticated users can set their preferred locale, stored in their account. Unauthenticated users can set locale via browser Accept-Language or an explicit UI control.
Search and indexing: full-text search configurations are locale-aware where the persistence layer supports it (e.g., PostgreSQL language dictionaries).

Non-Functional Requirements

Accessibility: designed from the start to be inclusive for people with sensory and motor differences. Includes WCAG 2.1 AA minimum, text-to-speech hooks, alt text for all visual content, captions for audio, keyboard navigation, and support for assistive technologies.
Localization: all UI strings externalized and locale-ready from day one; RTL layout support; content language tagging; community-translatable.
Reading comfort: users can override fonts (including dyslexia-friendly typefaces), size, line spacing, and contrast. No system-level or application-level font lock-in.
Reliability: resilient delivery and retry strategy for federated traffic.
Security: signature verification, key management, least-privilege defaults, and safe CORS policy.
Performance: predictable latency for local reads and bounded queues for remote events.
Operability: metrics, logs, and runbooks for incident response.
API stability: public API changes follow a deprecation policy; breaking changes require a version increment.

Privacy and Legal Compliance Strategy

FeDIY is designed with privacy as a first-class concern. The architecture must satisfy GDPR (and equivalent data-protection regulations) by design, not by retrofit.

Data Minimisation and Purpose Limitation

Collect only the personal data required for the platform to function.
Every personal data field must have a documented purpose. Fields without a clear purpose are not collected.
Access to personal data is scoped to the minimum needed for each system component.

Lawful Basis for Processing

Account data is processed on the basis of contract (the user's agreement to the terms of service).
Where consent is required (e.g. non-essential cookies, marketing communications), it is obtained explicitly, recorded, and revocable.
Processing logs record the lawful basis used for each data category.

Right to Access (Article 15)

Users can request a full export of all personal data held about them via a self-service API endpoint.
The export is machine-readable (JSON), human-readable, and covers: account data, published and draft content, interactions (follows, likes, bookmarks), moderation history affecting the user, and session/audit records.
Federated data about the user held by remote instances is outside the local instance's control; exports note this explicitly.

Right to Erasure / Right to be Forgotten — Technical Design

GDPR Article 17 (and equivalent laws in other jurisdictions) defines erasure at two levels. FeDIY must handle both:

Level 1 — Local erasure (the account deletion step):

Users initiate deletion via a self-service workflow with no moderator or admin involvement required.
Local erasure covers: account credentials, profile fields, email address, private/draft content, session tokens, IP logs beyond the retention window, interactions data (follows, likes, bookmarks, block lists) that is not itself public.
Draft projects (never published) are deleted immediately.
Deleted account data is fully purged within a defined maximum window (e.g. 30 days), not merely soft-deleted. The purge window is disclosed in the privacy notice.
A deletion-in-progress state is visible to the user while processing completes.

Level 2 — Propagation to third-party controllers (GDPR Art. 17(2)):

GDPR Article 17(2) requires that where a controller has made personal data public, it must take reasonable steps to inform other controllers processing that data of the erasure request. For FeDIY, this means federated instances.
FeDIY fulfils this via ActivityPub Delete activities sent to all known federation peers that have received activities from this actor.
Delete is sent for: the actor object itself, all known published objects (projects, activities).
Delivery is best-effort with retry. FeDIY cannot compel remote instances to comply — this is a known limitation of the federated model, and the privacy notice must state it clearly.
A log of which peers received the Delete activity (and delivery status) is retained for the operator's accountability records. This log is separate from the user's personal data and may be kept for legal compliance.

Tombstoning vs. full erasure of public content:

Publicly published content that is referenced by other objects (e.g. a project that has been boosted or commented on by other users) may be replaced by an ActivityPub Tombstone rather than silently deleted. The Tombstone preserves object identity without reproducing the content or the author's personal data.
The privacy notice and the deletion UI must clearly explain: (a) the distinction between draft/private data (deleted) and published content (tombstoned), and (b) the federation propagation limitation.
A user who wants all traces removed must be told honestly what FeDIY can and cannot control.

Permitted retentions (GDPR Art. 17(3) exceptions):

Moderation and safety records may be retained in anonymised form where necessary for legal compliance (e.g. CSAM reporting obligations, abuse records). These are anonymised at deletion time: the personal identifiers are removed but the safety record is kept.
Legal hold: if an account is under active investigation or legal proceedings, deletion may be suspended for the duration. The user must be informed of the hold.
Financial transaction records (where applicable) must comply with applicable tax and accounting retention laws, which may be longer than the general data window.

RTBF for individual content items (not full account deletion):

A user may request deletion of specific published content (a project, a comment) without deleting their entire account.
The same tombstone/propagation logic applies per object.
This supports the GDPR right to withdraw consent for a specific published item without requiring full account closure.

Multi-Jurisdiction Compliance Matrix

FeDIY is an open, self-hostable, federated platform. Instance operators may be subject to different legal regimes depending on where they operate and where their users are located. The architecture must support compliance across the primary regulatory regimes; instance operators are responsible for customising behaviour to their jurisdiction.

Jurisdiction	Law	Erasure/RTBF right	Deletion SLA	Notes
EU / EEA	GDPR Art. 17	Yes — right to erasure + propagation	Without undue delay; typically ≤30 days	Includes Art. 17(2) propagation obligation
UK	UK GDPR (post-Brexit)	Yes — equivalent to EU GDPR	Same as EU GDPR	Applies to UK users/controllers
California, US	CCPA/CPRA	Yes — right to delete	45 days (extendable to 90)	Opt-out of sale/sharing; right to correct
Virginia, US	VCDPA	Yes — right to delete	45 days	Opt-out of targeted advertising
Colorado, US	CPA	Yes	45 days	Global Privacy Control must be honored
Connecticut, US	CTDPA	Yes	45 days
Texas, US	TDPSA	Yes	45 days	Applies broadly to controllers
Other US states	Various (and growing)	Varies	Typically 45–90 days	Architecture must be jurisdiction-agnostic
Brazil	LGPD Art. 18	Yes — right to deletion	Reasonable time	Applies to data of Brazilian residents
Canada	PIPEDA / C-27	Limited currently; C-27 will add explicit right	Bill C-27 pending	Design for upcoming CPPA
UK / Australia	Privacy Act 1988 (AU)	Limited; reform ongoing	Varies	Design to accommodate

Architectural implications:

Jurisdiction-agnostic deletion workflow: the erasure workflow is the same regardless of the user's jurisdiction; the operator configures any jurisdiction-specific behaviour (SLA, exceptions, notices) via instance settings.
Configurable SLA timers: instance operators can set the deletion SLA window (e.g. 30 days for GDPR, 45 days for CCPA) via configuration. The default should satisfy the strictest common requirement.
Opt-out signals: the architecture must support Global Privacy Control (GPC) header signals, which California (and others) require businesses to honor. The API and UI must process GPC as an opt-out of data sale/sharing.
No data sale: FeDIY does not sell user data; this simplifies compliance but opt-out infrastructure may still be needed for operators who use analytics or advertising services.
Right to correct: both GDPR (Art. 16) and CCPA/CPRA provide this right. Self-service profile and content editing satisfies it.
Response time tracking: the system must be capable of recording when a deletion request was received, what was deleted, and when the purge completed, to support operator accountability obligations under multiple laws.

Right to Rectification (Article 16)

Users can correct or update their account profile data at any time without requiring moderator involvement.
Updates to personal data propagate to federated instances via ActivityPub Update activities.

Right to Data Portability (Article 20)

The data export (see Right to Access) is in a portable, interoperable format.
Where possible, exported project data can be imported into another FeDIY instance or compatible platform.

Data Retention Limits

A configurable retention policy governs how long the following categories are kept:
- Authentication logs and failed login records: short retention (e.g. 30–90 days) unless a security incident requires longer.
- Session tokens: purged on logout and on expiry.
- IP address logs: not stored beyond what is operationally required; never logged against content in a way that persists indefinitely.
- Deleted account data: fully purged within a defined window after the deletion request is processed (e.g. 30 days), except for legally required anonymised moderation records.
Instance operators must be able to configure retention windows to meet their local legal obligations.

Federated Data and Third-Party Instances

When a user deletes their account, a best-effort Delete activity is sent to known federated peers. FeDIY cannot compel remote instances to comply.
The privacy notice must clearly explain that content shared via federation may persist on remote instances beyond FeDIY's control.
FeDIY does not store personal data about remote users beyond what is strictly required to process incoming activities.

Children's Privacy

FeDIY does not knowingly collect personal data from children under the age of 13 (or the applicable age in the user's jurisdiction).
Age verification approach (self-declaration, parental consent, or age-gate) must be defined before Phase 1 launch.

Privacy Notice

A machine-readable and human-readable privacy notice must be published at a well-known URL before any public instance accepts registrations.
The notice describes: what data is collected, why, how long it is retained, how to exercise rights, and contact information for the data controller.
Instance operators are data controllers for their own instances. FeDIY provides a template notice but operators are responsible for customising it to their jurisdiction.

Architecture Decision Practice

Decisions are captured as ADRs in docs/adrs/.
Each ADR includes: context, options considered, decision, and consequences.
See ADR 0001: API-First with Bundled Web UI.

27 KiB Raw Blame History Unescape Escape