feat: Enhance architecture and roadmap documentation with material extensibility and persistence layer details

2026-05-23 17:40:20 -05:00
parent 941a9da928
commit 55d9db5c6c
3 changed files with 110 additions and 10 deletions
@@ -37,6 +37,8 @@ Current product definition:
 - The project model is composable: a minimal core plus optional domain-specific extensions.
 - Domain-specific detail (for example knitting patterns/yarns, 3D print profiles/STLs, electronics BoM data) should be representable without being mandatory for all instances.
 - First-party FeDIY focuses on a stable extension mechanism rather than implementing every niche schema directly.
+- Materials are also an extensible entity: the core material record captures display name, quantity, and unit; domain-specific attributes (yarn weight, fibre content, filament diameter/material, wood species/grade, electronics component value/package) are carried in extension payloads on the material entry, using the same extension mechanism as project-level extensions.
+- A federated material catalog is a long-term goal: community-defined material types and shared taxonomy entries could be federated as ActivityPub objects, allowing instances to reference a common vocabulary without requiring central authority.

 ## Client and Front-End Strategy

@@ -84,6 +86,45 @@ Bundled web UI:
 - Preserve source metadata for remote content and actor provenance.
 - Track object lifecycle states to support idempotent federation processing.
 - Persist project data as core fields plus extension payloads so instances can tailor domain detail without fragmenting the base model.
+- Material entries within a project carry the same extension payload structure as the project itself; domain-specific material attributes are co-located with the material entry rather than encoded in project-level fields.
+
+## Persistence Layer Architecture
+
+### Database
+
+PostgreSQL is the primary persistence target.
+
+Reasons:
+
+- **JSONB**: ActivityPub objects and extension payloads are stored as structured JSON with PostgreSQL's JSONB operators. Extension payloads can be queried, indexed, and validated without an external document store.
+- **Native full-text search**: PostgreSQL's built-in FTS with `tsvector`/`tsquery` eliminates an external search service for Phase 1. Language-specific configurations (stemming, stop words) are available per-column.
+- **Transactional consistency under federation load**: federation fan-in (many incoming AP activities from many peers) involves concurrent writes. PostgreSQL's MVCC concurrency model and row-level locking handle this safely. SQLite's single-writer model would be a bottleneck under the same load.
+- **Broad managed hosting support**: PostgreSQL is available on every major cloud platform and hosting provider with zero operational effort, lowering the barrier for instance operators.
+
+SQLite is **not in scope for Phase 1** but is explicitly not ruled out as a future lightweight self-hosting option (see Repository Abstraction below).
+
+### Repository Abstraction
+
+No business logic or domain service code queries the database directly. All persistence operations go through a repository interface layer:
+
+- Each domain aggregate (project, account, actor, moderation record, etc.) has a corresponding repository interface defined as a trait.
+- The domain service layer depends only on those traits, not on any database library types.
+- The PostgreSQL implementation of each trait is the only first-party implementation.
+- A SQLite implementation could be added in a future phase by implementing the same traits with SQLite-dialect queries — zero changes to domain logic or API handlers would be required.
+- The repository layer is also the natural seam for test doubles: domain logic tests can use an in-memory implementation of the traits without a running database.
+
+### Query Library
+
+The query library choice is deferred to the implementation ADR, but the constraints are:
+
+- Must support async execution.
+- Must support both PostgreSQL and SQLite dialects (to keep the SQLite future option open).
+- Compile-time query checking is strongly preferred to catch SQL errors before runtime.
+- `sqlx` satisfies all three constraints and is the expected choice, but the decision is recorded in the ADR.
+
+### Migration Strategy
+
+See Q28. Database migrations are run as part of application startup (or a separate migrate subcommand) and are versioned, idempotent, and checked into source control alongside the schema they produce.

 ## Federation Strategy

@@ -92,6 +133,41 @@ Bundled web UI:
 - Use replay-safe request validation and deterministic retry behavior.
 - Maintain an interop test matrix for protocol behaviors.

+## Content Integrity
+
+Certain categories of content are prohibited on any FeDIY instance regardless of operator configuration. These are not moderation policy — they are non-negotiable constraints enforced at the software level. The guiding principle is **consent**: the prohibited categories share the property that no legitimate consent to the content's creation or publication can exist.
+
+### Hardcoded Prohibitions
+
+| Category | Rationale | Enforcement approach |
+|---|---|---|
+| Child Sexual Abuse Material (CSAM) | Minors cannot consent to sexual content; production is abuse | Perceptual hash-matching against NCMEC hash database on every upload |
+| Non-Consensual Intimate Imagery (NCII) | Subject has not consented to distribution | Hash-matching against StopNCII or equivalent database on every upload |
+| Doxxing | Individual has not consented to publication of their private identifying information | Upload-time pattern detection (phone, address, government ID formats) as a signal; mandatory human-review flagging pipeline; rapid takedown tooling |
+
+### What Is Not Hardcoded
+
+Weapons, violence, and dual-use content are **not** hardcoded prohibitions. Legitimate DIY projects — fireworks, blacksmithing, blade-smithing, pyrotechnics, casting — can be indistinguishable from prohibited content at the software level. This category is handled by operator content policy and community moderation tools, not by the platform software.
+
+### Hash-Matching Infrastructure
+
+- All media uploaded to a FeDIY instance is processed through the hash-matching pipeline **before** storage is confirmed. A match results in rejection of the upload and triggers the reporting workflow.
+- Hash databases are not bundled with the software. Operators must configure the integration (NCMEC PhotoDNA, Microsoft CSAM hash API, or equivalent) before media upload is enabled. The application refuses to enable media upload without a configured hash-matching endpoint.
+- Hash-matching is performed locally on the server; media content is never transmitted to a third-party hash service. Only the computed hash is compared.
+- FeDIY provides clear integration documentation and a test mode for operators to verify their configuration before going live.
+
+### Doxxing Detection
+
+- Upload-time and submission-time scanning checks text content for patterns consistent with personal identifying information: phone number formats, postal address patterns, government ID number patterns (country-configurable), and combinations that together identify an individual.
+- Pattern matching is a signal, not a block: false positives (a project step referencing a phone socket) must not prevent legitimate content. Matched content is flagged for moderator review, not auto-rejected.
+- All instances must have at least one active moderator account to receive flagged content alerts before registrations are opened.
+
+### Reporting and Legal Obligations
+
+- When a CSAM match is confirmed by a moderator, the operator is required to report to the relevant national authority (NCMEC CyberTipline in the US, IWF in the UK, etc.). FeDIY provides a reporting workflow and documentation; the legal obligation rests with the instance operator as the data controller and platform host.
+- NCII confirmed matches follow the StopNCII/similar removal workflow; the operator notifies the subject where possible.
+- The platform stores a minimal, anonymised record of confirmed violations and reports for the operator's legal compliance purposes.
+
 ## Moderation and Safety Strategy

 ### Instance and Moderator-Level Controls