0058: User Domain Model and PII Strategy
Date: 2025-12-21
Status: Accepted
Context
The user-directory-service is responsible for managing the lifecycle of users across multiple Identity Providers (IdPs). A critical architectural challenge is defining the local data model for a "User" and determining how much Personally Identifiable Information (PII) should be stored within the Citadel platform versus remaining exclusively in the upstream IdP.
Storing PII (Name, Email, Phone) locally creates:
- Compliance Liability: Increases the scope for GDPR/CCPA compliance.
- Synchronization Complexity: Requires keeping local data in sync with the IdP (the source of truth).
However, not storing any user data locally creates:
- Performance Issues: Listing users requires querying the IdP API, which is often slow and rate-limited.
- Referential Integrity Issues: Other services (e.g.,
book-keeper) need a stable foreign key (created_by_user_id) to reference users. IdP identifiers (sub) can change if the IdP is migrated or if the user is re-created. - Join Complexity: It becomes impossible to perform SQL joins between business entities (e.g., Invoices) and Users.
Decision
We will adopt the Hybrid / Pointer Model (Option C) for the User Domain.
-
Internal "Pointer" Entity: The
user-directory-servicewill maintain a localUserentity that acts as a stable pointer to the external identity.id(UUID): The internal, immutable primary key used by all other Citadel services.external_id(String): The unique identifier from the IdP (e.g.,sub,oid,uid). This is configurable per adapter.idp_id(String): The routing key identifying which IdP adapter owns this user (e.g.,keycloak-default,rauth-staging).tenant_id(UUID): The tenant context for this user.
-
Pragmatic PII Caching: We will operate in Pragmatic Mode. We will store minimal PII (specifically
emailandfull_name) in the local database.- Purpose: This is strictly a read-only cache to enable performant UI displays (e.g., "Created by John Doe") and efficient searching/filtering within the Admin Portal without hammering the IdP API.
- Source of Truth: The IdP remains the absolute source of truth. Authentication and profile updates must happen at the IdP.
- Synchronization: The cache is updated during login (via token claims) or via webhooks from the IdP.
-
Configurable Subject Claim: The specific claim used to populate
external_idmust be configurable per adapter, as different IdPs use different fields (e.g., Auth0 usessub, Azure AD usesoid).
Consequences
Positive
- Stable References: Downstream services can rely on a stable, internal UUID (
user_id) that never changes, even if the upstream IdP is swapped. - Performance: "List Users" screens in the Admin UI can be served instantly from the local database with pagination and filtering, avoiding slow IdP API calls.
- SQL Joins: Enables efficient queries like "Show all invoices created by users with email domain @acme.com".
Negative
- Data Duplication: We are duplicating PII, which requires strict access controls on the
user-directory-servicedatabase. - Sync Latency: The local cache might be slightly stale if a user updates their profile in the IdP and the webhook fails or hasn't arrived yet. This is an acceptable trade-off for display purposes.