V0.2 tenant workspace

SITEMAP.XML operating spine

Org context: sitemap.xml

Provider reliability

Provider Health

A monitoring room for payment rails, identity, messaging, webhooks, APIs, database health, and the recovery actions that keep operators from guessing.

Operational

5

Provider surfaces that are currently healthy enough to rely on.

Degraded

3

Providers with visible warnings or incomplete readiness.

Blocked

1

Provider events or systems that should block automation.

Signal

63%

Average provider confidence across signals.

payments

2 signals

0

Blocked

2

Watch

58%

Inspect provider logs, queues, deployment context, and recent user-facing failures

webhook

3 signals

1

Blocked

0

Watch

47%

Verify idempotency, payload hash, retry policy, and owner alerting

api

4 signals

0

Blocked

1

Watch

71%

Keep monitoring cadence and evidence freshness

identity

1 signals

0

Blocked

0

Watch

92%

Deepen role-aware session usage across more surfaces

messaging

1 signals

0

Blocked

0

Watch

32%

Use for recovery, lead follow-up, and event comms

reporting

0 signals

0

Blocked

0

Watch

0%

No provider in this lane yet

infrastructure

1 signals

0

Blocked

0

Watch

95%

Keep monitoring cadence and evidence freshness

Provider signal queue

Ranked by signal strength so weak integrations, failed webhooks, and incomplete contracts become visible before members feel them.

Postgres database

infrastructure · Owner: Platform

operational

95%

Signal

Just now

Last signal

Core product read/write availability

Recovery action

Keep monitoring cadence and evidence freshness

Escalation

Check health endpoint

Compare recent deploy and provider status

Open incident timeline

Communicate owner-safe status

Evidence

System: Postgres database

Status: healthy

Core read/write path is available, which keeps the operational app usable.

Open source board

AI decision log

api · Owner: Operations

operational

95%

Signal

Today

Last signal

Automation trust, review safety, and owner visibility

Recovery action

Keep monitoring cadence and evidence freshness

Escalation

Check health endpoint

Compare recent deploy and provider status

Open incident timeline

Communicate owner-safe status

Evidence

System: AI decision log

Status: healthy

AI surfaces need logging and reasoning visibility to stay trustworthy as automation grows.

Open source board

Better Auth

identity · Owner: Platform

operational

92%

Signal

Connection expected to be usable

Last signal

Login, staff access, family switching, and audit attribution

Recovery action

Deepen role-aware session usage across more surfaces

Escalation

Confirm provider owner

Check secrets/config

Review recent webhook/API failures

Create owner-visible incident if user-facing

Evidence

Category: identity

Status: live

The identity foundation is there; the next lift is richer role and household behavior on top of it.

Open source board

Health endpoint

api · Owner: Platform

operational

90%

Signal

Live API surface

Last signal

Consumer: Coolify and operations

Recovery action

Run health smoke and inspect contract drift

Escalation

Identify consumer

Validate auth scope

Smoke route/contract

Document breaking-change risk

Evidence

Consumer: Coolify and operations

Status: live

The existing health surface is the start of a broader platform operations API.

Open source board

booking.changed

webhook · Owner: Operations

operational

86%

Signal

Delivery active

Last signal

Destination: Waitlist and member timeline

Recovery action

Verify idempotency, payload hash, retry policy, and owner alerting

Escalation

Inspect latest delivery

Check idempotency key

Replay safe payload

Escalate if timeline or money state can drift

Evidence

Destination: Waitlist and member timeline

Status: active

Booking changes are the glue between schedule pressure, member experience, and attendance operations.

Open source board

Members API

api · Owner: Platform

degraded

63%

Signal

Contract documented

Last signal

Consumer: Migration and integrations

Recovery action

Finish auth scope, examples, audit event, and rate-limit definition

Escalation

Identify consumer

Validate auth scope

Smoke route/contract

Document breaking-change risk

Evidence

Consumer: Migration and integrations

Status: documented

External access should follow the same tenant and audit expectations as internal actions.

Open source board

Payment webhooks

payments · Owner: Operations

degraded

61%

Signal

5 minutes ago

Last signal

Money state, recovery workflows, and member trust

Recovery action

Inspect provider logs, queues, deployment context, and recent user-facing failures

Escalation

Check health endpoint

Compare recent deploy and provider status

Open incident timeline

Communicate owner-safe status

Evidence

System: Payment webhooks

Status: watch

Payment rails should be visible to owners before failed webhooks become mystery billing states.

Open source board

Payment rails

payments · Owner: Billing

degraded

54%

Signal

Operational decision required

Last signal

Billing, dunning, refunds, receipts, and owner trust

Recovery action

Decide real retry/recovery ownership across providers

Escalation

Confirm provider owner

Check secrets/config

Review recent webhook/API failures

Create owner-visible incident if user-facing

Evidence

Category: payments

Status: needs attention

Billing depth is where the difference between a convincing shell and a truly operational system becomes very obvious.

Open source board

payment.failed

webhook · Owner: Billing

planned

38%

Signal

Delivery not yet wired

Last signal

Destination: Recovery workflow

Recovery action

Verify idempotency, payload hash, retry policy, and owner alerting

Escalation

Inspect latest delivery

Check idempotency key

Replay safe payload

Escalate if timeline or money state can drift

Evidence

Destination: Recovery workflow

Status: planned

Failed payments should trigger recovery, audit, and communication context without becoming noisy.

Open source board

Bookings API

api · Owner: Platform

planned

35%

Signal

Contract planned

Last signal

Consumer: Member app and partner widgets

Recovery action

Finish auth scope, examples, audit event, and rate-limit definition

Escalation

Identify consumer

Validate auth scope

Smoke route/contract

Document breaking-change risk

Evidence

Consumer: Member app and partner widgets

Status: planned

Booking APIs need capacity, waitlist, family, and cancellation rules baked in from the beginning.

Open source board

Resend

messaging · Owner: Growth

planned

32%

Signal

Not connected yet

Last signal

Lead nurture, recovery, reminders, and incident comms

Recovery action

Use for recovery, lead follow-up, and event comms

Escalation

Confirm provider owner

Check secrets/config

Review recent webhook/API failures

Create owner-visible incident if user-facing

Evidence

Category: messaging

Status: planned

Messaging is already becoming central to the product, so a clearer integration surface helps the roadmap feel coherent.

Open source board

ai.decision.logged

webhook · Owner: AI review

blocked

18%

Signal

Delivery failing

Last signal

Destination: Audit and owner review

Recovery action

Pause automation, inspect failed payloads, and define retry/dead-letter behavior

Escalation

Inspect latest delivery

Check idempotency key

Replay safe payload

Escalate if timeline or money state can drift

Evidence

Destination: Audit and owner review

Status: failing

AI events need extra visibility because trust breaks quickly when automation feels untraceable.

Open source board

Priority recovery

The providers that should drive the next operator action.

Members API

Finish auth scope, examples, audit event, and rate-limit definition

Payment webhooks

Inspect provider logs, queues, deployment context, and recent user-facing failures

Payment rails

Decide real retry/recovery ownership across providers

ai.decision.logged

Pause automation, inspect failed payloads, and define retry/dead-letter behavior

Monitoring runbook

The minimum loop for provider health once real providers are wired.

Observe

Capture status, latency, last success, last failure, payload count, queue depth, and provider incident notes.

Alert

Warn owners only when a signal is actionable: billing drift, blocked login, failed delivery, or user-facing outage.

Recover

Replay idempotent events, pause unsafe automations, hold billing sends, and document rollback options.

Close

Attach evidence, timeline, prevention task, and owner-safe explanation before marking healthy.

Durable tables next

provider_checks

Provider, check type, status, latency, last error, last success, and cadence.

provider_incidents

Impact, owner, linked provider, timeline, comms, recovery action, and prevention task.

provider_alert_rules

Threshold, severity, notification route, quiet hours, and suppress-until behavior.