Platform operations
Platform Ops
One operating room for integrations, API contracts, webhook delivery, provider health, incidents, and owner-facing reliability.
Blocked
2
Platform items that need incident-level attention.
Watch
4
Systems or contracts that deserve review soon.
Planned
3
Planned contracts and providers not yet live.
Pressure
45%
Average operational pressure across the platform.
Integrations
3 records
API surfaces
3 records
Webhooks
3 records
Admin health
3 records
Incidents
3 records
Reliability queue
Ranked by severity, user impact, and whether a provider or contract can drift.
ai.decision.logged
webhook · Owner: Owner
95%
Severity
5 steps
Runbook
Open incident and retry/dead-letter review
Evidence
Destination: Audit and owner review
Status: failing
AI events need extra visibility because trust breaks quickly when automation feels untraceable.
Runbook
Check latest delivery
Inspect destination
Verify idempotency key
Confirm retry policy
Escalate if failing
Member discomfort during recovery call
incident · Owner: Owner
90%
Severity
5 steps
Runbook
Owner review and follow-up plan
Evidence
Area: Billing
Reported: This week
Language and timing need review so the recovery workflow feels supportive rather than abrupt.
Runbook
Confirm impact
Assign owner
Record timeline
Decide communication
Close with prevention note
Payment rails
integration · Owner: Billing
72%
Severity
4 steps
Runbook
Decide real retry/recovery ownership across providers
Evidence
Category: payments
Status: needs attention
Billing depth is where the difference between a convincing shell and a truly operational system becomes very obvious.
Runbook
Confirm owner
Check provider credentials
Verify event/audit path
Record rollback contact
Payment webhooks
health · Owner: Admin
68%
Severity
4 steps
Runbook
Review provider, queue, and recent deployment context
Evidence
Last checked: 5 minutes ago
Status: watch
Payment rails should be visible to owners before failed webhooks become mystery billing states.
Runbook
Check health endpoint
Review latest deployment
Inspect provider logs
Create incident if user-facing
Late pickup confusion after kids class
incident · Owner: Front desk
62%
Severity
5 steps
Runbook
Close with evidence or schedule review
Evidence
Area: Kids finish
Reported: Yesterday
No safety breach, but the handoff between coach and guardian permissions needs tightening.
Runbook
Confirm impact
Assign owner
Record timeline
Decide communication
Close with prevention note
Bookings API
api · Owner: Platform
52%
Severity
5 steps
Runbook
Define auth, rate limits, payloads, and audit contract
Evidence
Consumer: Member app and partner widgets
Status: planned
Booking APIs need capacity, waitlist, family, and cancellation rules baked in from the beginning.
Runbook
Define consumer
Document auth scope
Add payload examples
Map audit event
Smoke health path
payment.failed
webhook · Owner: Platform
48%
Severity
5 steps
Runbook
Confirm delivery, retry, and idempotency policy
Evidence
Destination: Recovery workflow
Status: planned
Failed payments should trigger recovery, audit, and communication context without becoming noisy.
Runbook
Check latest delivery
Inspect destination
Verify idempotency key
Confirm retry policy
Escalate if failing
Resend
integration · Owner: Comms
45%
Severity
4 steps
Runbook
Use for recovery, lead follow-up, and event comms
Evidence
Category: messaging
Status: planned
Messaging is already becoming central to the product, so a clearer integration surface helps the roadmap feel coherent.
Runbook
Confirm owner
Check provider credentials
Verify event/audit path
Record rollback contact
Members API
api · Owner: Platform
35%
Severity
5 steps
Runbook
Define auth, rate limits, payloads, and audit contract
Evidence
Consumer: Migration and integrations
Status: documented
External access should follow the same tenant and audit expectations as internal actions.
Runbook
Define consumer
Document auth scope
Add payload examples
Map audit event
Smoke health path
Small mat tear spotted near wall edge
incident · Owner: Facility lead
28%
Severity
5 steps
Runbook
Close with evidence or schedule review
Evidence
Area: Main mats
Reported: Today
Contained for now, but should be patched before the weekend seminar block.
Runbook
Confirm impact
Assign owner
Record timeline
Decide communication
Close with prevention note
booking.changed
webhook · Owner: Platform
22%
Severity
5 steps
Runbook
Confirm delivery, retry, and idempotency policy
Evidence
Destination: Waitlist and member timeline
Status: active
Booking changes are the glue between schedule pressure, member experience, and attendance operations.
Runbook
Check latest delivery
Inspect destination
Verify idempotency key
Confirm retry policy
Escalate if failing
Better Auth
integration · Owner: Admin
18%
Severity
4 steps
Runbook
Deepen role-aware session usage across more surfaces
Evidence
Category: identity
Status: live
The identity foundation is there; the next lift is richer role and household behavior on top of it.
Runbook
Confirm owner
Check provider credentials
Verify event/audit path
Record rollback contact
Postgres database
health · Owner: Admin
15%
Severity
4 steps
Runbook
Keep monitoring
Evidence
Last checked: Just now
Status: healthy
Core read/write path is available, which keeps the operational app usable.
Runbook
Check health endpoint
Review latest deployment
Inspect provider logs
Create incident if user-facing
AI decision log
health · Owner: Admin
15%
Severity
4 steps
Runbook
Keep monitoring
Evidence
Last checked: Today
Status: healthy
AI surfaces need logging and reasoning visibility to stay trustworthy as automation grows.
Runbook
Check health endpoint
Review latest deployment
Inspect provider logs
Create incident if user-facing
Health endpoint
api · Owner: Platform
12%
Severity
5 steps
Runbook
Monitor usage and contract drift
Evidence
Consumer: Coolify and operations
Status: live
The existing health surface is the start of a broader platform operations API.
Runbook
Define consumer
Document auth scope
Add payload examples
Map audit event
Smoke health path
Runbook coverage
What this platform slice now makes explicit before we wire real providers deeper.
API contracts
Consumers, auth scopes, payload examples, rate limits, and audit events need to travel together.
Webhook delivery
Every event needs delivery status, retry policy, idempotency, dead-letter behavior, and owner escalation.
Provider health
Health checks should link to recent deploys, provider logs, queue status, and incident creation.
Incident response
Operational incidents need timeline, owner, communication, closure, and prevention notes.
Production hardening backlog
The next things that would make this durable rather than demo-data backed.
provider_health_checks
Persist provider status, latency, last error, and check cadence.
webhook_deliveries
Track delivery attempts, retry windows, payload hash, idempotency key, and dead-letter state.
api_consumers
Store consumer apps, scopes, keys, owner, rate limits, and audit contract.
incident_timelines
Record impact, owner, action log, communications, and prevention follow-up.