Service Components
Uptime — Last 30 Days
Overall Uptime
99.98%Incident History
Some customers experienced delayed webhook deliveries due to a connection pool exhaustion in our delivery workers. No webhooks were lost — all pending deliveries were processed after the fix was deployed.
14:12 UTC — Monitoring detected elevated p99 latency on webhook delivery queue.
14:25 UTC — Root cause identified: connection pool limit reached due to a slow downstream dependency.
14:41 UTC — Connection pool limits increased and stale connections recycled. Latency returning to normal.
15:03 UTC — All systems nominal. Backlog fully drained. Incident resolved.
Users in the EU region experienced intermittent 502 errors when loading dashboards. The issue was caused by a misconfigured load balancer rule deployed during a routine infrastructure update.
09:47 UTC — Reports received of dashboard load failures from EU-based accounts.
10:02 UTC — Issue traced to a load balancer routing rule that was incorrectly updated during a scheduled maintenance window.
10:18 UTC — Routing rule reverted. Dashboards loading normally for all regions.
10:30 UTC — Monitoring confirmed full recovery. Post-incident review scheduled.
A small percentage of SDK event submissions timed out due to a DNS resolution issue with one of our upstream providers. Events were buffered client-side and successfully retried after resolution.
20:15 UTC — Automated alerts triggered for increased timeout rates on SDK ingest endpoints.
20:28 UTC — Identified intermittent DNS resolution failures for a third-party certificate validation service.
20:45 UTC — Switched to backup DNS resolver and deployed cached certificate validation. Timeout rates dropped to baseline.
21:00 UTC — Incident resolved. No data loss confirmed through audit of client-side retry logs.