Why Webhooks Are Hard
Webhooks seem simple:
✅ Receive a POST
✅ Process it
✅ Acknowledge with 200
…but reality is messier:
Networks fail
Clients disconnect
Providers retry
Duplicate events arrive
Orders of messages are inconsistent
If you don’t design for this, you’ll have double-processed events or missed data.
Typical Webhook Risks
Event duplication (providers may retry on 5xx or timeout)
Out-of-order events (especially if providers retry aggressively)
Partial processing (e.g. your DB update succeeded, but your response was lost)
No retry (some providers do not retry, you have to acknowledge quickly)
So you need:
✅ Fast acknowledgment
✅ Idempotency to deduplicate
✅ Safe retry strategies
✅ Durable storage
Architecture Pattern for Resilient Webhooks
✅ Accept event
✅ Validate (auth, signatures, etc)
✅ Store the event in a durable queue or database
✅ Immediately respond 200 (so the provider stops retrying)
✅ Process the event asynchronously
✅ Enforce idempotency on processing
✅ Retry your internal processing if it fails
Let’s Build One in Node.js
Assume an Express server with Stripe webhooks (but this pattern is universal).
Quick Setup
Create server.js
:
Securely Validate the Webhook
Stripe signs events. Validate the signature:
✅ This immediately returns 200 so Stripe won’t retry
✅ Durable queue preferred in production (e.g. Redis, RabbitMQ, SQS)
Storing and Processing
Imagine a simple job queue (for illustration):
Enforcing Idempotency
Most webhook events have an id
. Store it to avoid replay.
Example with a simple SQLite store:
Handling Failures
If
handleBusinessLogic
fails, store a status for the event aspending
Use a cron or worker to retry “pending” events later
Use exponential backoff for internal retries
Monitor “stuck” events with metrics
Testing Your Webhooks
✅ Replay events with Stripe CLI or Postman
✅ Test duplicate deliveries
✅ Simulate network timeouts
✅ Confirm 200 responses within 3 seconds
✅ Test poison-message logic (bad payloads)
Best Practices Checklist
✅ Validate signatures (ALWAYS)
✅ Immediately ack with 200
✅ Store events in a durable queue
✅ Process events asynchronously
✅ Enforce idempotency with event IDs
✅ Monitor failures, alert on retry loops
✅ Apply exponential backoff
✅ Rate-limit or throttle abusive retries
Extra for High Security
Rotate webhook secrets
Log suspicious IPs
Validate timestamps to avoid replay
Don’t expose stack traces in 400 errors
Use HTTPS only
Conclusion
A robust webhook handler is an asynchronous, durable, idempotent processor, not a single synchronous POST handler. If you follow the patterns above, you’ll protect your customers from data loss, duplicates, and subtle fraud.
NEVER MISS A THING!
Subscribe and get freshly baked articles. Join the community!
Join the newsletter to receive the latest updates in your inbox.