Why your Paddle webhook handler should accept 5-minute skew, not 5 seconds
We shipped a Paddle Billing webhook handler with a 5-second default skew on signature verification. It looked tight and defensive. It was actually too tight to survive Paddle's own retry policy plus normal network latency, and would have rejected legitimate retries as "replay attacks." Replay protection still works fine at 5 minutes. Here's the math.
The setup
Paddle Billing signs every webhook with the format:
Paddle-Signature: ts=1714250000;h1=<hex-hmac>The HMAC is over <ts>:<rawBody> with the notification destination's secret. Verification has three checks:
- The
h1hex matches the recomputed HMAC. - Constant-time comparison via
hmac.Equal. - The timestamp is within an acceptable skew window — to prevent replay.
The first two are about authenticity. The third is about replay protection: if an attacker captures a valid request, they shouldn't be able to resubmit it days later. The skew window is the lifetime of a captured request. Tighter is better — to a point.
What we shipped
func VerifySignature(rawBody []byte, header, secret string, maxSkew time.Duration) error {
// ...
if maxSkew <= 0 {
maxSkew = 5 * time.Second
}
if delta := time.Since(tsTime); delta > maxSkew || delta < -maxSkew {
return fmt.Errorf("paddle: signature timestamp out of range (%s)", delta)
}
// ...
}Five seconds. Looked great. Lined up with HMAC patterns from OAuth-style request signing where 5–10 seconds is normal.
What broke
Paddle retries failed webhooks. The first retry happens within seconds, but back-off pushes later retries out: 30 seconds, then minutes. If our server returns 5xx once — say, mid-deploy — the retry timestamp is the original event timestamp, not the retry's send time.
ts on the signature is the moment Paddle first tried to deliver the webhook. By the time it reaches us on retry, that timestamp is 30 seconds, 2 minutes, or 10 minutes old. The 5-second skew rejects every one of those as "out of range."On top of retries, regular network latency through a load balancer plus TLS plus a busy app server can eat 1–3 seconds. Add small clock drift between Paddle's senders and our box. We were operating with almost no margin.
The bug shape: subscription.created arrives, we 5xx because Postgres is restarting from a deploy, Paddle retries 30 seconds later, our verification rejects the retry as a replay, the org never gets the active flag. The customer's checkout looks "succeeded" on Paddle's side; on ours they're still stuck on the free plan.
The fix
// maxSkew limits replay; pass zero to use the 5-minute default. Paddle
// retries failed webhooks, so a tight skew (e.g. 5s) will reject legitimate
// retries.
if maxSkew <= 0 {
maxSkew = 5 * time.Minute
}Five minutes. Three orders of magnitude wider. Sounds bad for replay protection. Walk the threat model:
- An attacker needs to capture a valid signed envelope. That envelope was sent over TLS. To capture it, they need either a MITM cert (game over for everything) or a compromised reverse-proxy log.
- They have to replay it. Once. Because the second time, our
ON CONFLICT (provider_event_id) DO NOTHINGshort-circuits duplicates. - The replay has to land within 5 minutes of the original send. Not 5 minutes from now — 5 minutes from when Paddle signed it.
So the actual replay-attack budget at 5 minutes is: capture an envelope (very hard), replay it before the legitimate copy arrives at our server (must beat Paddle by network latency), and have it processed before the first arrival creates the idempotency row. The window is about the round-trip time of a Paddle → us request. Tightening to 5 seconds bought no real replay protection but cost us correctness under retries.
The test that catches this
The bug only shows up when the timestamp is older than the skew but the signature is still valid. The test:
func TestVerifySignature_AcceptsWithinDefaultSkew(t *testing.T) {
body := []byte(`{}`)
// 30 seconds behind — within default 5-minute skew, well outside the
// old 5-second window.
ts := time.Now().Add(-30 * time.Second)
header := validSignature(t, body, "secret", ts)
if err := VerifySignature(body, header, "secret", 0); err != nil {
t.Fatalf("expected accept within skew, got %v", err)
}
}
func TestVerifySignature_RejectsReplayPastSkew(t *testing.T) {
body := []byte(`{}`)
old := time.Now().Add(-10 * time.Minute) // outside 5-minute window
header := validSignature(t, body, "secret", old)
if err := VerifySignature(body, header, "secret", 0); err == nil {
t.Fatal("expected out-of-range error")
}
}Both pass. Idempotency picks up the rest.
What I'd actually optimize for
Defense in depth on webhook handling looks like this in practice:
- HMAC over raw body. Read the bytes, verify, then parse JSON. Re-marshalling and verifying breaks the signature even when the values are identical.
- Idempotency at the database level. Unique constraint on
provider_event_id,ON CONFLICT DO NOTHING. A successful retry returns 200 without re-running side effects. - Reasonable skew. Wide enough to absorb retries and network jitter, narrow enough that captured-and-replayed envelopes have to beat the legitimate copy.
- Body size limit. Bound the read to 1 MiB so a malformed sender can't pin a goroutine streaming forever.
The skew is the one knob that's tighter is worse. Get the first three right, and 5 minutes is a comfortable default.