Skip to content
Blog
·5 min readsecuritybillinggo

Why your Paddle webhook handler should accept 5-minute skew, not 5 seconds

We shipped a Paddle Billing webhook handler with a 5-second default skew on signature verification. It looked tight and defensive. It was actually too tight to survive Paddle's own retry policy plus normal network latency, and would have rejected legitimate retries as "replay attacks." Replay protection still works fine at 5 minutes. Here's the math.

The setup

Paddle Billing signs every webhook with the format:

text
Paddle-Signature: ts=1714250000;h1=<hex-hmac>

The HMAC is over <ts>:<rawBody> with the notification destination's secret. Verification has three checks:

  • The h1 hex matches the recomputed HMAC.
  • Constant-time comparison via hmac.Equal.
  • The timestamp is within an acceptable skew window — to prevent replay.

The first two are about authenticity. The third is about replay protection: if an attacker captures a valid request, they shouldn't be able to resubmit it days later. The skew window is the lifetime of a captured request. Tighter is better — to a point.

What we shipped

go
func VerifySignature(rawBody []byte, header, secret string, maxSkew time.Duration) error {
    // ...
    if maxSkew <= 0 {
        maxSkew = 5 * time.Second
    }
    if delta := time.Since(tsTime); delta > maxSkew || delta < -maxSkew {
        return fmt.Errorf("paddle: signature timestamp out of range (%s)", delta)
    }
    // ...
}

Five seconds. Looked great. Lined up with HMAC patterns from OAuth-style request signing where 5–10 seconds is normal.

What broke

Paddle retries failed webhooks. The first retry happens within seconds, but back-off pushes later retries out: 30 seconds, then minutes. If our server returns 5xx once — say, mid-deploy — the retry timestamp is the original event timestamp, not the retry's send time.

Paddle re-sends the same signed envelope. The ts on the signature is the moment Paddle first tried to deliver the webhook. By the time it reaches us on retry, that timestamp is 30 seconds, 2 minutes, or 10 minutes old. The 5-second skew rejects every one of those as "out of range."

On top of retries, regular network latency through a load balancer plus TLS plus a busy app server can eat 1–3 seconds. Add small clock drift between Paddle's senders and our box. We were operating with almost no margin.

The bug shape: subscription.created arrives, we 5xx because Postgres is restarting from a deploy, Paddle retries 30 seconds later, our verification rejects the retry as a replay, the org never gets the active flag. The customer's checkout looks "succeeded" on Paddle's side; on ours they're still stuck on the free plan.

The fix

go
// maxSkew limits replay; pass zero to use the 5-minute default. Paddle
// retries failed webhooks, so a tight skew (e.g. 5s) will reject legitimate
// retries.
if maxSkew <= 0 {
    maxSkew = 5 * time.Minute
}

Five minutes. Three orders of magnitude wider. Sounds bad for replay protection. Walk the threat model:

  • An attacker needs to capture a valid signed envelope. That envelope was sent over TLS. To capture it, they need either a MITM cert (game over for everything) or a compromised reverse-proxy log.
  • They have to replay it. Once. Because the second time, our ON CONFLICT (provider_event_id) DO NOTHING short-circuits duplicates.
  • The replay has to land within 5 minutes of the original send. Not 5 minutes from now — 5 minutes from when Paddle signed it.

So the actual replay-attack budget at 5 minutes is: capture an envelope (very hard), replay it before the legitimate copy arrives at our server (must beat Paddle by network latency), and have it processed before the first arrival creates the idempotency row. The window is about the round-trip time of a Paddle → us request. Tightening to 5 seconds bought no real replay protection but cost us correctness under retries.

The test that catches this

The bug only shows up when the timestamp is older than the skew but the signature is still valid. The test:

go
func TestVerifySignature_AcceptsWithinDefaultSkew(t *testing.T) {
    body := []byte(`{}`)
    // 30 seconds behind — within default 5-minute skew, well outside the
    // old 5-second window.
    ts := time.Now().Add(-30 * time.Second)
    header := validSignature(t, body, "secret", ts)
    if err := VerifySignature(body, header, "secret", 0); err != nil {
        t.Fatalf("expected accept within skew, got %v", err)
    }
}

func TestVerifySignature_RejectsReplayPastSkew(t *testing.T) {
    body := []byte(`{}`)
    old := time.Now().Add(-10 * time.Minute)  // outside 5-minute window
    header := validSignature(t, body, "secret", old)
    if err := VerifySignature(body, header, "secret", 0); err == nil {
        t.Fatal("expected out-of-range error")
    }
}

Both pass. Idempotency picks up the rest.

What I'd actually optimize for

Defense in depth on webhook handling looks like this in practice:

  • HMAC over raw body. Read the bytes, verify, then parse JSON. Re-marshalling and verifying breaks the signature even when the values are identical.
  • Idempotency at the database level. Unique constraint on provider_event_id, ON CONFLICT DO NOTHING. A successful retry returns 200 without re-running side effects.
  • Reasonable skew. Wide enough to absorb retries and network jitter, narrow enough that captured-and-replayed envelopes have to beat the legitimate copy.
  • Body size limit. Bound the read to 1 MiB so a malformed sender can't pin a goroutine streaming forever.

The skew is the one knob that's tighter is worse. Get the first three right, and 5 minutes is a comfortable default.

Why this matters if you embed analytics in your SaaS
Billing webhook bugs don't show up in your error tracker — they show up in a customer's churn email three weeks later. The signature that "looks more secure" by being tighter is also the signature that silently rejects the retry that confirms the customer's upgrade. Webhooks fail closed by default; that means the failure is invisible to you and visible to them.