Back to Journal
Architecture11 min read

Designing Multi-Tenant SMS Recovery: Credits, Queues, and Consent for Cross-Border E-Commerce

Alex Mercer
Lead Backend Engineer

Most SMS cart recovery guides assume one country, one tenant, one payment provider. That works for a single Shopify store selling domestically. It falls apart the moment you build a platform where hundreds of merchants sell across borders — each with different carrier costs, timezone constraints, and regulatory obligations.

We designed an SMS recovery system that handles all of this from day one. This article walks through the architecture: per-country credit billing, timezone-aware quiet hours, a 4-layer GDPR consent stack, and an 8-step BullMQ sending pipeline. We use European e-commerce as the running example — it has the hardest constraints (GDPR, carrier cost variance, timezone spread). But the patterns apply to any cross-border SaaS, including US-to-LATAM, APAC, or global.

Key Decisions at a Glance

DecisionWhat We ChoseWhy
Billing modelCredit packs with country multipliersCarrier costs vary 3x across borders
Phone numbersDedicated per-tenant ($1/mo)Isolation + deliverability
Consent4-Layer Consent StackGDPR requires demonstrable consent management
Credit safetyReserve-Confirm Credit PatternPrevents overdraft under concurrent sends
Send timing45min post-abandonmentSupplements email step 1, respects recipient timezone
QueueBullMQ delayed jobsRe-queueable on quiet hours, deterministic job IDs

Why Is SMS Cart Recovery Harder for Cross-Border E-Commerce?

Cart abandonment hovers around 70% across the industry. Email recovery catches some of those — but SMS open rates sit at 98% compared to email's ~20%. Adding SMS as a supplementary channel, not a replacement, meaningfully increases recovery rates.

The complication: when your platform serves merchants across Lithuania, Germany, Finland, Poland, and France, a single SMS pipeline faces problems that a domestic setup never encounters:

  • Carrier cost asymmetry — sending an SMS to Lithuania costs roughly 1 credit (~€0.03). To Germany, it costs 3 credits (~€0.09). A flat per-message rate either overcharges low-cost countries or loses money on expensive ones.
  • GDPR consent requirements — a checkbox alone doesn't satisfy GDPR. You need demonstrable opt-in records, a working unsubscribe mechanism, and an audit trail of when consent was granted or revoked.
  • Timezone-aware sending — a Lithuanian merchant with Finnish customers needs to respect Finnish quiet hours, not Lithuanian ones. A "don't send at night" rule needs to know whose night.
  • Multiple payment providers — SMS consent validation must run before any payment flow. In our case, that means both Stripe and MakeCommerce (Baltic bank links) — the validation logic can't be coupled to one provider.

SMS supplements email recovery. It fires 15 minutes after the first recovery email (45 minutes post-abandonment), doesn't replace any email step, and uses a separate queue with independent rate limits.

Architecture Overview

The system spans three multi-tenant boundaries: shop-scoped configuration, org-scoped credits, and session-scoped consent. Here's the flow:

Cart Abandonment Cron (every 5 min)
  │
  ├─ Session has SMS consent + phone?
  ├─ Shop has SMS enabled?
  ├─ Org has enough credits for this country?
  ├─ Feature gate: PRO tier or higher?
  │
  └─▶ Schedule BullMQ delayed job (45min post-abandonment)
        │
        ├─ Pre-send checks (8 steps — see below)
        ├─ Reserve credits (SELECT FOR UPDATE)
        ├─ Send via Twilio
        │
        └─▶ Twilio Delivery Webhook
              │
              ├─ delivered → confirm credit reservation
              ├─ failed/undelivered → release credits (refund)
              └─ unsubscribed → clear consent on session

Every arrow crosses a tenant isolation boundary. Shop A's SMS config, credit pool, and customer sessions are invisible to Shop B. This is enforced by Prisma Client Extensions that automatically scope every query to the active shop.

How to Build a Credit Billing System That Can't Overdraft

A message to Lithuania costs 1 credit. To Germany, 3. To the US, 1. To Switzerland, 3. These multipliers reflect actual carrier per-segment pricing from Twilio — we mapped 23 countries to credit costs:

// Country credit multipliers — reflects actual carrier costs per SMS segment
const SMS_COUNTRY_MULTIPLIERS: Record<string, number> = {
  US: 1, CA: 1, GB: 2, LT: 1, LV: 1, EE: 1, FI: 2,
  PL: 2, IE: 2, NL: 2, BE: 2, IT: 2, ES: 2, PT: 2,
  DE: 3, FR: 3, AT: 3, CH: 3, AU: 3, NZ: 3,
  SE: 2, DK: 2, NO: 2,
};
const SMS_DEFAULT_MULTIPLIER = 3; // Unknown countries get worst-case pricing

Credits belong to the organization, not individual shops. An organization might own three shops — all sharing one credit pool. This simplifies billing (one Stripe checkout, one ledger) at the cost of one high-volume shop potentially draining credits from siblings. Acceptable at our scale; enterprise customers could get per-shop pools later.

The credit system uses two tables: OrgSmsCreditBalance (current balance, single row per org) and OrgSmsCreditLedger (append-only audit trail with entry types: PURCHASE, TIER_REFILL, SEND_RESERVE, SEND_CONFIRM, SEND_REFUND, DELIVERY_REFUND). Three credit packs are available: Starter (100 credits, €3), Standard (500, €12.50), Large (2000, €40).

The Reserve-Confirm Credit Pattern

This is the hardest concurrency problem in the system. Two merchants in the same org trigger SMS sends simultaneously. Without locking, both read "50 credits available" and both spend 30 — the org ends up at -10 credits.

The solution: reserve credits atomically before the external API call, confirm on success, release on failure. The reservation window is the blast radius of a crash.

// Reserve credits — atomic SELECT FOR UPDATE prevents overdraft
async function reserveSmsCredits(orgId: string, credits: number) {
  return prisma.$transaction(async (tx) => {
    // Lock the balance row — concurrent transactions wait here
    const balance = await tx.$queryRaw`
      SELECT available_credits FROM org_sms_credit_balances
      WHERE organization_id = ${orgId}
      FOR UPDATE
    `;

    if (balance[0].available_credits < credits) {
      throw new InsufficientCreditsError(orgId, credits);
    }

    // Deduct and create ledger entry atomically
    await tx.orgSmsCreditBalance.update({
      where: { organizationId: orgId },
      data: { availableCredits: { decrement: credits } },
    });

    return tx.orgSmsCreditLedger.create({
      data: {
        organizationId: orgId,
        entryType: 'SEND_RESERVE',
        credits: -credits,
        status: 'RESERVED',
      },
    });
  });
}

If Twilio accepts the message, we confirm the reservation (update status to CONFIRMED). If Twilio rejects it or the process crashes mid-send, a cleanup cron finds stale reservations older than 30 minutes and releases them back to the pool.

This pattern generalizes beyond SMS — we use the same Reserve-Confirm approach for AI token billing. Any SaaS with consumable credits and concurrent usage needs something like this.

How to Manage Per-Tenant Phone Numbers in a Multi-Tenant SaaS

Each shop gets a dedicated Twilio phone number ($1/month). Why not shared short codes? Three reasons: deliverability (carriers trust dedicated numbers more), isolation (one shop's spam complaints can't tank another's reputation), and debugging (each number maps to exactly one tenant).

The number lifecycle has four states: provision on enable → active during normal operation → 30-day hold on disable (prevents accidental number loss if a merchant toggles SMS off temporarily) → release via a daily cron at 3AM that checks for numbers past their hold date.

European long codes offer good delivery rates with higher latency than US domestic. Design for async — never block checkout on SMS. The queue handles everything after the customer leaves the checkout page.

How to Handle SMS Quiet Hours Across Timezones with BullMQ

We enforce a conservative 10AM–7PM send window. But whose 10AM? A Lithuanian merchant (UTC+2) selling to a Finnish customer (UTC+2) and a German customer (UTC+1) needs the quiet hours check to use the customer's timezone, not the server's or the merchant's.

// Country → UTC offset mapping for quiet hours enforcement
const SMS_COUNTRY_UTC_OFFSETS: Record<string, number> = {
  US: -5, CA: -5, GB: 0, IE: 0, PT: 0,
  DE: 1, FR: 1, NL: 1, BE: 1, AT: 1, CH: 1,
  IT: 1, ES: 1, SE: 1, DK: 1, NO: 1, PL: 1,
  LT: 2, LV: 2, EE: 2, FI: 2,
  AU: 10, NZ: 12,
};

function isInSendWindow(country: string): boolean {
  const offset = SMS_COUNTRY_UTC_OFFSETS[country] ?? 0;
  const recipientHour = (new Date().getUTCHours() + offset + 24) % 24;
  return recipientHour >= 10 && recipientHour < 19; // 10AM–7PM local
}

When the BullMQ worker picks up a job and finds the recipient is in quiet hours, it doesn't discard the job — it re-queues it with a calculated delay until the next send window opens. BullMQ's delayed job mechanism handles this natively; we just compute the milliseconds until 10AM in the recipient's timezone.

Why 7PM and not 9PM? SMS is intrusive. An 8PM "you forgot your cart" text feels like spam. The 7PM cutoff respects the customer's evening. This is a brand decision as much as a technical one — configurable per-shop if merchants want different windows.

Design tradeoff: Country-level UTC offsets are a simplification. Germany and Spain share UTC+1 but Spain's social clock runs roughly an hour later. We accepted this imprecision — city-level would require IP geolocation, adding complexity and privacy concerns (storing precise location data creates its own GDPR headaches).

How to Make SMS Cart Recovery GDPR Compliant

GDPR compliance for transactional SMS isn't a checkbox — it's an architecture. We use four independent enforcement layers, any one of which blocks unauthorized messaging. We call this the 4-Layer Consent Stack: each layer independently prevents unauthorized sends, so a failure at any single layer cannot result in an unauthorized message.

Layer 1: Frontend — Explicit Opt-In

The SMS consent checkbox is unchecked by default. No dark patterns, no pre-checked boxes. When checked, the phone number field is validated client-side with libphonenumber-js before the form can submit. The user must actively choose to receive SMS.

Layer 2: Backend — Pre-Payment Validation

A shared validateSmsConsent() function runs before payment creation in both the Stripe and MakeCommerce checkout handlers. If consent is true but the phone is missing or fails E.164 parsing, the API returns a 400 error. This prevents paying payment provider fees for a request we'd reject anyway.

// Shared validation — runs before PaymentIntent/MC transaction creation
function validateSmsConsent(consent: boolean, phone?: string): void {
  if (!consent) return; // No consent = no validation needed

  if (!phone?.trim()) {
    throw new ValidationError('Phone number is required when SMS consent is enabled');
  }

  const parsed = parsePhone(phone.trim());
  if (!parsed.isValid) {
    throw new ValidationError('Invalid phone number for SMS consent');
  }
}

Layer 3: Carrier — Network-Level STOP

Twilio handles STOP at the carrier level automatically. When a customer replies STOP, the carrier blocks all future messages from that sender number. Zero code needed on our side — but also zero visibility into whether a block is active.

Layer 4: Database — Webhook Consent Clearing

Twilio fires an "unsubscribed" delivery status webhook when a carrier-level STOP occurs. We catch this webhook, look up the customer session via the SMS message record, and set consentSmsRecovery = false plus a consentUpdatedAt timestamp. This prevents scheduling future SMS even if the carrier block is somehow bypassed, and creates an auditable consent trail.

Why all four layers? GDPR requires demonstrable consent management. A single-layer approach leaves gaps. The carrier layer (Layer 3) is outside your control — you need your own record of consent state. The consentUpdatedAt timestamp on every state change means you can prove exactly when consent was granted or revoked.

This stack applies to any transactional messaging under GDPR — order confirmations, shipping updates, marketing SMS. Substitute TCPA rules for US, CASL for Canada, and the layers still hold. The specific regulations change; the defense-in-depth architecture doesn't.

Building a Defensive SMS Sending Pipeline with BullMQ

The BullMQ worker processes one SMS job at a time. Every step is a reason not to send — the pipeline is designed to be maximally conservative. Better to skip a valid send than to fire one unauthorized message.

  1. The customer may have checked out. Query the session — if it's no longer abandoned (order completed since scheduling), discard the job silently.
  2. The merchant may have disabled SMS. Check ShopSmsConfig.enabled — merchants can toggle SMS off mid-cycle. Respect it immediately.
  3. The customer may have revoked consent. Re-check consentSmsRecovery and phone validity. The 4-Layer Consent Stack catches stale consent here.
  4. Runaway sends are possible. Rate limit: 1 SMS per phone per 24 hours, 100 per shop per cron cycle. Prevents a bug from becoming a spam incident.
  5. It might be 3AM for the customer. Quiet hours check using country-UTC offset. If outside the 10AM–7PM window, re-queue with a delay until the window opens.
  6. The org may have run out of credits. Reserve-Confirm: atomic SELECT FOR UPDATE. Insufficient credits → skip, don't fail the job. The merchant sees a "low credits" warning in their dashboard.
  7. The message must be localized. Select the i18n template matching the session's locale (7 supported: en, lt, de, fr, es, pl, ru). Interpolate the shop name and a recovery URL signed with an HMAC token. The URL includes &step=cart_recovery_sms for conversion attribution.
  8. Twilio may reject or fail. Call the Twilio API. On success: confirm the credit reservation. On failure: release credits immediately, record the failure status on the SmsMessage record. A delivery webhook will arrive later for final status — but we don't wait for it.

Design Tradeoffs: What We Chose and Why

Every architecture encodes tradeoffs. Here are ours, stated explicitly so you can evaluate whether they fit your constraints:

  • Credits vs. metered billing. Credits are simpler (no real-time usage tracking infrastructure), create natural purchase moments in the merchant dashboard, and merchants understand "200 credits left" better than "$0.045 per segment next month." The tradeoff: merchants may over-buy or under-buy. We mitigate with three pack sizes and a monthly tier allocation (PRO: 100/mo, ULTRA: 500/mo).
  • Dedicated numbers vs. shared short codes. Dedicated costs $1/month per tenant but gives full isolation and better deliverability. Shared short codes are cheaper but risk one tenant's spam complaints tanking deliverability for everyone. For a multi-tenant platform, isolation wins.
  • 45-minute delay after abandonment. Based on industry benchmarks for recovery timing. Too soon feels like surveillance ("how did they know I left?"). Too late and the customer has moved on. The 45-minute mark places SMS 15 minutes after the first recovery email — close enough to reinforce it, far enough to not feel like a barrage. This is a tunable constant, not a hard-coded value.
  • Org-scoped vs. shop-scoped credits. Organizations with multiple shops share one credit pool. Simpler billing, one Stripe checkout, one ledger. The tradeoff: one high-volume shop can drain credits from its siblings. Acceptable at most scales; enterprise customers with this problem can get per-shop pools as a future enhancement.

What This Enables

The system handles N tenants from day one — no per-shop infrastructure provisioning beyond the initial Twilio number. SMS supplements email recovery without replacing it: one more channel, one more chance to recover abandoned revenue.

More importantly, the Reserve-Confirm Credit Pattern and the 4-Layer Consent Stack are reusable beyond SMS. Order confirmation texts, shipping updates, restock alerts, promotional messaging — each uses the same credit pool, the same consent checks, and the same queue infrastructure. The SMS recovery system is the first consumer, not the only one.

Frequently Asked Questions

How much does SMS cart recovery cost per country?

It depends on the carrier. A message to the US or Canada costs 1 credit (~€0.03 on our Starter pack). To the UK or Poland, 2 credits (~€0.06). To Germany, France, or Switzerland, 3 credits (~€0.09). We mapped 23 countries to credit multipliers that reflect actual Twilio per-segment pricing. Unknown countries default to 3x (worst-case) to avoid undercharging.

Is SMS cart recovery GDPR compliant?

It can be, if you implement proper consent management. A single checkbox is necessary but not sufficient. You need: explicit opt-in (unchecked by default), server-side consent validation, a carrier-level STOP mechanism, and a database-level consent record with timestamps. Our 4-Layer Consent Stack implements all four. The key requirement: you must be able to prove when consent was granted and when it was revoked.

When should you send an SMS vs. an email for cart recovery?

Don't choose — send both. Email fires first (immediately on abandonment detection). SMS fires 45 minutes later (15 minutes after email step 1). They use separate queues and separate rate limits. If the customer recovers from the email, the SMS job checks session status before sending and silently discards itself. The two channels reinforce each other without duplicating effort.

How do you prevent SMS credit overdraft in a multi-tenant system?

Use the Reserve-Confirm Credit Pattern: lock the credit balance row with SELECT FOR UPDATE before every send attempt, deduct credits inside the same transaction, and release them on failure. A stale-reservation cleanup cron catches edge cases where the process crashes between reservation and confirmation. The lock window is milliseconds — long enough to prevent concurrent overdraft, short enough to not affect throughput.

Want to solve hard problems with us?

We're always looking for exceptional engineers to join the team.

View Open Roles