Business Continuity and Disaster Recovery Plan

Owner: HailBytes CTO function. Last reviewed: 2026-05-10. Last tested: First tabletop exercise scheduled 2026-Q3 (not yet executed). Review cadence: Annual, plus on any change of significant scope.

Audience: Procurement reviewers, enterprise security architects, customer continuity assessors.

Purpose: Document the threat scenarios HailBytes plans for, the customer-continuity options that survive a HailBytes incident or a HailBytes-vanishing scenario, and the HailBytes-side recovery procedures for build, release, and support.

1. Scope and the BYOC framing

Because HailBytes ASM and SAT run inside the customer’s own cloud account (byoc-architecture.md), the BCP/DR analysis is split between:

Customer-side continuity. The customer operates the live data plane. Most “is my deployment still working” scenarios are the customer’s to handle, with HailBytes-provided artifacts to make it possible.
HailBytes-side continuity. What customers depend on HailBytes to keep doing: building and signing new releases, distributing them through the public container registry and Marketplaces, providing support, and continuing to exist as a counterparty to the contract.

The plan below treats both. The structural good news is that the customer-side data plane is decoupled from HailBytes’ own operations — a HailBytes outage does not interrupt a running customer deployment.

2. Threat scenarios

For each scenario the plan answers: what is the customer’s continuity posture, what is the HailBytes-side recovery posture, what are the dependencies.

2.1 HailBytes corporate incident — minor

Examples: brief outage of hailbytes.com, support hub downtime, ticket-handling delays.

Customer impact: None on the running data plane. New deployments and new image pulls may be delayed if the marketing site is the route a prospect was using; existing customers pulling images from ghcr.io are unaffected.
HailBytes-side recovery: standard incident response on the affected subsystem; estimated MTTR < 8 hours for marketing-site and support-hub class incidents.

2.2 HailBytes corporate incident — major

Examples: extended HailBytes corporate outage; HailBytes’ own GitHub organization compromised; HailBytes’ own AWS or Azure account compromised.

Customer impact: Existing deployments continue to run. The customer cannot pull new releases during the incident. Customer-elected integrations (Slack, SIEM, etc.) are not affected — they do not route through HailBytes.
HailBytes-side recovery: see §3 (GitHub-side) and §4 (cloud-account-side).
Customer recommendation during incident: continue running the last-known-good release; do not pull new images until HailBytes publishes “all clear” with a signed advisory.

2.3 Supply-chain compromise of HailBytes’ build pipeline

Examples: an attacker gains commit access to a HailBytes repository, modifies CI to ship a backdoored container image, or compromises the Sigstore keyless signing flow.

Detection: today, primary detection happens customer-side. A customer running the documented cosign verify command (security-evidence-package.md §3) will see a verification failure if the signing identity is anomalous, or a certificate-identity claim that does not match the expected GitHub workflow path. HailBytes-side, the Sigstore Rekor transparency log records every signing event publicly, so a post-incident query against Rekor identifies the range of potentially-poisoned images even without proactive monitoring. Active Rekor reconciliation automation is not currently planned; the customer-side cosign verify gate is the primary mitigation and the forensic Rekor log is the secondary.
Customer-side mitigation: Customers verifying image signatures with the documented cosign verify command (security-evidence-package.md §3) will see a verification failure if the signing identity is unexpected. Customers running pinned image digests (recommended, documented in the install guide) are not silently moved to a backdoored image even if :latest is poisoned.
HailBytes-side response:
1. Revoke the compromised OIDC credentials immediately at GitHub.
2. Identify the range of images potentially poisoned by querying Rekor for the signing-identity range.
3. Yank the affected images from ghcr.io and post a signed advisory listing affected digests.
4. Re-build from a clean commit, re-sign, re-publish.
5. Notify all customers via the security mailing list within 24 hours of confirmed compromise.

2.4 Cloud-provider outage affecting Marketplace distribution

Examples: extended AWS Marketplace or Azure Marketplace outage; container registry outage at GitHub.

Customer impact: Existing customer deployments are unaffected. New customer deployments require waiting out the cloud-provider outage on the affected marketplace; customers with engineering capacity can build their own Marketplace-equivalent image from source using the open-source Packer template (marketplace/packer/hailbytes-asm.pkr.hcl). In practice this scenario rarely produces customer-visible impact, because the typical deployment path is Marketplace VM artifacts rather than direct ghcr.io pulls during normal operation — the VM image carries the containers it needs.
HailBytes-side recovery: wait for cloud-provider recovery; communicate ETA via status page once published. Mirror serves as a partial degradation path.

2.5 Customer-tenant ransomware

This is the customer’s incident. Listing it here because the HailBytes contract obliges some level of support.

Customer-side: the customer’s incident response process applies, including their own backup restoration and forensic procedures.
HailBytes-side: support-hour rebuild assistance under the support contract — guidance on re-deploying from Marketplace, restoring from PostgreSQL dump, etc. HailBytes does not (and structurally cannot) hold a backup of the customer’s data, so HailBytes cannot restore data that the customer did not back up themselves.
Customer recommendation: documented in the hardening guide — daily PostgreSQL snapshots, 30-day retention, quarterly restore test, off-account snapshot copy for ransomware-resistance.

2.6 Key-person loss at HailBytes

Covered in detail in key-person-succession.md. Summary: production access map and continuity-of-relationships plan ensure no single person’s loss breaks customer support or release pipeline operations within the recovery windows below.

2.7 HailBytes ceases to exist

The hardest scenario, and the one BYOC structurally protects against best.

Customer continuity: see §5 (“Customer continuity under a HailBytes-vanishing scenario”) below. Summary: the customer’s deployment continues to function indefinitely without HailBytes intervention; the customer can operate without HailBytes for at least 90 days using documented procedures, and indefinitely with degraded ability to receive new releases.

3. GitHub-side recovery (build and release pipeline)

The release pipeline lives in GitHub. A pipeline-compromise scenario or a HailBytes-organization compromise scenario requires the following steps:

Containment. Revoke all OIDC federation tokens; rotate all GitHub PAT tokens held by HailBytes for automation; suspend Actions in the affected repositories.
Forensics. Pull Actions workflow run logs for the past 30 days. Reconcile every signed image (Rekor log) against an expected-signer ledger maintained by the security function.
Restoration. Restore from the last clean commit. The release pipeline workflows (.github/workflows/build.yml, .github/workflows/ci.yml) are version-controlled and recoverable from any local working copy. Signing identity is re-established by configuring the new OIDC trust policy in Fulcio (Sigstore) for the new GitHub workflow path.
Communication. Signed security advisory to the customer mailing list and on the marketing site within 24 hours of containment confirmation.

Expected restoration window from incident detection to “new releases can ship again”: 5 business days, dominated by forensic time, not technical recovery.

4. Cloud-account-side recovery (HailBytes’ own infrastructure)

HailBytes operates a small footprint of cloud infrastructure (marketing site on Cloudflare, Support Hub on Cloudflare Workers/Pages with KV/D1 storage, billing back-office, build VMs in AWS and Azure). Recovery steps:

Containment. Rotate cloud account root credentials; revoke all IAM keys; reset SSO. Disable any third-party integrations connected to the compromised account.
Restoration. Marketing site is statically built from this hailbytes-static repository and deployed via Cloudflare Pages; rebuild from a clean source commit. The Support Hub is a HailBytes-built application running on Cloudflare Workers + Pages with KV/D1 as its storage backend; the application code is version-controlled (rebuildable from source in the same Cloudflare account), and persistent state is restored from the most recent KV/D1 snapshot. Build VMs are recreated by Packer from version-controlled templates.
Audit. Reconcile Marketplace listings (AWS and Azure) against expected published versions; revoke any unexpected listing.

Expected restoration window: 24–72 hours depending on the subsystem.

5. Customer continuity under a HailBytes-vanishing scenario

The strongest BCP/DR claim HailBytes can make to a customer is: even if HailBytes ceases to exist tomorrow, your deployment continues to function and you can operate it for an extended period without HailBytes. The mechanisms below make that concrete.

5.1 Last-known-good container images on public registry

All HailBytes ASM and SAT container images are published to ghcr.io/hailbytes/* as public images. They are not gated by license check or HailBytes-controlled authentication. A customer who has pulled a release continues to have access to that release as long as GitHub continues to host the registry, even if HailBytes is not present to issue new releases.

HailBytes’ commitment to customers: at any contract end (including involuntary dissolution), the last-known-good images remain pullable; HailBytes will not unpublish images that customers depend on.

5.2 IaC reproducibility

The product is buildable from source. The Packer template (marketplace/packer/hailbytes-asm.pkr.hcl), provisioning scripts, hardening script, and Docker-Compose files are all in the public repositories at MIT-style licensing — see the LICENSE file in each repo for exact terms. A customer with engineering capacity can build their own Marketplace-equivalent image from source.

5.3 Source-code escrow

Status: offered on request for enterprise customers. The current source is public open-source (MIT-style) for ASM and SAT, so a traditional escrow arrangement is more conservative than the situation requires — the source code is already accessible to the customer. For customers whose procurement process requires a formal escrow agreement regardless, HailBytes will sign one using a reputable third-party escrow provider at the customer’s expense for the contract administration fees.

5.4 Operating without HailBytes for 90+ days

A customer with the deployment artifacts and the open-source repositories can:

Continue running the deployed version indefinitely; nothing in the running stack reaches out to HailBytes.
Apply security patches by rebuilding affected containers from source (Poetry / Go module dependencies are pinned and pullable from public package indexes).
Add features or fix bugs by modifying the open-source code.
Use the open-source community channel (GitHub issues, community Discord) for peer support.

The deployment does not stop working at any time clock; there is no license-server heartbeat to fail.

5.5 What the customer loses without HailBytes

Honestly:

Vendor support (response-time SLA, escalation to engineering).
Pre-built Marketplace images (the customer would need to build their own from the open-source repo).
Pre-vetted release artifacts with Trust Pack archives.
Coordinated security advisories.

These are real losses. The point of §5 is that they are degradation, not service interruption.

6. Customer continuity under a HailBytes-incident scenario

A specific case worth calling out: HailBytes has a corporate-side security incident, but is not dissolved.

The customer’s deployment continues to function (§2.2).
The customer should pause pulling new images until HailBytes publishes a signed all-clear advisory.
The customer should subscribe to HailBytes’ security mailing list before an incident occurs; the mailing list is the primary notification channel.

7. Test cadence

Annual tabletop exercise. Scenarios in §2.1, §2.2, §2.3, §2.4, §2.6. Run by David McHale (CTO function), with external facilitator Lost Rabbit Digital for the 2026 first exercise.
First tabletop exercise scheduled: 2026-Q3.
Customer-side restore drill recommendation: customers should run PostgreSQL restore from snapshot quarterly. The hardening guide documents the procedure.
HailBytes pipeline-recovery dry run: once before 2026-Q3 to validate §3 mechanics.

8. Communication plan

Internal: CTO function declares incident; assigns IC; opens an active-incident channel in HailBytes’ private collaboration tooling (separate from any public community channel).
External, customers: security mailing list (signed advisory) + status page at hailbytes.com/status/ + direct email to enterprise security contacts on record.
External, regulators: ANPD and EU supervisory authorities for incidents meeting LGPD Art. 48 / GDPR Art. 33 thresholds. See lgpd-compliance.md §5 and §13 for the threshold analysis.

9. Recovery time and recovery point objectives

These are HailBytes’ target windows for HailBytes-operated systems and represent operating goals, not contractual obligations unless incorporated into a signed agreement. Customer-side RTO/RPO is the customer’s responsibility (see §2.5 on snapshots).

System	RTO	RPO	Notes
`ghcr.io/hailbytes/*` (third-party operated; GitHub)	Inherits GitHub’s	Inherits GitHub’s	Last-known-good images stay pullable through any HailBytes incident
Build pipeline (`.github/workflows/*`)	5 business days from incident detection (worst case: compromise scenario)	Source is version-controlled; RPO ~ 0 from clean commit	§3
Marketing site (`hailbytes.com`)	24 hours	RPO ~ 0; rebuilt from `hailbytes-static` repo	Statically generated
Support Hub (HailBytes-built on Cloudflare Workers/Pages, KV/D1 storage)	24 hours	RPO depends on KV/D1 snapshot cadence (target: hourly snapshots, 7-day retention)	Application code rebuildable from source; persistent state restored from most recent snapshot
Customer deployments	Customer-managed	Customer-managed	HailBytes’ RTO/RPO does not apply

Cross-references: key-person-succession.md for §2.6; byoc-architecture.md for the structural premise that customer deployments survive HailBytes incidents; security-evidence-package.md §3 for the image-signature verification a customer would use during §2.3.