Production Engineering

Production Engineering

Overview

The goal of this document is to outline strict processes those that have access to PRODUCTION systems MUST follow.

It is intended to mitigate most classes of known threats while still allowing for high productivity via compartmentalization.

Production access where one has access to the personal data or assets of others is to be taken very seriously, and assigned only to those who have tasks that can not be performed without it.

These are the rules we wish to meet and model at #! and futher recommend for those who are managing targeted production systems.

Assumptions

  1. All of our screens are visible to an adversary
  2. All of our keyboards are logging to an adversary
  3. Any firmware/bootloaders not verified on every boot are compromised
  4. Any host OS with network access is compromised
  5. Any guest OS used for any purpose other than prod access is compromised
  6. At least one member of the PRODUCTION team is always compromised
  7. At least one maintainer of third party code we depend on is compromised

Requirements

  1. A PRODUCTION ENGINEER SHOULD have the following:
    • A clear set of tasks that can’t be completed without access
    • Experience in Red/Blue team, CTF, or documented CVE discoveries
    • A demonstrated working knowledge of:
      • Low level unix understanding. E.g. /proc, filesystems, kernel modules
      • Low level debugging techniques. E.g. strace, inotify, /proc, LD_PRELOAD
      • Linux kernel security features. E.g. seccomp, pam, cgroups, permissions
      • Common attack classes. E.g: XSS, Buffer Overflows, Social Engineering.
    • A majority passed interview panel with current PRODUCTION access team
    • An extensive background check clean of any evidence of dishonesty
    • Training on secret coercion and duress protocols shared with team
  2. A PRODUCTION ENGINEER MUST NOT expose a CRITICAL SECRET to:
    • a screen
    • a keyboard
    • an unsupervised peer
    • an internet connected OS other than destination that requires it
  3. A recommended ENTROPY SOURCE MUST be used to generate a CRITICAL SECRET
  4. An OS that accesses PRIVILEGED SYSTEMS MUST NOT be used for anything else
  5. Any OS without verified boot from firmware up MUST NOT be trusted
  6. Manual PRIVILEGED SYSTEM mutations MUST be approved, witnessed, and recorded
  7. PRIVILEGED SYSTEM mutatations MUST be automated and repeatable via code
  8. Any code without SECURITY REVIEW MUST NOT be trusted
  9. Any code we can not review ourselves as desired MUST NOT be trusted
  10. PRIVILEGED SYSTEM access requires physical interaction with an approved HSM

Implementation

Tools

  • HARDENED WORKSTATION
  • PERSONAL HSM
  • ENTROPY SOURCE

Setup

Install QubesOS

  • Enable full disk encryption
  • Set up Verified Boot
    • Choose “factory reset” after QubesOS install in boot options
    • Sign with PERSONAL HSM
    • Verify every boot by inserting PERSONAL HSM and observing green LED
  • Set up FDE password in hardware device via PERSONAL HSM
  • Set up Challenge Response authentication with PERSONAL HSM
    • PERSONAL HSM + password to unlock screen
    • Automatically lock screen when PERSONAL HSM removed

Configure Qubes

Vault
  • MUST NOT have internet access
  • Example use cases:
    • Personal GPG keychain management
    • Bulk encrypting/decrypting documents
    • Provisioning OTP devices
Production
  • MUST have internet access limited to only bastion servers
  • MUST manage all credentials via individually approved hardware decryption
    • password-store via PERSONAL HSM or mooltipass
  • SHOULD use whonix as the network gateway
  • Used only to reach PRODUCTION bastion servers and systems beyond them
Work
  • MUST have internet access limited to organization needs and partners
  • MUST manage all credentials via individually approved hardware decryption
    • password-store via PERSONAL HSM or mooltipass
  • SHOULD use whonix as the network gateway
  • Example use cases:
    • Read only access to AWS panel
    • Observe Kibana logs
    • Check organization email and chat
Personal
  • No internet access limits
  • SHOULD only be used for personal services
  • SHOULD use whonix as the network gateway
  • Example use cases:
    • Check personal email / chat
    • Personal finance
Development
  • No internet access limits
  • SHOULD only be used for development
  • SHOULD use whonix as the network gateway
  • SHOULD manage all credentials via individually approved hardware decryption
    • password-store via PERSONAL HSM
  • Example use cases:
    • Read online documentation
    • Authoring code
    • Submitting PRs
    • Doing code review
Disposable
  • SHOULD only be used for development
  • SHOULD use whonix as the network gateway
  • Example use cases:
    • Pentesting
    • Testing untrusted code or applications
    • Competitive research
    • Explore dark web portals for data dumps

Setup Keychain

  • Follow “Advanced” GPG setup guide in Vault Qube
    • Daily driver PERSONAL HSM holds subkeys
    • Separate locked-away PERSONAL HSM holds master key
    • Separate locked-away flash drive holds pubkeys and encrypted private keys

Workflow

The following describes the workflow taken to get signed code/artifacts that have become stablized in the DEVELOPMENT environment and are ready to enter the pipeline towards production.

Local

  • PRODUCTION ENGINEER or Software Engineer
    • Authors/tests changes to INFRASTRUCTURE REPOSOTORY in LOCAL environment
    • makes feature branch
    • makes one or more commits to feature branch
    • signs all commits with PERSONAL HSM
    • Optional: Squashes and re-signs sets of commits as desired
    • Submits code to peer for review
  • PRODUCTION ENGINEER
    • Verifies changes work as intended in “local” environment
    • Verifies changes have solid health checks and recovery practices
    • Merges reviewed branch into master branch with signed merge commit

Staging

  • PRODUCTION ENGINEER #1
    • Copies desired changes from LOCAL templates to STAGING templates
    • makes feature branch
    • makes one or more commits to feature branch
    • signs all commits with PERSONAL HSM
    • Optional: Squashes and re-signs sets of commits as desired
    • Submits code to peer for review
  • PRODUCTION ENGINEER #2
    • Optional: significant changes/migrations are tested in “sandbox” env.
    • Merges reviewed branch into master branch with signed merge commit
  • PRODUCTION ENGINEER #1
    • Logs into STAGING toolbox
    • Deploys changes from STAGING “toolbox”

Production

  • PRODUCTION ENGINEER #1
    • Copies desired changes from STAGING templates to PRODUCTION templates
    • makes feature branch
    • makes one or more commits to feature branch
    • signs all commits with PERSONAL HSM
    • Optional: Squashes and re-signs sets of commits as desired
    • Submits code to peer for review
  • PRODUCTION ENGINEER #2
    • Optional: significant changes/migrations are tested in “sandbox” env.
    • Merges reviewed branch into master branch with signed merge commit
  • 2+ PRODUCTION ENGINEERs
    • Logs into PRODUCTION toolbox via PRODUCTION bastion
    • Deploys changes from PRODUCTION “toolbox” with witness

Emergency Changes

  • 2+ PRODUCTION ENGINEERs
    • Log into PRODUCTION toolbox via PRODUCTION bastion
    • Deploy live fixes from PRODUCTION “toolbox” with witness
  • Regular Production process is followed from here

Changes

Ammendments to this document including the Exceptions section may be possible with a majority vote of current members of the Production Engineering team.

All changes must be via cryptographically signed commits by current Production Engineering team members to limit risk of social engineering.

Direct orders from your superiors that conflict with this document should be considered a product of duress, and thus respectfully ignored.

Appendix

Glossary

MUST, MUST NOT, SHOULD, SHOULD NOT, MAY

These key words correspond to their IETF definitions per RFC2119

INFRASTRUCTURE REPOSITORY

This repo, which is where all infrastructure-as-code gets integrated via direct templates or submodules as appropriate.

PRODUCTION

Deployment environment that faces the public internet and consumed by end users.

STAGING

Internally facing environment that is identical to PRODUCTION and will normally be one release ahead. Intended for use by contributors to test our services and deployment process before we deploy to any public facing environments.

DEVELOPMENT

Environment where changes are rapidly integrated for integration and development aid with or without code review.

This environment is never trusted

LOCAL

Environment designed to run in virtual machines on the workstation of every engineer. Designed to behave as close as possible to our production environment so engineers can rapidly test changes and catch issues early without waiting on longer deployment round trips to deploy to DEVELOPMENT.

This environment is intended to be the hand-off point between unprivilged contributors and the PRODUCTION ENGINEER team.

SECURITY REVIEW

We consider code to be suitably reviewed if it meets the following criteria:

  • Validated by member of the PRODUCTION ENGINEERING team to do what is expected
  • Audited for vulnerabilities by an approved security reviewer
Approved security reviewers
  • PRODUCTION ENGINEERING team
  • Doyensec
  • Cure53
  • Arch Linux
  • Bitcoin Core
  • Canonical
  • Cloud Native Computing Foundation
  • CoreOS
  • QubesOS
  • Raptor Engineering
  • Fedora Foundation
  • FreeBSD Foundation
  • OpenBSD Foundation
  • Gentoo Foundation
  • Google
  • Guardian Project
  • Hashicorp
  • Inverse Path
  • Linux Foundation
  • The Debian Project
  • Tor Project
  • ZCash Foundation

HARDENED WORKSTATION

A workstation that will come in contact with production write access must meet the following standards:

  • Requires PERSONAL HSM do firmware/boot integrity attestation every boot
  • Open firmware (with potential exception of Mangement Engine blob)
  • CPU Management Engine disabled or removed
  • Physical switches for microphone, webcam, wireless, and bluetooth
  • PS/2 interface for Keyboard/touchpad to mitigate USB spoof/crosstalk attacks
  • Librem 15
  • Librem 13
  • Insurgo PrivacyBeast X230
  • Raptor Computing Blackbird
  • Raptor Computing Talos II

ENTROPY SOURCE

Good entropy sources should always be impossible to predict for a human. These are also typically called a True Random Number Generator or TRNG.

In the event that we can not -prove- an entropy source is impossible to predict then multiple unrelated entropy sources must be used and combined with each other.

A given string of entropy must:

  • Be at least 256 bits long
  • Be whitened so there are no statistically significant patterns
    • sha3, xor-encrypt-xor, etc
Approved entropy sources
  • Infinite Noise
  • Built-in hardware RNG in an a PERSONAL HSM
  • Webcam
  • Microphone
  • Thermal resistor
  • Dice

PERSONAL HSM

Small form factor HSM capable of doing common required cryptographic operations such as GnuPG smartcard emulation, TOTP, challenge/response, etc.

The following devices are recommended for each respective use case.

WebAuthN / U2F
  • Yubikey 5+
  • u2f-zero
  • Nitrokey
  • OnlyKey
  • MacOS Touchbar
  • ChromeOS Fingerprint ID
Password Management
  • Yubikey 5+
  • Trezor Model T
  • Leger Nano X
  • Mooltipass
Encryption/Decryption/SSH
  • Yubikey 5+
  • Trezor Model T
  • Leger Nano X
Firmware Attestation
  • Librem Key
  • Nitrokey

PRODUCTION ENGINEER

One who has limited or complete access to detailed financial data or documents of our customers, as well as any access that might allow mutation or movement of their assets.

PRIVILEGED SYSTEM

Any system that has any level of access beyond that which is provided to all members of the engineering team.

CRITICAL SECRET

Any secret that has partial or complete power to move customer assets

If a given computer -ever- has PRODUCTION access, then secrets used to manage that system such as login, full disk encryption etc are in scope.