Skip to main content

New Service Launch Checklist

Overview

Use this checklist when launching a new microservice, application, or infrastructure component.

Phase 1: Planning & Design

  • Service name decided (follow naming convention: service-name)
  • Tech stack approved (document in project README)
  • Architecture reviewed (create architecture diagram)
  • Data storage needs identified (database, cache, storage)
  • Third-party dependencies listed (APIs, services, licenses)
  • Resource estimates (CPU, memory, storage, costs)
  • Security requirements (authentication, authorization, data sensitivity)

Phase 2: Repository Setup

  • GitHub repository created

    • Name: company-org/service-name
    • Visibility: Private
    • Initialize with README
  • Team access configured

    • @all-engineering → Read
    • @<your-team> → Write
    • @devops-team → Admin
  • Branch protection enabled on main:

    • Require PR before merge
    • Require 1 approval
    • Require CI checks to pass
    • No force pushes
    • No deletions
  • Repository secrets added:

    • ANSIBLE_VAULT_PASSWORD_STAGING
    • ANSIBLE_VAULT_PASSWORD_PROD
    • Any service-specific API keys
  • README.md created with:

    • Service description
    • Setup instructions
    • Development guide
    • Architecture overview
    • API documentation (if applicable)
  • CODEOWNERS file added (optional):

    * @your-team
    /ansible/ @devops-team

Phase 3: CI/CD Setup

  • CI workflow created (.github/workflows/ci.yml):

    • Build
    • Lint
    • Unit tests
    • Integration tests
  • Security scanning workflow (.github/workflows/security.yml):

    • Gitleaks (secret scanning)
    • Dependency scanning
    • SAST (Semgrep or similar)
  • Docker build workflow (if applicable):

    • Build Docker image
    • Scan with Trivy
    • Push to registry
  • Deployment workflows:

    • deploy-staging.yml (auto-deploy on merge to main)
    • deploy-production.yml (manual trigger with approval)
  • Status checks configured (required before merge)

Phase 4: Infrastructure

  • Development environment provisioned:

    • Servers added to inventory/development/hosts.yml
    • Ansible playbook created/updated
    • Services running and accessible
  • Staging environment provisioned:

    • Servers added to inventory/staging/hosts.yml
    • DNS configured: service-name.staging.company.com
    • SSL certificate provisioned
    • Monitoring configured
  • Production infrastructure planned (don't provision yet):

    • Server requirements documented
    • Networking planned (VPC, subnets, firewall)
    • Load balancing (if needed)
    • Backup strategy defined
  • Ansible role created (if new service type):

    roles/service-name/
    ├── tasks/main.yml
    ├── handlers/main.yml
    ├── templates/
    ├── files/
    └── defaults/main.yml
  • Database provisioned (if needed):

    • Database created in dev/staging
    • User and permissions configured
    • Connection string in Ansible Vault
    • Migrations tested

Phase 5: Security & Secrets

  • Secrets management configured:

    • Ansible Vault files created for each environment
    • GitHub Secrets configured for CI/CD
    • No secrets committed to Git (verified)
  • Pre-commit hooks enabled:

    # .pre-commit-config.yaml
    repos:
    - repo: https://github.com/gitleaks/gitleaks
    rev: v8.18.0
    hooks:
    - id: gitleaks
  • Authentication implemented:

    • Keycloak SSO (if user-facing)
    • API authentication (if API service)
    • Service-to-service auth
  • Authorization implemented:

    • Role-based access control
    • Permission checks on sensitive operations
  • Input validation on all endpoints

  • Rate limiting configured (if API)

Phase 6: Observability

  • Logging configured:

    • Structured logging (JSON format)
    • Appropriate log levels
    • Sensitive data not logged
    • Logs shipped to central logging system
  • Metrics exposed:

    • Application metrics (requests, errors, latency)
    • Resource metrics (CPU, memory)
    • Business metrics (if applicable)
  • Health check endpoint implemented:

    GET /health → 200 OK
  • Monitoring dashboards created:

    • Service health dashboard
    • Performance metrics
    • Error rates
  • Alerts configured:

    • Service down alert
    • High error rate alert
    • Performance degradation alert
    • Resource exhaustion alert
  • Alerts route to Rocket.Chat #alerts channel

Phase 7: Documentation

  • Internal documentation in Docusaurus:

    • Service overview in docs/03-projects/service-name.md
    • Architecture diagram
    • API documentation
    • Runbooks for common operations
  • Deployment documentation:

    • How to deploy
    • How to rollback
    • Configuration options
    • Environment variables
  • Troubleshooting guide:

    • Common errors and solutions
    • Where to find logs
    • Who to contact for help
  • Announced in #engineering channel:

    New service launched: service-name
    Purpose: [brief description]
    Docs: [link]
    Staging: https://service-name.staging.company.com
    Questions: Ask in #engineering or tag @your-team

Phase 8: Testing & Validation

  • Unit tests written (>80% coverage target)

  • Integration tests written

  • End-to-end tests (if applicable)

  • Load testing performed:

    • Expected load tested
    • Performance benchmarks recorded
    • Bottlenecks identified and addressed
  • Security testing:

    • OWASP Top 10 verified
    • Authentication/authorization tested
    • Input validation tested
    • Dependency vulnerabilities scanned
  • Staging tested thoroughly before production

Phase 9: Production Deployment

Only after everything above is complete:

  • Production infrastructure provisioned:

    • Servers created in DigitalOcean
    • Added to inventory/production/hosts.yml
    • Provisioned via Ansible
  • DNS configured:

    • service-name.company.com points to load balancer/server
    • SSL certificate configured
  • Production secrets configured:

    • Ansible Vault production secrets added
    • GitHub Secrets for production added
  • Monitoring verified in production

  • Alerts tested (trigger test alert, verify delivery)

  • Backup configured and tested:

    • Database backups (if applicable)
    • Backup restoration tested
  • Initial production deployment:

    • Triggered via GitHub Actions
    • Health checks passing
    • Monitoring data flowing
    • No errors in logs
  • Production announcement:

    🚀 service-name is now in production!

    URL: https://service-name.company.com
    Status: https://service-name.company.com/health
    Monitoring: [link to dashboard]
    On-call: @your-team

Phase 10: Post-Launch

  • Monitor closely for first 24-48 hours

  • Incident response plan documented:

    • Who's on-call for this service
    • Escalation path
    • Common issues and fixes
  • Performance baseline established:

    • Normal CPU/memory usage
    • Typical request rates
    • Expected error rates
  • Post-launch review scheduled (1 week after launch):

    • What went well
    • What could be improved
    • Action items for next service
  • Documentation updated based on real-world usage

Ongoing Maintenance

After launch, ensure:

  • Regular dependency updates (Dependabot)
  • Security patches applied promptly
  • Monitoring dashboards reviewed regularly
  • Runbooks kept up-to-date
  • Team trained on service operations

Getting Help

Questions about this checklist?

  • Ask in #devops channel
  • Tag @devops-team
  • Review other services in docs/03-projects/ for examples

Need approval for something?

  • Infrastructure costs → DevOps lead
  • Production access → Manager + DevOps
  • Third-party services → Budget owner