New Service Launch Checklist

Overview

Use this checklist when launching a new microservice, application, or infrastructure component.

Phase 1: Planning & Design

Service name decided (follow naming convention: service-name)
Tech stack approved (document in project README)
Architecture reviewed (create architecture diagram)
Data storage needs identified (database, cache, storage)
Third-party dependencies listed (APIs, services, licenses)
Resource estimates (CPU, memory, storage, costs)
Security requirements (authentication, authorization, data sensitivity)

Phase 2: Repository Setup

GitHub repository created
- Name: company-org/service-name
- Visibility: Private
- Initialize with README
Team access configured
- @all-engineering → Read
- @<your-team> → Write
- @devops-team → Admin
Branch protection enabled on main:
- Require PR before merge
- Require 1 approval
- Require CI checks to pass
- No force pushes
- No deletions
Repository secrets added:
- ANSIBLE_VAULT_PASSWORD_STAGING
- ANSIBLE_VAULT_PASSWORD_PROD
- Any service-specific API keys
README.md created with:
- Service description
- Setup instructions
- Development guide
- Architecture overview
- API documentation (if applicable)
CODEOWNERS file added (optional):
```
* @your-team
/ansible/ @devops-team
```

Phase 3: CI/CD Setup

CI workflow created (.github/workflows/ci.yml):
- Build
- Lint
- Unit tests
- Integration tests
Security scanning workflow (.github/workflows/security.yml):
- Gitleaks (secret scanning)
- Dependency scanning
- SAST (Semgrep or similar)
Docker build workflow (if applicable):
- Build Docker image
- Scan with Trivy
- Push to registry
Deployment workflows:
- deploy-staging.yml (auto-deploy on merge to main)
- deploy-production.yml (manual trigger with approval)
Status checks configured (required before merge)

Phase 4: Infrastructure

Development environment provisioned:
- Servers added to inventory/development/hosts.yml
- Ansible playbook created/updated
- Services running and accessible
Staging environment provisioned:
- Servers added to inventory/staging/hosts.yml
- DNS configured: service-name.staging.company.com
- SSL certificate provisioned
- Monitoring configured
Production infrastructure planned (don't provision yet):
- Server requirements documented
- Networking planned (VPC, subnets, firewall)
- Load balancing (if needed)
- Backup strategy defined

Ansible role created (if new service type):

roles/service-name/
├── tasks/main.yml
├── handlers/main.yml
├── templates/
├── files/
└── defaults/main.yml

Database provisioned (if needed):
- Database created in dev/staging
- User and permissions configured
- Connection string in Ansible Vault
- Migrations tested

Phase 5: Security & Secrets

Secrets management configured:
- Ansible Vault files created for each environment
- GitHub Secrets configured for CI/CD
- No secrets committed to Git (verified)

Pre-commit hooks enabled:

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/gitleaks/gitleaks
    rev: v8.18.0
    hooks:
      - id: gitleaks

Authentication implemented:
- Keycloak SSO (if user-facing)
- API authentication (if API service)
- Service-to-service auth
Authorization implemented:
- Role-based access control
- Permission checks on sensitive operations
Input validation on all endpoints
Rate limiting configured (if API)

Phase 6: Observability

Logging configured:
- Structured logging (JSON format)
- Appropriate log levels
- Sensitive data not logged
- Logs shipped to central logging system
Metrics exposed:
- Application metrics (requests, errors, latency)
- Resource metrics (CPU, memory)
- Business metrics (if applicable)
Health check endpoint implemented:
```
GET /health → 200 OK
```
Monitoring dashboards created:
- Service health dashboard
- Performance metrics
- Error rates
Alerts configured:
- Service down alert
- High error rate alert
- Performance degradation alert
- Resource exhaustion alert
Alerts route to Rocket.Chat #alerts channel

Phase 7: Documentation

Internal documentation in Docusaurus:
- Service overview in docs/03-projects/service-name.md
- Architecture diagram
- API documentation
- Runbooks for common operations
Deployment documentation:
- How to deploy
- How to rollback
- Configuration options
- Environment variables
Troubleshooting guide:
- Common errors and solutions
- Where to find logs
- Who to contact for help

Announced in #engineering channel:

New service launched: service-name
Purpose: [brief description]
Docs: [link]
Staging: https://service-name.staging.company.com
Questions: Ask in #engineering or tag @your-team

Phase 8: Testing & Validation

Unit tests written (>80% coverage target)
Integration tests written
End-to-end tests (if applicable)
Load testing performed:
- Expected load tested
- Performance benchmarks recorded
- Bottlenecks identified and addressed
Security testing:
- OWASP Top 10 verified
- Authentication/authorization tested
- Input validation tested
- Dependency vulnerabilities scanned
Staging tested thoroughly before production

Phase 9: Production Deployment

Only after everything above is complete:

Production infrastructure provisioned:
- Servers created in DigitalOcean
- Added to inventory/production/hosts.yml
- Provisioned via Ansible
DNS configured:
- service-name.company.com points to load balancer/server
- SSL certificate configured
Production secrets configured:
- Ansible Vault production secrets added
- GitHub Secrets for production added
Monitoring verified in production
Alerts tested (trigger test alert, verify delivery)
Backup configured and tested:
- Database backups (if applicable)
- Backup restoration tested
Initial production deployment:
- Triggered via GitHub Actions
- Health checks passing
- Monitoring data flowing
- No errors in logs

Production announcement:

🚀 service-name is now in production!

URL: https://service-name.company.com
Status: https://service-name.company.com/health
Monitoring: [link to dashboard]
On-call: @your-team

Phase 10: Post-Launch

Monitor closely for first 24-48 hours
Incident response plan documented:
- Who's on-call for this service
- Escalation path
- Common issues and fixes
Performance baseline established:
- Normal CPU/memory usage
- Typical request rates
- Expected error rates
Post-launch review scheduled (1 week after launch):
- What went well
- What could be improved
- Action items for next service
Documentation updated based on real-world usage

Ongoing Maintenance

After launch, ensure:

Regular dependency updates (Dependabot)
Security patches applied promptly
Monitoring dashboards reviewed regularly
Runbooks kept up-to-date
Team trained on service operations

Getting Help

Questions about this checklist?

Ask in #devops channel
Tag @devops-team
Review other services in docs/03-projects/ for examples

Need approval for something?

Infrastructure costs → DevOps lead
Production access → Manager + DevOps
Third-party services → Budget owner

Overview​

Phase 1: Planning & Design​

Phase 2: Repository Setup​

Phase 3: CI/CD Setup​

Phase 4: Infrastructure​

Phase 5: Security & Secrets​

Phase 6: Observability​

Phase 7: Documentation​

Phase 8: Testing & Validation​

Phase 9: Production Deployment​

Phase 10: Post-Launch​

Ongoing Maintenance​

Getting Help​