New Service Launch Checklist
Overview
Use this checklist when launching a new microservice, application, or infrastructure component.
Phase 1: Planning & Design
- Service name decided (follow naming convention:
service-name) - Tech stack approved (document in project README)
- Architecture reviewed (create architecture diagram)
- Data storage needs identified (database, cache, storage)
- Third-party dependencies listed (APIs, services, licenses)
- Resource estimates (CPU, memory, storage, costs)
- Security requirements (authentication, authorization, data sensitivity)
Phase 2: Repository Setup
-
GitHub repository created
- Name:
company-org/service-name - Visibility: Private
- Initialize with README
- Name:
-
Team access configured
@all-engineering→ Read@<your-team>→ Write@devops-team→ Admin
-
Branch protection enabled on
main:- Require PR before merge
- Require 1 approval
- Require CI checks to pass
- No force pushes
- No deletions
-
Repository secrets added:
ANSIBLE_VAULT_PASSWORD_STAGINGANSIBLE_VAULT_PASSWORD_PROD- Any service-specific API keys
-
README.md created with:
- Service description
- Setup instructions
- Development guide
- Architecture overview
- API documentation (if applicable)
-
CODEOWNERS file added (optional):
* @your-team
/ansible/ @devops-team
Phase 3: CI/CD Setup
-
CI workflow created (
.github/workflows/ci.yml):- Build
- Lint
- Unit tests
- Integration tests
-
Security scanning workflow (
.github/workflows/security.yml):- Gitleaks (secret scanning)
- Dependency scanning
- SAST (Semgrep or similar)
-
Docker build workflow (if applicable):
- Build Docker image
- Scan with Trivy
- Push to registry
-
Deployment workflows:
deploy-staging.yml(auto-deploy on merge to main)deploy-production.yml(manual trigger with approval)
-
Status checks configured (required before merge)
Phase 4: Infrastructure
-
Development environment provisioned:
- Servers added to
inventory/development/hosts.yml - Ansible playbook created/updated
- Services running and accessible
- Servers added to
-
Staging environment provisioned:
- Servers added to
inventory/staging/hosts.yml - DNS configured:
service-name.staging.company.com - SSL certificate provisioned
- Monitoring configured
- Servers added to
-
Production infrastructure planned (don't provision yet):
- Server requirements documented
- Networking planned (VPC, subnets, firewall)
- Load balancing (if needed)
- Backup strategy defined
-
Ansible role created (if new service type):
roles/service-name/
├── tasks/main.yml
├── handlers/main.yml
├── templates/
├── files/
└── defaults/main.yml -
Database provisioned (if needed):
- Database created in dev/staging
- User and permissions configured
- Connection string in Ansible Vault
- Migrations tested
Phase 5: Security & Secrets
-
Secrets management configured:
- Ansible Vault files created for each environment
- GitHub Secrets configured for CI/CD
- No secrets committed to Git (verified)
-
Pre-commit hooks enabled:
# .pre-commit-config.yaml
repos:
- repo: https://github.com/gitleaks/gitleaks
rev: v8.18.0
hooks:
- id: gitleaks -
Authentication implemented:
- Keycloak SSO (if user-facing)
- API authentication (if API service)
- Service-to-service auth
-
Authorization implemented:
- Role-based access control
- Permission checks on sensitive operations
-
Input validation on all endpoints
-
Rate limiting configured (if API)
Phase 6: Observability
-
Logging configured:
- Structured logging (JSON format)
- Appropriate log levels
- Sensitive data not logged
- Logs shipped to central logging system
-
Metrics exposed:
- Application metrics (requests, errors, latency)
- Resource metrics (CPU, memory)
- Business metrics (if applicable)
-
Health check endpoint implemented:
GET /health → 200 OK -
Monitoring dashboards created:
- Service health dashboard
- Performance metrics
- Error rates
-
Alerts configured:
- Service down alert
- High error rate alert
- Performance degradation alert
- Resource exhaustion alert
-
Alerts route to Rocket.Chat #alerts channel
Phase 7: Documentation
-
Internal documentation in Docusaurus:
- Service overview in
docs/03-projects/service-name.md - Architecture diagram
- API documentation
- Runbooks for common operations
- Service overview in
-
Deployment documentation:
- How to deploy
- How to rollback
- Configuration options
- Environment variables
-
Troubleshooting guide:
- Common errors and solutions
- Where to find logs
- Who to contact for help
-
Announced in #engineering channel:
New service launched: service-name
Purpose: [brief description]
Docs: [link]
Staging: https://service-name.staging.company.com
Questions: Ask in #engineering or tag @your-team
Phase 8: Testing & Validation
-
Unit tests written (>80% coverage target)
-
Integration tests written
-
End-to-end tests (if applicable)
-
Load testing performed:
- Expected load tested
- Performance benchmarks recorded
- Bottlenecks identified and addressed
-
Security testing:
- OWASP Top 10 verified
- Authentication/authorization tested
- Input validation tested
- Dependency vulnerabilities scanned
-
Staging tested thoroughly before production
Phase 9: Production Deployment
Only after everything above is complete:
-
Production infrastructure provisioned:
- Servers created in DigitalOcean
- Added to
inventory/production/hosts.yml - Provisioned via Ansible
-
DNS configured:
service-name.company.compoints to load balancer/server- SSL certificate configured
-
Production secrets configured:
- Ansible Vault production secrets added
- GitHub Secrets for production added
-
Monitoring verified in production
-
Alerts tested (trigger test alert, verify delivery)
-
Backup configured and tested:
- Database backups (if applicable)
- Backup restoration tested
-
Initial production deployment:
- Triggered via GitHub Actions
- Health checks passing
- Monitoring data flowing
- No errors in logs
-
Production announcement:
🚀 service-name is now in production!
URL: https://service-name.company.com
Status: https://service-name.company.com/health
Monitoring: [link to dashboard]
On-call: @your-team
Phase 10: Post-Launch
-
Monitor closely for first 24-48 hours
-
Incident response plan documented:
- Who's on-call for this service
- Escalation path
- Common issues and fixes
-
Performance baseline established:
- Normal CPU/memory usage
- Typical request rates
- Expected error rates
-
Post-launch review scheduled (1 week after launch):
- What went well
- What could be improved
- Action items for next service
-
Documentation updated based on real-world usage
Ongoing Maintenance
After launch, ensure:
- Regular dependency updates (Dependabot)
- Security patches applied promptly
- Monitoring dashboards reviewed regularly
- Runbooks kept up-to-date
- Team trained on service operations
Getting Help
Questions about this checklist?
- Ask in #devops channel
- Tag @devops-team
- Review other services in
docs/03-projects/for examples
Need approval for something?
- Infrastructure costs → DevOps lead
- Production access → Manager + DevOps
- Third-party services → Budget owner