Production Checklist
A step-by-step checklist for hardening your evm-cloud deployment before running in production.
Remote State Backend
Terraform state contains all variable values, including secrets, in plaintext. Never use local state for production.
S3 + KMS + DynamoDB
# backend.tf (add to your example directory)
terraform {
backend "s3" {
bucket = "myorg-terraform-state"
key = "evm-cloud/production/terraform.tfstate"
region = "us-east-1"
encrypt = true
kms_key_id = "alias/terraform-state"
dynamodb_table = "terraform-locks"
}
}Create the prerequisite resources:
# S3 bucket with versioning
aws s3api create-bucket --bucket myorg-terraform-state --region us-east-1
aws s3api put-bucket-versioning --bucket myorg-terraform-state \
--versioning-configuration Status=Enabled
# KMS key for encryption
aws kms create-alias --alias-name alias/terraform-state \
--target-key-id $(aws kms create-key --query KeyMetadata.KeyId --output text)
# DynamoDB table for state locking
aws dynamodb create-table \
--table-name terraform-locks \
--attribute-definitions AttributeName=LockID,AttributeType=S \
--key-schema AttributeName=LockID,KeyType=HASH \
--billing-mode PAY_PER_REQUESTAfter adding the backend configuration, migrate existing state:
terraform init -migrate-stateEnvironment Isolation
Use separate state files (or separate directories) for each environment. Never share Terraform state across dev/staging/production.
Option A: Separate directories
environments/
dev/
main.tf # source = "../../"
dev.tfvars
secrets.auto.tfvars
backend.tf # key = "evm-cloud/dev/terraform.tfstate"
staging/
main.tf
staging.tfvars
secrets.auto.tfvars
backend.tf # key = "evm-cloud/staging/terraform.tfstate"
production/
main.tf
production.tfvars
secrets.auto.tfvars
backend.tf # key = "evm-cloud/production/terraform.tfstate"Option B: Terraform workspaces
terraform workspace new production
terraform workspace select production
terraform apply -var-file=production.tfvarsOption A is recommended -- it provides stronger isolation and makes it impossible to accidentally apply a dev config to production.
CI/CD Gates
On every pull request
Run make qa to catch formatting issues, validation errors, linting violations, and security misconfigurations:
make qa
# Runs: fmt-check + validate + lint (tflint) + security (checkov)Before merge
Run make verify to validate all examples plan successfully:
make verify
# Runs: qa + plan for all examples against LocalStackPlan-then-apply workflow
Never run terraform apply directly in CI. Generate a plan artifact, review it, then apply the specific plan file:
# CI: generate plan
terraform plan -var-file=production.tfvars -out=tfplan
# Upload tfplan as a build artifact
# After human review: apply the exact plan
terraform apply tfplanThis ensures the applied changes match exactly what was reviewed. No surprises from concurrent state changes.
Instance Sizing
Use t3.medium or larger for production. The default t3.medium (4 GB RAM) fits eRPC (1 GB) + rindexer (2 GB) + OS (1 GB) with no headroom.
| Workload | Recommended Instance | Memory Config |
|---|---|---|
| Single chain, few contracts | t3.medium (4 GB) | rindexer 2g, eRPC 1g |
| Single chain, many contracts | t3.large (8 GB) | rindexer 4g, eRPC 2g |
| Multi-chain or backfill | t3.xlarge (16 GB) | rindexer 8g, eRPC 2g |
Monitor resource usage to right-size:
# EC2: check Docker container memory
ssh ubuntu@<ip> "sudo docker stats --no-stream"
# K8s: check pod resource usage
kubectl top podsSee Variable Reference -- Instance Sizing for the full sizing table.
NAT Gateway
Enable the NAT gateway for production if your workloads run in private subnets and need outbound internet access (for RPC calls, ClickHouse Cloud connections, etc.):
network_enable_nat_gateway = trueWarning: NAT gateways add approximately $35/month plus data transfer charges. Skip for dev environments where instances can use public subnets.
Database Backup
Managed PostgreSQL (RDS)
Set the backup retention period:
# production.tfvars
postgres_backup_retention = 30 # days (default is 7)RDS handles automated daily backups and point-in-time recovery. Verify backups are running:
aws rds describe-db-instances \
--query "DBInstances[?DBInstanceIdentifier=='evm-cloud-prod'].{Retention:BackupRetentionPeriod,LatestRestore:LatestRestorableTime}"ClickHouse (BYODB)
If using ClickHouse Cloud, backups are managed by the service. For self-hosted ClickHouse, configure backups on the ClickHouse side -- evm-cloud does not manage external database backups.
Secrets Rotation
To rotate database credentials or other secrets:
- Update the password in
secrets.auto.tfvars(or yourTF_VAR_*environment variable) - Run
terraform apply - Verify the new credentials are propagated:
| Engine | Post-rotation Step |
|---|---|
| EC2 | SSH in, run pull-secrets.sh, then docker compose restart |
| EKS (terraform) | Terraform updates the K8s Secret and triggers rollout automatically |
| EKS (external) | Re-run deployers/eks/deploy.sh |
| k3s | Re-run deployers/k3s/deploy.sh |
| Bare metal | Terraform re-provisions .env via SSH automatically |
Set the Secrets Manager recovery window appropriately:
# production.tfvars
ec2_secret_recovery_window_in_days = 30 # default is 7, use 0 only for devSee Secrets Management for the full secrets lifecycle.
Destroy Safety
Kubernetes deployments (k3s, EKS external)
Always remove workloads before destroying infrastructure:
# Step 1: Remove workloads
./deployers/k3s/teardown.sh handoff.json
# or: helm uninstall rindexer erpc
# Step 2: Destroy infrastructure
terraform destroy -var-file=production.tfvarsSkipping step 1 will not leak AWS resources (the EC2 instance is terminated along with everything on it), but Helm release state and any persistent volumes will be lost without clean shutdown.
All deployments
Before running terraform destroy in production:
- Generate a destroy plan and review it:
terraform plan -destroy -var-file=production.tfvars -out=destroy-plan
# Review the plan carefully
terraform apply destroy-plan- Verify no unexpected resources are being destroyed (especially RDS instances or EBS volumes)
- Confirm database backups are current before destroying any database resources
Warning: Never run
terraform destroyin production without reviewing the destroy plan first. RDS instances have deletion protection enabled by default, but other resources do not.
Monitoring
EC2
Terraform creates a CloudWatch log group (available in the handoff as artifacts.cloudwatch_log_group). View logs:
# From the instance
ssh ubuntu@<ip> "sudo docker compose -f /opt/evm-cloud/docker-compose.yml logs rindexer --tail 100"
ssh ubuntu@<ip> "sudo docker compose -f /opt/evm-cloud/docker-compose.yml logs erpc --tail 100"
# Via CloudWatch (if configured)
aws logs tail /evm-cloud/my-indexer --followKubernetes (k3s, EKS)
kubectl logs -l app=rindexer --tail=100
kubectl logs -l app=erpc --tail=100
# Watch for restarts
kubectl get pods -wHealth checks
Verify the indexer is making progress:
# rindexer health endpoint
curl http://<ip>:3001/health
# eRPC health endpoint
curl http://<ip>:4000/healthNote: A future release will include a Prometheus + Grafana monitoring addon for dashboards and alerting. For now, use CloudWatch (EC2) or standard Kubernetes monitoring tooling.
Summary Checklist
Use this as a pre-launch checklist:
- Remote state backend configured (S3 + KMS + DynamoDB)
- Environment isolation in place (separate directories or workspaces)
- CI runs
make qaon every PR - Plan-then-apply workflow enforced (no direct
terraform applyin CI) - Instance sized for workload (
t3.mediumminimum, monitor and adjust) - NAT gateway enabled if private subnets need outbound access
- Database backup retention set (
postgres_backup_retention = 30) - Secrets Manager recovery window set to 30 days
-
secrets.auto.tfvarsandhandoff.jsonare in.gitignore - Destroy procedure documented (workload teardown before
terraform destroy) - Monitoring and log access verified
- Health check endpoints reachable
Related Pages
- Secrets Management -- Full secrets lifecycle and rotation
- Updating Configuration -- Post-deploy config changes
- Two-Phase Deployment -- k3s teardown procedure
- Variable Reference -- All configuration options with defaults
- Getting Started -- First deployment walkthrough