bregman-arie / sre-checklist
- четверг, 26 января 2023 г. в 00:38:04
A checklist of anyone practicing Site Reliability Engineering
Note: these checklists are opinionated. They are based on my own opinion and experience and are not universal truth (duh!
![]() Team - Responsibilities - Skills |
![]() Production - Requirements - Provisioning |
|
![]() Git - CI |
![]() Cloud - Provisioning |
![]() Kubernetes - Scaling |
Some will argue that the following skills and topics shouldn't be optional and are 100% must, but in my personal view they very much depend on your responsibilities as SRE and your work environment. If your environment consists mostly of bare metal (physical) servers for example and there are no containers, then why containers is a must for you?
GitOps adopted for GitOps agents themselves and not only for app related code and infra
DON'T install ArgoCD with kubectl commands
Helm chart not recommended as it lags behind official releases of new ArgoCD releases
Consider:
Think about:
The interesting topic of ensuring your environment can withstand unexpected disruptions
TODO: insert a list of steps to go towards the process of establishing and integrating chaos engineering in your environments
terraform apply
)Separate directory for each environment (staging, production, ...)
Separate backend for each environment (as you don't want share the same authentication and access controls for all environments)
Separate directory in each environment for each component and application
staging/
networking/
applications/
databases/
production/
networking/
applications/
web-app-1/
web-app-2/
databases/
mongo/
mysql/
global/
user_access_management/
modules/
applications/
web-app-1/...
You don't put everything in main.tf
locals
for_each
instead of count
. count
is very limited in modifying lists as it uses list position/index to rely on while for_each
is map, set based so it based on specific keysdefault_tags
. This will help you manage tags at scalepath references
(e.g. path.module
, path.root
)
data "cloud_image" "image_data" {
provider = ...
filter {
name = "name"
values = ["some-image-you-would-like-to-use"]
}
resource "some_instance" "instance" {
image = data.cloud_image.image_data.id
}
Basics!
Provider Credentials
- uses: aws-actions/configure-aws-credentials@v1
with:
role-to-assume: arn:aws:iam::someIamRole
aws-region: ...
CircleCI Context
and then specify it in your CircleCI config filecontext:
- some-context
Secrets in Terraform Configuration
sensitive = true
) and pass values with environment variables
Logos credits can be found here