SquareOps

Terraform Best Practices — at scale

About

Terraform Best Practices
Terraform may seem easy in the beginning but deploying architectures at scale can be a daunting task. Let’s look at some Terraform Best Practices — at scale.

Industries

Share Via

Introduction

Terraform is a powerful tool for infrastructure automation that allows teams to manage infrastructure as code. Learning terraform may seem easy in the beginning but deploying architectures at scale can be a daunting task , even for experienced professionals.

Here are a few tips and tricks we follow at SquareOps , that have proven to be useful in the longer run for managing large Scale infrastructure using terraform.

Forget .tfvars, sometimes !!!

We try to leverage the true power of terraform local variables. The easiest way to get started is to create a directory for each or your environments in your terraform git repository. e.g. env/staging , env/production. For example , this VPC reference file uses local variables Terraform-eks. Benefit ? you can avoid defining each and every variable , plus you can manage every configuration in one place in git .

Write modules

modules need to be independent and re-usable piece of code. We create custom modules on top of base modules available publicly. A good example can be our network module. It uses public vpc module published by AWS and then creates an EC2 instance for Pritunl VPN. So the resultant module can create VPC with a VPN appliance. Example —Terraform-aws-vpc

Follow a consistent directory structure

Using a consistent directory structure is essential for maintaining a clean and organized Terraform project. You should structure your Terraform code in a way that is easy to understand and navigate. A common directory structure for Terraform projects includes:

  • main.tf: contains the core infrastructure configuration.
  • variables.tf: contains input variables that can be passed to the main configuration.
  • outputs.tf: contains output variables that can be referenced by other resources.
  • modules/: contains reusable modules that can be used in the main configuration.
  • providers.tf: contains provider configuration.

Use control flags

Use flags in modules code to customize your architecture. Referring to same example of Pritunl VPN , we have a variable named — deploy_vpn = true. So that if we are deploying a development vpc or a network just for Proof Of Concept purpose, we do not need to deploy NAT Gateways or VPN appliance, hence we can disable these.

Refer remote state

Even if you start as a 2-person team , or one man army , it is advisable to use remote state. Also it makes sense to refer the outputs of other modules from remote state . E.g. when you plan to deploy an RDS instance, you can get the VPC ID and subnet information from remote state of the network module. This way your infrastructure deployment becomes loosely coupled

Pre-commit hooks

Git pre-commit hooks are a great saviour when it comes to maintain the coding standards in your IaaC repo . Our pre-commit hook configuration takes care of

  • Terraform linting using tflint
  • Generating/Updating Readme files using tfdocs
  • Formatting terraform code using terraform fmt ( It’s an obsession 🙂 )

PR Based workflow for development

When all the Infrastructure definition is stored in version control, then it becomes easy to implement an infrastructure change management process. For any change, create a branch from mainline branch, make the changes and review these via a Pull request before approve and apply. Refer the Pipeline Workflow section in here for more detailed walkthrough.

Static Code Analysis

Use Terraform Static code analysis tools , like Tfsec  to spot potential misconfigurations in code , even before it is used to deploy the resources on cloud.

Cost Projections

Cost is paramount to any deployment, and organizations often pay for what they don’t need ( or use )

A tool like Infracost can be integrated in your infrastructure deployment pipeline. It can generate projection for any new deployment , or even changes to existing deployment

Conclusion

Terraform is easy to get started , but not at all easy to do the right way . These techniques are adopted from our real life challenges and experience while building 100+ Architecture deliveries using terraform.

Happy Terraforming !!

Frequently asked questions

What are key Terraform best practices at scale?

Modularize code, use remote backends for state storage, implement version control, and use workspaces for environment separation. Leverage Terraform Cloud for team collaboration and state management.

How should I organize Terraform code at scale?

Break down code into reusable modules (e.g., networking, compute). Keep environment-specific configurations separate, and use a layered structure to minimize duplication.

How do I manage Terraform state files at scale?

Use remote backends (e.g., S3, Google Cloud Storage) with state locking (via DynamoDB). Separate state files for different environments and projects to avoid conflicts.

What are best practices for collaboration at scale?

Use version control (e.g., Git), enforce pull requests for reviews, and utilize Terraform Cloud/Enterprise for collaboration tools like RBAC and versioned plans.

How do I handle secrets management at scale?

Use secure secret managers like HashiCorp Vault, AWS Secrets Manager, or Azure Key Vault. Avoid hardcoding secrets and mark sensitive outputs to protect them.

How do I handle provider limitations at scale?

Break down large plans, configure timeouts/retry logic, and use Terraform Cloud for parallel runs to manage resource provisioning efficiently.

What is the role of versioning in Terraform at scale?

Use provider version constraints to ensure consistency, and regularly update Terraform and provider versions to avoid compatibility issues.

How do I optimize Terraform performance at scale?

Split large state files into smaller, environment-specific configurations, use terraform plan to preview changes, and apply targeted resource changes using -target.

How do I detect drift at scale?

Use terraform refresh to sync state with real resources, and use Terraform Cloud for automated drift detection. Run terraform plan regularly in CI/CD pipelines.

How do I manage Terraform dependencies at scale?

Use depends_on for explicit resource ordering, break infrastructure into independent modules, and avoid circular dependencies to keep configurations modular and maintainable.

Related Posts