Infrastructure as Code (Terraform)
Providers, resources, and state (and why state/remote state matters), modules for reuse, and plan vs apply.
Terraform manages infrastructure as code: you declare the resources you want in HCL, and Terraform figures out the API calls to create, change, or destroy them to match. The concept that makes this safe — and the one interviewers always probe — is state: Terraform’s record of what it has already built, which it diffs against your code to compute the minimal change. Get state, and plan/apply, remote backends, drift, and modules all fall out of it.
plan vs apply — the two-step that makes it safe
The workflow that prevents you from accidentally destroying production: terraform plan computes and shows the diff (what will be created, changed, destroyed) without touching anything; terraform apply executes that plan after you confirm. You review the plan like a code diff before it runs.
| Command | Does | Touches infra? | Reads / writes state |
|---|---|---|---|
init | Downloads providers, configures the backend | No | Initializes the backend |
plan | Diffs code vs state vs reality, prints the change | No | Reads state (refreshes) |
apply | Executes the planned create/update/destroy | Yes | Writes/updates state |
destroy | Tears down everything in state | Yes | Empties state |
A first resource
A minimal config: configure a provider, then declare a resource. Terraform reads *.tf files in the directory as one config.
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
provider "aws" {
region = "us-east-1"
}
variable "env" {
type = string
description = "Environment name, e.g. staging or prod"
}
resource "aws_s3_bucket" "assets" {
bucket = "myapp-assets-${var.env}"
tags = {
Environment = var.env
ManagedBy = "terraform"
}
}
output "bucket_name" {
value = aws_s3_bucket.assets.bucket
}terraform plan -var env=staging shows it will create one bucket named myapp-assets-staging; apply creates it and records its real ID in state. Change the tags and re-plan — Terraform shows an in-place update, not a destroy-and-recreate, because it diffs the new desired attributes against what state says exists. The output exposes the bucket name to the CLI or to a parent module.
Remote state — why it matters
State is the heart of Terraform, and the local default terraform.tfstate file is fine for solo experiments but dangerous for a team.
Modules for reuse
When you need the same stack across staging and prod, don’t copy-paste — extract a module (a directory of resources exposing input variables and outputs) and instantiate it with different inputs:
module "staging_network" {
source = "./modules/vpc"
cidr_block = "10.0.0.0/16"
env = "staging"
}
module "prod_network" {
source = "./modules/vpc"
cidr_block = "10.1.0.0/16"
env = "prod"
}
One vetted module, two environments, no drift between them — the same DRY benefit modules give in application code.
01 Learning objectives
0 / 1 done02 Curated reading
03 Knowledge check
- 01medium
Terraform state matters because it:
04 Interview questions
browse all ↗What gets asked on this topic — tap a card for how to approach it, the follow-ups, and the trap. Company tags are best-effort & sourced.
-
What is the Terraform state file, and why does it matter so much?
State is Terraform's record (
terraform.tfstate, JSON) mapping each resource in your config to the real-world object it created — IDs, attributes, and metadata. Terraform needs it to know what it already manages, so on the nextplanit can diff your desired config against reality and compute the minimal set of changes.Without state, Terraform could not tell the difference between 'create a new resource' and 'this resource already exists, just update it', and it would have no way to know what to destroy. State also caches attribute values and tracks dependencies. Because it can contain sensitive values (passwords, keys) in plaintext, it must be protected — which leads straight into remote state.
Follow-ups they push on- Why can't Terraform just query the cloud provider instead of keeping state?
- Why is committing tfstate to a git repo dangerous?
Red flag Treating state as a disposable cache or committing it to git — it can hold secrets in plaintext, and a lost/corrupt state file orphans real infrastructure that Terraform no longer recognizes.
source: Terraform docs — State ↗ -
What is remote state and state locking, and what problem do they solve on a team?
Local state lives on one engineer's laptop — useless for a team and easy to lose. Remote state stores the state file in a shared backend (S3, Azure Blob, GCS, Terraform Cloud) so everyone reads and writes the same source of truth, and sensitive state is not scattered across machines.
State locking prevents two people from running
applyagainst the same state at the same time. Backends acquire a lock (e.g. S3 with a DynamoDB lock table, or native locking in Terraform Cloud) for the duration of the operation; a second concurrent apply is blocked until the lock releases. Without locking, two simultaneous applies interleave writes and corrupt the state file, leaving Terraform's view inconsistent with reality.Follow-ups they push on- What corrupts the state if two engineers apply at the same time without a lock?
- How do you implement locking with an S3 backend?
Red flag Using a shared remote backend without locking — concurrent applies race on the state file and corrupt it, after which plans no longer match reality.
source: Terraform docs — Backends and remote state ↗ -
What is the difference between `terraform plan` and `terraform apply`?
planis a dry run: Terraform refreshes state, compares your desired configuration against the current state, and prints the exact set of actions it would take — what gets created, updated in place, replaced (destroy+create), or destroyed — without changing anything. It is your review-before-you-touch-prod safety check, and you can save it to a file.applyexecutes those changes against the real providers and then writes the new state. If you pass a saved plan file, apply runs exactly that plan with no surprises; without one, apply shows the plan again and asks for confirmation. The senior habit is to always read the plan output (especially anything marked for replacement/destruction) before approving an apply.Follow-ups they push on- What does it mean when a plan shows a resource will be replaced rather than updated in place?
- Why apply a saved plan file in automation?
Red flag Running `apply -auto-approve` in CI without reviewing the plan — you can silently destroy and recreate a stateful resource (like a database) that a config change forced to be replaced.
source: Terraform docs — terraform plan / apply ↗ -
What are Terraform modules and why do you use them?
A module is a reusable, parameterized bundle of Terraform resources — a directory with input variables, resources, and outputs. Instead of copy-pasting the same 200 lines to stand up a VPC or a service in dev, staging, and prod, you write it once as a module and call it three times with different inputs.
The payoff is DRY infrastructure, consistency (every environment provisions the same way), and an interface boundary: callers only deal with the module's variables and outputs, not its internals. Every Terraform config has an implicit root module; you compose it from child modules (your own, or versioned modules from the registry). The trap is over-abstracting too early — wrap something in a module once you actually have repetition, not speculatively.
Follow-ups they push on- How do you pass data in and out of a module?
- How do you pin a module to a specific version and why?
Red flag Over-modularizing on day one — wrapping a single-use resource in a deeply nested module hierarchy adds indirection without the reuse that justifies it.
source: Terraform docs — Modules ↗ -
What is configuration drift, and how do you detect and reconcile it in Terraform?
Drift is when the real infrastructure no longer matches what Terraform's state/config says — typically because someone made a change by hand in the cloud console ('ClickOps') outside Terraform.
Detection:
terraform planrefreshes state against the provider and shows the divergence as changes it wants to make; aplanthat proposes changes you did not author is drift. Reconcile in one of two directions: bring the real resource back in line by re-applying your config, or, if the manual change is desirable, update the Terraform config to match (and apply). For resources created outside Terraform,terraform importbrings them under management.The durable fix is process: make Terraform the single source of truth, restrict console write access, and run plan in CI on a schedule to catch drift early.
Follow-ups they push on- How does a scheduled `plan` in CI help you catch drift?
- When would you update the config to match reality instead of reverting reality?
Red flag Letting people make changes in the cloud console alongside Terraform — the next apply silently reverts their manual fix (or vice versa), and the two views of reality keep fighting.
source: Terraform docs — Manage resource drift ↗ -
What is the difference between a Terraform provider and a resource?
A provider is a plugin that teaches Terraform how to talk to a specific platform's API —
aws,google,azurerm,cloudflare,kubernetes. You configure it once (region, credentials), and it exposes the set of resource and data-source types for that platform.A resource is a single managed object you declare —
resource "aws_s3_bucket" "assets" { ... }describes one bucket. The provider knows how to create, read, update, and delete that resource type via the platform's API. So: the provider is the integration layer; resources are the things you actually provision through it. A data source is the read-only sibling — it looks up existing infrastructure without managing it.Follow-ups they push on- How is a data source different from a resource?
- Can one Terraform config use multiple providers at once?
Red flag Confusing a resource with a data source — a resource is created and managed by Terraform; a data source only reads existing infrastructure and never creates anything.
source: Terraform docs — Providers ↗ -
Why is Infrastructure as Code better than clicking through a cloud console, and what is the difference between declarative and imperative IaC?
IaC makes infrastructure versioned, reviewable, and reproducible. Config lives in git, so changes go through pull requests and code review, you have an audit trail, you can roll back, and you can stand up an identical environment on demand instead of relying on someone remembering which buttons they clicked. It eliminates configuration drift and snowflake servers.
Declarative vs imperative: declarative (Terraform) means you describe the desired end state and the tool figures out the steps and the diff to get there — apply it twice and nothing extra happens (idempotent). Imperative (a shell/SDK script) means you spell out the steps to take, and re-running can double-create or fail because it does not reason about current state. Terraform is declarative, which is why
plancan show you precisely what will change before anything happens.Follow-ups they push on- Why does declarative IaC give you idempotency for free?
- How does putting infra in git change your change-management process?
Red flag Describing Terraform as a script that 'runs commands to build infra' — that is the imperative mental model; Terraform reconciles toward a declared end state and is idempotent.
source: Terraform docs — What is Terraform / intro ↗ -
How do you bring an existing, manually-created cloud resource under Terraform management?
You import it — Terraform's state knows nothing about resources it didn't create, so you have to tell it. The two-part move: (1) write a matching
resourceblock in your config for the existing object, then (2) bring it into state, either with the CLIterraform import <resource_address> <real_id>or, in modern Terraform, animportblock that does it as part ofplan/apply(and can even generate config).The critical detail interviewers probe: importing only updates state, it does not write your configuration. If your hand-written resource block doesn't match the real object's settings, the very next
planwill propose changes to 'fix' the real resource back to your (incomplete) config. So after importing you runplanand iterate on the config until the plan is clean (no changes) — that confirms config, state, and reality all agree.This is also how you remediate drift / ClickOps: adopt the orphaned resource instead of destroying and recreating it.
What a strong answer coversTerraform ignores anything it didn't create — you must import existing resources into state.
Two steps: write a matching
resourceblock, thenterraform import(or animport {}block).Import updates state only — it does not generate or fix your config.
Iterate until
planshows no changes, proving config + state + reality agree.It's the safe way to adopt ClickOps/orphaned resources without destroy-and-recreate.
Quick self-checkAfter `terraform import` of an existing bucket, the next `plan` wants to modify it. Why?
-
Correct — import never writes config, so any mismatch shows up as a proposed change until you align the block.
-
Import does neither — it just records the existing resource in state.
-
It can, precisely via import; this is incorrect.
-
A normal import doesn't corrupt state; the diff comes from a config/real-world mismatch.
Follow-ups they push on- Why does a fresh import often produce a plan that wants to change the resource?
- What's the difference between the CLI `import` command and an `import` block?
- How does import help you fix drift without recreating infrastructure?
Red flag Running `terraform import` and assuming you're done — import only writes state, not config, so a mismatched resource block makes the next apply try to 'correct' the real resource; you must get a clean plan first.
source: Terraform docs — Import existing resources ↗ -
How do you manage multiple environments (dev / staging / prod) in Terraform, and why are workspaces often the wrong tool?
The common patterns: separate state per environment with a shared module. You write the infrastructure once as a module, then have a thin per-environment root config (
environments/prod,environments/staging) that calls the module with different variables (instance sizes, counts) and, crucially, its own backend/state file. This isolates blast radius — a badapplyin staging can't touch prod's state.Terraform workspaces let one config switch between multiple state files (
default,dev,prod) without copying code. They're tempting for environments but are usually the wrong fit: they share the same backend and code, it's easy to runapplyagainst the wrong workspace by accident (no separate credentials/approval boundary), and they don't capture genuinely different configs well. They're better suited to short-lived, near-identical parallel copies (e.g. per-feature-branch ephemeral envs).Senior answer: isolate prod with its own state, backend, and credentials; use modules for DRY; reserve workspaces for ephemeral, structurally-identical environments.
What a strong answer coversDefault pattern: one shared module + thin per-env root configs with separate state/backends.
Separate state per env isolates blast radius — staging mistakes can't corrupt prod.
Workspaces swap state files on one config/backend — convenient but no real isolation boundary.
Workspace risk: applying to the wrong environment with no separate credentials/approval.
Use workspaces for ephemeral, identical envs; use separate state+backend for dev/staging/prod.
Follow-ups they push on- Why does sharing a backend across environments via workspaces increase risk?
- How do modules keep multi-environment configs DRY?
- When are workspaces genuinely the right tool?
Red flag Using a single workspace-switched config for prod and staging — one fat-fingered `terraform workspace select` and an `apply` hits the wrong environment, with no separate backend or credential boundary to stop it.
source: Terraform docs — Workspaces ↗ -
What is the difference between `count` and `for_each` for creating multiple resources, and why does it matter for state?
Both create multiple instances of a resource, but they key the instances differently in state, and that's the whole game.
countproduces a list indexed by integer position —resource[0],resource[1].for_eachproduces a map keyed by a stable string —resource["web"],resource["db"].The trap with
count: because instances are positional, removing an item from the middle of the list shifts every later index, so Terraform thinks those resources changed identity and proposes to destroy-and-recreate them. Withfor_each, each instance is bound to its own key, so deleting one only affects that one — the rest stay put.Guidance: use
countfor N identical, order-independent copies (or a simple on/off toggle,count = var.enabled ? 1 : 0); usefor_eachwhenever you iterate over a set/map of distinct things (named buckets, subnets per AZ) so that adding or removing one doesn't churn the others.What a strong answer coverscount→ list indexed by integer position;for_each→ map keyed by a stable string.Removing a middle
countelement shifts later indices, forcing destroy/recreate of unrelated resources.for_eachbinds each instance to its key, so add/remove touches only that instance.Use
countfor N identical copies or an on/off toggle (count = enabled ? 1 : 0).Use
for_eachfor a set/map of distinct named things (buckets, subnets per AZ).
Quick self-checkYou manage 5 distinct named S3 buckets and sometimes remove one from the middle. Which is safer?
-
Correct — each bucket is bound to its key, so removing one doesn't disturb the others' state addresses.
-
Removing a middle element shifts later indices, causing Terraform to recreate unrelated buckets.
-
Works but is not DRY and defeats the purpose of iterating; for_each is the idiomatic safe choice.
-
Same positional-index problem as any count, plus you can't give them distinct names cleanly.
Follow-ups they push on- Why does deleting the first of three `count` resources recreate the other two?
- When is `count` still the right choice over `for_each`?
- How do you reference a specific instance under each approach?
Red flag Using `count` over a list of distinct named resources — removing or reordering an element shifts every later index, so Terraform destroys and recreates resources you never intended to touch; `for_each` keyed by name avoids the churn.
source: Terraform docs — The for_each meta-argument ↗ -
Why is `terraform destroy` (or an accidental resource replacement) so dangerous, and how do you guard against it?
Terraform faithfully executes the declared end state — including deletion. The danger is that a config change can force a replace (destroy + create) of a resource you assumed would update in place: changing an attribute marked 'ForceNew' (an EC2 instance's AMI, a database's engine, a subnet) makes Terraform plan to destroy the old object and create a new one. On a stateful resource like a production database, that's data loss executed by a routine-looking apply.
Guards, layered: (1) read the plan — anything showing
-/+ destroy and then createor# forces replacementis a red flag, never-auto-approveblindly. (2) Addlifecycle { prevent_destroy = true }on critical resources so Terraform errors out rather than destroying them. (3) Usecreate_before_destroywhere a replacement is acceptable but downtime isn't. (4) Take backups / enable deletion protection on the cloud side as a last line. (5) For stateful data stores, often manage them outside the same Terraform lifecycle as ephemeral compute.The trick being tested: knowing that 'update' can silently mean 'replace', and that the plan output is your safety check.
What a strong answer coversA config change to a ForceNew attribute makes Terraform destroy + recreate — potential data loss.
The plan shows it as
-/+/# forces replacement— that's your red flag to stop.lifecycle { prevent_destroy = true }makes Terraform refuse to destroy critical resources.create_before_destroyavoids downtime when a replace is genuinely acceptable.Layer cloud-side deletion protection / backups; manage stateful stores apart from ephemeral compute.
Follow-ups they push on- How do you tell from a plan that a resource will be replaced rather than updated in place?
- What does `prevent_destroy` actually do when a destroy is attempted?
- Why separate a production database's lifecycle from your app's Terraform?
Red flag Approving a plan without noticing a `# forces replacement` on a stateful resource — Terraform will dutifully destroy the production database and create a fresh empty one, and `apply` doesn't ask 'are you sure this is a DB?'.
source: Terraform docs — The lifecycle meta-argument ↗ -
What are input variables, outputs, and locals in Terraform, and how do they differ?
They're the three ways data flows through a config. Input variables (
variable) are the parameters a module accepts from its caller — the public 'function arguments' (region, instance size), set via.tfvars, CLI flags, or env vars, and typed/validated. Outputs (output) are the values a module exposes back to its caller or the CLI — the 'return values' (a created VPC's ID, a load balancer's DNS name) that other modules consume. Locals (locals) are named intermediate expressions used *inside* a config to avoid repetition — computed once, referenced aslocal.name, never settable from outside.The mental model: variables are inputs (caller → module), outputs are results (module → caller), locals are private helpers (internal only). This is exactly what makes a module a clean interface: callers only touch its variables and outputs, never its internals.
A practical note: mark sensitive variables/outputs
sensitive = trueso Terraform redacts them in plan/apply logs.What a strong answer coversVariables: a module's input parameters (caller → module), typed and validatable.
Outputs: values a module returns (module → caller / CLI), consumed by other modules.
Locals: private named expressions, computed once, used internally to avoid repetition.
Together, variables + outputs form a module's clean public interface; locals stay internal.
Use
sensitive = trueto redact secret variables/outputs from logs.
Follow-ups they push on- Why can't a local be set from outside the module?
- How does one module consume another module's output?
- When would you mark a variable or output `sensitive`?
Red flag Confusing locals with variables — a local is a computed internal helper that callers can't override, while a variable is the external input; using a local where you needed a configurable input makes the module non-parameterizable.
source: Terraform docs — Variables and outputs ↗ -
How does Terraform decide the order to create resources? What are implicit vs explicit dependencies?
Terraform builds a dependency graph from your config and creates/updates/destroys resources in the order that graph implies, parallelizing wherever there's no dependency between resources. You rarely specify order yourself.
Implicit dependencies are inferred from references: if a security group rule uses
aws_vpc.main.id, Terraform knows the VPC must exist first, because the rule reads an attribute of the VPC. This is the idiomatic, preferred way — wire resources together by referencing each other's attributes and the ordering falls out automatically (and correctly, including on destroy, which runs in reverse).Explicit dependencies use
depends_onto force an ordering Terraform can't infer — typically when there's a *hidden* relationship not expressed through a reference (e.g. an app needs an IAM policy attached before it runs, but doesn't reference the attachment's attributes). Usedepends_onsparingly; over-using it usually means you should have referenced the attribute instead.What a strong answer coversTerraform builds a dependency graph and parallelizes independent resources automatically.
Implicit deps: inferred from attribute references (
aws_vpc.main.id) — the idiomatic way.Referencing attributes gets ordering right for create *and* destroy (reverse order) for free.
Explicit deps (
depends_on): force an order for a hidden relationship not expressed by a reference.Use
depends_onsparingly — usually a missing attribute reference is the real fix.
Follow-ups they push on- Why is an implicit dependency via attribute reference preferred over `depends_on`?
- Give an example where `depends_on` is genuinely necessary.
- How does the graph handle destroy ordering?
Red flag Sprinkling `depends_on` everywhere to 'be safe' — it serializes resources that could run in parallel and hides the real relationships; reference the attribute you depend on and let Terraform infer the order.
source: Terraform docs — Resource dependencies ↗