6.2.4 ★ core [J][A] 13 interview Q's

Infrastructure as Code (Terraform)

Providers, resources, and state (and why state/remote state matters), modules for reuse, and plan vs apply.

Terraform manages infrastructure as code: you declare the resources you want in HCL, and Terraform figures out the API calls to create, change, or destroy them to match. The concept that makes this safe — and the one interviewers always probe — is state: Terraform’s record of what it has already built, which it diffs against your code to compute the minimal change. Get state, and plan/apply, remote backends, drift, and modules all fall out of it.

Key vocabulary

Provider: A plugin that teaches Terraform how to talk to a specific platform's API — aws, google, azurerm, kubernetes, cloudflare. You configure it (region, credentials) once; it exposes the resource types you can declare.
Resource: A single managed piece of infrastructure — an EC2 instance, an S3 bucket, a DNS record. You declare its desired attributes; Terraform creates and tracks it. The core building block of every config.
State: Terraform's JSON record mapping your declared resources to the real-world objects it created (IDs, attributes). On every run it diffs code vs state vs reality to plan the smallest change. Lose the state and Terraform forgets what it owns.
Remote state: Storing the state file in a shared backend (S3 + DynamoDB lock, Terraform Cloud, GCS) instead of locally — so a team shares one source of truth and a state lock prevents two people applying at once and corrupting it.
Module: A reusable, parameterized bundle of resources (inputs → resources → outputs) — the function/package of Terraform. Write a vpc module once, instantiate it per environment with different variables.
Drift: When the real infrastructure no longer matches state — usually because someone changed a Terraform-managed resource by hand in the console. The next plan tries to revert it; the discipline is to change managed resources only through Terraform.

plan vs apply — the two-step that makes it safe

The workflow that prevents you from accidentally destroying production: terraform plan computes and shows the diff (what will be created, changed, destroyed) without touching anything; terraform apply executes that plan after you confirm. You review the plan like a code diff before it runs.

FIG 1 · the three-way diff Terraform computes the change by reconciling three pictures of the world: your code (desired), the state file (last-known), and the live API (reality). plan prints the diff; apply executes it and rewrites state.

Command	Does	Touches infra?	Reads / writes state
`init`	Downloads providers, configures the backend	No	Initializes the backend
`plan`	Diffs code vs state vs reality, prints the change	No	Reads state (refreshes)
`apply`	Executes the planned create/update/destroy	Yes	Writes/updates state
`destroy`	Tears down everything in state	Yes	Empties state

Always read the plan before you apply — it's the dry run that tells you what apply will really do.

A first resource

A minimal config: configure a provider, then declare a resource. Terraform reads *.tf files in the directory as one config.

An S3 bucket, parameterized with a variable

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = "us-east-1"
}

variable "env" {
  type        = string
  description = "Environment name, e.g. staging or prod"
}

resource "aws_s3_bucket" "assets" {
  bucket = "myapp-assets-${var.env}"

  tags = {
    Environment = var.env
    ManagedBy   = "terraform"
  }
}

output "bucket_name" {
  value = aws_s3_bucket.assets.bucket
}

terraform plan -var env=staging shows it will create one bucket named myapp-assets-staging; apply creates it and records its real ID in state. Change the tags and re-plan — Terraform shows an in-place update, not a destroy-and-recreate, because it diffs the new desired attributes against what state says exists. The output exposes the bucket name to the CLI or to a parent module.

Remote state — why it matters

State is the heart of Terraform, and the local default terraform.tfstate file is fine for solo experiments but dangerous for a team.

Local state on a team = corruption + leaked secrets

Two hazards with a local state file. First, concurrency: if two engineers run apply against the same infrastructure with their own copies of state, they overwrite each other’s record and Terraform loses track of what exists — orphaned or duplicated resources follow. Second, state contains sensitive values in plaintext (DB passwords, generated keys), so a terraform.tfstate committed to Git is a credential leak.

The fix is remote state with locking — e.g. an S3 backend plus a DynamoDB lock table:

terraform {
  backend "s3" {
    bucket         = "myapp-tfstate"
    key            = "prod/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "tf-locks"   # acquires a lock during apply
    encrypt        = true
  }
}

Now state lives in one encrypted, shared place; the lock guarantees only one apply mutates it at a time, and everyone plans against the same truth.

Modules for reuse

When you need the same stack across staging and prod, don’t copy-paste — extract a module (a directory of resources exposing input variables and outputs) and instantiate it with different inputs:

module "staging_network" {
  source     = "./modules/vpc"
  cidr_block = "10.0.0.0/16"
  env        = "staging"
}

module "prod_network" {
  source     = "./modules/vpc"
  cidr_block = "10.1.0.0/16"
  env        = "prod"
}

One vetted module, two environments, no drift between them — the same DRY benefit modules give in application code.

The Friday-afternoon plan that read '- 47 to destroy' HashiCorp

The discipline of reading the plan before applying exists because of a recurring, expensive failure mode: an innocent-looking code change that Terraform interprets as destroy-and-recreate rather than update-in-place. Certain attribute changes — renaming a resource, changing a field the provider marks as “forces replacement” (an instance’s AZ, a bucket’s name, a DB’s engine) — make Terraform plan to delete the live resource and build a new one. On a stateful resource like a database or a load balancer with a stable IP, that’s an outage or data loss, not a tweak. The plan output spells it out (-/+ destroy and then create replacement, and a summary line like Plan: 1 to add, 0 to change, 1 to destroy), but only if you actually read it. Teams that skip straight to terraform apply -auto-approve in a hurry are the ones who learn this the hard way. The habit that saves you: treat every plan like a pull-request diff — scan the summary line, look for any destroy, and never auto-approve an apply you haven’t read.

read the writeup ↗ developer.hashicorp.com

01 Learning objectives

0 / 1 done

02 Curated reading

Terraform — Intro & docs
essential 20m — Core concepts: providers, state, modules, plan/apply.

03 Knowledge check

knowledge check1 questions · pass ≥ 70%

01medium
Terraform state matters because it:

04 Interview questions

browse all ↗

What gets asked on this topic — tap a card for how to approach it, the follow-ups, and the trap. Company tags are best-effort & sourced.

Commonly asked mid concept very common What is the Terraform state file, and why does it matter so much?
State is Terraform's record (terraform.tfstate, JSON) mapping each resource in your config to the real-world object it created — IDs, attributes, and metadata. Terraform needs it to know what it already manages, so on the next plan it can diff your desired config against reality and compute the minimal set of changes.
Without state, Terraform could not tell the difference between 'create a new resource' and 'this resource already exists, just update it', and it would have no way to know what to destroy. State also caches attribute values and tracks dependencies. Because it can contain sensitive values (passwords, keys) in plaintext, it must be protected — which leads straight into remote state.
Follow-ups they push on
- Why can't Terraform just query the cloud provider instead of keeping state?
- Why is committing tfstate to a git repo dangerous?
Red flag Treating state as a disposable cache or committing it to git — it can hold secrets in plaintext, and a lost/corrupt state file orphans real infrastructure that Terraform no longer recognizes.
source: Terraform docs — State ↗
Commonly asked senior concept very common What is remote state and state locking, and what problem do they solve on a team?
Local state lives on one engineer's laptop — useless for a team and easy to lose. Remote state stores the state file in a shared backend (S3, Azure Blob, GCS, Terraform Cloud) so everyone reads and writes the same source of truth, and sensitive state is not scattered across machines.
State locking prevents two people from running apply against the same state at the same time. Backends acquire a lock (e.g. S3 with a DynamoDB lock table, or native locking in Terraform Cloud) for the duration of the operation; a second concurrent apply is blocked until the lock releases. Without locking, two simultaneous applies interleave writes and corrupt the state file, leaving Terraform's view inconsistent with reality.
Follow-ups they push on
- What corrupts the state if two engineers apply at the same time without a lock?
- How do you implement locking with an S3 backend?
Red flag Using a shared remote backend without locking — concurrent applies race on the state file and corrupt it, after which plans no longer match reality.
source: Terraform docs — Backends and remote state ↗
Commonly asked junior concept very common What is the difference between `terraform plan` and `terraform apply`?
plan is a dry run: Terraform refreshes state, compares your desired configuration against the current state, and prints the exact set of actions it would take — what gets created, updated in place, replaced (destroy+create), or destroyed — without changing anything. It is your review-before-you-touch-prod safety check, and you can save it to a file.
apply executes those changes against the real providers and then writes the new state. If you pass a saved plan file, apply runs exactly that plan with no surprises; without one, apply shows the plan again and asks for confirmation. The senior habit is to always read the plan output (especially anything marked for replacement/destruction) before approving an apply.
Follow-ups they push on
- What does it mean when a plan shows a resource will be replaced rather than updated in place?
- Why apply a saved plan file in automation?
Red flag Running `apply -auto-approve` in CI without reviewing the plan — you can silently destroy and recreate a stateful resource (like a database) that a config change forced to be replaced.
source: Terraform docs — terraform plan / apply ↗
Commonly asked mid concept common What are Terraform modules and why do you use them?
A module is a reusable, parameterized bundle of Terraform resources — a directory with input variables, resources, and outputs. Instead of copy-pasting the same 200 lines to stand up a VPC or a service in dev, staging, and prod, you write it once as a module and call it three times with different inputs.
The payoff is DRY infrastructure, consistency (every environment provisions the same way), and an interface boundary: callers only deal with the module's variables and outputs, not its internals. Every Terraform config has an implicit root module; you compose it from child modules (your own, or versioned modules from the registry). The trap is over-abstracting too early — wrap something in a module once you actually have repetition, not speculatively.
Follow-ups they push on
- How do you pass data in and out of a module?
- How do you pin a module to a specific version and why?
Red flag Over-modularizing on day one — wrapping a single-use resource in a deeply nested module hierarchy adds indirection without the reuse that justifies it.
source: Terraform docs — Modules ↗
Commonly asked senior concept common What is configuration drift, and how do you detect and reconcile it in Terraform?
Drift is when the real infrastructure no longer matches what Terraform's state/config says — typically because someone made a change by hand in the cloud console ('ClickOps') outside Terraform.
Detection: terraform plan refreshes state against the provider and shows the divergence as changes it wants to make; a plan that proposes changes you did not author is drift. Reconcile in one of two directions: bring the real resource back in line by re-applying your config, or, if the manual change is desirable, update the Terraform config to match (and apply). For resources created outside Terraform, terraform import brings them under management.
The durable fix is process: make Terraform the single source of truth, restrict console write access, and run plan in CI on a schedule to catch drift early.
Follow-ups they push on
- How does a scheduled `plan` in CI help you catch drift?
- When would you update the config to match reality instead of reverting reality?
Red flag Letting people make changes in the cloud console alongside Terraform — the next apply silently reverts their manual fix (or vice versa), and the two views of reality keep fighting.
source: Terraform docs — Manage resource drift ↗
Commonly asked junior concept common What is the difference between a Terraform provider and a resource?
A provider is a plugin that teaches Terraform how to talk to a specific platform's API — aws, google, azurerm, cloudflare, kubernetes. You configure it once (region, credentials), and it exposes the set of resource and data-source types for that platform.
A resource is a single managed object you declare — resource "aws_s3_bucket" "assets" { ... } describes one bucket. The provider knows how to create, read, update, and delete that resource type via the platform's API. So: the provider is the integration layer; resources are the things you actually provision through it. A data source is the read-only sibling — it looks up existing infrastructure without managing it.
Follow-ups they push on
- How is a data source different from a resource?
- Can one Terraform config use multiple providers at once?
Red flag Confusing a resource with a data source — a resource is created and managed by Terraform; a data source only reads existing infrastructure and never creates anything.
source: Terraform docs — Providers ↗
Commonly asked mid concept common Why is Infrastructure as Code better than clicking through a cloud console, and what is the difference between declarative and imperative IaC?
IaC makes infrastructure versioned, reviewable, and reproducible. Config lives in git, so changes go through pull requests and code review, you have an audit trail, you can roll back, and you can stand up an identical environment on demand instead of relying on someone remembering which buttons they clicked. It eliminates configuration drift and snowflake servers.
Declarative vs imperative: declarative (Terraform) means you describe the desired end state and the tool figures out the steps and the diff to get there — apply it twice and nothing extra happens (idempotent). Imperative (a shell/SDK script) means you spell out the steps to take, and re-running can double-create or fail because it does not reason about current state. Terraform is declarative, which is why plan can show you precisely what will change before anything happens.
Follow-ups they push on
- Why does declarative IaC give you idempotency for free?
- How does putting infra in git change your change-management process?
Red flag Describing Terraform as a script that 'runs commands to build infra' — that is the imperative mental model; Terraform reconciles toward a declared end state and is idempotent.
source: Terraform docs — What is Terraform / intro ↗
Commonly asked senior concept common How do you bring an existing, manually-created cloud resource under Terraform management?
You import it — Terraform's state knows nothing about resources it didn't create, so you have to tell it. The two-part move: (1) write a matching resource block in your config for the existing object, then (2) bring it into state, either with the CLI terraform import <resource_address> <real_id> or, in modern Terraform, an import block that does it as part of plan/apply (and can even generate config).
The critical detail interviewers probe: importing only updates state, it does not write your configuration. If your hand-written resource block doesn't match the real object's settings, the very next plan will propose changes to 'fix' the real resource back to your (incomplete) config. So after importing you run plan and iterate on the config until the plan is clean (no changes) — that confirms config, state, and reality all agree.
This is also how you remediate drift / ClickOps: adopt the orphaned resource instead of destroying and recreating it.
What a strong answer covers
- Terraform ignores anything it didn't create — you must import existing resources into state.
- Two steps: write a matching resource block, then terraform import (or an import {} block).
- Import updates state only — it does not generate or fix your config.
- Iterate until plan shows no changes, proving config + state + reality agree.
- It's the safe way to adopt ClickOps/orphaned resources without destroy-and-recreate.
Quick self-check
After `terraform import` of an existing bucket, the next `plan` wants to modify it. Why?
Follow-ups they push on
- Why does a fresh import often produce a plan that wants to change the resource?
- What's the difference between the CLI `import` command and an `import` block?
- How does import help you fix drift without recreating infrastructure?
Red flag Running `terraform import` and assuming you're done — import only writes state, not config, so a mismatched resource block makes the next apply try to 'correct' the real resource; you must get a clean plan first.
source: Terraform docs — Import existing resources ↗
Commonly asked senior concept common How do you manage multiple environments (dev / staging / prod) in Terraform, and why are workspaces often the wrong tool?
The common patterns: separate state per environment with a shared module. You write the infrastructure once as a module, then have a thin per-environment root config (environments/prod, environments/staging) that calls the module with different variables (instance sizes, counts) and, crucially, its own backend/state file. This isolates blast radius — a bad apply in staging can't touch prod's state.
Terraform workspaces let one config switch between multiple state files (default, dev, prod) without copying code. They're tempting for environments but are usually the wrong fit: they share the same backend and code, it's easy to run apply against the wrong workspace by accident (no separate credentials/approval boundary), and they don't capture genuinely different configs well. They're better suited to short-lived, near-identical parallel copies (e.g. per-feature-branch ephemeral envs).
Senior answer: isolate prod with its own state, backend, and credentials; use modules for DRY; reserve workspaces for ephemeral, structurally-identical environments.
What a strong answer covers
- Default pattern: one shared module + thin per-env root configs with separate state/backends.
- Separate state per env isolates blast radius — staging mistakes can't corrupt prod.
- Workspaces swap state files on one config/backend — convenient but no real isolation boundary.
- Workspace risk: applying to the wrong environment with no separate credentials/approval.
- Use workspaces for ephemeral, identical envs; use separate state+backend for dev/staging/prod.
Follow-ups they push on
- Why does sharing a backend across environments via workspaces increase risk?
- How do modules keep multi-environment configs DRY?
- When are workspaces genuinely the right tool?
Red flag Using a single workspace-switched config for prod and staging — one fat-fingered `terraform workspace select` and an `apply` hits the wrong environment, with no separate backend or credential boundary to stop it.
source: Terraform docs — Workspaces ↗
Commonly asked senior concept occasional What is the difference between `count` and `for_each` for creating multiple resources, and why does it matter for state?
Both create multiple instances of a resource, but they key the instances differently in state, and that's the whole game. count produces a list indexed by integer position — resource[0], resource[1]. for_each produces a map keyed by a stable string — resource["web"], resource["db"].
The trap with count: because instances are positional, removing an item from the middle of the list shifts every later index, so Terraform thinks those resources changed identity and proposes to destroy-and-recreate them. With for_each, each instance is bound to its own key, so deleting one only affects that one — the rest stay put.
Guidance: use count for N identical, order-independent copies (or a simple on/off toggle, count = var.enabled ? 1 : 0); use for_each whenever you iterate over a set/map of distinct things (named buckets, subnets per AZ) so that adding or removing one doesn't churn the others.
What a strong answer covers
- count → list indexed by integer position; for_each → map keyed by a stable string.
- Removing a middle count element shifts later indices, forcing destroy/recreate of unrelated resources.
- for_each binds each instance to its key, so add/remove touches only that instance.
- Use count for N identical copies or an on/off toggle (count = enabled ? 1 : 0).
- Use for_each for a set/map of distinct named things (buckets, subnets per AZ).
Quick self-check
You manage 5 distinct named S3 buckets and sometimes remove one from the middle. Which is safer?
Follow-ups they push on
- Why does deleting the first of three `count` resources recreate the other two?
- When is `count` still the right choice over `for_each`?
- How do you reference a specific instance under each approach?
Red flag Using `count` over a list of distinct named resources — removing or reordering an element shifts every later index, so Terraform destroys and recreates resources you never intended to touch; `for_each` keyed by name avoids the churn.
source: Terraform docs — The for_each meta-argument ↗
Commonly asked senior trick occasional Why is `terraform destroy` (or an accidental resource replacement) so dangerous, and how do you guard against it?
Terraform faithfully executes the declared end state — including deletion. The danger is that a config change can force a replace (destroy + create) of a resource you assumed would update in place: changing an attribute marked 'ForceNew' (an EC2 instance's AMI, a database's engine, a subnet) makes Terraform plan to destroy the old object and create a new one. On a stateful resource like a production database, that's data loss executed by a routine-looking apply.
Guards, layered: (1) read the plan — anything showing -/+ destroy and then create or # forces replacement is a red flag, never -auto-approve blindly. (2) Add lifecycle { prevent_destroy = true } on critical resources so Terraform errors out rather than destroying them. (3) Use create_before_destroy where a replacement is acceptable but downtime isn't. (4) Take backups / enable deletion protection on the cloud side as a last line. (5) For stateful data stores, often manage them outside the same Terraform lifecycle as ephemeral compute.
The trick being tested: knowing that 'update' can silently mean 'replace', and that the plan output is your safety check.
What a strong answer covers
- A config change to a ForceNew attribute makes Terraform destroy + recreate — potential data loss.
- The plan shows it as -/+ / # forces replacement — that's your red flag to stop.
- lifecycle { prevent_destroy = true } makes Terraform refuse to destroy critical resources.
- create_before_destroy avoids downtime when a replace is genuinely acceptable.
- Layer cloud-side deletion protection / backups; manage stateful stores apart from ephemeral compute.
Follow-ups they push on
- How do you tell from a plan that a resource will be replaced rather than updated in place?
- What does `prevent_destroy` actually do when a destroy is attempted?
- Why separate a production database's lifecycle from your app's Terraform?
Red flag Approving a plan without noticing a `# forces replacement` on a stateful resource — Terraform will dutifully destroy the production database and create a fresh empty one, and `apply` doesn't ask 'are you sure this is a DB?'.
source: Terraform docs — The lifecycle meta-argument ↗
Commonly asked mid concept occasional What are input variables, outputs, and locals in Terraform, and how do they differ?
They're the three ways data flows through a config. Input variables (variable) are the parameters a module accepts from its caller — the public 'function arguments' (region, instance size), set via .tfvars, CLI flags, or env vars, and typed/validated. Outputs (output) are the values a module exposes back to its caller or the CLI — the 'return values' (a created VPC's ID, a load balancer's DNS name) that other modules consume. Locals (locals) are named intermediate expressions used *inside* a config to avoid repetition — computed once, referenced as local.name, never settable from outside.
The mental model: variables are inputs (caller → module), outputs are results (module → caller), locals are private helpers (internal only). This is exactly what makes a module a clean interface: callers only touch its variables and outputs, never its internals.
A practical note: mark sensitive variables/outputs sensitive = true so Terraform redacts them in plan/apply logs.
What a strong answer covers
- Variables: a module's input parameters (caller → module), typed and validatable.
- Outputs: values a module returns (module → caller / CLI), consumed by other modules.
- Locals: private named expressions, computed once, used internally to avoid repetition.
- Together, variables + outputs form a module's clean public interface; locals stay internal.
- Use sensitive = true to redact secret variables/outputs from logs.
Follow-ups they push on
- Why can't a local be set from outside the module?
- How does one module consume another module's output?
- When would you mark a variable or output `sensitive`?
Red flag Confusing locals with variables — a local is a computed internal helper that callers can't override, while a variable is the external input; using a local where you needed a configurable input makes the module non-parameterizable.
source: Terraform docs — Variables and outputs ↗
Commonly asked mid concept occasional How does Terraform decide the order to create resources? What are implicit vs explicit dependencies?
Terraform builds a dependency graph from your config and creates/updates/destroys resources in the order that graph implies, parallelizing wherever there's no dependency between resources. You rarely specify order yourself.
Implicit dependencies are inferred from references: if a security group rule uses aws_vpc.main.id, Terraform knows the VPC must exist first, because the rule reads an attribute of the VPC. This is the idiomatic, preferred way — wire resources together by referencing each other's attributes and the ordering falls out automatically (and correctly, including on destroy, which runs in reverse).
Explicit dependencies use depends_on to force an ordering Terraform can't infer — typically when there's a *hidden* relationship not expressed through a reference (e.g. an app needs an IAM policy attached before it runs, but doesn't reference the attachment's attributes). Use depends_on sparingly; over-using it usually means you should have referenced the attribute instead.
What a strong answer covers
- Terraform builds a dependency graph and parallelizes independent resources automatically.
- Implicit deps: inferred from attribute references (aws_vpc.main.id) — the idiomatic way.
- Referencing attributes gets ordering right for create *and* destroy (reverse order) for free.
- Explicit deps (depends_on): force an order for a hidden relationship not expressed by a reference.
- Use depends_on sparingly — usually a missing attribute reference is the real fix.
Follow-ups they push on
- Why is an implicit dependency via attribute reference preferred over `depends_on`?
- Give an example where `depends_on` is genuinely necessary.
- How does the graph handle destroy ordering?
Red flag Sprinkling `depends_on` everywhere to 'be safe' — it serializes resources that could run in parallel and hides the real relationships; reference the attribute you depend on and let Terraform infer the order.
source: Terraform docs — Resource dependencies ↗