> cs·fundamentals
interview 0% 22m read
6.2.4 ★ core [J][A] 13 interview Q's

Infrastructure as Code (Terraform)

Providers, resources, and state (and why state/remote state matters), modules for reuse, and plan vs apply.

Terraform manages infrastructure as code: you declare the resources you want in HCL, and Terraform figures out the API calls to create, change, or destroy them to match. The concept that makes this safe — and the one interviewers always probe — is state: Terraform’s record of what it has already built, which it diffs against your code to compute the minimal change. Get state, and plan/apply, remote backends, drift, and modules all fall out of it.

plan vs apply — the two-step that makes it safe

The workflow that prevents you from accidentally destroying production: terraform plan computes and shows the diff (what will be created, changed, destroyed) without touching anything; terraform apply executes that plan after you confirm. You review the plan like a code diff before it runs.

Three boxes — HCL code, state file, and live cloud — feed into a plan step that produces a diff; an apply step then mutates the cloud and updates the state file.*.tf (code)desired stateterraform.tfstatelast-knownlive cloud APIrealityplan3-way diffno changes made+ create ~ update- destroyreview like a code diffapplymutates cloudrewrites state →on confirmapply updates state + cloud together
FIG 1 · the three-way diff Terraform computes the change by reconciling three pictures of the world: your code (desired), the state file (last-known), and the live API (reality). plan prints the diff; apply executes it and rewrites state.
CommandDoesTouches infra?Reads / writes state
initDownloads providers, configures the backendNoInitializes the backend
planDiffs code vs state vs reality, prints the changeNoReads state (refreshes)
applyExecutes the planned create/update/destroyYesWrites/updates state
destroyTears down everything in stateYesEmpties state
Always read the plan before you apply — it's the dry run that tells you what apply will really do.

A first resource

A minimal config: configure a provider, then declare a resource. Terraform reads *.tf files in the directory as one config.

An S3 bucket, parameterized with a variable
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = "us-east-1"
}

variable "env" {
  type        = string
  description = "Environment name, e.g. staging or prod"
}

resource "aws_s3_bucket" "assets" {
  bucket = "myapp-assets-${var.env}"

  tags = {
    Environment = var.env
    ManagedBy   = "terraform"
  }
}

output "bucket_name" {
  value = aws_s3_bucket.assets.bucket
}

terraform plan -var env=staging shows it will create one bucket named myapp-assets-staging; apply creates it and records its real ID in state. Change the tags and re-plan — Terraform shows an in-place update, not a destroy-and-recreate, because it diffs the new desired attributes against what state says exists. The output exposes the bucket name to the CLI or to a parent module.

Remote state — why it matters

State is the heart of Terraform, and the local default terraform.tfstate file is fine for solo experiments but dangerous for a team.

Modules for reuse

When you need the same stack across staging and prod, don’t copy-paste — extract a module (a directory of resources exposing input variables and outputs) and instantiate it with different inputs:

module "staging_network" {
  source     = "./modules/vpc"
  cidr_block = "10.0.0.0/16"
  env        = "staging"
}

module "prod_network" {
  source     = "./modules/vpc"
  cidr_block = "10.1.0.0/16"
  env        = "prod"
}

One vetted module, two environments, no drift between them — the same DRY benefit modules give in application code.

01 Learning objectives

0 / 1 done

02 Curated reading

03 Knowledge check

knowledge check1 questions · pass ≥ 70%
  1. 01medium

    Terraform state matters because it:

04 Interview questions

browse all ↗

What gets asked on this topic — tap a card for how to approach it, the follow-ups, and the trap. Company tags are best-effort & sourced.

  • Commonly asked mid concept very common What is the Terraform state file, and why does it matter so much?

    State is Terraform's record (terraform.tfstate, JSON) mapping each resource in your config to the real-world object it created — IDs, attributes, and metadata. Terraform needs it to know what it already manages, so on the next plan it can diff your desired config against reality and compute the minimal set of changes.

    Without state, Terraform could not tell the difference between 'create a new resource' and 'this resource already exists, just update it', and it would have no way to know what to destroy. State also caches attribute values and tracks dependencies. Because it can contain sensitive values (passwords, keys) in plaintext, it must be protected — which leads straight into remote state.

    Red flag Treating state as a disposable cache or committing it to git — it can hold secrets in plaintext, and a lost/corrupt state file orphans real infrastructure that Terraform no longer recognizes.

    source: Terraform docs — State ↗
  • Commonly asked senior concept very common What is remote state and state locking, and what problem do they solve on a team?

    Local state lives on one engineer's laptop — useless for a team and easy to lose. Remote state stores the state file in a shared backend (S3, Azure Blob, GCS, Terraform Cloud) so everyone reads and writes the same source of truth, and sensitive state is not scattered across machines.

    State locking prevents two people from running apply against the same state at the same time. Backends acquire a lock (e.g. S3 with a DynamoDB lock table, or native locking in Terraform Cloud) for the duration of the operation; a second concurrent apply is blocked until the lock releases. Without locking, two simultaneous applies interleave writes and corrupt the state file, leaving Terraform's view inconsistent with reality.

    Red flag Using a shared remote backend without locking — concurrent applies race on the state file and corrupt it, after which plans no longer match reality.

    source: Terraform docs — Backends and remote state ↗
  • Commonly asked junior concept very common What is the difference between `terraform plan` and `terraform apply`?

    plan is a dry run: Terraform refreshes state, compares your desired configuration against the current state, and prints the exact set of actions it would take — what gets created, updated in place, replaced (destroy+create), or destroyed — without changing anything. It is your review-before-you-touch-prod safety check, and you can save it to a file.

    apply executes those changes against the real providers and then writes the new state. If you pass a saved plan file, apply runs exactly that plan with no surprises; without one, apply shows the plan again and asks for confirmation. The senior habit is to always read the plan output (especially anything marked for replacement/destruction) before approving an apply.

    Red flag Running `apply -auto-approve` in CI without reviewing the plan — you can silently destroy and recreate a stateful resource (like a database) that a config change forced to be replaced.

    source: Terraform docs — terraform plan / apply ↗
  • Commonly asked mid concept common What are Terraform modules and why do you use them?

    A module is a reusable, parameterized bundle of Terraform resources — a directory with input variables, resources, and outputs. Instead of copy-pasting the same 200 lines to stand up a VPC or a service in dev, staging, and prod, you write it once as a module and call it three times with different inputs.

    The payoff is DRY infrastructure, consistency (every environment provisions the same way), and an interface boundary: callers only deal with the module's variables and outputs, not its internals. Every Terraform config has an implicit root module; you compose it from child modules (your own, or versioned modules from the registry). The trap is over-abstracting too early — wrap something in a module once you actually have repetition, not speculatively.

    Red flag Over-modularizing on day one — wrapping a single-use resource in a deeply nested module hierarchy adds indirection without the reuse that justifies it.

    source: Terraform docs — Modules ↗
  • Commonly asked senior concept common What is configuration drift, and how do you detect and reconcile it in Terraform?

    Drift is when the real infrastructure no longer matches what Terraform's state/config says — typically because someone made a change by hand in the cloud console ('ClickOps') outside Terraform.

    Detection: terraform plan refreshes state against the provider and shows the divergence as changes it wants to make; a plan that proposes changes you did not author is drift. Reconcile in one of two directions: bring the real resource back in line by re-applying your config, or, if the manual change is desirable, update the Terraform config to match (and apply). For resources created outside Terraform, terraform import brings them under management.

    The durable fix is process: make Terraform the single source of truth, restrict console write access, and run plan in CI on a schedule to catch drift early.

    Red flag Letting people make changes in the cloud console alongside Terraform — the next apply silently reverts their manual fix (or vice versa), and the two views of reality keep fighting.

    source: Terraform docs — Manage resource drift ↗
  • Commonly asked junior concept common What is the difference between a Terraform provider and a resource?

    A provider is a plugin that teaches Terraform how to talk to a specific platform's API — aws, google, azurerm, cloudflare, kubernetes. You configure it once (region, credentials), and it exposes the set of resource and data-source types for that platform.

    A resource is a single managed object you declare — resource "aws_s3_bucket" "assets" { ... } describes one bucket. The provider knows how to create, read, update, and delete that resource type via the platform's API. So: the provider is the integration layer; resources are the things you actually provision through it. A data source is the read-only sibling — it looks up existing infrastructure without managing it.

    Red flag Confusing a resource with a data source — a resource is created and managed by Terraform; a data source only reads existing infrastructure and never creates anything.

    source: Terraform docs — Providers ↗
  • Commonly asked mid concept common Why is Infrastructure as Code better than clicking through a cloud console, and what is the difference between declarative and imperative IaC?

    IaC makes infrastructure versioned, reviewable, and reproducible. Config lives in git, so changes go through pull requests and code review, you have an audit trail, you can roll back, and you can stand up an identical environment on demand instead of relying on someone remembering which buttons they clicked. It eliminates configuration drift and snowflake servers.

    Declarative vs imperative: declarative (Terraform) means you describe the desired end state and the tool figures out the steps and the diff to get there — apply it twice and nothing extra happens (idempotent). Imperative (a shell/SDK script) means you spell out the steps to take, and re-running can double-create or fail because it does not reason about current state. Terraform is declarative, which is why plan can show you precisely what will change before anything happens.

    Red flag Describing Terraform as a script that 'runs commands to build infra' — that is the imperative mental model; Terraform reconciles toward a declared end state and is idempotent.

    source: Terraform docs — What is Terraform / intro ↗
  • Commonly asked senior concept common How do you bring an existing, manually-created cloud resource under Terraform management?

    You import it — Terraform's state knows nothing about resources it didn't create, so you have to tell it. The two-part move: (1) write a matching resource block in your config for the existing object, then (2) bring it into state, either with the CLI terraform import <resource_address> <real_id> or, in modern Terraform, an import block that does it as part of plan/apply (and can even generate config).

    The critical detail interviewers probe: importing only updates state, it does not write your configuration. If your hand-written resource block doesn't match the real object's settings, the very next plan will propose changes to 'fix' the real resource back to your (incomplete) config. So after importing you run plan and iterate on the config until the plan is clean (no changes) — that confirms config, state, and reality all agree.

    This is also how you remediate drift / ClickOps: adopt the orphaned resource instead of destroying and recreating it.

    What a strong answer covers
    • Terraform ignores anything it didn't create — you must import existing resources into state.

    • Two steps: write a matching resource block, then terraform import (or an import {} block).

    • Import updates state only — it does not generate or fix your config.

    • Iterate until plan shows no changes, proving config + state + reality agree.

    • It's the safe way to adopt ClickOps/orphaned resources without destroy-and-recreate.

    Quick self-check

    After `terraform import` of an existing bucket, the next `plan` wants to modify it. Why?

    Red flag Running `terraform import` and assuming you're done — import only writes state, not config, so a mismatched resource block makes the next apply try to 'correct' the real resource; you must get a clean plan first.

    source: Terraform docs — Import existing resources ↗
  • Commonly asked senior concept common How do you manage multiple environments (dev / staging / prod) in Terraform, and why are workspaces often the wrong tool?

    The common patterns: separate state per environment with a shared module. You write the infrastructure once as a module, then have a thin per-environment root config (environments/prod, environments/staging) that calls the module with different variables (instance sizes, counts) and, crucially, its own backend/state file. This isolates blast radius — a bad apply in staging can't touch prod's state.

    Terraform workspaces let one config switch between multiple state files (default, dev, prod) without copying code. They're tempting for environments but are usually the wrong fit: they share the same backend and code, it's easy to run apply against the wrong workspace by accident (no separate credentials/approval boundary), and they don't capture genuinely different configs well. They're better suited to short-lived, near-identical parallel copies (e.g. per-feature-branch ephemeral envs).

    Senior answer: isolate prod with its own state, backend, and credentials; use modules for DRY; reserve workspaces for ephemeral, structurally-identical environments.

    What a strong answer covers
    • Default pattern: one shared module + thin per-env root configs with separate state/backends.

    • Separate state per env isolates blast radius — staging mistakes can't corrupt prod.

    • Workspaces swap state files on one config/backend — convenient but no real isolation boundary.

    • Workspace risk: applying to the wrong environment with no separate credentials/approval.

    • Use workspaces for ephemeral, identical envs; use separate state+backend for dev/staging/prod.

    Red flag Using a single workspace-switched config for prod and staging — one fat-fingered `terraform workspace select` and an `apply` hits the wrong environment, with no separate backend or credential boundary to stop it.

    source: Terraform docs — Workspaces ↗
  • Commonly asked senior concept occasional What is the difference between `count` and `for_each` for creating multiple resources, and why does it matter for state?

    Both create multiple instances of a resource, but they key the instances differently in state, and that's the whole game. count produces a list indexed by integer position — resource[0], resource[1]. for_each produces a map keyed by a stable string — resource["web"], resource["db"].

    The trap with count: because instances are positional, removing an item from the middle of the list shifts every later index, so Terraform thinks those resources changed identity and proposes to destroy-and-recreate them. With for_each, each instance is bound to its own key, so deleting one only affects that one — the rest stay put.

    Guidance: use count for N identical, order-independent copies (or a simple on/off toggle, count = var.enabled ? 1 : 0); use for_each whenever you iterate over a set/map of distinct things (named buckets, subnets per AZ) so that adding or removing one doesn't churn the others.

    What a strong answer covers
    • count → list indexed by integer position; for_each → map keyed by a stable string.

    • Removing a middle count element shifts later indices, forcing destroy/recreate of unrelated resources.

    • for_each binds each instance to its key, so add/remove touches only that instance.

    • Use count for N identical copies or an on/off toggle (count = enabled ? 1 : 0).

    • Use for_each for a set/map of distinct named things (buckets, subnets per AZ).

    Quick self-check

    You manage 5 distinct named S3 buckets and sometimes remove one from the middle. Which is safer?

    Red flag Using `count` over a list of distinct named resources — removing or reordering an element shifts every later index, so Terraform destroys and recreates resources you never intended to touch; `for_each` keyed by name avoids the churn.

    source: Terraform docs — The for_each meta-argument ↗
  • Commonly asked senior trick occasional Why is `terraform destroy` (or an accidental resource replacement) so dangerous, and how do you guard against it?

    Terraform faithfully executes the declared end state — including deletion. The danger is that a config change can force a replace (destroy + create) of a resource you assumed would update in place: changing an attribute marked 'ForceNew' (an EC2 instance's AMI, a database's engine, a subnet) makes Terraform plan to destroy the old object and create a new one. On a stateful resource like a production database, that's data loss executed by a routine-looking apply.

    Guards, layered: (1) read the plan — anything showing -/+ destroy and then create or # forces replacement is a red flag, never -auto-approve blindly. (2) Add lifecycle { prevent_destroy = true } on critical resources so Terraform errors out rather than destroying them. (3) Use create_before_destroy where a replacement is acceptable but downtime isn't. (4) Take backups / enable deletion protection on the cloud side as a last line. (5) For stateful data stores, often manage them outside the same Terraform lifecycle as ephemeral compute.

    The trick being tested: knowing that 'update' can silently mean 'replace', and that the plan output is your safety check.

    What a strong answer covers
    • A config change to a ForceNew attribute makes Terraform destroy + recreate — potential data loss.

    • The plan shows it as -/+ / # forces replacement — that's your red flag to stop.

    • lifecycle { prevent_destroy = true } makes Terraform refuse to destroy critical resources.

    • create_before_destroy avoids downtime when a replace is genuinely acceptable.

    • Layer cloud-side deletion protection / backups; manage stateful stores apart from ephemeral compute.

    Red flag Approving a plan without noticing a `# forces replacement` on a stateful resource — Terraform will dutifully destroy the production database and create a fresh empty one, and `apply` doesn't ask 'are you sure this is a DB?'.

    source: Terraform docs — The lifecycle meta-argument ↗
  • Commonly asked mid concept occasional What are input variables, outputs, and locals in Terraform, and how do they differ?

    They're the three ways data flows through a config. Input variables (variable) are the parameters a module accepts from its caller — the public 'function arguments' (region, instance size), set via .tfvars, CLI flags, or env vars, and typed/validated. Outputs (output) are the values a module exposes back to its caller or the CLI — the 'return values' (a created VPC's ID, a load balancer's DNS name) that other modules consume. Locals (locals) are named intermediate expressions used *inside* a config to avoid repetition — computed once, referenced as local.name, never settable from outside.

    The mental model: variables are inputs (caller → module), outputs are results (module → caller), locals are private helpers (internal only). This is exactly what makes a module a clean interface: callers only touch its variables and outputs, never its internals.

    A practical note: mark sensitive variables/outputs sensitive = true so Terraform redacts them in plan/apply logs.

    What a strong answer covers
    • Variables: a module's input parameters (caller → module), typed and validatable.

    • Outputs: values a module returns (module → caller / CLI), consumed by other modules.

    • Locals: private named expressions, computed once, used internally to avoid repetition.

    • Together, variables + outputs form a module's clean public interface; locals stay internal.

    • Use sensitive = true to redact secret variables/outputs from logs.

    Red flag Confusing locals with variables — a local is a computed internal helper that callers can't override, while a variable is the external input; using a local where you needed a configurable input makes the module non-parameterizable.

    source: Terraform docs — Variables and outputs ↗
  • Commonly asked mid concept occasional How does Terraform decide the order to create resources? What are implicit vs explicit dependencies?

    Terraform builds a dependency graph from your config and creates/updates/destroys resources in the order that graph implies, parallelizing wherever there's no dependency between resources. You rarely specify order yourself.

    Implicit dependencies are inferred from references: if a security group rule uses aws_vpc.main.id, Terraform knows the VPC must exist first, because the rule reads an attribute of the VPC. This is the idiomatic, preferred way — wire resources together by referencing each other's attributes and the ordering falls out automatically (and correctly, including on destroy, which runs in reverse).

    Explicit dependencies use depends_on to force an ordering Terraform can't infer — typically when there's a *hidden* relationship not expressed through a reference (e.g. an app needs an IAM policy attached before it runs, but doesn't reference the attachment's attributes). Use depends_on sparingly; over-using it usually means you should have referenced the attribute instead.

    What a strong answer covers
    • Terraform builds a dependency graph and parallelizes independent resources automatically.

    • Implicit deps: inferred from attribute references (aws_vpc.main.id) — the idiomatic way.

    • Referencing attributes gets ordering right for create *and* destroy (reverse order) for free.

    • Explicit deps (depends_on): force an order for a hidden relationship not expressed by a reference.

    • Use depends_on sparingly — usually a missing attribute reference is the real fix.

    Red flag Sprinkling `depends_on` everywhere to 'be safe' — it serializes resources that could run in parallel and hides the real relationships; reference the attribute you depend on and let Terraform infer the order.

    source: Terraform docs — Resource dependencies ↗