Reviewing Terraform’s Basics — Part 2

Regarding DataSource, Variable Output, Local and Loop on Terraform

13 min readJun 22, 2024

Data Sources are used in Terraform to reference external resources or stored information that are not defined by Terraform.

A data source block begins with the keyword data, followed by the ‘data source type’ definition — similar to how Resource blocks are defined.

The data source type is divided by the first underscore (_): the part before it represents the provider name, while the part after it indicates the resource type offered by that provider.

After declaring the data source type, a unique name is assigned. Like resource names, this name acts as an identifier for the same type and must not be duplicated.

Following the name, configuration arguments for the data source type are declared within curly braces { }. Even for types that don’t require arguments, the curly braces must still be included.

The method of referencing a target read by a data source is distinguished from resources by prefixing it with ‘data’. Attribute values can be accessed as follows.

data "local_file" "abc" {
  filename = "${path.module}/abc.txt"
}

echo "t101 study - 2week" > abc.txt

# 
terraform init && terraform plan && terraform apply -auto-approve
terraform state list

terraform console
> 
data.local_file.abc
...
data.local_file.abc.filename
data.local_file.abc.content
data.local_file.abc.id
exit

#
# Declare the data source
data "aws_availability_zones" "available" {
  state = "available"
}
resource "aws_subnet" "primary" {
  availability_zone = data.aws_availability_zones.available.names[0]
  # e.g. ap-northeast-2a
}
resource "aws_subnet" "secondary" {
  availability_zone = data.aws_availability_zones.available.names[1]
  # e.g. ap-northeast-2b
}

Conduct a VPC resource creation exercise using the list of available availability zones within the above region, or perform any practice using a data source.

resource "local_file" "abc" {
 content = "123!"
 filename = "${path.module}/abc.txt"
}
data "local_file" "abc" {
 filename = local_file.abc.filename
}
resource "local_file" "def" {
 content = data.local_file.abc.content
 filename = "${path.module}/def.txt"
}
 
#
terraform apply -auto-approve
terraform state list
# Check files
ls *.txt
diff abc.txt def.txt
# Check graph
terraform graph > graph.dot
# Terraform console: Verify data source reference
echo "data.local_file.abc.content" | terraform console
# Created files have different permissions? Why?
ls -l

Input variables are designed to define attribute values necessary for infrastructure configuration, allowing the creation of multiple infrastructures without changing the code.

Variables are composed of blocks that start with the keyword ‘variable’. The name value following the variable block must be unique across all variable declarations within the same module, and this name is referenced in other parts of the code.

variable "<name>" { <argument> = <value> } variable "image_id" { type = string }

Meta-arguments available when defining variables: default: If no specific method for passing variable values is specified, the default value is used. If there’s no default value, the user is prompted interactively for variable information.

Type: Defines the allowed value types for the variable. Options include string, number, bool, list, map, set, object, tuple. If no type is specified, it’s considered as type ‘any’.
Description: Provides a description of the input variable.
Validation: Defines validation rules by adding constraints to the variable declaration.
Sensitive: Indicates that the variable value is sensitive and limits its exposure in Terraform’s output statements (for sensitive data like passwords)
Nullable: Specifies that the variable can have no value.

variable "string" {
  type        = string
  description = "var String"
  default     = "myString"
}

variable "number" {
  type    = number
  default = 123
}

variable "boolean" {
  default = true
}

variable "list" {
  default = [
    "google",
    "vmware",
    "amazon",
    "microsoft"
  ]
}

output "list_index_0" {
  value = var.list.0
}

output "list_all" {
  value = [
    for name in var.list : upper(name)
  ]
}

variable "map" { # Sorting
  default = {
    aws   = "amazon",
    azure = "microsoft",
    gcp   = "google"
  }
}

variable "set" { # Sorting
  type = set(string)
  default = [
    "google",
    "vmware",
    "amazon",
    "microsoft"
  ]
}

variable "object" {
  type = object({ name = string, age = number })
  default = {
    name = "abc"
    age  = 12
  }
}

variable "tuple" {
  type    = tuple([string, number, bool])
  default = ["abc", 123, true]
}

variable "ingress_rules" { # optional ( >= terraform 1.3.0)
  type = list(object({
    port        = number,
    description = optional(string),
    protocol    = optional(string, "tcp"),
  }))
  default = [
    { port = 80, description = "web" },
  { port = 53, protocol = "udp" }]
}

Validation

In addition to specifying input variable types, custom validations are possible. Within the variable block, a validation block is used. The condition specified in this block must return either true or false. The error_message defines the message to be output when the condition value results in false.

The regex function applies a regular expression to the target string and returns matching strings. When used in conjunction with the can function, it detects errors for cases that don’t match the regular expression.

variable "image_id" {
  type        = string
  description = "The id of the machine image (AMI) to use for the server."

  validation {
    condition     = length(var.image_id) > 4
    error_message = "The image_id value must exceed 4."
  }

  validation {
    # regex(...) fails if it cannot find a match
    condition     = can(regex("^ami-", var.image_id))
    error_message = "The image_id value must starting with \"ami-\"."
  }
}

#
terraform apply -auto-approve
var.image_id
  The id of the machine image (AMI) to use for the server.

  Enter a value: ami-
...

Variable Reference

Variables are referenced within the code using the syntax var.<name>.

variable "my_password" {}

resource "local_file" "abc" {
  content  = var.my_password
  filename = "${path.module}/abc.txt"
}

terraform init -upgrade && terraform apply -auto-approve
var.my_password
  Enter a value: qwe123

Handling Sensitive Variables

It’s possible to declare the sensitivity of input variables While adding a default value doesn’t generate additional input prompts, you can observe that the variable value referenced in the output is concealed as (sensitive)

variable "my_password" {
  default   = "password"
  sensitive = true
}

resource "local_file" "abc" {
  content  = var.my_password
  filename = "${path.module}/abc.txt"
}

# Content not visible in output section!
terraform apply -auto-approve
...
 ~ content              = (sensitive value)
...

terraform state show local_file.abc
echo "local_file.abc.content" | terraform console
(sensitive value)

# Check the result file
cat abc.txt ; echo

# Check terraform.tfstate file: Click terraform.tfstate in VSCode to verify
cat terraform.tfstate | grep '"content":'
# "content": "password",

Variable Input Methods and Priorities

The purpose of variables in Terraform is to enhance reusability by allowing input through Terraform’s modular characteristics without modifying the code content. In particular, true to the term “input variables,” users can define variables with desired values during provisioning execution.

variable "my_var" {}

resource "local_file" "abc" {
  content  = var.my_var
  filename = "${path.module}/abc.txt"
}

There is a priority hierarchy for variables based on how they are declared. This can be used effectively to define variables differently in local environments versus build server environments, or to specify external values to variables when configuring provisioning pipelines.

The above diagram illustrates the precedence hierarchy for variable definitions in Terraform, arranged from highest to lowest priority. At the top, command line options using -var and -var-file take precedence, followed by auto-loaded variable files (*.auto.tfvars or *.auto.tfvars.json) processed in lexical order of their filenames. Next in line are variables defined in terraform.tfvars.json, then those in terraform.tfvars. At the bottom of the hierarchy are environment variables.

This precedence order means that in cases of conflicting variable definitions, those higher in the list will override those lower down, providing a clear structure for how Terraform resolves variable values across different input methods.

Let’s examine each level, from lowest to highest precedence:

Default Values in Variable Blocks

The lowest precedence is given to default values defined within variable blocks in your Terraform configuration files:

variable "my_var" {
 default = "var2"
}

2. Environment Variables

Terraform recognizes environment variables prefixed with TF_VAR.

export TF_VAR_my_var=var3
terraform apply -auto-approve

3. terraform.tfvars File

Variables defined in a `terraform.tfvars` file in the root module directory take precedence over environment variables:

echo 'my_var="var4"' > terraform.tfvars
terraform apply -auto-approve

4. *.auto.tfvars Files

Terraform automatically loads files ending in `.auto.tfvars` or `.auto.tfvars.json`. These files are processed in lexical order of their filenames:

echo 'my_var="var5_a"' > a.auto.tfvars
echo 'my_var="var5_b"' > b.auto.tfvars
terraform apply -auto-approve

5. CLI -var and -var-file Options

The highest precedence is given to variables set directly on the command line using `-var` or `-var-file` options:

terraform apply -auto-approve -var=my_var=var7
terraform apply -auto-approve -var=my_var=var7 -var=my_var=var8

When multiple `-var` or `-var-file` options are used, the last value specified takes precedence.

Practice: Deploying a Custom VPC with EC2 Instance Using Terraform on AWS

Step 1: Setting Up the Project First, we create a new directory for our project and initialize the necessary files.

mkdir my-vpc-ec2
cd my-vpc-ec2
touch vpc.tf sg.tf ec2.tf

Step 2: Configuring the VPC In the vpc.tf file, we define our custom VPC with the following components:

VPC resource with DNS options enabled
Two subnets in different availability zones
Internet Gateway
Route table with a default route to the Internet Gateway

resource "aws_vpc" "myvpc" {
  cidr_block           = "10.10.0.0/16"
  enable_dns_support   = true
  enable_dns_hostnames = true
  tags = {
    Name = "t101-study"
  }
}

resource "aws_subnet" "mysubnet1" {
  vpc_id            = aws_vpc.myvpc.id
  cidr_block        = "10.10.1.0/24"
  availability_zone = "ap-northeast-2a"
  tags = {
    Name = "t101-subnet1"
  }
}

resource "aws_internet_gateway" "myigw" {
  vpc_id = aws_vpc.myvpc.id
  tags = {
    Name = "t101-igw"
  }
}

resource "aws_route" "mydefaultroute" {
  route_table_id         = aws_route_table.myrt.id
  destination_cidr_block = "0.0.0.0/0"
  gateway_id             = aws_internet_gateway.myigw.id
}

Step 3: Creating a Security Group In the sg.tf file, we define a security group to control inbound and outbound traffic:

resource "aws_security_group" "mysg" {
  vpc_id      = aws_vpc.myvpc.id
  name        = "T101 SG"
  description = "T101 Study SG"
}

resource "aws_security_group_rule" "mysginbound" {
  type              = "ingress"
  from_port         = 80
  to_port           = 80
  protocol          = "tcp"
  cidr_blocks       = ["0.0.0.0/0"]
  security_group_id = aws_security_group.mysg.id
}

Step 4: Launching an EC2 Instance In the ec2.tf file, we define an EC2 instance using the latest Amazon Linux 2 AMI:

data "aws_ami" "my_amazonlinux2" {
  most_recent = true
  filter {
    name   = "owner-alias"
    values = ["amazon"]
  }
  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-*-x86_64-ebs"]
  }
  owners = ["amazon"]
}

resource "aws_instance" "myec2" {
  ami                         = data.aws_ami.my_amazonlinux2.id
  instance_type               = "t2.micro"
  associate_public_ip_address = true
  vpc_security_group_ids      = [aws_security_group.mysg.id]
  subnet_id                   = aws_subnet.mysubnet1.id
  user_data                   = <<-EOF
              #!/bin/bash
              # ... (user data script to set up a simple web server)
              EOF
  tags = {
    Name = "t101-myec2"
  }
}

Step 5: Deploying and Verifying To deploy the infrastructure:

terraform init
terraform plan
terraform apply -auto-approve

terraform state list
terraform state show aws_vpc.myvpc
terraform state show aws_instance.myec2

MYIP=$(terraform output -raw myec2_public_ip)
curl http://$MYIP/

Local Values

Local values in Terraform are defined within the code and cannot be input externally during execution. They are accessible only within the module where they are declared, offering a scope-limited alternative to variables.

Local values are declared in blocks that start with the ‘locals’ keyword. Here’s an example.

locals {
  name    = "terraform"
  content = "${var.prefix} ${local.name}"
  my_info = {
    age    = 20
    region = "KR"
  }
  my_nums = [1, 2, 3, 4, 5]
}

Can be declared multiple times in the same file or across multiple files.
Local value names must be unique within the entire root module.
Can reference various types of values, including constants, resource attributes, and variable values.
Local values are referenced using the syntax ‘local.<name>’.

resource "local_file" "abc" {
  content  = local.content
  filename = "${path.module}/abc.txt"
}

// main.tf
variable "prefix" {
  default = "hello"
}

locals {
  name = "terraform"
}

resource "local_file" "abc" {
  content  = local.content
  filename = "${path.module}/abc.txt"
}

// sub.tf
locals {
  content = "${var.prefix} ${local.name}"
}

In this example, I have split the local value definitions across two files, demonstrating how locals can be referenced across different files within the same module.

You can override values used in locals by using .tfvars files. For instance, this will change the value of ‘var.prefix’ used in the local ‘content’, affecting the final output.

prefix = "t101-study"

Practice: Exploring the way how to create AWS Identity and Access Management (IAM) users using Terraform.

Notice how the code below local values (local.name and local.team) to set the name and tags for our IAM users. This approach promotes code reusability and makes it easier to manage common values across multiple resources.

provider "aws" {
  region = "ap-northeast-2"
}

locals {
  name = "mytest"
  team = {
    group = "dev"
  }
}

resource "aws_iam_user" "myiamuser1" {
  name = "${local.name}1"
  tags = local.team
}

resource "aws_iam_user" "myiamuser2" {
  name = "${local.name}2"
  tags = local.team
}

Let’s verify the deployment after applying the configurations:

terraform init
terraform apply -auto-approve

terraform state list

terraform state show aws_iam_user.myiamuser1
terraform state show aws_iam_user.myiamuser2

aws iam list-users | jq

# cleanup
terraform destroy -auto-approve -target=aws_iam_user.myiamuser1
terraform destroy -auto-approve

Terraform Outputs

Terraform outputs are a way to extract and display specific values from your Terraform-managed infrastructure. They serve several key purposes, Displaying important information after Terraform operations, Sharing data between modules, Exposing specific values for other Terraform configurations.

In this example, I am creating a local file and then outputting its ID and absolute path. The abspath function is used to convert the relative path to an absolute path.

output "instance_ip_addr" {
  value = "http://10.1.1"
}

resource "local_file" "abc" {
  content  = "abc123"
  filename = "${path.module}/abc.txt"
}

output "file_id" {
  value = local_file.abc.id
}

output "file_abspath" {
  value = abspath(local_file.abc.filename)
}

When you run terraform plan, Terraform can predict some output values, but not all. For instance is below. The file_abspath is known beforehand, but file_id will only be known after the resource is created.

Changes to Outputs:
  + file_abspath = "/Users/sigridjineth/project/abc.txt"
  + file_id      = (known after apply)

Loops in Terraform

The ‘count’ parameter in Terraform allows you to create multiple instances of a resource or module based on a single block of code. When you include a ‘count’ argument with an integer value in a resource or module block, Terraform will create that many instances of the resource or module.

It creates multiple instances of a resource or module.
The ‘count.index’ value starts at 0 and increments by 1 for each iteration.

In this example, I am creating 5 local files. However, there’s a catch — all files have the same name, so only one file will actually exist. This highlights an important consideration when using ‘count’: ensure unique identifiers for resources.

resource "local_file" "abc" {
  count    = 5
  content  = "abc"
  filename = "${path.module}/abc.txt"
}

output "filecontent" {
  value = local_file.abc[*].content
}

output "fileid" {
  value = local_file.abc[*].id
}

output "filename" {
  value = local_file.abc[*].filename
}

# Let's modify our code to create unique files.
resource "local_file" "abc" {
  count    = 5
  content  = "This is filename abc${count.index}.txt"
  filename = "${path.module}/abc${count.index}.txt"
}

output "fileid" {
  value = local_file.abc[*].id
}

output "filename" {
  value = local_file.abc[*].filename
}

output "filecontent" {
  value = local_file.abc[*].content
}

variable "names" {
  type    = list(string) # Often, you'll want the count to be dynamic based on some input.
  default = ["a", "b", "c"]
}

resource "local_file" "abc" {
  count   = length(var.names)
  content = "abc"
  filename = "${path.module}/abc-${var.names[count.index]}.txt"
}

resource "local_file" "def" {
  count   = length(var.names)
  content = local_file.abc[count.index].content
  filename = "${path.module}/def-${element(var.names, count.index)}.txt"
}

-----

variable "user_names" {
  description = "Create IAM users with these names"
  type        = list(string)
  default     = ["gasida", "akbun", "hyungwook"]
}

resource "aws_iam_user" "myiam" {
  count = length(var.user_names)
  name  = var.user_names[count.index]
}

Terraform allows you to output specific values after resource creation. For example:

output "first_arn" {
  value       = aws_iam_user.myiam[0].arn
  description = "The ARN for the first user"
}

output "all_arns" {
  value       = aws_iam_user.myiam[*].arn
  description = "The ARNs for all users"
}

Let’s look at a more complex example of creating VPC subnets. This configuration allows for the creation of multiple subnets with different CIDR blocks, availability zones, and tags, all defined through variables.

variable "vpc_cidr" {
  type = string
}

variable "subnets" {
  type = list(object({
    cidr = string
    az = string
    tags = map(string)
  }))
}

resource "aws_vpc" "main" {
  cidr_block = var.vpc_cidr
  tags = {
    Name = "terraform VPC"
  }
}

resource "aws_subnet" "main" {
  count             = length(var.subnets) # here!
  vpc_id            = aws_vpc.main.id
  cidr_block        = var.subnets[count.index].cidr # here!
  availability_zone = var.subnets[count.index].az # here!
  tags              = var.subnets[count.index].tags # here!
}

For more flexible tagging, you can use a map of tags.

variable "subnet_tag" {
  type = list(map(string))
}

resource "aws_subnet" "main" {
  count             = length(var.subnet_cidr)
  vpc_id            = aws_vpc.main.id
  cidr_block        = element(var.subnet_cidr, count.index)
  availability_zone = element(var.subnet_az, count.index)
  
  tags = element(var.subnet_tag, count.index)
}

Scenario: Unexpected EC2 Instance Replacement

Let’s consider a scenario where we have a Terraform configuration that creates a VPC, subnets, and an EC2 instance. The initial configuration looks like this. When we remove a subnet from the var.subnets list (particularly if it’s not the last one), Terraform may unexpectedly replace the EC2 instance. This happens because the instance is referencing the subnet by its index (aws_subnet.main[1].id), which changes when we modify the subnet list.

resource "aws_vpc" "main" {
  cidr_block = var.vpc_cidr
  tags = {
    Name = "terraform VPC"
  }
}

resource "aws_subnet" "main" {
  count             = length(var.subnets)
  vpc_id            = aws_vpc.main.id
  cidr_block        = var.subnets[count.index].cidr
  availability_zone = var.subnets[count.index].az
  tags              = var.subnets[count.index].tags
}

resource "aws_instance" "server" {
  ami           = "ami-0e8bd0820b6e1360b"
  instance_type = "t4g.nano"
  subnet_id     = aws_subnet.main[1].id
  tags = {
    Name = "Terraform demo"
  }
}

This issue occurs because we’re using index-based access (aws_subnet.main[1]) to reference the subnet. When the subnet list changes, the indices shift, causing Terraform to think it needs to place the EC2 instance in a different subnet.

There are 4 subnets defined, with CIDR blocks ranging from 192.168.1.0/24 to 192.168.4.0/24.
Each subnet has an index (0 to 3) based on its position in the list.
The (Korean) diagram aboveshows what happens when the first subnet (192.168.1.0/24) is removed.
All subsequent subnets shift up in the index. For example, 192.168.2.0/24 moves from index 1 to index 0.
An EC2 instance is configured to use the subnet at index 1 (aws_subnet.main[1].id).
Initially, this points to the 192.168.2.0/24 subnet.
After removing the first subnet, index 1 now points to 192.168.3.0/24.

Instead of using index-based access, we can use a more robust method to find the correct subnet. Avoid using hardcoded indices when referencing resources created with count. A more robust approach would be to reference resources by their attributes (like CIDR block) rather than their position in a list.

resource "aws_instance" "server" {
  ami           = "ami-0e8bd0820b6e1360b"
  instance_type = "t4g.nano"
  subnet_id     = element(aws_subnet.main[*].id, index(aws_subnet.main[*].cidr_block, "192.168.2.0/24"))
  tags = {
    Name = "Terraform demo"
  }
}

Reviewing Terraform’s Basics — Part 2

Regarding DataSource, Variable Output, Local and Loop on Terraform

Local Values

Terraform Outputs

Loops in Terraform

Written by Sigrid Jin