Dynamic VPC Module in Terraform 0.12

TL;DR Just show me the code!

This is a Dynamic VPC Module that builds a redundant network architecture in AWS based on structured input using for_each and for constructs. It will build a VPC with private and public subnets per AZ with the proper routing and labeling.

Here is the related VPC network diagram for visual reference. example_vpc

Preface

I was planning on releasing this blog post shortly after Terraform 0.12 released but there were unexpected delays. Now Terraform 0.13 is GA and the patterns I’m presenting here will change significantly but I hope this post will help others with their module development.

I’m also attempting to accomodate a beginner to intermediate audience so the general terminolgy I use is more about how I think about Terraform than about how Terraform works. Because of that, I may make incorrect assumptions and my design will probably leave areas for improvement.

Intro

This Dynamic VPC Module is a demonstration of how to build flexible module behavior using for_each meta-arguments and for expressions to orchestrate resource building more fluidly within a single module while exploring basic network design. I’m a big fan of utilizing nested modules to compose bigger architecture but this post is not about module composition patterns. Terraform has some docs on that and definitely worth a read!

Terraform 0.12 supports building resources with the for_each meta-arguments and for expressions. This allows us to design modules with minimal input requirments while providing flexible DAG behavior based on the object structures being passed in.

There is no shortage of highly configurable VPC modules online but most of them aren’t using the new 0.12 feature set. Even the official AWS VPC module has minimal for_each usage (probably by design to support various use cases and pre-Terraform 0.12 we only had count) and has a big interface with lots of internal logic that supports many different configurations but it’s hard to follow.

In general, I prefer my Terraform modules have specific behavior out of the box like the TL;DR says, I want a VPC module that builds a private and public subnet per AZ with the proper routing and labeling. I’d like to control which AZs get built with an AZ to subnet configuration map.

Note: In the code examples, ... means the code in-between, it’s not the Terraform expansion (ellipsis) symbol.

The Old Way

count

Previously in Terraform 0.11 we could only build resource lists with count. Notice that the third octet of the subnet CIDR is generated based on the index of resource objects in the list.

variable = "cidr_block" {
  default = "10.0.0.0/16"
}

variable "azs" {
  type = list
  default = ["a", "b", "c"]
}
...

resource "aws_subnet" "private" {
  count                   = length(var.azs)

  vpc_id                  = aws_vpc.vpc.id
  availability_zone       = format("%s%s", local.region_name, var.azs[count.index])
  cidr_block              = cidrsubnet(var.cidr_block, 8, count.index)
  map_public_ip_on_launch = false
}
...

output "private_subnet_ids" {
  value = aws_subnet.private[*].id
}

We would get us an indexed list of resource aws_subnet.private objects similar to this abstract example:

aws_subnet.private = [
  {
    az     = "us-east-1a",
    subnet = "10.0.0.0/24"
  },
  {
    az     = "us-east-1b",
    subnet = "10.0.1.0/24"
  },
  {
    az     = "us-east-1c",
    subnet = "10.0.2.0/24"
  }
]

When resource creation depends on the index of a list it’s usually a static configuration becuase if the ordering changes the list would need to be recomputed. This will force new resources which is what we don’t want.

Continuing the abstract example:

# a, b, and c are objects
aws_subnet.private = [a, b, c]
=> remove b subnet
aws_subnet.private = [a, c]
tf plan => (forced to rebuild ALL resources based on the index change for c)

If c is removed from the end of the list to delete only c related resources, the other resources would not be affected because their index would not have changed.

However, if b was removed then the index for c would change from 2 to 1 causing the subnet to be recomputed with a different CIDR forcing new resources.

The for_each meta-argument replaces count in most cases so we can iterate over more flexible data structures like maps and sets that will avoid destructive re-ordering behavior. However, count is still useful in cases when I want simple toggle logic on a resource (ie count = var.enabled ? 1 : 0, count = length(var.some_list) > 0 ? 1 : 0, etc).

A New Way

for_each

The for_each meta-argument can iterate over a static map/set to create a resource map/set. However, it cant iterate over a resource map/set generated at runtime directly but there are exceptions to this using for.

variable "azs" {
 type = map(number)
 default = {
   a = 1
   b = 2
   c = 3
 }
}
...
...

resource "aws_subnet" "private" {
  for_each = var.azs

  vpc_id                  = aws_vpc.vpc.id
  availability_zone       = format("%s%s", local.region_name, each.key)
  cidr_block              = cidrsubnet(var.cidr_block, 8, each.value)
  map_public_ip_on_launch = false
}
...

Now, we’ll get an aws_subnet.private resource map with the same keys as the static map that was passed into for_each.

Abstract example:

aws_subnet.private = {
  a = {
    az     = "us-east-1a",
    subnet = "10.0.0.0/24"
  },
  b = {
    az     = "us-east-1b",
    subnet = "10.0.1.0/24"
  },
  c = {
    az     = "us-east-1c",
    subnet = "10.0.2.0/24"
  }
}

For

for experessions are a way to transform a static map/list/set or a runtime generated resource maps/lists into a new map or list. It’s similar to Python dict/list comprehensions with different syntax.

It’s important to know that the splat operator [*] is a short hand for expression. Splat is for lists only.

We can iterate over aws_subnet.private resource list created with count and create a new list contaning private subnet IDs via the splat operator:

output "private_subnet_ids" {
  value = aws_subnet.private[*].id
}

This is equivalent to:

output "private_subnet_ids" {
  value = [ for subnet in aws_subnet.private : subnet.id ]
}

Both will give us:

[ "private-subnet-id-1", "private-subnet-id-2", "private-subnet-id-3" ]

Here we’ll output a new map of AZs associated to private subnet IDs:

output "private_subnet_ids" {
  value = { for az, subnet in aws_subnet.private : az => subnet.id }
}

This gives us:

{
  a = "private-subnet-id-1"
  b = "private-subnet-id-2"
  c = "private-subnet-id-3"
}

You can also use for as a filter.

Filter a list of objects based on an attribute:

locals {
  list_of_objects = [
    { type = "typeA" },
    { type = "typeB" },
    { type = "typeC" }
  ]
}

output "filtered_list_of_objects_based_on_type" {
  value = [ for o in local.list_of_objects : o if o.type == "typeA" ]
}
=>
[ {type = "typeA"} ]

Filter a map by value:

locals {
  some_map = {
    key1 = "value1"
    key2 = "value2"
    key3 = "value1"
  }
}

output "filtered_map_value" {
value = { for k, v in local.some_map : k => v if v == "value1" }
}
=>
{
  key1 = "value1"
  key3 = "value1"
}

for can be used just about anywhere.


Update:

In the rest of this for section I stated that passing resources directly into for_each was not allowed. Well, I was wrong and you can do this.

I assumed it wasn’t possible because I was hitting an error to related to this:

The keys of the map must be known values, or you will get an error message that for_each has dependencies that cannot be determined before apply,

So there was some initial misunderstanding.

However, now that I know it’s possible I don’t think it necessarily means that you should.

I use the pre-built map pattern which still holds strong (see the next Dynamically Generated Resources section).

But it’s important to know the edge cases and when to use them.

I also wanted to reflect on another statment of not recommending using for on a resource to pass into for_each but I was wrong again.

Looking back, I would strongly recommend doing this becuse it helps transforming into the different data structures.


I’ve seen it be used to generate a new map from a runtime generated resource map as input into for_each which can get around this limitation:

resource "aws_route_table_association" "private" {
  for_each = aws_subnet.private #<=== NOT ALLOWED
  ...
}

Here’s an complicated example to show how far you can push the use of for leading to unnecessary complexity (not recommended):

...

resource "aws_route_table_association" "private" {
  for_each = { for az, subnet in aws_subnet.private: az => subnet.id }

  route_table_id = lookup(
    { for az, route_table in aws_route_table.private : az => route_table.id },
    each.key
  )
  subnet_id = lookup(
    { for az, subnet in aws_subnet.private : az => subnet.id },
    each.key
  )
}
...

I don’t recommend building resources using for to generate a new map from a runtime generated resource map as input into for_each. Because, not only does this make reasoning about the resource more difficult but now the resource creation directly depends on another resource’s state to build itself which leaves us with less flexibility and control when state changes upstream. Also, there’s an unnessary amount of for iteration when setting attribute values via the lookup function. This can all be simplified with a different pattern.

Dynamically Generated Resources

The complicated for example above can be further simplified by using a shared static map that for_each iterates over for related resources and using functions to access resource maps generated at runtime.

variable "azs" {
 type = map(number)
 default = {
   a = 1
   b = 2
   c = 3
 }
}
...
...

resource "aws_route_table_association" "private" {
  for_each = var.azs

  route_table_id = lookup(aws_route_table.private, each.key).id
  subnet_id      = lookup(aws_subnet.private, each.key).id
}
...

This pattern allows us to dynamically generate resources for those in-between resources (ie association resources). They will link themselves together based on looking up the same keys as the other related resource maps. This is the advantage of using the shared static map in for_each.

Bringing It Together

Now that we can dynamically generate resources, in theory, we should be able to remove AZ b from the AZ map which will then remove all resources related to it while not affecting resources ralated to the other AZs.

var.azs = {
  a = 1
  b = 2
  c = 3
}
=> remove b subnet
var.azs = {
  a = 1
  c = 3
}
tf apply => (successful, no conflicts)

This works as expected. Now we should be able to add AZ b back in:

var.azs = {
  a = 1
  c = 3
}
=> add b subnet back in
var.azs = {
  a = 1
  b = 2
  c = 3
}
tf plan => (unexpected changes to resources releated to AZs a and c)

It seems when the b subnet is added back into the map, somewhere in the dependency chain certain resource maps want to re-compute their attribute values, some of which will force new resources like route_table_id and subnet_id in an association resource:

...

resource "aws_route_table_association" "private" {
  for_each = var.azs

  route_table_id = lookup(aws_route_table.private, each.key).id
  subnet_id      = lookup(aws_subnet.private, each.key).id
}
...

It appears an attribute that triggers a “force a new resource” assumes the value will change (even if the value stays the same) during resource creation or modification while iterating with for_each.

We can address this issue with an ignore_changes in a lifecycle meta-argument block. For every dynamically generated resource we’ll ignore the attribute that’s being set. Excuse the broken syntax highlighting in this example.

...

resource "aws_route_table_association" "private" {
  for_each = var.azs

  route_table_id = lookup(aws_route_table.private, each.key).id
  subnet_id      = lookup(aws_subnet.private, each.key).id

  lifecycle {
    ignore_changes = [subnet_id, route_table_id]
  }
}
...

Setting ignore_changes for subnet_id and route_table_id is OK because we know that the re-computation results will be the same value as before because we are not changing the adjecent resources in the resource map. So we’ll ignore changes to those attributes that want to force a new resource.

It’s advised that ignore_changes is used sparingly because now the module won’t detect state drift on those resources if there are changes made in the AWS console.

Now we can add AZ b back in without affecting the other resources.

var.azs = {
  a = 1
  c = 3
}
=> add AZ b back in
var.azs = {
  a = 1
  b = 2
  c = 3
}
tf apply => (successful, no conflicts)

Using the VPC module

This configuration will create a VPC in the us-east-1 region with a NAT Gatway per AZ with routing for each private and public subnets. Every taggable resource will have proper naming including environment, region and AZ. Everything is in main.tf and variables.tf because I wanted less focus on the directory structure.

Also, I like being explicit about passing in an aliased provider into the module. It makes it easier to identify which region or account I’m applying module resources into.

terraform {
  required_version = "~> 0.12.6"
  required_providers {
  aws = "~> 2.70.0"
  }
}

# base provider
provider "aws" {
  region = "us-east-1"
}

provider "aws" {
  region = "us-east-1"
  alias  = "use1"
}

variable "region_az_short_names" {
  description = "Region and AZ names mapped to short naming conventions for labeling"
  type = map(string)

  default = {
    us-east-1  = "use1"
    us-east-1a = "use1a"
    us-east-1b = "use1b"
    us-east-1c = "use1c"
    us-west-2  = "usw2"
    us-west-2a = "usw2a"
    us-west-2b = "usw2b"
    us-west-2c = "usw2c"
  }
}

module "stage_use1_vpc" {
  source = "git@github.com:JudeQuintana/terraform-modules.git//networking/dynamic_vpc?ref=v1.0.4"

  providers = {
    aws = aws.use1
  }

  env_prefix            = "stage"
  region_az_short_names = var.region_az_short_names
  cidr_block            = "10.0.0.0/16"
  azs = {
    a = 1
    b = 2
    c = 3
  }
}

Note: The third octet of the private subnets correspond to the values in the var.azs map. The third octect of the public subnets are n + 32.

az resource subnet cidr routing
a private subnet 10.0.1.0/24 traffic routes out nat gateway in AZ a
a public subnet 10.0.33.0/24 traffic routes out igw
b private subnet 10.0.2.0/24 traffic routes out nat gateway in AZ b
b public subnet 10.0.34.0/24 traffic routes out igw
c private subnet 10.0.3.0/24 traffic routes out nat gateway in AZ c
c public subnet 10.0.35.0/24 traffic routes out igw

Again, here is the related VPC network diagram for visual reference. example_vpc

Caveats

This VPC module more of a learning excersize and it does generate resources that cost money (ie NAT Gateways and EIPs). When it comes to scaling out networks via peer links it’s best practice to segment your network tiers with their own subnets per AZ (ie private app subnet, public load balancer subnet, etc). Network segmentation makes it easier configure security groups across the VPC Peer links because you can’t share security group IDs across VPCs, only subnets!

Closing Thoughts

I hope this shows how far you can get with dynamic behavior within a single module. The new Terrarform 0.13 feature set forces me to reconsider my initial desgins. I’m planning to refactor this VPC module into a sub component design to experiment with for_each on modules. This will open up even more module composibility and abstraction options so I’m looking forward to the future.

~jq1

Feedback

What did you think about this post? jude@jq1.io