Maps of Objects in Terraform

Maps of Objects in Terraform

May 29, 2022

Maps of objects are my go-to types for any repeatable resource in Terraform.

When creating more than one resource, maps of objects allow for a minimal footprint and remove the need to create multiple resources of the same type and/or multiple list(<type>)/map(<type>) variables.

Terraform variables are usually defined like this:

variable "a" {
    type    = string
    default = "testing"
}

variable "b" {
    type    = number
    default = 123
}

which work for small, simple resources that consume them:

resource "example_resource" "example" {
    argument_1 = var.a
    argument_2 = var.b
}

However, this starts to get unwieldy when doing something like this:

data "aws_ami" "ubuntu" {
    most_recent = true
    filter {
        name = "name"
        values = [
            "ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*"
        ]
    }
}

resource "aws_instance" "instance1" {
    ami                 = data.aws_ami.ubuntu
    instance_type       = var.instance_1_type
    availability_zone   = var.instance_1_az
    ...
}

resource "aws_instance" "instance2" {
    ami                 = data.aws_ami.ubuntu
    instance_type       = var.instance_2_type
    availability_zone   = var.instance_2_az
    ...
}

resource "aws_instance" "instance3" {
    ami                 = data.aws_ami.ubuntu
    instance_type       = var.instance_3_type
    availability_zone   = var.instance_3_az
    ...
}

where multiple resources have different values for the same argument. At the end of the day, this approach is not DRY at all.

DRY -> Don’t Repeat Yourself

A small improvement could be made by utilizing a list(string) variable:

variable "instance_types" {
    type = list(string)
    default = [
        "t3a.large",
        "t3a.xlarge",
        "t3a.2xlarge"
    ]
}

and then referencing each value in the corresponding resources like so:

resource "aws_instance" "instance1" {
    ...
    instance_type = var.instance_types[0]
    ...
}

resource "aws_instance" "instance2" {
    ...
    instance_type = var.instance_types[1]
    ...
}

resource "aws_instance" "instance3" {
    ...
    instance_type = var.instance_types[2]
    ...
}

This approach is better, but which instance type actually corresponds to which instance? Relying on the order in variables.tf is not necessarily the best indicator and there really isn’t a way to know for sure. Lists can be ordered incorrectly and may not be defined as they are intended to be consumed.

While it is possible to make the list example more concise by doing:

resource "aws_instance" "instance" {
    count           = length(var.instance_types)
    ...
    instance_type   = var.instance_types[count.index]
    ...
}

accommodating other variables (like availability_zone) could be problematic due to the items being potentially unordered. For instance, the availability_zone variable could be defined as ["us-west-2a", "us-west-2c", "us-west-2b"] and var.availability_zone[1] could place instance2 in the incorrect AZ. This specific case would not cause any issues, but more complicated resources might function improperly or not at all if this is not taken into account.

A possible real world example of this could be which subnets are associated with an EKS node group – a given EKS node group could potentially have private subnet “A” and public subnet “C” associated with it which may not cause issues immediately but very well could down the line.

After the list, a further improvement could be to use a map(string) variable (similar to the ubiquitous tags argument/variable):

variable "instance_types" {
    type = map(string)
    default = {
        instance1 = "t3a.xlarge"
        instance2 = "t3a.2xlarge"
        instance3 = "t3a.large"
    }
}

The ordering of instance types here is purposefully different than in the list(string) example.

and then reference the respective instance’s key for the values:

resource "aws_instance" "instance1" {
    ...
    instance_type = var.instance_types["instance1"]
    ...
}

resource "aws_instance" "instance2" {
    ...
    instance_type = var.instance_types["instance2"]
    ...
}

resource "aws_instance" "instance3" {
    ...
    instance_type = var.instance_types["instance3"]
    ...
}

The availability_zone argument would work the same – var.availability_zones["instanceX"]

The ordering of keys within the variable does not matter here, which is an improvement over the list; we can also tell which instance specifically uses which instance type (or availability zone).

This certainly works for a one or two variables, but defining many variables that each have this structure has a tendency to sprawl, not to mention having to add ["instanceX"] everywhere that requires it.

How can this be improved upon? Enter the map of objects:

variable "instances" {
    type = map(object({
        instance_type       = string
        availability_zone   = string
    }))
    default = {
        instance1 = {
            instance_type       = "t3a.xlarge"
            availability_zone   = "us-west-2a"
        }
        instance2 = {
            instance_type       = "t3a.2xlarge"
            availability_zone   = "us-west-2b"
        }
        instance3 = {
            instance_type       = "t3a.large"
            availability_zone   = "us-west-2c"
        }
    }
}

Defining a map of objects requires more syntax overhead on the variable side, but the end result in the resource/module-caller side is much cleaner. This is also where the for_each argument comes in:

resource "aws_instance" "instance" {
    for_each            = var.instances
    ...
    ami                 = data.aws_ami.ubuntu
    instance_type       = each.value.instance_type
    availability_zone   = each.value.availability_zone
    ...
}

And that’s it! Much cleaner. Only one aws_instance resource needs to be defined and the inputs are consolidated within a single variable. When additional inputs need to be added for a resource, each object (along with the base definition) will need to be updated, but this only requires updating the one variable rather than several resources and/or variables (or more, potentially).

This approach also works for modules (assuming the underlying module code implements the variables and for_each argument):

module "instance_module" {
    for_each    = var.instances
    source      = "path/to/instances_module"
    
    instance_type       = each.value.instance_type
    availability_zone   = each.value.availability_zone
}

This approach has proved invaluable, especially when creating resources like:

  • S3 buckets
  • EC2 instances
  • EKS node groups
  • IAM resources
  • Aurora clusters and other RDS resources
  • Vault policies, secrets engines, and auth backends

to name several (though this list is hardly exhaustive).

Maps of objects can support any type (at least within reason, a nested map of objects is not something I’ve tested…), so something like this will work:

variable "map_of_objects" {
    type = map(object({
        a = string
        b = list(string) 
    }))
    default = {}
}

as will this:

variable "map_of_objects" {
    type = map(object({
        a = set(string)
        b = list(number)
        c = list(map(string))
    }))
    default = {}
}

For those who are curious, a list(map(string)) can be used to represent parameters for an Aurora cluster parameter group.

With how flexible and adaptable this pattern is, creating a variable as a map of objects and defining resources/modules to consume it from the start makes sense for most use-cases unless a resource is a one-off. Additionally, migrating resources to this pattern definitely pays off later when adding new instances of the resource.