Use Case

You are using terraform to deploy AWS instances and EMR clusters and you want to randomly spread them across your subnets.  

The Issue

The AWS provider doesn't provide a direct way to say give me a random subnet.  You can get a set of subnet ids or you can get a subnet.  Neither of which is help ful in distrubting your work load.

The Solution

Use the random_id resource and some basic modulo math to select a subnet at random.  To make sure that we would get a nice distribution across all the subnets I ran a few test and was happy with the results.  

Assuming you have 5 subnets the results of 1000 random ids was:

"x=0", occured 208 times for 20%
"x=1", occured 205 times for 20%
"x=2", occured 194 times for 19%
"x=3", occured 192 times for 19%
"x=4", occured 201 times for 20%

You can find my test code and run the numbers yourself in my terraform-tips-and-workarounds github repo.  I was running this on Mac Book with a Core i7 processor.  If you run the test and don't get an even distrubtion let me know so I can update the test results.

Implementation

I'm going to review the terraform segment by segement.  You can access source which is ready to deploy in my tips-tricks-workarounds github repo.

List of subnets

First you need get the list of the subnets.  This is done in 2 steps.  

  1. First query for the default VPC.  If you are not wanting to use your default VPC then look at the filter and tag options on aws_vpc to dynamically select the vpc.
  2. Get the subnet ids for the default VPC.  If you don't want to use all the subnets you can use teh filter and tag options aws_subnet_ids similar to aws_vpc.
data aws_vpc default {
  default = true
}

data aws_subnet_ids current {
  vpc_id = data.aws_vpc.default.id
}

AMI

We need an AMI to deploy and EC2 instance.  Here I am just query for the latest release of the Amazon 2 Linux AMI.

data aws_ami current {
  most_recent = true

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }

  # Use Amazon Linux 2 AMI (HVM) SSD Volume Type
  name_regex = "^amzn2-ami-hvm-.*x86_64-gp2"
  owners     = ["137112412989"] # Amazon
}

Radom Time

This is step 1 of the magic.   We need to generate a random number.  Note you will also need to have a random_id for each instance or EMR you are deploying.

resource random_id index {
  byte_length = 2
}

The Math

  1. subnet_ids_list: We need to convert the subnet ids from a set to a list so we can access with an index.  
  2. subnet_ids_random_index: We generate the random index from our random number.   The % does modulo calculation.  If you are not familar with modulo checkout the wikipedia Modulo operation article.
  3. instance_subnet_id: Using the random index we select a subnet id at random.  

And poof, there is your magic in action.

locals {
  subnet_ids_list = tolist(data.aws_subnet_ids.current.ids)
  
  subnet_ids_random_index = random_id.index.dec % length(data.aws_subnet_ids.current.ids)
  
  instance_subnet_id = local.subnet_ids_list[local.subnet_ids_random_index]
}

Using It

Now you have a random subnet id you can use in your aws_instance.  Also, note the ignore_changes to ensure that you don't accidently destroy/create the instance on an future run.  This would only occur if a new subnet was added to the VPC.

resource aws_instance instance {
  ami             = data.aws_ami.current.id
  instance_type   = "t3.micro"

  subnet_id = local.instance_subnet_id

  lifecycle {
    ignore_changes = [subnet_id]
  }

  tags = {
    Name = "random_subnet_test"
  }
}