Programmatic Terraform config manipulation, Semgrep's autofix, and an example of OSS contribution

A story of cloud, automation, but mostly just contributing to open source - in several acts

Prologue - The TL;DR

The long and short is that Semgrep's autofix experiment feels like a decent way to programatically manipulate Terraform configurations, along with many other things.

Check out the screenshot below of a local run or Semgrep playground's chair6:terraform-autofix-example for an example of how this might work when checking EC2 instance metadata configuration.

Semgrep Autofix Screenshot

But if you only read this tl;dr, then you'll miss out on knowing why I was looking at this in the first place, and - perhaps more interestingly - one example of what contributing to open source looks like.

Act 1 - The Setup

Somebody on a security-related Slack somewhere was asking about how folks programatically manipulate Terraform configurations, and there weren't many useful answers beyond "write something custom to interact with the HCL directly".

Mulling this over, I remembered that Semgrep had recently added an autofix experiment, as well as beta Terraform language support along with some example Terraform-related rules. Seemed like it could be an option?

Coming up with an arbitary use case ("I want to programatically check that all my aws_instance resources have a metadata_options argument set, and add it if not."), I started digging in. (For more information about why you might want to configure IMDSv2 metadata options for your EC2 instances, see the AWS blog article that introduced it.)

Act 2 - The Problem

I pip install semgrep'd in my working virtualenv, and soon ran into the first snag... the pip-installable package doesn't currently have an ARM64 build of semgrep-core (Semgrep's OCaml engine) that'll work on my Apple M1. According to a quick search, this is a known issue and the workaround is to brew install or use Rosetta, but I don't use Homebrew (MacPorts is still a thing) and would prefer not to use Rosetta.

According to my possibly-incomplete notes, the steps I took to get this going under MacPorts, based somewhat on semgrep-core documentation, were:

$ port install ocaml opam tree-sitter
$ opam init && source ~/.bash_profile
$ git clone https://github.com/returntocorp/semgrep
$ cd semgrep
$ make dev-setup
$ make build-core

👍, semgrep is installed and working now (which would've been much more straightforward without the M1 complication).

I threw together a simple Terraform configuration to target:

resource "aws_instance" "example1" {
  ami = "ami-005e54dee72cc1d01"
  instance_type = "t2.micro"
}

resource "aws_instance" "example2" {
  ami = "ami-005e54dee72cc1d02"
  instance_type = "t2.micro"
}

... then, after some trial-and-error, a Semgrep rule that fit the use case:

rules:
- id: ec2-instance-metadata-options
  languages:
  - terraform
  message: EC2 instance does not set metadata options
  severity: WARNING
  patterns:
  - pattern-inside: |
      resource "aws_instance" "$RESNAME" {
      ...
      }
  - pattern-not-inside: |
      resource "aws_instance" "..." {
        ...
        metadata_options {
          ...
        }
        ...
      }
  fix-regex:
    # we do a greedy match to get all but the last }, then add the metadata_options{} block before it
    regex: (.*)\}
    replacement: |
      \1
        metadata_options {
          http_tokens = "required"
        }
      }

The Semgrep run without autofix enabled looked good!!:

$ semgrep -c ec2-instance-metadata-options.yml main.tf
Running 1 rules...
main.tf
rule:ec2-instance-metadata-options: EC2 instance does not set metadata options
autofix: s/(.*)\}/\1
  metadata_options {
    http_tokens = "required"
  }
}
/g
1:resource "aws_instance" "example1" {
2:  ami           = "ami-005e54dee72cc1d01"
3:  instance_type = "t2.micro"
4:}
--------------------------------------------------------------------------------
autofix: s/(.*)\}/\1
  metadata_options {
    http_tokens = "required"
  }
}
/g
6:resource "aws_instance" "example2" {
7:  ami           = "ami-005e54dee72cc1d02"
8:  instance_type = "t2.micro"
9:}
ran 1 rules on 1 files: 2 findings

But running it with --autofix resulted in a somewhat messed-up Terraform configuration:

resource "aws_instance" "example1" {
  ami           = "ami-005e54dee72cc1d01"
  instance_type = "t2.micro"

  metadata_options {
    http_tokens = "required"

  metadata_options {
    http_tokens = "required"
  }
}


  metadata_options {
    http_tokens = "required"
  }
}

resource "aws_instance" "example2" {
  ami           = "ami-005e54dee72cc1d02"
  instance_type = "t2.micro"
}

It seemed like autofix didn't take changes to line numbering into account when making multi-line changes?

I swung on over to Semgrep's home on GitHub, and found a similar-sounding issue. I could have just added a comment or created another issue and moved on, but figured I might dig a little deeper and see what a fix might involve.

Act 3 - The Resolution

I started by taking a look at Semgrep's GitHub repository, https://github.com/returntocorp/semgrep - specifically the CONTRIBUTING.md, which pointed me to a nice set of contributor docs at https://semgrep.dev/docs/contributing/contributing-code/. (I'd actually already found this set of docs when I was figuring out how to build semgrep-core for the M1.)

The Python project layout felt pretty standard. I found the code of interest in autofix.py, and saw that my assumption was correct - the autofix logic used the line numbers from the original set of rule matches, rather than tracking deltas or offsets between individual fixes. (I also found a solid set of e2e tests, 😍.)

I pulled out the example above (Terraform configuration and Semgrep rule with autofix component) and added it to the e2e tests. On running the tests, it didn't have the expected output to match against but the test framework has a handy --snapshot-update to generate that. The new test passed at first, but then I corrected the expected output and had a failing test to work against.

The next step took some trial-and-error, and some working through the autofix logic. I started by adding a variable to track line count deltas between original code and fixed code for each proposed fix, and updating them accordingly for subsequent fixes. That got me to a passing test. But I wanted something to double-check the extended autofix logic against. I went back to the similar-sounding issue, grabbed the example input / rule, and success! The code that corrected autofix's results for my issue also corrected the the results for the related issue.

On looking through the project's GitHub issues again, I found another related issue. This one was due to multiple fixes on the same line, which pointed to a problem with column tracking rather than line tracking. I setup the e2e test using the example input / rule from this issue, corrected the snapshot output, and had a failing test. After adding column tracking to the code, I was able to get this test into a passing state as well.

I refactored the somewhat-hacky 🤪 changes I had into a form that better fit the existing codebase, specifically adding a FileOffsets object to track line offset, column offset, and active line for each file being targetted by autofix. I'd also ignored semgrep's pre-commit requirement up until now, so did some iterative runs of that while I cleaned up the type declarations, code formatting, and a few others things. I kept running the e2e tests through this process to make sure I hadn't broken anything.

After running the tests locally one last time, I made several commits against my local repository/branch, then pushed it up to GitHub. I first created an issue describing my particular problem, then fairly soon afterward created the PR. Hey, it passed the tests in CI first time!

Within 12-24 hours of submitting the PR, the Semgrep team jumped in and provided a few comments related to making sure we had clear documentation about what the changes I proposed did. I made a few minor tweaks / commits to address their feedback and 🎉🎉🎉, the PR was approved / merged.

Epilogue

The improved autofix went out with Semgrep's 0.77.0 release and 💥is now pip-installable💥. Go try it out! I suspect this autofix feature will have quite a range of uses. Your usage of autofix doesn't have to actually completely fix an issue... you could just use it to comment/flag or partially address something that needs further attention.

The autofix-ed Terraform configuration for the example above is correct when run with this new Semgrep release:

resource "aws_instance" "example1" {
  ami = "ami-005e54dee72cc1d01"
  instance_type = "t2.micro"

  metadata_options {
    http_tokens = "required"
  }
}

resource "aws_instance" "example2" {
  ami = "ami-005e54dee72cc1d02"
  instance_type = "t2.micro"

  metadata_options {
    http_tokens = "required"
  }
}

I should note that one thing I did do here was get straight to working on a fix, and assume that my contribution would be welcome. What I could (should?) have done was create a GitHub issue describing my issue and get the Semgrep team engaged before I did any work on the fix, but I wanted to explore the issue myself regardless and figured I wouldn't mind too much if they decided they didn't want the contribution.

The fix is not perfect by any means. The assumption and current state is that fixes are applied top-to-bottom and in a single pass; semgrep will not come back and re-parse or re-fix a file or line. This approach may need to be revisited or extended to support more complex overlapping autofix cases in future.

Thanks to the Semgrep team for a useful tool, and for the quick turnaround on the PR!

Posted on: Sun 19 December 2021

Category: tech – Tags: tech, security