Every codebase has its own coding style guide. At some point, it was decided how code should be indented, aligned, shaped and organized. (In some cases, the decision was “Do whatever you want.”) Developers tend to have strong feelings about the “right” way of formatting code and discussions on style are always in danger of devolving into unproductive arguments – unless you use Go, in which case the discussion is just “Use gofmt.”

Style Enforcement

I have seen three approaches to maintaining coding style consistency in a codebase:

  1. What’s a style guide? These repos have no consistency between files, or even within the same file. Sometimes lines are wrapped at 80 characters, sometimes they run past the edge of your editor windows, even when you’re full screen. Diffs in these repos have a lot of noise due to whitespace changes – if you’re unlucky, devs don’t even use consistent line endings!

  2. Scout’s honor Everyone on the team (in theory) knows and agrees what the coding style should be, but there’s no enforcement. Someone may try to offer style feedback on open PRs from time to time, but they’re bound to miss things (who is going to take the time to manually figure out if a line is wrapped at 80 or 82 characters?) and will likely tire of being “the pedantic one.”

  3. Red builds By adding a style check step to CI, you ensure that the codebase adheres to the style guide, but in a sub-optimal way. First of all, the feedback loop is terrible. You see a red build, click over to the CI server, scroll through miles of output expecting to find a test failure and then discover you’re red because you didn’t format your hash correctly? That gets old quick. Additionally, nobody else on the team realizes you have style failures, they just think you’re writing code that can’t pass the tests. Finally, this approach is a non-starter for mature codebases with thousands of violations, unless you want all builds to be red as you bring the entire codebase into compliance.

Unlike Python, the Ruby community does not have a dictated style guide. Each Ruby codebase, then, has it’s own unique coding style. But once established, once you’ve agreed on proper spacing for block definitions, once you’ve beaten back the heathens who prefer tabs over spaces, how do you ensure everyone adheres to the guide? How do you detect the subversives who refuse to properly configure their editors?

Enter Rubocop. Rubocop is an open source gem that will statically analyze your codebase and report places where code has deviated from the unofficial style guide. Rubocop is easily customizable, allowing you to skip or modify checks for the parts of the style guide you disagree with.

I’ve long thought there’s a better approach to enforcing style guides and used Geekon, Groupon’s semi-annual hackfest, to build RubocopHQ.

A New Approach

I think of RubocopHQ as Rubocop-as-a-Service. Once a repository is properly configured, RubocopHQ checks all PRs for style violations and posts only those violations introduced by the PR as in-line comments.

RubocopHQ in action

RubocopHQ in action

Once all violations have been fixed (or if the PR didn’t introduce any in the first place,) RubocopHQ posts an “all clear” message to the PR, so you can merge with confidence.

All clear

All clear

This approach addresses a lot of the problems discussed above. Feedback comes quickly and is surfaced at the source of the violation. Publically available comments help to educate and remind the rest of the team what style rules to follow. And for codebases with numerous violations, RubocopHQ helps you gradually bring the codebase into conformity with the style guide. You’ll clean up the parts of the codebase that are in active development, and can punt on the file nobody has edited for 3 years, if you want.

Technical Details

The architecture of RubocopHQ wound up being pretty straightforward. A Sinatra server sits inside a Docker container running on a VM in Groupon’s data center. Each PR action triggers a webhook from our Github Enterprise (GHE) installation to the Sinatra server. Upon receiving the hook, Sinatra launches a new Docker container that handles running Rubocop and posting violations back to GHE. The Rubocop container clones the repository, checks out the necessary commit and tells Rubocop to spit out violations using its JSON formatter. Finally, RubocopHQ checks each violation against the PR’s diff, and only adds a comment if the PR actually changed the line containing the violation.

I was able to get everything coded up and automated by the end of the day on Wednesday and then refactored a bit on Thursday. There were a few funs problems to solve, but thankfully nothing unsurmountable.

Calling Docker From Within Docker

The Sinatra server runs inside a Docker container and launches a new Docker container for each Rubocop run. My Docker image is based on the official Ruby 2.2 image, which runs Debian. I added lines to my Dockerfile following the official instructions for installing Docker on Debian, mounted /var/run/docker.sock as a volume when I launched the server container and everything just worked. The only tricky part was making sure I installed the same version of Docker that runs on the VM, to avoid API mismatch errors.

Translating Line Numbers To Diff Positions

The Github API requires you to specify the position in the diff where a comment should be placed, not the line number. This was surprising at first, but actually makes sense. There are two sets of line numbers in the diff, one for the old version of the file and one for the new version. Using position simplifies their API, but complicates my code.

I was only concerned with lines added by the PR, which allowed for a simpler implementation. My code captures the output of git diff {base_sha}...{head_sha} and splits it into diff ranges (denoted by lines like @@ -157,6 +157,70 @@ ...). Each range tracks the range of line numbers in the updated file (in this case, lines 157-226) and knows its offset within the file diff. The process of finding an offset, then, is just:

  1. Initialize position to the range’s offset
  2. Initialize line_number to the range’s starting line number
  3. Iterate over all lines in the range. For each line,
    1. Increment position
    2. If the line doesn’t start with “-“ (that is, it’s a context line or an addition,) increment line_number
    3. If line_number matches the given line number, return position

As it turns out, I needed almost identical code to determine if the PR modified a given line in a file, so translating to positions wound up not costing me much.