We use Rubocop at work to enforce a consistent style, which has largely been a good thing for the codebase. Unfortunately, my preferred Ruby style doesn’t overlap perfectly with enforced style (the codebase prefers single quoted strings and hash rockets) and I quickly grew tired of CI builds failing due to style violations. Instead of changing my habits, I decided to write a pre-commit git hook that would prevent me from committing changes with style violations.

tl;dr Clone my Git repo, symlink the rubocop-pre-commit script to pre-commit in the .git/hooks directory of a repo you care about, and you won’t be able to commit files with Rubocop offenses:

bschmeck@osprey
[master] ~/src/example_repo > git ci -m "Example for a blog post"
Running rubocop against spec/example_repo/example_spec.rb
Rubocop found 13 offenses.  Aborting commit.
spec/example_repo/example_spec.rb:9 Metrics/LineLength: Line is too long. [125/100]

Git hooks are scripts that git will run either before or after an operation and allow you to modify or respond to an operation. This blog, for example, is published by a git hook that runs after changes are pushed to a repo on the server. Crucially, if a pre-commit hook exits with a non-zero value, git will abort the commit. For more detailed information about git hooks, the official documentation is here.

Our git hook needs to follow a few basic steps:

  1. Figure out which files to inspect with Rubocop
  2. Run Rubocop against those file and capture the results
  3. Abort the commit if there are any violations

I had some familiarity with both inspecting git output and programmatically running Rubocop after building RubocopHQ about a year ago. I didn’t wind up reusing any of the code, but was able to reuse a lot of the concepts.

Step One: What to Inspect?

Running Rubocop against the whole repo takes over 10 seconds, which is too slow to perform each time git commit runs. That’s also mostly unnecessary work, since the only files that need to be inspected are those that are staged for the commit. To figure out what those files are, the hook processes the output of git diff --cached --name-status.

class GitDiff
  def self.status_lines
    output = `git diff --cached --name-status`
    output.lines.map{|l| GitStatusLine.from(l) }.compact
  end
end

The diff output lists the status of all staged files, one per line, with the first character (called a filter character) on the line describing how the file was changed (added, modified, renamed, etc.) GitDiff::status_lines wraps each line of the diff output in a GitStatusLine value object (and ignores lines that don’t start with a known filter character.)

class GitStatusLine
  STATUSES = {
    "A" => :added,
    "B" => :pairing_broken,
    "C" => :copied,
    "D" => :deleted,
    "M" => :modified,
    "R" => :renamed,
    "T" => :type_changed,
    "U" => :unmerged,
    "X" => :unknown
  }

  STATUS_LINE_REGEX = /^[#{STATUSES.keys.join("|")}]\s+/

  def self.is_status_line?(line)
    line =~ STATUS_LINE_REGEX
  end

  def self.from(line)
    return nil unless is_status_line?(line)
    new(line.chomp)
  end

  attr_reader :status
  def initialize(line)
    @line = line
    @status = STATUSES[@line[0]]
  end

Because we’re only concerned with files that were added or modified, GitStatusLine objects provide predicate methods that allow us to detect each file’s status.

class GitStatusLine
  ...
  def added?
    status == :added
  end

  def modified?
    status == :modified
  end
end

Using those predicates, the (poorly-named) changed_files method returns pathnames for all files we need to inspect:

def changed_files
  GitDiff.status_lines
    .select{|f| f.added? || f.modified? }
    .map(&:pathname)
    .select(&:ruby?)
end

Pathnames are wrapped in another value object, GitStatusPathname, which implements a #ruby? predicate:

class GitStatusPathname
  def initialize(filename)
    @path = Pathname.new(filename)
  end

  def to_s
    @path.to_s
  end

  def ruby?
    @path.extname == ".rb"
  end
end

Step Two: Checking for Violations

After generating the list of all added and modified Ruby files, the script runs Rubocop against them and captures Rubocop’s output:

file_list = changed_files.join(" ")
if file_list == ""
  puts "No files to check."
  exit(0)
end

puts "Running rubocop against #{file_list}"
output, err, status = Open3.capture3("bin/rubocop --format json --force-exclusion --display-cop-names --auto-correct #{file_list}")

unless err.empty?
  puts "Error executing Rubocop."
  puts err
  exit(1)
end

The flags passed to Rubocop are:

  • --format json to generate machine readable output
  • --force-exclusion to obey file exclusion rules in .rubocop.yml, which is normally ignored when specifying files to inspect via the command line
  • --display-cop-names to show the full name of cops that detect violations, necessary for disabling cops
  • --auto-correct to automatically fix violations, where possible

Step Three: Communicating Success or Failure

If Rubocop doesn’t find any offenses, the script exits with a value of 0, signaling to git that everything is fine and the commit can proceed:

json = JSON.parse(output)

offenses = json["summary"]["offense_count"]
exit(0) if offenses == 0

If there were offenses, though, the script iterates through them all, printing out details for any offense that Rubocop was unable to automatically correct. The script then exits with a value of 1, causing git to abort the commit. This happens even if all offenses were corrected, because those changes remain unstaged:

puts "Rubocop found #{offenses} offense#{"s" if offenses > 1}.  Aborting commit."
clean = true
json["files"].each do |file|
  path = file["path"]
  uncorrected = file["offenses"].reject{|offense| offense["corrected"] }
  clean &&= uncorrected.empty?

  uncorrected.each do |offense|
    line = offense["location"]["line"]
    message = offense["message"]
    puts "#{path}:#{line} #{message}"
  end
end
puts "All offenses corrected." if clean

exit(1)

The unstaged corrections highlight a loophole in the script: Rubocop has no notion of what is staged versus unstaged, it merely checks the contents of the file. If you have staged changes that contain violations, then fix the violations but do not stage those fixes, the hook will allow your to commit to proceed, even though the staged changes contain violations. Don’t do that.

If you’re interested in using the hook, the full source is available under the MIT License on GitHub.