git ls-files | xargs cat | entropy.rb | sort | tail -n20

One of our engineers came up with a useful script to grab all unique lines from the history of the repository and sort them according to entropy. This helps to lift any access keys or passwords which may have been committed at any point to the top.

That’s about what the commandline above does.

Here’s entropy.rb:

#!/usr/bin/env ruby

def shannon_entropy(s)
  d = {}
  s.each_char do |c|
    d[c] ||= 0.0
    d[c] += 1
  end

  res = 0.0
  d.each_value do |v|
    freq = v / s.length
    res -= freq * (Math.log(freq) / Math.log(2))
  end

  res
end

if __FILE__ == $0
  $stdin.each_line do |line|
    e = shannon_entropy(line)
    puts format("%.4f\t%s", e, line)
  end
end

The comment is from a Hacker News thread about a recent disclosure of (very few) private repositories on GitHub.

Another comment in the same thread points out that Shannon Entropy was used for that, which I then ported to Ruby.

And now, you can search for “interesting” lines in your repository. Have fun with what you find! :)