git ls-files | xargs cat | entropy.rb | sort | tail -n20

One of our engineers came up with a useful script to grab all unique lines from the history of the repository and sort them according to entropy. This helps to lift any access keys or passwords which may have been committed at any point to the top.

That’s about what the commandline above does.

Here’s entropy.rb:

#!/usr/bin/env ruby

def shannon_entropy(s)
  d = {}
  s.each_char do |c|
    d[c] ||= 0.0
    d[c] += 1

  res = 0.0
  d.each_value do |v|
    freq = v / s.length
    res -= freq * (Math.log(freq) / Math.log(2))


if __FILE__ == $0
  $stdin.each_line do |line|
    e = shannon_entropy(line)
    puts format("%.4f\t%s", e, line)

The comment is from a Hacker News thread about a recent disclosure of (very few) private repositories on GitHub.

Another comment in the same thread points out that Shannon Entropy was used for that, which I then ported to Ruby.

And now, you can search for “interesting” lines in your repository. Have fun with what you find! :)