git ls-files | xargs cat | entropy.rb | sort | tail -n20
∞
One of our engineers came up with a useful script to grab all unique lines from the history of the repository and sort them according to entropy. This helps to lift any access keys or passwords which may have been committed at any point to the top.
That’s about what the commandline above does.
Here’s entropy.rb
:
#!/usr/bin/env ruby
def shannon_entropy(s)
d = {}
s.each_char do |c|
d[c] ||= 0.0
d[c] += 1
end
res = 0.0
d.each_value do |v|
freq = v / s.length
res -= freq * (Math.log(freq) / Math.log(2))
end
res
end
if __FILE__ == $0
$stdin.each_line do |line|
e = shannon_entropy(line)
puts format("%.4f\t%s", e, line)
end
end
The comment is from a Hacker News thread about a recent disclosure of (very few) private repositories on GitHub.
Another comment in the same thread points out that Shannon Entropy was used for that, which I then ported to Ruby.
And now, you can search for “interesting” lines in your repository. Have fun with what you find! :)