Blog

Apr
01
ACK vs. GREP
by Alex Chee | Article

I’ve been using grep for searching through files until ACK came along. I was wondering if it was better to filter large (5GB) logs with ACK or GREP. Is ACK really better than GREP?

Setup

I used my Macbook Pro for these benchmarks.

  • Processor: 2.33GHz Intel Core 2 Duo
  • Memory: 3GB
  • HD: 160GB 5400 RPM

There are three tests, I ran: searching a particular file, searching throught a piped input, and searching through a rails project with many subdirectories to find a particular word.

I generated a log with 100000 lines of random text using irb:


pids=['12345','54321','55555','65432','88888']
chars='abcdefghjkmnpqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ123456789 '
File.open('blog_log.txt', 'w') do |f|
100000.times {f.write("#{Time.now.to_s} rails[#{pids[rand(pids.size)]}] "); 500.times{f.write("#{chars[rand(chars.size)].chr}")}; f.write("\n") }
end

Then, I ran sed, awk, grep, and ack on the log file searching for ‘rails[12345]‘. I also did tail the log and piped it to those tools, because sometimes I might want the tail end of a log.

I also had a project with about 29900 files. I ran grep and ack on it to find any references to ‘app_helper’ in my files.

I ran each command 3 times with time command and averaged the real times, while outputting to /dev/null to not have any output.

Results

Test 1
Commands:


time grep 'rails\[12345\]' blog_log.txt > /dev/null
time awk '/rails\[12345\]/' blog_log.txt > /dev/null
time ack 'rails\[12345\]' blog_log.txt > /dev/null
time sed -n "/rails\[12345\]/ p" blog_log.txt > /dev/null

ACK GREP SED AWK
2.5820s 0.0960s 0.0920s 1.8317s

SED was the fastest, followed by GREP, AWK, and finally ACK. ACK was 27 times slower than GREP.

Test 2
Commands:


time tail -n 50000 blog_log.txt | grep 'rails\[12345\]' > /dev/null
time tail -n 50000 blog_log.txt | awk '/rails\[12345\]/' > /dev/null
time tail -n 50000 blog_log.txt | ack 'rails\[12345\]' > /dev/null
time tail -n 50000 blog_log.txt | sed -n "/rails\[12345\]/ p" > /dev/null

ACK GREP SED AWK
1.2047s 0.1873s 0.2463s 0.8587s

This time GREP was the fastest, then SED, AWK, and ACK. This doesn’t look good for ACK, until…

Test 3
Commands:


time grep -r app_helper . > /dev/null
time ack app_helper > /dev/null

ACK GREP
1.0773s 4.5027s

The winner here was ACK, it was 4 times faster than GREP.

The Winner

Neither, it’s a tie! Use grep and sed for processing single files and streams. Use ACK for searching through files for a specific pattern.

2 Comments
April 2, 2009

I’m surprised that ack was that much slower in your first case. Still, your point of “use grep when speed of execution is the most important factor” is a good one.

April 2, 2009

I was surprised about it too, but I guess it’s because the linux tools are compiled with C and meant to work with the OS’s streams, while ack is a general perl script is at the mercy of the perl interpter. Ack’s performance will probably vary on different systems.