ACK vs. GREP

I’ve been using grep for searching through files until ACK came along. I was wondering if it was better to filter large (5GB) logs with ACK or GREP. Is ACK really better than GREP?

Setup

I used my Macbook Pro for these benchmarks.

  • Processor: 2.33GHz Intel Core 2 Duo
  • Memory: 3GB
  • HD: 160GB 5400 RPM

There are three tests, I ran: searching a particular file, searching throught a piped input, and searching through a rails project with many subdirectories to find a particular word.

I generated a log with 100000 lines of random text using irb:


pids=['12345','54321','55555','65432','88888']
chars='abcdefghjkmnpqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ123456789 '
File.open('blog_log.txt', 'w') do |f|
100000.times {f.write("#{Time.now.to_s} rails[#{pids[rand(pids.size)]}] "); 500.times{f.write("#{chars[rand(chars.size)].chr}")}; f.write("\n") }
end

Then, I ran sed, awk, grep, and ack on the log file searching for ‘rails[12345]‘. I also did tail the log and piped it to those tools, because sometimes I might want the tail end of a log.

I also had a project with about 29900 files. I ran grep and ack on it to find any references to ‘app_helper’ in my files.

I ran each command 3 times with time command and averaged the real times, while outputting to /dev/null to not have any output.

Results

Test 1
Commands:


time grep 'rails\[12345\]' blog_log.txt > /dev/null
time awk '/rails\[12345\]/' blog_log.txt > /dev/null
time ack 'rails\[12345\]' blog_log.txt > /dev/null
time sed -n "/rails\[12345\]/ p" blog_log.txt > /dev/null

ACK GREP SED AWK
2.5820s 0.0960s 0.0920s 1.8317s

SED was the fastest, followed by GREP, AWK, and finally ACK. ACK was 27 times slower than GREP.

Test 2
Commands:


time tail -n 50000 blog_log.txt | grep 'rails\[12345\]' > /dev/null
time tail -n 50000 blog_log.txt | awk '/rails\[12345\]/' > /dev/null
time tail -n 50000 blog_log.txt | ack 'rails\[12345\]' > /dev/null
time tail -n 50000 blog_log.txt | sed -n "/rails\[12345\]/ p" > /dev/null

ACK GREP SED AWK
1.2047s 0.1873s 0.2463s 0.8587s

This time GREP was the fastest, then SED, AWK, and ACK. This doesn’t look good for ACK, until…

Test 3
Commands:


time grep -r app_helper . > /dev/null
time ack app_helper > /dev/null

ACK GREP
1.0773s 4.5027s

The winner here was ACK, it was 4 times faster than GREP.

The Winner

Neither, it’s a tie! Use grep and sed for processing single files and streams. Use ACK for searching through files for a specific pattern.

2 Comments

  1. Posted April 2, 2009 at 1:54 pm | Permalink

    I’m surprised that ack was that much slower in your first case. Still, your point of “use grep when speed of execution is the most important factor” is a good one.

  2. Posted April 2, 2009 at 7:11 pm | Permalink

    I was surprised about it too, but I guess it’s because the linux tools are compiled with C and meant to work with the OS’s streams, while ack is a general perl script is at the mercy of the perl interpter. Ack’s performance will probably vary on different systems.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">