ACK vs. GREP
I’ve been using grep for searching through files until ACK came along. I was wondering if it was better to filter large (5GB) logs with ACK or GREP. Is ACK really better than GREP?
Setup
I used my Macbook Pro for these benchmarks.
- Processor: 2.33GHz Intel Core 2 Duo
- Memory: 3GB
- HD: 160GB 5400 RPM
There are three tests, I ran: searching a particular file, searching throught a piped input, and searching through a rails project with many subdirectories to find a particular word.
I generated a log with 100000 lines of random text using irb:
pids=['12345','54321','55555','65432','88888']
chars='abcdefghjkmnpqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ123456789 '
File.open('blog_log.txt', 'w') do |f|
100000.times {f.write("#{Time.now.to_s} rails[#{pids[rand(pids.size)]}] "); 500.times{f.write("#{chars[rand(chars.size)].chr}")}; f.write("\n") }
end
Then, I ran sed, awk, grep, and ack on the log file searching for ‘rails[12345]‘. I also did tail the log and piped it to those tools, because sometimes I might want the tail end of a log.
I also had a project with about 29900 files. I ran grep and ack on it to find any references to ‘app_helper’ in my files.
I ran each command 3 times with time command and averaged the real times, while outputting to /dev/null to not have any output.
Results
Test 1
Commands:
time grep 'rails\[12345\]' blog_log.txt > /dev/null
time awk '/rails\[12345\]/' blog_log.txt > /dev/null
time ack 'rails\[12345\]' blog_log.txt > /dev/null
time sed -n "/rails\[12345\]/ p" blog_log.txt > /dev/null
| ACK | GREP | SED | AWK |
|---|---|---|---|
| 2.5820s | 0.0960s | 0.0920s | 1.8317s |
SED was the fastest, followed by GREP, AWK, and finally ACK. ACK was 27 times slower than GREP.
Test 2
Commands:
time tail -n 50000 blog_log.txt | grep 'rails\[12345\]' > /dev/null
time tail -n 50000 blog_log.txt | awk '/rails\[12345\]/' > /dev/null
time tail -n 50000 blog_log.txt | ack 'rails\[12345\]' > /dev/null
time tail -n 50000 blog_log.txt | sed -n "/rails\[12345\]/ p" > /dev/null
| ACK | GREP | SED | AWK |
|---|---|---|---|
| 1.2047s | 0.1873s | 0.2463s | 0.8587s |
This time GREP was the fastest, then SED, AWK, and ACK. This doesn’t look good for ACK, until…
Test 3
Commands:
time grep -r app_helper . > /dev/null
time ack app_helper > /dev/null
| ACK | GREP |
|---|---|
| 1.0773s | 4.5027s |
The winner here was ACK, it was 4 times faster than GREP.
The Winner
Neither, it’s a tie! Use grep and sed for processing single files and streams. Use ACK for searching through files for a specific pattern.
2 Comments
I’m surprised that ack was that much slower in your first case. Still, your point of “use grep when speed of execution is the most important factor” is a good one.
I was surprised about it too, but I guess it’s because the linux tools are compiled with C and meant to work with the OS’s streams, while ack is a general perl script is at the mercy of the perl interpter. Ack’s performance will probably vary on different systems.