Grepping in Ruby

You can’t call yourself a programmer if you never used Grep (or diff, or wget, or regular expressions for the matter). I couldn’t call myself a programmer till the fourth year of college, btw..

Still, grep has a pretty big manual page, and being a casual grep user I never see the need to memorize it. The one and only grep command I use and abuse is
grep -Rn "my pattern" . which, for people not familiar with it, means “search my pattern in all files under the current dir, recursively, and if find then print out also the line number”. Seriously, this might be the one and only use I have for grep now.

What about more complicated tasks, involving regular expressions and the like?

Since my job involves Ruby and Rails, there’s basically no need for me to use grep anymore. I mean, my tasks don’t involve real-time text line extractions; from time to time, though, they involve writing scripts that extract specific patterns from some log file.

Here’s where Ruby is my best friend. And given that I’ve had a bit of trouble putting together the bits and pieces found on the internet, I’m sticking it all together here:

Here’s a Ruby program that behaves like grep and searches all occurrences of a pattern on a given file, printing out the line number as well:


i = 0
file_name = "foo.bar"
File.open file_name do |fp|
  fp.each_line do |line|
    i=i+1
    print file_name, ":", i, ":", line if line =~ /hello world/
  end
end

The trick here is the line =~ /hello world/ part; this is a search within the current line, of the pattern hello world.

What if you want to find something more elaborate? And print some of the results them as well, but without the extra cruff from the line?

Then, you have to search for a more complex regular expression: the file remains the same, but for the pattern to match. For instance:


file_name = "foo.bar"
File.open file_name do |fp|
  fp.each_line do |line|
    if line =~ /(hello|dude)(.*)/
      print $1, $2
    end
  end
end

then for the input

hello man
Joe has some apples
Hey, Kate, hello you fox
and the cool man said "dude, you rock"

we’d get the output

hello you fox
dude, you rock"

See the trick? The basic thing to know here is that the pattern matches between round parentheses () are given a number, and we can later identify them by their number ($1, $2, etc..); also, the .* pattern means anything, while the | symbol means OR. So we’d interpret this pattern search like if line contains hello OR dude followed by ANYTHING then print those things and nothing else

Well.. that’s about the basis of regular expression matching in Ruby. For more tutorials on the subject, go over here: http://www.regular-expressions.info/ruby.html or here: http://www.ruby-doc.org/core/classes/Regexp.html.

This article was just an intro that I hope might be of help for a quick start into the subject…; in my humble opinion, the stuff here should be more than enough for most daily Ruby tasks.


Similar Posts:

Leave a Reply