Ruby, iOS, and Other Development

A place to share useful code snippets, ideas, and techniques

All code in posted articles shall be considered public domain unless otherwise noted.
Comments remain the property of their authors.

2006-06-07

Ungreedy Regular Expressions in Ruby

I was recently working on a script to condense or pretty-print CSS. Condensing is actually pretty easy, but pretty-printing involved preserving comments while sorting style directives within rules. (For those who aren't familiar with CSS, its comments are delimited by /* and */ just like in C.) Matching comments, particularly multiline comments, is pretty easy as long as you can make your regular expressions ungreedy. The naïve, greedy regex /\/\*.*\*\//m (note the m option at the end, which sets the multiline option for the Regexp) will not stop at just one comment but will match everything from the beginning of the first comment to the end of the last comment, including all the uncommented code in between. This is clearly wrong, and the problem is that * (and +) is greedy (i.e. matches as much text as it can).

If greedy matching is the problem, how do we make it ungreedy? It turns out that Ruby takes a page from Perl regular expressions and whereas * (and +) is the greedy version, *? (and +?) is the ungreedy version. Thus our problem regex becomes /\/\*.*?\*\//m and works as desired.

This may not be quite as significant as previous posts, but it's really handy to know when you need it.

Labels: ,

7 Comments:

Post a Comment

<< Home