You are using an outdated browser which puts all net citizens at risk. As an incentive to upgrade to a current and thus much more secure product (we recommend the free Firefox browser), you won't be able to visit this site in its cute design, but in this rather boring printer-ready version only. Thank you for considering a browser update!


Don’t worry, this is not about greedy bankers, megalomaniac corporations nor currupt politicians – although I would love to have a simple fix for these at hand, too.

Simple Example

With eagerness I mean a simple yet important and too often unknown concept of regular expressions. Take this Ruby example, it would yield the same results in any other language such as JavaScript or even PHP:

"Hello Ruby friend".sub(/^(.*)e/,  'X')   # => "Xnd"
"Hello Ruby friend".sub(/^(.*?)e/, 'X')   # => "Xllo Ruby friend"

By default, the * quantifier is eager which means it will eat its way though the string until it hits the last subsequent e. If you want it to go no further than the first subsequent e, you have to put it in non-eager mode – and the question mark behind the * quantifier does just that.

The eagerness questionmark works behind any quantifier, but for obvious reasons makes sense only behind *, + and {n,m}.

“Fine, dude, but can we use this eagerness thing for anything real?”

Real World Examples

Yes, we can! Eagerness comes in handy to isolate pieces from a string, a common task if you deal with a little less structured data sources such as webpage content snippets like this one:

<div class="product">Click <a href="/order">here</a> to order Supertool Deluxe (version 1.2.9) now</div>

A regex to isolate the product and version could look as follows:

s =  '<div class="product">Click <a href="/order">here</a> to order Supertool Deluxe [version 1.2.9] now</div>'
s =~ /<div class="product">.*order ([^\(]+)\[version ([\d\.]+)\]/
$1   # => "Supertool Deluxe "
$2   # => "1.2.9"

Well, it works somehow, but you’d have to strip that trailing space from the product name, the expression is somewhat hard to read and if you bump into a beta version “1.3.0 beta” or a product called the “Ultimate order tool”, you’re screwed:

s =  '<div class="product">Click <a href="/order">here</a> to order Ultimate order tool [version 0.2] now</div>'
s =~ /<div class="product">.*order ([^\(]+)\[version ([\d\.]+)\]/
$1   # => "tool "
$2   # => "0.2"

Here’s an alternative approach using low eagerness:

s =  '<div class="product">Click <a href="/order">here</a> to order Ultimate order tool [version 0.3 beta] now</div>'
s =~ /<div class="product">.*?order (.+?) \[version (.+?)\]/
$1   # => "Ultimate order tool"
$2   # => "0.3 beta"

If you liked this tip, you might want to take a look at the Regex in a Nutshell Cheat Sheet as well.

(Sven Schwyn)

(We are remaking our web presence and therefore comments are temporary disabled.)