TextMate and regular expressions

I’ve been trying to learn regular expressions for years, but never had a good use for them, because the tools that used them were so obscure. Now with TextMate, I have plenty of uses.

In the blogging bundle, the preferences format was:

Blog Name http://www.blogurl.com/

The regular expression for parsing that was:

^(.+?)\s+(https?:\/\/\.+)

I’ll break this down, mainly as an exercise for myself (talking helps understanding) and others can chime in. I’ll bold what I am commenting on.

  • ^(.+?)\s+(https?:\/\/.+) – Start at beginning of the line.
  • ^(.+?)\s+(https?:\/\/.+) – Grab at least one character, reluctantly, which means pay attention to the following patterns
  • ^(.+?)\s+(https?:\/\/.+) – Set a variable $1 to whatever is found inside the parenthesis
  • ^(.+?)\s+(https?:\/\/.+) – Find any breaking space (space, tab), one or more of them. 0 and the pattern fails
  • ^(.+?)\s+(https?:\/\/.+) – followed by http
  • ^(.+?)\s+(https?:\/\/.+) – followed by an optional s (the ? means 0 or 1 times)
  • ^(.+?)\s+(https?:\/\/.+) – followed by a colon
  • ^(.+?)\s+(https?:\/\/.+) – followed by a / (/ is a special char, so we need to escape it, with \)
  • ^(.+?)\s+(https?:\/\/.+) – followed by a second /
  • ^(.+?)\s+(https?:\/\/\.+) – followed any any characters, at least one of them

Whew! That is a lot of stuff. Ok, but I wanted to add an optional timeout value, spaces or tabs followed by numbers. Here is what I came up with:

/^(.+?)\s+(https?:\/\/\S+)\s*(\d+)?/
  • /^(.+?)\s+(https?:\/\/\S+)\s*(\d+)?/ – Here, I changed the .+, which was overly aggressive, to \S+, which means match any non-space characters, one or more
  • /^(.+?)\s+(https?:\/\/\S+)\s*(\d+)?/ – followed by white space, 0 or more. Has to be 0 or more, or a line without a timeout would fail
  • /^(.+?)\s+(https?:\/\/\S+)\s*(\d+)?/ – followed by one or more digits
  • /^(.+?)\s+(https?:\/\/\S+)\s*(\d+)?/ – capture those into a third variable
  • /^(.+?)\s+(https?:\/\/\S+)\s*(\d+)?/ – Specify that we can have either 0 or 1 of the digit patterns

Thats it. Now both of the following lines are valid blog entry lines:

Blog Name http://www.blogurl.com/

Blog Name http://www.blogurl.com/ 60

Thanks to Digi on #mac for the assistance.