ybits (why bits? because.)

One of the greatest things about TextMate is its powerful scripting engine, which includes native (non-string) support for regular expressions as well as support for shell interpolations. The ability to understand and use these mechanisms in snippets and commands makes the TextMate experience second to none.

In this post, I concentrate on regular expressions. In particular: captured groups, backreferences and conditionals — what they are, how to use them, and why. I also assume you have a basic understanding of what regex is and what the syntax looks like. If you don’t, read up first.

What are captured groups, backreferences and conditionals?

  • A captured group is a subpattern in an expression that can be evaluated and referenced later in the same expression. It is created by placing the pattern inside of parentheses.

    Captured groups are either implicitly enumerated or explicitly named. The enumeration is very simple; every opening paren begins a new enumeration. To figure out the number of a nested group, start from 1 and count each opening paren from left to right until you reach the group you’re looking for. That’s its number.

    I deal only with enumerated groups in the following examples.

  • A backreference is a section of an expression that references a preceding captured group.

  • A conditional is a section of an expression that evaluates a pattern, and based on a match (or not) executes another portion of the expression.

How do they work?

The best way to get through this is by getting right into an actual example. The following is taken directly from the model snippet in the Zend Framework TextMate Bundle.

class ${1:${TM_FILENAME/\.[^.]+$//}} extends ${2:Zend_Db_Table_Abstract}
{
	${3:protected \$_name = '${4:${1/^([A-Z])|(([a-z])([A-Z]))/(?1:\l$1)(?2:$3_\l$4)/g}}';}
	$0
}

The regex on line 1 isn’t really relevant here. What that does is take the file name and remove the extension. What is relevant is on line 3. The code on line 3 uses captured groups, backreferences and conditionals to convert the camel-cased value of $1 on line 1 to underscore.

Specifically, this expression:
=> /^([A-Z])|(([a-z])([A-Z]))/(?1:\l$1)(?2:$3_\l$4)/

There are two rules to convert a string from camel case to underscore:

  1. If the first letter is capital, change it to lowercase.
  2. For every occurrence of a lowercase letter followed by a capital letter, put an underscore between them and convert the capital to lowercase.

The example regex implements these rules, and converted to english, looks like this:

  • Match if the first letter is capital. Store the expression as group 1.
    => ^([A-Z])
  • Match each combination of a lowercase letter followed by capital letter. Store the entire expression as group 2, the lowercase expression as group 3 and the capital expression as group 4.
    => (([a-z])([A-Z]))
  • If group 1 is a match, convert what was matched to lowercase.
    => (?1:\l$1)
  • If group 2 is a match, convert what was matched for group 4 to lowercase, and add an underscore between group 3 and group 4.
    => (?2:$3_\l$4)

Referencing a captured group subpattern in a conditional for replacement is done using dollar sign-notation:
=> (?1:\l$1)

This says “if the first group matches, replace it with the lowercase value of the first group.”

Hopefully this is a gentle-enough introduction to these concepts, and demonstrates how they can be applied to TextMate scripting. There are hundreds of more complicated examples that live in your TextMate bundles. Look at them. Play with them. Learn from them.