Avoiding Perl Capture Variables
2010-02-17 Leave a Comment
In a recent conversation with an acquaintance of mine, he pointed out that Perl is very hard to read. I initially took offense as I loved the language during the relatively short time I coded in it, but I quickly realized that he was right. At the cost of flexibility, Perl tends to encourage unreadable code. You basically have to learn how to code Perl in such a way that others can clearly understand what’s going on, and to this end, Perl Best Practices by Damian Conway was an awesome book. I think that the challenge of coding Perl in a readable fashion made me a better programmer because it forced me to try and see my code from a different perspective.
Anyway, here’s one technique that I used to when dealing with regexes that captured matches. Needless to say, Perl’s capture variables are non-semantic, and convey nothing but the order of the captured matches. Here’s an example:
my $string = "eugene kashida tyler kashida";
my $re = /(eugene)[\s\w]+(tyler)/xms;
$string =~ $re;
At this point, you can access the captured values from the capture buffers $1 and $2 respectively. If you wanted to improve readability and reduce the possibility of introducing bugs, you should immediately unpack these values into variables with meaningful names:
my ($dad, $son) = ($1, $2);
Here's a technique that lets you skip the handling of the numbered variables completely:
my ($dad, $son) = $string =~ $re;
If a regex match is executed in a list context, a successful match returns a list of captured strings. It's helpful to note that this also avoids the problem where if the match fails, the numbered capture variables contain values from the last successful match. Regardless of the way you approach it, you should always be checking that the match was successful.
Ah, Perl! How I miss you so!