[Courses] [Perl] Part 8: The Special Variable $_

Dan dan at cellectivity.com
Sat May 21 01:10:51 EST 2005


LinuxChix Perl Course Part 8: The Special Variable $_

1) Introduction
2) An Implied Variable
3) $_ in While Loops
4) Back to the Original Example
5) Exercises
6) Answers to Previous Exercises
7) Acknowledgements
8) Licensing


          ----------------------------------------

1) Introduction

If you've read real-life Perl code before, you may remember being
puzzled by parts that read like this:

  while ( <STDIN> ) {
    s/foo/bar/;
    print;
  }

Most of this already looks familiar from the previous few weeks. But
this code is confusing because it doesn't look like any variables are
assigned or changed. We're now going to unravel this mystery.

          ----------------------------------------

2) An Implied Variable

We've seen that Perl has some funny variables, like "$<" for your
UID. But by far the most useful of these is "$_" (pronounced
"dollar-underscore").

This variable is passed as an understood parameter to many functions
and operators if an expected parameter is not explicitly passed.
Examples:

  chomp;       # Equivalent to chomp($_);
  print;       # Equivalent to print $_;

"$_" is also the understood subject of "m//" and "s///":

  s/foo/bar/;             # Equivalent to $_ =~ s/foo/bar/
  if ( /baz/ ) { ... }    # Equivalent to $_ =~ m/baz/

By the way, despite the "magic" nature of "$_", it can still be
assigned to or from just like any other variable.

          ----------------------------------------

3) $_ in While Loops

For any file handle "HANDLE", the following are equivalent:

  while ( <HANDLE> ) { ... }
      # ... is equivalent to ...
  while ( defined( $_ = <HANDLE> ) ) { ... }

Note that this trick only works if "<HANDLE>" is used as the sole
criterion of a while loop. So the following do NOT set "$_":

  <HANDLE>;                  # NOT equivalent to $_ = <HANDLE>
  if ( <HANDLE> ) { ... }    # NOT equivalent to $_ = <HANDLE>

          ----------------------------------------

4) Back to the Original Example

After this explanation, the example we started with looks a lot
clearer.

  while ( <STDIN> ) {   # while ( defined( $_ = <STDIN> ) ) {
    s/foo/bar/;         #   $_ =~ s/foo/bar/;
    print;              #   print $_;
  }                     # }

          ----------------------------------------

5) Exercises

a) Modify a recent program that we've written to use "$_" instead of
a variable. Try to avoid explicitly naming "$_".

b) Another "implied" variable is "ARGV", a which is the default
argument to the diamond operator "<>". Try running the following
program to find out what it does.

  #!/usr/bin/perl -w
  use strict;
  
  while ( <> ) {
    print;
  }

Hint: try specifying several file names on the command line.

          ----------------------------------------

6) Answers to Previous Exercises

a) The following program changes "Quirrel" to "Lockhart":

  #!/usr/bin/perl -w
  use strict;
  
  while ( defined(my $line = <STDIN>) ) {
    $line =~ s/Quirrel/Lockhart/g;
    print $line;
  }

To simply things we don't worry about false matches. However, there
is a small chance that we'll run into trouble if there's a discussion
of squirrels. The answer to the next exercise explains how to avoid
false matches.

b) Rather than just explaining the solution to this one, I'm going to
go through the reasoning to find the solution.

When we try to convert "dead" to "metabolically different", we want
to match the whole word "dead", but not words that contain it. So we
do something like this:

  $line =~ s/ dead / metabolically different /g;

The spaces around the word avoid false matches like "deaden", but
they also mean that the word won't be replaced if it's followed by a
comma or some other punctuation. So we want a line like this:

  $line =~ s/[^A-Za-z]dead[^A-Za-z]/metabolically different/g;

But that causes the characters before and after the word to be
replaced as well, so we have to include them on the right side of the
substitution:

  $line =~
    s/([^A-Za-z])dead([^A-Za-z])/${1}metabolically different$2/g;

Oh dear: now we won't match a line that begins or ends with "dead"
because our pattern requires a character before and after the word.
So we'll have to code some special cases:

  $line =~
    s/([^A-Za-z])dead([^A-Za-z])/${1}metabolically different$2/g;
  $line =~ s/^dead([^A-Za-z])/metabolically different$1/g;
  $line =~ s/([^A-Za-z])dead$/${1}metabolically different/g;
  $line =~ s/^dead$/metabolically different/g;

Relax: it gets better now. The point of this whole mess was to bring
up some useful Perl functionality. We can avoid having to write
"[^A-Za-z]" all the time by using \W instead. \w (lowercase) matches
a "word character" (letter, digit or underscore). \W (uppercase)
matches anything that's not \w. So we can write this instead:

  $line =~ s/(\W)dead(\W)/${1}metabolically different$2/g;
  # ... and so on

But an even better solution is to use the word boundary character:
\b. This handy little guy matches the space between two characters:
between a \w and a \W (in either order) or between a \w and the
beginning or end of the string. It's actually zero-length, so we
don't have to worry about replacing it.

So we can do everything in one substitution:

  $line =~ s/\bdead\b/metabolically different/g;

And the whole program looks like this:

  #!/usr/bin/perl -w
  use strict;
  
  while ( defined(my $line = <STDIN>) ) {
    $line =~ s/\bdead\b/metabolically different/g;
    print $line;
  }

c) To change U.S. spelling to U.K. spelling, we use \b as in the
previous exercise.

  #!/usr/bin/perl -w
  use strict;
  
  while ( defined(my $line = <STDIN>) ) {
    $line =~ s/ize\b/ise/g;
    print $line;
  }

Here is a more advanced version that takes into account variations
like "realised" and "desensitising". It also avoids changing the
spelling of the words "size", "resize" and "downsize", which are
spelt the same way in the U.K. as in America:

  #!/usr/bin/perl -w
  use strict;
  
  while ( defined(my $line = <STDIN>) ) {
    $line =~ s/iz(e|es|ed|ing)\b/is$1/g;
    $line =~ s/\b(re|down)?sis(e|es|ed|ing)\b/$1siz$2/g;
    print $line;
  }

d) The "e" option causes the substitution to be evaluated as Perl
code. You can use this to cause "s///" to call an arbitrary function
on every match.

          ----------------------------------------

7) Acknowledgements

A big thank you to Jacinta Richardson for suggestions and
proofreading. More advanced Perl users might want to check out the
free material from Perl Training Australia
<http://www.perltraining.com.au/>, which she is a part of.

Other contributors include Meryll Larkin.

          ----------------------------------------

8) Licensing

This course (i.e., all parts of it) is copyright 2003-2005 by Dan
Richter and Alice Wood, and is released under the same license as
Perl itself (Artistic License or GPL, your choice). This is the
license of choice to make it easy for other people to integrate your
Perl code/documentation into their own projects. It is not generally
used in projects unrelated to Perl.



More information about the Courses mailing list