[Courses] [Perl] Part 5: The m// Operator

Dan dan at cellectivity.com
Sat Apr 30 00:19:05 EST 2005


LinuxChix Perl Course Part 5: The m// Operator

1) Introduction
2) An Example Program
3) Perl Regular Expressions
4) The Many Forms of m//
5) Exercises
6) Answers to Previous Exercises
7) Acknowledgements
8) Licensing


          ----------------------------------------

1) Introduction

Now we're getting into the truly interesting parts of Perl. In my
opinion the m// operator is one of the two most fun things about the
language. (The other is the s/// operator, which we'll get to in two
weeks.) These two operators alone make the language worth learning.
If you remember nothing else, remember these two operators.

          ----------------------------------------

2) An Example Program

Try executing the following program, then typing a few lines of
input. (Hint: try entering a line containing "foo".)

  #!/usr/bin/perl -w
  use strict;
  
  while ( defined( my $line = <STDIN> ) ) {
    if ( $line =~ m/foo/ ) {
      print "You entered a line containing 'foo'.\n";
    }
    else {
      print "You entered a line that doesn't contain 'foo'.\n";
    }
  }

Notice that the line with m// is does not use a single or double
equals sign, but rather an operator that we haven't seen before: an
equals sign followed by a tilde.

This example doesn't do justice to the power of m//, because m//
actually does regular expression matching. So let's look at Perl
regular expressions in a little more detail.

          ----------------------------------------

3) Perl Regular Expressions

I'm going to assume that you have some familiarity with regular
expressions. If you don't, feel free to ask the list.

Perl adds additional wildcards and capabilities to the regular
expression syntax provided by some other tools. This makes Perl
regular expressions somewhat proprietary, though people like them so
much that they've been built into several other languages (such as
PHP) as extensions. But remember that what works in Perl isn't
necessarily going to work in grep or some other tool.

In addition to the standard wildcards ("." for any character, etc.),
Perl adds these:
\s    white space
\S    anything but white space
\w    word character
\W    non-word character
\d    digit: equivalent to [0-9]
\D    non-digit

Perl also supports all the repetition flags you've probably seen in
your life:
*     zero or more
+     one or more
?     zero or one
{n}   exactly "n" instances
{m,n} between "m" and "n" instances

Here are the characters that have special meaning in Perl regular
expressions (I hope I got them all):
  . * + ? | ^ $ @ ( ) [ ] { } \
  and the delimiter (usually slash)
You escape a character (make it lose its special meaning) by
preceding it with a backslash.

Here are some quick examples of regular expressions in Perl:

  if ( $x =~ m/[a-z]{4}/ ) { ... }    # 4-letter word (lowercase)
  if ( $x =~ m/[A-Za-z]{4}/ ) { ... } # 4-letter word (any case)
  if ( $x =~ m/\@/ ) { ... }          # e-mail address (see below)

That last example reminds us of an important point: if part of the
string matches, the whole string matches. If the regular expression
must match the entire string, use ^ and $, like this:

  if ( $x =~ m/abc/ ) { ... }    # matches "abc" and "xabcx"
  if ( $x =~ m/^abc$/ ) { ... }  # matches only "abc"

          ----------------------------------------

4) The Many Forms of m//

In our examples we have been using a slash as a delimiter, but the
delimiter can be almost any punctuation. In other words, all of the
following are equivalent (they match one or more capital letters):

  if ( $x =~ m/[A-Z]+/ ) { ... }
  if ( $x =~ m:[A-Z]+: ) { ... }
  if ( $x =~ m![A-Z]+! ) { ... }
  if ( $x =~ m#[A-Z]+# ) { ... }
  if ( $x =~ m<[A-Z]+> ) { ... }

This avoids the need to escape slashes. However, if you do use a
slash as the delimiter, you can leave out the "m":

  if ( $x =~ /[A-Z]+/ ) { ... }

          ----------------------------------------

5) Exercises

a) Write a program that reads /etc/passwd and outputs the line
corresponding to your account.

b) Write a program that reads a C++ file and outputs all the #include
statements. Assume that no include statement spans multiple lines,
but consider that it might be indented.

          ----------------------------------------

6) Answers to Previous Exercises

a) The following program counts the number of lines in a file.

  #!/usr/bin/perl -w
  use strict;
  
  open MY_INPUT, "< file1.txt" or die "Couldn't open input file: $!";
  
  my $count = 0;
  while ( defined( <MY_INPUT> ) ) {
    $count++;
  }
  
  close MY_INPUT;
  
  print "The file has $count lines.\n";

b) The following program copies only non-blank lines from one file to
another.

  #!/usr/bin/perl -w
  use strict;
  
  open MY_INPUT,  "< file1.txt" or die "Couldn't open input file:
$!";
  open MY_OUTPUT, "> file2.txt" or die "Couldn't open output file:
$!";
  
  while ( defined( my $line = <MY_INPUT> ) ) {
    if ( $line ne "\n" ) {
      print MY_OUTPUT $line;
    }
  }
  
  close MY_INPUT;
  close MY_OUTPUT;

c) If you don't specify "<" or ">" in the open statement, the file
will be opened for reading. But you should get in the habit of
specifying a "<" anyway, because you'll often be reading a file that
the user passed on the command line. You don't want to erase a
valuable file just because the user accidentally (or maliciously)
preceded the filename with a ">".

d) The reason we must use "defined" to determine if we have reached
the end of the file is that some strings evaluate numerically to
zero, which evaluates to false. For example, if the very last line of
a file is the digit zero without a newline, we might skip over it if
we don't test it using "defined". That's the kind of bug that will
take you forever to find!

e) A vertical bar causes "open" to execute a program instead of
opening a file. If the vertical bar comes before the program name,
you can use "print" to write to the program's standard input. If the
vertical bar comes after the program name, you can use the <>
operator to read the program's output.

f) If you set $/ to undef, <> will continue reading until it reaches
the end of the file (which doesn't make sense with STDIN, unless you
pipe the input from a file). This is called "slurp mode".

          ----------------------------------------

7) Acknowledgements

A big thank you to Jacinta Richardson for suggestions and
proofreading. More advanced Perl users might want to check out the
free material from Perl Training Australia
<http://www.perltraining.com.au/>, which she is a part of.

Other contributors include Meryll Larkin.

          ----------------------------------------

8) Licensing

This course (i.e., all parts of it) is copyright 2003-2005 by Dan
Richter and Alice Wood, and is released under the same license as
Perl itself (Artistic License or GPL, your choice). This is the
license of choice to make it easy for other people to integrate your
Perl code/documentation into their own projects. It is not generally
used in projects unrelated to Perl.




More information about the Courses mailing list