[Courses] [Perl] Part 9: Simple File Access

Tue Sep 16 10:28:05 EST 2003

LinuxChix Perl Course Part 9: Simple File Access

1) Introduction
2) The "die" Command
3) Simple File Access
4) Exercise
5) Answers to Previous Exercises
6) Past Information
7) Credits
8) Licensing

             -----------------------------------

1) Introduction

As we will see shortly, Perl uses the "open" command to read from and write to
files. The "open" command is very powerful and is not limited to files, but we
won't see that until next week.

             -----------------------------------

2) The "die" Command

The "die" command does exactly that: it kills your program. Example:

   die;

Of course, you'll usually want to make the command conditional and give a
helpful error message:

   if ( $there_was_an_error ) {
     die "There was an error. Sorry. Aborting.";
   }

If you run this command (and you set $there_was_an_error to some non-zero
value), you will see:

   There was an error. Sorry. Aborting. at test.pl line 7.

Of course, if you try it yourself, "test.pl" will be replaced by the file in
which you did the test. Perl is trying to be helpful by telling you where the
program died. But if you don't want that additional information, put a newline
on the end of the message string:

   die "There was an error. Sorry. Aborting.\n";   # No file+line information.

There are two additional subtleties to the "die" command:
a) it causes the message to be sent to standard error, not standard out, so
    the user will see it even if he is piping the output to a file.
b) it sets the program's exit code to indicate that an error occurred. A
    program such as a shell script can detect this error code and react
    accordingly.

Sometimes it's helpful to include the last error message, which is stored in
the special variable "$!". We'll see more about this in a moment.

Obviously, "die" should only be used when something has prevented the program
from doing whatever it was supposed to do. To end a program without an error,
use "exit".

             -----------------------------------

3) Simple File Access

Here's a simple program to copy one file to another:

   #!/usr/bin/perl -w
   use strict;

   open MY_INPUT_FILE, "< in.txt"   or die "Can't read in.txt: $!";
   open MY_OUTPUT_FILE, "> out.txt" or die "Can't write to out.txt: $!";

   while ( defined ( my $line = <MY_INPUT_FILE> ) ) {
     print MY_OUTPUT_FILE $line;
   }

   close MY_INPUT_FILE;
   close MY_OUTPUT_FILE;

Let's explain this from the top down.

First, we can see that the "open" command in this example takes two parameters:
a file handle and a file name. You'll notice that the file handle
(MY_INPUT_FILE in this example) isn't a normal scalar because it doesn't begin
with a dollar sign. (We didn't have to declare it with "my" either.) It also
doesn't have quotes around it. Plain text like this that isn't surrounded by
explanitory punctuation (like quotes) is called a "bareword". We'll be seeing
barewords a little more in the future. We actually could have put quotes around
it (in this "open" command at least), but by convention we don't.

The "file handle" is simply a way of referring to a file once you've opened it.
In C, file handles are integers; in Perl, they are text. By convention we write
file handles in all CAPS.

The file name in this case is "in.txt". The "<" in front of it means that the
file is being opened in read mode (so we will be able to read from it but not
write to it). The white space after the "<" won't cause any problems because
"open" strips all white space from the beginning and end of the file name. As
we will see later, you can use a command called "sysopen" if you want to open
files that begin or end with whitespace characters.

After opening the file, we check whether the "open" succeeds using "or die
...". Since Perl uses "short-circuit" boolean evaluation, the "die" command
will only be executed if "open" returns false (failure). This means that if the
file doesn't exist or can't be read, we'll know it right away. It's important
to detect problems right away; otherwise we'll be scratching our heads
wondering why the read command fails.

The command to open the output file is just like the command to open the input
file, except that it uses ">" instead of "<". If we wanted to append to the
file instead of overwriting it, we would use ">>".

In the next line, we can see that we can read from the file by surrounding its
file handle with "<>". We've seen this before: remember "<STDIN>"? In that case
the file handle was "STDIN", but the same syntax can be applied to any file
handle.

The line containing "print" is a little different from any "print" statement
we've seen before because it includes a file handle. This causes "print" to
output the text to the file handle rather than standard output (STDOUT). Also
note that there's no comma between the file handle and the text to be printed.

Finally, we close both files. If we forget to close a file, Perl does a good
job of cleaning up after our sloppiness, but it's still best to close the file
explicitly.

Note that all built-in Perl functions/commands can be written with or without
parentheses, so we could have written the above program like this:

   open(MY_INPUT_FILE, "< in.txt")   or die "Can't read in.txt: $!";
   open(MY_OUTPUT_FILE, "> out.txt") or die "Can't write to out.txt: $!";

   while ( defined ( my $line = <MY_INPUT_FILE> ) ) {
     print(MY_OUTPUT_FILE $line);
   }

   close(MY_INPUT_FILE);
   close(MY_OUTPUT_FILE);

This has no effect whatsoever on the way the program functions. As usual, There
Is More Than One Way To Do It. Just make sure you don't put a space between the
function name and the parenthesis; otherwise Perl will interpret it as a list.

As we've seen, a prefix (such as "<" or ">") in front of the file name
indicates whether the file is for reading or writing. Here is a list of such
prefixes:
   Prefix  Meaning
     <     read-only
     >     write-only
     >>    write-only, appending to end of file
     +<    read+write (if you're not sure which read+write to use, it's this)
     +>    read+write, erasing the file when it's opened
     +>>   read+write, appending data to the end of the file
     |     execute command (also works as a suffix; more about this
           next week)

Additionally, a file that is just called "-" refers to standard input (if you
read from it) or standard output (if you write to it). This is a convention
used in many Unix commands.

This begs the question: what do you do if you want to open a file that begins
with one of these characters or with white space? Perl includes a more
traditional command called "sysopen" which assumes that the file name you give
it is exactly the file name you want. After opening the file, you can operate
on the file handle exactly as if you had used "open" (i.e., there is no
"sysclose"). We won't say anymore here because in practice "sysopen" is rarely
used, but it is available if you want it. To learn more about it: "perldoc -f
sysopen".

             -----------------------------------

4) Exercise

One of the previous exercises (one solution to which is found below) was to
"translate" the American "ize" into the British "ise". Modify this program to
read its input from a file and write its output to another file.

             -----------------------------------

5) Answers to Previous Exercises

a) Here is a program that replaces the word "dead" with "metabolically
different":

   #!/usr/bin/perl -w
   use strict;

   while ( defined(my $line = <STDIN>) ) {
     $line =~ s/\bdead\b/metabolically different/g;
     print $line;
   }

b) Here is a very basic program that replaces "ize" with "ise":

   #!/usr/bin/perl -w
   use strict;

   while ( defined(my $line = <STDIN>) ) {
     $line =~ s/ize\b/ise$1/g;
     print $line;
   }

Here is more advanced version that takes into account variations like
"realised" and "desensitising". It also avoids changing the spelling of the
words "size", "resize" and "downsize", which are spelled the same way in
England and America:

   #!/usr/bin/perl -w
   use strict;

   while ( defined(my $line = <STDIN>) ) {
     $line =~ s/iz(e|es|ing)\b/is$1/g;
     $line =~ s/\b(re|down)?sis(e|es|ing)\b/$1siz$2/g;
     print $line;
   }

             -----------------------------------

6) Past Information

Part 1: Getting Started
         http://linuxchix.org/pipermail/courses/2003-March/001147.html

Part 2: Scalar Data
         http://linuxchix.org/pipermail/courses/2003-March/001153.html

Part 3: User Input
         http://linuxchix.org/pipermail/courses/2003-April/001170.html

Part 4: Control Structures
         http://linuxchix.org/pipermail/courses/2003-April/001184.html

Part 4.5, a review with a little new information at the end:
         http://linuxchix.org/pipermail/courses/2003-July/001297.html

Part 5: The "tr///" Operator
         http://linuxchix.org/pipermail/courses/2003-July/001302.html

Part 6: The "m//" Operator
         http://linuxchix.org/pipermail/courses/2003-August/001305.html

Part 7: More About "m//"
         http://linuxchix.org/pipermail/courses/2003-August/001322.html

Part 8: The "s///" Operator
         http://linuxchix.org/pipermail/courses/2003-August/001330.html

             -----------------------------------

7) Credits

Works cited:
a) "man perlop"
b) "man perlopentut"
c) Kirrily Robert, Paul Fenwick and Jacinta Richardson's "Intermedia Perl",
    which you can find (along with their "Introduction to Perl") at:
    http://www.perltraining.com.au/notes.html

Thanks to Jacinta Richardson for fact checking.

             -----------------------------------

8) Licensing

This course (i.e., all parts of it) is copyright 2003 by Alice Wood and Dan
Richter, and is released under the same license as Perl itself (Artistic
License or GPL, your choice). This is the license of choice to make it easy
for other people to integrate your Perl code/documentation into their own
projects. It is not generally used in projects unrelated to Perl.