[Courses] [python] Lesson 4: Modules and command-line arguments

Fri Jul 8 20:56:44 UTC 2011

Today's lesson will go into some details of building a larger program.

But first, a couple of digressions:

============ Comments ===================

Python's comment character is a hash mark, #. Any time you see #,
anything after it on the line isn't part of the program. It's just a
note to yourself or to anyone else who reads your program.

So comments can be on their own line:
# Print the "99 bottles of beer" song

or they can be on the same line with some Python code:

for i in range(99, 0, -1) :     # Loop downward from 99
    print i, "bottles of beer"  # print the next line of the song

As you write longer programs, it's good to add comments for
anything that might be confusing or hard to read.

=============== Shebang! ============================

Another digression, on how to run your programs more easily. You may have
seen some students including this as the first line of their solutions:

#!/usr/bin/env python

That's a special code that tells the operating system this is a python
program. That way, you can run the program simply by typing the filename;
no need to say "python progname".

#! is called a "shebang" because # is sometimes pronounced "hash" (or
"sharp") and ! is pronounced "bang", and "shebang" is more fun to say
than "hash-bang" or "sharp-bang".  You'll see them in scripts of all
languages -- python, perl, ruby, bash etc.  If you want to know more
about the "/usr/bin/env" part, see Wikipedia:
http://en.wikipedia.org/wiki/Shebang_%28Unix%29#Portability

If your system uses python3 by default, this is one of those places where
you should say python2 or python2.7 or whatever instead of just python.

If you're on Windows, a shebang line won't help you, but it doesn't hurt to
include one in your programs so they'll be easier to run on Linux and Mac.

============ Importing modules ===================

Okay, let's get back to actual programming.

One of Python's great advantages is that it comes with a huge slew of
built-in libraries, called "modules", to help you do nearly anything.
Want to write a web browser or a video editor, or search Twitter, or
figure out where Jupiter is in the sky? There's a Python module to help.

Randall Munroe (of XKCD) thinks so too: http://xkcd.com/353/

Okay, so how do you use a module? Simple: once you find the module
you want, just say "import modulename" at the top of your program.

======== The "sys" module, for command-line arguments ==========

One module you'll use a lot is the sys module, because that's the one
that lets you get command-line arguments.

Here's a program that just prints any arguments you give it:

#! /usr/bin/env python

import sys

for arg in sys.argv :
    print arg

Save that to a file -- I called my program "args" (notice I didn't use
a .py extension, though Windows users might need it). Make it executable:
chmod +x args
Then run it a few times, with different command-line arguments:

$ args 1 2 3
args
1
2
3
$ args hello, world
args
hello,
world

When you import the module called sys, you automatically get a
variable called sys.argv that gives you all the command-line arguments
the user typed -- including the program name.

Of course, most of the time you'll want to loop over all the
*other* arguments but not the program name -- you'd want to print
hello, and world, not args and hello, and world.

So how do you get around that? sys.argv is a list so you can use
slices.  Remember slices from lesson 3? In this case you want all the
arguments starting with number 1 (0 is the program name).  That's
sys.argv[1:]. So you can change the program so the loop reads:

for arg in sys.argv[1:] :
    print arg

and you'll get exactly what you need: it will print all the arguments
but not the program name.

================= String to int conversion ====================

sys.argv is a list, but each elements is a string. So if you need to
use them as numbers -- for instance, in range(0, sys.argv[1]) -- you
can convert to an integer with int(sys.argvp[1]). If you need a
floating point number, use float() rather than int().

import sys

num = int(sys.argv[1])
for i in range(0, num):
    print i

==================== Reading files ========================

A lot of the time, when you're passing arguments to a program, they're
names of files you want to open.

For instance, remember our word count program? Wouldn't that be a lot
more useful if you could give it a filename, and it would print the
number of words in that file?

Here's how you would read from a file in Python: let's say you wanted
to read from that "args" program you just wrote (and you're still in
that directory).

file = open("args")
for line in file :
    print "Read a line:", line
file.close()

open(filename) gets you a file object. If you loop over it (for line in
file), it reads the file line by line, giving you each line as a string.

It's always a good idea to close files after your program is finished
with them. When your program finishes running, Python will close the file
anyway -- but eventually, when you write bigger programs, they might not
finish right away, and they might have to open a whole bunch of files,
and after a while that could cause problems (it's like when Firefox has
been running for days and it just grows bigger and bigger) ...
so it's good to get in the habit right at the beginning.

Of course, you don't necessarily want to read your "args" file: you
want to read whatever file the user suggested. If you're only reading
one file, that's argument number 1 (remember, 0 is the program name):

import sys

file = open(sys.argv[1])
for line in file :
    print "Read a line:", line
file.close()

========================= Homework ============================

1. With the little example I gave earlier, the one that used
   num = int(sys.argv[1]):
   if you run it and don't give an argument, you'll get an error.
   Why? Can you think of a way to check whether the user forgot to
   supply an argument, and print an error message if so?

2. Write a program that takes a filename and prints the number of
lines in the file. (You can check its results with wc -l filename.)

3. How would you extend this so that you can count lines in multiple
files, not just one? So you could say
$ mywordcounter file1 file2 file3

4. Here's a harder problem, an exercise in debugging (which is a big
   part of programming, sadly):

   a. Write a program that counts words in a file (or multiple files,
      if you prefer). Use the same split() and len() you used in
      lesson 2.

   b. Compare the number of words from your program to what wc -w gives.
      (If you're on a platform that doesn't have wc, run it on a small
      file and count by hand.) Are the answers the same?

   c. Here's the debugging part: why aren't they the same?
      (You don't have to fix it: just figure out the problem.)

      Hint: if you're splitting each line into a list, try printing
      the list to see what's in it. In python, if you have a list
      called words, you can just say print words -- you don't have to
      do anything fancy like you would in some languages.

   d. OPTIONAL, harder: fix the problems and make your word count
      program give the same answer as wc -w.

      Hint 1: one Python function that will come in handy is strip():
      it strips off any leading and trailing spaces. So if you have
      a string s = "     hello, world     ", then s.strip() would
      give you "hello, world".

      By the way, I haven't mentioned Python's documentation, but
      most Python modules have excellent online docs. Here's strip():
      http://docs.python.org/library/string.html#string.strip

      Hint 2: If you're inside a loop, say, looping over lines, and
      you decide you don't care about this line, you can skip to the
      next one by saying:
          continue
      For instance, in a loop where you don't care about negative numbers:
      for i in list_of_numbers :
          if i < 0 :
              continue
          do_stuff_for_positive_numbers(i)

      You can break out of a loop completely with: break

      Don't drive yourself too crazy trying to get an exact match
      with wc. There are some special cases where splitting at spaces
      might not give the same answer as wc -w, and there are some
      other Python modules (specifically re, regular expressions)
      that can do a better job. The purpose of this exercise is to
      give you a taste of debugging, fixing problems as you find them
      and thinking about what special cases might arise.