[Courses] [Ruby] Lesson 0: Installing, References, and your first homework assignment

Akkana Peck akkana at shallowsky.com
Thu Nov 10 07:12:21 EST 2005


Hi! Laurel, thanks for running the course.

I'm a longtime programmer in lots of languages. My two favorites
are C and Python; I suppose because they're simple, compact and
efficient. I've used Ruby and Rails a little bit, but I want to
learn more about Ruby because although I can get by in it, I
don't feel like I understand it as well as I'd like to.

I expect most of my Ruby use will be for Rails, but who knows?
Maybe the course will convince me that I should use it instead of
Python for writing standalone scripts. And every language has
strengths and weaknesses, so even if I don't switch to Ruby as
my main scripting language, I'm sure there will be times when it's
the best tool for the job.

Laurel Fan writes:
> Any modern Linux distribution has ruby package(s).  It might even
> already be installed.  (To check, try 'ruby --version' in a shell). If

I'm currently running Ubuntu "Hoary Hedgehog". Although it offers
Ruby packages (which are probably fine for this course), I didn't
have much luck getting Rails and Gems to work, even after adding the
backports from Breezy. Also, the built-in Ruby didn't install a
"ruby" command so I had to make a symlink (I notice someone else
just asked about that on newchix recently).

I ended up building Ruby and Gems from source, then using gems to
install Rails. That worked fine and I've had no problem with the
setup. I suspect this all works out of the box on Breezy.

For my homework assignment, I rewrote in Ruby a little program that
I wrote last week for spam filtering. I suddenly started getting a
lot more spam with unprintable characters in the subject, so I
wanted a program to calculate how many characters in a set of
strings was printable vs. unprintable. I tried at first to write it
in Python, but it turned out Python doesn't have any easy way to do
the equivalent of C's "isprint" test. Eventually I gave up and wrote
it in C, which was really easy.

Googling, it turns out Ruby doesn't have a way to do isprint()
either. :-( But I did find a way to delete all unprintable characters
from a string, so I wrote it using that. (I think Python has that too,
so I could probably go back and write it in Python the same way.)

#!/usr/local/bin/ruby

# This program checks its runtime arguments for number of
# printable and unprintable characters.

class SmartString < String
  def print_unprint()
    toprint = self.gsub(/[^[:print:]]/, '')
    return [toprint.length, self.length - toprint.length]
  end
end

# main: loop over each input word

total_printable = 0
total_unprintable = 0
ARGV.each do |word|
  p_u = SmartString.new(word).print_unprint()
  total_printable += p_u[0]
  total_unprintable += p_u[1]
end

print "Total: ", total_printable, " printable, ",
      total_unprintable, " unprintable\n"

You said to analyze part of the code, so I'll analyze the class
SmartString. I made it a new class that inherits from the normal
String class, so that I could use all the built-in string methods.
Eventually I'd probably want to add some other tests (e.g. checking
how much punctuation and numbers there is compared to letters, maybe
checking word length) but right now the only new method is a
function called print_unprint that returns an array of two items:
the number of printable characters in the string, and the number of
unprintable characters.

It does that by replacing (using gsub, which does a global
substitution over the string) any unprintable character in the string
with '' (i.e. deleting it). It turns out Ruby has a character class
called [:print:] (the character classes are listed on p. 72 of the
second edition of Programming Ruby) so [^[:print:]] matches any
character that's not printable. (I found that snippet by googling.
There are lots of useful Ruby snippets on the web if you google for
terms related to what you're trying to do.)

I confess I'm not 100% clear on the two sets of brackets: the outside
set says "for this regular expression, use any character in this
group" and I grok that, but the inner set with the colons, [::],
seems to be something you always put around character classes
but I'm not comfortable enough yet with the syntax to be sure why.

The return in print_unprint uses [ ] to build up a two-element array
on the fly, so it can return both the printable and unprintable counts.
Then the caller can index the printable count with [0] and the
unprintable count with [1]. If I were actually using this for a
spam filter, instead of printing the count I'd exit with a nonzero
status if there were too many unprintables.

	...Akkana


More information about the Courses mailing list