[Courses] [Ruby] Lesson 0: Installing, References, and your first homework assignment

Laurel Fan laurel.fan at gmail.com
Thu Nov 17 10:28:02 EST 2005


On 11/9/05, Akkana Peck <akkana at shallowsky.com> wrote:
> Hi!

Hi Akkana!

> For my homework assignment, I rewrote in Ruby a little program that
> I wrote last week for spam filtering. I suddenly started getting a
> lot more spam with unprintable characters in the subject, so I
> wanted a program to calculate how many characters in a set of
> strings was printable vs. unprintable. I tried at first to write it
> in Python, but it turned out Python doesn't have any easy way to do
> the equivalent of C's "isprint" test. Eventually I gave up and wrote
> it in C, which was really easy.
>
> Googling, it turns out Ruby doesn't have a way to do isprint()
> either. :-( But I did find a way to delete all unprintable characters
> from a string, so I wrote it using that. (I think Python has that too,
> so I could probably go back and write it in Python the same way.)
>
> #!/usr/local/bin/ruby
>
> # This program checks its runtime arguments for number of
> # printable and unprintable characters.
>
> class SmartString < String
>   def print_unprint()
>     toprint = self.gsub(/[^[:print:]]/, '')
>     return [toprint.length, self.length - toprint.length]
>   end
> end
>
> # main: loop over each input word
>
> total_printable = 0
> total_unprintable = 0
> ARGV.each do |word|
>   p_u = SmartString.new(word).print_unprint()
>   total_printable += p_u[0]
>   total_unprintable += p_u[1]
> end
>
> print "Total: ", total_printable, " printable, ",
>       total_unprintable, " unprintable\n"
>
> You said to analyze part of the code, so I'll analyze the class
> SmartString. I made it a new class that inherits from the normal
> String class, so that I could use all the built-in string methods.
> Eventually I'd probably want to add some other tests (e.g. checking
> how much punctuation and numbers there is compared to letters, maybe
> checking word length) but right now the only new method is a
> function called print_unprint that returns an array of two items:
> the number of printable characters in the string, and the number of
> unprintable characters.

You could have done this by extending the String class itself (in the
context of your program only, of course).  We'll learn how later.

> It does that by replacing (using gsub, which does a global
> substitution over the string) any unprintable character in the string
> with '' (i.e. deleting it). It turns out Ruby has a character class
> called [:print:] (the character classes are listed on p. 72 of the
> second edition of Programming Ruby) so [^[:print:]] matches any
> character that's not printable. (I found that snippet by googling.
> There are lots of useful Ruby snippets on the web if you google for
> terms related to what you're trying to do.)

The character classes supported by Ruby are actually standard POSIX
character sets, and I think they work in othe rplaces like Perl,
Python, Java, etc...

> I confess I'm not 100% clear on the two sets of brackets: the outside
> set says "for this regular expression, use any character in this
> group" and I grok that, but the inner set with the colons, [::],
> seems to be something you always put around character classes
> but I'm not comfortable enough yet with the syntax to be sure why.

It's just what the regular expression thing does.  It's a way of
making the difference between "the character class isprint" and "match
the characters i, s, p, r..." really obvious.  I don't know of any
reason they chose that exact syntax, but I'm not a regex expert.  The
character classes are pretty standard.

> The return in print_unprint uses [ ] to build up a two-element array
> on the fly, so it can return both the printable and unprintable counts.
> Then the caller can index the printable count with [0] and the
> unprintable count with [1]. If I were actually using this for a
> spam filter, instead of printing the count I'd exit with a nonzero
> status if there were too many unprintables.

Looks good, and good analysis!  In the more-than-one-way department,
you could have used the String.scan function, documented here:
http://www.ruby-doc.org/core/classes/String.html#M001414

You would want something like:
input.scan(/[[:isprint:]]/).length

scan returns an array of matches, so to count the matches we're
calling length on the array it returns.

String.count also does something close to what you want (but it
wouldn't work here because it can't take a regular expression).  See
http://www.ruby-doc.org/core/classes/String.html#M001414

One of the philosophies of Ruby is that things should be made easy by
making library functions for common tasks (even if it would have taken
only 3 lines to do it the "normal" way).  There are lots of useful
little functions like this scattered around the builtin classes.

--
Laurel Fan
http://dreadnought.gorgorg.org


More information about the Courses mailing list