[prog] perl string question

k.clair k at klerp.net
Thu Feb 27 20:44:04 EST 2003


Hello...

I'm just going to do the string bit and not the printing bit because the
string bit is long enough...

well you have a few options:

--you can still use split:
    since you know that the string has 4 colons, you know how many
    strings split will split your string into:
    ($one, $two, $three, $four, $five) = split(/:/, $yourstring);

    then you will of course need to re-insert the colons back into the
    string you print:

    print FILE "$one:$two:$three:\n";
    print FILE "$four: $five\n";

--you can use a regular expression:

    regular expressions are "greedy" which means that if you use an
    operator in a regular expression like + or *, it will capture as
    much of the string as possible, so--

    $string = "^~00002439:0000007179:000016:Message-ID: <3E33C3BD.21CD8BB6 at att.net>"
    $string =~ /^(.*:)(\s.*)$/;
    $first_part = $1;  
        ### $first part is "^~00002439:0000007179:000016:Message-ID:"
    $second_part = $2;
        ### $second_part is " <3E33C3BD.21CD8BB6 at att.net>"

    what's going on here is--
    -- the parentheses in the regex mean that perl should remember what
    the part of the string was that matched that part of the regex.
    perl reserves variables $1, $2, $3, etc, that correspond to the
    order of the parentheses in the regex.
    -- the ^ at the beginning of the regex means that perl should start
    at the beginning of the string (which it would anyway but i always
    include it for good measure... sometimes it really is necessary)
    -- the $ at the end of the regex means to match until the very end
    of the string
    -- the . means to match any character (including spaces)
    -- the * means to match the pattern before it (which in this case is
    the .) as many times as possible, so in this case it means to match
    any character as many times as possible
    -- \s matches a space
    -- so perl starts at the beginning of the string and says "match
    every character until i find a colon followed by a space... now, it
    finds a colon after the "9", but it won't stop there because there
    is not a space after it.  so the space after the last colon was
    handy in this case... if there was not a space it would still work,
    though because the * operator is "greedy", so perl would have the
    regex eat up the entire string until the last colon, putting as many
    characters into that first * as possible while still being able to
    match the regular expression.

    well that was a bit of a crash course in regular expressions! let me
    know if you have any questions about that!!

    usually you'd probably want to use a regex that's a little more
    specific so that you can account for any erratic data that might be
    in your file, but that should get you started...

hope that helps a little,
kristina
    

On Thu, Feb 27, 2003 at 05:22:51PM -0700, mc wrote:
- Hi all!  I am writing a perl script that eventually will return a file
- that can be used to pull some posting statistics from an nntp server. 
- Amazing since I am teaching myself perl at the same time (can't wait for
- the course to start here!)
- 
- Anyway.  I think I have made good progress in that I have the output
- file to the point where I have all the header information I need for
- each post on the server.  My problem is that the first line of the
- headers contains two bits of info, and I would like to separate them. 
- An example:
- 
- ^~00002439:0000007179:000016:Message-ID: <3E33C3BD.21CD8BB6 at att.net>
- 
- I would like to output this to my working file so it ends up as:
- 
- ^~00002439:0000007179:000016:
- Message-ID: <3E33C3BD.21CD8BB6 at att.net>
- 
- I thought I could do this with the split function, but since I only want
- to split it at the third ":" I am not sure how to write that.  I am also
- lost as to how then to print the results to my file.  I am sure it is a
- print statement, just a bit lost :)
- 
- Once I get this figured out, I am pretty sure I have the next part
- figured out and am almost done. 
- 
- Any ideas or pointers to the best way to do this would be very much
- appreciated!!!
- 
- 
- -- 
- mc
- I haven't lost my mind,
- It is backed up on disk somewhere.
- 4M
- 
- _______________________________________________
- Programming mailing list
- Programming at linuxchix.org
- http://mailman.linuxchix.org/mailman/listinfo/programming

### my gpg key can be found here:
http://www.klerp.net/gpgkey
lynx --dump --source http://www.klerp.net/gpgkey | gpg import
Key fingerprint = 6B2F AB26 A8A9 DE4D 91FD  8E93 7A6B 387A 2795 714B
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 240 bytes
Desc: not available
Url : http://linuxchix.org/pipermail/programming/attachments/20030227/d744b3a5/attachment.pgp


More information about the Programming mailing list