[prog] perl question
mc
mcgonzalez at att.net
Fri Jul 25 12:16:34 EST 2003
I am writing a perl script that will take a mbox type newsserver file
and extract only some of the headers into a tab separated file that can
be sent to Windows. I think I have most of it OK, but I am not getting
the import to work properly and I think it is because I may have a few
extraneous carriage return/line feeds. I have tried various chomps, in
various places, and I am stuck cause it is not working :)
My script so far:
#!/usr/bin/perl -w
#Create one big file from all db files
open(STUFF, '>dbfile');
@ALLFLOW=glob('db_*');
foreach $Filename(@ALLFLOW) {
open(IN, $Filename);
print STUFF <IN>;
close (IN);
}
close (STUFF);
# create a file of just the headers
open(FULL, "dbfile");
open (HEAD, '>>headfile');
while (<FULL>){
$ThisLine = $_;
if ($ThisLine =~ /^\^\~/)
{($one, $two, $three, $four, $five) = split(/:/, $ThisLine);
print HEAD "$four: $five";
}
elsif ($ThisLine =~ /Message-ID:/)
{print HEAD $ThisLine}
elsif ($ThisLine =~ /From:/)
{print HEAD $ThisLine}
elsif ($ThisLine =~ /Subject:/)
{print HEAD $ThisLine}
elsif ($ThisLine =~ /^Date:/)
{print HEAD $ThisLine}
elsif ($ThisLine =~ /NNTP-Posting-Host:/)
{print HEAD $ThisLine}
elsif ($ThisLine =~ /^Newsgroups:/)
{print HEAD $ThisLine}
elsif ($ThisLine =~ /User-Agent:/)
{print HEAD $ThisLine}
elsif ($ThisLine =~ /HTTP-Posting-Host:/)
{print HEAD $ThisLine}
elsif ($ThisLine =~ /HTTP-User-Agent:/)
{print HEAD $ThisLine}
elsif ($ThisLine =~ /X-Newsreader:/)
{print HEAD $ThisLine}
elsif ($ThisLine =~ /X-Mailer:/)
{print HEAD $ThisLine}
elsif ($ThisLine =~ /Xref:/)
{print HEAD "$ThisLine\n"}
}
close(FULL);
close(HEAD);
#Create list in order then add it to the end of a csv file with print
#First, state the array, then print it as first line of tab separated
file
open(HEAD, "headfile");
@headers=(
'Message-ID', 'From', 'Subject', 'Date', 'NNTP-Posting-Host',
'HTTP-Posting-Host', 'Newsgroups', 'Newsreader'
);
open(HEADOUT, '>headers.csv');
print HEADOUT "$headers[0]\t$headers[1]\t$headers[2]\t$headers[3]\t" .
"$headers[4]\t$headers[5]\t$headers[6]\t$headers[7]\n";
close(HEADOUT);
#Now do the testing and order the headers and add to output file
while (<HEAD>) {
$ThisLine = $_;
if ($ThisLine =~ /Message-ID:/)
{($one, $two) = split(/:/,$ThisLine);
chomp($two);
$headers[0] = $two;
}
elsif ($ThisLine =~ /From:/)
{($one, $two) = split(/:/,$ThisLine);
chomp($two);
$headers[1] = $two;
}
elsif ($ThisLine =~ /Subject:/)
{($one, $two) = split(/:/,$ThisLine);
chomp($two);
$headers[2] = $two;
}
elsif ($ThisLine =~ /^Date:/)
{($one, $two) = split(/:/,$ThisLine);
chomp($two);
$headers[3] = $two;
}
elsif ($ThisLine =~ /NNTP-Posting-Host:/)
{($one, $two) = split(/:/,$ThisLine);
chomp($two);
$headers[4] = $two;
}
elsif ($ThisLine =~ /HTTP-Posting-Host:/)
{($one, $two) = split(/:/,$ThisLine);
chomp($two);
$headers[5] = $two;
}
elsif ($ThisLine =~ /^Newsgroups:/)
{($one, $two) = split(/:/,$ThisLine);
chomp($two);
$headers[6] = $two;
}
elsif ($ThisLine =~ /User-Agent:/)
{($one, $two) = split(/:/,$ThisLine);
chomp($two);
$headers[7] = $two;
}
elsif ($ThisLine =~ /HTTP-User-Agent:/)
{($one, $two) = split(/:/,$ThisLine);
chomp($two);
$headers[7] = $two;
}
elsif ($ThisLine =~ /X-Newsreader:/)
{($one, $two) = split(/:/,$ThisLine);
chomp($two);
$headers[7] = $two;
}
elsif ($ThisLine =~ /X-Mailer:/)
{($one, $two) = split(/:/,$ThisLine);
chomp($two);
$headers[7] = $two;
}
elsif ($ThisLine =~ /Xref:/)
{open(HEADOUT, '>>headers.csv');
print HEADOUT "$headers[0]\t$headers[1]\t$headers[2]\t$headers[3]\t" .
"$headers[4]\t$headers[5]\t$headers[6]\t$headers[7]\n";
close(HEADOUT);
}
}
close(HEAD);
I am 98.7% sure my breakdown is in the second part, where I take the
dbfile and get out a file of just headers. If I put in a chomp right
after I read in the first line (ie right after $ThisLine=$_) Then the
script dies without any output to my final csv file, but no errors.
I created a file in gedit that I had just cut and pasted some of the raw
headers into. When I run that as my input file in the second part, the
script runs perfectly and I get a perfect output.
Am I making any sense here????? Any help would be most appreciated :)
--
mc
I haven't lost my mind,
It is backed up on disk somewhere.
9M
More information about the Programming
mailing list