[prog] perl question

mc mcgonzalez at att.net
Fri Jul 25 12:16:34 EST 2003


I am writing a perl script that will take a mbox type newsserver file
and extract only some of the headers into a tab separated file that can
be sent to Windows.  I think I have most of it OK, but I am not getting
the import to work properly and I think it is because I may have a few
extraneous carriage return/line feeds.  I have tried various chomps, in
various places, and I am stuck cause it is not working :)

My script so far:
#!/usr/bin/perl -w

#Create one big file from all db files

open(STUFF, '>dbfile');
@ALLFLOW=glob('db_*');
foreach $Filename(@ALLFLOW) {
	open(IN, $Filename);
	print STUFF <IN>;
	close (IN);
}
close (STUFF);


# create a file of just the headers

open(FULL, "dbfile");
open (HEAD, '>>headfile');
while (<FULL>){
	$ThisLine = $_;

	if ($ThisLine =~ /^\^\~/)
		{($one, $two, $three, $four, $five) = split(/:/, $ThisLine);
    		print HEAD "$four: $five";
		}

	elsif ($ThisLine =~ /Message-ID:/)
		{print HEAD $ThisLine}
	
	elsif ($ThisLine =~ /From:/)
		{print HEAD $ThisLine}

	elsif ($ThisLine =~ /Subject:/)
		{print HEAD $ThisLine}

	elsif ($ThisLine =~ /^Date:/)
		{print HEAD $ThisLine}

	elsif ($ThisLine =~ /NNTP-Posting-Host:/)
		{print HEAD $ThisLine}

	elsif ($ThisLine =~ /^Newsgroups:/)
		{print HEAD $ThisLine}

	elsif ($ThisLine =~ /User-Agent:/)
		{print HEAD $ThisLine}

	elsif ($ThisLine =~ /HTTP-Posting-Host:/)
		{print HEAD $ThisLine}

	elsif ($ThisLine =~ /HTTP-User-Agent:/)
		{print HEAD $ThisLine}

	elsif ($ThisLine =~ /X-Newsreader:/)
		{print HEAD $ThisLine}

	elsif ($ThisLine =~ /X-Mailer:/)
		{print HEAD $ThisLine}

	elsif ($ThisLine =~ /Xref:/)
		{print HEAD "$ThisLine\n"}

	}

close(FULL);
close(HEAD);

#Create list in order then add it to the end of a csv file with print
#First, state the array, then print it as first line of tab separated
file

open(HEAD, "headfile");

@headers=(
	'Message-ID', 'From', 'Subject', 'Date', 'NNTP-Posting-Host',
'HTTP-Posting-Host', 'Newsgroups', 'Newsreader'
	);

open(HEADOUT, '>headers.csv');
print HEADOUT "$headers[0]\t$headers[1]\t$headers[2]\t$headers[3]\t" . 
  "$headers[4]\t$headers[5]\t$headers[6]\t$headers[7]\n";

close(HEADOUT);

#Now do the testing and order the headers and add to output file

while (<HEAD>) {
	$ThisLine = $_;

	if ($ThisLine =~ /Message-ID:/)
		{($one, $two) = split(/:/,$ThisLine);
		chomp($two);
		$headers[0] = $two;
		}

	elsif ($ThisLine =~ /From:/)
		{($one, $two) = split(/:/,$ThisLine);
		chomp($two);
		$headers[1] = $two;
		}

	elsif ($ThisLine =~ /Subject:/)
		{($one, $two) = split(/:/,$ThisLine);
		chomp($two);
		$headers[2] = $two;
		}

	elsif ($ThisLine =~ /^Date:/)
		{($one, $two) = split(/:/,$ThisLine);
		chomp($two);
		$headers[3] = $two;
		}

	elsif ($ThisLine =~ /NNTP-Posting-Host:/)
		{($one, $two) = split(/:/,$ThisLine);
                chomp($two);
		$headers[4] = $two;
		}

	elsif ($ThisLine =~ /HTTP-Posting-Host:/)
		{($one, $two) = split(/:/,$ThisLine);
                chomp($two);
		$headers[5] = $two;
		}

	elsif ($ThisLine =~ /^Newsgroups:/)
		{($one, $two) = split(/:/,$ThisLine);
                chomp($two);
		$headers[6] = $two;
		}

	elsif ($ThisLine =~ /User-Agent:/)
		{($one, $two) = split(/:/,$ThisLine);
                chomp($two);
		$headers[7] = $two;
		}

	elsif ($ThisLine =~ /HTTP-User-Agent:/)
		{($one, $two) = split(/:/,$ThisLine);
                chomp($two);
		$headers[7] = $two;
		}

	elsif ($ThisLine =~ /X-Newsreader:/)
		{($one, $two) = split(/:/,$ThisLine);
                chomp($two);
		$headers[7] = $two;
		}

	elsif ($ThisLine =~ /X-Mailer:/)
		{($one, $two) = split(/:/,$ThisLine);
                chomp($two);
		$headers[7] = $two;
		}

	elsif ($ThisLine =~ /Xref:/)
		{open(HEADOUT, '>>headers.csv');
print HEADOUT "$headers[0]\t$headers[1]\t$headers[2]\t$headers[3]\t" . 
  "$headers[4]\t$headers[5]\t$headers[6]\t$headers[7]\n";

close(HEADOUT);
}
	}
close(HEAD);

I am 98.7% sure my breakdown is in the second part, where I take the
dbfile and get out a file of just headers.  If I put in a chomp right
after I read in the first line (ie right after $ThisLine=$_) Then the
script dies without any output to my final csv file, but no errors.


I created a file in gedit that I had just cut and pasted some of the raw
headers into.  When I run that as my input file in the second part, the
script runs perfectly and I get a perfect output.

Am I making any sense here?????  Any help would be most appreciated :)



-- 
mc
I haven't lost my mind,
It is backed up on disk somewhere.
9M



More information about the Programming mailing list