[prog] Introduction and Perl: Flat File DB Query Question

Sabine Konhaeuser sjmk at gmx.net
Wed Mar 15 06:07:28 EST 2006


Hi Everyone,

I just subscribed to this list. My name is Sabine and I work as a web 
programmer in a Design firm. I'm pretty much responsible for most things 
technical there. My main programming language is now PHP, but once in a while 
I have to dabble in Perl. And this is where my question fits in.

A client of ours uses a flat file database for a zip locator (flat text file 
for store locations, dbm for zip codes, and a Perl script we inherited). 
Beside the fact that some areas are getting very slow due to large files, 
things work OK so far. No we ran into the problem (the number of locations is 
growing), that locations with the same zip code (but different address) are 
displayed several times.

Example: zip code 12345 appears 2 times in the locator list. When I search for 
a zip code, lets say 12345, both locations show up twice. If this zip code 
would be present 3 times, than each location would be displayed 3 times in 
the results page. 

Here is a typical result. Company 1 and 2 share the same zip code.:

Name: Company 1
Location: Street A, City A, 12345
Distance: 5 miles

Name: Company 2
Location: Street B, City A, 12345
Distance: 5 miles

Name: Company 1
Location: Street A, City A, 12345
Distance: 5 miles

Name: Company 2
Location: Street B, City A, 12345
Distance: 5 miles

Name: Company 3
Location: Street D, City A, 12346
Distance: 15 miles

Name: Company 2
Location: Street E, City A, 12347
Distance: 20 miles


Both location (1 and 2) are displayed twice instead of just one time.

Here is the code:

# getting the list of zip codes fitting the allowed radius
# Convert to radians...

$lat_origin = ($lat_origin*3.14159)/180;
$long_origin = ($long_origin*3.14159)/180;


# Search
$counter = 0;
$centers_found = 0;
$j = 0;

# get zip codes
open (INPUTFILE, "$rc_data") || die print STDOUT "Can't open $rc_data\n";
while (<INPUTFILE>) { #begin while
    chop;
    ($rc_name, $street_address, $city, $state, $rc_zip) = split (/\|/);

    ($rc_lat_long) = $zipdb{$rc_zip};
    ($lat_destination, $long_destination) = split (/\|/, $rc_lat_long);

    #convert to radians...

    $long_destination = ($long_destination*3.14159)/180;
    $lat_destination = ($lat_destination*3.14159)/180;

    #find the angle between the two locations...

    $angle = sin($lat_destination)*sin($lat_origin) + 
cos($lat_destination)*cos($lat_origin)*cos($long_origin-$long_destination);
# the above 2 lines are on one line

if ($angle>.999999){
$angle=".999999";}

    # Now find great circle distance from between the two locations...

    $total=atan2(sqrt(1 - $angle * $angle), $angle);
    $distance=int(($total*180/3.14159) * 72);

    $dist[$j] = $distance;
    $zip[$j] = $rc_zip;

    if ($distance <= 100) {
        $centers_found++;
    }

    $j++;

} # end while

close(INPUTFILE);

# Now sort by increasing distance...

for ($i = 0; $i < $#dist; $i++) {

    $min = $dist[$i];
    $min_index = $i;

    for ($j = $i+1; $j <= $#dist; $j++) {

        if ($dist[$j] < $min) {

            $min = $dist[$j];
            $min_index = $j;

            }
    } # end for j

    $temp_dist = $dist[$i];
    $dist[$i] = $dist[$min_index];
    $dist[$min_index] = $temp_dist;

    $temp_zip = $zip[$i];
    $zip[$i] = $zip[$min_index];
    $zip[$min_index] = $temp_zip;

} # end for i

# now the printout of results

if ($centers_found == 0) {
	# some message
}
else {
	$centers_found--;
}


for ($j = 0; $j <= $centers_found; $j ++) {
    open (INPUTFILE, "$rc_data") || die print STDOUT "Can't open $rc_data\n";
    while (<INPUTFILE>) { #begin while
        chop;
        ($rc_name, $street_address, $city, $state, $rc_zip, $email, $phone1, 
$phone2, $fax) = split (/\|/);
# the above 2 lines are on one line
        if ($rc_zip eq $zip[$j]) { 
          # show the dealers that match the zip/distance
          # prints out the locations
        } # end if
    } # end while
    close (INPUTFILE);

} # end for

dbmclose(%zipdb);

end of code

$rc_data is the text file that holds the store locations delimited with pipes.
$zipdb is the zip dbm holding zip codes and their longitudes and latitudes.

Some lines might be broken by the email client. I made a note where this 
happened.

What happens is that both locations (1 and 2 from the above example) are 
displayed when j=0 and also when j=1, but I would only want to have each 
location displayed once, not twice.

My question is now, how do I force this code to not show duplicates? In SQL I 
use DISTINCT. Is there some equivalent to DISTINCT for getting results in 
flat text files? I could not find anything online. At this time a DB driven 
approach is unfortunately not an option for this client.

I hope that all makes sense. 
Thanks in advance.

-- 
Sabine Konhaeuser


More information about the Programming mailing list