[prog] Introduction and Perl: Flat File DB Query Question
Sabine Konhaeuser
sjmk at gmx.net
Wed Mar 15 06:07:28 EST 2006
Hi Everyone,
I just subscribed to this list. My name is Sabine and I work as a web
programmer in a Design firm. I'm pretty much responsible for most things
technical there. My main programming language is now PHP, but once in a while
I have to dabble in Perl. And this is where my question fits in.
A client of ours uses a flat file database for a zip locator (flat text file
for store locations, dbm for zip codes, and a Perl script we inherited).
Beside the fact that some areas are getting very slow due to large files,
things work OK so far. No we ran into the problem (the number of locations is
growing), that locations with the same zip code (but different address) are
displayed several times.
Example: zip code 12345 appears 2 times in the locator list. When I search for
a zip code, lets say 12345, both locations show up twice. If this zip code
would be present 3 times, than each location would be displayed 3 times in
the results page.
Here is a typical result. Company 1 and 2 share the same zip code.:
Name: Company 1
Location: Street A, City A, 12345
Distance: 5 miles
Name: Company 2
Location: Street B, City A, 12345
Distance: 5 miles
Name: Company 1
Location: Street A, City A, 12345
Distance: 5 miles
Name: Company 2
Location: Street B, City A, 12345
Distance: 5 miles
Name: Company 3
Location: Street D, City A, 12346
Distance: 15 miles
Name: Company 2
Location: Street E, City A, 12347
Distance: 20 miles
Both location (1 and 2) are displayed twice instead of just one time.
Here is the code:
# getting the list of zip codes fitting the allowed radius
# Convert to radians...
$lat_origin = ($lat_origin*3.14159)/180;
$long_origin = ($long_origin*3.14159)/180;
# Search
$counter = 0;
$centers_found = 0;
$j = 0;
# get zip codes
open (INPUTFILE, "$rc_data") || die print STDOUT "Can't open $rc_data\n";
while (<INPUTFILE>) { #begin while
chop;
($rc_name, $street_address, $city, $state, $rc_zip) = split (/\|/);
($rc_lat_long) = $zipdb{$rc_zip};
($lat_destination, $long_destination) = split (/\|/, $rc_lat_long);
#convert to radians...
$long_destination = ($long_destination*3.14159)/180;
$lat_destination = ($lat_destination*3.14159)/180;
#find the angle between the two locations...
$angle = sin($lat_destination)*sin($lat_origin) +
cos($lat_destination)*cos($lat_origin)*cos($long_origin-$long_destination);
# the above 2 lines are on one line
if ($angle>.999999){
$angle=".999999";}
# Now find great circle distance from between the two locations...
$total=atan2(sqrt(1 - $angle * $angle), $angle);
$distance=int(($total*180/3.14159) * 72);
$dist[$j] = $distance;
$zip[$j] = $rc_zip;
if ($distance <= 100) {
$centers_found++;
}
$j++;
} # end while
close(INPUTFILE);
# Now sort by increasing distance...
for ($i = 0; $i < $#dist; $i++) {
$min = $dist[$i];
$min_index = $i;
for ($j = $i+1; $j <= $#dist; $j++) {
if ($dist[$j] < $min) {
$min = $dist[$j];
$min_index = $j;
}
} # end for j
$temp_dist = $dist[$i];
$dist[$i] = $dist[$min_index];
$dist[$min_index] = $temp_dist;
$temp_zip = $zip[$i];
$zip[$i] = $zip[$min_index];
$zip[$min_index] = $temp_zip;
} # end for i
# now the printout of results
if ($centers_found == 0) {
# some message
}
else {
$centers_found--;
}
for ($j = 0; $j <= $centers_found; $j ++) {
open (INPUTFILE, "$rc_data") || die print STDOUT "Can't open $rc_data\n";
while (<INPUTFILE>) { #begin while
chop;
($rc_name, $street_address, $city, $state, $rc_zip, $email, $phone1,
$phone2, $fax) = split (/\|/);
# the above 2 lines are on one line
if ($rc_zip eq $zip[$j]) {
# show the dealers that match the zip/distance
# prints out the locations
} # end if
} # end while
close (INPUTFILE);
} # end for
dbmclose(%zipdb);
end of code
$rc_data is the text file that holds the store locations delimited with pipes.
$zipdb is the zip dbm holding zip codes and their longitudes and latitudes.
Some lines might be broken by the email client. I made a note where this
happened.
What happens is that both locations (1 and 2 from the above example) are
displayed when j=0 and also when j=1, but I would only want to have each
location displayed once, not twice.
My question is now, how do I force this code to not show duplicates? In SQL I
use DISTINCT. Is there some equivalent to DISTINCT for getting results in
flat text files? I could not find anything online. At this time a DB driven
approach is unfortunately not an option for this client.
I hope that all makes sense.
Thanks in advance.
--
Sabine Konhaeuser
More information about the Programming
mailing list