[prog] Introduction and Perl: Flat File DB Query Question

Katherine Spice katherine at coruscant.demon.co.uk
Wed Mar 15 09:37:24 EST 2006


Hi Sabine,

I've reposted your code below with line numbers so that it's easier to 
reference bits.

In english, as far as I can tell:

A.  The script opens up the datafile and reads it line by line.
B.  Once it has done this it is left with two arrays, an element in each 
for each line of the source file. One contains all the zip codes and one 
contains the distances for each zip code. These arrays are 'linked' by 
index - eg the distance of zip code at position 4 is stored in position 
4 of the distance array. The link is inferred, it it not made 
programatically.
C.  The zip array is sorted according to the correlating distances.
D.  For the number of centers found, the script opens up the datafile 
and reads it line by line. Each line which matches the current zip code 
is printed.

The problem is that at stage B (ln 53) given your data above, you are 
left with a zip array containing (12345, 12345, 12346, 12347).

This means that at stage D (ln 91) when this array is looped through, 
the first element is 12345. The file is opened and read - Company 1 
matches so it is output, Company 2 matches, so it is output. For the 
second element (also 12345), the file is opened and read again. Company 
1 matches so it is output, Company 2 matches, so it is output.

The least invasive way to fix this is to fiddle so that the zip array 
only contains each zip code once. You can do this by adding the code 
below at line 18. What we want to do is ask if the newly read zip code 
($rc_zip) is already in the array of zips (@zip). If it is, we just want 
to go to the next line - which is what 'next;' does. However, doing this 
means we never get to line 44 - to increment the counter if the distance 
is less than 100. So (at ln 18) add


$k = 0;
$found = '';
foreach $comparezip (@zip1) {
         if ($comparezip eq $rc_zip) {
                 $found = 'yes';
                 last;
         }
         $k++;
}

if ($found eq 'yes') {
	$comparedist = $dist[$k];
	if ($comparedistance <= 100) {
		$centers_found++;
	}
	next;
}

This checks if this zip has already been seen before, and if so, saves 
the index ($k). If it has, it uses $k to get the already calculated 
distance, and then if necessary, increments $centers_found before 
jumping to the next line in the source file.

However, I've got to say that this is pretty hacky and quite nasty! If 
you've time, I'd recommend that you actually redo this so that the zip 
and distance be stored in a hash, to link them as far as the program is 
concerned. This would avoid all the indexing ickyness, and also mean you 
could pretty much do away with lines 54 to 90 (replaced with 'sort' on 
the hash). Finally, the repeated opening and rereading of the file for 
each matching center (ln 92) is deeply inefficient - if your source file 
has 1000 lines, and there are 10 matches, you're reading 10,000 lines. 
This will only get worse as the size of your files increase, and is 
quite possibly the culprit for poor performance. If you want more detail 
about any of this please ask - I just didn't want to go all out on it if 
you're not interested.

I hope this helps,
Katherine


  1 # getting the list of zip codes fitting the allowed radius
  2 # Convert to radians...
  3
  4 $lat_origin = ($lat_origin*3.14159)/180;
  5 $long_origin = ($long_origin*3.14159)/180;
  6
  7
  8 # Search
  9 $counter = 0;
10 $centers_found = 0;
11 $j = 0;
12
13 # get zip codes
14 open (INPUTFILE, "$rc_data") || die print STDOUT "Can't open $rc_data\n";
15 while (<INPUTFILE>) { #begin while
16 chop;
17 ($rc_name, $street_address, $city, $state, $rc_zip) = split (/\|/);
18
19 ($rc_lat_long) = $zipdb{$rc_zip};
20 ($lat_destination, $long_destination) = split (/\|/, $rc_lat_long);
21
22 #convert to radians...
23
24 $long_destination = ($long_destination*3.14159)/180;
25 $lat_destination = ($lat_destination*3.14159)/180;
26
27 #find the angle between the two locations...
28
29 $angle = sin($lat_destination)*sin($lat_origin) +
30 
cos($lat_destination)*cos($lat_origin)*cos($long_origin-$long_destination);
31 # the above 2 lines are on one line
32
33 if ($angle>.999999){
34 $angle=".999999";}
35
36 # Now find great circle distance from between the two locations...
37
38 $total=atan2(sqrt(1 - $angle * $angle), $angle);
39 $distance=int(($total*180/3.14159) * 72);
40
41 $dist[$j] = $distance;
42 $zip[$j] = $rc_zip;
43
44 if ($distance <= 100) {
45    $centers_found++;
46 }
47
48 $j++;
49
50 } # end while
51 1G/indext
52 close(INPUTFILE);
53
54 # Now sort by increasing distance...
55
56 for ($i = 0; $i < $#dist; $i++) {
57
58 $min = $dist[$i];
59 $min_index = $i;
60
61 for ($j = $i+1; $j <= $#dist; $j++) {
62
63    if ($dist[$j] < $min) {
64
65   $min = $dist[$j];
66   $min_index = $j;
67
68   }
69 } # end for j
70
71 $temp_dist = $dist[$i];
72 $dist[$i] = $dist[$min_index];
73 $dist[$min_index] = $temp_dist;
74
75 $temp_zip = $zip[$i];
76 $zip[$i] = $zip[$min_index];
77 $zip[$min_index] = $temp_zip;
78
79 } # end for i
80
81 # now the printout of results
82
83 if ($centers_found == 0) {
84    # some message
85 }
86 else {
87    $centers_found--;
88 }
89
90
91 for ($j = 0; $j <= $centers_found; $j ++) {
92 open (INPUTFILE, "$rc_data") || die print STDOUT "Can't open $rc_data\n";
93 while (<INPUTFILE>) { #begin while
94    chop;
95    ($rc_name, $street_address, $city, $state, $rc_zip, $email, $phone1,
96 $phone2, $fax) = split (/\|/);
97 # the above 2 lines are on one line
98    if ($rc_zip eq $zip[$j]) {
99  		# show the dealers that match the zip/distance
100 		# prints out the locations
101    } # end if
102 } # end while
103 close (INPUTFILE);
104
105 } # end for
106
107 dbmclose(%zipdb);





More information about the Programming mailing list