File Coverage

File:blib/lib/Geo/Coder/Free.pm
Coverage:32.2%

linestmtbrancondsubpodtimecode
1package Geo::Coder::Free;
2
3# TODO: Don't have Maxmind as a separate database
4# TODO: Rename openaddresses.sql as geo_coder_free.sql
5# TODO: Consider Data::Dumper::Names instead of Data::Dumper
6# TODO: use the cache to store common queries
7
8
8
8
8
437845
4
76
use strict;
9
8
8
8
11
3
119
use warnings;
10
11# use lib '.';
12
13
8
8
8
14
6
165
use Carp;
14
8
8
8
1012
16647
134
use Config::Auto;
15
8
8
8
25
6
175
use Data::Dumper;
16
8
8
8
1681
339146
114
use Geo::Coder::Abbreviations;
17
8
8
8
1407
13
129
use Geo::Coder::Free::Local;
18
8
8
8
1481
31
152
use Geo::Coder::Free::MaxMind;
19
8
8
8
1406
17
148
use Geo::Coder::Free::OpenAddresses;
20
8
8
8
23
7
61
use Locale::US;
21
8
8
8
11
5
60
use Object::Configure;
22
8
8
8
12
6
121
use Params::Get;
23
8
8
8
11
4
12733
use Scalar::Util;
24
25 - 33
=head1 NAME

Geo::Coder::Free - Provides a Geo-Coding functionality using free databases

=head1 VERSION

Version 0.41

=cut
34
35our $VERSION = '0.41';
36
37our $alternatives;
38our $abbreviations;
39
40sub _abbreviate($);
41sub _normalize($);
42
43 - 104
=head1 DESCRIPTION

C<Geo::Coder::Free> translates addresses into latitude and longitude coordinates using a local C<SQLite> database built from free databases such as
L<https://spelunker.whosonfirst.org/>,
L<https://maxmind.com>,
L<https://github.com/dr5hn/countries-states-cities-database>,
L<https://openaddresses.io/>, and
L<https://openstreetmap.org>.
The module is designed to be flexible,
importing the data into the database,
and supporting both command-line and programmatic usage.
The module includes methods for geocoding (translating addresses to coordinates) and reverse geocoding (translating coordinates to addresses),
though the latter is not fully implemented.
It also provides utilities for handling common address formats and abbreviations,
and it includes a sample CGI script for a web-based geocoding service.
The module is intended for use in applications requiring geocoding without relying on paid or rate-limited online services,
and it supports customization through environment variables and optional database downloads.

The cgi-bin directory contains a simple DIY Geo-Coding website.

    cgi-bin/page.fcgi page=query q=1600+Pennsylvania+Avenue+NW+Washington+DC+USA

The sample website is currently down while I look for a new host.
When it's back up you will be able to use this to test it.

    curl 'https://geocode.nigelhorne.com/cgi-bin/page.fcgi?page=query&q=1600+Pennsylvania+Avenue+NW+Washington+DC+USA'

=head1 SYNOPSIS

    use Geo::Coder::Free;

    my $geo_coder = Geo::Coder::Free->new();
    my $location = $geo_coder->geocode(location => 'Ramsgate, Kent, UK');

    print 'Latitude: ', $location->lat(), "\n";
    print 'Longitude: ', $location->long(), "\n";

    # Use a local download of http://results.openaddresses.io/ and https://www.whosonfirst.org/
    my $openaddr_geo_coder = Geo::Coder::Free->new(openaddr => $ENV{'OPENADDR_HOME'});
    $location = $openaddr_geo_coder->geocode(location => '1600 Pennsylvania Avenue NW, Washington DC, USA');

    print 'Latitude: ', $location->lat(), "\n";
    print 'Longitude: ', $location->long(), "\n";

=head1 METHODS

=head2 new

    $geo_coder = Geo::Coder::Free->new();

Takes one optional parameter, openaddr, which is the base directory of
the OpenAddresses data from L<http://results.openaddresses.io>,
and Who's On First data from L<https://whosonfirst.org>.

Takes one optional parameter, directory,
which tells the object where to find the MaxMind and GeoNames files admin1db,
admin2.db and cities.[sql|csv.gz].
If that parameter isn't given,
the module will attempt to find the databases,
but that can't be guaranteed to work.

=cut
105
106sub new {
107
13
1
207842
        my $class = shift;
108
109        # Handle hash or hashref arguments
110
13
26
        my $params = Params::Get::get_params(undef, \@_) || {};
111
112
13
195
        if(!defined($class)) {
113
1
1
1
2
                if((scalar keys %{$params}) > 0) {
114                        # Using Geo::Coder::Free->new not Geo::Coder::Free::new
115
0
0
                        carp(__PACKAGE__, ' use ->new() not ::new() to instantiate');
116
0
0
                        return;
117                }
118
119                # FIXME: this only works when no arguments are given
120
1
2
                $class = __PACKAGE__;
121        } elsif(Scalar::Util::blessed($class)) {
122                # clone the given object
123
2
2
2
0
5
9
                return bless { %{$class}, %{$params} }, ref($class);
124        }
125
126
11
17
        if(!$alternatives) {
127
2
3
                my $keep = $/;
128
2
4
                local $/ = undef;
129
2
17
                my $data = <DATA>;
130
2
3
                $/ = $keep;
131
132
2
16
                $alternatives = Config::Auto->new(source => $data)->parse();
133
2
10
3275
182
                while(my ($key, $value) = (each %{$alternatives})) {
134
8
8
5
13
                        $alternatives->{$key} = join(', ', @{$value});
135                }
136        }
137
11
28
        $params = Object::Configure::configure($class, $params);
138        my $rc = {
139
11
11
38629
56
                %{$params},
140                maxmind => Geo::Coder::Free::MaxMind->new($params),
141                alternatives => $alternatives
142        };
143
144
11
3582
        if((!defined $params->{'openaddr'}) && $ENV{'OPENADDR_HOME'}) {
145
2
3
                $params->{'openaddr'} = $ENV{'OPENADDR_HOME'};
146        }
147
148
11
18
        if($params->{'openaddr'}) {
149
4
4
4
21
                $rc->{'openaddr'} = Geo::Coder::Free::OpenAddresses->new('id' => 'md5', %{$params});
150        }
151
9
14
        if(my $cache = $params->{'cache'}) {
152
2
3
                $rc->{'cache'} = $cache;
153        }
154
155
9
36
        return bless $rc, $class;
156}
157
158 - 174
=head2 geocode

    $location = $geo_coder->geocode(location => $location);

    print 'Latitude: ', $location->{'latitude'}, "\n";
    print 'Longitude: ', $location->{'longitude'}, "\n";

    # TODO:
    # @locations = $geo_coder->geocode('Portland, USA');
    # diag 'There are Portlands in ', join (', ', map { $_->{'state'} } @locations);

    # Note that this yields many false positives and isn't useable yet
    my @matches = $geo_coder->geocode(scantext => 'arbitrary text', region => 'US');

    @matches = $geo_coder->geocode(scantext => 'arbitrary text', region => 'GB', ignore_words => [ 'foo', 'bar' ]);

=cut
175
176# List of words that scantext should ignore
177my %common_words = (
178        'a' => 1,
179        'an' => 1,
180        'age' => 1,
181        'and' => 1,
182        'at' => 1,
183        'be' => 1,
184        'by' => 1,
185        'cross' => 1,
186        'for' => 1,
187        'how' => 1,
188        'i' => 1,
189        'in' => 1,
190        'is' => 1,
191        'more' => 1,
192        'of' => 1,
193        'on' => 1,
194        'or' => 1,
195        'over' => 1,
196        'pm' => 1,
197        'road' => 1,
198        'she' => 1,
199        'side' => 1,
200        'some' => 1,
201        'to' => 1,
202        'the' => 1,
203        'was' => 1,
204        'with' => 1,
205);
206
207sub geocode {
208
6
1
7730
        my $self = shift;
209
6
2
        my %params;
210
211        # Try hard to support whatever API that the user wants to use
212
6
25
        if(!ref($self)) {
213
0
0
                if(scalar(@_)) {
214
0
0
                        return(__PACKAGE__->new()->geocode(@_));
215                } elsif(!defined($self)) {
216                        # Geo::Coder::Free->geocode()
217
0
0
                        Carp::croak('Usage: ', __PACKAGE__, '::geocode(location => $location|scantext => $text)');
218                } elsif($self eq __PACKAGE__) {
219
0
0
                        Carp::croak("Usage: $self", '::geocode(location => $location|scantext => $text)');
220                }
221
0
0
                return(__PACKAGE__->new()->geocode($self));
222        } elsif(ref($self) eq 'HASH') {
223
0
0
                return(__PACKAGE__->new()->geocode($self));
224        } elsif(ref($_[0]) eq 'HASH') {
225
0
0
0
0
                %params = %{$_[0]};
226        # } elsif(ref($_[0]) && (ref($_[0] !~ /::/))) {
227        } elsif(ref($_[0])) {
228
0
0
                Carp::croak('Usage: ', __PACKAGE__, '::geocode(location => $location|scantext => $text)');
229        } elsif(scalar(@_) && (scalar(@_) % 2 == 0)) {
230
2
2
                %params = @_;
231        } else {
232
4
4
                $params{'location'} = shift;
233        }
234
235        # Fail when the input is just a set of numbers
236
6
19
        if(defined($params{'location'}) && ($params{'location'} !~ /\D/)) {
237
2
8
                Carp::croak('Usage: ', __PACKAGE__, ": invalid location to geocode(), $params{location}") if(length($params{'location'}));
238
1
3
                return undef;
239        } elsif(defined($params{'scantext'}) && ($params{'scantext'} !~ /\D/)) {
240
1
2
                Carp::croak('Usage: ', __PACKAGE__, ": invalid scantext to geocode(), $params{scantext}") if(length($params{'scantext'}));
241
1
2
                return undef;
242        }
243
244
3
4
        if($self->{'openaddr'}) {
245
0
0
                if(my $scantext = $params{'scantext'}) {
246
0
0
                        return if($self->{'scantext_misses'}{$scantext});
247
0
0
                        $self->{'local'} ||= Geo::Coder::Free::Local->new();
248                        my @matches = grep defined, (
249                                $self->{'local'}->geocode($scantext),
250                                $self->{'openaddr'}->geocode($scantext),
251
0
0
                                $self->{'maxmind'}->geocode($scantext)
252                        );
253
0
0
                        if(scalar(@matches)) {
254                                # ::diag(__LINE__, Data::Dumper->Dump([\@matches]));
255
0
0
                                return @matches;
256                        }
257
0
0
                        my $region = $params{'region'};
258
259
0
0
                        my %ignore_words;
260
0
0
                        if($params{'ignore_words'}) {
261
0
0
0
0
0
0
                                %ignore_words = map { lc($_) => 1 } @{$params{'ignore_words'}};
262                        }
263
264
0
0
                        %ignore_words = (%ignore_words, %common_words);
265
266
0
0
                        my @rc;
267
0
0
                        @matches = _find_word_triplets($scantext, \%ignore_words);
268
269
0
0
                        foreach my $place (@matches) {
270
0
0
                                my $location = $region ? "$place, $region" : $place;
271
0
0
                                next if($self->{'scantext_misses'}{$location});
272                                my @res = grep defined, (
273
0
0
                                        $self->{'openaddr'}->geocode($location),
274                                        # $self->{'maxmind'}->geocode($location)
275                                );
276
0
0
                                foreach my $entry(@res) {
277
0
0
                                        $entry->{'location'} = $location;
278
0
0
                                        $entry->{'text'} = $scantext;
279
0
0
                                        $entry->{'confidence'} = 0.8;
280                                }
281
0
0
                                if(scalar(@res) && !wantarray) {
282                                        # ::diag(__LINE__, Data::Dumper->Dump([\@res]));
283
0
0
                                        return $res[0];
284                                }
285
0
0
                                if(scalar(@res)) {
286
0
0
                                        push @rc, @res;
287                                } else {
288
0
0
                                        $self->{'scantext_misses'}{$location} = 1;
289                                }
290                        }
291
0
0
                        if(scalar(@rc)) {
292                                # ::diag(__LINE__, Data::Dumper->Dump([\@rc]));
293
0
0
                                return @rc;
294                        }
295
0
0
                        @matches = _find_word_duplets($scantext, \%ignore_words);
296
297
0
0
                        foreach my $place (@matches) {
298
0
0
                                my $location = $region ? "$place, $region" : $place;
299
0
0
                                next if($self->{'scantext_misses'}{$location});
300                                my @res = grep defined, (
301
0
0
                                        $self->{'openaddr'}->geocode($location),
302                                        # $self->{'maxmind'}->geocode($location)
303                                );
304
0
0
                                foreach my $entry(@res) {
305
0
0
                                        $entry->{'location'} = $location;
306
0
0
                                        $entry->{'text'} = $scantext;
307
0
0
                                        $entry->{'confidence'} = 0.7;
308                                }
309
0
0
                                if(scalar(@res) && !wantarray) {
310                                        # ::diag(__LINE__, Data::Dumper->Dump([\@res]));
311
0
0
                                        return $res[0];
312                                }
313
0
0
                                if(scalar(@res)) {
314
0
0
                                        push @rc, @res;
315                                } else {
316
0
0
                                        $self->{'scantext_misses'}{$location} = 1;
317                                }
318                        }
319
0
0
                        if(scalar(@rc)) {
320                                # ::diag(__LINE__, Data::Dumper->Dump([\@rc]));
321
0
0
                                return @rc;
322                        }
323
324                        # Regular expression to match different formats of places
325                        # This rediculous regex is from Chatgpt
326                        #       OpenAI. (2025). ChatGPT [Large language model]. https://chatgpt.com
327
328                        # FIXME: Doesn't find the place in this "She was born May 21, 1937 in Noblesville, IN.";
329
0
0
                        @matches = $scantext =~ /\b(?:\d+\s+)?(?:[A-Z][a-z]+(?:\s+[A-Z][a-z]+)*\.?),\s*(?:[A-Z][a-z]+(?:\s+[A-Z][a-z]+)*(?:,\s*[A-Z]{2,})*)\b/g;
330
331
0
0
                        my @places;
332
0
0
                        foreach my $match (@matches) {
333
0
0
                                push @places, $match if(defined $match && $match ne '');
334                        }
335
336                        # ::diag($scantext);
337                        # ::diag(join(';', @places)) if(scalar(@places));
338
339
0
0
                        foreach my $place (@places) {
340
0
0
                                my $location = $region ? "$place, $region" : $place;
341
0
0
                                next if($self->{'scantext_misses'}{$location});
342                                my @res = grep defined, (
343
0
0
                                        $self->{'openaddr'}->geocode($location),
344                                        # $self->{'maxmind'}->geocode($location)
345                                );
346
0
0
                                foreach my $entry(@res) {
347
0
0
                                        $entry->{'location'} = $location;
348
0
0
                                        $entry->{'text'} = $scantext;
349
0
0
                                        $entry->{'confidence'} = 0.7;
350                                }
351
0
0
                                if(scalar(@res) && !wantarray) {
352                                        # ::diag(__LINE__, Data::Dumper->Dump([\@res]));
353
0
0
                                        return $res[0];
354                                }
355
0
0
                                if(scalar(@res)) {
356
0
0
                                        push @rc, @res;
357                                } else {
358
0
0
                                        $self->{'scantext_misses'}{$location} = 1;
359                                }
360                        }
361
0
0
                        if(scalar(@rc)) {
362                                # ::diag(__LINE__, Data::Dumper->Dump([\@rc]));
363
0
0
                                return @rc;
364                        }
365
366
0
0
                        if($region) {
367
0
0
                                if($region eq 'GB') {
368
0
0
                                        my @candidates = _find_gb_addresses($scantext);
369                                        # ::diag(Data::Dumper->new([\@candidates])->Dump());
370
0
0
                                        if(scalar(@candidates)) {
371
0
0
                                                my @gb;
372
0
0
                                                foreach my $candidate(@candidates) {
373                                                        # ::diag(__LINE__, ": $candidate");
374
0
0
                                                        next if(exists($ignore_words{lc($candidate)}));
375                                                        my @res = grep defined, (
376
0
0
                                                                $self->{'openaddr'}->geocode("$candidate, GB"),
377                                                                # $self->{'maxmind'}->geocode("$candidate, GB")
378                                                        );
379
0
0
                                                        push @gb, @res if(scalar(@res));
380                                                }
381
0
0
                                                return @gb if(scalar(@gb));
382                                        }
383                                } elsif($region eq 'US') {
384
0
0
                                        my @candidates = _find_us_addresses($scantext);
385                                        # ::diag(Data::Dumper->new([\@candidates])->Dump());
386
0
0
                                        if(scalar(@candidates)) {
387
0
0
                                                my @us;
388
0
0
                                                foreach my $candidate(@candidates) {
389                                                        # ::diag(__LINE__, ": $candidate");
390
0
0
                                                        next if(exists($ignore_words{lc($candidate)}));
391                                                        my @res = grep defined, (
392
0
0
                                                                $self->{'openaddr'}->geocode("$candidate, US"),
393                                                                # $self->{'maxmind'}->geocode("$candidate, US")
394                                                        );
395
0
0
                                                        push @us, @res if(scalar(@res));
396                                                }
397
0
0
                                                return @us if(scalar(@us));
398                                        }
399                                } elsif($region eq 'Canada') {
400
0
0
                                        my @candidates = _find_ca_addresses($scantext);
401                                        # ::diag(Data::Dumper->new([\@candidates])->Dump());
402
0
0
                                        if(scalar(@candidates)) {
403
0
0
                                                my @ca;
404
0
0
                                                foreach my $candidate(@candidates) {
405                                                        # ::diag(__LINE__, ": $candidate");
406
0
0
                                                        next if(exists($ignore_words{lc($candidate)}));
407                                                        my @res = grep defined, (
408
0
0
                                                                $self->{'openaddr'}->geocode("$candidate, Canada"),
409                                                                # $self->{'maxmind'}->geocode("$candidate, Canada")
410                                                        );
411
0
0
                                                        push @ca, @res if(scalar(@res));
412                                                }
413
0
0
                                                return @ca if(scalar(@ca));
414                                        }
415                                }
416                        }
417
0
0
                        $self->{'scantext_misses'}{$scantext} = 1;
418
0
0
                        return;
419                }
420
0
0
                if(wantarray) {
421
0
0
                        my @rc = $self->{'openaddr'}->geocode(\%params);
422
0
0
                        if(scalar(@rc)) {
423
0
0
                                return @rc if(scalar(@rc) && $rc[0]);
424                        }
425
0
0
                        $self->{'local'} ||= Geo::Coder::Free::Local->new();
426
0
0
                        @rc = $self->{'local'}->geocode(\%params);
427
428
0
0
                        return @rc if(scalar(@rc) && $rc[0]);
429                } else {        # !wantarray
430
0
0
                        if(my $rc = $self->{'openaddr'}->geocode(\%params)) {
431
0
0
                                return $rc;
432                        }
433
0
0
                        $self->{'local'} ||= Geo::Coder::Free::Local->new();
434
0
0
                        if(my $rc = $self->{'local'}->geocode(\%params)) {
435
0
0
                                return $rc;
436                        }
437                }
438
0
0
                if((!$params{'scantext'}) && (my $alternatives = $self->{'alternatives'})) {
439                        # Try some alternatives, would be nice to read this from somewhere on line
440
0
0
                        my $location = $params{'location'};
441
0
0
0
0
                        while (my($key, $value) = each %{$alternatives}) {
442
0
0
                                if($location =~ $key) {
443                                        # ::diag("$key=>$value");
444
0
0
                                        my $keep = $location;
445
0
0
                                        $location =~ s/$key/$value/;
446
0
0
                                        $params{'location'} = $location;
447
0
0
                                        if(my $rc = $self->geocode(\%params)) {
448
0
0
                                                return $rc;
449                                        }
450                                        # Try without the commas, for "Tyne and Wear"
451
0
0
                                        if($value =~ /, /) {
452
0
0
                                                my $string = $value;
453
0
0
                                                $string =~ s/,//g;
454
0
0
                                                $location = $keep;
455
0
0
                                                $location =~ s/$key/$string/;
456
0
0
                                                $params{'location'} = $location;
457
0
0
                                                if(my $rc = $self->geocode(\%params)) {
458
0
0
                                                        return $rc;
459                                                }
460                                        }
461                                }
462                        }
463                }
464        }
465
466        # FIXME: scantext only works if OPENADDR_HOME is set
467
3
4
        if($params{'location'}) {
468
1
2
                if(wantarray) {
469
0
0
                        my @rc = $self->{'maxmind'}->geocode(\%params);
470
0
0
                        return @rc;
471                }
472
1
4
                return $self->{'maxmind'}->geocode(\%params);
473        }
474
2
2
        if(!$params{'scantext'}) {
475
1
5
                Carp::croak('Usage: geocode(location => $location|scantext => $text)');
476        }
477
1
2
        return;
478}
479
480# Find all sets of 3 consecutive words in a string
481# Example usage
482# my $input_string = "apple, banana orange,grape, melon";
483# my @result = find_word_triplets($input_string);
484# print join("\n", @result), "\n";
485sub _find_word_triplets
486{
487
0
0
        my ($text, $remove_words) = @_;
488
489        # Normalize spaces and commas
490
0
0
        $text =~ s/[,]+/ /g;    # Replace commas with spaces
491
0
0
        $text =~ s/\s+/ /g;     # Normalize multiple spaces
492
0
0
        $text =~ s/^\s+|\s+$//g; # Trim leading/trailing spaces
493
494        # my @words = split /\s+/, $text;
495
0
0
0
0
        my @words = grep { !/^\d+$/ && !$remove_words->{lc($_)} } split /\s+/, $text; # Remove numeric words and unwanted words
496
0
0
        my @triplets;
497
498
0
0
        for my $i (0 .. $#words - 2) {
499
0
0
                push @triplets, "$words[$i], $words[$i+1], $words[$i+2]";
500        }
501
502
0
0
        return @triplets;
503}
504
505# Find all sets of 2 consecutive words in a string
506sub _find_word_duplets
507{
508
0
0
        my ($text, $remove_words) = @_;
509
510        # Normalize spaces and commas
511
0
0
        $text =~ s/[,]+/ /g;    # Replace commas with spaces
512
0
0
        $text =~ s/\s+/ /g;     # Normalize multiple spaces
513
0
0
        $text =~ s/^\s+|\s+$//g; # Trim leading/trailing spaces
514
515        # my @words = split /\s+/, $text;
516
0
0
0
0
        my @words = grep { !/^\d+$/ && !$remove_words->{$_} } split /\s+/, $text; # Remove numeric words and unwanted words
517
0
0
        my @duplets;
518
519
0
0
        for my $i (0 .. $#words - 1) {
520
0
0
                push @duplets, "$words[$i], $words[$i+1]";
521        }
522
523
0
0
        return @duplets;
524}
525
526# Function to find all possible US addresses in a string
527sub _find_us_addresses {
528
0
0
        my $text = shift;
529
0
0
        my @addresses;
530
531        # Regular expression to match U.S.-style addresses
532
0
0
        my $address_regex = qr/
533                \b                    # Word boundary
534                (\d{1,5})       # Street number: 1 to 5 digits
535                \s+     # Space
536                ([A-Za-z0-9\s]+?)       # Street name (alphanumeric, allows spaces)
537                \s+     # Space
538                (Avenue|Ave\.?|Boulevard|Blvd\.?|Road|Rd\.?|Lane|Ln\.?|Drive|Dr\.?|Street|St\.?) # Street type
539                (\s+[A-Za-z]{2})?       # Optional directional suffix (NW, NE, etc.)
540                ,\s*    # Comma and optional spaces
541                ([A-Za-z\s]+)   # City name
542                ,\s*    # Comma and optional spaces
543                ([A-Z]{2})      # State abbreviation
544                \s*     # Optional spaces
545                (\d{5}(-\d{4})?)?       # Optional ZIP code
546                \b      # Word boundary
547        /x;
548
549        # Find all matches
550
0
0
        while ($text =~ /$address_regex/g) {
551
0
0
                push @addresses, $&;        # Capture the full match
552        }
553
554
0
0
        return @addresses;
555}
556
557# Function to find all possible British addresses in a string
558sub _find_gb_addresses {
559
0
0
        my $text = shift;
560
0
0
        my @addresses;
561
562        # Regular expression to match British-style addresses
563
0
0
        my $address_regex = qr/
564                \b                                     # Word boundary
565                (\d{1,5}|\w[\w\s'-]+)       # House number or name (e.g., "123", "The White House")
566                \s+                                      # Space
567                ([A-Za-z0-9\s'-]+)                       # Street name (alphanumeric with spaces, hyphens, or apostrophes)
568                \s*,?\s*                                 # Optional comma and spaces
569                ([A-Za-z\s'-]+)                          # Locality or district name (optional, but typically a valid name)
570                \s*,?\s*                                 # Optional comma and spaces
571                ([A-Za-z\s'-]+)                          # Town or city name
572                \s*,?\s*                                 # Optional comma and spaces
573                ([A-Za-z\s'-]+)                         # County name
574                # \s*,?\s*                                 # Optional comma and spaces
575                # ([A-Z]{1,2}[0-9R][0-9A-Z]?\s[0-9][ABD-HJLNP-UW-Z]{2}),        # Optional postcode (e.g., "SW1A 1AA", "EC1A 1BB")
576                \b                                       # Word boundary
577        /x;
578
579        # Find all matches
580
0
0
        while ($text =~ /$address_regex/g) {
581
0
0
                my $address = $&;
582
0
0
                $address =~ s/[,\s]+$//;
583
0
0
                push @addresses, $address;      # Capture the full match
584        }
585
586
0
0
        return @addresses;
587}
588
589# Function to find all possible Canadian addresses in a string
590sub _find_ca_addresses {
591
0
0
        my $text = shift;
592
0
0
        my @addresses;
593
594        # Regular expression to match Canadian-style addresses
595
0
0
        my $address_regex = qr/
596                \b                                # Word boundary
597                (\d{1,5})                         # Street number: 1 to 5 digits
598                \s+                               # Space
599                ([A-Za-z0-9\s]+?)                 # Street name (alphanumeric, allows spaces)
600                \s+                               # Space
601                (Avenue|Ave\.?|Boulevard|Blvd\.?|Road|Rd\.?|Lane|Ln\.?|Drive|Dr\.?|Street|St\.?|Circle|Crescent|Cres\.?) # Street type
602                \s*,\s*                           # Comma and optional spaces
603                ([A-Za-z\s]+)                     # City name (allows multi-word names)
604                \s*,\s*                           # Comma and optional spaces
605                ([A-Z]{2})                        # Province abbreviation (e.g., ON, QC, BC)
606                \s*,?\s*                          # Optional comma and spaces
607                ([A-Z]\d[A-Z]\s?\d[A-Z]\d)?     # Optional Canadian postal code (e.g., A1A 1A1)
608                \b                                # Word boundary
609        /x;
610
611        # Find all matches
612
0
0
        while ($text =~ /$address_regex/g) {
613
0
0
                push @addresses, $&; # Capture the full match
614        }
615
616
0
0
        return @addresses;
617}
618
619 - 625
=head2 reverse_geocode

    $location = $geocoder->reverse_geocode(latlng => '37.778907,-122.39732');

To be done.

=cut
626
627sub reverse_geocode {
628
1
1
2044
        my $self = shift;
629
1
1
        my %params;
630
631        # Try hard to support whatever API that the user wants to use
632
1
7
        if(!ref($self)) {
633
0
0
                if(scalar(@_)) {
634
0
0
                        return(__PACKAGE__->new()->reverse_geocode(@_));
635                } elsif(!defined($self)) {
636                        # Geo::Coder::Free->reverse_geocode()
637
0
0
                        Carp::croak('Usage: ', __PACKAGE__, '::reverse_geocode(latlng => "$lat,$long")');
638                } elsif($self eq __PACKAGE__) {
639
0
0
                        Carp::croak("Usage: $self", '::reverse_geocode(latlng => "$lat,$long")');
640                }
641
0
0
                return(__PACKAGE__->new()->reverse_geocode($self));
642        } elsif(ref($self) eq 'HASH') {
643
0
0
                return(__PACKAGE__->new()->reverse_geocode($self));
644        } elsif(ref($_[0]) eq 'HASH') {
645
0
0
0
0
                %params = %{$_[0]};
646        # } elsif(ref($_[0]) && (ref($_[0] !~ /::/))) {
647        } elsif(ref($_[0])) {
648
0
0
                Carp::croak('Usage: ', __PACKAGE__, '::reverse_geocode(latlng => "$lat,$long")');
649        } elsif(scalar(@_) && (scalar(@_) % 2 == 0)) {
650
1
5
                %params = @_;
651        } else {
652
0
0
                $params{'latlng'} = shift;
653        }
654
655        # The drivers don't yet support it
656
1
3
        if($self->{'openaddr'}) {
657
0
0
                if(wantarray) {
658
0
0
                        my @rc = $self->{'openaddr'}->reverse_geocode(\%params);
659
0
0
                        return @rc;
660                } elsif(my $rc = $self->{'openaddr'}->reverse_geocode(\%params)) {
661
0
0
                        return $rc;
662                }
663        }
664
665
1
2
        if($params{'latlng'}) {
666
0
0
                if(wantarray) {
667
0
0
                        my @rc = $self->{'maxmind'}->reverse_geocode(\%params);
668
0
0
                        return @rc;
669                }
670
0
0
                return $self->{'maxmind'}->reverse_geocode(\%params);
671        }
672
673
1
4
        Carp::croak('Reverse lookup is not yet supported');
674}
675
676 - 680
=head2  ua

Does nothing, here for compatibility with other Geo-Coders

=cut
681
682sub ua
683
1
{
684}
685
686 - 692
=head2 run

You can also run this module from the command line:

    perl lib/Geo/Coder/Free.pm 1600 Pennsylvania Avenue NW, Washington DC

=cut
693
694__PACKAGE__->run(@ARGV) unless caller();
695
696sub run {
697
0
1
0
        require Data::Dumper;
698
699
0
0
        my $class = shift;
700
701
0
0
        my $location = join(' ', @_);
702
703
0
0
        my @rc;
704
0
0
        if($ENV{'OPENADDR_HOME'}) {
705
0
0
                @rc = $class->new(directory => $ENV{'OPENADDR_HOME'})->geocode($location);
706        } else {
707
0
0
                @rc = $class->new()->geocode($location);
708        }
709
710
0
0
        die "$0: geocoding failed" unless(scalar(@rc));
711
712
0
0
        print Data::Dumper->new([\@rc])->Dump();
713}
714
715sub _normalize($) {
716
1
2
        my $street = shift;
717
718
1
8
        $abbreviations ||= Geo::Coder::Abbreviations->new();
719
720
1
7390
        $street = uc($street);
721
1
13
        if($street =~ /(.+)\s+(.+)\s+(.+)/) {
722
1
1
                my $a;
723
1
5
                if((lc($2) ne 'cross') && ($a = $abbreviations->abbreviate($2))) {
724
0
0
                        $street = "$1 $a $3";
725                } elsif($a = $abbreviations->abbreviate($3)) {
726
1
9
                        $street = "$1 $2 $a";
727                }
728        } elsif($street =~ /(.+)\s(.+)$/) {
729
0
0
                if(my $a = $abbreviations->abbreviate($2)) {
730
0
0
                        $street = "$1 $a";
731                }
732        }
733
1
2
        $street =~ s/^0+//;     # Turn 04th St into 4th St
734
1
2
        return $street;
735}
736
737sub _abbreviate($) {
738
6
46
        my $type = uc(shift);
739
740
6
13
        $abbreviations ||= Geo::Coder::Abbreviations->new();
741
742
6
22
        if(my $rc = $abbreviations->abbreviate($type)) {
743
6
39
                return $rc;
744        }
745
0
        return $type;
746}
747
748 - 942
=head1 GETTING STARTED

To download, import and setup the local database,
before running "make", but after running "perl Makefile.PL", run these instructions.

Optionally set the environment variable OPENADDR_HOME to point to an empty directory and download the data from L<http://results.openaddresses.io> into that directory; and
optionally set the environment variable WHOSONFIRST_HOME to point to an empty directory and download the data using L<https://github.com/nigelhorne/NJH-Snippets/blob/master/bin/wof-clone>.
The script bin/download_databases (see below) will do those for you.
You do not need to download the MaxMind data, that will be downloaded automatically.

You will need to create the database used by Geo::Coder::Free.

Install L<App::csv2sqlite> and L<https://github.com/nigelhorne/NJH-Snippets>.
Run bin/create_sqlite - converts the Maxmind "cities" database from CSV to SQLite.

To use with MariaDB,
set MARIADB_SERVER="$hostname;$port" and
MARIADB_USER="$user;$password" (TODO: username/password should be asked for)
The code will use a database called geo_code_free which will be deleted
if it exists.
$user should only need to privileges to DROP, CREATE, SELECT, INSERT, CREATE and INDEX
on that database. If you've set DEBUG mode in createdatabase.PL, or are playing
with REPLACE instead of INSERT, you'll also need DELETE privileges - but non-developers
don't need to have that.

Optional steps to download and install large databases.
This will take a long time and use a lot of disc space, be clear that this is what you want.
In the bin directory there are some helper scripts to do this.
You will need to tailor them to your set up, but that's not that hard as the
scripts are trivial.

=over 4

=item 1

C<mkdir $WHOSONFIRST_HOME; cd $WHOSONFIRST_HOME> run wof-clone from NJH-Snippets.

This can take a long time because it contains lots of directories which filesystem drivers
seem to take a long time to navigate (at least my EXT4 and ZFS systems do).

=item 2

Install L<https://github.com/dr5hn/countries-states-cities-database.git> into $DR5HN_HOME.
This data contains cities only,
so it's not used if OSM_HOME is set,
since the latter is much more comprehensive.
Also, only Australia, Canada and the US is imported, as the UK data is difficult to parse.

=item 3

Run bin/download_databases - this will download the WhosOnFirst, Openaddr,
Open Street Map and dr5hn databases.
Open Street Map now uses PBF files,
so you will need to C<apt instsall osmium_tool> first.
Check the values of OSM_HOME, OPENADDR_HOME,
DR5HN_HOME and WHOSONFIRST_HOME within that script,
you may wish to change them.
The Makefile.PL file will download the MaxMind database for you, as that is not optional.

=item 4

Run bin/create_db - this creates the database used by G:C:F using the data you've just downloaded
The database is called openaddr.sql,
even though it does include all of the above data.
That's historical before I added the WhosOnFirst database.
The names are a bit of a mess because of that.
I should rename it to geo-coder-free.sql, even though it doesn't contain the Maxmind data.

=back

Now you're ready to run "make"
(note that the download_databases script may have done that for you,
but you'll want to check).

See the comment at the start of createdatabase.PL for further reading.

=head1 MORE INFORMATION

I've written a few Perl related Genealogy programs including gedcom (L<https://github.com/nigelhorne/gedcom>)
and ged2site (L<https://github.com/nigelhorne/ged2site>).
One of the things that these do is to check the validity of your family tree, and one of those tasks is to verify place-names.
Of course places do change names and spelling becomes more consistent over the years, but the vast majority remain the same.
Enough of a majority to computerise the verification.
Unfortunately all of the on-line services have one problem or another - most either charge for large number of access, or throttle the number of look-ups.
Even my modest tree, just over 2000 people, reaches those limits.

There are, however, a number of free databases that can be used, including MaxMind, GeoNames, OpenAddresses and WhosOnFirst.
The objective of L<Geo::Coder::Free> (L<https://github.com/nigelhorne/Geo-Coder-Free>)
is to create a database of those databases and to create a search engine either through a local copy of the database or through an on-line website.
Both are in their early days, but I have examples which do surprisingly well.

The local copy of the database is built using the createdatabase.PL script which is bundled with G:C:F.
That script creates a single SQLite file from downloaded copies of the databases listed above, to create the database you will need
to first install L<App::csv2sqlite>.
If REDIS_SERVER is set, the data are also stored on a Redis Server.
Running 'make' will download GeoNames and MaxMind, but OpenAddresses and WhosOnFirst need to be downloaded manually if you decide to use them - they are treated as optional by G:C:F.

The sample website at L<https://geocode.nigelhorne.com/> is down at the moment while I look for a new host.
The source code for that site is included in the G:C:F distribution.

=head1 BUGS

Some lookups fail at the moments, if you find one please file a bug report.

The MaxMind data only contains cities.
The OpenAddresses data doesn't cover the globe.

Can't parse and handle "London, England".

It would be great to have a set-up wizard to create the database.

The various scripts in NJH-Snippets ought to be in this module.

=head1 SEE ALSO

=over 4

=item * L<Test Dashboard|https://nigelhorne.github.io/Geo-Coder-Free/coverage/>

=back

L<https://openaddresses.io/>,
L<https://www.maxmind.com/en/home>,
L<https://www.geonames.org/>,
L<https://raw.githubusercontent.com/dr5hn/countries-states-cities-database/master/countries%2Bstates%2Bcities.json>,
L<https://www.whosonfirst.org/> and
L<https://github.com/nigelhorne/vwf>.

L<Geo::Coder::Free::Local>,
L<Geo::Coder::Free::Maxmind>,
L<Geo::Coder::Free::OpenAddresses>.

See L<Geo::Coder::Free::OpenAddresses> for instructions creating the SQLite database from
L<http://results.openaddresses.io/>.

=head1 AUTHOR

Nigel Horne, C<< <njh@nigelhorne.com> >>

This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself.

=head1 SUPPORT

This module is provided as-is without any warranty.

You can find documentation for this module with the perldoc command.

    perldoc Geo::Coder::Free

You can also look for information at:

=over 4

=item * MetaCPAN

L<https://metacpan.org/release/Geo-Coder-Free>

=item * RT: CPAN's request tracker

L<https://rt.cpan.org/NoAuth/Bugs.html?Dist=Geo-Coder-Free>

=item * CPANTS

L<http://cpants.cpanauthors.org/dist/Geo-Coder-Free>

=item * CPAN Testers' Matrix

L<http://matrix.cpantesters.org/?dist=Geo-Coder-Free>

=item * CPAN Testers Dependencies

L<http://deps.cpantesters.org/?module=Geo::Coder::Free>

=item * Search CPAN

L<http://search.cpan.org/dist/Geo-Coder-Free/>

=back

=head1 LICENSE AND COPYRIGHT

Copyright 2017-2026 Nigel Horne.

The program code is released under the following licence: GPL for personal use on a single computer.
All other users (including Commercial, Charity, Educational, Government)
must apply in writing for a licence for use from Nigel Horne at `<njh at nigelhorne.com>`.

This product uses GeoLite2 data created by MaxMind, available from
L<https://www.maxmind.com/en/home>. See their website for licensing information.

This product uses data from Who's on First.
See L<https://github.com/whosonfirst-data/whosonfirst-data/blob/master/LICENSE.md> for licensing information.

=cut
943
9441;
945
946# Common mappings allowing looser lookups
947# Would be nice to read this from somewhere on-line
948# See also lib/Geo/Coder/Free/Local.pm