File Coverage

File:blib/lib/Database/Abstraction.pm
Coverage:64.1%

linestmtbrancondsubtimecode
1package Database::Abstraction;
2
3# Author Nigel Horne: njh@nigelhorne.com
4# Copyright (C) 2015-2025, Nigel Horne
5
6# Usage is subject to licence terms.
7# The licence terms of this software are as follows:
8# Personal single user, single computer use: GPL2
9# All other users (for example, Commercial, Charity, Educational, Government)
10#       must apply in writing for a licence for use from Nigel Horne at the
11#       above e-mail.
12
13# TODO: Switch "entry" to off by default, and enable by passing 'entry'
14#       though that wouldn't be so nice for AUTOLOAD
15# TODO: support a directory hierarchy of databases
16# TODO: consider returning an object or array of objects, rather than hashes
17# TODO: Add redis database - could be of use for Geo::Coder::Free
18#       use select() to select a database - use the table arg
19#       new(database => 'redis://servername');
20# TODO: Add a "key" property, defaulting to "entry", which would be the name of the key
21# TODO: The maximum number to return should be tuneable (as a LIMIT)
22# TODO: Add full CRUD support
23# TODO: It would be better for the default sep_char to be ',' rather than '!'
24# FIXME:        t/xml.t fails in slurping mode
25# TODO: Other databases e.g., Redis, noSQL, remote databases such as MySQL, PostgresSQL
26# TODO: The no_entry/entry terminology is confusing.  Replace with no_id/id_column
27# TODO: Add support for DBM::Deep
28
29
16
16
16
2098336
15
344
use warnings;
30
16
16
16
29
11
140
use strict;
31
32
16
16
16
2415
7330
31
use boolean;
33
16
16
16
602
19
371
use Carp;
34
16
16
16
27
14
248
use Data::Dumper;
35
16
16
16
3145
43833
390
use Data::Reuse;
36
16
16
16
3347
213017
2007
use DBD::SQLite::Constants qw/:file_open/;      # For SQLITE_OPEN_READONLY
37
16
16
16
77
18
1931
use Fcntl;      # For O_RDONLY
38
16
16
16
41
19
257
use File::Spec;
39
16
16
16
2980
3611
349
use File::pfopen 0.03;  # For $mode and list context
40
16
16
16
5433
101544
647
use File::Temp;
41
16
16
16
3232
413632
310
use Log::Abstraction 0.24;
42
16
16
16
3121
6255
264
use Object::Configure 0.12;
43
16
16
16
35
74
223
use Params::Get 0.13;
44# use Error::Simple;    # A nice idea to use this, but it doesn't play well with "use lib"
45
16
16
16
47
31
328
use Scalar::Util;
46
47our %defaults;
48
16
16
16
26
29
53312
use constant    DEFAULT_MAX_SLURP_SIZE => 16 * 1024; # CSV files <= than this size are read into memory
49
50 - 58
=head1 NAME

Database::Abstraction - Read-only Database Abstraction Layer (ORM)

=head1 VERSION

Version 0.32

=cut
59
60our $VERSION = '0.32';
61
62 - 192
=head1 DESCRIPTION

C<Database::Abstraction> is a read-only database abstraction layer (ORM) for Perl,
designed to provide a simple interface for accessing and querying various types of databases such as CSV, XML, and SQLite without the need to write SQL queries.
It promotes code maintainability by abstracting database access logic into a single interface,
allowing users to switch between different storage formats seamlessly.
The module supports caching for performance optimization,
flexible logging for debugging and monitoring,
and includes features like the AUTOLOAD method for convenient access to database columns.
By handling numerous database and file formats,
C<Database::Abstraction> adds versatility and simplifies the management of read-intensive applications.

=head1 SYNOPSIS

Abstract class giving read-only access to CSV,
XML,
BerkeleyDB and SQLite databases via Perl without writing any SQL,
using caching for performance optimization.

The module promotes code maintainability by abstracting database access logic into a single interface.
Users can switch between different storage formats without changing application logic.
The ability to handle numerous database and file formats adds versatility and makes it useful for a variety of applications.

It's a simple ORM like interface which,
for all of its simplicity,
allows you to do a lot of the heavy lifting of simple database operations without any SQL.
It offers functionalities like opening the database and fetching data based on various criteria.

Built-in support for flexible and configurable caching improves performance for read-intensive applications.

Supports logging to debug and monitor database operations.

Look for databases in $directory in this order:

=over 4

=item 1 C<SQLite>

File ends with .sql

=item 2 C<PSV>

Pipe separated file, file ends with .psv

=item 3 C<CSV>

File ends with .csv or .db, can be gzipped. Note the default sep_char is '!' not ','

=item 4 C<XML>

File ends with .xml

=item 5 C<BerkeleyDB>

File ends with .db

=back

The AUTOLOAD feature allows for convenient access to database columns using method calls.
It hides the complexity of querying the underlying data storage.

If the table has a key column,
entries are keyed on that and sorts are based on it.
To turn that off, pass 'no_entry' to the constructor, for legacy
reasons it's enabled by default.
The key column's default name is 'entry', but it can be overridden by the 'id' parameter.

Arrays are made read-only before being returned.
To disable that, pass C<no_fixate> to the constructor.

CSV files that are not no_entry can have empty lines or comment lines starting with '#',
to make them more readable.

=head1 EXAMPLE

If the file /var/dat/foo.csv contains something like:

    "customer_id","name"
    "plugh","John"
    "xyzzy","Jane"

Create a driver for the file in .../Database/foo.pm:

    package Database::foo;

    use Database::Abstraction;

    our @ISA = ('Database::Abstraction');

    # Regular CSV: There is no entry column and the separators are commas
    sub new
    {
        my $class = shift;
        my %args = (ref($_[0]) eq 'HASH') ? %{$_[0]} : @_;

        return $class->SUPER::new(no_entry => 1, sep_char => ',', %args);
    }

You can then use this code to access the data via the driver:

    # Opens the file, e.g. /var/dat/foo.csv
    my $foo = Database::foo->new(directory => '/var/dat');

    # Prints "John"
    print 'Customer name ', $foo->name(customer_id => 'plugh'), "\n";

    # Prints:
    #  $VAR1 = {
    #     'customer_id' => 'xyzzy',
    #     'name' => 'Jane'
    #  };
    my $row = $foo->fetchrow_hashref(customer_id => 'xyzzy');
    print Data::Dumper->new([$row])->Dump();

=head1 SUBROUTINES/METHODS

=head2 init

Initializes the abstraction class and its subclasses with optional arguments for configuration.

    Database::Abstraction::init(directory => '../data');

See the documentation for new to see what variables can be set.

Returns a reference to a hash of the current values.
Therefore when given with no arguments you can get the current default values:

    my $defaults = Database::Abstraction::init();
    print $defaults->{'directory'}, "\n";

=cut
193
194# Subroutine to initialize with args
195sub init
196{
197
46
491857
        if(my $params = Params::Get::get_params(undef, @_)) {
198
44
512
                if(($params->{'expires_in'} && !$params->{'cache_duration'})) {
199                        # Compatibility with CHI
200
1
6
                        $params->{'cache_duration'} = $params->{'expires_in'};
201                }
202
203
44
44
49
102
                %defaults = (%defaults, %{$params});
204
44
165
                $defaults{'cache_duration'} ||= '1 hour';
205        }
206
207
46
1057
        return \%defaults
208}
209
210 - 220
=head2 import

The module can be initialised by the C<use> directive.

    use Database::Abstraction 'directory' => '/etc/data';

or

    use Database::Abstraction { 'directory' => '/etc/data' };

=cut
221
222sub import
223{
224
38
2253
        my $pkg = shift;
225
226
38
87
        if((scalar(@_) % 2) == 0) {
227
38
43
                my %h = @_;
228
38
73
                init(Object::Configure::configure($pkg, \%h));
229        } elsif((scalar(@_) == 1) && (ref($_[0]) eq 'HASH')) {
230
0
0
                init(Object::Configure::configure($pkg, $_[0]));
231        } elsif(scalar(@_) > 0) {    # >= 3 would also work here
232
0
0
                init(\@_);
233        }
234}
235
236 - 315
=head2 new

Create an object to point to a read-only database.

Arguments:

Takes different argument formats (hash or positional)

=over 4

=item * C<auto_load>

Enable/disable the AUTOLOAD feature.
The default is to have it enabled.

=item * C<cache>

Place to store results

=item * C<cache_duration>

How long to store results in the cache (default is 1 hour).

=item * C<config_file>

Points to a configuration file which contains the parameters to C<new()>.
The file can be in any common format including C<YAML>, C<XML>, and C<INI>.
This allows the parameters to be set at run time.

=item * C<expires_in>

Synonym of C<cache_duration>, for compatibility with C<CHI>.

=item * C<dbname>

The prefix of name of the database file (default is name of the table).
The database will be held in a file such as $dbname.csv.

=item * C<directory>

Where the database file is held.
If only one argument is given to C<new()>, it is taken to be C<directory>.

=item * C<filename>

Filename containing the data.
When not given,
the filename is derived from the tablename
which in turn comes from the class name.

=item * C<logger>

Takes an optional parameter logger, which is used for warnings and traces.
Can be an object that understands warn() and trace() messages,
such as a L<Log::Log4perl> or L<Log::Any> object,
a reference to code,
or a filename.

=item * C<max_slurp_size>

CSV/PSV/XML files smaller than this are held in a HASH in RAM (default is 16K),
falling back to SQL on larger data sets.
Setting this value to 0 will turn this feature off,
thus forcing SQL to be used to access the database

=back

If the arguments are not set, tries to take from class level defaults.

Checks for abstract class usage.

Slurp mode assumes that the key column (entry) is unique.
If it isn't, searches will be incomplete.
Turn off slurp mode on those databases,
by setting a low value for max_slurp_size.

Clones existing objects with or without modifications.
Uses Carp::carp to log warnings for incorrect usage or potential mistakes.

=cut
316
317sub new {
318
35
98508
        my $class = shift;
319
35
33
        my %args;
320
321        # Handle hash or hashref arguments
322
35
112
        if((scalar(@_) == 1) && !ref($_[0])) {
323
4
7
                $args{'directory'} = $_[0];
324        } elsif(my $params = Params::Get::get_params(undef, @_)) {
325
24
24
271
39
                %args = %{$params};
326        }
327
328
35
155
        if(!defined($class)) {
329
0
0
                if((scalar keys %args) > 0) {
330                        # Using Database::Abstraction->new(), not Database::Abstraction::new()
331
0
0
                        carp(__PACKAGE__, ' use ->new() not ::new() to instantiate');
332
0
0
                        return;
333                }
334                # FIXME: this only works when no arguments are given
335
0
0
                $class = __PACKAGE__;
336        } elsif($class eq __PACKAGE__) {
337
1
13
                croak("$class: abstract class");
338        } elsif(Scalar::Util::blessed($class)) {
339                # If $class is an object, clone it with new arguments
340
2
2
2
9
                return bless { %{$class}, %args }, ref($class);
341        }
342
343        # Load the configuration from a config file, if provided
344
32
32
30
59
        %args = %{Object::Configure::configure($class, \%args)};
345
346        # Validate logger object has required methods
347
32
54664
        if(defined $args{'logger'}) {
348
32
230
                unless(Scalar::Util::blessed($args{'logger'}) && $args{'logger'}->can('info') && $args{'logger'}->can('error')) {
349
0
0
                        Carp::croak("Logger must be an object with info() and error() methods");
350                }
351        }
352
353
32
82
        croak("$class: where are the files?") unless($args{'directory'} || $defaults{'directory'});
354
355
31
205
        croak("$class: ", $args{'directory'} || $defaults{'directory'}, ' is not a directory') unless(-d ($args{'directory'} || $defaults{'directory'}));
356
357        # init(\%args);
358
359        # return bless {
360                # logger => $args{'logger'} || $logger,
361                # directory => $args{'directory'} || $directory,     # The directory containing the tables in XML, SQLite or CSV format
362                # cache => $args{'cache'} || $cache,
363                # cache_duration => $args{'cache_duration'} || $cache_duration || '1 hour',
364                # table => $args{'table'},   # The name of the file containing the table, defaults to the class name
365                # no_entry => $args{'no_entry'} || 0,
366        # }, $class;
367
368        # Re-seen keys take precedence, so defaults come first
369        # print STDERR ">>>>>>>>>>>>>>>>>>>\n";
370        # print STDERR __LINE__, "\n";
371        # print STDERR $args{'id'} || 'undef';
372        # print STDERR "\n";
373
30
156
        return bless {
374                no_entry => 0,
375                no_fixate => 0,
376                id => 'entry',
377                cache_duration => '1 hour',
378                max_slurp_size => DEFAULT_MAX_SLURP_SIZE,
379                %defaults,
380                %args,
381        }, $class;
382}
383
384 - 388
=head2  set_logger

Sets the class, code reference, or file that will be used for logging.

=cut
389
390sub set_logger
391{
392
5
1622
        my $self = shift;
393
5
8
        my $params = Params::Get::get_params('logger', @_);
394
395
4
46
        if(my $logger = ($params->{'logger'})) {
396
4
7
                if(Scalar::Util::blessed($logger)) {
397
2
7
                        $self->{'logger'} = $logger;
398                } else {
399
2
4
                        $self->{'logger'} = Log::Abstraction->new($logger);
400                }
401
4
40
                return $self;
402        }
403
0
0
        Carp::croak('Usage: set_logger(logger => $logger)')
404}
405
406# Open the database connection based on the specified type (e.g., SQLite, CSV).
407# Read the data into memory or establish a connection to the database file.
408# column_names allows the column names to be overridden on CSV files
409
410sub _open
411{
412
18
71
        if(!UNIVERSAL::isa((caller)[0], __PACKAGE__)) {
413
0
0
                Carp::croak('Illegal Operation: This method can only be called by a subclass');
414        }
415
416
18
194
        my $self = shift;
417
18
43
        my $params = Params::Get::get_params(undef, @_);
418
419
18
221
        $params->{'sep_char'} ||= $self->{'sep_char'} ? $self->{'sep_char'} : '!';
420
18
48
        my $max_slurp_size = $params->{'max_slurp_size'} || $self->{'max_slurp_size'};
421
422
18
46
        my $table = $self->{'table'} || ref($self);
423
18
42
        $table =~ s/.*:://;
424
425
18
61
        $self->_trace(ref($self), ": _open $table");
426
427
18
736
        return if($self->{$table});
428
429        # Read in the database
430
18
15
        my $dbh;
431
432
18
32
        my $dir = $self->{'directory'} || $defaults{'directory'};
433
18
74
        my $dbname = $self->{'dbname'} || $defaults{'dbname'} || $table;
434
18
109
        my $slurp_file = File::Spec->catfile($dir, "$dbname.sql");
435
436
18
58
        $self->_debug("_open: try to open $slurp_file");
437
438        # Look at various places to find the file and derive the file type from the file's name
439
18
650
        if(-r $slurp_file) {
440                # SQLite file
441
0
0
                require DBI && DBI->import() unless DBI->can('connect');
442
443
0
0
                $dbh = DBI->connect("dbi:SQLite:dbname=$slurp_file", undef, undef, {
444                        sqlite_open_flags => SQLITE_OPEN_READONLY,
445                });
446        }
447
18
143
        if($dbh) {
448
0
0
                $dbh->do('PRAGMA synchronous = OFF');
449
0
0
                $dbh->do('PRAGMA cache_size = -4096');       # Use 4MB cache - negative = KB)
450
0
0
                $dbh->do('PRAGMA journal_mode = OFF');       # Read-only, no journal needed
451
0
0
                $dbh->do('PRAGMA temp_store = MEMORY');      # Store temp data in RAM
452
0
0
                $dbh->do('PRAGMA mmap_size = 1048576');      # Use 1MB memory-mapped I/O
453
0
0
                $dbh->sqlite_busy_timeout(100000);   # 10s
454
0
0
                $self->_debug("read in $table from SQLite $slurp_file");
455
0
0
                $self->{'type'} = 'DBI';
456        } elsif($self->_is_berkeley_db(File::Spec->catfile($dir, "$dbname.db"))) {
457
0
0
                $self->_debug("$table is a BerkeleyDB file");
458
0
0
                $self->{'type'} = 'BerkeleyDB';
459        } else {
460
18
20
                my $fin;
461
18
55
                ($fin, $slurp_file) = File::pfopen::pfopen($dir, $dbname, 'csv.gz:db.gz', '<');
462
18
619
                if(defined($slurp_file) && (-r $slurp_file)) {
463
0
0
                        require Gzip::Faster;
464
0
0
                        Gzip::Faster->import();
465
466
0
0
                        close($fin);
467
0
0
                        $fin = File::Temp->new(SUFFIX => '.csv', UNLINK => 0);
468
0
0
                        print $fin gunzip_file($slurp_file);
469
0
0
                        $slurp_file = $fin->filename();
470
0
0
                        $self->{'temp'} = $slurp_file;
471                } else {
472
18
42
                        ($fin, $slurp_file) = File::pfopen::pfopen($dir, $dbname, 'psv', '<');
473
18
335
                        if(defined($fin)) {
474                                # Pipe separated file
475
4
8
                                $params->{'sep_char'} = '|';
476                        } else {
477                                # CSV file
478
14
31
                                ($fin, $slurp_file) = File::pfopen::pfopen($dir, $dbname, 'csv:db', '<');
479                        }
480                }
481
18
374
                if(my $filename = $self->{'filename'} || $defaults{'filename'}) {
482
1
3
                        $self->_debug("Looking for $filename in $dir");
483
1
14
                        $slurp_file = File::Spec->catfile($dir, $filename);
484                }
485
18
104
                if(defined($slurp_file) && (-r $slurp_file)) {
486
15
60
                        close($fin) if(defined($fin));
487
15
17
                        my $sep_char = $params->{'sep_char'};
488
489
15
43
                        $self->_debug(__LINE__, ' of ', __PACKAGE__, ": slurp_file = $slurp_file, sep_char = $sep_char");
490
491
15
593
                        if($params->{'column_names'}) {
492                                $dbh = DBI->connect("dbi:CSV:db_name=$slurp_file", undef, undef,
493                                        {
494                                                csv_sep_char => $sep_char,
495                                                csv_tables => {
496                                                        $table => {
497
1
10
                                                                col_names => $params->{'column_names'},
498                                                        },
499                                                },
500                                        }
501                                );
502                        } else {
503
14
82
                                $dbh = DBI->connect("dbi:CSV:db_name=$slurp_file", undef, undef, { csv_sep_char => $sep_char});
504                        }
505
15
626682
                        $dbh->{'RaiseError'} = 1;
506
507
15
107
                        $self->_debug("read in $table from CSV $slurp_file");
508
509
15
774
                        $dbh->{csv_tables}->{$table} = {
510                                allow_loose_quotes => 1,
511                                blank_is_undef => 1,
512                                empty_is_undef => 1,
513                                binary => 1,
514                                f_file => $slurp_file,
515                                escape_char => '\\',
516                                sep_char => $sep_char,
517                                # Don't do this, causes "Bizarre copy of HASH
518                                #       in scalar assignment in error_diag
519                                #       RT121127
520                                # auto_diag => 1,
521                                auto_diag => 0,
522                                # Don't do this, it causes "Attempt to free unreferenced scalar"
523                                # callbacks => {
524                                        # after_parse => sub {
525                                                # my ($csv, @rows) = @_;
526                                                # my @rc;
527                                                # foreach my $row(@rows) {
528                                                        # if($row->[0] !~ /^#/) {
529                                                                # push @rc, $row;
530                                                        # }
531                                                # }
532                                                # return @rc;
533                                        # }
534                                # }
535                        };
536
537                        # my %options = (
538                                # allow_loose_quotes => 1,
539                                # blank_is_undef => 1,
540                                # empty_is_undef => 1,
541                                # binary => 1,
542                                # f_file => $slurp_file,
543                                # escape_char => '\\',
544                                # sep_char => $sep_char,
545                        # );
546
547                        # $dbh->{csv_tables}->{$table} = \%options;
548                        # delete $options{f_file};
549
550                        # require Text::CSV::Slurp;
551                        # Text::CSV::Slurp->import();
552                        # $self->{'data'} = Text::CSV::Slurp->load(file => $slurp_file, %options);
553
554                        # Can't slurp when we want to use our own column names as Text::xSV::Slurp has no way to override the names
555                        # FIXME: Text::xSV::Slurp can't cope well with quotes in field contents
556
15
12811
                        if(((-s $slurp_file) <= $max_slurp_size) && !$params->{'column_names'}) {
557
13
61
                                if((-s $slurp_file) == 0) {
558                                        # Empty file
559
0
0
                                        $self->{'data'} = ();
560                                } else {
561
13
1963
                                        require Text::xSV::Slurp;
562
13
29104
                                        Text::xSV::Slurp->import();
563
564
13
68
                                        $self->_debug('slurp in');
565
566
13
673
                                        my $dataref = xsv_slurp(
567                                                shape => 'aoh',
568                                                text_csv => {
569                                                        sep_char => $sep_char,
570                                                        allow_loose_quotes => 1,
571                                                        blank_is_undef => 1,
572                                                        empty_is_undef => 1,
573                                                        binary => 1,
574                                                        escape_char => '\\',
575                                                },
576                                                # string => \join('', grep(!/^\s*(#|$)/, <DATA>))
577                                                file => $slurp_file
578                                        );
579
580                                        # Ignore blank lines or lines starting with # in the CSV file
581
13
44
65
13
2486
62
93
14
                                        my @data = grep { $_->{$self->{'id'}} !~ /^\s*#/ } grep { defined($_->{$self->{'id'}}) } @{$dataref};
582
583
13
29
                                        if($self->{'no_entry'}) {
584                                                # Not keyed, will need to scan each entry
585
3
8
                                                $self->{'data'} = @data;
586                                        } else {
587                                                # keyed on the $self->{'id'} (default: "entry") column
588                                                # while(my $d = shift @data) {
589                                                        # $self->{'data'}->{$d->{$self->{'id'}}} = $d;
590                                                # }
591                                                # Build hash directly from the filtered array, better to use map to avoid data copy
592                                                # and enclose in { } to ensure it's a hash ref
593
10
38
9
59
                                                $self->{'data'} = { map { $_->{$self->{'id'}} => $_ } @data };
594                                        }
595                                }
596                        }
597
15
48
                        $self->{'type'} = 'CSV';
598                } else {
599
3
10
                        $slurp_file = File::Spec->catfile($dir, "$dbname.xml");
600
3
16
                        if(-r $slurp_file) {
601
2
10
                                if((-s $slurp_file) <= $max_slurp_size) {
602
1
350
                                        require XML::Simple;
603
1
3780
                                        XML::Simple->import();
604
605
1
32
                                        my $xml = XMLin($slurp_file);
606
1
1
18295
3
                                        my @keys = keys %{$xml};
607
1
1
                                        my $key = $keys[0];
608
1
1
                                        my @data;
609
1
2
                                        if(ref($xml->{$key}) eq 'ARRAY') {
610
1
1
12
2
                                                @data = @{$xml->{$key}};
611                                        } elsif(ref($xml) eq 'ARRAY') {
612
0
0
0
0
                                                @data = @{$xml};
613                                        } elsif((ref($xml) eq 'HASH') && !$self->{'no_entry'}) {
614
0
0
0
0
                                                if(scalar(keys %{$xml}) == 1) {
615
0
0
                                                        if($xml->{$table}) {
616
0
0
                                                                @data = $xml->{$table};
617                                                        } else {
618
0
0
                                                                die 'TODO: import arbitrary XML with "entry" field';
619                                                        }
620                                                } else {
621
0
0
                                                        die 'TODO: import arbitrary XML (differnt number of keys)';
622                                                }
623                                        } else {
624
0
0
                                                die 'TODO: import arbitrary XML, cannot currently handle ', ref($xml);
625                                        }
626
1
2
                                        $self->{'data'} = ();
627
1
3
                                        if($self->{'no_entry'}) {
628                                                # Not keyed, will need to scan each entry
629
0
0
                                                my $i = 0;
630
0
0
                                                foreach my $d(@data) {
631
0
0
                                                        $self->{'data'}->{$i++} = $d;
632                                                }
633                                        } else {
634                                                # keyed on the $self->{'id'} (default: "entry") column
635
1
2
                                                foreach my $d(@data) {
636
5
8
                                                        $self->{'data'}->{$d->{$self->{'id'}}} = $d;
637                                                }
638                                        }
639                                } else {
640
1
5
                                        $dbh = DBI->connect('dbi:XMLSimple(RaiseError=>1):');
641
1
88735
                                        $dbh->{'RaiseError'} = 1;
642
1
12
                                        $self->_debug("read in $table from XML $slurp_file");
643
1
24
                                        $dbh->func($table, 'XML', $slurp_file, 'xmlsimple_import');
644                                }
645                        } else {
646                                # throw Error(-file => "$dir/$table");
647
1
4
                                $self->_fatal("Can't find a file called '$dbname' for the table $table in $dir");
648                        }
649
2
9
                        $self->{'type'} = 'XML';
650                }
651        }
652
653
17
0
81
0
        Data::Reuse::fixate(%{$self->{'data'}}) if($self->{'data'} && (ref($self->{'data'} eq 'HASH')));
654
655
17
45
        $self->{$table} = $dbh;
656
17
129
        my @statb = stat($slurp_file);
657
17
31
        $self->{'_updated'} = $statb[9];
658
659
17
36
        return $self;
660}
661
662 - 674
=head2  selectall_arrayref

Returns a reference to an array of hash references of all the data meeting
the given criteria.

Note that since this returns an array ref,
optimisations such as "LIMIT 1" will not be used.

Use caching if that is available.

Returns undef if there are no matches.

=cut
675
676sub selectall_arrayref {
677
8
673
        my $self = shift;
678
8
7
        my $params;
679
680
8
23
        if($self->{'no_entry'}) {
681
1
3
                $params = Params::Get::get_params(undef, \@_);
682        } elsif(scalar(@_)) {
683
4
10
                $params = Params::Get::get_params('entry', @_);
684        }
685
686
8
68
        if($self->{'berkeley'}) {
687
0
0
                $self->_fatal(ref($self), ': selectall_arrayref is meaningless on a NoSQL database');
688        }
689
690
8
18
        my $table = $self->_open_table($params);
691
692
8
13
        if($self->{'data'}) {
693
4
4
3
7
                if(scalar(keys %{$params}) == 0) {
694
2
7
                        $self->_trace("$table: selectall_arrayref fast track return");
695
2
34
                        if(ref($self->{'data'}) eq 'HASH') {
696                                # $self->{'data'} looks like this:
697                                #       key1 => {
698                                #               entry => key1,
699                                #               field1 => value1,
700                                #               field2 => value2
701                                #       }, key2 => {
702                                #               entry => key2,
703                                #               field1 => valuea,
704                                #               field2 => valueb
705                                #       }
706
2
2
3
6
                                $self->_debug("$table: returning ", scalar keys %{$self->{'data'}}, ' entries');
707
2
2
22
3
                                if(scalar keys %{$self->{'data'}} <= 10) {
708
2
9
                                        $self->_debug(Dumper($self->{'data'}));
709                                }
710
2
23
                                my @rc;
711
2
2
3
4
                                foreach my $k (keys %{$self->{'data'}}) {
712
9
9
                                        push @rc, $self->{'data'}->{$k};
713                                }
714
2
9
                                return Return::Set::set_return(\@rc, { type => 'arrayref' });
715                        }
716
0
0
                        return Return::Set::set_return($self->{'data'}, { type => 'arrayref'});
717                        # my @rc = values %{$self->{'data'}};
718                        # return @rc;
719
2
6
                } elsif((scalar(keys %{$params}) == 1) && defined($params->{'entry'}) && !$self->{'no_entry'}) {
720
1
3
                        return Return::Set::set_return([$self->{'data'}->{$params->{'entry'}}], { type => 'arrayref' });
721                }
722        }
723
724
5
5
        my $query;
725
5
4
        my $done_where = 0;
726
727
5
17
        if(($self->{'type'} eq 'CSV') && !$self->{no_entry}) {
728
3
2
                $query = "SELECT * FROM $table WHERE entry IS NOT NULL AND entry NOT LIKE '#%'";
729
3
2
                $done_where = 1;
730        } else {
731
2
2
                $query = "SELECT * FROM $table";
732        }
733
734
5
5
        my @query_args;
735
5
5
4
10
        foreach my $c1(sort keys(%{$params})) { # sort so that the key is always the same
736
3
2
                my $arg = $params->{$c1};
737
3
10
                if(ref($arg)) {
738                        # throw Error::Simple("$query: argument is not a string: " . ref($arg));
739
0
0
                        $self->_fatal("selectall_arrayref(): $query: argument is not a string: ", ref($arg));
740                }
741
3
5
                if(!defined($arg)) {
742
0
0
                        my @call_details = caller(0);
743                        # throw Error::Simple("$query: value for $c1 is not defined in call from " .
744                                # $call_details[2] . ' of ' . $call_details[1]);
745
0
0
                        Carp::croak("$query: value for $c1 is not defined in call from ",
746                                $call_details[2], ' of ', $call_details[1]);
747                }
748
749
3
3
                my $keyword;
750
3
4
                if($done_where) {
751
3
3
                        $keyword = 'AND';
752                } else {
753
0
0
                        $keyword = 'WHERE';
754
0
0
                        $done_where = 1;
755                }
756
3
6
                if($arg =~ /[%_]/) {
757
0
0
                        $query .= " $keyword $c1 LIKE ?";
758                } else {
759
3
4
                        $query .= " $keyword $c1 = ?";
760                }
761
3
4
                push @query_args, $arg;
762        }
763
5
9
        if(!$self->{no_entry}) {
764
4
6
                $query .= ' ORDER BY ' . $self->{'id'};
765        }
766
767
5
8
        if(defined($query_args[0])) {
768
3
7
                $self->_debug("selectall_arrayref $query: ", join(', ', @query_args));
769        } else {
770
2
4
                $self->_debug("selectall_arrayref $query");
771        }
772
773
5
87
        my $key;
774        my $c;
775
5
8
        if($c = $self->{cache}) {
776
0
0
                $key = ref($self) . "::$query array";
777
0
0
                if(defined($query_args[0])) {
778
0
0
                        $key .= ' ' . join(', ', @query_args);
779                }
780
0
0
                if(my $rc = $c->get($key)) {
781
0
0
                        $self->_debug('cache HIT');
782
0
0
                        return $rc;     # We stored a ref to the array
783
784                        # This use of a temporary variable is to avoid
785                        #       "Implicit scalar context for array in return"
786                        # my @rc = @{$rc};
787                        # return @rc;
788                }
789
0
0
                $self->_debug('cache MISS');
790        } else {
791
5
7
                $self->_debug('cache not used');
792        }
793
794
5
70
        if(my $sth = $self->{$table}->prepare($query)) {
795
5
6298
                $sth->execute(@query_args) ||
796                        # throw Error::Simple("$query: @query_args");
797                        croak("$query: @query_args");
798
799
5
9280
                my $rc;
800
5
33
                while(my $href = $sth->fetchrow_hashref()) {
801
27
27
27
661
47
32
                        push @{$rc}, $href if(scalar keys %{$href});
802                }
803
5
104
                if($c) {
804
0
0
                        $c->set($key, $rc, $self->{'cache_duration'});    # Store a ref to the array
805                }
806
807
5
5
8
13
                Data::Reuse::fixate(@{$rc}) if(!$self->{'no_fixate'});
808
809
5
805
                return $rc;
810        }
811
0
0
        $self->_warn("selectall_array failure on $query: @query_args");
812        # throw Error::Simple("$query: @query_args");
813
0
0
        croak("$query: @query_args");
814
815        # my @rc = grep { defined $_ } $self->selectall_array(@_);
816
817        # return if(scalar(@rc) == 0);
818
819        # Data::Reuse::fixate(@rc) if(!$self->{'no_fixate'});
820        # return \@rc;
821}
822
823 - 827
=head2  selectall_hashref

Deprecated misleading legacy name for selectall_arrayref.

=cut
828
829sub selectall_hashref
830{
831
6
1388
        my $self = shift;
832
6
19
        return $self->selectall_arrayref(@_);
833}
834
835 - 844
=head2  selectall_array

Similar to selectall_array but returns an array of hash references.

Con:    Copies more data around than selectall_arrayref
Pro:    Better determination of list vs scalar mode than selectall_arrayref by setting "LIMIT 1"

TODO:   Remove duplicated code

=cut
845
846sub selectall_array
847{
848
7
7
        my $self = shift;
849
850
7
20
        if($self->{'berkeley'}) {
851
0
0
                Carp::croak(ref($self), ': selectall_array is meaningless on a NoSQL database');
852        }
853
854
7
18
        my $params = Params::Get::get_params(undef, \@_);
855
7
92
        my $table = $self->_open_table($params);
856
857
7
15
        if($self->{'data'}) {
858
4
4
4
10
                if(scalar(keys %{$params}) == 0) {
859
3
8
                        $self->_trace("$table: selectall_array fast track return");
860
3
50
                        if(ref($self->{'data'}) eq 'HASH') {
861
3
3
3
9
                                return values %{$self->{'data'}};
862                        }
863
0
0
0
0
                        return @{$self->{'data'}};
864                        # my @rc = values %{$self->{'data'}};
865                        # return @rc;
866
1
6
                } elsif((scalar(keys %{$params}) == 1) && defined($params->{'entry'}) && !$self->{'no_entry'}) {
867
0
0
                        return $self->{'data'}->{$params->{'entry'}};
868                }
869        }
870
871
4
4
        my $query;
872
4
4
        my $done_where = 0;
873
874
4
12
        if(($self->{'type'} eq 'CSV') && !$self->{no_entry}) {
875
2
2
                $query = "SELECT * FROM $table WHERE entry IS NOT NULL AND entry NOT LIKE '#%'";
876
2
2
                $done_where = 1;
877        } else {
878
2
3
                $query = "SELECT * FROM $table";
879        }
880
881
4
3
        my @query_args;
882
4
4
4
9
        foreach my $c1(sort keys(%{$params})) { # sort so that the key is always the same
883
3
3
                my $arg = $params->{$c1};
884
3
5
                if(ref($arg)) {
885                        # throw Error::Simple("$query: argument is not a string: " . ref($arg));
886
0
0
                        $self->_fatal("selectall_array(): $query: argument is not a string: ", ref($arg));
887                }
888
3
5
                if(!defined($arg)) {
889
1
1
                        my @call_details = caller(0);
890                        # throw Error::Simple("$query: value for $c1 is not defined in call from " .
891                                # $call_details[2] . ' of ' . $call_details[1]);
892
1
13
                        Carp::croak("$query: value for $c1 is not defined in call from ",
893                                $call_details[2], ' of ', $call_details[1]);
894                }
895
896
2
2
                my $keyword;
897
2
3
                if($done_where) {
898
1
1
                        $keyword = 'AND';
899                } else {
900
1
1
                        $keyword = 'WHERE';
901
1
1
                        $done_where = 1;
902                }
903
2
4
                if($arg =~ /[%_]/) {
904
0
0
                        $query .= " $keyword $c1 LIKE ?";
905                } else {
906
2
3
                        $query .= " $keyword $c1 = ?";
907                }
908
2
3
                push @query_args, $arg;
909        }
910
3
22
        if(!$self->{no_entry}) {
911
2
5
                $query .= ' ORDER BY ' . $self->{'id'};
912        }
913
3
3
        if(!wantarray) {
914
0
0
                $query .= ' LIMIT 1';
915        }
916
917
3
4
        if(defined($query_args[0])) {
918
2
6
                $self->_debug("selectall_array $query: ", join(', ', @query_args));
919        } else {
920
1
2
                $self->_debug("selectall_array $query");
921        }
922
923
3
54
        my $key;
924        my $c;
925
3
7
        if($c = $self->{cache}) {
926
0
0
                $key = ref($self) . '::' . $query;
927
0
0
                if(wantarray) {
928
0
0
                        $key .= ' array';
929                }
930
0
0
                if(defined($query_args[0])) {
931
0
0
                        $key .= ' ' . join(', ', @query_args);
932                }
933
0
0
                if(my $rc = $c->get($key)) {
934
0
0
                        $self->_debug('cache HIT');
935
0
0
0
0
                        return wantarray ? @{$rc} : $rc;        # We stored a ref to the array
936
937                        # This use of a temporary variable is to avoid
938                        #       "Implicit scalar context for array in return"
939                        # my @rc = @{$rc};
940                        # return @rc;
941                }
942
0
0
                $self->_debug('cache MISS');
943        } else {
944
3
6
                $self->_debug('cache not used');
945        }
946
947
3
44
        if(my $sth = $self->{$table}->prepare($query)) {
948
3
3714
                $sth->execute(@query_args) ||
949                        # throw Error::Simple("$query: @query_args");
950                        croak("$query: @query_args");
951
952
3
6420
                my $rc;
953
3
16
                while(my $href = $sth->fetchrow_hashref()) {
954
21
511
                        return $href if(!wantarray);    # FIXME: Doesn't store in the cache
955
21
21
13
38
                        push @{$rc}, $href;
956                }
957
3
66
                if($c) {
958
0
0
                        $c->set($key, $rc, $self->{'cache_duration'});    # Store a ref to the array
959                }
960
961
3
9
                if($rc) {
962
3
3
5
16
                        Data::Reuse::fixate(@{$rc}) if(!$self->{'no_fixate'});
963
3
3
625
14
                        return @{$rc};
964                }
965
0
0
                return;
966        }
967
0
0
        $self->_warn("selectall_array failure on $query: @query_args");
968        # throw Error::Simple("$query: @query_args");
969
0
0
        croak("$query: @query_args");
970}
971
972 - 976
=head2  selectall_hash

Deprecated misleading legacy name for selectall_array.

=cut
977
978sub selectall_hash
979{
980
7
1873
        my $self = shift;
981
7
29
        return $self->selectall_array(@_);
982}
983
984 - 988
=head2  count

Return the number items/rows matching the given criteria

=cut
989
990sub count
991{
992
5
831
        my $self = shift;
993
994
5
9
        if($self->{'berkeley'}) {
995
0
0
                Carp::croak(ref($self), ': count is meaningless on a NoSQL database');
996        }
997
998
5
13
        my $params = Params::Get::get_params(undef, \@_);
999
5
54
        my $table = $self->_open_table($params);
1000
1001
5
9
        if($self->{'data'}) {
1002
3
3
2
6
                if(scalar(keys %{$params}) == 0) {
1003
2
5
                        $self->_trace("$table: count fast track return");
1004
2
25
                        if(ref($self->{'data'}) eq 'HASH') {
1005
2
2
2
6
                                return scalar keys %{$self->{'data'}};
1006                        }
1007
0
0
0
0
                        return scalar @{$self->{'data'}};
1008
1
4
                } elsif((scalar(keys %{$params}) == 1) && defined($params->{'entry'}) && !$self->{'no_entry'}) {
1009
0
0
                        return $self->{'data'}->{$params->{'entry'}} ? 1 : 0;
1010                }
1011        }
1012
1013
3
4
        my $query;
1014
3
3
        my $done_where = 0;
1015
1016
3
10
        if(($self->{'type'} eq 'CSV') && !$self->{no_entry}) {
1017
1
1
                $query = "SELECT COUNT(*) FROM $table WHERE entry IS NOT NULL AND entry NOT LIKE '#%'";
1018
1
0
                $done_where = 1;
1019        } elsif($self->{no_entry}) {
1020
2
3
                $query = "SELECT COUNT(*) FROM $table";
1021        } else {
1022
0
0
                $query = "SELECT COUNT(entry) FROM $table";
1023        }
1024
1025
3
4
        my @query_args;
1026
3
3
3
6
        foreach my $c1(sort keys(%{$params})) { # sort so that the key is always the same
1027
1
2
                my $arg = $params->{$c1};
1028
1
1
                if(ref($arg)) {
1029                        # throw Error::Simple("$query: argument is not a string: " . ref($arg));
1030
0
0
                        $self->_fatal("count(): $query: argument is not a string: ", ref($arg));
1031                }
1032
1
2
                if(!defined($arg)) {
1033
0
0
                        my @call_details = caller(0);
1034                        # throw Error::Simple("$query: value for $c1 is not defined in call from " .
1035                                # $call_details[2] . ' of ' . $call_details[1]);
1036
0
0
                        Carp::croak("$query: value for $c1 is not defined in call from ",
1037                                $call_details[2], ' of ', $call_details[1]);
1038                }
1039
1040
1
1
                my $keyword;
1041
1
1
                if($done_where) {
1042
1
2
                        $keyword = 'AND';
1043                } else {
1044
0
0
                        $keyword = 'WHERE';
1045
0
0
                        $done_where = 1;
1046                }
1047
1
2
                if($arg =~ /[%_]/) {
1048
0
0
                        $query .= " $keyword $c1 LIKE ?";
1049                } else {
1050
1
1
                        $query .= " $keyword $c1 = ?";
1051                }
1052
1
2
                push @query_args, $arg;
1053        }
1054
3
6
        if(!$self->{no_entry}) {
1055
1
1
                $query .= ' ORDER BY ' . $self->{'id'};
1056        }
1057
1058
3
5
        if(defined($query_args[0])) {
1059
1
3
                $self->_debug("count $query: ", join(', ', @query_args));
1060        } else {
1061
2
4
                $self->_debug("count $query");
1062        }
1063
1064
3
60
        my $key;
1065        my $c;
1066
3
8
        if($c = $self->{cache}) {
1067
0
0
                $key = ref($self) . '::' . $query;
1068
0
0
                $key =~ s/COUNT\((.+?)\)/$1/;
1069
0
0
                $key .= ' array';
1070
0
0
                if(defined($query_args[0])) {
1071
0
0
                        $key .= ' ' . join(', ', @query_args);
1072                }
1073
0
0
                if(my $rc = $c->get($key)) {
1074                        # Unlikely
1075
0
0
                        $self->_debug('cache HIT');
1076
0
0
0
0
                        return scalar @{$rc};   # We stored a ref to the array
1077                }
1078
0
0
                $self->_debug('cache MISS');
1079        } else {
1080
3
3
                $self->_debug('cache not used');
1081        }
1082
1083
3
40
        if(my $sth = $self->{$table}->prepare($query)) {
1084
3
1878
                $sth->execute(@query_args) ||
1085                        # throw Error::Simple("$query: @query_args");
1086                        croak("$query: @query_args");
1087
1088
3
3986
                my $count = $sth->fetchrow_arrayref()->[0];
1089
1090
3
61
                $sth->finish();
1091
1092
3
14
                return $count;
1093        }
1094
0
0
        $self->_warn("count failure on $query: @query_args");
1095        # throw Error::Simple("$query: @query_args");
1096
0
0
        croak("$query: @query_args");
1097}
1098
1099 - 1108
=head2  fetchrow_hashref

Returns a hash reference for a single row in a table.

Special argument: table: determines the table to read from if not the default,
which is worked out from the class name

When no_entry is not set allow just one argument to be given: the entry value.

=cut
1109
1110sub fetchrow_hashref {
1111
8
2071
        my $self = shift;
1112
1113
8
47
        $self->_trace('Entering fetchrow_hashref');
1114
1115
8
116
        my $params;
1116
1117
8
19
        if(!$self->{'no_entry'}) {
1118
7
21
                $params = Params::Get::get_params('entry', @_);
1119        } else {
1120
1
3
                $params = Params::Get::get_params(undef, @_);
1121        }
1122
1123
8
130
        my $table = $self->_open_table($params);
1124
1125        # ::diag($self->{'type'});
1126
8
7
25
26
        if($self->{'data'} && (!$self->{'no_entry'}) && (scalar keys(%{$params}) == 1) && defined($params->{'entry'})) {
1127
4
6
                $self->_debug('Fast return from slurped data');
1128
4
55
                return $self->{'data'}->{$params->{'entry'}};
1129        }
1130
1131
4
11
        if($self->{'berkeley'}) {
1132                # print STDERR ">>>>>>>>>>>>\n";
1133                # ::diag(Data::Dumper->new([$self->{'berkeley'}])->Dump());
1134
0
0
0
0
                if((!$self->{'no_entry'}) && (scalar keys(%{$params}) == 1) && defined($params->{'entry'})) {
1135
0
0
                        return { entry => $self->{'berkeley'}->{$params->{'entry'}} };
1136                }
1137
0
0
                my $id = $self->{'id'};
1138
0
0
0
0
                if($self->{'no_entry'} && (scalar keys(%{$params}) == 1) && defined($id) && defined($params->{$id})) {
1139
0
0
                        if(my $rc = $self->{'berkeley'}->{$params->{$id}}) {
1140
0
0
                                return { $params->{$id} => $rc }  # Return key->value as a hash pair
1141                        }
1142
0
0
                        return;
1143                }
1144
0
0
                Carp::croak(ref($self), ': fetchrow_hashref is meaningless on a NoSQL database');
1145        }
1146
1147
4
4
        my $query = 'SELECT * FROM ';
1148
4
10
        if(my $t = delete $params->{'table'}) {
1149
0
0
                $query .= $t;
1150        } else {
1151
4
6
                $query .= $table;
1152        }
1153
4
4
        my $done_where = 0;
1154
1155
4
14
        if(($self->{'type'} eq 'CSV') && !$self->{no_entry}) {
1156
3
5
                $query .= ' WHERE ' . $self->{'id'} . ' IS NOT NULL AND ' . $self->{'id'} . " NOT LIKE '#%'";
1157
3
2
                $done_where = 1;
1158        }
1159
4
4
        my @query_args;
1160
4
4
5
8
        foreach my $c1(sort keys(%{$params})) { # sort so that the key is always the same
1161
4
8
                if(my $arg = $params->{$c1}) {
1162
3
3
                        my $keyword;
1163
1164
3
5
                        if(ref($arg)) {
1165                                # throw Error::Simple("$query: argument is not a string: " . ref($arg));
1166
0
0
                                $self->_fatal("fetchrow_hash(): $query: argument is not a string: ", ref($arg));
1167                        }
1168
1169
3
4
                        if($done_where) {
1170
2
2
                                $keyword = 'AND';
1171                        } else {
1172
1
1
                                $keyword = 'WHERE';
1173
1
1
                                $done_where = 1;
1174                        }
1175
3
7
                        if($arg =~ /[%_]/) {
1176
0
0
                                $query .= " $keyword $c1 LIKE ?";
1177                        } else {
1178
3
5
                                $query .= " $keyword $c1 = ?";
1179                        }
1180
3
4
                        push @query_args, $arg;
1181                } elsif(!defined($arg)) {
1182
1
4
                        my @call_details = caller(0);
1183                        # throw Error::Simple("$query: value for $c1 is not defined in call from " .
1184                                # $call_details[2] . ' of ' . $call_details[1]);
1185
1
8
                        Carp::croak("$query: value for $c1 is not defined in call from ",
1186                                $call_details[2], ' of ', $call_details[1]);
1187                }
1188        }
1189        # $query .= ' ORDER BY entry LIMIT 1';
1190
3
3
        $query .= ' LIMIT 1';
1191
3
8
        if(defined($query_args[0])) {
1192
3
5
                my @call_details = caller(0);
1193
3
33
                $self->_debug("fetchrow_hashref $query: ", join(', ', @query_args),
1194                        ' called from ', $call_details[2], ' of ', $call_details[1]);
1195        } else {
1196
0
0
                $self->_debug("fetchrow_hashref $query");
1197        }
1198
3
42
        my $key = ref($self) . '::';
1199
3
3
        if(defined($query_args[0])) {
1200
3
6
                if(wantarray) {
1201
0
0
                        $key .= 'array ';
1202                }
1203
3
6
                $key .= "fetchrow $query " . join(', ', @query_args);
1204        } else {
1205
0
0
                $key .= "fetchrow $query";
1206        }
1207
3
3
        my $c;
1208
3
6
        if($c = $self->{cache}) {
1209
0
0
                if(my $rc = $c->get($key)) {
1210
0
0
                        if(wantarray) {
1211
0
0
                                if(ref($rc) eq 'ARRAY') {
1212
0
0
0
0
                                        return @{$rc};  # We stored a ref to the array
1213                                }
1214                        } else {
1215
0
0
                                return $rc;
1216                        }
1217                }
1218        }
1219
1220
3
38
        my $sth = $self->{$table}->prepare($query) or die $self->{$table}->errstr();
1221        # $sth->execute(@query_args) || throw Error::Simple("$query: @query_args");
1222
3
10925
        $sth->execute(@query_args) || croak("$query: @query_args");
1223
3
4459
        my $rc = $sth->fetchrow_hashref();
1224
3
150
        if($c) {
1225
0
0
                if($rc) {
1226
0
0
                        $self->_debug("stash $key=>$rc in the cache for ", $self->{'cache_duration'});
1227
0
0
                        $self->_debug("returns ", Data::Dumper->new([$rc])->Dump());
1228                } else {
1229
0
0
                        $self->_debug("Stash $key=>undef in the cache for ", $self->{'cache_duration'});
1230                }
1231
0
0
                $c->set($key, $rc, $self->{'cache_duration'});
1232        }
1233
3
25
        return $rc;
1234}
1235
1236 - 1249
=head2  execute

Execute the given SQL query on the database.
In an array context, returns an array of hash refs,
in a scalar context returns a hash of the first row

On CSV tables without no_entry, it may help to add
"WHERE entry IS NOT NULL AND entry NOT LIKE '#%'"
to the query.

If the data have been slurped,
this will still work by accessing that actual database.

=cut
1250
1251sub execute
1252{
1253
2
1197
        my $self = shift;
1254
1255
2
6
        if($self->{'berkeley'}) {
1256
0
0
                Carp::croak(ref($self), ': execute is meaningless on a NoSQL database');
1257        }
1258
1259
2
7
        my $args = Params::Get::get_params('query', @_);
1260        # Ensure the 'query' parameter is provided
1261        Carp::croak(__PACKAGE__, ': Usage: execute(query => $query)')
1262
2
32
                unless defined $args->{'query'};
1263
1264
2
5
        my $table = $self->_open_table($args);
1265
1266
2
3
        my $query = $args->{'query'};
1267
1268        # Append "FROM <table>" if missing
1269
2
8
        $query .= " FROM $table" unless $query =~ /\sFROM\s/i;
1270
1271        # Log the query if a logger is available
1272
2
7
        $self->_debug("execute $query");
1273
1274        # Prepare and execute the query
1275
2
56
        my $sth = $self->{$table}->prepare($query);
1276
2
1505
        $sth->execute() or croak($query);    # Die with the query in case of error
1277
1278        # Fetch the results
1279
2
1946
        my @results;
1280
2
12
        while (my $row = $sth->fetchrow_hashref()) {
1281                # Return a single hashref if scalar context is expected
1282
6
159
                return $row unless wantarray;
1283
6
14
                push @results, $row;
1284        }
1285
1286        # Return all rows as an array in list context
1287
2
40
        return @results;
1288}
1289
1290 - 1294
=head2 updated

Returns the timestamp of the last database update.

=cut
1295
1296sub updated {
1297
1
488
        my $self = shift;
1298
1299
1
3
        return $self->{'_updated'};
1300}
1301
1302 - 1323
=head2 AUTOLOAD

Directly access a database column.

Returns all entries in a column, a single entry based on criteria.
Uses cached data if available.

Returns an array of the matches,
or only the first when called in scalar context

If the database has a column called "entry" you can do a quick lookup with

    my $value = $foo->column('123'); # where "column" is the value you're after

    my @entries = $foo->entry();
    print 'There are ', scalar(@entries), " entries in the database\n";

Set distinct or unique to 1 if you're after a unique list.

Throws an error in slurp mode when an invalid column name is given.

=cut
1324
1325sub AUTOLOAD {
1326
30
7108
        our $AUTOLOAD;
1327
30
121
        my ($column) = $AUTOLOAD =~ /::(\w+)$/;
1328
1329
30
64
        return if($column eq 'DESTROY');
1330
1331
30
67
        my $self = shift or return;
1332
1333
30
52
        Carp::croak(__PACKAGE__, ": Unknown column $column") if(!ref($self));
1334
1335        # Allow the AUTOLOAD feature to be disabled
1336
30
66
        Carp::croak(__PACKAGE__, ": Unknown column $column") if(exists($self->{'auto_load'}) && boolean($self->{'auto_load'})->isFalse());
1337
1338        # Validate column name - only allow safe column name
1339
30
78
        Carp::croak(__PACKAGE__, ": Invalid column name: $column") unless $column =~ /^[a-zA-Z_][a-zA-Z0-9_]*$/;
1340
1341
30
69
        my $table = $self->_open_table();
1342
1343
29
27
        my %params;
1344
29
80
        if(ref($_[0]) eq 'HASH') {
1345
2
2
2
4
                %params = %{$_[0]};
1346        } elsif((scalar(@_) % 2) == 0) {
1347
11
16
                %params = @_;
1348        } elsif(scalar(@_) == 1) {
1349                # Don't error on key-value databases, since there's no idea of columns
1350
16
49
                if($self->{'no_entry'} && !$self->{'berkeley'}) {
1351
0
0
                        Carp::croak(ref($self), "::($_[0]): ", $self->{'id'}, ' is not a column');
1352                }
1353
16
24
                $params{'entry'} = shift;
1354        }
1355
1356
29
48
        if($self->{'berkeley'}) {
1357
0
0
                if(my $id = $self->{'id'}) {
1358
0
0
                        return $self->{'berkeley'}->{$params{$id}};
1359                }
1360
0
0
                return $self->{'berkeley'}->{$params{'entry'}};
1361        }
1362
1363
29
41
        croak('Where did the data come from?') if(!defined($self->{'type'}));
1364
29
26
        my $query;
1365
29
22
        my $done_where = 0;
1366
29
69
        my $distinct = delete($params{'distinct'}) || delete($params{'unique'});
1367
1368
29
56
        if(wantarray && !$distinct) {
1369
5
17
                if(((scalar keys %params) == 0) && (my $data = $self->{'data'})) {
1370                        # Return all the entries in the column
1371
3
12
3
3
19
5
                        return map { $_->{$column} } values %{$data};
1372                }
1373
2
10
                if(($self->{'type'} eq 'CSV') && !$self->{no_entry}) {
1374
1
3
                        $query = "SELECT $column FROM $table WHERE " . $self->{'id'} . " IS NOT NULL AND entry NOT LIKE '#%'";
1375
1
1
                        $done_where = 1;
1376                } else {
1377
1
2
                        $query = "SELECT $column FROM $table";
1378                }
1379        } else {
1380
24
36
                if(my $data = $self->{'data'}) {
1381                        # The data has been read in using Text::xSV::Slurp,
1382                        #       so no need to do any SQL
1383
18
22
                        $self->_debug('AUTOLOAD using slurped data');
1384
18
419
                        if($self->{'no_entry'}) {
1385
0
0
                                $self->_debug('no_entry is set');
1386
0
0
                                my ($key, $value) = %params;
1387
0
0
                                if(defined($key)) {
1388
0
0
                                        $self->_debug("key = $key, value = $value, column = $column");
1389
0
0
0
0
                                        foreach my $row(@{$data}) {
1390
0
0
                                                if(defined($row->{$key}) && ($row->{$key} eq $value) && (my $rc = $row->{$column})) {
1391
0
0
                                                        if(defined($rc)) {
1392
0
0
                                                                $self->_trace(__LINE__, ": AUTOLOAD $key: return '$rc' from slurped data");
1393                                                        } else {
1394
0
0
                                                                $self->_trace(__LINE__, ": AUTOLOAD $key: return undef from slurped data");
1395                                                        }
1396
0
0
                                                        return $rc
1397                                                }
1398                                        }
1399
0
0
                                        $self->_debug('not found in slurped data');
1400                                }
1401                        } elsif(((scalar keys %params) == 1) && defined(my $key = $params{'entry'})) {
1402                                # Look up the key
1403
1404                                # This weird code is to stop the data hash becoming polluted with empty
1405                                #       values as we look things up
1406                                # my $rc = $data->{$key}->{$column};
1407
14
12
                                my $rc;
1408
14
20
                                if(defined(my $hash = $data->{$key})) {
1409                                        # Look up the key
1410
10
15
                                        if(!exists($hash->{$column})) {
1411
1
13
                                                Carp::croak(__PACKAGE__, ": There is no column $column in $table");
1412                                        }
1413
9
11
                                        $rc = $hash->{$column};
1414                                }
1415
13
14
                                if(defined($rc)) {
1416
8
34
                                        $self->_trace(__LINE__, ": AUTOLOAD $key: return '$rc' from slurped data");
1417                                } else {
1418
5
10
                                        $self->_trace(__LINE__, ": AUTOLOAD $key: return undef from slurped data");
1419                                }
1420
13
273
                                return $rc
1421                        } elsif((scalar keys %params) == 0) {
1422
2
5
                                if(wantarray) {
1423
2
4
                                        if($distinct) {
1424                                                # https://stackoverflow.com/questions/7651/how-do-i-remove-duplicate-items-from-an-array-in-perl
1425
2
7
7
2
3
10
8
3
                                                my %h = map { $_, 1 } map { $_->{$column} } values %{$data};
1426
2
8
                                                return keys %h;
1427                                        }
1428
0
0
0
0
0
0
                                        return map { $_->{$column} } values %{$data}
1429                                }
1430                                # FIXME - this works but really isn't the right way to do it
1431
0
0
0
0
                                foreach my $v (values %{$data}) {
1432
0
0
                                        return $v->{$column}
1433                                }
1434                        } else {
1435                                # It's keyed, but we're not querying off it
1436
2
4
                                my ($key, $value) = %params;
1437
2
2
2
4
                                foreach my $row (values %{$data}) {
1438
3
14
                                        if(defined($row->{$key}) && ($row->{$key} eq $value) && (my $rc = $row->{$column})) {
1439
2
4
                                                if(defined($rc)) {
1440
2
4
                                                        $self->_trace(__LINE__, ": AUTOLOAD $key: return '$rc' from slurped data");
1441                                                } else {
1442
0
0
                                                        $self->_trace(__LINE__, ": AUTOLOAD $key: return undef from slurped data");
1443                                                }
1444
2
43
                                                return $rc
1445                                        }
1446                                }
1447                        }
1448                        return
1449
0
0
                }
1450                # Data has not been slurped in
1451
6
19
                if(($self->{'type'} eq 'CSV') && !$self->{no_entry}) {
1452
0
0
                        $query = "SELECT DISTINCT $column FROM $table WHERE " . $self->{'id'} . " IS NOT NULL AND entry NOT LIKE '#%'";
1453
0
0
                        $done_where = 1;
1454                } else {
1455
6
9
                        $query = "SELECT DISTINCT $column FROM $table";
1456                }
1457        }
1458
8
5
        my @args;
1459
8
21
        while(my ($key, $value) = each %params) {
1460
6
16
                $self->_debug(__PACKAGE__, ": AUTOLOAD adding key/value pair $key=>$value");
1461
6
384
                if(defined($value)) {
1462
6
27
                        if($done_where) {
1463
1
2
                                $query .= " AND $key = ?";
1464                        } else {
1465
5
8
                                $query .= " WHERE $key = ?";
1466
5
6
                                $done_where = 1;
1467                        }
1468
6
14
                        push @args, $value;
1469                } else {
1470
0
0
                        $self->_debug("AUTOLOAD params $key isn't defined");
1471
0
0
                        if($done_where) {
1472
0
0
                                $query .= " AND $key IS NULL";
1473                        } else {
1474
0
0
                                $query .= " WHERE $key IS NULL";
1475
0
0
                                $done_where = 1;
1476                        }
1477                }
1478        }
1479
8
12
        if(wantarray) {
1480
3
5
                $query .= " ORDER BY $column";
1481        } else {
1482
5
5
                $query .= ' LIMIT 1';
1483        }
1484
8
22
        if(scalar(@args) && $args[0]) {
1485
6
25
                $self->_debug("AUTOLOAD $query: ", join(', ', @args));
1486        } else {
1487
2
5
                $self->_debug("AUTOLOAD $query");
1488        }
1489
8
389
        my $cache;
1490
8
12
        my $key = ref($self) . '::';
1491
8
16
        if($cache = $self->{cache}) {
1492
0
0
                if(wantarray) {
1493
0
0
                        $key .= 'array ';
1494                }
1495
0
0
                if(defined($args[0])) {
1496
0
0
                        $key .= "fetchrow $query " . join(', ', @args);
1497                } else {
1498
0
0
                        $key .= "fetchrow $query";
1499                }
1500
0
0
                if(my $rc = $cache->get($key)) {
1501
0
0
                        $self->_debug('cache HIT');
1502
0
0
0
0
                        return wantarray ? @{$rc} : $rc;        # We stored a ref to the array
1503                }
1504
0
0
                $self->_debug('cache MISS');
1505        } else {
1506
8
10
                $self->_debug('cache not used');
1507        }
1508        # my $sth = $self->{$table}->prepare($query) || throw Error::Simple($query);
1509
8
405
        my $sth = $self->{$table}->prepare($query) || croak($query);
1510        # $sth->execute(@args) || throw Error::Simple($query);
1511
8
15683
        $sth->execute(@args) || croak($query);
1512
1513
8
20125
        if(wantarray) {
1514
3
10
3
3
204
31
                my @rc = map { $_->[0] } @{$sth->fetchall_arrayref()};
1515
3
5
                if($cache) {
1516
0
0
                        $cache->set($key, \@rc, $self->{'cache_duration'});       # Store a ref to the array
1517                }
1518
3
15
                Data::Reuse::fixate(@rc) if(!$self->{'no_fixate'});
1519
3
211
                return @rc;
1520        }
1521
5
43
        my $rc = $sth->fetchrow_array();     # Return the first match only
1522
5
111
        $sth->finish();
1523
5
21
        if($cache) {
1524
0
0
                return $cache->set($key, $rc, $self->{'cache_duration'});
1525        }
1526
5
42
        return $rc;
1527}
1528
1529sub DESTROY
1530{
1531
31
9323
        if(defined($^V) && ($^V ge 'v5.14.0')) {
1532
31
67
                return if ${^GLOBAL_PHASE} eq 'DESTRUCT';       # >= 5.14.0 only
1533        }
1534
31
44
        my $self = shift;
1535
1536        # Clean up temporary file
1537
31
51
        if($self->{'temp'}) {
1538
0
0
                unlink delete $self->{'temp'};
1539        }
1540
1541        # Clean up database handles
1542
31
66
        my $table_name = $self->{'table'} || ref($self);
1543
31
86
        $table_name =~ s/.*:://;
1544
1545
31
57
        if(my $dbh = delete $self->{$table_name}) {
1546
15
165
                $dbh->disconnect() if $dbh->can('disconnect');
1547
15
654
                $dbh->finish() if $dbh->can('finish');
1548        }
1549
1550        # Clean up Berkeley DB
1551
31
40
        if($self->{'berkeley'}) {
1552
0
0
                eval {
1553
0
0
0
0
                        untie %{$self->{'berkeley'}};
1554                };
1555
0
0
                delete $self->{'berkeley'};
1556        }
1557
1558        # Clear all other attributes to break potential circular references
1559
31
64
        foreach my $key (keys %$self) {
1560
294
823
                delete $self->{$key};
1561        }
1562}
1563
1564# Determine the table and open the database
1565sub _open_table
1566{
1567
60
73
        my($self, $params) = @_;
1568
1569        # Get table name (remove package name prefix if present)
1570
60
230
        my $table = $params->{'table'} || $self->{'table'} || ref($self);
1571
60
150
        $table =~ s/.*:://;
1572
1573        # Open a connection if it's not already open
1574
60
179
        $self->_open() if((!$self->{$table}) && (!$self->{'data'}));
1575
1576
59
92
        return $table;
1577}
1578
1579# Determine whether a given file is a valid Berkeley DB file.
1580# It combines a fast preliminary check with a more thorough validation step for accuracy.
1581# It looks for the magic number at both byte 0 and byte 12
1582# TODO: Combine _db_0 and _db_12 as they are very similar routines
1583sub _is_berkeley_db {
1584
18
32
        my ($self, $file) = @_;
1585
1586        # Step 1: Check magic number
1587
18
243
        open my $fh, '<', $file or return 0;
1588
0
0
        binmode $fh;
1589
1590
0
0
        my $is_db = (($self->_is_berkeley_db_0($fh)) || ($self->_is_berkeley_db_12($fh)));
1591
0
0
        close $fh;
1592
1593
0
0
        if($is_db) {
1594                # Step 2: Attempt to open as Berkeley DB
1595
1596
0
0
                require DB_File && DB_File->import();
1597
1598
0
0
                my %bdb;
1599
0
0
                if(tie %bdb, 'DB_File', $file, O_RDONLY, 0644, $DB_File::DB_HASH) {
1600                        # untie %db;
1601
0
0
                        $self->{'berkeley'} = \%bdb;
1602
0
0
                        return 1;       # Successfully identified as a Berkeley DB file
1603                }
1604        }
1605
0
0
        return 0;
1606}
1607
1608# Determine whether a given file is a valid Berkeley DB file.
1609# It combines a fast preliminary check with a more thorough validation step for accuracy.
1610sub _is_berkeley_db_0
1611{
1612
0
0
        my ($self, $fh) = @_;
1613
1614        # Read the first 4 bytes (magic number)
1615
0
0
        read($fh, my $magic_bytes, 4) == 4 or return 0;
1616
1617        # Unpack both big-endian and little-endian values
1618
0
0
        my $magic_be = unpack('N', $magic_bytes);       # Big-endian
1619
0
0
        my $magic_le = unpack('V', $magic_bytes);       # Little-endian
1620
1621        # Known Berkeley DB magic numbers (in both endian formats)
1622
0
0
0
0
        my %known_magic = map { $_ => 1 } (
1623                0x00061561,     # Btree
1624                0x00053162,     # Hash
1625                0x00042253,     # Queue
1626                0x00052444,     # Recno
1627        );
1628
1629
0
0
        return($known_magic{$magic_be} || $known_magic{$magic_le});
1630}
1631
1632sub _is_berkeley_db_12
1633{
1634
0
0
        my ($self, $fh) = @_;
1635
0
0
        my $header;
1636
1637
0
0
        seek $fh, 12, 0 or return 0;
1638
0
0
        read($fh, $header, 4) or return 0;
1639
1640
0
0
        $header = substr(unpack('H*', $header), 0, 4);
1641
1642        # Berkeley DB magic numbers
1643
0
0
        return($header eq '6115' || $header eq '1561'); # Btree
1644}
1645
1646# Log and remember a message
1647sub _log
1648{
1649
187
228
        my ($self, $level, @messages) = @_;
1650
1651        # FIXME: add caller's function
1652        # if(($level eq 'warn') || ($level eq 'notice')) {
1653
187
187
139
511
                push @{$self->{'messages'}}, { level => $level, message => join('', grep defined, @messages) };
1654        # }
1655
1656
187
377
        if(scalar(@messages) && (my $logger = $self->{'logger'})) {
1657
187
497
                $self->{'logger'}->$level(join('', grep defined, @messages));
1658        }
1659}
1660
1661sub _debug {
1662
138
263
        my $self = shift;
1663
138
200
        $self->_log('debug', @_);
1664}
1665
1666sub _trace {
1667
48
43
        my $self = shift;
1668
48
91
        $self->_log('trace', @_);
1669}
1670
1671# Emit a warning message somewhere
1672sub _warn {
1673
0
0
        my $self = shift;
1674
0
0
        my $params = Params::Get::get_params('warning', @_);
1675
1676
0
0
        $self->_log('warn', $params->{'warning'});
1677
0
0
        Carp::carp(join('', grep defined, $params->{'warning'}));
1678}
1679
1680# Die
1681sub _fatal {
1682
1
1
        my $self = shift;
1683
1
1
        my $params = Params::Get::get_params('warning', @_);
1684
1685
1
10
        $self->_log('error', $params->{'warning'});
1686
1
250
        Carp::croak(join('', grep defined, $params->{'warning'}));
1687}
1688
1689 - 1741
=head1 AUTHOR

Nigel Horne, C<< <njh at nigelhorne.com> >>

=head1 SUPPORT

This module is provided as-is without any warranty.

Please report any bugs or feature requests to C<bug-database-abstraction at rt.cpan.org>,
or through the web interface at
L<http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Database-Abstraction>.
I will be notified, and then you'll
automatically be notified of progress on your bug as I make changes.

=head1 BUGS

The default delimiter for CSV files is set to '!', not ',' for historical reasons.
I really ought to fix that.

It would be nice for the key column to be called key, not entry,
however key's a reserved word in SQL.

The no_entry parameter should be no_id.

XML slurping is hard,
so if XML fails for you on a small file force non-slurping mode with

    $foo = MyPackageName::Database::Foo->new({
        directory => '/var/dat',
        max_slurp_size => 0  # force to not use slurp and therefore to use SQL
    });

=head1 SEE ALSO

=over 4

=item * Test coverage report: L<https://nigelhorne.github.io/Database-Abstraction/coverage/>

=back

=head1 LICENSE AND COPYRIGHT

Copyright 2015-2025 Nigel Horne.

This program is released under the following licence: GPL2.
Usage is subject to licence terms.
The licence terms of this software are as follows:
Personal single user, single computer use: GPL2
All other users (for example, Commercial, Charity, Educational, Government)
must apply in writing for a licence for use from Nigel Horne at the
above e-mail.

=cut
1742
17431;