lib/Schema/Validator.pm

Structural Coverage (Approximate)

TER1 (Statement): 100.00%
TER2 (Branch): 100.00%
TER3 (LCSAJ): 100.0% (21/21)
Approximate LCSAJ segments: 49

LCSAJ Legend

โ— Covered โ€” this LCSAJ path was executed during testing.

โ— Not covered โ€” this LCSAJ path was never executed. These are the paths to focus on.

Multiple dots on a line indicate that multiple control-flow paths begin at that line. Hovering over any dot shows:

        start โ†’ end โ†’ jump
        

Uncovered paths show [NOT COVERED] in the tooltip.

Mutant Testing Legend

Survived (tests missed this) Killed (tests detected this) No mutation
    1: package Schema::Validator;
    2: 
    3: # ---------------------------------------------------------------------------
    4: # Schema::Validator -- ISO 8601 datetime validation and Schema.org vocabulary
    5: # loading.  Purely functional; all symbols are opt-in via import list.
    6: # ---------------------------------------------------------------------------
    7: 
    8: use strict;
    9: use warnings;
   10: use autodie qw(:all);
   11: 
   12: use Carp qw(carp croak);
   13: use DateTime::Format::ISO8601;
   14: use Encode qw(decode encode);
   15: use File::Spec;
   16: use JSON::MaybeXS qw(decode_json);
   17: use LWP::UserAgent;
   18: use Params::Get qw(get_params);
   19: use Params::Validate::Strict qw(validate_strict);
   20: use Readonly;
   21: use Scalar::Util qw(reftype);
   22: 
   23: use base 'Exporter';
   24: 
   25: # Only these two symbols may be imported by callers via 'use ... qw(...)'.
   26: our @EXPORT_OK = qw(is_valid_datetime load_dynamic_vocabulary);
   27: 
   28: our $VERSION = '0.03';
   29: 
   30: # ---------------------------------------------------------------------------
   31: # Package globals: both are populated as a side-effect of
   32: # load_dynamic_vocabulary().  Callers may read them after that call.
   33: # ---------------------------------------------------------------------------
   34: 
   35: # rdfs:Class items from the Schema.org JSON-LD graph, keyed by class label
   36: our %dynamic_schema;
   37: 
   38: # rdf:Property items from the Schema.org JSON-LD graph, keyed by property label
   39: our %dynamic_properties;
   40: 
   41: # ===========================================================================
   42: # CONSTANTS
   43: # ===========================================================================
   44: # All magic strings and numbers are confined here; nothing below uses bare
   45: # literals.  Every constant mirrors a key in %config so runtime overrides
   46: # are possible without re-opening the Readonly namespace.
   47: # ---------------------------------------------------------------------------
   48: 
   49: # Default cache directory: $CACHEDIR env var if set, otherwise the system
   50: # temporary directory.  Evaluated once at module load time.
   51: Readonly::Scalar my $DEFAULT_CACHE_DIR =>
   52: 	(defined $ENV{CACHEDIR} && length $ENV{CACHEDIR})
   53: 		? $ENV{CACHEDIR}
   54: 		: File::Spec->tmpdir();
   55: 
   56: # Default cache filename -- stored in $DEFAULT_CACHE_DIR, never in CWD.
   57: Readonly::Scalar my $DEFAULT_CACHE_FILE =>
   58: 	File::Spec->catfile($DEFAULT_CACHE_DIR, 'schemaorg_dynamic_vocabulary.jsonld');
   59: 
   60: # 86400 == 60 * 60 * 24: cache is considered fresh for one full day.
   61: Readonly::Scalar my $DEFAULT_CACHE_DURATION => 86_400;
   62: 
   63: # Canonical URL for the Schema.org full vocabulary in JSON-LD format.
   64: Readonly::Scalar my $DEFAULT_VOCAB_URL => 'https://schema.org/version/latest/schemaorg-current-https.jsonld';
   65: 
   66: # HTTP timeout for the vocabulary download request, in seconds.
   67: Readonly::Scalar my $DEFAULT_UA_TIMEOUT => 30;
   68: 
   69: # JSON-LD structural keys and RDF type labels used when traversing @graph.
   70: Readonly::Scalar my $AT_GRAPH        => '@graph';
   71: Readonly::Scalar my $RDF_CLASS       => 'rdfs:Class';
   72: Readonly::Scalar my $RDF_PROPERTY    => 'rdf:Property';
   73: Readonly::Scalar my $RDFS_LABEL      => 'rdfs:label';
   74: Readonly::Scalar my $RDFS_LABEL_FULL => 'http://www.w3.org/2000/01/rdf-schema#label';
   75: 
   76: # ===========================================================================
   77: # CONFIGURATION
   78: # ===========================================================================
   79: # Callers may override any key before calling an exported function, or inject
   80: # a full replacement via Object::Configure->configure('Schema::Validator', \%h).
   81: # ---------------------------------------------------------------------------
   82: our %config = (
   83: 	cache_file     => $DEFAULT_CACHE_FILE,
   84: 	cache_duration => $DEFAULT_CACHE_DURATION,
   85: 	vocab_url      => $DEFAULT_VOCAB_URL,
   86: 	ua_timeout     => $DEFAULT_UA_TIMEOUT,
   87: );
   88: 
   89: # ===========================================================================
   90: # PUBLIC INTERFACE (POD + code)
   91: # ===========================================================================
   92: 
   93: =head1 NAME
   94: 
   95: Schema::Validator - Tools for validating and loading Schema.org vocabulary definitions
   96: 
   97: =head1 VERSION
   98: 
   99: Version 0.03
  100: 
  101: =head1 SYNOPSIS
  102: 
  103:     use Schema::Validator qw(is_valid_datetime load_dynamic_vocabulary);
  104: 
  105:     # Validate a date or datetime string
  106:     if (is_valid_datetime('2024-11-14')) {
  107:         print "Valid date\n";
  108:     }
  109: 
  110:     # Load and query the Schema.org vocabulary
  111:     my $classes = load_dynamic_vocabulary();
  112:     if (exists $classes->{'Person'}) {
  113:         print "Person class is defined\n";
  114:     }
  115: 
  116:     # Override a config value for a single call
  117:     my $classes = load_dynamic_vocabulary(ua_timeout => 60);
  118: 
  119: =head1 DESCRIPTION
  120: 
  121: C<Schema::Validator> provides two utilities for working with Schema.org
  122: structured data:
  123: 
  124: =over 4
  125: 
  126: =item * L</is_valid_datetime> -- validates a string against the ISO 8601
  127: date/datetime subset used by Schema.org.
  128: 
  129: =item * L</load_dynamic_vocabulary> -- downloads (and caches for 24 hours)
  130: the full Schema.org JSON-LD vocabulary and exposes all class and property
  131: definitions as a hashref and via package globals.
  132: 
  133: =back
  134: 
  135: =head2 Configuration
  136: 
  137: Runtime behaviour is controlled by the package-level C<%Schema::Validator::config>
  138: hash.  Supported keys and their defaults:
  139: 
  140:     cache_file     => "$CACHEDIR/schemaorg_dynamic_vocabulary.jsonld"  # or tmpdir
  141:     cache_duration => 86400                                          # seconds
  142:     vocab_url      => 'https://schema.org/.../schemaorg-current-https.jsonld'
  143:     ua_timeout     => 30                                             # seconds
  144: 
  145: Override any key before calling an exported function:
  146: 
  147:     $Schema::Validator::config{ua_timeout} = 60;
  148: 
  149: Or supply a complete replacement via L<Object::Configure>:
  150: 
  151:     Object::Configure->configure('Schema::Validator', \%my_config);
  152: 
  153: =head1 PACKAGE VARIABLES
  154: 
  155: =head2 %dynamic_schema
  156: 
  157: Package hash keyed by Schema.org class label (e.g. C<Person>, C<Event>).
  158: Values are the raw item hashrefs from the JSON-LD C<@graph> array.
  159: Populated as a side-effect of L</load_dynamic_vocabulary>.
  160: 
  161: =head2 %dynamic_properties
  162: 
  163: Package hash keyed by Schema.org property label (e.g. C<name>, C<startDate>).
  164: Values are the raw item hashrefs from the JSON-LD C<@graph> array.
  165: Populated as a side-effect of L</load_dynamic_vocabulary>.
  166: 
  167: =head1 FUNCTIONS
  168: 
  169: =head2 is_valid_datetime
  170: 
  171: =head3 PURPOSE
  172: 
  173: Tests whether a scalar string conforms to one of the ISO 8601
  174: date or datetime formats accepted by Schema.org:
  175: 
  176:     YYYY-MM-DD               (date only)
  177:     YYYY-MM-DDTHH:MM         (T separator, no seconds)
  178:     YYYY-MM-DD HH:MM         (space separator, no seconds)
  179:     YYYY-MM-DDTHH:MM:SS      (T separator, with seconds)
  180:     YYYY-MM-DD HH:MM:SS      (space separator, with seconds)
  181: 
  182: Optional timezone designators (C<Z>, C<+HH:MM>, C<-HH:MM>) are B<accepted>.
  183: Calendar sanity B<is> enforced: out-of-range values (e.g. month 99) are B<rejected>.
  184: 
  185: =head3 ARGUMENTS
  186: 
  187: =over 4
  188: 
  189: =item * C<string> (required, scalar) -- the candidate string to test.
  190: Both positional (C<is_valid_datetime('2024-11-14')>) and named
  191: (C<is_valid_datetime(string =E<gt> '2024-11-14')>) calling conventions
  192: are accepted.
  193: 
  194: =back
  195: 
  196: =head3 RETURNS
  197: 
  198: C<1> if the string is in a supported format; C<0> otherwise.
  199: Returns C<0> for C<undef> or an empty string without throwing.
  200: 
  201: =head3 SIDE EFFECTS
  202: 
  203: None.
  204: 
  205: =head3 NOTES
  206: 
  207: Delegates to C<DateTime::Format::ISO8601->parse_datetime()> for semantic
  208: validation, so out-of-range values (e.g. month 99) are rejected.
  209: The space-separator variant (C<YYYY-MM-DD HH:MM>) is normalised to a T
  210: separator before parsing since the module requires strict ISO 8601.
  211: Timezone designators (C<Z>, C<+HH:MM>, C<-HH:MM>) are now accepted.
  212: 
  213: =head3 EXAMPLE
  214: 
  215:     use Schema::Validator qw(is_valid_datetime);
  216: 
  217:     is_valid_datetime('2024-11-14');                 # 1
  218:     is_valid_datetime('2024-11-14T15:30:00');        # 1
  219:     is_valid_datetime('2024-11-14 15:30');           # 1  (space sep normalised)
  220:     is_valid_datetime('2024-11-14T15:30:00Z');       # 1  (UTC timezone)
  221:     is_valid_datetime('2024-11-14T15:30:00+01:00');  # 1  (offset timezone)
  222:     is_valid_datetime('2024-99-01');                 # 0  (invalid month)
  223:     is_valid_datetime('28/06/2025');                 # 0
  224:     is_valid_datetime(undef);                        # 0  (no exception)
  225:     is_valid_datetime('');                           # 0  (no exception)
  226: 
  227:     # Named calling convention
  228:     is_valid_datetime(string => '2024-11-14');       # 1
  229: 
  230: =head3 API SPECIFICATION
  231: 
  232: =head4 Input (Params::Validate::Strict)
  233: 
  234:     {
  235:         string => {
  236:             type     => 'string',
  237:             optional => 0,
  238:         },
  239:     }
  240: 
  241: =head4 Output (Return::Set)
  242: 
  243:     {
  244:         type => 'boolean'
  245:         description  => '1 (valid) or 0 (invalid, undef, or empty input)'
  246:     }
  247: 
  248: =cut
  249: 
  250: sub is_valid_datetime {
  251: 	# Accept both positional (is_valid_datetime($s)) and named
  252: 	# (is_valid_datetime(string => $s)) calling conventions.
  253: 	# Validate: value must be a scalar or undef (undef returns 0 cleanly below).
  254: 	my $p = validate_strict(
  255: 		input => get_params('string', \@_),
  256: 		schema   => { 'string' => { type => 'string', optional => 0 } },
  257: 	);
  258: 
  259: 	my $string = $p->{string};
  260: 
  261: 	# Treat undef or empty string as invalid without throwing.
  262: 	return 0 unless defined $string && length $string;

					
Mutants (Total: 2, Killed: 0, Survived: 2)
263: 264: # Normalise the space-separator variant to T before handing off to the 265: # module, which requires strict ISO 8601 (T separator only). 266: (my $normalised = $string) =~ s/^(\d{4}-\d{2}-\d{2}) (?=\d{2}:)/$1T/; 267: 268: # Delegate to DateTime::Format::ISO8601 for full semantic validation; 269: # a truthy (DateTime) object means valid, undef/$@ means invalid. 270: return eval { DateTime::Format::ISO8601->parse_datetime($normalised) } ? 1 : 0; 271: } 272: 273: # =========================================================================== 274: 275: =head2 load_dynamic_vocabulary 276: 277: =head3 PURPOSE 278: 279: Downloads the complete Schema.org vocabulary from the official JSON-LD 280: endpoint, parses it into class and property lookup tables, caches the raw 281: JSON-LD locally, and returns the class table as a hashref. 282: 283: The cache is considered fresh for C<cache_duration> seconds (default 24 hours). 284: On network failure the function falls back to a stale cache rather than 285: returning an empty result, and emits a C<carp> warning. 286: 287: =head3 ARGUMENTS 288: 289: All arguments are optional; defaults come from C<%Schema::Validator::config>. 290: 291: =over 4 292: 293: =item * C<cache_file> (optional, scalar) -- path to the local cache file. 294: Defaults to C<$config{cache_file}>: C<$CACHEDIR/schemaorg_dynamic_vocabulary.jsonld> 295: if C<$ENV{CACHEDIR}> is set, otherwise C<File::Spec-E<gt>tmpdir()> is used. 296: 297: =item * C<cache_duration> (optional, scalar) -- cache validity window in seconds. 298: Defaults to C<$config{cache_duration}>. 299: 300: =item * C<vocab_url> (optional, scalar) -- URL of the JSON-LD vocabulary endpoint. 301: Defaults to C<$config{vocab_url}>. 302: 303: =item * C<ua_timeout> (optional, scalar) -- LWP::UserAgent timeout in seconds. 304: Defaults to C<$config{ua_timeout}>. 305: 306: =back 307: 308: Both zero-argument and named calling conventions are supported: 309: 310: load_dynamic_vocabulary(); 311: load_dynamic_vocabulary(ua_timeout => 60); 312: 313: =head3 RETURNS 314: 315: A hashref mapping class labels (e.g. C<'Person'>) to their raw JSON-LD 316: definition hashrefs from the C<@graph> array. 317: 318: Returns an empty hashref C<{}> on all failure paths (network unreachable, 319: no cache, JSON parse error). Never throws. 320: 321: =head3 SIDE EFFECTS 322: 323: =over 4 324: 325: =item * Populates C<%Schema::Validator::dynamic_schema> with class definitions. 326: 327: =item * Populates C<%Schema::Validator::dynamic_properties> with property definitions. 328: 329: =item * Creates or updates the local cache file on a successful download. 330: 331: =item * Emits C<carp> warnings on network failures, I/O errors, or JSON 332: parse errors. 333: 334: =back 335: 336: =head3 NOTES 337: 338: The default cache directory is determined once at module load time: the 339: C<$CACHEDIR> environment variable is used if set; otherwise C<File::Spec-E<gt>tmpdir()> 340: is used (typically C</tmp> on Unix). Override for the session with: 341: 342: $Schema::Validator::config{cache_file} = '/my/path/vocab.jsonld'; 343: 344: The C<bin/validate-schema> CLI tool imports this function from the module and 345: uses C<cache_file =E<gt> $path> to store its cache under C<~/.cache/schema_validator/>. 346: 347: =head3 EXAMPLE 348: 349: use Schema::Validator qw(load_dynamic_vocabulary); 350: 351: my $classes = load_dynamic_vocabulary(); 352: printf "%d classes loaded\n", scalar keys %{$classes}; 353: 354: # Check for a specific class in the returned hashref 355: print "Has Person\n" if exists $classes->{'Person'}; 356: 357: # Or query the package globals directly after the call 358: Schema::Validator::load_dynamic_vocabulary(); 359: my @names = sort keys %Schema::Validator::dynamic_schema; 360: 361: =head3 API SPECIFICATION 362: 363: =head4 Input (Params::Validate::Strict) 364: 365: { 366: cache_file => { type => 'string', optional => 1 }, 367: cache_duration => { type => 'string', optional => 1 }, 368: vocab_url => { type => 'string', optional => 1 }, 369: ua_timeout => { type => 'string', optional => 1 }, 370: } 371: 372: =head4 Output (Return::Set) 373: 374: { 375: type => 'hashref', 376: description => 'class-label => JSON-LD item hashref' 377: # ON_FAILURE => 'empty hashref {}; never throws' 378: # SIDE_EFFECTS => 'populates %dynamic_schema and %dynamic_properties' 379: } 380: 381: =cut 382: 383: sub load_dynamic_vocabulary { โ—384 โ†’ 387 โ†’ 400โ—384 โ†’ 387 โ†’ 0 384: my $params; 385: 386: # Validate types of any supplied overrides (all are optional scalars). 387: if(scalar(@_)) {
Mutants (Total: 1, Killed: 0, Survived: 1)
388: $params = validate_strict( 389: input => get_params(undef, \@_), 390: schema => { 391: cache_file => { type => 'string', optional => 1 }, 392: cache_duration => { type => 'integer', optional => 1 }, 393: vocab_url => { type => 'string', optional => 1 }, 394: ua_timeout => { type => 'integer', optional => 1 }, 395: } 396: ); 397: } 398: 399: # Merge caller overrides with module-level configuration defaults. โ—400 โ†’ 409 โ†’ 415โ—400 โ†’ 409 โ†’ 0 400: my $cache_file = $params->{cache_file} // $config{cache_file}; 401: my $cache_duration = $params->{cache_duration} // $config{cache_duration}; 402: my $vocab_url = $params->{vocab_url} // $config{vocab_url}; 403: my $ua_timeout = $params->{ua_timeout} // $config{ua_timeout}; 404: 405: my $content; 406: 407: # Attempt to read a fresh cache file. Open directly to avoid the TOCTOU 408: # race that would exist between a separate -e test and the open call. 409: if (-e $cache_file && (time - (stat($cache_file))[9] < $cache_duration)) {
Mutants (Total: 4, Killed: 0, Survived: 4)
410: eval { $content = _slurp_file($cache_file) }; 411: carp "Could not read cache '$cache_file': $@" if $@; 412: } 413: 414: # If no usable content yet, try to download the vocabulary. โ—415 โ†’ 415 โ†’ 436โ—415 โ†’ 415 โ†’ 0 415: unless (defined $content) {
Mutants (Total: 1, Killed: 0, Survived: 1)
416: $content = _fetch_url($vocab_url, $ua_timeout); 417: 418: if (defined $content) {
Mutants (Total: 1, Killed: 0, Survived: 1)
419: # Persist the download to the cache (best-effort; warn, do not die). 420: eval { _spit_file($cache_file, $content) }; 421: carp "Could not write cache '$cache_file': $@" if $@; 422: } else { 423: # Network failed; fall back to a stale cache if one exists. 424: if (-e $cache_file) {
Mutants (Total: 1, Killed: 0, Survived: 1)
425: eval { $content = _slurp_file($cache_file) }; 426: if ($@) {
Mutants (Total: 1, Killed: 0, Survived: 1)
427: carp "Could not read stale cache '$cache_file': $@"; 428: } else { 429: carp "Network unavailable; using stale cache '$cache_file'"; 430: } 431: } 432: } 433: } 434: 435: # All content-acquisition strategies failed; return empty result. โ—436 โ†’ 436 โ†’ 442โ—436 โ†’ 436 โ†’ 0 436: unless (defined $content) {
Mutants (Total: 1, Killed: 0, Survived: 1)
437: carp 'load_dynamic_vocabulary: no vocabulary content available'; 438: return {}; 439: } 440: 441: # Parse the JSON; treat errors as non-fatal warnings. โ—442 โ†’ 443 โ†’ 451โ—442 โ†’ 443 โ†’ 0 442: my $data = eval { decode_json($content) }; 443: if ($@) {
Mutants (Total: 1, Killed: 0, Survived: 1)
444: carp "Failed to parse vocabulary JSON: $@"; 445: return {}; 446: } 447: 448: # Guard against decode_json returning a non-object (e.g. a JSON array, 449: # a bare number, or any other non-hash type). Calling exists on a 450: # non-hashref dies; catching it here keeps the "never throws" contract. โ—451 โ†’ 451 โ†’ 457โ—451 โ†’ 451 โ†’ 0 451: unless (ref($data) eq 'HASH') {
Mutants (Total: 1, Killed: 0, Survived: 1)
452: carp "Vocabulary JSON is not a JSON object"; 453: return {}; 454: } 455: 456: # Confirm the expected JSON-LD graph structure is present. โ—457 โ†’ 457 โ†’ 463โ—457 โ†’ 457 โ†’ 0 457: unless (exists $data->{$AT_GRAPH} && ref($data->{$AT_GRAPH}) eq 'ARRAY') {
Mutants (Total: 1, Killed: 0, Survived: 1)
458: carp "Vocabulary JSON is missing the '\@graph' array"; 459: return {}; 460: } 461: 462: # Delegate parsing to the internal graph processor. โ—463 โ†’ 477 โ†’ 0 463: my ($classes, $props) = _parse_graph($data->{$AT_GRAPH}); 464: 465: # Populate package globals as documented side-effects. 466: %dynamic_schema = %{$classes}; 467: %dynamic_properties = %{$props}; 468: 469: # Report the result count via carp (informational, not an error). 470: carp sprintf( 471: 'Dynamic vocabulary loaded: %d classes, %d properties', 472: scalar(keys %dynamic_schema), 473: scalar(keys %dynamic_properties), 474: ); 475: 476: # Return the class hashref; callers needing properties use the global. 477: return $classes;
Mutants (Total: 2, Killed: 0, Survived: 2)
478: } 479: 480: # =========================================================================== 481: # INTERNAL HELPERS 482: # All routines below begin with _ and are not part of the public API. 483: # =========================================================================== 484: 485: # --------------------------------------------------------------------------- 486: # _slurp_file($path) 487: # 488: # Purpose: Read the complete contents of a file into a scalar. 489: # Entry: $path is a path to an existing, readable file. 490: # Returns: The file contents as a scalar string. 491: # Side fx: None beyond reading the file. 492: # Notes: autodie causes open/close to throw on failure; callers should 493: # wrap in eval { } and handle $@ if a non-fatal path is needed. 494: # --------------------------------------------------------------------------- 495: sub _slurp_file { 496: my ($path) = @_; 497: 498: # Open the file; autodie will throw if this fails. 499: open my $fh, '<', $path; 500: 501: # Temporarily undefine $/ to read the whole file in one operation. 502: local $/; 503: my $content = <$fh>; 504: 505: close $fh; 506: return $content;
Mutants (Total: 2, Killed: 0, Survived: 2)
507: } 508: 509: # --------------------------------------------------------------------------- 510: # _spit_file($path, $content) 511: # 512: # Purpose: Write a scalar string to a file, creating or truncating it. 513: # Entry: $path is a writable path; $content is a defined scalar. 514: # Returns: 1 on success. 515: # Side fx: Creates or overwrites $path. 516: # Notes: autodie causes open/close to throw on failure; wrap in eval 517: # when the write is non-critical (e.g. cache population). 518: # --------------------------------------------------------------------------- 519: sub _spit_file { 520: my ($path, $content) = @_; 521: 522: # Open for writing; autodie throws on permission or path errors. 523: open my $fh, '>', $path; 524: print $fh $content; 525: close $fh; 526: 527: return 1;
Mutants (Total: 2, Killed: 0, Survived: 2)
528: } 529: 530: # --------------------------------------------------------------------------- 531: # _fetch_url($url, $timeout) 532: # 533: # Purpose: Perform an HTTP GET and return the decoded response body. 534: # Entry: $url is a valid absolute HTTP/HTTPS URL; $timeout is a positive 535: # integer (seconds). 536: # Returns: Decoded response content on success; undef on HTTP error. 537: # Side fx: Network I/O; emits carp on non-success HTTP status. 538: # Notes: Transport-level errors (DNS failure, TLS error) may propagate as 539: # exceptions from LWP::UserAgent; callers should wrap in eval if 540: # they need a guaranteed non-throwing call. 541: # --------------------------------------------------------------------------- 542: sub _fetch_url { โ—543 โ†’ 550 โ†’ 555โ—543 โ†’ 550 โ†’ 0 543: my ($url, $timeout) = @_; 544: 545: # Build a minimal UA; timeout prevents indefinite hangs. 546: my $ua = LWP::UserAgent->new(timeout => $timeout); 547: my $res = $ua->get($url); 548: 549: # Treat any non-2xx status as a soft failure so callers can try fallbacks. 550: unless ($res->is_success) {
Mutants (Total: 1, Killed: 0, Survived: 1)
551: carp "Failed to fetch '$url': ", $res->status_line; 552: return; 553: } 554: โ—555 โ†’ 555 โ†’ 0 555: return $res->decoded_content;
Mutants (Total: 2, Killed: 0, Survived: 2)
556: } 557: 558: # --------------------------------------------------------------------------- 559: # _extract_label($item) 560: # 561: # Purpose: Extract the rdfs:label string from a JSON-LD graph item hashref. 562: # Entry: $item is a hashref that may contain 'rdfs:label' or the full 563: # URI equivalent key. 564: # Returns: The label as a plain string, or undef if no label is found. 565: # Side fx: None. 566: # Notes: Schema.org JSON-LD may represent the label as a scalar string or 567: # as an array (for multi-language entries); this function always 568: # returns the first (or only) value. 569: # --------------------------------------------------------------------------- 570: sub _extract_label { 571: my ($item) = @_; 572: 573: # Try the compact key first; fall back to the full RDF URI form. 574: my $label = $item->{$RDFS_LABEL} // $item->{$RDFS_LABEL_FULL}; 575: return unless defined $label; 576: 577: # If the label is multi-valued, take the first entry. 578: return ref($label) eq 'ARRAY' ? $label->[0] : $label; 579: } 580: 581: # --------------------------------------------------------------------------- 582: # _parse_graph(\@graph) 583: # 584: # Purpose: Iterate over a JSON-LD @graph array and partition items into 585: # Schema.org class definitions and property definitions. 586: # Entry: $graph_ref is an arrayref of item hashrefs as decoded from the 587: # Schema.org JSON-LD vocabulary. 588: # Returns: Two hashrefs: (\%classes, \%properties), each keyed by label. 589: # Items are also indexed by the short name extracted from their 590: # @id URI so that both 'MusicEvent' and its label resolve correctly. 591: # Side fx: None. 592: # Notes: Items with no recognisable label or @type are silently skipped. 593: # The @id short-name index uses //= so the label always wins if 594: # it differs. 595: # --------------------------------------------------------------------------- 596: sub _parse_graph { โ—597 โ†’ 602 โ†’ 637โ—597 โ†’ 602 โ†’ 0 597: my ($graph_ref) = @_; 598: 599: my (%classes, %props); 600: 601: # Iterate every item in the JSON-LD graph array. 602: for my $item (@{$graph_ref}) { 603: 604: # Skip items that do not declare an RDF type. 605: next unless exists $item->{'@type'}; 606: my $item_type = $item->{'@type'}; 607: 608: # Normalise @type: the spec allows either a scalar or an array. 609: my @types = ref($item_type) eq 'ARRAY' ? @{$item_type} : ($item_type); 610: 611: # Extract the human-readable label; skip items with none. 612: my $label = _extract_label($item) or next; 613: 614: # Index rdfs:Class items under their label and their @id short name. 615: if (grep { $_ eq $RDF_CLASS } @types) {
Mutants (Total: 1, Killed: 0, Survived: 1)
616: $classes{$label} = $item; 617: 618: # Secondary index by short URI fragment (e.g. 'MusicGroup'). 619: if (my $id = $item->{'@id'}) {
Mutants (Total: 1, Killed: 0, Survived: 1)
620: (my $short = $id) =~ s{.*/}{}; 621: $classes{$short} //= $item; 622: } 623: } 624: 625: # Index rdf:Property items under their label and @id short name. 626: if (grep { $_ eq $RDF_PROPERTY } @types) {
Mutants (Total: 1, Killed: 0, Survived: 1)
627: $props{$label} = $item; 628: 629: # Secondary index by short URI fragment (e.g. 'startDate'). 630: if (my $id = $item->{'@id'}) {
Mutants (Total: 1, Killed: 0, Survived: 1)
631: (my $short = $id) =~ s{.*/}{}; 632: $props{$short} //= $item; 633: } 634: } 635: } 636: โ—637 โ†’ 637 โ†’ 0 637: return (\%classes, \%props); 638: } 639: 640: # =========================================================================== 641: # END OF MODULE POD 642: # =========================================================================== 643: 644: =encoding utf-8 645: 646: =head1 FILES 647: 648: =head2 schemaorg_dynamic_vocabulary.jsonld 649: 650: Cache file written to C<$CACHEDIR> (if set) or the system temporary directory 651: (C<File::Spec-E<gt>tmpdir()>), unless overridden via C<$config{cache_file}>. 652: Contains the downloaded Schema.org vocabulary in JSON-LD format. Refreshed 653: when older than C<$config{cache_duration}> seconds. 654: 655: =head1 ERROR HANDLING 656: 657: The module uses C<carp> rather than C<die> for recoverable failures: 658: 659: =over 4 660: 661: =item * Failed HTTP requests emit C<carp> and trigger the stale-cache fallback. 662: 663: =item * JSON parse errors emit C<carp> and return C<{}>. 664: 665: =item * File I/O errors emit C<carp>; the download path is attempted next. 666: 667: =item * C<croak> is reserved for programmer errors (bad argument types). 668: 669: =back 670: 671: =head1 BUGS 672: 673: =over 4 674: 675: =item * Cache invalidation is time-based only; no checksum or version check. 676: 677: =back 678: 679: =head1 SEE ALSO 680: 681: =over 4 682: 683: =item * L<Test Dashboard|https://nigelhorne.github.io/Schema-Validator/coverage/> 684: 685: =back 686: 687: =head1 REPOSITORY 688: 689: L<https://github.com/nigelhorne/schema-validator> 690: 691: =head2 FORMAL SPECIFICATION 692: 693: =head3 is_valid_datetime 694: 695: Let CHAR denote the set of all Unicode code points and 696: DIGIT = { c : CHAR | c in {'0'..'9'} }. 697: Let seqN(S) = { s : seq S | #s = N }. 698: 699: YEAR ≜ seqN(4, DIGIT) 700: MONTH ≜ seqN(2, DIGIT) 701: DAY ≜ seqN(2, DIGIT) 702: HOUR ≜ seqN(2, DIGIT) 703: MINUTE ≜ seqN(2, DIGIT) 704: SECOND ≜ seqN(2, DIGIT) 705: SEP ≜ { 'T', ' ' } 706: 707: DATE ≜ { d : seq CHAR | ∃ y ∈ YEAR; mo ∈ MONTH; dy ∈ DAY 708: • d = y ⌢ ⟨'-'⟩ ⌢ mo ⌢ ⟨'-'⟩ ⌢ dy } 709: 710: HHMM ≜ { t : seq CHAR | ∃ h ∈ HOUR; m ∈ MINUTE 711: • t = h ⌢ ⟨':'⟩ ⌢ m } 712: 713: HHMMSS ≜ { t : seq CHAR | ∃ h ∈ HOUR; m ∈ MINUTE; s ∈ SECOND 714: • t = h ⌢ ⟨':'⟩ ⌢ m ⌢ ⟨':'⟩ ⌢ s } 715: 716: TIMEFRAG ≜ { tf : seq CHAR | ∃ sep ∈ SEP; hm ∈ (HHMM ∪ HHMMSS) 717: • tf = ⟨sep⟩ ⌢ hm } 718: 719: DATETIME ≜ DATE ∪ { dt : seq CHAR | ∃ d ∈ DATE; tf ∈ TIMEFRAG 720: • dt = d ⌢ tf } 721: 722: ────────────────────────────────────────────────────────────── 723: IsValidDatetime 724: ────────────────────────────────────────────────────────────── 725: str? : seq CHAR 726: result! : B 727: ────────────────────────────────────────────────────────────── 728: result! ⟺ str? ∈ DATETIME 729: ────────────────────────────────────────────────────────────── 730: 731: =head3 load_dynamic_library 732: 733: Let FILE, DUR, URL be the resolved config values. 734: Let now : N be the current UNIX epoch time. 735: Let mtime : PATH -> N map a path to its last-modification time. 736: Let readable, writeable : PATH -> B be filesystem predicates. 737: Let reachable : URL -> B test HTTP reachability. 738: Let slurp : PATH -> seq CHAR and spit : PATH x seq CHAR -> 1. 739: Let fetch : URL x N -> seq CHAR (second arg is timeout). 740: Let decode_json : seq CHAR -> ITEM. 741: Let label : ITEM -> (LABEL | {}) extract rdfs:label. 742: Let types : ITEM -> P TYPE extract @type values. 743: 744: FRESH ≜ ( -e(FILE) ) ∧ ( (now - mtime(FILE)) < DUR ) 745: 746: ────────────────────────────────────────────────────────────────────── 747: LoadDynamicVocabulary 748: ────────────────────────────────────────────────────────────────────── 749: ΔVocabularyStore 750: cache_file? : PATH 751: cache_duration? : N 752: vocab_url? : URL 753: ua_timeout? : N 754: result! : CLASS_LABEL ⇸ ITEM 755: ────────────────────────────────────────────────────────────────────── 756: content : seq CHAR 757: 758: FRESH ∧ readable(cache_file?) 759: ⇒ content = slurp(cache_file?) 760: 761: ¬FRESH ∧ reachable(vocab_url?) 762: ⇒ content = fetch(vocab_url?, ua_timeout?) 763: ∧ ( writeable(cache_file?) ⇒ spit(cache_file?, content) ) 764: 765: ¬FRESH ∧ ¬reachable(vocab_url?) ∧ -e(cache_file?) 766: ⇒ content = slurp(cache_file?) 767: 768: graph ≜ (decode_json content)[AT_GRAPH] 769: 770: dynamic_schema' = 771: { item ∈ graph | RDF_CLASS ∈ types(item) ∧ label(item) ≠ ∅ 772: • label(item) ↦ item } 773: 774: dynamic_properties' = 775: { item ∈ graph | RDF_PROPERTY ∈ types(item) ∧ label(item) ≠ ∅ 776: • label(item) ↦ item } 777: 778: result! = dynamic_schema' 779: ────────────────────────────────────────────────────────────────────── 780: 781: =head1 AUTHOR 782: 783: Nigel Horne, C<< <njh at nigelhorne.com> >> 784: 785: =head1 LICENCE AND COPYRIGHT 786: 787: Copyright 2025-2026 Nigel Horne. 788: 789: Usage is subject to the GPL2 licence terms. 790: If you use it, 791: please let me know. 792: 793: =cut 794: 795: 1;