File Coverage

File:blib/lib/App/Test/Generator/SchemaExtractor.pm
Coverage:78.3%

linestmtbrancondsubtimecode
1package App::Test::Generator::SchemaExtractor;
2
3
30
30
30
1445423
23
401
use strict;
4
30
30
30
52
25
553
use warnings;
5
30
30
30
4664
158972
57
use autodie qw(:all);
6
7
30
30
30
213567
36
444
use App::Test::Generator::Model::Method;
8
30
30
30
3992
36
470
use App::Test::Generator::Analyzer::Complexity;
9
30
30
30
3859
38
436
use App::Test::Generator::Analyzer::Return;
10
30
30
30
3649
34
413
use App::Test::Generator::Analyzer::ReturnMeta;
11
30
30
30
3933
40
476
use App::Test::Generator::Analyzer::SideEffect;
12
13
30
30
30
63
21
563
use Carp qw(carp croak);
14
30
30
30
5026
1787799
472
use PPI;
15
30
30
30
4785
388731
424
use Pod::Simple::Text;
16
30
30
30
121
27
1042
use File::Basename;
17
30
30
30
64
22
557
use File::Path qw(make_path);
18
30
30
30
4580
92499
519
use Params::Get;
19
30
30
30
5928
121456
484
use Safe;
20
30
30
30
81
22
518
use Scalar::Util qw(looks_like_number);
21
30
30
30
3489
27592
682
use YAML::XS;
22
30
30
30
5500
34631
719
use IPC::Open3;
23
30
30
30
70
23
778
use JSON::MaybeXS qw(encode_json decode_json);
24
30
30
30
54
26
458
use Readonly;
25
30
30
30
49
26
668319
use Symbol qw(gensym);
26
27# --------------------------------------------------
28# Confidence score thresholds for input and output analysis
29# --------------------------------------------------
30Readonly my $CONFIDENCE_HIGH_THRESHOLD   => 60;
31Readonly my $CONFIDENCE_MEDIUM_THRESHOLD => 35;
32Readonly my $CONFIDENCE_LOW_THRESHOLD    => 15;
33
34# --------------------------------------------------
35# Confidence level label strings
36# --------------------------------------------------
37Readonly my $LEVEL_HIGH     => 'high';
38Readonly my $LEVEL_MEDIUM   => 'medium';
39Readonly my $LEVEL_LOW      => 'low';
40Readonly my $LEVEL_VERY_LOW => 'very_low';
41Readonly my $LEVEL_NONE     => 'none';
42
43# --------------------------------------------------
44# Analysis limits
45# --------------------------------------------------
46Readonly my $DEFAULT_MAX_PARAMETERS     => 20;
47Readonly my $DEFAULT_CONFIDENCE_THRESH  => 0.5;
48Readonly my $POD_WALK_LIMIT             => 200;
49Readonly my $SIGNATURE_TIMEOUT_SECS     => 3;
50Readonly my $MEMORY_LIMIT_BYTES         => 50_000_000;
51
52# --------------------------------------------------
53# Numeric boundary values for test hint generation
54# --------------------------------------------------
55Readonly my $INT32_MAX => 2_147_483_647;
56
57# --------------------------------------------------
58# Boolean return score thresholds
59# --------------------------------------------------
60Readonly my $BOOLEAN_SCORE_THRESHOLD => 30;
61
62 - 70
=head1 NAME

App::Test::Generator::SchemaExtractor - Extract test schemas from Perl modules

=head1 VERSION

Version 0.41

=cut
71
72our $VERSION = '0.41';
73
74 - 1334
=head1 SYNOPSIS

        use App::Test::Generator::SchemaExtractor;

        my $extractor = App::Test::Generator::SchemaExtractor->new(
                input_file => 'lib/MyModule.pm',
                output_dir => 'schemas/',
                verbose => 1,
        );

        my $schemas = $extractor->extract_all();

=head1 DESCRIPTION

App::Test::Generator::SchemaExtractor analyzes Perl modules and generates
structured YAML schema files suitable for automated test generation by L<App::Test::Generator>.
This module employs
static analysis techniques to infer parameter types, constraints, and
method behaviors directly from your source code.

=head2 Analysis Methods

The extractor combines multiple analysis approaches for a comprehensive schema generation:

=over 4

=item * B<POD Documentation Analysis>

Parses embedded documentation to extract:
  - Parameter names, types, and descriptions from =head2 sections
  - Method signatures with positional parameters
  - Return value specifications from "Returns:" sections
  - Constraints (ranges, patterns, required/optional status)
  - Semantic type detection (email, URL, filename)

=item * B<Code Pattern Detection>

Analyzes source code using PPI to identify:
  - Method signatures and parameter extraction patterns
  - Type validation (ref(), isa(), blessed())
  - Constraint patterns (length checks, numeric comparisons, regex matches)
  - Return statement analysis and value type inference
  - Object instantiation requirements and accessor methods

=item * B<Signature Analysis>

Examines method declarations for:
  - Parameter names and positional information
  - Instance vs. class method detection
  - Method modifiers (Moose-style before/after/around)
  - Various parameter declaration styles (shift, @_ assignment)

=item * B<Heuristic Inference>

Applies Perl-specific domain knowledge:
  - Boolean return detection from method names (is_*, has_*, can_*)
  - Common Perl idioms and coding patterns
  - Context awareness (scalar vs list, wantarray usage)
  - Object-oriented patterns (constructors, accessors, chaining)

=back

=head2 Generated Schema Structure

The extracted schemas follow this YAML structure:

    function: method_name
    module: Package::Name
    input:
      param1:
        type: string
        min: 3
        max: 50
        optional: 0
        position: 0
      param2:
        type: integer
        min: 0
        max: 100
        optional: 1
        position: 1
    output:
      type: boolean
      value: 1
    new: Package::Name # if object instantiation required
    config:
      test_empty: 1
      test_nuls: 0
      test_undef: 0
      test_non_ascii: 0

=head2 Advanced Detection Capabilities

=over 4

=item * B<Accessor Method Detection>

Automatically identifies getter, setter, and combined accessor methods
by analyzing common patterns like C<return $self-E<gt>{property}> and
C<$self-E<gt>{property} = $value>.

=item * B<Params::Get Integration>

Recognises parameters extracted via C<Params::Get::get_params('key', \@_)>,
treating the quoted key as a named parameter equivalent to a traditional
C<my ($self, $key) = @_> signature.  This prevents false positives from
C<--strict-pod> when the method body never declares an explicit C<$key>
variable.

=item * B<Direct-Index Self Style>

Recognises C<my $self = $_[0]> as a valid method-invocant pattern.  Parameters
at C<$_[1]>, C<$_[2]>, etc. are extracted as positional parameters.  Without
this, the signature fallback would incorrectly pick up C<my (...) = @_> from
inner closures defined in the method body and treat those variables as the
outer method's parameters.

=item * B<Boolean Return Inference>

Detects boolean-returning methods through multiple signals:
  - Method name patterns (is_*, has_*, can_*)
  - Return patterns (consistent 1/0 returns)
  - POD descriptions ("returns true on success")
  - Ternary operators with boolean results

=item * B<Context Awareness>

Identifies methods that use C<wantarray> and can return different
values in scalar vs list context.

=item * B<Object Lifecycle Management>

Detects instance methods requiring object instantiation and
automatically adds the C<new> field to schemas.

=item * B<Enhanced Object Detection>

The extractor includes sophisticated object detection capabilities that go beyond simple instance method identification:

=over 4

=item * B<Factory Method Recognition>

Automatically identifies methods that create and return object instances, such as methods named C<create_*>, C<make_*>, C<build_*>, or C<get_*>. Factory methods are correctly classified as class methods that don't require pre-existing objects for testing.

=item * B<Singleton Pattern Detection>

Recognizes singleton patterns through multiple signals: method names like C<instance> or C<get_instance>, static variables holding instance references, lazy initialization patterns (C<$instance ||= new()>), and consistent return of the same instance variable.

=item * B<Constructor Parameter Analysis>

Examines C<new> methods to determine required and optional parameters, validation requirements, and default values. This enables test generators to provide appropriate constructor arguments when object instantiation is needed.

=item * B<Inheritance Relationship Handling>

Detects parent classes through C<use parent>, C<use base>, and C<@ISA> declarations. Identifies when methods use C<SUPER::> calls and determines whether the current class or a parent class constructor should be used for object instantiation.

=item * B<External Object Dependency Detection>

Identifies when methods create or depend on objects from other classes, enabling proper test setup with mock objects or real dependencies.

=back

These enhancements ensure that generated test schemas accurately reflect the object-oriented structure of the code, leading to more meaningful and effective test generation.

=back

=head2 Confidence Scoring

Each generated schema includes detailed confidence assessments:

=over 4

=item * B<High Confidence>

Multiple independent analysis sources converge on consistent,
well-constrained parameters with explicit validation logic and
comprehensive documentation.

=item * B<Medium Confidence>

Reasonable evidence from code patterns or partial documentation,
but may lack comprehensive constraints or have some ambiguities.

=item * B<Low Confidence>

Minimal evidence - primarily based on naming conventions,
default assumptions, or single-source analysis.

=item * B<Very Low Confidence>

Barely any detectable signals - schema should be thoroughly
reviewed before use in test generation.

=back

=head2 Use Cases

=over 4

=item * B<Automated Test Generation>

Generate comprehensive test suites with L<App::Test::Generator> using
extracted schemas as input. The schemas provide the necessary structure
for generating both positive and negative test cases.

=item * B<API Documentation Generation>

Supplement existing documentation with automatically inferred interface
specifications, parameter requirements, and return types.

=item * B<Code Quality Assessment>

Identify methods with poor documentation, inconsistent parameter handling,
or unclear interfaces that may benefit from refactoring.

=item * B<Refactoring Assistance>

Detect method dependencies, object instantiation requirements, and
parameter usage patterns to inform refactoring decisions.

=item * B<Legacy Code Analysis>

Quickly understand the interface contracts of legacy Perl codebases
without extensive manual code reading.

=back

=head2 Integration with Testing Ecosystem

The generated schemas are specifically designed to work with the
L<App::Test::Generator> ecosystem:

    # Extract schemas from your module
    my $extractor = App::Test::Generator::SchemaExtractor->new(...);
    my $schemas = $extractor->extract_all();

    # Use with test generator (typically as separate steps)
    # fuzz-harness-generator -r schemas/method_name.yml

=head2 Limitations and Considerations

=over 4

=item * B<Dynamic Code Patterns>

Highly dynamic code (string evals, AUTOLOAD, symbolic references)
may not be fully detected by static analysis.

=item * B<Complex Validation Logic>

Sophisticated validation involving multiple parameters or external
dependencies may require manual schema refinement.

=item * B<Confidence Heuristics>

Confidence scores are based on heuristics and should be reviewed
by developers familiar with the codebase.

=item * B<Perl Idiom Recognition>

Some Perl-specific idioms may require custom pattern recognition
beyond the built-in detectors.

=item * B<Documentation Dependency>

Analysis quality improves significantly with comprehensive POD
documentation following consistent patterns.

=back

=head2 Best Practices for Optimal Results

=over 4

=item * B<Comprehensive POD Documentation>

Write detailed POD with explicit parameter documentation using
consistent patterns like C<$param - type (constraints), description>.

=item * B<Consistent Coding Patterns>

Use consistent parameter validation patterns and method signatures
throughout your codebase.

=item * B<Schema Review Process>

Review and refine automatically generated schemas, particularly
those with low confidence scores.

=item * B<Descriptive Naming>

Use descriptive method and parameter names that clearly indicate
purpose and expected types.

=item * B<Progressive Enhancement>

Start with automatically generated schemas and progressively
refine them based on test results and code understanding.

=back

The module is particularly valuable for large codebases where manual schema
creation would be prohibitively time-consuming, and for maintaining test
coverage as code evolves through continuous integration pipelines.

=head2 Advanced Type Detection

The schema extractor includes enhanced type detection capabilities that identify specialized Perl types beyond basic strings and integers.
L<DateTime> and L<Time::Piece> objects are detected through isa() checks and method call patterns, while date strings (ISO 8601, YYYY-MM-DD) and UNIX timestamps are recognized through regex validation and numeric range checks.
File handles and file paths are identified via I/O operations and file test operators, coderefs are detected through ref() checks and invocation patterns, and enum-like parameters are extracted from validation code including regex patterns (C</^(a|b|c)$/>), hash lookups, grep statements, and if/elsif chains.
These detected types are preserved in the generated YAML schemas with appropriate semantic annotations, enabling test generators to create more accurate and meaningful test cases.

=head3 Example Advanced Type Schema

For a method like:

    sub process_event {
        my ($self, $timestamp, $status, $callback) = @_;
        croak unless $timestamp > 1000000000;
        croak unless $status =~ /^(active|pending|complete)$/;
        croak unless ref($callback) eq 'CODE';
        $callback->($timestamp, $status);
    }

The extractor generates:

    ---
    function: process_event
    module: MyModule
    input:
      timestamp:
        type: integer
        # min: 0
        # max: 2147483647
        position: 0
        _note: Unix timestamp
        semantic: unix_timestamp
      status:
        type: string
        enum:
          - active
          - pending
          - complete
        position: 1
        _note: 'Must be one of: active, pending, complete'
      callback:
        type: coderef
        position: 2
        _note: 'CODE reference - provide sub { } in tests'

=head1 RELATIONSHIP DETECTION

The schema extractor detects relationships and dependencies between parameters,
enabling more sophisticated validation and test generation.

=head2 Relationship Types

=over 4

=item * B<mutually_exclusive>

Parameters that cannot be used together.

    die if $file && $content;  # Can't specify both

Generated schema:

    relationships:
      - type: mutually_exclusive
        params: [file, content]
        description: Cannot specify both file and content

=item * B<required_group>

At least one parameter from the group must be specified (OR logic).

    die unless $id || $name;  # Must provide one

Generated schema:

    relationships:
      - type: required_group
        params: [id, name]
        logic: or
        description: Must specify either id or name

=item * B<conditional_requirement>

If one parameter is specified, another becomes required (IF-THEN logic).

    die if $async && !$callback;  # async requires callback

Generated schema:

    relationships:
      - type: conditional_requirement
        if: async
        then_required: callback
        description: When async is specified, callback is required

=item * B<dependency>

One parameter depends on another being present.

    die "Port requires host" if $port && !$host;

Generated schema:

    relationships:
      - type: dependency
        param: port
        requires: host
        description: port requires host to be specified

=item * B<value_constraint>

Specific value requirements between parameters.

    die if $ssl && $port != 443;  # ssl requires port 443

Generated schema:

    relationships:
      - type: value_constraint
        if: ssl
        then: port
        operator: ==
        value: 443
        description: When ssl is specified, port must equal 443

=item * B<value_conditional>

Parameter required when another has a specific value.

    die if $mode eq 'secure' && !$key;

Generated schema:

    relationships:
      - type: value_conditional
        if: mode
        equals: secure
        then_required: key
        description: When mode equals 'secure', key is required

=back

=head2 Default Value Extraction

The extractor comprehensively extracts default values from both code and POD documentation:

=head3 Code Pattern Recognition

Extracts defaults from multiple Perl idioms:

=over 4

=item * Logical OR operator: C<$param = $param || 'default'>

=item * Defined-or operator: C<$param //= 'default'>

=item * Ternary operator: C<$param = defined $param ? $param : 'default'>

=item * Unless conditional: C<$param = 'default' unless defined $param>

=item * Chained defaults: C<$param = $param || $self->{_default} || 'fallback'>

=item * Multi-line patterns: C<$param = {} unless $param>

=back

=head3 POD Pattern Recognition

Extracts defaults from documentation:

=over 4

=item * Standard format: C<Default: 'value'>

=item * Alternative format: C<Defaults to: 'value'>

=item * Inline format: C<Optional, default: 'value'>

=item * Parameter lists: C<$param - type, default 'value'>

=back

=head3 Value Processing

Properly handles:

=over 4

=item * String literals with quotes and escape sequences

=item * Numeric values (integers and floats)

=item * Boolean values (true/false converted to 1/0)

=item * Empty data structures ([] and {})

=item * Special values (undef, __PACKAGE__)

=item * Complex expressions (preserved as-is when unevaluatable)

=item * Quote operators (q{}, qq{}, qw{})

=back

=head3 Type Inference

When a parameter has a default value but no explicit type annotation,
the type is automatically inferred from the default:

    $options = {}        # inferred as hashref
    $items = []          # inferred as arrayref
    $count = 42          # inferred as integer
    $ratio = 3.14        # inferred as number
    $enabled = 1         # inferred as boolean

=head2 Context-Aware Return Analysis

The extractor provides comprehensive analysis of method return behavior,
including context sensitivity, error handling conventions, and method chaining patterns.

When a method's POD contains a C<=head4 Output> block in
L<Params::Validate::Strict> schema format, the C<type> declared there is
used as the authoritative output type and takes precedence over all
heuristic code analysis:

    =head4 Output

        {
            type => 'hashref',
        }

This is the recommended way to document methods whose return type would
otherwise be misidentified (e.g. a method that returns C<$self-E<gt>{cache}>
where the cache happens to hold a hashref).

Using parentheses as the outer container emits C<type: array>, indicating a
list-returning method.  L<App::Test::Generator> 0.39+ (with L<Test::Returns>
0.03+) captures these results in list context automatically:

    =head4 Output

        (
            {
                type => 'hashref',
            },
            ...
        )

=head3 List vs Scalar Context Detection

Automatically detects methods that return different values based on calling context:

    sub get_items {
        my $self = $_[0];
        return wantarray ? @items : scalar(@items);
    }

Detection captures:

=over 4

=item * C<_context_aware> flag - Method uses wantarray

=item * C<_list_context> - Type returned in list context (e.g., 'array')

=item * C<_scalar_context> - Type returned in scalar context (e.g., 'integer')

=back

Recognizes both ternary operator patterns and conditional return patterns.

=head3 Void Context Methods

Identifies methods that don't return meaningful values:

=over 4

=item * Setters (C<set_*> methods)

=item * Mutators (C<add_*, remove_*, delete_*, clear_*, reset_*, update_*>)

=item * Loggers (C<log, debug, warn, error, info>)

=item * Methods with only empty returns

=back

Example:

    sub set_name {
        my ($self, $name) = @_;
        $self->{name} = $name;
        return;  # Void context
    }

Sets C<_void_context> flag and C<type =E<gt> 'void'>.

=head3 Method Chaining Detection

Identifies chainable methods that return C<$self> for fluent interfaces:

    sub set_width {
        my ($self, $width) = @_;
        $self->{width} = $width;
        return $self;  # Chainable
    }

Detection provides:

=over 4

=item * C<_returns_self> - Returns invocant for chaining

=item * C<class> - The class name being returned

=back

Also detects chaining documentation in POD (keywords: "chainable", "fluent interface",
"returns self", "method chaining").

=head3 Error Return Conventions

Analyzes how methods signal errors:

B<Pattern Detection:>

=over 4

=item * C<undef_on_error> - Explicit C<return undef if/unless condition>

=item * C<implicit_undef> - Bare C<return if/unless condition>

=item * C<empty_list> - C<return ()> for list context errors

=item * C<zero_on_error> - Returns 0/false for boolean error indication

=item * C<exception_handling> - Uses eval blocks with error checking

=back

B<Example Analysis:>

    sub fetch_user {
        my ($self, $id) = @_;

        return undef unless $id;        # undef_on_error
        return undef if $id < 0;        # undef_on_error

        return $self->{users}{$id};
    }

Results in:

    _error_return: 'undef'
    _success_failure_pattern: 1
    _error_handling: {
        undef_on_error: ['$id', '$id < 0']
    }

B<Success/Failure Pattern:>

Methods that return different types for success vs. failure are flagged with
C<_success_failure_pattern>. Common patterns:

=over 4

=item * Returns value on success, undef on failure

=item * Returns true on success, false on failure

=item * Returns data on success, empty list on failure

=back

=head3 Success Indicator Detection

Methods that always return true (typically for side effects):

    sub update_status {
        my ($self, $status) = @_;
        $self->{status} = $status;
        return 1;  # Success indicator
    }

Sets C<_success_indicator> flag when method consistently returns 1.

=head3 Schema Output

Enhanced return analysis adds these fields to method schemas:

    output:
      type: boolean              # Inferred return type
      _context_aware: 1           # Uses wantarray
      _list_context:
        type: array
      _scalar_context:
        type: integer
      _returns_self: 1               # Returns $self
      _void_context: 1            # No meaningful return
      _success_indicator: 1       # Always returns true
      _error_return: undef        # How errors are signaled
      _success_failure_pattern: 1 # Mixed return types
      _error_handling:            # Detailed error patterns
        undef_on_error: [...]
        exception_handling: 1

This comprehensive analysis enables:

=over 4

=item * Better test generation (testing both contexts, error paths)

=item * Documentation generation (clear error conventions)

=item * API design validation (consistent error handling)

=item * Contract specification (precise return behavior)

=back

=head2 Example

For a method like:

    sub connect {
        my ($self, $host, $port, $ssl, $file, $content) = @_;

        die if $file && $content;                    # mutually exclusive
        die unless $host || $file;                   # required group
        die "Port requires host" if $port && !$host; # dependency
        die if $ssl && $port != 443;                 # value constraint

        # ... connection logic
    }

The extractor generates:

    relationships:
      - type: mutually_exclusive
        params: [file, content]
        description: Cannot specify both file and content
      - type: required_group
        params: [host, file]
        logic: or
        description: Must specify either host or file
      - type: dependency
        param: port
        requires: host
        description: port requires host to be specified
      - type: value_constraint
        if: ssl
        then: port
        operator: ==
        value: 443
        description: When ssl is specified, port must equal 443

=head1 MODERN PERL FEATURES

This module adds support for:

=head2 Subroutine Signatures (Perl 5.20+)

    sub connect($host, $port = 3306, %options) {
        ...
    }

Extracts: required params, optional params with defaults, slurpy params

=head2 Type Constraints (Perl 5.36+)

    sub calculate($x :Int, $y :Num) {
        ...
    }

Recognizes: Int, Num, Str, Bool, ArrayRef, HashRef, custom classes

=head3 Subroutine Attributes

    sub get_value :lvalue :Returns(Int) {
        ...
    }

Detects: :lvalue, :method, :Returns(Type), custom attributes

=head2 Postfix Dereferencing (Perl 5.20+)

    my @array = $arrayref->@*;
    my %hash = $hashref->%*;
    my @slice = $arrayref->@[1,3,5];

Tracks usage of modern dereferencing syntax

=head2 Field Declarations (Perl 5.38+)

    field $host :param = 'localhost';
    field $port :param(port_number) = 3306;
    field $logger :param :isa(Log::Any);

Extracts fields and maps them to parameters

=head2 Modern Perl Features Support

The schema extractor supports modern Perl syntax introduced in versions 5.20, 5.36, and 5.38+.

=head3 Subroutine Signatures (Perl 5.20+)

Automatically extracts parameters from native Perl signatures:

    use feature 'signatures';

    sub connect($host, $port = 3306, $database = undef) {
        ...
    }

Extracted schema includes:

=over 4

=item * Parameter positions

=item * Optional vs required parameters

=item * Default values from signature

=item * Slurpy parameters (@array, %hash)

=back

B<Example:>

    # Signature with defaults
    sub process($file, %options) { ... }

    # Extracts:
    # $file: position 0, required
    # %options: position 1, optional, slurpy hash

=head3 Type Constraints in Signatures (Perl 5.36+)

Recognizes type constraints in signature parameters:

    sub calculate($x :Int, $y :Num, $name :Str = "result") {
        return $x + $y;
    }

Supported constraint types:

=over 4

=item * C<:Int, :Integer> -> integer

=item * C<:Num, :Number> -> number

=item * C<:Str, :String> -> string

=item * C<:Bool, :Boolean> -> boolean

=item * C<:ArrayRef, :Array> -> arrayref

=item * C<:HashRef, :Hash> -> hashref

=item * C<:ClassName> -> object with isa constraint

=back

Type constraints are combined with defaults when both are present.

=head3 Subroutine Attributes

Extracts and documents subroutine attributes:

    sub get_value :lvalue {
        my $self = shift;
        return $self->{value};
    }

    sub calculate :Returns(Int) :method {
        my ($self, $x, $y) = @_;
        return $x + $y;
    }

Recognized attributes stored in C<_attributes> field:

=over 4

=item * C<:lvalue> - Method can be assigned to

=item * C<:method> - Explicitly marked as method

=item * C<:Returns(Type)> - Declares return type

=item * Custom attributes with values: C<:MyAttr(value)>

=back

=head3 Postfix Dereferencing (Perl 5.20+)

Detects usage of postfix dereferencing syntax:

    use feature 'postderef';

    sub process_array {
        my ($self, $arrayref) = @_;
        my @array = $arrayref->@*;        # Array dereference
        my @slice = $arrayref->@[1,3,5];  # Array slice
        return @array;
    }

    sub process_hash {
        my ($self, $hashref) = @_;
        my %hash = $hashref->%*;          # Hash dereference
        return keys %hash;
    }

Tracked features stored in C<_modern_features>:

=over 4

=item * C<array_deref> - Uses C<-E<gt>@*>

=item * C<hash_deref> - Uses C<-E<gt>%*>

=item * C<scalar_deref> - Uses C<-E<gt>$*>

=item * C<code_deref> - Uses C<-E<gt>&*>

=item * C<array_slice> - Uses C<-E<gt>@[...]>

=item * C<hash_slice> - Uses C<-E<gt>%{...}>

=back

=head3 Field Declarations (Perl 5.38+)

Extracts field declarations from class syntax and maps them to method parameters:

    use feature 'class';

    class DatabaseConnection {
        field $host :param = 'localhost';
        field $port :param = 3306;
        field $username :param(user);
        field $password :param;
        field $logger :param :isa(Log::Any);

        method connect() {
            # Fields available as instance variables
        }
    }

Field attributes:

=over 4

=item * C<:param> - Field is a constructor parameter (uses field name)

=item * C<:param(name)> - Field maps to parameter with different name

=item * C<:isa(Class)> - Type constraint for the field

=item * Default values in field declarations

=back

Extracted schema includes both field information in C<_fields> and merged parameter
information in C<input>, allowing proper validation of class constructors.

=head3 Mixed Modern and Traditional Syntax

The extractor handles code that mixes modern and traditional syntax:

    sub modern($x, $y = 5) {
        # Modern signature with default
    }

    sub traditional {
        my ($self, $x, $y) = @_;
        $y //= 5;  # Traditional default in code
        # Both extract same parameter information
    }

Priority order for parameter information:

=over 4

=item 1. Signature declarations (highest priority)

=item 2. Field declarations (for class methods)

=item 3. POD documentation

=item 4. Code analysis (lowest priority)

=back

This ensures that explicit declarations in signatures take precedence over
inferred information from code analysis.

=head3 Backwards Compatibility

All modern Perl feature detection is optional and automatic:

=over 4

=item * Traditional C<sub> declarations continue to work

=item * Code without modern features extracts parameters as before

=item * Modern features are additive - they enhance rather than replace existing extraction

=item * Schemas include C<_source> field indicating where parameter info came from

=back

=head2 _yamltest_hints

Each method schema returned by L</extract_all> now optionally includes a
C<_yamltest_hints> key, which provides guidance for automated test generation
based on the code analysis.

This is intended to help L<App::Test::Generator> create meaningful tests,
including boundary and invalid input cases, without manually specifying them.

The structure is a hashref with the following keys:

=over 4

=item * boundary_values

An arrayref of numeric values that represent boundaries detected from
comparisons in the code. These are derived from literals in statements
like C<$x < 0> or C<$y >= 255>. The generator can use these to create
boundary tests.

Example:

    _yamltest_hints:
      boundary_values: [0, 1, 100, 255]

=item * invalid_inputs

An arrayref of values that are likely to be rejected by the method,
based on checks like C<defined>, empty strings, or numeric validations.

Example:

    _yamltest_hints:
      invalid_inputs: [undef, '', -1]

=item * equivalence_classes

An arrayref intended to capture detected equivalence classes or patterns
among inputs. Currently this is empty by default, but future enhancements
may populate it based on detected input groupings.

Example:

    _yamltest_hints:
      equivalence_classes: []

=back

=head3 Usage

When calling C<extract_all>, each method schema will include
C<_yamltest_hints> if any hints were detected:

    my $schemas = $extractor->extract_all;
    my $hints  = $schemas->{example_method}->{_yamltest_hints};

You can then feed these hints into automated test generators to produce
negative tests, boundary tests, and parameter-specific test cases.

=head3 Notes

=over 4

=item * Hints are inferred heuristically from code and validation statements.

=item * Not all inputs are guaranteed to be detected; the feature is additive
and will never remove information from the schema.

=item * Currently, equivalence classes are not populated, but the field exists
for future extension.

=item * Boundary and invalid input hints are deduplicated to avoid repeated
test values.

=back

=head3 Examples

Given a method like:

    sub example {
        my ($x) = @_;
        die "negative" if $x < 0;
        return unless defined($x);
        return $x * 2;
    }

After running:

    my $extractor = App::Test::Generator::SchemaExtractor->new(
        input_file => 'TestHints.pm',
        output_dir => '/tmp',
        quiet    => 1,
    );

    my $schemas = $extractor->extract_all;

The schema for the method "example" will include:

    $schemas->{example} = {
        function => 'example',
        _confidence => {
            input  => 'unknown',
            output => 'unknown',
        },
        input => {
            x => {
                type     => 'scalar',
                optional => 0,
            }
        },
        output => {
            type => 'scalar',
        },
        _yamltest_hints => {
            boundary_values => [0, 1],
            invalid_inputs  => [undef, -1],
            equivalence_classes => [],
        },
        _notes => '...',
        _analysis => {
            input_confidence  => 'low',
            output_confidence => 'unknown',
            confidence_factors => {
                input  => {...},
                output => {...},
            },
            overall_confidence => 'low',
        },
        _fields => {},
        _modern_features => {},
        _attributes => {},
    };

=head1 METHODS

=head2 new

Construct a new SchemaExtractor for a given Perl source file.

    my $extractor = App::Test::Generator::SchemaExtractor->new(
        input_file           => 'lib/MyModule.pm',  # Required
        output_dir           => 'schemas/',         # Optional - only needed if writing schemas
        verbose              => 1,                  # Default: 0
        include_private      => 1,                  # Default: 0
        max_parameters       => 50,                 # Default: 20
        confidence_threshold => 0.7,               # Default: 0.5
        strict_pod           => 0|1|2,              # Default: 0 (off)
        allow_signature_exec => 1,                  # Default: 0 (off)
    );

=head3 Arguments

=over 4

=item * C<$input_file>

Path to the Perl source file to analyse. Required. Must exist on disk.

=item * C<output_dir>

Directory to write generated schema YAML files. Optional - only
required if C<_write_schema> will be called. Callers passing
C<no_write =E<gt> 1> to C<extract_all> do not need to supply it.

=item * C<verbose>

Print progress messages to stdout during analysis. Optional, default 0.

=item * C<include_private>

Include methods whose names begin with C<_> in the analysis. Optional,
default 0. Methods whose name begins with C<_new>, C<_init>, or
C<_build> are always included regardless of this setting (a prefix
match, so e.g. C<_build_attribute> and C<_init_logger> qualify too,
matching common Moose builder/initializer naming conventions).

=item * C<max_parameters>

Safety limit on the number of parameters analysed per method to prevent
runaway processing on pathological code. Optional, default 20.

=item * C<confidence_threshold>

Minimum confidence score (0.0-1.0) below which a schema is marked with
C<_low_confidence =E<gt> 1>. Optional, default 0.5.

=item * C<strict_pod>

Controls POD/code agreement validation. C<0> disables validation,
C<1> emits warnings, C<2> croaks on first disagreement. Also accepts
the strings C<off>, C<warn>, and C<fatal>. Optional, default 0.

=item * C<allow_signature_exec>

Opt-in flag allowing extraction of parameter types from a
L<Type::Params> C<signature_for()> declaration. This requires actually
running the C<signature_for> expression (sliced from the target
module's own source) in a forked C<perl -T> process, since
L<Type::Params> types are runtime objects that cannot be introspected
statically. Every other extraction path in this module is static
(L<PPI>-only) analysis that never executes any of the target module's
code; this is the one exception. Optional, default 0 (the
C<signature_for> path is silently skipped, with a warning under
C<verbose>, when off). Only enable this for modules whose code you
already trust enough to execute.

=back

=head3 Returns

A blessed hashref. Croaks if C<input_file> is missing or does not
exist on disk.

=head3 Side effects

Reads and parses the input file using L<PPI> at construction time.

=head3 API specification

=head4 input

    {
        input_file           => { type => SCALAR },
        output_dir           => { type => SCALAR,  optional => 1 },
        verbose              => { type => SCALAR,  optional => 1 },
        include_private      => { type => SCALAR,  optional => 1 },
        max_parameters       => { type => SCALAR,  optional => 1 },
        confidence_threshold => { type => SCALAR,  optional => 1 },
        strict_pod           => { type => SCALAR,  optional => 1 },
        allow_signature_exec => { type => SCALAR,  optional => 1 },
    }

=head4 output

    {
        type => OBJECT,
        isa  => 'App::Test::Generator::SchemaExtractor',
    }

=cut
1335
1336sub new {
1337
453
3024312
        my $class = shift;
1338
1339        # Handle hash or hashref arguments
1340
453
869
        my $params = Params::Get::get_params('input_file', @_) || {};
1341
1342
450
5874
        croak(__PACKAGE__, ': input_file required') unless exists $params->{input_file};
1343
1344        my $self = {
1345                input_file => $params->{input_file},
1346                # output_dir is optional — only required if _write_schema will be called.
1347                # Callers using extract_all(no_write => 1) do not need to supply it.
1348                output_dir => $params->{output_dir},
1349                verbose => $params->{verbose} // 0,
1350                include_private => $params->{include_private} // 0,       # include _private methods
1351                confidence_threshold => $params->{confidence_threshold} // $DEFAULT_CONFIDENCE_THRESH,
1352                max_parameters       => $params->{max_parameters}       // $DEFAULT_MAX_PARAMETERS,       # safety limit
1353                strict_pod => _validate_strictness_level($params->{strict_pod}),  # Enable strict POD checking
1354
450
1946
                allow_signature_exec => $params->{allow_signature_exec} // 0,     # opt-in: execute Type::Params signature_for() exprs from the target module
1355        };
1356
1357        # Validate input file exists
1358
450
3338
        unless (-f $self->{input_file}) {
1359
3
52
                croak(__PACKAGE__, ": Input file '$self->{input_file}' does not exist");
1360        }
1361
1362
447
988
        return bless $self, $class;
1363}
1364
1365 - 1438
=head2 extract_all

Extract schemas for all qualifying methods in the module and return
them as a hashref.

    my $schemas = $extractor->extract_all();

    # Suppress writing .yml files to disk
    my $schemas = $extractor->extract_all(no_write => 1);

=head3 Arguments

=over 4

=item * C<no_write>

When true, schema files are not written to C<output_dir>. The returned
hashref is still fully populated. Useful when the caller wants to
inspect or augment schemas before deciding whether to write them.
Optional, default 0.

=back

=head3 Returns

A hashref mapping method name strings to schema hashrefs. Each schema
contains at minimum the keys C<function>, C<module>, C<input>,
C<output>, and C<_analysis>. See L</Generated Schema Structure> for
the full structure.

=head3 Side effects

Parses the input file with L<PPI>. Writes one YAML file per method to
C<output_dir> unless C<no_write> is set. Creates C<output_dir> if it
does not exist and writing is enabled.

=head3 Notes

Private methods (names beginning with C<_>) are excluded unless
C<include_private =E<gt> 1> was passed to C<new>. Duplicate method
names are deduplicated with a warning logged to stdout in verbose mode.

POD/code agreement validation is applied if C<strict_pod> was set in
C<new>. At level 2 (fatal), the first disagreement causes an immediate
croak.

=head3 API specification

=head4 input

    {
        self     => { type => OBJECT, isa => 'App::Test::Generator::SchemaExtractor' },
        no_write => { type => SCALAR, optional => 1 },
    }

=head4 output

    {
        type => HASHREF,
        keys => {
            '*' => {
                type => HASHREF,
                keys => {
                    function  => { type => SCALAR  },
                    module    => { type => SCALAR  },
                    input     => { type => HASHREF },
                    output    => { type => HASHREF },
                    _analysis => { type => HASHREF },
                },
            },
        },
    }

=cut
1439
1440sub extract_all {
1441
87
2602
        my $self = shift;
1442
87
146
        my $params = Params::Get::get_params(undef, @_) || {};
1443
1444
87
1023
        $self->_log("Parsing $self->{input_file}...");
1445
87
182
        $self->_log('Strict POD mode: ' . (qw(off warn fatal))[$self->{strict_pod}]);
1446
1447
87
289
        my $document = PPI::Document->new($self->{input_file}) or die "Failed to parse $self->{input_file}: $!";
1448
1449        # Store document for later use
1450
87
694810
        $self->{_document} = $document;
1451
1452
87
229
        my $package_name = $self->_extract_package_name($document);
1453
87
783
        $self->{_package_name} //= $package_name;
1454
87
184
        $self->_log("Package: $package_name");
1455
1456
87
121
        my $methods = $self->_find_methods($document);
1457
87
188
        $self->_log('Found ' . scalar(@$methods) . ' methods (pre-dedup)');
1458
1459
87
71
        my %schemas;
1460
87
87
58
85
        foreach my $method (@{$methods}) {
1461
286
464
                $self->_log("\nAnalyzing method: $method->{name}");
1462
1463
286
351
                my $schema = $self->_analyze_method($method);
1464
285
364
                $schemas{$method->{name}} = $schema;
1465
285
289
                $schema->{'module'} = $package_name;
1466
1467                # Write individual schema file
1468                # Only write schema files if no_write is not set
1469
285
434
                $self->_write_schema($method->{name}, $schema) unless $params->{no_write};
1470        }
1471
1472
86
303
        return \%schemas;
1473}
1474
1475# --------------------------------------------------
1476# _extract_package_name
1477#
1478# Purpose:    Extract the Perl package name from a
1479#             PPI document, or from the cached value
1480#             stored at construction time.
1481#
1482# Entry:      $document - a PPI::Document, or undef
1483#                         to use $self->{_document}.
1484#
1485# Exit:       Returns the package namespace string,
1486#             or an empty string if no package
1487#             statement is found.
1488#
1489# Side effects: Stores the package name in
1490#               $self->{_package_name} if not already
1491#               set.
1492#
1493# Notes:      Croaks if more than one package
1494#             declaration is found — multi-package
1495#             files are not supported.
1496# --------------------------------------------------
1497sub _extract_package_name {
1498
102
3243
        my ($self, $document) = @_;
1499
1500
102
124
        if(!defined($document)) {
1501
13
13
                $document = $self->{_document};
1502        }
1503
102
174
        my $pkgs = $document->find('PPI::Statement::Package') || [];
1504
102
179755
        if(@$pkgs == 0) {
1505
8
24
                my $package_stmt = $document->find_first('PPI::Statement::Package');
1506
8
9257
                return $package_stmt ? $package_stmt->namespace() : '';
1507        }
1508
94
121
        croak('More than one package declaration found') if @$pkgs > 1;
1509
94
265
        $self->{_package_name} //= $pkgs->[0]->namespace();
1510
94
1078
        return $pkgs->[0]->namespace();
1511}
1512
1513# --------------------------------------------------
1514# _find_methods
1515#
1516# Purpose:    Locate all subroutine and method
1517#             declarations in a PPI document,
1518#             including Moose-style method modifiers
1519#             and Perl 5.38 class/method syntax.
1520#
1521# Entry:      $document - a PPI::Document.
1522#
1523# Exit:       Returns an arrayref of method hashrefs,
1524#             each containing: name, node, body, pod,
1525#             type, and optionally modifier, class,
1526#             and fields keys.
1527#             Private methods (names beginning with
1528#             _) are excluded unless include_private
1529#             was set in new(), except for _new,
1530#             _init, and _build which are always
1531#             included.
1532#
1533# Side effects: Logs progress and warnings to stdout
1534#               when verbose is set.
1535#
1536# Notes:      Duplicate method names are silently
1537#             deduplicated — the second occurrence
1538#             is dropped with a verbose warning.
1539#             Class/method detection is regex-based
1540#             and may misbehave on complex code.
1541# --------------------------------------------------
1542sub _find_methods {
1543
93
34247
        my ($self, $document) = @_;
1544
1545
93
109
        my $subs = $document->find('PPI::Statement::Sub') || [];
1546
93
156336
        my $sub_decls = $document->find('PPI::Statement') || [];
1547
1548
93
155104
        my @methods;
1549
93
125
        foreach my $sub (@$subs) {
1550
308
11985
                my $name = $sub->name();
1551
1552
308
6495
                next unless defined $name;      # Skip anonymous routines
1553
308
355
                next if $name =~ /^(BEGIN|END|DESTROY|AUTOLOAD|CHECK|INIT|UNITCHECK)$/;
1554
1555                # Skip private methods unless explicitly included, or they're special
1556
306
360
                if ($name =~ /^_/ && $name !~ /^_(new|init|build)/) {
1557
10
20
                        next unless $self->{include_private};
1558                }
1559
1560                # Get the POD before this sub
1561
299
316
                my $pod = $self->_extract_pod_before($sub);
1562
1563
299
329
                push @methods, {
1564                        name => $name,
1565                        node => $sub,
1566                        body => $sub->content(),
1567                        pod => $pod,
1568                        type => 'sub',
1569                };
1570        }
1571
1572        # Look for class { method } syntax (Perl 5.38+)
1573
93
6665
        my $content = $document->content();
1574
93
20851
        if ($content =~ /\bclass\b/) {
1575
18
31
                $self->_log('  Detecting class/method syntax...');
1576
18
30
                $self->_extract_class_methods($content, \@methods);
1577        }
1578
1579        # Process method modifiers (Moose)
1580
93
90
        foreach my $decl (@$sub_decls) {
1581
1599
1217
                my $content = $decl->content;
1582
1599
35743
                if ($content =~ /^\s*(before|after|around)\s+['"]?(\w+)['"]?\b/) {
1583
0
0
                        my ($modifier, $method_name) = ($1, $2);
1584
0
0
                        my $full_name = "${modifier}_$method_name";
1585
1586                        # Look for the actual sub definition that follows
1587
0
0
                        my $next_sib = $decl->next_sibling;
1588
0
0
                        while ($next_sib && !$next_sib->isa('PPI::Statement::Sub')) {
1589
0
0
                                $next_sib = $next_sib->next_sibling;
1590                        }
1591
1592
0
0
                        if ($next_sib && $next_sib->isa('PPI::Statement::Sub')) {
1593
0
0
                                my $pod = $self->_extract_pod_before($decl); # POD might be before modifier
1594
0
0
                                push @methods, {
1595                                        name => $full_name,
1596                                        node => $next_sib,
1597                                        body => $next_sib->content,
1598                                        pod => $pod,
1599                                        type => 'modifier',
1600                                        original_method => $method_name,
1601                                        modifier => $modifier,
1602                                };
1603
0
0
                                $self->_log("  Found method modifier: $full_name");
1604                        }
1605                }
1606        }
1607
1608        # Prevent silent duplicate method overwrites
1609
93
81
        my %seen;
1610        @methods = grep {
1611
93
300
100
225
                my $n = $_->{name};
1612
300
354
                if ($seen{$n}++) {
1613
1
2
                        $self->_log("  WARNING: duplicate method '$n' ignored");
1614
1
2
                        0;
1615                } else {
1616
299
260
                        1;
1617                }
1618        } @methods;
1619
1620
93
188
        return \@methods;
1621}
1622
1623# --------------------------------------------------
1624# _extract_class_methods
1625#
1626# Purpose:    Extract method declarations from
1627#             Perl 5.38 class { method {} } syntax
1628#             by regex-based scanning of the class
1629#             body content.
1630#
1631# Entry:      $content - full document source string.
1632#             $methods - arrayref to push discovered
1633#                        method hashrefs onto
1634#                        (modified in place).
1635#
1636# Exit:       Returns nothing. Appends to $methods.
1637#
1638# Side effects: Logs class and method discoveries
1639#               to stdout when verbose is set.
1640#
1641# Notes:      This is experimental — regex-based
1642#             class body parsing may misbehave on
1643#             complex or nested class declarations.
1644#             Class body boundaries are tracked by
1645#             simple brace counting, which will
1646#             fail on unbalanced braces in strings
1647#             or heredocs.
1648# --------------------------------------------------
1649sub _extract_class_methods {
1650
19
28
        my ($self, $content, $methods) = @_;
1651
1652        # EXPERIMENTAL: regex-based parsing, may misbehave on complex code
1653
1654        # Simple pattern: find "class Name {" blocks
1655        # This won't handle all edge cases but will work for simple classes
1656
19
75
        while ($content =~ /class\s+(\w+)\s*\{/g) {
1657
2
2
                my $class_name = $1;
1658
2
2
                my $start_pos = pos($content);
1659
1660                # Find the matching closing brace. $start_pos is just after the
1661                # opening '{' consumed by the regex above, so back up one
1662                # character to hand the brace itself to extract_bracketed.
1663
2
7
                require Text::Balanced;
1664
2
8
                my $extracted = Text::Balanced::extract_bracketed(substr($content, $start_pos - 1), '{}');
1665
1666
2
684
                next unless defined $extracted; # unbalanced braces, skip class
1667
1668
2
3
                my $class_body = substr($extracted, 1, length($extracted) - 2);
1669
1670
2
6
                $self->_log("  Found class $class_name");
1671
1672                # Extract field declarations from class
1673
2
5
                my $fields = $self->_extract_field_declarations($class_body);
1674
1675                # Find methods in the class body
1676
2
8
                while ($class_body =~ /method\s+(\w+)\s*(\([^)]*\))?\s*\{/g) {
1677
2
6
                        my ($method_name, $sig_with_parens) = ($1, $2 || '()');
1678
1679                        # Skip private unless configured
1680
2
4
                        if ($method_name =~ /^_/ && $method_name !~ /^_(new|init|build)/) {
1681
0
0
                                next unless $self->{include_private};
1682                        }
1683
1684                        # Reconstruct as sub for analysis
1685
2
2
                        my $signature = $sig_with_parens;
1686
2
4
                        $signature =~ s/^\(//;
1687
2
4
                        $signature =~ s/\)$//;
1688
1689                        # Build a fake sub declaration
1690
2
3
                        my $fake_sub = "sub $method_name($signature) { }";
1691
1692
2
8
                        push @$methods, {
1693                                name => $method_name,
1694                                node => undef,
1695                                body => $fake_sub,   # Just the signature for now
1696                                is_stub => 1,
1697                                pod => '',
1698                                type => 'method',
1699                                class => $class_name,
1700                                fields => $fields,
1701                        };
1702
1703
2
4
                        $self->_log("  Found method $method_name in class $class_name");
1704                }
1705        }
1706}
1707
1708# --------------------------------------------------
1709# _extract_pod_before
1710#
1711# Purpose:    Collect the POD documentation that
1712#             appears immediately before a
1713#             subroutine in the PPI document, by
1714#             walking backwards through siblings.
1715#
1716# Entry:      $sub - a PPI node (typically a
1717#                    PPI::Statement::Sub).
1718#
1719# Exit:       Returns a string containing all POD
1720#             content found before the sub, with
1721#             inline parameter comments converted
1722#             to =item format. Returns an empty
1723#             string if no POD is found.
1724#
1725# Side effects: None.
1726#
1727# Notes:      Stops walking backwards on the first
1728#             non-POD, non-whitespace, non-separator,
1729#             non-include node encountered.
1730#             Walking is capped at $POD_WALK_LIMIT
1731#             steps to prevent runaway processing
1732#             on pathological documents.
1733# --------------------------------------------------
1734sub _extract_pod_before {
1735
301
3411
        my ($self, $sub) = @_;
1736
1737
301
233
        my $pod = '';
1738
301
340
        my $current = $sub->previous_sibling();
1739
301
3447
        my $seen_code = 0;
1740
301
201
        my $steps = 0;
1741
1742        # Walk backwards collecting POD.
1743        # Stop after the first pod token so that a =cut before =head1 METHODS
1744        # prevents class-level POD from being mistaken for method-specific POD.
1745
301
546
        while($current && $steps++ < $POD_WALK_LIMIT) {
1746
745
6578
                if ($current->isa('PPI::Token::Pod')) {
1747
109
119
                        $pod = $current->content() . $pod;
1748
109
209
                        last;   # Only take the immediately adjacent pod block
1749                } elsif ($current->isa('PPI::Token::Comment')) {
1750                        # Include comments that might contain parameter info
1751
6
7
                        my $comment = $current->content();
1752
6
13
                        if ($comment =~ /#\s*(?:param|arg|input)\s+\$(\w+)\s*:\s*(.+)/i) {
1753
0
0
                                $pod .= "=item \$$1\n$2\n\n";
1754                        }
1755                } elsif ($current->isa('PPI::Token::Whitespace') ||
1756                         $current->isa('PPI::Token::Separator')) {
1757                        # Skip whitespace and separators
1758                } elsif ($current->isa('PPI::Statement::Include')) {
1759                        # allow 'use strict', 'use warnings' between POD and sub
1760                } else {
1761                        # Hit non-POD, non-whitespace - stop
1762
188
129
                        last;
1763                }
1764
448
425
                $current = $current->previous_sibling();
1765        }
1766
1767
301
294
        return $pod;
1768}
1769
1770# --------------------------------------------------
1771# _analyze_method
1772#
1773# Purpose:    Perform full multi-source analysis of
1774#             a single method and produce a complete
1775#             schema hashref, combining POD analysis,
1776#             code pattern detection, signature
1777#             analysis, validator schema extraction,
1778#             confidence scoring, relationship
1779#             detection, and modern Perl feature
1780#             extraction.
1781#
1782# Entry:      $method - a method hashref as produced
1783#                       by _find_methods, containing
1784#                       at minimum: name, body, pod.
1785#
1786# Exit:       Returns a schema hashref containing:
1787#             function, input, output, _confidence,
1788#             _analysis, _notes, and optionally:
1789#             new, accessor, relationships,
1790#             _yamltest_hints, _attributes,
1791#             _modern_features, _fields, _model,
1792#             _low_confidence.
1793#
1794# Side effects: Logs progress to stdout when verbose
1795#               is set. May carp or croak if
1796#               strict_pod is enabled and POD/code
1797#               disagreements are found.
1798#
1799# Notes:      This is the central analysis entry
1800#             point — it orchestrates all other
1801#             analysis helpers and merges their
1802#             results. The non-invasive reasoning
1803#             layer (Model::Method, Analyzer::*)
1804#             runs after the main schema is built
1805#             and attaches metadata only.
1806# --------------------------------------------------
1807sub _analyze_method {
1808
286
233
        my ($self, $method) = @_;
1809
286
249
        my $code = $method->{body};
1810
286
224
        my $pod = $method->{pod};
1811
1812        # Extract modern features
1813
286
369
        my $attributes = $self->_extract_subroutine_attributes($code);
1814
286
306
        my $postfix_derefs = $self->_analyze_postfix_dereferencing($code);
1815
286
313
        my $fields = $self->_extract_field_declarations($code);
1816
1817        # If this method came from a class, use those field declarations
1818
286
1
294
2
        if ($method->{fields} && keys %{$method->{fields}}) {
1819
1
2
                $fields = $method->{fields};
1820        }
1821
1822        my $schema = {
1823                function => $method->{name},
1824
286
761
                _confidence => {
1825                        'input' => {},
1826                        'output' => {}
1827                },
1828                input => {},
1829                output => {},
1830                setup => undef,
1831                transforms => {},
1832        };
1833
1834        # Analyze different sources
1835
286
382
        my $pod_params = $self->_analyze_pod($pod);
1836
286
350
        my $code_params = $self->_analyze_code($code, $method);
1837
1838        # Validate POD/code agreement if strict mode is enabled.
1839        # Skip when there is no POD at all — strict_pod checks accuracy of
1840        # existing documentation, not whether every method is documented.
1841
286
379
        if ($self->{strict_pod} && $pod) {
1842                my @validation_errors = $self->_validate_pod_code_agreement(
1843                        $pod_params,
1844                        $code_params,
1845                        $method->{name},
1846                        {
1847
10
28
                                ignore_self => 1,
1848                                allow_renames => 1,
1849                        }
1850                );
1851
1852
10
17
                if (@validation_errors) {
1853
9
18
                        my $error_msg = "POD/Code disagreement in method '$method->{name}':\n  " .
1854                                join("\n  ", @validation_errors);
1855
1856                        # Add to schema for reference even if we croak
1857
9
14
                        $schema->{_pod_validation_errors} = \@validation_errors;
1858
1859                        # Either croak immediately or log based on configuration
1860
9
14
                        if($self->{strict_pod} == 2) {       # 2 = fatal errors
1861
1
7
                                croak("[POD STRICT] $error_msg");
1862                        } else {        # 1 = warnings
1863
8
404
                                carp("[POD STRICT] $error_msg");
1864                                # Continue with analysis, but mark as problematic
1865
8
3745
                                $schema->{_pod_disagreement} = 1;
1866                        }
1867                }
1868
9
11
                $schema->{_strict_pod_level} = $self->{strict_pod};
1869        }
1870
1871
285
329
        my $validator_params = $self->_extract_validator_schema($code);
1872
1873
285
281
        if ($validator_params) {
1874
5
7
                $schema->{input} = $validator_params->{input};
1875
5
6
                $schema->{input_style} = 'hash';
1876
5
12
                $schema->{_confidence}{input} = { 'factors' => [ 'Determined from validator' ], 'level' => 'high' };
1877                $schema->{_analysis}{confidence_factors}{input} = [
1878
5
11
                        'Input schema extracted from validator'
1879                ];
1880        } else {
1881                # Merge field declarations into code_params before merging analyses
1882
280
352
                if (keys %$fields) {
1883
1
2
                        $self->_merge_field_declarations($code_params, $fields);
1884                }
1885
1886                # Merge analyses
1887
280
335
                $schema->{input} = $self->_merge_parameter_analyses(
1888                        $pod_params,
1889                        $code_params,
1890                );
1891        }
1892
1893# ----------------------------------------
1894# Legacy Output Analysis (unchanged)
1895# ----------------------------------------
1896
1897$schema->{output} = $self->_analyze_output(
1898    $method->{pod},
1899    $method->{body},
1900    $method->{name}
1901
285
450
);
1902
1903
1904        # Detect accessor methods
1905
285
432
        $self->_detect_accessor_methods($method, $schema);
1906
1907        # Detect if this is an instance method that needs object instantiation
1908        # Constructors never require object instantiation
1909
285
397
        my $needs_object = $self->_needs_object_instantiation($method->{name}, $method->{body}, $method);
1910
285
499
        if($method->{name} ne 'new' && $needs_object) {
1911
126
172
                $schema->{new} = $needs_object;
1912
126
181
                $self->_log("  NEW: Method requires object instantiation: $needs_object");
1913        }
1914
1915        # Calculate confidences
1916
285
321
        my $input_confidence = $schema->{_confidence}{'input'};
1917
285
262
        if(!ref($input_confidence)) {
1918
0
0
                $input_confidence = $schema->{_confidence}{'input'} = $self->_calculate_input_confidence($schema->{input});
1919        }
1920
285
398
        my $output_confidence = $schema->{_confidence}{'output'} = $self->_calculate_output_confidence($schema->{output});
1921
1922        # Add metadata
1923
285
367
        $schema->{_notes} = $self->_generate_notes($schema->{input});
1924
1925        # Add analytics
1926
285
550
        $schema->{_analysis} ||= {};
1927
285
282
        $schema->{_analysis}{input_confidence} = $input_confidence->{level};
1928
285
305
        $schema->{_analysis}{output_confidence} = $output_confidence->{level};
1929
285
490
        $schema->{_analysis}{confidence_factors} ||= {};
1930
285
514
        $schema->{_analysis}{confidence_factors}{input} ||= $input_confidence->{factors};
1931
285
495
        $schema->{_analysis}{confidence_factors}{output} ||= $output_confidence->{factors};
1932
1933
285
221
        foreach my $mode('input', 'output') {
1934
570
552
                $self->_set_defaults($schema, $mode);
1935        }
1936
1937        # Optionally store detailed per-parameter analysis
1938
285
339
        if ($input_confidence->{per_parameter}) {
1939
0
0
                $schema->{_analysis}{per_parameter_scores} = $input_confidence->{per_parameter};
1940        }
1941
1942        # Calculate overall confidence (for backward compatibility)
1943
285
214
        my $input_level = $input_confidence->{level};
1944
285
206
        my $output_level = $output_confidence->{level};
1945
1946
285
504
        my %level_rank = (
1947                none => 0,
1948                very_low => 1,
1949                low => 2,
1950                medium => 3,
1951                high => 4
1952        );
1953
1954        # Overall is the lower of input and output
1955
285
477
        $input_level //= 'none';
1956
285
296
        $output_level //= 'none';
1957
285
317
        my $overall = $level_rank{$input_level} < $level_rank{$output_level} ? $input_level : $output_level;
1958
1959
285
246
        $schema->{_analysis}{overall_confidence} = $overall;
1960
1961        # Analyze parameter relationships
1962
285
281
        my $relationships = $self->_analyze_relationships($method);
1963
285
285
292
314
        if ($relationships && @{$relationships}) {
1964
7
8
                $schema->{relationships} = $relationships;
1965
7
11
                $self->_log("  Found " . scalar(@$relationships) . " parameter relationships");
1966        }
1967
1968        # Store modern feature info in schema
1969
285
307
        $schema->{_attributes} = $attributes if keys %$attributes;
1970
285
289
        $schema->{_modern_features}{postfix_dereferencing} = $postfix_derefs if keys %$postfix_derefs;
1971
285
325
        $schema->{_fields} = $fields if keys %$fields;
1972
1973        # Store class info if this is a class method
1974
285
297
        if ($method->{class}) {
1975
1
1
                $schema->{_class} = $method->{class};
1976        }
1977
1978
285
377
        my $hints = $self->_extract_test_hints($method, $schema);
1979
285
375
        $self->_extract_pod_examples($pod, $hints);
1980
1981
285
230
        for my $k (qw(boundary_values invalid_inputs valid_inputs equivalence_classes)) {
1982
1140
633
                my %seen;
1983                $hints->{$k} = [
1984
44
78
                        grep { !$seen{ defined $_ ? $_ : '__undef__' }++ }
1985
1140
1140
625
1288
                        @{ $hints->{$k} }
1986                ];
1987        }
1988
1989        # --------------------------------------------------
1990        # YAML test hints: numeric boundaries
1991        # --------------------------------------------------
1992
285
310
        if ($self->_method_has_numeric_intent($schema)) {
1993
143
284
                $schema->{_yamltest_hints} ||= {};
1994
1995                # Do not override existing hints
1996
143
243
                $schema->{_yamltest_hints}{boundary_values} ||= [];
1997
1998
0
0
                my %seen = map { (defined $_ ? $_ : '__undef__') => 1 }
1999
143
143
108
152
                        @{ $schema->{_yamltest_hints}{boundary_values} };
2000
2001
143
143
107
139
                foreach my $v (@{ $self->_numeric_boundary_values }) {
2002
714
519
                        my $key = defined $v ? $v : '__undef__';
2003
714
714
684
596
                        push @{ $schema->{_yamltest_hints}{boundary_values} }, $v unless $seen{$key}++;
2004                }
2005
2006
143
162
                $self->_log('  HINTS: Added numeric boundary values');
2007        }
2008
2009
285
301
        if (keys %$hints) {
2010
285
401
                $schema->{_yamltest_hints} ||= {};
2011
285
267
                foreach my $k (keys %$hints) {
2012                        $schema->{_yamltest_hints}{$k} = $hints->{$k}
2013
1140
1177
                        unless exists $schema->{_yamltest_hints}{$k};
2014                }
2015        }
2016
2017
285
470
        if(($level_rank{$overall} < $level_rank{$LEVEL_MEDIUM}) &&
2018           ($level_rank{$overall} < ($self->{confidence_threshold} * 4))) {
2019
238
964
                $schema->{_low_confidence} = 1
2020        }
2021
2022        # ----------------------------------------
2023        # Non-invasive reasoning layer
2024        # ----------------------------------------
2025
2026        my $method_model = App::Test::Generator::Model::Method->new(
2027                name => $method->{name},
2028                source => $method->{body},
2029
285
1017
        );
2030
2031
285
569
        my $return_analyzer = App::Test::Generator::Analyzer::Return->new();
2032
285
383
        $return_analyzer->analyze($method_model);
2033
2034        # Let model learn from finalized schema
2035
285
293
        if ($schema->{output}) {
2036
285
390
                $method_model->absorb_legacy_output($schema->{output});
2037        }
2038
2039
285
397
        $method_model->resolve_return_type();
2040
285
355
        $method_model->resolve_classification();
2041
285
330
        $method_model->resolve_confidence();
2042
2043        # Attach only metadata
2044        $schema->{_model} = {
2045
285
314
                classification => $method_model->classification,
2046                confidence => $method_model->confidence,
2047        };
2048
2049        # ----------------------------------------
2050        # Return Meta Analysis (Non-invasive)
2051        # ----------------------------------------
2052
2053
285
608
        my $meta = App::Test::Generator::Analyzer::ReturnMeta->new();
2054
285
342
        my $analysis = $meta->analyze($schema);
2055
2056
285
267
        $schema->{_analysis}{stability_score} = $analysis->{stability_score};
2057
285
277
        $schema->{_analysis}{consistency_score} = $analysis->{consistency_score};
2058
285
245
        $schema->{_analysis}{risk_flags} = $analysis->{risk_flags};
2059
2060        # ----------------------------------------
2061        # Side Effect Analysis (Non-invasive)
2062        # ----------------------------------------
2063
2064
285
514
        my $se = App::Test::Generator::Analyzer::SideEffect->new();
2065
2066
285
304
        my $effects = $se->analyze($method);
2067
2068
285
272
        $schema->{_analysis}{side_effects} = $effects;
2069
2070        # ----------------------------------------
2071        # Complexity Analysis (Non-invasive)
2072        # ----------------------------------------
2073
2074
285
480
        my $cx = App::Test::Generator::Analyzer::Complexity->new();
2075
285
302
        my $complexity = $cx->analyze($method);
2076
2077
285
293
        $schema->{_analysis}{complexity} = $complexity;
2078
2079
285
1651
        return $schema;
2080}
2081
2082# --------------------------------------------------
2083# _method_has_numeric_intent
2084#
2085# Purpose:    Determine whether a method schema
2086#             has numeric intent — either a numeric
2087#             output type or at least one required
2088#             numeric input parameter — to decide
2089#             whether to add standard numeric
2090#             boundary hint values.
2091#
2092# Entry:      $schema - schema hashref as built by
2093#                       _analyze_method.
2094#
2095# Exit:       Returns 1 if numeric intent is
2096#             detected, 0 otherwise.
2097#
2098# Side effects: None.
2099# --------------------------------------------------
2100sub _method_has_numeric_intent {
2101
289
259
        my ($self, $schema) = @_;
2102
2103        # Numeric output
2104
289
875
        return 1 if ($schema->{output} && $schema->{output}{type} && $schema->{output}{type} =~ /^(number|integer)$/);
2105
2106        # Numeric inputs
2107
158
158
126
267
        foreach my $p (values %{ $schema->{input} || {} }) {
2108
133
137
                next if $p->{optional};
2109
49
140
                return 1 if ($p->{type} && $p->{type} =~ /^(number|integer)$/);
2110        }
2111
2112
143
158
        return 0;
2113}
2114
2115# --------------------------------------------------
2116# _numeric_boundary_values
2117#
2118# Purpose:    Return the standard set of numeric
2119#             boundary values used as test hints
2120#             for methods with numeric intent.
2121#
2122# Entry:      None.
2123#
2124# Exit:       Returns an arrayref of boundary
2125#             values: [-1, 0, 1, 2, 100].
2126#
2127# Side effects: None.
2128# --------------------------------------------------
2129sub _numeric_boundary_values {
2130
143
187
        return [ -1, 0, 1, 2, 100 ];
2131}
2132
2133# --------------------------------------------------
2134# _detect_accessor_methods
2135#
2136# Purpose:    Detect whether a method is a getter,
2137#             setter, or combined getter/setter
2138#             accessor by analysing assignment and
2139#             return patterns involving $self->{...}.
2140#
2141# Entry:      $method - method hashref containing
2142#                       at minimum 'body' and
2143#                       optionally 'pod'.
2144#             $schema - schema hashref (modified
2145#                       in place).
2146#
2147# Exit:       Returns nothing. Modifies $schema in
2148#             place, setting accessor, input,
2149#             input_style, output, and _confidence
2150#             keys as appropriate.
2151#
2152# Side effects: Croaks if a getter/setter has more
2153#               than one argument, or if a setter
2154#               returns non-self data.
2155#               Logs detections to stdout when
2156#               verbose is set.
2157#
2158# Notes:      Four accessor patterns are detected
2159#             in order: (1) combined getter/setter
2160#             with shift, (2) combined getter/setter
2161#             with validated input, (3) getter only,
2162#             (4) setter that returns $self. Methods
2163#             accessing multiple $self fields are
2164#             skipped immediately.
2165# --------------------------------------------------
2166sub _detect_accessor_methods {
2167
288
287
        my ($self, $method, $schema) = @_;
2168
2169
288
243
        my $body = $method->{body};
2170
2171        # Normalize whitespace for regex sanity
2172
288
192
        my $code = $body;
2173
288
1317
        $code =~ s/\s+/ /g;
2174
2175        # If a method touches more than one $self->{...}, it’s not an accessor.
2176
288
205
        my %fields_seen;
2177
288
415
        while ($code =~ /\$self\s*->\s*\{\s*['"]?([^}'"]+)['"]?\s*\}/g) {
2178
73
113
                $fields_seen{$1}++;
2179        }
2180
288
275
        if (keys(%fields_seen) > 1) {
2181
3
4
                $self->_log("  Skipping accessor detection: multiple fields accessed");
2182
3
4
                return;
2183        }
2184
2185        # -------------------------------
2186        # Getter/Setter combo
2187        # -------------------------------
2188
285
1141
        if (
2189                # Require get/set of the same property
2190                $code =~ /\$self\s*->\s*\{\s*['"]?([^}'"]+)['"]?\s*\}\s*=\s*shift\s*;/ &&
2191                $code =~ /return\s+\$self\s*->\s*\{\s*['"]?\Q$1\E['"]?\s*\}\s*;/ &&
2192                $code =~ /if\s*\(\s*\@_/
2193        ) {
2194
0
0
                my $property = $1;
2195
2196
0
0
                if(!defined($property)) {
2197
0
0
                        if($code =~ /\$self\s*->\s*\{\s*['"]?([^}'"]+)['"]?\s*\}\s*=\s*shift\s*;/) {
2198
0
0
                                $property = $1;
2199                        }
2200                }
2201
2202                $schema->{accessor} = {
2203
0
0
                        type => 'getset',
2204                        property => $property,
2205                };
2206
2207
0
0
                $self->_log("  Detected getter/setter accessor for property: $property");
2208
2209
0
0
                $schema->{input} ||= { value => { type => 'string', optional => 1 } };
2210
2211
0
0
                $schema->{input_style} = 'hash';
2212
2213                $schema->{_confidence}{input} = {
2214
0
0
                        level => 'high',
2215                        factors => ['Detected combined getter/setter accessor'],
2216                };
2217
0
0
                if (my $pod = $method->{pod}) {
2218
0
0
                        if ($pod =~ /\b(LWP::UserAgent(::\w+)*)\b/) {
2219
0
0
                                my $class = $1;
2220                                $schema->{output} = {
2221
0
0
                                        type => 'object',
2222                                        isa => $class,
2223                                };
2224
0
0
                                $schema->{input}{$property} = {
2225                                        type => 'object',
2226                                        isa => $class,
2227                                        optional => 1,
2228                                };
2229
2230                                $schema->{_confidence}{output} = {
2231
0
0
                                        level => 'high',
2232                                        factors => ['POD specifies UserAgent object'],
2233                                };
2234                        }
2235                }
2236        } elsif($code =~ /if\s*\(\s*(?:\@_|[\$]\w+)/ &&
2237            $code =~ /\$self\s*->\s*\{\s*['"]?([^}'"]+)['"]?\s*\}\s*=\s*(?:shift|\@_|\$_\[\d+\]|\$\w+)\b/x &&
2238            $code =~ /return\b/
2239        ) {
2240                # -------------------------------
2241                # Getter/Setter (validated input)
2242                # -------------------------------
2243
4
4
                my $property = $1;
2244
2245
4
7
                if(!defined($property)) {
2246
4
9
                        if($code =~ /\$self\s*->\s*\{\s*['"]?([^}'"]+)['"]?\s*\}\s*=/) {
2247
4
4
                                $property = $1;
2248                        }
2249                }
2250
4
8
                if ($code =~ /validate_strict/) {
2251
2
2
2
3
                        push @{ $schema->{_confidence}{input}{factors} }, 'Setter uses Params::Validate::Strict';
2252                } else {
2253                        # ---------------------------------------
2254                        # Detect object input via blessed($arg)
2255                        # ---------------------------------------
2256
2
3
                        if ($code =~ /blessed\s*\(\s*\$(\w+)\s*\)/) {
2257
0
0
                                my $param = $1;
2258
2259
0
0
                                $self->_log("  Detected object input via blessed(\$$param)");
2260
2261                                $schema->{input} = {
2262
0
0
                                        $param => {
2263                                                type => 'object',
2264                                                optional => 1,
2265                                        }
2266                                };
2267
2268                                $schema->{_confidence}{input} = {
2269
0
0
                                        level   => 'high',
2270                                        factors => ['Input validated by Scalar::Util::blessed'],
2271                                };
2272                        } else {
2273                                # fallback ONLY if nothing known
2274                                $schema->{input} ||= {
2275
2
2
                                        value => { type => 'string', optional => 1 },
2276                                };
2277                        }
2278                };
2279                $schema->{accessor} = {
2280
4
9
                        type => 'getset',
2281                        property => $property,
2282                };
2283
2284
4
6
                $self->_log("  Detected getter/setter accessor for property: $property");
2285
4
5
                if (my $pod = $method->{pod}) {
2286
1
2
                        if ($pod =~ /\b(LWP::UserAgent(::\w+)*)\b/) {
2287
0
0
                                my $class = $1;
2288                                $schema->{output} = {
2289
0
0
                                        type => 'object',
2290                                        isa => $class,
2291                                };
2292
0
0
                                $schema->{input}{$property} = {
2293                                        type => 'object',
2294                                        isa => $class,
2295                                        optional => 1,
2296                                };
2297
2298                                $schema->{_confidence}{output} = {
2299
0
0
                                        level => 'high',
2300                                        factors => ['POD specifies UserAgent object'],
2301                                };
2302                        }
2303                }
2304
4
7
                if(ref($schema->{input}) eq 'HASH') {
2305
4
4
3
6
                        if(scalar keys(%{$schema->{input}}) > 1) {
2306
0
0
                                croak(__PACKAGE__, ': A getset accessor function can have at most one argument');
2307                        }
2308                }
2309
4
5
                $schema->{input}->{$property}->{position} = 0;
2310        } elsif ($code =~ /return\s+\$self\s*->\s*\{\s*['"]?([^}'"]+)['"]?\s*\}\s*;/) {
2311                # -------------------------------
2312                # Getter
2313                # -------------------------------
2314
20
27
                my $property = $1;
2315
2316                # Don't flag mutators like
2317                # sub foo {
2318                    # my $self = shift;
2319                    # $self->{bar} = shift;
2320                    # return $self->{bar};
2321                # }
2322                # Only exclude if the property is being set FROM EXTERNAL INPUT
2323
20
869
                if($code !~ /\$self\s*->\s*\{\s*['"]?\Q$property\E['"]?\s*\}\s*=\s*(?:shift|\$\w+\s*=\s*shift|\@_|\$_\[\d+\])/) {
2324
19
45
                        my @returns = $code =~ /return\b/g;
2325
19
321
                        my @self_returns = $code =~ /return\s+\$self\s*->\s*\{\s*['"]?\Q$property\E['"]?\s*\}/g;
2326                        # it's a getter
2327
19
28
                        if (scalar(@returns) == scalar(@self_returns)) {
2328                                # all returns are returning $self->{$property}, so it's a getter
2329                                $schema->{accessor} = {
2330
17
46
                                        type => 'getter',
2331                                        property => $property,
2332                                };
2333
2334
17
32
                                $self->_log("  Detected getter accessor for property: $property");
2335
2336                                $schema->{_confidence}{output} = {
2337
17
57
                                        level => 'high',
2338                                        factors => ['Detected getter method'],
2339                                };
2340
17
35
                                delete $schema->{input};
2341                        }
2342                }
2343        } elsif (
2344                $code =~ /return\s+\$self\b/ &&
2345                $code =~ /\$self\s*->\s*\{\s*['"]?([^}'"]+)['"]?\s*\}\s*=\s*\$(\w+)\s*;/
2346        ) {
2347                # -------------------------------
2348                # Setter
2349                # -------------------------------
2350
5
9
                my ($property, $param) = ($1, $2);
2351
2352                $schema->{accessor} = {
2353
5
14
                        type => 'setter',
2354                        property => $property,
2355                        param => $param,
2356                };
2357
2358
5
11
                $self->_log("  Detected setter accessor for property: $property");
2359
2360                $schema->{input} = {
2361
5
12
                        $param => { type => 'string' }, # safe default
2362                };
2363
5
8
                $schema->{input_style} = 'hash';
2364
2365                $schema->{_confidence}{input} = {
2366
5
9
                        level => 'high',
2367                        factors => ['Detected setter/accessor method'],
2368                };
2369
5
8
                if($schema->{output}{_returns_self}) {
2370
4
7
                        if($schema->{output}{type} ne 'object') {
2371
0
0
                                croak 'Setter can not return data other than $self';
2372                        }
2373
4
7
                        if($schema->{output}{isa} ne $self->{_package_name}) {
2374
0
0
                                croak 'Setter can not return data other than $self';
2375                        }
2376
1
2
                } elsif(scalar(keys %{$schema->{output}}) != 0) {
2377                        $self->_analysis_error(
2378                                method  => $method->{name},
2379
0
0
                                message => "Setter cannot return data",
2380                        );
2381                }
2382        }
2383
2384
285
383
        if(exists($schema->{accessor})) {
2385
26
112
                if($schema->{accessor}{type} && $schema->{accessor}{type} =~ /setter|getset/ && $schema->{input}) {
2386
9
9
9
12
                        for my $param (keys %{ $schema->{input} }) {
2387
10
11
                                my $in = $schema->{input}{$param};
2388
2389
10
19
                                if ($in->{type} && ($in->{type} eq 'object')) {
2390                                        $schema->{output} = {
2391                                                type => 'object',
2392
2
4
                                                ($in->{isa} ? (isa => $in->{isa}) : ()),
2393                                        };
2394
2395                                        $schema->{_confidence}{output} = {
2396
2
5
                                                level => 'high',
2397                                                factors => ['Output type propagated from setter input'],
2398                                        };
2399                                }
2400                        }
2401                }
2402
2403
26
171
                if($schema->{accessor}{type} && $schema->{accessor}{property} && ($schema->{accessor}{type} =~ /getter|getset/) &&
2404                   ((!defined($schema->{output}{type})) || ($schema->{output}{type} eq 'string'))) {
2405
18
30
                        if (my $pod = $method->{pod}) {
2406                                # POD says "UserAgent object"
2407
9
26
                                if ($pod =~ /\bUser[- ]?Agent\b.*\bobject\b/i) {
2408
1
1
                                        $schema->{output}{type} = 'object';
2409
1
2
                                        $schema->{output}{isa} = 'LWP::UserAgent';
2410
2411
1
1
0
2
                                        push @{ $schema->{_confidence}{output}{factors} }, 'POD indicates UserAgent object';
2412
2413
1
2
                                        $schema->{_confidence}{output}{level} = 'high';
2414                                }
2415                        }
2416                }
2417        }
2418}
2419
2420# --------------------------------------------------
2421# _analysis_error
2422#
2423# Purpose:    Report a fatal analysis error with
2424#             module, method, and file context,
2425#             then croak.
2426#
2427# Entry:      Named args:
2428#               method  - method name string.
2429#               message - error description string.
2430#
2431# Exit:       Does not return — always croaks.
2432#
2433# Side effects: None beyond the croak.
2434# --------------------------------------------------
2435sub _analysis_error {
2436
2
4778
        my ($self, %args) = @_;
2437
2438
2
9
        my $method = $args{method} // 'UNKNOWN';
2439
2
7
        my $msg = $args{message} // 'Analysis error';
2440
2441
2
35
        my $module = $self->{_package_name} // 'UNKNOWN';
2442
2
4
        my $file   = $self->{input_file} // 'UNKNOWN';
2443
2444
2
22
        croak join "\n",
2445                $msg,
2446                "  Module: $module",
2447                "  Method: $method",
2448                "  File:   $file",
2449        '';
2450}
2451
2452# --------------------------------------------------
2453# _extract_validator_schema
2454#
2455# Purpose:    Try each supported validator extractor
2456#             in priority order and return the first
2457#             schema that yields a non-empty input
2458#             spec. Used to detect explicit
2459#             parameter validation declarations
2460#             before falling back to heuristic
2461#             code analysis.
2462#
2463# Entry:      $code - method body source string.
2464#
2465# Exit:       Returns a schema hashref on success,
2466#             or undef if no supported validator
2467#             call is detected.
2468#
2469# Side effects: None.
2470#
2471# Notes:      Extractors tried in order:
2472#             Params::Validate::Strict,
2473#             Params::Validate,
2474#             MooseX::Params::Validate,
2475#             Type::Params.
2476# --------------------------------------------------
2477sub _extract_validator_schema {
2478
290
263
        my ($self, $code) = @_;
2479
2480
290
271
        for my $extractor ('_extract_pvs_schema', '_extract_pv_schema', '_extract_moosex_params_schema', '_extract_type_params_schema') {
2481
1145
1359
                my $res = $self->$extractor($code);
2482
1145
7
3215
30
                return $res if ($res && ref($res) eq 'HASH' && keys %{ $res->{input} || {} });
2483        }
2484
2485
283
255
        return;
2486}
2487
2488# --------------------------------------------------
2489# _parse_schema_hash
2490#
2491# Purpose:    Parse a PPI block node representing
2492#             a validator schema hash literal and
2493#             return a normalised schema structure
2494#             suitable for use as input spec.
2495#
2496# Entry:      $hash - a PPI node with a children()
2497#                     method, typically a
2498#                     PPI::Structure::Block from
2499#                     a validate_strict call.
2500#
2501# Exit:       Returns a hashref with keys:
2502#               input       - hashref of param specs
2503#               input_style - 'hash'
2504#               _confidence - confidence hashref
2505#             or undef if parsing fails.
2506#
2507# Side effects: None.
2508# --------------------------------------------------
2509sub _parse_schema_hash {
2510
1
908
        my ($self, $hash) = @_;
2511
2512
1
1
        my %result;
2513
2514
1
4
        for my $child ($hash->children) {
2515                # skip whitespace and operators
2516
1
9
                if ($child->isa('PPI::Statement') || $child->isa('PPI::Statement::Expression')) {
2517
0
0
                        my ($key, $val);
2518
2519                        my @tokens = grep {
2520
0
0
0
0
                                !$_->isa('PPI::Token::Whitespace') &&
2521                                !$_->isa('PPI::Token::Operator')
2522                        } $child->children;
2523
2524
0
0
                        for (my $i = 0; $i < @tokens - 1; $i++) {
2525
0
0
                                if(($tokens[$i]->isa('PPI::Token::Word') || $tokens[$i]->isa('PPI::Token::Quote')) &&
2526                                   $tokens[$i+1]->isa('PPI::Structure::Constructor')) {
2527
0
0
                                        $key = $tokens[$i]->content;
2528
0
0
                                        $key =~ s/^['"]|['"]$//g;
2529
0
0
                                        $val = $tokens[$i+1];
2530
0
0
                                        last;
2531                                }
2532                        }
2533
2534
0
0
                        next unless $key && $val;
2535
2536
0
0
                        my %param;
2537
0
0
                        for my $inner ($val->children) {
2538
0
0
                                next unless $inner->isa('PPI::Statement') || $inner->isa('PPI::Statement::Expression');
2539
2540                                my ($k, undef, $v) = grep {
2541
0
0
0
0
                                        !$_->isa('PPI::Token::Whitespace') &&
2542                                        !$_->isa('PPI::Token::Operator')
2543                                } $inner->children;
2544
2545
0
0
                                next unless $k && $v;
2546
2547
0
0
                                my $keyname = $k->content;
2548
0
0
                                my $value = $v->can('content') ? $v->content : undef;
2549
0
0
                                $value =~ s/^['"]|['"]$//g if defined $value;
2550
2551
0
0
                                if ($keyname eq 'type') {
2552
0
0
                                        $param{type} = lc($value);
2553                                } elsif ($keyname eq 'optional') {
2554
0
0
                                        $param{optional} = $value ? 1 : 0;
2555                                } elsif ($keyname =~ /^(min|max)$/ && looks_like_number($value)) {
2556
0
0
                                        $param{$keyname} = 0 + $value;
2557                                } elsif ($keyname eq 'matches') {
2558
0
0
                                        $param{matches} = qr/$value/;
2559                                }
2560                        }
2561
2562
0
0
                        $param{type} //= 'string';
2563
0
0
                        $param{optional} //= 0;
2564
2565
0
0
                        $result{$key} = \%param;
2566                }
2567        }
2568
2569        return {
2570
1
5
                input => \%result,
2571                input_style => 'hash',
2572                _confidence => {
2573                        input => {
2574                                level => 'high',
2575                                factors => ['Input schema extracted from validator'],
2576                        },
2577                },
2578        };
2579}
2580
2581# --------------------------------------------------
2582# _ppi
2583#
2584# Purpose:    Return a PPI::Document for a code
2585#             string, using a per-instance cache
2586#             to avoid re-parsing the same string
2587#             multiple times during a single
2588#             analysis pass.
2589#
2590# Entry:      $code - either a string of Perl source
2591#                     code, or an object that
2592#                     already has a find() method
2593#                     (returned as-is).
2594#
2595# Exit:       Returns a PPI::Document, or the
2596#             original object if it already
2597#             supports find().
2598#
2599# Side effects: Populates $self->{_ppi_cache}.
2600# --------------------------------------------------
2601sub _ppi {
2602
29
7236
        my ($self, $code) = @_;
2603
2604
29
55
        return $code if ref($code) && $code->can('find');
2605
2606
28
73
        $self->{_ppi_cache} ||= {};
2607
28
114
        return $self->{_ppi_cache}{$code} //= PPI::Document->new(\$code);
2608}
2609
2610# --------------------------------------------------
2611# _extract_pvs_schema
2612#
2613# Purpose:    Detect and extract a parameter schema
2614#             from a Params::Validate::Strict
2615#             validate_strict() call in the method
2616#             body.
2617#
2618# Entry:      $code - method body source string.
2619#
2620# Exit:       Returns a schema hashref with input,
2621#             style, and source keys on success,
2622#             or undef if no validate_strict call
2623#             is found or parsing fails.
2624#
2625# Side effects: None.
2626# --------------------------------------------------
2627sub _extract_pvs_schema {
2628
298
274
        my ($self, $code) = @_;
2629
2630
298
418
        return unless $code =~ /\bvalidate_strict\s*\(/;
2631
2632
9
13
        my $doc = $self->_ppi($code) or return;
2633
2634        my $calls = $doc->find(sub {
2635
954
4396
                $_[1]->isa('PPI::Token::Word') && ($_[1]->content eq 'validate_strict' || $_[1]->content eq 'Params::Validate::Strict::validate_strict')
2636
9
44369
        }) or return;
2637
2638
9
62
        for my $call (@$calls) {
2639
9
46
                my $list = $call->parent();
2640
9
174
                while ($list && !$list->isa('PPI::Structure::List')) {
2641
40
210
                        $list = $list->parent();
2642                }
2643
9
22
                if(!defined($list)) {
2644
9
17
                        my $next = $call->next_sibling();
2645
9
193
                        next unless defined $next;
2646
9
13
                        if($next->content() =~ /schema\s*=>\s*(\{(?:[^{}]|\{(?:[^{}]|\{[^{}]*\})*\})*\})/s) {
2647
7
915
                                my $schema_text = $1;
2648
7
25
                                my $compartment = Safe->new();
2649
7
2972
                                $compartment->permit_only(qw(:base_core :base_mem :base_orig));
2650
2651
7
30
                                my $schema_str = "my \$schema = $schema_text";
2652
7
10
                                my $schema = $compartment->reval($schema_str);
2653
7
7
2055
11
                                if(scalar keys %{$schema}) {
2654                                        return {
2655
7
22
                                                input => $schema,
2656                                                style => 'hash',
2657                                                source => 'validator'
2658                                        }
2659                                }
2660                        }
2661                }
2662
2
103
                next unless $list;
2663
2664
0
0
0
0
                my ($schema_block) = grep { $_->isa('PPI::Structure::Block') } $list->children;
2665
2666
0
0
                next unless $schema_block;
2667
2668
0
0
                my $schema = $self->_extract_schema_hash_from_block($schema_block);
2669
0
0
                return $self->_normalize_validator_schema($schema) if $schema;
2670        }
2671
2672
2
16
        return;
2673}
2674
2675# --------------------------------------------------
2676# _extract_pv_schema
2677#
2678# Purpose:    Detect and extract a parameter schema
2679#             from a Params::Validate validate()
2680#             call in the method body.
2681#
2682# Entry:      $code - method body source string.
2683#
2684# Exit:       Returns a schema hashref with input,
2685#             style, and source keys on success,
2686#             or undef if no validate() call is
2687#             found or parsing fails.
2688#
2689# Side effects: None.
2690# --------------------------------------------------
2691sub _extract_pv_schema {
2692
296
268
        my ($self, $code) = @_;
2693
2694
296
389
        return unless $code =~ /\bvalidate\s*\(/;
2695
2696
8
12
        my $doc = $self->_ppi($code) or return;
2697
2698        my $calls = $doc->find(sub {
2699
689
3266
                $_[1]->isa('PPI::Token::Word') && ($_[1]->content eq 'validate' || $_[1]->content eq 'Params::Validate::validate')
2700
8
32135
        }) or return;
2701
2702
8
54
        for my $call (@$calls) {
2703
8
15
                my $list = $call->parent;
2704
8
129
                while ($list && !$list->isa('PPI::Structure::List')) {
2705
30
147
                        $list = $list->parent;
2706                }
2707
8
18
                if(!defined($list)) {
2708
8
14
                        my $next = $call->next_sibling();
2709
8
184
                        my ($arglist, $schema_text) = $self->_parse_pv_call($next);
2710
2711
8
11
                        if($schema_text) {
2712
8
19
                                my $compartment = Safe->new();
2713
8
3190
                                $compartment->permit_only(qw(:base_core :base_mem :base_orig));
2714
2715
8
29
                                my $schema_str = "my \$schema = $schema_text";
2716
8
11
                                my $schema = $compartment->reval($schema_str);
2717
2718
8
8
2577
34
                                if(scalar keys %{$schema}) {
2719
7
7
7
8
                                        foreach my $arg(keys %{$schema}) {
2720
12
9
                                                my $field = $schema->{$arg};
2721
12
14
                                                if(my $type = $field->{'type'}) {
2722
12
17
                                                        if($type eq 'ARRAYREF') {
2723
1
1
                                                                $field->{'type'} = 'arrayref';
2724                                                        } elsif($type eq 'SCALAR') {
2725
11
10
                                                                $field->{'type'} = 'string';
2726                                                        }
2727                                                }
2728
12
28
                                                delete $field->{'callbacks'};
2729                                        }
2730
2731                                        return {
2732
7
21
                                                input => $schema,
2733                                                style => 'hash',
2734                                                source => 'validator'
2735                                        }
2736                                }
2737                        }
2738                }
2739
1
45
                next unless $list;
2740
2741
0
0
0
0
                my ($schema_block) = grep { $_->isa('PPI::Structure::Block') } $list->children;
2742
2743
0
0
                next unless $schema_block;
2744
2745
0
0
                my $schema = $self->_extract_schema_hash_from_block($schema_block);
2746
0
0
                return $self->_normalize_validator_schema($schema) if $schema;
2747        }
2748
2749
1
2
        return;
2750}
2751
2752# --------------------------------------------------
2753# _parse_pv_call
2754#
2755# Purpose:    Split a Params::Validate call argument
2756#             string into its two components: the
2757#             first argument (typically \@_) and
2758#             the schema hash string.
2759#
2760# Entry:      $string - the raw argument string
2761#                       from the validate() call,
2762#                       including outer parentheses.
2763#
2764# Exit:       Returns a two-element list:
2765#               ($first_arg, $hash_str)
2766#             or an empty list if no comma is found
2767#             at brace depth zero (malformed call).
2768#
2769# Side effects: None.
2770# --------------------------------------------------
2771sub _parse_pv_call {
2772
21
29
        my ($self, $string) = @_;
2773
2774        # Remove outer parentheses and whitespace
2775
21
30
        $string =~ s/^\s*\(\s*//;
2776
21
2697
        $string =~ s/\s*\)\s*$//;
2777
2778        # Find the first comma at brace-depth 0, jumping over each balanced
2779        # {...} block in one step via extract_bracketed rather than
2780        # counting depth character by character
2781
21
686
        require Text::Balanced;
2782
21
8187
        my $rest = $string;
2783
21
33
        my $comma_pos = 0;
2784
21
16
        my $found_comma = 0;
2785
2786
21
27
        while (length $rest) {
2787
93
81
                if (substr($rest, 0, 1) eq '{') {
2788                        # extract_bracketed advances $rest past the extracted block
2789                        # in place, so $rest must not be re-truncated afterwards
2790
1
3
                        my $extracted = Text::Balanced::extract_bracketed($rest, '{}');
2791
1
166
                        return unless defined $extracted;       # Broken source code
2792
1
1
                        $comma_pos += length $extracted;
2793
1
2
                        next;
2794                }
2795
92
80
                if (substr($rest, 0, 1) eq ',') {
2796
20
13
                        $found_comma = 1;
2797
20
17
                        last;
2798                }
2799
72
44
                $comma_pos++;
2800
72
62
                $rest = substr($rest, 1);
2801        }
2802
2803
21
21
        return unless $found_comma;
2804
2805
20
16
        my $first_arg = substr($string, 0, $comma_pos);
2806
20
23
        my $hash_str = substr($string, $comma_pos + 1);
2807
2808        # Trim whitespace
2809
20
35
        $first_arg =~ s/^\s+|\s+$//g;
2810
20
137
        $hash_str =~ s/^\s+|\s+$//g;
2811
2812
20
29
        return ($first_arg, $hash_str);
2813}
2814
2815# --------------------------------------------------
2816# _extract_moosex_params_schema
2817#
2818# Purpose:    Detect and extract a parameter schema
2819#             from a MooseX::Params::Validate
2820#             validated_hash() call in the method
2821#             body.
2822#
2823# Entry:      $code - method body source string.
2824#
2825# Exit:       Returns a schema hashref with input,
2826#             style, and source keys on success,
2827#             or undef if no validated_hash() call
2828#             is found or parsing fails.
2829#
2830# Side effects: None.
2831# --------------------------------------------------
2832sub _extract_moosex_params_schema
2833{
2834
295
279
        my ($self, $code) = @_;
2835
2836
295
340
        return unless $code =~ /\bvalidated_hash\s*\(/;
2837
2838
9
13
        my $doc = $self->_ppi($code) or return;
2839
2840        my $calls = $doc->find(sub {
2841
742
3336
                $_[1]->isa('PPI::Token::Word') && ($_[1]->content eq 'validated_hash')
2842
9
32942
        }) or return;
2843
2844
9
55
        for my $call (@$calls) {
2845
9
18
                my $list = $call->parent();
2846
9
212
                while ($list && !$list->isa('PPI::Structure::List')) {
2847
36
241
                        $list = $list->parent;
2848                }
2849
9
34
                if(!defined($list)) {
2850
9
16
                        my $next = $call->next_sibling();
2851
9
273
                        my ($arglist, $schema_text) = $self->_parse_pv_call($next);
2852
2853
9
11
                        if($schema_text) {
2854
9
20
                                my $compartment = Safe->new();
2855
9
3600
                                $compartment->permit_only(qw(:base_core :base_mem :base_orig));
2856
2857
9
32
                                my $schema_str = "my \$schema = { $schema_text }";
2858
9
22
                                $schema_str =~ s/ArrayRef\[(.+?)\]/arrayref, element_type => $1/g;
2859
9
11
                                my $schema = $compartment->reval($schema_str);
2860
2861
9
9
2507
11
                                if(scalar keys %{$schema}) {
2862
9
9
8
10
                                        foreach my $arg(keys %{$schema}) {
2863
14
13
                                                my $field = $schema->{$arg};
2864
14
13
                                                if(my $isa = delete $field->{'isa'}) {
2865
14
14
                                                        $field->{'type'} = $isa;
2866                                                }
2867
14
15
                                                if(exists($field->{'required'})) {
2868
11
10
                                                        my $required = delete $field->{'required'};
2869
11
12
                                                        $field->{'optional'} = $required ? 0 : 1;
2870                                                } else {
2871
3
2
                                                        $field->{'optional'} = 1;
2872                                                }
2873
14
23
                                                if(ref($field->{'default'}) eq 'CODE') {
2874
2
23
                                                        delete $field->{'default'};  # TODO
2875                                                }
2876                                        }
2877
2878
9
9
7
9
                                        foreach my $arg(keys %{$schema}) {
2879
14
9
                                                my $field = $schema->{$arg};
2880
14
14
                                                if(my $type = $field->{'type'}) {
2881
14
17
                                                        if($type eq 'ARRAYREF') {
2882
0
0
                                                                $field->{'type'} = 'arrayref';
2883                                                        } elsif($type eq 'SCALAR') {
2884
0
0
                                                                $field->{'type'} = 'string';
2885                                                        }
2886                                                }
2887
14
13
                                                delete $field->{'callbacks'};
2888                                        }
2889
2890                                        return {
2891
9
24
                                                input => $schema,
2892                                                style => 'hash',
2893                                                source => 'validator'
2894                                        }
2895                                }
2896                        }
2897                }
2898
0
0
                next unless $list;
2899
2900
0
0
0
0
                my ($schema_block) = grep { $_->isa('PPI::Structure::Block') } $list->children;
2901
2902
0
0
                next unless $schema_block;
2903
2904
0
0
                my $schema = $self->_extract_schema_hash_from_block($schema_block);
2905
0
0
                return $self->_normalize_validator_schema($schema) if $schema;
2906        }
2907
2908
0
0
        return;
2909}
2910
2911# --------------------------------------------------
2912# _extract_schema_hash_from_block
2913#
2914# Purpose:    Extract a parameter schema hashref from
2915#             a PPI::Structure::Block node representing
2916#             the schema argument to a validator call
2917#             such as validate_strict({ ... }).
2918#
2919# Entry:      $block - a PPI::Structure::Block node.
2920#
2921# Exit:       Returns a hashref of parameter name to
2922#             spec hashref, or undef if parsing fails.
2923#
2924# Side effects: None.
2925#
2926# Notes:      Delegates to _parse_schema_hash which
2927#             expects a PPI node with a children()
2928#             method. This method exists to provide
2929#             a clear semantic name at the call site.
2930# --------------------------------------------------
2931sub _extract_schema_hash_from_block {
2932
2
8
        my ($self, $block) = @_;
2933
2934
2
8
        return unless $block && $block->can('children');
2935
2936
0
0
        my $result = $self->_parse_schema_hash($block);
2937
2938
0
0
        return unless $result && ref($result) eq 'HASH' && $result->{input};
2939
2940
0
0
        return $result->{input};
2941}
2942
2943# --------------------------------------------------
2944# _normalize_validator_schema
2945#
2946# Purpose:    Normalise a raw validator schema
2947#             hashref (as extracted from PPI) into
2948#             the standard input spec format used
2949#             throughout the extractor.
2950#
2951# Entry:      $schema - hashref of parameter name
2952#                       to raw spec hashref, as
2953#                       produced by
2954#                       _extract_schema_hash_from_block.
2955#
2956# Exit:       Returns a hashref with keys:
2957#               input_style - 'hash'
2958#               input       - normalised param specs
2959#             Each param spec gains an explicit
2960#             optional key and _source / _type_confidence
2961#             metadata.
2962#
2963# Side effects: None.
2964# --------------------------------------------------
2965sub _normalize_validator_schema {
2966
5
19
        my ($self, $schema) = @_;
2967
2968
5
5
        my %input;
2969
2970
5
6
        for my $name (keys %$schema) {
2971
8
6
                my $spec = $schema->{$name};
2972
2973                $input{$name} = {
2974                        %$spec,
2975
8
22
                        optional => exists $spec->{optional} ? $spec->{optional} : 0,
2976                        _source => 'validator',
2977                        _type_confidence => 'high',
2978                };
2979        }
2980
2981        return {
2982
5
9
                input_style => 'hash',
2983                input => \%input,
2984        };
2985}
2986
2987# --------------------------------------------------
2988# _extract_type_params_schema
2989#
2990# Purpose:    Detect and extract a parameter schema
2991#             from a Type::Params signature_for()
2992#             declaration for the current method,
2993#             located in the module-level document.
2994#
2995# Entry:      $code - method body source string
2996#                     (used to extract the function
2997#                     name for lookup).
2998#
2999# Exit:       Returns a schema hashref on success,
3000#             or undef if no signature_for
3001#             declaration is found or compilation
3002#             fails.
3003#
3004# Side effects: May fork a child process to compile
3005#               the signature in isolation.
3006# --------------------------------------------------
3007sub _extract_type_params_schema {
3008
286
5024
        my ($self, $code) = @_;
3009
3010
286
305
        my $function = $self->_extract_function_name($code) or return;
3011
3012
285
696
        my $doc = $self->{_document} or return;
3013
282
304
        my $stmt = $self->_find_signature_statement($doc, $function) or return;
3014
3015
2
12
        my $signature_expr = $self->_extract_signature_expression($stmt, $function) or return;
3016
3017
2
7
        my $meta = $self->_compile_signature_isolated($function, $signature_expr) or return;
3018
3019
1
7
        return $self->_build_schema_from_meta($meta);
3020}
3021
3022# --------------------------------------------------
3023# _extract_function_name
3024#
3025# Purpose:    Extract the subroutine name from the
3026#             start of a method body string, used
3027#             to look up its Type::Params signature.
3028#
3029# Entry:      $code - method body source string.
3030#
3031# Exit:       Returns the subroutine name string,
3032#             or undef if no 'sub name' declaration
3033#             is found.
3034#
3035# Side effects: None.
3036# --------------------------------------------------
3037sub _extract_function_name {
3038
287
1558
        my ($self, $code) = @_;
3039
287
880
        return $1 if $code =~ /^\s*sub\s+([a-zA-Z0-9_]+)/;
3040
3
9
        return;
3041}
3042
3043# --------------------------------------------------
3044# _find_signature_statement
3045#
3046# Purpose:    Search a PPI document for a
3047#             signature_for statement that
3048#             corresponds to a named function.
3049#
3050# Entry:      $doc      - PPI::Document to search.
3051#             $function - function name string.
3052#
3053# Exit:       Returns the matching PPI::Statement
3054#             node, or undef if none is found.
3055#
3056# Side effects: None.
3057# --------------------------------------------------
3058sub _find_signature_statement {
3059
282
4234
        my ($self, $doc, $function) = @_;
3060
3061        my $statements = $doc->find(
3062                sub {
3063
213478
1302048
                        $_[1]->isa('PPI::Statement') && $_[1]->content =~ /^\s*signature_for\b/
3064                }
3065
282
669
        ) or return;
3066
3067
2
14
        foreach my $stmt (@$statements) {
3068
2
3
                my $content = $stmt->content;
3069
2
131
                if ($content =~ /^\s*signature_for\s+\Q$function\E\b/) {
3070
2
4
                        return $stmt;
3071                }
3072        }
3073
3074
0
0
        return;
3075}
3076
3077# --------------------------------------------------
3078# _extract_signature_expression
3079#
3080# Purpose:    Extract the Type::Params signature
3081#             expression (everything after =>) from
3082#             a signature_for statement node.
3083#
3084# Entry:      $stmt     - PPI::Statement node.
3085#             $function - function name string,
3086#                         used in the match pattern.
3087#
3088# Exit:       Returns the signature expression
3089#             string, or undef if the pattern
3090#             does not match.
3091#
3092# Side effects: None.
3093# --------------------------------------------------
3094sub _extract_signature_expression {
3095
3
2157
        my ($self, $stmt, $function) = @_;
3096
3097
3
12
        my $content = $stmt->content;
3098
3099
3
171
        if ($content =~ /^\s*signature_for\s+\Q$function\E\s*=>\s*(.+?);?\s*$/s) {
3100
2
5
                return $1;
3101        }
3102
3103
1
1
        return;
3104}
3105
3106# --------------------------------------------------
3107# _compile_signature_isolated
3108#
3109# Purpose:    Compile and evaluate a Type::Params
3110#             signature expression in an isolated
3111#             process to extract parameter metadata
3112#             without polluting the current process.
3113#
3114#             Only runs when the caller passed
3115#             allow_signature_exec => 1 to new().
3116#             Extracting parameter types from a
3117#             Type::Params signature_for() declaration
3118#             requires actually building the type
3119#             objects at runtime -- there is no purely
3120#             static way to do it -- so this is real
3121#             execution of an excerpt of the target
3122#             module's own source. Every other code
3123#             path in this module is static (PPI-only)
3124#             analysis that never runs the target's
3125#             code, so this one feature must be opted
3126#             into explicitly rather than triggered
3127#             implicitly by extract_all().
3128#
3129#             A Safe compartment was previously tried
3130#             first as a "fast path" before falling
3131#             back to this subprocess unconditionally.
3132#             It was removed: Type::Params and
3133#             Types::Common pull in XS modules (e.g.
3134#             B.pm via Type::Params), and Safe cannot
3135#             host XS/dynamic loading at all, so the
3136#             compartment never succeeded for any real
3137#             signature_for() declaration -- it was
3138#             dead code that gave a false impression of
3139#             sandboxing while every real call fell
3140#             through to the unconditional subprocess
3141#             below.
3142#
3143# Entry:      $function        - function name string.
3144#             $signature_expr  - Type::Params
3145#                                signature expression
3146#                                string.
3147#
3148# Exit:       Returns a decoded JSON hashref
3149#             containing parameters and returns
3150#             metadata on success.
3151#             Returns undef without running anything if
3152#             allow_signature_exec was not enabled.
3153#             Croaks on unsafe expressions, timeout,
3154#             or compile errors.
3155#
3156# Side effects: May fork a child process with a
3157#               memory limit applied via
3158#               BSD::Resource if available.
3159#               Memory limiting is best-effort and
3160#               silently skipped on platforms where
3161#               BSD::Resource is unavailable.
3162# --------------------------------------------------
3163sub _compile_signature_isolated {
3164
2
5
        my ($self, $function, $signature_expr) = @_;
3165
3166
2
2
        unless ($self->{allow_signature_exec}) {
3167                carp "Skipping Type::Params signature_for($function) extraction: ",
3168                        'allow_signature_exec => 1 was not passed to new() ',
3169                        '(this would execute code from the target module)'
3170
2
3
                        if $self->{verbose};
3171
2
7
                return;
3172        }
3173
3174        # Remove comments
3175
0
0
        $signature_expr =~ s/#.*$//mg;
3176
3177        # Reject obviously dangerous constructs. This is defense in depth
3178        # only, not a real security boundary -- it is a denylist of literal
3179        # tokens and cannot catch e.g. a symbolic-ref call built by string
3180        # concatenation. The actual control here is the allow_signature_exec
3181        # opt-in above: this code must never run against a module the caller
3182        # has not already decided to trust enough to execute.
3183
0
0
        if ($signature_expr =~ /\b(?:system|exec|open|fork|require|do|eval|qx)\b/) {
3184
0
0
                die 'Unsafe signature expression';
3185        }
3186
3187
0
0
        if ($signature_expr =~ /[`{};]/) {
3188
0
0
                die "Unsafe signature expression";
3189        }
3190
3191
0
0
        my $payload = <<'PERL';
3192use strict;
3193use warnings;
3194use Type::Params -sigs;
3195use Types::Common -types;
3196use JSON::MaybeXS;
3197
3198# Stub sub so Perl can parse it
3199sub FUNCTION_NAME {}
3200
3201# Create the Type::Params signature object
3202my $sig = signature_for FUNCTION_NAME => SIGNATURE_EXPR;
3203
3204# Extract parameters
3205my @sig_params = @{ $sig->parameters || [] };
3206my $pos = 0;
3207my @params;
3208
3209# if ($sig->method) {
3210    # The $self value
3211    # push @params, {
3212        # name     => 'arg0',
3213        # optional => 0,
3214        # position => $pos++,
3215    # };
3216# }
3217
3218for my $p (@sig_params) {
3219        push @params, {
3220                name => "arg$pos",
3221                optional => $p->optional ? 1 : 0,
3222                position => $pos,
3223                type => $p->type->name
3224        };
3225        $pos++;
3226}
3227
3228# Extract return type
3229my $returns;
3230if (my $r = $sig->returns_scalar) {
3231        $returns = {
3232                context => 'scalar',
3233                type => $r ? $r->name : 'unknown',
3234        };
3235} elsif ($r = $sig->returns_list) {
3236        $returns = {
3237                context => 'list',
3238                type => $r ? $r->name : 'unknown',
3239        };
3240}
3241
3242print encode_json({
3243        parameters => \@params,
3244        returns => $returns,
3245});
3246PERL
3247
3248        # Substitute function name and signature expression
3249
0
0
        $payload =~ s/FUNCTION_NAME/$function/g;
3250
0
0
        $payload =~ s/SIGNATURE_EXPR/$signature_expr/;
3251
3252        # Run in an isolated Perl process
3253
0
0
        my ($wtr, $rdr, $err) = (undef, undef, gensym);
3254
0
0
        local %ENV;
3255
3256        # Apply memory limit if BSD::Resource is available.
3257        # This module is Unix-only and not available on Windows,
3258        # so we guard the call and skip silently if not present.
3259
0
0
        eval {
3260
0
0
                require BSD::Resource;
3261
0
0
                BSD::Resource::setrlimit(
3262                        BSD::Resource::RLIMIT_AS(),
3263                        $MEMORY_LIMIT_BYTES,
3264                        $MEMORY_LIMIT_BYTES
3265                );
3266        };
3267        # Ignore failure — resource limiting is best-effort only
3268
3269
0
0
        my $pid = open3($wtr, $rdr, $err, $^X, '-T');
3270
3271
0
0
        print $wtr $payload;
3272
0
0
        close $wtr;
3273
3274
0
0
0
0
        local $SIG{ALRM} = sub { croak 'Signature compile timeout' };
3275
0
0
0
0
        eval { alarm($SIGNATURE_TIMEOUT_SECS) };        # no-op on Windows
3276
3277
0
0
0
0
0
0
        my $stdout = do { local $/; <$rdr> };
3278
0
0
0
0
0
0
        my $stderr = do { local $/; <$err> };
3279
3280
0
0
0
0
        eval { alarm 0 };
3281
3282
0
0
        waitpid($pid, 0);
3283
3284
0
0
        if ($stderr && length $stderr) {
3285
0
0
                croak "Error compiling signature:\n$stderr";
3286        }
3287
3288
0
0
        return decode_json($stdout);
3289}
3290
3291# --------------------------------------------------
3292# _build_schema_from_meta
3293#
3294# Purpose:    Convert the parameter and return type
3295#             metadata produced by
3296#             _compile_signature_isolated into a
3297#             standard schema hashref.
3298#
3299# Entry:      $meta - hashref with 'parameters'
3300#                     arrayref and optional
3301#                     'returns' hashref, as decoded
3302#                     from the isolated compile
3303#                     JSON output.
3304#
3305# Exit:       Returns a schema hashref with input,
3306#             output, style, source, _notes, and
3307#             _confidence keys.
3308#
3309# Side effects: None.
3310#
3311# Notes:      Unknown Type::Params type names are
3312#             mapped to 'string' with a note added
3313#             and confidence downgraded to 'medium'.
3314# --------------------------------------------------
3315sub _build_schema_from_meta {
3316
8
35
        my ($self, $meta) = @_;
3317
3318
8
21
        my %type_map = (
3319                Num => 'number',
3320                Int => 'integer',
3321                Str => 'string',
3322                Bool => 'boolean',
3323                Object  => 'object',
3324                ArrayRef => 'array',
3325                HashRef  => 'object',
3326        );
3327
3328
8
5
        my $input;
3329
8
7
        my $position = 0;
3330
8
6
        my $confidence = 'high';
3331
8
9
        my @notes = ('Type::Params detected');
3332
3333
8
8
5
12
        foreach my $p (@{ $meta->{parameters} || [] }) {
3334
8
13
                my $type = $type_map{ $p->{type} } // 'string';
3335
3336
8
9
                if (!exists $type_map{$p->{type}}) {
3337
1
2
                        push @notes, "Unknown type $p->{type}, defaulting to string";
3338
1
1
                        $confidence = 'medium';
3339                }
3340
3341                $input->{"arg$position"} = {
3342                        type => $type,
3343                        position => $position,
3344
8
18
                        optional => $p->{optional} ? 1 : 0,
3345                };
3346
3347
8
8
                $position++;
3348        }
3349
3350
8
6
        my $output;
3351
3352
8
9
        if (my $ret = $meta->{returns}) {
3353
2
3
                my $type = $type_map{ $ret->{type} } // 'string';
3354
3355
2
2
                if (!exists $type_map{$ret->{type}}) {
3356
0
0
                        push @notes, "Unknown return type $ret->{type}, defaulting to string";
3357
0
0
                        $confidence = 'medium';
3358                }
3359
3360                $output = {
3361
2
5
                        type => $type,
3362                        "_$ret->{context}_context" => { type => $type },
3363                };
3364        }
3365
3366        return {
3367
8
27
                input  => $input,
3368                output => $output,
3369                style  => 'hash',
3370                source => 'validator',
3371                _notes => \@notes,
3372                _confidence => {
3373                        input => $confidence,
3374                },
3375        };
3376}
3377
3378# --------------------------------------------------
3379# _analyze_pod
3380#
3381# Purpose:    Parse POD documentation for a method
3382#             and extract parameter names, types,
3383#             constraints, and optionality from
3384#             multiple POD patterns.
3385#
3386# Entry:      $pod - string of POD content as
3387#                    returned by _extract_pod_before.
3388#                    May be undef or empty.
3389#
3390# Exit:       Returns a hashref of parameter name
3391#             to parameter spec hashref. Returns an
3392#             empty hashref if no POD is provided
3393#             or no parameters are found.
3394#
3395# Side effects: Carps when a semantic type is
3396#               detected, advising the caller to
3397#               set config->properties.
3398#               Logs progress to stdout when
3399#               verbose is set.
3400#
3401# Notes:      Three pattern strategies are tried
3402#             in order: (1) named Parameters section,
3403#             (2) inline $name - type format,
3404#             (3) =over/=item list. Parameters found
3405#             earlier take precedence over later
3406#             discoveries. Default values from POD
3407#             are merged in last.
3408# --------------------------------------------------
3409sub _analyze_pod {
3410
297
284
        my ($self, $pod) = @_;
3411
3412
297
325
        return {} unless $pod;
3413
3414
115
75
        my %params;
3415
115
80
        my $position_counter = 0;
3416
3417        # Check for positional arguments in method signature
3418        # Pattern: =head2 method_name($arg1, $arg2, $arg3)
3419
115
231
        if ($pod =~ /=head2\s+\w+\s*\(([^)]+)\)/s) {
3420
35
37
                my $sig = $1;
3421                # Extract parameter names in order
3422
35
74
                my @sig_params = $sig =~ /\$(\w+)/g;
3423
3424                # Skip $self or $class
3425
35
84
                shift @sig_params if @sig_params && $sig_params[0] =~ /^(self|class)$/i;
3426
3427                # Assign positions
3428
35
31
                foreach my $param (@sig_params) {
3429
47
112
                        $params{$param}{position} //= $position_counter;
3430
47
78
                        $self->_log("  POD: $param has position $params{$param}{position}");
3431
47
45
                        $position_counter++;
3432                }
3433        }
3434
3435
115
179
        $self->_log("  POD: Found $position_counter unnamed parameters to add to the position list");
3436
3437        # Pattern 1: Parse line-by-line in Parameters section
3438        # First, extract the Parameters section
3439
115
77
        my $param_section;
3440
115
831
        if($pod =~ /(?:Parameters?|Arguments?|Inputs?):?\s*\n((?:\s*\$.*\n)+)/si) {
3441
30
32
                $param_section = $1;
3442        } elsif ($pod =~ /^=head\d+\s+(?:Parameters?|Arguments?|Inputs?)\b.*?\n(.*?)(?=^=head|\Z)/msi) {
3443
25
45
                $param_section = $1;
3444        }
3445
115
123
        if($param_section) {
3446
55
45
                my $param_order = 0;
3447
3448
55
73
                $self->_log("  POD: Scan for named parameters in '$param_section'");
3449                # Now parse each line that starts with $varname
3450
55
99
                foreach my $line (split /\n/, $param_section) {
3451
282
246
                        if ($line =~ /C<\$(\w+)>\s*\((Required|Mandatory)\)/i) {
3452
0
0
                                $params{$1}{optional} = 0;
3453
0
0
                                $self->_log("  POD: $1 marked required from item header");
3454                        }
3455
3456                        # Match: $name - type (constraints), description
3457                        # or:   $name - type, description
3458                        # or:   $name - type
3459
282
447
                        if(($line =~ /^\s*\$(\w+)\s*-\s*(\w+)(?:\s*\(([^)]+)\))?\s*,?\s*(.*)$/i) ||
3460                           ($line =~ /^\s*C<\$(\w+)>\s*-\s*(\w+)(?:\s*\(([^)]+)\))?\s*,?\s*(.*)$/i)) {
3461
43
88
                                my ($name, $type, $constraint, $desc) = ($1, lc($2), $3, $4);
3462
3463                                # Clean up
3464
43
101
                                $desc =~ s/^\s+|\s+$//g if $desc;
3465
3466                                # Skip common non-parameters
3467
43
56
                                next if $name =~ /^(self|class|return|returns?)$/i;
3468
3469
42
77
                                $params{$name} ||= { _source => 'pod' };
3470
3471                                # If we haven't already assigned a position from the signature, use order in Parameters section
3472
42
86
                                unless (exists $params{$name}{position}) {
3473
16
14
                                        $params{$name}{position} = $param_order++;
3474
16
25
                                        $self->_log("  POD: $name has position $params{$name}{position} (from Parameters order)");
3475                                }
3476
3477                                # Normalize type names
3478
42
39
                                $type = 'integer' if $type eq 'int';
3479
42
62
                                $type = 'number' if $type eq 'num' || $type eq 'float';
3480
42
29
                                $type = 'boolean' if $type eq 'bool';
3481
42
43
                                $type = 'arrayref' if $type eq 'array';
3482
42
32
                                $type = 'hashref' if $type eq 'hash';
3483
3484
42
42
                                $params{$name}{type} = $type;
3485
3486                                # Parse constraints
3487
42
36
                                if($constraint) {
3488
7
13
                                        $self->_parse_constraints($params{$name}, $constraint);
3489                                }
3490
3491                                # Check for optional/required in description OR constraint.
3492                                # Use word boundaries to avoid matching "optionally" as "optional".
3493
42
71
                                my $full_text = ($constraint || '') . ' ' . ($desc || '');
3494
42
89
                                if ($full_text =~ /\boptional\b/i) {
3495
6
7
                                        $params{$name}{optional} = 1;
3496
6
9
                                        $self->_log("  POD: $name marked as optional");
3497                                } elsif ($full_text =~ /required|mandatory/i) {
3498
2
4
                                        $params{$name}{optional} = 0;
3499
2
4
                                        $self->_log("  POD: $name marked as required");
3500                                }
3501
3502                                # Detect semantic types:
3503
42
66
                                if ($desc =~ /\b(email|url|uri|path|filename)\b/i) {
3504                                        # TODO: ensure properties is set to 1 in $config
3505
2
56
                                        carp('Manually set config->properties to 1 in ', $self->{'input_file'});
3506
2
1375
                                        $params{$name}{semantic} = lc($1);
3507                                }
3508
3509                                # Look for regex patterns
3510
42
63
                                if ($desc && $desc =~ m{matches?\s+(/[^/]+/|qr/.+?/)}i) {
3511
1
2
                                        $params{$name}{matches} = $1;
3512                                }
3513
3514
42
87
                                $self->_log("  POD: Found parameter '$name' in parameters section, type=$type" .
3515                                                ($constraint ? " ($constraint)" : '') .
3516                                                ($desc ? " - $desc" : ''));
3517                        }
3518                }
3519        }
3520
3521        # Pattern 2: Also try the inline format in case Parameters: section wasn't found
3522
115
239
        while ($pod =~ /\$(\w+)\s*-\s*(string|integer|int|number|num|float|boolean|bool|arrayref|array|hashref|hash|object|any)(?:\s*\(([^)]+)\))?\s*,?\s*(.*)$/gim) {
3523
29
65
                my ($name, $type, $constraint, $desc) = ($1, lc($2), $3, $4);
3524
3525                # Only process if we haven't already found this param in the Parameters section
3526
29
50
                next if exists $params{$name};
3527
3528                # Clean up description - remove leading/trailing whitespace
3529
2
7
                $desc =~ s/^\s+|\s+$//g if $desc;
3530
3531                # Skip common words that aren't parameters
3532
2
4
                next if $name =~ /^(self|class|return|returns?)$/i;
3533
3534
1
4
                $params{$name} ||= { _source => 'pod' };
3535
3536                # Normalize type names
3537
1
2
                $type = 'integer' if $type eq 'int';
3538
1
3
                $type = 'number' if $type eq 'num' || $type eq 'float';
3539
1
1
                $type = 'boolean' if $type eq 'bool';
3540
1
1
                $type = 'arrayref' if $type eq 'array';
3541
1
1
                $type = 'hashref' if $type eq 'hash';
3542
3543
1
2
                $params{$name}{type} = $type;
3544
3545                # Parse constraints
3546
1
2
                if ($constraint) {
3547
0
0
                        $self->_parse_constraints($params{$name}, $constraint);
3548                }
3549
3550                # Check for optional/required in description.
3551                # Use word boundaries to avoid matching "optionally" as "optional".
3552
1
1
                if ($desc) {
3553
1
5
                        if ($desc =~ /\boptional\b/i) {
3554
0
0
                                $params{$name}{optional} = 1;
3555                        } elsif ($desc =~ /required|mandatory/i) {
3556
0
0
                                $params{$name}{optional} = 0;
3557                        }
3558
3559                        # Look for regex patterns in description
3560
1
2
                        if ($desc =~ m{matches?\s+(/[^/]+/|qr/.+?/)}i) {
3561
0
0
                                $params{$name}{matches} = $1;
3562                        }
3563                }
3564
3565
1
4
                $self->_log("  POD: Found parameter '$name' in the inline documentation, type=$type" .
3566                                        ($constraint ? " ($constraint)" : ''));
3567        }
3568
3569        # Pattern 3: Parse =over /=item list (supports bullets and C<>)
3570
115
479
        while ($pod =~ /=item\s+(?:\*\s*)?(?:C<)?\$(\w+)\b(?:>)?\s*(?:-.*)?\n?(.*?)(?==item|\=back|\=head)/sig) {
3571
11
11
                my $name = $1;
3572
11
15
                my $desc = $2;
3573
3574                # Never allow empty or undefined parameter names
3575
11
26
                next unless defined $name && length $name;
3576
3577
11
15
                $desc =~ s/^\s+|\s+$//g;
3578
3579                # Skip common non-parameters
3580
11
17
                next if $name =~ /^(self|class|return|returns?)$/i;
3581
3582
11
42
                $params{$name} ||= { _source => 'pod' };
3583
3584                # Explicit typed form only:
3585                #       $param - type (constraints)
3586
11
14
                if ($desc =~ /^\s*(string|integer|int|number|num|float|boolean|bool|array|arrayref|hash|hashref|any)\b(?:\s*\(([^)]+)\))?/i) {
3587
0
0
                        my $type = lc($1);
3588
0
0
                        my $constraint = $2;
3589
3590                        # Normalize type names
3591
0
0
                        $type = 'integer' if $type eq 'int';
3592
0
0
                        $type = 'number' if $type eq 'num' || $type eq 'float';
3593
0
0
                        $type = 'boolean' if $type eq 'bool';
3594
0
0
                        $type = 'arrayref' if $type eq 'array';
3595
0
0
                        $type = 'hashref' if $type eq 'hash';
3596
3597
0
0
                        $params{$name}{type} = $type;
3598
3599
0
0
                        if ($constraint) {
3600
0
0
                                $self->_parse_constraints($params{$name}, $constraint);
3601                        }
3602
3603
0
0
                        $self->_log("  POD: Explicit type '$type' for $name");
3604                } else {
3605                        # Heuristic inference from description text
3606
11
34
                        if ($desc =~ /\bstring\b/i) {
3607
0
0
                                $params{$name}{type} = 'string';
3608                        } elsif ($desc =~ /\b(int|integer)\b/i) {
3609
0
0
                                $params{$name}{type} = 'integer';
3610                        } elsif ($desc =~ /\b(num|number|float)\b/i) {
3611
0
0
                                $params{$name}{type} = 'number';
3612                        } elsif ($desc =~ /\b(bool|boolean)\b/i) {
3613
0
0
                                $params{$name}{type} = 'boolean';
3614                        }
3615                }
3616
3617                # Check for optional/required in description.
3618                # Use word boundaries to avoid matching "optionally" as "optional".
3619
11
21
                if ($desc =~ /\boptional\b/i) {
3620
0
0
                        $params{$name}{optional} = 1;
3621                } elsif ($desc =~ /required|mandatory/i) {
3622
1
2
                        $params{$name}{optional} = 0;
3623                }
3624
3625                # Look for regex patterns
3626
11
14
                if ($desc =~ m{matches?\s+(/[^/]+/|qr/.+?/)}i) {
3627
0
0
                        $params{$name}{matches} = $1;
3628                }
3629
3630
11
15
                $self->_log("  POD: Found parameter '$name' from =item list");
3631        }
3632
3633        # Extract default values from POD
3634
115
161
        my $pod_defaults = $self->_extract_defaults_from_pod($pod);
3635
115
162
        foreach my $param (keys %$pod_defaults) {
3636
10
7
                if (exists $params{$param}) {
3637
10
7
                        $params{$param}{_default} = $pod_defaults->{$param};
3638
10
11
                        $params{$param}{optional} = 1 unless defined $params{$param}{optional};
3639                        $self->_log(sprintf("  POD: %s has default value: %s",
3640                                $param,
3641
10
18
                                defined($pod_defaults->{$param}) ? $pod_defaults->{$param} : 'undef'
3642                        ));
3643                }
3644        }
3645
3646        # Default undocumented optionality: documented params are REQUIRED unless stated otherwise
3647
115
126
        for my $name (keys %params) {
3648
74
106
                next if $name =~ /^(self|class)$/i;
3649
3650                # TODO: if optionality was never explicitly set, assume required.
3651                # Currently disabled as it breaks some schemas — revisit in a future pass.
3652                # if (!exists $params{$name}{optional}) {
3653                        # $params{$name}{optional} = 0;
3654                        # $self->_log("  POD: $name assumed required (no optional/default specified)");
3655                # }
3656        }
3657
3658        # Pattern 0: =head3|4 Input formal spec — highest-priority type source.
3659        # Runs last so positional matching can use positions set by earlier patterns.
3660        # Accepts positional array format: [ {type=>'...'}, ... ]
3661        # and named hash format:           { name => {type=>'...'}, ... }
3662
115
304
        if ($pod =~ /=head[34]\s+Input\b(.*?)(?==head|\z)/si) {
3663
7
23
                my $block = $1;
3664
7
11
                $block =~ s/\A\s+//;
3665
3666
7
14
                if ($block =~ /\A\[/) {
3667                        # Positional format: each {…} maps to the param at that array index.
3668
5
5
                        my $idx = 0;
3669
5
11
                        while ($block =~ /\{([^}]*)\}/g) {
3670
5
5
                                my $spec = $1;
3671
5
3
5
6
                                my ($name) = grep { ($params{$_}{position} // -1) == $idx }
3672                                             keys %params;
3673
5
6
                                if (defined $name) {
3674
3
2
                                        $params{$name}{_from_input_spec} = 1;
3675
3
4
                                        if (my $t = $self->_map_formal_input_type($spec)) {
3676
3
4
                                                $params{$name}{type} = $t;
3677
3
4
                                                $self->_log("  POD: $name type '$t' from =head Input (positional $idx)");
3678                                        }
3679
3
5
                                        if ($spec =~ /\boptional\s*=>\s*(0|1)/i) {
3680
0
0
                                                $params{$name}{optional} = $1 + 0;
3681                                        }
3682                                }
3683
5
8
                                $idx++;
3684                        }
3685                } elsif ($block =~ /\A\{/) {
3686                        # Named format: each 'name => {…}' entry maps directly by name.
3687
1
3
                        while ($block =~ /\b(\w+)\s*=>\s*\{([^}]*)\}/g) {
3688
2
4
                                my ($name, $spec) = ($1, $2);
3689
2
3
                                next if $name =~ /^(self|class)$/i;
3690
2
5
                                $params{$name} //= { _source => 'pod' };
3691
2
3
                                $params{$name}{_from_input_spec} = 1;
3692
2
2
                                if (my $t = $self->_map_formal_input_type($spec)) {
3693
2
3
                                        $params{$name}{type} = $t;
3694
2
3
                                        $self->_log("  POD: $name type '$t' from =head Input (named)");
3695                                }
3696
2
7
                                if ($spec =~ /\boptional\s*=>\s*(0|1)/i) {
3697
1
2
                                        $params{$name}{optional} = $1 + 0;
3698                                }
3699                        }
3700                        # A named-format Input spec signals a hash/named API.  Positional
3701                        # info from signature analysis is not meaningful here and causes
3702                        # "param X missing position" errors when params are mixed.
3703
1
3
                        delete $params{$_}{position} for keys %params;
3704                }
3705        }
3706
3707
115
167
        return \%params;
3708}
3709
3710# --------------------------------------------------
3711# _map_formal_input_type
3712#
3713# Purpose:    Extract and normalise the type string
3714#             from a parameter spec fragment such as
3715#             "type => 'scalar | scalarref'".
3716#             Handles union types by returning the
3717#             canonical ATG type for the first
3718#             recognised alternative.
3719#
3720# Entry:      $spec - text content of a { } block
3721#                     from a =head3|4 Input spec.
3722#
3723# Exit:       Canonical type string, or undef when
3724#             no 'type' key is present or the value
3725#             is not a recognised type name.
3726# --------------------------------------------------
3727sub _map_formal_input_type {
3728
10
1699
        my ($self, $spec) = @_;
3729
10
36
        return undef unless $spec =~ /\btype\s*=>\s*['"]([^'"]+)['"]/i;
3730
9
14
        my $raw = lc($1);
3731
9
15
        $raw =~ s/\s+//g;
3732
3733
9
76
        my %map = (
3734                scalar    => 'string',
3735                scalarref => 'string',
3736                str       => 'string',
3737                string    => 'string',
3738                int       => 'integer',
3739                integer   => 'integer',
3740                num       => 'number',
3741                number    => 'number',
3742                float     => 'number',
3743                bool      => 'boolean',
3744                boolean   => 'boolean',
3745                array     => 'arrayref',
3746                arrayref  => 'arrayref',
3747                hash      => 'hashref',
3748                hashref   => 'hashref',
3749                object    => 'object',
3750                any       => 'any',
3751                undef     => 'undef',
3752                coderef   => 'coderef',
3753        );
3754
3755
9
14
        for my $t (split /\|/, $raw) {
3756
9
33
                return $map{$t} if exists $map{$t};
3757        }
3758
1
4
        return undef;
3759}
3760
3761# --------------------------------------------------
3762# _analyze_output
3763#
3764# Purpose:    Orchestrate analysis of a method's
3765#             return value by combining POD return
3766#             section parsing, code return statement
3767#             analysis, boolean detection, context
3768#             detection, void detection, chaining
3769#             detection, and error convention
3770#             detection.
3771#
3772# Entry:      $pod         - POD string for the method.
3773#             $code        - method body source string.
3774#             $method_name - name of the method being
3775#                            analysed, used for
3776#                            boolean heuristics.
3777#
3778# Exit:       Returns a hashref describing the
3779#             output type and behaviour, or an empty
3780#             hashref if nothing could be determined.
3781#             Keys include: type, value, isa, and
3782#             various _* metadata keys.
3783#
3784# Side effects: Logs progress to stdout when
3785#               verbose is set.
3786# --------------------------------------------------
3787sub _analyze_output {
3788
287
3005
        my ($self, $pod, $code, $method_name) = @_;
3789
3790
287
207
        my %output;
3791
3792
287
405
        $self->_analyze_output_from_pod(\%output, $pod);
3793
287
416
        $self->_analyze_output_from_code(\%output, $code, $method_name);
3794
287
432
        $self->_enhance_boolean_detection(\%output, $pod, $code, $method_name);
3795
287
410
        $self->_detect_list_context(\%output, $code);
3796
287
379
        $self->_detect_void_context(\%output, $code, $method_name);
3797
287
377
        $self->_detect_chaining_pattern(\%output, $code);
3798
287
427
        $self->_detect_error_conventions(\%output, $code);
3799
3800
287
488
        $self->_validate_output(\%output) if keys %output;
3801
3802        # Don't return empty output
3803
287
437
        return (keys %output) ? \%output : {};
3804}
3805
3806# --------------------------------------------------
3807# _analyze_output_from_pod
3808#
3809# Purpose:    Parse the POD documentation for a
3810#             method's return value and populate
3811#             an output hashref with type, value,
3812#             and behaviour information.
3813#
3814# Entry:      $output - hashref to populate
3815#                       (modified in place).
3816#             $pod    - POD string for the method.
3817#
3818# Exit:       Returns nothing. Modifies $output
3819#             in place.
3820#
3821# Side effects: Logs detections to stdout when
3822#               verbose is set.
3823#
3824# Notes:      Two patterns are tried: (1) a
3825#             'Returns:' section of up to 3 lines,
3826#             and (2) an inline 'returns X' phrase.
3827#             The section pattern takes precedence.
3828# --------------------------------------------------
3829sub _analyze_output_from_pod {
3830
290
256
        my ($self, $output, $pod) = @_;
3831
290
3190
262
2784
        my %VALID_OUTPUT_TYPES = map { $_ => 1 }
3832                qw(string integer number float boolean arrayref hashref object coderef void undef);
3833
3834
290
487
        if ($pod) {
3835                # Pattern 0: =head4 Output formal spec (highest priority — explicit over heuristic)
3836                # The outer container shape determines the return type:
3837                #   (...)  â€” list/array of items
3838                #   [...]  â€” arrayref  (bare [] = empty/void, skip)
3839                #   {...}  â€” hashref spec; look for type => inside, or isa => for object
3840
110
250
                if($pod =~ /=head4\s+Output\b(.*?)(?==head|\z)/si) {
3841
3
4
                        my $block = $1;
3842
3
6
                        $block =~ s/^\s+//;
3843
3
7
                        if($block =~ /^\(/) {
3844
2
3
                                $output->{type} = 'array';
3845
2
3
                                $self->_log("  OUTPUT: type 'array' from =head4 Output list notation");
3846                        } elsif($block =~ /^\[/) {
3847
0
0
                                unless($block =~ /^\[\s*\]/) {
3848
0
0
                                        $output->{type} = 'arrayref';
3849
0
0
                                        $self->_log("  OUTPUT: type 'arrayref' from =head4 Output arrayref notation");
3850                                }
3851                        } elsif($block =~ /^\{/) {
3852
1
4
                                if($block =~ /type\s*=>\s*['"]?(\w[\w:]*?)['"]?\s*[,}]/i) {
3853
1
2
                                        my $type = lc($1);
3854
1
2
                                        $type = 'hashref'  if $type eq 'hash';
3855
1
1
                                        $type = 'arrayref' if $type eq 'array';
3856
1
2
                                        if($VALID_OUTPUT_TYPES{$type}) {
3857
1
1
                                                $output->{type} = $type;
3858
1
2
                                                $self->_log("  OUTPUT: type '$type' from =head4 Output formal spec");
3859                                        } elsif($block =~ /\bisa\s*=>/) {
3860
0
0
                                                $output->{type} = 'object';
3861
0
0
                                                $self->_log("  OUTPUT: type 'object' from =head4 Output isa spec");
3862                                        }
3863                                } elsif($block =~ /\bisa\s*=>/) {
3864
0
0
                                        $output->{type} = 'object';
3865
0
0
                                        $self->_log("  OUTPUT: type 'object' from =head4 Output isa spec");
3866                                }
3867                        }
3868                }
3869
3870                # Pattern 1: Returns: section
3871                # Up to 3 lines
3872
110
151
                if ($pod =~ /Returns?:\s+([^\n]+(?:\n[^\n]+){0,2})/si) {
3873
8
12
                        my $returns_desc = $1;
3874
8
21
                        $returns_desc =~ s/^\s+|\s+$//g;
3875
3876
8
16
                        $self->_log("  OUTPUT: Found Returns section: $returns_desc");
3877
3878                        # Try to infer type from description (skip if Pattern 0 already set type)
3879
8
115
                        if (!$output->{type} && $returns_desc =~ /\b(string|text)\b/i) {
3880
1
2
                                $output->{type} = 'string';
3881                        } elsif (!$output->{type} && $returns_desc =~ /\b(integer|int|count)\b/i) {
3882
1
2
                                $output->{type} = 'integer';
3883                        } elsif (!$output->{type} && $returns_desc =~ /\b(float|decimal|number)\b/i) {
3884
0
0
                                $output->{type} = 'number';
3885                        } elsif (!$output->{type} && $returns_desc =~ /\b(boolean|true|false)\b/i) {
3886
1
2
                                $output->{type} = 'boolean';
3887                        } elsif (!$output->{type} && $returns_desc =~ /\b(array|list)\b/i) {
3888
0
0
                                $output->{type} = 'arrayref';
3889                        } elsif (!$output->{type} && $returns_desc =~ /\b(hash|hashref|dictionary)\b/i) {
3890
0
0
                                $output->{type} = 'hashref';
3891                        } elsif (!$output->{type} && $returns_desc =~ /\b(object|instance)\b/i) {
3892
2
4
                                $output->{type} = 'object';
3893                        } elsif (!$output->{type} && $returns_desc =~ /\bundef\b/i) {
3894
0
0
                                $output->{type} = 'undef';
3895                        }
3896
3897                        # Look for specific values
3898
8
20
                        if ($returns_desc =~ /\b1\s+(?:on\s+success|if\s+successful)\b/i) {
3899
1
2
                                $output->{value} = 1;
3900
1
2
                                if(defined($output->{'type'}) && ($output->{type} eq 'scalar')) {
3901
0
0
                                        $output->{type} = 'boolean';
3902                                } else {
3903
1
3
                                        $output->{type} ||= 'boolean';
3904                                }
3905
1
2
                                $self->_log("  OUTPUT: Returns 1 on success");
3906                        } elsif ($returns_desc =~ /\b0\s+(?:on\s+failure|if\s+fail)\b/i) {
3907
0
0
                                $output->{alt_value} = 0;
3908                        } elsif ($returns_desc =~ /dies\s+on\s+(?:error|failure)/i) {
3909
0
0
                                $output->{_STATUS} = 'LIVES';
3910
0
0
                                $self->_log('  OUTPUT: Should not die on success');
3911                        }
3912
8
16
                        if ($returns_desc =~ /\b(true|false)\b/i) {
3913
1
2
                                $output->{type} ||= 'boolean';
3914                        }
3915
8
12
                        if ($returns_desc =~ /\bundef\b/i) {
3916
0
0
                                $output->{optional} = 1;
3917                        }
3918                }
3919
3920                # Pattern 2: Inline "returns X"
3921
110
375
                if((!$output->{type}) && ($pod =~ /returns?\s+(?:an?\s+)?(\w+)/i)) {
3922
30
45
                        my $type = lc($1);
3923
3924
30
56
                        $type = 'boolean' if $type =~ /^(true|false|bool)$/;
3925                        # Skip if it's just a number (like "returns 1")
3926
30
39
                        $type = 'integer' if $type eq 'int';
3927
30
48
                        $type = 'number' if $type =~ /^(num|float)$/;
3928
30
31
                        $type = 'arrayref' if $type eq 'array';
3929
30
28
                        $type = 'hashref' if $type eq 'hash';
3930
3931
30
41
                        if($type =~ /^\d+$/) {
3932
0
0
                                if($type eq '1' || $type eq '0') {
3933                                        # Try hard to guess if the result is a boolean
3934
0
0
                                        if($pod =~ /1 on success.+0 (on|if) /i) {
3935
0
0
                                                $type = 'boolean';
3936                                        } elsif($pod =~ /return 0 .+ 1 on success/) {
3937
0
0
                                                $type = 'boolean';
3938                                        } else {
3939
0
0
                                                $type = 'integer';
3940                                        }
3941                                } else {
3942
0
0
                                        $type = 'integer';
3943                                }
3944                        }
3945
3946
30
34
                        $type = 'arrayref' if !$type && $pod =~ /returns?\s+.+\slist\b/i;
3947                        # $output->{type} = $type if $type && $type !~ /^\d+$/;
3948
30
36
                        if ($VALID_OUTPUT_TYPES{$type}) {
3949
9
11
                                $output->{type} = $type;
3950
9
14
                                $self->_log("  OUTPUT: Inferred type from POD: $type");
3951                        } else {
3952
21
33
                                $self->_log("  OUTPUT: POD return type '$type' is not a valid type, ignoring");
3953                        }
3954                }
3955        }
3956}
3957
3958# --------------------------------------------------
3959# _extract_defaults_from_pod
3960#
3961# Purpose:    Extract default values for parameters
3962#             from POD documentation using multiple
3963#             pattern strategies.
3964#
3965# Entry:      $pod - POD string for the method.
3966#                    May be undef or empty.
3967#
3968# Exit:       Returns a hashref of parameter name
3969#             to cleaned default value. Returns an
3970#             empty hashref if no POD is provided
3971#             or no defaults are found.
3972#
3973# Side effects: None.
3974#
3975# Notes:      Three strategies are tried: (1) lines
3976#             containing 'Default:' or 'Defaults to:',
3977#             (2) lines containing 'Optional, default',
3978#             (3) inline $name - type, default value
3979#             format. Parameter names are inferred
3980#             by scanning backwards from the default
3981#             phrase to the nearest $variable.
3982# --------------------------------------------------
3983sub _extract_defaults_from_pod {
3984
121
2457
        my ($self, $pod) = @_;
3985
3986
121
110
        return {} unless $pod;
3987
3988
120
105
        my %defaults;
3989
3990        # Pattern 1: Default: 'value' or Defaults to: 'value'
3991
120
204
        while ($pod =~ /(?:Default(?:s? to)?|default(?:s? to)?)[:]\s*([^\n\r]+)/gi) {
3992
19
19
                my $default_text = $1;
3993
19
12
                my $match_pos = pos($pod);
3994
19
27
                $default_text =~ s/^\s+|\s+$//g;
3995
3996                # Look backwards in the POD to find the parameter name
3997
19
17
                my $context = substr($pod, 0, $match_pos);
3998
19
36
                my @param_matches = ($context =~ /\$(\w+)/g);
3999
19
19
                my $param = $param_matches[-1] if @param_matches;  # Last parameter before default
4000
4001
19
33
                if ($param) {
4002                        # Always clean the default value - let _clean_default_value handle everything
4003
19
18
                        if ($default_text =~ /(\w+)\s*=\s*(.+)$/) {
4004                                # Has explicit param = value format in the default text
4005
0
0
                                my ($p, $value) = ($1, $2);
4006
0
0
                                $defaults{$p} = $self->_clean_default_value($value);
4007                        } else {
4008                                # Just a value, associate with the found param
4009
19
20
                                $defaults{$param} = $self->_clean_default_value($default_text, 0);  # NOT from code
4010                        }
4011                }
4012        }
4013
4014        # Pattern 2: Optional, default 'value'
4015
120
181
        while ($pod =~ /Optional(?:,)?\s+(?:default|value)\s*[:=]?\s*([^\n\r,;]+)/gi) {
4016
6
8
                my $default_text = $1;
4017
6
5
                my $match_pos = pos($pod);
4018
6
11
                $default_text =~ s/^\s+|\s+$//g;
4019
4020                # Look backwards for parameter name
4021
6
6
                my $context = substr($pod, 0, $match_pos);
4022
6
13
                my @param_matches = ($context =~ /\$(\w+)/g);
4023
6
8
                if (@param_matches) {
4024
6
6
                        my $param = $param_matches[-1];  # Last parameter before the default
4025
6
4
                        $defaults{$param} = $self->_clean_default_value($default_text, 0);
4026                }
4027        }
4028
4029        # Pattern 3: In parameter descriptions: $param - type, default 'value'
4030
120
284
        while ($pod =~ /\$(\w+)\s*-\s*\w+(?:\([^)]*\))?[,\s]+default\s+['"]?([^'",\n]+)['"]?/gi) {
4031
1
2
                my ($param, $value) = ($1, $2);
4032
1
2
                $defaults{$param} = $self->_clean_default_value($value, 0);
4033        }
4034
4035
120
119
        return \%defaults;
4036}
4037
4038# --------------------------------------------------
4039# _analyze_output_from_code
4040#
4041# Purpose:    Analyse return statements in a method
4042#             body to infer the output type by
4043#             counting and classifying each return
4044#             expression.
4045#
4046# Entry:      $output      - hashref to populate
4047#                            (modified in place).
4048#             $code        - method body source string.
4049#             $method_name - method name string.
4050#
4051# Exit:       Returns nothing. Modifies $output
4052#             in place.
4053#
4054# Side effects: Logs detections to stdout when
4055#               verbose is set.
4056# --------------------------------------------------
4057sub _analyze_output_from_code
4058{
4059
289
270
        my ($self, $output, $code, $method_name) = @_;
4060
4061
289
263
        if ($code) {
4062                # Early boolean detection - check for consistent 1/0 returns
4063
289
767
                my @all_returns = $code =~ /return\s+([^;]+);/g;
4064
289
274
                if (@all_returns) {
4065
271
203
                        my $boolean_count = 0;
4066
271
211
                        my $total_count = scalar(@all_returns);
4067
4068
271
239
                        foreach my $ret (@all_returns) {
4069
297
596
                                $ret =~ s/^\s+|\s+$//g;
4070                                # Match 0 or 1, even with conditions
4071
297
397
                                $boolean_count++ if ($ret =~ /^(?:0|1)(?:\s|$)/);
4072                        }
4073
4074                        # If most returns are 0 or 1, strongly suggest boolean
4075
271
329
                        if ($boolean_count >= 2 && $boolean_count >= $total_count * 0.8) {
4076
3
4
                                unless ($output->{type}) {
4077
3
6
                                        $output->{type} = 'boolean';
4078
3
6
                                        $self->_log("  OUTPUT: Early detection - $boolean_count/$total_count returns are 0/1, setting boolean");
4079                                }
4080                        }
4081                }
4082
4083
289
194
                my @return_statements;
4084
4085
289
1082
                if ($code =~ /return\s+bless\s*\{[^}]*\}\s*,\s*['"]?(\w+)['"]?/s) {
4086                        # Detect blessed refs
4087
2
3
                        $output->{type} = 'object';
4088
2
3
                        if($method_name eq 'new') {
4089                                # If we found the new() method, the object we're returning should be a sensible one
4090
0
0
                                if($self->{_document} && (my $package_stmt = $self->{_document}->find_first('PPI::Statement::Package'))) {
4091
0
0
                                        $output->{isa} = $package_stmt->namespace();
4092
0
0
                                        $self->{_package_name} //= $output->{isa};
4093                                }
4094                        } else {
4095
2
3
                                $output->{isa} = $1;
4096                        }
4097
2
5
                        $self->_log("  OUTPUT: Bless found, inferring type from code is $output->{isa}");
4098                } elsif ($code =~ /return\s+bless/s) {
4099
14
20
                        $output->{type} = 'object';
4100
14
21
                        if($method_name eq 'new') {
4101
13
16
                                $output->{isa} = $self->_extract_package_name();
4102
13
175
                                $self->_log("  OUTPUT: Bless found, inferring type from code is $output->{isa}");
4103                        } else {
4104
1
2
                                $self->_log('  OUTPUT: Bless found, inferring type from code is object');
4105                        }
4106                } elsif ($code =~ /return\s*\(\s*[^)]+\s*,\s*[^)]+\s*\)\s*;/) {
4107                        # Detect array context returns - must end with semicolon to be actual return
4108
1
1
                        $output->{type} = 'array';   # Not arrayref - actual array
4109
1
1
                        $self->_log('  OUTPUT: Found array contect return');
4110                } elsif ($code =~ /return\s+bless[^,]+,\s*__PACKAGE__/) {
4111                        # Detect: bless {}, __PACKAGE__
4112
0
0
                        $output->{type} = 'object';
4113                        # Get package name from the extractor's stored document
4114
0
0
                        if ($self->{_document}) {
4115
0
0
                                my $pkg = $self->{_document}->find_first('PPI::Statement::Package');
4116
0
0
                                $output->{isa} = $pkg ? $pkg->namespace : 'UNKNOWN';
4117
0
0
                                $self->_log('  OUTPUT: Object blessed into __PACKAGE__: ' . ($output->{isa} || 'UNKNOWN'));
4118
0
0
                                $self->{_package_name} //= $output->{isa};
4119                        }
4120                } elsif ($code =~ /return\s*\(([^)]+)\)/) {
4121
1
1
                        my $content = $1;
4122
1
2
                        if ($content =~ /,/) {  # Has comma = multiple values
4123
0
0
                                $output->{type} = 'array';
4124                        }
4125                } elsif ($code =~ /return\s+\$self\s*;/ && $code =~ /\$self\s*->\s*\{[^}]+\}\s*=/) {
4126                        # Returns $self for chaining
4127
6
7
                        $output->{type} = 'object';
4128
6
11
                        if ($self->{_document}) {
4129
6
11
                                my $pkg = $self->{_document}->find_first('PPI::Statement::Package');
4130
6
501
                                $output->{isa} = $pkg ? $pkg->namespace : 'UNKNOWN';
4131
6
84
                                $self->_log('  OUTPUT: Object chained into __PACKAGE__: ' . ($output->{isa} || 'UNKNOWN'));
4132
6
9
                                $self->{_package_name} //= $output->{isa};
4133                        }
4134                }
4135
4136                # Find all return statements
4137
289
587
                while ($code =~ /return\s+([^;]+);/g) {
4138
297
326
                        my $return_expr = $1;
4139
297
398
                        push @return_statements, $return_expr;
4140                }
4141
4142
289
254
                if (@return_statements) {
4143
271
461
                        $self->_log('  OUTPUT: Found ' . scalar(@return_statements) . ' return statement(s)');
4144
4145                        # Analyze return patterns
4146
271
222
                        my %return_types;
4147
4148
271
255
                        if($output->{'type'}) {
4149
38
51
                                $return_types{$output->{'type'}} += 3;       # Add weighting to what's already been found
4150                        }
4151
271
169
                        my $min;
4152
271
222
                        foreach my $ret (@return_statements) {
4153
297
491
                                $ret =~ s/^\s+|\s+$//g;
4154
4155                                # Literal values
4156
297
1609
                                if ($ret eq '1' || $ret eq '0') {
4157
26
59
                                        $return_types{boolean}++;
4158                                } elsif ($ret =~ /^['"]/) {
4159
16
23
                                        $return_types{string}++;
4160                                } elsif ($ret =~ /^-?\d+$/) {
4161
103
118
                                        $return_types{integer}++;
4162                                } elsif ($ret =~ /^-?\d+\.\d+$/) {
4163
0
0
                                        $return_types{number}++;
4164                                } elsif ($ret eq 'undef') {
4165
1
2
                                        $return_types{undef}++;
4166                                } elsif ($ret =~ /^\[/) {
4167                                # Data structures
4168
0
0
                                        $return_types{arrayref}++;
4169                                } elsif ($ret =~ /^\{/) {
4170
1
1
                                        $return_types{hashref}++;
4171                                } elsif ($ret =~ m{
4172                                        # Numeric expressions (heuristic, medium confidence)
4173                                        # Don't match ->
4174                                    (?:
4175                                        \+ | -\b | \* | / | %
4176                                      | \+\+ | --
4177                                    )
4178                                }x) {
4179
42
62
                                        $return_types{number} += 2;
4180                                } elsif ($ret =~ /\|\|\s*\d+\b/) {
4181                                        # Logical-or fallback with numeric literal (e.g. $x || 200)
4182
0
0
                                        $return_types{integer} += 2;
4183
0
0
                                        $self->_log("  OUTPUT: Numeric fallback expression detected");
4184                                } elsif($ret =~ /^length[\s\(]/) {
4185
0
0
                                        $return_types{integer}++;
4186
0
0
                                        $min = 0;
4187                                } elsif($ret =~ /^pos[\s\(]/) {
4188
0
0
                                        $return_types{integer}++;
4189
0
0
                                        $min = 0;
4190                                } elsif($ret =~ /^index[\s\(]/) {
4191
0
0
                                        $return_types{integer}++;
4192
0
0
                                        $min = -1;
4193                                } elsif($ret =~ /^rindex[\s\(]/) {
4194
0
0
                                        $return_types{integer}++;
4195
0
0
                                        $min = -1;
4196                                } elsif($ret =~ /^ord[\s\(]/) {
4197
0
0
                                        $return_types{integer}++;
4198                                } elsif ($ret =~ /=/ && $ret =~ /\$\w+/) {
4199                                        # Assignment returning a value (e.g. $self->{status} = $status)
4200                                        # If assignment involves a numeric literal or variable, assume numeric intent
4201
2
5
                                        if ($ret =~ /\b\d+\b/) {
4202
0
0
                                                $return_types{integer} += 2;
4203
0
0
                                                $self->_log("  OUTPUT: Assignment with numeric value detected");
4204                                        } else {
4205
2
3
                                                $return_types{scalar}++;
4206                                        }
4207                                }
4208                                # Variables/expressions
4209                                elsif ($ret =~ /\$\w+/) {
4210
94
247
                                        if ($ret =~ /\\\@/) {
4211
0
0
                                                $return_types{arrayref}++;
4212                                        } elsif ($ret =~ /\\\%/) {
4213
0
0
                                                $return_types{hashref}++;
4214                                        } elsif ($ret =~ /bless/) {
4215
1
2
                                                $return_types{object} += 2;     # Heigher weight
4216                                        } elsif ($ret =~ /^\{[^}]*\}$/) {
4217
0
0
                                                $return_types{hashref}++;
4218                                        } elsif ($ret =~ /^\[[^\]]*\]$/) {
4219
0
0
                                                $return_types{arrayref}++;
4220                                        } else {
4221
93
131
                                                $return_types{scalar}++;
4222                                        }
4223                                }
4224                        }
4225
4226                        # Determine most common return type
4227
271
266
                        if (keys %return_types) {
4228
267
40
375
69
                                my ($most_common) = sort { $return_types{$b} <=> $return_types{$a} } keys %return_types;
4229                                # Prefer integer over scalar if numeric returns dominate
4230
267
369
                                if ($return_types{integer} && (!$return_types{string})) {
4231
101
115
                                        if (!$output->{type} || $output->{type} eq 'scalar') {
4232
101
99
                                                $output->{type} = 'integer';
4233
101
115
                                                $self->_log("  OUTPUT: Numeric returns dominate, forcing integer");
4234
101
188
                                                $output->{_type_confidence} ||= 'low';
4235
101
85
                                                if(defined($min)) {
4236
0
0
                                                        $output->{min} = $min;
4237                                                }
4238                                        }
4239                                }
4240
267
278
                                unless ($output->{type}) {
4241
128
134
                                        $output->{type} = $most_common;
4242
4243                                        # Assign confidence for inferred numeric expressions
4244
128
140
                                        if ($most_common eq 'number') {
4245
29
74
                                                $output->{_type_confidence} ||= 'medium';
4246
29
34
                                                if(defined($min)) {
4247
0
0
                                                        $output->{min} = $min;
4248                                                }
4249                                        }
4250
4251
128
154
                                        $self->_log("  OUTPUT: Inferred type from code: $most_common");
4252                                }
4253                        }
4254
4255                        # Check for consistent single value returns
4256
271
696
                        if (@return_statements == 1 && $return_statements[0] eq '1') {
4257
21
26
                                $output->{value} = 1;
4258
21
47
                                $output->{type} = 'boolean' if !$output->{type} || $output->{type} eq 'scalar';
4259
21
44
                                $self->_log("  OUTPUT: Type already set to '$output->{type}', overriding with boolean") if($output->{'type'});
4260                        }
4261                } else {
4262                        # No explicit return - might return nothing or implicit undef
4263
18
24
                        $self->_log("  OUTPUT: No explicit return statement found");
4264                }
4265        }
4266}
4267
4268# --------------------------------------------------
4269# _enhance_boolean_detection
4270#
4271# Purpose:    Apply additional boolean-specific
4272#             detection heuristics using a weighted
4273#             scoring system, to override weak
4274#             type assignments when there is strong
4275#             evidence of a boolean return.
4276#
4277# Entry:      $output      - output hashref
4278#                            (modified in place).
4279#             $pod         - POD string.
4280#             $code        - method body source string.
4281#             $method_name - method name string.
4282#
4283# Exit:       Returns nothing. Modifies $output
4284#             in place, setting type to 'boolean'
4285#             if the score reaches
4286#             $BOOLEAN_SCORE_THRESHOLD.
4287#
4288# Side effects: Logs scoring details to stdout when
4289#               verbose is set.
4290#
4291# Notes:      Only fires when output type is
4292#             not yet set or is 'unknown'. Does not
4293#             override explicitly set types.
4294# --------------------------------------------------
4295sub _enhance_boolean_detection {
4296
288
323
        my ($self, $output, $pod, $code, $method_name) = @_;
4297
4298
288
191
        my $boolean_score = 0;  # Track evidence for boolean return
4299
4300
288
553
        return unless !$output->{type} || $output->{type} eq 'unknown';
4301
4302        # Look for stronger boolean indicators
4303
24
41
        if ($pod && !$output->{type}) {
4304                # Common boolean return patterns in POD
4305
3
7
                if ($pod =~ /returns?\s+(true|false|true|false|1|0)\s+(?:on|for|upon)\s+(success|failure|error|valid|invalid)/i) {
4306
0
0
                        $boolean_score += 30;
4307
0
0
                        $self->_log('  OUTPUT: Strong boolean indicator in POD (+30)');
4308                }
4309
4310                # Check for method names that suggest boolean returns
4311
3
11
                if ($pod =~ /(?:method|sub)\s+(\w+)/) {
4312
0
0
                        my $inferred_method_name = $1;
4313
0
0
                        if ($inferred_method_name =~ /^(is_|has_|can_|should_|contains_|exists_)/) {
4314
0
0
                                $boolean_score += 20;
4315
0
0
                                $self->_log("  OUTPUT: Inferred method name '$inferred_method_name' suggests boolean return (+20)");
4316                        }
4317                }
4318        }
4319
4320        # Analyze code for boolean patterns
4321
24
25
        if ($code) {
4322                # Count boolean return idioms
4323
24
40
                my $true_returns = () = $code =~ /return\s+1\s*;/g;
4324
24
31
                my $false_returns = () = $code =~ /return\s+0\s*;/g;
4325
4326
24
46
                if ($true_returns + $false_returns >= 2) {
4327
0
0
                        $boolean_score += 40;
4328
0
0
                        $self->_log('  OUTPUT: Multiple 1/0 returns suggest boolean (+40)');
4329                } elsif ($true_returns + $false_returns == 1) {
4330
2
1
                        $boolean_score += 10;
4331
2
2
                        $self->_log('  OUTPUT: Single 1/0 return (+10)');
4332                }
4333
4334                # Ternary operators that return booleans
4335
24
41
                if ($code =~ /return\s+(?:\w+\s*[!=]=\s*\w+|\w+\s*>\s*\w+|\w+\s*<\s*\w+)\s*\?\s*(?:1|0)\s*:\s*(?:1|0)/) {
4336
0
0
                        $boolean_score += 25;
4337
0
0
                        $self->_log('  OUTPUT: Ternary with 1/0 suggests boolean (+25)');
4338                }
4339
4340                # Check for common boolean method patterns
4341
24
41
                if ($code =~ /return\s+[!\$\@\%]/) {
4342                        # Returns negation or existence check
4343
0
0
                        $boolean_score += 15;
4344
0
0
                        $self->_log('  OUTPUT: Returns negation/existence check (+15)');
4345                }
4346        }
4347
4348        # Check method name for boolean indicators
4349
24
24
        if ($method_name) {
4350
24
40
                if ($method_name =~ /^(is_|has_|can_|should_|contains_|exists_|check_|verify_|validate_)/) {
4351
2
1
                        $boolean_score += 25;
4352
2
4
                        $self->_log("  OUTPUT: Method name '$method_name' suggests boolean return (+25)");
4353                }
4354
24
33
                if ($method_name =~ /_ok$/) {
4355
0
0
                        $boolean_score += 30;
4356
0
0
                        $self->_log("  OUTPUT: Method name '$method_name' ends with '_ok' (+30)");
4357                }
4358        }
4359
4360        # Apply boolean type if we have strong evidence
4361        # Override weak type assignments (like 'array' from false positive)
4362
24
42
        if($boolean_score >= $BOOLEAN_SCORE_THRESHOLD) {
4363
2
8
                if (!$output->{type} || $output->{type} eq 'scalar' || $output->{type} eq 'array' || $output->{type} eq 'undef') {
4364
2
4
                        my $old_type = $output->{type} || 'none';
4365
2
2
                        $output->{type} = 'boolean';
4366
2
4
                        $self->_log("  OUTPUT: Boolean score $boolean_score >= $BOOLEAN_SCORE_THRESHOLD, setting type to boolean (was: $old_type)");
4367                }
4368        }
4369}
4370
4371# --------------------------------------------------
4372# _detect_list_context
4373#
4374# Purpose:    Detect methods that return different
4375#             values depending on calling context
4376#             via wantarray, and methods that
4377#             return explicit lists.
4378#
4379# Entry:      $output - output hashref (modified
4380#                       in place).
4381#             $code   - method body source string.
4382#
4383# Exit:       Returns nothing. Modifies $output
4384#             in place, setting _context_aware,
4385#             _list_context, _scalar_context,
4386#             _list_return, and/or type keys.
4387#
4388# Side effects: Logs detections to stdout when
4389#               verbose is set.
4390# --------------------------------------------------
4391sub _detect_list_context {
4392
288
263
        my ($self, $output, $code) = @_;
4393
288
262
        return unless $code;
4394
4395        # Check for wantarray usage
4396
288
336
        if ($code =~ /wantarray/) {
4397
5
4
                $output->{_context_aware} = 1;
4398
5
15
                $self->_log('  OUTPUT: Method uses wantarray - context sensitive');
4399
4400                # Debug: show what we're matching against
4401
5
11
                if ($code =~ /(wantarray[^;]+;)/s) {
4402
4
9
                        $self->_log("  DEBUG wantarray line: $1");
4403                }
4404
4405
5
28
                if ($code =~ /wantarray\s*\?\s*\(([^)]+)\)\s*:\s*([^;]+)/s) {
4406                        # Pattern 1: wantarray ? (list, items) : scalar_value (with parens)
4407
1
2
                        my ($list_return, $scalar_return) = ($1, $2);
4408
1
3
                        $self->_log("  DEBUG list (with parens): [$list_return], scalar: [$scalar_return]");
4409
4410
1
1
                        $output->{_list_context} = $self->_infer_type_from_expression($list_return);
4411
1
2
                        $output->{_scalar_context} = $self->_infer_type_from_expression($scalar_return);
4412
1
1
                        $self->_log('  OUTPUT: Detected context-dependent returns (parenthesized)');
4413                } elsif ($code =~ /wantarray\s*\?\s*([^:]+?)\s*:\s*([^;]+)/s) {
4414                        # Pattern 2: wantarray ? @array : scalar (no parens around list)
4415
3
4
                        my ($list_return, $scalar_return) = ($1, $2);
4416                        # Clean up
4417
3
7
                        $list_return =~ s/^\s+|\s+$//g;
4418
3
6
                        $scalar_return =~ s/^\s+|\s+$//g;
4419
4420
3
7
                        $self->_log("  DEBUG list (no parens): [$list_return], scalar: [$scalar_return]");
4421
4422
3
6
                        $output->{_list_context} = $self->_infer_type_from_expression($list_return);
4423
3
5
                        $output->{_scalar_context} = $self->_infer_type_from_expression($scalar_return);
4424
3
4
                        $self->_log('  OUTPUT: Detected context-dependent returns (non-parenthesized)');
4425                } elsif ($code =~ /return[^;]*unless\s+wantarray.*?return\s*\(([^)]+)\)/s) {
4426                        # Pattern 3: return unless wantarray; return (list);
4427
1
2
                        $output->{_list_context} = { type => 'array' };
4428
1
1
                        $self->_log('  OUTPUT: Detected list context return after wantarray check');
4429                }
4430        }
4431
4432        # Detect explicit list returns (multiple values in parentheses)
4433        # Avoid false positives from function calls
4434
288
394
        if ($code =~ /return\s*\(\s*([^)]+)\s*\)\s*;/) {
4435
4
5
                my $content = $1;
4436
4437                # Count commas outside of nested structures, jumping over each
4438                # balanced bracketed block in one step via extract_bracketed
4439
4
334
                require Text::Balanced;
4440
4
4192
                my $comma_count = 0;
4441
4
4
                my $rest = $content;
4442
4
5
                while (length $rest) {
4443
44
48
                        if (substr($rest, 0, 1) =~ /[(\[{]/) {
4444
5
7
                                my $extracted = Text::Balanced::extract_bracketed($rest, '(){}[]');
4445
5
349
                                last unless defined $extracted; # Unbalanced brackets
4446
5
6
                                next;
4447                        }
4448
39
29
                        $comma_count++ if substr($rest, 0, 1) eq ',';
4449
39
29
                        $rest = substr($rest, 1);
4450                }
4451
4452
4
12
                if ($comma_count > 0 && $content !~ /\b(?:bless|new)\b/) {
4453                        # Multiple values returned
4454
3
8
                        unless ($output->{type} && $output->{type} eq 'boolean') {
4455
3
4
                                $output->{type} = 'array';
4456
3
4
                                $output->{_list_return} = $comma_count + 1;
4457
3
6
                                $self->_log('  OUTPUT: Returns list of ' . ($comma_count + 1) . ' values');
4458                        }
4459                }
4460        }
4461}
4462
4463# --------------------------------------------------
4464# _detect_void_context
4465#
4466# Purpose:    Detect methods that return nothing
4467#             meaningful (void context), methods
4468#             that always return 1 as a success
4469#             indicator, and methods whose name
4470#             suggests void context (setters,
4471#             mutators, loggers).
4472#
4473# Entry:      $output      - output hashref
4474#                            (modified in place).
4475#             $code        - method body source string.
4476#             $method_name - method name string.
4477#
4478# Exit:       Returns nothing. Modifies $output
4479#             in place, setting _void_context,
4480#             _success_indicator, and/or type.
4481#
4482# Side effects: Logs detections to stdout when
4483#               verbose is set.
4484# --------------------------------------------------
4485sub _detect_void_context {
4486
288
256
        my ($self, $output, $code, $method_name) = @_;
4487
288
253
        return unless $code;
4488
4489
288
368
        $self->_log("  DEBUG _detect_void_context called for $method_name");
4490
4491        # Methods that typically don't return meaningful values
4492
288
941
        my $void_patterns = {
4493                'setter' => qr/^set_\w+$/,
4494                'mutator' => qr/^(?:add|remove|delete|clear|reset|update)_/,
4495                'logger' => qr/^(?:log|debug|warn|error|info)$/,
4496                'printer' => qr/^(?:print|say|dump)_/,
4497        };
4498
4499        # Check if method name suggests void context
4500
288
344
        foreach my $type (keys %$void_patterns) {
4501
1134
1755
                if ($method_name =~ $void_patterns->{$type}) {
4502
11
12
                        $output->{_void_context_hint} = $type;
4503
11
20
                        $self->_log("  OUTPUT: Method name suggests $type (typically void context)");
4504
11
11
                        last;
4505                }
4506        }
4507
4508        # Analyze return statements
4509
288
570
        my @returns = $code =~ /return\s*([^;]*);/g;
4510
4511
288
415
        $self->_log('  DEBUG Found ' . scalar(@returns) . ' return statements');
4512
4513        # Count different return patterns
4514
288
236
        my $no_value_returns = 0;
4515
288
187
        my $true_returns = 0;
4516
288
222
        my $self_returns = 0;
4517
4518
288
223
        foreach my $ret (@returns) {
4519
300
482
                $ret =~ s/^\s+|\s+$//g;
4520
300
365
                $self->_log("  DEBUG return value: [$ret]");
4521
300
297
                $no_value_returns++ if $ret eq '';
4522
300
331
                $no_value_returns++ if($ret =~ /^(if|unless)\s/);
4523
300
289
                $true_returns++ if $ret eq '1';
4524
300
253
                $self_returns++ if $ret eq '$self';
4525
300
306
                if ($ret =~ /\?\s*1\s*:\s*0\b/) {
4526                        # Strong boolean signal: ternary returning 1/0
4527
9
8
                        $true_returns++;
4528                        # $self->_log("  OUTPUT: Ternary 1:0 return detected, treating as boolean (+40)");
4529
9
10
                        $self->_log('  OUTPUT: Ternary 1:0 return detected, treating as boolean');
4530                }
4531        }
4532
4533
288
239
        my $total_returns = scalar(@returns);
4534
4535
288
425
        $self->_log("  DEBUG no_value=$no_value_returns, true=$true_returns, self=$self_returns, total=$total_returns");
4536
4537        # Void context indicators
4538
288
899
        if ($no_value_returns > 0 && $no_value_returns == $total_returns) {
4539
6
6
                $output->{_void_context} = 1;
4540
6
6
                $output->{type} = 'void';  # This should override any previous type
4541
6
6
                $self->_log('  OUTPUT: All returns are empty - void context method');
4542        } elsif ($true_returns > 0 && $true_returns == $total_returns && $total_returns >= 1) {
4543                # Methods that always return true (success indicator)
4544
31
37
                $output->{_success_indicator} = 1;
4545                # Don't override type if already set to boolean
4546
31
68
                unless ($output->{type} && $output->{type} eq 'boolean') {
4547
2
2
                        $output->{type} = 'boolean';
4548                }
4549
31
31
                $self->_log('  OUTPUT: Always returns 1 - success indicator pattern');
4550        }
4551}
4552
4553# --------------------------------------------------
4554# _detect_chaining_pattern
4555#
4556# Purpose:    Detect methods that return $self for
4557#             fluent interface chaining, by counting
4558#             the proportion of return statements
4559#             that return $self.
4560#
4561# Entry:      $output - output hashref (modified
4562#                       in place).
4563#             $code   - method body source string.
4564#
4565# Exit:       Returns nothing. Modifies $output
4566#             in place, setting type to 'object',
4567#             _returns_self to 1, and isa to the
4568#             current package name when the
4569#             proportion of $self returns is >= 0.8.
4570#
4571# Side effects: Logs detection to stdout when
4572#               verbose is set.
4573# --------------------------------------------------
4574sub _detect_chaining_pattern {
4575
287
257
        my ($self, $output, $code) = @_;
4576
287
256
        return unless $code;
4577
4578        # Count returns of $self
4579
287
197
        my $self_returns = 0;
4580
287
162
        my $total_returns = 0;
4581
4582
287
523
        while ($code =~ /return\s+([^;]+);/g) {
4583
294
269
                my $ret = $1;
4584
294
424
                $ret =~ s/^\s+|\s+$//g;
4585
294
187
                $total_returns++;
4586
294
383
                $self_returns++ if $ret eq '$self';
4587        }
4588
4589        # If most/all returns are $self, it's a chaining method
4590
287
368
        if ($self_returns > 0 && $total_returns > 0) {
4591
8
10
                my $ratio = $self_returns / $total_returns;
4592
4593
8
8
                if ($ratio >= 0.8) {
4594
6
5
                        $output->{type} = 'object';
4595
6
6
                        $output->{_returns_self} = 1;
4596
4597                        # Get the class name
4598
6
11
                        if ($self->{_document}) {
4599
5
8
                                my $pkg = $self->{_document}->find_first('PPI::Statement::Package');
4600
5
424
                                $output->{isa} = $pkg ? $pkg->namespace : 'UNKNOWN';
4601
5
68
                                $self->{_package_name} //= $output->{isa};
4602                        }
4603
4604
6
12
                        $self->_log("  OUTPUT: Chainable method - returns \$self ($self_returns/$total_returns returns)");
4605                }
4606        }
4607}
4608
4609# --------------------------------------------------
4610# _detect_error_conventions
4611#
4612# Purpose:    Analyse how a method signals errors
4613#             by detecting patterns such as
4614#             'return undef if', implicit bare
4615#             returns, empty list returns, 0/1
4616#             boolean error patterns, and eval
4617#             exception handling.
4618#
4619# Entry:      $output - output hashref (modified
4620#                       in place).
4621#             $code   - method body source string.
4622#
4623# Exit:       Returns nothing. Modifies $output
4624#             in place, setting _error_handling,
4625#             _error_return, and
4626#             _success_failure_pattern keys.
4627#
4628# Side effects: Logs detections to stdout when
4629#               verbose is set.
4630# --------------------------------------------------
4631sub _detect_error_conventions {
4632
288
253
        my ($self, $output, $code) = @_;
4633
4634
288
240
        return unless $code;
4635
4636
288
300
        $self->_log('  DEBUG _detect_error_conventions called');
4637
4638
288
185
        my %error_patterns;
4639
4640        # Pattern 1: return undef if/unless condition
4641
288
362
        while ($code =~ /return\s+undef\s+(?:if|unless)\s+([^;]+);/g) {
4642
7
7
4
10
                push @{$error_patterns{undef_on_error}}, $1;
4643
7
22
                $self->_log("  DEBUG Found 'return undef' pattern");
4644        }
4645
4646        # Pattern 2: return if/unless (implicit undef)
4647
288
399
        while ($code =~ /return\s+(?:if|unless)\s+([^;]+);/g) {
4648
7
7
5
11
                push @{$error_patterns{implicit_undef}}, $1;
4649
7
9
                $self->_log("  DEBUG Found implicit undef pattern");
4650        }
4651
4652        # Pattern 3: return () - matches with or without conditions
4653
288
318
        if ($code =~ /return\s*\(\s*\)\s*(?:if|unless|;)/) {
4654
3
4
                $error_patterns{empty_list} = 1;
4655
3
4
                $self->_log("  DEBUG Found empty list return");
4656        }
4657
4658        # Pattern 4: return 0/1 pattern (indicates boolean with error handling)
4659
288
195
        my $zero_returns = 0;
4660
288
186
        my $one_returns = 0;
4661        # Match "return 0" or "return 1" followed by anything (condition or semicolon)
4662
288
490
        while ($code =~ /return\s+(0|1)\s*(?:;|if|unless)/g) {
4663
29
38
                if ($1 eq '0') {
4664
3
5
                        $zero_returns++;
4665                } else {
4666
26
30
                        $one_returns++;
4667                }
4668        }
4669
4670
288
287
        if ($zero_returns > 0 && $one_returns > 0) {
4671
2
2
                $error_patterns{zero_on_error} = 1;
4672
2
4
                $self->_log("  DEBUG Found 0/1 return pattern ($zero_returns zeros, $one_returns ones)");
4673        }
4674
4675        # Pattern 5: Exception handling with eval
4676
288
317
        if ($code =~ /eval\s*\{/) {
4677                # Check if there's error handling after eval
4678
3
12
                if ($code =~ /eval\s*\{.*?\}[^}]*(?:if\s*\(\s*\$\@|catch|return\s+undef)/s) {
4679
3
3
                        $error_patterns{exception_handling} = 1;
4680
3
3
                        $self->_log('  DEBUG Found exception handling with eval');
4681                }
4682        }
4683
4684        # Detect success/failure return pattern
4685
288
431
        my @all_returns = $code =~ /return\s+([^;]+);/g;
4686
288
298
258
357
        my $has_undef = grep { /^\s*undef\s*(?:if|unless|$)/ } @all_returns;
4687
288
298
210
602
        my $has_value = grep { !/^\s*undef\s*$/ && !/^\s*$/ } @all_returns;
4688
4689
288
305
        if ($has_undef && $has_value && scalar(@all_returns) >= 2) {
4690
6
59
                $output->{_success_failure_pattern} = 1;
4691
6
6
                $self->_log("  OUTPUT: Uses success/failure return pattern");
4692        }
4693
4694        # Store error conventions in output
4695
288
263
        if(scalar(keys %error_patterns)) {
4696
17
21
                $output->{_error_handling} = \%error_patterns;
4697
4698                # Determine primary error convention
4699
17
48
                if ($error_patterns{undef_on_error}) {
4700
5
7
                        $output->{_error_return} = 'undef';
4701
5
3
                        $self->_log("  OUTPUT: Returns undef on error");
4702                } elsif ($error_patterns{implicit_undef}) {
4703
6
7
                        $output->{_error_return} = 'undef';
4704
6
6
                        $self->_log("  OUTPUT: Returns implicit undef on error");
4705                } elsif ($error_patterns{empty_list}) {
4706
3
5
                        $output->{_error_return} = 'empty_list';
4707
3
5
                        $self->_log("  OUTPUT: Returns empty list on error");
4708                } elsif ($error_patterns{zero_on_error}) {
4709
2
2
                        $output->{_error_return} = 'false';
4710
2
4
                        $self->_log("  OUTPUT: Returns 0/false on error");
4711                }
4712
4713
17
29
                if ($error_patterns{exception_handling}) {
4714
3
4
                        $self->_log("  OUTPUT: Has exception handling");
4715                }
4716        } else {
4717
271
313
                delete $output->{_error_handling};
4718        }
4719}
4720
4721# --------------------------------------------------
4722# _infer_type_from_expression
4723#
4724# Purpose:    Infer the data type of a return
4725#             expression string by matching it
4726#             against common Perl literal and
4727#             variable patterns.
4728#
4729# Entry:      $expr - return expression string,
4730#                     trimmed of leading and
4731#                     trailing whitespace.
4732#                     May be undef.
4733#
4734# Exit:       Returns a type hashref of the form
4735#             { type => '...' } and optionally
4736#             { min => N }. Defaults to
4737#             { type => 'scalar' } when no
4738#             pattern matches.
4739#
4740# Side effects: None.
4741# --------------------------------------------------
4742sub _infer_type_from_expression {
4743
37
638
        my ($self, $expr) = @_;
4744
4745
37
44
        return { type => 'scalar' } unless defined $expr;
4746
4747
35
72
        $expr =~ s/^\s+|\s+$//g;
4748
4749        # Check for multiple comma-separated values (indicates array/list)
4750
35
46
        if ($expr =~ /,/) {
4751
5
618
                require Text::Balanced;
4752
5
8098
                my $comma_count = 0;
4753
5
5
                my $rest = $expr;
4754
5
8
                while (length $rest) {
4755
42
40
                        if (substr($rest, 0, 1) =~ /[(\[{]/) {
4756
5
6
                                my $extracted = Text::Balanced::extract_bracketed($rest, '(){}[]');
4757
5
317
                                last unless defined $extracted; # Unbalanced brackets
4758
5
7
                                next;
4759                        }
4760
37
31
                        $comma_count++ if substr($rest, 0, 1) eq ',';
4761
37
31
                        $rest = substr($rest, 1);
4762                }
4763
4764
5
7
                if ($comma_count > 0) {
4765
3
7
                        return { type => 'array' };
4766                }
4767        }
4768
4769        # Check for @ prefix (array)
4770
32
88
        if ($expr =~ /^\@\w+/ || $expr =~ /^qw\(/ || $expr =~ /^\@\{/) {
4771
6
16
                return { type => 'array' };
4772        }
4773
4774        # Check for scalar() function - returns count
4775
26
33
        if ($expr =~ /scalar\s*\(/) {
4776
4
8
                return { type => 'integer', min => 0 };
4777        }
4778
4779        # Check for array reference
4780
22
37
        if ($expr =~ /^\[/ || $expr =~ /^\\\@/) {
4781
4
10
                return { type => 'arrayref' };
4782        }
4783
4784        # Check for hash reference
4785
18
32
        if ($expr =~ /^\{/ || $expr =~ /^\\\%/) {
4786
3
6
                return { type => 'hashref' };
4787        }
4788
4789        # Check for hash
4790
15
26
        if ($expr =~ /^\%\w+/ || $expr =~ /^\%\{/) {
4791
0
0
                return { type => 'hash' };
4792        }
4793
4794        # Check for strings
4795
15
31
        if ($expr =~ /^['"]/ || $expr =~ /['"]$/) {
4796
2
6
                return { type => 'string' };
4797        }
4798
4799        # Check for booleans first — must come before the integer check
4800        # since /^-?\d+$/ would otherwise match 0 and 1 as integers
4801
13
16
        if($expr =~ /^[01]$/) {
4802
4
11
                return { type => 'boolean' };
4803        }
4804
4805        # Check for integers
4806
9
19
        if($expr =~ /^-?\d+$/) {
4807
4
10
                return { type => 'integer' };
4808        }
4809
4810
5
9
        if ($expr =~ /^-?\d+\.\d+$/) {
4811
2
6
                return { type => 'number' };
4812        }
4813
4814        # Check for objects
4815
3
6
        if ($expr =~ /bless/) {
4816
0
0
                return { type => 'object' };
4817        }
4818
4819
3
14
        if($expr =~ /\blength\s*\(/) {
4820
2
4
                return { type => 'integer', min => 0 };
4821        }
4822
4823        # Default to scalar
4824
1
2
        return { type => 'scalar' };
4825}
4826
4827# --------------------------------------------------
4828# _detect_chaining_from_pod
4829#
4830# Purpose:    Check POD documentation for explicit
4831#             indications that a method is chainable
4832#             or part of a fluent interface.
4833#
4834# Entry:      $output - output hashref (modified
4835#                       in place).
4836#             $pod    - POD string for the method.
4837#
4838# Exit:       Returns nothing. Sets _returns_self
4839#             in $output if chaining keywords are
4840#             found.
4841#
4842# Side effects: Logs detection to stdout when
4843#               verbose is set.
4844# --------------------------------------------------
4845sub _detect_chaining_from_pod {
4846
5
17
        my ($self, $output, $pod) = @_;
4847
5
6
        return unless $pod;
4848
4849        # Look for explicit chaining documentation
4850
4
21
        if ($pod =~ /returns?\s+(?:\$)?self\b/i ||
4851                $pod =~ /chainable/i ||
4852                $pod =~ /fluent\s+interface/i ||
4853                $pod =~ /method\s+chaining/i) {
4854
4855
3
4
                $output->{_returns_self} = 1;
4856
3
3
                $self->_log("  OUTPUT: POD indicates chainable/fluent interface");
4857        }
4858}
4859
4860# --------------------------------------------------
4861# _validate_output
4862#
4863# Purpose:    Apply basic sanity checks to the
4864#             assembled output hashref and warn
4865#             about suspicious type combinations,
4866#             normalising clearly invalid types to
4867#             'string'.
4868#
4869# Entry:      $output - output hashref (modified
4870#                       in place).
4871#
4872# Exit:       Returns nothing. May modify type key
4873#             in $output. Logs warnings to stdout
4874#             when verbose is set.
4875#
4876# Side effects: None.
4877# --------------------------------------------------
4878sub _validate_output {
4879
279
2396
        my ($self, $output) = @_;
4880
4881        # Warn about suspicious combinations
4882
279
525
        if (defined $output->{type} && $output->{type} eq 'boolean' && !defined($output->{value})) {
4883
13
19
                $self->_log('  WARNING Boolean type without value - may want to set value: 1');
4884        }
4885
279
322
        if ($output->{value} && defined $output->{type} && $output->{type} ne 'boolean') {
4886
0
0
                $self->_log("  WARNING Value set but type is not boolean: $output->{type}");
4887        }
4888
279
2511
293
2092
        my %valid_types = map { $_ => 1 } qw(string integer number boolean array arrayref hashref object void);
4889
279
402
        if(exists $output->{type}) {
4890
275
443
                if(!$valid_types{$output->{type}}) {
4891
60
121
                        $self->_log("  WARNING Output value type is unknown: '$output->{type}', setting to string");
4892
60
105
                        $output->{type} = 'string';
4893                }
4894        }
4895}
4896
4897# --------------------------------------------------
4898# _parse_constraints
4899#
4900# Purpose:    Parse a constraint string extracted
4901#             from POD documentation and populate
4902#             min, max, or other constraint fields
4903#             in a parameter hashref.
4904#
4905# Entry:      $param      - hashref for the parameter
4906#                           being annotated (modified
4907#                           in place).
4908#             $constraint - the constraint string,
4909#                           e.g. '3-50', 'positive',
4910#                           '>= 0', 'min 3'.
4911#
4912# Exit:       Returns nothing. Modifies $param in
4913#             place by setting min and/or max keys.
4914#
4915# Side effects: Logs min/max values to stdout when
4916#               verbose is set.
4917# --------------------------------------------------
4918sub _parse_constraints {
4919
22
2906
        my ($self, $param, $constraint) = @_;
4920
4921        # Range: "3-50" or "1-100 chars"
4922
22
107
        if ($constraint =~ /(\d+)\s*-\s*(\d+)/) {
4923
8
20
                $param->{min} = $1;
4924
8
11
                $param->{max} = $2;
4925        }
4926        elsif ($constraint =~ /(\d+)\s*\.\.\s*(\d+)/) {
4927                # Range: 0..19
4928
2
5
                $param->{min} = $1;
4929
2
23
                $param->{max} = $2;
4930        }
4931        # Minimum: "min 3" or "at least 5"
4932        elsif ($constraint =~ /(?:min|minimum|at least)\s*(\d+)/i) {
4933
4
6
                $param->{min} = $1;
4934        }
4935        # Maximum: "max 50" or "up to 100"
4936        elsif ($constraint =~ /(?:max|maximum|up to)\s*(\d+)/i) {
4937
3
4
                $param->{max} = $1;
4938        }
4939        # Positive
4940        elsif ($constraint =~ /positive/i) {
4941
2
6
                $param->{min} = 1 if $param->{type} && $param->{type} eq 'integer';
4942
2
5
                $param->{min} = 0.01 if $param->{type} && $param->{type} eq 'number';
4943        }
4944        # Non-negative
4945        elsif ($constraint =~ /non-negative/i) {
4946
2
2
                $param->{min} = 0;
4947        } elsif($constraint =~ /(.+)?\s(.+)/) {
4948
0
0
                my ($op, $val) = ($1, $2);
4949
0
0
                if(looks_like_number($val)) {
4950
0
0
                        if ($op eq '<') {
4951
0
0
                                $param->{max} = $val - 1;
4952                        } elsif ($op eq '<=') {
4953
0
0
                                $param->{max} = $val;
4954                        } elsif ($op eq '>') {
4955
0
0
                                $param->{min} = $val + 1;
4956                        } elsif ($op eq '>=') {
4957
0
0
                                $param->{min} = $val;
4958                        }
4959                }
4960        }
4961
4962
22
28
        if(defined($param->{max})) {
4963
13
21
                $self->_log("  Set max to $param->{max}");
4964        }
4965
22
25
        if(defined($param->{min})) {
4966
18
28
                $self->_log("  Set min to $param->{min}");
4967        }
4968}
4969
4970# --------------------------------------------------
4971# _analyze_code
4972#
4973# Purpose:    Analyse a method's source code using
4974#             pattern matching to infer parameter
4975#             names, types, constraints, defaults,
4976#             and optionality. Orchestrates all
4977#             per-parameter code analysis helpers.
4978#
4979# Entry:      $code   - method body source string.
4980#             $method - method hashref (used for
4981#                       constructor-specific logic
4982#                       when extracting parameters
4983#                       from @_ patterns).
4984#
4985# Exit:       Returns a hashref of parameter name
4986#             to parameter spec hashref, with as
4987#             much type and constraint information
4988#             as could be inferred from the code.
4989#
4990# Side effects: Logs progress and warnings to stdout
4991#               when verbose is set.
4992#
4993# Notes:      Analysis is capped at max_parameters
4994#             to prevent runaway processing on
4995#             pathological methods. Falls back to
4996#             classic @_ extraction if signature
4997#             extraction found no parameters.
4998# --------------------------------------------------
4999sub _analyze_code {
5000
288
1904
        my ($self, $code, $method) = @_;
5001
5002
288
180
        my %params;
5003
5004        # Safety check - limit parameter analysis to prevent runaway processing
5005
288
222
        my $param_count = 0;
5006
5007        # Extract parameter names from various signature styles
5008
288
416
        $self->_extract_parameters_from_signature(\%params, $code);
5009
5010        # Params::Get: get_params('key', \@_) passes the param name as a string,
5011        # not as a $var in the signature, so run this unconditionally as a second
5012        # pass after the early-returning signature parsers have finished.
5013
288
346
        if($code =~ /Params::Get/) {
5014
2
3
                my $pos = scalar keys %params;
5015
2
5
                while($code =~ /get_params\s*\(\s*['"](\w+)['"]/g) {
5016
2
2
                        my $name = $1;
5017
2
2
                        next if $name =~ /^(self|class)$/i;
5018
2
7
                        $params{$name} //= { _source => 'code', position => $pos++ };
5019
2
2
                        $self->_log("  CODE: Found Params::Get parameter '$name'");
5020                }
5021        }
5022
5023
288
441
        $self->_extract_defaults_from_code(\%params, $code, $method);
5024
5025        # Infer types from defaults
5026
288
308
        foreach my $param (keys %params) {
5027
185
238
                if ($params{$param}{_default} && !$params{$param}{type}) {
5028
18
12
                        my $default = $params{$param}{_default};
5029
18
24
                        if (ref($default) eq 'HASH') {
5030
2
2
                                $params{$param}{type} = 'hashref';
5031
2
2
                                $self->_log("  CODE: $param type inferred as hashref from default");
5032                        } elsif (ref($default) eq 'ARRAY') {
5033
1
2
                                $params{$param}{type} = 'arrayref';
5034
1
1
                                $self->_log("  CODE: $param type inferred as arrayref from default");
5035                        }
5036                }
5037        }
5038
5039
288
431
        if($code =~ /(croak|die)\(.*\)\s+if\s*\(\s*scalar\(\@_\)\s*<\s*(\d+)\s*\)/s) {
5040
0
0
                my $required_count = $2;
5041
0
0
0
0
                my @param_names = sort { $params{$a}{position} <=> $params{$b}{position} } keys %params;
5042
0
0
                for my $i (0 .. $required_count-1) {
5043
0
0
                        $params{$param_names[$i]}{optional} = 0;
5044
0
0
                        $self->_log("  CODE: $param_names[$i] marked required due to croak scalar check");
5045                }
5046        } elsif ($code =~ /(croak|die)\(.*\)\s+if\s*\(\s*scalar\(\@_\)\s*==\s*(0)\s*\)/s) {
5047
0
0
                foreach my $param (keys %params) {
5048
0
0
                        $params{$param}{optional} = 0;
5049
0
0
                        $self->_log("  CODE: $param: all parameters are required due to 'scalar(@_) == 0' check");
5050                }
5051        }
5052
5053        # Analyze each parameter (with safety limit)
5054
288
308
        foreach my $param (keys %params) {
5055
185
248
                if ($param_count++ > $self->{max_parameters}) {
5056
0
0
                        $self->_log("  WARNING: Max parameters ($self->{max_parameters}) exceeded, skipping remaining");
5057
0
0
                        last;
5058                }
5059
5060
185
171
                my $p = \$params{$param};
5061
5062
185
273
                $self->_analyze_parameter_type($p, $param, $code);
5063
185
393
                $self->_analyze_parameter_constraints($p, $param, $code);
5064
185
365
                $self->_analyze_parameter_validation($p, $param, $code);
5065
185
273
                $self->_analyze_advanced_types($p, $param, $code);
5066
5067                # Defined checks
5068
185
2407
                if ($code =~ /defined\s*\(\s*\$$param\s*\)/) {
5069
1
2
                        $$p->{optional} = 0;
5070
1
2
                        $self->_log("  CODE: $param is required (defined check)");
5071                }
5072
5073                # Determine optional/required and numeric type from code
5074
185
9932
                if ($code =~ /\s*\$$param\s*(?:\/\/|\|\|)=/) {
5075                        # e.g. $var //= 5; or $var ||= 5;
5076
7
6
                        $$p->{optional} = 1;
5077
7
12
                        $self->_log("  CODE: $param is optional (default value assigned in code)");
5078                } elsif ($code =~ /\s*\$$param\s*(?:[\+\-\*\%]|\/(?!\/)|(?:\+\+)|(?:--)|(?:[\+\-\*\%]=|\/(?!\/)=)|\+\$|\$[+-])/ ) {
5079                        # Covers arithmetic usage:
5080                        # $x + $param, $param++, $param--, $x += $param, $x -= $param, etc.
5081
36
48
                        $$p->{optional} = 0;
5082
36
51
                        $$p->{type} //= 'number';
5083
36
67
                        $self->_log("  CODE: $param is required (used in arithmetic context)");
5084                } elsif ($code =~ /\$\b$param\b\s*(?:\+0|\*1)/) {
5085                        # Forces numeric context, e.g., "$param + 0" or "$param * 1"
5086
0
0
                        $$p->{optional} = 0;
5087
0
0
                        $$p->{type} //= 'number';
5088
0
0
                        $self->_log("  CODE: $param is required (numeric context)");
5089                }
5090
5091                # Required parameter checks (undef causes error)
5092
5093                # Style 1: block form
5094
185
5419
                if ($code =~ /if\s*\(\s*!\s*defined\s*\(\s*\$$param\s*\)\s*\)\s*\{([^}]+)\}/s) {
5095
0
0
                        my $block = $1;
5096
0
0
                        if ($block =~ /\b(croak|die|confess)\b/) {
5097
0
0
                                $$p->{optional} = 0;
5098
0
0
                                $self->_log("  CODE: $param is required (undef causes error)");
5099                        }
5100                }
5101
5102                # Style 2: postfix unless
5103
185
5876
                if ($code =~ /\b(croak|die|confess)\b[^;]*\bunless\s+defined\s*\(\s*\$$param\s*\)/) {
5104
0
0
                        $$p->{optional} = 0;
5105
0
0
                        $self->_log("  CODE: $param is required (postfix undef check)");
5106                }
5107
5108                # Exists checks for hash keys
5109
185
2324
                if ($code =~ /exists\s*\(\s*\$$param\s*\)/) {
5110
0
0
                        $$p->{type} = 'hashkey';
5111
0
0
                        $self->_log("  CODE: $param is a hash key");
5112                }
5113
5114                # Scalar context for arrays
5115
185
2345
                if ($code =~ /scalar\s*\(\s*\@?\$$param\s*\)/) {
5116
0
0
                        $$p->{type} = 'array';
5117
0
0
                        $self->_log("  CODE: $param used in scalar context (array)");
5118                }
5119
5120
185
252
                $self->_extract_error_constraints($p, $param, $code);
5121        }
5122
5123
288
332
        return \%params;
5124}
5125
5126# --------------------------------------------------
5127# _analyze_parameter_type
5128#
5129# Purpose:    Infer the type of a single parameter
5130#             from ref() checks, isa() calls,
5131#             bless patterns, array/hash operations,
5132#             and numeric operator usage in the
5133#             method body.
5134#
5135# Entry:      $p_ref - reference to the parameter
5136#                      hashref (modified in place
5137#                      via the referenced hash).
5138#             $param - parameter name string.
5139#             $code  - method body source string.
5140#
5141# Exit:       Returns nothing. Modifies the
5142#             referenced parameter hashref.
5143#
5144# Side effects: Logs detections to stdout when
5145#               verbose is set.
5146# --------------------------------------------------
5147sub _analyze_parameter_type {
5148
189
196
        my ($self, $p_ref, $param, $code) = @_;
5149
189
147
        my $p = $$p_ref;
5150
5151        # Type inference from ref() checks
5152
189
14186
        if ($code =~ /ref\s*\(\s*\$$param\s*\)\s*eq\s*['"](ARRAY|HASH|SCALAR)['"]/gi) {
5153
4
7
                my $reftype = lc($1);
5154
4
12
                $p->{type} = $reftype eq 'array' ? 'arrayref' :
5155                                         $reftype eq 'hash' ? 'hashref' :
5156                                         'scalar';
5157
4
9
                $self->_log("  CODE: $param is $p->{type} (ref check)");
5158        }
5159        # ISA checks for objects
5160        elsif ($code =~ /\$$param\s*->\s*isa\s*\(\s*['"]([^'"]+)['"]\s*\)/i) {
5161
3
4
                $p->{type} = 'object';
5162
3
5
                $p->{isa} = $1;
5163
3
8
                $self->_log("  CODE: $param is object of class $1");
5164        }
5165        # Blessed references
5166        elsif ($code =~ /bless\s+.*\$$param/) {
5167
0
0
                $p->{type} = 'object';
5168
0
0
                $self->_log("  CODE: $param is blessed object");
5169        }
5170        # Array/hash operations
5171
189
338
        if (!$p->{type}) {
5172
163
6191
                if ($code =~ /\@\{\s*\$$param\s*\}/ || $code =~ /push\s*\(\s*\@?\$$param/) {
5173
0
0
                        $p->{type} = 'arrayref';
5174                } elsif ($code =~ /\%\{\s*\$$param\s*\}/ || $code =~ /\$$param\s*->\s*\{/) {
5175
0
0
                        $p->{type} = 'hashref';
5176                }
5177        }
5178
5179        # Infer type from the default value if type is unknown
5180
189
380
        if (!$p->{type} && exists $p->{_default}) {
5181
20
16
                my $default = $p->{_default};
5182
20
28
                if (ref($default) eq 'HASH') {
5183
0
0
                        $p->{type} = 'hashref';
5184
0
0
                        $self->_log("  CODE: $param type inferred as hashref from default");
5185                } elsif (ref($default) eq 'ARRAY') {
5186
0
0
                        $p->{type} = 'arrayref';
5187
0
0
                        $self->_log("  CODE: $param type inferred as arrayref from default");
5188                }
5189        }
5190
5191        # ------------------------------------------------------------
5192        # Heuristic numeric inference (low confidence)
5193        # ------------------------------------------------------------
5194
189
223
        if (!$p->{type}) {
5195                # An explicit looks_like_number($param) check is a direct
5196                # numeric-type assertion by the author, stronger evidence than
5197                # incidental arithmetic adjacency (e.g. $param is only ever
5198                # used inside a defined-or default before the arithmetic, so
5199                # the arithmetic-operator check below never sees $param itself
5200                # next to an operator).
5201
163
11600
                if ($code =~ /\blooks_like_number\s*\(\s*\$$param\s*\)/) {
5202
2
3
                        $p->{type} = 'number';
5203
2
2
                        $p->{_type_confidence} = 'heuristic';
5204
2
2
                        $self->_log("  CODE: $param inferred as number (looks_like_number check)");
5205                }
5206                # Numeric operators: + - * / % **
5207                # Use \/(?!\/) to exclude // (defined-or) from matching as division.
5208                elsif (
5209                        $code =~ /\$$param\s*(?:[\+\-\*\%]|\/(?!\/))/ ||
5210                        $code =~ /(?:[\+\-\*\%]|\/(?!\/))\s*\$$param/ ||
5211                        $code =~ /\bint\s*\(\s*\$$param\s*\)/ ||
5212                        $code =~ /\babs\s*\(\s*\$$param\s*\)/
5213                ) {
5214
47
61
                        $p->{type} = 'number';
5215
47
106
                        $p->{_type_confidence} = 'heuristic';
5216
47
74
                        $self->_log("  CODE: $param inferred as number (numeric operator)");
5217                }
5218                # Numeric comparison
5219                elsif (
5220                        $code =~ /\$$param\s*(?:==|!=|<=|>=|<|>)/ ||
5221                        $code =~ /(?:==|!=|<=|>=|<|>)\s*\$$param/
5222                ) {
5223
15
19
                        $p->{type} = 'number';
5224
15
17
                        $p->{_type_confidence} = 'heuristic';
5225
15
28
                        $self->_log("  CODE: $param inferred as number (numeric comparison)");
5226                }
5227        }
5228}
5229
5230# --------------------------------------------------
5231# _analyze_advanced_types
5232#
5233# Purpose:    Apply enhanced type detection to a
5234#             single parameter, checking for
5235#             DateTime objects, file handles,
5236#             coderefs, and enum-like constraints
5237#             beyond what basic type inference
5238#             can determine.
5239#
5240# Entry:      $p_ref - reference to the parameter
5241#                      hashref (modified in place
5242#                      via the referenced hash).
5243#             $param - the parameter name string.
5244#             $code  - method body source string.
5245#
5246# Exit:       Returns nothing. Modifies the
5247#             referenced parameter hashref in place.
5248#
5249# Side effects: Logs detections to stdout when
5250#               verbose is set.
5251#
5252# Notes:      Delegates to four specialised
5253#             detectors: _detect_datetime_type,
5254#             _detect_filehandle_type,
5255#             _detect_coderef_type, and
5256#             _detect_enum_type. Each detector
5257#             returns early on first match so
5258#             detectors are implicitly prioritised
5259#             in that order.
5260# --------------------------------------------------
5261sub _analyze_advanced_types {
5262
186
1706
        my ($self, $p_ref, $param, $code) = @_;
5263
5264        # Dereference once to get the hash reference
5265
186
157
        my $p = $$p_ref;
5266
5267        # Now pass the dereferenced hash to the detection methods
5268
186
237
        $self->_detect_datetime_type($p, $param, $code);
5269
186
328
        $self->_detect_filehandle_type($p, $param, $code);
5270
186
334
        $self->_detect_coderef_type($p, $param, $code);
5271
186
236
        $self->_detect_enum_type($p, $param, $code);
5272}
5273
5274# --------------------------------------------------
5275# _detect_datetime_type
5276#
5277# Purpose:    Detect DateTime objects, Time::Piece
5278#             objects, date strings, ISO 8601
5279#             strings, and UNIX timestamps by
5280#             analysing code patterns involving
5281#             the parameter.
5282#
5283# Entry:      $p     - parameter hashref (modified
5284#                      in place).
5285#             $param - parameter name string.
5286#             $code  - method body source string.
5287#
5288# Exit:       Returns nothing. Modifies $p in place,
5289#             setting type, isa, semantic, min,
5290#             matches, and/or format keys.
5291#             Returns immediately on first match.
5292#
5293# Side effects: Logs detections to stdout when
5294#               verbose is set.
5295# --------------------------------------------------
5296sub _detect_datetime_type {
5297
187
180
        my ($self, $p, $param, $code) = @_;
5298
5299        # Validate param is just a simple word
5300
187
456
        return unless defined $param && $param =~ /^\w+$/;
5301
5302        # DateTime object detection via isa/UNIVERSAL checks
5303
187
5423
        if ($code =~ /\$$param\s*->\s*isa\s*\(\s*['"]DateTime['"]\s*\)/i) {
5304
2
3
                $p->{type} = 'object';
5305
2
3
                $p->{isa} = 'DateTime';
5306
2
2
                $p->{semantic} = 'datetime_object';
5307
2
4
                $self->_log("  ADVANCED: $param is DateTime object");
5308
2
3
                return;
5309        }
5310
5311        # Check for DateTime method calls
5312
185
3160
        if ($code =~ /\$$param\s*->\s*(ymd|dmy|mdy|hms|iso8601|epoch|strftime)/) {
5313
1
2
                $p->{type} = 'object';
5314
1
2
                $p->{isa} = 'DateTime';
5315
1
1
                $p->{semantic} = 'datetime_object';
5316
1
2
                $self->_log("  ADVANCED: $param uses DateTime methods");
5317
1
1
                return;
5318        }
5319
5320        # Time::Piece detection
5321
184
7132
        if ($code =~ /\$$param\s*->\s*isa\s*\(\s*['"]Time::Piece['"]\s*\)/i ||
5322            $code =~ /\$$param\s*->\s*(strftime|epoch|year|mon|mday)/) {
5323
0
0
                $p->{type} = 'object';
5324
0
0
                $p->{isa} = 'Time::Piece';
5325
0
0
                $p->{semantic} = 'timepiece_object';
5326
0
0
                $self->_log("  ADVANCED: $param is Time::Piece object");
5327
0
0
                return;
5328        }
5329
5330        # String date/time patterns via regex matching
5331
184
1979
        if ($code =~ /\$$param\s*=~\s*\/.*?\\d\{4\}.*?\\d\{2\}.*?\\d\{2\}/) {
5332
1
1
                $p->{type} = 'string';
5333
1
1
                $p->{semantic} = 'date_string';
5334
1
2
                $p->{format} = 'YYYY-MM-DD or similar';
5335
1
3
                $self->_log("  ADVANCED: $param validated as date string pattern");
5336
1
1
                return;
5337        }
5338
5339        # ISO 8601 date pattern
5340
183
2022
        if ($code =~ /\$$param\s*=~\s*\/.*?[Tt].*?[Zz].*?\//) {
5341
1
2
                $p->{type} = 'string';
5342
1
1
                $p->{semantic} = 'iso8601_string';
5343
1
1
                $p->{matches} = '/^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z?$/';
5344
1
3
                $self->_log("  ADVANCED: $param validated as ISO 8601 datetime");
5345
1
1
                return;
5346        }
5347
5348        # UNIX timestamp detection (numeric with specific range)
5349
182
6436
        if ($code =~ /\$$param\s*>\s*\d{9,}/ || # UNIX timestamps are 10+ digits
5350            $code =~ /time\(\s*\)\s*-\s*\$$param/ ||
5351            $code =~ /\$$param\s*-\s*time\(\s*\)/) {
5352
2
3
                $p->{type} = 'integer';
5353
2
3
                $p->{semantic} = 'unix_timestamp';
5354
2
2
                $p->{min} = 0;
5355
2
4
                $self->_log("  ADVANCED: $param appears to be UNIX timestamp");
5356
2
5
                return;
5357        }
5358
5359        # Date parsing with strptime or similar
5360
180
6041
        if ($code =~ /strptime\s*\(\s*\$$param/ ||
5361            $code =~ /DateTime::Format::\w+\s*->\s*parse_datetime\s*\(\s*\$$param/) {
5362
0
0
                $p->{type} = 'string';
5363
0
0
                $p->{semantic} = 'datetime_parseable';
5364
0
0
                $self->_log("  ADVANCED: $param is parsed as datetime");
5365
0
0
                return;
5366        }
5367}
5368
5369# --------------------------------------------------
5370# _detect_filehandle_type
5371#
5372# Purpose:    Detect file handle parameters and
5373#             file path string parameters by
5374#             analysing I/O operations, file test
5375#             operators, and path manipulation
5376#             patterns involving the parameter.
5377#
5378# Entry:      $p     - parameter hashref (modified
5379#                      in place).
5380#             $param - parameter name string.
5381#             $code  - method body source string.
5382#
5383# Exit:       Returns nothing. Modifies $p in place,
5384#             setting type, isa, and semantic keys.
5385#             Returns immediately on first match.
5386#
5387# Side effects: Logs detections to stdout when
5388#               verbose is set.
5389# --------------------------------------------------
5390sub _detect_filehandle_type {
5391
187
192
        my ($self, $p, $param, $code) = @_;
5392
5393
187
393
        return unless defined $param && $param =~ /^\w+$/;
5394
5395        # File handle operations
5396
187
4307
        if ($code =~ /(?:open|close|read|print|say|sysread|syswrite)\s*\(?\s*\$$param/) {
5397
2
4
                $p->{type} = 'object';
5398
2
2
                $p->{isa} = 'IO::Handle';
5399
2
3
                $p->{semantic} = 'filehandle';
5400
2
5
                $self->_log("  ADVANCED: $param is a file handle");
5401
2
3
                return;
5402        }
5403
5404        # Filehandle-specific operations
5405
185
3108
        if ($code =~ /\$$param\s*->\s*(readline|getline|print|say|close|flush|autoflush)/) {
5406
1
1
                $p->{type} = 'object';
5407
1
2
                $p->{isa} = 'IO::Handle';
5408
1
1
                $p->{semantic} = 'filehandle';
5409
1
2
                $self->_log("  ADVANCED: $param uses filehandle methods");
5410
1
2
                return;
5411        }
5412
5413        # File test operators
5414
184
2242
        if ($code =~ /(?:-[frwxoOeszlpSbctugkTBMAC])\s+\$$param/) {
5415
2
4
                $p->{type} = 'string';
5416
2
3
                $p->{semantic} = 'filepath';
5417
2
5
                $self->_log("  ADVANCED: $param is tested as file path");
5418
2
4
                return;
5419        }
5420
5421        # File::Spec operations or path manipulation
5422
182
6560
        if ($code =~ /File::(?:Spec|Basename)::\w+\s*\(\s*\$$param/ ||
5423            $code =~ /(?:basename|dirname|fileparse)\s*\(\s*\$$param/) {
5424
0
0
                $p->{type} = 'string';
5425
0
0
                $p->{semantic} = 'filepath';
5426
0
0
                $self->_log("  ADVANCED: $param manipulated as file path");
5427
0
0
                return;
5428        }
5429
5430        # Path validation patterns
5431        # Only match a literal path assigned or defaulted to this variable
5432
182
355
        if(defined $p->{_default} && $p->{_default} =~ m{^([A-Za-z]:\\|/|\./|\.\./)}) {
5433
0
0
                $p->{type} = 'string';
5434
0
0
                $p->{semantic} = 'filepath';
5435
0
0
                $self->_log("  ADVANCED: $param default looks like a path");
5436
0
0
                return;
5437        }
5438
5439        # IO::File detection
5440
182
7193
        if ($code =~ /\$$param\s*->\s*isa\s*\(\s*['"]IO::File['"]\s*\)/ ||
5441            $code =~ /IO::File\s*->\s*new\s*\(\s*\$$param/) {
5442
0
0
                $p->{type} = 'object';
5443
0
0
                $p->{isa} = 'IO::File';
5444
0
0
                $p->{semantic} = 'filehandle';
5445
0
0
                $self->_log("  ADVANCED: $param is IO::File object");
5446
0
0
                return;
5447        }
5448}
5449
5450# --------------------------------------------------
5451# _detect_coderef_type
5452#
5453# Purpose:    Detect coderef and callback parameters
5454#             by analysing ref() checks, invocation
5455#             patterns, and parameter naming
5456#             conventions.
5457#
5458# Entry:      $p     - parameter hashref (modified
5459#                      in place).
5460#             $param - parameter name string.
5461#             $code  - method body source string.
5462#
5463# Exit:       Returns nothing. Modifies $p in place,
5464#             setting type and semantic keys.
5465#             Returns immediately on first match.
5466#
5467# Side effects: Logs detections to stdout when
5468#               verbose is set.
5469# --------------------------------------------------
5470sub _detect_coderef_type {
5471
188
207
        my ($self, $p, $param, $code) = @_;
5472
5473
188
399
        return unless defined $param && $param =~ /^\w+$/;
5474
5475        # ref() check for CODE
5476
188
5039
        if ($code =~ /ref\s*\(\s*\$$param\s*\)\s*eq\s*['"]CODE['"]/i) {
5477
2
4
                $p->{type} = 'coderef';
5478
2
2
                $p->{semantic} = 'callback';
5479
2
6
                $self->_log("  ADVANCED: $param is coderef (ref check)");
5480
2
42
                return;
5481        }
5482
5483        # Invocation as coderef - note the escaped @ in \@_
5484
186
6671
        if ($code =~ /\$$param\s*->\s*\(/ ||
5485            $code =~ /\$$param\s*->\s*\(\s*\@_\s*\)/ ||
5486            $code =~ /&\s*\{\s*\$$param\s*\}/) {
5487
2
3
                $p->{type} = 'coderef';
5488
2
3
                $p->{semantic} = 'callback';
5489
2
4
                $self->_log("  ADVANCED: $param invoked as coderef");
5490
2
2
                return;
5491        }
5492
5493        # Parameter name suggests callback
5494
184
333
        if ($param =~ /^(?:callback|cb|handler|sub|code|fn|func|on_\w+)$/i) {
5495
2
2
                $p->{type} = 'coderef';
5496
2
3
                $p->{semantic} = 'callback';
5497
2
4
                $self->_log("  ADVANCED: $param name suggests coderef");
5498
2
4
                return;
5499        }
5500
5501        # Blessed coderef (unusual but valid)
5502
182
2382
        if ($code =~ /blessed\s*\(\s*\$$param\s*\)/ &&
5503            $code =~ /ref\s*\(\s*\$$param\s*\)\s*eq\s*['"]CODE['"]/i) {
5504
0
0
                $p->{type} = 'object';
5505
0
0
                $p->{isa} = 'blessed_coderef';
5506
0
0
                $p->{semantic} = 'callback';
5507
0
0
                $self->_log("  ADVANCED: $param is blessed coderef");
5508
0
0
                return;
5509        }
5510}
5511
5512# --------------------------------------------------
5513# _detect_enum_type
5514#
5515# Purpose:    Detect enum-like parameters whose
5516#             valid values are a fixed set, by
5517#             analysing validation patterns
5518#             including regex alternations, hash
5519#             lookups, grep checks, given/when,
5520#             if/elsif chains, and smart match.
5521#
5522# Entry:      $p     - parameter hashref (modified
5523#                      in place).
5524#             $param - parameter name string.
5525#             $code  - method body source string.
5526#
5527# Exit:       Returns nothing. Modifies $p in place,
5528#             setting type, enum, and semantic keys.
5529#             Returns immediately on first match.
5530#
5531# Side effects: Logs detections to stdout when
5532#               verbose is set.
5533#
5534# Notes:      Requires at least 3 if/elsif branches
5535#             for pattern 5 to avoid false positives
5536#             from ordinary conditional code.
5537# --------------------------------------------------
5538sub _detect_enum_type {
5539
188
185
        my ($self, $p, $param, $code) = @_;
5540
5541
188
383
        return unless defined $param && $param =~ /^\w+$/;
5542
5543        # Pattern 1: die/croak unless value is in list
5544        # die 'Invalid status' unless $status =~ /^(active|inactive|pending)$/;
5545
188
2823
        if ($code =~ /unless\s+\$$param\s*=~\s*\/\^?\(([^)]+)\)/) {
5546
3
5
                my $values = $1;
5547
3
40
                my @enum_values = split(/\|/, $values);
5548
3
8
                $p->{type} = 'string' unless $p->{type};
5549
3
6
                $p->{enum} = \@enum_values;
5550
3
3
                $p->{semantic} = 'enum';
5551
3
9
                $self->_log("  ADVANCED: $param is enum with values: " . join(', ', @enum_values));
5552
3
6
                return;
5553        }
5554
5555        # Pattern 2: Hash lookup for validation
5556        # my %valid = map { $_ => 1 } qw(red green blue);
5557        # die unless $valid{$param};
5558
185
263
        if ($code =~ /\%(\w+)\s*=.*?qw\s*[\(\[<{]([^)\]>}]+)[\)\]>}]/) {
5559
3
5
                my $hash_name = $1;
5560
3
4
                my $values_str = $2;
5561
3
48
                if (defined $values_str && $code =~ /\$$hash_name\s*\{\s*\$$param\s*\}/) {
5562
2
4
                        my @enum_values = split(/\s+/, $values_str);
5563
2
5
                        $p->{type} = 'string' unless $p->{type};
5564
2
2
                        $p->{enum} = \@enum_values;
5565
2
3
                        $p->{semantic} = 'enum';
5566
2
8
                        $self->_log("  ADVANCED: $param validated via hash lookup: " . join(', ', @enum_values));
5567
2
4
                        return;
5568                }
5569        }
5570
5571        # Pattern 3: Array grep validation
5572        # die unless grep { $_ eq $param } qw(foo bar baz);
5573
183
4307
        if ($code =~ /grep\s*\{[^}]*\$$param[^}]*\}\s*qw\s*[\(\[<{]([^)\]>}]+)[\)\]>}]/) {
5574
1
1
                my $values_str = $1;
5575
1
2
                my @enum_values = split(/\s+/, $values_str);
5576
1
2
                $p->{type} = 'string' unless $p->{type};
5577
1
2
                $p->{enum} = \@enum_values;
5578
1
1
                $p->{semantic} = 'enum';
5579
1
4
                $self->_log("  ADVANCED: $param validated via grep: " . join(', ', @enum_values));
5580
1
2
                return;
5581        }
5582
5583        # Pattern 4: Given/when (Perl 5.10+)
5584
182
2311
        if ($code =~ /given\s*\(\s*\$$param\s*\)/) {
5585
0
0
                my @enum_values;
5586
0
0
                while ($code =~ /when\s*\(\s*['"]([^'"]+)['"]\s*\)/g) {
5587
0
0
                        push @enum_values, $1;
5588                }
5589
0
0
                if (@enum_values >= 2) {
5590
0
0
                        $p->{type} = 'string' unless $p->{type};
5591
0
0
                        $p->{enum} = \@enum_values;
5592
0
0
                        $p->{semantic} = 'enum';
5593
0
0
                        $self->_log("  ADVANCED: $param has enum values from given/when: " .
5594                                   join(', ', @enum_values));
5595
0
0
                        return;
5596                }
5597        }
5598
5599        # Pattern 5: Multiple if/elsif checking specific values
5600
182
153
        my @if_values;
5601
182
4874
        while ($code =~ /if\s*\(\s*\$$param\s*eq\s*['"]([^'"]+)['"]\s*\)/g) {
5602
7
18
                push @if_values, $1;
5603        }
5604
182
4799
        while ($code =~ /elsif\s*\(\s*\$$param\s*eq\s*['"]([^'"]+)['"]\s*\)/g) {
5605
5
11
                push @if_values, $1;
5606        }
5607
182
239
        if (@if_values >= 3) {
5608
2
5
                $p->{type} = 'string' unless $p->{type};
5609
2
3
                $p->{enum} = \@if_values;
5610
2
3
                $p->{semantic} = 'enum';
5611
2
7
                $self->_log("  ADVANCED: $param appears to be enum from if/elsif: " .
5612                           join(', ', @if_values));
5613
2
3
                return;
5614        }
5615
5616        # Pattern 6: Smart match (~~) with array
5617
180
5808
        if ($code =~ /\$$param\s*~~\s*\[([^\]]+)\]/ ||
5618            $code =~ /\$$param\s*~~\s*qw\s*[\(\[<{]([^)\]>}]+)[\)\]>}]/) {
5619
0
0
                my $values_str = $1;
5620
0
0
                my @enum_values;
5621
0
0
                if ($values_str =~ /['"]/) {
5622
0
0
                        @enum_values = $values_str =~ /['"](.*?)['"]/g;
5623                } else {
5624
0
0
                        @enum_values = split(/\s+/, $values_str);
5625                }
5626
0
0
                if (@enum_values) {
5627
0
0
                        $p->{type} = 'string' unless $p->{type};
5628
0
0
                        $p->{enum} = \@enum_values;
5629
0
0
                        $p->{semantic} = 'enum';
5630
0
0
                        $self->_log("  ADVANCED: $param validated with smart match: " .
5631                                   join(', ', @enum_values));
5632
0
0
                        return;
5633                }
5634        }
5635}
5636
5637# --------------------------------------------------
5638# _extract_error_constraints
5639#
5640# Purpose:    Extract invalid-value constraints and
5641#             error messages from die/croak patterns
5642#             referencing a specific parameter, and
5643#             infer numeric bounds from comparisons
5644#             with literals.
5645#
5646# Entry:      $p_ref - reference to the parameter
5647#                      hashref (modified in place).
5648#             $param - parameter name string.
5649#             $code  - method body source string.
5650#
5651# Exit:       Returns nothing. May add _invalid,
5652#             _errors, min, and/or max to the
5653#             referenced parameter hashref.
5654#
5655# Side effects: Logs detections to stdout when
5656#               verbose is set.
5657# --------------------------------------------------
5658sub _extract_error_constraints {
5659
188
232
        my ($self, $p, $param, $code) = @_;
5660
5661        # Look for die/croak/confess with a condition involving this param
5662
188
355
        while ($code =~ /
5663                (?:die|croak|confess)       # error call
5664                \s*
5665                (?:
5666                        ["']([^"']+)["']        # captured error message
5667                |
5668                        q[qw]?\s*[\(\[]([^)\]]+)[\)\]]  # q(), qq(), qw()
5669                )?
5670                \s*
5671                if\s+
5672                (.+?)                      # condition
5673                \s*;
5674        /gsx) {
5675
5676
33
52
                my $message = $1 || $2;
5677
33
29
                my $condition = $3;
5678
5679                # Only keep conditions that reference this parameter
5680
33
132
                next unless $condition =~ /\$$param\b/;
5681
5682                # Initialize storage
5683
22
54
                $$p->{_invalid} ||= [];
5684
22
46
                $$p->{_errors}  ||= [];
5685
5686                # Normalize condition (strip surrounding parens)
5687
22
48
                $condition =~ s/^\(|\)$//g;
5688
22
79
                $condition =~ s/\s+/ /g;
5689
5690                # Try to extract a meaningful invalid constraint
5691
22
20
                my $constraint;
5692
5693                # Examples:
5694                #   $age <= 0
5695                #   $x eq ''
5696                #   length($s) < 3
5697
22
912
                if ($condition =~ /\$$param\s*([!<>=]=?|eq|ne|lt|gt|le|ge)\s*(.+)/) {
5698
8
15
                        $constraint = "$1 $2";
5699                }
5700                elsif ($condition =~ /length\s*\(\s*\$$param\s*\)\s*([<>=!]+)\s*(\d+)/) {
5701
1
2
                        $constraint = "length $1 $2";
5702                }
5703                elsif ($condition =~ /\$$param\s*==\s*0/) {
5704
0
0
                        $constraint = '== 0';
5705                }
5706
5707                # Store results
5708
22
9
30
13
                push @{ $$p->{_invalid} }, $constraint if $constraint;
5709
22
17
23
21
                push @{ $$p->{_errors}  }, $message if defined $message;
5710
5711
22
46
                $self->_log(
5712                        "  ERROR: $param invalid when [$condition]" .
5713                        (defined $message ? " => '$message'" : '')
5714                );
5715        }
5716
5717        # Numeric comparison with literal
5718
188
3493
        if ($code =~ /\b\Q$param\E\s*(<=|<|>=|>)\s*(-?\d+)/) {
5719
16
31
                my ($op, $num) = ($1, $2);
5720
5721                # Mark required
5722
16
25
                $$p->{optional} = 0;
5723
5724
16
47
                if ($op eq '<=') {
5725
1
5
                        $$p->{min} = $num + 1;
5726                } elsif ($op eq '<') {
5727
4
7
                        $$p->{min} = $num;
5728                } elsif ($op eq '>=') {
5729
1
2
                        $$p->{max} = $num - 1;
5730                } elsif ($op eq '>') {
5731
10
16
                        $$p->{max} = $num;
5732                }
5733
5734
16
32
                $self->_log("  ERROR: $param normalized constraint from '$op $num'");
5735        }
5736}
5737
5738# --------------------------------------------------
5739# _extract_parameters_from_signature
5740#
5741# Purpose:    Extract parameter names and positions
5742#             from a method's signature, trying
5743#             modern Perl subroutine signatures
5744#             first and falling back to traditional
5745#             @_ extraction styles.
5746#
5747# Entry:      $params - hashref to populate with
5748#                       parameter specs (modified
5749#                       in place).
5750#             $code   - method body source string.
5751#
5752# Exit:       Returns nothing. Populates $params.
5753#
5754# Side effects: Logs detections to stdout when
5755#               verbose is set.
5756#
5757# Notes:      Three traditional styles are
5758#             supported: (1) my ($self, ...) = @_,
5759#             (2) my $self = shift; my $x = shift,
5760#             (3) my $x = $_[N]. $self and $class
5761#             are always excluded from the returned
5762#             parameters.
5763# --------------------------------------------------
5764sub _extract_parameters_from_signature {
5765
582
522
        my ($self, $params, $code) = @_;
5766
5767        # Modern Style: Subroutine signatures with attributes
5768        # Handle multi-line signatures
5769        # sub foo :attr1 :attr2(val) (
5770        #     $self,
5771        #     $x :Type,
5772        #     $y = default
5773        # ) { }
5774
5775        # Try to match signature after attributes
5776        # Look for the parameter list - it's the last (...) before the opening brace
5777        # that contains sigils ($, %, @)
5778
582
1400
        if ($code =~ /sub\s+\w+\s*(?::\w+(?:\([^)]*\))?\s*)*\(((?:[^()]|\([^)]*\))*)\)\s*\{/s) {
5779
28
29
                my $potential_sig = $1;
5780
5781                # Check if this looks like parameters (has sigils)
5782
28
43
                if ($potential_sig =~ /[\$\%\@]/) {
5783
24
32
                        $self->_log("  SIG: Found modern signature: ($potential_sig)");
5784
24
36
                        $self->_parse_modern_signature($params, $potential_sig);
5785
24
22
                        return;
5786                }
5787        }
5788
5789        # Direct-index style: my $self = $_[0];  my $arg = $_[1]; ...
5790        # Must be checked before Style 1 to avoid matching @_ inside closures
5791        # defined in the body of a method that uses this style.
5792
558
610
        if($code =~ /my\s+\$(?:self|class)\s*=\s*\$_\[0\]/) {
5793
14
12
                my $pos = 0;
5794
14
35
                while($code =~ /my\s+\$(\w+)\s*=\s*\$_\[(\d+)\]/g) {
5795
14
15
                        my $name = $1;
5796
14
33
                        next if $name =~ /^(self|class)$/i;
5797
0
0
                        $params->{$name} //= { _source => 'code', optional => 1, position => $pos++ };
5798
0
0
                        $self->_log("  CODE: Found direct-index parameter '\$$name' at \$_[$2]");
5799                }
5800
14
15
                return;
5801        }
5802
5803        # Traditional Style 1: my ($self, $arg1, $arg2) = @_;
5804
544
913
        if ($code =~ /my\s*\(\s*([^)]+)\)\s*=\s*\@_/s) {
5805
271
289
                my $sig = $1;
5806
271
199
                my $pos = 0;
5807
5808
271
504
                while ($sig =~ /\$(\w+)/g) {
5809
574
447
                        my $name = $1;
5810
5811
574
815
                        next if $name =~ /^(self|class)$/i;
5812
5813
317
796
                        $params->{$name} //= {
5814                                _source => 'code',
5815                                optional => 1,
5816                        };
5817
5818
317
400
                        $params->{$name}{position} = $pos unless exists $params->{$name}{position};
5819
5820
317
344
                        $pos++;
5821                }
5822
271
307
                return;
5823        } elsif ($code =~ /my\s+\$self\s*=\s*shift/) {
5824                # Traditional Style 2: my $self = shift; my $arg1 = shift;
5825
22
15
                my @shifts;
5826
22
48
                while ($code =~ /my\s+\$(\w+)\s*=\s*shift/g) {
5827
27
50
                        push @shifts, $1;
5828                }
5829
22
53
                shift @shifts if @shifts && $shifts[0] =~ /^(self|class)$/i;
5830
22
21
                my $pos = 0;
5831
22
16
                foreach my $param (@shifts) {
5832
5
18
                        $params->{$param} ||= { _source => 'code', optional => 1, position => $pos++ };
5833                }
5834
22
25
                return;
5835        }
5836
5837        # Traditional Style 3: Function parameters (no $self)
5838
251
228
        if ($code =~ /my\s*\(\s*([^)]+)\)\s*=\s*\@_/s) {
5839
0
0
                my $sig = $1;
5840
0
0
                my @param_names = $sig =~ /\$(\w+)/g;
5841
0
0
                my $pos = 0;
5842
0
0
                foreach my $param (@param_names) {
5843
0
0
                        next if $param =~ /^(self|class)$/i;
5844
0
0
                        $params->{$param} ||= { _source => 'code', optional => 1, position => $pos++ };
5845                }
5846        }
5847
5848        # De-duplicate
5849
251
160
        my %seen;
5850
251
265
        foreach my $param (keys %$params) {
5851
0
0
                if ($seen{$param}++) {
5852
0
0
                        $self->_log("  WARNING: Duplicate parameter '$param' found");
5853                }
5854        }
5855}
5856
5857# --------------------------------------------------
5858# _parse_modern_signature
5859#
5860# Purpose:    Parse a Perl 5.20+ subroutine
5861#             signature string into individual
5862#             parameter specs, respecting nested
5863#             structures when splitting on commas.
5864#
5865# Entry:      $params - hashref to populate
5866#                       (modified in place).
5867#             $sig    - signature string with outer
5868#                       parentheses already removed.
5869#
5870# Exit:       Returns nothing. Populates $params
5871#             via _parse_signature_parameter.
5872#
5873# Side effects: Logs parsing details to stdout when
5874#               verbose is set.
5875# --------------------------------------------------
5876sub _parse_modern_signature {
5877
26
3298
        my ($self, $params, $sig) = @_;
5878
5879
26
35
        $self->_log("  DEBUG: Parsing signature: [$sig]");
5880
5881        # Split signature by commas, but respect nested structures (e.g. a
5882        # default value containing a hashref/arrayref literal)
5883
26
1105
        require Text::Balanced;
5884
26
12280
        my @parts;
5885
26
21
        my $current = '';
5886
26
22
        my $rest = $sig;
5887
5888
26
27
        while (length $rest) {
5889
743
581
                if (substr($rest, 0, 1) =~ /[(\[{]/) {
5890                        # extract_bracketed advances $rest past the extracted block
5891                        # in place, so $rest must not be re-truncated afterwards
5892
3
5
                        my $extracted = Text::Balanced::extract_bracketed($rest, '(){}[]');
5893
3
358
                        last unless defined $extracted; # Unbalanced brackets
5894
3
2
                        $current .= $extracted;
5895
3
5
                        next;
5896                }
5897
740
600
                if (substr($rest, 0, 1) eq ',') {
5898
47
67
                        push @parts, $current;
5899
47
27
                        $current = '';
5900
47
37
                        $rest = substr($rest, 1);
5901
47
43
                        next;
5902                }
5903
693
425
                $current .= substr($rest, 0, 1);
5904
693
516
                $rest = substr($rest, 1);
5905        }
5906
26
61
        push @parts, $current if $current =~ /\S/;
5907
5908
26
17
        my $position = 0;
5909
5910
26
24
        foreach my $part (@parts) {
5911
73
168
                $part =~ s/^\s+|\s+$//g;
5912
5913                # Skip empty parts
5914
73
67
                next unless $part;
5915
5916                # Parse different parameter types
5917
73
63
                my $param_info = $self->_parse_signature_parameter($part, $position);
5918
5919
73
70
                if ($param_info) {
5920
73
52
                        my $name = $param_info->{name};
5921
5922                        # Skip self/class
5923
73
87
                        if ($name =~ /^(self|class)$/i) {
5924
18
24
                                next;
5925                        }
5926
5927
55
54
                        $params->{$name} = $param_info;
5928                        $self->_log("  SIG: $name has position $position" .
5929                                ($param_info->{optional} ? ' (optional)' : '') .
5930
55
123
                                ($param_info->{_default} ? ", default: $param_info->{_default}" : ''));
5931
55
73
                        $position++;
5932                }
5933        }
5934}
5935
5936# --------------------------------------------------
5937# _parse_signature_parameter
5938#
5939# Purpose:    Parse a single parameter declaration
5940#             from a modern Perl signature, handling
5941#             type constraints, default values,
5942#             plain scalars, and slurpy array/hash
5943#             parameters.
5944#
5945# Entry:      $part     - a single parameter string
5946#                         (one comma-separated
5947#                         element from the signature).
5948#             $position - zero-based position index
5949#                         of this parameter.
5950#
5951# Exit:       Returns a parameter info hashref on
5952#             success, or undef if the string does
5953#             not match any known pattern.
5954#
5955# Side effects: None.
5956#
5957# Notes:      Six patterns are tried in order:
5958#             (1) :Type with default,
5959#             (2) :Type without default,
5960#             (3) default without type,
5961#             (4) plain $name,
5962#             (5) slurpy @name,
5963#             (6) slurpy %name.
5964# --------------------------------------------------
5965sub _parse_signature_parameter {
5966
86
5192
        my ($self, $part, $position) = @_;
5967
5968
86
115
        my %info = (
5969                _source => 'signature',
5970                position => $position,
5971                optional => 0,
5972        );
5973
5974        # Pattern 1: Type constraint WITH default: $name :Type = default
5975
86
231
        if ($part =~ /^\$(\w+)\s*:\s*(\w+)\s*=\s*(.+)$/s) {
5976
7
10
                my ($name, $constraint, $default) = ($1, $2, $3);
5977
7
12
                $default =~ s/^\s+|\s+$//g;
5978
5979
7
6
                $info{name} = $name;
5980
7
5
                $info{optional} = 1;
5981
7
8
                $info{_default} = $self->_clean_default_value($default, 1);
5982
5983                # Apply type constraint
5984
7
18
                if ($constraint =~ /^(Int|Integer)$/i) {
5985
2
3
                        $info{type} = 'integer';
5986                } elsif ($constraint =~ /^(Num|Number)$/i) {
5987
3
3
                        $info{type} = 'number';
5988                } elsif ($constraint =~ /^(Str|String)$/i) {
5989
2
2
                        $info{type} = 'string';
5990                } elsif ($constraint =~ /^(Bool|Boolean)$/i) {
5991
0
0
                        $info{type} = 'boolean';
5992                } elsif ($constraint =~ /^(Array|ArrayRef)$/i) {
5993
0
0
                        $info{type} = 'arrayref';
5994                } elsif ($constraint =~ /^(Hash|HashRef)$/i) {
5995
0
0
                        $info{type} = 'hashref';
5996                } else {
5997
0
0
                        $info{type} = 'object';
5998
0
0
                        $info{isa} = $constraint;
5999                }
6000
6001
7
8
                return \%info;
6002        } elsif ($part =~ /^\$(\w+)\s*:\s*(\w+)\s*$/s) {
6003                # Pattern 2: Type constraint WITHOUT default: $name :Type
6004
14
20
                my ($name, $constraint) = ($1, $2);
6005
14
15
                $info{name} = $name;
6006
14
12
                $info{optional} = 0;
6007
6008                # Apply type constraint (same as above)
6009
14
39
                if ($constraint =~ /^(Int|Integer)$/i) {
6010
4
5
                        $info{type} = 'integer';
6011                } elsif ($constraint =~ /^(Num|Number)$/i) {
6012
2
2
                        $info{type} = 'number';
6013                } elsif ($constraint =~ /^(Str|String)$/i) {
6014
2
3
                        $info{type} = 'string';
6015                } elsif ($constraint =~ /^(Bool|Boolean)$/i) {
6016
2
3
                        $info{type} = 'boolean';
6017                } elsif ($constraint =~ /^(Array|ArrayRef)$/i) {
6018
2
2
                        $info{type} = 'arrayref';
6019                } elsif ($constraint =~ /^(Hash|HashRef)$/i) {
6020
0
0
                        $info{type} = 'hashref';
6021                } else {
6022
2
2
                        $info{type} = 'object';
6023
2
2
                        $info{isa} = $constraint;
6024                }
6025
6026
14
42
                return \%info;
6027        } elsif ($part =~ /^\$(\w+)\s*=\s*(.+)$/s) {
6028                # Pattern 3: Default WITHOUT type: $name = default
6029
16
25
                my ($name, $default) = ($1, $2);
6030
16
26
                $default =~ s/^\s+|\s+$//g;
6031
6032
16
17
        $info{name} = $name;
6033
16
11
        $info{optional} = 1;
6034
16
21
        $info{_default} = $self->_clean_default_value($default, 1);
6035
16
49
        $info{type} = $self->_infer_type_from_default($info{_default}) if $self->can('_infer_type_from_default');
6036
6037
16
20
        return \%info;
6038        }
6039
6040    # Pattern 4: Plain parameter: $name
6041    elsif ($part =~ /^\$(\w+)$/s) {
6042
39
48
        $info{name} = $1;
6043
39
34
        $info{optional} = 0;
6044
39
42
        return \%info;
6045    }
6046
6047    # Pattern 5: Array parameter: @name
6048    elsif ($part =~ /^\@(\w+)$/s) {
6049
4
6
        $info{name} = $1;
6050
4
3
        $info{type} = 'array';
6051
4
5
        $info{slurpy} = 1;
6052
4
4
        $info{optional} = 1;
6053
4
6
        return \%info;
6054    }
6055
6056    # Pattern 6: Hash parameter: %name
6057    elsif ($part =~ /^\%(\w+)$/s) {
6058
4
7
        $info{name} = $1;
6059
4
5
        $info{type} = 'hash';
6060
4
4
        $info{slurpy} = 1;
6061
4
3
        $info{optional} = 1;
6062
4
8
        return \%info;
6063    }
6064
6065
2
5
        return undef;
6066}
6067
6068# --------------------------------------------------
6069# _infer_type_from_default
6070#
6071# Purpose:    Infer a parameter type from its
6072#             default value when no explicit type
6073#             annotation is available.
6074#
6075# Entry:      $default - the cleaned default value
6076#                        scalar, hashref, or
6077#                        arrayref. May be undef.
6078#
6079# Exit:       Returns a type string ('hashref',
6080#             'arrayref', 'integer', 'number',
6081#             'boolean', 'string'), or undef if
6082#             $default is undef.
6083#
6084# Side effects: None.
6085# --------------------------------------------------
6086sub _infer_type_from_default {
6087
33
51
        my ($self, $default) = @_;
6088
6089
33
37
        return undef unless defined $default;
6090
6091
29
124
        if (ref($default) eq 'HASH') {
6092
2
5
                return 'hashref';
6093        } elsif (ref($default) eq 'ARRAY') {
6094
2
4
                return 'arrayref';
6095        } elsif ($default =~ /^-?\d+$/) {
6096
16
26
                return 'integer';
6097        } elsif ($default =~ /^-?\d+\.\d+$/) {
6098
2
4
                return 'number';
6099        } elsif ($default eq '1' || $default eq '0') {
6100
0
0
                return 'boolean';
6101        } else {
6102
7
15
                return 'string';
6103        }
6104}
6105
6106# --------------------------------------------------
6107# _extract_subroutine_attributes
6108#
6109# Purpose:    Extract Perl subroutine attributes
6110#             (e.g. :lvalue, :method, :Returns(Int))
6111#             from a method's source string.
6112#
6113# Entry:      $code - method body source string.
6114#
6115# Exit:       Returns a hashref of attribute name
6116#             to value (1 for flag-only attributes,
6117#             the attribute argument string for
6118#             attributes with values).
6119#             Returns an empty hashref if no
6120#             attributes are found.
6121#
6122# Side effects: Logs detections to stdout when
6123#               verbose is set.
6124# --------------------------------------------------
6125sub _extract_subroutine_attributes {
6126
293
1808
        my ($self, $code) = @_;
6127
6128
293
200
        my %attributes;
6129
6130        # Extract all attributes from the sub declaration
6131        # Attributes are :name or :name(value) between sub name and either ( or {
6132        # Pattern: sub name ATTRIBUTES ( params ) { }
6133        # or:      sub name ATTRIBUTES { }
6134
6135        # First, find the attributes section (everything between sub name and ( or { )
6136
293
220
        my $attr_section = '';
6137
6138
293
465
        if($code =~ /sub\s+\w+\s+((?::\w+(?:\([^)]*\))?\s*)+)/s) {
6139
9
11
                $attr_section = $1;
6140        }
6141
6142        # Parse individual attributes from the section
6143
293
279
        if($attr_section) {
6144
9
26
                while($attr_section =~ /:(\w+)(?:\(([^)]*)\))?/g) {
6145
11
18
                        my ($name, $value) = ($1, $2);
6146
6147
11
17
                        if (defined $value && $value ne '') {
6148
5
7
                                $attributes{$name} = $value;
6149
5
11
                                $self->_log("  ATTR: Found attribute :$name($value)");
6150                        } else {
6151
6
8
                                $attributes{$name} = 1;
6152
6
7
                                $self->_log("  ATTR: Found attribute :$name");
6153                        }
6154                }
6155        }
6156
6157        # Process common attributes
6158
293
297
        if ($attributes{Returns}) {
6159
4
3
                my $return_type = $attributes{Returns};
6160
4
6
                if ($return_type ne '1') {  # Only log if it's an actual type, not just the flag
6161
4
5
                        $self->_log("  ATTR: Method declares return type: $return_type");
6162                }
6163        }
6164
6165
293
301
        if ($attributes{lvalue}) {
6166
4
4
                $self->_log("  ATTR: Method is lvalue (can be assigned to)");
6167        }
6168
6169
293
265
        if ($attributes{method}) {
6170
2
2
                $self->_log('  ATTR: Method explicitly marked as :method');
6171        }
6172
6173
293
297
        return \%attributes;
6174}
6175
6176# --------------------------------------------------
6177# _analyze_postfix_dereferencing
6178#
6179# Purpose:    Detect usage of Perl 5.20+ postfix
6180#             dereferencing syntax in a method body
6181#             and record which dereference forms
6182#             are used.
6183#
6184# Entry:      $code - method body source string.
6185#
6186# Exit:       Returns a hashref whose keys are
6187#             dereference form names (array_deref,
6188#             hash_deref, scalar_deref, code_deref,
6189#             array_slice, hash_slice) with value 1
6190#             when detected.
6191#             Returns an empty hashref if no
6192#             postfix dereferencing is found.
6193#
6194# Side effects: Logs detections to stdout when
6195#               verbose is set.
6196# --------------------------------------------------
6197sub _analyze_postfix_dereferencing {
6198
295
1191
        my ($self, $code) = @_;
6199
6200
295
196
        my %derefs;
6201
6202        # Array dereference: $ref->@*
6203
295
473
        if ($code =~ /\$\w+\s*->\s*\@\*/) {
6204
4
7
                $derefs{array_deref} = 1;
6205
4
4
                $self->_log("  MODERN: Uses postfix array dereferencing (->@*)");
6206        }
6207
6208        # Hash dereference: $ref->%*
6209
295
393
        if ($code =~ /\$\w+\s*->\s*\%\*/) {
6210
3
5
                $derefs{hash_deref} = 1;
6211
3
5
                $self->_log("  MODERN: Uses postfix hash dereferencing (->%*)");
6212        }
6213
6214        # Scalar dereference: $ref->$*
6215
295
396
        if ($code =~ /\$\w+\s*->\s*\$\*/) {
6216
2
3
                $derefs{scalar_deref} = 1;
6217
2
3
                $self->_log('  MODERN: Uses postfix scalar dereferencing (->$*)');
6218        }
6219
6220        # Code dereference: $ref->&*
6221
295
385
        if ($code =~ /\$\w+\s*->\s*\&\*/) {
6222
1
1
                $derefs{code_deref} = 1;
6223
1
1
                $self->_log("  MODERN: Uses postfix code dereferencing (->&*)");
6224        }
6225
6226        # Array element: $ref->@[0,2,4]
6227
295
391
        if ($code =~ /\$\w+\s*->\s*\@\[/) {
6228
2
3
                $derefs{array_slice} = 1;
6229
2
2
                $self->_log("  MODERN: Uses postfix array slice (->@[...])");
6230        }
6231
6232        # Hash element: $ref->%{key1,key2}
6233
295
384
        if ($code =~ /\$\w+\s*->\s*\%\{/) {
6234
2
3
                $derefs{hash_slice} = 1;
6235
2
2
                $self->_log("  MODERN: Uses postfix hash slice (->%{...})");
6236        }
6237
6238
295
247
        return \%derefs;
6239}
6240
6241# --------------------------------------------------
6242# _extract_field_declarations
6243#
6244# Purpose:    Extract Perl 5.38 field declarations
6245#             from a class body or method source
6246#             string, capturing field names,
6247#             :param attributes, default values,
6248#             and :isa type constraints.
6249#
6250# Entry:      $code - source string potentially
6251#                     containing 'field $name ...'
6252#                     declarations.
6253#
6254# Exit:       Returns a hashref of field name to
6255#             field_info hashref. Returns an empty
6256#             hashref if no field declarations
6257#             are found.
6258#
6259# Side effects: Logs detections to stdout when
6260#               verbose is set.
6261# --------------------------------------------------
6262sub _extract_field_declarations {
6263
296
2234
        my ($self, $code) = @_;
6264
6265
296
196
        my %fields;
6266
6267        # Pattern: field $name :param;
6268        # Pattern: field $name :param(name);
6269        # Pattern: field $name = default;
6270        # More lenient pattern to catch various formats
6271
296
393
        while ($code =~ /^\s*field\s+\$(\w+)\s*([^;]*);/gm) {
6272
12
17
                my ($name, $modifiers) = ($1, $2);
6273
6274
12
21
                $self->_log("  FIELD: Found field \$$name with modifiers: [$modifiers]");
6275
6276
12
16
                my %field_info = (
6277                        name => $name,
6278                        _source => 'field'
6279                );
6280
6281                # Check for :param attribute
6282
12
24
                if ($modifiers =~ /:param(?:\(([^)]+)\))?/) {
6283
11
12
                        $field_info{is_param} = 1;
6284
6285
11
12
                        if (defined $1) {
6286                                # Explicit parameter name
6287
2
3
                                $field_info{param_name} = $1;
6288                        } else {
6289                                # Implicit - field name is param name
6290
9
9
                                $field_info{param_name} = $name;
6291                        }
6292
6293
11
13
                        $self->_log("  FIELD: $name maps to parameter: $field_info{param_name}");
6294                }
6295
6296        # Check for default value - must come before type constraint check
6297
12
18
        if ($modifiers =~ /=\s*([^:;]+)(?::|;|$)/) {
6298
4
5
                my $default = $1;
6299
4
4
                $default =~ s/\s+$//;
6300
4
7
                $field_info{_default} = $self->_clean_default_value($default, 1);
6301
4
5
                $field_info{optional} = 1;
6302
4
13
                $self->_log("  FIELD: $name has default: " . (defined $field_info{_default} ? $field_info{_default} : 'undef'));
6303        }
6304
6305        # Check for type constraints
6306
12
18
        if ($modifiers =~ /:isa\(([^)]+)\)/) {
6307
3
4
            $field_info{isa} = $1;
6308
3
4
            $field_info{type} = 'object';
6309
3
8
            $self->_log("  FIELD: $name has type constraint: $1");
6310        }
6311
6312
12
22
                $fields{$name} = \%field_info;
6313        }
6314
6315
296
250
        return \%fields;
6316}
6317
6318# --------------------------------------------------
6319# _merge_field_declarations
6320#
6321# Purpose:    Integrate Perl 5.38 field declarations
6322#             that carry the :param attribute into
6323#             the code parameter hashref, so they
6324#             appear as constructor parameters in
6325#             the generated schema.
6326#
6327# Entry:      $params - hashref of parameters
6328#                       extracted from code analysis
6329#                       (modified in place).
6330#             $fields - hashref of field declarations
6331#                       as returned by
6332#                       _extract_field_declarations.
6333#
6334# Exit:       Returns nothing. Modifies $params
6335#             in place.
6336#
6337# Side effects: Logs merges to stdout when verbose
6338#               is set.
6339#
6340# Notes:      Only fields with is_param => 1 are
6341#             merged. The param_name key in the
6342#             field (which may differ from the
6343#             field name if :param(name) was used)
6344#             determines the parameter key.
6345# --------------------------------------------------
6346sub _merge_field_declarations {
6347
6
31
        my ($self, $params, $fields) = @_;
6348
6349
6
7
        foreach my $field_name (keys %$fields) {
6350
11
10
                my $field = $fields->{$field_name};
6351
6352                # Only process fields that are parameters
6353
11
11
                next unless $field->{is_param};
6354
6355
9
10
                my $param_name = $field->{param_name};
6356
6357                # Create or update parameter info
6358
9
18
                $params->{$param_name} ||= {};
6359
9
8
                my $p = $params->{$param_name};
6360
6361                # Merge field information into parameter
6362
9
11
                $p->{_source} = 'field' unless $p->{_source};
6363
9
11
                $p->{field_name} = $field_name if $field_name ne $param_name;
6364
6365
9
8
                if ($field->{_default}) {
6366
3
4
                        $p->{_default} = $field->{_default};
6367
3
4
                        $p->{optional} = 1;
6368                }
6369
6370
9
8
                if ($field->{isa}) {
6371
2
3
                        $p->{isa} = $field->{isa};
6372
2
2
                        $p->{type} = 'object';
6373                }
6374
6375
9
12
                $self->_log("  MERGED: Field $field_name -> parameter $param_name");
6376        }
6377}
6378
6379# --------------------------------------------------
6380# _extract_defaults_from_code
6381#
6382# Purpose:    Scan a method body for default value
6383#             assignment patterns and populate the
6384#             optional and _default fields of
6385#             known parameters.
6386#
6387# Entry:      $params - hashref of parameters
6388#                       (modified in place).
6389#             $code   - method body source string.
6390#             $method - method hashref, used for
6391#                       constructor-specific
6392#                       exclusions of $class and
6393#                       $self.
6394#
6395# Exit:       Returns nothing. Modifies $params
6396#             in place.
6397#
6398# Side effects: Logs detections to stdout when
6399#               verbose is set.
6400#
6401# Notes:      Eight default patterns are tried.
6402#             Only parameters already present in
6403#             $params are updated — this method
6404#             does not add new parameters.
6405#             Falls back to extracting all @_
6406#             assignments if $params is empty
6407#             after the main pass.
6408# --------------------------------------------------
6409sub _extract_defaults_from_code {
6410
293
296
        my ($self, $params, $code, $method) = @_;
6411
6412        # Pattern 1: my $param = value;
6413
293
485
        while ($code =~ /my\s+\$(\w+)\s*=\s*([^;]+);/g) {
6414
34
72
                my ($param, $value) = ($1, $2);
6415
34
73
                next unless exists $params->{$param};
6416
1
1
                next if $value =~ /->/;      # deref/method call, not a default value
6417
6418
1
3
                $params->{$param}{_default} = $self->_clean_default_value($value, 1);
6419
1
1
                $params->{$param}{optional} = 1;
6420
1
3
                $self->_log("  CODE: $param has default: " . $self->_format_default($params->{$param}{_default}));
6421        }
6422
6423        # Pattern 2: $param = value unless defined $param;
6424
293
437
        while ($code =~ /\$(\w+)\s*=\s*([^;]+?)\s+unless\s+(?:defined\s+)?\$\1/g) {
6425
4
7
                my ($param, $value) = ($1, $2);
6426
4
6
                next unless exists $params->{$param};
6427
6428
4
6
                $params->{$param}{_default} = $self->_clean_default_value($value, 1);
6429
4
3
                $params->{$param}{optional} = 1;
6430
4
9
                $self->_log("  CODE: $param has default (unless): " . $self->_format_default($params->{$param}{_default}));
6431        }
6432
6433        # Pattern 3: $param = value unless $param;
6434
293
415
        while ($code =~ /\$(\w+)\s*=\s*([^;]+?)\s+unless\s+\$\1/g) {
6435
1
1
                my ($param, $value) = ($1, $2);
6436
1
1
                next unless exists $params->{$param};
6437
6438
1
45
                $params->{$param}{_default} = $self->_clean_default_value($value, 1);
6439
1
1
                $params->{$param}{optional} = 1;
6440
1
2
                $self->_log("  CODE: $param has default (unless): " . $self->_format_default($params->{$param}{_default}));
6441        }
6442
6443        # Pattern 4: $param = $param || 'default';
6444
293
332
        while ($code =~ /\$(\w+)\s*=\s*\$\1\s*\|\|\s*([^;]+);/g) {
6445
8
12
                my ($param, $value) = ($1, $2);
6446
8
8
                next unless exists $params->{$param};
6447
6448
8
8
                $params->{$param}{_default} = $self->_clean_default_value($value, 1);
6449
8
9
                $params->{$param}{optional} = 1;
6450
8
12
                $self->_log("  CODE: $param has default (||): " . $self->_format_default($params->{$param}{_default}));
6451        }
6452
6453        # Pattern 5: $param ||= 'default';
6454
293
316
        while ($code =~ /\$(\w+)\s*\|\|=\s*([^;]+);/g) {
6455
2
4
                my ($param, $value) = ($1, $2);
6456
2
3
                next unless exists $params->{$param};
6457
6458
2
4
                $params->{$param}{_default} = $self->_clean_default_value($value, 1);
6459
2
2
                $params->{$param}{optional} = 1;
6460
2
7
                $self->_log("  CODE: $param has default (||=): " . $self->_format_default($params->{$param}{_default}));
6461        }
6462
6463        # Pattern 6: $param //= 'default';
6464
293
335
        while ($code =~ /\$(\w+)\s*\/\/=\s*([^;]+);/g) {
6465
8
30
                my ($param, $value) = ($1, $2);
6466
8
11
                next unless exists $params->{$param};  # Using -> because $params is a reference
6467
6468
7
9
                $params->{$param}{_default} = $self->_clean_default_value($value, 1);
6469
6470
7
8
                $params->{$param}{optional} = 1;
6471
7
16
                $self->_log("  CODE: $param has default (//=): " . $self->_format_default($params->{$param}{_default}));
6472        }
6473
6474        # Pattern 7: $param = defined $param ? $param : 'default';
6475
293
370
        while ($code =~ /\$(\w+)\s*=\s*defined\s+\$\1\s*\?\s*\$\1\s*:\s*([^;]+);/g) {
6476
4
5
                my ($param, $value) = ($1, $2);
6477
6478                # Create param entry if it doesn't exist
6479
4
5
                $params->{$param} ||= {};
6480
6481
4
4
                my $cleaned = $self->_clean_default_value($value, 1);
6482
6483
4
4
                $params->{$param}{_default} = $cleaned;
6484
4
4
                $params->{$param}{optional} = 1;
6485
4
4
                $self->_log("  CODE: $param has default (ternary): " . $self->_format_default($params->{$param}{_default}));
6486        }
6487
6488        # Pattern 8: $param = $args{param} || 'default';
6489
293
366
        while ($code =~ /\$(\w+)\s*=\s*\$args\{['"]?\w+['"]?\}\s*\|\|\s*([^;]+);/g) {
6490
0
0
                my ($param, $value) = ($1, $2);
6491
0
0
                next unless exists $params->{$param};
6492
6493
0
0
                $params->{$param}{_default} = $self->_clean_default_value($value, 1);
6494
0
0
                $params->{$param}{optional} = 1;
6495
0
0
                $self->_log("  CODE: $param has default (from args): " . $self->_format_default($params->{$param}{_default}));
6496        }
6497
6498        # Pattern for non-empty hashref
6499
293
313
        while ($code =~ /\$(\w+)\s*\|\|=\s*\{[^}]+\}/gs) {
6500
1
1
                my $param = $1;
6501
1
1
                next unless exists $params->{$param};
6502
6503                # Return empty hashref as placeholder (can't evaluate complex hashrefs)
6504
1
2
                $params->{$param}{_default} = {};
6505
1
1
                $params->{$param}{optional} = 1;
6506
1
1
                $self->_log("  CODE: $param has hashref default (||=)");
6507        }
6508
6509        # Fallback: extract parameters from classic Perl body styles
6510        # Only run if signature extraction found nothing AND the code does not use
6511        # the direct-index ($_[0]) style — that style is used for no-param methods
6512        # whose empty %params would otherwise trigger this fallback and pick up
6513        # my (...) = @_ from inner closures as if they were method params.
6514        # TODO:  On constructors, use $class to help to determine the output type
6515
293
169
        if (!keys %{$params} && $code !~ /my\s+\$(?:self|class)\s*=\s*\$_\[0\]/) {
6516
156
107
                my $position = 0;
6517
6518                # Style 1: my ($a, $b) = @_;
6519
156
190
                while ($code =~ /my\s*\(\s*([^)]+)\s*\)\s*=\s*\@_/g) {
6520
24
70
                        my @vars = $1 =~ /\$(\w+)/g;
6521
24
26
                        foreach my $var (@vars) {
6522
24
96
                                if(($var eq 'class') && ($position == 0) && ($method->{name} eq 'new')) {
6523                                        # Don't include "class" in the variable names of the constructor
6524
13
19
                                        delete $params->{'class'};
6525                                } elsif(($var eq 'self') && ($position == 0) && ($method->{name} ne 'new')) {
6526                                        # Don't include "self" in the variable names
6527
11
20
                                        delete $params->{'self'};
6528                                } else {
6529
0
0
                                        $params->{$var} ||= { position => $position++ };
6530
0
0
                                        $self->_log("  CODE: $var extracted from \@_ list assignment");
6531                                }
6532                        }
6533                }
6534
6535                # Style 2: my $x = shift;
6536
156
175
                while ($code =~ /my\s+\$(\w+)\s*=\s*shift\b/g) {
6537
7
6
                        my $var = $1;
6538
7
30
                        if(($var eq 'class') && ($position == 0) && ($method->{name} eq 'new')) {
6539                                # Don't include "class" in the variable names of the constructor
6540
0
0
                                delete $params->{'class'};
6541                        } elsif(($var eq 'self') && ($position == 0) && ($method->{name} ne 'new')) {
6542                                # Don't include "self" in the variable names
6543
7
11
                                delete $params->{'self'};
6544                        } else {
6545
0
0
                                $params->{$var} ||= { position => $position++ };
6546
0
0
                                $self->_log("  CODE: $var is extracted from shift");
6547                        }
6548                }
6549
6550                # Style 3: my $x = $_[0];
6551
156
206
                while ($code =~ /my\s+\$(\w+)\s*=\s*\$_\[(\d+)\]/g) {
6552
0
0
                        my ($var, $index) = ($1, $2);
6553
0
0
                        if(($var ne 'class') || ($position > 0) || ($method->{name} ne 'new')) {
6554
0
0
                                $params->{$var} ||= { position => $index };
6555
0
0
                                $self->_log("  CODE: $var is extracted from \$_\[$index\]");
6556                        }
6557                }
6558        }
6559}
6560
6561# --------------------------------------------------
6562# _format_default
6563#
6564# Purpose:    Format a default value for display
6565#             in verbose log output.
6566#
6567# Entry:      $default - the default value to
6568#                        format. May be undef,
6569#                        a scalar, a hashref, or
6570#                        an arrayref.
6571#
6572# Exit:       Returns a display string: 'undef'
6573#             for undef, 'HASH ref' / 'ARRAY ref'
6574#             for references, or the value itself
6575#             for scalars.
6576#
6577# Side effects: None.
6578# --------------------------------------------------
6579sub _format_default {
6580
32
35
        my ($self, $default) = @_;
6581
32
27
        return 'undef' unless defined $default;
6582
30
37
        return ref($default) . ' ref' if ref($default);
6583
25
34
        return $default;
6584}
6585
6586# --------------------------------------------------
6587# _analyze_parameter_constraints
6588#
6589# Purpose:    Infer min, max, and regex match
6590#             constraints for a single parameter
6591#             from length checks, numeric
6592#             comparisons, and regex match
6593#             patterns in the method body.
6594#
6595# Entry:      $p_ref - reference to the parameter
6596#                      hashref (modified in place).
6597#             $param - parameter name string.
6598#             $code  - method body source string.
6599#
6600# Exit:       Returns nothing. Modifies the
6601#             referenced parameter hashref.
6602#
6603# Side effects: Logs detections to stdout when
6604#               verbose is set.
6605#
6606# Notes:      Numeric comparisons that appear
6607#             inside die/croak guard conditions
6608#             are excluded to avoid inferring
6609#             invalid-input ranges as valid
6610#             constraints.
6611# --------------------------------------------------
6612sub _analyze_parameter_constraints {
6613
188
205
        my ($self, $p_ref, $param, $code) = @_;
6614
188
134
        my $p = $$p_ref;
6615
6616        # Do not treat comparisons inside die/croak/confess as valid constraints
6617
188
117
        my $guarded = 0;
6618
188
3725
                if ($code =~ /(die|croak|confess)\b[^{;]*\bif\b[^{;]*\$$param\b/s) {
6619
22
18
                $guarded = 1;
6620        }
6621
6622        # Length checks for strings
6623
188
4345
        if ($code =~ /length\s*\(\s*\$$param\s*\)\s*([<>]=?)\s*(\d+)/) {
6624
4
9
                my ($op, $val) = ($1, $2);
6625
4
13
                $p->{type} ||= 'string';
6626
4
9
                if ($op eq '<') {
6627
2
4
                        $p->{max} = $val - 1;
6628                } elsif ($op eq '<=') {
6629
0
0
                        $p->{max} = $val;
6630                } elsif ($op eq '>') {
6631
1
2
                        $p->{min} = $val + 1;
6632                } elsif ($op eq '>=') {
6633
1
2
                        $p->{min} = $val;
6634                }
6635
4
8
                $self->_log("  CODE: $param length constraint $op $val");
6636        }
6637
6638        # Numeric range checks (only if NOT part of error guard)
6639
188
3382
        if (
6640                !$guarded
6641                && $code =~ /\$$param\s*([<>]=?)\s*([+-]?(?:\d+\.?\d*|\.\d+))/
6642        ) {
6643
11
19
                my ($op, $val) = ($1, $2);
6644
11
16
                $p->{type} ||= looks_like_number($val) ? 'number' : 'integer';
6645
6646
11
35
                if ($op eq '<' || $op eq '<=') {
6647                        # Only set max if it tightens the range
6648
1
2
                        my $max = ($op eq '<') ? $val - 1 : $val;
6649
1
3
                        $p->{max} = $max if !defined($p->{max}) || $max < $p->{max};
6650                } elsif ($op eq '>' || $op eq '>=') {
6651
10
15
                        my $min = ($op eq '>') ? $val + 1 : $val;
6652
10
28
                        $p->{min} = $min if !defined($p->{min}) || $min > $p->{min};
6653                }
6654        }
6655
6656        # Regex pattern matching with better capture
6657
188
10045
        if ($code =~ /\$$param\s*=~\s*((?:qr?\/[^\/]+\/|\$[\w:]+|\$\{\w+\}))/) {
6658
1
2
                my $pattern = $1;
6659
1
4
                $p->{type} ||= 'string';
6660
6661                # Clean up the pattern if it's a straightforward regex
6662
1
2
                if ($pattern =~ /^qr?\/([^\/]+)\/$/) {
6663
1
3
                        $p->{matches} = "/$1/";
6664                } else {
6665
0
0
                        $p->{matches} = $pattern;
6666                }
6667
1
3
                $self->_log("  CODE: $param matches pattern: $p->{matches}");
6668        }
6669}
6670
6671# --------------------------------------------------
6672# _analyze_parameter_validation
6673#
6674# Purpose:    Determine optionality and extract
6675#             default values for a single parameter
6676#             by analysing explicit required checks
6677#             (die/croak unless defined) and default
6678#             assignment patterns in the method body.
6679#
6680# Entry:      $p_ref - reference to the parameter
6681#                      hashref (modified in place).
6682#             $param - parameter name string.
6683#             $code  - method body source string.
6684#
6685# Exit:       Returns nothing. Modifies the
6686#             referenced parameter hashref.
6687#
6688# Side effects: Logs detections to stdout when
6689#               verbose is set.
6690#
6691# Notes:      Explicit required checks take highest
6692#             priority and override any default
6693#             value detected earlier.
6694# --------------------------------------------------
6695sub _analyze_parameter_validation {
6696
189
253
        my ($self, $p_ref, $param, $code) = @_;
6697
189
142
        my $p = $$p_ref;
6698
6699        # Required/optional checks
6700
189
127
        my $is_required = 0;
6701
6702        # Die/croak if not defined
6703
189
4701
        if ($code =~ /(?:die|croak|confess)\s+[^;]*unless\s+(?:defined\s+)?\$$param/s) {
6704
25
22
                $is_required = 1;
6705        }
6706
6707        # Extract default values with the new method
6708
189
299
        my $default_value = $self->_extract_default_value($param, $code);
6709
189
238
        if (defined $default_value && !exists $p->{_default}) {
6710
3
3
                $p->{optional} = 1;
6711
3
4
                $p->{_default} = $default_value;
6712
6713                # Try to infer type from default value if not already set
6714
3
5
                unless ($p->{type}) {
6715
3
6
                        if (looks_like_number($default_value)) {
6716
3
5
                                $p->{type} = $default_value =~ /\./ ? 'number' : 'integer';
6717                        } elsif (ref($default_value) eq 'ARRAY') {
6718
0
0
                                $p->{type} = 'arrayref';
6719                        } elsif (ref($default_value) eq 'HASH') {
6720
0
0
                                $p->{type} = 'hashref';
6721                        } elsif ($default_value eq 'undef') {
6722
0
0
                                $p->{type} = 'scalar';       # undef can be any scalar
6723                        } elsif (defined $default_value && !ref($default_value)) {
6724
0
0
                                $p->{type} = 'string';
6725                        }
6726                }
6727
6728
3
9
                $self->_log("  CODE: $param has default value: " . (ref($default_value) ? ref($default_value) . ' ref' : $default_value));
6729        }
6730
6731        # Also check for simple default assignment without condition
6732        # Pattern: $param = 'value';
6733
189
3530
        if (!$default_value && !exists $p->{_default} && $code =~ /\$$param\s*=\s*([^;{}]+?)(?:\s*[;}])/s) {
6734
6
10
                my $assignment = $1;
6735                # Make sure it's not part of a larger expression
6736
6
33
                if ($assignment !~ /\$$param/ && $assignment !~ /^shift/) {
6737
6
6
                        my $possible_default = $assignment;
6738
6
9
                        $possible_default =~ s/\s*;\s*$//;
6739
6
12
                        $possible_default = $self->_clean_default_value($possible_default);
6740
6
8
                        if (defined $possible_default) {
6741
6
8
                                $p->{_default} = $possible_default;
6742
6
9
                                $p->{optional} = 1;
6743
6
12
                                $self->_log("  CODE: $param has unconditional default: $possible_default");
6744                        }
6745                }
6746        }
6747
6748        # Explicit required check overrides default detection
6749
189
315
        if ($is_required) {
6750
25
27
                $p->{optional} = 0;
6751
25
34
                delete $p->{_default} if exists $p->{_default};
6752
25
42
                $self->_log("  CODE: $param is required (validation check)");
6753        }
6754}
6755
6756# --------------------------------------------------
6757# _merge_parameter_analyses
6758#
6759# Purpose:    Merge parameter information from POD,
6760#             code, and signature analysis into a
6761#             single authoritative parameter hashref
6762#             for each parameter.
6763#
6764# Entry:      $pod - hashref of parameters from POD
6765#                    analysis.
6766#             $code - hashref of parameters from
6767#                     code analysis.
6768#             $sig  - hashref of parameters from
6769#                     signature analysis (optional,
6770#                     defaults to empty hashref).
6771#
6772# Exit:       Returns a merged hashref of parameter
6773#             name to spec hashref. Each spec has
6774#             all available information combined,
6775#             with POD taking highest priority,
6776#             code second, and signature filling
6777#             remaining gaps.
6778#
6779# Side effects: Logs merged parameter details to
6780#               stdout when verbose is set.
6781#
6782# Notes:      Position is determined by majority
6783#             vote across all sources, with the
6784#             lowest position winning ties. Optional
6785#             status is determined by
6786#             _determine_optional_status. Internal
6787#             _source keys are stripped from the
6788#             merged result.
6789# --------------------------------------------------
6790sub _merge_parameter_analyses {
6791
286
344
        my ($self, $pod, $code, $sig) = @_;
6792
6793
286
190
        my %merged;
6794
6795        # Start with all parameters from all sources
6796
286
254
407
310
        my %all_params = map { $_ => 1 } (keys %$pod, keys %$code, keys %$sig);
6797
6798
286
331
        foreach my $param (keys %all_params) {
6799
188
199
                my $p = $merged{$param} = {};
6800
6801                # Collect position from all sources
6802
188
128
                my @positions;
6803
188
679
                push @positions, $pod->{$param}{position} if $pod->{$param} && defined $pod->{$param}{position};
6804
188
213
                push @positions, $sig->{$param}{position} if $sig->{$param} && defined $sig->{$param}{position};
6805
188
378
                push @positions, $code->{$param}{position} if $code->{$param} && defined $code->{$param}{position};
6806
6807                # Use the most common position, or lowest if tie
6808
188
170
                if (@positions) {
6809
182
119
                        my %pos_count;
6810
182
337
                        $pos_count{$_}++ for @positions;
6811
182
0
200
0
                        my ($best_pos) = sort { $pos_count{$b} <=> $pos_count{$a} || $a <=> $b } keys %pos_count;
6812
182
342
                        $p->{position} = $best_pos unless(exists($p->{position}));
6813                }
6814
6815                # POD has highest priority for type info and explicit declarations
6816
188
174
                if ($pod->{$param}) {
6817
69
69
54
125
                        %$p = (%$p, %{$pod->{$param}});
6818                }
6819
6820                # Code analysis adds concrete evidence (but doesn't override POD explicit types)
6821
188
178
                if ($code->{$param}) {
6822
185
185
129
246
                        foreach my $key (keys %{$code->{$param}}) {
6823
863
659
                                next if $key eq '_source';
6824
682
524
                                next if $key eq 'position';
6825
6826                                # Only override if POD didn't provide this info or it's a stronger signal
6827
502
321
                                my $from_pod = exists $pod->{$param};
6828
502
540
                                if (!exists $p->{$key} ||
6829                                   ($key eq 'type' && $from_pod && $p->{type} eq 'string' &&
6830                                   $code->{$param}{$key} ne 'string')) {
6831
455
420
                                        $p->{$key} = $code->{$param}{$key};
6832                                }
6833                        }
6834                }
6835
6836                # Signature fills in remaining gaps
6837
188
198
                if ($sig->{$param}) {
6838
0
0
0
0
                        foreach my $key (keys %{$sig->{$param}}) {
6839
0
0
                                next if $key eq '_source';
6840
0
0
                                next if $key eq 'position';
6841
0
0
                                $p->{$key} //= $sig->{$param}{$key};
6842                        }
6843                }
6844
6845                # Handle optional field with better logic
6846
188
319
                $self->_determine_optional_status($p, $pod->{$param}, $code->{$param});
6847
6848                # Clean up internal fields
6849
188
217
                delete $p->{_source};
6850        }
6851
6852        # Debug logging
6853
286
316
        if ($self->{verbose}) {
6854
2
1
4
3
                foreach my $param (sort { ($merged{$a}{position} || 999) <=> ($merged{$b}{position} || 999) } keys %merged) {
6855
2
2
                        my $p = $merged{$param};
6856                        $self->_log("  MERGED $param: " .
6857                                        'pos=' . ($p->{position} || 'none') .
6858                                        ", type=" . ($p->{type} || 'none') .
6859
2
5
                                        ", optional=" . (defined($p->{optional}) ? $p->{optional} : 'undef'));
6860                }
6861        }
6862
6863
286
509
        return \%merged;
6864}
6865
6866# --------------------------------------------------
6867# _determine_optional_status
6868#
6869# Purpose:    Set the optional field on a merged
6870#             parameter spec based on evidence from
6871#             POD and code analysis, with POD taking
6872#             highest priority.
6873#
6874# Entry:      $merged_param - the merged parameter
6875#                             hashref (modified in
6876#                             place).
6877#             $pod_param    - parameter spec from
6878#                             POD analysis, or undef.
6879#             $code_param   - parameter spec from
6880#                             code analysis, or undef.
6881#
6882# Exit:       Returns nothing. Sets or leaves
6883#             $merged_param->{optional}.
6884#
6885# Side effects: None.
6886# --------------------------------------------------
6887sub _determine_optional_status {
6888
194
1185
        my ($self, $merged_param, $pod_param, $code_param) = @_;
6889
6890
194
203
        my $pod_optional = $pod_param ? $pod_param->{optional} : undef;
6891
194
186
        my $code_optional = $code_param ? $code_param->{optional} : undef;
6892
6893        # Explicit POD declaration wins
6894
194
224
        if (defined $pod_optional) {
6895
16
18
                $merged_param->{optional} = $pod_optional;
6896        }
6897        # Code validation evidence
6898        elsif (defined $code_optional) {
6899
169
168
                $merged_param->{optional} = $code_optional;
6900        }
6901        # Default: if we have any info about the param, assume required
6902        elsif (keys %$merged_param > 0) {
6903
8
11
                $merged_param->{optional} = 0;
6904        }
6905        # Otherwise leave undef (unknown)
6906}
6907
6908
6909# --------------------------------------------------
6910# _calculate_input_confidence
6911#
6912# Purpose:    Calculate a confidence score and level
6913#             for the input parameter analysis,
6914#             based on how much type, constraint,
6915#             and semantic information was inferred
6916#             for each parameter.
6917#
6918# Entry:      $params - hashref of merged parameter
6919#                       specs as produced by
6920#                       _merge_parameter_analyses.
6921#
6922# Exit:       Returns a hashref with keys:
6923#               level         - one of: none,
6924#                               very_low, low,
6925#                               medium, high
6926#               score         - numeric average
6927#                               across all params
6928#               factors       - arrayref of
6929#                               human-readable
6930#                               factor strings
6931#               per_parameter - hashref of per-
6932#                               parameter score
6933#                               and factor detail
6934#             Returns { level => 'none', ... } if
6935#             no parameters were found.
6936#
6937# Side effects: None.
6938# --------------------------------------------------
6939sub _calculate_input_confidence {
6940
80
1319
        my ($self, $params) = @_;
6941
6942
80
63
        my @factors;  # Track all confidence factors
6943
6944
80
123
        return { level => 'none', factors => ['No parameters found'] } unless keys %$params;
6945
6946
58
49
        my $total_score = 0;
6947
58
42
        my $count = 0;
6948
58
41
        my %param_details;      # Store per-parameter analysis
6949
6950
58
71
        foreach my $param (keys %$params) {
6951
88
74
                my $p = $params->{$param};
6952
88
51
                my $score = 0;
6953
88
57
                my @param_factors;
6954
6955                # Type information
6956
88
85
                if ($p->{type}) {
6957
87
207
                        if ($p->{type} eq 'string' && ($p->{min} || $p->{max} || $p->{matches})) {
6958
6
8
                                $score += 25;
6959
6
7
                                push @param_factors, "Type: constrained string (+25)";
6960                        } elsif ($p->{type} eq 'string') {
6961
17
14
                                $score += 10;
6962
17
18
                                push @param_factors, "Type: plain string (+10)";
6963                        } else {
6964
64
94
                                $score += 30;
6965
64
79
                                push @param_factors, "Type: $p->{type} (+30)";
6966                        }
6967                } else {
6968
1
1
                        push @param_factors, "No type information (-0)";
6969                }
6970
6971                # Constraints
6972
88
95
                if (defined $p->{min}) {
6973
9
7
                        $score += 15;
6974
9
11
                        push @param_factors, 'Has min constraint (+15)';
6975                }
6976
88
92
                if (defined $p->{max}) {
6977
8
7
                        $score += 15;
6978
8
55
                        push @param_factors, "Has max constraint (+15)";
6979                }
6980
88
83
                if (defined $p->{optional}) {
6981
85
65
                        $score += 20;
6982
85
55
                        push @param_factors, "Optional/required explicitly defined (+20)";
6983                }
6984
88
86
                if ($p->{matches}) {
6985
2
3
                        $score += 20;
6986
2
2
                        push @param_factors, 'Has regex pattern constraint (+20)';
6987                }
6988
88
97
                if ($p->{isa}) {
6989
5
4
                        $score += 25;
6990
5
6
                        push @param_factors, "Specific class constraint: $p->{isa} (+25)";
6991                }
6992
6993                # Position information
6994
88
126
                if (defined $p->{position}) {
6995
85
50
                        $score += 10;
6996
85
93
                        push @param_factors, "Position defined: $p->{position} (+10)";
6997                }
6998
6999                # Default value
7000
88
85
                if (exists $p->{_default}) {
7001
21
14
                        $score += 10;
7002
21
16
                        push @param_factors, "Has default value (+10)";
7003                }
7004
7005                # Semantic information
7006
88
82
                if ($p->{semantic}) {
7007
15
13
                        $score += 15;
7008
15
11
                        push @param_factors, "Semantic type: $p->{semantic} (+15)";
7009                }
7010
7011
88
120
                $param_details{$param} = {
7012                        score => $score,
7013                        factors => \@param_factors
7014                };
7015
7016
88
62
                $total_score += $score;
7017
88
82
                $count++;
7018        }
7019
7020
58
136
        my $avg = $count ? ($total_score / $count) : 0;
7021
7022        # Build summary factors
7023
58
131
        push @factors, sprintf("Analyzed %d parameter%s", $count, $count == 1 ? '' : 's');
7024
58
266
        push @factors, sprintf("Average confidence score: %.1f", $avg);
7025
7026        # Add top contributing factors
7027
58
39
93
50
        my @sorted_params = sort { $param_details{$b}{score} <=> $param_details{$a}{score} } keys %param_details;
7028
7029
58
66
        if (@sorted_params) {
7030
58
52
                my $highest = $sorted_params[0];
7031
58
56
                my $highest_score = $param_details{$highest}{score};
7032
58
92
                push @factors, sprintf("Highest scoring parameter: \$$highest (score: %d)", $highest_score);
7033
7034
58
68
                if (@sorted_params > 1) {
7035
20
15
                        my $lowest = $sorted_params[-1];
7036
20
19
                        my $lowest_score = $param_details{$lowest}{score};
7037
20
30
                        push @factors, sprintf("Lowest scoring parameter: \$$lowest (score: %d)", $lowest_score);
7038                }
7039        }
7040
7041        # Determine confidence level
7042
58
44
        my $level;
7043
58
91
        if ($avg >= $CONFIDENCE_HIGH_THRESHOLD) {
7044
43
137
                $level = $LEVEL_HIGH;
7045
43
100
                push @factors, "High confidence: comprehensive type and constraint information";
7046        } elsif ($avg >= $CONFIDENCE_MEDIUM_THRESHOLD) {
7047
12
54
                $level = $LEVEL_MEDIUM;
7048
12
26
                push @factors, "Medium confidence: some type or constraint information present";
7049        } elsif ($avg >= $CONFIDENCE_LOW_THRESHOLD) {
7050
1
11
                $level = $LEVEL_LOW;
7051
1
5
                push @factors, "Low confidence: minimal type information";
7052        } else {
7053
2
18
                $level = $LEVEL_VERY_LOW;
7054
2
4
                push @factors, "Very low confidence: little to no type information";
7055        }
7056
7057        return {
7058
58
180
                level => $level,
7059                score => $avg,
7060                factors => \@factors,
7061                per_parameter => \%param_details
7062        };
7063}
7064
7065# --------------------------------------------------
7066# _calculate_output_confidence
7067#
7068# Purpose:    Calculate a confidence score and level
7069#             for the output analysis based on how
7070#             much return type, value, class,
7071#             context, and error convention
7072#             information was determined.
7073#
7074# Entry:      $output - the output hashref as built
7075#                       by _analyze_output.
7076#
7077# Exit:       Returns a hashref with keys:
7078#               level   - one of: none, very_low,
7079#                         low, medium, high
7080#               score   - numeric confidence score
7081#               factors - arrayref of factor strings
7082#             Returns { level => 'none', ... } if
7083#             output is empty.
7084#
7085# Side effects: None.
7086# --------------------------------------------------
7087sub _calculate_output_confidence {
7088
296
928
        my ($self, $output) = @_;
7089
7090
296
187
        my @factors;
7091
7092
296
314
        return { level => 'none', factors => ['No return information found'] } unless keys %$output;
7093
7094
279
198
        my $score = 0;
7095
7096        # Type information
7097
279
268
        if ($output->{type}) {
7098
274
196
                $score += 30;
7099
274
371
                push @factors, "Return type defined: $output->{type} (+30)";
7100        } else {
7101
5
7
                push @factors, 'No return type information (-0)';
7102        }
7103
7104        # Specific value known
7105
279
312
        if (defined $output->{value}) {
7106
23
20
                $score += 30;
7107
23
30
                push @factors, "Specific return value: $output->{value} (+30)";
7108        }
7109
7110        # Class information for objects
7111
279
265
        if ($output->{isa}) {
7112
22
21
                $score += 30;
7113
22
27
                push @factors, "Returns specific class: $output->{isa} (+30)";
7114        }
7115
7116        # Context-aware returns
7117
279
250
        if ($output->{_context_aware}) {
7118
4
1
                $score += 20;
7119
4
4
                push @factors, "Context-aware return (wantarray) (+20)";
7120
7121
4
5
                if ($output->{_list_context}) {
7122
4
4
                        push @factors, "  List context: $output->{_list_context}{type}";
7123                }
7124
4
4
                if ($output->{_scalar_context}) {
7125
3
3
                        push @factors, "  Scalar context: $output->{_scalar_context}{type}";
7126                }
7127        }
7128
7129        # Error handling information
7130
279
277
        if ($output->{_error_return}) {
7131
13
7
                $score += 15;
7132
13
15
                push @factors, "Error return convention documented: $output->{_error_return} (+15)";
7133        }
7134
7135        # Success/failure pattern
7136
279
259
        if ($output->{_success_failure_pattern}) {
7137
3
3
                $score += 10;
7138
3
3
                push @factors, 'Success/failure pattern detected (+10)';
7139        }
7140
7141        # Chainable methods
7142
279
265
        if ($output->{_returns_self}) {
7143
6
4
                $score += 15;
7144
6
7
                push @factors, "Chainable method (fluent interface) (+15)";
7145        }
7146
7147        # Void context
7148
279
262
        if ($output->{_void_context}) {
7149
5
5
                $score += 20;
7150
5
5
                push @factors, "Void context method (no meaningful return) (+20)";
7151        }
7152
7153        # Exception handling
7154
279
274
        if ($output->{_error_handling} && $output->{_error_handling}{exception_handling}) {
7155
2
2
                $score += 10;
7156
2
2
                push @factors, 'Exception handling present (+10)';
7157        }
7158
7159
279
472
        push @factors, sprintf("Total output confidence score: %d", $score);
7160
7161        # Determine confidence level
7162
279
198
        my $level;
7163
279
510
        if ($score >= $CONFIDENCE_HIGH_THRESHOLD) {
7164
48
138
                $level = $LEVEL_HIGH;
7165
48
108
                push @factors, "High confidence: detailed return type and behavior";
7166        } elsif ($score >= $CONFIDENCE_MEDIUM_THRESHOLD) {
7167
15
72
                $level = $LEVEL_MEDIUM;
7168
15
32
                push @factors, "Medium confidence: return type defined";
7169        } elsif ($score >= $CONFIDENCE_LOW_THRESHOLD) {
7170
215
1268
                $level = $LEVEL_LOW;
7171
215
460
                push @factors, "Low confidence: minimal return information";
7172        } else {
7173
1
6
                $level = $LEVEL_VERY_LOW;
7174
1
3
                push @factors, 'Very low confidence: little return information';
7175        }
7176
7177        return {
7178
279
714
                level => $level,
7179                score => $score,
7180                factors => \@factors
7181        };
7182}
7183
7184# --------------------------------------------------
7185# _generate_confidence_report
7186#
7187# Purpose:    Generate a human-readable text report
7188#             of all confidence factors for a
7189#             schema, for debugging and review
7190#             purposes.
7191#
7192# Entry:      $schema - schema hashref containing
7193#                       a populated _analysis key.
7194#
7195# Exit:       Returns a multi-line string report,
7196#             or nothing if $schema->{_analysis}
7197#             is absent.
7198#
7199# Side effects: None.
7200# --------------------------------------------------
7201sub _generate_confidence_report
7202{
7203
3
14
        my ($self, $schema) = @_;
7204
7205
3
6
        return unless $schema->{_analysis};
7206
7207
2
2
        my $analysis = $schema->{_analysis};
7208
2
1
        my @report;
7209
7210
2
5
        push @report, "Confidence Analysis for " . ($schema->{method_name} || 'method');
7211
2
3
        push @report, '=' x 60;
7212
2
1
        push @report, '';
7213
7214
2
3
        push @report, "Overall Confidence: " . uc($analysis->{overall_confidence});
7215
2
2
        push @report, '';
7216
7217
2
3
        if ($analysis->{confidence_factors}{input}) {
7218                push @report, (
7219                        "Input Parameters:",
7220                         "  Confidence Level: " . uc($analysis->{input_confidence})
7221
2
3
                );
7222
2
2
2
2
                foreach my $factor (@{$analysis->{confidence_factors}{input}}) {
7223
2
3
                        push @report, "  - $factor";
7224                }
7225
2
2
                push @report, '';
7226        }
7227
7228
2
3
        if ($analysis->{confidence_factors}{output}) {
7229                push @report, 'Return Value:',
7230
2
2
                        "  Confidence Level: " . uc($analysis->{output_confidence});
7231
2
2
2
2
                foreach my $factor (@{$analysis->{confidence_factors}{output}}) {
7232
2
3
                        push @report, "  - $factor";
7233                }
7234
2
2
                push @report, '';
7235        }
7236
7237
2
2
        if ($analysis->{per_parameter_scores}) {
7238
0
0
                push @report, 'Per-Parameter Analysis:';
7239
0
0
0
0
                foreach my $param (sort keys %{$analysis->{per_parameter_scores}}) {
7240
0
0
                        my $details = $analysis->{per_parameter_scores}{$param};
7241
0
0
                        push @report, "  \$$param (score: $details->{score}):";
7242
0
0
0
0
                        foreach my $factor (@{$details->{factors}}) {
7243
0
0
                                push @report, "    - $factor";
7244                        }
7245                }
7246
0
0
                push @report, '';
7247        }
7248
7249
2
7
        return join("\n", @report);
7250}
7251
7252# --------------------------------------------------
7253# _generate_notes
7254#
7255# Purpose:    Generate human-readable advisory notes
7256#             about parameters whose type or
7257#             optionality could not be determined,
7258#             to guide manual schema review.
7259#
7260# Entry:      $params - hashref of merged parameter
7261#                       specs.
7262#
7263# Exit:       Returns an arrayref of note strings.
7264#             Returns an empty arrayref if all
7265#             parameters have known types and
7266#             optionality.
7267#
7268# Side effects: None.
7269# --------------------------------------------------
7270sub _generate_notes {
7271
290
1327
        my ($self, $params) = @_;
7272
7273
290
194
        my @notes;
7274
7275
290
389
        foreach my $param (keys %$params) {
7276
190
152
                my $p = $params->{$param};
7277
7278
190
205
                unless ($p->{type}) {
7279
57
59
                        push @notes, "$param: type unknown - please review - will set to 'string' as a default";
7280                }
7281
7282
190
210
                unless (defined $p->{optional}) {
7283
11
14
                        push @notes, "$param: optional status unknown";
7284                        # Don't automatically set - let it be undef if we don't know
7285                }
7286        }
7287
7288
290
363
        return \@notes;
7289}
7290
7291# --------------------------------------------------
7292# _set_defaults
7293#
7294# Purpose:    Apply default type values to any
7295#             parameters in a schema mode (input
7296#             or output) whose type was not set
7297#             during analysis, setting them to
7298#             'string' as a conservative fallback.
7299#
7300# Entry:      $schema - the schema hashref being
7301#                       built by _analyze_method.
7302#             $mode   - either 'input' or 'output'.
7303#
7304# Exit:       Returns nothing. Modifies $schema in
7305#             place by setting type => 'string' on
7306#             any parameter that lacks a type, and
7307#             downgrading input confidence to 'low'.
7308#
7309# Side effects: Logs type defaulting to stdout when
7310#               verbose is set.
7311#
7312# Notes:      Called after all analysis is complete
7313#             so that genuine type unknowns can be
7314#             distinguished from analysis gaps.
7315# --------------------------------------------------
7316sub _set_defaults {
7317
572
443
        my ($self, $schema, $mode) = @_;
7318
7319
572
410
        my $params = $schema->{$mode};
7320
7321
572
597
        foreach my $param (keys %$params) {
7322
716
515
                my $p = $params->{$param};
7323
7324
716
751
                next unless(ref($p) eq 'HASH');
7325
209
222
                unless ($p->{type}) {
7326
71
126
                        $self->_log("  DEBUG {$mode}{$param}: Setting to 'string' as a default");
7327
71
64
                        $p->{'type'} = 'string';
7328
71
82
                        $schema->{_confidence}{$mode}->{level} = 'low';   # Setting a default means it's a guess
7329                }
7330        }
7331}
7332
7333# --------------------------------------------------
7334# _analyze_relationships
7335#
7336# Purpose:    Detect inter-parameter relationships
7337#             in a method's source code, including
7338#             mutually exclusive parameters, required
7339#             groups, conditional requirements,
7340#             dependencies, and value-based
7341#             constraints.
7342#
7343# Entry:      $method - method hashref containing
7344#                       at minimum a 'body' key
7345#                       with the source string.
7346#
7347# Exit:       Returns an arrayref of relationship
7348#             hashrefs. Returns an empty arrayref
7349#             if no parameters or no relationships
7350#             are found.
7351#
7352# Side effects: Logs detections to stdout when
7353#               verbose is set.
7354#
7355# Notes:      Parameter names are extracted via
7356#             _extract_parameters_from_signature, so
7357#             every style it supports -- my (...) =
7358#             @_, shift-style (my $x = shift), direct-
7359#             index ($_[N]), and modern signatures --
7360#             is analysed for relationships, not just
7361#             the my (...) = @_ list-assignment form.
7362# --------------------------------------------------
7363sub _analyze_relationships {
7364
291
250
        my ($self, $method) = @_;
7365
7366
291
250
        my $code = $method->{body};
7367
291
207
        my @relationships;
7368
7369        # Extract all parameter names from the method, using the same
7370        # multi-style detection used for schema population so shift-style
7371        # and modern-signature methods get relationship analysis too
7372        my %params;
7373
291
376
        $self->_extract_parameters_from_signature(\%params, $code);
7374
291
76
367
116
        my @param_names = sort { $params{$a}{position} <=> $params{$b}{position} } keys %params;
7375
7376
291
358
        return [] unless @param_names;
7377
7378        # Detect mutually exclusive parameters
7379
125
125
91
175
        push @relationships, @{$self->_detect_mutually_exclusive($code, \@param_names)};
7380
7381        # Detect required groups (OR logic)
7382
125
125
103
168
        push @relationships, @{$self->_detect_required_groups($code, \@param_names)};
7383
7384        # Detect conditional requirements (IF-THEN)
7385
125
125
94
151
        push @relationships, @{$self->_detect_conditional_requirements($code, \@param_names)};
7386
7387        # Detect dependencies
7388
125
125
94
169
        push @relationships, @{$self->_detect_dependencies($code, \@param_names)};
7389
7390        # Detect value-based constraints
7391
125
125
98
153
        push @relationships, @{$self->_detect_value_constraints($code, \@param_names)};
7392
7393        # Deduplicate relationships
7394
125
162
        my @unique = $self->_deduplicate_relationships(\@relationships);
7395
7396
125
242
        return \@unique;
7397}
7398
7399# --------------------------------------------------
7400# _deduplicate_relationships
7401#
7402# Purpose:    Remove duplicate relationship entries
7403#             from the relationships list by
7404#             computing a canonical signature for
7405#             each relationship type.
7406#
7407# Entry:      $relationships - arrayref of
7408#                              relationship hashrefs.
7409#
7410# Exit:       Returns a deduplicated list of
7411#             relationship hashrefs.
7412#
7413# Side effects: None.
7414# --------------------------------------------------
7415sub _deduplicate_relationships {
7416
129
1050
        my ($self, $relationships) = @_;
7417
7418
129
90
        my @unique;
7419        my %seen;
7420
7421
129
114
        foreach my $rel (@$relationships) {
7422                # Create a signature for this relationship
7423
31
19
                my $sig;
7424
31
46
                if ($rel->{type} eq 'mutually_exclusive') {
7425
13
13
8
23
                        $sig = join(':', 'mutex', sort @{$rel->{params}});
7426                } elsif ($rel->{type} eq 'required_group') {
7427
5
5
5
11
                        $sig = join(':', 'reqgroup', sort @{$rel->{params}});
7428                } elsif ($rel->{type} eq 'conditional_requirement') {
7429
7
8
                        $sig = join(':', 'condreq', $rel->{if}, $rel->{then_required});
7430                } elsif ($rel->{type} eq 'dependency') {
7431
3
3
                        $sig = join(':', 'dep', $rel->{param}, $rel->{requires});
7432                } elsif ($rel->{type} eq 'value_constraint') {
7433
2
3
                        $sig = join(':', 'valcon', $rel->{if}, $rel->{then}, $rel->{operator}, $rel->{value});
7434                } elsif ($rel->{type} eq 'value_conditional') {
7435
1
2
                        $sig = join(':', 'valcond', $rel->{if}, $rel->{equals}, $rel->{then_required});
7436                } else {
7437
0
0
                        $sig = join(':', $rel->{type}, %$rel);
7438                }
7439
7440
31
69
                unless ($seen{$sig}++) {
7441
25
21
                        push @unique, $rel;
7442                }
7443        }
7444
7445
129
155
        return @unique;
7446}
7447
7448# --------------------------------------------------
7449# _detect_mutually_exclusive
7450#
7451# Purpose:    Detect pairs of parameters that cannot
7452#             be specified together, by searching
7453#             for die/croak/confess patterns
7454#             that fire when both are truthy.
7455#
7456# Entry:      $code        - method body source string.
7457#             $param_names - arrayref of parameter
7458#                            name strings.
7459#
7460# Exit:       Returns an arrayref of relationship
7461#             hashrefs of type 'mutually_exclusive'.
7462#             Returns an empty arrayref if none found.
7463#
7464# Side effects: Logs detections to stdout when
7465#               verbose is set.
7466# --------------------------------------------------
7467sub _detect_mutually_exclusive {
7468
130
1416
        my ($self, $code, $param_names) = @_;
7469
7470
130
84
        my @relationships;
7471
7472        # Pattern 1: die/croak if $x && $y
7473        # Look for: die/croak ... if $param1 && $param2
7474
130
128
        foreach my $param1 (@$param_names) {
7475
196
161
                foreach my $param2 (@$param_names) {
7476
392
413
                        next if $param1 eq $param2;
7477
7478                        # Check various patterns
7479
196
8665
                        if ($code =~ /(?:die|croak|confess)[^;]*if\s+\$$param1\s+&&\s+\$$param2/ ||
7480                            $code =~ /(?:die|croak|confess)[^;]*if\s+\$$param2\s+&&\s+\$$param1/) {
7481
7482                                # Avoid duplicates (param1,param2 vs param2,param1)
7483
22
20
                                my $found_reverse = 0;
7484
22
24
                                foreach my $rel (@relationships) {
7485
13
34
                                        if ($rel->{type} eq 'mutually_exclusive' &&
7486                                            (($rel->{params}[0] eq $param2 && $rel->{params}[1] eq $param1))) {
7487
11
9
                                                $found_reverse = 1;
7488
11
9
                                                last;
7489                                        }
7490                                }
7491
7492
22
33
                                next if $found_reverse;
7493
7494
11
27
                                push @relationships, {
7495                                        type => 'mutually_exclusive',
7496                                        params => [$param1, $param2],
7497                                        description => "Cannot specify both $param1 and $param2"
7498                                };
7499
7500
11
19
                                $self->_log("  RELATIONSHIP: $param1 and $param2 are mutually exclusive");
7501                        }
7502
7503                        # Pattern 2: die "Cannot specify both X and Y"
7504
185
16003
                        if ($code =~ /(?:die|croak|confess)\s+['"](Cannot|Can't)[^'"]*both[^'"]*$param1[^'"]*$param2/i ||
7505                            $code =~ /(?:die|croak|confess)\s+['"](Cannot|Can't)[^'"]*both[^'"]*$param2[^'"]*$param1/i) {
7506
7507
1
1
                                my $found_reverse = 0;
7508
1
2
                                foreach my $rel (@relationships) {
7509
1
3
                                        if ($rel->{type} eq 'mutually_exclusive' &&
7510                                            (($rel->{params}[0] eq $param2 && $rel->{params}[1] eq $param1))) {
7511
0
0
                                                $found_reverse = 1;
7512
0
0
                                                last;
7513                                        }
7514                                }
7515
7516
1
1
                                next if $found_reverse;
7517
7518
1
3
                                push @relationships, {
7519                                        type => 'mutually_exclusive',
7520                                        params => [$param1, $param2],
7521                                        description => "Cannot specify both $param1 and $param2"
7522                                };
7523
7524
1
2
                                $self->_log("  RELATIONSHIP: $param1 and $param2 are mutually exclusive (from error message)");
7525                        }
7526                }
7527        }
7528
7529
130
164
        return \@relationships;
7530}
7531
7532# --------------------------------------------------
7533# _detect_required_groups
7534#
7535# Purpose:    Detect parameter groups where at least
7536#             one parameter must be specified (OR
7537#             logic), by searching for die/croak
7538#             patterns that fire unless any of the
7539#             group is truthy.
7540#
7541# Entry:      $code        - method body source string.
7542#             $param_names - arrayref of parameter
7543#                            name strings.
7544#
7545# Exit:       Returns an arrayref of relationship
7546#             hashrefs of type 'required_group'.
7547#             Returns an empty arrayref if none found.
7548#
7549# Side effects: Logs detections to stdout when
7550#               verbose is set.
7551# --------------------------------------------------
7552sub _detect_required_groups {
7553
128
1072
        my ($self, $code, $param_names) = @_;
7554
7555
128
86
        my @relationships;
7556
7557        # Pattern 1: die/croak unless $x || $y
7558
128
129
        foreach my $param1 (@$param_names) {
7559
192
150
                foreach my $param2 (@$param_names) {
7560
384
363
                        next if $param1 eq $param2;
7561
7562
192
7799
                        if ($code =~ /(?:die|croak|confess)[^;]*unless\s+\$$param1\s+\|\|\s+\$$param2/ ||
7563                            $code =~ /(?:die|croak|confess)[^;]*unless\s+\$$param2\s+\|\|\s+\$$param1/) {
7564
7565                                # Avoid duplicates
7566
10
11
                                my $found_reverse = 0;
7567
10
11
                                foreach my $rel (@relationships) {
7568
5
20
                                        if ($rel->{type} eq 'required_group' &&
7569                                            (($rel->{params}[0] eq $param2 && $rel->{params}[1] eq $param1))) {
7570
5
5
                                                $found_reverse = 1;
7571
5
4
                                                last;
7572                                        }
7573                                }
7574
7575
10
18
                                next if $found_reverse;
7576
7577
5
15
                                push @relationships, {
7578                                        type => 'required_group',
7579                                        params => [$param1, $param2],
7580                                        logic => 'or',
7581                                        description => "Must specify either $param1 or $param2"
7582                                };
7583
7584
5
23
                                $self->_log("  RELATIONSHIP: Must specify either $param1 or $param2");
7585                        }
7586
7587                        # Pattern 2: die "Must specify either X or Y"
7588
187
16374
                        if ($code =~ /(?:die|croak|confess)\s+['"]Must\s+specify\s+either[^'"]*$param1[^'"]*or[^'"]*$param2/i ||
7589                            $code =~ /(?:die|croak|confess)\s+['"]Must\s+specify\s+either[^'"]*$param2[^'"]*or[^'"]*$param1/i) {
7590
7591
1
1
                                my $found_reverse = 0;
7592
1
2
                                foreach my $rel (@relationships) {
7593
1
3
                                        if ($rel->{type} eq 'required_group' &&
7594                                            (($rel->{params}[0] eq $param2 && $rel->{params}[1] eq $param1))) {
7595
0
0
                                                $found_reverse = 1;
7596
0
0
                                                last;
7597                                        }
7598                                }
7599
7600
1
1
                                next if $found_reverse;
7601
7602
1
3
                                push @relationships, {
7603                                        type => 'required_group',
7604                                        params => [$param1, $param2],
7605                                        logic => 'or',
7606                                        description => "Must specify either $param1 or $param2"
7607                                };
7608
7609
1
2
                                $self->_log("  RELATIONSHIP: Must specify either $param1 or $param2 (from error message)");
7610                        }
7611                }
7612        }
7613
7614
128
133
        return \@relationships;
7615}
7616
7617# --------------------------------------------------
7618# _detect_conditional_requirements
7619#
7620# Purpose:    Detect IF-THEN parameter relationships
7621#             where one parameter being present
7622#             makes another required, by searching
7623#             for die/croak patterns of the form
7624#             'die if $x && !$y'.
7625#
7626# Entry:      $code        - method body source string.
7627#             $param_names - arrayref of parameter
7628#                            name strings.
7629#
7630# Exit:       Returns an arrayref of relationship
7631#             hashrefs of type
7632#             'conditional_requirement'.
7633#             Returns an empty arrayref if none found.
7634#
7635# Side effects: Logs detections to stdout when
7636#               verbose is set.
7637# --------------------------------------------------
7638sub _detect_conditional_requirements {
7639
128
1647
        my ($self, $code, $param_names) = @_;
7640
7641
128
90
        my @relationships;
7642
7643
128
99
        foreach my $param1 (@$param_names) {
7644
192
138
                foreach my $param2 (@$param_names) {
7645
384
353
                        next if $param1 eq $param2;
7646
7647                        # Pattern 1: die if $x && !$y  (if x then y required)
7648
192
3918
                        if ($code =~ /(?:die|croak|confess)[^;]*if\s+\$$param1\s+&&\s+!\$$param2/) {
7649
5
17
                                push @relationships, {
7650                                        type => 'conditional_requirement',
7651                                        if => $param1,
7652                                        then_required => $param2,
7653                                        description => "When $param1 is specified, $param2 is required"
7654                                };
7655
7656
5
9
                                $self->_log("  RELATIONSHIP: $param1 requires $param2");
7657                        }
7658
7659                        # Pattern 2: die if $x && !defined($y)
7660
192
5710
                        if ($code =~ /(?:die|croak|confess)[^;]*if\s+\$$param1\s+&&\s+!defined\s*\(\s*\$$param2\s*\)/) {
7661
0
0
                                push @relationships, {
7662                                        type => 'conditional_requirement',
7663                                        if => $param1,
7664                                        then_required => $param2,
7665                                        description => "When $param1 is specified, $param2 is required"
7666                                };
7667
7668
0
0
                                $self->_log("  RELATIONSHIP: $param1 requires $param2 (defined check)");
7669                        }
7670
7671                        # Pattern 3: Error message "X requires Y"
7672
192
8154
                        if ($code =~ /(?:die|croak|confess)\s+['"]\w*$param1[^'"]*requires[^'"]*$param2/i) {
7673
4
8
                                push @relationships, {
7674                                        type => 'conditional_requirement',
7675                                        if => $param1,
7676                                        then_required => $param2,
7677                                        description => "When $param1 is specified, $param2 is required"
7678                                };
7679
7680
4
5
                                $self->_log("  RELATIONSHIP: $param1 requires $param2 (from error message)");
7681                        }
7682                }
7683        }
7684
7685
128
139
        return \@relationships;
7686}
7687
7688# --------------------------------------------------
7689# _detect_dependencies
7690#
7691# Purpose:    Detect simple parameter dependencies
7692#             where one parameter requires another
7693#             to also be present, by combining
7694#             error message pattern matching with
7695#             code condition matching.
7696#
7697# Entry:      $code        - method body source string.
7698#             $param_names - arrayref of parameter
7699#                            name strings.
7700#
7701# Exit:       Returns an arrayref of relationship
7702#             hashrefs of type 'dependency'.
7703#             Returns an empty arrayref if none found.
7704#
7705# Side effects: Logs detections to stdout when
7706#               verbose is set.
7707# --------------------------------------------------
7708sub _detect_dependencies {
7709
129
27480
        my ($self, $code, $param_names) = @_;
7710
7711
129
90
        my @relationships;
7712
7713
129
119
        foreach my $param1 (@$param_names) {
7714
194
154
                foreach my $param2 (@$param_names) {
7715
388
357
                        next if $param1 eq $param2;
7716
7717                        # Pattern 1: Error message mentions "X requires Y" AND code checks $x && !$y
7718                        # Split into two checks to be more flexible
7719
194
7762
                        if (($code =~ /(?:die|croak|confess)\s+['"]\w*$param1[^'"]*requires[^'"]*$param2/i) &&
7720                            ($code =~ /if\s+\$$param1\s+&&\s+!\$$param2/)) {
7721
7722
6
22
                                push @relationships, {
7723                                        type => 'dependency',
7724                                        param => $param1,
7725                                        requires => $param2,
7726                                        description => "$param1 requires $param2 to be specified"
7727                                };
7728
7729
6
10
                                $self->_log("  RELATIONSHIP: $param1 depends on $param2");
7730                        }
7731                }
7732        }
7733
7734
129
115
        return \@relationships;
7735}
7736
7737# --------------------------------------------------
7738# _detect_value_constraints
7739#
7740# Purpose:    Detect value-based constraints between
7741#             parameters, such as 'if $ssl then
7742#             $port must equal 443' or 'if $mode
7743#             eq secure then $key is required'.
7744#
7745# Entry:      $code        - method body source string.
7746#             $param_names - arrayref of parameter
7747#                            name strings.
7748#
7749# Exit:       Returns an arrayref of relationship
7750#             hashrefs of type 'value_constraint'
7751#             or 'value_conditional'.
7752#             Returns an empty arrayref if none found.
7753#
7754# Side effects: Logs detections to stdout when
7755#               verbose is set.
7756# --------------------------------------------------
7757sub _detect_value_constraints {
7758
126
127
        my ($self, $code, $param_names) = @_;
7759
7760
126
89
        my @relationships;
7761
7762
126
103
        foreach my $param1 (@$param_names) {
7763
188
142
                foreach my $param2 (@$param_names) {
7764
376
325
                        next if $param1 eq $param2;
7765
7766                        # Pattern 1: die if $x && $y != value
7767
188
5271
                        if ($code =~ /(?:die|croak|confess)[^;]*if\s+\$$param1\s+&&\s+\$$param2\s*!=\s*(\d+)/) {
7768
3
5
                                my $value = $1;
7769
3
11
                                push @relationships, {
7770                                        type => 'value_constraint',
7771                                        if => $param1,
7772                                        then => $param2,
7773                                        operator => '==',
7774                                        value => $value,
7775                                        description => "When $param1 is specified, $param2 must equal $value"
7776                                };
7777
7778
3
6
                                $self->_log("  RELATIONSHIP: $param1 requires $param2 == $value");
7779                        }
7780
7781                        # Pattern 2: die if $x && $y < value
7782
188
5136
                        if ($code =~ /(?:die|croak|confess)[^;]*if\s+\$$param1\s+&&\s+\$$param2\s*<\s*(\d+)/) {
7783
0
0
                                my $value = $1;
7784
0
0
                                push @relationships, {
7785                                        type => 'value_constraint',
7786                                        if => $param1,
7787                                        then => $param2,
7788                                        operator => '>=',
7789                                        value => $value,
7790                                        description => "When $param1 is specified, $param2 must be >= $value"
7791                                };
7792
7793
0
0
                                $self->_log("  RELATIONSHIP: $param1 requires $param2 >= $value");
7794                        }
7795
7796                        # Pattern 3: die if $x eq 'value' && !$y
7797
188
6595
                        if ($code =~ /(?:die|croak|confess)[^;]*if\s+\$$param1\s+eq\s+['"]([^'"]+)['"]\s+&&\s+!\$$param2/) {
7798
1
2
                                my $value = $1;
7799
1
4
                                push @relationships, {
7800                                        type => 'value_conditional',
7801                                        if => $param1,
7802                                        equals => $value,
7803                                        then_required => $param2,
7804                                        description => "When $param1 equals '$value', $param2 is required"
7805                                };
7806
7807
1
2
                                $self->_log("  RELATIONSHIP: $param1='$value' requires $param2");
7808                        }
7809                }
7810        }
7811
7812
126
120
        return \@relationships;
7813}
7814
7815# Write a single method schema to a YAML file in output_dir.
7816#
7817# Entry:      $method_name is a non-empty string; $schema is a hashref.
7818# Exit:       YAML file written to output_dir/$method_name.yml.
7819# Side effects: Creates output_dir if it does not exist.
7820# Notes:      Croaks if output_dir was not set in new().
7821
7822sub _write_schema {
7823
105
5011
        my ($self, $method_name, $schema) = @_;
7824
7825        # output_dir is required here — croak early with a clear message
7826        # rather than letting make_path fail with a cryptic error
7827
105
139
        croak(__PACKAGE__, ': output_dir must be provided to new() when writing schema files') unless defined $self->{output_dir};
7828
7829
104
1318
        make_path($self->{output_dir}) unless -d $self->{output_dir};
7830
7831
104
127
        my $filename = "$self->{output_dir}/${method_name}.yml";
7832
7833        # Configure YAML::XS to not quote numeric strings
7834
104
101
        local $YAML::XS::QuoteNumericStrings = 0;
7835
7836        # Extract package name for module field
7837
104
107
        my $package_name = '';
7838
104
209
        if ($self->{_document}) {
7839
103
155
                my $package_stmt = $self->{_document}->find_first('PPI::Statement::Package');
7840
103
17229
                $package_name = $package_stmt ? $package_stmt->namespace : '';
7841
103
1256
                $self->{_package_name} //= $package_name;
7842        }
7843
7844        # Clean up schema for output - use the format expected by App::Test::Generator::Template
7845
104
367
        my $output = {
7846                function => $method_name,
7847                module => $package_name,
7848                config => {
7849                        close_stdin => 0,
7850                        dedup => 1,
7851                        test_nuls => 0,
7852                        test_undef => 0,
7853                        test_empty => 1,
7854                        test_non_ascii => 0,
7855                        test_security => 0
7856                }
7857        };
7858
7859        # Process input parameters with advanced type handling
7860
104
129
        if($schema->{'input'}) {
7861
101
101
67
125
                if(scalar(keys %{$schema->{'input'}})) {
7862
84
110
                        $output->{'input'} = {};
7863
7864
84
84
61
130
                        foreach my $param_name (keys %{$schema->{'input'}}) {
7865
141
120
                                my $param = $schema->{'input'}{$param_name};
7866
141
136
                                if($param->{name}) {
7867
24
22
                                        my $name = delete $param->{name};
7868
24
21
                                        if($name ne $param_name) {
7869                                                # Sanity check
7870
0
0
                                                croak("BUG: Parameter name - expected $param_name, got $name");
7871                                        }
7872                                }
7873
141
153
                                my $cleaned_param = $self->_serialize_parameter_for_yaml($param);
7874
141
150
                                $output->{'input'}{$param_name} = $cleaned_param;
7875                        }
7876
7877                        # If some params have positions and others don't, treat the whole
7878                        # input as a named (hash) API and strip all positions.  Mixed
7879                        # position state arises when a named-API method also happens to
7880                        # have a Params::Get positional-key call alongside =head4 Input
7881                        # named-block params that carry no position.
7882
84
141
84
71
198
114
                        my @with_pos    = grep { defined $output->{input}{$_}{position} } keys %{$output->{input}};
7883
84
141
84
65
140
81
                        my @without_pos = grep { !defined $output->{input}{$_}{position} } keys %{$output->{input}};
7884
84
176
                        if (@with_pos && @without_pos) {
7885
0
0
                                delete $output->{input}{$_}{position} for @with_pos;
7886                        }
7887                } else {
7888
17
20
                        delete $output->{input};
7889                }
7890        }
7891
7892        # Process output
7893
104
104
162
147
        if($schema->{'output'} && (scalar(keys %{$schema->{'output'}}))) {
7894
104
103
157
166
                if((ref($schema->{output}{_error_handling}) eq 'HASH') && (scalar(keys %{$schema->{output}{_error_handling}}) == 0)) {
7895
90
80
                        delete $schema->{output}{_error_handling};
7896                }
7897
104
126
                $output->{'output'} = $schema->{'output'};
7898        }
7899
7900
104
210
        if($schema->{'output'}{'type'} && ($schema->{'output'}{'type'} eq 'scalar')) {
7901
0
0
                $schema->{'output'}{'type'} = 'string';
7902
0
0
                $schema->{_confidence}{output}->{level} = 'low';  # A guess
7903        }
7904
7905        # Add 'new' field if object instantiation is needed
7906
104
110
        if ($schema->{new}) {
7907                # TODO: consider allowing parent class packages up the ISA chain
7908
76
124
                if(ref($schema->{new}) || ($schema->{new} eq $package_name)) {
7909
75
108
                        $output->{new} = $schema->{new} eq $package_name ? undef : $schema->{'new'};
7910                } else {
7911
1
3
                        $self->_log("  NEW: Don't use $schema->{new} for object insantiation");
7912
1
1
                        delete $schema->{new};
7913
1
1
                        delete $output->{new};
7914                }
7915        }
7916
7917
104
155
        if(!defined($schema->{_confidence}{input}->{level})) {
7918
73
111
                $schema->{_confidence}{input} = $self->_calculate_input_confidence($schema->{input});
7919        }
7920
104
174
        if(!defined($schema->{_confidence}{output}->{level})) {
7921
1
3
                $schema->{_confidence}{output} = $self->_calculate_output_confidence($schema->{output});
7922        }
7923
7924        # Add relationships if detected
7925
104
7
138
11
        if ($schema->{relationships} && @{$schema->{relationships}}) {
7926
7
5
                $output->{relationships} = $schema->{relationships};
7927        }
7928
7929
104
6
120
9
        if($schema->{accessor} && scalar(keys %{$schema->{accessor}})) {
7930
6
8
                $output->{accessor} = $schema->{accessor};
7931        }
7932
7933
104
191
        open my $fh, '>', $filename;
7934
104
28525
        print $fh YAML::XS::Dump($output);
7935
104
453
        print $fh $self->_generate_schema_comments($schema, $method_name);
7936
104
172
        close $fh;
7937
7938        my $rel_info = $schema->{relationships} ?
7939
104
7
11440
11
                ' [' . scalar(@{$schema->{relationships}}) . ' relationships]' : '';
7940        $self->_log("  Wrote: $filename (input confidence: $schema->{_confidence}{input}->{level})" .
7941
104
302
                                ($schema->{new} ? " [requires: $schema->{new}]" : '') . $rel_info);
7942}
7943
7944# --------------------------------------------------
7945# _generate_schema_comments
7946#
7947# Purpose:    Generate the YAML comment block
7948#             appended to the end of each written
7949#             schema file, containing provenance,
7950#             confidence levels, parameter type
7951#             notes, relationship summaries, and
7952#             warnings about types requiring
7953#             special test setup.
7954#
7955# Entry:      $schema      - the schema hashref as
7956#                            built by _analyze_method.
7957#             $method_name - the method name string,
7958#                            used in the fuzz
7959#                            command hint.
7960#
7961# Exit:       Returns a string of YAML comment lines
7962#             beginning with a blank line and ending
7963#             with a trailing newline.
7964#
7965# Side effects: None.
7966# --------------------------------------------------
7967sub _generate_schema_comments {
7968
109
179
        my ($self, $schema, $method_name) = @_;
7969
7970
109
83
        my @comments;
7971
7972
109
158
        push @comments, '';
7973
109
142
        push @comments, '# Generated by ' . ref($self);
7974
109
184
        push @comments, "# Run: fuzz-harness-generator -r $self->{output_dir}/${method_name}.yml";
7975
109
118
        push @comments, '#';
7976
109
136
        push @comments, "# Input confidence: $schema->{_confidence}{input}->{level}";
7977
109
147
        push @comments, "# Output confidence: $schema->{_confidence}{output}->{level}";
7978
7979        # Add notes about parameters
7980
109
143
        if ($schema->{input}) {
7981
101
77
                my @param_notes;
7982
101
101
73
190
                foreach my $param_name (sort keys %{$schema->{input}}) {
7983
141
115
                        my $p = $schema->{input}{$param_name};
7984
7985
141
144
                        if ($p->{semantic}) {
7986
16
19
                                push @param_notes, "$param_name: $p->{semantic}";
7987                        }
7988
7989
141
117
                        if ($p->{enum}) {
7990
4
4
4
6
                                push @param_notes, "$param_name: enum with " . scalar(@{$p->{enum}}) . " values";
7991                        }
7992
7993
141
187
                        if ($p->{isa}) {
7994
6
8
                                push @param_notes, "$param_name: requires $p->{isa} object";
7995                        }
7996                }
7997
7998
101
124
                if (@param_notes) {
7999
18
14
                        push @comments, '#';
8000
18
15
                        push @comments, '# Parameter types detected:';
8001
18
14
                        foreach my $note (@param_notes) {
8002
26
68
                                push @comments, "#   - $note";
8003                        }
8004                }
8005        }
8006
8007        # Add relationship notes
8008
109
8
141
11
        if ($schema->{relationships} && @{$schema->{relationships}}) {
8009
8
8
                push @comments, (
8010                        '#',
8011                        '# Parameter relationships detected:'
8012                );
8013
8
8
5
9
                foreach my $rel (@{$schema->{relationships}}) {
8014
17
17
                        my $desc = $rel->{description} || _format_relationship($rel);
8015
17
16
                        push @comments, "#   - $desc";
8016                }
8017        }
8018
8019        # Add general notes
8020
109
108
135
157
        if ($schema->{_notes} && scalar(@{$schema->{_notes}})) {
8021
30
24
                push @comments, '#';
8022
30
23
                push @comments, '# Notes:';
8023
30
30
24
31
                foreach my $note (@{$schema->{_notes}}) {
8024
46
43
                        push @comments, "#   - $note";
8025                }
8026        }
8027
8028
109
121
        if($schema->{_analysis}) {
8029
103
108
                push @comments, (
8030                        '#',
8031                        '# Analysis:',
8032                        '# TODO:',
8033                );
8034                # confidence_factors:
8035                #   input:
8036                #   - No parameters found
8037                #   output:
8038                #   - 'Return type defined: object (+30)'
8039                #   - 'Total output confidence score: 30'
8040                #   - 'Medium confidence: return type defined'
8041                #   input_confidence: none
8042                #   output_confidence: medium
8043                #   overall_confidence: none
8044        }
8045
8046        # Add warnings for complex types
8047
109
71
        my @warnings;
8048
109
105
        if ($schema->{input}) {
8049
101
101
73
105
                foreach my $param_name (keys %{$schema->{input}}) {
8050
141
104
                        my $p = $schema->{input}{$param_name};
8051
8052
141
218
                        if ($p->{type} && $p->{type} eq 'coderef') {
8053
3
3
                                push @warnings, "Parameter '$param_name' is a coderef - you'll need to provide a sub {} in tests";
8054                        }
8055
8056
141
170
                        if ($p->{semantic} && $p->{semantic} eq 'filehandle') {
8057
2
4
                                push @warnings, "Parameter '$param_name' is a filehandle - consider using IO::String or mock";
8058                        }
8059
8060
141
197
                        if ($p->{isa} && $p->{isa} =~ /DateTime/) {
8061
2
2
                                push @warnings, "Parameter '$param_name' requires DateTime - ensure DateTime is loaded";
8062                        }
8063                }
8064        }
8065
8066
109
111
        if (@warnings) {
8067
7
6
                push @comments, '#';
8068
7
6
                push @comments, '# WARNINGS - Manual test setup may be required:';
8069
7
4
                foreach my $warning (@warnings) {
8070
7
8
                        push @comments, "#   ! $warning";
8071                }
8072        }
8073
8074
109
91
        push @comments, '';
8075
8076
109
298
        return join("\n", @comments);
8077}
8078
8079# --------------------------------------------------
8080# _serialize_parameter_for_yaml
8081#
8082# Purpose:    Convert a parameter spec hashref into
8083#             a cleaned, YAML-serialisable form
8084#             suitable for App::Test::Generator
8085#             consumption, handling semantic type
8086#             mappings, enum values, and object
8087#             class annotations.
8088#
8089# Entry:      $param - parameter spec hashref as
8090#                      produced by the merge and
8091#                      analysis pipeline.
8092#
8093# Exit:       Returns a new hashref containing only
8094#             the fields App::Test::Generator
8095#             understands, with internal _ keys
8096#             and semantic keys removed or converted.
8097#
8098# Side effects: None.
8099#
8100# Notes:      Semantic types are mapped to
8101#             appropriate base types with additional
8102#             constraint and note fields.
8103#             The original $param hashref is not
8104#             modified.
8105# --------------------------------------------------
8106sub _serialize_parameter_for_yaml {
8107
156
3771
        my ($self, $param) = @_;
8108
8109
156
106
        my %cleaned;
8110
8111        # Copy basic fields that App::Test::Generator expects
8112
156
135
        foreach my $field (qw(type position optional min max matches default)) {
8113
1092
1090
                $cleaned{$field} = $param->{$field} if defined $param->{$field};
8114        }
8115
8116        # Handle advanced type mappings
8117
156
168
        if(my $semantic = $param->{semantic}) {
8118
23
84
                if ($semantic eq 'datetime_object') {
8119                        # DateTime objects: test generator needs to know how to create them
8120
2
2
                        $cleaned{type} = 'object';
8121
2
3
                        $cleaned{isa} = $param->{isa} || 'DateTime';
8122
2
3
                        $cleaned{_note} = 'Requires DateTime object';
8123                } elsif ($semantic eq 'timepiece_object') {
8124
0
0
                        $cleaned{type} = 'object';
8125
0
0
                        $cleaned{isa} = $param->{isa} || 'Time::Piece';
8126
0
0
                        $cleaned{_note} = 'Requires Time::Piece object';
8127                } elsif ($semantic eq 'date_string') {
8128                        # Date strings: provide regex pattern
8129
1
2
                        $cleaned{type} = 'string';
8130
1
3
                        $cleaned{matches} ||= '/^\d{4}-\d{2}-\d{2}$/';
8131
1
2
                        $cleaned{_example} = '2024-12-12';
8132                } elsif ($semantic eq 'iso8601_string') {
8133
1
1
                        $cleaned{type} = 'string';
8134
1
2
                        $cleaned{matches} ||= '/^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z?$/';
8135
1
1
                        $cleaned{_example} = '2024-12-12T10:30:00Z';
8136                } elsif ($semantic eq 'unix_timestamp') {
8137
3
4
                        $cleaned{type} = 'integer';
8138
3
12
                        $cleaned{min} ||= 0;
8139
3
9
                        $cleaned{max} ||= $INT32_MAX;   # 32-bit max
8140
3
9
                        $cleaned{_note} = 'UNIX timestamp';
8141                } elsif ($semantic eq 'datetime_parseable') {
8142
0
0
                        $cleaned{type} = 'string';
8143
0
0
                        $cleaned{_note} = 'Must be parseable as datetime';
8144                } elsif ($semantic eq 'filehandle') {
8145                        # File handles: special handling needed
8146
2
2
                        $cleaned{type} = 'object';
8147
2
4
                        $cleaned{isa} = $param->{isa} || 'IO::Handle';
8148
2
2
                        $cleaned{_note} = 'File handle - may need mock in tests';
8149                } elsif ($semantic eq 'filepath') {
8150                        # File paths: string with path pattern
8151
2
3
                        $cleaned{type} = 'string';
8152
2
7
                        $cleaned{matches} ||= '/^[\\w\\/.\\-_]+$/';
8153
2
3
                        $cleaned{_note} = 'File path';
8154                } elsif ($semantic eq 'callback') {
8155                        # Coderefs: mark as special type
8156
5
8
                        $cleaned{type} = 'coderef';
8157
5
6
                        $cleaned{_note} = 'CODE reference - provide sub { } in tests';
8158                } elsif ($semantic eq 'enum') {
8159                        # Enum: keep as string but add valid values
8160
4
4
                        $cleaned{type} = 'string';
8161
4
11
                        if ($param->{enum} && ref($param->{enum}) eq 'ARRAY') {
8162
4
3
                                $cleaned{enum} = $param->{enum};
8163
4
4
4
7
                                $cleaned{_note} = 'Must be one of: ' . join(', ', @{$param->{enum}});
8164                        }
8165                }
8166        }
8167
8168        # Handle memberof even if not marked with semantic.
8169        # enum and memberof are mutually exclusive — only set memberof when enum
8170        # is not already being output (avoids the "has both" validation error).
8171
156
194
        if($param->{enum} && ref($param->{enum}) eq 'ARRAY' && !$cleaned{enum}) {
8172
2
3
                $cleaned{memberof} = $param->{enum};
8173        }
8174
156
196
        if($param->{memberof} && ref($param->{memberof}) eq 'ARRAY') {
8175
0
0
                $cleaned{memberof} = $param->{memberof};
8176        }
8177
8178        # Handle object class
8179
156
174
        if ($param->{isa} && !$cleaned{isa}) {
8180
4
5
                $cleaned{isa} = $param->{isa};
8181        }
8182
8183        # Add format hints where available
8184
156
142
        if ($param->{format}) {
8185
1
1
                $cleaned{_format} = $param->{format};
8186        }
8187
8188        # Remove internal fields
8189
156
113
        delete $cleaned{_source};
8190
156
106
        delete $cleaned{_from_input_spec};
8191
156
127
        delete $cleaned{semantic};
8192
8193
156
144
        return \%cleaned;
8194}
8195
8196# --------------------------------------------------
8197# _format_relationship
8198#
8199# Purpose:    Format a relationship hashref as a
8200#             short human-readable description
8201#             string for use in YAML comments.
8202#
8203# Entry:      $rel - relationship hashref as
8204#                    produced by the relationship
8205#                    detection methods.
8206#
8207# Exit:       Returns a description string.
8208#             Returns 'Unknown relationship' for
8209#             unrecognised types.
8210#
8211# Side effects: None.
8212# --------------------------------------------------
8213sub _format_relationship {
8214
14
8310
        my $rel = $_[0];
8215
8216
14
34
        if ($rel->{type} eq 'mutually_exclusive') {
8217
3
3
4
10
                return 'Mutually exclusive: ' . join(', ', @{$rel->{params}});
8218        } elsif ($rel->{type} eq 'required_group') {
8219
2
2
3
5
                return "Required group (OR): " . join(', ', @{$rel->{params}});
8220        } elsif ($rel->{type} eq 'conditional_requirement') {
8221
2
6
                return "If $rel->{if} then $rel->{then_required} required";
8222        } elsif ($rel->{type} eq 'dependency') {
8223
3
8
                return "$rel->{param} depends on $rel->{requires}";
8224        } elsif ($rel->{type} eq 'value_constraint') {
8225
2
6
                return "If $rel->{if} then $rel->{then} $rel->{operator} $rel->{value}";
8226        } elsif ($rel->{type} eq 'value_conditional') {
8227
1
3
                return "If $rel->{if}='$rel->{equals}' then $rel->{then_required} required";
8228        }
8229
1
2
        return 'Unknown relationship';
8230}
8231
8232# --------------------------------------------------
8233# _needs_object_instantiation
8234#
8235# Purpose:    Determine whether a method requires
8236#             an object to be instantiated before
8237#             it can be called, and if so return
8238#             the package name to instantiate.
8239#
8240# Entry:      $method_name - name of the method.
8241#             $method_body - method source string.
8242#             $method_info - method hashref from
8243#                            _find_methods (optional,
8244#                            for backward compat).
8245#
8246# Exit:       Returns the package name string if
8247#             object instantiation is required.
8248#             Returns undef if the method is a
8249#             constructor, factory, singleton, or
8250#             pure class method.
8251#
8252# Side effects: Logs analysis decisions to stdout
8253#               when verbose is set.
8254#
8255# Notes:      Orchestrates five detection sub-steps:
8256#             factory detection, singleton detection,
8257#             instance method detection, inheritance
8258#             check, and constructor requirements.
8259#             Instance method detection overrides
8260#             factory detection when both fire.
8261# --------------------------------------------------
8262sub _needs_object_instantiation {
8263
290
1696
        my ($self, $method_name, $method_body, $method_info) = @_;
8264
8265        # Allow method_info to be optional for backward compatibility
8266
290
717
        $method_info ||= {};
8267
8268
290
264
        my $doc = $self->{_document};
8269
290
537
        return undef unless $doc;
8270
8271        # Get the current package name
8272
290
419
        my $package_stmt = $doc->find_first('PPI::Statement::Package');
8273
290
33781
        my $current_package = $package_stmt ? $package_stmt->namespace : 'UNKNOWN';
8274
290
3602
        $self->{_package_name} //= $current_package;
8275
8276        # Initialize result structure
8277
290
1093
        my $result = {
8278                package => $current_package,
8279                needs_object => 0,
8280                type => 'unknown',
8281                details => {},
8282                constructor_params => undef,
8283        };
8284
8285        # Track whether we should explicitly skip object instantiation
8286
290
234
        my $skip_object = 0;
8287
8288        # Skip constructors and destructors
8289
290
355
        if ($method_name eq 'new') {
8290
13
22
                $self->_log("  OBJECT: Constructor '$method_name' detected; skipping instantiation analysis");
8291
13
29
                return undef;
8292        }
8293
277
490
        if($method_name =~ /^(create|build|construct|init|DESTROY)$/i) {
8294
0
0
                $skip_object = 1;
8295        }
8296
8297        # 1. Check for factory methods that return instances
8298
277
399
        my $is_factory = $self->_detect_factory_method(
8299                $method_name, $method_body, $current_package, $method_info
8300        );
8301
8302        # 2. Check for singleton patterns
8303
277
352
        my $is_singleton = $self->_detect_singleton_pattern($method_name, $method_body);
8304
277
282
        if ($is_singleton) {
8305
1
1
                $result->{needs_object} = 0; # Singleton methods return the singleton instance
8306
1
2
                $result->{type} = 'singleton_accessor';
8307
1
1
                $result->{details} = $is_singleton;
8308
1
2
                $self->_log("  OBJECT: Detected singleton accessor '$method_name'");
8309                # Singleton accessors typically don't need object creation in tests
8310                # as they're called on the class, not instance
8311
1
1
                $skip_object = 1;
8312        }
8313
8314        # 3. Check if this is an instance method that needs an object
8315
277
305
        my $is_instance_method = $self->_detect_instance_method($method_name, $method_body);
8316
277
424
        if ($is_instance_method &&
8317            ($is_instance_method->{explicit_self} ||
8318             $is_instance_method->{shift_self} ||
8319             $is_instance_method->{accesses_object_data} ||
8320             ($is_instance_method->{calls_instance_methods} &&
8321              scalar @{$is_instance_method->{calls_instance_methods}}))) {
8322
8323                # Instance-only methods override factory detection
8324
129
138
                if ($is_factory) {
8325
3
6
                        $self->_log(
8326                                "  OBJECT: Instance-only method '$method_name' overrides factory detection"
8327                        );
8328                }
8329
8330
129
124
                $result->{needs_object} = 1;
8331
129
117
                $result->{type} = 'instance_method';
8332
129
124
                $result->{details} = $is_instance_method;
8333
8334                # 4. Check for inheritance - if parent class constructor should be used
8335
129
153
                my $inheritance_info = $self->_check_inheritance_for_constructor(
8336                        $current_package, $method_body
8337                );
8338
129
175
                if ($inheritance_info && $inheritance_info->{use_parent_constructor}) {
8339
0
0
                        $result->{package} = $inheritance_info->{parent_class};
8340
0
0
                        $result->{details}{inheritance} = $inheritance_info;
8341
0
0
                        $self->_log(
8342                                "  OBJECT: Method '$method_name' uses parent class constructor: $inheritance_info->{parent_class}"
8343                        );
8344                }
8345
8346                # 5. Check if constructor needs specific parameters
8347                my $constructor_needs = $self->_detect_constructor_requirements(
8348                        $current_package, $result->{package}
8349
129
191
                );
8350
129
140
                if ($constructor_needs) {
8351
1
1
                        $result->{constructor_params} = $constructor_needs;
8352
1
1
                        $result->{details}{constructor_requirements} = $constructor_needs;
8353
1
2
                        $self->_log(
8354                                "  OBJECT: Constructor for $result->{package} requires parameters"
8355                        );
8356                }
8357
8358                # Return the package name (or parent package) that needs instantiation
8359
129
351
                return $result->{package};
8360        }
8361
8362        # 6. Check for class methods that might need objects from other classes
8363
148
180
        my $needs_other_object = $self->_detect_external_object_dependency($method_body);
8364
148
138
        if ($needs_other_object) {
8365
0
0
                $result->{needs_object} = 1;
8366
0
0
                $result->{type} = 'external_dependency';
8367                $result->{package} = $needs_other_object->{package}
8368
0
0
                        if $needs_other_object->{package};
8369
0
0
                $result->{details} = $needs_other_object;
8370
8371
0
0
                $self->_log(
8372                        "  OBJECT: Method '$method_name' depends on external object: $needs_other_object->{package}"
8373                );
8374
0
0
                return $result->{package} if $result->{package};
8375        }
8376
8377        # Factory method only if NOT instance-based
8378
148
148
        if ($is_factory && !$skip_object) {
8379
1
4
                $result->{needs_object} = 0;
8380
1
1
                $result->{type} = 'factory';
8381
1
1
                $result->{details} = $is_factory;
8382                $self->_log(
8383                        "  OBJECT: Detected factory method '$method_name' returns $is_factory->{returns_class} objects"
8384
1
3
                ) if $is_factory->{returns_class};
8385        }
8386
8387
148
233
        return undef;
8388}
8389
8390# --------------------------------------------------
8391# _detect_factory_method
8392#
8393# Purpose:    Detect whether a method is a factory
8394#             that creates and returns object
8395#             instances rather than operating on
8396#             an existing instance.
8397#
8398# Entry:      $method_name     - method name string.
8399#             $method_body     - method source string.
8400#             $current_package - current package name.
8401#             $method_info     - method hashref
8402#                                (optional).
8403#
8404# Exit:       Returns a factory_info hashref on
8405#             detection, or undef if the method
8406#             is not a factory.
8407#             The hashref includes: returns_class,
8408#             confidence, and one of:
8409#             returns_blessed, returns_new,
8410#             returns_factory_result, pod_hint.
8411#
8412# Side effects: None.
8413# --------------------------------------------------
8414sub _detect_factory_method {
8415
283
12752
        my ($self, $method_name, $method_body, $current_package, $method_info) = @_;
8416
8417
283
188
        my %factory_info;
8418
8419        # Check method name patterns
8420
283
353
        if ($method_name =~ /^(create_|make_|build_|get_)/i) {
8421
9
44
                $factory_info{name_pattern} = 1;
8422        }
8423
8424        # Look for object creation patterns in the method body
8425
283
277
        if ($method_body) {
8426                # Pattern 1: Returns a blessed reference
8427
283
682
                if ($method_body =~ /return\s+bless\s*\{[^}]*\},\s*['"]?(\w+(?:::\w+)*|\$\w+)['"]?/s ||
8428                        $method_body =~ /bless\s*\{[^}]*\},\s*['"]?(\w+(?:::\w+)*|\$\w+)['"]?.*return/s) {
8429
5
9
                        my $class_name = $1;
8430
8431                        # Handle variable class names
8432
5
12
                        if ($class_name =~ /^\$(class|self|package)$/) {
8433
2
3
                                $factory_info{returns_class} = $current_package;
8434                        } elsif ($class_name =~ /^\$/) {
8435
0
0
                                $factory_info{returns_class} = 'VARIABLE';      # Unknown variable
8436                        } else {
8437
3
4
                                $factory_info{returns_class} = $class_name;
8438                        }
8439
8440
5
7
                        $factory_info{returns_blessed} = 1;
8441
5
7
                        $factory_info{confidence} = 'high';
8442
5
6
                        return \%factory_info;
8443                }
8444
8445                # Pattern 2: Returns ->new() call on class or $self
8446
278
550
                if ($method_body =~ /return\s+([\$\w:]+)->new\(/s ||
8447                        $method_body =~ /([\$\w:]+)->new\(.*return/s) {
8448
5
7
                        my $target = $1;
8449
8450                        # Determine what class is being instantiated
8451
5
25
                        if ($target eq '$self' || $target eq 'shift' || $target =~ /^\$/) {
8452
0
0
                                $factory_info{returns_class} = $current_package;
8453
0
0
                                $factory_info{self_new} = 1;
8454                        } elsif ($target =~ /::/) {
8455
1
2
                                $factory_info{returns_class} = $target;
8456
1
2
                                $factory_info{external_class} = 1;
8457                        } else {
8458
4
8
                                $factory_info{returns_class} = $target;
8459                        }
8460
8461
5
5
                        $factory_info{returns_new} = 1;
8462
5
7
                        $factory_info{confidence} = 'medium';
8463
5
11
                        return \%factory_info;
8464                }
8465
8466                # Pattern 3: Returns an object from another factory method
8467
273
784
                if ($method_body =~ /return\s+([\$\w:]+)->(create_|make_|build_|get_)/i ||
8468                        $method_body =~ /([\$\w:]+)->(create_|make_|build_|get_).*return/si) {
8469
0
0
                        $factory_info{returns_factory_result} = 1;
8470
0
0
                        $factory_info{confidence} = 'low';
8471
0
0
                        return \%factory_info;
8472                }
8473        }
8474
8475        # Check for return type hints in POD if available
8476
273
651
        if ($method_info && ref($method_info) eq 'HASH' && $method_info->{pod}) {
8477
94
75
                my $pod = $method_info->{pod};
8478
94
230
                if ($pod =~ /returns?\s+(?:an?\s+)?(object|instance|new\s+\w+)/i) {
8479
0
0
                        $factory_info{pod_hint} = 1;
8480
0
0
                        $factory_info{confidence} = 'low';
8481
0
0
                        return \%factory_info;
8482                }
8483        }
8484
8485
273
314
        return undef;
8486}
8487
8488# --------------------------------------------------
8489# _detect_singleton_pattern
8490#
8491# Purpose:    Detect singleton accessor methods
8492#             that return a shared instance rather
8493#             than creating a new object, by
8494#             checking the method name and body
8495#             for singleton patterns.
8496#
8497# Entry:      $method_name - method name string.
8498#             $method_body - method source string.
8499#
8500# Exit:       Returns a singleton_info hashref on
8501#             detection (always contains at least
8502#             name_pattern => 1), or undef if the
8503#             method name does not match the
8504#             singleton accessor pattern.
8505#
8506# Side effects: None.
8507#
8508# Notes:      Only fires for methods named
8509#             instance, get_instance, singleton,
8510#             or shared_instance. Methods not
8511#             matching these names always return
8512#             undef regardless of body content.
8513# --------------------------------------------------
8514sub _detect_singleton_pattern {
8515
286
1024
        my ($self, $method_name, $method_body) = @_;
8516
8517        # Check method name patterns
8518
286
437
        return undef unless $method_name =~ /^(instance|get_instance|singleton|shared_instance)$/i;
8519
8520
7
9
        my %singleton_info = (
8521                name_pattern => 1,
8522        );
8523
8524        # Look for singleton patterns in code
8525
7
12
        if ($method_body) {
8526                # Pattern 1: Static/state variable holding instance
8527
7
29
                if ($method_body =~ /(?:my\s+)?(?:our\s+)?\$(?:instance|_instance|singleton)\b/s ||
8528                        $method_body =~ /state\s+\$(?:instance|_instance|singleton)\b/s) {
8529
5
8
                        $singleton_info{static_variable} = 1;
8530
5
7
                        $singleton_info{confidence} = 'high';
8531                }
8532
8533                # Pattern 2: Returns $instance if defined (with better regex)
8534
7
26
                if ($method_body =~ /return\s+\$instance\s+if\s+(?:defined\s+)?\$instance/ ||
8535                        $method_body =~ /unless\s+\$instance.*?=\s*.*?new/) {
8536
1
1
                        $singleton_info{returns_instance} = 1;
8537
1
2
                        $singleton_info{confidence} = 'high';
8538                }
8539
8540                # Pattern 3: ||= new() pattern (with better regex)
8541
7
24
                if ($method_body =~ /\$instance\s*\|\|=\s*.*?new/ ||
8542                        $method_body =~ /\$instance\s*=\s*.*?new\s+unless\s+(?:defined\s+)?\$instance/) {
8543
3
4
                        $singleton_info{lazy_initialization} = 1;
8544
3
4
                        $singleton_info{confidence} = 'medium';
8545                }
8546
8547                # Pattern 4: Direct return of $instance variable
8548
7
16
                if ($method_body =~ /return\s+\$instance;/) {
8549
4
5
                        $singleton_info{returns_instance} = 1;
8550
4
7
                        $singleton_info{confidence} = 'high' unless $singleton_info{confidence};
8551                }
8552        }
8553
8554
7
15
        return \%singleton_info if keys %singleton_info > 0; # Need at least name pattern
8555
8556
0
0
        return undef;
8557}
8558
8559# --------------------------------------------------
8560# _detect_instance_method
8561#
8562# Purpose:    Detect whether a method is an
8563#             instance method that requires a
8564#             blessed object ($self) to be called,
8565#             through multiple detection patterns
8566#             of varying confidence.
8567#
8568# Entry:      $method_name - method name string.
8569#             $method_body - method source string.
8570#
8571# Exit:       Returns an instance_info hashref if
8572#             any instance method signal is found.
8573#             Returns undef if no signals are
8574#             detected.
8575#             The hashref may contain: explicit_self,
8576#             shift_self, uses_self,
8577#             accesses_object_data,
8578#             calls_instance_methods,
8579#             private_method, and confidence.
8580#
8581# Side effects: None.
8582# --------------------------------------------------
8583sub _detect_instance_method {
8584
285
2616
        my ($self, $method_name, $method_body) = @_;
8585
8586
285
178
        my %instance_info;
8587
8588        # Pattern 1: my ($self, ...) = @_;
8589
285
633
        if ($method_body =~ /my\s*\(\s*\$self\s*[,)]/) {
8590
113
123
                $instance_info{explicit_self} = 1;
8591
113
115
                $instance_info{confidence} = 'high';
8592        }
8593
8594        # Pattern 1b: my $self = $_[0];  (direct-index style)
8595        elsif ($method_body =~ /my\s+\$self\s*=\s*\$_\[0\]/) {
8596
7
8
                $instance_info{explicit_self} = 1;
8597
7
8
                $instance_info{confidence} = 'high';
8598        }
8599
8600        # Pattern 2: my $self = shift;
8601        elsif ($method_body =~ /my\s+\$self\s*=\s*shift/) {
8602
13
17
                $instance_info{shift_self} = 1;
8603
13
14
                $instance_info{confidence} = 'high';
8604        }
8605
8606        # Pattern 3: Uses $self->something (including hash/array access)
8607        # This catches $self->{value} and $self->[0] as well as $self->method()
8608        elsif ($method_body =~ /\$self\s*->\s*(\w+|[\{\[])/) {
8609
2
3
                $instance_info{uses_self} = 1;
8610
2
3
                $instance_info{confidence} = 'medium';
8611        }
8612
8613        # Pattern 4: Accesses object data: $self->{...}, $self->[...]
8614
285
374
        if ($method_body =~ /\$self\s*->\s*[\{\[]/) {
8615
50
56
                $instance_info{accesses_object_data} = 1;
8616
50
81
                $instance_info{confidence} = 'high' unless $instance_info{confidence} eq 'high';
8617        }
8618
8619        # Pattern 5: Calls other instance methods on $self
8620
285
336
        if ($method_body =~ /\$self\s*->\s*(\w+)\s*\(/s) {
8621
5
7
                $instance_info{calls_instance_methods} = [];
8622
5
13
                while ($method_body =~ /\$self\s*->\s*(\w+)\s*\(/g) {
8623
5
5
2
12
                        push @{$instance_info{calls_instance_methods}}, $1;
8624                }
8625
5
5
3
8
                $instance_info{confidence} = 'high' if @{$instance_info{calls_instance_methods}};
8626        }
8627
8628        # Pattern 6: Method name suggests instance method (not perfect but helpful)
8629
285
309
        if ($method_name =~ /^_/ && $method_name !~ /^_new/) {
8630                # Private methods are usually instance methods
8631
3
3
                $instance_info{private_method} = 1;
8632
3
5
                $instance_info{confidence} = 'low' unless exists $instance_info{confidence};
8633        }
8634
8635
285
317
        return \%instance_info if keys %instance_info;
8636
147
132
        return undef;
8637}
8638
8639# --------------------------------------------------
8640# _check_inheritance_for_constructor
8641#
8642# Purpose:    Determine whether the current package
8643#             uses an inherited constructor from a
8644#             parent class, by examining use parent,
8645#             use base, and @ISA declarations.
8646#
8647# Entry:      $current_package - current package
8648#                                name string.
8649#             $method_body     - method source string
8650#                                (checked for SUPER::
8651#                                calls).
8652#
8653# Exit:       Returns an inheritance_info hashref
8654#             if any inheritance information is
8655#             found, or undef otherwise.
8656#             The hashref may contain:
8657#             parent_statements, isa_array,
8658#             uses_super, calls_super_new,
8659#             has_own_constructor,
8660#             use_parent_constructor, parent_class.
8661#
8662# Side effects: None.
8663# --------------------------------------------------
8664sub _check_inheritance_for_constructor {
8665
132
12746
        my ($self, $current_package, $method_body) = @_;
8666
8667
132
110
        my $doc = $self->{_document};
8668
132
177
        return undef unless $doc;
8669
8670
131
124
        my %inheritance_info;
8671
8672        # 1. Look for parent/base statements
8673        my @parent_classes;
8674
8675        # Find all 'use parent' or 'use base' statements
8676
131
170
        my $includes = $doc->find('PPI::Statement::Include') || [];
8677
131
491749
        foreach my $inc (@$includes) {
8678
255
246
                my $content = $inc->content;
8679
255
2416
                if ($content =~ /use\s+(parent|base)\s+['"]?([\w:]+)['"]?/) {
8680
4
4
                        push @parent_classes, $2;
8681
4
7
                        $inheritance_info{parent_statements} = \@parent_classes;
8682                }
8683                # Also check for multiple parents: use parent qw(Class1 Class2)
8684
255
298
                if ($content =~ /use\s+(parent|base)\s+qw?[\(\[]?(.+?)[\)\]]?;/) {
8685
0
0
                        my $parents = $2;
8686
0
0
                        my @multi_parents = split /\s+/, $parents;
8687
0
0
                        push @parent_classes, @multi_parents;
8688
0
0
                        $inheritance_info{parent_statements} = \@parent_classes;
8689                }
8690        }
8691
8692        # 2. Look for @ISA assignments (with or without 'our')
8693
131
158
        my $isas = $doc->find('PPI::Statement::Variable') || [];
8694
131
488791
        foreach my $isa (@$isas) {
8695
805
744
                my $content = $isa->content();
8696                # Match both "our @ISA = qw(...)" and "@ISA = qw(...)"
8697
805
20209
                if ($content =~ /(?:our\s+)?\@ISA\s*=\s*qw?[\(\[]?(.+?)[\)\]]?/) {
8698
0
0
                        my $parents = $1;
8699
0
0
                        my @isa_parents = split(/\s+/, $parents);
8700
0
0
                        push @parent_classes, @isa_parents;
8701
0
0
                        $inheritance_info{isa_array} = \@isa_parents;
8702                }
8703        }
8704
8705        # Also look for @ISA in regular statements
8706
131
153
        my $statements = $doc->find('PPI::Statement') || [];
8707
131
489986
        foreach my $stmt (@$statements) {
8708
4946
3824
                my $content = $stmt->content;
8709
4946
120067
                if ($content =~ /\@ISA\s*=\s*qw?[\(\[]?(.+?)[\)\]]?/) {
8710
1
2
                        my $parents = $1;
8711
1
1
                        my @isa_parents = split(/\s+/, $parents);
8712
1
1
                        push @parent_classes, @isa_parents;
8713
1
2
                        $inheritance_info{isa_array} = \@isa_parents;
8714                }
8715        }
8716
8717        # 3. Check if method uses SUPER:: calls
8718
131
290
        if ($method_body && $method_body =~ /SUPER::/) {
8719
2
4
                $inheritance_info{uses_super} = 1;
8720
2
4
                if ($method_body =~ /SUPER::new/) {
8721
0
0
                        $inheritance_info{calls_super_new} = 1;
8722                }
8723        }
8724
8725        # 4. Check if current package has its own new method
8726        my $has_own_new = $doc->find(sub {
8727
47676
215497
                $_[1]->isa('PPI::Statement::Sub') &&
8728                $_[1]->name eq 'new'
8729
131
297
        });
8730
8731
131
812
        if ($has_own_new) {
8732
32
42
                $inheritance_info{has_own_constructor} = 1;
8733        } elsif (@parent_classes) {
8734                # No own constructor, but has parents - might need parent constructor
8735
1
3
                $inheritance_info{use_parent_constructor} = 1;
8736
1
2
                $inheritance_info{parent_class} = $parent_classes[0];   # Use first parent
8737        }
8738
8739
131
196
        return \%inheritance_info if keys %inheritance_info;
8740
98
211
        return undef;
8741}
8742
8743# --------------------------------------------------
8744# _detect_constructor_requirements
8745#
8746# Purpose:    Analyse the new() method of the
8747#             current or target package to determine
8748#             what parameters the constructor
8749#             requires, including required and
8750#             optional parameters and their defaults.
8751#
8752# Entry:      $current_package - the package being
8753#                                analysed.
8754#             $target_package  - the package whose
8755#                                constructor will
8756#                                be called (may
8757#                                differ from current
8758#                                for inherited
8759#                                constructors).
8760#
8761# Exit:       Returns a requirements hashref on
8762#             success, or undef if no new() method
8763#             is found. For external classes,
8764#             returns a minimal hashref with
8765#             external_class => 1.
8766#
8767# Side effects: None.
8768# --------------------------------------------------
8769sub _detect_constructor_requirements {
8770
133
16204
        my ($self, $current_package, $target_package) = @_;
8771
8772
133
119
        my $doc = $self->{_document};
8773
133
180
        return undef unless $doc;
8774
8775        # If target is different from current, we can't analyze it
8776        # (external class, parent class in different file)
8777
133
154
        if ($target_package ne $current_package) {
8778                return {
8779
1
7
                        external_class => 1,
8780                        package => $target_package,
8781                        note => "Constructor for external class $target_package - parameters unknown"
8782                };
8783        }
8784
8785        # Find the new method in current package
8786        my $new_method = $doc->find_first(sub {
8787
40837
182012
                $_[1]->isa('PPI::Statement::Sub') &&
8788                $_[1]->name eq 'new'
8789
132
253
        });
8790
8791
132
1439
        return undef unless $new_method;
8792
8793
33
23
        my %requirements;
8794
8795        # Get method body
8796
33
34
        my $body = $new_method->content;
8797
8798        # Look for parameter extraction patterns - handle both $self and $class
8799
33
2872
        if ($body =~ /my\s*\(\s*\$(self|class)\s*,\s*(.+?)\)\s*=\s*\@_/s) {
8800
32
36
                my $params = $2;
8801
32
40
                my @param_names = $params =~ /\$(\w+)/g;
8802
8803
32
37
                if (@param_names) {
8804
3
6
                        $requirements{parameters} = \@param_names;
8805
3
4
                        $requirements{parameter_count} = scalar @param_names;
8806                }
8807        }
8808
8809        # Look for shift patterns
8810
33
22
        my @shift_params;
8811
33
50
        while ($body =~ /my\s+\$(\w+)\s*=\s*shift/g) {
8812
0
0
                push @shift_params, $1;
8813        }
8814        # Remove $self or $class if present
8815
33
0
33
0
        @shift_params = grep { $_ !~ /^(self|class)$/i } @shift_params;
8816
8817
33
32
        if (@shift_params) {
8818
0
0
                $requirements{parameters} = \@shift_params;
8819
0
0
                $requirements{parameter_count} = scalar @shift_params;
8820
0
0
                $requirements{shift_pattern} = 1;
8821        }
8822
8823        # Look for validation of parameters (more flexible pattern)
8824
33
24
        my @required_params;
8825
33
58
        if ($body =~ /croak.*unless.*(?:defined\s+)?\$(\w+)/g) {
8826
2
3
                push @required_params, $1;
8827        }
8828
33
53
        if ($body =~ /die.*unless.*(?:defined\s+)?\$(\w+)/g) {
8829
1
1
                push @required_params, $1;
8830        }
8831
8832
33
32
        if (@required_params) {
8833
3
5
                $requirements{required_parameters} = \@required_params;
8834        }
8835
8836        # Look for default values (optional parameters)
8837
33
24
        my @optional_params;
8838        my %default_values;
8839
8840        # Use the new _extract_default_value method
8841        # Check for each parameter in the constructor body
8842
33
48
        if ($requirements{parameters}) {
8843
3
3
3
4
                foreach my $param (@{$requirements{parameters}}) {
8844
5
12
                        my $default = $self->_extract_default_value($param, $body);
8845
5
7
                        if (defined $default) {
8846
2
3
                                push @optional_params, $param;
8847
2
3
                                $default_values{$param} = $default;
8848                        }
8849                }
8850        }
8851
8852
33
30
        if (@optional_params) {
8853
2
4
                $requirements{optional_parameters} = \@optional_params;
8854
2
3
                $requirements{default_values} = \%default_values;
8855        }
8856
8857
33
36
        return \%requirements if keys %requirements;
8858
30
45
        return undef;
8859}
8860
8861
8862# --------------------------------------------------
8863# _detect_external_object_dependency
8864#
8865# Purpose:    Detect whether a method creates or
8866#             depends on objects from classes other
8867#             than the current package, by scanning
8868#             for ->new() calls on named classes
8869#             and method calls on typed variables.
8870#
8871# Entry:      $method_body - method source string.
8872#                            May be undef.
8873#
8874# Exit:       Returns a dependency_info hashref if
8875#             external object usage is found, or
8876#             undef otherwise.
8877#             The hashref may contain:
8878#             creates_objects (arrayref of class
8879#             names), uses_objects (arrayref of
8880#             class names), and package (the primary
8881#             dependency class).
8882#
8883# Side effects: None.
8884# --------------------------------------------------
8885sub _detect_external_object_dependency {
8886
154
4550
        my ($self, $method_body) = @_;
8887
8888
154
126
        return undef unless $method_body;
8889
8890
153
103
        my %dependency_info;
8891
8892        # Pattern 1: Creates objects of other classes with ->new() or ->create()
8893        # Reset pos for global match
8894
153
204
        pos($method_body) = 0;
8895
153
266
        while ($method_body =~ /(\w+(?:::\w+)*)->(?:new|create)\(/g) {
8896
5
6
                my $class = $1;
8897
5
18
                next if $class eq 'main' || $class eq '__PACKAGE__' || $class =~ /^\$/;
8898
4
4
3
18
                push @{$dependency_info{creates_objects}}, $class;
8899        }
8900
8901
153
169
        if ($dependency_info{creates_objects}) {
8902                # Remove duplicates
8903
3
2
                my %seen;
8904
3
4
3
3
9
3
                $dependency_info{creates_objects} = [grep { !$seen{$_}++ } @{$dependency_info{creates_objects}}];
8905
3
5
                $dependency_info{package} = $dependency_info{creates_objects}[0];
8906        }
8907
8908        # Pattern 2: Calls methods on objects from other classes
8909
153
163
        if ($method_body =~ /\$(\w+)->\w+\(/) {
8910
3
2
                my %object_vars;
8911                # Reset pos for global match — the if check above used a
8912                # non-/g match so it cannot have advanced pos, but the while
8913                # loop's own /g matches still need to start from the beginning.
8914
3
3
                pos($method_body) = 0;
8915
3
8
                while ($method_body =~ /\$(\w+)->\w+\(/g) {
8916
4
9
                        $object_vars{$1}++;
8917                }
8918
8919                # Try to determine type of object variables
8920
3
1
                my @object_classes;
8921
3
6
                foreach my $var (keys %object_vars) {
8922                        # Look for type declarations or assignments
8923
4
236
                        if ($method_body =~ /my\s+\$$var\s*=\s*(\w+(?:::\w+)+)->(?:new|create)/) {
8924
3
8
                                push @object_classes, $1;
8925                        } elsif ($method_body =~ /my\s+\$$var\s*=\s*(\w+(?:::\w+)+)->/) {
8926
0
0
                                push @object_classes, $1;
8927                        }
8928                }
8929
8930
3
5
                if (@object_classes) {
8931
3
3
                        $dependency_info{uses_objects} = \@object_classes;
8932
3
6
                        $dependency_info{package} = $object_classes[0] unless $dependency_info{package};
8933                }
8934        }
8935
8936        # Pattern 3: Receives objects as parameters (type hints in comments/POD)
8937        # This would need integration with parameter analysis
8938
8939
153
168
        return \%dependency_info if keys %dependency_info;
8940
149
145
        return undef;
8941}
8942
8943# --------------------------------------------------
8944# _get_parent_class
8945#
8946# Purpose:    Find the first parent class of the
8947#             current package by searching the
8948#             PPI document for use parent, use base,
8949#             or our @ISA declarations.
8950#
8951# Entry:      None (operates on $self->{_document}).
8952#
8953# Exit:       Returns the parent class name string,
8954#             or undef if no parent is found.
8955#
8956# Side effects: None.
8957# --------------------------------------------------
8958sub _get_parent_class {
8959
2
3425
        my $self = $_[0];
8960
8961
2
2
        my $doc = $self->{_document};
8962
2
3
        return unless $doc;
8963
8964        # Look for use parent statements
8965        my $parent_stmt = $doc->find_first(sub {
8966
68
344
                $_[1]->isa('PPI::Statement::Include') &&
8967                $_[1]->type eq 'use' &&
8968                $_[1]->module =~ /^(parent|base)$/ &&
8969                $_[1]->arguments =~ /['"](\w+(?:::\w+)*)['"]/
8970
2
5
        });
8971
2
14
        if ($parent_stmt) {
8972
0
0
                my $parent = $1;
8973
0
0
                return $parent;
8974        }
8975
8976        # Look for @ISA assignment
8977        my $isa_stmt = $doc->find_first(sub {
8978
68
450
                $_[1]->isa('PPI::Statement') &&
8979                $_[1]->content =~ /our\s+\@ISA\s*=\s*\(\s*['"](\w+(?:::\w+)*)['"]\s*\)/
8980
2
4
        });
8981
2
13
        if ($isa_stmt && $isa_stmt->content =~ /['"](\w+(?:::\w+)*)['"]/) {
8982
0
0
                return $1;
8983        }
8984
8985
2
2
        return;
8986}
8987
8988# --------------------------------------------------
8989# _get_class_for_instance_method
8990#
8991# Purpose:    Determine which class should be used
8992#             for object instantiation when testing
8993#             an instance method, preferring the
8994#             current package if it has a new()
8995#             method, falling back to the parent
8996#             class otherwise.
8997#
8998# Entry:      None (operates on $self->{_document}).
8999#
9000# Exit:       Returns the package name string to
9001#             use for instantiation. Returns
9002#             'UNKNOWN_PACKAGE' if no package
9003#             statement is found.
9004#
9005# Side effects: Stores the package name in
9006#               $self->{_package_name} if not
9007#               already set.
9008# --------------------------------------------------
9009sub _get_class_for_instance_method {
9010
2
3183
        my $self = $_[0];
9011
9012        # Get the current package
9013
2
3
        my $doc = $self->{_document};
9014
2
2
        my $package_stmt = $doc->find_first('PPI::Statement::Package');
9015
2
357
        return 'UNKNOWN_PACKAGE' unless $package_stmt;
9016
1
2
        my $package_name = $package_stmt->namespace;
9017
1
13
        $self->{_package_name} //= $package_name;
9018
9019        # Check if the current package has a 'new' method
9020        my $has_new = $doc->find(sub {
9021
46
270
                $_[1]->isa('PPI::Statement::Sub') && $_[1]->name eq 'new'
9022
1
3
        });
9023
9024
1
6
        if ($has_new) {
9025
1
2
                return $package_name;
9026        }
9027
9028        # Otherwise, try to get the parent class
9029
0
0
        my $parent = $self->_get_parent_class();
9030
0
0
        return $parent if $parent;
9031
9032        # Fallback to current package
9033
0
0
        return $package_name;
9034}
9035
9036# --------------------------------------------------
9037# _extract_default_value
9038#
9039# Purpose:    Extract a default value for a named
9040#             parameter from a method body by
9041#             matching multiple common Perl default
9042#             assignment idioms.
9043#
9044# Entry:      $param - parameter name string.
9045#             $code  - method body source string.
9046#
9047# Exit:       Returns the cleaned default value
9048#             scalar on success, or undef if no
9049#             default assignment pattern is found.
9050#
9051# Side effects: None.
9052#
9053# Notes:      Eight patterns are tried in order:
9054#             ||, //=, defined ternary, unless
9055#             defined, ||=, //, multi-line if
9056#             !defined, unless defined block.
9057#             Comment lines are stripped from the
9058#             code before matching to avoid false
9059#             positives. Delegates to
9060#             _clean_default_value for value
9061#             normalisation.
9062# --------------------------------------------------
9063sub _extract_default_value {
9064
212
11010
        my ($self, $param, $code) = @_;
9065
9066
212
361
        return undef unless $param && $code;
9067
9068        # Clean up the code for easier pattern matching
9069        # Remove comments to avoid false positives
9070
209
151
        my $clean_code = $code;
9071
209
295
        $clean_code =~ s/#.*$//gm;
9072
209
2073
        $clean_code =~ s/^\s+|\s+$//g;
9073
9074        # Pattern 1: $param = $param || 'default_value'
9075        # Also handles: $param = $arg || 'default'
9076
209
6662
        if ($clean_code =~ /\$$param\s*=\s*(?:\$$param|\$[a-zA-Z_]\w*)\s*\|\|\s*([^;]+)/) {
9077
12
19
                my $default = $1;
9078
12
9
                $default =~ s/\s*;\s*$//;
9079
12
12
                $default = $self->_clean_default_value($default);
9080
12
30
                return $default if defined $default;
9081        }
9082
9083        # Pattern 2: $param //= 'default_value'
9084
197
2283
        if ($clean_code =~ /\$$param\s*\/\/=\s*([^;]+)/) {
9085
10
12
                my $default = $1;
9086
10
12
                $default =~ s/\s*;\s*$//;
9087
10
12
                $default = $self->_clean_default_value($default);
9088
10
28
                return $default if defined $default;
9089        }
9090
9091        # Pattern 3: $param = defined $param ? $param : 'default'
9092        # Also handles: $param = defined $arg ? $arg : 'default'
9093
188
8898
        if ($clean_code =~ /\$$param\s*=\s*defined\s+(?:\$$param|\$[a-zA-Z_]\w*)\s*\?\s*(?:\$$param|\$[a-zA-Z_]\w*)\s*:\s*([^;]+)/) {
9094
6
9
                my $default = $1;
9095
6
6
                $default =~ s/\s*;\s*$//;
9096
6
6
                $default = $self->_clean_default_value($default);
9097
6
17
                return $default if defined $default;
9098        }
9099
9100        # Pattern 4: $param = 'default' unless defined $param;
9101
182
5908
        if ($clean_code =~ /\$$param\s*=\s*([^;]+?)\s+unless\s+defined\s+(?:\$$param|\$[a-zA-Z_]\w*)/) {
9102
3
3
                my $default = $1;
9103
3
4
                $default = $self->_clean_default_value($default);
9104
3
10
                return $default if defined $default;
9105        }
9106
9107        # Pattern 5: $param ||= 'default'
9108
179
2038
        if ($clean_code =~ /\$$param\s*\|\|=\s*([^;]+)/) {
9109
4
18
                my $default = $1;
9110
4
7
                $default =~ s/\s*;\s*$//;
9111
4
6
                $default = $self->_clean_default_value($default);
9112
4
14
                return $default if defined $default;
9113        }
9114
9115        # Pattern 6: $param = $arg // 'default'
9116
175
4819
        if ($clean_code =~ /\$$param\s*=\s*(?:\$$param|\$[a-zA-Z_]\w*)\s*\/\/\s*([^;]+)/) {
9117
2
2
                my $default = $1;
9118
2
2
                $default =~ s/\s*;\s*$//;
9119
2
3
                $default = $self->_clean_default_value($default);
9120
2
5
                return $default if defined $default;
9121        }
9122
9123        # Pattern 7: Multi-line: if (!defined $param) { $param = 'default'; }
9124
174
4763
        if ($clean_code =~ /if\s*\(\s*!defined\s+\$$param\s*\)\s*\{[^}]*\$$param\s*=\s*([^;]+)/s) {
9125
1
2
                my $default = $1;
9126
1
1
                $default =~ s/\s*;\s*$//;
9127
1
1
                $default = $self->_clean_default_value($default);
9128
1
4
                return $default if defined $default;
9129        }
9130
9131        # Pattern 8: unless (defined $param) { $param = 'default'; }
9132
173
4600
        if ($clean_code =~ /unless\s*\(\s*defined\s+\$$param\s*\)\s*\{[^}]*\$$param\s*=\s*([^;]+)/s) {
9133
1
1
                my $default = $1;
9134
1
2
                $default =~ s/\s*;\s*$//;
9135
1
1
                $default = $self->_clean_default_value($default);
9136
1
4
                return $default if defined $default;
9137        }
9138
9139
172
434
        return undef;
9140}
9141
9142# --------------------------------------------------
9143# _extract_test_hints
9144#
9145# Purpose:    Extract structured test hints from
9146#             a method's code and schema, including
9147#             boundary values, invalid inputs, and
9148#             valid input examples from POD.
9149#
9150# Entry:      $method - method hashref.
9151#             $schema - schema hashref as built so
9152#                       far by _analyze_method.
9153#
9154# Exit:       Returns a hints hashref with keys:
9155#             boundary_values, invalid_inputs,
9156#             equivalence_classes, valid_inputs.
9157#             Keys with empty arrays are deleted
9158#             before returning.
9159#
9160# Side effects: None.
9161# --------------------------------------------------
9162sub _extract_test_hints {
9163
288
271
        my ($self, $method, $schema) = @_;
9164
9165
288
538
        my %hints = (
9166                boundary_values => [],
9167                invalid_inputs => [],
9168                equivalence_classes => [],
9169                valid_inputs => [],
9170        );
9171
9172
288
244
        my $code = $method->{body};
9173
288
286
        return {} unless $code;
9174
9175
287
376
        $self->_extract_invalid_input_hints($code, \%hints);
9176
287
360
        $self->_extract_boundary_value_hints($code, \%hints);
9177
9178        # prune empties
9179
287
400
        for my $k (keys %hints) {
9180
1148
1148
627
1149
                delete $hints{$k} unless @{$hints{$k}};
9181        }
9182
9183
287
326
        return \%hints;
9184}
9185
9186# --------------------------------------------------
9187# _extract_invalid_input_hints
9188#
9189# Purpose:    Detect likely invalid input values
9190#             from a method body by looking for
9191#             defined checks, empty string checks,
9192#             and negative number checks.
9193#
9194# Entry:      $code  - method body source string.
9195#             $hints - hints hashref (modified in
9196#                      place via invalid_inputs key).
9197#
9198# Exit:       Returns nothing. Appends to
9199#             $hints->{invalid_inputs}.
9200#
9201# Side effects: None.
9202# --------------------------------------------------
9203sub _extract_invalid_input_hints {
9204
294
927
        my ($self, $code, $hints) = @_;
9205
9206        # undef invalid
9207
294
350
        if ($code =~ /defined\s*\(\s*\$/) {
9208
5
5
4
8
                push @{ $hints->{invalid_inputs} }, 'undef';
9209        }
9210
9211        # empty string invalid
9212
294
546
        if ($code =~ /\beq\s*''/ || $code =~ /\blength\s*\(/) {
9213
5
5
5
7
                push @{ $hints->{invalid_inputs} }, '';
9214        }
9215
9216        # negative number invalid
9217
294
331
        if ($code =~ /\$\w+\s*<\s*0/) {
9218
7
7
8
10
                push @{ $hints->{invalid_inputs} }, -1;
9219        }
9220}
9221
9222# --------------------------------------------------
9223# _extract_boundary_value_hints
9224#
9225# Purpose:    Extract numeric boundary values from
9226#             comparison operators in a method body,
9227#             adding both the boundary value and
9228#             the value one step either side.
9229#
9230# Entry:      $code  - method body source string.
9231#             $hints - hints hashref (modified in
9232#                      place via boundary_values key).
9233#
9234# Exit:       Returns nothing. Appends to and
9235#             deduplicates $hints->{boundary_values}.
9236#
9237# Side effects: None.
9238# --------------------------------------------------
9239sub _extract_boundary_value_hints {
9240
292
921
        my ($self, $code, $hints) = @_;
9241
9242
292
637
        while ($code =~ /\$\w+\s*(<=|<|>=|>)\s*(\d+)/g) {
9243
24
37
                my ($op, $n) = ($1, $2);
9244
9245
24
49
                if ($op eq '<') {
9246
11
11
12
29
                        push @{ $hints->{boundary_values} }, $n, $n+1;
9247                } elsif ($op eq '<=') {
9248
2
2
1
5
                        push @{ $hints->{boundary_values} }, $n, $n+1;
9249                } elsif ($op eq '>') {
9250
9
9
7
24
                        push @{ $hints->{boundary_values} }, $n, $n-1;
9251                } elsif ($op eq '>=') {
9252
2
2
2
6
                        push @{ $hints->{boundary_values} }, $n, $n-1;
9253                }
9254        }
9255
9256        # Remove duplicates
9257
292
196
        my %seen;
9258
292
48
292
184
89
427
        $hints->{boundary_values} = [ grep { !$seen{$_}++ } @{ $hints->{boundary_values} } ];
9259}
9260
9261# --------------------------------------------------
9262# _extract_pod_examples
9263#
9264# Purpose:    Extract example method call patterns
9265#             from a method's SYNOPSIS POD section
9266#             and add them as valid_inputs hints.
9267#
9268# Entry:      $pod   - POD string for the method.
9269#                      May be undef.
9270#             $hints - hints hashref (modified in
9271#                      place via valid_inputs key).
9272#
9273# Exit:       Returns $hints. Appends to
9274#             $hints->{valid_inputs}.
9275#
9276# Side effects: Logs the number of examples found
9277#               to stdout when verbose is set.
9278# --------------------------------------------------
9279sub _extract_pod_examples {
9280
288
276
        my ($self, $pod, $hints) = @_;
9281
9282
288
320
        return $hints unless $pod;
9283
9284
106
85
        my @examples;
9285
9286        # Extract SYNOPSIS
9287
106
162
        return $hints unless $pod =~ /=head2\s+SYNOPSIS\s*(.+?)(?=\n=head|\z)/s;
9288
4
5
        my $synopsis = $1;
9289
9290        # Constructor examples: ->wilma(foo => 'bar', count => 5)
9291
4
17
        while ($synopsis =~ /->([a-z_0-9A-Z]+)\s*\(\s*(.*?)\s*\)/sg) {
9292
4
5
                my ($method, $args) = ($1, $2);
9293
4
3
                my %kv;
9294
9295
4
9
                while ($args =~ /(\w+)\s*=>\s*(?:'([^']*)'|"([^"]*)"|(\d+))/g) {
9296
7
6
                        my $key = $1;
9297
7
14
                        my $val = defined $2 ? $2 : defined $3 ? $3 : $4;
9298
7
15
                        $kv{$key} = $val;
9299                }
9300
9301
4
13
                push @examples, {
9302                        style => 'named',
9303                        source => 'pod',
9304                        args => \%kv,
9305                        function => $method, # TODO: add a sanity check this is what we expect
9306                } if %kv;
9307        }
9308
9309
4
4
        unless(scalar(@examples)) {
9310                # Positional calls: func($a, $b)
9311
1
5
                while ($synopsis =~ /\b(\w+)\s*\(\s*(.*?)\s*\)/sg) {
9312
1
2
                        my ($func, $argstr) = ($1, $2);
9313
9314                        # next if $func eq 'new';       # already handled
9315
9316
1
2
3
5
                        my @args = map { s/^\s+|\s+$//gr } split /\s*,\s*/, $argstr;
9317
9318
1
1
                        next unless @args;
9319
9320
1
3
                        push @examples, {
9321                                style   => 'positional',
9322                                source  => 'pod',
9323                                function => $func,
9324                                args    => \@args,
9325                        };
9326                }
9327        }
9328
9329
4
4
        if (scalar(@examples)) {
9330
4
11
                $hints->{valid_inputs} ||= [];
9331
4
4
4
3
                push @{ $hints->{valid_inputs} }, @examples;
9332
9333
4
7
                $self->_log("  POD: extracted " . scalar(@examples) . " example call(s)");
9334        }
9335
9336
4
6
        for my $k (qw(boundary_values invalid_inputs valid_inputs equivalence_classes)) {
9337
16
20
                $hints->{$k} //= [];
9338        }
9339
9340
4
5
        return $hints;
9341}
9342
9343# --------------------------------------------------
9344# _clean_default_value
9345#
9346# Purpose:    Normalise a raw default value string
9347#             extracted from code or POD into a
9348#             clean Perl scalar, handling quoted
9349#             strings, numeric literals, boolean
9350#             keywords, empty containers, and
9351#             undef.
9352#
9353# Entry:      $value     - raw value string.
9354#                          May be undef.
9355#             $from_code - true if the value was
9356#                          extracted from source
9357#                          code (affects escape
9358#                          sequence handling).
9359#
9360# Exit:       Returns the cleaned value:
9361#               undef   for undef or unparseable
9362#               {}      for empty hashrefs
9363#               []      for empty arrayrefs
9364#               integer for whole numbers
9365#               float   for decimal numbers
9366#               1 or 0  for boolean keywords
9367#               string  for everything else
9368#
9369# Side effects: None.
9370# --------------------------------------------------
9371sub _clean_default_value {
9372
170
188
        my ($self, $value, $from_code) = @_;
9373
9374
170
166
        return unless defined $value;
9375
9376        # Remove leading/trailing whitespace
9377
168
272
        $value =~ s/^\s+|\s+$//g;
9378
9379        # Remove parenthetical notes like "(no password)" only if there's content before them
9380
168
147
        $value =~ s/(\S+)\s*\([^)]+\)\s*$/$1/;
9381
168
209
        $value =~ s/^\s+|\s+$//g;
9382
9383        # Handle chained || or // operators - extract the rightmost value
9384
168
291
        if ($value =~ /\|\||\/{2}/) {
9385
7
27
                my @parts = split(/\s*(?:\|\||\/{2})\s*/, $value);
9386
7
7
                $value = $parts[-1];
9387
7
10
                $value =~ s/^\s+|\s+$//g;
9388        }
9389
9390        # Remove trailing semicolon if present
9391
168
124
        $value =~ s/;\s*$//;
9392
9393        # Handle q{}, qq{}, qw{} quotes
9394
168
269
        if ($value =~ /^qq?\{(.*?)\}$/s) {
9395
3
4
                $value = $1;
9396        } elsif ($value =~ /^qw\{(.*?)\}$/s) {
9397
0
0
                $value = $1;
9398        } elsif ($value =~ /^q[qwx]?\s*([^a-zA-Z0-9\{\[])(.*?)\1$/s) {
9399
0
0
                $value = $2;
9400        }
9401
9402        # Handle quoted strings
9403
168
210
        if ($value =~ /^(['"])(.*)\1$/s) {
9404
48
45
                $value = $2;
9405
9406
48
48
                if ($from_code) {
9407                        # In regex captures from source code, escape sequences are doubled
9408                        # \\n in capture needs to become \n for the test
9409
17
18
                        $value =~ s/\\\\/\\/g;
9410                }
9411
9412                # Only unescape the quote characters themselves
9413
48
35
                $value =~ s/\\"/"/g;
9414
48
34
                $value =~ s/\\'/'/g;
9415
9416                # If NOT from code (i.e., from POD), interpret escape sequences
9417
48
46
                unless ($from_code) {
9418
31
21
                        $value =~ s/\\n/\n/g;
9419
31
24
                        $value =~ s/\\r/\r/g;
9420
31
21
                        $value =~ s/\\t/\t/g;
9421
31
25
                        $value =~ s/\\\\/\\/g;
9422                }
9423        }
9424
9425        # Sometimes trailing ) is left on
9426
168
167
        if($value !~ /^\(/) {
9427
166
124
                $value =~ s/\)$//;
9428        }
9429
9430        # Handle Perl empty hash (must be before numeric/boolean checks)
9431
168
155
        if ($value =~ /^\{\s*\}$/) {
9432
6
27
                return {};
9433        }
9434
9435        # Handle Perl empty list/array
9436
162
152
        if ($value =~ /^\[\s*\]$/) {
9437
6
9
                return [];
9438        }
9439
9440        # Handle numeric values
9441
156
250
        if ($value =~ /^-?\d+(?:\.\d+)?$/) {
9442
68
66
                if ($value =~ /\./) {
9443
11
35
                        return $value + 0;
9444                } else {
9445
57
105
                        return int($value);
9446                }
9447        }
9448
9449        # Handle boolean keywords
9450
88
126
        if ($value =~ /^(true|false)$/i) {
9451
9
33
                return lc($1) eq 'true' ? 1 : 0;
9452        }
9453
9454        # Handle Perl boolean constants
9455
79
93
        if ($value eq '1') {
9456
0
0
                return 1;
9457        } elsif ($value eq '0') {
9458
0
0
                return 0;
9459        }
9460
9461        # Handle undef
9462
79
72
        if ($value eq 'undef') {
9463
12
17
                return undef;
9464        }
9465
9466        # Handle __PACKAGE__ and similar constants
9467
67
66
        if ($value eq '__PACKAGE__') {
9468
1
1
                return '__PACKAGE__';
9469        }
9470
9471        # Remove surrounding parentheses
9472
66
54
        $value =~ s/^\((.+)\)$/$1/;
9473
9474        # Handle expressions we can't evaluate
9475
66
115
        if ($value =~ /^\$[a-zA-Z_]/ || $value =~ /\(.*\)/) {
9476
2
8
                return if($value =~ /^\$|\@|\%/);       # The default is a value, so who knows its type?
9477                # return $value;
9478        }
9479
9480
66
130
        return $value;
9481}
9482
9483# --------------------------------------------------
9484# _validate_pod_code_agreement
9485#
9486# Purpose:    Compare POD parameter documentation
9487#             against code-inferred parameters and
9488#             return a list of disagreements when
9489#             strict_pod mode is enabled.
9490#
9491# Entry:      $pod_params  - hashref of parameters
9492#                            from POD analysis.
9493#             $code_params - hashref of parameters
9494#                            from code analysis.
9495#             $method_name - method name string,
9496#                            used for context in
9497#                            error messages.
9498#
9499# Exit:       Returns a list of disagreement
9500#             strings. Returns an empty list if
9501#             all parameters agree.
9502#
9503# Side effects: None.
9504#
9505# Notes:      Type mismatches are classified as
9506#             either 'compatible' (e.g. integer vs
9507#             number) or 'incompatible' via
9508#             _types_are_compatible. $self and
9509#             $class are excluded from undocumented
9510#             parameter warnings in appropriate
9511#             context.
9512# --------------------------------------------------
9513sub _validate_pod_code_agreement {
9514
17
50
        my ($self, $pod_params, $code_params, $method_name) = @_;
9515
9516
17
18
        my @errors;
9517
9518        # Get all parameter names from both sources
9519
17
27
28
39
        my %all_params = map { $_ => 1 } (keys %$pod_params, keys %$code_params);
9520
9521
17
33
        foreach my $param (sort keys %all_params) {
9522
23
38
                my $pod = $pod_params->{$param} || {};
9523
23
32
                my $code = $code_params->{$param} || {};
9524
9525                # Params from a =head3|4 Input formal spec are the authoritative API
9526                # definition — they are exempt from POD/code disagreement checks since
9527                # the spec takes precedence over heuristic code analysis.
9528
23
28
                next if $pod->{_from_input_spec};
9529
9530                # Check if parameter exists in both
9531
23
86
                if (exists $pod_params->{$param} && !exists $code_params->{$param}) {
9532
3
3
                        push @errors, "Parameter '\$$param' documented in POD but not found in code signature";
9533
3
8
                        next;
9534                }
9535
9536
20
38
                if(!exists $pod_params->{$param} && exists $code_params->{$param}) {
9537
16
18
                        if($param eq 'class') {
9538                                # $class is the class invocant, not a user-facing parameter
9539
1
2
                                next;
9540                        }
9541
15
20
                        if($param eq 'self') {
9542                                # $self is the instance invocant, not a user-facing parameter
9543
1
1
                                next;
9544                        }
9545
14
19
                        push @errors, "Parameter '\$$param' found in code but not documented in POD";
9546
14
17
                        next;
9547                }
9548
9549                # Compare types if both exist
9550
4
12
                if ($pod->{type} && $code->{type} && $pod->{type} ne $code->{type}) {
9551
2
4
                        if (!$self->_types_are_compatible($pod->{type}, $code->{type})) {
9552
1
3
                                push @errors, "Type mismatch for '\$$param': POD says '$pod->{type}', code suggests '$code->{type}' (incompatible)";
9553                        } else {
9554
1
2
                                push @errors, "Type difference for '\$$param': POD says '$pod->{type}', code suggests '$code->{type}' (compatible)";
9555                        }
9556                }
9557
9558                # Compare optional status if both exist
9559
4
8
                if (exists $pod->{optional} && exists $code->{optional} &&
9560                        $pod->{optional} != $code->{optional}) {
9561
0
0
                        my $pod_status = $pod->{optional} ? 'optional' : 'required';
9562
0
0
                        my $code_status = $code->{optional} ? 'optional' : 'required';
9563
0
0
                        push @errors, "Optional status mismatch for '\$$param': POD says '$pod_status', code suggests '$code_status'";
9564                }
9565
9566                # Check constraints (min/max)
9567
4
9
                if (defined $pod->{min} && defined $code->{min} && $pod->{min} != $code->{min}) {
9568
0
0
                        push @errors, "Min constraint mismatch for '\$$param': POD says '$pod->{min}', code suggests '$code->{min}'";
9569                }
9570
9571
4
6
                if (defined $pod->{max} && defined $code->{max} && $pod->{max} != $code->{max}) {
9572
0
0
                        push @errors, "Max constraint mismatch for '\$$param': POD says '$pod->{max}', code suggests '$code->{max}'";
9573                }
9574
9575                # Check regex patterns
9576
4
7
                if ($pod->{matches} && $code->{matches} && $pod->{matches} ne $code->{matches}) {
9577
0
0
                        push @errors, "Pattern mismatch for '\$$param': POD says '$pod->{matches}', code suggests '$code->{matches}'";
9578                }
9579        }
9580
9581        # Return errors (empty array if no errors)
9582
17
49
        return @errors;
9583}
9584
9585# --------------------------------------------------
9586# _validate_strictness_level
9587#
9588# Purpose:    Validate and normalise the strict_pod
9589#             option value accepted by new() into
9590#             an integer level: 0 (off), 1 (warn),
9591#             or 2 (fatal).
9592#
9593# Entry:      $val - the raw value passed to
9594#                    strict_pod in new(). May be
9595#                    undef, a number, or a string.
9596#
9597# Exit:       Returns 0, 1, or 2.
9598#             Croaks if the value is not recognised.
9599#
9600# Side effects: None.
9601# --------------------------------------------------
9602sub _validate_strictness_level {
9603
468
12111
        my $val = $_[0];
9604
9605
468
1325
        return 0 unless defined $val;
9606
9607        # Numeric
9608
33
89
        return 0 if $val =~ /^(0|off|none)$/i;
9609
27
84
        return 1 if $val =~ /^(1|warn|warning)$/i;
9610
12
39
        return 2 if $val =~ /^(2|fatal|die|error)$/i;
9611
9612
2
10
        croak("Invalid value for --strict-pod: '$val' (use off|warn|fatal)");
9613}
9614
9615# --------------------------------------------------
9616# _types_are_compatible
9617#
9618# Purpose:    Determine whether two type strings
9619#             are compatible for POD/code agreement
9620#             checking, allowing semantically
9621#             equivalent types (e.g. 'integer' and
9622#             'number') to coexist without
9623#             triggering a strict POD warning.
9624#
9625# Entry:      $pod_type  - type string from POD.
9626#             $code_type - type string from code.
9627#
9628# Exit:       Returns 1 if compatible, 0 otherwise.
9629#
9630# Side effects: None.
9631# --------------------------------------------------
9632sub _types_are_compatible {
9633
20
31
        my ($self, $pod_type, $code_type) = @_;
9634
9635        # Exact match is always compatible
9636
20
31
        return 1 if $pod_type eq $code_type;
9637
9638        # Define compatibility matrix
9639
15
43
        my %compatible_types = (
9640                'integer' => ['number', 'scalar'],
9641                'number' => ['scalar'],
9642                'string' => ['scalar'],
9643                'scalar' => ['string', 'integer', 'number'],
9644                'arrayref' => ['array'],
9645                'hashref' => ['hash'],
9646        );
9647
9648        # Check if code_type is compatible with pod_type
9649
15
20
        if (my $allowed = $compatible_types{$pod_type}) {
9650
13
17
11
39
                return grep { $_ eq $code_type } @$allowed;
9651        }
9652
9653        # Check if pod_type is compatible with code_type
9654
2
5
        if (my $allowed = $compatible_types{$code_type}) {
9655
2
2
2
7
                return grep { $_ eq $pod_type } @$allowed;
9656        }
9657
9658
0
0
        return 0;       # Not compatible
9659}
9660
9661 - 9710
=head2 generate_pod_validation_report

Generate a human-readable report of all POD/code disagreements found
across a set of extracted schemas.

    my $schemas = $extractor->extract_all(no_write => 1);
    my $report  = $extractor->generate_pod_validation_report($schemas);
    print $report;

=head3 Arguments

=over 4

=item * C<$schemas>

A hashref of method name to schema hashref as returned by
C<extract_all>. Required.

=back

=head3 Returns

A string containing the full validation report, or a single line
confirming all methods passed if no disagreements were found.

=head3 Side effects

None.

=head3 Notes

Only methods whose schemas contain a C<_pod_validation_errors> key
(populated when C<strict_pod> is 1 or 2) appear in the report. If
C<strict_pod> was 0 when C<extract_all> was called, this method will
always return the all-passed message.

=head3 API specification

=head4 input

    {
        self    => { type => OBJECT,  isa => 'App::Test::Generator::SchemaExtractor' },
        schemas => { type => HASHREF },
    }

=head4 output

    { type => SCALAR }

=cut
9711
9712sub generate_pod_validation_report {
9713
17
1979
        my ($self, $schemas) = @_;
9714
9715
17
11
        my @reports;
9716
17
34
        foreach my $method_name (sort keys %$schemas) {
9717
26
21
                my $schema = $schemas->{$method_name};
9718
9719
26
34
                if (my $errors = $schema->{_pod_validation_errors}) {
9720
16
18
                        push @reports, "Method: $method_name";
9721
16
31
                        push @reports, "  Severity: " . ($schema->{_pod_disagreement} ? 'warning' : 'fatal');
9722
16
14
                        push @reports, "  Errors:";
9723
16
16
17
22
                        push @reports, map { "    - $_" } @$errors;
9724
16
18
                        push @reports, '';
9725                }
9726        }
9727
9728
17
21
        if (@reports) {
9729
11
28
                return join("\n", "POD/Code Validation Report:", '=' x 40, '', @reports);
9730        } else {
9731
6
8
                return 'POD/Code Validation: All methods passed consistency checks.';
9732        }
9733}
9734
9735 - 9739
=head2 _log

Log a message if verbose mode is on.

=cut
9740
9741sub _log {
9742
4245
5950
        my($self, $msg) = @_;
9743
9744
4245
5135
        print "$msg\n" if $self->{verbose};
9745}
9746
9747 - 9784
=head1 NOTES

This is pre-pre-alpha proof of concept code.
Nevertheless,
it is useful for creating a template which you can modify to create a working schema to pass into L<App::Test::Generator>.

=head1 TODO

Extend C<=head4 Input> parsing to cover the C<enum>/C<memberof> constraint
synonym (union types, e.g. C<scalar | scalarref>, are already handled by
C<_map_formal_input_type>).

=head1 SEE ALSO

=over 4

=item * L<App::Test::Generator> - Generate fuzz and corpus-driven test harnesses

Output from this module serves as input to that module.
So with well-documented code, you can automatically create your tests.

=item * L<App::Test::Generator::Template> - Template of the file of tests created by C<App::Test::Generator>

=back

=head1 AUTHOR

Nigel Horne, C<< <njh at nigelhorne.com> >>

=head1 LICENCE AND COPYRIGHT

Copyright 2025-2026 Nigel Horne.

Usage is subject to GPL2 licence terms.
If you use it,
please let me know.

=cut
9785
97861;