lib/Email/Abuse/Investigator.pm

Structural Coverage (Approximate)

TER1 (Statement): 85.80%
TER2 (Branch): 72.05%
TER3 (LCSAJ): 96.2% (230/239)
Approximate LCSAJ segments: 663

LCSAJ Legend

● Covered — this LCSAJ path was executed during testing.

● Not covered — this LCSAJ path was never executed. These are the paths to focus on.

Multiple dots on a line indicate that multiple control-flow paths begin at that line. Hovering over any dot shows:

        start → end → jump

start – first line of the linear sequence
end – last line before control flow changes
jump – line execution jumps to next

Uncovered paths show [NOT COVERED] in the tooltip.

Mutant Testing Legend

Survived (tests missed this) Killed (tests detected this) No mutation

1: package Email::Abuse::Investigator; 2: 3: use strict; 4: use warnings; 5: use autodie qw(:all); 6: use Time::Piece; 7: 8: =head1 NAME 9: 10: Email::Abuse::Investigator - Analyse spam email to identify originating hosts, 11: hosted URLs, and suspicious domains 12: 13: =head1 VERSION 14: 15: Version 0.08 16: 17: =cut 18: 19: our $VERSION = '0.08'; 20: 21: =head1 SYNOPSIS 22: 23: use Email::Abuse::Investigator; 24: 25: my $analyser = Email::Abuse::Investigator->new( verbose => 1 ); 26: $analyser->parse_email($raw_email_text); 27: 28: # Originating IP and its network owner 29: my $origin = $analyser->originating_ip(); 30: 31: # All HTTP/HTTPS URLs found in the body 32: my @urls = $analyser->embedded_urls(); 33: 34: # All domains extracted from mailto: links and bare addresses in the body 35: my @mdoms = $analyser->mailto_domains(); 36: 37: # All domains mentioned anywhere (union of the above) 38: my @adoms = $analyser->all_domains(); 39: 40: # Full printable report 41: print $analyser->report(); 42: 43: =head1 DESCRIPTION 44: 45: C<Email::Abuse::Investigator> examines the raw source of a spam/phishing e-mail 46: and answers the questions manual abuse investigators ask: 47: 48: =over 4 49: 50: =item 1. Where did the message really come from? 51: 52: Walks the C<Received:> chain, skips private/trusted IPs, and identifies the 53: first external hop. Enriches with rDNS, WHOIS/RDAP org name and abuse 54: contact. Both IPv4 and IPv6 addresses are supported. 55: 56: =item 2. Who hosts the advertised web sites? 57: 58: Extracts every C<http://> and C<https://> URL from both plain-text and HTML 59: parts, resolves each hostname to an IP, and looks up the network owner. 60: 61: =item 3. Who owns the reply-to / contact domains? 62: 63: Extracts domains from C<mailto:> links, bare e-mail addresses in the body, 64: the C<From:>/C<Reply-To:>/C<Sender:>/C<Return-Path:> headers, C<DKIM-Signature: d=> 65: (the signing domain), C<List-Unsubscribe:> (the ESP or bulk-sender domain), and the 66: C<Message-ID:> domain. For each unique domain it gathers: 67: 68: =over 8 69: 70: =item * Domain registrar and registrant (WHOIS) 71: 72: =item * Web-hosting IP and network owner (A record -> RDAP) 73: 74: =item * Mail-hosting IP and network owner (MX record -> RDAP) 75: 76: =item * DNS nameserver operator (NS record -> RDAP) 77: 78: =item * Whether the domain was recently registered (potential flag) 79: 80: =back 81: 82: =back 83: 84: =head1 REQUIRED MODULES 85: 86: The following modules are mandatory: 87: 88: Readonly::Values::Months 89: Socket (core since Perl 5) 90: IO::Socket::INET (core since Perl 5) 91: MIME::QuotedPrint (core since Perl 5.8) 92: MIME::Base64 (core since Perl 5.8) 93: 94: The following are optional but strongly recommended: 95: 96: Net::DNS -- enables MX, NS, AAAA record lookups 97: LWP::UserAgent -- enables RDAP (faster and richer than raw WHOIS) 98: HTML::LinkExtor -- enables structural HTML link extraction 99: CHI -- enables cross-message IP/domain result caching 100: IO::Socket::IP -- enables IPv6 WHOIS connections 101: 102: =cut 103: 104: use Carp qw(croak carp); 105: use IO::Select; 106: use IO::Socket::INET; 107: use MIME::QuotedPrint qw( decode_qp ); 108: use MIME::Base64 qw( decode_base64 ); 109: use Object::Configure; 110: use Params::Get; 111: use Params::Validate::Strict; 112: use Readonly; 113: use Readonly::Values::Months; 114: use Socket qw( inet_aton inet_ntoa AF_INET ); 115: 116: # ----------------------------------------------------------------------- 117: # Optional modules -- gracefully degraded when absent 118: # ----------------------------------------------------------------------- 119: 120: # Net::DNS enables MX, NS, AAAA lookups; falls back to gethostbyname 121: my $HAS_NET_DNS; 122: 123: # LWP::UserAgent enables RDAP queries; falls back to raw WHOIS 124: my $HAS_LWP; 125: my $HAS_CONN_CACHE; 126: 127: # HTML::LinkExtor enables structural HTML link extraction 128: my $HAS_HTML_LINKEXTOR; 129: 130: # CHI enables a persistent cross-message cache for IP/domain data 131: my $HAS_CHI; 132: 133: # IO::Socket::IP provides dual-stack (IPv4+IPv6) socket support 134: my $HAS_IO_SOCKET_IP; 135: 136: # Domain::PublicSuffix enables accurate eTLD+1 normalisation 137: my $HAS_PUBLIC_SUFFIX; 138: 139: # AnyEvent::DNS enables parallel DNS queries 140: my $HAS_ANYEVENT_DNS; 141: 142: BEGIN { 143: $HAS_NET_DNS = eval { require Net::DNS; 1 }; 144: $HAS_LWP = eval { require LWP::UserAgent; 1 }; 145: $HAS_CONN_CACHE = eval { require LWP::ConnCache; 1 }; 146: $HAS_HTML_LINKEXTOR= eval { require HTML::LinkExtor; 1 }; 147: $HAS_CHI = eval { require CHI; 1 }; 148: $HAS_IO_SOCKET_IP = eval { require IO::Socket::IP; 1 }; 149: $HAS_PUBLIC_SUFFIX = eval { require Domain::PublicSuffix; 1 }; 150: $HAS_ANYEVENT_DNS = eval { require AnyEvent::DNS; 1 }; 151: } 152: 153: # ----------------------------------------------------------------------- 154: # Constants -- all magic numbers and strings live here 155: # ----------------------------------------------------------------------- 156: 157: # WHOIS protocol port (IANA-assigned) 158: Readonly::Scalar my $WHOIS_PORT => 43; 159: 160: # Bytes to read per sysread() call from a WHOIS socket 161: Readonly::Scalar my $WHOIS_READ_CHUNK => 4096; 162: 163: # Maximum WHOIS response bytes stored in whois_raw (keep reports compact) 164: Readonly::Scalar my $WHOIS_RAW_MAX => 2048; 165: 166: # Maximum multipart nesting depth (recursion guard -- RFC 2046 has no limit 167: # but we cap it to prevent stack exhaustion on crafted messages) 168: Readonly::Scalar my $MAX_MULTIPART_DEPTH => 20; 169: 170: # Number of days before registration that triggers recently_registered flag 171: Readonly::Scalar my $RECENT_REG_DAYS => 180; 172: 173: # Number of days ahead of expiry that triggers domain_expires_soon flag 174: Readonly::Scalar my $EXPIRY_WARN_DAYS => 30; 175: 176: # Seconds in a day -- used in date arithmetic throughout 177: Readonly::Scalar my $SECS_PER_DAY => 86400; 178: 179: # Suspicious date window: dates outside +/- 7 days raise a flag 180: Readonly::Scalar my $DATE_SKEW_DAYS => 7; 181: 182: # Maximum positive timezone offset in minutes (+14:00 = Line Islands) 183: Readonly::Scalar my $TZ_MAX_POS_MINS => 840; 184: 185: # Maximum negative timezone offset in minutes (-12:00 = Baker Island) 186: Readonly::Scalar my $TZ_MAX_NEG_MINS => 720; 187: 188: # High-risk score threshold 189: Readonly::Scalar my $SCORE_HIGH => 9; 190: 191: # Medium-risk score threshold 192: Readonly::Scalar my $SCORE_MEDIUM => 5; 193: 194: # Low-risk score threshold 195: Readonly::Scalar my $SCORE_LOW => 2; 196: 197: # Flag severity weights (contribute to the numeric risk score) 198: Readonly::Hash my %FLAG_WEIGHT => ( 199: HIGH => 3, 200: MEDIUM => 2, 201: LOW => 1, 202: INFO => 0, 203: ); 204: 205: # Maximum merged-role display string length before summarisation kicks in 206: Readonly::Scalar my $ROLE_MAX_LEN => 80; 207: 208: # CHI cache TTL in seconds (1 hour -- IP allocations change slowly) 209: Readonly::Scalar my $CACHE_TTL_SECS => 3600; 210: 211: # Default constructor timeout for network operations (seconds) 212: Readonly::Scalar my $DEFAULT_TIMEOUT => 10; 213: 214: # Maximum role string length before truncation 215: Readonly::Scalar my $ROLE_WRAP_LEN => 66; 216: 217: # ----------------------------------------------------------------------- 218: # Private ranges -- IPs that are never actionable abuse targets 219: # ----------------------------------------------------------------------- 220: 221: # Both IPv4 and IPv6 private/reserved ranges. Each entry is a compiled 222: # regex; _is_private() iterates over them and returns true on first match. 223: my @PRIVATE_RANGES = ( 224: # IPv4 ranges 225: qr/^0\./, # 0.0.0.0/8 this-network (RFC 1122) 226: qr/^127\./, # 127.0.0.0/8 loopback 227: qr/^10\./, # 10.0.0.0/8 RFC 1918 228: qr/^192\.168\./, # 192.168.0.0/16 RFC 1918 229: qr/^172\.(?:1[6-9]|2\d|3[01])\./, # 172.16.0.0/12 RFC 1918 230: qr/^169\.254\./, # 169.254.0.0/16 link-local 231: qr/^100\.(?:6[4-9]|[7-9]\d|1(?:[01]\d|2[0-7]))\./, # 100.64.0.0/10 CGN (RFC 6598) 232: qr/^192\.0\.0\./, # 192.0.0.0/24 IETF protocol (RFC 6890) 233: qr/^192\.0\.2\./, # 192.0.2.0/24 TEST-NET-1 (RFC 5737) 234: qr/^198\.51\.100\./, # 198.51.100.0/24 TEST-NET-2 (RFC 5737) 235: qr/^203\.0\.113\./, # 203.0.113.0/24 TEST-NET-3 (RFC 5737) 236: qr/^255\./, # 255.0.0.0/8 broadcast 237: # IPv6 ranges 238: qr/^::1$/, # IPv6 loopback 239: qr/^fe80:/i, # IPv6 link-local (fe80::/10) 240: qr/^fc/i, # IPv6 ULA fc00::/7 241: qr/^fd/i, # IPv6 ULA fd00::/8 242: qr/^2001:db8:/i, # IPv6 documentation range (RFC 3849) 243: qr/^64:ff9b:/i, # IPv6 NAT64 well-known prefix 244: ); 245: 246: # Priority-ordered patterns for extracting IPs from Received: headers. 247: # Covers bracketed IPv4, bracketed IPv6, parenthesised address, and bare dotted-quad. 248: my @RECEIVED_IP_RE = ( 249: qr/\[\s*([\d.]+)\s*\]/, # [1.2.3.4] 250: qr/\[\s*([0-9a-fA-F:]+)\s*\]/, # [IPv6 address] 251: qr/$\s*[\w.-]*\s*\[?\s*([\d.]+)\s*\]?\s*$/, # (hostname [1.2.3.4]) 252: qr/from\s+[\w.-]+\s+([\d.]+)/, # from hostname addr 253: qr/([\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3})/, # bare dotted-quad fallback 254: ); 255: 256: # ----------------------------------------------------------------------- 257: # Default configuration -- overridable via Object::Configure 258: # ----------------------------------------------------------------------- 259: 260: # Object::Configure may overlay 261: # values from a file before new() uses them. Use Readonly for constants 262: # that should never be overridden at runtime. 263: 264: # ----------------------------------------------------------------------- 265: # Trusted domains (infrastructure -- never report these as abuse targets) 266: # Can be overrideen at runtime by Object::Configure 267: # ----------------------------------------------------------------------- 268: 269: my %TRUSTED_DOMAINS = map { $_ => 1 } qw( 270: gmail.com googlemail.com yahoo.com outlook.com hotmail.com 271: google.com microsoft.com apple.com amazon.com 272: googlegroups.com groups.google.com 273: w3.org 274: fedex.com ups.com dhl.com usps.com royalmail.com 275: ); 276: 277: # ----------------------------------------------------------------------- 278: # URL shortener domains (real destination is hidden behind these) 279: # ----------------------------------------------------------------------- 280: 281: my %URL_SHORTENERS = map { $_ => 1 } qw( 282: bit.ly bitly.com tinyurl.com t.co ow.ly 283: goo.gl is.gd buff.ly ift.tt dlvr.it 284: short.link rebrand.ly tiny.cc cutt.ly rb.gy 285: shorturl.at bl.ink smarturl.it yourls.org clicky.me 286: snip.ly adf.ly bc.vc lnkd.in fb.me 287: youtu.be 288: ); 289: 290: # ----------------------------------------------------------------------- 291: # Well-known provider abuse contacts 292: # Can be overrideen at runtime by Object::Configure 293: # ----------------------------------------------------------------------- 294: 295: # Curated table of provider abuse contacts. Entries with only a 'form' 296: # key (no 'email') require web-form submission; abuse_contacts() suppresses 297: # email addresses for those providers and form_contacts() surfaces them. 298: my %PROVIDER_ABUSE = ( 299: # Google / Gmail 300: 'google.com' => { email => 'abuse@google.com', note => 'Also report Gmail accounts via https://support.google.com/mail/contact/abuse' }, 301: 'gmail.com' => { email => 'abuse@google.com', note => 'Report Gmail spam via https://support.google.com/mail/contact/abuse' }, 302: 'googlemail.com' => { email => 'abuse@google.com', note => 'Report via https://support.google.com/mail/contact/abuse' }, 303: '1e100.net' => { email => 'abuse@google.com', note => 'Google infrastructure' }, 304: 'blogspot.com' => { email => 'abuse@google.com', note => 'Blogger/Blogspot -- report via https://support.google.com/blogger/answer/76315' }, 305: 'blogger.com' => { email => 'abuse@google.com', note => 'Blogger platform abuse' }, 306: 'sites.google.com' => { email => 'abuse@google.com', note => 'Google Sites hosted content' }, 307: # Microsoft 308: 'microsoft.com' => { email => 'abuse@microsoft.com', note => 'Also report via https://www.microsoft.com/en-us/wdsi/support/report-unsafe-site' }, 309: 'outlook.com' => { email => 'abuse@microsoft.com', note => 'Report Outlook spam: https://support.microsoft.com/en-us/office/report-phishing' }, 310: 'hotmail.com' => { email => 'abuse@microsoft.com', note => 'Report via https://support.microsoft.com/en-us/office/report-phishing' }, 311: 'live.com' => { email => 'abuse@microsoft.com', note => 'Microsoft consumer mail' }, 312: 'office365.com' => { email => 'abuse@microsoft.com', note => 'Microsoft 365 infrastructure' }, 313: 'protection.outlook.com' => { email => 'abuse@microsoft.com', note => 'Microsoft EOP gateway' }, 314: # Yahoo 315: 'yahoo.com' => { email => 'abuse@yahoo-inc.com', note => 'Also use https://io.help.yahoo.com/contact/index' }, 316: 'yahoo.co.uk' => { email => 'abuse@yahoo-inc.com', note => 'Yahoo UK' }, 317: # Apple 318: 'apple.com' => { email => 'reportphishing@apple.com', note => 'iCloud / Apple Mail abuse' }, 319: 'icloud.com' => { email => 'reportphishing@apple.com', note => 'iCloud abuse' }, 320: 'me.com' => { email => 'reportphishing@apple.com', note => 'Apple legacy mail' }, 321: # Amazon / AWS 322: 'amazon.com' => { email => 'abuse@amazonaws.com', note => 'Also https://aws.amazon.com/forms/report-abuse' }, 323: 'amazonaws.com' => { email => 'abuse@amazonaws.com', note => 'AWS abuse form: https://aws.amazon.com/forms/report-abuse' }, 324: 'amazonses.com' => { email => 'abuse@amazonaws.com', note => 'Amazon SES sending infrastructure' }, 325: # Cloudflare 326: 'cloudflare.com' => { email => 'abuse@cloudflare.com', note => 'Report via https://www.cloudflare.com/abuse/' }, 327: # Fastly / Akamai 328: 'fastly.net' => { email => 'abuse@fastly.com', note => 'Fastly CDN' }, 329: 'akamai.com' => { email => 'abuse@akamai.com', note => 'Akamai CDN' }, 330: 'akamaitechnologies.com' => { email => 'abuse@akamai.com', note => 'Akamai CDN' }, 331: # Namecheap 332: 'namecheap.com' => { email => 'abuse@namecheap.com', note => 'Registrar abuse' }, 333: # GoDaddy -- web form only; email bounces 334: 'godaddy.com' => { 335: form => 'https://supportcenter.godaddy.com/AbuseReport', 336: form_paste => 'Select the abuse type (spam, phishing, malware etc). ' 337: . 'Enter the domain name in the Domain field. ' 338: . 'Paste the originating IP, risk flags, and the relevant ' 339: . 'Received: headers from the report below.', 340: form_upload => 'Take a screenshot of the report as a .png or .jpg, ' 341: . 'or export it as a .pdf.', 342: note => 'Registrar/host -- email reports not monitored, use web form', 343: }, 344: # SendGrid / Twilio 345: 'sendgrid.net' => { email => 'abuse@sendgrid.com', note => 'ESP -- include full headers' }, 346: 'sendgrid.com' => { email => 'abuse@sendgrid.com', note => 'ESP -- include full headers' }, 347: # Mailchimp / Mandrill 348: 'mailchimp.com' => { email => 'abuse@mailchimp.com', note => 'ESP abuse' }, 349: 'mandrillapp.com' => { email => 'abuse@mailchimp.com', note => 'Mandrill transactional ESP' }, 350: # OVH 351: 'ovh.net' => { email => 'abuse@ovh.net', note => 'OVH hosting' }, 352: 'ovh.com' => { email => 'abuse@ovh.com', note => 'OVH hosting' }, 353: # Hetzner 354: 'hetzner.com' => { email => 'abuse@hetzner.com', note => 'Hetzner hosting' }, 355: # Digital Ocean 356: 'digitalocean.com' => { email => 'abuse@digitalocean.com',note => 'DO abuse form: https://www.digitalocean.com/company/contact/#abuse' }, 357: # Linode / Akamai 358: 'linode.com' => { email => 'abuse@linode.com', note => 'Linode/Akamai Cloud' }, 359: # Constant Contact 360: 'constantcontact.com' => { email => 'abuse@constantcontact.com', note => 'ESP abuse' }, 361: 'r.constantcontact.com' => { email => 'abuse@constantcontact.com', note => 'Constant Contact sending infrastructure' }, 362: # HubSpot 363: 'hubspot.com' => { email => 'abuse@hubspot.com', note => 'ESP abuse' }, 364: 'hs-analytics.net' => { email => 'abuse@hubspot.com', note => 'HubSpot analytics infrastructure' }, 365: # Campaign Monitor 366: 'createsend.com' => { email => 'abuse@campaignmonitor.com', note => 'Campaign Monitor ESP' }, 367: 'cmail20.com' => { email => 'abuse@campaignmonitor.com', note => 'Campaign Monitor sending infrastructure' }, 368: # Klaviyo 369: 'klaviyo.com' => { email => 'abuse@klaviyo.com', note => 'ESP abuse' }, 370: # Brevo (formerly Sendinblue) 371: 'sendinblue.com' => { email => 'abuse@sendinblue.com', note => 'ESP abuse' }, 372: 'brevo.com' => { email => 'abuse@brevo.com', note => 'ESP abuse' }, 373: # Mailgun 374: 'mailgun.com' => { email => 'abuse@mailgun.com', note => 'ESP abuse' }, 375: 'mailgun.org' => { email => 'abuse@mailgun.com', note => 'Mailgun sending infrastructure' }, 376: # Postmark 377: 'postmarkapp.com' => { email => 'abuse@postmarkapp.com', note => 'ESP abuse' }, 378: # WordPress.com 379: 'wordpress.com' => { email => 'abuse@wordpress.com', note => 'WordPress.com hosted blog -- report via https://en.wordpress.com/abuse/' }, 380: 'wp.com' => { email => 'abuse@wordpress.com', note => 'WordPress.com short domain' }, 381: # Substack 382: 'substack.com' => { email => 'abuse@substack.com', note => 'Substack newsletter platform abuse' }, 383: # ActiveCampaign 384: 'activecampaign.com' => { email => 'abuse@activecampaign.com', note => 'ActiveCampaign ESP' }, 385: 'ac-tinker.com' => { email => 'abuse@activecampaign.com', note => 'ActiveCampaign tracking infrastructure' }, 386: # Salesforce Marketing Cloud 387: 'salesforce.com' => { email => 'abuse@salesforce.com', note => 'Salesforce Marketing Cloud / ExactTarget ESP' }, 388: 'mc.salesforce.com' => { email => 'abuse@salesforce.com', note => 'Salesforce Marketing Cloud sending infrastructure' }, 389: 'exacttarget.com' => { email => 'abuse@salesforce.com', note => 'ExactTarget / Salesforce Marketing Cloud ESP' }, 390: 'et.exacttarget.com' => { email => 'abuse@salesforce.com', note => 'ExactTarget sending infrastructure' }, 391: # Vultr 392: 'vultr.com' => { email => 'abuse@vultr.com', note => 'Vultr hosting' }, 393: # Contabo 394: 'contabo.com' => { email => 'abuse@contabo.com', note => 'Contabo hosting' }, 395: # Leaseweb 396: 'leaseweb.com' => { email => 'abuse@leaseweb.com', note => 'Leaseweb hosting' }, 397: # M247 398: 'm247.com' => { email => 'abuse@m247.com', note => 'M247 hosting' }, 399: # MarkMonitor -- web form only 400: 'markmonitor.com' => { 401: form => 'https://corp.markmonitor.com/domain/ui/abuse-report', 402: form_paste => 'Complete all fields including the domain name and your ' 403: . 'description of the abuse. Paste the originating IP, ' 404: . 'risk flags, and the relevant Received: headers from the ' 405: . 'report below.', 406: form_upload => 'Take a screenshot of the report as a .png or .jpg, or export it as a .pdf. MarkMonitor does not accept .eml files.', 407: note => 'Brand-protection registrar -- email reports not processed', 408: }, 409: # URL shortener operators 410: 'is.gd' => { email => 'abuse@is.gd', note => 'URL shortener -- report via https://is.gd/contact.php' }, 411: 'bitly.com' => { email => 'abuse@bitly.com', note => 'URL shortener abuse' }, 412: 'bit.ly' => { email => 'abuse@bitly.com', note => 'URL shortener abuse' }, 413: 'tinyurl.com' => { email => 'abuse@tinyurl.com', note => 'URL shortener abuse' }, 414: 'ow.ly' => { email => 'abuse@hootsuite.com', note => 'Hootsuite URL shortener' }, 415: 'buff.ly' => { email => 'abuse@buffer.com', note => 'Buffer URL shortener' }, 416: 'rb.gy' => { email => 'abuse@rb.gy', note => 'URL shortener abuse' }, 417: 'cutt.ly' => { email => 'abuse@cutt.ly', note => 'URL shortener abuse' }, 418: 'shorturl.at' => { email => 'abuse@shorturl.at', note => 'URL shortener abuse' }, 419: # Dynadot -- web form only 420: 'dynadot.com' => { 421: form => 'https://www.dynadot.com/report-abuse', 422: form_paste => 'Complete all fields including the domain name and your ' 423: . 'description of the abuse. Paste the originating IP, ' 424: . 'risk flags, and the relevant Received: headers from the ' 425: . 'report below.', 426: form_upload => 'Take a screenshot of the report as a .png or .jpg, ' 427: . 'or export it as a .pdf.', 428: note => 'Registrar -- email reports not monitored, use web form', 429: }, 430: # Global Domain Group -- web form only 431: 'globaldomaingroup.com' => { 432: form => 'https://globaldomaingroup.com/report-abuse', 433: form_paste => 'Complete all fields including the domain name and your ' 434: . 'description of the abuse. Paste the originating IP, ' 435: . 'risk flags, and the relevant Received: headers from the ' 436: . 'report below.', 437: form_upload => 'Attach the original spam message as an .eml file.', 438: note => 'Registrar -- email reports explicitly not accepted', 439: }, 440: # TPG / Internode (Australia) 441: 'tpgi.com.au' => { email => 'abuse@tpg.com.au', note => 'TPG Telecom Australia' }, 442: 'tpg.com.au' => { email => 'abuse@tpg.com.au', note => 'TPG Telecom Australia' }, 443: 'internode.on.net' => { email => 'abuse@internode.on.net',note => 'Internode Australia' }, 444: ); 445: 446: # ----------------------------------------------------------------------- 447: # Constructor 448: # ----------------------------------------------------------------------- 449: 450: =head1 METHODS 451: 452: =head2 new( %options ) 453: 454: Constructs and returns a new C<Email::Abuse::Investigator> analyser object. The 455: object is stateless until C<parse_email()> is called; all analysis results 456: are stored on the object and retrieved via the public accessor methods 457: documented below. 458: 459: A single object may be reused for multiple emails by calling C<parse_email()> 460: again: all per-message cached state from the previous message is discarded 461: automatically. Cross-message IP and domain lookup results are retained 462: in a shared CHI cache (if C<CHI> is installed) to avoid redundant network 463: queries across messages processed in the same process. 464: 465: =head3 Usage 466: 467: # Minimal -- all options take safe defaults 468: my $analyser = Email::Abuse::Investigator->new(); 469: 470: # With options 471: my $analyser = Email::Abuse::Investigator->new( 472: timeout => 15, 473: trusted_relays => ['203.0.113.0/24', '10.0.0.0/8'], 474: verbose => 0, 475: ); 476: 477: $analyser->parse_email($raw_rfc2822_text); 478: my $origin = $analyser->originating_ip(); 479: my @urls = $analyser->embedded_urls(); 480: my @domains = $analyser->mailto_domains(); 481: my $risk = $analyser->risk_assessment(); 482: my @contacts = $analyser->abuse_contacts(); 483: print $analyser->report(); 484: 485: =head3 Arguments 486: 487: All arguments are optional named parameters passed as a flat key-value list. 488: 489: =over 4 490: 491: =item C<timeout> (integer, default 10) 492: 493: Maximum seconds to wait for any single network operation. Set to 0 to 494: disable timeouts (not recommended for production use). 495: 496: =item C<trusted_relays> (arrayref of strings, default []) 497: 498: IP addresses or CIDR blocks to skip during Received: chain analysis. 499: Each element may be an exact IPv4 address (C<'192.0.2.1'>) or a CIDR 500: block (C<'192.0.2.0/24'>). 501: 502: =item C<verbose> (boolean, default 0) 503: 504: When true, diagnostic messages are written to STDERR. 505: 506: =back 507: 508: =head3 Returns 509: 510: A blessed C<Email::Abuse::Investigator> object. No network I/O is performed 511: during construction. 512: 513: =head3 Side Effects 514: 515: If C<CHI> is installed, a shared in-memory cache is initialised (or 516: re-used if a cache was already created by a prior call to C<new()>). 517: This cache persists for the lifetime of the process. 518: 519: =head3 Notes 520: 521: =over 4 522: 523: =item * 524: 525: Unknown option keys are silently ignored. 526: 527: =item * 528: 529: The object is not thread-safe. Use a separate object per thread. 530: 531: =item * 532: 533: WHOIS read timeouts use C<IO::Select> rather than C<alarm()>, so they 534: work correctly on Windows and in threaded Perl interpreters. 535: 536: =back 537: 538: =head3 API Specification 539: 540: =head4 Input 541: 542: { 543: timeout => { 544: type => 'integer', 545: optional => 1, 546: min => 0, 547: default => 10, 548: }, 549: trusted_relays => { 550: type => 'arrayref', 551: element_type => 'string', 552: optional => 1, 553: default => [], 554: }, 555: verbose => { 556: type => 'boolean', 557: optional => 1, 558: default => 0, 559: }, 560: } 561: 562: =head4 Output 563: 564: { 565: type => 'Email::Abuse::Investigator', 566: isa => 'Email::Abuse::Investigator', 567: } 568: 569: =head3 FORMAL SPECIFICATION 570: 571: -- Z notation (simplified) 572: new == [ 573: timeout : N; 574: trusted_relays : seq STRING; 575: verbose : BOOL; 576: _raw : STRING; 577: _headers : seq (STRING x STRING); 578: _origin? : IP_INFO | undefined; 579: _urls? : seq URL_INFO | undefined; 580: _risk? : RISK_INFO | undefined 581: ] 582: pre: timeout >= 0 583: post: self.timeout = params.timeout /\ self._raw = '' 584: 585: =cut 586: 587: # Class-level cross-message CHI cache (shared across all instances). 588: # Populated lazily on first call to new() when CHI is available. 589: my $_cache; 590: 591: sub new { ●592 → 619 → 628●592 → 619 → 0 592: my $class = shift; 593: 594: # Accept hash or hashref arguments uniformly 595: my $params = Params::Validate::Strict::validate_strict({ 596: args => Params::Get::get_params(undef, \@_) || {}, 597: schema => { 598: timeout => { 599: 'type' => 'integer', 600: 'optional' => 1, 601: 'min' => 0, 602: }, 603: trusted_relays => { 604: 'type' => 'arrayref', 605: 'element_type' => 'string', 606: 'optional' => 1, 607: }, 608: verbose => { 609: 'type' => 'boolean', 610: 'optional' => 1, 611: }, 612: }, 613: }); 614: 615: # Merge in any file-based configuration via Object::Configure 616: $params = Object::Configure::configure($class, $params); 617: 618: # Initialise the cross-message CHI cache on first construction

Mutants (Total: 1, Killed: 1, Survived: 0)

620: $_cache = CHI->new( 621: driver => 'Memory', 622: global => 1, 623: expires_in => $CACHE_TTL_SECS, 624: ); 625: } 626: 627: # Build and bless the object with default slot values ●[NOT COVERED] 628 → 628 → 0 628: return bless { 629: timeout => $DEFAULT_TIMEOUT, 630: trusted_relays => [], 631: verbose => 0, 632: _raw => '', 633: _headers => [], 634: _body_plain => '', 635: _body_html => '', 636: _received => [], 637: _origin => undef, 638: _urls => undef, # lazy-computed by embedded_urls() 639: _mailto_domains=> undef, # lazy-computed by mailto_domains() 640: _domain_info => {}, # per-message domain analysis cache 641: _sending_sw => [], # X-Mailer / X-PHP-Originating-Script etc. 642: _rcvd_tracking => [], # per-hop tracking IDs from Received: headers 643: %{$params}, # Override the defaults with Object:Configure and the values passed in 644: }, $class; 645: } 646: 647: # ----------------------------------------------------------------------- 648: # Public: parse 649: # ----------------------------------------------------------------------- 650: 651: =head2 parse_email( $text ) 652: 653: Feeds a raw RFC 2822 email message to the analyser and prepares it for 654: subsequent interrogation. This is the only method that must be called 655: before any other public method. 656: 657: If the same object is used for a second message, calling C<parse_email()> 658: again completely replaces all per-message state from the first message. 659: The cross-message CHI cache is B<not> flushed; IP and domain lookups 660: cached from prior messages are retained. 661: 662: =head3 Usage 663: 664: my $raw = do { local $/; <STDIN> }; 665: $analyser->parse_email($raw); 666: 667: # Scalar reference (avoids copying large messages) 668: $analyser->parse_email(\$raw); 669: 670: # Chained 671: my $analyser = Email::Abuse::Investigator->new()->parse_email($raw); 672: 673: =head3 Arguments 674: 675: =over 4 676: 677: =item C<$text> (scalar or scalar reference, required) 678: 679: Complete raw RFC 2822 email message, including all headers and the body. 680: Both LF-only and CRLF line endings are accepted. 681: 682: =back 683: 684: =head3 Returns 685: 686: The object itself (C<$self>), enabling method chaining. 687: 688: =head3 Side Effects 689: 690: Parses headers, decodes the body (quoted-printable, base64, multipart), 691: extracts sending-software fingerprints, and populates per-hop tracking 692: data. All previously computed lazy results are discarded. 693: 694: =head3 Notes 695: 696: =over 4 697: 698: =item * 699: 700: If C<$text> is empty or contains no header/body separator, all public 701: methods will return empty/safe values. 702: 703: =item * 704: 705: Decoding errors in base64 or quoted-printable payloads are silenced; raw 706: bytes are used in place of correct output to prevent exceptions. 707: 708: =back 709: 710: =head3 API Specification 711: 712: =head4 Input 713: 714: [ 715: { 716: type => 'scalar | scalarref', 717: }, 718: ] 719: 720: =head4 Output 721: 722: { 723: type => 'Email::Abuse::Investigator', 724: isa => 'Email::Abuse::Investigator', 725: } 726: 727: =head3 FORMAL SPECIFICATION 728: 729: -- Z notation 730: parse_email == [ 731: Delta Email::Abuse::Investigator; 732: text? : STRING | ref STRING 733: ] 734: pre: defined text? 735: post: self._raw = deref(text?) /\ 736: self._origin = undefined /\ 737: self._urls = undefined /\ 738: self._risk = undefined 739: 740: =cut 741: 742: # TODO: Allow a Mail::Message object to be passed in 743: sub parse_email { 744: my $self = shift; 745: 746: # Accept both positional string and named 'text' argument 747: my $args = Params::Get::get_params('text', \@_); 748: my $text = $args->{text}; 749: 750: # Dereference if a scalar reference was supplied 751: $text = $$text if ref $text eq 'SCALAR'; 752: 753: # Sanitise: strip control characters that could affect terminal output. 754: # Keep \t (tabs in headers), \n (line endings), \r (CRLF mail format). 755: $text =~ s/[^\x09\x0A\x0D\x20-\x7E\x80-\xFF]//g if defined $text; 756: 757: # Store the sanitised raw text for later reproduction in reports 758: $self->{_raw} = $text // ''; 759: 760: # Invalidate all per-message lazy caches 761: $self->{_origin} = undef; 762: $self->{_urls} = undef; 763: $self->{_mailto_domains} = undef; 764: $self->{_domain_info} = {}; 765: $self->{_risk} = undef; 766: $self->{_auth_results} = undef; 767: $self->{_sending_sw} = []; 768: $self->{_rcvd_tracking} = []; 769: 770: # Perform synchronous header/body parsing (no network I/O) 771: $self->_split_message($text) if defined $text && $text =~ /\S/; 772: return $self;

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_772_2: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_772_2: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

773: } 774: 775: # ----------------------------------------------------------------------- 776: # Public: originating host 777: # ----------------------------------------------------------------------- 778: 779: =head2 originating_ip() 780: 781: Identifies the IP address of the machine that originally injected the 782: message into the mail system by walking the C<Received:> chain, skipping 783: private/trusted hops, and enriching the first external hop with rDNS, 784: WHOIS/RDAP organisation name, abuse contact, and country code. 785: 786: Both IPv4 and IPv6 addresses are extracted and evaluated. 787: 788: The result is cached; subsequent calls return the same hashref without 789: repeating network I/O. 790: 791: =head3 Usage 792: 793: my $orig = $analyser->originating_ip(); 794: if (defined $orig) { 795: printf "Origin: %s (%s)\n", $orig->{ip}, $orig->{rdns}; 796: printf "Owner: %s\n", $orig->{org}; 797: } 798: 799: =head3 Arguments 800: 801: None. C<parse_email()> must have been called first. 802: 803: =head3 Returns 804: 805: A hashref with keys C<ip>, C<rdns>, C<org>, C<abuse>, C<confidence>, 806: C<note>, and C<country> (may be undef). Returns C<undef> if no suitable 807: originating IP can be determined. 808: 809: =head3 Side Effects 810: 811: On first call: one PTR lookup and one RDAP/WHOIS query. Results are cached 812: in the object and in the cross-message CHI cache (if available). 813: 814: =head3 Notes 815: 816: Only the first (oldest) external IP in the chain is reported. See 817: C<received_trail()> for the full chain. 818: 819: =head3 API Specification 820: 821: =head4 Input 822: 823: [] 824: 825: =head4 Output 826: 827: { 828: type => 'hashref | undef', 829: keys => { 830: ip => { type => 'scalar', regex => qr/[\d.:a-fA-F]/ }, 831: rdns => { type => 'scalar' }, 832: org => { type => 'scalar' }, 833: abuse => { type => 'scalar' }, 834: confidence => { type => 'scalar', regex => qr/^(?:high|medium|low)$/ }, 835: note => { type => 'scalar' }, 836: country => { type => 'scalar', optional => 1 }, 837: }, 838: } 839: 840: =head3 FORMAL SPECIFICATION 841: 842: -- Z notation 843: originating_ip == [ 844: Xi Email::Abuse::Investigator; 845: result! : IP_INFO | undefined 846: ] 847: pre: self._raw /= '' 848: post: result! = self._origin /\ 849: (result! /= undefined => result!.ip in EXTERNAL_IPS) 850: 851: =cut 852: 853: sub originating_ip { 854: my $self = $_[0]; 855: 856: # Return the cached result if we already have it 857: $self->{_origin} //= $self->_find_origin();

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_858_2: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_858_2: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

859: } 860: 861: # ----------------------------------------------------------------------- 862: # Public: HTTP/HTTPS URLs 863: # ----------------------------------------------------------------------- 864: 865: =head2 embedded_urls() 866: 867: Extracts every HTTP and HTTPS URL from the message body and enriches each 868: one with the hosting IP address, network organisation name, abuse contact, 869: and country code. Both IPv4 and IPv6 host addresses are supported. 870: 871: URL extraction runs across both plain-text and HTML body parts. DNS 872: lookups for each unique hostname are optionally parallelised via 873: C<AnyEvent::DNS> if that module is installed. 874: 875: The result is cached; subsequent calls return the same list without 876: repeating network I/O. 877: 878: =head3 Usage 879: 880: my @urls = $analyser->embedded_urls(); 881: for my $u (@urls) { 882: printf "URL: %s host: %s org: %s\n", 883: $u->{url}, $u->{host}, $u->{org}; 884: } 885: 886: =head3 Arguments 887: 888: None. C<parse_email()> must have been called first. 889: 890: =head3 Returns 891: 892: A list of hashrefs, one per unique URL, in first-seen order. Returns an 893: empty list if no HTTP/HTTPS URLs are present. Each hashref has keys 894: C<url>, C<host>, C<ip>, C<org>, C<abuse>, C<country>. 895: 896: =head3 Side Effects 897: 898: Per unique hostname: one A/AAAA lookup and one RDAP/WHOIS query. Results 899: are cached in the object and in the cross-message CHI cache. 900: 901: =head3 Notes 902: 903: Only C<http://> and C<https://> URLs are extracted. URL shortener hosts 904: are included in the returned list (they are flagged by C<risk_assessment()>). 905: 906: =head3 API Specification 907: 908: =head4 Input 909: 910: [] 911: 912: =head4 Output 913: 914: ( 915: { 916: type => 'hashref', 917: keys => { 918: url => { type => 'scalar', regex => qr{^https?://}i }, 919: host => { type => 'scalar' }, 920: ip => { type => 'scalar' }, 921: org => { type => 'scalar' }, 922: abuse => { type => 'scalar' }, 923: country => { type => 'scalar', optional => 1 }, 924: }, 925: }, 926: ... 927: ) 928: 929: =head3 FORMAL SPECIFICATION 930: 931: -- Z notation 932: embedded_urls == [ 933: Xi Email::Abuse::Investigator; 934: result! : seq URL_INFO 935: ] 936: pre: self._raw /= '' 937: post: result! = self._urls /\ 938: forall u : result! @ u.url =~ m{^https?://}i 939: 940: =cut 941: 942: sub embedded_urls { 943: my $self = $_[0]; 944: 945: $self->{_urls} //= $self->_extract_and_resolve_urls(); 946: return @{ $self->{_urls} }; 947: } 948: 949: # ----------------------------------------------------------------------- 950: # Public: mailto / reply-to / from domains 951: # ----------------------------------------------------------------------- 952: 953: =head2 mailto_domains() 954: 955: Identifies every domain associated with the message as a contact, reply, 956: or delivery address, then runs a full intelligence pipeline on each one 957: (A record, MX, NS, WHOIS) to determine hosting and registration details. 958: 959: The result is cached; subsequent calls return the same list without 960: repeating network I/O. 961: 962: =head3 Usage 963: 964: my @domains = $analyser->mailto_domains(); 965: for my $d (@domains) { 966: printf "Domain: %s registrar: %s\n", 967: $d->{domain}, $d->{registrar} // 'unknown'; 968: } 969: 970: =head3 Arguments 971: 972: None. C<parse_email()> must have been called first. 973: 974: =head3 Returns 975: 976: A list of hashrefs, one per unique domain. See the main POD for the full 977: set of possible keys. Returns an empty list if no qualifying domains are 978: found. 979: 980: =head3 Side Effects 981: 982: Per unique domain: up to three A lookups, one MX lookup, one NS lookup, 983: and two WHOIS queries. Results are cached in the object and in the 984: cross-message CHI cache. 985: 986: =head3 Notes 987: 988: MX and NS lookups require C<Net::DNS>. Without it those keys are absent 989: from every returned hashref. 990: 991: =head3 API Specification 992: 993: =head4 Input 994: 995: [] 996: 997: =head4 Output 998: 999: ( 1000: { 1001: type => 'hashref', 1002: keys => { 1003: domain => { type => 'scalar' }, 1004: source => { type => 'scalar' }, 1005: # All other keys optional -- see main POD 1006: }, 1007: }, 1008: ... 1009: ) 1010: 1011: =head3 FORMAL SPECIFICATION 1012: 1013: -- Z notation 1014: mailto_domains == [ 1015: Xi Email::Abuse::Investigator; 1016: result! : seq DOMAIN_INFO 1017: ] 1018: pre: self._raw /= '' 1019: post: result! = self._mailto_domains /\ 1020: forall d : result! @ d.domain =~ /\.[a-zA-Z]{2,}$/ 1021: 1022: =cut 1023: 1024: sub mailto_domains { 1025: my $self = $_[0]; 1026: 1027: $self->{_mailto_domains} //= $self->_extract_and_analyse_domains(); 1028: return @{ $self->{_mailto_domains} }; 1029: } 1030: 1031: =head2 all_domains() 1032: 1033: Returns the deduplicated union of every registrable domain seen anywhere 1034: in the message -- URL hosts from C<embedded_urls()> and contact domains 1035: from C<mailto_domains()> -- normalised to eTLD+1 form. 1036: 1037: Triggers C<embedded_urls()> and C<mailto_domains()> lazily. 1038: 1039: =head3 Usage 1040: 1041: my @domains = $analyser->all_domains(); 1042: print "$_\n" for @domains; 1043: 1044: =head3 Arguments 1045: 1046: None. 1047: 1048: =head3 Returns 1049: 1050: A list of plain strings (registrable domain names), lower-cased, no 1051: duplicates, in first-seen order. 1052: 1053: =head3 Side Effects 1054: 1055: Triggers C<embedded_urls()> and C<mailto_domains()> if not already cached. 1056: 1057: =head3 Notes 1058: 1059: Normalisation to eTLD+1 uses C<Domain::PublicSuffix> if installed, falling 1060: back to a built-in heuristic otherwise. 1061: 1062: =head3 API Specification 1063: 1064: =head4 Input 1065: 1066: [] 1067: 1068: =head4 Output 1069: 1070: ( 1071: { type => 'scalar', regex => qr/^[a-z0-9][a-z0-9.-]*\.[a-z]{2,}$/ }, 1072: ... 1073: ) 1074: 1075: =head3 FORMAL SPECIFICATION 1076: 1077: -- Z notation 1078: all_domains == [ 1079: Xi Email::Abuse::Investigator; 1080: result! : seq STRING 1081: ] 1082: post: result! = deduplicate( 1083: map(_registrable, url_hosts union mailto_domains) 1084: ) 1085: 1086: =cut 1087: 1088: sub all_domains { ●1089 → 1093 → 1099●1089 → 1093 → 0 1089: my $self = $_[0]; 1090: my (%seen, @out); 1091: 1092: # Collect registrable domains from URL hosts first 1093: for my $u ($self->embedded_urls()) { 1094: my $dom = _registrable($u->{host}); 1095: push @out, $dom if $dom && !$seen{$dom}++; 1096: } 1097: 1098: # Then from contact domains (normalise subdomains to registrable parent) ●1099 → 1099 → 1103●1099 → 1099 → 0 1099: for my $d ($self->mailto_domains()) { 1100: my $dom = _registrable($d->{domain}) // $d->{domain}; 1101: push @out, $dom if $dom && !$seen{$dom}++; 1102: }

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_1103_2: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_1103_2: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

1104: } 1105: 1106: =head2 unresolved_contacts() 1107: 1108: Returns a list of domains and URL hosts found in the message for which no 1109: abuse contact could be determined. Useful for surfacing parties that may 1110: warrant manual investigation. 1111: 1112: =head3 Usage 1113: 1114: my @unresolved = $analyser->unresolved_contacts(); 1115: for my $u (@unresolved) { 1116: printf "Unresolved: %s (%s) via %s\n", 1117: $u->{domain}, $u->{type}, $u->{source}; 1118: } 1119: 1120: =head3 Arguments 1121: 1122: None. 1123: 1124: =head3 Returns 1125: 1126: A list of hashrefs, each with keys C<domain>, C<type> (C<'url_host'> or 1127: C<'domain'>), and C<source> (where the domain was found). 1128: 1129: =head3 Side Effects 1130: 1131: Triggers C<embedded_urls()>, C<mailto_domains()>, C<abuse_contacts()>, 1132: and C<form_contacts()> if not already cached. 1133: 1134: =head3 Notes 1135: 1136: Domains sourced only from spoofable sending headers (C<From:>, 1137: C<Return-Path:>, C<Sender:>) are excluded. 1138: 1139: =head3 API Specification 1140: 1141: =head4 Input 1142: 1143: [] 1144: 1145: =head4 Output 1146: 1147: ( 1148: { 1149: type => 'hashref', 1150: keys => { 1151: domain => { type => 'scalar' }, 1152: type => { type => 'scalar', regex => qr/^(?:url_host|domain)$/ }, 1153: source => { type => 'scalar' }, 1154: }, 1155: }, 1156: ... 1157: ) 1158: 1159: =head3 FORMAL SPECIFICATION 1160: 1161: -- Z notation 1162: unresolved_contacts == [ 1163: Xi Email::Abuse::Investigator; 1164: result! : seq UNRESOLVED_INFO 1165: ] 1166: post: forall u : result! @ 1167: u.domain not_in covered_domains(abuse_contacts, form_contacts) 1168: 1169: =cut 1170: 1171: sub unresolved_contacts { ●1172 → 1176 → 1186●1172 → 1176 → 0 1172: my $self = $_[0]; 1173: 1174: # Build a set of domains already covered by email or form contacts 1175: my %covered; 1176: for my $c ($self->abuse_contacts(), $self->form_contacts()) { 1177: my $dom = $c->{form_domain};

Mutants (Total: 1, Killed: 1, Survived: 0)

1179: # Extract domain from abuse email address 1180: ($dom) = ($c->{address} // '') =~ /\@([\w.-]+)/; 1181: } 1182: $covered{lc $dom}++ if $dom; 1183: } 1184: 1185: # Also mark URL hosts that already have a resolved abuse address ●1186 → 1186 → 1191●1186 → 1186 → 0 1186: for my $u ($self->embedded_urls()) { 1187: (my $bare = lc $u->{host}) =~ s/^www\.//; 1188: $covered{$bare}++ if $u->{abuse} && $u->{abuse} ne '(unknown)'; 1189: } 1190: ●1191 → 1194 → 1206●1191 → 1194 → 0 1191: my (@out, %seen); 1192: 1193: # Check URL hosts first 1194: for my $u ($self->embedded_urls()) { 1195: (my $bare = lc $u->{host}) =~ s/^www\.//; 1196: next if $covered{$bare}; 1197: next if $seen{"url:$bare"}++; 1198: push @out, { 1199: domain => $u->{host}, 1200: type => 'url_host', 1201: source => 'URL in body', 1202: }; 1203: } 1204: 1205: # Then check contact domains, skipping spoofable-header-only sources ●1206 → 1206 → 1219●1206 → 1206 → 0 1206: for my $d ($self->mailto_domains()) { 1207: my $dom = $d->{domain}; 1208: my $source = $d->{source} // ''; 1209: next if $source =~ /^(?:From:|Return-Path:|Sender:) header$/; 1210: next if $covered{lc $dom}; 1211: next if $seen{"dom:$dom"}++; 1212: push @out, { 1213: domain => $dom, 1214: type => 'domain', 1215: source => $source, 1216: }; 1217: } 1218: ●1219 → 1219 → 0 1219: return @out;

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_1219_2: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_1219_2: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

1220: } 1221: 1222: # ----------------------------------------------------------------------- 1223: # Public: sending software fingerprint 1224: # ----------------------------------------------------------------------- 1225: 1226: =head2 sending_software() 1227: 1228: Returns information extracted from headers that identify the software or 1229: server-side infrastructure used to compose or inject the message. Headers 1230: such as C<X-PHP-Originating-Script> reveal the exact PHP script and Unix 1231: account responsible on shared-hosting platforms. 1232: 1233: Data is extracted during C<parse_email()> with no network I/O. 1234: 1235: =head3 Usage 1236: 1237: my @sw = $analyser->sending_software(); 1238: for my $s (@sw) { 1239: printf "%-30s : %s\n", $s->{header}, $s->{value}; 1240: } 1241: 1242: =head3 Arguments 1243: 1244: None. C<parse_email()> must have been called first. 1245: 1246: =head3 Returns 1247: 1248: A list of hashrefs in alphabetical header-name order. Returns an empty 1249: list if none of the watched headers are present. Each hashref has keys 1250: C<header>, C<value>, and C<note>. 1251: 1252: =head3 Side Effects 1253: 1254: None. Data is pre-collected during C<parse_email()>. 1255: 1256: =head3 Notes 1257: 1258: Header names are lower-cased. Header values are stored verbatim. 1259: 1260: =head3 API Specification 1261: 1262: =head4 Input 1263: 1264: [] 1265: 1266: =head4 Output 1267: 1268: ( 1269: { 1270: type => 'hashref', 1271: keys => { 1272: header => { type => 'scalar' }, 1273: value => { type => 'scalar' }, 1274: note => { type => 'scalar' }, 1275: }, 1276: }, 1277: ... 1278: ) 1279: 1280: =head3 FORMAL SPECIFICATION 1281: 1282: -- Z notation 1283: sending_software == [ 1284: Xi Email::Abuse::Investigator; 1285: result! : seq SW_INFO 1286: ] 1287: post: result! = self._sending_sw 1288: 1289: =cut 1290: 1291: sub sending_software { 1292: my $self = $_[0]; 1293: 1294: return @{ $self->{_sending_sw} }; 1295: } 1296: 1297: # ----------------------------------------------------------------------- 1298: # Public: per-hop tracking IDs 1299: # ----------------------------------------------------------------------- 1300: 1301: =head2 received_trail() 1302: 1303: Returns per-hop tracking data extracted from the C<Received:> header chain: 1304: the IP address, envelope recipient address, and server session ID for each 1305: relay. ISP postmasters use these identifiers to locate the SMTP session in 1306: their logs. 1307: 1308: =head3 Usage 1309: 1310: my @trail = $analyser->received_trail(); 1311: for my $hop (@trail) { 1312: printf "IP: %s ID: %s\n", 1313: $hop->{ip} // '?', $hop->{id} // '?'; 1314: } 1315: 1316: =head3 Arguments 1317: 1318: None. C<parse_email()> must have been called first. 1319: 1320: =head3 Returns 1321: 1322: A list of hashrefs in oldest-first order. Returns an empty list if no 1323: C<Received:> headers are present or none yielded extractable data. Each 1324: hashref has keys C<received>, C<ip> (may be undef), C<for> (may be undef), 1325: C<id> (may be undef). 1326: 1327: =head3 Side Effects 1328: 1329: None. Data is pre-collected during C<parse_email()>. 1330: 1331: =head3 Notes 1332: 1333: Private IPs are NOT filtered here; all IPs including RFC 1918 addresses 1334: are returned as found. Filtering is applied only by C<originating_ip()>. 1335: 1336: =head3 API Specification 1337: 1338: =head4 Input 1339: 1340: [] 1341: 1342: =head4 Output 1343: 1344: ( 1345: { 1346: type => 'hashref', 1347: keys => { 1348: received => { type => 'scalar' }, 1349: ip => { type => 'scalar', optional => 1 }, 1350: for => { type => 'scalar', optional => 1 }, 1351: id => { type => 'scalar', optional => 1 }, 1352: }, 1353: }, 1354: ... 1355: ) 1356: 1357: =head3 FORMAL SPECIFICATION 1358: 1359: -- Z notation 1360: received_trail == [ 1361: Xi Email::Abuse::Investigator; 1362: result! : seq HOP_INFO 1363: ] 1364: post: result! = self._rcvd_tracking 1365: 1366: =cut 1367: 1368: sub received_trail { 1369: my $self = $_[0]; 1370: 1371: return @{ $self->{_rcvd_tracking} }; 1372: } 1373: 1374: # ----------------------------------------------------------------------- 1375: # Public: risk assessment 1376: # ----------------------------------------------------------------------- 1377: 1378: =head2 risk_assessment() 1379: 1380: Evaluates the message against heuristic checks and returns an overall risk 1381: level, a weighted numeric score, and a list of every specific red flag. 1382: 1383: The assessment covers five categories: originating IP, email authentication, 1384: Date: header validity, identity/header consistency, and URL/domain properties. 1385: 1386: The result is cached; subsequent calls return the same hashref without 1387: repeating any analysis. 1388: 1389: =head3 Usage 1390: 1391: my $risk = $analyser->risk_assessment(); 1392: printf "Risk: %s (score: %d)\n", $risk->{level}, $risk->{score}; 1393: for my $f (@{ $risk->{flags} }) { 1394: printf " [%s] %s\n", $f->{severity}, $f->{detail}; 1395: } 1396: 1397: =head3 Arguments 1398: 1399: None. C<parse_email()> must have been called first. 1400: 1401: =head3 Returns 1402: 1403: A hashref with keys C<level> (HIGH/MEDIUM/LOW/INFO), C<score> (integer), 1404: and C<flags> (arrayref of hashrefs with C<severity>, C<flag>, C<detail>). 1405: 1406: =head3 Side Effects 1407: 1408: Triggers C<originating_ip()>, C<embedded_urls()>, and C<mailto_domains()> 1409: if not already cached. 1410: 1411: =head3 Notes 1412: 1413: Scores: HIGH >= 9, MEDIUM >= 5, LOW >= 2, INFO < 2. 1414: Flag weights: HIGH=3, MEDIUM=2, LOW=1, INFO=0. 1415: 1416: =head3 API Specification 1417: 1418: =head4 Input 1419: 1420: [] 1421: 1422: =head4 Output 1423: 1424: { 1425: type => 'hashref', 1426: keys => { 1427: level => { type => 'scalar', regex => qr/^(?:HIGH|MEDIUM|LOW|INFO)$/ }, 1428: score => { type => 'scalar', regex => qr/^\d+$/ }, 1429: flags => { type => 'arrayref' }, 1430: }, 1431: } 1432: 1433: =head3 FORMAL SPECIFICATION 1434: 1435: -- Z notation 1436: risk_assessment == [ 1437: Xi Email::Abuse::Investigator; 1438: result! : RISK_INFO 1439: ] 1440: post: result!.score = sum({ w(f.severity) | f in result!.flags }) /\ 1441: result!.level = classify(result!.score) 1442: where: 1443: w(HIGH) = 3; w(MEDIUM) = 2; w(LOW) = 1; w(INFO) = 0 1444: classify(s) = HIGH if s >= 9 1445: | MEDIUM if s >= 5 1446: | LOW if s >= 2 1447: | INFO otherwise 1448: 1449: =cut 1450: 1451: sub risk_assessment { ●1452 → 1468 → 1501●1452 → 1468 → 0 1452: my $self = $_[0]; 1453:

Mutants (Total: 2, Killed: 2, Survived: 0)

1455: 1456: my (@flags, $score); 1457: $score = 0; 1458: 1459: # Closure to record a flag and accumulate its weight 1460: my $flag = sub { 1461: my ($severity, $name, $detail) = @_; 1462: $score += $FLAG_WEIGHT{$severity} // 1; 1463: push @flags, { severity => $severity, flag => $name, detail => $detail }; 1464: }; 1465: 1466: # --- Group 1: Originating IP checks --- 1467: my $orig = $self->originating_ip(); 1468: if ($orig) {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1468_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1469: # Residential / broadband rDNS patterns suggest a compromised host

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1470_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1471: \d+[-_.]\d+[-_.]\d+[-_.]\d+ # dotted-quad in rDNS 1472: | (?:dsl|adsl|cable|broad|dial|dynamic|dhcp|ppp| 1473: residential|cust|home|pool|client|user| 1474: static\d|host\d) 1475: /xi) { 1476: $flag->('HIGH', 'residential_sending_ip', 1477: "Sending IP $orig->{ip} rDNS '$orig->{rdns}' looks like a broadband/residential line, not a legitimate mail server"); 1478: } 1479: 1480: # Absence of rDNS is a strong spam indicator

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1481_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1482: $flag->('HIGH', 'no_reverse_dns', 1483: "Sending IP $orig->{ip} has no reverse DNS -- legitimate mail servers always have rDNS"); 1484: } 1485: 1486: # Low-confidence origin means the IP came from an unverifiable header

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1487_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1488: $flag->('MEDIUM', 'low_confidence_origin', 1489: "Originating IP taken from unverified header ($orig->{note})"); 1490: } 1491: 1492: # Statistically high-volume spam countries (informational only)

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1493_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1494: $flag->('INFO', 'high_spam_country', 1495: 'Sending IP is in ' . _country_name($orig->{country}) . 1496: " ($orig->{country}) -- statistically high spam volume country"); 1497: } 1498: } 1499: 1500: # --- Group 2: Email authentication checks --- ●1501 → 1502 → 1514●1501 → 1502 → 0 1501: my $auth = $self->_parse_auth_results_cached();

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1502_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1503_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1504: $flag->('HIGH', 'spf_fail', 1505: "SPF result: $auth->{spf} -- sending IP not authorised by domain's SPF record"); 1506: } elsif ($auth->{spf} =~ /^softfail/i) { 1507: $flag->('MEDIUM', 'spf_softfail', 1508: "SPF result: softfail (~all) -- sending IP not explicitly authorised"); 1509: } elsif ($auth->{spf} !~ /^pass/i) { 1510: $flag->('HIGH', 'spf_fail', 1511: "SPF result: $auth->{spf} -- sending IP not authorised"); 1512: } 1513: }

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1514_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1515: $flag->('HIGH', 'dkim_fail', 1516: "DKIM result: $auth->{dkim} -- message signature invalid or absent"); 1517: }

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1518_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1519: $flag->('HIGH', 'dmarc_fail', "DMARC result: $auth->{dmarc}"); 1520: } 1521: 1522: # DKIM signing domain vs From: domain mismatch check

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1523_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1524: my ($from_domain) = ($self->_header_value('from') // '') =~ /\@([\w.-]+)/;

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1525_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1526: my $reg_dkim = _registrable($auth->{dkim_domain}) // $auth->{dkim_domain}; 1527: my $reg_from = _registrable(lc $from_domain) // lc $from_domain;

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1528_4: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1529: # Passing DKIM with a different domain is normal for ESPs

Mutants (Total: 1, Killed: 1, Survived: 0)

1531: $flag->('INFO', 'dkim_domain_mismatch', 1532: "DKIM signed by '$auth->{dkim_domain}' but From: domain is '$from_domain'" 1533: . ' -- message sent via third-party sender (normal for bulk/ESP mail)'); 1534: } else { 1535: # Failing DKIM plus mismatched domain is more suspicious 1536: $flag->('MEDIUM', 'dkim_domain_mismatch', 1537: "DKIM signed by '$auth->{dkim_domain}' but From: domain is '$from_domain'" 1538: . ' and DKIM did not pass -- possible impersonation'); 1539: } 1540: } 1541: } 1542: } 1543: 1544: # --- Group 3: Date: header checks --- ●1545 → 1546 → 1579●1545 → 1546 → 0 1545: my $date_raw = $self->_header_value('date'); 1546: if (!$date_raw || $date_raw !~ /\S/) {

Mutants (Total: 1, Killed: 1, Survived: 0)

1547: $flag->('MEDIUM', 'missing_date', 1548: 'No Date: header -- violates RFC 5322; common in spam'); 1549: } else { 1550: # Check for an implausible timezone offset (outside real-world bounds) 1551: if ($date_raw =~ /([+-])(\d{2})(\d{2})\s*$/) {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1551_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1552: my ($sign, $hh, $mm) = ($1, $2, $3); 1553: my $offset_mins = $hh * 60 + $mm;

Mutants (Total: 3, Killed: 0, Survived: 3)

NUM_BOUNDARY_1554_26_>: Numeric boundary flip >= to >
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_1554_26_<: Numeric boundary flip >= to <
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_1554_26_<=: Numeric boundary flip >= to <=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```

Mutants (Total: 3, Killed: 0, Survived: 3)

NUM_BOUNDARY_1555_38_<: Numeric boundary flip > to <
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_1555_38_>=: Numeric boundary flip > to >=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_1555_38_<=: Numeric boundary flip > to <=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```

Mutants (Total: 3, Killed: 0, Survived: 3)

NUM_BOUNDARY_1556_38_<: Numeric boundary flip > to <
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_1556_38_>=: Numeric boundary flip > to >=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_1556_38_<=: Numeric boundary flip > to <=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```

Mutants (Total: 1, Killed: 1, Survived: 0)

1558: $flag->('MEDIUM', 'implausible_timezone', 1559: "Date: '$date_raw' contains an implausible timezone offset " 1560: . "($sign$hh$mm) -- header is likely forged"); 1561: } 1562: } 1563: 1564: # Check for dates more than DATE_SKEW_DAYS outside the analysis window 1565: my $date_epoch = _parse_rfc2822_date($date_raw); 1566: if (defined $date_epoch) {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1566_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1567: my $delta = time() - $date_epoch;

Mutants (Total: 4, Killed: 0, Survived: 4)

NUM_BOUNDARY_1568_15_<: Numeric boundary flip > to <
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_1568_15_>=: Numeric boundary flip > to >=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_1568_15_<=: Numeric boundary flip > to <=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
COND_INV_1568_4: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1569: $flag->('LOW', 'suspicious_date', 1570: "Date: '$date_raw' is more than $DATE_SKEW_DAYS days in the past");

Mutants (Total: 3, Killed: 0, Survived: 3)

NUM_BOUNDARY_1571_20_>: Numeric boundary flip < to >
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_1571_20_<=: Numeric boundary flip < to <=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_1571_20_>=: Numeric boundary flip < to >=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```

1572: $flag->('LOW', 'suspicious_date', 1573: "Date: '$date_raw' is more than $DATE_SKEW_DAYS days in the future"); 1574: } 1575: } 1576: } 1577: 1578: # --- Group 4: Header identity checks --- ●1579 → 1583 → 1599●1579 → 1583 → 0 1579: my $from_raw = $self->_header_value('from') // ''; 1580: my $from_decoded = $self->_decode_mime_words($from_raw); 1581: 1582: # Display-name domain spoofing: "PayPal paypal.com" <phish@evil.example>

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1583_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1584: my ($display, $addr) = ($1, $2); 1585: while ($display =~ /\b([\w-]+\.(?:com|net|org|io|co|uk|au|gov|edu))\b/gi) { 1586: my $disp_domain = lc $1; 1587: my ($addr_domain) = $addr =~ /\@([\w.-]+)/; 1588: $addr_domain = lc($addr_domain // ''); 1589: my $reg_disp = _registrable($disp_domain); 1590: my $reg_addr = _registrable($addr_domain);

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1591_4: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1592: $flag->('HIGH', 'display_name_domain_spoof', 1593: "From: display name mentions '$disp_domain' but actual address is <$addr>"); 1594: } 1595: } 1596: } 1597: 1598: # Free webmail sender flag (no corporate infrastructure)

Mutants (Total: 1, Killed: 1, Survived: 0)

1600: || $from_raw =~ /\@mail\.ru(?:[\s>]|$)/i) { 1601: $flag->('MEDIUM', 'free_webmail_sender', 1602: "Message sent from free webmail address ($from_raw)"); 1603: } 1604: 1605: # Reply-To differs from From: -- replies harvested by different address ●1606 → 1607 → 1617●1606 → 1607 → 0 1606: my $reply_to = $self->_header_value('reply-to'); 1607: if ($reply_to) {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1607_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1608: my ($from_addr) = $from_raw =~ /([\w.+%-]+\@[\w.-]+)/; 1609: my ($reply_addr) = $reply_to =~ /([\w.+%-]+\@[\w.-]+)/;

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1610_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1611: $flag->('MEDIUM', 'reply_to_differs_from_from', 1612: "Reply-To ($reply_addr) differs from From: ($from_addr)"); 1613: } 1614: } 1615: 1616: # Undisclosed or absent To: header ●1617 → 1618 → 1624●1617 → 1618 → 0 1617: my $to = $self->_header_value('to') // '';

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1618_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1619: $flag->('MEDIUM', 'undisclosed_recipients', 1620: "To: header is '$to' -- message was bulk-sent with hidden recipient list"); 1621: } 1622: 1623: # MIME-encoded Subject (potential filter evasion) ●1624 → 1625 → 1632●1624 → 1625 → 0 1624: my $subj_raw = $self->_header_value('subject') // '';

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1625_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1626: $flag->('LOW', 'encoded_subject', 1627: "Subject line is MIME-encoded: '$subj_raw' (decoded: '" 1628: . $self->_decode_mime_words($subj_raw) . "')"); 1629: } 1630: 1631: # --- Group 5: URL and domain checks --- ●1632 → 1633 → 1653●1632 → 1633 → 0 1632: my (%shortener_seen, %url_host_seen); 1633: for my $u ($self->embedded_urls()) { 1634: # Skip trusted infrastructure -- these are not spam indicators 1635: my $bare = lc $u->{host}; 1636: $bare =~ s/^www\.//; 1637: next if $self->{trusted_domains}->{$bare}; 1638: next if $TRUSTED_DOMAINS{$bare}; 1639: 1640: # URL shortener hides real destination

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1641_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1642: $flag->('MEDIUM', 'url_shortener', 1643: "$u->{host} is a URL shortener -- the real destination is hidden"); 1644: } 1645: # Plain HTTP provides no encryption

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1646_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1647: $flag->('LOW', 'http_not_https', 1648: "$u->{host} linked over plain HTTP -- no encryption"); 1649: } 1650: } 1651: 1652: # Domain-level checks against contact/reply domains ●1653 → 1653 → 1689●1653 → 1653 → 0 1653: for my $d ($self->mailto_domains()) { 1654: # Recently registered domain is a common phishing indicator

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1655_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1656: $flag->('HIGH', 'recently_registered_domain', 1657: "$d->{domain} was registered $d->{registered} (less than ${\$RECENT_REG_DAYS} days ago)"); 1658: } 1659: 1660: # Domain expiry checks

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1661_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

Mutants (Total: 1, Killed: 1, Survived: 0)

1663: my $now = time(); 1664: my $remaining = $exp - $now; 1665: if ($remaining > 0 && $remaining < $EXPIRY_WARN_DAYS * $SECS_PER_DAY) {

Mutants (Total: 7, Killed: 0, Survived: 7)

NUM_BOUNDARY_1665_20_<: Numeric boundary flip > to <
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_1665_20_>=: Numeric boundary flip > to >=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_1665_20_<=: Numeric boundary flip > to <=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_1665_38_>: Numeric boundary flip < to >
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_1665_38_<=: Numeric boundary flip < to <=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_1665_38_>=: Numeric boundary flip < to >=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
COND_INV_1665_5: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1666: $flag->('HIGH', 'domain_expires_soon', 1667: "$d->{domain} expires $d->{expires} -- may be a throwaway domain");

Mutants (Total: 3, Killed: 0, Survived: 3)

NUM_BOUNDARY_1668_25_<: Numeric boundary flip <= to <
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_1668_25_>: Numeric boundary flip <= to >
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_1668_25_>=: Numeric boundary flip <= to >=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```

1669: $flag->('HIGH', 'domain_expired', 1670: "$d->{domain} expired $d->{expires} -- domain has lapsed"); 1671: } 1672: } 1673: } 1674: 1675: # Lookalike domain check (brand name in a non-brand domain) 1676: for my $brand (qw(paypal apple google amazon microsoft netflix ebay 1677: instagram facebook twitter linkedin bankofamerica 1678: wellsfargo chase barclays hsbc lloyds santander)) {

Mutants (Total: 1, Killed: 1, Survived: 0)

1680: $d->{domain} !~ /^\Q$brand\E\.(?:com|co\.uk|net|org)$/) { 1681: $flag->('HIGH', 'lookalike_domain', 1682: "$d->{domain} contains brand name '$brand' but is not the real domain -- possible phishing"); 1683: last; 1684: } 1685: } 1686: } 1687: 1688: # Determine overall risk level from accumulated score ●1689 → 1695 → 0 1689: my $level = $score >= $SCORE_HIGH ? 'HIGH'

Mutants (Total: 3, Killed: 0, Survived: 3)

NUM_BOUNDARY_1689_21_>: Numeric boundary flip >= to >
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_1689_21_<: Numeric boundary flip >= to <
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_1689_21_<=: Numeric boundary flip >= to <=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```

Mutants (Total: 3, Killed: 0, Survived: 3)

NUM_BOUNDARY_1690_21_>: Numeric boundary flip >= to >
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_1690_21_<: Numeric boundary flip >= to <
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_1690_21_<=: Numeric boundary flip >= to <=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```

Mutants (Total: 3, Killed: 0, Survived: 3)

NUM_BOUNDARY_1691_21_>: Numeric boundary flip >= to >
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_1691_21_<: Numeric boundary flip >= to <
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_1691_21_<=: Numeric boundary flip >= to <=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```

1692: : 'INFO'; 1693: 1694: $self->{_risk} = { level => $level, score => $score, flags => \@flags };

Mutants (Total: 2, Killed: 2, Survived: 0)

1696: } 1697: 1698: # ----------------------------------------------------------------------- 1699: # Public: abuse report text 1700: # ----------------------------------------------------------------------- 1701: 1702: =head2 abuse_report_text() 1703: 1704: Produces a compact, plain-text string suitable for sending as the body of 1705: an abuse report email. It summarises risk level, red flags, originating IP, 1706: abuse contacts, and original message headers. The message body is omitted 1707: to keep the report concise. 1708: 1709: Use C<abuse_contacts()> to get the recipient addresses and this method for 1710: the body text. 1711: 1712: =head3 Usage 1713: 1714: my $text = $analyser->abuse_report_text(); 1715: my @contacts = $analyser->abuse_contacts(); 1716: for my $c (@contacts) { 1717: send_email(to => $c->{address}, body => $text); 1718: } 1719: 1720: =head3 Arguments 1721: 1722: None. C<parse_email()> must have been called first. 1723: 1724: =head3 Returns 1725: 1726: A plain scalar string, newline-terminated, Unix line endings. Never empty 1727: or undef. 1728: 1729: =head3 Side Effects 1730: 1731: Calls C<risk_assessment()>, C<originating_ip()>, and C<abuse_contacts()> 1732: if not already cached. 1733: 1734: =head3 Notes 1735: 1736: Output text is sanitised: control characters that could affect terminal or 1737: HTML rendering are stripped from all user-derived content before inclusion. 1738: 1739: =head3 API Specification 1740: 1741: =head4 Input 1742: 1743: [] 1744: 1745: =head4 Output 1746: 1747: { type => 'scalar' } 1748: 1749: =head3 FORMAL SPECIFICATION 1750: 1751: -- Z notation 1752: abuse_report_text == [ 1753: Xi Email::Abuse::Investigator; 1754: result! : STRING 1755: ] 1756: post: result! /= '' /\ result! ends_with '\n' 1757: 1758: =cut 1759: 1760: sub abuse_report_text { ●1761 → 1773 → 1782●1761 → 1773 → 0 1761: my $self = $_[0]; 1762: my @out; 1763: 1764: push @out, 'This is an automated abuse report generated by Email::Abuse::Investigator.', 1765: 'Please investigate the following spam/phishing message.', 1766: ''; 1767: 1768: my $risk = $self->risk_assessment(); 1769: push @out, "RISK LEVEL: $risk->{level} (score: $risk->{score})", 1770: ''; 1771: 1772: # List each red flag with its severity prefix 1773: if (@{ $risk->{flags} }) {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1773_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1774: push @out, 'RED FLAGS IDENTIFIED:'; 1775: for my $f (@{ $risk->{flags} }) { 1776: push @out, " [$f->{severity}] " . _sanitise_output($f->{detail}); 1777: } 1778: push @out, ''; 1779: } 1780: 1781: # Originating IP summary block ●1782 → 1783 → 1790●1782 → 1783 → 0 1782: my $orig = $self->originating_ip();

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1783_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1784: push @out, 'ORIGINATING IP: ' . _sanitise_output("$orig->{ip} ($orig->{rdns})"), 1785: 'NETWORK OWNER: ' . _sanitise_output($orig->{org}), 1786: ''; 1787: } 1788: 1789: # Email abuse contacts ●1790 → 1791 → 1798●1790 → 1791 → 0 1790: my @contacts = $self->abuse_contacts();

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1791_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1792: push @out, 'ABUSE CONTACTS:'; 1793: push @out, ' ' . _sanitise_output("$_->{address} ($_->{role})") for @contacts; 1794: push @out, ''; 1795: } 1796: 1797: # Web-form contacts (providers that reject email)

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1798_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1799: push @out, 'WEB-FORM REPORTS REQUIRED:', 1800: ' The following parties do not accept email -- submit manually:'; 1801: for my $c (@form_cs) { 1802: push @out, " [$c->{role}]", 1803: ' Form : ' . _sanitise_output($c->{form}); 1804: push @out, ' Domain : ' . _sanitise_output($c->{form_domain}) if $c->{form_domain}; 1805: push @out, ' Paste : ' . _sanitise_output($c->{form_paste}) if $c->{form_paste}; 1806: push @out, ' Upload : ' . _sanitise_output($c->{form_upload}) if $c->{form_upload}; 1807: } 1808: push @out, ''; 1809: } 1810: 1811: # Separator and raw headers (body excluded for brevity) ●1812 → 1816 → 1819●1812 → 1816 → 0 1812: push @out, '-' x 72, 1813: 'ORIGINAL MESSAGE HEADERS:', 1814: '-' x 72; 1815: 1816: for my $h (@{ $self->{_headers} }) { 1817: push @out, _sanitise_output("$h->{name}: $h->{value}"); 1818: } ●1819 → 1821 → 0 1819: push @out, ''; 1820: 1821: return join("\n", @out); 1822: } 1823: 1824: # ----------------------------------------------------------------------- 1825: # Public: abuse contacts 1826: # ----------------------------------------------------------------------- 1827: 1828: =head2 abuse_contacts() 1829: 1830: Collates the complete set of parties that should receive an abuse report: 1831: the sending ISP, URL host operators, contact domain web/mail/DNS/registrar 1832: contacts, account providers identified from key headers, the DKIM signer, 1833: and the ESP identified via List-Unsubscribe. 1834: 1835: Addresses are deduplicated globally; if the same address is found via 1836: multiple routes, a single entry is kept and role strings are merged. 1837: 1838: =head3 Usage 1839: 1840: my @contacts = $analyser->abuse_contacts(); 1841: my @addrs = map { $_->{address} } @contacts; 1842: 1843: =head3 Arguments 1844: 1845: None. C<parse_email()> must have been called first. 1846: 1847: =head3 Returns 1848: 1849: A list of hashrefs, one per unique abuse address, in discovery order. 1850: Each hashref has keys C<role>, C<roles> (arrayref), C<address>, C<note>, 1851: C<via>. Returns an empty list if no contacts can be determined. 1852: 1853: =head3 Side Effects 1854: 1855: Triggers C<originating_ip()>, C<embedded_urls()>, and C<mailto_domains()> 1856: if not already cached. 1857: 1858: =head3 Notes 1859: 1860: The result is not independently cached; each call recomputes the contact 1861: list from the cached results of the underlying methods. 1862: 1863: =head3 API Specification 1864: 1865: =head4 Input 1866: 1867: [] 1868: 1869: =head4 Output 1870: 1871: ( 1872: { 1873: type => 'hashref', 1874: keys => { 1875: role => { type => 'scalar' }, 1876: roles => { type => 'arrayref' }, 1877: address => { type => 'scalar', regex => qr/\@/ }, 1878: note => { type => 'scalar' }, 1879: via => { type => 'scalar', regex => qr/^(?:provider-table|ip-whois|domain-whois)$/ }, 1880: }, 1881: }, 1882: ... 1883: ) 1884: 1885: =head3 FORMAL SPECIFICATION 1886: 1887: -- Z notation 1888: abuse_contacts == [ 1889: Xi Email::Abuse::Investigator; 1890: result! : seq CONTACT_INFO 1891: ] 1892: post: forall c : result! @ c.address contains '@' /\ 1893: forall c1, c2 : result! @ c1 /= c2 => c1.address /= c2.address 1894: 1895: =cut 1896: 1897: sub abuse_contacts { ●1898 → 1951 → 1972●1898 → 1951 → 0 1898: my $self = $_[0]; 1899: 1900: my (@contacts, %seen_idx); 1901: 1902: # Inner closure: add one contact entry, merging roles for duplicate addresses 1903: my $add = sub { 1904: my (%args) = @_; 1905: my $addr = lc($args{address} // ''); 1906: return unless $addr && $addr =~ /\@/; 1907: 1908: # Suppress addresses belonging to form-only providers (no email accepted)

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1909_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1910: my $dom = $1; 1911: my $pa = $self->_provider_abuse_for_host($dom); 1912: return if $pa && $pa->{form} && !$pa->{email}; 1913: } 1914:

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1915_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1916: # Merge the new role into the existing entry 1917: my $entry = $contacts[ $seen_idx{$addr} ]; 1918: push @{ $entry->{roles} }, $args{role}; 1919: 1920: # Collapse repeated role labels to avoid unreadable strings 1921: my (%role_counts, @ordered_roles); 1922: for my $r (@{ $entry->{roles} }) { 1923: push @ordered_roles, $r unless $role_counts{$r}++; 1924: } 1925: my @display = map {

Mutants (Total: 3, Killed: 0, Survived: 3)

NUM_BOUNDARY_1926_22_<: Numeric boundary flip > to <
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_1926_22_>=: Numeric boundary flip > to >=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_1926_22_<=: Numeric boundary flip > to <=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```

1927: } @ordered_roles; 1928: my $joined = join(' and ', @display); 1929: 1930: # Summarise if the merged string is too long to read

Mutants (Total: 4, Killed: 1, Survived: 3)

NUM_BOUNDARY_1931_24_<: Numeric boundary flip > to <
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_1931_24_>=: Numeric boundary flip > to >=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_1931_24_<=: Numeric boundary flip > to <=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```

1932: my @short = map { 1933: (my $s = $_) =~ s/[:(\d].*//; 1934: $s =~ s/\s+$//; 1935: $s; 1936: } @display; 1937: $joined = scalar(@display) . ' routes: ' . join(', ', @short); 1938: } 1939: $entry->{role} = $joined; 1940: return; 1941: } 1942: 1943: # First time seeing this address -- record and store 1944: $seen_idx{$addr} = scalar @contacts; 1945: $args{roles} = [ $args{role} ]; 1946: push @contacts, \%args; 1947: }; 1948: 1949: # Route 1 -- Sending ISP (originating IP) 1950: my $orig = $self->originating_ip();

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1951_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1952: my $pa = $self->_provider_abuse_for_ip($orig->{ip}, $orig->{rdns});

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1953_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1954: $add->( 1955: role => 'Sending ISP', 1956: address => $pa->{email}, 1957: note => "$orig->{ip} ($orig->{rdns}) -- $pa->{note}", 1958: via => 'provider-table', 1959: ); 1960: }

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1961_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1962: $add->( 1963: role => 'Sending ISP', 1964: address => $orig->{abuse}, 1965: note => "Network owner of originating IP $orig->{ip} ($orig->{org})", 1966: via => 'ip-whois', 1967: ); 1968: } 1969: } 1970: 1971: # Route 2 -- URL hosts ●1972 → 1973 → 2000●1972 → 1973 → 0 1972: my %url_host_seen; 1973: for my $u ($self->embedded_urls()) { 1974: next if $url_host_seen{ $u->{host} }++; 1975: my $bare_host = lc $u->{host}; 1976: $bare_host =~ s/^www\.//; 1977: # Skip trusted infrastructure (Google, W3C, etc.) 1978: next if $self->{trusted_domains}->{$bare_host}; 1979: next if $TRUSTED_DOMAINS{$bare_host}; 1980: my $pa = $self->_provider_abuse_for_host($u->{host});

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1981_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1982: $add->( 1983: role => "URL host: $u->{host}", 1984: address => $pa->{email}, 1985: note => "$u->{host} -- $pa->{note}", 1986: via => 'provider-table', 1987: ); 1988: }

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_1989_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

1990: $add->( 1991: role => "URL host: $u->{host}", 1992: address => $u->{abuse}, 1993: note => "Hosting $u->{host} ($u->{ip}, $u->{org})", 1994: via => 'ip-whois', 1995: ); 1996: } 1997: } 1998: 1999: # Route 3 -- Contact domain hosting and registration ●2000 → 2000 → 2067●2000 → 2000 → 0 2000: for my $d ($self->mailto_domains()) { 2001: my $dom = $d->{domain}; 2002: 2003: # Web host contact

Mutants (Total: 1, Killed: 1, Survived: 0)

2005: my $pa = $self->_provider_abuse_for_host($dom); 2006: if ($pa) {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_2006_4: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

2007: $add->(role => "Web host of $dom", address => $pa->{email}, 2008: note => $pa->{note}, via => 'provider-table'); 2009: } 2010: $add->( 2011: role => "Web host of $dom", 2012: address => $d->{web_abuse}, 2013: note => sprintf('Hosting %s (%s, %s)', 2014: $dom // '(unknown domain)', 2015: $d->{web_ip} // '(unknown IP)', 2016: $d->{web_org} // '(unknown org)'), 2017: via => 'ip-whois', 2018: ); 2019: } 2020: 2021: # MX (mail host) contact

Mutants (Total: 1, Killed: 1, Survived: 0)

2023: $add->( 2024: role => "Mail host (MX) for $dom", 2025: address => $d->{mx_abuse}, 2026: note => sprintf('MX %s (%s, %s)', 2027: $d->{mx_host} // '(unknown host)', 2028: $d->{mx_ip} // '(unknown IP)', 2029: $d->{mx_org} // '(unknown org)'), 2030: via => 'ip-whois', 2031: ); 2032: } 2033: 2034: # NS (DNS host) contact 2035: if ($d->{ns_abuse}) {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_2035_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

2036: $add->( 2037: role => "DNS host (NS) for $dom", 2038: address => $d->{ns_abuse}, 2039: note => sprintf('NS %s (%s, %s)', 2040: $d->{ns_host} // '(unknown host)', 2041: $d->{ns_ip} // '(unknown IP)', 2042: $d->{ns_org} // '(unknown org)'), 2043: via => 'ip-whois', 2044: ); 2045: } 2046: 2047: # Domain registrar (skip if domain only seen in spoofable headers)

Mutants (Total: 1, Killed: 1, Survived: 0)

2049: my $spoofable_only = 2050: $d->{source} =~ /^(?:From:|Return-Path:|Sender:) header$/ && 2051: !scalar(grep { 2052: $_->{host} && 2053: _registrable($_->{host}) eq (_registrable($dom) // $dom) 2054: } $self->embedded_urls()); 2055: unless ($spoofable_only) {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_2055_4: Invert condition unless to if
MEDIUM: Add tests asserting both true and false outcomes

2056: $add->( 2057: role => "Domain registrar for $dom", 2058: address => $d->{registrar_abuse}, 2059: note => 'Registrar: ' . ($d->{registrar} // '(unknown)'), 2060: via => 'domain-whois', 2061: ); 2062: } 2063: } 2064: } 2065: 2066: # Route 4 -- From:/Reply-To:/Return-Path:/Sender: account provider ●2067 → 2067 → 2092●2067 → 2067 → 0 2067: for my $hname (qw(from reply-to return-path sender)) { 2068: my $val = $self->_header_value($hname) // next; 2069: 2070: # Extract addr-spec from angle-bracket form to avoid display-name @-signs 2071: my $addr_spec = ($val =~ /<([^>]*)>\s*$/) ? $1 : $val; 2072: my ($addr_domain) = $addr_spec =~ /\@([\w.-]+)/; 2073: next unless $addr_domain; 2074: 2075: # Skip SRS-rewritten forwarder addresses (not the real sender) 2076: next if $addr_spec =~ /\+SRS[0-9]?=/i; 2077: 2078: my $pa = $self->_provider_abuse_for_host($addr_domain);

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_2079_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

2080: my $role_addr = $addr_spec =~ /\@/ ? $addr_spec : $val; 2081: $role_addr =~ s/^\s+|\s+$//g; 2082: $add->( 2083: role => "Account provider ($hname: $role_addr)", 2084: address => $pa->{email}, 2085: note => $pa->{note}, 2086: via => 'provider-table', 2087: ); 2088: } 2089: } 2090: 2091: # Route 5 -- DKIM signing organisation ●2092 → 2093 → 2106●2092 → 2093 → 0 2092: my $auth = $self->_parse_auth_results_cached();

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_2093_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

2094: my $pa = $self->_provider_abuse_for_host($auth->{dkim_domain});

Mutants (Total: 1, Killed: 1, Survived: 0)

2096: $add->( 2097: role => "DKIM signer: $auth->{dkim_domain}", 2098: address => $pa->{email}, 2099: note => $pa->{note}, 2100: via => 'provider-table', 2101: ); 2102: } 2103: } 2104: 2105: # Route 6 -- List-Unsubscribe ESP domain ●2106 → 2107 → 2130●2106 → 2107 → 0 2106: my $unsub = $self->_header_value('list-unsubscribe'); 2107: if ($unsub) {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_2107_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

2108: my @unsub_domains; 2109: while ($unsub =~ m{https?://([^/:?\s>]+)}gi) { 2110: push @unsub_domains, lc $1; 2111: } 2112: while ($unsub =~ m{mailto:[^@\s>]+\@([\w.-]+)}gi) { 2113: push @unsub_domains, lc $1; 2114: } 2115: my %unsub_seen; 2116: for my $dom (grep { !$unsub_seen{$_}++ } @unsub_domains) { 2117: my $pa = $self->_provider_abuse_for_host($dom);

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_2118_4: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

2119: $add->( 2120: role => "ESP / bulk sender (List-Unsubscribe: $dom)", 2121: address => $pa->{email}, 2122: note => "$pa->{note} -- responsible for this bulk delivery", 2123: via => 'provider-table', 2124: ); 2125: } 2126: } 2127: } 2128: 2129: # Route 7 -- Reply addresses embedded in the message body ●2130 → 2132 → 2146●2130 → 2132 → 0 2130: my %body_addr_seen; 2131: my $combined_body = $self->{_body_plain} . "\n" . $self->{_body_html}; 2132: for my $addr_dom ($self->_domains_from_text($combined_body)) { 2133: next if $body_addr_seen{$addr_dom}++; 2134: my $pa = $self->_provider_abuse_for_host($addr_dom); 2135: next unless $pa && $pa->{email}; 2136: my ($example_addr) = $combined_body =~ /(\S+\@\Q$addr_dom\E)/i; 2137: $example_addr //= "\@$addr_dom"; 2138: $add->( 2139: role => "Reply address in body ($example_addr)", 2140: address => $pa->{email}, 2141: note => $pa->{note}, 2142: via => 'provider-table', 2143: ); 2144: } 2145:

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_2146_2: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_2146_2: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

2147: } 2148: 2149: # ----------------------------------------------------------------------- 2150: # Public: form contacts (providers that require web-form submission) 2151: # ----------------------------------------------------------------------- 2152: 2153: =head2 form_contacts() 2154: 2155: Returns the list of parties that require abuse reports via a web form 2156: rather than email. These are providers whose C<%PROVIDER_ABUSE> entry 2157: has a C<form> key. Each hashref includes the form URL, paste 2158: instructions, upload instructions, and the discovery role. 2159: 2160: =head3 Usage 2161: 2162: my @forms = $analyser->form_contacts(); 2163: for my $c (@forms) { 2164: printf "Open: %s\n", $c->{form}; 2165: } 2166: 2167: =head3 Arguments 2168: 2169: None. C<parse_email()> must have been called first. 2170: 2171: =head3 Returns 2172: 2173: A list of hashrefs, one per unique form contact. Each hashref has keys 2174: C<form>, C<role>, C<note>, C<form_paste> (optional), C<form_upload> 2175: (optional), and C<via>. Returns an empty list if no form contacts are found. 2176: 2177: =head3 Side Effects 2178: 2179: Triggers C<originating_ip()>, C<embedded_urls()>, and C<mailto_domains()> 2180: if not already cached. 2181: 2182: =head3 Notes 2183: 2184: Deduplication is by form URL. 2185: 2186: =head3 API Specification 2187: 2188: =head4 Input 2189: 2190: [] 2191: 2192: =head4 Output 2193: 2194: ( 2195: { 2196: type => 'hashref', 2197: keys => { 2198: form => { type => 'scalar', regex => qr{^https?://} }, 2199: role => { type => 'scalar' }, 2200: note => { type => 'scalar' }, 2201: form_paste => { type => 'scalar', optional => 1 }, 2202: form_upload => { type => 'scalar', optional => 1 }, 2203: via => { type => 'scalar' }, 2204: }, 2205: }, 2206: ... 2207: ) 2208: 2209: =head3 FORMAL SPECIFICATION 2210: 2211: -- Z notation 2212: form_contacts == [ 2213: Xi Email::Abuse::Investigator; 2214: result! : seq FORM_CONTACT_INFO 2215: ] 2216: post: forall c : result! @ c.form =~ m{^https?://} /\ 2217: forall c1, c2 : result! @ c1 /= c2 => c1.form /= c2.form 2218: 2219: =cut 2220: 2221: sub form_contacts { ●2222 → 2237 → 2252●2222 → 2237 → 0 2222: my $self = $_[0]; 2223: 2224: my (@contacts, %seen); 2225: 2226: # Inner closure: add one form-contact entry, deduplicating by form URL 2227: my $add = sub { 2228: my (%args) = @_; 2229: my $form = $args{form} // ''; 2230: return unless $form; 2231: return if $seen{$form}++; 2232: push @contacts, \%args; 2233: }; 2234: 2235: # Route 1 -- Sending ISP 2236: my $orig = $self->originating_ip();

Mutants (Total: 1, Killed: 1, Survived: 0)

2238: my $pa = $self->_provider_abuse_for_ip($orig->{ip}, $orig->{rdns}); 2239: if ($pa && $pa->{form}) {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_2239_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

2240: $add->( 2241: role => 'Sending ISP', 2242: form => $pa->{form}, 2243: note => $pa->{note} // '', 2244: form_paste => $pa->{form_paste} // '', 2245: form_upload => $pa->{form_upload} // '', 2246: via => 'provider-table', 2247: ); 2248: } 2249: } 2250: 2251: # Route 2 -- URL hosts ●2252 → 2253 → 2270●2252 → 2253 → 0 2252: my %url_host_seen; 2253: for my $u ($self->embedded_urls()) { 2254: next if $url_host_seen{ $u->{host} }++; 2255: my $pa = $self->_provider_abuse_for_host($u->{host});

Mutants (Total: 1, Killed: 1, Survived: 0)

2257: $add->( 2258: role => "URL host: $u->{host}", 2259: form => $pa->{form}, 2260: form_domain => $u->{host}, 2261: note => $pa->{note} // '', 2262: form_paste => $pa->{form_paste} // '', 2263: form_upload => $pa->{form_upload} // '', 2264: via => 'provider-table', 2265: ); 2266: } 2267: } 2268: 2269: # Route 3 -- Contact domains (web host + registrar) ●2270 → 2270 → 2304●2270 → 2270 → 0 2270: for my $d ($self->mailto_domains()) { 2271: my $dom = $d->{domain}; 2272: my $pa = $self->_provider_abuse_for_host($dom); 2273: if ($pa && $pa->{form}) {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_2273_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

2274: $add->( 2275: role => "Web host of $dom", 2276: form => $pa->{form}, 2277: form_domain => $dom, 2278: note => $pa->{note} // '', 2279: form_paste => $pa->{form_paste} // '', 2280: form_upload => $pa->{form_upload} // '', 2281: via => 'provider-table', 2282: ); 2283: } 2284: 2285: # Registrar identified via WHOIS -- check for form-only registrar

Mutants (Total: 1, Killed: 1, Survived: 0)

2287: my $reg_domain = lc $1; 2288: my $rpa = $self->_provider_abuse_for_host($reg_domain); 2289: if ($rpa && $rpa->{form}) {

Mutants (Total: 1, Killed: 1, Survived: 0)

2290: $add->( 2291: role => "Domain registrar for $dom (web form only)", 2292: form => $rpa->{form}, 2293: form_domain => $dom, 2294: note => $rpa->{note} // '', 2295: form_paste => $rpa->{form_paste} // '', 2296: form_upload => $rpa->{form_upload} // '', 2297: via => 'provider-table', 2298: ); 2299: } 2300: } 2301: } 2302: 2303: # Route 4 -- Account provider headers ●2304 → 2304 → 2327●2304 → 2304 → 0 2304: for my $hname (qw(from reply-to return-path sender)) { 2305: my $val = $self->_header_value($hname) // next; 2306: my $addr_spec = ($val =~ /<([^>]*)>\s*$/) ? $1 : $val; 2307: my ($addr_domain) = $addr_spec =~ /\@([\w.-]+)/; 2308: next unless $addr_domain; 2309: # Skip SRS forwarder rewrite addresses 2310: next if $addr_spec =~ /\+SRS[0-9]?=/i; 2311: my $pa = $self->_provider_abuse_for_host($addr_domain); 2312: if ($pa && $pa->{form}) {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_2312_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

2313: my $role_addr = $addr_spec =~ /@/ ? $addr_spec : $val; 2314: $role_addr =~ s/^\s+|\s+$//g; 2315: $add->( 2316: role => "Account provider ($hname: $role_addr)", 2317: form => $pa->{form}, 2318: note => $pa->{note} // '', 2319: form_paste => $pa->{form_paste} // '', 2320: form_upload => $pa->{form_upload} // '', 2321: via => 'provider-table', 2322: ); 2323: } 2324: } 2325: 2326: # Route 5 -- DKIM signer ●2327 → 2328 → 2343●2327 → 2328 → 0 2327: my $auth = $self->_parse_auth_results_cached();

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_2328_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

2329: my $pa = $self->_provider_abuse_for_host($auth->{dkim_domain});

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_2330_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

2331: $add->( 2332: role => "DKIM signer: $auth->{dkim_domain}", 2333: form => $pa->{form}, 2334: note => $pa->{note} // '', 2335: form_paste => $pa->{form_paste} // '', 2336: form_upload => $pa->{form_upload} // '', 2337: via => 'provider-table', 2338: ); 2339: } 2340: } 2341: 2342: # Route 6 -- List-Unsubscribe ESP domains ●2343 → 2344 → 2364●2343 → 2344 → 0 2343: my $unsub = $self->_header_value('list-unsubscribe');

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_2344_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

2345: my @unsub_domains; 2346: while ($unsub =~ m{https?://([^/:?\s>]+)}gi) { push @unsub_domains, lc $1 } 2347: while ($unsub =~ m{mailto:[^@\s>]+\@([\w.-]+)}gi) { push @unsub_domains, lc $1 } 2348: my %useen; 2349: for my $dom (grep { !$useen{$_}++ } @unsub_domains) { 2350: my $pa = $self->_provider_abuse_for_host($dom);

Mutants (Total: 1, Killed: 1, Survived: 0)

2352: $add->( 2353: role => "ESP / bulk sender (List-Unsubscribe: $dom)", 2354: form => $pa->{form}, 2355: note => $pa->{note} // '', 2356: form_paste => $pa->{form_paste} // '', 2357: form_upload => $pa->{form_upload} // '', 2358: via => 'provider-table', 2359: ); 2360: } 2361: } 2362: } 2363: ●2364 → 2364 → 0 2364: return @contacts;

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_2364_2: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_2364_2: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

2365: } 2366: 2367: # ----------------------------------------------------------------------- 2368: # Public: full analyst report 2369: # ----------------------------------------------------------------------- 2370: 2371: =head2 report() 2372: 2373: Produces a comprehensive, analyst-facing plain-text report covering all 2374: findings: envelope fields, risk assessment, originating host, sending 2375: software, received chain tracking IDs, embedded URLs, contact domain 2376: intelligence, and recommended abuse contacts. 2377: 2378: Use C<report()> for human review or ticketing systems. Use 2379: C<abuse_report_text()> for sending to ISP abuse desks. 2380: 2381: =head3 Usage 2382: 2383: print $analyser->report(); 2384: 2385: open my $fh, '>', 'report.txt' or croak "Cannot open: $!"; 2386: print $fh $analyser->report(); 2387: close $fh; 2388: 2389: =head3 Arguments 2390: 2391: None. C<parse_email()> must have been called first. 2392: 2393: =head3 Returns 2394: 2395: A plain scalar string, newline-terminated, Unix line endings. Never empty 2396: or undef. 2397: 2398: =head3 Side Effects 2399: 2400: Triggers all analysis methods if not already cached. 2401: 2402: =head3 Notes 2403: 2404: The report is idempotent: calling it multiple times on the same object 2405: always returns an identical string. All user-derived content is sanitised 2406: before output. 2407: 2408: =head3 API Specification 2409: 2410: =head4 Input 2411: 2412: [] 2413: 2414: =head4 Output 2415: 2416: { type => 'scalar' } 2417: 2418: =head3 FORMAL SPECIFICATION 2419: 2420: -- Z notation 2421: report == [ 2422: Xi Email::Abuse::Investigator; 2423: result! : STRING 2424: ] 2425: post: result! /= '' /\ result! ends_with '\n' 2426: 2427: =cut 2428: 2429: sub report { ●2430 → 2441 → 2449●2430 → 2441 → 0 2430: my $self = $_[0]; 2431: 2432: my @out; 2433: 2434: # Banner header 2435: push @out, '=' x 72; 2436: push @out, " Email::Abuse::Investigator Report (v$VERSION)"; 2437: push @out, '=' x 72; 2438: push @out, ''; 2439: 2440: # Envelope summary -- decode MIME encoded-words for readability 2441: for my $f (qw(from reply-to return-path subject date message-id)) { 2442: my $v = $self->_header_value($f); 2443: next unless defined $v; 2444: my $decoded = $self->_decode_mime_words($v); 2445: my $label = ucfirst($f); 2446: push @out, sprintf(' %-14s : %s', $label, 2447: _sanitise_output($decoded ne $v ? "$decoded [encoded: $v]" : $v)); 2448: } ●2449 → 2454 → 2461●2449 → 2454 → 0 2449: push @out, ''; 2450: 2451: # Risk assessment section 2452: my $risk = $self->risk_assessment(); 2453: push @out, "[ RISK ASSESSMENT: $risk->{level} (score: $risk->{score}) ]";

Mutants (Total: 1, Killed: 1, Survived: 0)

2455: for my $f (@{ $risk->{flags} }) { 2456: push @out, " [$f->{severity}] " . _sanitise_output($f->{detail}); 2457: } 2458: } else { 2459: push @out, ' (no specific red flags detected)'; 2460: } ●2461 → 2466 → 2477●2461 → 2466 → 0 2461: push @out, ''; 2462: 2463: # Originating host section 2464: push @out, '[ ORIGINATING HOST ]'; 2465: my $orig = $self->originating_ip(); 2466: if ($orig) {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_2466_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

2467: push @out, ' IP : ' . _sanitise_output($orig->{ip}); 2468: push @out, ' Reverse DNS : ' . _sanitise_output($orig->{rdns}) if $orig->{rdns}; 2469: push @out, ' Country : ' . _sanitise_output($orig->{country}) if $orig->{country}; 2470: push @out, ' Organisation : ' . _sanitise_output($orig->{org}) if $orig->{org}; 2471: push @out, ' Abuse addr : ' . _sanitise_output($orig->{abuse}) if $orig->{abuse}; 2472: push @out, " Confidence : $orig->{confidence}"; 2473: push @out, ' Note : ' . _sanitise_output($orig->{note}) if $orig->{note}; 2474: } else { 2475: push @out, ' (could not determine originating IP)'; 2476: } ●2477 → 2481 → 2491●2477 → 2481 → 0 2477: push @out, ''; 2478: 2479: # Sending software section (omitted if none found) 2480: my @sw = $self->sending_software();

Mutants (Total: 1, Killed: 1, Survived: 0)

2482: push @out, '[ SENDING SOFTWARE / INFRASTRUCTURE CLUES ]'; 2483: for my $s (@sw) { 2484: push @out, sprintf(' %-14s : %s', $s->{header}, _sanitise_output($s->{value})); 2485: push @out, " Note : $s->{note}"; 2486: push @out, ''; 2487: } 2488: } 2489: 2490: # Received chain tracking IDs (only hops with id or for are shown) ●2491 → 2493 → 2506●2491 → 2493 → 0 2491: my @trail = grep { defined $_->{id} || defined $_->{for} } 2492: $self->received_trail(); 2493: if (@trail) {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_2493_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

2494: push @out, '[ RECEIVED CHAIN TRACKING IDs ]'; 2495: push @out, ' (Supply these to the relevant ISP abuse team to trace the session)'; 2496: push @out, ''; 2497: for my $hop (@trail) { 2498: push @out, ' IP : ' . (_sanitise_output($hop->{ip}) // '(unknown)'); 2499: push @out, ' Envelope for : ' . _sanitise_output($hop->{for}) if $hop->{for}; 2500: push @out, ' Server ID : ' . _sanitise_output($hop->{id}) if $hop->{id}; 2501: push @out, ''; 2502: } 2503: } 2504: 2505: # Embedded URLs section -- grouped by hostname ●2506 → 2508 → 2551●2506 → 2508 → 0 2506: push @out, '[ EMBEDDED HTTP/HTTPS URLs ]'; 2507: my @urls = $self->embedded_urls();

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_2508_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

2509: my (%host_order, %host_meta, %host_paths); 2510: my $seq = 0; 2511: for my $u (@urls) { 2512: my $h = $u->{host};

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_2513_4: Invert condition unless to if
MEDIUM: Add tests asserting both true and false outcomes

2514: $host_order{$h} = $seq++; 2515: $host_meta{$h} = { 2516: ip => $u->{ip}, 2517: org => $u->{org}, 2518: abuse => $u->{abuse}, 2519: country => $u->{country}, 2520: }; 2521: } 2522: push @{ $host_paths{$h} }, $u->{url}; 2523: } 2524: 2525: # Output each host group in first-seen order 2526: for my $h (sort { $host_order{$a} <=> $host_order{$b} } keys %host_order) { 2527: my $m = $host_meta{$h}; 2528: my $bare = lc $h; $bare =~ s/^www\.//; 2529: push @out, ' Host : ' . _sanitise_output($h) . 2530: (($URL_SHORTENERS{$bare} || $self->{url_shorteners}->{$bare}) 2531: ? ' *** URL SHORTENER -- real destination hidden ***' : ''); 2532: push @out, ' IP : ' . _sanitise_output($m->{ip}) if $m->{ip}; 2533: push @out, ' Country : ' . _sanitise_output($m->{country}) if $m->{country}; 2534: push @out, ' Organisation : ' . _sanitise_output($m->{org}) if $m->{org}; 2535: push @out, ' Abuse addr : ' . _sanitise_output($m->{abuse}) if $m->{abuse}; 2536: my @paths = @{ $host_paths{$h} };

Mutants (Total: 2, Killed: 0, Survived: 2)

NUM_BOUNDARY_2537_15_!=: Numeric boundary flip == to !=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
COND_INV_2537_4: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

2538: push @out, ' URL : ' . _sanitise_output($paths[0]); 2539: } else { 2540: push @out, ' URLs (' . scalar(@paths) . ') :'; 2541: push @out, ' ' . _sanitise_output($_) for @paths; 2542: } 2543: push @out, ''; 2544: } 2545: } else { 2546: push @out, ' (none found)'; 2547: push @out, ''; 2548: } 2549: 2550: # Contact / reply-to domains section ●2551 → 2553 → 2593●2551 → 2553 → 0 2551: push @out, '[ CONTACT / REPLY-TO DOMAINS ]'; 2552: my @mdoms = $self->mailto_domains();

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_2553_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

2554: for my $d (@mdoms) { 2555: push @out, ' Domain : ' . _sanitise_output($d->{domain}); 2556: push @out, ' Found in : ' . _sanitise_output($d->{source});

Mutants (Total: 1, Killed: 1, Survived: 0)

2558: push @out, ' *** WARNING: RECENTLY REGISTERED - possible phishing domain ***'; 2559: } 2560: push @out, ' Registered : ' . $d->{registered} if $d->{registered}; 2561: push @out, ' Expires : ' . $d->{expires} if $d->{expires}; 2562: push @out, ' Registrar : ' . _sanitise_output($d->{registrar}) if $d->{registrar}; 2563: push @out, ' Reg. abuse : ' . _sanitise_output($d->{registrar_abuse}) if $d->{registrar_abuse}; 2564: if ($d->{web_ip}) {

Mutants (Total: 1, Killed: 1, Survived: 0)

2565: push @out, ' Web host IP : ' . _sanitise_output($d->{web_ip}); 2566: push @out, ' Web host org : ' . _sanitise_output($d->{web_org}) if $d->{web_org}; 2567: push @out, ' Web abuse : ' . _sanitise_output($d->{web_abuse}) if $d->{web_abuse}; 2568: } else { 2569: push @out, ' Web host : (no A record / unreachable)'; 2570: } 2571: if ($d->{mx_host}) {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_2571_4: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

2572: push @out, ' MX host : ' . _sanitise_output($d->{mx_host}); 2573: push @out, ' MX IP : ' . _sanitise_output($d->{mx_ip}) if $d->{mx_ip}; 2574: push @out, ' MX org : ' . _sanitise_output($d->{mx_org}) if $d->{mx_org}; 2575: push @out, ' MX abuse : ' . _sanitise_output($d->{mx_abuse}) if $d->{mx_abuse}; 2576: } else { 2577: push @out, ' MX host : (none found)'; 2578: }

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_2579_4: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

2580: push @out, ' NS host : ' . _sanitise_output($d->{ns_host}); 2581: push @out, ' NS IP : ' . _sanitise_output($d->{ns_ip}) if $d->{ns_ip}; 2582: push @out, ' NS org : ' . _sanitise_output($d->{ns_org}) if $d->{ns_org}; 2583: push @out, ' NS abuse : ' . _sanitise_output($d->{ns_abuse}) if $d->{ns_abuse}; 2584: } 2585: push @out, ''; 2586: } 2587: } else { 2588: push @out, ' (none found)'; 2589: push @out, ''; 2590: } 2591: 2592: # Abuse contacts summary ●2593 → 2595 → 2609●2593 → 2595 → 0 2593: push @out, '[ WHERE TO SEND ABUSE REPORTS ]'; 2594: my @contacts = $self->abuse_contacts();

Mutants (Total: 1, Killed: 1, Survived: 0)

2596: for my $c (@contacts) { 2597: push @out, ' Role : ' . _sanitise_output($c->{role}); 2598: push @out, ' Send to : ' . _sanitise_output($c->{address}); 2599: push @out, ' Note : ' . _sanitise_output($c->{note}) if $c->{note}; 2600: push @out, " Discovered : $c->{via}"; 2601: push @out, ''; 2602: } 2603: } else { 2604: push @out, ' (no abuse contacts could be determined)'; 2605: push @out, ''; 2606: } 2607: 2608: # Web-form contacts (providers that require manual form submission) ●2609 → 2610 → 2642●2609 → 2610 → 0 2609: my @form_cs = $self->form_contacts(); 2610: if (@form_cs) {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_2610_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

2611: push @out, '[ WHERE TO FILE WEB-FORM REPORTS ]'; 2612: push @out, ' The following parties require manual submission via a web form.'; 2613: push @out, ' Open each URL in a browser, then follow the instructions below it.'; 2614: push @out, ''; 2615: for my $c (@form_cs) { 2616: push @out, ' Role : ' . _sanitise_output($c->{role}); 2617: push @out, ' Form URL : ' . _sanitise_output($c->{form}); 2618: push @out, ' Domain/URL : ' . _sanitise_output($c->{form_domain}) if $c->{form_domain}; 2619: push @out, ' Note : ' . _sanitise_output($c->{note}) if $c->{note};

Mutants (Total: 1, Killed: 1, Survived: 0)

2621: # Word-wrap the paste hint at ROLE_WRAP_LEN characters 2622: my $hint = $c->{form_paste}; 2623: my @words = split /\s+/, $hint; 2624: my (@lines, $line); 2625: for my $w (@words) { 2626: if (defined $line && length("$line $w") > $ROLE_WRAP_LEN) {

Mutants (Total: 4, Killed: 0, Survived: 4)

NUM_BOUNDARY_2626_46_<: Numeric boundary flip > to <
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_2626_46_>=: Numeric boundary flip > to >=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_2626_46_<=: Numeric boundary flip > to <=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
COND_INV_2626_6: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

2627: push @lines, $line; 2628: $line = $w; 2629: } else { 2630: $line = defined $line ? "$line $w" : $w; 2631: } 2632: } 2633: push @lines, $line if defined $line; 2634: push @out, ' Paste : ' . shift @lines if @lines; 2635: push @out, ' ' . $_ for @lines; 2636: } 2637: push @out, ' Upload : ' . _sanitise_output($c->{form_upload}) if $c->{form_upload}; 2638: push @out, ''; 2639: } 2640: } 2641: ●2642 → 2643 → 0 2642: push @out, '=' x 72; 2643: return join("\n", @out) . "\n"; 2644: } 2645: 2646: # ----------------------------------------------------------------------- 2647: # Private: output sanitisation 2648: # ----------------------------------------------------------------------- 2649: 2650: # _sanitise_output( $str ) -> $str 2651: # 2652: # Purpose: 2653: # Strip control characters that could affect terminal rendering or HTML 2654: # injection from any string that will appear in a report or abuse email. 2655: # Preserves printable ASCII, high-bytes (for UTF-8 content), tabs, and 2656: # line endings. 2657: # 2658: # Entry criteria: 2659: # $str -- a defined or undef scalar. 2660: # 2661: # Exit status: 2662: # Returns the sanitised string, or the empty string if $str is undef. 2663: # 2664: # Notes: 2665: # Only strips C0 control characters below 0x20 (except \t) and the DEL 2666: # character (0x7F). High bytes (0x80-0xFF) are preserved because they 2667: # form valid UTF-8 multi-byte sequences in headers and body text. 2668: 2669: sub _sanitise_output { 2670: my ($str) = @_;

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_2671_2: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_2671_2: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

2672: # Remove C0 controls (except tab) and DEL 2673: $str =~ s/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F]//g;

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_2674_2: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_2674_2: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

2675: } 2676: 2677: # ----------------------------------------------------------------------- 2678: # Private: message parsing 2679: # ----------------------------------------------------------------------- 2680: 2681: # _split_message( $text ) 2682: # 2683: # Purpose: 2684: # Split a raw RFC 2822 email into headers and body, parse all headers, 2685: # decode the body (including multipart), extract sending-software 2686: # fingerprints, and populate per-hop tracking data. 2687: # 2688: # Entry criteria: 2689: # $text -- defined scalar, already dereferenced by parse_email(). 2690: # $self->{_sending_sw} and $self->{_rcvd_tracking} reset to [] by caller. 2691: # 2692: # Exit status: 2693: # Returns undef silently if the header block is empty/whitespace-only. 2694: # Otherwise all results are communicated via side effects on $self. 2695: # 2696: # Side effects: 2697: # Populates _headers, _received, _body_plain, _body_html, _sending_sw, 2698: # and _rcvd_tracking. 2699: # 2700: # Notes: 2701: # Delegates to _decode_multipart() for multipart/* content types. 2702: # Lines not matching the header pattern are silently discarded. 2703: # Boundary extraction uses a simple regex; missing boundary causes the 2704: # body to be skipped silently. 2705: 2706: sub _split_message { ●2707 → 2720 → 2725●2707 → 2720 → 0 2707: my ($self, $text) = @_; 2708: 2709: # Split at the first blank line (RFC 2822 header/body separator) 2710: my ($header_block, $body_raw) = split /\r?\n\r?\n/, $text, 2; 2711: 2712: return unless defined $header_block && $header_block =~ /\S/; 2713: $body_raw //= ''; 2714: 2715: # Unfold RFC 2822 continuation lines (s2.2.3) 2716: $header_block =~ s/\r?\n([ \t]+)/ $1/g; 2717: 2718: # Parse each header line into a { name, value } pair 2719: my @headers; 2720: for my $line (split /\r?\n/, $header_block) {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_2721_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

2722: push @headers, { name => lc($1), value => $2 }; 2723: } 2724: } ●2725 → 2740 → 2750●2725 → 2740 → 0 2725: $self->{_headers} = \@headers; 2726: 2727: # Collect all Received: header values (most-recent first, as in message) 2728: $self->{_received} = [ 2729: map { $_->{value} } 2730: grep { $_->{name} eq 'received' } @headers 2731: ]; 2732: 2733: # Determine content type and transfer encoding from top-level headers 2734: my ($ct_h) = grep { $_->{name} eq 'content-type' } @headers; 2735: my ($cte_h) = grep { $_->{name} eq 'content-transfer-encoding' } @headers; 2736: my $ct = defined $ct_h ? $ct_h->{value} : ''; 2737: my $cte = defined $cte_h ? $cte_h->{value} : ''; 2738: 2739: # Decode multipart or single-part body as appropriate

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_2740_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

2741: my ($boundary) = $ct =~ /boundary="?([^";]+)"?/i; 2742: # Pass depth=0 to enforce the MAX_MULTIPART_DEPTH recursion guard 2743: $self->_decode_multipart($body_raw, $boundary, 0) if $boundary; 2744: } else { 2745: my $decoded = $self->_decode_body($body_raw, $cte);

Mutants (Total: 1, Killed: 1, Survived: 0)

2747: else { $self->{_body_plain} = $decoded } 2748: } 2749: ●2750 → 2764 → 2776●2750 → 2764 → 0 2750: $self->_debug(sprintf 'Parsed %d headers, %d Received lines', 2751: scalar @headers, scalar @{ $self->{_received} }); 2752: 2753: # --- Sending software fingerprints --- 2754: # These headers identify the mailer or shared-hosting script that sent 2755: # the message; invaluable for shared-hosting abuse reports. 2756: my %sw_notes = ( 2757: 'x-php-originating-script' => 'PHP script on shared hosting -- report to hosting abuse team', 2758: 'x-source' => 'Source file on shared hosting -- report to hosting abuse team', 2759: 'x-source-host' => 'Sending hostname injected by shared hosting provider', 2760: 'x-source-args' => 'Command-line args injected by shared hosting provider', 2761: 'x-mailer' => 'Email client or bulk-mailer identifier', 2762: 'user-agent' => 'Email client identifier', 2763: ); 2764: for my $sw_hdr (sort keys %sw_notes) { 2765: my ($h) = grep { $_->{name} eq $sw_hdr } @headers; 2766: next unless $h; 2767: push @{ $self->{_sending_sw} }, { 2768: header => $sw_hdr, 2769: value => $h->{value}, 2770: note => $sw_notes{$sw_hdr}, 2771: }; 2772: } 2773: 2774: # --- Per-hop tracking IDs from Received: chain --- 2775: # Walk oldest-first (reverse) so _rcvd_tracking is oldest-first ●2776 → 2776 → 0 2776: for my $rcvd (reverse @{ $self->{_received} }) { 2777: my $ip = $self->_extract_ip_from_received($rcvd); 2778: my ($for_addr) = $rcvd =~ /\bfor\s+<?([^\s>]+\@[\w.-]+\.[\w]+)>?/i; 2779: my ($srv_id) = $rcvd =~ /\bid\s+([\w.-]+)/i; 2780: # Skip hops with no actionable tracking data 2781: next unless defined $ip || defined $for_addr || defined $srv_id; 2782: push @{ $self->{_rcvd_tracking} }, { 2783: received => $rcvd, 2784: ip => $ip, 2785: for => $for_addr, 2786: id => $srv_id, 2787: }; 2788: } 2789: } 2790: 2791: # _decode_multipart( $body, $boundary, $depth ) 2792: # 2793: # Purpose: 2794: # Recursively split a MIME multipart body on its boundary and decode each 2795: # text/plain and text/html part. Nested multipart/* containers are 2796: # recursed into up to MAX_MULTIPART_DEPTH levels deep. 2797: # 2798: # Entry criteria: 2799: # $body -- the raw body text of the multipart container. 2800: # $boundary -- the boundary string from the Content-Type header. 2801: # $depth -- current recursion depth (starts at 0 from _split_message). 2802: # 2803: # Exit status: 2804: # Returns undef if $depth >= MAX_MULTIPART_DEPTH (recursion guard). 2805: # Otherwise all results via side effects. 2806: # 2807: # Side effects: 2808: # Appends decoded text to $self->{_body_plain} and $self->{_body_html}. 2809: # 2810: # Notes: 2811: # Whitespace-only MIME segments between boundaries are silently skipped. 2812: # Decoding errors are silenced; raw bytes are used as fallback. 2813: 2814: sub _decode_multipart { ●2815 → 2820 → 2827●2815 → 2820 → 0 2815: my ($self, $body, $boundary, $depth) = @_; 2816: $depth //= 0; 2817: 2818: # Enforce the recursion depth limit to prevent stack exhaustion on 2819: # pathological crafted messages with deeply nested multipart structures. 2820: if ($depth >= $MAX_MULTIPART_DEPTH) {

Mutants (Total: 4, Killed: 4, Survived: 0)

2821: Carp::carp 'Email::Abuse::Investigator: multipart nesting depth limit', 2822: "($MAX_MULTIPART_DEPTH) exceeded; stopping recursion"; 2823: return; 2824: } 2825: 2826: # Split on the boundary marker; the (?:--)? suffix handles closing boundary ●2827 → 2829 → 0 2827: my @parts = split /--\Q$boundary\E(?:--)?/, $body; 2828: 2829: for my $part (@parts) { 2830: # Skip whitespace-only segments between boundaries 2831: next unless $part =~ /\S/; 2832: 2833: $part =~ s/^\r?\n//; 2834: 2835: # Each MIME part has its own headers separated from body by a blank line 2836: my ($phdr_block, $pbody) = split /\r?\n\r?\n/, $part, 2; 2837: next unless defined $pbody; 2838: 2839: # Unfold continuation header lines within this part 2840: $phdr_block =~ s/\r?\n([ \t]+)/ $1/g; 2841: 2842: # Parse this part's headers into a simple hash 2843: my %phdr; 2844: for my $line (split /\r?\n/, $phdr_block) { 2845: $phdr{ lc($1) } = $2 if $line =~ /^([\w-]+)\s*:\s*(.*)/; 2846: } 2847: 2848: my $pct = $phdr{'content-type'} // ''; 2849: my $pcte = $phdr{'content-transfer-encoding'} // ''; 2850: 2851: # Nested multipart/* must be recursed into; without this URLs in 2852: # multipart/alternative inside multipart/mixed would be missed. 2853: if ($pct =~ /multipart/i) {

Mutants (Total: 1, Killed: 1, Survived: 0)

2854: my ($inner_boundary) = $pct =~ /boundary\s*=\s*"?([^";]+)"?/i; 2855: if ($inner_boundary) {

Mutants (Total: 1, Killed: 1, Survived: 0)

2856: $inner_boundary =~ s/\s+$//; 2857: # Increment depth counter for the recursion guard 2858: $self->_decode_multipart($pbody, $inner_boundary, $depth + 1); 2859: } 2860: next; 2861: } 2862: 2863: # Decode transfer encoding and accumulate by content type 2864: my $decoded = $self->_decode_body($pbody, $pcte); 2865: if ($pct =~ /text\/html/i) { $self->{_body_html} .= $decoded }

Mutants (Total: 1, Killed: 1, Survived: 0)

2866: elsif ($pct =~ /text/i || !$pct) { $self->{_body_plain} .= $decoded } 2867: } 2868: } 2869: 2870: # _decode_body( $body, $cte ) -> string 2871: # 2872: # Purpose: 2873: # Decode a MIME body part according to its Content-Transfer-Encoding. 2874: # 2875: # Entry criteria: 2876: # $body -- raw body string (may be undef). 2877: # $cte -- Content-Transfer-Encoding value string (may be undef). 2878: # 2879: # Exit status: 2880: # Returns the decoded string, or the original string if the encoding is 2881: # 7bit/8bit/binary or unrecognised. 2882: # 2883: # Notes: 2884: # decode_qp and decode_base64 are imported from MIME:: modules; errors 2885: # from malformed content are silenced by the eval wrappers they provide. 2886: 2887: sub _decode_body { 2888: my ($self, $body, $cte) = @_; 2889: $cte //= ''; 2890: return decode_qp($body) if $cte =~ /quoted-printable/i; 2891: return decode_base64($body) if $cte =~ /base64/i; 2892: return $body // '';

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_2892_2: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_2892_2: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

2893: } 2894: 2895: # ----------------------------------------------------------------------- 2896: # Private: Received-chain -> originating IP 2897: # ----------------------------------------------------------------------- 2898: 2899: # _find_origin() 2900: # 2901: # Purpose: 2902: # Walk the Received: chain (oldest-first) to find the first external IP, 2903: # or fall back to X-Originating-IP. Enrich with rDNS and WHOIS. 2904: # 2905: # Entry criteria: 2906: # $self->{_received} populated by _split_message(). 2907: # $self->{trusted_relays} set by new(). 2908: # 2909: # Exit status: 2910: # Returns { ip, rdns, org, abuse, country, confidence, note } on success. 2911: # Returns undef if no usable IP can be identified. 2912: # 2913: # Side effects: 2914: # Network I/O via _enrich_ip(): one PTR lookup, one RDAP/WHOIS query. 2915: # Results are also stored in the CHI cross-message cache if available. 2916: # 2917: # Notes: 2918: # confidence 'high' = 2+ distinct external IPs; 2919: # 'medium' = exactly one external IP; 2920: # 'low' = taken from X-Originating-IP. 2921: 2922: sub _find_origin { ●2923 → 2928 → 2936●2923 → 2928 → 0 2923: my $self = $_[0]; 2924: 2925: my @candidates; 2926: 2927: # Walk oldest-first (reverse) to collect external IPs 2928: for my $hdr (reverse @{ $self->{_received} }) { 2929: my $ip = $self->_extract_ip_from_received($hdr) // next; 2930: next if $self->_is_private($ip); 2931: next if $self->_is_trusted($ip); 2932: push @candidates, $ip; 2933: } 2934: 2935: # Fall back to X-Originating-IP if no external IPs in Received: chain

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_2936_2: Invert condition unless to if
MEDIUM: Add tests asserting both true and false outcomes

2937: my $xoip = $self->_header_value('x-originating-ip');

Mutants (Total: 1, Killed: 1, Survived: 0)

2939: $xoip =~ s/[\[\]\s]//g; 2940: return $self->_enrich_ip($xoip, 'low',

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_2940_4: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_2940_4: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

2941: 'Taken from X-Originating-IP (webmail, unverified)') 2942: unless $self->_is_private($xoip); 2943: }

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_2944_3: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_2944_3: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

2945: } 2946: 2947: # Report the oldest (first) external IP; confidence depends on count

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_2948_2: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_2948_2: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

2949: $candidates[0],

Mutants (Total: 3, Killed: 3, Survived: 0)

2951: 'First external hop in Received: chain', 2952: ); 2953: } 2954: 2955: # _extract_ip_from_received( $hdr ) -> ipv4_or_ipv6_string | undef 2956: # 2957: # Purpose: 2958: # Extract the most-significant IP address from a raw Received: header 2959: # value, trying patterns in priority order. Supports both IPv4 dotted- 2960: # quad and IPv6 bracket notation. 2961: # 2962: # Entry criteria: 2963: # $hdr -- a defined Received: header value string. 2964: # 2965: # Exit status: 2966: # Returns the IP string on success, undef if no IP can be extracted. 2967: # 2968: # Notes: 2969: # IPv4 addresses are validated (all octets <= 255). 2970: # IPv6 addresses are returned as-is if they contain colons. 2971: 2972: sub _extract_ip_from_received { ●2973 → 2974 → 2987●2973 → 2974 → 0 2973: my ($self, $hdr) = @_; 2974: for my $re (@RECEIVED_IP_RE) { 2975: if ($hdr =~ $re) {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_2975_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

2976: my $ip = $1; 2977: 2978: # Accept IPv6 addresses (contain colons) without further validation

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_2979_4: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_2979_4: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

2980: 2981: # Validate IPv4 format and octet range 2982: next unless $ip =~ /^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$/;

Mutants (Total: 3, Killed: 3, Survived: 0)

2984: return $ip;

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_2984_4: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_2984_4: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

2985: } 2986: }

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_2987_2: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_2987_2: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

2988: } 2989: 2990: # _is_private( $ip ) -> bool 2991: # 2992: # Purpose: 2993: # Test whether an IP address falls in any private, reserved, or special- 2994: # use range (IPv4 or IPv6) that should never be reported as a spam origin. 2995: # 2996: # Entry criteria: 2997: # $ip -- a scalar IP string (IPv4 or IPv6); may be undef. 2998: # 2999: # Exit status: 3000: # Returns 1 (true) if the IP is private/reserved, 0 (false) otherwise. 3001: # Returns 1 for undef or empty strings. 3002: # 3003: # Notes: 3004: # Uses the module-level @PRIVATE_RANGES array of pre-compiled regexes. 3005: # Covers all ranges listed in RFC 1122, 1918, 5737, 6598, and RFC 4193. 3006: 3007: sub _is_private { ●3008 → 3010 → 3011●3008 → 3010 → 0 3008: my ($self, $ip) = @_;

Mutants (Total: 2, Killed: 2, Survived: 0)

3010: for my $re (@PRIVATE_RANGES) { return 1 if $ip =~ $re }

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_3010_33: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_3010_33: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

Mutants (Total: 2, Killed: 2, Survived: 0)

3012: } 3013: 3014: # _is_trusted( $ip ) -> bool 3015: # 3016: # Purpose: 3017: # Test whether an IP address matches any entry in the caller-supplied 3018: # trusted_relays list (exact IP or CIDR block). 3019: # 3020: # Entry criteria: 3021: # $ip -- a defined IPv4 address string. 3022: # $self->{trusted_relays} -- arrayref of exact IPs or CIDR strings. 3023: # 3024: # Exit status: 3025: # Returns 1 (true) if the IP matches any trusted relay, 0 otherwise. 3026: 3027: sub _is_trusted { ●3028 → 3029 → 3032●3028 → 3029 → 0 3028: my ($self, $ip) = @_; 3029: for my $cidr (@{ $self->{trusted_relays} }) { 3030: return 1 if $self->_ip_in_cidr($ip, $cidr);

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_3030_3: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_3030_3: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

3031: }

Mutants (Total: 2, Killed: 2, Survived: 0)

3033: } 3034: 3035: # ----------------------------------------------------------------------- 3036: # Private: HTTP/HTTPS URL extraction and resolution 3037: # ----------------------------------------------------------------------- 3038: 3039: # _extract_and_resolve_urls() -> arrayref of url hashrefs 3040: # 3041: # Purpose: 3042: # Extract all HTTP/HTTPS URLs from the decoded body, resolve each unique 3043: # hostname to an IP, and enrich with WHOIS/RDAP data. Optionally uses 3044: # AnyEvent::DNS to parallelise the DNS resolution step. 3045: # 3046: # Entry criteria: 3047: # $self->{_body_plain} and $self->{_body_html} populated by _split_message(). 3048: # 3049: # Exit status: 3050: # Returns an arrayref of url hashrefs (possibly empty). 3051: # 3052: # Side effects: 3053: # Network I/O per unique hostname: one A/AAAA lookup, one RDAP/WHOIS. 3054: # Results stored in the CHI cross-message cache if available. 3055: 3056: sub _extract_and_resolve_urls { ●3057 → 3067 → 3073●3057 → 3067 → 0 3057: my $self = $_[0]; 3058: my (%url_seen, %host_cache); 3059: my @results; 3060: my $combined = $self->{_body_plain} . "\n" . $self->{_body_html}; 3061: 3062: # Collect unique URLs from body 3063: my @urls = grep { !$url_seen{$_}++ } $self->_extract_http_urls($combined); 3064: 3065: # Extract unique hostnames for parallel DNS resolution 3066: my %hostname_needed; 3067: for my $url (@urls) { 3068: my ($host) = $url =~ m{https?://([^/:?\s#]+)}i; 3069: $hostname_needed{$host}++ if $host; 3070: } 3071: 3072: # Parallelise DNS lookups if AnyEvent::DNS is available ●3073 → 3073 → 3078●3073 → 3073 → 0 3073: if ($HAS_ANYEVENT_DNS && scalar(keys %hostname_needed) > 1) {

Mutants (Total: 4, Killed: 0, Survived: 4)

NUM_BOUNDARY_3073_57_<: Numeric boundary flip > to <
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_3073_57_>=: Numeric boundary flip > to >=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_3073_57_<=: Numeric boundary flip > to <=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
COND_INV_3073_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

3074: $self->_parallel_resolve_hosts(\%hostname_needed, \%host_cache); 3075: } 3076: 3077: # Process each URL: resolve hostname and WHOIS-enrich ●3078 → 3078 → 3116●3078 → 3078 → 0 3078: for my $url (@urls) { 3079: my ($host) = $url =~ m{https?://([^/:?\s#]+)}i; 3080: next unless $host; 3081: 3082: # Resolve and WHOIS once per unique hostname, then cache the result

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_3083_3: Invert condition unless to if
MEDIUM: Add tests asserting both true and false outcomes

3084: # Check the cross-message CHI cache first 3085: my $cached = $_cache ? $_cache->get("url:$host") : undef;

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_3086_4: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

3087: $host_cache{$host} = $cached; 3088: } else { 3089: my $ip = $self->_resolve_host($host) // '(unresolved)'; 3090: my $whois = $ip ne '(unresolved)' 3091: ? $self->_whois_ip($ip) 3092: : {}; 3093: 3094: # Fall back to domain WHOIS if IP lookup returned nothing

Mutants (Total: 1, Killed: 1, Survived: 0)

3096: my $reg = _registrable($host) // $host; 3097: my $dw = $self->_parse_domain_whois_abuse($reg); 3098: $whois = $dw if $dw->{abuse}; 3099: } 3100: 3101: my $entry = { 3102: ip => $ip, 3103: org => $whois->{org} // '(unknown)', 3104: abuse => $whois->{abuse} // '(unknown)', 3105: country => $whois->{country} // undef, 3106: }; 3107: $host_cache{$host} = $entry; 3108: 3109: # Store in cross-message cache for reuse across messages 3110: $_cache->set("url:$host", $entry) if $_cache; 3111: } 3112: } 3113: 3114: push @results, { url => $url, host => $host, %{ $host_cache{$host} } }; 3115: } ●3116 → 3116 → 0 3116: return \@results; 3117: } 3118: 3119: # _parallel_resolve_hosts( \%hostnames, \%cache ) 3120: # 3121: # Purpose: 3122: # Resolve multiple hostnames to IPs in parallel using AnyEvent::DNS. 3123: # Populates the cache with resolved IPs so the sequential loop in 3124: # _extract_and_resolve_urls() can skip the DNS step for pre-resolved hosts. 3125: # 3126: # Entry criteria: 3127: # $hostnames_ref -- hashref keyed by hostname (values ignored). 3128: # $cache_ref -- hashref to populate with { ip => '...' } results. 3129: # AnyEvent::DNS must be installed ($HAS_ANYEVENT_DNS is true). 3130: # 3131: # Exit status: 3132: # Returns undef; all results written to %$cache_ref via side effects. 3133: # 3134: # Notes: 3135: # Errors (NXDOMAIN, timeout) are silently swallowed; the sequential 3136: # resolution loop will return '(unresolved)' for those hosts. 3137: 3138: sub _parallel_resolve_hosts { ●3139 → 3146 → 3163●3139 → 3146 → 0 3139: my ($self, $hostnames_ref, $cache_ref) = @_; 3140: return unless $HAS_ANYEVENT_DNS; 3141: 3142: # Build an AnyEvent condvar to wait for all lookups to complete 3143: my $cv = AnyEvent->condvar; 3144: my $pending = scalar keys %$hostnames_ref; 3145: 3146: for my $host (keys %$hostnames_ref) { 3147: # Fire an async A (and AAAA) query for each hostname 3148: AnyEvent::DNS::resolve( 3149: $host, 'A', 3150: sub { 3151: my @answers = @_; 3152: if (@answers) {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_3152_5: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

3153: # Cache the first A record result 3154: $cache_ref->{$host} = { ip => $answers[0][4] }; 3155: } 3156: # Decrement the pending counter; signal when all done

Mutants (Total: 3, Killed: 0, Survived: 3)

NUM_BOUNDARY_3157_29_<: Numeric boundary flip <= to <
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_3157_29_>: Numeric boundary flip <= to >
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_3157_29_>=: Numeric boundary flip <= to >=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```

3158: }, 3159: ); 3160: } 3161: 3162: # Block until all DNS queries complete (subject to AnyEvent's own timeouts) ●[NOT COVERED] 3163 → 3163 → 0 3163: $cv->recv; 3164: } 3165: 3166: # _extract_http_urls( $body ) -> list of url strings 3167: # 3168: # Purpose: 3169: # Extract all HTTP and HTTPS URLs from a body string, using both 3170: # structural HTML parsing (if HTML::LinkExtor is available) and a 3171: # plain-text regex pass. Deduplicates and strips trailing punctuation. 3172: # 3173: # Entry criteria: 3174: # $body -- combined plain+HTML body string. 3175: # 3176: # Exit status: 3177: # Returns a list of URL strings (possibly empty), deduplicated. 3178: 3179: sub _extract_http_urls { ●3180 → 3184 → 3201●3180 → 3184 → 0 3180: my ($self, $body) = @_; 3181: my @urls; 3182: 3183: # Structural HTML link extraction (handles quoted attributes correctly)

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_3184_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

3185: my $p = HTML::LinkExtor->new(sub { 3186: my ($tag, %attrs) = @_; 3187: for my $attr (qw(href src action)) { 3188: my $val = $attrs{$attr} // '';

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_3189_5: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

3190: push @urls, $val; 3191: } elsif ($val =~ m{^//[\w.-]}) { 3192: # Protocol-relative -- assume https 3193: push @urls, 'https:' . $val; 3194: } 3195: } 3196: }); 3197: $p->parse($body); 3198: } 3199: 3200: # Plain-text regex pass for bare URLs not in HTML attributes ●3201 → 3201 → 3206●3201 → 3201 → 0 3201: while ($body =~ m{(https?://[^\s<>"'\)\]]+)}gi) { 3202: push @urls, $1; 3203: } 3204: 3205: # Protocol-relative URLs not caught above ●3206 → 3206 → 3211●3206 → 3206 → 0 3206: while ($body =~ m{(?:^|[\s"'=])(//[\w.-][^\s<>"'\)\]]*)}gim) { 3207: push @urls, 'https:' . $1; 3208: } 3209: 3210: # Deduplicate and strip trailing punctuation ●3211 → 3214 → 0 3211: my %seen; 3212: my @all = grep { !$seen{$_}++ } @urls; 3213: s/[.,;:!?\)>\]]+$// for @all;

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_3214_2: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_3214_2: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

3215: } 3216: 3217: # ----------------------------------------------------------------------- 3218: # Private: domain extraction and full analysis 3219: # ----------------------------------------------------------------------- 3220: 3221: # _extract_and_analyse_domains() -> arrayref of domain hashrefs 3222: # 3223: # Purpose: 3224: # Collect all non-infrastructure contact domains from headers and body, 3225: # run the full domain intelligence pipeline on each, and return an arrayref 3226: # suitable for storage in $self->{_mailto_domains}. 3227: # 3228: # Entry criteria: 3229: # _split_message() must have been called. 3230: # 3231: # Exit status: 3232: # Always returns an arrayref; never undef. 3233: # 3234: # Side effects: 3235: # Network I/O per domain via _analyse_domain(). 3236: # Results stored in $self->{_domain_info} and CHI cache. 3237: 3238: sub _extract_and_analyse_domains { ●3239 → 3244 → 3254●3239 → 3244 → 0 3239: my $self = $_[0]; 3240: my (%seen, @domains_with_source); 3241: 3242: # Build a set of recipient domains to exclude (victims, not senders) 3243: my %recipient_domains; 3244: for my $hname (qw(to cc)) { 3245: my $val = $self->_header_value($hname) // next; 3246: for my $dom ($self->_domains_from_text($val)) { 3247: my $reg = _registrable($dom) // $dom; 3248: $recipient_domains{$dom}++; 3249: $recipient_domains{$reg}++; 3250: } 3251: } 3252: 3253: # Also exclude domains from Received: "for" envelope recipients ●3254 → 3254 → 3263●3254 → 3254 → 0 3254: for my $hop (@{ $self->{_rcvd_tracking} }) { 3255: next unless $hop->{for} && $hop->{for} =~ /\@([\w.-]+)/; 3256: my $dom = lc $1; 3257: my $reg = _registrable($dom) // $dom; 3258: $recipient_domains{$dom}++; 3259: $recipient_domains{$reg}++; 3260: } 3261: 3262: # Inner closure: record a domain if it passes all filters ●3263 → 3285 → 3292●3263 → 3285 → 0 3263: my $record = sub { 3264: my ($dom, $source) = @_; 3265: $dom = lc $dom; 3266: $dom =~ s/\.$//; 3267: next if $self->{trusted_domains}->{$dom}; 3268: return if $TRUSTED_DOMAINS{$dom}; 3269: return if $recipient_domains{$dom}; 3270: return if $recipient_domains{ _registrable($dom) // $dom }; 3271: # Discard non-routable hostnames (single-label, pseudo-TLDs, etc.) 3272: return unless $dom =~ /\.[a-zA-Z]{2,}$/; 3273: return if $dom =~ /\.(?:local|internal|lan|localdomain|arpa)$/i; 3274: return if $seen{$dom}++; 3275: push @domains_with_source, { domain => $dom, source => $source }; 3276: }; 3277: 3278: # Collect from standard sender/reply headers 3279: my %header_sources = ( 3280: 'from' => 'From: header', 3281: 'reply-to' => 'Reply-To: header', 3282: 'return-path' => 'Return-Path: header', 3283: 'sender' => 'Sender: header', 3284: ); 3285: for my $hname (sort keys %header_sources) { 3286: my $val = $self->_header_value($hname) // next; 3287: $record->($_, $header_sources{$hname}) 3288: for $self->_domains_from_text($val); 3289: } 3290: 3291: # Message-ID domain often reveals the real bulk-sending platform ●3292 → 3293 → 3301●3292 → 3293 → 0 3292: my $mid = $self->_header_value('message-id');

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_3293_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

3294: my $mid_dom = lc $1; 3295: my $mid_reg = _registrable($mid_dom) // $mid_dom; 3296: $record->($mid_dom, 'Message-ID: header') 3297: unless $TRUSTED_DOMAINS{$mid_dom} || $TRUSTED_DOMAINS{$mid_reg} || $self->{trusted_domains}->{$mid_dom} || $self->{trusted_domains}->{$mid_reg}; 3298: } 3299: 3300: # DKIM signing domain(s) -- the organisation that vouches for the message ●3301 → 3302 → 3307●3301 → 3302 → 0 3301: my $auth = $self->_parse_auth_results_cached(); 3302: for my $dkim_d (@{ $auth->{dkim_domains} // [] }) { 3303: $record->($dkim_d, 'DKIM-Signature: d= (signing domain)'); 3304: } 3305: 3306: # List-Unsubscribe identifies the ESP or bulk sender ●3307 → 3308 → 3318●3307 → 3308 → 0 3307: my $unsub = $self->_header_value('list-unsubscribe');

Mutants (Total: 1, Killed: 1, Survived: 0)

3309: while ($unsub =~ m{https?://([^/:?\s>]+)}gi) { 3310: $record->(lc $1, 'List-Unsubscribe: header'); 3311: } 3312: while ($unsub =~ m{mailto:[^@\s>]+\@([\w.-]+)}gi) { 3313: $record->(lc $1, 'List-Unsubscribe: header'); 3314: } 3315: } 3316: 3317: # Body email addresses (mailto: and bare user@domain forms) ●3318 → 3324 → 3328●3318 → 3324 → 0 3318: my $combined = $self->{_body_plain} . "\n" . $self->{_body_html}; 3319: $record->($_, 'email address / mailto in body') 3320: for $self->_domains_from_text($combined); 3321: 3322: # Run the full intelligence pipeline on each collected domain 3323: my @results; 3324: for my $entry (@domains_with_source) { 3325: my $info = $self->_analyse_domain($entry->{domain}); 3326: push @results, { %$entry, %$info }; 3327: } ●3328 → 3328 → 0 3328: return \@results; 3329: } 3330: 3331: # _domains_from_text( $text ) -> list of domain strings 3332: # 3333: # Purpose: 3334: # Extract unique domain names from mailto: links and bare user@domain 3335: # addresses in a block of text. 3336: # 3337: # Entry criteria: 3338: # $text -- a defined scalar of decoded body or header text. 3339: # 3340: # Exit status: 3341: # Returns a list of lower-cased domain strings (possibly empty). 3342: 3343: sub _domains_from_text { ●3344 → 3348 → 3354●3344 → 3348 → 0 3344: my ($self, $text) = @_; 3345: my (%seen, @out); 3346: 3347: # mailto: links (including HTML-entity-encoded @ signs from QP) 3348: while ($text =~ /mailto:(?:[^@\s<>"]+)@([\w.-]+)/gi) { 3349: my $dom = lc $1; $dom =~ s/\.$//; 3350: push @out, $dom unless $seen{$dom}++; 3351: } 3352: 3353: # Bare user@domain patterns ●3354 → 3354 → 3358●3354 → 3354 → 0 3354: while ($text =~ /\b[\w.+%-]+@([\w.-]+\.[a-zA-Z]{2,})\b/g) { 3355: my $dom = lc $1; $dom =~ s/\.$//; 3356: push @out, $dom unless $seen{$dom}++; 3357: } ●3358 → 3358 → 0 3358: return @out;

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_3358_2: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_3358_2: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

3359: } 3360: 3361: # _analyse_domain( $domain ) -> hashref 3362: # 3363: # Purpose: 3364: # Run the complete intelligence pipeline for a single domain: A record 3365: # (web hosting), MX record (mail hosting), NS record (DNS hosting), 3366: # and WHOIS (registrar, creation/expiry dates, abuse contact). 3367: # Each IP is enriched via RDAP/WHOIS. Results are cached per domain 3368: # in $self->{_domain_info} and in the CHI cross-message cache. 3369: # 3370: # Entry criteria: 3371: # $domain -- lower-cased, no trailing dot, not in TRUSTED_DOMAINS. 3372: # $self->{timeout} used for all network operations. 3373: # 3374: # Exit status: 3375: # Always returns a hashref reference; never undef; may be empty ({}). 3376: # Possible keys: web_ip, web_org, web_abuse, mx_host, mx_ip, mx_org, 3377: # mx_abuse, ns_host, ns_ip, ns_org, ns_abuse, registrar, 3378: # registrar_abuse, registered, expires, recently_registered, whois_raw. 3379: # 3380: # Side effects: 3381: # Network I/O; writes result to $self->{_domain_info}{$domain} and CHI. 3382: # 3383: # Notes: 3384: # MX/NS lookups require Net::DNS; absent without it. 3385: # recently_registered is set to 1 (not 0) when the threshold is met. 3386: # whois_raw is truncated to WHOIS_RAW_MAX bytes. 3387: 3388: sub _analyse_domain { ●3389 → 3396 → 3404●3389 → 3396 → 0 3389: my ($self, $domain) = @_; 3390: 3391: # Return the per-message cached result if already analysed

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_3392_2: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_3392_2: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

3393: if $self->{_domain_info}{$domain}; 3394: 3395: # Check the cross-message CHI cache before hitting the network

Mutants (Total: 1, Killed: 1, Survived: 0)

3397: my $cached = $_cache->get("dom:$domain"); 3398: if ($cached) {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_3398_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

3399: $self->{_domain_info}{$domain} = $cached;

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_3400_4: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_3400_4: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

3401: } 3402: } 3403: ●3404 → 3409 → 3417●3404 → 3409 → 0 3404: $self->_debug("Analysing domain: $domain"); 3405: my %info; 3406: 3407: # --- A record -> web hosting IP --- 3408: my $web_ip = $self->_resolve_host($domain);

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_3409_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

3410: $info{web_ip} = $web_ip; 3411: my $w = $self->_whois_ip($web_ip); 3412: $info{web_org} = $w->{org} if $w->{org}; 3413: $info{web_abuse} = $w->{abuse} if $w->{abuse}; 3414: } 3415: 3416: # MX and NS lookups require Net::DNS

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_3417_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

3418: my $res = Net::DNS::Resolver->new( 3419: tcp_timeout => $self->{timeout}, 3420: udp_timeout => $self->{timeout}, 3421: ); 3422: 3423: # --- MX record -> mail hosting --- 3424: my $mxq = $res->search($domain, 'MX');

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_3425_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

3426: my ($best) = sort { $a->preference <=> $b->preference } 3427: grep { $_->type eq 'MX' } $mxq->answer;

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_3428_4: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

3429: (my $mx_host = lc $best->exchange) =~ s/\.$//; 3430: $info{mx_host} = $mx_host; 3431: my $mx_ip = $self->_resolve_host($mx_host);

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_3432_5: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

3433: $info{mx_ip} = $mx_ip; 3434: my $mw = $self->_whois_ip($mx_ip); 3435: $info{mx_org} = $mw->{org} if $mw->{org}; 3436: $info{mx_abuse} = $mw->{abuse} if $mw->{abuse}; 3437: } 3438: } 3439: } 3440: 3441: # --- NS record -> DNS hosting --- 3442: my $nsq = $res->search($domain, 'NS');

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_3443_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

3444: my ($first) = grep { $_->type eq 'NS' } $nsq->answer;

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_3445_4: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

3446: (my $ns_host = lc $first->nsdname) =~ s/\.$//; 3447: $info{ns_host} = $ns_host; 3448: my $ns_ip = $self->_resolve_host($ns_host);

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_3449_5: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

3450: $info{ns_ip} = $ns_ip; 3451: my $nw = $self->_whois_ip($ns_ip); 3452: $info{ns_org} = $nw->{org} if $nw->{org}; 3453: $info{ns_abuse} = $nw->{abuse} if $nw->{abuse}; 3454: } 3455: } 3456: } 3457: } 3458: 3459: # --- Domain WHOIS -> registrar + dates --- ●3460 → 3461 → 3513●3460 → 3461 → 0 3460: my $domain_whois = $self->_domain_whois($domain);

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_3461_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

3462: # Truncate raw WHOIS for storage but parse structured fields from full text 3463: $info{whois_raw} = substr($domain_whois, 0, $WHOIS_RAW_MAX); 3464: 3465: # Registrar name

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_3466_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

3467: ($info{registrar} = $1) =~ s/\s+$//; 3468: } 3469: 3470: # Registrar abuse contact email (try multiple field names) 3471: for my $pat ( 3472: qr/Registrar Abuse Contact Email:\s*(\S+@\S+)/i, 3473: qr/Abuse Contact Email:\s*(\S+@\S+)/i, 3474: qr/abuse-contact:\s*(\S+@\S+)/i, 3475: ) {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_3476_4: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

3477: ($info{registrar_abuse} = $1) =~ s/\s+$//; 3478: } 3479: } 3480: 3481: # Domain creation date (multiple registrar field name variations) 3482: for my $pat ( 3483: qr/Creation Date:\s*(\S+)/i, 3484: qr/Created(?:\s+On)?:\s*(\S+)/i, 3485: qr/Registration Time:\s*(\S+)/i, 3486: qr/^registered:\s*(\S+)/im, 3487: ) {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_3488_4: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

3489: ($info{registered} = $1) =~ s/[TZ].*//; 3490: } 3491: } 3492: 3493: # Domain expiry date 3494: for my $pat ( 3495: qr/Registry Expiry Date:\s*(\S+)/i, 3496: qr/Expir(?:y|ation)(?: Date)?:\s*(\S+)/i, 3497: qr/paid-till:\s*(\S+)/i, 3498: ) {

Mutants (Total: 1, Killed: 1, Survived: 0)

3500: ($info{expires} = $1) =~ s/[TZ].*//; 3501: } 3502: } 3503: 3504: # Flag recently-registered domains (< RECENT_REG_DAYS old) 3505: if ($info{registered}) {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_3505_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

3506: my $epoch = $self->_parse_date_to_epoch($info{registered}); 3507: $info{recently_registered} = 1

Mutants (Total: 3, Killed: 0, Survived: 3)

NUM_BOUNDARY_3508_36_>: Numeric boundary flip < to >
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_3508_36_<=: Numeric boundary flip < to <=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_3508_36_>=: Numeric boundary flip < to >=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```

3509: } 3510: } 3511: 3512: # Store in per-message and cross-message caches ●3513 → 3516 → 0 3513: $self->{_domain_info}{$domain} = \%info; 3514: $_cache->set("dom:$domain", \%info) if $_cache; 3515: 3516: return \%info; 3517: } 3518: 3519: # ----------------------------------------------------------------------- 3520: # Private: DNS helpers 3521: # ----------------------------------------------------------------------- 3522: 3523: # _resolve_host( $host ) -> ip_string | undef 3524: # 3525: # Purpose: 3526: # Resolve a hostname to an IPv4 (or IPv6) address. Uses Net::DNS for 3527: # both A and AAAA queries when available; falls back to inet_aton for 3528: # pure IPv4 resolution. 3529: # 3530: # Entry criteria: 3531: # $host -- hostname string or already-numeric IP. 3532: # 3533: # Exit status: 3534: # Returns the first resolved IP string, or undef on failure. 3535: # 3536: # Notes: 3537: # When the input is already a dotted-quad IPv4 it is returned immediately. 3538: # AAAA records are tried if the A query fails and Net::DNS is available. 3539: 3540: sub _resolve_host { ●3541 → 3545 → 3550●3541 → 3545 → 0 3541: my ($self, $host) = @_;

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_3542_2: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_3542_2: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

3543: 3544: # Check the CHI cache before hitting DNS

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_3545_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

3546: my $cached_ip = $_cache->get("resolve:$host");

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_3547_3: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_3547_3: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

3548: } 3549: ●3550 → 3552 → 3581●3550 → 3552 → 0 3550: my $ip; 3551:

Mutants (Total: 1, Killed: 1, Survived: 0)

3553: my $res = Net::DNS::Resolver->new( 3554: tcp_timeout => $self->{timeout}, 3555: udp_timeout => $self->{timeout}, 3556: ); 3557: 3558: # Try A record first, then AAAA for IPv6 3559: for my $type (qw(A AAAA)) { 3560: my $query = $res->search($host, $type); 3561: if ($query) {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_3561_4: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

3562: for my $rr ($query->answer) {

Mutants (Total: 1, Killed: 1, Survived: 0)

3564: $ip = $rr->address; 3565: last; 3566: } elsif ($rr->type eq 'AAAA') { 3567: $ip = $rr->address; 3568: last; 3569: } 3570: } 3571: } 3572: last if defined $ip; 3573: } 3574: } else { 3575: # Fallback: gethostbyname (IPv4 only) 3576: my $packed = eval { inet_aton($host) }; 3577: $ip = $packed ? inet_ntoa($packed) : undef; 3578: } 3579: 3580: # Cache the result (including undef as '' to avoid repeated failed lookups) ●3581 → 3581 → 3585●3581 → 3581 → 0 3581: if ($_cache) {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_3581_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

3582: $_cache->set("resolve:$host", $ip // ''); 3583: } 3584:

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_3585_2: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_3585_2: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

3586: } 3587: 3588: # _reverse_dns( $ip ) -> hostname | undef 3589: # 3590: # Purpose: 3591: # Perform a PTR (reverse DNS) lookup for an IP address. Supports both 3592: # IPv4 and IPv6 via Net::DNS when available; falls back to gethostbyaddr. 3593: # 3594: # Entry criteria: 3595: # $ip -- a defined IPv4 or IPv6 address string. 3596: # 3597: # Exit status: 3598: # Returns the PTR hostname string, or undef if no record exists. 3599: 3600: sub _reverse_dns { ●3601 → 3604 → 3616●3601 → 3604 → 0 3601: my ($self, $ip) = @_;

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_3602_2: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_3602_2: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

3603:

Mutants (Total: 1, Killed: 1, Survived: 0)

3605: my $res = Net::DNS::Resolver->new(tcp_timeout => $self->{timeout}); 3606: my $query = $res->search($ip, 'PTR'); 3607: if ($query) {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_3607_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

3608: for my $rr ($query->answer) {

Mutants (Total: 2, Killed: 2, Survived: 0)

3610: } 3611: } 3612: return undef;

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_3612_3: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_3612_3: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

3613: } 3614: 3615: # Fallback for IPv4 only ●3616 → 3616 → 0 3616: return scalar gethostbyaddr(inet_aton($ip), AF_INET); 3617: } 3618: 3619: # ----------------------------------------------------------------------- 3620: # Private: WHOIS / RDAP 3621: # ----------------------------------------------------------------------- 3622: 3623: # _whois_ip( $ip ) -> hashref 3624: # 3625: # Purpose: 3626: # Enrich an IP address with organisation name, abuse contact, and country 3627: # code. Tries RDAP first (if LWP is available), then falls back to raw 3628: # WHOIS via IANA referral. Results are cached in CHI if available. 3629: # 3630: # Entry criteria: 3631: # $ip -- a defined IPv4 or IPv6 address string. 3632: # 3633: # Exit status: 3634: # Returns { org, abuse, country } hashref; keys absent when unknown. 3635: 3636: sub _whois_ip { ●3637 → 3640 → 3645●3637 → 3640 → 0 3637: my ($self, $ip) = @_; 3638: 3639: # Check CHI cache before going to the network

Mutants (Total: 1, Killed: 1, Survived: 0)

3641: my $cached = $_cache->get("whois_ip:$ip"); 3642: return $cached if $cached;

Mutants (Total: 2, Killed: 2, Survived: 0)

3643: } 3644: ●3645 → 3648 → 3658●3645 → 3648 → 0 3645: my $result = $HAS_LWP ? $self->_rdap_lookup($ip) : {}; 3646: 3647: # Fall back to raw WHOIS if RDAP returned no organisation 3648: unless ($result->{org}) {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_3648_2: Invert condition unless to if
MEDIUM: Add tests asserting both true and false outcomes

3649: my $raw = $self->_raw_whois($ip, 'whois.iana.org');

Mutants (Total: 1, Killed: 1, Survived: 0)

3651: my ($ref) = $raw =~ /whois:\s*([\w.-]+)/i; 3652: my $detail = $ref ? $self->_raw_whois($ip, $ref) : $raw; 3653: $result = $self->_parse_whois_text($detail) if $detail; 3654: } 3655: } 3656: 3657: # Cache the enrichment result ●3658 → 3660 → 0 3658: $_cache->set("whois_ip:$ip", $result) if $_cache && $result; 3659: 3660: return $result;

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_3660_2: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_3660_2: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

3661: } 3662: 3663: # _domain_whois( $domain ) -> raw_whois_string | undef 3664: # 3665: # Purpose: 3666: # Perform a two-step WHOIS lookup for a domain: first ask IANA for the 3667: # TLD's authoritative WHOIS server, then query that server. 3668: # 3669: # Entry criteria: 3670: # $domain -- a lower-cased domain name string. 3671: # 3672: # Exit status: 3673: # Returns the raw WHOIS response string, or undef on failure. 3674: 3675: sub _domain_whois { 3676: my ($self, $domain) = @_; 3677: my $iana = $self->_raw_whois($domain, 'whois.iana.org') // return undef; 3678: my ($server) = $iana =~ /whois:\s*([\w.-]+)/i;

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_3679_2: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_3679_2: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_3680_2: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_3680_2: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

3681: } 3682: 3683: # _parse_domain_whois_abuse( $domain ) -> hashref 3684: # 3685: # Purpose: 3686: # Lightweight domain WHOIS lookup to extract only registrar name and 3687: # abuse contact. Used as a fallback in _extract_and_resolve_urls() when 3688: # a URL host cannot be resolved to an IP. 3689: # 3690: # Entry criteria: 3691: # $domain -- a registrable domain name string. 3692: # 3693: # Exit status: 3694: # Returns { org, abuse } hashref; empty hashref on failure. 3695: 3696: sub _parse_domain_whois_abuse { ●3697 → 3700 → 3704●3697 → 3700 → 0 3697: my ($self, $domain) = @_; 3698: my $raw = $self->_domain_whois($domain) // return {}; 3699: my %info;

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_3700_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

3701: ($info{org} = $1) =~ s/\s+$//; 3702: } 3703: # Try multiple field name patterns for the abuse email ●3704 → 3704 → 3713●3704 → 3704 → 0 3704: for my $pat ( 3705: qr/Registrar Abuse Contact Email:\s*(\S+\@\S+)/i, 3706: qr/Abuse Contact Email:\s*(\S+\@\S+)/i, 3707: qr/abuse-contact:\s*(\S+\@\S+)/i, 3708: ) {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_3709_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

3710: ($info{abuse} = $1) =~ s/\s+$//; 3711: } 3712: } ●3713 → 3713 → 0 3713: return \%info; 3714: } 3715: 3716: # _rdap_lookup( $ip ) -> hashref 3717: # 3718: # Purpose: 3719: # Query the ARIN RDAP API for IP block ownership information. RDAP is 3720: # preferred over raw WHOIS because it returns structured JSON. 3721: # 3722: # Entry criteria: 3723: # $ip -- a defined IPv4 or IPv6 address string. 3724: # LWP::UserAgent must be installed. 3725: # 3726: # Exit status: 3727: # Returns { org, abuse, country } hashref; empty hashref on failure. 3728: 3729: sub _rdap_lookup { ●[NOT COVERED] 3730 → 3734 → 3752●[NOT COVERED] 3730 → 3734 → 0 3730: my ($self, $ip) = @_; 3731: return {} unless $HAS_LWP; 3732: 3733: my $ua = $self->{ua};

Mutants (Total: 1, Killed: 1, Survived: 0)

3735: $ua = LWP::UserAgent->new( 3736: timeout => $self->{timeout}, 3737: agent => "Email-Abuse-Investigator/$VERSION", 3738: ); 3739: 3740: if($HAS_CONN_CACHE) {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_3740_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

3741: my $conn_cache = LWP::ConnCache->new(); 3742: $conn_cache->total_capacity(10); 3743: $ua->conn_cache($conn_cache); 3744: } 3745: 3746: $ua->env_proxy(1); 3747: $self->{ua} = $ua; 3748: } 3749: 3750: # Use the ARIN RDAP endpoint; it covers the ARIN region and redirects 3751: # for RIPE/APNIC/LACNIC/AfriNIC allocations. ●[NOT COVERED] 3752 → 3763 → 3770●[NOT COVERED] 3752 → 3763 → 0 3752: my $res = eval { $ua->get("https://rdap.arin.net/registry/ip/$ip") }; 3753: return {} unless $res && $res->is_success(); 3754: 3755: my $j = $res->decoded_content(); 3756: my %info; 3757: 3758: # Extract organisation name from the JSON response 3759: $info{org} = $1 if $j =~ /"name"\s*:\s*"([^"]+)"/; 3760: $info{handle} = $1 if $j =~ /"handle"\s*:\s*"([^"]+)"/; 3761: 3762: # Extract abuse email from the vcardArray contact block

Mutants (Total: 1, Killed: 1, Survived: 0)

3764: $info{abuse} = $1; 3765: } elsif ($j =~ /"email"\s*:\s*"([^@"]+@[^"]+)"/) { 3766: $info{abuse} = $1; 3767: } 3768: 3769: # Country code from the network's country field ●[NOT COVERED] 3770 → 3772 → 0 3770: $info{country} = $1 if $j =~ /"country"\s*:\s*"([A-Z]{2})"/; 3771: 3772: return \%info; 3773: } 3774: 3775: # _raw_whois( $query, $server ) -> string | undef 3776: # 3777: # Purpose: 3778: # Open a TCP connection to a WHOIS server on port 43, send the query, 3779: # and return the full response as a string. Uses IO::Select for read 3780: # timeouts so that alarm() is never needed (alarm() is unreliable on 3781: # Windows and in threaded Perl). Supports IPv6 WHOIS servers via 3782: # IO::Socket::IP when that module is available. 3783: # 3784: # Entry criteria: 3785: # $query -- the domain name or IP to query (defined, non-empty). 3786: # $server -- the WHOIS server hostname (default: 'whois.iana.org'). 3787: # $self->{timeout} -- seconds used for connect and per-read waits. 3788: # 3789: # Exit status: 3790: # Returns the raw WHOIS response string, or undef on connection/write failure. 3791: # 3792: # Notes: 3793: # Uses IO::Socket::IP (dual-stack) when available, falling back to 3794: # IO::Socket::INET (IPv4 only) otherwise. The IO::Select loop reads 3795: # until the server closes the connection or the per-read timeout expires. 3796: 3797: sub _raw_whois { ●3798 → 3826 → 3838●3798 → 3826 → 0 3798: my ($self, $query, $server) = @_; 3799: $server //= 'whois.iana.org'; 3800: $self->_debug("WHOIS $server -> $query"); 3801: 3802: # Choose the socket class based on what is installed. 3803: # IO::Socket::IP supports both IPv4 and IPv6 WHOIS servers. 3804: my $sock_class = $HAS_IO_SOCKET_IP ? 'IO::Socket::IP' : 'IO::Socket::INET'; 3805: 3806: # Attempt TCP connection to port 43 on the WHOIS server 3807: my $sock = eval { 3808: $sock_class->new( 3809: PeerAddr => $server, 3810: PeerPort => $WHOIS_PORT, 3811: Proto => 'tcp', 3812: Timeout => $self->{timeout}, 3813: ); 3814: }; 3815: return undef unless $sock;

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_3815_2: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_3815_2: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

3816: 3817: # Send the WHOIS query in wire format (CRLF-terminated per RFC 3912)

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_3818_53: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_3818_53: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

3819: 3820: # Use IO::Select to implement per-read timeouts without alarm() 3821: my $sel = IO::Select->new($sock); 3822: my $response = ''; 3823: my $buf = ''; 3824: 3825: # Read until EOF (server closes) or timeout 3826: while ($sel->can_read($self->{timeout})) { 3827: # Wrap in eval to catch 'Connection reset by peer' thrown by Fatal/autodie 3828: my $n = eval { sysread($sock, $buf, $WHOIS_READ_CHUNK) }; 3829:

Mutants (Total: 4, Killed: 1, Survived: 3)

NUM_BOUNDARY_3830_31_<: Numeric boundary flip <= to <
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_3830_31_>: Numeric boundary flip <= to >
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_3830_31_>=: Numeric boundary flip <= to >=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```

3831: $self->_debug("WHOIS read failed: $@") if $@; 3832: last; 3833: }

Mutants (Total: 3, Killed: 0, Survived: 3)

NUM_BOUNDARY_3834_32_<: Numeric boundary flip > to <
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_3834_32_>=: Numeric boundary flip > to >=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_3834_32_<=: Numeric boundary flip > to <=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```

3835: $response .= $buf; 3836: } 3837: ●3838 → 3839 → 0 3838: $sock->close();

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_3839_2: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_3839_2: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

3840: } 3841: 3842: # _parse_whois_text( $text ) -> hashref 3843: # 3844: # Purpose: 3845: # Parse a raw WHOIS IP block response to extract organisation name, 3846: # abuse contact email, and country code. 3847: # 3848: # Entry criteria: 3849: # $text -- a defined WHOIS response string. 3850: # 3851: # Exit status: 3852: # Returns { org, abuse, country } hashref; keys absent when not found. 3853: 3854: sub _parse_whois_text { ●3855 → 3860 → 3870●3855 → 3860 → 0 3855: my ($self, $text) = @_; 3856: return {} unless $text; 3857: my %info; 3858: 3859: # Try multiple field names for the organisation name 3860: for my $pat ( 3861: qr/^OrgName:\s*(.+)/mi, qr/^org-name:\s*(.+)/mi, 3862: qr/^owner:\s*(.+)/mi, qr/^descr:\s*(.+)/mi, 3863: ) {

Mutants (Total: 1, Killed: 1, Survived: 0)

3865: ($info{org} = $1) =~ s/\s+$//; 3866: } 3867: } 3868: 3869: # Try multiple field names for the abuse email ●3870 → 3870 → 3880●3870 → 3870 → 0 3870: for my $pat ( 3871: qr/OrgAbuseEmail:\s*(\S+@\S+)/mi, 3872: qr/abuse-mailbox:\s*(\S+@\S+)/mi, 3873: ) { 3874: if (!$info{abuse} && $text =~ $pat) {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_3874_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

3875: ($info{abuse} = $1) =~ s/\s+$//; 3876: } 3877: } 3878: 3879: # Last-resort: any abuse@ address in the response ●3880 → 3883 → 3886●3880 → 3883 → 0 3880: $info{abuse} //= $1 if $text =~ /(abuse\@[\w.-]+)/i; 3881: 3882: # Country code (case-insensitive match, normalised to uppercase)

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_3883_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

3884: $info{country} = uc $1; 3885: } ●3886 → 3886 → 0 3886: return \%info; 3887: } 3888: 3889: # ----------------------------------------------------------------------- 3890: # Private: authentication results parsing 3891: # ----------------------------------------------------------------------- 3892: 3893: # _parse_auth_results_cached() -> hashref 3894: # 3895: # Purpose: 3896: # Parse the Authentication-Results: header(s) from the message once, 3897: # cache the result, and return it. Extracts SPF, DKIM, DMARC, ARC 3898: # results and the DKIM signing domain(s). 3899: # 3900: # Entry criteria: 3901: # $self->{_headers} populated by _split_message(). 3902: # 3903: # Exit status: 3904: # Returns { spf, dkim, dmarc, arc, dkim_domain, dkim_domains } hashref. 3905: # Keys absent when the corresponding header or field is not present. 3906: 3907: sub _parse_auth_results_cached { ●3908 → 3927 → 3933●3908 → 3927 → 0 3908: my $self = $_[0];

Mutants (Total: 2, Killed: 2, Survived: 0)

3910: 3911: my %auth; 3912: 3913: # Concatenate all Authentication-Results: header values 3914: my $raw = join('; ', 3915: map { $_->{value} } 3916: grep { $_->{name} eq 'authentication-results' } 3917: @{ $self->{_headers} } 3918: ); 3919: 3920: # Extract individual authentication mechanism results 3921: $auth{spf} = $1 if $raw =~ /\bspf=(\S+)/i; 3922: $auth{dkim} = $1 if $raw =~ /\bdkim=(\S+)/i; 3923: $auth{dmarc} = $1 if $raw =~ /\bdmarc=(\S+)/i; 3924: $auth{arc} = $1 if $raw =~ /\barc=(\S+)/i; 3925: 3926: # Strip trailing punctuation captured by the greedy \S+ 3927: for my $k (qw(spf dkim dmarc arc)) { 3928: $auth{$k} =~ s/[;,\s]+$// if defined $auth{$k}; 3929: } 3930: 3931: # Extract DKIM signing domains from all DKIM-Signature: d= tags. 3932: # Prefer the first domain that matches the provider table (identifies ESP). ●3933 → 3934 → 3940●3933 → 3934 → 0 3933: my @dkim_domains; 3934: for my $h (grep { $_->{name} eq 'dkim-signature' } @{ $self->{_headers} }) { 3935: if ($h->{value} =~ /\bd=([^;,\s]+)/) {

Mutants (Total: 1, Killed: 1, Survived: 0)

3936: push @dkim_domains, lc $1; 3937: } 3938: } 3939: ●3940 → 3940 → 3953●3940 → 3940 → 0 3940: if (@dkim_domains) {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_3940_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

3941: # Check if any signing domain matches a known provider 3942: my $preferred; 3943: for my $d (@dkim_domains) {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_3944_4: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

3945: $preferred = $d; 3946: last; 3947: } 3948: } 3949: $auth{dkim_domain} = $preferred // $dkim_domains[0]; 3950: $auth{dkim_domains} = \@dkim_domains; 3951: } 3952: ●3953 → 3954 → 0 3953: $self->{_auth_results} = \%auth; 3954: return \%auth; 3955: } 3956: 3957: # ----------------------------------------------------------------------- 3958: # Private: provider-table lookups 3959: # ----------------------------------------------------------------------- 3960: 3961: # _provider_abuse_for_host( $host ) -> hashref | undef 3962: # 3963: # Purpose: 3964: # Look up a hostname (and each of its parent domains, stripping one label 3965: # at a time from the left) in the %PROVIDER_ABUSE table. 3966: # 3967: # Entry criteria: 3968: # $host -- a defined hostname or domain string. 3969: # 3970: # Exit status: 3971: # Returns the %PROVIDER_ABUSE entry hashref on match, undef otherwise. 3972: 3973: sub _provider_abuse_for_host { ●3974 → 3977 → 3982●3974 → 3977 → 0 3974: my ($self, $host) = @_; 3975: $host = lc $host; 3976: # Strip successive subdomains until we find a match or exhaust labels 3977: while ($host =~ /\./) {

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_3978_3: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_3978_3: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

3979: return $PROVIDER_ABUSE{$host} if $PROVIDER_ABUSE{$host}; 3980: $host =~ s/^[^.]+\.//; 3981: }

Mutants (Total: 2, Killed: 2, Survived: 0)

3983: } 3984: 3985: # _provider_abuse_for_ip( $ip, $rdns ) -> hashref | undef 3986: # 3987: # Purpose: 3988: # Look up an IP's reverse-DNS hostname in the %PROVIDER_ABUSE table to 3989: # identify well-known provider networks by rDNS pattern. 3990: # 3991: # Entry criteria: 3992: # $ip -- IPv4 or IPv6 address string (used as fallback if $rdns absent). 3993: # $rdns -- optional rDNS hostname string. 3994: # 3995: # Exit status: 3996: # Returns the %PROVIDER_ABUSE entry on match, undef otherwise. 3997: 3998: sub _provider_abuse_for_ip { 3999: my ($self, $ip, $rdns) = @_; 4000: return $self->_provider_abuse_for_host($rdns) if $rdns;

Mutants (Total: 2, Killed: 2, Survived: 0)

4001: return undef;

Mutants (Total: 2, Killed: 2, Survived: 0)

4002: } 4003: 4004: # ----------------------------------------------------------------------- 4005: # Private: eTLD+1 normalisation 4006: # ----------------------------------------------------------------------- 4007: 4008: # _registrable( $host ) -> string | undef 4009: # 4010: # Purpose: 4011: # Return the registrable eTLD+1 form of a hostname. Uses 4012: # Domain::PublicSuffix when installed for accurate results; falls back 4013: # to a built-in heuristic for the common two-letter ccTLD+2 pattern. 4014: # 4015: # Entry criteria: 4016: # $host -- a hostname string (may include subdomains). 4017: # 4018: # Exit status: 4019: # Returns the registrable domain string, or undef for single-label 4020: # hostnames (e.g. 'localhost'). 4021: # 4022: # Notes: 4023: # The heuristic handles co.uk, com.au, net.jp, org.nz etc. but not 4024: # uncommon second-level delegations like ltd.uk or plc.uk. 4025: 4026: sub _registrable { ●4027 → 4031 → 4038●4027 → 4031 → 0 4027: my ($host) = @_; 4028: return undef unless $host && $host =~ /\./;

Mutants (Total: 2, Killed: 2, Survived: 0)

4029: 4030: # Use Domain::PublicSuffix for accurate PSL-based normalisation 4031: if ($HAS_PUBLIC_SUFFIX) {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_4031_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

4032: my $psl = Domain::PublicSuffix->new(); 4033: my $root = $psl->get_root_domain(lc $host);

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_4034_3: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_4034_3: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

4035: } 4036: 4037: # Built-in heuristic fallback ●4038 → 4042 → 4046●4038 → 4042 → 0 4038: my @labels = split /\./, lc $host;

Mutants (Total: 5, Killed: 0, Survived: 5)

NUM_BOUNDARY_4039_26_<: Numeric boundary flip <= to <
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_4039_26_>: Numeric boundary flip <= to >
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_4039_26_>=: Numeric boundary flip <= to >=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
BOOL_NEGATE_4039_2: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_4039_2: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

4040: 4041: # Detect common ccTLD second-level patterns (e.g. co.uk, com.au)

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_4042_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

4043: $labels[-2] =~ /^(?:co|com|net|org|gov|edu|ac|me)$/) { 4044: return join('.', @labels[-3..-1]); 4045: } ●4046 → 4046 → 0 4046: return join('.', @labels[-2..-1]); 4047: } 4048: 4049: # ----------------------------------------------------------------------- 4050: # Private: utilities 4051: # ----------------------------------------------------------------------- 4052: 4053: # _enrich_ip( $ip, $confidence, $note ) -> origin hashref 4054: # 4055: # Purpose: 4056: # Perform rDNS and WHOIS/RDAP for a single IP and package the results 4057: # into the standard origin hashref returned by originating_ip(). 4058: # 4059: # Entry criteria: 4060: # $ip -- a defined, non-private IPv4 or IPv6 address string. 4061: # $confidence -- 'high', 'medium', or 'low'. 4062: # $note -- human-readable explanation of why this IP was chosen. 4063: # 4064: # Exit status: 4065: # Returns { ip, rdns, org, abuse, country, confidence, note } hashref. 4066: 4067: sub _enrich_ip { 4068: my ($self, $ip, $confidence, $note) = @_; 4069: my $rdns = $self->_reverse_dns($ip); 4070: my $whois = $self->_whois_ip($ip); 4071: return { 4072: ip => $ip, 4073: rdns => $rdns // '(no reverse DNS)', 4074: org => $whois->{org} // '(unknown)', 4075: abuse => $whois->{abuse} // '(unknown)', 4076: country => $whois->{country} // undef, 4077: confidence => $confidence, 4078: note => $note, 4079: }; 4080: } 4081: 4082: # _header_value( $name ) -> value_string | undef 4083: # 4084: # Purpose: 4085: # Return the value of the first header matching the given lower-cased 4086: # header name. 4087: # 4088: # Entry criteria: 4089: # $name -- a lower-cased header name string. 4090: # $self->{_headers} populated by _split_message(). 4091: # 4092: # Exit status: 4093: # Returns the value string, or undef if the header is not present. 4094: 4095: sub _header_value { ●4096 → 4097 → 4100●4096 → 4097 → 0 4096: my ($self, $name) = @_; 4097: for my $h (@{ $self->{_headers} }) {

Mutants (Total: 2, Killed: 2, Survived: 0)

4099: } ●4100 → 4100 → 0 4100: return undef;

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_4100_2: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_4100_2: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

4101: } 4102: 4103: # _ip_in_cidr( $ip, $cidr ) -> bool 4104: # 4105: # Purpose: 4106: # Test whether an IPv4 address falls within a CIDR block or is an exact 4107: # match (when $cidr contains no '/' separator). 4108: # 4109: # Entry criteria: 4110: # $ip -- a defined dotted-quad IPv4 address string. 4111: # $cidr -- a CIDR string like '10.0.0.0/8' or an exact IP. 4112: # 4113: # Exit status: 4114: # Returns 1 (true) if the IP is within the CIDR block, 0 otherwise. 4115: 4116: sub _ip_in_cidr { 4117: my ($self, $ip, $cidr) = @_;

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_4118_2: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_4118_2: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

4119: my ($net_addr, $prefix) = split m{/}, $cidr;

Mutants (Total: 5, Killed: 0, Survived: 5)

NUM_BOUNDARY_4120_67_<: Numeric boundary flip <= to <
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_4120_67_>: Numeric boundary flip <= to >
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
NUM_BOUNDARY_4120_67_>=: Numeric boundary flip <= to >=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```
BOOL_NEGATE_4120_2: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_4120_2: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

4121: 4122: # Compute the network mask and compare masked network addresses 4123: my $mask = ~0 << (32 - $prefix); 4124: my $net_n = unpack 'N', (inet_aton($net_addr) // return 0); 4125: my $ip_n = unpack 'N', (inet_aton($ip) // return 0);

Mutants (Total: 1, Killed: 0, Survived: 1)

NUM_BOUNDARY_4126_25_!=: Numeric boundary flip == to !=
HIGH: Likely missing edge-case test (boundary value)
🧪 Suggested Test
```
# Boundary test suggestion
is( func(VALUE_AT_BOUNDARY), EXPECTED, 'Test boundary behaviour' );
```

4127: } 4128: 4129: # _decode_mime_words( $str ) -> decoded_string 4130: # 4131: # Purpose: 4132: # Decode MIME encoded-words (=?charset?B/Q?...?=) in a header value 4133: # string for human-readable display in reports. 4134: # 4135: # Entry criteria: 4136: # $str -- a defined header value string; may be undef. 4137: # 4138: # Exit status: 4139: # Returns the decoded string, or '' if $str is undef. 4140: 4141: sub _decode_mime_words { 4142: my ($self, $str) = @_;

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_4143_2: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_4143_2: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

4144: # Replace each encoded-word with its decoded equivalent 4145: $str =~ s/=\?([^?]+)\?([BbQq])\?([^?]*)\?=/_decode_ew($1,$2,$3)/ge;

Mutants (Total: 2, Killed: 2, Survived: 0)

4147: } 4148: 4149: # _decode_ew( $charset, $enc, $text ) -> decoded_bytes 4150: # 4151: # Purpose: 4152: # Decode a single MIME encoded-word component (base64 or quoted-printable). 4153: # 4154: # Notes: 4155: # Non-UTF-8 charsets return raw bytes; good enough for display-name spoof 4156: # detection which only needs ASCII matching. 4157: 4158: sub _decode_ew { ●4159 → 4161 → 4168●4159 → 4161 → 0 4159: my ($charset, $enc, $text) = @_; 4160: my $raw; 4161: if (uc($enc) eq 'B') {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_4161_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

4162: $raw = decode_base64($text); 4163: } else { 4164: # Quoted-printable encoded-word uses underscore for space 4165: $text =~ s/_/ /g; 4166: $raw = decode_qp($text); 4167: }

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_4168_2: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_4168_2: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

4169: } 4170: 4171: # _parse_date_to_epoch( $str ) -> epoch_int | undef 4172: # 4173: # Purpose: 4174: # Parse common WHOIS date strings to a Unix epoch integer. 4175: # Handles YYYY-MM-DD, YYYY-MM-DDThh:mm:ssZ, and DD-Mon-YYYY formats. 4176: # 4177: # Entry criteria: 4178: # $str -- a defined date string; may be undef. 4179: # 4180: # Exit status: 4181: # Returns epoch integer on success, undef if the string cannot be parsed. 4182: 4183: sub _parse_date_to_epoch { ●4184 → 4191 → 4207●4184 → 4191 → 0 4184: my ($self, $str) = @_;

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_4185_2: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_4185_2: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

4186: 4187: # Clean the string of trailing whitespace/newlines 4188: $str =~ s/^\s+|\s+$//g; 4189: 4190: # Guard Regex: Validates the strict YYYY-MM-DDThh:mm:ssZ format

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_4191_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

4192: # Parse the string 4193: # We use 'strptime' to create a Time::Piece object. 4194: # The 'Z' indicates UTC (Zulu time). 4195: my $epoch = eval { 4196: my $t = Time::Piece->strptime($1, '%Y-%m-%dT%H:%M:%S'); 4197: 4198: # Return seconds since the epoch 4199: # Time::Piece handles the timezone offset internally when calling ->epoch 4200: 4201: # strptime returns a local time object. 4202: # We must subtract the local timezone offset to get the true UTC epoch.

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_4203_4: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_4203_4: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

4204: };

Mutants (Total: 2, Killed: 2, Survived: 0)

4206: } ●4207 → 4209 → 4213●4207 → 4209 → 0 4207: my ($y, $m, $d); 4208: 4209: if ($str =~ /^(\d{4})-(\d{2})-(\d{2})/) { ($y,$m,$d)=($1,$2,$3) }

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_4209_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

4210: elsif ($str =~ /^(\d{2})-([A-Za-z]{3})-(\d{4})/) { ($d,$m,$y)=($1,$Readonly::Values::Months::months{lc$2}//0,$3) } 4211: elsif ($str =~ /^(\d{2})\/(\d{2})\/(\d{4})/) { ($m,$d,$y)=($1,$2,$3) } 4212:

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_4213_2: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_4213_2: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

4214:

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_4215_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

4216: return eval { Time::Local::timegm(0,0,0,$d,$m-1,$y-1900) }; 4217: } 4218: # Approximate fallback without Time::Local ●[NOT COVERED] 4219 → 4219 → 0 4219: return ($y-1970)*365.25*$SECS_PER_DAY + ($m-1)*30.5*$SECS_PER_DAY + ($d-1)*$SECS_PER_DAY; 4220: } 4221: 4222: # _parse_rfc2822_date( $str ) -> epoch_int | undef 4223: # 4224: # Purpose: 4225: # Parse an RFC 2822 Date: header value to a Unix epoch integer. 4226: # Timezone offsets are intentionally ignored; the function returns a 4227: # UTC-equivalent value. For the 7-day suspicious_date window the 4228: # maximum error is ~14 hours, well within the tolerance. 4229: # 4230: # Entry criteria: 4231: # $str -- a defined Date: header value string. 4232: # 4233: # Exit status: 4234: # Returns epoch integer on success, undef if the string cannot be parsed. 4235: 4236: sub _parse_rfc2822_date { ●4237 → 4241 → 4249●4237 → 4241 → 0 4237: my ($str) = @_;

Mutants (Total: 2, Killed: 2, Survived: 0)

4239: 4240: # Match: DD Mon YYYY HH:MM:SS (timezone offset ignored) 4241: if ($str =~ /(\d{1,2})\s+([A-Za-z]{3})\s+(\d{4})\s+(\d{2}):(\d{2}):(\d{2})/) {

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_4241_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

4242: my ($d, $m, $y, $H, $M, $S) = 4243: ($1, $Readonly::Values::Months::months{ lc $2 } // 0, $3, $4, $5, $6);

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_4244_3: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_4244_3: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_4245_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

4246: return eval { Time::Local::timegm($S, $M, $H, $d, $m - 1, $y - 1900) }; 4247: } 4248: }

Mutants (Total: 2, Killed: 0, Survived: 2)

BOOL_NEGATE_4249_2: Negate boolean return expression
MEDIUM: Add tests asserting both true and false outcomes
🧪 Suggested Test
```
# Boolean branch test suggestion
ok( !func(INPUT), 'Verify boolean branch behaviour' );
```
RETURN_UNDEF_4249_2: Replace return expression with undef
LOW: Mutation survived but impact may be minor
🧪 Suggested Test
```
# Return value assertion
is( func(INPUT), EXPECTED, 'Verify correct return value' );
```

4250: } 4251: 4252: # _country_name( $cc ) -> country_name_string 4253: # 4254: # Purpose: 4255: # Return a human-readable country name for a two-letter ISO 3166-1 4256: # alpha-2 country code. Only the small set of statistically high-volume 4257: # spam-originating countries is covered; other codes are returned as-is. 4258: # 4259: # Entry criteria: 4260: # $cc -- a two-letter uppercase country code string. 4261: # 4262: # Exit status: 4263: # Returns the country name string, or the code itself if not in the table. 4264: 4265: sub _country_name { 4266: my ($cc) = @_; 4267: my %names = ( 4268: CN => 'China', RU => 'Russia', NG => 'Nigeria', 4269: VN => 'Vietnam', IN => 'India', PK => 'Pakistan', 4270: BD => 'Bangladesh', 4271: ); 4272: return $names{$cc} // $cc; 4273: } 4274: 4275: # _debug( $msg ) 4276: # 4277: # Purpose: 4278: # Write a diagnostic message to STDERR when verbose mode is enabled. 4279: # 4280: # Entry criteria: 4281: # $msg -- a defined message string. 4282: # 4283: # Notes: 4284: # Messages are prefixed with the class name for easy grepping. 4285: 4286: sub _debug { ●4287 → 4289 → 0 4287: my ($self, $msg) = @_; 4288:

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_4289_2: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

Mutants (Total: 1, Killed: 0, Survived: 1)

COND_INV_4290_3: Invert condition if to unless
MEDIUM: Add tests asserting both true and false outcomes

4291: $logger->debug("[Email::Abuse::Investigator] $msg"); 4292: } else { 4293: print STDERR "[Email::Abuse::Investigator] $msg\n"; 4294: } 4295: } 4296: } 4297: 4298: 1; 4299: 4300: __END__ 4301: 4302: =head1 ALGORITHM: DOMAIN INTELLIGENCE PIPELINE 4303: 4304: For each unique non-infrastructure domain found in the email, the module 4305: runs the following pipeline: 4306: 4307: Domain name 4308: | 4309: +-- A/AAAA record --> web hosting IP --> RDAP --> org + abuse contact 4310: | 4311: +-- MX record --> mail server hostname --> A --> RDAP --> org + abuse 4312: | 4313: +-- NS record --> nameserver hostname --> A --> RDAP --> org + abuse 4314: | 4315: +-- WHOIS (TLD whois server via IANA referral) 4316: +-- Registrar name + abuse contact 4317: +-- Creation date (-> recently-registered flag if < 180 days) 4318: +-- Expiry date (-> expires-soon or expired flags) 4319: 4320: Domains are collected from: 4321: 4322: From:/Reply-To:/Sender:/Return-Path: headers 4323: DKIM-Signature: d= (signing domain) 4324: List-Unsubscribe: (ESP / bulk sender domain) 4325: Message-ID: (often reveals real sending platform) 4326: mailto: links and bare addresses in the body 4327: 4328: =head1 CACHING 4329: 4330: Two levels of caching are used: 4331: 4332: =over 4 4333: 4334: =item Per-message cache (C<$self-E<gt>{_domain_info}>) 4335: 4336: Stores domain analysis results for the lifetime of one C<parse_email()> 4337: call. Invalidated by each call to C<parse_email()>. 4338: 4339: =item Cross-message cache (CHI Memory driver, if C<CHI> is installed) 4340: 4341: Stores IP WHOIS, DNS resolution, and domain analysis results across all 4342: messages processed by the same process. TTL is one hour. Prevents 4343: redundant WHOIS queries for infrastructure that appears in multiple 4344: messages in the same run (e.g. a sending ISP seen in 500 spam messages). 4345: 4346: =back 4347: 4348: =head1 IPV6 SUPPORT 4349: 4350: IPv6 addresses are extracted from C<Received:> headers using bracketed 4351: notation (C<[2001:db8::1]>). They are tested against the private range 4352: list (which covers ::1, fe80::/10, fc00::/7, fd00::/8, and the 4353: documentation range 2001:db8::/32) and passed through C<_whois_ip()> and 4354: C<_rdap_lookup()> in the same way as IPv4 addresses. 4355: 4356: C<_resolve_host()> attempts both A and AAAA lookups when C<Net::DNS> is 4357: installed. C<_raw_whois()> uses C<IO::Socket::IP> for dual-stack WHOIS 4358: connections when that module is installed. 4359: 4360: =head1 SEE ALSO 4361: 4362: =over 4 4363: 4364: =item * L<Configure an Object at Runtime|Object::Configure> 4365: 4366: The provider_abuse, trusted_domains and url_shorteners tables can all be overridden at runtime 4367: 4368: =item * L<Test Dashboard|https://nigelhorne.github.io/Email-Abuse-Investigator/coverage/> 4369: 4370: =item * L<ARIN RDAP|https://rdap.arin.net/> 4371: 4372: =item * L<Net::DNS>, L<LWP::UserAgent>, L<HTML::LinkExtor> 4373: 4374: =item * L<CHI>, L<AnyEvent::DNS>, L<IO::Socket::IP>, L<Domain::PublicSuffix> 4375: 4376: =back 4377: 4378: =head1 REPOSITORY 4379: 4380: L<https://github.com/nigelhorne/Email-Abuse-Investigator> 4381: 4382: =head1 SUPPORT 4383: 4384: This module is provided as-is without any warranty. 4385: 4386: Please report any bugs or feature requests to C<bug-email-abuse-investigator at rt.cpan.org>, 4387: or through the web interface at 4388: L<http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Email-Abuse-Investigator>. 4389: I will be notified, and then you'll 4390: automatically be notified of progress on your bug as I make changes. 4391: 4392: You can find documentation for this module with the perldoc command. 4393: 4394: perldoc Email::Abuse::Investigator 4395: 4396: You can also look for information at: 4397: 4398: =over 4 4399: 4400: =item * MetaCPAN 4401: 4402: L<https://metacpan.org/dist/Email-Abuse-Investigator> 4403: 4404: =item * RT: CPAN's request tracker 4405: 4406: L<https://rt.cpan.org/NoAuth/Bugs.html?Dist=Email-Abuse-Investigator> 4407: 4408: =item * CPAN Testers' Matrix 4409: 4410: L<http://matrix.cpantesters.org/?dist=Email-Abuse-Investigator> 4411: 4412: =item * CPAN Testers Dependencies 4413: 4414: L<http://deps.cpantesters.org/?module=Email-Abuse-Investigator> 4415: 4416: =back 4417: 4418: =head1 LICENCE AND COPYRIGHT 4419: 4420: Copyright 2026 Nigel Horne. 4421: 4422: Usage is subject to GPL2 licence terms. 4423: If you use it, 4424: please let me know. 4425: 4426: =cut 4427: 4428: 1;