🔍 Code Extractor

function extract_document_code_v1

Maturity: 43

Extracts a structured document code (e.g., 2.13.4.3.3.2) from a filename using regex pattern matching.

File:
/tf/active/vicechatdev/mailsearch/enhanced_document_comparison.py
Lines:
29 - 34
Complexity:
simple

Purpose

This function is designed to parse filenames and extract hierarchical document codes that follow a dotted numeric pattern. It's commonly used in document management systems where files are named with embedded classification codes. The function relies on a pre-defined CODE_PATTERN regex to identify and extract the code portion from the filename string.

Source Code

def extract_document_code(filename: str) -> Optional[str]:
    """Extract document code from filename (e.g., 2.13.4.3.3.2)"""
    match = CODE_PATTERN.search(filename)
    if match:
        return match.group(1)
    return None

Parameters

Name Type Default Kind
filename str - positional_or_keyword

Parameter Details

filename: A string representing the filename (with or without path) from which to extract the document code. Expected to contain a dotted numeric pattern like '2.13.4.3.3.2'. Can be a full path or just a filename. No specific format constraints beyond being a valid string.

Return Value

Type: Optional[str]

Returns an Optional[str] - either a string containing the extracted document code (e.g., '2.13.4.3.3.2') if a match is found, or None if no matching pattern exists in the filename. The returned code is the first captured group from the regex match.

Dependencies

  • re

Required Imports

import re
from typing import Optional

Usage Example

import re
from typing import Optional

# Define the required CODE_PATTERN
CODE_PATTERN = re.compile(r'(\d+(?:\.\d+)+)')

def extract_document_code(filename: str) -> Optional[str]:
    match = CODE_PATTERN.search(filename)
    if match:
        return match.group(1)
    return None

# Example usage
filename1 = 'document_2.13.4.3.3.2_final.pdf'
code1 = extract_document_code(filename1)
print(code1)  # Output: '2.13.4.3.3.2'

filename2 = 'report_without_code.pdf'
code2 = extract_document_code(filename2)
print(code2)  # Output: None

filename3 = '/path/to/file_1.2.3.pdf'
code3 = extract_document_code(filename3)
print(code3)  # Output: '1.2.3'

Best Practices

  • Ensure CODE_PATTERN is defined as a module-level constant before using this function
  • The function assumes CODE_PATTERN has at least one capture group - verify your regex pattern includes parentheses for capturing
  • Consider validating the extracted code format if your application requires specific number of segments or value ranges
  • Handle the None return value appropriately in calling code to avoid NoneType errors
  • For performance-critical applications with many files, ensure CODE_PATTERN is compiled once at module level rather than inside the function

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function extract_document_code 84.1% similar

    Extracts a structured document code (e.g., '4.5.38.2') from a filename using regex pattern matching.

    From: /tf/active/vicechatdev/mailsearch/compare_documents.py
  • function extract_code_parts 73.3% similar

    Splits a document code string into its component parts using a period (.) as the delimiter.

    From: /tf/active/vicechatdev/mailsearch/copy_signed_documents.py
  • function has_wuxi_coding_v1 58.5% similar

    Validates whether a filename starts with a Wuxi coding pattern, which consists of numbers separated by dots (e.g., '2.13.4.1.2').

    From: /tf/active/vicechatdev/mailsearch/upload_non_wuxi_coded.py
  • function has_wuxi_coding 56.8% similar

    Validates whether a filename starts with a Wuxi coding pattern consisting of dot-separated numeric segments (e.g., '2.13.4.1.2').

    From: /tf/active/vicechatdev/mailsearch/copy_signed_documents.py
  • function scan_output_folder_v1 56.0% similar

    Scans a specified folder for PDF documents with embedded codes in their filenames, extracting metadata and signature information for each coded document found.

    From: /tf/active/vicechatdev/mailsearch/enhanced_document_comparison.py
← Back to Browse