extract_document_code - Code Extractor

function extract_document_code

Maturity: 53

Extracts a structured document code (e.g., '4.5.38.2') from a filename using regex pattern matching.

File:
/tf/active/vicechatdev/mailsearch/compare_documents.py

Lines:
27 - 40

Complexity:
simple

Purpose

This function parses filenames to extract hierarchical document codes that typically appear at the beginning of document names. It's commonly used in document management systems where files are prefixed with numerical codes for organization and categorization. The function returns None if no matching code pattern is found, making it safe to use in filtering and validation workflows.

Source Code

def extract_document_code(filename: str) -> Optional[str]:
    """
    Extract document code from filename (e.g., '4.5.38.2' from '4.5.38.2 Document Name.pdf')
    
    Args:
        filename: The filename to extract code from
        
    Returns:
        Document code or None if no code found
    """
    match = CODE_PATTERN.match(filename)
    if match:
        return match.group(1)
    return None

Parameters

Name	Type	Default	Kind
`filename`	str	-	positional_or_keyword

Parameter Details

filename: A string representing the filename (with or without path) from which to extract the document code. Expected to contain a numerical code pattern at the beginning (e.g., '4.5.38.2 Document Name.pdf'). Can be a full path or just the filename. No constraints on length or format, but must be a valid string.

Return Value

Type: Optional[str]

Returns an Optional[str] - either a string containing the extracted document code (e.g., '4.5.38.2') if a matching pattern is found, or None if no code pattern matches. The returned code is the first capturing group from the CODE_PATTERN regex match.

Dependencies

re

Required Imports

import re
from typing import Optional

Usage Example

import re
from typing import Optional

# Define the CODE_PATTERN (must be defined before using the function)
CODE_PATTERN = re.compile(r'^(\d+(?:\.\d+)+)')

def extract_document_code(filename: str) -> Optional[str]:
    match = CODE_PATTERN.match(filename)
    if match:
        return match.group(1)
    return None

# Example usage
filename1 = '4.5.38.2 Document Name.pdf'
code1 = extract_document_code(filename1)
print(code1)  # Output: '4.5.38.2'

filename2 = 'Document Without Code.pdf'
code2 = extract_document_code(filename2)
print(code2)  # Output: None

filename3 = '/path/to/1.2.3 Report.docx'
code3 = extract_document_code(filename3)
print(code3)  # Output: None (pattern matches start of string, not basename)

Best Practices

Ensure CODE_PATTERN is defined as a module-level constant before calling this function
The function expects CODE_PATTERN to match from the start of the filename string - if processing full paths, extract the basename first using os.path.basename() or Path().name
The regex pattern should have at least one capturing group to extract the code
Handle the None return value appropriately in calling code to avoid NoneType errors
Consider validating the extracted code format if specific hierarchical structures are required
For batch processing, compile the regex pattern once at module level rather than inside the function for better performance

Similar Components

AI-powered semantic similarity - components with related functionality:

function extract_document_code_v1 84.1% similar

Extracts a structured document code (e.g., 2.13.4.3.3.2) from a filename using regex pattern matching.
From: /tf/active/vicechatdev/mailsearch/enhanced_document_comparison.py
function extract_code_parts 58.2% similar

Splits a document code string into its component parts using a period (.) as the delimiter.
From: /tf/active/vicechatdev/mailsearch/copy_signed_documents.py
function get_document_type_code 53.8% similar

Retrieves a document type code from a dictionary lookup using the provided document type name, returning the name itself if no mapping exists.
From: /tf/active/vicechatdev/CDocs/settings_prod.py
function fuzzy_match_filename 53.1% similar

Calculates a fuzzy match similarity score between two filenames by comparing them after normalization, using exact matching, substring containment, and word overlap techniques.
From: /tf/active/vicechatdev/mailsearch/compare_documents.py
function is_valid_document_file 51.3% similar

Validates whether a given filename has an extension corresponding to a supported document type by checking against a predefined list of valid document extensions.
From: /tf/active/vicechatdev/CDocs/utils/__init__.py

🔍 Code Extractor

function extract_document_code

Purpose

Source Code

Parameters

Parameter Details

Return Value

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function extract_document_code_v1 84.1% similar

function extract_code_parts 58.2% similar

function get_document_type_code 53.8% similar

function fuzzy_match_filename 53.1% similar

function is_valid_document_file 51.3% similar

function extract_document_code

Purpose

Source Code

Parameters

Parameter Details

Return Value

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function extract_document_code_v1 84.1% similar

function extract_code_parts 58.2% similar

function get_document_type_code 53.8% similar

function fuzzy_match_filename 53.1% similar

function is_valid_document_file 51.3% similar

✨ Improve Code: extract_document_code

Code Comparison