extract_document_code_v1 - Code Extractor

function extract_document_code_v1

Maturity: 43

Extracts a structured document code (e.g., 2.13.4.3.3.2) from a filename using regex pattern matching.

File:
/tf/active/vicechatdev/mailsearch/enhanced_document_comparison.py

Lines:
29 - 34

Complexity:
simple

Purpose

This function is designed to parse filenames and extract hierarchical document codes that follow a dotted numeric pattern. It's commonly used in document management systems where files are named with embedded classification codes. The function relies on a pre-defined CODE_PATTERN regex to identify and extract the code portion from the filename string.

Source Code

def extract_document_code(filename: str) -> Optional[str]:
    """Extract document code from filename (e.g., 2.13.4.3.3.2)"""
    match = CODE_PATTERN.search(filename)
    if match:
        return match.group(1)
    return None

Parameters

Name	Type	Default	Kind
`filename`	str	-	positional_or_keyword

Parameter Details

filename: A string representing the filename (with or without path) from which to extract the document code. Expected to contain a dotted numeric pattern like '2.13.4.3.3.2'. Can be a full path or just a filename. No specific format constraints beyond being a valid string.

Return Value

Type: Optional[str]

Returns an Optional[str] - either a string containing the extracted document code (e.g., '2.13.4.3.3.2') if a match is found, or None if no matching pattern exists in the filename. The returned code is the first captured group from the regex match.

Dependencies

re

Required Imports

import re
from typing import Optional

Usage Example

import re
from typing import Optional

# Define the required CODE_PATTERN
CODE_PATTERN = re.compile(r'(\d+(?:\.\d+)+)')

def extract_document_code(filename: str) -> Optional[str]:
    match = CODE_PATTERN.search(filename)
    if match:
        return match.group(1)
    return None

# Example usage
filename1 = 'document_2.13.4.3.3.2_final.pdf'
code1 = extract_document_code(filename1)
print(code1)  # Output: '2.13.4.3.3.2'

filename2 = 'report_without_code.pdf'
code2 = extract_document_code(filename2)
print(code2)  # Output: None

filename3 = '/path/to/file_1.2.3.pdf'
code3 = extract_document_code(filename3)
print(code3)  # Output: '1.2.3'

Best Practices

Ensure CODE_PATTERN is defined as a module-level constant before using this function
The function assumes CODE_PATTERN has at least one capture group - verify your regex pattern includes parentheses for capturing
Consider validating the extracted code format if your application requires specific number of segments or value ranges
Handle the None return value appropriately in calling code to avoid NoneType errors
For performance-critical applications with many files, ensure CODE_PATTERN is compiled once at module level rather than inside the function

Similar Components

AI-powered semantic similarity - components with related functionality:

function extract_document_code 84.1% similar

Extracts a structured document code (e.g., '4.5.38.2') from a filename using regex pattern matching.
From: /tf/active/vicechatdev/mailsearch/compare_documents.py
function extract_code_parts 73.3% similar

Splits a document code string into its component parts using a period (.) as the delimiter.
From: /tf/active/vicechatdev/mailsearch/copy_signed_documents.py
function has_wuxi_coding_v1 58.5% similar

Validates whether a filename starts with a Wuxi coding pattern, which consists of numbers separated by dots (e.g., '2.13.4.1.2').
From: /tf/active/vicechatdev/mailsearch/upload_non_wuxi_coded.py
function has_wuxi_coding 56.8% similar

Validates whether a filename starts with a Wuxi coding pattern consisting of dot-separated numeric segments (e.g., '2.13.4.1.2').
From: /tf/active/vicechatdev/mailsearch/copy_signed_documents.py
function scan_output_folder_v1 56.0% similar

Scans a specified folder for PDF documents with embedded codes in their filenames, extracting metadata and signature information for each coded document found.
From: /tf/active/vicechatdev/mailsearch/enhanced_document_comparison.py

🔍 Code Extractor

function extract_document_code_v1

Purpose

Source Code

Parameters

Parameter Details

Return Value

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function extract_document_code 84.1% similar

function extract_code_parts 73.3% similar

function has_wuxi_coding_v1 58.5% similar

function has_wuxi_coding 56.8% similar

function scan_output_folder_v1 56.0% similar

function extract_document_code_v1

Purpose

Source Code

Parameters

Parameter Details

Return Value

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function extract_document_code 84.1% similar

function extract_code_parts 73.3% similar

function has_wuxi_coding_v1 58.5% similar

function has_wuxi_coding 56.8% similar

function scan_output_folder_v1 56.0% similar

✨ Improve Code: extract_document_code_v1

Code Comparison