function extract_document_code_v1
Extracts a structured document code (e.g., 2.13.4.3.3.2) from a filename using regex pattern matching.
/tf/active/vicechatdev/mailsearch/enhanced_document_comparison.py
29 - 34
simple
Purpose
This function is designed to parse filenames and extract hierarchical document codes that follow a dotted numeric pattern. It's commonly used in document management systems where files are named with embedded classification codes. The function relies on a pre-defined CODE_PATTERN regex to identify and extract the code portion from the filename string.
Source Code
def extract_document_code(filename: str) -> Optional[str]:
"""Extract document code from filename (e.g., 2.13.4.3.3.2)"""
match = CODE_PATTERN.search(filename)
if match:
return match.group(1)
return None
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
filename |
str | - | positional_or_keyword |
Parameter Details
filename: A string representing the filename (with or without path) from which to extract the document code. Expected to contain a dotted numeric pattern like '2.13.4.3.3.2'. Can be a full path or just a filename. No specific format constraints beyond being a valid string.
Return Value
Type: Optional[str]
Returns an Optional[str] - either a string containing the extracted document code (e.g., '2.13.4.3.3.2') if a match is found, or None if no matching pattern exists in the filename. The returned code is the first captured group from the regex match.
Dependencies
re
Required Imports
import re
from typing import Optional
Usage Example
import re
from typing import Optional
# Define the required CODE_PATTERN
CODE_PATTERN = re.compile(r'(\d+(?:\.\d+)+)')
def extract_document_code(filename: str) -> Optional[str]:
match = CODE_PATTERN.search(filename)
if match:
return match.group(1)
return None
# Example usage
filename1 = 'document_2.13.4.3.3.2_final.pdf'
code1 = extract_document_code(filename1)
print(code1) # Output: '2.13.4.3.3.2'
filename2 = 'report_without_code.pdf'
code2 = extract_document_code(filename2)
print(code2) # Output: None
filename3 = '/path/to/file_1.2.3.pdf'
code3 = extract_document_code(filename3)
print(code3) # Output: '1.2.3'
Best Practices
- Ensure CODE_PATTERN is defined as a module-level constant before using this function
- The function assumes CODE_PATTERN has at least one capture group - verify your regex pattern includes parentheses for capturing
- Consider validating the extracted code format if your application requires specific number of segments or value ranges
- Handle the None return value appropriately in calling code to avoid NoneType errors
- For performance-critical applications with many files, ensure CODE_PATTERN is compiled once at module level rather than inside the function
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function extract_document_code 84.1% similar
-
function extract_code_parts 73.3% similar
-
function has_wuxi_coding_v1 58.5% similar
-
function has_wuxi_coding 56.8% similar
-
function scan_output_folder_v1 56.0% similar