function process_directory
Processes all files matching a specified pattern in a directory, applying date fixes to each file and providing a summary of results.
/tf/active/vicechatdev/mailsearch/fix_file_dates.py
120 - 162
moderate
Purpose
This function is designed to batch process PDF files (or other files matching a pattern) in a directory structure. It searches for files matching a glob pattern, processes each file using a fix_file_dates function, and provides detailed progress reporting and summary statistics. It supports both recursive and non-recursive directory traversal, and includes a dry-run mode for testing without making actual changes.
Source Code
def process_directory(directory, pattern="*_fully_signed.pdf", dry_run=False, recursive=True):
"""Process all matching files in a directory"""
base_path = Path(directory)
if not base_path.exists():
print(f"Error: Directory {directory} does not exist")
return
# Find all matching files
if recursive:
files = list(base_path.rglob(pattern))
else:
files = list(base_path.glob(pattern))
if not files:
print(f"No files matching pattern '{pattern}' found in {directory}")
return
print(f"Found {len(files)} files matching '{pattern}'")
print("=" * 80)
success_count = 0
error_count = 0
for filepath in sorted(files):
try:
if fix_file_dates(str(filepath), dry_run):
success_count += 1
else:
error_count += 1
except Exception as e:
print(f"\n{filepath}")
print(f" ✗ Error: {e}")
error_count += 1
print("\n" + "=" * 80)
print("SUMMARY")
print("=" * 80)
print(f"Total files: {len(files)}")
print(f"Successful: {success_count}")
print(f"Errors: {error_count}")
if dry_run:
print("\n(Dry run - no files were modified)")
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
directory |
- | - | positional_or_keyword |
pattern |
- | '*_fully_signed.pdf' | positional_or_keyword |
dry_run |
- | False | positional_or_keyword |
recursive |
- | True | positional_or_keyword |
Parameter Details
directory: String or path-like object specifying the directory to search for files. Must be an existing directory path.
pattern: Glob pattern string to match files. Defaults to '*_fully_signed.pdf'. Supports standard glob wildcards like * and ?. Examples: '*.pdf', 'report_*.txt', '**/*.csv'
dry_run: Boolean flag indicating whether to run in simulation mode. When True, no actual file modifications are made. Defaults to False.
recursive: Boolean flag controlling whether to search subdirectories. When True, uses rglob for recursive search; when False, uses glob for current directory only. Defaults to True.
Return Value
Returns None. The function produces side effects by calling fix_file_dates on matching files and prints progress information and summary statistics to stdout.
Dependencies
pathlibsubprocessargparse
Required Imports
from pathlib import Path
Usage Example
from pathlib import Path
# Define or import the fix_file_dates function
def fix_file_dates(filepath, dry_run=False):
# Implementation that processes the file
print(f"Processing {filepath}")
return True
# Process all fully signed PDFs in a directory recursively
process_directory('/path/to/documents', pattern='*_fully_signed.pdf', dry_run=False, recursive=True)
# Dry run to preview what would be processed
process_directory('/path/to/documents', pattern='*.pdf', dry_run=True, recursive=False)
# Process only current directory with custom pattern
process_directory('/path/to/reports', pattern='report_*.txt', dry_run=False, recursive=False)
Best Practices
- Always test with dry_run=True first to preview which files will be processed before making actual changes
- Ensure the fix_file_dates function is properly defined or imported before calling this function
- The function expects fix_file_dates to return a boolean indicating success/failure
- Use appropriate glob patterns to avoid processing unintended files
- Be cautious with recursive=True on large directory structures as it may process many files
- The function handles exceptions per file, so one file error won't stop processing of remaining files
- Monitor the console output for real-time progress and error messages
- Check the summary statistics at the end to verify processing results
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function main_v112 57.1% similar
-
function test_enhanced_pdf_processing 55.1% similar
-
function test_document_processing 54.5% similar
-
function main_v1 54.3% similar
-
function scan_output_folder_v1 53.7% similar