🔍 Code Extractor

function process_directory

Maturity: 46

Processes all files matching a specified pattern in a directory, applying date fixes to each file and providing a summary of results.

File:
/tf/active/vicechatdev/mailsearch/fix_file_dates.py
Lines:
120 - 162
Complexity:
moderate

Purpose

This function is designed to batch process PDF files (or other files matching a pattern) in a directory structure. It searches for files matching a glob pattern, processes each file using a fix_file_dates function, and provides detailed progress reporting and summary statistics. It supports both recursive and non-recursive directory traversal, and includes a dry-run mode for testing without making actual changes.

Source Code

def process_directory(directory, pattern="*_fully_signed.pdf", dry_run=False, recursive=True):
    """Process all matching files in a directory"""
    base_path = Path(directory)
    
    if not base_path.exists():
        print(f"Error: Directory {directory} does not exist")
        return
    
    # Find all matching files
    if recursive:
        files = list(base_path.rglob(pattern))
    else:
        files = list(base_path.glob(pattern))
    
    if not files:
        print(f"No files matching pattern '{pattern}' found in {directory}")
        return
    
    print(f"Found {len(files)} files matching '{pattern}'")
    print("=" * 80)
    
    success_count = 0
    error_count = 0
    
    for filepath in sorted(files):
        try:
            if fix_file_dates(str(filepath), dry_run):
                success_count += 1
            else:
                error_count += 1
        except Exception as e:
            print(f"\n{filepath}")
            print(f"  ✗ Error: {e}")
            error_count += 1
    
    print("\n" + "=" * 80)
    print("SUMMARY")
    print("=" * 80)
    print(f"Total files: {len(files)}")
    print(f"Successful: {success_count}")
    print(f"Errors: {error_count}")
    if dry_run:
        print("\n(Dry run - no files were modified)")

Parameters

Name Type Default Kind
directory - - positional_or_keyword
pattern - '*_fully_signed.pdf' positional_or_keyword
dry_run - False positional_or_keyword
recursive - True positional_or_keyword

Parameter Details

directory: String or path-like object specifying the directory to search for files. Must be an existing directory path.

pattern: Glob pattern string to match files. Defaults to '*_fully_signed.pdf'. Supports standard glob wildcards like * and ?. Examples: '*.pdf', 'report_*.txt', '**/*.csv'

dry_run: Boolean flag indicating whether to run in simulation mode. When True, no actual file modifications are made. Defaults to False.

recursive: Boolean flag controlling whether to search subdirectories. When True, uses rglob for recursive search; when False, uses glob for current directory only. Defaults to True.

Return Value

Returns None. The function produces side effects by calling fix_file_dates on matching files and prints progress information and summary statistics to stdout.

Dependencies

  • pathlib
  • subprocess
  • argparse

Required Imports

from pathlib import Path

Usage Example

from pathlib import Path

# Define or import the fix_file_dates function
def fix_file_dates(filepath, dry_run=False):
    # Implementation that processes the file
    print(f"Processing {filepath}")
    return True

# Process all fully signed PDFs in a directory recursively
process_directory('/path/to/documents', pattern='*_fully_signed.pdf', dry_run=False, recursive=True)

# Dry run to preview what would be processed
process_directory('/path/to/documents', pattern='*.pdf', dry_run=True, recursive=False)

# Process only current directory with custom pattern
process_directory('/path/to/reports', pattern='report_*.txt', dry_run=False, recursive=False)

Best Practices

  • Always test with dry_run=True first to preview which files will be processed before making actual changes
  • Ensure the fix_file_dates function is properly defined or imported before calling this function
  • The function expects fix_file_dates to return a boolean indicating success/failure
  • Use appropriate glob patterns to avoid processing unintended files
  • Be cautious with recursive=True on large directory structures as it may process many files
  • The function handles exceptions per file, so one file error won't stop processing of remaining files
  • Monitor the console output for real-time progress and error messages
  • Check the summary statistics at the end to verify processing results

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function main_v112 57.1% similar

    Entry point function that parses command-line arguments to fix file timestamps by setting them to the oldest date found, either for a single file or recursively through a directory.

    From: /tf/active/vicechatdev/mailsearch/fix_file_dates.py
  • function test_enhanced_pdf_processing 55.1% similar

    A comprehensive test function that validates PDF processing capabilities, including text extraction, cleaning, chunking, and table detection across multiple PDF processing libraries.

    From: /tf/active/vicechatdev/vice_ai/test_enhanced_pdf.py
  • function test_document_processing 54.5% similar

    A test function that validates document processing functionality by creating a test PDF file, processing it through a DocumentProcessor, and verifying the extraction results or error handling.

    From: /tf/active/vicechatdev/contract_validity_analyzer/test_implementation.py
  • function main_v1 54.3% similar

    Main execution function that processes and copies document files from an output directory to target folders based on document codes, with support for dry-run and test modes.

    From: /tf/active/vicechatdev/mailsearch/copy_signed_documents.py
  • function scan_output_folder_v1 53.7% similar

    Scans a specified folder for PDF documents with embedded codes in their filenames, extracting metadata and signature information for each coded document found.

    From: /tf/active/vicechatdev/mailsearch/enhanced_document_comparison.py
← Back to Browse