šŸ” Code Extractor

function test_multiple_files

Maturity: 45

A test function that validates the extraction of text content from multiple document files using a DocumentExtractor instance, displaying extraction results and simulating combined content processing.

File:
/tf/active/vicechatdev/leexi/test_multiple_files.py
Lines:
15 - 71
Complexity:
moderate

Purpose

This function serves as a test harness for the DocumentExtractor class, specifically testing its ability to extract text from multiple files (markdown and text formats). It verifies file existence, extracts content from each file individually, displays extraction statistics, and simulates how the extracted content would be combined for processing by an LLM. The function provides detailed console output for debugging and validation purposes.

Source Code

def test_multiple_files():
    """Test the previous reports extraction with multiple files"""
    
    # Initialize extractor
    extractor = DocumentExtractor()
    
    print("Multiple Files Previous Reports Test")
    print("=" * 50)
    
    # Test files
    test_files = [
        "test_files/previous_report_1.md",
        "test_files/previous_report_2.txt"
    ]
    
    # Check if test files exist
    for file_path in test_files:
        if not os.path.exists(file_path):
            print(f"Test file not found: {file_path}")
            return
    
    print(f"Testing with {len(test_files)} files:")
    for file_path in test_files:
        print(f"  - {file_path}")
    print()
    
    # Test extraction from each file
    extracted_contents = []
    for file_path in test_files:
        print(f"Extracting from: {file_path}")
        try:
            content = extractor.extract_text(file_path)
            if content:
                print(f"āœ“ Successfully extracted {len(content)} characters")
                extracted_contents.append(content)
                print(f"Preview: {content[:100]}...")
            else:
                print("āœ— No content extracted")
        except Exception as e:
            print(f"āœ— Error: {str(e)}")
        print("-" * 40)
    
    # Simulate the combined extraction process
    if extracted_contents:
        print("\nSimulating combined extraction for LLM:")
        combined_content = []
        for i, content in enumerate(extracted_contents):
            file_name = Path(test_files[i]).name
            combined_content.append(f"=== {file_name} ===\n{content}\n")
        
        full_content = "\n".join(combined_content)
        print(f"Total combined content: {len(full_content)} characters")
        print(f"Combined preview:\n{full_content[:500]}...")
        
        print("\nāœ“ Multiple file extraction simulation successful!")
    else:
        print("\nāœ— No content extracted from any files")

Return Value

This function does not return any value (implicitly returns None). It performs its operations through side effects, primarily printing test results and extraction status to the console. The function may exit early if test files are not found.

Dependencies

  • os
  • sys
  • pathlib
  • document_extractor

Required Imports

import os
import sys
from pathlib import Path
from document_extractor import DocumentExtractor

Usage Example

import os
import sys
from pathlib import Path
from document_extractor import DocumentExtractor

# Ensure test files exist before running
os.makedirs('test_files', exist_ok=True)

# Create sample test files
with open('test_files/previous_report_1.md', 'w') as f:
    f.write('# Previous Report 1\n\nThis is a sample markdown report.')

with open('test_files/previous_report_2.txt', 'w') as f:
    f.write('Previous Report 2\n\nThis is a sample text report.')

# Run the test
test_multiple_files()

# Expected output:
# Multiple Files Previous Reports Test
# ==================================================
# Testing with 2 files:
#   - test_files/previous_report_1.md
#   - test_files/previous_report_2.txt
# ...
# āœ“ Multiple file extraction simulation successful!

Best Practices

  • Ensure the test_files directory and required test files exist before calling this function
  • The function expects specific file paths; modify the test_files list if testing with different files
  • This is a test function and should not be used in production code; it's designed for development and validation
  • The function prints directly to console; consider redirecting output or capturing it if running in automated test suites
  • Error handling is basic; the function continues processing remaining files even if one fails
  • The function performs early return if test files don't exist, so always verify file paths before execution
  • Consider wrapping this function in a proper test framework (pytest, unittest) for better integration with CI/CD pipelines

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function test_document_extractor 81.5% similar

    A test function that validates the DocumentExtractor class by testing file type support detection, text extraction from various document formats, and error handling.

    From: /tf/active/vicechatdev/leexi/test_document_extractor.py
  • function test_mixed_previous_reports 79.4% similar

    A test function that validates the DocumentExtractor's ability to extract text content from multiple file formats (text and markdown) and combine them into a unified previous reports summary.

    From: /tf/active/vicechatdev/leexi/test_enhanced_reports.py
  • class DocumentExtractor 62.9% similar

    A document text extraction class that supports multiple file formats including Word, PowerPoint, PDF, and plain text files, with automatic format detection and conversion capabilities.

    From: /tf/active/vicechatdev/leexi/document_extractor.py
  • function test_attendee_extraction 56.9% similar

    A test function that validates the attendee extraction logic of the EnhancedMeetingMinutesGenerator by parsing a meeting transcript and displaying extracted metadata including speakers, date, and duration.

    From: /tf/active/vicechatdev/leexi/test_attendee_extraction.py
  • function test_attendee_extraction_comprehensive 56.5% similar

    A comprehensive test function that validates the attendee extraction logic from meeting transcripts, comparing actual speakers versus mentioned names, and demonstrating integration with meeting minutes generation.

    From: /tf/active/vicechatdev/leexi/test_attendee_comprehensive.py
← Back to Browse