function test_multiple_files
A test function that validates the extraction of text content from multiple document files using a DocumentExtractor instance, displaying extraction results and simulating combined content processing.
/tf/active/vicechatdev/leexi/test_multiple_files.py
15 - 71
moderate
Purpose
This function serves as a test harness for the DocumentExtractor class, specifically testing its ability to extract text from multiple files (markdown and text formats). It verifies file existence, extracts content from each file individually, displays extraction statistics, and simulates how the extracted content would be combined for processing by an LLM. The function provides detailed console output for debugging and validation purposes.
Source Code
def test_multiple_files():
"""Test the previous reports extraction with multiple files"""
# Initialize extractor
extractor = DocumentExtractor()
print("Multiple Files Previous Reports Test")
print("=" * 50)
# Test files
test_files = [
"test_files/previous_report_1.md",
"test_files/previous_report_2.txt"
]
# Check if test files exist
for file_path in test_files:
if not os.path.exists(file_path):
print(f"Test file not found: {file_path}")
return
print(f"Testing with {len(test_files)} files:")
for file_path in test_files:
print(f" - {file_path}")
print()
# Test extraction from each file
extracted_contents = []
for file_path in test_files:
print(f"Extracting from: {file_path}")
try:
content = extractor.extract_text(file_path)
if content:
print(f"ā Successfully extracted {len(content)} characters")
extracted_contents.append(content)
print(f"Preview: {content[:100]}...")
else:
print("ā No content extracted")
except Exception as e:
print(f"ā Error: {str(e)}")
print("-" * 40)
# Simulate the combined extraction process
if extracted_contents:
print("\nSimulating combined extraction for LLM:")
combined_content = []
for i, content in enumerate(extracted_contents):
file_name = Path(test_files[i]).name
combined_content.append(f"=== {file_name} ===\n{content}\n")
full_content = "\n".join(combined_content)
print(f"Total combined content: {len(full_content)} characters")
print(f"Combined preview:\n{full_content[:500]}...")
print("\nā Multiple file extraction simulation successful!")
else:
print("\nā No content extracted from any files")
Return Value
This function does not return any value (implicitly returns None). It performs its operations through side effects, primarily printing test results and extraction status to the console. The function may exit early if test files are not found.
Dependencies
ossyspathlibdocument_extractor
Required Imports
import os
import sys
from pathlib import Path
from document_extractor import DocumentExtractor
Usage Example
import os
import sys
from pathlib import Path
from document_extractor import DocumentExtractor
# Ensure test files exist before running
os.makedirs('test_files', exist_ok=True)
# Create sample test files
with open('test_files/previous_report_1.md', 'w') as f:
f.write('# Previous Report 1\n\nThis is a sample markdown report.')
with open('test_files/previous_report_2.txt', 'w') as f:
f.write('Previous Report 2\n\nThis is a sample text report.')
# Run the test
test_multiple_files()
# Expected output:
# Multiple Files Previous Reports Test
# ==================================================
# Testing with 2 files:
# - test_files/previous_report_1.md
# - test_files/previous_report_2.txt
# ...
# ā Multiple file extraction simulation successful!
Best Practices
- Ensure the test_files directory and required test files exist before calling this function
- The function expects specific file paths; modify the test_files list if testing with different files
- This is a test function and should not be used in production code; it's designed for development and validation
- The function prints directly to console; consider redirecting output or capturing it if running in automated test suites
- Error handling is basic; the function continues processing remaining files even if one fails
- The function performs early return if test files don't exist, so always verify file paths before execution
- Consider wrapping this function in a proper test framework (pytest, unittest) for better integration with CI/CD pipelines
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function test_document_extractor 81.5% similar
-
function test_mixed_previous_reports 79.4% similar
-
class DocumentExtractor 62.9% similar
-
function test_attendee_extraction 56.9% similar
-
function test_attendee_extraction_comprehensive 56.5% similar