merge_pdfs_v1 - Code Extractor

function merge_pdfs_v1

Maturity: 49

Merges multiple PDF files into a single output PDF file with robust error handling and fallback mechanisms.

File:
/tf/active/vicechatdev/msg_to_eml.py

Lines:
412 - 474

Complexity:
moderate

Purpose

This function combines multiple PDF files into one consolidated PDF document. It validates input files, filters out non-existent or empty files, and attempts to use PyMuPDF (fitz) as the primary merging library with PyPDF2 as a fallback. It handles edge cases like single file inputs (which are simply copied) and continues processing even if individual PDFs fail to merge.

Source Code

def merge_pdfs(input_paths, output_path):
    """Merge multiple PDF files with better error handling"""
    try:
        # Filter out non-existent files
        valid_paths = [path for path in input_paths if os.path.exists(path) and os.path.getsize(path) > 0]
        
        if not valid_paths:
            logger.error("No valid PDF files to merge")
            return None
            
        if len(valid_paths) == 1:
            # Just copy the single file
            shutil.copy2(valid_paths[0], output_path)
            return output_path
        
        # Try PyMuPDF first, as it's commonly used and more robust
        try:
            import fitz
            
            # Create output PDF
            output_pdf = fitz.open()
            
            # Add each input PDF
            for input_path in valid_paths:
                try:
                    pdf = fitz.open(input_path)
                    output_pdf.insert_pdf(pdf)
                except Exception as e:
                    logger.warning(f"Problem with PDF {input_path}: {str(e)}")
                    continue
            
            # Save merged PDF
            output_pdf.save(output_path)
            output_pdf.close()
            
            return output_path
            
        except ImportError:
            # Fall back to using PyPDF2
            try:
                from PyPDF2 import PdfMerger
                
                merger = PdfMerger()
                
                for input_path in valid_paths:
                    try:
                        merger.append(input_path)
                    except Exception as e:
                        logger.warning(f"Problem with PDF {input_path}: {str(e)}")
                        continue
                
                merger.write(output_path)
                merger.close()
                
                return output_path
            except ImportError:
                logger.error("No PDF merging library available. Install PyMuPDF or PyPDF2.")
                return None
            
    except Exception as e:
        logger.error(f"Error merging PDFs: {str(e)}")
        logger.error(traceback.format_exc())
        return None

Parameters

Name	Type	Default	Kind
`input_paths`	-	-	positional_or_keyword
`output_path`	-	-	positional_or_keyword

Parameter Details

input_paths: A list or iterable of file paths (strings) pointing to PDF files to be merged. The function will filter out non-existent files and empty files (size 0 bytes) automatically. Order in the list determines the order in the merged output.

output_path: A string representing the file path where the merged PDF should be saved. Should include the filename and .pdf extension. The directory must exist or be writable.

Return Value

Returns the output_path (string) if the merge operation succeeds, or None if the operation fails (no valid input files, missing libraries, or other errors). The returned path confirms the location of the successfully created merged PDF.

Dependencies

os
shutil
traceback
fitz (PyMuPDF)
PyPDF2
logging

Required Imports

import os
import shutil
import traceback
import logging

Conditional/Optional Imports

These imports are only needed under specific conditions:

import fitz

Condition: Primary PDF merging library (PyMuPDF). Used first if available. Install with: pip install PyMuPDF

Optional

from PyPDF2 import PdfMerger

Condition: Fallback PDF merging library. Used only if PyMuPDF (fitz) is not available. Install with: pip install PyPDF2

Optional

Usage Example

import os
import shutil
import traceback
import logging

# Setup logger
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)

# Install dependencies first:
# pip install PyMuPDF
# or
# pip install PyPDF2

def merge_pdfs(input_paths, output_path):
    # ... (function code here)
    pass

# Example usage
input_files = ['document1.pdf', 'document2.pdf', 'document3.pdf']
output_file = 'merged_output.pdf'

result = merge_pdfs(input_files, output_file)

if result:
    print(f'Successfully merged PDFs to: {result}')
else:
    print('Failed to merge PDFs')

Best Practices

Ensure at least one PDF library (PyMuPDF or PyPDF2) is installed before calling this function
Always check the return value - None indicates failure, a path string indicates success
The function logs warnings for individual PDF failures but continues processing remaining files
Input files are validated automatically - non-existent or empty files are filtered out
For single file inputs, the function optimizes by copying instead of merging
PyMuPDF (fitz) is preferred over PyPDF2 for better robustness and performance
Ensure the logger object is properly configured in the calling scope
The output directory must exist before calling this function
Consider wrapping calls in try-except blocks for additional error handling at the application level

Similar Components

AI-powered semantic similarity - components with related functionality:

function merge_pdfs 72.0% similar

Merges multiple PDF files into a single consolidated PDF document by delegating to a PDFManipulator instance.
From: /tf/active/vicechatdev/CDocs/utils/pdf_utils.py
class DocumentMerger 69.8% similar

A class that merges PDF documents with audit trail pages, combining an original PDF with an audit page and updating metadata to reflect the audit process.
From: /tf/active/vicechatdev/document_auditor/src/document_merger.py
class MultiPagePDFProcessor 53.9% similar

A class for processing multi-page PDF documents with context-aware analysis, OCR, and summarization capabilities.
From: /tf/active/vicechatdev/e-ink-llm/multi_page_processor.py
function test_enhanced_pdf_processing 51.0% similar

A comprehensive test function that validates PDF processing capabilities, including text extraction, cleaning, chunking, and table detection across multiple PDF processing libraries.
From: /tf/active/vicechatdev/vice_ai/test_enhanced_pdf.py
function eml_to_pdf 50.2% similar

Converts an .eml email file to PDF format, including the email body and all attachments merged into a single PDF document.
From: /tf/active/vicechatdev/msg_to_eml.py

← Back to Browse

Assistant

Hi! I can help improve this code. Tell me what you'd like to enhance (e.g., "add error handling", "optimize performance", "improve readability", "add type hints").

Code Comparison

Original Code

                            def merge_pdfs(input_paths, output_path):
    """Merge multiple PDF files with better error handling"""
    try:
        # Filter out non-existent files
        valid_paths = [path for path in input_paths if os.path.exists(path) and os.path.getsize(path) > 0]
        
        if not valid_paths:
            logger.error("No valid PDF files to merge")
            return None
            
        if len(valid_paths) == 1:
            # Just copy the single file
            shutil.copy2(valid_paths[0], output_path)
            return output_path
        
        # Try PyMuPDF first, as it's commonly used and more robust
        try:
            import fitz
            
            # Create output PDF
            output_pdf = fitz.open()
            
            # Add each input PDF
            for input_path in valid_paths:
                try:
                    pdf = fitz.open(input_path)
                    output_pdf.insert_pdf(pdf)
                except Exception as e:
                    logger.warning(f"Problem with PDF {input_path}: {str(e)}")
                    continue
            
            # Save merged PDF
            output_pdf.save(output_path)
            output_pdf.close()
            
            return output_path
            
        except ImportError:
            # Fall back to using PyPDF2
            try:
                from PyPDF2 import PdfMerger
                
                merger = PdfMerger()
                
                for input_path in valid_paths:
                    try:
                        merger.append(input_path)
                    except Exception as e:
                        logger.warning(f"Problem with PDF {input_path}: {str(e)}")
                        continue
                
                merger.write(output_path)
                merger.close()
                
                return output_path
            except ImportError:
                logger.error("No PDF merging library available. Install PyMuPDF or PyPDF2.")
                return None
            
    except Exception as e:
        logger.error(f"Error merging PDFs: {str(e)}")
        logger.error(traceback.format_exc())
        return None
                        

Improved Code

🔍 Code Extractor

function merge_pdfs_v1

Purpose

Source Code

Parameters

Parameter Details

Return Value

Dependencies

Required Imports

Conditional/Optional Imports

Usage Example

Best Practices

Tags

Similar Components

function merge_pdfs 72.0% similar

class DocumentMerger 69.8% similar

class MultiPagePDFProcessor 53.9% similar

function test_enhanced_pdf_processing 51.0% similar

function eml_to_pdf 50.2% similar

function merge_pdfs_v1

Purpose

Source Code

Parameters

Parameter Details

Return Value

Dependencies

Required Imports

Conditional/Optional Imports

Usage Example

Best Practices

Tags

Similar Components

function merge_pdfs 72.0% similar

class DocumentMerger 69.8% similar

class MultiPagePDFProcessor 53.9% similar

function test_enhanced_pdf_processing 51.0% similar

function eml_to_pdf 50.2% similar

✨ Improve Code: merge_pdfs_v1

Code Comparison