🔍 Code Extractor

function msg_to_eml

Maturity: 47

Converts Microsoft Outlook .msg files to standard .eml format, preserving email headers, body content (plain text and HTML), and attachments.

File:
/tf/active/vicechatdev/msg_to_eml.py
Lines:
43 - 150
Complexity:
moderate

Purpose

This function provides a complete conversion utility for transforming proprietary Microsoft .msg email files into the universal .eml format. It handles all email components including sender/recipient information, subject, date, message body (both plain text and HTML versions), and file attachments. The function includes robust error handling, logging, and fallback mechanisms for missing or malformed data. It's particularly useful for email migration, archival, or integration with email systems that don't support .msg format.

Source Code

def msg_to_eml(msg_path, eml_path):
    """Convert a .msg file to .eml format preserving all content and features"""
    try:
        # Check if input file exists
        if not os.path.exists(msg_path):
            logger.error(f"Input file not found: {msg_path}")
            return False
        
        # Load the .msg file
        logger.info(f"Opening .msg file: {msg_path}")
        msg = extract_msg.Message(msg_path)

        # Create a new EmailMessage object
        eml = EmailMessage()

        # Fill the basic headers
        eml['From'] = parse_email_address(msg.sender)
        eml['To'] = parse_email_address(msg.to)
        if msg.cc:
            eml['Cc'] = parse_email_address(msg.cc)
        if hasattr(msg, 'bcc') and msg.bcc:
            eml['Bcc'] = parse_email_address(msg.bcc)
        eml['Subject'] = msg.subject or ''
        
        # Add date header
        if hasattr(msg, 'date') and msg.date:
            try:
                eml['Date'] = formatdate(msg.date.timestamp(), localtime=True)
            except (AttributeError, TypeError):
                # Fallback to current date if there's an issue with the date format
                eml['Date'] = formatdate(localtime=True)
        else:
            eml['Date'] = formatdate(localtime=True)
            
        # Add message ID and other headers if available
        if hasattr(msg, 'message_id') and msg.message_id:
            eml['Message-ID'] = msg.message_id

        # Handle body content - prefer HTML if available
        body_text = msg.body or ''
        html_body = None
        
        # Properly handle HTML body extraction
        if hasattr(msg, 'htmlBody') and msg.htmlBody:
            html_body = msg.htmlBody
        elif hasattr(msg, 'html') and msg.html:
            html_body = msg.html
        
        if html_body:
            # Include both plain text and HTML versions - FIX: Added maintype='text'
            eml.set_content(body_text, subtype='plain')
            eml.add_alternative(html_body, maintype='text', subtype='html')
        else:
            # Only plain text available
            eml.set_content(body_text, subtype='plain')

        # Handle attachments
        logger.info(f"Processing {len(msg.attachments)} attachments")
        for attachment in msg.attachments:
            try:
                # Get filename (prefer long name if available)
                filename = None
                if hasattr(attachment, 'longFilename') and attachment.longFilename:
                    filename = attachment.longFilename
                elif hasattr(attachment, 'shortFilename') and attachment.shortFilename:
                    filename = attachment.shortFilename
                else:
                    filename = 'attachment'
                    
                # Get attachment data
                data = attachment.data
                if not data:
                    logger.warning(f"Skipping empty attachment: {filename}")
                    continue

                # Determine content type
                content_type = None
                if hasattr(attachment, 'mimetype') and attachment.mimetype:
                    content_type = attachment.mimetype
                else:
                    # Guess MIME type from filename
                    content_type, _ = mimetypes.guess_type(filename)
                    
                if content_type:
                    maintype, subtype = content_type.split('/', 1)
                else:
                    maintype, subtype = 'application', 'octet-stream'

                # Add the attachment with explicit maintype and subtype
                eml.add_attachment(data, maintype=maintype, subtype=subtype, filename=filename)
                logger.info(f"Added attachment: {filename} ({maintype}/{subtype})")
                
            except Exception as e:
                logger.error(f"Error processing attachment: {str(e)}")
                # Continue with next attachment even if this one fails

        # Write the EML file
        with open(eml_path, 'wb') as f:
            f.write(eml.as_bytes())

        logger.info(f"Successfully converted '{msg_path}' to '{eml_path}'")
        return True
        
    except Exception as e:
        logger.error(f"Error converting {msg_path} to EML: {str(e)}")
        # Print more detailed error information for debugging
        logger.error(traceback.format_exc())
        return False

Parameters

Name Type Default Kind
msg_path - - positional_or_keyword
eml_path - - positional_or_keyword

Parameter Details

msg_path: String path to the input .msg file to be converted. Must be a valid file path pointing to an existing Microsoft Outlook .msg file. The function checks for file existence before processing.

eml_path: String path where the output .eml file will be saved. Should include the desired filename with .eml extension. The directory must be writable. If the file exists, it will be overwritten.

Return Value

Returns a boolean value: True if the conversion was successful and the .eml file was created, False if any error occurred during the conversion process (file not found, parsing errors, write errors, etc.). Errors are logged via the logger object.

Dependencies

  • extract_msg
  • os
  • mimetypes
  • logging
  • email
  • traceback

Required Imports

import extract_msg
import os
import mimetypes
import logging
import traceback
from email.message import EmailMessage
from email.utils import formatdate

Usage Example

import logging
import os
from email.message import EmailMessage
from email.utils import formatdate
import extract_msg
import mimetypes
import traceback

# Setup logger
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)

# Define parse_email_address helper function
def parse_email_address(address):
    """Helper to parse email addresses"""
    if not address:
        return ''
    return address

# Convert a .msg file to .eml
msg_file = 'path/to/email.msg'
eml_file = 'path/to/output.eml'

success = msg_to_eml(msg_file, eml_file)

if success:
    print(f'Successfully converted {msg_file} to {eml_file}')
else:
    print('Conversion failed. Check logs for details.')

Best Practices

  • Ensure the 'parse_email_address' helper function is defined before calling this function
  • Configure a logger before using this function to capture detailed conversion information and errors
  • Verify that the input .msg file is not corrupted and is a valid Microsoft Outlook message file
  • Ensure sufficient disk space is available for the output .eml file, especially when dealing with large attachments
  • The function handles missing or malformed data gracefully with fallbacks, but review logs for warnings about skipped content
  • For batch conversions, wrap this function in error handling to prevent one failed conversion from stopping the entire process
  • The function preserves both HTML and plain text versions of emails when available, which is best practice for email compatibility
  • MIME types are automatically detected for attachments, but the function falls back to 'application/octet-stream' if detection fails

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function msg_to_eml_alternative 90.4% similar

    Converts Microsoft Outlook .msg files to .eml (email) format using extract_msg library, with support for headers, body content (plain text and HTML), and attachments.

    From: /tf/active/vicechatdev/msg_to_eml.py
  • function msg_to_pdf_improved 85.4% similar

    Converts a Microsoft Outlook .msg file to PDF format using EML as an intermediate format for improved reliability, with fallback to direct conversion if needed.

    From: /tf/active/vicechatdev/msg_to_eml.py
  • function msg_to_pdf 81.9% similar

    Converts a Microsoft Outlook .msg email file to a single PDF document, including the email body and all attachments merged together.

    From: /tf/active/vicechatdev/msg_to_eml.py
  • function generate_html_from_msg 67.2% similar

    Converts an email message object into a formatted HTML representation with styling, headers, body content, and attachment information.

    From: /tf/active/vicechatdev/msg_to_eml.py
  • function generate_simple_html_from_eml 65.3% similar

    Converts an email.message.Message object into a clean, styled HTML representation with embedded inline images and attachment listings.

    From: /tf/active/vicechatdev/msg_to_eml.py
← Back to Browse