🔍 Code Extractor

function generate_html_from_msg

Maturity: 51

Converts an email message object into a formatted HTML representation with styling, headers, body content, and attachment information.

File:
/tf/active/vicechatdev/msg_to_eml.py
Lines:
476 - 645
Complexity:
moderate

Purpose

This function generates a complete, styled HTML document from an email message object (typically from extract_msg library). It handles both HTML and plain text email bodies, includes email metadata (subject, sender, recipients, date), preserves formatting, converts URLs to hyperlinks, and lists attachments. The output is suitable for viewing in web browsers or converting to other formats like PDF.

Source Code

def generate_html_from_msg(msg, include_headers=True):
    """Generate HTML representation of email message with cleaner formatting"""
    html_parts = []
    
    # Add CSS styling with more robust formatting
    html_parts.append("""
    <html>
    <head>
    <meta charset="utf-8">
    <style>
        body { 
            font-family: Arial, sans-serif; 
            line-height: 1.6; 
            color: #333; 
            margin: 20px;
            max-width: 800px;
            margin: 0 auto;
            padding: 20px;
        }
        .header { 
            background-color: #f5f5f5; 
            padding: 15px; 
            border-bottom: 1px solid #ddd; 
            margin-bottom: 20px; 
            border-radius: 4px;
        }
        .header h1 { 
            margin: 0 0 10px 0; 
            padding: 0; 
            color: #444; 
            font-size: 22px;
            font-weight: bold;
        }
        .meta { 
            margin: 10px 0; 
            color: #666; 
            font-size: 14px;
        }
        .meta p {
            margin: 5px 0;
        }
        .meta strong { 
            color: #333; 
        }
        .body { 
            padding: 10px 0; 
            border-top: 1px solid #eee;
        }
        .attachments { 
            margin-top: 20px; 
            padding-top: 10px; 
            border-top: 1px solid #eee;
        }
        .attachment { 
            background-color: #f9f9f9; 
            padding: 8px; 
            margin-bottom: 5px; 
            border-left: 3px solid #ccc;
        }
        pre {
            white-space: pre-wrap;
            font-family: inherit;
            margin: 0;
        }
        blockquote {
            border-left: 3px solid #ddd;
            padding-left: 10px;
            color: #555;
            margin: 10px 0 10px 20px;
        }
    </style>
    </head>
    <body>
    """)
    
    # Header section with email metadata
    if include_headers:
        html_parts.append(f"<div class='header'>")
        html_parts.append(f"<h1>{msg.subject or '(No Subject)'}</h1>")
        html_parts.append(f"<div class='meta'>")
        html_parts.append(f"<p><strong>From:</strong> {msg.sender}</p>")
        html_parts.append(f"<p><strong>To:</strong> {msg.to}</p>")
        
        if msg.cc:
            html_parts.append(f"<p><strong>CC:</strong> {msg.cc}</p>")
        
        if hasattr(msg, 'date') and msg.date:
            html_parts.append(f"<p><strong>Date:</strong> {msg.date}</p>")
        
        html_parts.append(f"</div>") # Close meta
        html_parts.append(f"</div>") # Close header
    
    # Body content - prefer HTML if available
    html_parts.append(f"<div class='body'>")
    
    # Get HTML body content if available, otherwise use plain text
    body_html = None
    if hasattr(msg, 'htmlBody') and msg.htmlBody:
        body_html = msg.htmlBody
    elif hasattr(msg, 'html') and msg.html:
        body_html = msg.html
    
    if body_html:
        # Clean up HTML body - ensuring proper string type
        if isinstance(body_html, bytes):
            try:
                clean_html = body_html.decode('utf-8')
            except UnicodeDecodeError:
                try:
                    clean_html = body_html.decode('latin-1')
                except UnicodeDecodeError:
                    clean_html = body_html.decode('utf-8', errors='replace')
        else:
            clean_html = str(body_html)
            
        # Clean up HTML content - replace problematic tags and attributes
        import re
        
        # Replace charset if needed and replace problematic elements
        clean_html = clean_html.replace('charset="us-ascii"', 'charset="utf-8"')
        
        # Remove potentially problematic CSS that might mess up rendering
        clean_html = re.sub(r'<style[^>]*>.*?</style>', '', clean_html, flags=re.DOTALL)
        
        # Simplify complex tables if present
        if '<table' in clean_html.lower():
            clean_html = re.sub(r'<table[^>]*>', '<table border="1" cellpadding="4" style="border-collapse:collapse">', clean_html)
        
        # Ensure body content is properly enclosed in body tags
        if '<body' not in clean_html.lower():
            clean_html = f"<div>{clean_html}</div>"
            
        html_parts.append(clean_html)
    else:
        # Convert plain text to HTML with proper line breaks and formatting
        body_text = msg.body or '(No content)'
        
        # Convert URLs to hyperlinks
        import re
        url_pattern = r'(https?://[^\s<>"]+|www\.[^\s<>"]+)'
        body_text = re.sub(url_pattern, r'<a href="\1">\1</a>', body_text)
        
        # Preserve line breaks properly
        body_text = body_text.replace('\r\n', '\n').replace('\n', '<br>\n')
        
        # Escape any remaining HTML characters except our added tags
        import html
        body_text = html.escape(body_text, quote=False).replace('&lt;br&gt;', '<br>')
        body_text = body_text.replace('&lt;a href=', '<a href=').replace('&lt;/a&gt;', '</a>')
        
        html_parts.append(f"<pre>{body_text}</pre>")
    
    html_parts.append(f"</div>") # Close body
    
    # Add attachment info section
    if len(msg.attachments) > 0:
        html_parts.append(f"<div class='attachments'>")
        html_parts.append(f"<h2>Attachments ({len(msg.attachments)})</h2>")
        
        for attachment in msg.attachments:
            filename = getattr(attachment, 'longFilename', None) or getattr(attachment, 'shortFilename', None) or 'attachment'
            html_parts.append(f"<div class='attachment'>")
            html_parts.append(f"<p><strong>{filename}</strong></p>")
            html_parts.append(f"</div>")
            
        html_parts.append(f"</div>") # Close attachments
    
    html_parts.append("</body></html>")
    
    return "\n".join(html_parts)

Parameters

Name Type Default Kind
msg - - positional_or_keyword
include_headers - True positional_or_keyword

Parameter Details

msg: An email message object (typically from extract_msg library) containing properties like subject, sender, to, cc, date, body, htmlBody/html, and attachments. Expected to have attributes: subject, sender, to, cc (optional), date (optional), body, htmlBody or html (optional), and attachments list.

include_headers: Boolean flag (default: True) that controls whether to include the email header section (subject, from, to, cc, date) in the generated HTML. Set to False to generate only the body content without metadata.

Return Value

Returns a string containing a complete HTML document with embedded CSS styling. The HTML includes a header section with email metadata (if include_headers=True), the email body content (either rendered HTML or formatted plain text), and an attachments section listing all attached files. The HTML is UTF-8 encoded and ready for display or further processing.

Dependencies

  • extract_msg
  • html
  • re

Required Imports

import html
import re

Conditional/Optional Imports

These imports are only needed under specific conditions:

import extract_msg

Condition: Required to create the msg object that is passed as parameter

Required (conditional)

Usage Example

import extract_msg
import html
import re

# Define the function (copy the source code here)
def generate_html_from_msg(msg, include_headers=True):
    # ... (function code)
    pass

# Load an email message from .msg file
msg = extract_msg.Message('path/to/email.msg')

# Generate HTML with headers
html_output = generate_html_from_msg(msg, include_headers=True)

# Save to file
with open('email_output.html', 'w', encoding='utf-8') as f:
    f.write(html_output)

# Or generate without headers (body only)
html_body_only = generate_html_from_msg(msg, include_headers=False)

# Clean up
msg.close()

Best Practices

  • Ensure the msg object is properly initialized and has the expected attributes before calling this function
  • Handle potential encoding issues when working with email bodies from different sources
  • The function handles both HTML and plain text email bodies automatically, preferring HTML when available
  • For plain text emails, URLs are automatically converted to clickable hyperlinks
  • The generated HTML includes responsive CSS styling with a max-width of 800px for better readability
  • Attachment information is displayed but actual attachment files are not embedded in the HTML
  • The function attempts multiple encoding strategies (utf-8, latin-1, utf-8 with error replacement) to handle various email formats
  • Potentially problematic CSS and style tags from original HTML emails are stripped to ensure consistent rendering
  • Always close the msg object after use to free resources: msg.close()
  • The function is safe for emails with missing or None values for optional fields (cc, date, body)

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function generate_simple_html_from_eml 83.1% similar

    Converts an email.message.Message object into a clean, styled HTML representation with embedded inline images and attachment listings.

    From: /tf/active/vicechatdev/msg_to_eml.py
  • function msg_to_eml 67.2% similar

    Converts Microsoft Outlook .msg files to standard .eml format, preserving email headers, body content (plain text and HTML), and attachments.

    From: /tf/active/vicechatdev/msg_to_eml.py
  • function msg_to_eml_alternative 66.4% similar

    Converts Microsoft Outlook .msg files to .eml (email) format using extract_msg library, with support for headers, body content (plain text and HTML), and attachments.

    From: /tf/active/vicechatdev/msg_to_eml.py
  • function msg_to_pdf 65.6% similar

    Converts a Microsoft Outlook .msg email file to a single PDF document, including the email body and all attachments merged together.

    From: /tf/active/vicechatdev/msg_to_eml.py
  • function html_to_pdf 63.6% similar

    Converts HTML content to a PDF file using ReportLab with intelligent parsing of email-formatted HTML, including metadata extraction, body content processing, and attachment information.

    From: /tf/active/vicechatdev/msg_to_eml.py
← Back to Browse