🔍 Code Extractor

function generate_simple_html_from_eml

Maturity: 45

Converts an email.message.Message object into a clean, styled HTML representation with embedded inline images and attachment listings.

File:
/tf/active/vicechatdev/msg_to_eml.py
Lines:
994 - 1148
Complexity:
moderate

Purpose

This function generates a comprehensive HTML view of an email message, preserving formatting, embedding inline images as base64 data URIs, and listing attachments. It handles both multipart and simple email structures, preferring HTML content when available but gracefully falling back to plain text. The output includes email headers (From, To, Cc, Date), subject, body content with inline images, and a list of attachments. This is useful for displaying emails in web interfaces, generating email previews, or creating standalone HTML archives of email messages.

Source Code

def generate_simple_html_from_eml(msg):
    """Generate cleaner, more reliable HTML from an email.message.Message object, including inline images."""
    import html
    import base64
    import re

    html_parts = []

    # Start with a clean, simple HTML template
    html_parts.append("""
    <!DOCTYPE html>
    <html>
    <head>
        <meta charset="utf-8">
        <style>
            body { 
                font-family: Arial, sans-serif; 
                line-height: 1.5;
                margin: 20px;
                color: #333;
            }
            .header {
                margin-bottom: 20px;
                padding-bottom: 10px;
                border-bottom: 1px solid #ddd;
            }
            .header h2 { 
                margin: 0 0 10px 0;
                color: #444;
            }
            .meta {
                margin: 10px 0;
                font-size: 14px;
            }
            .meta div { margin: 5px 0; }
            .meta strong { color: #333; }
            .body { padding: 10px 0; }
            .attachments {
                margin-top: 15px;
                padding-top: 10px;
                border-top: 1px solid #eee;
            }
            .attachment {
                background-color: #f5f5f5;
                padding: 8px;
                margin-bottom: 5px;
                border-left: 3px solid #ddd;
            }
        </style>
    </head>
    <body>
    """)

    # Add header with subject
    subject = msg.get('Subject', '(No Subject)')
    html_parts.append(f'<div class="header"><h2>{html.escape(subject)}</h2>')

    # Add metadata
    html_parts.append('<div class="meta">')
    for header in ['From', 'To', 'Cc', 'Date']:
        if msg.get(header):
            html_parts.append(f'<div><strong>{header}:</strong> {html.escape(msg.get(header, ""))}</div>')
    html_parts.append('</div></div>')

    # Add body content
    html_parts.append('<div class="body">')

    # Find the best part to display (prefer HTML, then text)
    body_html = None
    body_text = None
    cid_images = {}

    # Extract inline images (Content-ID based)
    if msg.is_multipart():
        for part in msg.walk():
            content_type = part.get_content_type()
            disposition = part.get('Content-Disposition', '')

            # Handle inline images
            if content_type.startswith('image') and 'inline' in disposition:
                cid = part.get('Content-ID', '').strip('<>')
                img_data = part.get_payload(decode=True)
                if img_data:
                    img_type = content_type.split('/', 1)[1]
                    img_b64 = base64.b64encode(img_data).decode('ascii')
                    cid_images[f'cid:{cid}'] = f'data:image/{img_type};base64,{img_b64}'

            # Skip attachments
            if 'attachment' in disposition:
                continue

            # Get the payload
            payload = part.get_payload(decode=True)
            if not payload:
                continue

            charset = part.get_content_charset() or 'utf-8'
            try:
                decoded_payload = payload.decode(charset, errors='replace')
            except:
                decoded_payload = payload.decode('utf-8', errors='replace')

            if content_type == 'text/html':
                body_html = decoded_payload
                break
            elif content_type == 'text/plain' and not body_text:
                body_text = decoded_payload
    else:
        # Not multipart, just get the payload
        payload = msg.get_payload(decode=True)
        if payload:
            charset = msg.get_content_charset() or 'utf-8'
            try:
                decoded_payload = payload.decode(charset, errors='replace')
            except:
                decoded_payload = payload.decode('utf-8', errors='replace')

            if msg.get_content_type() == 'text/html':
                body_html = decoded_payload
            else:
                body_text = decoded_payload

    # Use HTML content if available, otherwise convert plain text to HTML
    if body_html:
        # Replace inline image references with embedded base64 data
        for cid_url, data_url in cid_images.items():
            body_html = body_html.replace(f'src="{cid_url}"', f'src="{data_url}"')

        html_parts.append(body_html)
    elif body_text:
        # Convert plain text to HTML with proper escaping
        html_body = html.escape(body_text).replace('\n', '<br>\n')
        html_parts.append(f'<pre style="white-space: pre-wrap; font-family: inherit;">{html_body}</pre>')
    else:
        html_parts.append('<p>(No content)</p>')

    html_parts.append('</div>')

    # Add attachment list
    attachments = []
    if msg.is_multipart():
        for part in msg.walk():
            if part.get_content_disposition() == 'attachment':
                filename = part.get_filename()
                if filename:
                    attachments.append(filename)

    if attachments:
        html_parts.append(f'<div class="attachments"><h3>Attachments ({len(attachments)})</h3>')
        for attachment in attachments:
            html_parts.append(f'<div class="attachment">{html.escape(attachment)}</div>')
        html_parts.append('</div>')

    html_parts.append('</body></html>')
    return "\n".join(html_parts)

Parameters

Name Type Default Kind
msg - - positional_or_keyword

Parameter Details

msg: An email.message.Message object (or compatible EmailMessage object) representing a parsed email. This should be obtained from parsing an .eml file or MIME message using Python's email module. The object should contain email headers, body content, and potentially multipart structures with attachments and inline images.

Return Value

Returns a string containing a complete, self-contained HTML document. The HTML includes embedded CSS styling, email metadata (subject, from, to, cc, date), the email body (either as HTML or converted plain text), inline images embedded as base64 data URIs, and a list of attachment filenames. The HTML is ready to be saved to a file or displayed in a web browser without external dependencies.

Dependencies

  • html
  • base64
  • re

Required Imports

import html
import base64
import re
import email

Usage Example

import email
import html
import base64
import re

# Parse an email from a file
with open('example.eml', 'r', encoding='utf-8') as f:
    msg = email.message_from_file(f)

# Generate HTML representation
html_output = generate_simple_html_from_eml(msg)

# Save to file
with open('email_output.html', 'w', encoding='utf-8') as f:
    f.write(html_output)

# Or parse from string
eml_string = '''From: sender@example.com
To: recipient@example.com
Subject: Test Email
Content-Type: text/plain

This is a test email.'''
msg = email.message_from_string(eml_string)
html_output = generate_simple_html_from_eml(msg)
print(html_output)

Best Practices

  • Ensure the input msg parameter is a properly parsed email.message.Message object using email.message_from_file() or email.message_from_string()
  • The function handles character encoding errors gracefully with 'replace' mode, but ensure source emails are properly encoded when possible
  • Inline images are embedded as base64 data URIs, which can significantly increase HTML file size for emails with many or large images
  • The function only lists attachment filenames, it does not embed or extract attachment content
  • HTML content from emails is inserted directly without sanitization - be cautious when displaying user-generated email content in security-sensitive contexts
  • The function prefers HTML content over plain text when both are available in multipart emails
  • Content-ID (cid:) references in HTML emails are automatically replaced with embedded base64 data URIs for inline images

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function generate_html_from_msg 83.1% similar

    Converts an email message object into a formatted HTML representation with styling, headers, body content, and attachment information.

    From: /tf/active/vicechatdev/msg_to_eml.py
  • function msg_to_eml_alternative 66.9% similar

    Converts Microsoft Outlook .msg files to .eml (email) format using extract_msg library, with support for headers, body content (plain text and HTML), and attachments.

    From: /tf/active/vicechatdev/msg_to_eml.py
  • function msg_to_eml 65.3% similar

    Converts Microsoft Outlook .msg files to standard .eml format, preserving email headers, body content (plain text and HTML), and attachments.

    From: /tf/active/vicechatdev/msg_to_eml.py
  • function eml_to_pdf 62.5% similar

    Converts an .eml email file to PDF format, including the email body and all attachments merged into a single PDF document.

    From: /tf/active/vicechatdev/msg_to_eml.py
  • function html_to_pdf 57.1% similar

    Converts HTML content to a PDF file using ReportLab with intelligent parsing of email-formatted HTML, including metadata extraction, body content processing, and attachment information.

    From: /tf/active/vicechatdev/msg_to_eml.py
← Back to Browse