function msg_to_eml_alternative
Converts Microsoft Outlook .msg files to .eml (email) format using extract_msg library, with support for headers, body content (plain text and HTML), and attachments.
/tf/active/vicechatdev/msg_to_eml.py
152 - 259
complex
Purpose
This function provides an alternative method for converting .msg files to .eml format by manually constructing a MIME multipart message. It first attempts to use extract_msg's built-in save_email method if available, then falls back to manually creating the EML file with proper MIME structure, including email headers, plain text and HTML bodies, and base64-encoded attachments. This is useful for email migration, archival, or when working with systems that require standard .eml format instead of proprietary .msg format.
Source Code
def msg_to_eml_alternative(msg_path, eml_path):
"""Alternative conversion approach using extract_msg's built-in functionality"""
try:
if not os.path.exists(msg_path):
logger.error(f"Input file not found: {msg_path}")
return False
# Load the .msg file
logger.info(f"Using alternative conversion method for: {msg_path}")
msg = extract_msg.Message(msg_path)
# Try direct raw EML content extraction if available
if hasattr(msg, 'save_email'):
msg.save_email(eml_path)
logger.info(f"Successfully converted '{msg_path}' to '{eml_path}' using built-in save_email")
return True
# Use extract_msg's built-in properties to manually create the EML
with open(eml_path, 'w', encoding='utf-8') as f:
# Write basic headers
f.write(f"From: {msg.sender}\n")
f.write(f"To: {msg.to}\n")
if msg.cc:
f.write(f"Cc: {msg.cc}\n")
f.write(f"Subject: {msg.subject or ''}\n")
# Add date
if hasattr(msg, 'date') and msg.date:
try:
f.write(f"Date: {msg.date}\n")
except:
f.write(f"Date: {formatdate(localtime=True)}\n")
else:
f.write(f"Date: {formatdate(localtime=True)}\n")
# Add content type header for MIME message
f.write("MIME-Version: 1.0\n")
# Create a simple multipart message
boundary = "----=_NextPart_" + os.urandom(16).hex()
f.write(f'Content-Type: multipart/mixed; boundary="{boundary}"\n\n')
# Add message separator
f.write(f"--{boundary}\n")
# Add plain text body
f.write('Content-Type: text/plain; charset="utf-8"\n')
f.write('Content-Transfer-Encoding: quoted-printable\n\n')
f.write(msg.body or '')
f.write(f"\n\n--{boundary}\n")
# Add HTML body if available
html_content = None
if hasattr(msg, 'htmlBody') and msg.htmlBody:
html_content = msg.htmlBody
elif hasattr(msg, 'html') and msg.html:
html_content = msg.html
if html_content:
f.write('Content-Type: text/html; charset="utf-8"\n')
f.write('Content-Transfer-Encoding: quoted-printable\n\n')
f.write(html_content)
f.write(f"\n\n--{boundary}\n")
# Add attachments
for attachment in msg.attachments:
try:
# Get filename
filename = getattr(attachment, 'longFilename', None) or getattr(attachment, 'shortFilename', None) or 'attachment'
# Determine content type
content_type = None
if hasattr(attachment, 'mimetype') and attachment.mimetype:
content_type = attachment.mimetype
else:
content_type, _ = mimetypes.guess_type(filename)
if not content_type:
content_type = 'application/octet-stream'
# Write attachment headers
f.write(f'Content-Type: {content_type}; name="{filename}"\n')
f.write('Content-Transfer-Encoding: base64\n')
f.write(f'Content-Disposition: attachment; filename="{filename}"\n\n')
# Write base64 encoded attachment data
import base64
if attachment.data:
encoded_data = base64.b64encode(attachment.data).decode('ascii')
# Write in chunks of 76 characters for proper base64 format
for i in range(0, len(encoded_data), 76):
f.write(encoded_data[i:i+76] + '\n')
f.write(f"\n--{boundary}\n")
except Exception as e:
logger.error(f"Error processing attachment {filename}: {str(e)}")
# Close the multipart message
f.write(f"--{boundary}--\n")
logger.info(f"Successfully converted '{msg_path}' to '{eml_path}' using manual alternative method")
return True
except Exception as e:
logger.error(f"Error in alternative conversion of {msg_path} to EML: {str(e)}")
logger.error(traceback.format_exc())
return False
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
msg_path |
- | - | positional_or_keyword |
eml_path |
- | - | positional_or_keyword |
Parameter Details
msg_path: String path to the input Microsoft Outlook .msg file to be converted. Must be a valid file path that exists on the filesystem.
eml_path: String path where the output .eml file will be saved. The directory must exist and be writable. If the file exists, it will be overwritten.
Return Value
Returns a boolean value: True if the conversion was successful (either through built-in save_email or manual construction), False if any error occurred during the conversion process (file not found, parsing errors, write errors, etc.).
Dependencies
extract_msgosmimetypesloggingemailtracebackbase64
Required Imports
import extract_msg
import os
import mimetypes
import logging
import traceback
from email.utils import formatdate
Conditional/Optional Imports
These imports are only needed under specific conditions:
import base64
Condition: only when processing attachments in the manual conversion method
Required (conditional)Usage Example
import extract_msg
import os
import mimetypes
import logging
import traceback
from email.utils import formatdate
# Setup logger
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)
# Convert a .msg file to .eml
msg_file = '/path/to/email.msg'
eml_file = '/path/to/output.eml'
success = msg_to_eml_alternative(msg_file, eml_file)
if success:
print(f'Successfully converted {msg_file} to {eml_file}')
else:
print(f'Conversion failed for {msg_file}')
Best Practices
- Ensure the logger object is properly configured before calling this function
- Verify that the input .msg file exists and is readable before calling
- Ensure the output directory for eml_path exists and has write permissions
- Handle the boolean return value to determine if conversion succeeded
- Be aware that this function writes files with UTF-8 encoding, which may affect special characters
- The function creates MIME multipart/mixed messages with a random boundary string
- Attachments are base64-encoded with 76-character line wrapping per RFC standards
- If the msg file has both plain text and HTML bodies, both will be included in the output
- Error handling is comprehensive but errors are logged rather than raised, so check return value
- The function attempts a built-in method first (save_email) before falling back to manual construction
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function msg_to_eml 90.4% similar
-
function msg_to_pdf_improved 80.0% similar
-
function msg_to_pdf 75.5% similar
-
function generate_simple_html_from_eml 66.9% similar
-
function generate_html_from_msg 66.4% similar