function extract_previous_reports_summary
Extracts and summarizes key information from previous meeting report files using document extraction and OpenAI's GPT-4o-mini model to provide context for upcoming meetings.
/tf/active/vicechatdev/leexi/app.py
51 - 128
complex
Purpose
This function processes multiple previous meeting report files (various formats), extracts their text content, and uses an LLM to generate a structured summary focusing on action items, decisions, ongoing projects, stakeholders, issues, and deadlines. The summary provides continuity and context for new meetings by highlighting relevant information from past discussions.
Source Code
def extract_previous_reports_summary(file_paths):
"""Extract key information from previous meeting reports using document extractor and LLM"""
if not file_paths:
return ""
try:
# Use a lightweight model for preprocessing
import openai
client = openai.OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
combined_content = []
# Extract content from each file using the document extractor
for file_path in file_paths:
try:
logger.info(f"Extracting content from: {file_path}")
# Use document extractor to get text content
extracted_text = doc_extractor.extract_text(file_path)
if extracted_text:
file_name = Path(file_path).name
combined_content.append(f"=== {file_name} ===\n{extracted_text}\n")
else:
logger.warning(f"No text extracted from: {file_path}")
except Exception as file_error:
logger.error(f"Error processing file {file_path}: {str(file_error)}")
# Try to read as plain text as fallback
try:
with open(file_path, 'r', encoding='utf-8') as f:
content = f.read()
combined_content.append(f"=== {Path(file_path).name} ===\n{content}\n")
except Exception as fallback_error:
logger.error(f"Fallback text extraction also failed for {file_path}: {str(fallback_error)}")
if not combined_content:
return "No content could be extracted from previous reports."
full_content = "\n".join(combined_content)
# Limit content to avoid token limits (roughly 8000 characters = ~2000 tokens)
if len(full_content) > 8000:
full_content = full_content[:8000] + "\n... (content truncated)"
# Create preprocessing prompt
preprocessing_prompt = f"""You are an expert meeting analyst. Extract key information from the following previous meeting reports to provide context for a new meeting.
Focus on extracting:
1. Outstanding action items and their current status
2. Previous decisions that may impact current discussions
3. Ongoing projects and their timelines
4. Key stakeholders and their roles
5. Critical issues requiring follow-up
6. Important dates and deadlines
Provide a concise summary (max 800 words) that will help contextualize a new meeting discussion.
Previous Meeting Reports:
{full_content}
Generate a structured summary focusing on continuity and context for the upcoming meeting."""
response = client.chat.completions.create(
model="gpt-4o-mini", # Use smaller model for preprocessing
messages=[
{"role": "system", "content": "You are an expert meeting analyst who extracts key contextual information from previous meeting reports."},
{"role": "user", "content": preprocessing_prompt}
],
max_tokens=1000,
temperature=0.2
)
return response.choices[0].message.content
except Exception as e:
logger.error(f"Error extracting previous reports summary: {str(e)}")
return f"Error processing previous reports: {str(e)}"
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
file_paths |
- | - | positional_or_keyword |
Parameter Details
file_paths: A list of file path strings pointing to previous meeting report documents. Can be empty/None. Supports various document formats that the DocumentExtractor can handle (PDF, DOCX, TXT, etc.). If empty or None, returns an empty string immediately.
Return Value
Returns a string containing either: (1) A structured summary (max 800 words) of key information from previous reports including action items, decisions, projects, stakeholders, issues, and deadlines, (2) An empty string if no file_paths provided, (3) 'No content could be extracted from previous reports.' if all extractions fail, or (4) An error message string starting with 'Error processing previous reports:' if an exception occurs.
Dependencies
openaipathliboslogging
Required Imports
import os
from pathlib import Path
import logging
Conditional/Optional Imports
These imports are only needed under specific conditions:
import openai
Condition: Required when file_paths is not empty and function needs to call OpenAI API
Required (conditional)Usage Example
import os
import logging
from pathlib import Path
from document_extractor import DocumentExtractor
# Setup required dependencies
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)
doc_extractor = DocumentExtractor()
# Set API key
os.environ['OPENAI_API_KEY'] = 'your-api-key-here'
# Use the function
file_paths = [
'/path/to/meeting_report_1.pdf',
'/path/to/meeting_report_2.docx',
'/path/to/meeting_notes.txt'
]
summary = extract_previous_reports_summary(file_paths)
print(summary)
# Handle empty case
empty_summary = extract_previous_reports_summary([])
print(empty_summary) # Returns empty string
Best Practices
- Ensure the OPENAI_API_KEY environment variable is set before calling this function
- Initialize both 'logger' and 'doc_extractor' objects in module scope before using this function
- Be aware that content is truncated at 8000 characters to avoid token limits
- The function uses gpt-4o-mini model which incurs API costs per call
- Handle the returned error messages appropriately in your application logic
- The function has fallback mechanisms: if doc_extractor fails, it attempts plain text reading
- Consider rate limiting when processing many files to avoid OpenAI API rate limits
- The function processes files sequentially, so large numbers of files may take time
- Temperature is set to 0.2 for more deterministic outputs
- Max tokens is limited to 1000 for the summary response
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function test_mixed_previous_reports 67.4% similar
-
class MeetingMinutesGenerator 64.0% similar
-
class MeetingMinutesGenerator_v1 63.3% similar
-
function main_v15 62.3% similar
-
function main_v6 61.4% similar