function test_attendee_extraction_comprehensive
A comprehensive test function that validates the attendee extraction logic from meeting transcripts, comparing actual speakers versus mentioned names, and demonstrating integration with meeting minutes generation.
/tf/active/vicechatdev/leexi/test_attendee_comprehensive.py
12 - 104
moderate
Purpose
This test function validates the EnhancedMeetingMinutesGenerator's ability to correctly identify actual meeting attendees (speakers) from a transcript while filtering out people merely mentioned in conversation, generic speaker labels, and meeting room systems. It provides detailed analysis of speaker patterns, frequency counts, and demonstrates the integration of attendee extraction with the full meeting minutes generation pipeline.
Source Code
def test_attendee_extraction_comprehensive():
"""Test the improved attendee extraction with before/after comparison"""
print("๐งช COMPREHENSIVE ATTENDEE EXTRACTION TEST")
print("=" * 50)
# Initialize generator
generator = EnhancedMeetingMinutesGenerator(model='gpt-4o')
# Test with actual transcript
transcript_path = '/tf/active/leexi/leexi-20250618-transcript-development_team_meeting.md'
try:
# Load transcript
with open(transcript_path, 'r', encoding='utf-8') as f:
transcript = f.read()
# Extract metadata with improved logic
metadata = generator.parse_transcript_metadata(transcript)
print(f"๐ Transcript: {transcript_path}")
print(f"๐
Meeting Date: {metadata['date']}")
print(f"โฑ๏ธ Duration: {metadata['duration']}")
print(f"๐ฅ Actual Speakers Found: {len(metadata['speakers'])}")
print("\nโ
IMPROVED ATTENDEE LIST:")
for i, speaker in enumerate(metadata['speakers'], 1):
print(f" {i}. {speaker}")
# Show what was EXCLUDED (mentioned but not speakers)
print("\n๐ซ CORRECTLY EXCLUDED (mentioned in conversation but not actual speakers):")
excluded_names = [
"Jean", "Koen", "Julie", "Vincent", "Javier", "Juana", "Mike",
"Pascal", "Manu", "Frank", "Daniel", "Ksenia", "Wim", "Morgan"
]
for name in excluded_names:
if name in transcript:
print(f" - {name} (mentioned in conversation)")
# Show analysis of speaking patterns
print("\n๐ SPEAKING PATTERN ANALYSIS:")
speaker_pattern = r'^(.+) at \d+[h:]?\d*[:\-]\d+ - \d+[h:]?\d*[:\-]\d+'
import re
speaker_counts = {}
for line in transcript.split('\n'):
line = line.strip()
if not line:
continue
match = re.match(speaker_pattern, line)
if match:
speaker = match.group(1).strip()
if speaker and not re.match(r'^Speaker \d+$', speaker):
speaker_counts[speaker] = speaker_counts.get(speaker, 0) + 1
print(" Speaker frequencies:")
for speaker, count in sorted(speaker_counts.items(), key=lambda x: x[1], reverse=True):
status = "โ
INCLUDED" if speaker in metadata['speakers'] else "๐ซ EXCLUDED"
print(f" {speaker}: {count} times - {status}")
print("\n๐ก IMPROVEMENT SUMMARY:")
print(" - Only actual speakers are included as attendees")
print(" - People mentioned in conversation are correctly excluded")
print(" - Generic speakers (Speaker 1, Speaker 2) are filtered out")
print(" - Meeting room systems are filtered out")
print(" - Frequency analysis ensures consistent speakers")
# Test with a small sample generation
print("\n๐ฏ TESTING INTEGRATION WITH MEETING MINUTES GENERATION...")
# Generate a brief summary to test attendee integration
brief_minutes = generator.generate_meeting_minutes_with_config(
transcript=transcript[:2000], # Use first 2000 chars for speed
meeting_title="Test Meeting - Attendee Extraction",
detail_level="concise",
rigor_level="standard",
action_focus="standard",
output_style="professional"
)
# Extract attendee line from generated minutes
for line in brief_minutes.split('\n'):
if 'Attendees:' in line:
print(f" Generated attendees: {line.strip()}")
break
print("\nโ
ATTENDEE EXTRACTION TEST COMPLETED SUCCESSFULLY!")
except Exception as e:
print(f"โ Error: {e}")
import traceback
traceback.print_exc()
Return Value
This function does not return any value (implicitly returns None). It prints comprehensive test results to stdout, including extracted attendees, excluded names, speaking pattern analysis, and integration test results. If an error occurs, it prints the error message and traceback.
Dependencies
enhanced_meeting_minutes_generatorjsonretracebacksys
Required Imports
import sys
from enhanced_meeting_minutes_generator import EnhancedMeetingMinutesGenerator
import json
import re
import traceback
Usage Example
# Ensure the transcript file exists at the expected path
# Set up OpenAI API key in environment
import os
os.environ['OPENAI_API_KEY'] = 'your-api-key-here'
# Import and run the test
from enhanced_meeting_minutes_generator import EnhancedMeetingMinutesGenerator
import sys
import json
import re
import traceback
# Run the comprehensive test
test_attendee_extraction_comprehensive()
# Expected output includes:
# - Number of actual speakers found
# - List of included attendees
# - List of correctly excluded names
# - Speaking pattern analysis with frequencies
# - Integration test with brief minutes generation
Best Practices
- Ensure the transcript file path is correct and accessible before running the test
- The function expects a specific transcript format with speaker timestamps (e.g., 'Speaker at HH:MM - HH:MM')
- This is a test function designed for development/validation, not for production use
- The function uses a hardcoded list of excluded names specific to the test transcript - modify this list for different transcripts
- The test truncates the transcript to 2000 characters for the integration test to improve speed
- Error handling is comprehensive with traceback printing for debugging
- The function provides visual indicators (emojis) for better readability of test results
- Consider redirecting output to a log file for automated testing scenarios
- The speaker pattern regex assumes a specific timestamp format - adjust if your transcript format differs
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function test_attendee_extraction 89.0% similar
-
function test_mixed_previous_reports 62.6% similar
-
function extract_previous_reports_summary 57.1% similar
-
function handle_potential_truncation 57.1% similar
-
function test_multiple_files 56.5% similar