๐Ÿ” Code Extractor

function test_attendee_extraction_comprehensive

Maturity: 49

A comprehensive test function that validates the attendee extraction logic from meeting transcripts, comparing actual speakers versus mentioned names, and demonstrating integration with meeting minutes generation.

File:
/tf/active/vicechatdev/leexi/test_attendee_comprehensive.py
Lines:
12 - 104
Complexity:
moderate

Purpose

This test function validates the EnhancedMeetingMinutesGenerator's ability to correctly identify actual meeting attendees (speakers) from a transcript while filtering out people merely mentioned in conversation, generic speaker labels, and meeting room systems. It provides detailed analysis of speaker patterns, frequency counts, and demonstrates the integration of attendee extraction with the full meeting minutes generation pipeline.

Source Code

def test_attendee_extraction_comprehensive():
    """Test the improved attendee extraction with before/after comparison"""
    
    print("๐Ÿงช COMPREHENSIVE ATTENDEE EXTRACTION TEST")
    print("=" * 50)
    
    # Initialize generator
    generator = EnhancedMeetingMinutesGenerator(model='gpt-4o')
    
    # Test with actual transcript
    transcript_path = '/tf/active/leexi/leexi-20250618-transcript-development_team_meeting.md'
    
    try:
        # Load transcript
        with open(transcript_path, 'r', encoding='utf-8') as f:
            transcript = f.read()
        
        # Extract metadata with improved logic
        metadata = generator.parse_transcript_metadata(transcript)
        
        print(f"๐Ÿ“„ Transcript: {transcript_path}")
        print(f"๐Ÿ“… Meeting Date: {metadata['date']}")
        print(f"โฑ๏ธ Duration: {metadata['duration']}")
        print(f"๐Ÿ‘ฅ Actual Speakers Found: {len(metadata['speakers'])}")
        
        print("\nโœ… IMPROVED ATTENDEE LIST:")
        for i, speaker in enumerate(metadata['speakers'], 1):
            print(f"  {i}. {speaker}")
        
        # Show what was EXCLUDED (mentioned but not speakers)
        print("\n๐Ÿšซ CORRECTLY EXCLUDED (mentioned in conversation but not actual speakers):")
        excluded_names = [
            "Jean", "Koen", "Julie", "Vincent", "Javier", "Juana", "Mike", 
            "Pascal", "Manu", "Frank", "Daniel", "Ksenia", "Wim", "Morgan"
        ]
        
        for name in excluded_names:
            if name in transcript:
                print(f"  - {name} (mentioned in conversation)")
        
        # Show analysis of speaking patterns
        print("\n๐Ÿ“Š SPEAKING PATTERN ANALYSIS:")
        speaker_pattern = r'^(.+) at \d+[h:]?\d*[:\-]\d+ - \d+[h:]?\d*[:\-]\d+'
        import re
        speaker_counts = {}
        
        for line in transcript.split('\n'):
            line = line.strip()
            if not line:
                continue
            match = re.match(speaker_pattern, line)
            if match:
                speaker = match.group(1).strip()
                if speaker and not re.match(r'^Speaker \d+$', speaker):
                    speaker_counts[speaker] = speaker_counts.get(speaker, 0) + 1
        
        print("  Speaker frequencies:")
        for speaker, count in sorted(speaker_counts.items(), key=lambda x: x[1], reverse=True):
            status = "โœ… INCLUDED" if speaker in metadata['speakers'] else "๐Ÿšซ EXCLUDED"
            print(f"    {speaker}: {count} times - {status}")
        
        print("\n๐Ÿ’ก IMPROVEMENT SUMMARY:")
        print("  - Only actual speakers are included as attendees")
        print("  - People mentioned in conversation are correctly excluded")
        print("  - Generic speakers (Speaker 1, Speaker 2) are filtered out")
        print("  - Meeting room systems are filtered out")
        print("  - Frequency analysis ensures consistent speakers")
        
        # Test with a small sample generation
        print("\n๐ŸŽฏ TESTING INTEGRATION WITH MEETING MINUTES GENERATION...")
        
        # Generate a brief summary to test attendee integration
        brief_minutes = generator.generate_meeting_minutes_with_config(
            transcript=transcript[:2000],  # Use first 2000 chars for speed
            meeting_title="Test Meeting - Attendee Extraction",
            detail_level="concise",
            rigor_level="standard",
            action_focus="standard",
            output_style="professional"
        )
        
        # Extract attendee line from generated minutes
        for line in brief_minutes.split('\n'):
            if 'Attendees:' in line:
                print(f"  Generated attendees: {line.strip()}")
                break
        
        print("\nโœ… ATTENDEE EXTRACTION TEST COMPLETED SUCCESSFULLY!")
        
    except Exception as e:
        print(f"โŒ Error: {e}")
        import traceback
        traceback.print_exc()

Return Value

This function does not return any value (implicitly returns None). It prints comprehensive test results to stdout, including extracted attendees, excluded names, speaking pattern analysis, and integration test results. If an error occurs, it prints the error message and traceback.

Dependencies

  • enhanced_meeting_minutes_generator
  • json
  • re
  • traceback
  • sys

Required Imports

import sys
from enhanced_meeting_minutes_generator import EnhancedMeetingMinutesGenerator
import json
import re
import traceback

Usage Example

# Ensure the transcript file exists at the expected path
# Set up OpenAI API key in environment
import os
os.environ['OPENAI_API_KEY'] = 'your-api-key-here'

# Import and run the test
from enhanced_meeting_minutes_generator import EnhancedMeetingMinutesGenerator
import sys
import json
import re
import traceback

# Run the comprehensive test
test_attendee_extraction_comprehensive()

# Expected output includes:
# - Number of actual speakers found
# - List of included attendees
# - List of correctly excluded names
# - Speaking pattern analysis with frequencies
# - Integration test with brief minutes generation

Best Practices

  • Ensure the transcript file path is correct and accessible before running the test
  • The function expects a specific transcript format with speaker timestamps (e.g., 'Speaker at HH:MM - HH:MM')
  • This is a test function designed for development/validation, not for production use
  • The function uses a hardcoded list of excluded names specific to the test transcript - modify this list for different transcripts
  • The test truncates the transcript to 2000 characters for the integration test to improve speed
  • Error handling is comprehensive with traceback printing for debugging
  • The function provides visual indicators (emojis) for better readability of test results
  • Consider redirecting output to a log file for automated testing scenarios
  • The speaker pattern regex assumes a specific timestamp format - adjust if your transcript format differs

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function test_attendee_extraction 89.0% similar

    A test function that validates the attendee extraction logic of the EnhancedMeetingMinutesGenerator by parsing a meeting transcript and displaying extracted metadata including speakers, date, and duration.

    From: /tf/active/vicechatdev/leexi/test_attendee_extraction.py
  • function test_mixed_previous_reports 62.6% similar

    A test function that validates the DocumentExtractor's ability to extract text content from multiple file formats (text and markdown) and combine them into a unified previous reports summary.

    From: /tf/active/vicechatdev/leexi/test_enhanced_reports.py
  • function extract_previous_reports_summary 57.1% similar

    Extracts and summarizes key information from previous meeting report files using document extraction and OpenAI's GPT-4o-mini model to provide context for upcoming meetings.

    From: /tf/active/vicechatdev/leexi/app.py
  • function handle_potential_truncation 57.1% similar

    Detects and handles truncated meeting minutes by comparing agenda items to discussion sections, then attempts regeneration with enhanced instructions to ensure completeness.

    From: /tf/active/vicechatdev/leexi/app.py
  • function test_multiple_files 56.5% similar

    A test function that validates the extraction of text content from multiple document files using a DocumentExtractor instance, displaying extraction results and simulating combined content processing.

    From: /tf/active/vicechatdev/leexi/test_multiple_files.py
โ† Back to Browse