🔍 Code Extractor

class EnhancedMeetingMinutesGenerator

Maturity: 15

A class named EnhancedMeetingMinutesGenerator

File:
/tf/active/vicechatdev/leexi/enhanced_meeting_minutes_generator.py
Lines:
213 - 1143
Complexity:
moderate

Purpose

No detailed description available

Source Code

class EnhancedMeetingMinutesGenerator:
    def __init__(self, model: str = "gpt-4o", api_key: Optional[str] = None):
        """Initialize the enhanced generator with specified model and API key."""
        self.model = model.lower()
        self.ppt_processor = PowerPointProcessor()
        

        if self.model == "gpt-4o":
            if not OPENAI_AVAILABLE:
                raise Exception("OpenAI library not installed. Run: pip install openai")
            if not api_key:
                api_key = os.getenv('OPENAI_API_KEY')
            if not api_key:
                raise Exception("OpenAI API key not provided")
            self.client = openai.OpenAI(api_key=api_key)
            
        elif self.model == "azure-gpt-4o":
            if not OPENAI_AVAILABLE:
                raise Exception("OpenAI library not installed. Run: pip install openai")
            azure_endpoint = os.getenv('AZURE_OPENAI_ENDPOINT')
            azure_api_key = api_key or os.getenv('AZURE_OPENAI_API_KEY')
            
            if not azure_endpoint or not azure_api_key:
                raise Exception("Azure OpenAI endpoint and API key must be provided")
            
            # Ensure the endpoint has the correct format
            if azure_endpoint.endswith('/'):
                azure_endpoint = azure_endpoint.rstrip('/')
            
            logger.info(f"Azure OpenAI endpoint: {azure_endpoint}")
            logger.info(f"Azure API key length: {len(azure_api_key)}")
                
            self.client = openai.AzureOpenAI(
                api_key=azure_api_key,
                api_version="2024-08-01-preview",
                azure_endpoint=azure_endpoint
            )
            
            logger.info(f"Azure OpenAI client initialized with base URL: {self.client._base_url}")
            
        elif self.model == "gemini":
            if not GEMINI_AVAILABLE:
                raise Exception("Google Generative AI library not installed. Run: pip install google-generativeai")
            if not api_key:
                api_key = os.getenv('GEMINI_API_KEY')
            if not api_key:
                raise Exception("Gemini API key not provided")
            genai.configure(api_key=api_key)
            self.client = genai.GenerativeModel('gemini-2.0-flash-exp')
            
        else:
            raise Exception(f"Unsupported model: {model}. Choose 'gpt-4o', 'azure-gpt-4o', or 'gemini'")
        
    def load_transcript(self, file_path: str) -> str:
        """Load transcript from file."""
        try:
            with open(file_path, 'r', encoding='utf-8') as file:
                return file.read()
        except Exception as e:
            raise Exception(f"Error loading transcript: {e}")
    
    def process_powerpoint_content(self, ppt_path: str) -> Dict:
        """Process PowerPoint presentation and extract content."""
        if not os.path.exists(ppt_path):
            logger.warning(f"PowerPoint file not found: {ppt_path}")
            return {"text_chunks": [], "table_chunks": []}
        
        return self.ppt_processor.process_powerpoint(ppt_path)
    
    def parse_transcript_metadata(self, transcript: str) -> Dict[str, str]:
        """Extract meeting metadata from transcript."""
        # Extract date from filename or content
        date_match = re.search(r'(\d{4}-\d{2}-\d{2}|\d{8})', transcript)
        meeting_date = date_match.group(1) if date_match else datetime.now().strftime('%Y-%m-%d')
        
        # Extract actual speakers (people who spoke, not just mentioned)
        speaker_pattern = r'^(.+) at \d+[h:]?\d*[:\-]\d+ - \d+[h:]?\d*[:\-]\d+'
        speakers = set()
        speaker_frequency = {}
        
        for line in transcript.split('\n'):
            line = line.strip()
            if not line:
                continue
            match = re.match(speaker_pattern, line)
            if match:
                speaker = match.group(1).strip()
                # Filter out generic speaker names, meeting rooms, and system messages
                if (speaker and 
                    not re.match(r'^Speaker \d+$', speaker) and 
                    not speaker.startswith('Vice Lln Level') and
                    not speaker.startswith('Meeting Room') and
                    not speaker.startswith('System') and
                    # Filter out very short/generic names that might be misidentified
                    len(speaker) > 2 and
                    # Check if it looks like a real name (has letters, not just numbers/symbols)
                    re.search(r'[A-Za-z]', speaker)):
                    speakers.add(speaker)
                    speaker_frequency[speaker] = speaker_frequency.get(speaker, 0) + 1
        
        # Only include speakers who spoke multiple times (reduces false positives from mentions)
        # But keep speakers with full names (First Last) even if they spoke only once
        actual_speakers = set()
        for speaker in speakers:
            # Keep if they spoke multiple times OR have a full name (space indicates first/last name)
            if (speaker_frequency[speaker] > 1 or 
                (' ' in speaker and len(speaker.split()) >= 2)):
                actual_speakers.add(speaker)
        
        return {
            'date': meeting_date,
            'speakers': sorted(list(actual_speakers)),  # Sort for consistent output
            'duration': self._extract_duration(transcript)
        }
    
    def _extract_duration(self, transcript: str) -> str:
        """Extract meeting duration from transcript."""
        time_pattern = r'(\d+[h:]?\d+[:\-]\d+)'
        times = re.findall(time_pattern, transcript)
        if len(times) >= 2:
            start_time = times[0]
            end_time = times[-1]
            return f"{start_time} - {end_time}"
        return "Duration not available"
    
    def format_powerpoint_content(self, ppt_content: Dict) -> str:
        """Format PowerPoint content for inclusion in the prompt."""
        if not ppt_content:
            return "No PowerPoint content available."
        
        formatted_content = []
        
        # Add text chunks
        text_chunks = ppt_content.get("text_chunks", [])
        if text_chunks:
            formatted_content.append("**PRESENTATION SLIDES:**")
            for chunk in text_chunks:
                slide_title = chunk[0]
                slide_content = chunk[1]
                formatted_content.append(f"\n{slide_title}")
                formatted_content.append(f"{slide_content}")
        
        # Add table chunks
        table_chunks = ppt_content.get("table_chunks", [])
        if table_chunks:
            formatted_content.append("\n**PRESENTATION TABLES:**")
            for chunk in table_chunks:
                table_title = chunk[0]
                table_content = chunk[1]
                formatted_content.append(f"\n{table_title}")
                formatted_content.append(f"{table_content}")
        
        return "\n".join(formatted_content)
    
    def generate_enhanced_meeting_minutes_gpt4o(self, transcript: str, ppt_content: Dict = None, 
                                              meeting_title: str = "Development Team Meeting") -> str:
        """Generate enhanced meeting minutes using GPT-4o."""
        
        metadata = self.parse_transcript_metadata(transcript)
        ppt_formatted = self.format_powerpoint_content(ppt_content) if ppt_content else ""
        
        prompt = self._create_enhanced_prompt(transcript, metadata, ppt_formatted, meeting_title)
        
        try:
            # Use appropriate model name for the platform
            model_name = "OneCo-gpt" if self.model == "azure-gpt-4o" else "gpt-4o"
            
            response = self.client.chat.completions.create(
                model=model_name,
                messages=[
                    {
                        "role": "system", 
                        "content": "You are an expert pharmaceutical industry meeting secretary who creates clear, professional meeting minutes from both transcripts and presentation materials. Focus on extracting key decisions, action items, and technical discussions while maintaining professional pharmaceutical industry terminology. Integrate presentation content with spoken discussions to create comprehensive minutes that capture the full context of vaccine development meetings."
                    },
                    {"role": "user", "content": prompt}
                ],
                max_tokens=16384,  # Use correct GPT-4o limit
                temperature=0.3
            )
            
            return response.choices[0].message.content
        
        except Exception as e:
            raise Exception(f"Error generating enhanced meeting minutes with GPT-4o: {e}")
    
    def generate_enhanced_meeting_minutes_gemini(self, transcript: str, ppt_content: Dict = None, 
                                               meeting_title: str = "Development Team Meeting") -> str:
        """Generate enhanced meeting minutes using Gemini."""
        
        metadata = self.parse_transcript_metadata(transcript)
        ppt_formatted = self.format_powerpoint_content(ppt_content) if ppt_content else ""
        
        prompt = self._create_enhanced_prompt(transcript, metadata, ppt_formatted, meeting_title)
        
        try:
            response = self.client.generate_content(
                prompt,
                generation_config=genai.types.GenerationConfig(
                    max_output_tokens=8192,  # Use conservative Gemini limit
                    temperature=0.3,
                )
            )
            
            return response.text
        
        except Exception as e:
            raise Exception(f"Error generating enhanced meeting minutes with Gemini: {e}")
    
    def _create_enhanced_prompt(self, transcript: str, metadata: Dict, ppt_formatted: str, meeting_title: str) -> str:
        """Create the enhanced prompt for LLM processing."""
        return f"""You are an expert meeting secretary tasked with creating professional meeting minutes from both a meeting transcript and the PowerPoint presentation that was shown during the meeting.

Transform the following materials into well-structured meeting minutes with these sections:

## Meeting Information
- **Title:** {meeting_title}
- **Date:** {metadata['date']}
- **Duration:** {metadata['duration']}
- **Attendees:** {', '.join(metadata['speakers']) if metadata['speakers'] else 'Multiple participants (see transcript)'}

## Executive Summary
Provide a brief overview of the meeting's main purpose and key outcomes (2-3 sentences), integrating both discussion and presentation highlights.

## Meeting Agenda
Based on the topics discussed in the transcript and presented in the PowerPoint, create a structured agenda with numbered items:
1. [Main topic 1 - e.g., Preclinical Publications & IP Updates]
2. [Main topic 2 - e.g., Clinical Development Plan Review]
3. [Main topic 3 - e.g., Study Design Modifications]
4. [Additional topics as identified from transcript and presentation]

## Presentation Overview
- Summary of the key points covered in the PowerPoint presentation
- Main slides and their content
- Important data, charts, or tables presented
- How presentation content supports meeting discussions

## Meeting Discussion by Agenda Item

### 1. [Agenda Item Title]
**Summary:** [2-3 sentence summary of the key discussion points, decisions, and outcomes for this agenda item, integrating presentation content where relevant]

**Key Points:**
- [Bullet point 1 - discussion and/or presentation content]
- [Bullet point 2 - discussion and/or presentation content]
- [Additional relevant details from both sources]

**Decisions Made:**
- [Specific decision 1 with rationale if provided]
- [Specific decision 2 with rationale if provided]

### 2. [Agenda Item Title]
**Summary:** [2-3 sentence summary of the key discussion points, decisions, and outcomes for this agenda item, integrating presentation content where relevant]

**Key Points:**
- [Bullet point 1 - discussion and/or presentation content]
- [Bullet point 2 - discussion and/or presentation content]
- [Additional relevant details from both sources]

**Decisions Made:**
- [Specific decision 1 with rationale if provided]
- [Specific decision 2 with rationale if provided]

[Continue this pattern for all agenda items identified]

## Action Items

| Priority | Action Item | Responsible Party | Deadline | Status | Notes |
|----------|-------------|-------------------|----------|---------|-------|
| High | [Action description] | [Name/Team] | [Date/Timeline] | Open | [Additional context] |
| Medium | [Action description] | [Name/Team] | [Date/Timeline] | Open | [Additional context] |
| Low | [Action description] | [Name/Team] | TBD | Open | [Additional context] |

## Next Steps & Follow-up Meetings
- **Upcoming Meetings:** [List scheduled meetings with dates/purposes]
- **Outstanding Issues:** [Items requiring further discussion or resolution]
- **Future Planning:** [Long-term items or considerations]

## Technical Specifications & Key Data
- **Dosages & Formulations:** [Key technical specifications discussed]
- **Study Parameters:** [Important study design elements]
- **Timeline Impacts:** [Critical dates and dependencies]
- **Presentation Data:** [Key technical data from slides/tables]

**Instructions for Processing:**
1. First, analyze both the transcript and presentation content to identify 4-6 main agenda topics based on discussion flow and presentation structure
2. Create a logical agenda structure that reflects the meeting's natural progression and presentation flow
3. Organize all content under the appropriate agenda items with chapter-like summaries that integrate both transcript and presentation content
4. Extract action items and format them in a clear table with priority levels (High/Medium/Low)
5. Clean up conversational language into professional pharmaceutical industry language
6. Ignore technical difficulties, off-topic chatter, and multilingual fragments
7. Use clear, professional pharmaceutical industry language
8. Be specific about dosages, study phases, and technical details from both sources
9. Maintain context of vaccine development discussions
10. Ensure each agenda chapter has a concise summary plus detailed key points that seamlessly integrate spoken discussion with presentation content
11. Reference specific slides or presentation elements when relevant to discussions
12. Cross-reference presentation content with transcript discussions

**MEETING TRANSCRIPT:**
{transcript}

**POWERPOINT PRESENTATION CONTENT:**
{ppt_formatted}

Generate comprehensive meeting minutes following the structure above, focusing on the pharmaceutical development context and organizing content by agenda chapters that effectively combine both the spoken discussion and the presented material."""

    def generate_enhanced_meeting_minutes_with_context(self, transcript: str, ppt_content: Dict = None, 
                                                     meeting_title: str = "Development Team Meeting",
                                                     previous_reports_summary: str = "",
                                                     user_instructions: str = "") -> str:
        """Generate enhanced meeting minutes with additional context from previous reports and user instructions."""
        
        metadata = self.parse_transcript_metadata(transcript)
        ppt_formatted = self.format_powerpoint_content(ppt_content) if ppt_content else ""
        
        prompt = self._create_enhanced_prompt_with_context(
            transcript, metadata, ppt_formatted, meeting_title, 
            previous_reports_summary, user_instructions
        )
        
        if self.model == "gpt-4o":
            return self._generate_with_gpt4o(prompt)
        elif self.model == "azure-gpt-4o":
            return self._generate_with_gpt4o(prompt)  # Use same method as regular GPT-4o
        elif self.model == "gemini":
            return self._generate_with_gemini(prompt)
        else:
            raise Exception(f"Unknown model: {self.model}")
    
    def _generate_with_gpt4o(self, prompt: str) -> str:
        """Generate content using GPT-4o with automatic continuation for longer outputs"""
        try:
            # Check token limits and adjust output accordingly
            token_info = self._check_token_limits(prompt)
            if not token_info["fits"]:
                logger.warning(f"Prompt ({token_info['prompt_tokens']} tokens) may exceed model limits")
            
            # Use actual GPT-4o completion token limit
            max_output_tokens = min(16384, token_info.get("recommended_output_limit", 16384))
            logger.info(f"Using {max_output_tokens} max output tokens for GPT-4o")
            
            # First attempt - let the model decide output length (no max_tokens)
            try:
                # Use appropriate model name for the platform
                model_name = "OneCo-gpt" if self.model == "azure-gpt-4o" else "gpt-4o"
                
                response = self.client.chat.completions.create(
                    model=model_name,
                    messages=[
                        {
                            "role": "system", 
                            "content": """You are an expert pharmaceutical meeting secretary creating comprehensive, detailed meeting minutes from source materials. 

CORE PRINCIPLES:
1. **COMPLETENESS**: Generate ALL agenda items identified with rich, detailed content for each section
2. **COMPREHENSIVE COVERAGE**: Provide substantial content under each agenda item - expand on discussions, context, and implications  
3. **NATURAL SUMMARIZATION**: Summarize and rephrase content naturally while preserving meaning and key information
4. **ACTION & DECISION FOCUS**: Actively identify and clearly articulate all decisions made and action items, even if implicit
5. **BALANCED DETAIL**: Each agenda section should be substantive with multiple key points and thorough coverage

Create detailed, professional minutes that capture the full richness of the meeting discussion."""
                        },
                        {"role": "user", "content": prompt}
                    ],
                    temperature=0.3  # Increased for more natural, flowing output
                )
                
                result = response.choices[0].message.content
                
                # Check if output appears complete (has action items section which should be at the end)
                if "## Action Items" in result and not result.strip().endswith("..."):
                    return result
                else:
                    logger.warning("Output may be incomplete, attempting continuation...")
                    return self._continue_generation_gpt4o(prompt, result)
                    
            except Exception as first_attempt_error:
                logger.warning(f"First attempt without max_tokens failed: {first_attempt_error}")
                
                # Fallback: Use max_tokens with continuation
                model_name = "OneCo-gpt" if self.model == "azure-gpt-4o" else "gpt-4o"
                
                response = self.client.chat.completions.create(
                    model=model_name,
                    messages=[
                        {
                            "role": "system", 
                            "content": """You are an expert pharmaceutical meeting secretary creating comprehensive, detailed meeting minutes from source materials. 

CORE PRINCIPLES:
1. **COMPLETENESS**: Generate ALL agenda items identified with rich, detailed content for each section
2. **COMPREHENSIVE COVERAGE**: Provide substantial content under each agenda item - expand on discussions, context, and implications
3. **NATURAL SUMMARIZATION**: Summarize and rephrase content naturally while preserving meaning and key information
4. **ACTION & DECISION FOCUS**: Actively identify and clearly articulate all decisions made and action items, even if implicit
5. **BALANCED DETAIL**: Each agenda section should be substantive with multiple key points and thorough coverage

Create detailed, professional minutes that capture the full richness of the meeting discussion."""
                        },
                        {"role": "user", "content": prompt}
                    ],
                    max_tokens=max_output_tokens,
                    temperature=0.3  # Increased for more natural, flowing output
                )
                
                result = response.choices[0].message.content
                
                # Check if we need continuation
                if response.choices[0].finish_reason == "length":
                    logger.info("Output was truncated due to token limit, attempting continuation...")
                    return self._continue_generation_gpt4o(prompt, result)
                
                return result
        
        except Exception as e:
            raise Exception(f"Error generating enhanced meeting minutes with GPT-4o: {e}")
    
    def _continue_generation_gpt4o(self, original_prompt: str, partial_result: str) -> str:
        """Continue generation for GPT-4o when output was truncated"""
        try:
            # Create continuation prompt
            continuation_prompt = f"""You are continuing to generate meeting minutes. Here's what you've generated so far:

{partial_result}

CRITICAL: Continue from where you left off and complete ALL remaining sections. Focus on:
1. Complete any truncated agenda item discussions
2. Include all remaining agenda items that weren't covered
3. Complete the Action Items table
4. Include Next Steps & Follow-up section
5. Include Technical Specifications & Key Data section

Continue writing from where the above content ends. Do not repeat any content that's already been written."""

            model_name = "OneCo-gpt" if self.model == "azure-gpt-4o" else "gpt-4o"
            
            response = self.client.chat.completions.create(
                model=model_name,
                messages=[
                    {
                        "role": "system",
                        "content": "You are continuing to write comprehensive meeting minutes. Complete all remaining sections with detailed, natural content. Focus on substance and completeness rather than rigid adherence to exact formats."
                    },
                    {"role": "user", "content": continuation_prompt}
                ],
                temperature=0.3  # Increased for more natural continuation
            )
            
            continuation = response.choices[0].message.content
            
            # Combine the results
            combined_result = partial_result.rstrip() + "\n\n" + continuation.lstrip()
            
            logger.info("Successfully continued generation")
            return combined_result
            
        except Exception as e:
            logger.error(f"Continuation failed: {e}")
            # Return partial result with warning
            return partial_result + "\n\n## Generation Warning\n⚠️ Output may be incomplete due to token limits. Manual review recommended."
    
    def _generate_with_gemini(self, prompt: str) -> str:
        """Generate content using Gemini with automatic continuation for longer outputs"""
        try:
            # Check token limits and adjust output accordingly
            token_info = self._check_token_limits(prompt)
            if not token_info["fits"]:
                logger.warning(f"Prompt ({token_info['prompt_tokens']} tokens) may exceed model limits")
            
            # Use conservative Gemini output limit
            max_output_tokens = min(8192, token_info.get("recommended_output_limit", 8192))
            logger.info(f"Using {max_output_tokens} max output tokens for Gemini")
            
            # First attempt - let the model decide output length (no max_output_tokens)
            try:
                response = self.client.generate_content(
                    prompt,
                    generation_config=genai.types.GenerationConfig(
                        temperature=0.3,  # Increased for more natural output
                    )
                )
                
                result = response.text
                
                # Check if output appears complete
                if "## Action Items" in result and not result.strip().endswith("..."):
                    return result
                else:
                    logger.warning("Output may be incomplete, attempting continuation...")
                    return self._continue_generation_gemini(prompt, result)
                    
            except Exception as first_attempt_error:
                logger.warning(f"First attempt without max_output_tokens failed: {first_attempt_error}")
                
                # Fallback: Use max_output_tokens with continuation
                response = self.client.generate_content(
                    prompt,
                    generation_config=genai.types.GenerationConfig(
                        max_output_tokens=max_output_tokens,
                        temperature=0.3,  # Increased for more natural output
                    )
                )
                
                result = response.text
                
                # Check if we need continuation (Gemini doesn't have finish_reason like OpenAI)
                if len(result) > max_output_tokens * 3:  # Rough check for truncation
                    logger.info("Output may be truncated, attempting continuation...")
                    return self._continue_generation_gemini(prompt, result)
                
                return result
        
        except Exception as e:
            raise Exception(f"Error generating enhanced meeting minutes with Gemini: {e}")
    
    def _continue_generation_gemini(self, original_prompt: str, partial_result: str) -> str:
        """Continue generation for Gemini when output was truncated"""
        try:
            # Create continuation prompt
            continuation_prompt = f"""You are continuing to generate meeting minutes. Here's what you've generated so far:

{partial_result}

CRITICAL: Continue from where you left off and complete ALL remaining sections. Focus on:
1. Complete any truncated agenda item discussions
2. Include all remaining agenda items that weren't covered
3. Complete the Action Items table
4. Include Next Steps & Follow-up section
5. Include Technical Specifications & Key Data section

Continue writing from where the above content ends. Do not repeat any content that's already been written."""

            response = self.client.generate_content(
                continuation_prompt,
                generation_config=genai.types.GenerationConfig(
                    temperature=0.3,  # Increased for more natural continuation
                )
            )
            
            continuation = response.text
            
            # Combine the results
            combined_result = partial_result.rstrip() + "\n\n" + continuation.lstrip()
            
            logger.info("Successfully continued generation")
            return combined_result
            
        except Exception as e:
            logger.error(f"Continuation failed: {e}")
            # Return partial result with warning
            return partial_result + "\n\n## Generation Warning\n⚠️ Output may be incomplete due to token limits. Manual review recommended."
    
    def generate_meeting_minutes_with_config(self, transcript: str, ppt_content: Dict = None, 
                                           meeting_title: str = "Development Team Meeting",
                                           previous_reports_summary: str = "",
                                           user_instructions: str = "",
                                           rigor_level: str = "balanced",
                                           detail_level: str = "comprehensive", 
                                           action_focus: str = "standard",
                                           output_style: str = "professional") -> str:
        """Generate meeting minutes with configuration-based prompt variants"""
        
        metadata = self.parse_transcript_metadata(transcript)
        ppt_formatted = self.format_powerpoint_content(ppt_content) if ppt_content else ""
        
        # Create configuration-based prompt
        prompt = self._create_configurable_prompt(
            transcript, metadata, ppt_formatted, meeting_title, 
            previous_reports_summary, user_instructions,
            rigor_level, detail_level, action_focus, output_style
        )
        
        if self.model == "gpt-4o":
            return self._generate_with_gpt4o(prompt)
        elif self.model == "azure-gpt-4o":
            return self._generate_with_gpt4o(prompt)  # Use same method as regular GPT-4o
        elif self.model == "gemini":
            return self._generate_with_gemini(prompt)
        else:
            raise Exception(f"Unknown model: {self.model}")
    
    def _create_configurable_prompt(self, transcript: str, metadata: Dict, ppt_formatted: str, 
                                  meeting_title: str, previous_reports_summary: str, 
                                  user_instructions: str, rigor_level: str, detail_level: str,
                                  action_focus: str, output_style: str) -> str:
        """Create a prompt based on the selected configuration options"""
        
        # Build configuration-specific system message
        system_config = self._build_system_config(rigor_level, detail_level, action_focus, output_style)
        
        # Build context section
        context_section = ""
        if previous_reports_summary:
            context_section = f"""
## CONTEXT FROM PREVIOUS MEETINGS:
{previous_reports_summary}

**Context Usage Guidelines:**
- Use this context to provide relevant background and check for continuity
- Cross-reference action items for status updates when explicitly mentioned
- Note any conflicts between current and previous meeting information
"""
        
        # Build user instructions section
        instructions_section = ""
        if user_instructions:
            instructions_section = f"""
## SPECIFIC USER INSTRUCTIONS:
{user_instructions}

**Implementation Notes:**
- Incorporate these instructions while maintaining the selected rigor and detail levels
- Balance user requirements with the chosen configuration settings
"""
        
        # Get style-specific template
        template = self._get_style_template(output_style, detail_level, metadata, meeting_title)
        
        return f"""{system_config}

{context_section}

{instructions_section}

{template}

**MEETING TRANSCRIPT:**
{transcript}

**POWERPOINT PRESENTATION CONTENT:**
{ppt_formatted}

Generate meeting minutes following the structure above with the specified configuration settings."""
    
    def _build_system_config(self, rigor_level: str, detail_level: str, action_focus: str, output_style: str) -> str:
        """Build system configuration based on selected options"""
        
        # Rigor level configurations
        rigor_configs = {
            "flexible": "Prioritize readability and natural flow. Summarize and paraphrase content freely while preserving key meaning.",
            "balanced": "Balance accuracy with natural summarization. Use source materials as foundation while creating flowing, professional content.",
            "standard": "Balance accuracy with natural summarization. Use source materials as foundation while creating flowing, professional content.",  # Same as balanced
            "strict": "Maintain high accuracy with extensive cross-checking. Use close paraphrasing and mark unclear areas.",
            "forensic": "Maximum precision required. Use exact quotes for commitments and decisions. Mark all assumptions and uncertainties."
        }
        
        # Detail level configurations
        detail_configs = {
            "executive": "Provide high-level summary focus with essential decisions and outcomes only.",
            "concise": "Include essential information with key points and main decisions clearly stated.",
            "detailed": "Provide substantial content with comprehensive key points and thorough decision documentation.",
            "comprehensive": "Generate rich, detailed content with full context, implications, and extensive coverage of all topics."
        }
        
        # Action focus configurations
        action_configs = {
            "conservative": "Identify only clear, unambiguous action items and decisions explicitly stated in the materials.",
            "standard": "Capture explicit actions and decisions, including reasonably clear commitments and next steps.",
            "aggressive": "Actively identify implicit actions, commitments, and potential follow-up items from discussion context.",
            "contextual": "Extract actions with full decision context, including rationale, dependencies, and background information."
        }
        
        # Output style configurations
        style_configs = {
            "professional": "Use standard business meeting language with clear, professional tone and structure.",
            "technical": "Employ industry-specific terminology and technical precision appropriate for specialist audiences.",
            "formal": "Apply corporate governance style with formal language and structured documentation standards.",
            "narrative": "Create story-like flow with contextual connections and natural progression between topics."
        }
        
        return f"""You are an expert meeting secretary creating professional meeting minutes with the following configuration:

**RIGOR LEVEL - {rigor_level.upper()}:**
{rigor_configs[rigor_level]}

**DETAIL LEVEL - {detail_level.upper()}:**
{detail_configs[detail_level]}

**ACTION FOCUS - {action_focus.upper()}:**
{action_configs[action_focus]}

**OUTPUT STYLE - {output_style.upper()}:**
{style_configs[output_style]}

**IMPORTANT INSTRUCTIONS:**
1. Use ONLY the attendees specified in the template - do not add names mentioned in conversation
2. The attendee list has been pre-extracted from actual speakers, not people mentioned in discussion
3. Follow the exact structure and information provided in the template
4. Do not override meeting metadata (date, duration, attendees) with your own interpretation

Create meeting minutes that adhere to these specifications while maintaining professional quality and completeness."""
    
    def _get_style_template(self, output_style: str, detail_level: str, metadata: Dict, meeting_title: str) -> str:
        """Get the appropriate template based on style and detail level"""
        
        if output_style == "executive":
            return self._get_executive_template(metadata, meeting_title)
        elif output_style == "narrative":
            return self._get_narrative_template(metadata, meeting_title)
        elif output_style == "formal":
            return self._get_formal_template(metadata, meeting_title)
        else:
            return self._get_standard_template(detail_level, metadata, meeting_title)
    
    def _get_standard_template(self, detail_level: str, metadata: Dict, meeting_title: str) -> str:
        """Get standard template with detail level adjustments"""
        
        if detail_level == "executive":
            key_points_guidance = "- [2-3 highest priority items only]"
            decisions_guidance = "- [Critical decisions only]"
        elif detail_level == "concise":
            key_points_guidance = "- [Essential points with brief explanations]"
            decisions_guidance = "- [Main decisions with basic rationale]"
        elif detail_level == "detailed":
            key_points_guidance = "- [Comprehensive points with context and implications]"
            decisions_guidance = "- [Detailed decisions with full rationale and process]"
        else:  # comprehensive
            key_points_guidance = "- [Rich, detailed points with full context, background, and implications]"
            decisions_guidance = "- [Thorough decision documentation with rationale, alternatives considered, and implementation notes]"
        
        attendees_list = ', '.join(metadata['speakers']) if metadata['speakers'] else 'Multiple participants (see transcript)'
        
        return f"""## Meeting Information
- **Title:** {meeting_title}
- **Date:** {metadata['date']}
- **Duration:** {metadata['duration']}
- **Attendees:** {attendees_list}

## Executive Summary
[2-3 sentence overview of meeting purpose and key outcomes]

## Meeting Agenda
[Numbered list of all topics discussed]

## Presentation Overview
[Summary of PowerPoint content and integration with discussions]

## Meeting Discussion by Agenda Item

### 1. [Agenda Item Title]
**Summary:** [Overview of key discussion points and outcomes]

**Key Points:**
{key_points_guidance}

**Decisions Made:**
{decisions_guidance}

[Repeat for all agenda items]

## Action Items
[Detailed table with responsibilities, deadlines, and context]

## Next Steps & Follow-up
[Future planning and outstanding issues]

## Technical Specifications & Key Data
[Technical details and specifications discussed]"""
    
    def _get_executive_template(self, metadata: Dict, meeting_title: str) -> str:
        """Get executive summary focused template"""
        attendees_list = ', '.join(metadata['speakers']) if metadata['speakers'] else 'Multiple participants (see transcript)'
        
        return f"""## Executive Meeting Summary
- **Meeting:** {meeting_title} - {metadata['date']}
- **Attendees:** {attendees_list}
- **Key Decisions:** [Top 3-5 critical decisions]
- **Action Items:** [Priority actions with owners and deadlines]
- **Outstanding Issues:** [Items requiring follow-up]
- **Next Steps:** [Immediate next steps and upcoming meetings]

## Critical Discussion Points
[Brief overview of main topics with focus on decisions and outcomes]

## Priority Action Items
[Condensed action item list with high/medium/low priority classification]"""
    
    def _get_narrative_template(self, metadata: Dict, meeting_title: str) -> str:
        """Get narrative flow template"""
        attendees_list = ', '.join(metadata['speakers']) if metadata['speakers'] else 'Multiple participants (see transcript)'
        
        return f"""## Meeting Overview
**Title:** {meeting_title}
**Date:** {metadata['date']}
**Duration:** {metadata['duration']}
**Attendees:** {attendees_list}

[Natural narrative describing the meeting flow and context]

## Discussion Flow
[Story-like progression through topics with contextual connections]

### Opening Discussion
[How the meeting began and initial topics]

### Main Topics Covered
[Natural flow through agenda items with connections and context]

### Decision Points
[Key moments where decisions were made, with background and rationale]

### Action Planning
[How action items emerged from discussions]

## Meeting Outcomes
[Natural summary of what was accomplished and next steps]

## Action Items and Follow-up
[Detailed action items with context of how they emerged]"""
    
    def _get_formal_template(self, metadata: Dict, meeting_title: str) -> str:
        """Get formal corporate governance template"""
        attendees_list = ', '.join(metadata['speakers']) if metadata['speakers'] else 'Multiple participants (see transcript)'
        
        return f"""## MEETING MINUTES

**Meeting Title:** {meeting_title}
**Date and Time:** {metadata['date']} - {metadata['duration']}
**Location/Platform:** [Meeting location]
**Attendees:** {attendees_list}
**Meeting Called to Order:** [Time]

## AGENDA ITEMS

### Item 1: [Agenda Item]
**Presented by:** [Presenter]
**Discussion:** [Formal summary of discussion]
**Resolution:** [Formal decisions and resolutions]
**Vote/Consensus:** [Decision process]

[Repeat for all agenda items]

## MOTIONS AND RESOLUTIONS
[Formal list of motions, seconds, and voting outcomes]

## ACTION ITEMS AND ASSIGNMENTS
[Formal table with specific assignments and accountability]

## ADJOURNMENT
**Meeting Adjourned:** [Time]
**Next Meeting:** [Date and agenda items]

**Minutes Prepared by:** [Secretary/Recorder]
**Date of Preparation:** [Date]"""

    def save_minutes(self, minutes: str, output_path: str):
        """Save meeting minutes to file."""
        try:
            with open(output_path, 'w', encoding='utf-8') as file:
                file.write(minutes)
            print(f"Enhanced meeting minutes saved to: {output_path}")
        except Exception as e:
            raise Exception(f"Error saving meeting minutes: {e}")
    
    def _validate_output_quality(self, minutes: str, transcript: str, ppt_content: str) -> str:
        """Validate the generated minutes for completeness and quality"""
        
        validation_notes = []
        
        # Check for agenda completeness
        if "## Meeting Agenda" in minutes and "## Meeting Discussion by Agenda Item" in minutes:
            agenda_section = minutes.split("## Meeting Agenda")[1].split("##")[0]
            agenda_items = [line.strip() for line in agenda_section.split('\n') if line.strip().startswith(('1.', '2.', '3.', '4.', '5.', '6.', '7.', '8.', '9.'))]
            
            discussion_section = minutes.split("## Meeting Discussion by Agenda Item")[1].split("## Action Items")[0] if "## Action Items" in minutes else minutes.split("## Meeting Discussion by Agenda Item")[1]
            discussion_items = [line.strip() for line in discussion_section.split('\n') if line.strip().startswith('###')]
            
            if len(agenda_items) > len(discussion_items):
                validation_notes.append(f"⚠️ NOTICE: Agenda lists {len(agenda_items)} items but {len(discussion_items)} discussion sections found.")
                validation_notes.append("Consider regenerating if important topics appear to be missing.")
        
        # Check for content richness
        if "## Meeting Discussion by Agenda Item" in minutes:
            discussion_section = minutes.split("## Meeting Discussion by Agenda Item")[1].split("## Action Items")[0] if "## Action Items" in minutes else minutes.split("## Meeting Discussion by Agenda Item")[1]
            avg_section_length = len(discussion_section) // max(1, len([line for line in discussion_section.split('\n') if line.strip().startswith('###')]))
            
            if avg_section_length < 500:  # Less than ~500 characters per agenda item
                validation_notes.append("💡 TIP: Content appears condensed. Consider regenerating with instructions for more detailed coverage.")
        
        # Check for action items presence
        if "## Action Items" in minutes:
            action_section = minutes.split("## Action Items")[1].split("##")[0]
            if len(action_section.strip()) < 100:
                validation_notes.append("💡 TIP: Few action items identified. Ensure all commitments and follow-ups are captured.")
        
        # Add helpful validation notes if any issues found
        if validation_notes:
            validation_section = "\n\n## Quality Assessment Notes\n"
            for note in validation_notes:
                validation_section += f"- {note}\n"
            
            # Add general quality tips
            validation_section += "\n**Quality Enhancement Tips:**\n"
            validation_section += "- Use regeneration with specific instructions for more detailed coverage\n"
            validation_section += "- Request focus on specific agenda items that need more detail\n"
            validation_section += "- Consider breaking complex meetings into focused sessions\n"
            
            minutes += validation_section
        
        return minutes

    def _estimate_tokens(self, text: str) -> int:
        """Rough estimation of token count for text (approximately 4 characters per token)"""
        return len(text) // 4
    
    def _check_token_limits(self, prompt: str) -> Dict[str, int]:
        """Check if prompt fits within model token limits and provide recommendations"""
        prompt_tokens = self._estimate_tokens(prompt)
        
        if self.model in ["gpt-4o", "azure-gpt-4o"]:
            total_limit = 128000  # Input limit
            max_completion_tokens = 16384  # Actual GPT-4o completion limit
            recommended_output_limit = min(max_completion_tokens, total_limit - prompt_tokens - 1000)
            return {
                "prompt_tokens": prompt_tokens,
                "total_limit": total_limit,
                "max_completion_tokens": max_completion_tokens,
                "recommended_output_limit": max(1000, recommended_output_limit),  # Ensure minimum
                "fits": prompt_tokens < total_limit - 5000  # Need room for output
            }
        elif self.model == "gemini":
            total_limit = 1048576  # 1M+ tokens
            max_completion_tokens = 8192  # Conservative limit for Gemini
            recommended_output_limit = min(max_completion_tokens, total_limit - prompt_tokens - 1000)
            return {
                "prompt_tokens": prompt_tokens,
                "total_limit": total_limit,
                "max_completion_tokens": max_completion_tokens,
                "recommended_output_limit": max(1000, recommended_output_limit),
                "fits": prompt_tokens < total_limit - 10000  # Need room for large output
            }
        else:
            return {"prompt_tokens": prompt_tokens, "fits": False}

Parameters

Name Type Default Kind
bases - -

Parameter Details

bases: Parameter of type

Return Value

Returns unspecified type

Class Interface

Methods

__init__(self, model, api_key)

Purpose: Initialize the enhanced generator with specified model and API key.

Parameters:

  • model: Type: str
  • api_key: Type: Optional[str]

Returns: None

load_transcript(self, file_path) -> str

Purpose: Load transcript from file.

Parameters:

  • file_path: Type: str

Returns: Returns str

process_powerpoint_content(self, ppt_path) -> Dict

Purpose: Process PowerPoint presentation and extract content.

Parameters:

  • ppt_path: Type: str

Returns: Returns Dict

parse_transcript_metadata(self, transcript) -> Dict[str, str]

Purpose: Extract meeting metadata from transcript.

Parameters:

  • transcript: Type: str

Returns: Returns Dict[str, str]

_extract_duration(self, transcript) -> str

Purpose: Extract meeting duration from transcript.

Parameters:

  • transcript: Type: str

Returns: Returns str

format_powerpoint_content(self, ppt_content) -> str

Purpose: Format PowerPoint content for inclusion in the prompt.

Parameters:

  • ppt_content: Type: Dict

Returns: Returns str

generate_enhanced_meeting_minutes_gpt4o(self, transcript, ppt_content, meeting_title) -> str

Purpose: Generate enhanced meeting minutes using GPT-4o.

Parameters:

  • transcript: Type: str
  • ppt_content: Type: Dict
  • meeting_title: Type: str

Returns: Returns str

generate_enhanced_meeting_minutes_gemini(self, transcript, ppt_content, meeting_title) -> str

Purpose: Generate enhanced meeting minutes using Gemini.

Parameters:

  • transcript: Type: str
  • ppt_content: Type: Dict
  • meeting_title: Type: str

Returns: Returns str

_create_enhanced_prompt(self, transcript, metadata, ppt_formatted, meeting_title) -> str

Purpose: Create the enhanced prompt for LLM processing.

Parameters:

  • transcript: Type: str
  • metadata: Type: Dict
  • ppt_formatted: Type: str
  • meeting_title: Type: str

Returns: Returns str

generate_enhanced_meeting_minutes_with_context(self, transcript, ppt_content, meeting_title, previous_reports_summary, user_instructions) -> str

Purpose: Generate enhanced meeting minutes with additional context from previous reports and user instructions.

Parameters:

  • transcript: Type: str
  • ppt_content: Type: Dict
  • meeting_title: Type: str
  • previous_reports_summary: Type: str
  • user_instructions: Type: str

Returns: Returns str

_generate_with_gpt4o(self, prompt) -> str

Purpose: Generate content using GPT-4o with automatic continuation for longer outputs

Parameters:

  • prompt: Type: str

Returns: Returns str

_continue_generation_gpt4o(self, original_prompt, partial_result) -> str

Purpose: Continue generation for GPT-4o when output was truncated

Parameters:

  • original_prompt: Type: str
  • partial_result: Type: str

Returns: Returns str

_generate_with_gemini(self, prompt) -> str

Purpose: Generate content using Gemini with automatic continuation for longer outputs

Parameters:

  • prompt: Type: str

Returns: Returns str

_continue_generation_gemini(self, original_prompt, partial_result) -> str

Purpose: Continue generation for Gemini when output was truncated

Parameters:

  • original_prompt: Type: str
  • partial_result: Type: str

Returns: Returns str

generate_meeting_minutes_with_config(self, transcript, ppt_content, meeting_title, previous_reports_summary, user_instructions, rigor_level, detail_level, action_focus, output_style) -> str

Purpose: Generate meeting minutes with configuration-based prompt variants

Parameters:

  • transcript: Type: str
  • ppt_content: Type: Dict
  • meeting_title: Type: str
  • previous_reports_summary: Type: str
  • user_instructions: Type: str
  • rigor_level: Type: str
  • detail_level: Type: str
  • action_focus: Type: str
  • output_style: Type: str

Returns: Returns str

_create_configurable_prompt(self, transcript, metadata, ppt_formatted, meeting_title, previous_reports_summary, user_instructions, rigor_level, detail_level, action_focus, output_style) -> str

Purpose: Create a prompt based on the selected configuration options

Parameters:

  • transcript: Type: str
  • metadata: Type: Dict
  • ppt_formatted: Type: str
  • meeting_title: Type: str
  • previous_reports_summary: Type: str
  • user_instructions: Type: str
  • rigor_level: Type: str
  • detail_level: Type: str
  • action_focus: Type: str
  • output_style: Type: str

Returns: Returns str

_build_system_config(self, rigor_level, detail_level, action_focus, output_style) -> str

Purpose: Build system configuration based on selected options

Parameters:

  • rigor_level: Type: str
  • detail_level: Type: str
  • action_focus: Type: str
  • output_style: Type: str

Returns: Returns str

_get_style_template(self, output_style, detail_level, metadata, meeting_title) -> str

Purpose: Get the appropriate template based on style and detail level

Parameters:

  • output_style: Type: str
  • detail_level: Type: str
  • metadata: Type: Dict
  • meeting_title: Type: str

Returns: Returns str

_get_standard_template(self, detail_level, metadata, meeting_title) -> str

Purpose: Get standard template with detail level adjustments

Parameters:

  • detail_level: Type: str
  • metadata: Type: Dict
  • meeting_title: Type: str

Returns: Returns str

_get_executive_template(self, metadata, meeting_title) -> str

Purpose: Get executive summary focused template

Parameters:

  • metadata: Type: Dict
  • meeting_title: Type: str

Returns: Returns str

_get_narrative_template(self, metadata, meeting_title) -> str

Purpose: Get narrative flow template

Parameters:

  • metadata: Type: Dict
  • meeting_title: Type: str

Returns: Returns str

_get_formal_template(self, metadata, meeting_title) -> str

Purpose: Get formal corporate governance template

Parameters:

  • metadata: Type: Dict
  • meeting_title: Type: str

Returns: Returns str

save_minutes(self, minutes, output_path)

Purpose: Save meeting minutes to file.

Parameters:

  • minutes: Type: str
  • output_path: Type: str

Returns: None

_validate_output_quality(self, minutes, transcript, ppt_content) -> str

Purpose: Validate the generated minutes for completeness and quality

Parameters:

  • minutes: Type: str
  • transcript: Type: str
  • ppt_content: Type: str

Returns: Returns str

_estimate_tokens(self, text) -> int

Purpose: Rough estimation of token count for text (approximately 4 characters per token)

Parameters:

  • text: Type: str

Returns: Returns int

_check_token_limits(self, prompt) -> Dict[str, int]

Purpose: Check if prompt fits within model token limits and provide recommendations

Parameters:

  • prompt: Type: str

Returns: Returns Dict[str, int]

Required Imports

import os
import re
from datetime import datetime
from typing import List
from typing import Dict

Usage Example

# Example usage:
# result = EnhancedMeetingMinutesGenerator(bases)

Similar Components

AI-powered semantic similarity - components with related functionality:

  • class MeetingMinutesGenerator 62.3% similar

    A class that generates professional meeting minutes from meeting transcripts using OpenAI's GPT-4o model, with capabilities to parse metadata, extract action items, and format output.

    From: /tf/active/vicechatdev/meeting_minutes_generator.py
  • class MeetingMinutesGenerator_v1 60.3% similar

    A class that generates professional meeting minutes from meeting transcripts using either OpenAI's GPT-4o or Google's Gemini AI models.

    From: /tf/active/vicechatdev/advanced_meeting_minutes_generator.py
  • function regenerate_minutes 52.9% similar

    Flask route handler that regenerates meeting minutes from a previous session using modified instructions, model selection, and configuration parameters.

    From: /tf/active/vicechatdev/leexi/app.py
  • class DocxMerger 52.6% similar

    A class named DocxMerger

    From: /tf/active/vicechatdev/word_merge.py
  • function test_attendee_extraction 52.1% similar

    A test function that validates the attendee extraction logic of the EnhancedMeetingMinutesGenerator by parsing a meeting transcript and displaying extracted metadata including speakers, date, and duration.

    From: /tf/active/vicechatdev/leexi/test_attendee_extraction.py
← Back to Browse