🔍 Code Extractor

function extract_total_references

Maturity: 43

Extracts the total count of references from markdown-formatted content by first checking for a header line with the total, then falling back to manually counting reference entries.

File:
/tf/active/vicechatdev/enhanced_word_converter_fixed.py
Lines:
73 - 88
Complexity:
simple

Purpose

This function is designed to parse markdown documents that contain bibliographic references and determine the total number of references present. It uses a two-stage approach: first attempting to find an explicit '**Total References**:' header line with the count, and if that fails, manually counting lines that match the reference format '**[...]**'. This is useful for document processing pipelines that need to validate or report on reference counts in markdown-formatted academic or technical documents.

Source Code

def extract_total_references(markdown_content):
    """Extract total number of references from the markdown content"""
    lines = markdown_content.split('\n')
    for line in lines:
        if line.startswith('**Total References**:'):
            try:
                return int(line.split(':')[1].strip())
            except:
                pass
    
    # Count references manually if not found in header
    ref_count = 0
    for line in lines:
        if line.startswith('**[') and ']**' in line:
            ref_count += 1
    return ref_count

Parameters

Name Type Default Kind
markdown_content - - positional_or_keyword

Parameter Details

markdown_content: A string containing markdown-formatted text. Expected to contain references formatted as '**[reference_id]**' or a header line '**Total References**: N' where N is an integer. Can be multi-line content with newline characters separating lines.

Return Value

Returns an integer representing the total number of references found in the markdown content. If a '**Total References**:' header is found and successfully parsed, returns that value. Otherwise, returns the count of lines matching the reference pattern '**[...]**'. Returns 0 if no references are found.

Usage Example

# Example 1: Markdown with explicit total
markdown_with_header = '''
**Total References**: 3

**[1]** Smith, J. (2020). Example Paper.
**[2]** Doe, J. (2021). Another Paper.
**[3]** Brown, A. (2022). Third Paper.
'''

total = extract_total_references(markdown_with_header)
print(f"Total references: {total}")  # Output: Total references: 3

# Example 2: Markdown without explicit total (manual count)
markdown_without_header = '''
**[1]** Smith, J. (2020). Example Paper.
**[2]** Doe, J. (2021). Another Paper.
'''

total = extract_total_references(markdown_without_header)
print(f"Total references: {total}")  # Output: Total references: 2

# Example 3: Empty or no references
empty_markdown = "Some text without references"
total = extract_total_references(empty_markdown)
print(f"Total references: {total}")  # Output: Total references: 0

Best Practices

  • Ensure markdown_content is a string; pass empty string '' instead of None to avoid AttributeError
  • The function uses a broad exception handler (bare except) which may hide parsing errors; consider logging when the header parsing fails
  • Reference format must match exactly '**[' at line start and ']**' somewhere in the line for manual counting to work
  • The function assumes references are on separate lines; inline references won't be counted correctly
  • If the '**Total References**:' header exists but contains non-numeric values, the function silently falls back to manual counting
  • Consider validating that the header count matches the manual count for data integrity in production use

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function parse_references_section 66.7% similar

    Parses a formatted references section string and extracts structured data including reference numbers, sources, and content previews using regular expressions.

    From: /tf/active/vicechatdev/improved_convert_disclosures_to_table.py
  • function extract_warranty_data_improved 47.7% similar

    Parses markdown-formatted warranty documentation to extract structured warranty data including IDs, titles, sections, disclosure text, and reference citations.

    From: /tf/active/vicechatdev/improved_convert_disclosures_to_table.py
  • class ReferenceManager 47.1% similar

    Manages document references for inline citation and bibliography generation in a RAG (Retrieval-Augmented Generation) system.

    From: /tf/active/vicechatdev/fixed_project_victoria_generator.py
  • class ReferenceManager_v1 47.1% similar

    Manages document references for inline citation and bibliography generation, tracking documents and generating formatted citations and bibliographies.

    From: /tf/active/vicechatdev/improved_project_victoria_generator.py
  • function extract_warranty_sections 46.6% similar

    Parses markdown content to extract warranty section headers, returning a list of dictionaries containing section IDs and titles for table of contents generation.

    From: /tf/active/vicechatdev/enhanced_word_converter_fixed.py
← Back to Browse