🔍 Code Extractor

function test_markdown_link_parsing

Maturity: 42

A test function that validates markdown link parsing capabilities, specifically testing extraction and URL encoding of complex URLs containing special characters from Quill editor format.

File:
/tf/active/vicechatdev/test_complex_hyperlink.py
Lines:
50 - 80
Complexity:
simple

Purpose

This function serves as a unit test to verify that markdown links with complex URLs (containing special characters like &, commas, spaces, and URL fragments) can be correctly parsed, extracted, and encoded. It demonstrates the process of splitting markdown link syntax, extracting link text and URLs, and properly encoding URL paths while preserving query parameters and fragments.

Source Code

def test_markdown_link_parsing():
    """Test markdown link parsing with complex URLs"""
    print("\nTesting markdown link parsing...")
    
    # Test the exact format that would come from Quill editor
    markdown_text = "[3.5.1 Cost model for WBPK022&K024,K034_20240624.xlsx](https://filecloud.vicebio.com/ui/core/index.html?filter=3.5.1+Cost+model+for+WBPK022&K024,K034_20240624.xlsx#expl-tabl./SHARED/vicebio_shares/Wuxi/3%20WO-CO%20&%20invoice%20plan/3.5%20Cost%20Model/)"
    
    print(f"Input markdown: {markdown_text}")
    
    import re
    # Test URL extraction
    link_parts = re.split(r'\[([^\]]+)\]\(([^)]+)\)', markdown_text)
    print(f"Parsed parts: {link_parts}")
    
    if len(link_parts) >= 3:
        text = link_parts[1]
        url = link_parts[2] 
        print(f"Extracted text: '{text}'")
        print(f"Extracted URL: '{url}'")
        
        # Test URL encoding
        import urllib.parse
        if '://' in url:
            scheme_and_domain, path_part = url.split('://', 1)
            if '/' in path_part:
                domain, path = path_part.split('/', 1)
                encoded_path = urllib.parse.quote(path, safe='/?&=:#%')
                clean_url = f"{scheme_and_domain}://{domain}/{encoded_path}"
                print(f"Cleaned URL: '{clean_url}'")
    
    print("✅ URL parsing test completed")

Return Value

This function does not return any value (implicitly returns None). It prints test results and status messages to stdout, including the input markdown, parsed parts, extracted text and URL, and the cleaned/encoded URL.

Dependencies

  • re
  • urllib.parse

Required Imports

import re
import urllib.parse

Usage Example

import re
import urllib.parse

def test_markdown_link_parsing():
    """Test markdown link parsing with complex URLs"""
    print("\nTesting markdown link parsing...")
    
    markdown_text = "[3.5.1 Cost model for WBPK022&K024,K034_20240624.xlsx](https://filecloud.vicebio.com/ui/core/index.html?filter=3.5.1+Cost+model+for+WBPK022&K024,K034_20240624.xlsx#expl-tabl./SHARED/vicebio_shares/Wuxi/3%20WO-CO%20&%20invoice%20plan/3.5%20Cost%20Model/)"
    
    print(f"Input markdown: {markdown_text}")
    
    link_parts = re.split(r'\[([^\]]+)\]\(([^)]+)\)', markdown_text)
    print(f"Parsed parts: {link_parts}")
    
    if len(link_parts) >= 3:
        text = link_parts[1]
        url = link_parts[2] 
        print(f"Extracted text: '{text}'")
        print(f"Extracted URL: '{url}'")
        
        if '://' in url:
            scheme_and_domain, path_part = url.split('://', 1)
            if '/' in path_part:
                domain, path = path_part.split('/', 1)
                encoded_path = urllib.parse.quote(path, safe='/?&=:#%')
                clean_url = f"{scheme_and_domain}://{domain}/{encoded_path}"
                print(f"Cleaned URL: '{clean_url}'")
    
    print("✅ URL parsing test completed")

# Run the test
test_markdown_link_parsing()

Best Practices

  • This is a test function meant for validation purposes, not production use
  • The regex pattern r'\[([^\]]+)\]\(([^)]+)\)' assumes well-formed markdown links and may not handle nested brackets or escaped characters
  • The URL encoding preserves specific safe characters ('/?&=:#%') which may need adjustment based on specific URL requirements
  • The function assumes URLs contain '://' scheme separator and at least one path component
  • For production code, consider using a dedicated markdown parsing library instead of regex
  • The function prints directly to stdout; consider using logging or returning results for better testability

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function test_complex_url_hyperlink 64.7% similar

    A test function that validates the creation of Word documents with complex FileCloud URLs containing special characters, query parameters, and URL fragments as clickable hyperlinks.

    From: /tf/active/vicechatdev/test_complex_hyperlink.py
  • function test_fixes 55.2% similar

    A comprehensive test function that validates email template rendering and CDocs application link presence in a document management system's email notification templates.

    From: /tf/active/vicechatdev/test_comprehensive_fixes.py
  • function test_document_extractor 53.7% similar

    A test function that validates the DocumentExtractor class by testing file type support detection, text extraction from various document formats, and error handling.

    From: /tf/active/vicechatdev/leexi/test_document_extractor.py
  • function test_multiple_files 52.8% similar

    A test function that validates the extraction of text content from multiple document files using a DocumentExtractor instance, displaying extraction results and simulating combined content processing.

    From: /tf/active/vicechatdev/leexi/test_multiple_files.py
  • function test_mixed_previous_reports 51.9% similar

    A test function that validates the DocumentExtractor's ability to extract text content from multiple file formats (text and markdown) and combine them into a unified previous reports summary.

    From: /tf/active/vicechatdev/leexi/test_enhanced_reports.py
← Back to Browse