🔍 Code Extractor

function show_critical_errors

Maturity: 45

Displays critical data quality errors in treatment records, focusing on date anomalies including 1900 dates, extreme future dates, and extreme past dates relative to flock lifecycles.

File:
/tf/active/vicechatdev/data_quality_dashboard.py
Lines:
193 - 223
Complexity:
moderate

Purpose

This function performs data quality analysis on treatment records to identify and report urgent date-related errors that require immediate attention. It categorizes errors into three types: (1) treatments with placeholder 1900-01-01 dates, (2) treatments recorded more than 1 year after flock end dates, and (3) treatments recorded more than 1000 days before flock start dates. The function provides detailed breakdowns including affected flocks and the most extreme cases for each error category.

Source Code

def show_critical_errors(before_start, after_end, severe_cases):
    """Show critical data errors that need immediate attention."""
    print("\nCRITICAL DATA ERRORS (URGENT)")
    print("-" * 40)
    
    # 1900 date errors
    errors_1900 = before_start[before_start['AdministeredDate'].dt.year == 1900]
    print(f"1. Treatments with 1900-01-01 dates: {len(errors_1900)}")
    if len(errors_1900) > 0:
        print("   Affected flocks:")
        for flock in errors_1900['FlockCD'].unique():
            count = len(errors_1900[errors_1900['FlockCD'] == flock])
            print(f"     {flock}: {count} treatments")
    
    # Extreme future dates
    extreme_future = after_end[after_end['DaysAfterEnd'] > 365]
    print(f"\n2. Treatments >1 year after flock end: {len(extreme_future)}")
    if len(extreme_future) > 0:
        print("   Most extreme cases:")
        top_extreme = extreme_future.nlargest(5, 'DaysAfterEnd')
        for _, row in top_extreme.iterrows():
            print(f"     {row['FlockCD']}: {row['DaysAfterEnd']:.0f} days after end")
    
    # Extreme past dates
    extreme_past = before_start[before_start['DaysBeforeStart'] > 1000]
    print(f"\n3. Treatments >1000 days before flock start: {len(extreme_past)}")
    if len(extreme_past) > 0:
        print("   Most extreme cases:")
        top_extreme = extreme_past.nlargest(5, 'DaysBeforeStart')
        for _, row in top_extreme.iterrows():
            print(f"     {row['FlockCD']}: {row['DaysBeforeStart']:.0f} days before start")

Parameters

Name Type Default Kind
before_start - - positional_or_keyword
after_end - - positional_or_keyword
severe_cases - - positional_or_keyword

Parameter Details

before_start: A pandas DataFrame containing treatment records that occurred before their associated flock start dates. Must include columns: 'AdministeredDate' (datetime), 'FlockCD' (flock identifier), and 'DaysBeforeStart' (numeric indicating days before flock start).

after_end: A pandas DataFrame containing treatment records that occurred after their associated flock end dates. Must include columns: 'FlockCD' (flock identifier) and 'DaysAfterEnd' (numeric indicating days after flock end).

severe_cases: A pandas DataFrame containing severe data quality cases. This parameter is accepted but not currently used in the function implementation, suggesting it may be reserved for future functionality or backward compatibility.

Return Value

This function does not return any value (returns None implicitly). It outputs formatted text directly to stdout displaying three categories of critical errors with counts, affected flocks, and detailed examples of the most extreme cases.

Dependencies

  • pandas

Required Imports

import pandas as pd

Usage Example

import pandas as pd
from datetime import datetime

# Create sample data
before_start = pd.DataFrame({
    'FlockCD': ['F001', 'F001', 'F002'],
    'AdministeredDate': pd.to_datetime(['1900-01-01', '2020-01-01', '2019-06-01']),
    'DaysBeforeStart': [8000, 1500, 1200]
})

after_end = pd.DataFrame({
    'FlockCD': ['F003', 'F004'],
    'DaysAfterEnd': [400, 500]
})

severe_cases = pd.DataFrame()  # Not used but required parameter

# Display critical errors
show_critical_errors(before_start, after_end, severe_cases)

Best Practices

  • Ensure 'AdministeredDate' column is converted to pandas datetime type before calling this function
  • Pre-filter DataFrames to only include records with date anomalies to improve performance
  • The 'severe_cases' parameter is currently unused; consider removing it or implementing its functionality
  • Consider redirecting output to a log file for production environments instead of printing to stdout
  • Validate that required columns exist in input DataFrames before calling to avoid AttributeError
  • The function assumes DaysBeforeStart and DaysAfterEnd are already calculated; ensure these columns exist
  • For large datasets, consider adding pagination or limiting the number of displayed results

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function analyze_temporal_trends 74.7% similar

    Analyzes and prints temporal trends in timing issues for treatments that occur before flock start dates or after flock end dates, breaking down occurrences by year and month.

    From: /tf/active/vicechatdev/data_quality_dashboard.py
  • function generate_action_report 74.2% similar

    Generates a comprehensive corrective action report for data quality issues in treatment records, categorizing actions by urgency and providing impact assessment.

    From: /tf/active/vicechatdev/data_quality_dashboard.py
  • function show_problematic_flocks 74.1% similar

    Analyzes and displays problematic flocks from a dataset by identifying those with systematic timing issues in their treatment records, categorizing them by severity and volume.

    From: /tf/active/vicechatdev/data_quality_dashboard.py
  • function quick_clean 72.8% similar

    Cleans flock data by identifying and removing flocks that have treatment records with timing inconsistencies (treatments administered outside the flock's start/end date range).

    From: /tf/active/vicechatdev/quick_cleaner.py
  • function create_data_quality_dashboard 72.1% similar

    Creates an interactive command-line dashboard for analyzing data quality issues in treatment timing data, specifically focusing on treatments administered outside of flock lifecycle dates.

    From: /tf/active/vicechatdev/data_quality_dashboard.py
← Back to Browse