🔍 Code Extractor

function generate_action_report

Maturity: 44

Generates a comprehensive corrective action report for data quality issues in treatment records, categorizing actions by urgency and providing impact assessment.

File:
/tf/active/vicechatdev/data_quality_dashboard.py
Lines:
323 - 373
Complexity:
moderate

Purpose

This function analyzes data quality issues in veterinary treatment records and produces a formatted console report. It identifies immediate, short-term, and long-term corrective actions needed to address timing issues, invalid dates (particularly 1900-01-01 errors), and treatments recorded outside flock lifespans. The report includes specific recommendations, affected flock counts, error rates, and business impact assessments to guide data quality improvement efforts.

Source Code

def generate_action_report(before_start, after_end, severe_cases, flocks_issues):
    """Generate a corrective action report."""
    print("\nCORRECTIVE ACTION REPORT")
    print("=" * 40)
    
    # Immediate actions
    print("IMMEDIATE ACTIONS (Within 1 week):")
    
    # 1900 date fixes
    errors_1900 = before_start[before_start['AdministeredDate'].dt.year == 1900]
    if len(errors_1900) > 0:
        print(f"1. Fix {len(errors_1900)} treatments with 1900-01-01 dates")
        print("   Action: Update AdministeredDate to correct values")
        print("   Affected flocks:")
        for flock in errors_1900['FlockCD'].unique():
            print(f"     - {flock}")
    
    # Perfect timing issue flocks
    perfect_issues = flocks_issues[flocks_issues['TimingIssueRate'] == 1.0]
    print(f"\n2. Review {len(perfect_issues)} flocks with 100% timing issues")
    print("   Action: Check if flock dates or treatment dates are incorrect")
    
    # Short-term actions
    print(f"\nSHORT-TERM ACTIONS (Within 1 month):")
    print("1. Implement data validation rules")
    print("   - Treatment dates must be within flock lifespan ± 7 days")
    print("   - Flag dates before 2000 or after current date + 1 year")
    print("   - Require confirmation for treatments outside normal range")
    
    extreme_future = after_end[after_end['DaysAfterEnd'] > 365]
    if len(extreme_future) > 0:
        print(f"\n2. Review {len(extreme_future)} treatments >1 year after flock end")
        print("   Action: Verify if these are data entry errors")
    
    # Long-term actions
    print(f"\nLONG-TERM ACTIONS (Within 3 months):")
    print("1. Implement automated data quality monitoring")
    print("2. Create monthly data quality reports")
    print("3. Train staff on proper date entry procedures")
    print("4. Review and update data entry interfaces")
    
    # Cost/benefit analysis
    total_issues = len(before_start) + len(after_end)
    total_treatments = 247640  # From previous analysis
    error_rate = (total_issues / total_treatments) * 100
    
    print(f"\nIMPACT ASSESSMENT:")
    print(f"- Current error rate: {error_rate:.2f}% of all treatments")
    print(f"- Data quality improvement potential: High")
    print(f"- Estimated effort: Medium (primarily data validation setup)")
    print(f"- Business impact: Improved analysis accuracy and regulatory compliance")

Parameters

Name Type Default Kind
before_start - - positional_or_keyword
after_end - - positional_or_keyword
severe_cases - - positional_or_keyword
flocks_issues - - positional_or_keyword

Parameter Details

before_start: A pandas DataFrame containing treatment records that occurred before the flock start date. Must have columns 'AdministeredDate' (datetime) and 'FlockCD' (flock identifier). Used to identify and report on pre-start timing issues.

after_end: A pandas DataFrame containing treatment records that occurred after the flock end date. Must have a 'DaysAfterEnd' column (numeric) indicating how many days after flock end the treatment occurred. Used to identify post-end timing issues and extreme future dates.

severe_cases: A pandas DataFrame containing severe data quality cases. This parameter is accepted but not currently used in the function implementation.

flocks_issues: A pandas DataFrame containing flock-level summary statistics. Must have columns 'TimingIssueRate' (float between 0-1) representing the proportion of timing issues per flock. Used to identify flocks with 100% timing issue rates.

Return Value

This function returns None. It produces output by printing a formatted report directly to the console (stdout). The report includes sections for immediate actions, short-term actions, long-term actions, and impact assessment.

Dependencies

  • pandas

Required Imports

import pandas as pd

Usage Example

import pandas as pd
from datetime import datetime

# Create sample data
before_start = pd.DataFrame({
    'AdministeredDate': pd.to_datetime(['1900-01-01', '2020-01-15', '1900-01-01']),
    'FlockCD': ['FLOCK001', 'FLOCK002', 'FLOCK003']
})

after_end = pd.DataFrame({
    'DaysAfterEnd': [10, 400, 500],
    'FlockCD': ['FLOCK004', 'FLOCK005', 'FLOCK006']
})

severe_cases = pd.DataFrame()  # Not used but required

flocks_issues = pd.DataFrame({
    'FlockCD': ['FLOCK001', 'FLOCK002', 'FLOCK003'],
    'TimingIssueRate': [1.0, 0.5, 1.0]
})

# Generate the report
generate_action_report(before_start, after_end, severe_cases, flocks_issues)

# Output will be printed to console with formatted sections

Best Practices

  • Ensure all input DataFrames have the required columns with correct data types before calling this function
  • The 'AdministeredDate' column in before_start must be datetime type for year extraction to work
  • The hardcoded total_treatments value (247,640) should be parameterized for reusability across different datasets
  • Consider redirecting output to a file or returning a string instead of printing directly for better testability
  • The severe_cases parameter is unused and could be removed or implemented in future versions
  • Validate that DaysAfterEnd values are numeric to avoid errors in the extreme_future calculation
  • This function is designed for console output; consider creating a version that returns structured data for programmatic use

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function show_critical_errors 74.2% similar

    Displays critical data quality errors in treatment records, focusing on date anomalies including 1900 dates, extreme future dates, and extreme past dates relative to flock lifecycles.

    From: /tf/active/vicechatdev/data_quality_dashboard.py
  • function create_data_quality_dashboard 70.8% similar

    Creates an interactive command-line dashboard for analyzing data quality issues in treatment timing data, specifically focusing on treatments administered outside of flock lifecycle dates.

    From: /tf/active/vicechatdev/data_quality_dashboard.py
  • function create_data_quality_dashboard_v1 68.4% similar

    Creates an interactive data quality dashboard for analyzing treatment timing issues in poultry flock management data by loading and processing CSV files containing timing anomalies.

    From: /tf/active/vicechatdev/data_quality_dashboard.py
  • function show_problematic_flocks 67.8% similar

    Analyzes and displays problematic flocks from a dataset by identifying those with systematic timing issues in their treatment records, categorizing them by severity and volume.

    From: /tf/active/vicechatdev/data_quality_dashboard.py
  • function analyze_temporal_trends 66.6% similar

    Analyzes and prints temporal trends in timing issues for treatments that occur before flock start dates or after flock end dates, breaking down occurrences by year and month.

    From: /tf/active/vicechatdev/data_quality_dashboard.py
← Back to Browse