function generate_action_report
Generates a comprehensive corrective action report for data quality issues in treatment records, categorizing actions by urgency and providing impact assessment.
/tf/active/vicechatdev/data_quality_dashboard.py
323 - 373
moderate
Purpose
This function analyzes data quality issues in veterinary treatment records and produces a formatted console report. It identifies immediate, short-term, and long-term corrective actions needed to address timing issues, invalid dates (particularly 1900-01-01 errors), and treatments recorded outside flock lifespans. The report includes specific recommendations, affected flock counts, error rates, and business impact assessments to guide data quality improvement efforts.
Source Code
def generate_action_report(before_start, after_end, severe_cases, flocks_issues):
"""Generate a corrective action report."""
print("\nCORRECTIVE ACTION REPORT")
print("=" * 40)
# Immediate actions
print("IMMEDIATE ACTIONS (Within 1 week):")
# 1900 date fixes
errors_1900 = before_start[before_start['AdministeredDate'].dt.year == 1900]
if len(errors_1900) > 0:
print(f"1. Fix {len(errors_1900)} treatments with 1900-01-01 dates")
print(" Action: Update AdministeredDate to correct values")
print(" Affected flocks:")
for flock in errors_1900['FlockCD'].unique():
print(f" - {flock}")
# Perfect timing issue flocks
perfect_issues = flocks_issues[flocks_issues['TimingIssueRate'] == 1.0]
print(f"\n2. Review {len(perfect_issues)} flocks with 100% timing issues")
print(" Action: Check if flock dates or treatment dates are incorrect")
# Short-term actions
print(f"\nSHORT-TERM ACTIONS (Within 1 month):")
print("1. Implement data validation rules")
print(" - Treatment dates must be within flock lifespan ± 7 days")
print(" - Flag dates before 2000 or after current date + 1 year")
print(" - Require confirmation for treatments outside normal range")
extreme_future = after_end[after_end['DaysAfterEnd'] > 365]
if len(extreme_future) > 0:
print(f"\n2. Review {len(extreme_future)} treatments >1 year after flock end")
print(" Action: Verify if these are data entry errors")
# Long-term actions
print(f"\nLONG-TERM ACTIONS (Within 3 months):")
print("1. Implement automated data quality monitoring")
print("2. Create monthly data quality reports")
print("3. Train staff on proper date entry procedures")
print("4. Review and update data entry interfaces")
# Cost/benefit analysis
total_issues = len(before_start) + len(after_end)
total_treatments = 247640 # From previous analysis
error_rate = (total_issues / total_treatments) * 100
print(f"\nIMPACT ASSESSMENT:")
print(f"- Current error rate: {error_rate:.2f}% of all treatments")
print(f"- Data quality improvement potential: High")
print(f"- Estimated effort: Medium (primarily data validation setup)")
print(f"- Business impact: Improved analysis accuracy and regulatory compliance")
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
before_start |
- | - | positional_or_keyword |
after_end |
- | - | positional_or_keyword |
severe_cases |
- | - | positional_or_keyword |
flocks_issues |
- | - | positional_or_keyword |
Parameter Details
before_start: A pandas DataFrame containing treatment records that occurred before the flock start date. Must have columns 'AdministeredDate' (datetime) and 'FlockCD' (flock identifier). Used to identify and report on pre-start timing issues.
after_end: A pandas DataFrame containing treatment records that occurred after the flock end date. Must have a 'DaysAfterEnd' column (numeric) indicating how many days after flock end the treatment occurred. Used to identify post-end timing issues and extreme future dates.
severe_cases: A pandas DataFrame containing severe data quality cases. This parameter is accepted but not currently used in the function implementation.
flocks_issues: A pandas DataFrame containing flock-level summary statistics. Must have columns 'TimingIssueRate' (float between 0-1) representing the proportion of timing issues per flock. Used to identify flocks with 100% timing issue rates.
Return Value
This function returns None. It produces output by printing a formatted report directly to the console (stdout). The report includes sections for immediate actions, short-term actions, long-term actions, and impact assessment.
Dependencies
pandas
Required Imports
import pandas as pd
Usage Example
import pandas as pd
from datetime import datetime
# Create sample data
before_start = pd.DataFrame({
'AdministeredDate': pd.to_datetime(['1900-01-01', '2020-01-15', '1900-01-01']),
'FlockCD': ['FLOCK001', 'FLOCK002', 'FLOCK003']
})
after_end = pd.DataFrame({
'DaysAfterEnd': [10, 400, 500],
'FlockCD': ['FLOCK004', 'FLOCK005', 'FLOCK006']
})
severe_cases = pd.DataFrame() # Not used but required
flocks_issues = pd.DataFrame({
'FlockCD': ['FLOCK001', 'FLOCK002', 'FLOCK003'],
'TimingIssueRate': [1.0, 0.5, 1.0]
})
# Generate the report
generate_action_report(before_start, after_end, severe_cases, flocks_issues)
# Output will be printed to console with formatted sections
Best Practices
- Ensure all input DataFrames have the required columns with correct data types before calling this function
- The 'AdministeredDate' column in before_start must be datetime type for year extraction to work
- The hardcoded total_treatments value (247,640) should be parameterized for reusability across different datasets
- Consider redirecting output to a file or returning a string instead of printing directly for better testability
- The severe_cases parameter is unused and could be removed or implemented in future versions
- Validate that DaysAfterEnd values are numeric to avoid errors in the extreme_future calculation
- This function is designed for console output; consider creating a version that returns structured data for programmatic use
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function show_critical_errors 74.2% similar
-
function create_data_quality_dashboard 70.8% similar
-
function create_data_quality_dashboard_v1 68.4% similar
-
function show_problematic_flocks 67.8% similar
-
function analyze_temporal_trends 66.6% similar