function show_critical_errors
Displays critical data quality errors in treatment records, focusing on date anomalies including 1900 dates, extreme future dates, and extreme past dates relative to flock lifecycles.
/tf/active/vicechatdev/data_quality_dashboard.py
193 - 223
moderate
Purpose
This function performs data quality analysis on treatment records to identify and report urgent date-related errors that require immediate attention. It categorizes errors into three types: (1) treatments with placeholder 1900-01-01 dates, (2) treatments recorded more than 1 year after flock end dates, and (3) treatments recorded more than 1000 days before flock start dates. The function provides detailed breakdowns including affected flocks and the most extreme cases for each error category.
Source Code
def show_critical_errors(before_start, after_end, severe_cases):
"""Show critical data errors that need immediate attention."""
print("\nCRITICAL DATA ERRORS (URGENT)")
print("-" * 40)
# 1900 date errors
errors_1900 = before_start[before_start['AdministeredDate'].dt.year == 1900]
print(f"1. Treatments with 1900-01-01 dates: {len(errors_1900)}")
if len(errors_1900) > 0:
print(" Affected flocks:")
for flock in errors_1900['FlockCD'].unique():
count = len(errors_1900[errors_1900['FlockCD'] == flock])
print(f" {flock}: {count} treatments")
# Extreme future dates
extreme_future = after_end[after_end['DaysAfterEnd'] > 365]
print(f"\n2. Treatments >1 year after flock end: {len(extreme_future)}")
if len(extreme_future) > 0:
print(" Most extreme cases:")
top_extreme = extreme_future.nlargest(5, 'DaysAfterEnd')
for _, row in top_extreme.iterrows():
print(f" {row['FlockCD']}: {row['DaysAfterEnd']:.0f} days after end")
# Extreme past dates
extreme_past = before_start[before_start['DaysBeforeStart'] > 1000]
print(f"\n3. Treatments >1000 days before flock start: {len(extreme_past)}")
if len(extreme_past) > 0:
print(" Most extreme cases:")
top_extreme = extreme_past.nlargest(5, 'DaysBeforeStart')
for _, row in top_extreme.iterrows():
print(f" {row['FlockCD']}: {row['DaysBeforeStart']:.0f} days before start")
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
before_start |
- | - | positional_or_keyword |
after_end |
- | - | positional_or_keyword |
severe_cases |
- | - | positional_or_keyword |
Parameter Details
before_start: A pandas DataFrame containing treatment records that occurred before their associated flock start dates. Must include columns: 'AdministeredDate' (datetime), 'FlockCD' (flock identifier), and 'DaysBeforeStart' (numeric indicating days before flock start).
after_end: A pandas DataFrame containing treatment records that occurred after their associated flock end dates. Must include columns: 'FlockCD' (flock identifier) and 'DaysAfterEnd' (numeric indicating days after flock end).
severe_cases: A pandas DataFrame containing severe data quality cases. This parameter is accepted but not currently used in the function implementation, suggesting it may be reserved for future functionality or backward compatibility.
Return Value
This function does not return any value (returns None implicitly). It outputs formatted text directly to stdout displaying three categories of critical errors with counts, affected flocks, and detailed examples of the most extreme cases.
Dependencies
pandas
Required Imports
import pandas as pd
Usage Example
import pandas as pd
from datetime import datetime
# Create sample data
before_start = pd.DataFrame({
'FlockCD': ['F001', 'F001', 'F002'],
'AdministeredDate': pd.to_datetime(['1900-01-01', '2020-01-01', '2019-06-01']),
'DaysBeforeStart': [8000, 1500, 1200]
})
after_end = pd.DataFrame({
'FlockCD': ['F003', 'F004'],
'DaysAfterEnd': [400, 500]
})
severe_cases = pd.DataFrame() # Not used but required parameter
# Display critical errors
show_critical_errors(before_start, after_end, severe_cases)
Best Practices
- Ensure 'AdministeredDate' column is converted to pandas datetime type before calling this function
- Pre-filter DataFrames to only include records with date anomalies to improve performance
- The 'severe_cases' parameter is currently unused; consider removing it or implementing its functionality
- Consider redirecting output to a log file for production environments instead of printing to stdout
- Validate that required columns exist in input DataFrames before calling to avoid AttributeError
- The function assumes DaysBeforeStart and DaysAfterEnd are already calculated; ensure these columns exist
- For large datasets, consider adding pagination or limiting the number of displayed results
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function analyze_temporal_trends 74.7% similar
-
function generate_action_report 74.2% similar
-
function show_problematic_flocks 74.1% similar
-
function quick_clean 72.8% similar
-
function create_data_quality_dashboard 72.1% similar