function show_problematic_flocks
Analyzes and displays problematic flocks from a dataset by identifying those with systematic timing issues in their treatment records, categorizing them by severity and volume.
/tf/active/vicechatdev/data_quality_dashboard.py
267 - 295
simple
Purpose
This function provides a diagnostic report for data quality analysis in livestock/poultry management systems. It identifies flocks with timing data entry errors by examining treatment records, highlighting both systematic issues (100% error rate) and high-volume flocks with significant but partial timing problems. This helps data managers prioritize data cleaning efforts and identify systematic data entry problems.
Source Code
def show_problematic_flocks(flocks_issues):
"""Show the most problematic flocks."""
print("\nMOST PROBLEMATIC FLOCKS")
print("-" * 40)
# Flocks with 100% timing issues
perfect_issues = flocks_issues[flocks_issues['TimingIssueRate'] == 1.0]
print(f"Flocks with 100% timing issues: {len(perfect_issues)}")
print("(These likely have systematic data entry errors)")
if len(perfect_issues) > 0:
print("\nTop 10 flocks with 100% issues (by treatment count):")
top_perfect = perfect_issues.nlargest(10, 'TotalTreatments')
for _, flock in top_perfect.iterrows():
print(f" {flock['FlockCD']}: {flock['TotalTreatments']} treatments, {flock['Type']} type")
# Flocks with partial issues but high volume
partial_issues = flocks_issues[
(flocks_issues['TimingIssueRate'] < 1.0) &
(flocks_issues['TimingIssueRate'] > 0.1) &
(flocks_issues['TotalTreatments'] >= 10)
]
if len(partial_issues) > 0:
print(f"\nHigh-volume flocks with significant timing issues (10+ treatments, >10% issues):")
top_partial = partial_issues.nlargest(10, 'TotalTreatments')
for _, flock in top_partial.iterrows():
rate = flock['TimingIssueRate'] * 100
print(f" {flock['FlockCD']}: {rate:.1f}% issues ({flock['TimingIssueCount']}/{flock['TotalTreatments']} treatments)")
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
flocks_issues |
- | - | positional_or_keyword |
Parameter Details
flocks_issues: A pandas DataFrame containing flock-level aggregated data with the following expected columns: 'FlockCD' (flock identifier), 'TimingIssueRate' (float between 0-1 representing proportion of treatments with timing issues), 'TotalTreatments' (integer count of total treatments), 'TimingIssueCount' (integer count of treatments with timing issues), and 'Type' (string indicating flock type). The DataFrame should be pre-computed with timing issue statistics for each flock.
Return Value
This function returns None. It produces console output displaying: (1) count and details of flocks with 100% timing issues, showing top 10 by treatment count, (2) high-volume flocks (10+ treatments) with significant timing issues (>10% but <100%), showing top 10 by treatment count with their issue rates and counts.
Dependencies
pandas
Required Imports
import pandas as pd
Usage Example
import pandas as pd
# Create sample flocks_issues DataFrame
flocks_issues = pd.DataFrame({
'FlockCD': ['F001', 'F002', 'F003', 'F004', 'F005'],
'TimingIssueRate': [1.0, 1.0, 0.5, 0.15, 0.05],
'TotalTreatments': [25, 15, 30, 50, 100],
'TimingIssueCount': [25, 15, 15, 7, 5],
'Type': ['Broiler', 'Layer', 'Broiler', 'Layer', 'Broiler']
})
# Display problematic flocks report
show_problematic_flocks(flocks_issues)
Best Practices
- Ensure the input DataFrame contains all required columns before calling this function
- Pre-compute timing issue statistics (TimingIssueRate, TimingIssueCount) before passing to this function
- Use this function as part of a larger data quality pipeline to identify data entry problems
- The function assumes TimingIssueRate is a float between 0 and 1 (not a percentage)
- Consider redirecting output to a file for logging purposes in production environments
- This is a display-only function; capture the output or modify the function if you need to programmatically access the results
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function analyze_flock_type_patterns 76.1% similar
-
function quick_clean 75.0% similar
-
function show_critical_errors 74.1% similar
-
function analyze_temporal_trends 73.1% similar
-
function create_data_quality_dashboard 72.9% similar