🔍 Code Extractor

function show_problematic_flocks

Maturity: 42

Analyzes and displays problematic flocks from a dataset by identifying those with systematic timing issues in their treatment records, categorizing them by severity and volume.

File:
/tf/active/vicechatdev/data_quality_dashboard.py
Lines:
267 - 295
Complexity:
simple

Purpose

This function provides a diagnostic report for data quality analysis in livestock/poultry management systems. It identifies flocks with timing data entry errors by examining treatment records, highlighting both systematic issues (100% error rate) and high-volume flocks with significant but partial timing problems. This helps data managers prioritize data cleaning efforts and identify systematic data entry problems.

Source Code

def show_problematic_flocks(flocks_issues):
    """Show the most problematic flocks."""
    print("\nMOST PROBLEMATIC FLOCKS")
    print("-" * 40)
    
    # Flocks with 100% timing issues
    perfect_issues = flocks_issues[flocks_issues['TimingIssueRate'] == 1.0]
    print(f"Flocks with 100% timing issues: {len(perfect_issues)}")
    print("(These likely have systematic data entry errors)")
    
    if len(perfect_issues) > 0:
        print("\nTop 10 flocks with 100% issues (by treatment count):")
        top_perfect = perfect_issues.nlargest(10, 'TotalTreatments')
        for _, flock in top_perfect.iterrows():
            print(f"  {flock['FlockCD']}: {flock['TotalTreatments']} treatments, {flock['Type']} type")
    
    # Flocks with partial issues but high volume
    partial_issues = flocks_issues[
        (flocks_issues['TimingIssueRate'] < 1.0) & 
        (flocks_issues['TimingIssueRate'] > 0.1) &
        (flocks_issues['TotalTreatments'] >= 10)
    ]
    
    if len(partial_issues) > 0:
        print(f"\nHigh-volume flocks with significant timing issues (10+ treatments, >10% issues):")
        top_partial = partial_issues.nlargest(10, 'TotalTreatments')
        for _, flock in top_partial.iterrows():
            rate = flock['TimingIssueRate'] * 100
            print(f"  {flock['FlockCD']}: {rate:.1f}% issues ({flock['TimingIssueCount']}/{flock['TotalTreatments']} treatments)")

Parameters

Name Type Default Kind
flocks_issues - - positional_or_keyword

Parameter Details

flocks_issues: A pandas DataFrame containing flock-level aggregated data with the following expected columns: 'FlockCD' (flock identifier), 'TimingIssueRate' (float between 0-1 representing proportion of treatments with timing issues), 'TotalTreatments' (integer count of total treatments), 'TimingIssueCount' (integer count of treatments with timing issues), and 'Type' (string indicating flock type). The DataFrame should be pre-computed with timing issue statistics for each flock.

Return Value

This function returns None. It produces console output displaying: (1) count and details of flocks with 100% timing issues, showing top 10 by treatment count, (2) high-volume flocks (10+ treatments) with significant timing issues (>10% but <100%), showing top 10 by treatment count with their issue rates and counts.

Dependencies

  • pandas

Required Imports

import pandas as pd

Usage Example

import pandas as pd

# Create sample flocks_issues DataFrame
flocks_issues = pd.DataFrame({
    'FlockCD': ['F001', 'F002', 'F003', 'F004', 'F005'],
    'TimingIssueRate': [1.0, 1.0, 0.5, 0.15, 0.05],
    'TotalTreatments': [25, 15, 30, 50, 100],
    'TimingIssueCount': [25, 15, 15, 7, 5],
    'Type': ['Broiler', 'Layer', 'Broiler', 'Layer', 'Broiler']
})

# Display problematic flocks report
show_problematic_flocks(flocks_issues)

Best Practices

  • Ensure the input DataFrame contains all required columns before calling this function
  • Pre-compute timing issue statistics (TimingIssueRate, TimingIssueCount) before passing to this function
  • Use this function as part of a larger data quality pipeline to identify data entry problems
  • The function assumes TimingIssueRate is a float between 0 and 1 (not a percentage)
  • Consider redirecting output to a file for logging purposes in production environments
  • This is a display-only function; capture the output or modify the function if you need to programmatically access the results

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function analyze_flock_type_patterns 76.1% similar

    Analyzes and prints timing pattern statistics for flock data by categorizing issues that occur before start time and after end time, grouped by flock type.

    From: /tf/active/vicechatdev/data_quality_dashboard.py
  • function quick_clean 75.0% similar

    Cleans flock data by identifying and removing flocks that have treatment records with timing inconsistencies (treatments administered outside the flock's start/end date range).

    From: /tf/active/vicechatdev/quick_cleaner.py
  • function show_critical_errors 74.1% similar

    Displays critical data quality errors in treatment records, focusing on date anomalies including 1900 dates, extreme future dates, and extreme past dates relative to flock lifecycles.

    From: /tf/active/vicechatdev/data_quality_dashboard.py
  • function analyze_temporal_trends 73.1% similar

    Analyzes and prints temporal trends in timing issues for treatments that occur before flock start dates or after flock end dates, breaking down occurrences by year and month.

    From: /tf/active/vicechatdev/data_quality_dashboard.py
  • function create_data_quality_dashboard 72.9% similar

    Creates an interactive command-line dashboard for analyzing data quality issues in treatment timing data, specifically focusing on treatments administered outside of flock lifecycle dates.

    From: /tf/active/vicechatdev/data_quality_dashboard.py
← Back to Browse