šŸ” Code Extractor

function create_data_quality_dashboard

Maturity: 46

Creates an interactive command-line dashboard for analyzing data quality issues in treatment timing data, specifically focusing on treatments administered outside of flock lifecycle dates.

File:
/tf/active/vicechatdev/data_quality_dashboard.py
Lines:
104 - 191
Complexity:
complex

Purpose

This function provides a comprehensive interactive menu system for data quality analysis of poultry treatment timing data. It allows users to select datasets (original, cleaned, or comparison mode), load pre-analyzed timing issue data, and navigate through various analytical views including critical errors, flock type patterns, product analysis, problematic flocks, temporal trends, and corrective action reports. The dashboard is designed to help identify and understand data quality issues where treatments are recorded before flock start dates or after flock end dates.

Source Code

def create_data_quality_dashboard():
    """Create an interactive data quality dashboard."""
    
    print("TREATMENT TIMING DATA QUALITY DASHBOARD")
    print("=" * 50)
    
    # Dataset selection
    dataset_choice = select_dataset()
    if dataset_choice is None:
        return
    
    # Load dataset-specific data
    dataset_data = load_analysis_data(dataset_choice)
    
    # Load timing analysis results
    try:
        before_start = pd.read_csv("/tf/active/timing_analysis_output/treatments_before_start.csv")
        after_end = pd.read_csv("/tf/active/timing_analysis_output/treatments_after_end.csv")
        severe_cases = pd.read_csv("/tf/active/timing_analysis_output/severe_timing_issues.csv")
        flocks_issues = pd.read_csv("/tf/active/timing_analysis_output/flocks_with_timing_issues.csv")
        
        # Convert dates
        before_start['AdministeredDate'] = pd.to_datetime(before_start['AdministeredDate'])
        before_start['StartDate'] = pd.to_datetime(before_start['StartDate'])
        after_end['AdministeredDate'] = pd.to_datetime(after_end['AdministeredDate'])
        after_end['EndDate'] = pd.to_datetime(after_end['EndDate'])
        
    except FileNotFoundError:
        print("Error: Analysis output files not found. Please run the main analysis first.")
        return
    
    # Add dataset context to menu
    dataset_type = dataset_data['type']
    if dataset_type == 'compare':
        print(f"\nšŸ” COMPARISON MODE - Analyzing both original and cleaned datasets")
    else:
        print(f"\nšŸ“Š ANALYZING {dataset_type.upper()} DATASET")
    
    while True:
        print(f"\nDATA QUALITY MENU ({dataset_type} dataset):")
        print("1. Show critical data errors (urgent fixes needed)")
        print("2. Analyze timing patterns by flock type")
        print("3. Product analysis for problematic treatments")
        print("4. Show most problematic flocks")
        print("5. Temporal trend analysis")
        print("6. Generate corrective action report")
        if dataset_type == 'compare':
            print("7. Dataset comparison analysis")
            print("8. Change dataset selection")
            print("9. Exit")
        else:
            print("7. Change dataset selection")
            print("8. Exit")
        
        if dataset_type == 'compare':
            choice_range = "1-9"
            exit_choice = '9'
            change_choice = '8'
        else:
            choice_range = "1-8"
            exit_choice = '8'
            change_choice = '7'
        
        choice = input(f"\nSelect option ({choice_range}): ").strip()
        
        if choice == '1':
            show_critical_errors(before_start, after_end, severe_cases)
        elif choice == '2':
            analyze_flock_type_patterns(before_start, after_end)
        elif choice == '3':
            analyze_problematic_products(severe_cases)
        elif choice == '4':
            show_problematic_flocks(flocks_issues)
        elif choice == '5':
            analyze_temporal_trends(before_start, after_end)
        elif choice == '6':
            generate_action_report(before_start, after_end, severe_cases, flocks_issues)
        elif choice == '7' and dataset_type == 'compare':
            compare_datasets(dataset_data['original_flocks'], dataset_data['cleaned_flocks'])
        elif choice == change_choice:
            print("Returning to dataset selection...")
            create_data_quality_dashboard()  # Restart with new dataset selection
            break
        elif choice == exit_choice:
            print("Exiting dashboard...")
            break
        else:
            print(f"Invalid choice. Please select {choice_range}.")

Return Value

Returns None. The function operates through side effects by printing output to console and calling other analysis functions. It exits when the user selects the exit option or if required data files are not found.

Dependencies

  • pandas
  • matplotlib
  • seaborn
  • datetime
  • warnings
  • os

Required Imports

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import warnings
import os

Usage Example

# Ensure all required analysis output files exist and helper functions are defined
# Then simply call the dashboard function:

create_data_quality_dashboard()

# The function will:
# 1. Prompt user to select a dataset (original/cleaned/compare)
# 2. Load the timing analysis results from CSV files
# 3. Display an interactive menu with options 1-9 (depending on dataset type)
# 4. Execute selected analysis functions based on user input
# 5. Continue until user selects exit option

Best Practices

  • Ensure all required CSV files exist in '/tf/active/timing_analysis_output/' directory before calling this function
  • Run the main timing analysis first to generate the required input files
  • All helper functions (select_dataset, load_analysis_data, show_critical_errors, etc.) must be defined in the same scope
  • The function uses recursive call to restart with new dataset selection - be aware of potential stack depth in long sessions
  • Date columns in CSV files should be in a format parseable by pd.to_datetime()
  • The function expects specific column names in the CSV files: 'AdministeredDate', 'StartDate', 'EndDate'
  • User input is stripped but not extensively validated - ensure proper error handling in called functions
  • The dashboard operates in a blocking loop until user exits - not suitable for automated/batch processing

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function create_data_quality_dashboard_v1 93.7% similar

    Creates an interactive data quality dashboard for analyzing treatment timing issues in poultry flock management data by loading and processing CSV files containing timing anomalies.

    From: /tf/active/vicechatdev/data_quality_dashboard.py
  • function analyze_temporal_trends 74.9% similar

    Analyzes and prints temporal trends in timing issues for treatments that occur before flock start dates or after flock end dates, breaking down occurrences by year and month.

    From: /tf/active/vicechatdev/data_quality_dashboard.py
  • function quick_clean 73.7% similar

    Cleans flock data by identifying and removing flocks that have treatment records with timing inconsistencies (treatments administered outside the flock's start/end date range).

    From: /tf/active/vicechatdev/quick_cleaner.py
  • function show_problematic_flocks 72.9% similar

    Analyzes and displays problematic flocks from a dataset by identifying those with systematic timing issues in their treatment records, categorizing them by severity and volume.

    From: /tf/active/vicechatdev/data_quality_dashboard.py
  • function show_critical_errors 72.1% similar

    Displays critical data quality errors in treatment records, focusing on date anomalies including 1900 dates, extreme future dates, and extreme past dates relative to flock lifecycles.

    From: /tf/active/vicechatdev/data_quality_dashboard.py
← Back to Browse