🔍 Code Extractor

function load_analysis_data

Maturity: 42

Loads CSV dataset(s) into pandas DataFrames based on dataset configuration, supporting both single dataset and comparison (two-dataset) modes.

File:
/tf/active/vicechatdev/data_quality_dashboard.py
Lines:
56 - 74
Complexity:
simple

Purpose

This function serves as a data loader for analysis workflows that need to handle either a single dataset or compare two datasets (original vs cleaned). It abstracts the loading logic based on dataset type, returning a dictionary with the loaded DataFrame(s) and metadata about the dataset type. This is useful for analysis pipelines that need flexible data loading based on user selection or configuration.

Source Code

def load_analysis_data(dataset_info):
    """Load analysis data based on dataset selection."""
    if dataset_info['type'] == 'compare':
        print("Loading data for comparison analysis...")
        # Load both datasets for comparison
        original_flocks = pd.read_csv(dataset_info['original'])
        cleaned_flocks = pd.read_csv(dataset_info['cleaned'])
        return {
            'original_flocks': original_flocks,
            'cleaned_flocks': cleaned_flocks,
            'type': 'compare'
        }
    else:
        print(f"Loading {dataset_info['type']} dataset...")
        flocks = pd.read_csv(dataset_info['path'])
        return {
            'flocks': flocks,
            'type': dataset_info['type']
        }

Parameters

Name Type Default Kind
dataset_info - - positional_or_keyword

Parameter Details

dataset_info: A dictionary containing dataset configuration. Must include a 'type' key. If type='compare', must have 'original' and 'cleaned' keys with file paths to CSV files. For other types, must have a 'path' key with the file path to a single CSV file. Example: {'type': 'compare', 'original': 'data/original.csv', 'cleaned': 'data/cleaned.csv'} or {'type': 'single', 'path': 'data/dataset.csv'}

Return Value

Returns a dictionary with different structures based on dataset type. For 'compare' type: {'original_flocks': DataFrame, 'cleaned_flocks': DataFrame, 'type': 'compare'}. For other types: {'flocks': DataFrame, 'type': <dataset_type>}. The DataFrames contain the loaded CSV data, and 'type' indicates the dataset configuration used.

Dependencies

  • pandas

Required Imports

import pandas as pd

Usage Example

import pandas as pd

# Example 1: Load comparison datasets
dataset_config = {
    'type': 'compare',
    'original': 'data/original_flocks.csv',
    'cleaned': 'data/cleaned_flocks.csv'
}
result = load_analysis_data(dataset_config)
original_df = result['original_flocks']
cleaned_df = result['cleaned_flocks']
print(f"Loaded {len(original_df)} original and {len(cleaned_df)} cleaned records")

# Example 2: Load single dataset
dataset_config = {
    'type': 'production',
    'path': 'data/production_data.csv'
}
result = load_analysis_data(dataset_config)
flocks_df = result['flocks']
print(f"Loaded {len(flocks_df)} records of type {result['type']}")

Best Practices

  • Ensure dataset_info dictionary has the correct structure with required keys before calling this function
  • Wrap function calls in try-except blocks to handle FileNotFoundError or pandas parsing errors
  • Validate that CSV files exist and are accessible before calling this function
  • Consider adding error handling for malformed CSV files or missing columns
  • The function prints status messages to stdout; redirect or capture if logging is needed
  • For large datasets, consider memory implications of loading multiple DataFrames simultaneously in 'compare' mode

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function compare_datasets 60.8% similar

    Analyzes and compares two pandas DataFrames containing flock data (original vs cleaned), printing detailed statistics about removed records, type distributions, and impact assessment.

    From: /tf/active/vicechatdev/data_quality_dashboard.py
  • function select_dataset 56.8% similar

    Interactive command-line function that prompts users to select between original, cleaned, or comparison of flock datasets for analysis.

    From: /tf/active/vicechatdev/data_quality_dashboard.py
  • function create_data_quality_dashboard_v1 45.5% similar

    Creates an interactive data quality dashboard for analyzing treatment timing issues in poultry flock management data by loading and processing CSV files containing timing anomalies.

    From: /tf/active/vicechatdev/data_quality_dashboard.py
  • function create_data_quality_dashboard 44.2% similar

    Creates an interactive command-line dashboard for analyzing data quality issues in treatment timing data, specifically focusing on treatments administered outside of flock lifecycle dates.

    From: /tf/active/vicechatdev/data_quality_dashboard.py
  • function create_csv_report 43.9% similar

    Creates two CSV reports (summary and detailed) from warranty data, writing warranty information to files with different levels of detail.

    From: /tf/active/vicechatdev/convert_disclosures_to_table.py
← Back to Browse