function load_analysis_data
Loads CSV dataset(s) into pandas DataFrames based on dataset configuration, supporting both single dataset and comparison (two-dataset) modes.
/tf/active/vicechatdev/data_quality_dashboard.py
56 - 74
simple
Purpose
This function serves as a data loader for analysis workflows that need to handle either a single dataset or compare two datasets (original vs cleaned). It abstracts the loading logic based on dataset type, returning a dictionary with the loaded DataFrame(s) and metadata about the dataset type. This is useful for analysis pipelines that need flexible data loading based on user selection or configuration.
Source Code
def load_analysis_data(dataset_info):
"""Load analysis data based on dataset selection."""
if dataset_info['type'] == 'compare':
print("Loading data for comparison analysis...")
# Load both datasets for comparison
original_flocks = pd.read_csv(dataset_info['original'])
cleaned_flocks = pd.read_csv(dataset_info['cleaned'])
return {
'original_flocks': original_flocks,
'cleaned_flocks': cleaned_flocks,
'type': 'compare'
}
else:
print(f"Loading {dataset_info['type']} dataset...")
flocks = pd.read_csv(dataset_info['path'])
return {
'flocks': flocks,
'type': dataset_info['type']
}
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
dataset_info |
- | - | positional_or_keyword |
Parameter Details
dataset_info: A dictionary containing dataset configuration. Must include a 'type' key. If type='compare', must have 'original' and 'cleaned' keys with file paths to CSV files. For other types, must have a 'path' key with the file path to a single CSV file. Example: {'type': 'compare', 'original': 'data/original.csv', 'cleaned': 'data/cleaned.csv'} or {'type': 'single', 'path': 'data/dataset.csv'}
Return Value
Returns a dictionary with different structures based on dataset type. For 'compare' type: {'original_flocks': DataFrame, 'cleaned_flocks': DataFrame, 'type': 'compare'}. For other types: {'flocks': DataFrame, 'type': <dataset_type>}. The DataFrames contain the loaded CSV data, and 'type' indicates the dataset configuration used.
Dependencies
pandas
Required Imports
import pandas as pd
Usage Example
import pandas as pd
# Example 1: Load comparison datasets
dataset_config = {
'type': 'compare',
'original': 'data/original_flocks.csv',
'cleaned': 'data/cleaned_flocks.csv'
}
result = load_analysis_data(dataset_config)
original_df = result['original_flocks']
cleaned_df = result['cleaned_flocks']
print(f"Loaded {len(original_df)} original and {len(cleaned_df)} cleaned records")
# Example 2: Load single dataset
dataset_config = {
'type': 'production',
'path': 'data/production_data.csv'
}
result = load_analysis_data(dataset_config)
flocks_df = result['flocks']
print(f"Loaded {len(flocks_df)} records of type {result['type']}")
Best Practices
- Ensure dataset_info dictionary has the correct structure with required keys before calling this function
- Wrap function calls in try-except blocks to handle FileNotFoundError or pandas parsing errors
- Validate that CSV files exist and are accessible before calling this function
- Consider adding error handling for malformed CSV files or missing columns
- The function prints status messages to stdout; redirect or capture if logging is needed
- For large datasets, consider memory implications of loading multiple DataFrames simultaneously in 'compare' mode
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function compare_datasets 60.8% similar
-
function select_dataset 56.8% similar
-
function create_data_quality_dashboard_v1 45.5% similar
-
function create_data_quality_dashboard 44.2% similar
-
function create_csv_report 43.9% similar