function main
Command-line interface function that orchestrates pattern-based extraction of poultry flock data, including data loading, pattern classification, geocoding, and export functionality.
/tf/active/vicechatdev/pattern_based_extraction.py
505 - 622
complex
Purpose
This is the main entry point for a pattern-based poultry data extraction tool. It processes command-line arguments to extract flock data based on In-Ovo usage patterns (sequential, concurrent, mixed, or all), filters data by date, optionally performs geocoding and map generation, and exports results to CSV files. The function coordinates multiple extraction steps including loading base data, identifying mixed farms, classifying farm patterns, enriching data, and exporting results.
Source Code
def main():
"""Main function for pattern-based extraction."""
parser = argparse.ArgumentParser(description='Pattern-Based Poultry Data Extraction')
parser.add_argument('--pattern', type=str, required=True,
choices=['sequential', 'concurrent', 'mixed', 'all'],
help='In-Ovo usage pattern to extract')
parser.add_argument('--output', type=str, default=None,
help='Output CSV filename (default: auto-generated)')
parser.add_argument('--sample-size', type=int, default=None,
help='Number of flocks to sample (default: extract all)')
parser.add_argument('--geocoded-data', type=str, default=None,
help='Path to geocoded data file for coordinate enrichment')
parser.add_argument('--data-dir', type=str, default='/tf/active/pehestat_data',
help='Directory containing Pehestat data files')
parser.add_argument('--skip-geocoding', action='store_true',
help='Skip geocoding and map generation')
parser.add_argument('--cache-only', action='store_true',
help='Use geocoding cache only (no API calls)')
parser.add_argument('--create-map', action='store_true',
help='Create interactive map (requires geocoding)')
parser.add_argument('--map-output', type=str, default=None,
help='Output map filename (default: auto-generated)')
parser.add_argument('--use-clustering', action='store_true',
help='Enable marker clustering on the map')
parser.add_argument('--start-date', type=str, default='2020-01-01',
help='Start date filter (YYYY-MM-DD, default: 2020-01-01)')
args = parser.parse_args()
print("=" * 80)
print("PATTERN-BASED POULTRY DATA EXTRACTION")
print("=" * 80)
print(f"Target pattern: {args.pattern}")
print(f"Start date filter: {args.start_date}")
print(f"Sample size: {'All flocks' if args.sample_size is None else f'{args.sample_size:,} flocks'}")
print(f"Data directory: {args.data_dir}")
if args.geocoded_data:
print(f"Geocoded data: {args.geocoded_data}")
if not args.skip_geocoding:
if args.cache_only:
print("Geocoding: Cache-only mode (no API calls)")
else:
print("Geocoding: Full mode (includes API calls if needed)")
if args.create_map:
print("Map generation: Enabled")
else:
print("Geocoding: Disabled")
print("=" * 80)
try:
# Initialize extractor
extractor = PatternBasedExtractor(
data_dir=args.data_dir,
geocoded_file=args.geocoded_data
)
# Load and filter base data
flocks_df = extractor.load_and_filter_base_data(start_date=args.start_date)
# Identify mixed farms
mixed_farms_df = extractor.identify_mixed_farms(flocks_df)
if len(mixed_farms_df) == 0:
print("No mixed farms found! Cannot proceed with pattern extraction.")
return
# Classify farm patterns
patterns_df = extractor.classify_farm_patterns(flocks_df, mixed_farms_df)
if len(patterns_df) == 0:
print("No farm patterns could be classified! Cannot proceed.")
return
# Extract flocks by pattern
if args.pattern == 'all':
# Extract all patterns
for pattern in ['sequential', 'concurrent', 'mixed']:
pattern_flocks = extractor.extract_flocks_by_pattern(
pattern, flocks_df, patterns_df, args.sample_size
)
if len(pattern_flocks) > 0:
# Enrich data
enriched_flocks = extractor.enrich_flock_data(pattern_flocks)
# Export results
output_file = args.output
if output_file and args.pattern == 'all':
# Modify filename for each pattern
base, ext = os.path.splitext(output_file)
output_file = f"{base}_{pattern}{ext}"
extractor.export_results(enriched_flocks, pattern, output_file)
else:
# Extract specific pattern
pattern_flocks = extractor.extract_flocks_by_pattern(
args.pattern, flocks_df, patterns_df, args.sample_size
)
if len(pattern_flocks) == 0:
print(f"No flocks found for pattern '{args.pattern}'!")
return
# Enrich data
enriched_flocks = extractor.enrich_flock_data(pattern_flocks)
# Export results
extractor.export_results(enriched_flocks, args.pattern, args.output)
print("\nā
Pattern-based extraction completed successfully!")
except Exception as e:
print(f"\nā Error during pattern-based extraction: {e}")
import traceback
traceback.print_exc()
return 1
return 0
Return Value
Returns an integer exit code: 0 for successful completion, 1 for errors during execution, or None (implicit None) if early termination occurs due to no data found. The function primarily produces side effects (file exports, console output) rather than returning data.
Dependencies
argparseossyspandasnumpydatetimetypingtracebackmatched_sample_analysisextractor
Required Imports
import os
import sys
import pandas as pd
import numpy as np
import argparse
from datetime import datetime
from typing import Dict, List, Optional, Tuple
from matched_sample_analysis import MatchedSampleAnalyzer
from extractor import PehestatDataExtractor
import traceback
Conditional/Optional Imports
These imports are only needed under specific conditions:
import traceback
Condition: only used in exception handling blocks for detailed error reporting
Required (conditional)Usage Example
# Run from command line:
# Extract sequential pattern flocks from 2020 onwards
python script.py --pattern sequential --start-date 2020-01-01 --output sequential_flocks.csv
# Extract all patterns with sampling and geocoding
python script.py --pattern all --sample-size 1000 --geocoded-data geocoded.csv --create-map
# Extract concurrent pattern with cache-only geocoding
python script.py --pattern concurrent --cache-only --skip-geocoding --data-dir /custom/path
# If calling from Python code:
if __name__ == '__main__':
sys.exit(main())
Best Practices
- Always run with --pattern argument as it is required
- Use --start-date to filter data to relevant time periods and improve performance
- When extracting all patterns, provide a base output filename; the function will automatically append pattern names
- Use --cache-only flag to avoid API rate limits when geocoding data repeatedly
- Set --sample-size for testing or when working with large datasets to reduce processing time
- Check console output for data availability messages before expecting output files
- The function returns exit codes (0 or 1) suitable for shell scripting and CI/CD pipelines
- Ensure PatternBasedExtractor class is properly defined with methods: load_and_filter_base_data, identify_mixed_farms, classify_farm_patterns, extract_flocks_by_pattern, enrich_flock_data, export_results
- Handle the case where no mixed farms or patterns are found, as the function will exit early
- Use --skip-geocoding when coordinates are not needed to speed up processing
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
class PatternBasedExtractor 65.4% similar
-
function analyze_flock_type_patterns 59.4% similar
-
function show_problematic_flocks 57.8% similar
-
function create_data_quality_dashboard 55.5% similar
-
function select_dataset 55.2% similar