🔍 Code Extractor

function analyze_logs

Maturity: 46

Parses and analyzes log files to extract synchronization statistics, error counts, and performance metrics for a specified time period.

File:
/tf/active/vicechatdev/SPFCsync/monitor.py
Lines:
36 - 110
Complexity:
moderate

Purpose

This function reads a log file and analyzes entries within a specified time window (default 24 hours) to generate comprehensive statistics about synchronization operations between SharePoint and FileCloud. It tracks sync cycles, file operations (uploads, updates, skips), error/warning counts, cycle completion times, and calculates average performance metrics. The function is designed for monitoring and troubleshooting file synchronization processes.

Source Code

def analyze_logs(log_file, hours=24):
    """Analyze logs for the specified time period."""
    if not os.path.exists(log_file):
        print(f"Log file {log_file} not found")
        return
    
    cutoff_time = datetime.now() - timedelta(hours=hours)
    
    stats = {
        'total_lines': 0,
        'sync_cycles': 0,
        'new_uploads': 0,
        'updated_files': 0,
        'skipped_files': 0,
        'errors': 0,
        'warnings': 0,
        'last_sync': None,
        'avg_cycle_time': 0,
        'cycle_times': []
    }
    
    cycle_start_time = None
    
    try:
        with open(log_file, 'r') as f:
            for line in f:
                stats['total_lines'] += 1
                parsed = parse_log_line(line)
                
                if not parsed or parsed['timestamp'] < cutoff_time:
                    continue
                
                message = parsed['message']
                level = parsed['level']
                
                # Count log levels
                if level == 'ERROR':
                    stats['errors'] += 1
                elif level == 'WARNING':
                    stats['warnings'] += 1
                
                # Track sync cycles
                if 'Starting SharePoint to FileCloud synchronization' in message:
                    cycle_start_time = parsed['timestamp']
                elif 'Synchronization completed' in message and cycle_start_time:
                    stats['sync_cycles'] += 1
                    stats['last_sync'] = parsed['timestamp']
                    cycle_time = (parsed['timestamp'] - cycle_start_time).total_seconds()
                    stats['cycle_times'].append(cycle_time)
                
                # Extract sync statistics
                if 'new_uploads' in message:
                    match = re.search(r"'new_uploads': (\d+)", message)
                    if match:
                        stats['new_uploads'] += int(match.group(1))
                
                if 'updated_files' in message:
                    match = re.search(r"'updated_files': (\d+)", message)
                    if match:
                        stats['updated_files'] += int(match.group(1))
                
                if 'skipped_files' in message:
                    match = re.search(r"'skipped_files': (\d+)", message)
                    if match:
                        stats['skipped_files'] += int(match.group(1))
    
    except Exception as e:
        print(f"Error reading log file: {e}")
        return
    
    # Calculate average cycle time
    if stats['cycle_times']:
        stats['avg_cycle_time'] = sum(stats['cycle_times']) / len(stats['cycle_times'])
    
    return stats

Parameters

Name Type Default Kind
log_file - - positional_or_keyword
hours - 24 positional_or_keyword

Parameter Details

log_file: String path to the log file to be analyzed. Must be a valid file path. If the file doesn't exist, the function prints an error message and returns None.

hours: Integer or float specifying the time window in hours to analyze from the current time backwards. Default is 24 hours. Only log entries within this time period will be included in the analysis. Must be a positive number.

Return Value

Returns a dictionary containing comprehensive log statistics, or None if the log file doesn't exist or an error occurs. The dictionary contains: 'total_lines' (int: total lines read), 'sync_cycles' (int: completed synchronization cycles), 'new_uploads' (int: total new files uploaded), 'updated_files' (int: total files updated), 'skipped_files' (int: total files skipped), 'errors' (int: count of ERROR level logs), 'warnings' (int: count of WARNING level logs), 'last_sync' (datetime object or None: timestamp of last completed sync), 'avg_cycle_time' (float: average cycle time in seconds, 0 if no cycles), 'cycle_times' (list of floats: individual cycle times in seconds).

Dependencies

  • os
  • datetime
  • re

Required Imports

import os
from datetime import datetime, timedelta
import re

Usage Example

import os
from datetime import datetime, timedelta
import re

# Assuming parse_log_line function is defined
def parse_log_line(line):
    # Example implementation
    parts = line.split(' - ')
    if len(parts) >= 3:
        timestamp = datetime.strptime(parts[0], '%Y-%m-%d %H:%M:%S')
        level = parts[1]
        message = parts[2]
        return {'timestamp': timestamp, 'level': level, 'message': message}
    return None

# Analyze logs from the last 24 hours
stats = analyze_logs('/var/log/sync.log', hours=24)

if stats:
    print(f"Total sync cycles: {stats['sync_cycles']}")
    print(f"New uploads: {stats['new_uploads']}")
    print(f"Errors: {stats['errors']}")
    print(f"Average cycle time: {stats['avg_cycle_time']:.2f} seconds")
    if stats['last_sync']:
        print(f"Last sync: {stats['last_sync']}")

# Analyze logs from the last 48 hours
stats_48h = analyze_logs('/var/log/sync.log', hours=48)

Best Practices

  • Ensure the parse_log_line() helper function is defined before calling analyze_logs()
  • The log file should follow a consistent format with timestamps, log levels, and structured messages
  • For large log files, consider the memory impact as the function reads the entire file
  • The function silently skips log entries outside the time window, so verify the 'hours' parameter matches your analysis needs
  • Check if the return value is None before accessing statistics to handle missing files or errors gracefully
  • The function expects specific message patterns for sync operations; ensure your logging format matches these patterns
  • Regex patterns for extracting statistics assume dictionary-style formatting in log messages (e.g., "'new_uploads': 5")
  • Cycle time calculation requires both start and completion messages to be present in the logs within the time window

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function main_v11 70.5% similar

    Executes a diagnostic analysis for file synchronization issues, analyzes missing files, and saves the results to a JSON file.

    From: /tf/active/vicechatdev/SPFCsync/deep_diagnostics.py
  • class SyncDiagnostics 68.3% similar

    A diagnostic class that analyzes and reports on synchronization issues between SharePoint and FileCloud, identifying missing files and root causes of sync failures.

    From: /tf/active/vicechatdev/SPFCsync/deep_diagnostics.py
  • function main_v18 68.3% similar

    Command-line interface entry point for monitoring SharePoint to FileCloud synchronization logs, providing status analysis, log tailing, and real-time watching capabilities.

    From: /tf/active/vicechatdev/SPFCsync/monitor.py
  • function print_status 66.6% similar

    Prints a formatted status report for SharePoint to FileCloud synchronization operations, displaying sync statistics, timing information, and health indicators.

    From: /tf/active/vicechatdev/SPFCsync/monitor.py
  • function dry_run_test 63.0% similar

    Performs a dry run test of SharePoint to FileCloud synchronization, analyzing up to a specified number of documents without actually transferring files.

    From: /tf/active/vicechatdev/SPFCsync/dry_run_test.py
← Back to Browse