function analyze_logs
Parses and analyzes log files to extract synchronization statistics, error counts, and performance metrics for a specified time period.
/tf/active/vicechatdev/SPFCsync/monitor.py
36 - 110
moderate
Purpose
This function reads a log file and analyzes entries within a specified time window (default 24 hours) to generate comprehensive statistics about synchronization operations between SharePoint and FileCloud. It tracks sync cycles, file operations (uploads, updates, skips), error/warning counts, cycle completion times, and calculates average performance metrics. The function is designed for monitoring and troubleshooting file synchronization processes.
Source Code
def analyze_logs(log_file, hours=24):
"""Analyze logs for the specified time period."""
if not os.path.exists(log_file):
print(f"Log file {log_file} not found")
return
cutoff_time = datetime.now() - timedelta(hours=hours)
stats = {
'total_lines': 0,
'sync_cycles': 0,
'new_uploads': 0,
'updated_files': 0,
'skipped_files': 0,
'errors': 0,
'warnings': 0,
'last_sync': None,
'avg_cycle_time': 0,
'cycle_times': []
}
cycle_start_time = None
try:
with open(log_file, 'r') as f:
for line in f:
stats['total_lines'] += 1
parsed = parse_log_line(line)
if not parsed or parsed['timestamp'] < cutoff_time:
continue
message = parsed['message']
level = parsed['level']
# Count log levels
if level == 'ERROR':
stats['errors'] += 1
elif level == 'WARNING':
stats['warnings'] += 1
# Track sync cycles
if 'Starting SharePoint to FileCloud synchronization' in message:
cycle_start_time = parsed['timestamp']
elif 'Synchronization completed' in message and cycle_start_time:
stats['sync_cycles'] += 1
stats['last_sync'] = parsed['timestamp']
cycle_time = (parsed['timestamp'] - cycle_start_time).total_seconds()
stats['cycle_times'].append(cycle_time)
# Extract sync statistics
if 'new_uploads' in message:
match = re.search(r"'new_uploads': (\d+)", message)
if match:
stats['new_uploads'] += int(match.group(1))
if 'updated_files' in message:
match = re.search(r"'updated_files': (\d+)", message)
if match:
stats['updated_files'] += int(match.group(1))
if 'skipped_files' in message:
match = re.search(r"'skipped_files': (\d+)", message)
if match:
stats['skipped_files'] += int(match.group(1))
except Exception as e:
print(f"Error reading log file: {e}")
return
# Calculate average cycle time
if stats['cycle_times']:
stats['avg_cycle_time'] = sum(stats['cycle_times']) / len(stats['cycle_times'])
return stats
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
log_file |
- | - | positional_or_keyword |
hours |
- | 24 | positional_or_keyword |
Parameter Details
log_file: String path to the log file to be analyzed. Must be a valid file path. If the file doesn't exist, the function prints an error message and returns None.
hours: Integer or float specifying the time window in hours to analyze from the current time backwards. Default is 24 hours. Only log entries within this time period will be included in the analysis. Must be a positive number.
Return Value
Returns a dictionary containing comprehensive log statistics, or None if the log file doesn't exist or an error occurs. The dictionary contains: 'total_lines' (int: total lines read), 'sync_cycles' (int: completed synchronization cycles), 'new_uploads' (int: total new files uploaded), 'updated_files' (int: total files updated), 'skipped_files' (int: total files skipped), 'errors' (int: count of ERROR level logs), 'warnings' (int: count of WARNING level logs), 'last_sync' (datetime object or None: timestamp of last completed sync), 'avg_cycle_time' (float: average cycle time in seconds, 0 if no cycles), 'cycle_times' (list of floats: individual cycle times in seconds).
Dependencies
osdatetimere
Required Imports
import os
from datetime import datetime, timedelta
import re
Usage Example
import os
from datetime import datetime, timedelta
import re
# Assuming parse_log_line function is defined
def parse_log_line(line):
# Example implementation
parts = line.split(' - ')
if len(parts) >= 3:
timestamp = datetime.strptime(parts[0], '%Y-%m-%d %H:%M:%S')
level = parts[1]
message = parts[2]
return {'timestamp': timestamp, 'level': level, 'message': message}
return None
# Analyze logs from the last 24 hours
stats = analyze_logs('/var/log/sync.log', hours=24)
if stats:
print(f"Total sync cycles: {stats['sync_cycles']}")
print(f"New uploads: {stats['new_uploads']}")
print(f"Errors: {stats['errors']}")
print(f"Average cycle time: {stats['avg_cycle_time']:.2f} seconds")
if stats['last_sync']:
print(f"Last sync: {stats['last_sync']}")
# Analyze logs from the last 48 hours
stats_48h = analyze_logs('/var/log/sync.log', hours=48)
Best Practices
- Ensure the parse_log_line() helper function is defined before calling analyze_logs()
- The log file should follow a consistent format with timestamps, log levels, and structured messages
- For large log files, consider the memory impact as the function reads the entire file
- The function silently skips log entries outside the time window, so verify the 'hours' parameter matches your analysis needs
- Check if the return value is None before accessing statistics to handle missing files or errors gracefully
- The function expects specific message patterns for sync operations; ensure your logging format matches these patterns
- Regex patterns for extracting statistics assume dictionary-style formatting in log messages (e.g., "'new_uploads': 5")
- Cycle time calculation requires both start and completion messages to be present in the logs within the time window
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function main_v11 70.5% similar
-
class SyncDiagnostics 68.3% similar
-
function main_v18 68.3% similar
-
function print_status 66.6% similar
-
function dry_run_test 63.0% similar