šŸ” Code Extractor

class OneDriveProcessor

Maturity: 48

OneDriveProcessor is a class that monitors a OneDrive folder for new files, processes them using an E-Ink LLM Assistant, and uploads the results back to OneDrive.

File:
/tf/active/vicechatdev/e-ink-llm/onedrive_client.py
Lines:
510 - 625
Complexity:
moderate

Purpose

This class provides automated file processing integration with OneDrive for the E-Ink LLM Assistant. It continuously watches a specified OneDrive folder for new files (PDFs and images), downloads them, processes them through the LLM assistant, and uploads the processed results to an output folder. It supports configurable polling intervals, automatic folder creation, and optional deletion of processed files.

Source Code

class OneDriveProcessor:
    """OneDrive file processor for E-Ink LLM Assistant"""
    
    def __init__(self, onedrive_config: Dict[str, Any], api_key: str):
        """
        Initialize OneDrive processor
        
        Args:
            onedrive_config: OneDrive configuration dictionary
            api_key: OpenAI API key
        """
        self.client = OneDriveClient(onedrive_config)
        self.api_key = api_key
        self.config = onedrive_config
        
        # Configuration
        self.watch_folder = onedrive_config.get('watch_folder_path', '/E-Ink LLM Input')
        self.output_folder = onedrive_config.get('output_folder_path', '/E-Ink LLM Output')
        self.poll_interval = onedrive_config.get('poll_interval', 60)
        self.processed_files = set()
        
        # Supported file types
        self.supported_extensions = ['.pdf', '.jpg', '.jpeg', '.png', '.gif', '.bmp', '.tiff', '.webp']
        
        print(f"šŸ“ OneDrive watch folder: {self.watch_folder}")
        print(f"šŸ“ OneDrive output folder: {self.output_folder}")
    
    async def initialize(self) -> bool:
        """Initialize OneDrive connection"""
        success = await self.client.authenticate()
        if success:
            # Ensure folders exist
            await self.client.create_folder(self.watch_folder)
            await self.client.create_folder(self.output_folder)
        return success
    
    async def start_watching(self) -> None:
        """Start watching OneDrive folder for new files"""
        if not await self.initialize():
            print("āŒ Failed to initialize OneDrive connection")
            return
        
        print(f"šŸ‘€ Watching OneDrive folder: {self.watch_folder}")
        print(f"ā±ļø Poll interval: {self.poll_interval} seconds")
        print("šŸ›‘ Press Ctrl+C to stop")
        
        try:
            while True:
                await self._check_for_new_files()
                await asyncio.sleep(self.poll_interval)
                
        except KeyboardInterrupt:
            print("\nšŸ›‘ OneDrive watching stopped")
    
    async def _check_for_new_files(self) -> None:
        """Check for new files in OneDrive watch folder"""
        try:
            files = await self.client.list_files_in_folder(
                self.watch_folder, 
                self.supported_extensions
            )
            
            new_files = [f for f in files if f['id'] not in self.processed_files]
            
            if new_files:
                print(f"šŸ” Found {len(new_files)} new files in OneDrive")
                
                for file_info in new_files:
                    await self._process_file(file_info)
                    self.processed_files.add(file_info['id'])
            
        except Exception as e:
            print(f"āŒ Error checking for new files: {e}")
    
    async def _process_file(self, file_info: Dict[str, Any]) -> None:
        """Process a single file from OneDrive"""
        print(f"šŸ“„ Processing OneDrive file: {file_info['name']}")
        
        try:
            # Create temporary directory for processing
            temp_dir = Path("temp_onedrive")
            temp_dir.mkdir(exist_ok=True)
            
            # Download file
            local_input_path = temp_dir / file_info['name']
            if not await self.client.download_file(file_info, str(local_input_path)):
                return
            
            # Process with E-Ink LLM
            from processor import process_single_file
            result_path = await process_single_file(str(local_input_path), self.api_key)
            
            if result_path:
                # Upload result to OneDrive
                result_file = Path(result_path)
                upload_success = await self.client.upload_file(
                    str(result_file),
                    self.output_folder,
                    result_file.name
                )
                
                if upload_success:
                    print(f"āœ… Processed and uploaded: {file_info['name']} -> {result_file.name}")
                    
                    # Optional: delete original file from input folder
                    if self.config.get('delete_after_processing', False):
                        await self.client.delete_file(file_info)
                
                # Clean up local files
                local_input_path.unlink(missing_ok=True)
                result_file.unlink(missing_ok=True)
            else:
                print(f"āŒ Failed to process: {file_info['name']}")
                
        except Exception as e:
            print(f"āŒ Error processing {file_info['name']}: {e}")

Parameters

Name Type Default Kind
bases - -

Parameter Details

onedrive_config: Dictionary containing OneDrive configuration settings. Expected keys include: 'watch_folder_path' (default: '/E-Ink LLM Input'), 'output_folder_path' (default: '/E-Ink LLM Output'), 'poll_interval' (default: 60 seconds), 'delete_after_processing' (boolean, optional). Also contains authentication credentials passed to OneDriveClient.

api_key: OpenAI API key string used for processing files through the LLM. Required for the process_single_file function to work.

Return Value

The constructor returns an instance of OneDriveProcessor. The initialize() method returns a boolean indicating success/failure of OneDrive connection. The start_watching() method returns None and runs indefinitely until interrupted. Internal methods _check_for_new_files() and _process_file() return None and perform side effects (file processing and uploads).

Class Interface

Methods

__init__(self, onedrive_config: Dict[str, Any], api_key: str)

Purpose: Initialize the OneDrive processor with configuration and API credentials

Parameters:

  • onedrive_config: Dictionary containing OneDrive settings and authentication credentials
  • api_key: OpenAI API key for LLM processing

Returns: None (constructor)

async initialize(self) -> bool

Purpose: Authenticate with OneDrive and ensure required folders exist

Returns: Boolean indicating whether initialization was successful

async start_watching(self) -> None

Purpose: Start continuous monitoring of OneDrive folder for new files to process

Returns: None (runs indefinitely until KeyboardInterrupt)

async _check_for_new_files(self) -> None

Purpose: Check OneDrive watch folder for new files and process any found

Returns: None (performs side effects: processes files and updates processed_files set)

async _process_file(self, file_info: Dict[str, Any]) -> None

Purpose: Download a file from OneDrive, process it through LLM, and upload result

Parameters:

  • file_info: Dictionary containing file metadata including 'id', 'name', and download information

Returns: None (performs side effects: downloads, processes, uploads files)

Attributes

Name Type Description Scope
client OneDriveClient OneDrive client instance for API interactions instance
api_key str OpenAI API key for LLM processing instance
config Dict[str, Any] Full OneDrive configuration dictionary instance
watch_folder str OneDrive folder path to monitor for new files (default: '/E-Ink LLM Input') instance
output_folder str OneDrive folder path where processed files are uploaded (default: '/E-Ink LLM Output') instance
poll_interval int Seconds between checks for new files (default: 60) instance
processed_files set Set of file IDs that have already been processed to avoid reprocessing instance
supported_extensions List[str] List of file extensions that can be processed: ['.pdf', '.jpg', '.jpeg', '.png', '.gif', '.bmp', '.tiff', '.webp'] instance

Dependencies

  • msal
  • requests
  • asyncio
  • pathlib

Required Imports

import os
import json
import time
import asyncio
from pathlib import Path
from typing import Dict, List, Optional, Any
import hashlib
import msal
import requests
from datetime import datetime, timedelta

Conditional/Optional Imports

These imports are only needed under specific conditions:

from processor import process_single_file

Condition: Required when _process_file method is called to process downloaded files through the E-Ink LLM Assistant

Required (conditional)

Usage Example

import asyncio
from onedrive_processor import OneDriveProcessor

# Configuration
onedrive_config = {
    'client_id': 'your-client-id',
    'client_secret': 'your-client-secret',
    'tenant_id': 'your-tenant-id',
    'watch_folder_path': '/E-Ink LLM Input',
    'output_folder_path': '/E-Ink LLM Output',
    'poll_interval': 60,
    'delete_after_processing': False
}
api_key = 'your-openai-api-key'

# Create processor instance
processor = OneDriveProcessor(onedrive_config, api_key)

# Start watching (runs indefinitely)
async def main():
    await processor.start_watching()

asyncio.run(main())

Best Practices

  • Always call initialize() or start_watching() (which calls initialize internally) before attempting to process files
  • The class maintains state through processed_files set to avoid reprocessing the same files
  • Use start_watching() for continuous monitoring; it handles initialization automatically
  • Ensure OneDriveClient is properly configured with valid authentication credentials before instantiation
  • The class creates temporary files in 'temp_onedrive' directory which are cleaned up after processing
  • Handle KeyboardInterrupt gracefully when using start_watching() for clean shutdown
  • The poll_interval should be set appropriately to balance responsiveness and API rate limits
  • Supported file extensions are hardcoded but can be modified by accessing the supported_extensions attribute
  • Set delete_after_processing to True in config only if you want original files removed after successful processing

Similar Components

AI-powered semantic similarity - components with related functionality:

  • class MixedCloudProcessor 68.9% similar

    A cloud integration processor that monitors both OneDrive and reMarkable Cloud for input PDF files, processes them through an API, and manages file synchronization between cloud services.

    From: /tf/active/vicechatdev/e-ink-llm/mixed_cloud_processor.py
  • class OneDriveClient 61.1% similar

    A comprehensive Microsoft OneDrive client that uses the Microsoft Graph API to authenticate and perform file operations (upload, download, list, delete) on OneDrive storage.

    From: /tf/active/vicechatdev/e-ink-llm/onedrive_client.py
  • class RemarkableEInkProcessor 60.9% similar

    Enhanced E-Ink LLM Processor that extends EInkLLMProcessor with reMarkable Cloud integration, enabling file processing from both local directories and reMarkable Cloud storage.

    From: /tf/active/vicechatdev/e-ink-llm/remarkable_processor.py
  • class EInkLLMProcessor 56.7% similar

    Main processor class that handles the complete workflow

    From: /tf/active/vicechatdev/e-ink-llm/processor.py
  • class DocumentProcessor_v2 55.7% similar

    A document processing class that extracts text from PDF and Word documents using llmsherpa as the primary method with fallback support for PyPDF2, pdfplumber, and python-docx.

    From: /tf/active/vicechatdev/contract_validity_analyzer/utils/document_processor_old.py
← Back to Browse