🔍 Code Extractor

function get_invoice_pagedata

Maturity: 44

Retrieves pagedata content from a specific Poulpharm invoice document stored in the Remarkable cloud service by authenticating, navigating the document hierarchy, and extracting the pagedata component.

File:
/tf/active/vicechatdev/e-ink-llm/cloudtest/extract_invoice_pagedata.py
Lines:
11 - 91
Complexity:
moderate

Purpose

This function is designed to fetch pagedata from a hardcoded Poulpharm invoice (UUID: cf2a3833-4a8f-4004-ab8d-8dc3c5f561bc) stored in Remarkable's cloud storage. It authenticates with Remarkable's API, retrieves the root document structure, locates the specific invoice by UUID, finds the pagedata component within the invoice's docSchema, and downloads its content. The function is useful for extracting page-specific data from Remarkable documents for further processing or analysis.

Source Code

def get_invoice_pagedata():
    """Get pagedata from Poulpharm invoice"""
    auth = RemarkableAuth()
    session = auth.get_authenticated_session()
    
    if not session:
        raise RuntimeError("Failed to authenticate")
    
    # Poulpharm invoice UUID
    invoice_uuid = "cf2a3833-4a8f-4004-ab8d-8dc3c5f561bc"
    
    # Get root info
    root_response = session.get("https://eu.tectonic.remarkable.com/sync/v4/root")
    root_response.raise_for_status()
    root_data = root_response.json()
    
    # Get root content
    root_content_response = session.get(f"https://eu.tectonic.remarkable.com/sync/v3/files/{root_data['hash']}")
    root_content_response.raise_for_status()
    root_content = root_content_response.text
    
    # Find invoice in root
    lines = root_content.strip().split('\n')
    invoice_hash = None
    
    for line in lines[1:]:  # Skip version header
        if invoice_uuid in line:
            parts = line.split(':')
            if len(parts) >= 5:
                invoice_hash = parts[0]
                break
    
    if not invoice_hash:
        print(f"❌ Invoice not found in root")
        return None
    
    print(f"✅ Found invoice hash: {invoice_hash}")
    
    # Get invoice docSchema
    doc_response = session.get(f"https://eu.tectonic.remarkable.com/sync/v3/files/{invoice_hash}")
    doc_response.raise_for_status()
    doc_content = doc_response.text
    doc_lines = doc_content.strip().split('\n')
    
    print(f"📄 Invoice docSchema:")
    for i, line in enumerate(doc_lines):
        print(f"   Line {i}: {line}")
    
    # Find pagedata component
    pagedata_hash = None
    pagedata_line = None
    
    for line in doc_lines[1:]:  # Skip version
        if ':' in line and '.pagedata' in line:
            parts = line.split(':')
            if len(parts) >= 5:
                pagedata_hash = parts[0]
                pagedata_line = line
                break
    
    if not pagedata_hash:
        print(f"❌ Pagedata not found in invoice")
        return None
    
    print(f"✅ Found pagedata hash: {pagedata_hash}")
    print(f"✅ Pagedata line: {pagedata_line}")
    
    # Get pagedata content
    pagedata_response = session.get(f"https://eu.tectonic.remarkable.com/sync/v3/files/{pagedata_hash}")
    pagedata_response.raise_for_status()
    pagedata_content = pagedata_response.text
    
    print(f"📄 Pagedata content:")
    print(f"   Size: {len(pagedata_content)} bytes")
    print(f"   Content: {repr(pagedata_content)}")
    
    return {
        'hash': pagedata_hash,
        'content': pagedata_content,
        'line': pagedata_line
    }

Return Value

Returns a dictionary with three keys if successful: 'hash' (string containing the pagedata component's hash identifier), 'content' (string containing the raw pagedata content), and 'line' (string containing the full docSchema line for the pagedata component). Returns None if authentication fails, the invoice is not found, or the pagedata component is not found within the invoice.

Dependencies

  • requests
  • auth

Required Imports

import json
from auth import RemarkableAuth

Usage Example

from auth import RemarkableAuth
import json

# Ensure RemarkableAuth is configured with valid credentials
result = get_invoice_pagedata()

if result:
    print(f"Pagedata hash: {result['hash']}")
    print(f"Content size: {len(result['content'])} bytes")
    print(f"DocSchema line: {result['line']}")
    # Process the pagedata content
    pagedata_content = result['content']
else:
    print("Failed to retrieve pagedata")

Best Practices

  • The invoice UUID is hardcoded, making this function specific to one document. Consider parameterizing the UUID for reusability.
  • The function prints debug information to stdout. Consider using a logging framework for production use.
  • Error handling could be improved - HTTP errors are raised but not caught, which may cause unexpected crashes.
  • The function assumes a specific format for the docSchema (colon-separated values). Changes to the API format could break parsing.
  • Consider adding retry logic for network requests to handle transient failures.
  • The session object should ideally be reused across multiple calls rather than creating a new authenticated session each time.
  • Add timeout parameters to HTTP requests to prevent hanging on slow connections.

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function find_invoice_uuid 70.1% similar

    Searches through all documents in a Remarkable cloud storage account to find documents with 'invoice' in their name and prints their UUIDs.

    From: /tf/active/vicechatdev/e-ink-llm/cloudtest/find_invoice_uuid.py
  • function analyze_pylontech_document 54.0% similar

    Performs deep forensic analysis of a specific Pylontech document stored in reMarkable Cloud, examining all document components (content, metadata, pagedata, PDF) to identify patterns and differences between app-uploaded and API-uploaded documents.

    From: /tf/active/vicechatdev/e-ink-llm/cloudtest/analyze_pylontech_details.py
  • function main_v113 48.8% similar

    Analyzes and compares .content files for PDF documents stored in reMarkable cloud storage, identifying differences between working and non-working documents.

    From: /tf/active/vicechatdev/e-ink-llm/cloudtest/analyze_content_files.py
  • function process_single_remarkable_file 48.4% similar

    Asynchronously processes a single document from reMarkable Cloud by downloading it, processing it through an e-ink LLM processor, and returning the path to the generated response PDF.

    From: /tf/active/vicechatdev/e-ink-llm/remarkable_processor.py
  • function show_current_root 48.3% similar

    Fetches and displays the current root.docSchema from the reMarkable cloud sync service, showing metadata and analyzing document entries.

    From: /tf/active/vicechatdev/e-ink-llm/cloudtest/show_current_root.py
← Back to Browse