function get_invoice_pagedata
Retrieves pagedata content from a specific Poulpharm invoice document stored in the Remarkable cloud service by authenticating, navigating the document hierarchy, and extracting the pagedata component.
/tf/active/vicechatdev/e-ink-llm/cloudtest/extract_invoice_pagedata.py
11 - 91
moderate
Purpose
This function is designed to fetch pagedata from a hardcoded Poulpharm invoice (UUID: cf2a3833-4a8f-4004-ab8d-8dc3c5f561bc) stored in Remarkable's cloud storage. It authenticates with Remarkable's API, retrieves the root document structure, locates the specific invoice by UUID, finds the pagedata component within the invoice's docSchema, and downloads its content. The function is useful for extracting page-specific data from Remarkable documents for further processing or analysis.
Source Code
def get_invoice_pagedata():
"""Get pagedata from Poulpharm invoice"""
auth = RemarkableAuth()
session = auth.get_authenticated_session()
if not session:
raise RuntimeError("Failed to authenticate")
# Poulpharm invoice UUID
invoice_uuid = "cf2a3833-4a8f-4004-ab8d-8dc3c5f561bc"
# Get root info
root_response = session.get("https://eu.tectonic.remarkable.com/sync/v4/root")
root_response.raise_for_status()
root_data = root_response.json()
# Get root content
root_content_response = session.get(f"https://eu.tectonic.remarkable.com/sync/v3/files/{root_data['hash']}")
root_content_response.raise_for_status()
root_content = root_content_response.text
# Find invoice in root
lines = root_content.strip().split('\n')
invoice_hash = None
for line in lines[1:]: # Skip version header
if invoice_uuid in line:
parts = line.split(':')
if len(parts) >= 5:
invoice_hash = parts[0]
break
if not invoice_hash:
print(f"❌ Invoice not found in root")
return None
print(f"✅ Found invoice hash: {invoice_hash}")
# Get invoice docSchema
doc_response = session.get(f"https://eu.tectonic.remarkable.com/sync/v3/files/{invoice_hash}")
doc_response.raise_for_status()
doc_content = doc_response.text
doc_lines = doc_content.strip().split('\n')
print(f"📄 Invoice docSchema:")
for i, line in enumerate(doc_lines):
print(f" Line {i}: {line}")
# Find pagedata component
pagedata_hash = None
pagedata_line = None
for line in doc_lines[1:]: # Skip version
if ':' in line and '.pagedata' in line:
parts = line.split(':')
if len(parts) >= 5:
pagedata_hash = parts[0]
pagedata_line = line
break
if not pagedata_hash:
print(f"❌ Pagedata not found in invoice")
return None
print(f"✅ Found pagedata hash: {pagedata_hash}")
print(f"✅ Pagedata line: {pagedata_line}")
# Get pagedata content
pagedata_response = session.get(f"https://eu.tectonic.remarkable.com/sync/v3/files/{pagedata_hash}")
pagedata_response.raise_for_status()
pagedata_content = pagedata_response.text
print(f"📄 Pagedata content:")
print(f" Size: {len(pagedata_content)} bytes")
print(f" Content: {repr(pagedata_content)}")
return {
'hash': pagedata_hash,
'content': pagedata_content,
'line': pagedata_line
}
Return Value
Returns a dictionary with three keys if successful: 'hash' (string containing the pagedata component's hash identifier), 'content' (string containing the raw pagedata content), and 'line' (string containing the full docSchema line for the pagedata component). Returns None if authentication fails, the invoice is not found, or the pagedata component is not found within the invoice.
Dependencies
requestsauth
Required Imports
import json
from auth import RemarkableAuth
Usage Example
from auth import RemarkableAuth
import json
# Ensure RemarkableAuth is configured with valid credentials
result = get_invoice_pagedata()
if result:
print(f"Pagedata hash: {result['hash']}")
print(f"Content size: {len(result['content'])} bytes")
print(f"DocSchema line: {result['line']}")
# Process the pagedata content
pagedata_content = result['content']
else:
print("Failed to retrieve pagedata")
Best Practices
- The invoice UUID is hardcoded, making this function specific to one document. Consider parameterizing the UUID for reusability.
- The function prints debug information to stdout. Consider using a logging framework for production use.
- Error handling could be improved - HTTP errors are raised but not caught, which may cause unexpected crashes.
- The function assumes a specific format for the docSchema (colon-separated values). Changes to the API format could break parsing.
- Consider adding retry logic for network requests to handle transient failures.
- The session object should ideally be reused across multiple calls rather than creating a new authenticated session each time.
- Add timeout parameters to HTTP requests to prevent hanging on slow connections.
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function find_invoice_uuid 70.1% similar
-
function analyze_pylontech_document 54.0% similar
-
function main_v113 48.8% similar
-
function process_single_remarkable_file 48.4% similar
-
function show_current_root 48.3% similar