🔍 Code Extractor

Browse Components

Showing 20 of 982 components

  • function setup_similarity_cleaner

    A pytest fixture that creates and returns a configured SimilarityCleaner instance with a threshold of 0.8 for use in test cases.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_similarity_cleaner.py | Lines: 5-7

    pytest fixture testing similarity data-cleaning
  • function download_model

    Downloads a model file from a specified URL and saves it to a local file path using HTTP GET request.

    File: /tf/active/vicechatdev/chromadb-cleanup/scripts/download_model.py | Lines: 4-11

    download http file-io model-management network
  • function save_data_to_chromadb

    Saves a list of document dictionaries to a ChromaDB vector database collection, optionally including embeddings and metadata.

    File: /tf/active/vicechatdev/chromadb-cleanup/main copy.py | Lines: 109-167

    chromadb vector-database document-storage embeddings persistence
  • function save_data_to_chromadb_v1

    Saves a list of document dictionaries to a ChromaDB collection, with support for batch processing, embeddings, and metadata storage.

    File: /tf/active/vicechatdev/chromadb-cleanup/main.py | Lines: 168-239

    chromadb vector-database document-storage embeddings batch-processing
  • function load_data_from_chromadb_v1

    Retrieves all documents from a specified ChromaDB collection, including their IDs, text content, embeddings, and metadata.

    File: /tf/active/vicechatdev/chromadb-cleanup/main copy.py | Lines: 69-107

    chromadb database document-retrieval vector-database embeddings
  • function load_data_from_chromadb

    Connects to a ChromaDB instance and retrieves all documents from a specified collection, returning them as a list of dictionaries with document IDs, text content, embeddings, and metadata.

    File: /tf/active/vicechatdev/chromadb-cleanup/main.py | Lines: 123-165

    chromadb vector-database data-loading document-retrieval embeddings
  • function clean_collection

    Cleans a ChromaDB collection by removing duplicate and similar documents using hash-based and similarity-based deduplication techniques, then saves the cleaned data to a new collection.

    File: /tf/active/vicechatdev/chromadb-cleanup/main.py | Lines: 71-120

    data-cleaning deduplication chromadb vector-database similarity-detection
  • class DocumentProtector

    A class that handles protecting PDF documents from editing by applying encryption and permission restrictions using pikepdf and PyMuPDF libraries.

    File: /tf/active/vicechatdev/document_auditor/src/security/document_protection.py | Lines: 9-118

    pdf security encryption document-protection permissions
  • class SignatureManager

    A class that manages digital signature images for documents, providing functionality to store, retrieve, and list signature files in a designated directory.

    File: /tf/active/vicechatdev/document_auditor/src/security/signature_manager.py | Lines: 6-141

    signature-management document-processing file-management image-processing digital-signatures
  • class Watermarker

    A class that adds watermark images to PDF documents with configurable opacity, scale, and positioning options.

    File: /tf/active/vicechatdev/document_auditor/src/security/watermark.py | Lines: 8-178

    pdf watermark document-processing image-processing pdf-manipulation
  • class HashGenerator

    A class that provides cryptographic hashing functionality for PDF documents, including hash generation, embedding, and verification for document integrity checking.

    File: /tf/active/vicechatdev/document_auditor/src/security/hash_generator.py | Lines: 11-215

    cryptography hashing SHA-256 PDF document-integrity
  • class SignatureGenerator

    A class that generates signature-like images from text names using italic fonts and decorative flourishes.

    File: /tf/active/vicechatdev/document_auditor/src/utils/signature_generator.py | Lines: 10-134

    image-generation signature PIL graphics text-rendering
  • class PDFAConverter

    A class that converts PDF files to PDF/A format for long-term archiving and compliance, supporting multiple compliance levels (1b, 2b, 3b) with fallback conversion methods.

    File: /tf/active/vicechatdev/document_auditor/src/utils/pdf_utils.py | Lines: 8-145

    pdf pdf-a document-conversion archiving compliance
  • class AuditPageGenerator

    A class that generates comprehensive PDF audit trail pages for documents, including document information, reviews, approvals, revision history, and event history with electronic signatures.

    File: /tf/active/vicechatdev/document_auditor/src/audit_page_generator.py | Lines: 55-434

    pdf-generation audit-trail document-management compliance electronic-signature
  • class SignatureImage

    A custom ReportLab Flowable class that renders signature images in PDF documents with automatic fallback to placeholder text when images are unavailable or cannot be loaded.

    File: /tf/active/vicechatdev/document_auditor/src/audit_page_generator.py | Lines: 26-52

    pdf-generation reportlab flowable signature image-rendering
  • class DocumentMerger

    A class that merges PDF documents with audit trail pages, combining an original PDF with an audit page and updating metadata to reflect the audit process.

    File: /tf/active/vicechatdev/document_auditor/src/document_merger.py | Lines: 5-72

    pdf document-processing merge audit-trail file-operations
  • class DocumentProcessor

    A comprehensive document processing class that converts documents to PDF, adds audit trails, applies security features (watermarks, signatures, hashing), and optionally converts to PDF/A format with document protection.

    File: /tf/active/vicechatdev/document_auditor/src/document_processor.py | Lines: 16-175

    document-processing pdf-generation audit-trail security watermarking
  • class DocumentProcessor_v1

    A document processing class that extracts text from PDF and Word documents using llmsherpa as the primary method with fallback support for PyPDF2, pdfplumber, and python-docx.

    File: /tf/active/vicechatdev/contract_validity_analyzer/utils/document_processor_new.py | Lines: 13-302

    document-processing text-extraction pdf-processing word-processing llmsherpa
  • class DocumentConverter_v1

    A class that converts various document formats (Word, Excel, PowerPoint, images) to PDF format using LibreOffice, unoconv, or PIL.

    File: /tf/active/vicechatdev/document_auditor/src/document_converter.py | Lines: 8-136

    document-conversion pdf file-processing office-documents image-to-pdf
  • function create_signature_image

    Generates a synthetic signature image for a given name, either as stylized text or as a random hand-drawn curve, and saves it as a PNG file with transparent background.

    File: /tf/active/vicechatdev/document_auditor/generate_sample_signatures.py | Lines: 17-140

    image-generation signature PIL graphics document-generation