🔍 Code Extractor

Search Components

Full-Text: Fast keyword matching | Semantic: AI-powered understanding of intent (finds similar concepts)

Search Results for "duplicate"

Found 50 matching component(s)

  • class MetadataCatalog

    Helper class to manage FileCloud metadata sets and attributes. This class provides methods to work with FileCloud metadata by providing a more user-friendly interface on top of the raw API.

    File: /tf/active/vicechatdev/metadata_catalog.py

    class metadatacatalog
  • class OneCo_hybrid_RAG

    A class named OneCo_hybrid_RAG

    File: /tf/active/vicechatdev/OneCo_hybrid_RAG copy.py

    class oneco_hybrid_rag
  • class ReferenceManager

    Manages document references for inline citation and bibliography generation in a RAG (Retrieval-Augmented Generation) system.

    File: /tf/active/vicechatdev/fixed_project_victoria_generator.py

    citation bibliography reference-management document-tracking RAG
  • class OneCo_hybrid_RAG_v1

    A class named OneCo_hybrid_RAG

    File: /tf/active/vicechatdev/OneCo_hybrid_RAG_old.py

    class oneco_hybrid_rag
  • function create_folder_hierarchy

    Creates a hierarchical structure of Subfolder nodes in a Neo4j graph database based on a file system path, connecting each folder level with PATH relationships.

    File: /tf/active/vicechatdev/offline_docstore_multi_vice.py

    neo4j graph-database file-system hierarchy folder-structure
  • class OneCo_hybrid_RAG_v2

    A class named OneCo_hybrid_RAG

    File: /tf/active/vicechatdev/OneCo_hybrid_RAG.py

    class oneco_hybrid_rag
  • class ExtensiveSearchManager

    Manages extensive search functionality including full document retrieval, summarization, and enhanced context gathering.

    File: /tf/active/vicechatdev/OneCo_hybrid_RAG.py

    class extensivesearchmanager
  • function create_folder_hierarchy_v1

    Creates a hierarchical structure of Subfolder nodes in a Neo4j graph database based on a file path, connecting each folder level with PATH relationships.

    File: /tf/active/vicechatdev/offline_docstore_multi.py

    neo4j graph-database folder-hierarchy file-system path-processing
  • function api_upload

    Flask API endpoint that handles file uploads, validates file types, saves files to a configured directory structure, and automatically indexes the uploaded document for search/retrieval.

    File: /tf/active/vicechatdev/docchat/app.py

    file-upload api-endpoint document-management rag indexing
  • class DocChatRAG

    Main RAG engine with three operating modes: 1. Basic RAG (similarity search) 2. Extensive (full document retrieval with preprocessing) 3. Full Reading (process all documents)

    File: /tf/active/vicechatdev/docchat/rag_engine.py

    class docchatrag
  • class DocumentIndexer

    A class for indexing documents into ChromaDB with support for multiple file formats (PDF, Word, PowerPoint, Excel, text files), smart incremental indexing, and document chunk management.

    File: /tf/active/vicechatdev/docchat/document_indexer.py

    document-indexing vector-database chromadb embeddings pdf-processing
  • function clean_collection

    Cleans a ChromaDB collection by removing duplicate and similar documents using hash-based and similarity-based deduplication techniques, then saves the cleaned data to a new collection.

    File: /tf/active/vicechatdev/chromadb-cleanup/main.py

    data-cleaning deduplication chromadb vector-database similarity-detection
  • function main_v52

    Command-line interface function that orchestrates a ChromaDB collection cleaning pipeline by removing duplicate and similar documents through hashing and similarity screening.

    File: /tf/active/vicechatdev/chromadb-cleanup/main copy.py

    cli command-line data-cleaning deduplication chromadb
  • function setup_similarity_cleaner

    A pytest fixture that creates and returns a configured SimilarityCleaner instance with a threshold of 0.8 for use in test cases.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_similarity_cleaner.py

    pytest fixture testing similarity data-cleaning
  • function test_identical_text_removal

    A pytest test function that verifies the SimilarityCleaner's ability to remove identical duplicate text entries from a list while preserving unique documents.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_similarity_cleaner.py

    testing pytest unit-test deduplication text-processing
  • function test_similarity_threshold_effect

    A pytest test function that validates the behavior of SimilarityCleaner with different similarity threshold values, ensuring that higher thresholds retain more texts while lower thresholds are more aggressive in removing similar content.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_similarity_cleaner.py

    testing pytest text-deduplication similarity-detection data-cleaning
  • class TestCombinedCleaner

    A unittest test class that validates the functionality of the CombinedCleaner class, testing its ability to remove duplicate and similar texts from collections.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_combined_cleaner.py

    unittest testing text-cleaning deduplication similarity-detection
  • function test_remove_identical_chunks

    A pytest test function that verifies the HashCleaner's ability to remove duplicate text chunks from a list while preserving order and unique entries.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_hash_cleaner.py

    testing pytest unit-test deduplication text-processing
  • function test_no_identical_chunks

    A unit test function that verifies the HashCleaner's behavior when processing a list of unique text chunks, ensuring no chunks are removed when all are distinct.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_hash_cleaner.py

    unit-test pytest hash-cleaner deduplication text-processing
  • function test_identical_chunks_with_different_cases

    A unit test function that verifies the HashCleaner's ability to remove duplicate text chunks while being case-sensitive, ensuring that strings differing only in case are treated as distinct entries.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_hash_cleaner.py

    unit-test pytest deduplication case-sensitive text-processing
  • function build_similarity_matrix

    Computes a pairwise cosine similarity matrix for a collection of embedding vectors, where each cell (i,j) represents the similarity between embedding i and embedding j.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/utils/similarity_utils.py

    embeddings similarity cosine-similarity matrix nlp
  • function find_similar_documents

    Identifies pairs of similar documents by comparing their embeddings and returns those exceeding a specified similarity threshold, sorted by similarity score.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/utils/similarity_utils.py

    document-similarity embedding-comparison duplicate-detection cosine-similarity nlp
  • function hash_text

    Creates a SHA-256 hash of normalized text content to generate a unique identifier for documents, enabling duplicate detection and content comparison.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/utils/hash_utils.py

    hashing text-processing deduplication content-fingerprinting sha256
  • function identify_duplicates

    Identifies duplicate documents by computing hash values of their text content and grouping documents with identical hashes.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/utils/hash_utils.py

    deduplication document-processing hashing data-cleaning duplicate-detection
  • function get_unique_documents

    Identifies and separates unique documents from duplicates in a list by comparing hash values of document text content.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/utils/hash_utils.py

    deduplication document-processing data-cleaning hashing text-processing
  • class HashCleaner

    A document deduplication cleaner that removes documents with identical content by comparing hash values of document text.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/cleaners/hash_cleaner.py

    deduplication data-cleaning hash-based document-processing duplicate-removal
  • class CombinedCleaner

    A document cleaner that combines hash-based and similarity-based cleaning approaches to remove both exact and near-duplicate documents in a two-stage process.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/cleaners/combined_cleaner.py

    document-cleaning deduplication data-processing hash-based similarity-based
  • class SimilarityCleaner

    A document cleaning class that identifies and removes duplicate or highly similar documents based on embedding vector similarity, keeping only representative documents from each similarity group.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/cleaners/similarity_cleaner.py

    document-processing deduplication similarity embeddings clustering
  • function create_document_v5

    Flask API endpoint that creates a new document or duplicates an existing document with options to copy or reference sections.

    File: /tf/active/vicechatdev/vice_ai/new_app.py

    flask api-endpoint document-management create duplicate
  • function create_text_section_for_document

    Flask API endpoint that creates or adds text sections to a document with three action modes: creating new sections, adding existing sections, or duplicating existing sections.

    File: /tf/active/vicechatdev/vice_ai/new_app.py

    flask api-endpoint document-management text-sections crud-operations
  • function add_existing_section_to_document

    Flask API endpoint that adds an existing text section to a document with advanced positioning options, copy creation, and access control validation.

    File: /tf/active/vicechatdev/vice_ai/new_app.py

    flask api-endpoint document-management text-section authentication
  • function duplicate_text_section

    Flask API endpoint that creates a duplicate of an existing text section with ownership verification and optional custom title.

    File: /tf/active/vicechatdev/vice_ai/new_app.py

    flask api-endpoint text-section duplicate copy
  • class TextSectionService

    Service class for managing TextSection entities, providing CRUD operations, versioning, chat functionality, and search capabilities.

    File: /tf/active/vicechatdev/vice_ai/services.py

    service-layer text-management versioning crud-operations chat-integration
  • class DocumentService

    Service class for managing Document entities, including creation, retrieval, section management, versioning, and duplication operations.

    File: /tf/active/vicechatdev/vice_ai/services.py

    document-management service-layer crud-operations versioning section-management
  • class OneCo_hybrid_RAG_v3

    A class named OneCo_hybrid_RAG

    File: /tf/active/vicechatdev/vice_ai/hybrid_rag_engine.py

    class oneco_hybrid_rag
  • class ExtensiveSearchManager_v1

    Manages extensive search functionality including full document retrieval, summarization, and enhanced context gathering.

    File: /tf/active/vicechatdev/vice_ai/hybrid_rag_engine.py

    class extensivesearchmanager
  • function check_document_exists_by_uid

    Queries a Neo4j database to check if a ControlledDocument with a specific UID exists and returns the document object if found.

    File: /tf/active/vicechatdev/CDocs/FC_sync.py

    database neo4j document-management lookup validation
  • function main_v8

    Main execution function that orchestrates the import of controlled documents from FileCloud into a Neo4j database, checking for duplicates and managing document metadata.

    File: /tf/active/vicechatdev/CDocs/FC_sync.py

    document-management filecloud neo4j import batch-processing
  • class ExecutionGuard

    A guard mechanism that prevents recursive or repeated function calls by tracking active executions and enforcing cooldown periods between calls.

    File: /tf/active/vicechatdev/CDocs/__init__.py

    concurrency guard debounce rate-limiting recursion-prevention
  • function guard_execution

    A decorator factory that prevents rapid repeated execution of a function by enforcing a cooldown period between calls.

    File: /tf/active/vicechatdev/CDocs/__init__.py

    decorator rate-limiting throttling cooldown debounce
  • function get_next_document_number

    Atomically retrieves and increments the next sequential document number for a specific document type and department combination from a Neo4j graph database.

    File: /tf/active/vicechatdev/CDocs/db/db_operations.py

    document-management counter sequential-numbering neo4j graph-database
  • function check_node_exists

    Checks if a node with a specified label and matching properties exists in a Neo4j graph database.

    File: /tf/active/vicechatdev/CDocs/db/db_operations.py

    neo4j graph-database node-validation database-query existence-check
  • function node_exists

    Checks if a node with a specific UID exists in a Neo4j graph database by querying for the node and returning a boolean result.

    File: /tf/active/vicechatdev/CDocs/db/db_operations.py

    neo4j graph-database node-validation existence-check database-query
  • function initialize_document_counters

    Initializes document counters in Neo4j by analyzing existing ControlledDocument nodes and creating DocumentCounter nodes with values higher than the maximum existing document numbers for each department/type combination.

    File: /tf/active/vicechatdev/CDocs/db/schema_manager.py

    neo4j database-initialization document-management counter-initialization graph-database
  • function validate_document_number

    Validates a custom document number by checking its format, length constraints, and uniqueness in the database, returning a dictionary with validation results.

    File: /tf/active/vicechatdev/CDocs/controllers/document_controller.py

    validation document-management database-query uniqueness-check format-validation
  • function clone_document

    Clones an existing controlled document, creating a new document with optional content copying, custom properties, and FileCloud integration.

    File: /tf/active/vicechatdev/CDocs/controllers/document_controller.py

    document-management cloning duplication controlled-documents filecloud
  • function add_reviewer_to_active_review

    Adds a reviewer to an active review cycle with optional sequence ordering and instructions, handling permissions, notifications, and audit logging.

    File: /tf/active/vicechatdev/CDocs/controllers/review_controller.py

    document-management review-cycle reviewer-assignment permissions audit-logging
  • function schedule_reminders

    Automated scheduled task that sends reminder and overdue notifications for pending reviews and approvals that are due within 3 days or past their due date.

    File: /tf/active/vicechatdev/CDocs/utils/notifications.py

    scheduled-task notifications reminders neo4j graph-database
  • function check_document_hash_exists

    Checks if a document with a given SHA-256 hash already exists in the database by querying the graph database for matching DocumentVersion nodes.

    File: /tf/active/vicechatdev/CDocs/utils/document_processor.py

    database graph-database neo4j document-management deduplication
  • class DocumentDashboard

    Dashboard for viewing and managing controlled documents.

    File: /tf/active/vicechatdev/CDocs/ui/document_dashboard.py

    class documentdashboard

Search Examples