🔍 Code Extractor

function search_documents

Maturity: 59

Searches for documents in a Neo4j graph database based on multiple optional filter criteria including text query, document type, department, status, and owner.

File:
/tf/active/vicechatdev/document_controller_backup.py
Lines:
308 - 402
Complexity:
moderate

Purpose

This function provides a flexible document search capability for a controlled document management system. It constructs and executes a Cypher query against a Neo4j database to retrieve documents matching specified criteria. The function supports text-based searching across document titles and descriptions, as well as filtering by metadata fields. It includes logging via a decorator and returns results as a list of dictionaries, making it suitable for API endpoints or UI search interfaces.

Source Code

def search_documents(query=None, doc_type=None, department=None, status=None, owner=None, limit=100, user=None):
    """
    Search for documents based on criteria.
    
    Parameters
    ----------
    query : str, optional
        Text search query
    doc_type : str, optional
        Document type to filter by
    department : str, optional
        Department to filter by
    status : str, optional
        Status to filter by
    owner : str, optional
        Owner UID to filter by
    limit : int, optional
        Maximum number of results to return
    user : DocUser, optional
        The current user (for permission filtering)
        
    Returns
    -------
    List[Dict[str, Any]]
        List of document dictionaries matching the search criteria
    """
    try:
        from CDocs.db import db_operations
        
        logger.info("Controller action: search_documents")
        
        # Build the Cypher query
        cypher_query = """
        MATCH (d:Document)
        """
        
        # Add optional filters
        where_clauses = []
        params = {}
        
        if query:
            where_clauses.append("(d.title CONTAINS $query OR d.description CONTAINS $query)")
            params["query"] = query
        
        if doc_type:
            where_clauses.append("d.doc_type = $doc_type")
            params["doc_type"] = doc_type
        
        if department:
            where_clauses.append("d.department = $department")
            params["department"] = department
        
        if status:
            where_clauses.append("d.status = $status")
            params["status"] = status
        
        if owner:
            where_clauses.append("d.owner_id = $owner")
            params["owner"] = owner
        
        # Add WHERE clause if we have any conditions
        if where_clauses:
            cypher_query += "WHERE " + " AND ".join(where_clauses)
        
        # Add permission filtering if user is provided
        # This is commented out for now as it depends on schema details
        # if user and hasattr(user, 'uid') and user.role != 'ADMIN':
        #    # Only add more WHERE conditions if we already have some
        #    connector = "AND" if where_clauses else "WHERE"
        #    cypher_query += f" {connector} (d.owner_id = $user_id OR d.is_public = true)"
        #    params["user_id"] = user.uid
        
        # Add RETURN clause with LIMIT
        cypher_query += f"""
        RETURN d 
        ORDER BY d.created_date DESC
        LIMIT {int(limit)}
        """
        
        # Execute query
        result = db_operations.run_query(cypher_query, params)
        
        # Process results into a list of document dictionaries
        documents = []
        if result:
            for record in result:
                if 'd' in record:
                    document = dict(record['d'])
                    documents.append(document)
        
        return documents
        
    except Exception as e:
        logger.error(f"Error in controller action search_documents: {e}")
        raise e

Parameters

Name Type Default Kind
query - None positional_or_keyword
doc_type - None positional_or_keyword
department - None positional_or_keyword
status - None positional_or_keyword
owner - None positional_or_keyword
limit - 100 positional_or_keyword
user - None positional_or_keyword

Parameter Details

query: Optional text string to search within document titles and descriptions using case-sensitive CONTAINS matching. Can be any string value or None to skip text search filtering.

doc_type: Optional string to filter documents by their type classification (e.g., 'policy', 'procedure', 'form'). Must match the exact value stored in the document's doc_type property.

department: Optional string to filter documents by the department they belong to. Must match the exact department name stored in the document's department property.

status: Optional string to filter documents by their current status (e.g., 'DRAFT', 'PUBLISHED', 'ARCHIVED'). Must match one of the valid status values defined in the system.

owner: Optional string representing the owner's unique identifier (UID) to filter documents by ownership. Must match the owner_id property stored on document nodes.

limit: Integer specifying the maximum number of documents to return. Defaults to 100. Must be a positive integer that will be cast to int for safety.

user: Optional DocUser object representing the current user making the search request. Intended for permission-based filtering (currently commented out in implementation). Should have 'uid' and 'role' attributes.

Return Value

Returns a List[Dict[str, Any]] containing document dictionaries. Each dictionary represents a document node from the database with all its properties (e.g., title, description, doc_type, department, status, owner_id, created_date). Returns an empty list if no documents match the criteria or if an error occurs during query execution. Documents are ordered by created_date in descending order (newest first).

Dependencies

  • logging
  • CDocs.db.db_operations
  • CDocs.controllers (for log_controller_action decorator)
  • CDocs.models.user_extensions (for DocUser type hint)

Required Imports

import logging
from CDocs.controllers import log_controller_action

Conditional/Optional Imports

These imports are only needed under specific conditions:

from CDocs.db import db_operations

Condition: imported lazily inside the function at runtime, required for all executions

Required (conditional)

Usage Example

# Basic text search
from CDocs.controllers import log_controller_action
import logging

logger = logging.getLogger(__name__)

@log_controller_action('search_documents')
def search_documents(query=None, doc_type=None, department=None, status=None, owner=None, limit=100, user=None):
    # ... function implementation ...
    pass

# Search for documents with 'safety' in title or description
results = search_documents(query='safety')

# Search for published policies in Engineering department
results = search_documents(
    doc_type='policy',
    department='Engineering',
    status='PUBLISHED',
    limit=50
)

# Search for documents owned by specific user
results = search_documents(owner='user-123-uid', limit=20)

# Combined search with multiple filters
results = search_documents(
    query='quality',
    doc_type='procedure',
    status='EFFECTIVE',
    department='QA',
    limit=10
)

# Process results
for doc in results:
    print(f"Title: {doc.get('title')}, Status: {doc.get('status')}")

Best Practices

  • Always handle the returned list safely as it may be empty if no documents match the criteria
  • Be aware that the text search using CONTAINS is case-sensitive; consider normalizing query strings if case-insensitive search is needed
  • The limit parameter is cast to int for SQL injection protection, but validate it before calling if accepting user input
  • The user-based permission filtering is currently commented out; implement it if role-based access control is required
  • Catch and handle exceptions appropriately as the function re-raises any errors encountered
  • Consider the performance impact of text searches on large document collections; indexing title and description fields in Neo4j is recommended
  • The function uses lazy import of db_operations which may hide import errors until runtime; ensure the module is available
  • When using multiple filters, understand they are combined with AND logic, which may return fewer results than expected
  • The ORDER BY created_date DESC ensures newest documents appear first, but this may not be suitable for all use cases

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function get_documents 82.1% similar

    Retrieves filtered and paginated documents from a Neo4j graph database with permission-based access control, supporting multiple filter criteria and search functionality.

    From: /tf/active/vicechatdev/document_controller_backup.py
  • function get_document_audit_trail 55.1% similar

    Retrieves the complete audit trail for a controlled document from a Neo4j graph database, including timestamps, user actions, and event details.

    From: /tf/active/vicechatdev/document_controller_backup.py
  • function create_document_v2 50.3% similar

    Creates a new controlled document in a document management system with specified properties, type, department, and status.

    From: /tf/active/vicechatdev/document_controller_backup.py
  • function get_document 50.2% similar

    Retrieves comprehensive details of a controlled document by its UID, with optional inclusion of version history, review cycles, and approval cycles.

    From: /tf/active/vicechatdev/document_controller_backup.py
  • function run_query 49.1% similar

    Executes a Cypher query against a Neo4j database session and returns the result, with optional parameterization for safe query execution.

    From: /tf/active/vicechatdev/offline_docstore_multi_vice.py
← Back to Browse