🔍 Code Extractor

function create_folder_hierarchy

Maturity: 51

Creates a hierarchical structure of Subfolder nodes in a Neo4j graph database based on a file system path, connecting each folder level with PATH relationships.

File:
/tf/active/vicechatdev/offline_docstore_multi_vice.py
Lines:
1277 - 1327
Complexity:
complex

Purpose

This function is designed to mirror a file system's folder structure within a Neo4j graph database. It parses a file path, creates Subfolder nodes for each directory level, and establishes parent-child relationships between them. The function connects the top-level folder to a Rootfolder node and ensures that duplicate folders are not created by checking for existing nodes. This is typically used in document management systems or knowledge graphs where file organization needs to be represented in a graph structure.

Source Code

def create_folder_hierarchy(session, common_path, file_path, topfolder, rootfolder_uid):
    """Create a hierarchy of Subfolder nodes based on the file path"""
    print("working on ",file_path)
    subpath=str(file).replace(common_path,'').replace('/'+folder_name+'/','')
    # Split into folder components
    folders = subpath.split("/")
    folders.pop()  # Remove the filename itself
    
    if not folders:  # No subfolders
        return None
    print("Folders: ",folders)
    current_path = common_path+'/'+topfolder
    parent_uid = None
    key=evaluate_query(session,"match (x:Docstores)  where not ('Template' in labels(x)) return x.Keys")
    
    # Create folder hierarchy
    for i, folder in enumerate(folders):
        current_path = os.path.join(current_path, folder)
        folder_escaped = folder.replace("'", "`")
        current_path_escaped = current_path.replace("'", "``")
        
        # Check if this folder node already exists - get result DATA
        result_data = run_query(session,f"MATCH (f:Subfolder {{Path: '{current_path_escaped}'}})"
                          f" RETURN f.UID as uid").data()
        #print(result_data)
        
        # Check if the result data list is empty
        if not result_data:
            # Create new folder node
            folder_uid = str(uuid4())
            if i == 0:
                # Connect to the References node since it's the first level
                run_query(session,f"MATCH (x:Rootfolder {{UID:'{rootfolder_uid}'}}) "
                         f" MERGE (x)-[:PATH]->(:Subfolder {{UID: '{folder_uid}', "
                         f"Name: '{folder_escaped}', Path: '{current_path_escaped}', "
                         f"Level: '{i+1}',"
                         f"Keys:'{key}'}})")
            else:
                # Connect to parent folder
                run_query(session,f"MATCH (p:Subfolder {{UID: '{parent_uid}'}})"
                         f" MERGE (p)-[:PATH]->(:Subfolder {{UID: '{folder_uid}', "
                         f"Name: '{folder_escaped}', Path: '{current_path_escaped}', "
                         f"Level: '{i+1}',"
                         f"Keys:'{key}'}})")
            parent_uid = folder_uid
        else:
            # Access the UID from the first result in the data list
            parent_uid = result_data[0]['uid']
    
    # Return the UID of the deepest subfolder
    return parent_uid

Parameters

Name Type Default Kind
session - - positional_or_keyword
common_path - - positional_or_keyword
file_path - - positional_or_keyword
topfolder - - positional_or_keyword
rootfolder_uid - - positional_or_keyword

Parameter Details

session: A Neo4j database session object used to execute Cypher queries against the graph database. Must be an active session from neo4j.GraphDatabase.driver().session()

common_path: The base/root path string that should be removed from the file_path to determine the relative folder structure. This represents the common prefix shared by all files being processed

file_path: The complete file system path string for the file being processed. This path will be parsed to extract the folder hierarchy

topfolder: The name of the top-level folder string that serves as the root of the hierarchy being created. This is appended to common_path to build the current_path

rootfolder_uid: The unique identifier (UID) string of the Rootfolder node in Neo4j to which the first-level Subfolder should be connected via a PATH relationship

Return Value

Returns the UID (string) of the deepest/last Subfolder node in the hierarchy, which represents the immediate parent folder of the file. Returns None if there are no subfolders in the path (file is directly in the root folder). This UID can be used to link the actual file node to its containing folder.

Dependencies

  • neo4j
  • uuid
  • os

Required Imports

from neo4j import GraphDatabase
from uuid import uuid4
import os

Usage Example

from neo4j import GraphDatabase
from uuid import uuid4
import os

# Assuming evaluate_query and run_query functions are defined
# Assuming folder_name and file variables are in scope

# Setup Neo4j connection
driver = GraphDatabase.driver('bolt://localhost:7687', auth=('neo4j', 'password'))
session = driver.session()

# Define parameters
common_path = '/home/user/documents'
file_path = '/home/user/documents/project/reports/2024/report.pdf'
topfolder = 'project'
rootfolder_uid = 'root-folder-uuid-12345'

# Create folder hierarchy
deepest_folder_uid = create_folder_hierarchy(
    session=session,
    common_path=common_path,
    file_path=file_path,
    topfolder=topfolder,
    rootfolder_uid=rootfolder_uid
)

if deepest_folder_uid:
    print(f'Deepest folder UID: {deepest_folder_uid}')
else:
    print('No subfolders created')

session.close()
driver.close()

Best Practices

  • Ensure the Neo4j session is active and properly authenticated before calling this function
  • The function has dependencies on undefined variables 'folder_name' and 'file' which should be defined in the calling scope or refactored as parameters
  • The function relies on helper functions 'evaluate_query' and 'run_query' which must be available in the module
  • Consider adding error handling for database connection failures and query execution errors
  • The function uses string escaping for single quotes (replacing with backticks) which may not be sufficient for all special characters - consider using parameterized queries instead
  • The function prints debug information to console - consider using proper logging instead
  • Ensure the Rootfolder node with the specified UID exists before calling this function to avoid orphaned Subfolder nodes
  • The function creates nodes with a 'Keys' property fetched from Docstores - ensure this query returns valid data
  • Consider wrapping database operations in try-except blocks to handle Neo4j exceptions gracefully
  • The path manipulation logic assumes Unix-style paths - may need adjustment for Windows compatibility

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function create_folder_hierarchy_v2 96.4% similar

    Creates a hierarchical structure of Subfolder nodes in a Neo4j graph database based on a file path, connecting each folder level with PATH relationships.

    From: /tf/active/vicechatdev/offline_docstore_multi.py
  • function create_folder_hierarchy_v1 94.0% similar

    Creates a hierarchical structure of Subfolder nodes in a Neo4j graph database based on a file path, establishing parent-child relationships between folders.

    From: /tf/active/vicechatdev/offline_parser_docstore.py
  • function create_folder 72.6% similar

    Creates a nested folder structure on a FileCloud server by traversing a path and creating missing directories.

    From: /tf/active/vicechatdev/filecloud_wuxi_sync.py
  • function add_document_to_graph 63.8% similar

    Creates nodes and relationships in a Neo4j graph database for a processed document, including its text and table chunks, connecting it to a folder hierarchy.

    From: /tf/active/vicechatdev/offline_docstore_multi.py
  • function add_document_to_graph_v1 63.3% similar

    Creates a Neo4j graph node for a processed document and connects it to a folder hierarchy, along with its text and table chunks.

    From: /tf/active/vicechatdev/offline_docstore_multi_vice.py
← Back to Browse