function create_folder_hierarchy
Creates a hierarchical structure of Subfolder nodes in a Neo4j graph database based on a file system path, connecting each folder level with PATH relationships.
/tf/active/vicechatdev/offline_docstore_multi_vice.py
1277 - 1327
complex
Purpose
This function is designed to mirror a file system's folder structure within a Neo4j graph database. It parses a file path, creates Subfolder nodes for each directory level, and establishes parent-child relationships between them. The function connects the top-level folder to a Rootfolder node and ensures that duplicate folders are not created by checking for existing nodes. This is typically used in document management systems or knowledge graphs where file organization needs to be represented in a graph structure.
Source Code
def create_folder_hierarchy(session, common_path, file_path, topfolder, rootfolder_uid):
"""Create a hierarchy of Subfolder nodes based on the file path"""
print("working on ",file_path)
subpath=str(file).replace(common_path,'').replace('/'+folder_name+'/','')
# Split into folder components
folders = subpath.split("/")
folders.pop() # Remove the filename itself
if not folders: # No subfolders
return None
print("Folders: ",folders)
current_path = common_path+'/'+topfolder
parent_uid = None
key=evaluate_query(session,"match (x:Docstores) where not ('Template' in labels(x)) return x.Keys")
# Create folder hierarchy
for i, folder in enumerate(folders):
current_path = os.path.join(current_path, folder)
folder_escaped = folder.replace("'", "`")
current_path_escaped = current_path.replace("'", "``")
# Check if this folder node already exists - get result DATA
result_data = run_query(session,f"MATCH (f:Subfolder {{Path: '{current_path_escaped}'}})"
f" RETURN f.UID as uid").data()
#print(result_data)
# Check if the result data list is empty
if not result_data:
# Create new folder node
folder_uid = str(uuid4())
if i == 0:
# Connect to the References node since it's the first level
run_query(session,f"MATCH (x:Rootfolder {{UID:'{rootfolder_uid}'}}) "
f" MERGE (x)-[:PATH]->(:Subfolder {{UID: '{folder_uid}', "
f"Name: '{folder_escaped}', Path: '{current_path_escaped}', "
f"Level: '{i+1}',"
f"Keys:'{key}'}})")
else:
# Connect to parent folder
run_query(session,f"MATCH (p:Subfolder {{UID: '{parent_uid}'}})"
f" MERGE (p)-[:PATH]->(:Subfolder {{UID: '{folder_uid}', "
f"Name: '{folder_escaped}', Path: '{current_path_escaped}', "
f"Level: '{i+1}',"
f"Keys:'{key}'}})")
parent_uid = folder_uid
else:
# Access the UID from the first result in the data list
parent_uid = result_data[0]['uid']
# Return the UID of the deepest subfolder
return parent_uid
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
session |
- | - | positional_or_keyword |
common_path |
- | - | positional_or_keyword |
file_path |
- | - | positional_or_keyword |
topfolder |
- | - | positional_or_keyword |
rootfolder_uid |
- | - | positional_or_keyword |
Parameter Details
session: A Neo4j database session object used to execute Cypher queries against the graph database. Must be an active session from neo4j.GraphDatabase.driver().session()
common_path: The base/root path string that should be removed from the file_path to determine the relative folder structure. This represents the common prefix shared by all files being processed
file_path: The complete file system path string for the file being processed. This path will be parsed to extract the folder hierarchy
topfolder: The name of the top-level folder string that serves as the root of the hierarchy being created. This is appended to common_path to build the current_path
rootfolder_uid: The unique identifier (UID) string of the Rootfolder node in Neo4j to which the first-level Subfolder should be connected via a PATH relationship
Return Value
Returns the UID (string) of the deepest/last Subfolder node in the hierarchy, which represents the immediate parent folder of the file. Returns None if there are no subfolders in the path (file is directly in the root folder). This UID can be used to link the actual file node to its containing folder.
Dependencies
neo4juuidos
Required Imports
from neo4j import GraphDatabase
from uuid import uuid4
import os
Usage Example
from neo4j import GraphDatabase
from uuid import uuid4
import os
# Assuming evaluate_query and run_query functions are defined
# Assuming folder_name and file variables are in scope
# Setup Neo4j connection
driver = GraphDatabase.driver('bolt://localhost:7687', auth=('neo4j', 'password'))
session = driver.session()
# Define parameters
common_path = '/home/user/documents'
file_path = '/home/user/documents/project/reports/2024/report.pdf'
topfolder = 'project'
rootfolder_uid = 'root-folder-uuid-12345'
# Create folder hierarchy
deepest_folder_uid = create_folder_hierarchy(
session=session,
common_path=common_path,
file_path=file_path,
topfolder=topfolder,
rootfolder_uid=rootfolder_uid
)
if deepest_folder_uid:
print(f'Deepest folder UID: {deepest_folder_uid}')
else:
print('No subfolders created')
session.close()
driver.close()
Best Practices
- Ensure the Neo4j session is active and properly authenticated before calling this function
- The function has dependencies on undefined variables 'folder_name' and 'file' which should be defined in the calling scope or refactored as parameters
- The function relies on helper functions 'evaluate_query' and 'run_query' which must be available in the module
- Consider adding error handling for database connection failures and query execution errors
- The function uses string escaping for single quotes (replacing with backticks) which may not be sufficient for all special characters - consider using parameterized queries instead
- The function prints debug information to console - consider using proper logging instead
- Ensure the Rootfolder node with the specified UID exists before calling this function to avoid orphaned Subfolder nodes
- The function creates nodes with a 'Keys' property fetched from Docstores - ensure this query returns valid data
- Consider wrapping database operations in try-except blocks to handle Neo4j exceptions gracefully
- The path manipulation logic assumes Unix-style paths - may need adjustment for Windows compatibility
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function create_folder_hierarchy_v2 96.4% similar
-
function create_folder_hierarchy_v1 94.0% similar
-
function create_folder 72.6% similar
-
function add_document_to_graph 63.8% similar
-
function add_document_to_graph_v1 63.3% similar