function create_folder_hierarchy_v1
Creates a hierarchical structure of Subfolder nodes in a Neo4j graph database based on a file path, establishing parent-child relationships between folders.
/tf/active/vicechatdev/offline_parser_docstore.py
114 - 169
moderate
Purpose
This function parses a file path (expected to start with './PDF_docs/') and creates a corresponding hierarchy of Subfolder nodes in a Neo4j graph database. It connects the first-level folder to a Rootfolder node and subsequent folders to their parent folders. Each folder node is assigned a unique identifier (UID) and stores metadata including name, path, level, and keys. The function is designed to organize document storage in a graph structure, likely for a document management or knowledge graph system.
Source Code
def create_folder_hierarchy(graph, file_path):
"""Create a hierarchy of Subfolder nodes based on the file path"""
# Get path components from the PDF_docs folder
if file_path.startswith("./PDF_docs/"):
rel_path = file_path[11:] # Remove './PDF_docs/' prefix
else:
rel_path = os.path.basename(file_path) # Just use filename if no expected prefix
# If file is directly in the PDF_docs root
if "/" not in rel_path:
return None
# Split into folder components
folders = rel_path.split("/")
folders.pop() # Remove the filename itself
if not folders: # No subfolders
return None
current_path = "./PDF_docs"
parent_uid = None
key=graph.run("match (x:Docstores) where not ('Template' in labels(x)) return x.Keys").evaluate()
# Create folder hierarchy
for i, folder in enumerate(folders):
current_path = os.path.join(current_path, folder)
folder_escaped = folder.replace("'", "`")
current_path_escaped = current_path.replace("'", "``")
# Check if this folder node already exists
result = graph.run(f"MATCH (f:Subfolder {{Path: '{current_path_escaped}'}})"
f" RETURN f.UID as uid").data()
if not result:
# Create new folder node
folder_uid = str(uuid4())
if i == 0:
# Connect to the References node since it's the first level
graph.run(f"MATCH (x:Rootfolder {{Name:'T001'}}) "
f" MERGE (x)-[:PATH]->(:Subfolder {{UID: '{folder_uid}', "
f"Name: '{folder_escaped}', Path: '{current_path_escaped}', "
f"Level: '{i+1}',"
f"Keys:'{key}'}})")
else:
# Connect to parent folder
graph.run(f"MATCH (p:Subfolder {{UID: '{parent_uid}'}})"
f" MERGE (p)-[:PATH]->(:Subfolder {{UID: '{folder_uid}', "
f"Name: '{folder_escaped}', Path: '{current_path_escaped}', "
f"Level: '{i+1}',"
f"Keys:'{key}'}})")
parent_uid = folder_uid
else:
parent_uid = result[0]['uid']
# Return the UID of the deepest subfolder
return parent_uid
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
graph |
- | - | positional_or_keyword |
file_path |
- | - | positional_or_keyword |
Parameter Details
graph: A Neo4j graph database connection object (likely from py2neo or similar library) that provides a 'run()' method to execute Cypher queries. This object is used to query and create nodes in the graph database.
file_path: A string representing the file path to process. Expected format is './PDF_docs/folder1/folder2/.../filename.ext'. The function extracts folder hierarchy from this path. If the path doesn't start with './PDF_docs/', only the basename is used.
Return Value
Returns a string containing the UID (Unique Identifier) of the deepest subfolder node created or found in the hierarchy. Returns None if the file is directly in the PDF_docs root directory with no subfolders, or if there are no folder components in the path.
Dependencies
neo4jpy2neouuidos
Required Imports
from uuid import uuid4
import os
Usage Example
from py2neo import Graph
from uuid import uuid4
import os
# Establish Neo4j connection
graph = Graph('bolt://localhost:7687', auth=('neo4j', 'password'))
# Ensure required nodes exist
graph.run("MERGE (r:Rootfolder {Name:'T001'})")
graph.run("MERGE (d:Docstores {Keys:'default_key'})")
# Create folder hierarchy for a file
file_path = './PDF_docs/research/papers/2023/document.pdf'
deepest_folder_uid = create_folder_hierarchy(graph, file_path)
if deepest_folder_uid:
print(f'Deepest folder UID: {deepest_folder_uid}')
# Use this UID to link the document node
graph.run(f"MATCH (f:Subfolder {{UID: '{deepest_folder_uid}'}}) "
f"MERGE (f)-[:CONTAINS]->(:Document {{Path: '{file_path}'}})")
else:
print('File is in root directory, no subfolders created')
Best Practices
- Ensure the Neo4j database has a Rootfolder node with Name='T001' before calling this function
- Ensure at least one Docstores node exists without a 'Template' label
- The function uses string interpolation in Cypher queries which could be vulnerable to injection; consider using parameterized queries for production use
- File paths should follow the './PDF_docs/' convention for proper hierarchy creation
- The function escapes single quotes in folder names but uses backticks (`) for single quotes and double backticks (``) for paths, which may need review
- Consider adding error handling for database connection failures or malformed paths
- The function creates nodes if they don't exist (MERGE pattern), making it idempotent for repeated calls with the same path
- Parent-child relationships are established using [:PATH] relationship type
- Each folder node stores its level in the hierarchy (1-indexed) for easy querying
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function create_folder_hierarchy_v2 94.1% similar
-
function create_folder_hierarchy 94.0% similar
-
function create_folder 68.9% similar
-
function add_document_to_graph 65.3% similar
-
function add_document_to_graph_v1 64.6% similar