🔍 Code Extractor

function create_folder_hierarchy_v1

Maturity: 49

Creates a hierarchical structure of Subfolder nodes in a Neo4j graph database based on a file path, establishing parent-child relationships between folders.

File:
/tf/active/vicechatdev/offline_parser_docstore.py
Lines:
114 - 169
Complexity:
moderate

Purpose

This function parses a file path (expected to start with './PDF_docs/') and creates a corresponding hierarchy of Subfolder nodes in a Neo4j graph database. It connects the first-level folder to a Rootfolder node and subsequent folders to their parent folders. Each folder node is assigned a unique identifier (UID) and stores metadata including name, path, level, and keys. The function is designed to organize document storage in a graph structure, likely for a document management or knowledge graph system.

Source Code

def create_folder_hierarchy(graph, file_path):
    """Create a hierarchy of Subfolder nodes based on the file path"""
    # Get path components from the PDF_docs folder
    if file_path.startswith("./PDF_docs/"):
        rel_path = file_path[11:]  # Remove './PDF_docs/' prefix
    else:
        rel_path = os.path.basename(file_path)  # Just use filename if no expected prefix
    
    # If file is directly in the PDF_docs root
    if "/" not in rel_path:
        return None
    
    # Split into folder components
    folders = rel_path.split("/")
    folders.pop()  # Remove the filename itself
    
    if not folders:  # No subfolders
        return None
    
    current_path = "./PDF_docs"
    parent_uid = None
    key=graph.run("match (x:Docstores)  where not ('Template' in labels(x)) return x.Keys").evaluate()
    
    # Create folder hierarchy
    for i, folder in enumerate(folders):
        current_path = os.path.join(current_path, folder)
        folder_escaped = folder.replace("'", "`")
        current_path_escaped = current_path.replace("'", "``")
        
        # Check if this folder node already exists
        result = graph.run(f"MATCH (f:Subfolder {{Path: '{current_path_escaped}'}})"
                          f" RETURN f.UID as uid").data()
        
        if not result:
            # Create new folder node
            folder_uid = str(uuid4())
            if i == 0:
                # Connect to the References node since it's the first level
                graph.run(f"MATCH (x:Rootfolder {{Name:'T001'}}) "
                         f" MERGE (x)-[:PATH]->(:Subfolder {{UID: '{folder_uid}', "
                         f"Name: '{folder_escaped}', Path: '{current_path_escaped}', "
                         f"Level: '{i+1}',"
                         f"Keys:'{key}'}})")
            else:
                # Connect to parent folder
                graph.run(f"MATCH (p:Subfolder {{UID: '{parent_uid}'}})"
                         f" MERGE (p)-[:PATH]->(:Subfolder {{UID: '{folder_uid}', "
                         f"Name: '{folder_escaped}', Path: '{current_path_escaped}', "
                         f"Level: '{i+1}',"
                         f"Keys:'{key}'}})")
            parent_uid = folder_uid
        else:
            parent_uid = result[0]['uid']
    
    # Return the UID of the deepest subfolder
    return parent_uid

Parameters

Name Type Default Kind
graph - - positional_or_keyword
file_path - - positional_or_keyword

Parameter Details

graph: A Neo4j graph database connection object (likely from py2neo or similar library) that provides a 'run()' method to execute Cypher queries. This object is used to query and create nodes in the graph database.

file_path: A string representing the file path to process. Expected format is './PDF_docs/folder1/folder2/.../filename.ext'. The function extracts folder hierarchy from this path. If the path doesn't start with './PDF_docs/', only the basename is used.

Return Value

Returns a string containing the UID (Unique Identifier) of the deepest subfolder node created or found in the hierarchy. Returns None if the file is directly in the PDF_docs root directory with no subfolders, or if there are no folder components in the path.

Dependencies

  • neo4j
  • py2neo
  • uuid
  • os

Required Imports

from uuid import uuid4
import os

Usage Example

from py2neo import Graph
from uuid import uuid4
import os

# Establish Neo4j connection
graph = Graph('bolt://localhost:7687', auth=('neo4j', 'password'))

# Ensure required nodes exist
graph.run("MERGE (r:Rootfolder {Name:'T001'})")
graph.run("MERGE (d:Docstores {Keys:'default_key'})")

# Create folder hierarchy for a file
file_path = './PDF_docs/research/papers/2023/document.pdf'
deepest_folder_uid = create_folder_hierarchy(graph, file_path)

if deepest_folder_uid:
    print(f'Deepest folder UID: {deepest_folder_uid}')
    # Use this UID to link the document node
    graph.run(f"MATCH (f:Subfolder {{UID: '{deepest_folder_uid}'}}) "
              f"MERGE (f)-[:CONTAINS]->(:Document {{Path: '{file_path}'}})") 
else:
    print('File is in root directory, no subfolders created')

Best Practices

  • Ensure the Neo4j database has a Rootfolder node with Name='T001' before calling this function
  • Ensure at least one Docstores node exists without a 'Template' label
  • The function uses string interpolation in Cypher queries which could be vulnerable to injection; consider using parameterized queries for production use
  • File paths should follow the './PDF_docs/' convention for proper hierarchy creation
  • The function escapes single quotes in folder names but uses backticks (`) for single quotes and double backticks (``) for paths, which may need review
  • Consider adding error handling for database connection failures or malformed paths
  • The function creates nodes if they don't exist (MERGE pattern), making it idempotent for repeated calls with the same path
  • Parent-child relationships are established using [:PATH] relationship type
  • Each folder node stores its level in the hierarchy (1-indexed) for easy querying

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function create_folder_hierarchy_v2 94.1% similar

    Creates a hierarchical structure of Subfolder nodes in a Neo4j graph database based on a file path, connecting each folder level with PATH relationships.

    From: /tf/active/vicechatdev/offline_docstore_multi.py
  • function create_folder_hierarchy 94.0% similar

    Creates a hierarchical structure of Subfolder nodes in a Neo4j graph database based on a file system path, connecting each folder level with PATH relationships.

    From: /tf/active/vicechatdev/offline_docstore_multi_vice.py
  • function create_folder 68.9% similar

    Creates a nested folder structure on a FileCloud server by traversing a path and creating missing directories.

    From: /tf/active/vicechatdev/filecloud_wuxi_sync.py
  • function add_document_to_graph 65.3% similar

    Creates nodes and relationships in a Neo4j graph database for a processed document, including its text and table chunks, connecting it to a folder hierarchy.

    From: /tf/active/vicechatdev/offline_docstore_multi.py
  • function add_document_to_graph_v1 64.6% similar

    Creates a Neo4j graph node for a processed document and connects it to a folder hierarchy, along with its text and table chunks.

    From: /tf/active/vicechatdev/offline_docstore_multi_vice.py
← Back to Browse