function add_document_to_graph_v1
Creates a Neo4j graph node for a processed document and connects it to a folder hierarchy, along with its text and table chunks.
/tf/active/vicechatdev/offline_docstore_multi_vice.py
1220 - 1275
moderate
Purpose
This function integrates a processed document into a Neo4j knowledge graph by creating a Document node with metadata, linking it to either a subfolder or root folder, and creating child nodes for text and table chunks extracted from the document. It's designed for document management systems that use graph databases to represent hierarchical document structures with searchable content chunks.
Source Code
def add_document_to_graph(session, processed_doc, deepest_folder_uid,rootfolder_uid):
"""Add processed document to Neo4j graph"""
file_path = processed_doc["file_path"]
file_path_escaped = file_path.replace("'", "``")
filename = processed_doc["file_name"]
filename_escaped = filename.replace("'", "``")
text_chunks = processed_doc.get("text_chunks", [])
table_chunks = processed_doc.get("table_chunks", [])
# Generate UID for the document
doc_uid = str(uuid4())
key = evaluate_query(session,"match (x:Docstores) where not ('Template' in labels(x)) return x.Keys")
# Connect document to folder
if deepest_folder_uid:
query = f"MATCH (f:Subfolder {{UID: '{deepest_folder_uid}'}}) " \
f"MERGE (f)-[:PATH]->(n:Document {{UID:'{doc_uid}', " \
f"Name:'{filename_escaped}', " \
f"File:'{file_path_escaped}', " \
f"Type:'{processed_doc['file_type']}', " \
f"Keys:'{key}'}})"
run_query(session,query)
else:
# Connect to root folder
query = f"MATCH (x:Rootfolder {{UID:'{rootfolder_uid}'}}) " \
f"MERGE (x)-[:PATH]->(n:Document {{UID:'{doc_uid}', " \
f"Name:'{filename_escaped}', " \
f"File:'{file_path_escaped}', " \
f"Type:'{processed_doc['file_type']}', " \
f"Keys:'{key}'}})"
run_query(session,query)
# Connect chunks to the document (unchanged)
for i,text in enumerate(text_chunks):
text1_escaped=text[1].replace("'", "``")
text0_escaped=text[0].replace("'", "``")
out=run_query(session,f"MATCH (x {{UID:'{doc_uid}'}}) "
f"MERGE (x)-[:CHUNK]->(n:Text_chunk {{UID:'{text[2]}',"
f"Name:'{filename_escaped}:Text:{str(i)}',"
f"Text:'{text1_escaped}',"
f"Parent:'{text0_escaped}',"
f"Keys:'{key}'}})")
for i,text in enumerate(table_chunks):
text1_escaped=text[1].replace("'", "``")
text2_escaped=text[2].replace("'", "``")
text0_escaped=text[0].replace("'", "``")
out=run_query(session,f"MATCH (x {{UID:'{doc_uid}'}}) "
f"MERGE (x)-[:CHUNK]->(n:Table_chunk {{UID:'{text[3]}',"
f"Name:'{filename_escaped}:Table:{str(i)}',"
f"Text:'{text2_escaped}',"
f"Html:'{text1_escaped}',"
f"Parent:'{text0_escaped}',"
f"Keys:'{key}'}})")
return doc_uid
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
session |
- | - | positional_or_keyword |
processed_doc |
- | - | positional_or_keyword |
deepest_folder_uid |
- | - | positional_or_keyword |
rootfolder_uid |
- | - | positional_or_keyword |
Parameter Details
session: Neo4j database session object used to execute Cypher queries against the graph database. Must be an active session from neo4j.GraphDatabase.driver().session()
processed_doc: Dictionary containing document metadata and content. Expected keys: 'file_path' (str: full path to file), 'file_name' (str: name of file), 'file_type' (str: document type/extension), 'text_chunks' (list of tuples: [(parent, text, uid), ...]), 'table_chunks' (list of tuples: [(parent, html, text, uid), ...])
deepest_folder_uid: String UID of the deepest subfolder in the hierarchy where this document should be attached. If None or empty, the document will be attached to the root folder instead
rootfolder_uid: String UID of the root folder node in the graph. Used as fallback when deepest_folder_uid is not provided
Return Value
Returns a string containing the generated UUID (doc_uid) for the newly created Document node in the Neo4j graph. This UID can be used to reference or query this document in subsequent operations.
Dependencies
neo4juuid
Required Imports
from uuid import uuid4
from neo4j import GraphDatabase
Usage Example
from neo4j import GraphDatabase
from uuid import uuid4
# Establish Neo4j connection
driver = GraphDatabase.driver('bolt://localhost:7687', auth=('neo4j', 'password'))
session = driver.session()
# Prepare processed document data
processed_doc = {
'file_path': '/documents/report.pdf',
'file_name': 'report.pdf',
'file_type': 'pdf',
'text_chunks': [
('Section 1', 'This is the first paragraph', 'chunk-uid-1'),
('Section 2', 'This is the second paragraph', 'chunk-uid-2')
],
'table_chunks': [
('Table 1', '<table><tr><td>Data</td></tr></table>', 'Data', 'table-uid-1')
]
}
# Add document to graph
doc_uid = add_document_to_graph(
session=session,
processed_doc=processed_doc,
deepest_folder_uid='subfolder-123-uid',
rootfolder_uid='root-folder-uid'
)
print(f'Document added with UID: {doc_uid}')
session.close()
driver.close()
Best Practices
- Ensure the Neo4j session is properly opened and closed using context managers or explicit close() calls
- Validate that processed_doc contains all required keys before calling this function to avoid KeyError exceptions
- The function uses string escaping (replacing ' with ``) which may not be sufficient for all special characters - consider using parameterized queries instead of string formatting to prevent Cypher injection
- Ensure UUIDs for chunks are unique and generated before passing to this function
- The function depends on external helper functions (evaluate_query, run_query) which must be implemented and available
- Consider wrapping this function in a try-except block to handle Neo4j connection errors and query failures
- For large documents with many chunks, consider batching chunk creation queries for better performance
- The Keys property is retrieved once and applied to all nodes - ensure this is the intended behavior for your use case
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function add_document_to_graph 98.6% similar
-
function create_folder_hierarchy_v1 64.6% similar
-
function create_folder_hierarchy 63.3% similar
-
function create_folder_hierarchy_v2 62.5% similar
-
function create_document 61.3% similar