generate_neo4j_schema_report

function generate_neo4j_schema_report

Maturity: 47

Generates a comprehensive schema report of a Neo4j graph database, including node labels, relationships, properties, constraints, indexes, and sample data, outputting multiple report formats (JSON, HTML, Python snippets, Cypher examples).

File:
/tf/active/vicechatdev/neo4j_schema_report.py

Lines:
24 - 243

Complexity:
complex

Purpose

This function connects to a Neo4j database and performs extensive introspection to document the complete database schema. It analyzes node labels, relationship types, property keys, constraints, indexes, and the connections between different node types. The function generates multiple output files including JSON schema data, diagram data, an HTML report, Python code snippets for interacting with the schema, and Cypher query examples. This is useful for database documentation, onboarding new developers, schema analysis, and generating boilerplate code for working with the database.

Source Code

def generate_neo4j_schema_report(
    neo4j_uri="bolt://localhost:7687", 
    neo4j_username="neo4j", 
    neo4j_password="password",
    output_dir="./neo4j_schema"
):
    """
    Generate a comprehensive schema report of a Neo4j database
    
    Parameters:
    - neo4j_uri: Neo4j server URI
    - neo4j_username: Neo4j username
    - neo4j_password: Neo4j password
    - output_dir: Directory to save the report files
    """
    print("Connecting to Neo4j and generating schema report...")
    
    # Create output directory if it doesn't exist
    os.makedirs(output_dir, exist_ok=True)
    
    # Connect to Neo4j
    driver = GraphDatabase.driver(neo4j_uri, auth=(neo4j_username, neo4j_password))
    
    try:
        # Dictionary to store all schema information
        schema_info = {
            "node_labels": [],
            "relationship_types": [],
            "property_keys": [],
            "constraints": [],
            "indexes": [],
            "node_counts": {},
            "relationship_counts": {},
            "sample_nodes": {},
            "sample_relationships": {},
            "node_properties": defaultdict(set),
            "relationship_properties": defaultdict(set),
            "node_relationships": defaultdict(lambda: defaultdict(list))
        }
        
        with driver.session() as session:
            # Get node labels and counts
            print("Retrieving node labels and counts...")
            result = session.run("""
                CALL db.labels() YIELD label
                RETURN label, count(label) as count
                ORDER BY label
            """)
            
            for record in result:
                label = record["label"]
                schema_info["node_labels"].append(label)
            
            # Get node counts per label
            for label in schema_info["node_labels"]:
                count_result = session.run(f"MATCH (n:{label}) RETURN count(n) as count")
                count = count_result.single()["count"]
                schema_info["node_counts"][label] = count
            
            # Get relationship types and counts
            print("Retrieving relationship types and counts...")
            result = session.run("""
                CALL db.relationshipTypes() YIELD relationshipType
                RETURN relationshipType
                ORDER BY relationshipType
            """)
            
            for record in result:
                rel_type = record["relationshipType"]
                schema_info["relationship_types"].append(rel_type)
            
            # Get relationship counts per type
            for rel_type in schema_info["relationship_types"]:
                count_result = session.run(f"MATCH ()-[r:{rel_type}]->() RETURN count(r) as count")
                count = count_result.single()["count"]
                schema_info["relationship_counts"][rel_type] = count
            
            # Get property keys
            print("Retrieving property keys...")
            result = session.run("""
                CALL db.propertyKeys() YIELD propertyKey
                RETURN propertyKey
                ORDER BY propertyKey
            """)
            
            for record in result:
                schema_info["property_keys"].append(record["propertyKey"])
            
            # Get constraints
            print("Retrieving constraints...")
            try:
                # For Neo4j 4.x+
                result = session.run("SHOW CONSTRAINTS")
                for record in result:
                    schema_info["constraints"].append(dict(record))
            except:
                # For older Neo4j versions
                try:
                    result = session.run("CALL db.constraints()")
                    for record in result:
                        schema_info["constraints"].append(dict(record))
                except:
                    print("Could not retrieve constraints information.")
            
            # Get indexes
            print("Retrieving indexes...")
            try:
                # For Neo4j 4.x+
                result = session.run("SHOW INDEXES")
                for record in result:
                    schema_info["indexes"].append(dict(record))
            except:
                # For older Neo4j versions
                try:
                    result = session.run("CALL db.indexes()")
                    for record in result:
                        schema_info["indexes"].append(dict(record))
                except:
                    print("Could not retrieve indexes information.")
            
            # Get node properties per label
            print("Analyzing node properties per label...")
            for label in schema_info["node_labels"]:
                # Get a sample node to see its properties
                result = session.run(f"""
                    MATCH (n:{label})
                    RETURN n LIMIT 1
                """)
                
                record = result.single()
                if record:
                    node = record["n"]
                    schema_info["sample_nodes"][label] = dict(node)
                    
                    # Record all properties for this label
                    for key in node.keys():
                        schema_info["node_properties"][label].add(key)
            
            # Convert sets to lists for JSON serialization
            for label in schema_info["node_properties"]:
                schema_info["node_properties"][label] = sorted(list(schema_info["node_properties"][label]))
            
            # Get relationship properties per type
            print("Analyzing relationship properties per type...")
            for rel_type in schema_info["relationship_types"]:
                # Get a sample relationship to see its properties
                result = session.run(f"""
                    MATCH ()-[r:{rel_type}]->()
                    RETURN r LIMIT 1
                """)
                
                record = result.single()
                if record:
                    rel = record["r"]
                    schema_info["sample_relationships"][rel_type] = dict(rel)
                    
                    # Record all properties for this relationship type
                    for key in rel.keys():
                        schema_info["relationship_properties"][rel_type].add(key)
            
            # Convert sets to lists for JSON serialization
            for rel_type in schema_info["relationship_properties"]:
                schema_info["relationship_properties"][rel_type] = sorted(list(schema_info["relationship_properties"][rel_type]))
            
            # Analyze node relationships (which labels connect to which)
            print("Analyzing relationships between node labels...")
            for source_label in schema_info["node_labels"]:
                for target_label in schema_info["node_labels"]:
                    for rel_type in schema_info["relationship_types"]:
                        # Check if this relationship exists between these labels
                        result = session.run(f"""
                            MATCH (a:{source_label})-[r:{rel_type}]->(b:{target_label})
                            RETURN count(r) as count LIMIT 1
                        """)
                        
                        count = result.single()["count"]
                        if count > 0:
                            schema_info["node_relationships"][source_label][target_label].append({
                                "type": rel_type,
                                "count": count
                            })
            
            # Convert defaultdict to regular dict for JSON serialization
            schema_info["node_relationships"] = {k: dict(v) for k, v in schema_info["node_relationships"].items()}
            schema_info["node_properties"] = dict(schema_info["node_properties"])
            schema_info["relationship_properties"] = dict(schema_info["relationship_properties"])
            
        # Generate schema diagram data
        diagram_data = generate_diagram_data(schema_info)
        
        # Save all schema information as JSON using the custom encoder
        schema_file = os.path.join(output_dir, "neo4j_schema.json")
        with open(schema_file, "w") as f:
            json.dump(schema_info, f, indent=2, cls=Neo4jEncoder)
        
        # Save diagram data using the custom encoder
        diagram_file = os.path.join(output_dir, "neo4j_diagram.json")
        with open(diagram_file, "w") as f:
            json.dump(diagram_data, f, indent=2, cls=Neo4jEncoder)
        
        # Generate HTML report
        generate_html_report(schema_info, output_dir)
        
        # Generate Python code snippets
        generate_python_snippets(schema_info, output_dir)
        
        # Generate Cypher query examples
        generate_cypher_examples(schema_info, output_dir)
        
        print(f"Schema report generated in {output_dir}")
        print(f"  - Full schema: {schema_file}")
        print(f"  - Diagram data: {diagram_file}")
        print(f"  - HTML report: {os.path.join(output_dir, 'neo4j_schema_report.html')}")
        print(f"  - Python snippets: {os.path.join(output_dir, 'neo4j_python_snippets.py')}")
        print(f"  - Cypher examples: {os.path.join(output_dir, 'neo4j_cypher_examples.cypher')}")
        
    except Exception as e:
        print(f"Error generating schema report: {str(e)}")
    finally:
        driver.close()

Parameters

Name	Type	Default	Kind
`neo4j_uri`	-	'bolt://localhost:7687'	positional_or_keyword
`neo4j_username`	-	'neo4j'	positional_or_keyword
`neo4j_password`	-	'password'	positional_or_keyword
`output_dir`	-	'./neo4j_schema'	positional_or_keyword

Parameter Details

neo4j_uri: The connection URI for the Neo4j database server. Should be in the format 'bolt://hostname:port' or 'neo4j://hostname:port'. Default is 'bolt://localhost:7687' for a local Neo4j instance.

neo4j_username: The username for authenticating with the Neo4j database. Default is 'neo4j', which is the default username for Neo4j installations.

neo4j_password: The password for authenticating with the Neo4j database. Default is 'password', but should be changed to match your actual database password.

output_dir: The directory path where all generated report files will be saved. The directory will be created if it doesn't exist. Default is './neo4j_schema' in the current working directory.

Return Value

This function does not return any value (implicitly returns None). Instead, it generates multiple files in the specified output directory: 'neo4j_schema.json' (complete schema information), 'neo4j_diagram.json' (diagram visualization data), 'neo4j_schema_report.html' (human-readable HTML report), 'neo4j_python_snippets.py' (Python code examples), and 'neo4j_cypher_examples.cypher' (Cypher query examples). The function prints status messages and file locations to stdout.

Dependencies

neo4j
pandas
os
json
sys
collections
datetime

Required Imports

import os
import json
from neo4j import GraphDatabase
from collections import defaultdict

Conditional/Optional Imports

These imports are only needed under specific conditions:

from neo4j import time

Condition: Required for handling Neo4j temporal types in the Neo4jEncoder class (used for JSON serialization)

Required (conditional)

import pandas as pd

Condition: May be used by helper functions (generate_html_report, generate_python_snippets, generate_cypher_examples, generate_diagram_data) that are called by this function

Required (conditional)

from datetime import datetime

Condition: May be used by helper functions for timestamp generation in reports

Required (conditional)

import sys

Condition: May be used by helper functions for error handling or system operations

Required (conditional)

Usage Example

# Basic usage with default local Neo4j instance
generate_neo4j_schema_report(
    neo4j_uri='bolt://localhost:7687',
    neo4j_username='neo4j',
    neo4j_password='your_password',
    output_dir='./schema_reports'
)

# Usage with remote Neo4j instance
generate_neo4j_schema_report(
    neo4j_uri='neo4j://production-server.example.com:7687',
    neo4j_username='admin',
    neo4j_password='secure_password',
    output_dir='/path/to/reports/production_schema'
)

# Minimal usage with defaults (local instance)
generate_neo4j_schema_report(neo4j_password='mypassword')

Best Practices

Ensure the Neo4j database is not under heavy load when running this function, as it performs multiple queries across all labels and relationship types
For large databases, this function may take significant time to complete as it analyzes all node labels and relationship types
Always use secure passwords and avoid hardcoding credentials; consider using environment variables or configuration files
The function requires several helper functions (generate_diagram_data, generate_html_report, generate_python_snippets, generate_cypher_examples) and a Neo4jEncoder class to be defined in the same module
The function handles both Neo4j 4.x+ and older versions for constraints and indexes queries, but may need updates for future Neo4j versions
Ensure sufficient disk space in the output directory as the function generates multiple files
The function uses f-strings for Cypher queries with label/relationship type names, which is safe for Neo4j identifiers but be aware of potential injection if labels contain special characters
The driver connection is properly closed in a finally block to prevent connection leaks
For production use, consider adding connection timeout parameters and retry logic for network resilience

Similar Components

AI-powered semantic similarity - components with related functionality:

function generate_diagram_data 68.1% similar

Transforms Neo4j schema information into a structured format suitable for graph visualization, creating separate node and edge data structures.
From: /tf/active/vicechatdev/neo4j_schema_report.py
function generate_cypher_examples 65.7% similar

Generates a comprehensive Cypher query examples file for interacting with a Neo4j graph database based on the provided schema information.
From: /tf/active/vicechatdev/neo4j_schema_report.py
function generate_python_snippets 64.9% similar

Generates a Python file containing code snippets and helper functions for interacting with a Neo4j graph database based on the provided schema information.
From: /tf/active/vicechatdev/neo4j_schema_report.py
function generate_html_report 64.7% similar

Generate HTML report from schema info
From: /tf/active/vicechatdev/neo4j_schema_report.py
function init_connections 49.8% similar

Initializes and returns a Neo4j database session and driver connection using configuration settings.
From: /tf/active/vicechatdev/offline_docstore_multi_vice.py

← Back to Browse

Assistant

Hi! I can help improve this code. Tell me what you'd like to enhance (e.g., "add error handling", "optimize performance", "improve readability", "add type hints").

Code Comparison

Original Code

                            def generate_neo4j_schema_report(
    neo4j_uri="bolt://localhost:7687", 
    neo4j_username="neo4j", 
    neo4j_password="password",
    output_dir="./neo4j_schema"
):
    """
    Generate a comprehensive schema report of a Neo4j database
    
    Parameters:
    - neo4j_uri: Neo4j server URI
    - neo4j_username: Neo4j username
    - neo4j_password: Neo4j password
    - output_dir: Directory to save the report files
    """
    print("Connecting to Neo4j and generating schema report...")
    
    # Create output directory if it doesn't exist
    os.makedirs(output_dir, exist_ok=True)
    
    # Connect to Neo4j
    driver = GraphDatabase.driver(neo4j_uri, auth=(neo4j_username, neo4j_password))
    
    try:
        # Dictionary to store all schema information
        schema_info = {
            "node_labels": [],
            "relationship_types": [],
            "property_keys": [],
            "constraints": [],
            "indexes": [],
            "node_counts": {},
            "relationship_counts": {},
            "sample_nodes": {},
            "sample_relationships": {},
            "node_properties": defaultdict(set),
            "relationship_properties": defaultdict(set),
            "node_relationships": defaultdict(lambda: defaultdict(list))
        }
        
        with driver.session() as session:
            # Get node labels and counts
            print("Retrieving node labels and counts...")
            result = session.run("""
                CALL db.labels() YIELD label
                RETURN label, count(label) as count
                ORDER BY label
            """)
            
            for record in result:
                label = record["label"]
                schema_info["node_labels"].append(label)
            
            # Get node counts per label
            for label in schema_info["node_labels"]:
                count_result = session.run(f"MATCH (n:{label}) RETURN count(n) as count")
                count = count_result.single()["count"]
                schema_info["node_counts"][label] = count
            
            # Get relationship types and counts
            print("Retrieving relationship types and counts...")
            result = session.run("""
                CALL db.relationshipTypes() YIELD relationshipType
                RETURN relationshipType
                ORDER BY relationshipType
            """)
            
            for record in result:
                rel_type = record["relationshipType"]
                schema_info["relationship_types"].append(rel_type)
            
            # Get relationship counts per type
            for rel_type in schema_info["relationship_types"]:
                count_result = session.run(f"MATCH ()-[r:{rel_type}]->() RETURN count(r) as count")
                count = count_result.single()["count"]
                schema_info["relationship_counts"][rel_type] = count
            
            # Get property keys
            print("Retrieving property keys...")
            result = session.run("""
                CALL db.propertyKeys() YIELD propertyKey
                RETURN propertyKey
                ORDER BY propertyKey
            """)
            
            for record in result:
                schema_info["property_keys"].append(record["propertyKey"])
            
            # Get constraints
            print("Retrieving constraints...")
            try:
                # For Neo4j 4.x+
                result = session.run("SHOW CONSTRAINTS")
                for record in result:
                    schema_info["constraints"].append(dict(record))
            except:
                # For older Neo4j versions
                try:
                    result = session.run("CALL db.constraints()")
                    for record in result:
                        schema_info["constraints"].append(dict(record))
                except:
                    print("Could not retrieve constraints information.")
            
            # Get indexes
            print("Retrieving indexes...")
            try:
                # For Neo4j 4.x+
                result = session.run("SHOW INDEXES")
                for record in result:
                    schema_info["indexes"].append(dict(record))
            except:
                # For older Neo4j versions
                try:
                    result = session.run("CALL db.indexes()")
                    for record in result:
                        schema_info["indexes"].append(dict(record))
                except:
                    print("Could not retrieve indexes information.")
            
            # Get node properties per label
            print("Analyzing node properties per label...")
            for label in schema_info["node_labels"]:
                # Get a sample node to see its properties
                result = session.run(f"""
                    MATCH (n:{label})
                    RETURN n LIMIT 1
                """)
                
                record = result.single()
                if record:
                    node = record["n"]
                    schema_info["sample_nodes"][label] = dict(node)
                    
                    # Record all properties for this label
                    for key in node.keys():
                        schema_info["node_properties"][label].add(key)
            
            # Convert sets to lists for JSON serialization
            for label in schema_info["node_properties"]:
                schema_info["node_properties"][label] = sorted(list(schema_info["node_properties"][label]))
            
            # Get relationship properties per type
            print("Analyzing relationship properties per type...")
            for rel_type in schema_info["relationship_types"]:
                # Get a sample relationship to see its properties
                result = session.run(f"""
                    MATCH ()-[r:{rel_type}]->()
                    RETURN r LIMIT 1
                """)
                
                record = result.single()
                if record:
                    rel = record["r"]
                    schema_info["sample_relationships"][rel_type] = dict(rel)
                    
                    # Record all properties for this relationship type
                    for key in rel.keys():
                        schema_info["relationship_properties"][rel_type].add(key)
            
            # Convert sets to lists for JSON serialization
            for rel_type in schema_info["relationship_properties"]:
                schema_info["relationship_properties"][rel_type] = sorted(list(schema_info["relationship_properties"][rel_type]))
            
            # Analyze node relationships (which labels connect to which)
            print("Analyzing relationships between node labels...")
            for source_label in schema_info["node_labels"]:
                for target_label in schema_info["node_labels"]:
                    for rel_type in schema_info["relationship_types"]:
                        # Check if this relationship exists between these labels
                        result = session.run(f"""
                            MATCH (a:{source_label})-[r:{rel_type}]->(b:{target_label})
                            RETURN count(r) as count LIMIT 1
                        """)
                        
                        count = result.single()["count"]
                        if count > 0:
                            schema_info["node_relationships"][source_label][target_label].append({
                                "type": rel_type,
                                "count": count
                            })
            
            # Convert defaultdict to regular dict for JSON serialization
            schema_info["node_relationships"] = {k: dict(v) for k, v in schema_info["node_relationships"].items()}
            schema_info["node_properties"] = dict(schema_info["node_properties"])
            schema_info["relationship_properties"] = dict(schema_info["relationship_properties"])
            
        # Generate schema diagram data
        diagram_data = generate_diagram_data(schema_info)
        
        # Save all schema information as JSON using the custom encoder
        schema_file = os.path.join(output_dir, "neo4j_schema.json")
        with open(schema_file, "w") as f:
            json.dump(schema_info, f, indent=2, cls=Neo4jEncoder)
        
        # Save diagram data using the custom encoder
        diagram_file = os.path.join(output_dir, "neo4j_diagram.json")
        with open(diagram_file, "w") as f:
            json.dump(diagram_data, f, indent=2, cls=Neo4jEncoder)
        
        # Generate HTML report
        generate_html_report(schema_info, output_dir)
        
        # Generate Python code snippets
        generate_python_snippets(schema_info, output_dir)
        
        # Generate Cypher query examples
        generate_cypher_examples(schema_info, output_dir)
        
        print(f"Schema report generated in {output_dir}")
        print(f"  - Full schema: {schema_file}")
        print(f"  - Diagram data: {diagram_file}")
        print(f"  - HTML report: {os.path.join(output_dir, 'neo4j_schema_report.html')}")
        print(f"  - Python snippets: {os.path.join(output_dir, 'neo4j_python_snippets.py')}")
        print(f"  - Cypher examples: {os.path.join(output_dir, 'neo4j_cypher_examples.cypher')}")
        
    except Exception as e:
        print(f"Error generating schema report: {str(e)}")
    finally:
        driver.close()
                        

Improved Code

🔍 Code Extractor

function generate_neo4j_schema_report

Purpose

Source Code

Parameters

Parameter Details

Return Value

Dependencies

Required Imports

Conditional/Optional Imports

Usage Example

Best Practices

Tags

Similar Components

function generate_diagram_data 68.1% similar

function generate_cypher_examples 65.7% similar

function generate_python_snippets 64.9% similar

function generate_html_report 64.7% similar

function init_connections 49.8% similar

function generate_neo4j_schema_report

Purpose

Source Code

Parameters

Parameter Details

Return Value

Dependencies

Required Imports

Conditional/Optional Imports

Usage Example

Best Practices

Tags

Similar Components

function generate_diagram_data 68.1% similar

function generate_cypher_examples 65.7% similar

function generate_python_snippets 64.9% similar

function generate_html_report 64.7% similar

function init_connections 49.8% similar

✨ Improve Code: generate_neo4j_schema_report

Code Comparison