🔍 Code Extractor

class Study_overview

Maturity: 40

A class that generates comprehensive study overview reports by querying a Neo4j graph database and producing Excel files with ID mappings, audit logs, and Gantt chart visualizations of study progress.

File:
/tf/active/vicechatdev/resources/documents.py
Lines:
150 - 223
Complexity:
complex

Purpose

Study_overview is responsible for extracting and visualizing study-related data from a Neo4j database. It retrieves sample IDs, tracks task completion timelines across various laboratory processes (from quote to final report), and generates downloadable reports including Excel spreadsheets and interactive HTML Gantt charts. This class is designed for laboratory information management systems to provide stakeholders with a complete overview of study progress and sample tracking.

Source Code

class Study_overview():
    graph = Graph(config.DB_ADDR, auth=config.DB_AUTH, name=config.DB_NAME)
    
    def __init__(self, study):
        self.study=study
        self.id_table = self.get_ids(study)
        self.table_buffer, self.img_buffer = self.get_task_trail(study)
        self.files = [(self.id_table,f'{study}_IDs.xlsx'), (self.table_buffer,f'{study}_audit_log.xlsx'), (self.img_buffer,f'{study}_gantt_chart.html')]
    
    def get_ids(self, study):
        df=self.graph.run(f"""
MATCH (:Study {{N:'{study}'}})-[*]->(g:Group)-->(e:ExpItem)-->(o:Organ) WHERE NOT g.N = 'Z' AND NOT o.external_N = 'None'
RETURN DISTINCT e.N as CPathID, o.external_N as CustomerID ORDER BY CPathID""").to_data_frame()
        buffer = io.BytesIO()
        df.to_excel(buffer, index=False)
        buffer.seek(0)
        return buffer
    
    def get_task_trail(self, study):
        df = self.graph.run(f"""
            MATCH (s:Study {{N:'{study}'}})
            RETURN 'Quote' as Task, Date(s.quote) as Start, count(s) as Completed
            UNION ALL
            MATCH (s:Study {{N:'{study}'}})
            RETURN 'Study Plan' as Task, Date(s.studyplan) as Start, count(s) as Completed
            UNION ALL
            MATCH (s:Study {{N:'{study}'}})
            RETURN 'Delivery' as Task, Date(s.delivered) as Start, count(s) as Completed
            UNION ALL
            MATCH (s:Study {{N:'{study}'}})
            RETURN 'Draft Report' as Task, Date(s.draftreport) as Start, count(s) as Completed
            UNION ALL
            MATCH (s:Study {{N:'{study}'}})
            RETURN 'Final Report' as Task, Date(s.finalreport) as Start, count(s) as Completed
            UNION ALL
            MATCH (:Study {{N:'{study}'}})-[*]->(o:Organ)
            RETURN 'Sample Registration' as Task, Date(o.registered) as Start, count(o) as Completed
            UNION ALL
            MATCH (:Study {{N:'{study}'}})-[*]->(p:Parblock)
            RETURN 'Grossing' as Task, Date(p.created) as Start, count(p) as Completed
            UNION ALL
            MATCH (:Study {{N:'{study}'}})-[*]->(p:Parblock)
            RETURN 'Embedding' as Task, Date(p.embedded) as Start, count(p) as Completed
            UNION ALL
            MATCH (:Study {{N:'{study}'}})-[*]->(s:Slide) 
            RETURN 'Sectioning' as Task, Date(s.created) as Start, count(s) as Completed
            UNION ALL
            MATCH (:Study {{N:'{study}'}})-[*]->(s:Slide) 
            RETURN 'Staining' as Task, Date(s.stained) as Start, count(s) as Completed
        """).to_data_frame()
        df=df.dropna()
        df.Start=df.Start.apply(lambda x: x.to_native())
        df['End'] = df.Start + dt.timedelta(days=1)
        fig = px.timeline(df, x_start="Start", x_end="End", y="Task", color="Completed", custom_data=['Completed'],
                         category_orders={"Task":['Quote','Study Plan','Delivery','Sample Registration',
                                                  'Grossing','Embedding','Sectioning','Staining','Assessment',
                                                  'Draft Report','Final Report']},
                         title=f"{study} time table",
                         color_continuous_scale=[[0, '#F7D06D'], [1, '#3DCAB1']],)
        fig.update_layout({
        'plot_bgcolor': 'rgba(0, 0, 0, 0)',
        })
        config = dict({
            'displaylogo':False,
            'modeBarButtonsToRemove':['zoom','zoomIn','zoomOut','pan','lasso2d','select2d','autoScale','resetScale']
        }) 
        fig.update_traces(hovertemplate= "<b>%{y}</b><br> %{x}: %{customdata[0]}")
        table_buffer = io.BytesIO()
        df.to_excel(table_buffer, index=False)
        table_buffer.seek(0)
        figure_buffer = io.StringIO()
        fig.write_html(figure_buffer, config=config)
        figure_buffer.seek(0)
        return table_buffer, figure_buffer

Parameters

Name Type Default Kind
bases - -

Parameter Details

study: The study identifier (string) used to query the Neo4j database. This should match the 'N' property of Study nodes in the graph database. It's used to filter all related data including groups, experimental items, organs, parblocks, and slides associated with this specific study.

Return Value

The constructor returns a Study_overview instance with pre-populated buffers containing study data. The instance has three main outputs stored in the 'files' attribute: (1) id_table - BytesIO buffer with Excel file mapping CPathIDs to CustomerIDs, (2) table_buffer - BytesIO buffer with Excel file containing task audit log data, (3) img_buffer - StringIO buffer with HTML Gantt chart visualization. Each method returns specific data: get_ids() returns BytesIO Excel buffer, get_task_trail() returns tuple of (BytesIO table buffer, StringIO HTML buffer).

Class Interface

Methods

__init__(self, study: str) -> None

Purpose: Initializes the Study_overview instance, executes all database queries, and generates all report buffers

Parameters:

  • study: String identifier for the study to generate reports for, must match Study.N property in Neo4j database

Returns: None - initializes instance with populated attributes

get_ids(self, study: str) -> io.BytesIO

Purpose: Queries the database for all CPathIDs and CustomerIDs associated with the study and returns them as an Excel file buffer

Parameters:

  • study: String identifier for the study to retrieve IDs for

Returns: BytesIO buffer containing an Excel file with columns 'CPathID' and 'CustomerID', sorted by CPathID

get_task_trail(self, study: str) -> tuple[io.BytesIO, io.StringIO]

Purpose: Queries the database for all task completion dates and counts, generates both an Excel audit log and an interactive Gantt chart visualization

Parameters:

  • study: String identifier for the study to retrieve task trail for

Returns: Tuple of (table_buffer, figure_buffer) where table_buffer is BytesIO containing Excel file with task data, and figure_buffer is StringIO containing HTML Gantt chart

Attributes

Name Type Description Scope
graph Graph Class-level Neo4j Graph database connection object shared across all instances, initialized with config settings class
study str The study identifier passed to the constructor, stored for reference instance
id_table io.BytesIO BytesIO buffer containing Excel file with CPathID to CustomerID mappings for the study instance
table_buffer io.BytesIO BytesIO buffer containing Excel file with task audit log data including task names, start dates, and completion counts instance
img_buffer io.StringIO StringIO buffer containing HTML representation of the interactive Gantt chart visualization instance
files list[tuple[io.BytesIO | io.StringIO, str]] List of tuples containing (buffer, filename) pairs for all three generated files: IDs Excel, audit log Excel, and Gantt chart HTML instance

Dependencies

  • neo4j_driver
  • datetime
  • io
  • json
  • config
  • python-docx
  • pylibdmtx
  • docxtpl
  • PIL
  • plotly

Required Imports

from neo4j_driver import *
import datetime as dt
import io
import json
import config
from docx import Document
from docx.enum.text import WD_ALIGN_PARAGRAPH
from docx.shared import RGBColor
from docx.shared import Pt
from docx.shared import Length
from docx.shared import Inches
from docx.shared import Cm
from docx.enum.table import WD_ROW_HEIGHT_RULE
from pylibdmtx.pylibdmtx import decode
from pylibdmtx.pylibdmtx import encode
from docxtpl import DocxTemplate
from docxtpl import InlineImage
from PIL import Image
import plotly.express as px

Usage Example

# Ensure config.py has DB_ADDR, DB_AUTH, DB_NAME defined
# from neo4j_driver import *
# import config
# from Study_overview import Study_overview

# Instantiate the class with a study identifier
study_overview = Study_overview('STUDY-2024-001')

# Access the generated files
id_excel_buffer, id_filename = study_overview.files[0]
audit_log_buffer, audit_filename = study_overview.files[1]
gantt_chart_buffer, gantt_filename = study_overview.files[2]

# Save files to disk
with open(id_filename, 'wb') as f:
    f.write(id_excel_buffer.getvalue())

with open(audit_filename, 'wb') as f:
    f.write(audit_log_buffer.getvalue())

with open(gantt_filename, 'w') as f:
    f.write(gantt_chart_buffer.getvalue())

# Or access individual components
id_table = study_overview.id_table
table_buffer = study_overview.table_buffer
img_buffer = study_overview.img_buffer

Best Practices

  • Instantiate the class only when you need to generate reports, as it immediately executes database queries and generates all outputs in the constructor
  • Ensure the Neo4j database connection is available before instantiation, as the class-level Graph object is shared across all instances
  • The class uses Cypher queries with string interpolation - ensure study parameter is sanitized to prevent injection attacks
  • All data is loaded into memory as buffers during initialization, so be mindful of memory usage for large studies
  • The class creates three file buffers immediately upon instantiation; access them via the 'files' attribute as tuples of (buffer, filename)
  • Buffers are seeked to position 0 after creation, ready for reading or writing to files
  • The Gantt chart uses a fixed color scale and task order; modify the category_orders in get_task_trail() to change task ordering
  • The class does not handle database connection errors; wrap instantiation in try-except blocks for production use
  • The Graph object is a class variable shared across all instances, which may cause issues in multi-threaded environments
  • Date fields are converted from Neo4j date types to Python native datetime objects; ensure database dates are properly formatted

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function get_training_overview 60.9% similar

    Retrieves a comprehensive training overview for the admin panel, including training plans, active assignments, and recent completions from a Neo4j graph database.

    From: /tf/active/vicechatdev/CDocs/controllers/training_controller.py
  • class Tasklist 58.8% similar

    A class for tracking and managing the status of study tasks through a Neo4j graph database, monitoring progress through a predefined sequence of workflow steps.

    From: /tf/active/vicechatdev/resources/taskmanager.py
  • class options 58.7% similar

    A Panel-based UI class for managing slide release visibility in a study management system, allowing users to view and toggle the release status of slides at various hierarchical levels (Study, Group, Animal, Block, Slide).

    From: /tf/active/vicechatdev/options.py
  • class Total_tasks 56.7% similar

    A class that retrieves and manages an overview of all current tasks from a Neo4j graph database, organized by task type and filtered by usergroup.

    From: /tf/active/vicechatdev/resources/taskmanager.py
  • function generate_neo4j_schema_report 54.9% similar

    Generates a comprehensive schema report of a Neo4j graph database, including node labels, relationships, properties, constraints, indexes, and sample data, outputting multiple file formats (JSON, HTML, Python snippets, Cypher examples).

    From: /tf/active/vicechatdev/neo4j_schema_report.py
← Back to Browse