function analyze_structure
Analyzes and reports on the folder structure of a SharePoint site, displaying folder paths, file counts, and searching for expected folder patterns.
/tf/active/vicechatdev/SPFCsync/analyze_structure.py
10 - 92
moderate
Purpose
This function connects to a SharePoint site via Microsoft Graph API, retrieves all documents from the root directory, analyzes the folder structure, and provides a detailed report including: total item count, unique folder paths with file counts, sample files in each folder, and searches for specific expected folder patterns (like numbered folders 01-08 and named folders like UCJ, Toxicology, CMC, etc.). It's useful for auditing SharePoint document libraries, understanding folder organization, and verifying expected folder structures exist.
Source Code
def analyze_structure():
"""Analyze the current folder structure"""
config = Config()
try:
client = SharePointGraphClient(
site_url=config.SHAREPOINT_SITE_URL,
client_id=config.AZURE_CLIENT_ID,
client_secret=config.AZURE_CLIENT_SECRET
)
print("ā
SharePoint Graph client initialized successfully")
print(f"Site ID: {client.site_id}")
print(f"Drive ID: {client.drive_id}")
print()
except Exception as e:
print(f"ā Failed to initialize client: {e}")
return
print("š ANALYZING CURRENT FOLDER STRUCTURE")
print("=" * 60)
# Get all documents and analyze their paths
try:
documents = client.get_all_documents("/")
print(f"ā
Found {len(documents)} total items")
print()
# Analyze folder distribution
folder_paths = set()
file_by_folder = {}
for doc in documents:
folder_path = doc.get('folder_path', 'Unknown')
folder_paths.add(folder_path)
if folder_path not in file_by_folder:
file_by_folder[folder_path] = []
file_by_folder[folder_path].append(doc)
print(f"š Found {len(folder_paths)} unique folder paths:")
print("-" * 40)
for folder_path in sorted(folder_paths):
files_in_folder = len(file_by_folder[folder_path])
print(f"š {folder_path}: {files_in_folder} files")
# Show first few files as examples
if files_in_folder > 0:
example_files = file_by_folder[folder_path][:3]
for file_info in example_files:
print(f" š {file_info.get('name', 'Unknown')}")
if files_in_folder > 3:
print(f" ... and {files_in_folder - 3} more files")
print()
# Look for specific patterns
print("\nš SEARCHING FOR EXPECTED FOLDER PATTERNS")
print("-" * 50)
expected_patterns = [
"01", "02", "03", "04", "05", "06", "07", "08",
"UCJ", "Toxicology", "CMC", "Quality", "Clinical",
"Regulatory", "Marketing", "Manufacturing"
]
for pattern in expected_patterns:
matching_folders = [path for path in folder_paths if pattern.lower() in path.lower()]
matching_files = [doc for doc in documents if pattern.lower() in doc.get('name', '').lower()]
if matching_folders or matching_files:
print(f"š Pattern '{pattern}':")
if matching_folders:
print(f" š Folders: {matching_folders}")
if matching_files:
print(f" š Files: {len(matching_files)} files contain this pattern")
print()
except Exception as e:
print(f"ā Failed to get documents: {e}")
import traceback
traceback.print_exc()
Return Value
This function does not return any value (implicitly returns None). It prints analysis results directly to stdout, including success/error messages, folder structure information, file counts, and pattern matching results.
Dependencies
sharepoint_graph_clientconfigtraceback
Required Imports
from sharepoint_graph_client import SharePointGraphClient
from config import Config
Conditional/Optional Imports
These imports are only needed under specific conditions:
import traceback
Condition: only used when an exception occurs during document retrieval to print detailed error information
OptionalUsage Example
# Ensure config.py exists with required settings
# Example config.py:
# class Config:
# SHAREPOINT_SITE_URL = 'https://yourtenant.sharepoint.com/sites/yoursite'
# AZURE_CLIENT_ID = 'your-client-id'
# AZURE_CLIENT_SECRET = 'your-client-secret'
# Run the analysis
analyze_structure()
# Output will be printed to console showing:
# - SharePoint connection status
# - Total number of items found
# - List of folder paths with file counts
# - Sample files in each folder
# - Pattern matching results for expected folders
Best Practices
- Ensure Azure AD application has appropriate SharePoint permissions (Sites.Read.All or Sites.ReadWrite.All) before running
- The function prints output directly to stdout, so redirect output if you need to capture results programmatically
- Handle the case where the function returns early (None) if client initialization fails
- The function analyzes all documents from root ('/'), which may be slow for large SharePoint sites with many files
- Expected patterns list can be customized by modifying the expected_patterns list in the source code
- Error handling is built-in but basic - consider wrapping calls in additional try-except blocks for production use
- The function shows only the first 3 files per folder as examples - modify the slice [:3] if you need more/fewer examples
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function test_folder_structure 80.9% similar
-
function search_and_locate 77.2% similar
-
function search_for_folders 74.8% similar
-
function explore_site_structure 73.6% similar
-
function main_v24 72.8% similar