class DocumentCrawlLog
A SharePoint search administration class that provides methods to retrieve information about crawled documents and URLs, including both successful and unsuccessful crawl attempts.
/tf/active/vicechatdev/SPFCsync/venv/lib64/python3.11/site-packages/office365/sharepoint/search/administration/document_crawl_log.py
8 - 49
moderate
Purpose
DocumentCrawlLog is a SharePoint entity class that interfaces with the SharePoint Search Administration API to query and retrieve crawl log information. It allows protocol clients to access details about content that has been crawled by SharePoint's search crawler, including statistics about crawled URLs and information about failed crawl attempts. This is useful for monitoring search indexing health, debugging crawl issues, and auditing what content has been processed by the search system.
Source Code
class DocumentCrawlLog(Entity):
"""This object contains methods that can be used by the protocol client to retrieve information
about items that were crawled."""
def __init__(self, context):
static_path = ResourcePath(
"Microsoft.SharePoint.Client.Search.Administration.DocumentCrawlLog"
)
super(DocumentCrawlLog, self).__init__(context, static_path)
def get_crawled_urls(self, get_count_only=False):
"""
Retrieves information about all the contents that were crawled.
:param bool get_count_only: f true, only the count of the contents crawled MUST be returned.
If false, all the information about the crawled contents MUST be returned.
"""
return_type = ClientResult(self.context, SimpleDataTable())
payload = {"getCountOnly": get_count_only}
qry = ServiceOperationQuery(
self, "GetCrawledUrls", None, payload, None, return_type
)
self.context.add_query(qry)
return return_type
def get_unsuccesful_crawled_urls(self, display_url=None):
"""
Retrieves information about the contents that failed crawling.
:param str display_url:
"""
return_type = ClientResult(self.context, SimpleDataTable())
payload = {"displayUrl": display_url}
qry = ServiceOperationQuery(
self, "GetUnsuccesfulCrawledUrls", None, payload, None, return_type
)
self.context.add_query(qry)
return return_type
@property
def entity_type_name(self):
return "Microsoft.SharePoint.Client.Search.Administration.DocumentCrawlLog"
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
bases |
Entity | - |
Parameter Details
context: A SharePoint client context object that manages the connection and communication with the SharePoint server. This context is required for executing queries and operations against the SharePoint API. It should be an instance of a ClientContext or similar context manager that handles authentication and request lifecycle.
Return Value
The constructor returns an instance of DocumentCrawlLog. The get_crawled_urls method returns a ClientResult containing a SimpleDataTable with information about crawled URLs (or just a count if get_count_only is True). The get_unsuccesful_crawled_urls method returns a ClientResult containing a SimpleDataTable with information about URLs that failed during crawling. The entity_type_name property returns the fully qualified type name string 'Microsoft.SharePoint.Client.Search.Administration.DocumentCrawlLog'.
Class Interface
Methods
__init__(self, context)
Purpose: Initializes a new DocumentCrawlLog instance with the provided SharePoint client context
Parameters:
context: SharePoint client context object for API communication
Returns: None (constructor)
get_crawled_urls(self, get_count_only: bool = False) -> ClientResult
Purpose: Retrieves information about all contents that were crawled by the SharePoint search crawler
Parameters:
get_count_only: If True, returns only the count of crawled contents. If False, returns all detailed information about crawled contents. Defaults to False.
Returns: ClientResult object containing a SimpleDataTable with crawled URL information or count. The actual data is available after calling context.execute_query().
get_unsuccesful_crawled_urls(self, display_url: str = None) -> ClientResult
Purpose: Retrieves information about contents that failed during the crawling process
Parameters:
display_url: Optional URL string to filter results for a specific URL. If None, returns all unsuccessful crawl attempts.
Returns: ClientResult object containing a SimpleDataTable with information about failed crawl attempts. The actual data is available after calling context.execute_query().
@property entity_type_name(self) -> str
property
Purpose: Returns the fully qualified SharePoint entity type name for this class
Returns: String 'Microsoft.SharePoint.Client.Search.Administration.DocumentCrawlLog' representing the SharePoint entity type
Attributes
| Name | Type | Description | Scope |
|---|---|---|---|
context |
ClientContext | The SharePoint client context used for executing queries and managing the connection to SharePoint. Inherited from Entity base class. | instance |
resource_path |
ResourcePath | The static resource path identifying this entity in the SharePoint API hierarchy. Set to 'Microsoft.SharePoint.Client.Search.Administration.DocumentCrawlLog'. Inherited from Entity base class. | instance |
Dependencies
office365
Required Imports
from office365.runtime.client_result import ClientResult
from office365.runtime.paths.resource_path import ResourcePath
from office365.runtime.queries.service_operation import ServiceOperationQuery
from office365.sharepoint.entity import Entity
from office365.sharepoint.search.simple_data_table import SimpleDataTable
Usage Example
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.search.administration.document_crawl_log import DocumentCrawlLog
# Authenticate and create context
ctx = ClientContext('https://yourtenant.sharepoint.com/sites/yoursite')
ctx = ctx.with_credentials(UserCredential('username', 'password'))
# Create DocumentCrawlLog instance
crawl_log = DocumentCrawlLog(ctx)
# Get all crawled URLs with full information
crawled_urls_result = crawl_log.get_crawled_urls(get_count_only=False)
ctx.execute_query()
print(f'Crawled URLs: {crawled_urls_result.value}')
# Get only the count of crawled URLs
count_result = crawl_log.get_crawled_urls(get_count_only=True)
ctx.execute_query()
print(f'Total crawled count: {count_result.value}')
# Get unsuccessful crawl attempts for a specific URL
failed_urls_result = crawl_log.get_unsuccesful_crawled_urls(display_url='https://example.com/page')
ctx.execute_query()
print(f'Failed crawls: {failed_urls_result.value}')
Best Practices
- Always call ctx.execute_query() after calling get_crawled_urls or get_unsuccesful_crawled_urls to actually execute the query and retrieve results
- The ClientResult objects returned by methods are lazy-loaded; their .value property will only be populated after execute_query() is called
- Use get_count_only=True when you only need statistics to reduce data transfer and improve performance
- Ensure the context object has appropriate permissions for Search Administration operations before instantiating this class
- Handle potential exceptions from execute_query() as network or authentication issues may occur
- The display_url parameter in get_unsuccesful_crawled_urls can be None to retrieve all failed crawls, or a specific URL to filter results
- This class inherits from Entity, so it follows the standard SharePoint entity lifecycle and query pattern
- Results are returned as SimpleDataTable objects which contain tabular data about crawl operations
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
class TenantCrawlVersionsInfoProvider 60.5% similar
-
class SharePointClient 59.9% similar
-
class LogExport 59.8% similar
-
class ActivityLogger 59.7% similar
-
class LogFileInfo 58.1% similar