🔍 Code Extractor

class DocumentCrawlLog

Maturity: 50

A SharePoint search administration class that provides methods to retrieve information about crawled documents and URLs, including both successful and unsuccessful crawl attempts.

File:
/tf/active/vicechatdev/SPFCsync/venv/lib64/python3.11/site-packages/office365/sharepoint/search/administration/document_crawl_log.py
Lines:
8 - 49
Complexity:
moderate

Purpose

DocumentCrawlLog is a SharePoint entity class that interfaces with the SharePoint Search Administration API to query and retrieve crawl log information. It allows protocol clients to access details about content that has been crawled by SharePoint's search crawler, including statistics about crawled URLs and information about failed crawl attempts. This is useful for monitoring search indexing health, debugging crawl issues, and auditing what content has been processed by the search system.

Source Code

class DocumentCrawlLog(Entity):
    """This object contains methods that can be used by the protocol client to retrieve information
    about items that were crawled."""

    def __init__(self, context):
        static_path = ResourcePath(
            "Microsoft.SharePoint.Client.Search.Administration.DocumentCrawlLog"
        )
        super(DocumentCrawlLog, self).__init__(context, static_path)

    def get_crawled_urls(self, get_count_only=False):
        """
        Retrieves information about all the contents that were crawled.

        :param bool get_count_only: f true, only the count of the contents crawled MUST be returned.
             If false, all the information about the crawled contents MUST be returned.
        """
        return_type = ClientResult(self.context, SimpleDataTable())
        payload = {"getCountOnly": get_count_only}
        qry = ServiceOperationQuery(
            self, "GetCrawledUrls", None, payload, None, return_type
        )
        self.context.add_query(qry)
        return return_type

    def get_unsuccesful_crawled_urls(self, display_url=None):
        """
        Retrieves information about the contents that failed crawling.

        :param str display_url:
        """
        return_type = ClientResult(self.context, SimpleDataTable())
        payload = {"displayUrl": display_url}
        qry = ServiceOperationQuery(
            self, "GetUnsuccesfulCrawledUrls", None, payload, None, return_type
        )
        self.context.add_query(qry)
        return return_type

    @property
    def entity_type_name(self):
        return "Microsoft.SharePoint.Client.Search.Administration.DocumentCrawlLog"

Parameters

Name Type Default Kind
bases Entity -

Parameter Details

context: A SharePoint client context object that manages the connection and communication with the SharePoint server. This context is required for executing queries and operations against the SharePoint API. It should be an instance of a ClientContext or similar context manager that handles authentication and request lifecycle.

Return Value

The constructor returns an instance of DocumentCrawlLog. The get_crawled_urls method returns a ClientResult containing a SimpleDataTable with information about crawled URLs (or just a count if get_count_only is True). The get_unsuccesful_crawled_urls method returns a ClientResult containing a SimpleDataTable with information about URLs that failed during crawling. The entity_type_name property returns the fully qualified type name string 'Microsoft.SharePoint.Client.Search.Administration.DocumentCrawlLog'.

Class Interface

Methods

__init__(self, context)

Purpose: Initializes a new DocumentCrawlLog instance with the provided SharePoint client context

Parameters:

  • context: SharePoint client context object for API communication

Returns: None (constructor)

get_crawled_urls(self, get_count_only: bool = False) -> ClientResult

Purpose: Retrieves information about all contents that were crawled by the SharePoint search crawler

Parameters:

  • get_count_only: If True, returns only the count of crawled contents. If False, returns all detailed information about crawled contents. Defaults to False.

Returns: ClientResult object containing a SimpleDataTable with crawled URL information or count. The actual data is available after calling context.execute_query().

get_unsuccesful_crawled_urls(self, display_url: str = None) -> ClientResult

Purpose: Retrieves information about contents that failed during the crawling process

Parameters:

  • display_url: Optional URL string to filter results for a specific URL. If None, returns all unsuccessful crawl attempts.

Returns: ClientResult object containing a SimpleDataTable with information about failed crawl attempts. The actual data is available after calling context.execute_query().

@property entity_type_name(self) -> str property

Purpose: Returns the fully qualified SharePoint entity type name for this class

Returns: String 'Microsoft.SharePoint.Client.Search.Administration.DocumentCrawlLog' representing the SharePoint entity type

Attributes

Name Type Description Scope
context ClientContext The SharePoint client context used for executing queries and managing the connection to SharePoint. Inherited from Entity base class. instance
resource_path ResourcePath The static resource path identifying this entity in the SharePoint API hierarchy. Set to 'Microsoft.SharePoint.Client.Search.Administration.DocumentCrawlLog'. Inherited from Entity base class. instance

Dependencies

  • office365

Required Imports

from office365.runtime.client_result import ClientResult
from office365.runtime.paths.resource_path import ResourcePath
from office365.runtime.queries.service_operation import ServiceOperationQuery
from office365.sharepoint.entity import Entity
from office365.sharepoint.search.simple_data_table import SimpleDataTable

Usage Example

from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.search.administration.document_crawl_log import DocumentCrawlLog

# Authenticate and create context
ctx = ClientContext('https://yourtenant.sharepoint.com/sites/yoursite')
ctx = ctx.with_credentials(UserCredential('username', 'password'))

# Create DocumentCrawlLog instance
crawl_log = DocumentCrawlLog(ctx)

# Get all crawled URLs with full information
crawled_urls_result = crawl_log.get_crawled_urls(get_count_only=False)
ctx.execute_query()
print(f'Crawled URLs: {crawled_urls_result.value}')

# Get only the count of crawled URLs
count_result = crawl_log.get_crawled_urls(get_count_only=True)
ctx.execute_query()
print(f'Total crawled count: {count_result.value}')

# Get unsuccessful crawl attempts for a specific URL
failed_urls_result = crawl_log.get_unsuccesful_crawled_urls(display_url='https://example.com/page')
ctx.execute_query()
print(f'Failed crawls: {failed_urls_result.value}')

Best Practices

  • Always call ctx.execute_query() after calling get_crawled_urls or get_unsuccesful_crawled_urls to actually execute the query and retrieve results
  • The ClientResult objects returned by methods are lazy-loaded; their .value property will only be populated after execute_query() is called
  • Use get_count_only=True when you only need statistics to reduce data transfer and improve performance
  • Ensure the context object has appropriate permissions for Search Administration operations before instantiating this class
  • Handle potential exceptions from execute_query() as network or authentication issues may occur
  • The display_url parameter in get_unsuccesful_crawled_urls can be None to retrieve all failed crawls, or a specific URL to filter results
  • This class inherits from Entity, so it follows the standard SharePoint entity lifecycle and query pattern
  • Results are returned as SimpleDataTable objects which contain tabular data about crawl operations

Similar Components

AI-powered semantic similarity - components with related functionality:

  • class TenantCrawlVersionsInfoProvider 60.5% similar

    A SharePoint client class that manages crawl versions settings at the tenant and site level, providing methods to check and disable version crawling functionality.

    From: /tf/active/vicechatdev/SPFCsync/venv/lib64/python3.11/site-packages/office365/sharepoint/search/administration/tenant_crawl_versions_info_provider.py
  • class SharePointClient 59.9% similar

    A SharePoint client class that provides methods for connecting to SharePoint sites, retrieving documents recursively, downloading file content, and managing document metadata using app-only authentication.

    From: /tf/active/vicechatdev/SPFCsync/sharepoint_client.py
  • class LogExport 59.8% similar

    A class for accessing and managing SharePoint diagnostic log exports, providing methods to retrieve log files and log type information.

    From: /tf/active/vicechatdev/SPFCsync/venv/lib64/python3.11/site-packages/office365/sharepoint/logger/log_export.py
  • class ActivityLogger 59.7% similar

    ActivityLogger is a SharePoint entity class that logs user activities and operations performed on SharePoint list items, tracking metadata such as operation type, affected resources, and timestamps.

    From: /tf/active/vicechatdev/SPFCsync/venv/lib64/python3.11/site-packages/office365/sharepoint/activities/logger.py
  • class LogFileInfo 58.1% similar

    LogFileInfo is a minimal entity class that represents log file information in SharePoint, inheriting all functionality from the Entity base class.

    From: /tf/active/vicechatdev/SPFCsync/venv/lib64/python3.11/site-packages/office365/sharepoint/logger/logFileInfo.py
← Back to Browse