DocumentCrawlLog - Code Extractor

class DocumentCrawlLog

Maturity: 50

A SharePoint search administration class that provides methods to retrieve information about crawled documents and URLs, including both successful and unsuccessful crawl attempts.

File:

/tf/active/vicechatdev/SPFCsync/venv/lib64/python3.11/site-packages/office365/sharepoint/search/administration/document_crawl_log.py

Lines:
8 - 49

Complexity:
moderate

Purpose

DocumentCrawlLog is a SharePoint entity class that interfaces with the SharePoint Search Administration API to query and retrieve crawl log information. It allows protocol clients to access details about content that has been crawled by SharePoint's search crawler, including statistics about crawled URLs and information about failed crawl attempts. This is useful for monitoring search indexing health, debugging crawl issues, and auditing what content has been processed by the search system.

Source Code

class DocumentCrawlLog(Entity):
    """This object contains methods that can be used by the protocol client to retrieve information
    about items that were crawled."""

    def __init__(self, context):
        static_path = ResourcePath(
            "Microsoft.SharePoint.Client.Search.Administration.DocumentCrawlLog"
        )
        super(DocumentCrawlLog, self).__init__(context, static_path)

    def get_crawled_urls(self, get_count_only=False):
        """
        Retrieves information about all the contents that were crawled.

        :param bool get_count_only: f true, only the count of the contents crawled MUST be returned.
             If false, all the information about the crawled contents MUST be returned.
        """
        return_type = ClientResult(self.context, SimpleDataTable())
        payload = {"getCountOnly": get_count_only}
        qry = ServiceOperationQuery(
            self, "GetCrawledUrls", None, payload, None, return_type
        )
        self.context.add_query(qry)
        return return_type

    def get_unsuccesful_crawled_urls(self, display_url=None):
        """
        Retrieves information about the contents that failed crawling.

        :param str display_url:
        """
        return_type = ClientResult(self.context, SimpleDataTable())
        payload = {"displayUrl": display_url}
        qry = ServiceOperationQuery(
            self, "GetUnsuccesfulCrawledUrls", None, payload, None, return_type
        )
        self.context.add_query(qry)
        return return_type

    @property
    def entity_type_name(self):
        return "Microsoft.SharePoint.Client.Search.Administration.DocumentCrawlLog"

Parameters

Name	Type	Default	Kind
`bases`	Entity	-

Parameter Details

context: A SharePoint client context object that manages the connection and communication with the SharePoint server. This context is required for executing queries and operations against the SharePoint API. It should be an instance of a ClientContext or similar context manager that handles authentication and request lifecycle.

Return Value

The constructor returns an instance of DocumentCrawlLog. The get_crawled_urls method returns a ClientResult containing a SimpleDataTable with information about crawled URLs (or just a count if get_count_only is True). The get_unsuccesful_crawled_urls method returns a ClientResult containing a SimpleDataTable with information about URLs that failed during crawling. The entity_type_name property returns the fully qualified type name string 'Microsoft.SharePoint.Client.Search.Administration.DocumentCrawlLog'.

Class Interface

Methods

`init(self, context)`

Purpose: Initializes a new DocumentCrawlLog instance with the provided SharePoint client context

Parameters:

context: SharePoint client context object for API communication

Returns: None (constructor)

`get_crawled_urls(self, get_count_only: bool = False) -> ClientResult`

Purpose: Retrieves information about all contents that were crawled by the SharePoint search crawler

Parameters:

get_count_only: If True, returns only the count of crawled contents. If False, returns all detailed information about crawled contents. Defaults to False.

Returns: ClientResult object containing a SimpleDataTable with crawled URL information or count. The actual data is available after calling context.execute_query().

`get_unsuccesful_crawled_urls(self, display_url: str = None) -> ClientResult`

Purpose: Retrieves information about contents that failed during the crawling process

Parameters:

display_url: Optional URL string to filter results for a specific URL. If None, returns all unsuccessful crawl attempts.

Returns: ClientResult object containing a SimpleDataTable with information about failed crawl attempts. The actual data is available after calling context.execute_query().

`@property entity_type_name(self) -> str` property

Purpose: Returns the fully qualified SharePoint entity type name for this class

Returns: String 'Microsoft.SharePoint.Client.Search.Administration.DocumentCrawlLog' representing the SharePoint entity type

Attributes

Name	Type	Description	Scope
`context`	ClientContext	The SharePoint client context used for executing queries and managing the connection to SharePoint. Inherited from Entity base class.	instance
`resource_path`	ResourcePath	The static resource path identifying this entity in the SharePoint API hierarchy. Set to 'Microsoft.SharePoint.Client.Search.Administration.DocumentCrawlLog'. Inherited from Entity base class.	instance

Dependencies

office365

Required Imports

from office365.runtime.client_result import ClientResult
from office365.runtime.paths.resource_path import ResourcePath
from office365.runtime.queries.service_operation import ServiceOperationQuery
from office365.sharepoint.entity import Entity
from office365.sharepoint.search.simple_data_table import SimpleDataTable

Usage Example

from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.search.administration.document_crawl_log import DocumentCrawlLog

# Authenticate and create context
ctx = ClientContext('https://yourtenant.sharepoint.com/sites/yoursite')
ctx = ctx.with_credentials(UserCredential('username', 'password'))

# Create DocumentCrawlLog instance
crawl_log = DocumentCrawlLog(ctx)

# Get all crawled URLs with full information
crawled_urls_result = crawl_log.get_crawled_urls(get_count_only=False)
ctx.execute_query()
print(f'Crawled URLs: {crawled_urls_result.value}')

# Get only the count of crawled URLs
count_result = crawl_log.get_crawled_urls(get_count_only=True)
ctx.execute_query()
print(f'Total crawled count: {count_result.value}')

# Get unsuccessful crawl attempts for a specific URL
failed_urls_result = crawl_log.get_unsuccesful_crawled_urls(display_url='https://example.com/page')
ctx.execute_query()
print(f'Failed crawls: {failed_urls_result.value}')

Best Practices

Always call ctx.execute_query() after calling get_crawled_urls or get_unsuccesful_crawled_urls to actually execute the query and retrieve results
The ClientResult objects returned by methods are lazy-loaded; their .value property will only be populated after execute_query() is called
Use get_count_only=True when you only need statistics to reduce data transfer and improve performance
Ensure the context object has appropriate permissions for Search Administration operations before instantiating this class
Handle potential exceptions from execute_query() as network or authentication issues may occur
The display_url parameter in get_unsuccesful_crawled_urls can be None to retrieve all failed crawls, or a specific URL to filter results
This class inherits from Entity, so it follows the standard SharePoint entity lifecycle and query pattern
Results are returned as SimpleDataTable objects which contain tabular data about crawl operations

Similar Components

AI-powered semantic similarity - components with related functionality:

class TenantCrawlVersionsInfoProvider 60.5% similar

A SharePoint client class that manages crawl versions settings at the tenant and site level, providing methods to check and disable version crawling functionality.
From: /tf/active/vicechatdev/SPFCsync/venv/lib64/python3.11/site-packages/office365/sharepoint/search/administration/tenant_crawl_versions_info_provider.py
class SharePointClient 59.9% similar

A SharePoint client class that provides methods for connecting to SharePoint sites, retrieving documents recursively, downloading file content, and managing document metadata using app-only authentication.
From: /tf/active/vicechatdev/SPFCsync/sharepoint_client.py
class LogExport 59.8% similar

A class for accessing and managing SharePoint diagnostic log exports, providing methods to retrieve log files and log type information.
From: /tf/active/vicechatdev/SPFCsync/venv/lib64/python3.11/site-packages/office365/sharepoint/logger/log_export.py
class ActivityLogger 59.7% similar

ActivityLogger is a SharePoint entity class that logs user activities and operations performed on SharePoint list items, tracking metadata such as operation type, affected resources, and timestamps.
From: /tf/active/vicechatdev/SPFCsync/venv/lib64/python3.11/site-packages/office365/sharepoint/activities/logger.py
class LogFileInfo 58.1% similar

LogFileInfo is a minimal entity class that represents log file information in SharePoint, inheriting all functionality from the Entity base class.
From: /tf/active/vicechatdev/SPFCsync/venv/lib64/python3.11/site-packages/office365/sharepoint/logger/logFileInfo.py

← Back to Browse

Assistant

Hi! I can help improve this code. Tell me what you'd like to enhance (e.g., "add error handling", "optimize performance", "improve readability", "add type hints").

Code Comparison

Original Code

                            class DocumentCrawlLog(Entity):
    """This object contains methods that can be used by the protocol client to retrieve information
    about items that were crawled."""

    def __init__(self, context):
        static_path = ResourcePath(
            "Microsoft.SharePoint.Client.Search.Administration.DocumentCrawlLog"
        )
        super(DocumentCrawlLog, self).__init__(context, static_path)

    def get_crawled_urls(self, get_count_only=False):
        """
        Retrieves information about all the contents that were crawled.

        :param bool get_count_only: f true, only the count of the contents crawled MUST be returned.
             If false, all the information about the crawled contents MUST be returned.
        """
        return_type = ClientResult(self.context, SimpleDataTable())
        payload = {"getCountOnly": get_count_only}
        qry = ServiceOperationQuery(
            self, "GetCrawledUrls", None, payload, None, return_type
        )
        self.context.add_query(qry)
        return return_type

    def get_unsuccesful_crawled_urls(self, display_url=None):
        """
        Retrieves information about the contents that failed crawling.

        :param str display_url:
        """
        return_type = ClientResult(self.context, SimpleDataTable())
        payload = {"displayUrl": display_url}
        qry = ServiceOperationQuery(
            self, "GetUnsuccesfulCrawledUrls", None, payload, None, return_type
        )
        self.context.add_query(qry)
        return return_type

    @property
    def entity_type_name(self):
        return "Microsoft.SharePoint.Client.Search.Administration.DocumentCrawlLog"
                        

Improved Code

🔍 Code Extractor

class DocumentCrawlLog

Purpose

Source Code

Parameters

Parameter Details

Return Value

Class Interface

Methods

`init(self, context)`

`get_crawled_urls(self, get_count_only: bool = False) -> ClientResult`

`get_unsuccesful_crawled_urls(self, display_url: str = None) -> ClientResult`

`@property entity_type_name(self) -> str` property

Attributes

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

class TenantCrawlVersionsInfoProvider 60.5% similar

class SharePointClient 59.9% similar

class LogExport 59.8% similar

class ActivityLogger 59.7% similar

class LogFileInfo 58.1% similar

class DocumentCrawlLog

Purpose

Source Code

Parameters

Parameter Details

Return Value

Class Interface

Methods

__init__(self, context)

get_crawled_urls(self, get_count_only: bool = False) -> ClientResult

get_unsuccesful_crawled_urls(self, display_url: str = None) -> ClientResult

@property entity_type_name(self) -> str property

Attributes

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

class TenantCrawlVersionsInfoProvider 60.5% similar

class SharePointClient 59.9% similar

class LogExport 59.8% similar

class ActivityLogger 59.7% similar

class LogFileInfo 58.1% similar

✨ Improve Code: DocumentCrawlLog

Code Comparison

`init(self, context)`

`get_crawled_urls(self, get_count_only: bool = False) -> ClientResult`

`get_unsuccesful_crawled_urls(self, display_url: str = None) -> ClientResult`

`@property entity_type_name(self) -> str` property