IndexingCrawlerIdServingDocumentIdentifier

AI Overview😉

  • The potential purpose of this module is to identify and manage duplicate documents in Google's search index. It appears to be related to experiments and testing different versions of documents, possibly to improve search result quality.
  • This module could impact search results by ensuring that duplicate documents are not shown to users, and that experiments with different document versions do not affect the main search results. It may also help Google to identify and remove duplicate content from their index.
  • A website may change things to be more favorable for this function by ensuring that their content is unique and not duplicated across multiple URLs or versions. This could include using canonical URLs, avoiding duplicate content, and ensuring that their website's structure and content are easily crawlable by Google's search bots.

Interesting Module? Vote 👇

Voting helps other researchers find interesting modules.

Current Votes: 0

GoogleApi.ContentWarehouse.V1.Model.IndexingCrawlerIdServingDocumentIdentifier (google_api_content_warehouse v0.4.0)

Attributes

  • doubleIndexingExperimentId (type: String.t, default: nil) - Only for double indexing experiments. This field is set for duplicated documents so that docjoin users will not see duplicated docs.
  • dupExperimentId (type: String.t, default: nil) - Only for Experimental clusters, not relevant for production serving data: Index-Dups can run experiments in Quality Clusters where different versions of the same document (e.g. with different signals) are serving in parallel. They are uniquely identified by the dup-experiment-IDs. This is for experimental clusters only. In prod-versions the member will not be set.
  • key (type: String.t, default: nil) - The primary identifier of a production document is the document key, which is the same as the row-key in Alexandria, and represents a URL and its crawling context. The document key is the unique identifier for each document, but multiple document keys can cover the same URL (e.g. crawled with different device types). In your production code, please always assume that the document key is the only way to uniquely identify a document. Link for more background information: http://go/url The document key is populated for all docs in indexing since 2014-03. ## Recommended way of reading: const string& doc_key = cdoc.doc().id().key(); ## CHECK(!doc_key.empty()); Note: For older DocJoins (e.g. historical DocJoins), the field is not populated. In those scenarios it is recommended to use the function 'GetDocumentKeyFromCompositeDoc' in '//indexing/crawler_id/utils/compositedoc/compositedoc_util.h' instead.

Summary

Types

t()

Functions

decode(value, options)

Unwrap a decoded JSON object into its complex fields.

Types

Link to this type

t()

@type t() ::
  %GoogleApi.ContentWarehouse.V1.Model.IndexingCrawlerIdServingDocumentIdentifier{
    doubleIndexingExperimentId: String.t() | nil,
    dupExperimentId: String.t() | nil,
    key: String.t() | nil
  }

Functions

Link to this function

decode(value, options)

@spec decode(struct(), keyword()) :: struct()

Unwrap a decoded JSON object into its complex fields.