RepositoryWebrefDocumentMetadata

AI Overview😉

  • The potential purpose of this module is to gather and store metadata about a document, such as its title, language, crawl time, and other attributes. This metadata can be used to better understand the content and context of the document, and to improve search results.
  • This module could impact search results by providing additional information about a document that can be used to rank it more accurately. For example, the "salientTerms" attribute could help identify the main topics of a document, while the "numIncomingAnchors" attribute could indicate the document's importance or relevance. The "isDisambiguationPage" attribute could also help to avoid returning ambiguous results.
  • A website may change things to be more favorable for this function by ensuring that their documents have accurate and complete metadata, such as titles, languages, and relevant keywords. They could also focus on creating high-quality, relevant content that attracts links from other reputable sources, which could increase the "numIncomingAnchors" attribute. Additionally, using clear and descriptive URLs, and avoiding ambiguity in their content, could also help to improve the accuracy of the metadata and the search results.

Interesting Module? Vote 👇

Voting helps other researchers find interesting modules.

Current Votes: 0

GoogleApi.ContentWarehouse.V1.Model.RepositoryWebrefDocumentMetadata (google_api_content_warehouse v0.4.0)

Information about the document which is not produced by webref, typically copied from the docjoin. Next available tag: 15

Attributes

  • crawlTime (type: String.t, default: nil) - The timestamp of when the document was crawled (if known). Copied from CompositeDoc.Content.CrawlTime.
  • docFp (type: String.t, default: nil) - Fingerprint of the document. We compute and set this fingerprint when creating the pagesets that we use for evals. Otherwise, this field is not normally set. We use the field to make sure that the human ratings that we have are generated for the same version of the document, otherwise they might be invalid. We do not compute the fingerprint on the fly (e.g. as a fingerprint of the proto buffer serialization of the cdoc) because protocol buffer serialization is not stable.
  • docId (type: String.t, default: nil) - DocId of the annotated document as read from cdoc.doc().docid().
  • forwardingUrls (type: GoogleApi.ContentWarehouse.V1.Model.RepositoryWebrefForwardingUrls.t, default: nil) - Urls that forward to this url. Needed for url -> topical entity entries.
  • isDisambiguationPage (type: boolean(), default: nil) - Set to true if the document is a known disambiguation page, e.g. https://en.wikipedia.org/wiki/Orange.
  • language (type: String.t, default: nil) - The document language, as read from doc().content().language(). This is go/language-enum value.
  • numIncomingAnchors (type: number(), default: nil) - The (weighted) number of incoming anchors (links from other documents).
  • salientTerms (type: GoogleApi.ContentWarehouse.V1.Model.QualitySalientTermsSalientTermSet.t, default: nil) - The salient terms for this document. Only set if --webref_doc_metadata_copy_salient_terms is true. Same motivation as the title field above.
  • title (type: String.t, default: nil) - The title of the document. Only set if --webref_doc_metadata_set_title is true. The idea is that we can use this to more easily learn things like: title contains "restaurants" -> more likely to be a list page.
  • totalClicks (type: number(), default: nil) - The total clicks on this document, taken from navboost data.
  • url (type: String.t, default: nil) - The url of the document.

Summary

Types

t()

Functions

decode(value, options)

Unwrap a decoded JSON object into its complex fields.

Types

Link to this type

t()

@type t() :: %GoogleApi.ContentWarehouse.V1.Model.RepositoryWebrefDocumentMetadata{
  crawlTime: String.t() | nil,
  docFp: String.t() | nil,
  docId: String.t() | nil,
  forwardingUrls:
    GoogleApi.ContentWarehouse.V1.Model.RepositoryWebrefForwardingUrls.t() | nil,
  isDisambiguationPage: boolean() | nil,
  language: String.t() | nil,
  numIncomingAnchors: number() | nil,
  salientTerms:
    GoogleApi.ContentWarehouse.V1.Model.QualitySalientTermsSalientTermSet.t()
    | nil,
  title: String.t() | nil,
  totalClicks: number() | nil,
  url: String.t() | nil
}

Functions

Link to this function

decode(value, options)

@spec decode(struct(), keyword()) :: struct()

Unwrap a decoded JSON object into its complex fields.