DocProperties

AI Overview😉

  • The potential purpose of this module is to analyze and extract various features from a document's content, such as title, language, font size, punctuation, and token count, to help Google's search algorithm understand the document's structure and content.
  • This module could impact search results by influencing how documents are ranked and displayed. For example, a document with a missing or meaningless title (badTitle) might be penalized, while a document with a well-structured title and relevant keywords might be boosted. The leading text information (leadingtext) could also affect how snippets are generated and displayed in search results.
  • To be more favorable for this function, a website could ensure that its title tags are descriptive and accurate, use a clear and concise language, and structure its content in a way that makes it easy for the algorithm to extract relevant features. Additionally, using proper font sizes and formatting could also help the algorithm better understand the document's content and structure.

Interesting Module? Vote 👇

Voting helps other researchers find interesting modules.

Current Votes: 2

GoogleApi.ContentWarehouse.V1.Model.DocProperties (google_api_content_warehouse v0.4.0)

NOTE: In segindexer, the docproperties of a document may be reused from a previous cycle if its content is not changed. If you add a new field to DocProperties, make sure it is taken care (i.e., gets copied from a previous cycle to the current document) in CDocProperties::EndDocument().

Attributes

  • avgTermWeight (type: integer(), default: nil) - The average weighted font size of a term in the doc body
  • badTitle (type: boolean(), default: nil) - Missing or meaningless title
  • badtitleinfo (type: list(GoogleApi.ContentWarehouse.V1.Model.DocPropertiesBadTitleInfo.t), default: nil) -
  • languages (type: list(integer()), default: nil) - A Language enum value. See: go/language-enum
  • leadingtext (type: GoogleApi.ContentWarehouse.V1.Model.SnippetsLeadingtextLeadingTextInfo.t, default: nil) - Leading text information generated by google3/quality/snippets/leadingtext/leadingtext-detector.cc
  • numPunctuations (type: integer(), default: nil) -
  • numTags (type: integer(), default: nil) -
  • numTokens (type: integer(), default: nil) - The number of tokens, tags and punctuations in the tokenized contents. This is an approximation of the number of tokens, tags and punctuations we end up with in mustang, but is inexact since we drop some tokens in mustang and also truncate docs at a max cap.
  • proseRestrict (type: list(String.t), default: nil) - The restricts for CSE structured search.
  • restricts (type: list(String.t), default: nil) -
  • timestamp (type: String.t, default: nil) - The time CDocProperties::StartDocument() is called, encoded as seconds past the epoch (Jan 1, 1970). This value is always refreshed and not reused.
  • title (type: String.t, default: nil) - Extracted from the title tag of the content. This is typically extracted by TitleMetaCollector defined at google3/segindexer/title-meta-collector.h. Please see its documentation for the format and other caveats.

Summary

Types

t()

Functions

decode(value, options)

Unwrap a decoded JSON object into its complex fields.

Types

Link to this type

t()

@type t() :: %GoogleApi.ContentWarehouse.V1.Model.DocProperties{
  avgTermWeight: integer() | nil,
  badTitle: boolean() | nil,
  badtitleinfo:
    [GoogleApi.ContentWarehouse.V1.Model.DocPropertiesBadTitleInfo.t()] | nil,
  languages: [integer()] | nil,
  leadingtext:
    GoogleApi.ContentWarehouse.V1.Model.SnippetsLeadingtextLeadingTextInfo.t()
    | nil,
  numPunctuations: integer() | nil,
  numTags: integer() | nil,
  numTokens: integer() | nil,
  proseRestrict: [String.t()] | nil,
  restricts: [String.t()] | nil,
  timestamp: String.t() | nil,
  title: String.t() | nil
}

Functions

Link to this function

decode(value, options)

@spec decode(struct(), keyword()) :: struct()

Unwrap a decoded JSON object into its complex fields.