
AI Overview😉

  • The potential purpose of this module is to analyze and understand the structure and meaning of text within a document, breaking it down into individual words or tokens, and identifying their relationships with each other. This includes identifying parts of speech, word categories, and dependency relationships between tokens.
  • This module could impact search results by allowing Google to better understand the context and meaning of a webpage's content, and to more accurately match search queries with relevant results. This could lead to more accurate and relevant search results, and could also help to filter out low-quality or irrelevant content.
  • To be more favorable to this function, a website could focus on creating high-quality, well-structured content that is easy for the algorithm to understand. This could include using clear and concise language, organizing content into logical sections, and using header tags and other semantic HTML elements to define the structure of the page. Additionally, using schema markup and other forms of structured data could help to provide the algorithm with additional context and meaning.

Interesting Module? Vote 👇

Voting helps other researchers find interesting modules.

Current Votes: 0

GoogleApi.ContentWarehouse.V1.Model.NlpSaftToken (google_api_content_warehouse v0.4.0)

A document token marks a span of bytes in the document text as a token or word. Next available index: 16.


  • breakLevel (type: String.t, default: nil) -
  • breakSkippedText (type: boolean(), default: nil) - Whether the break skipped over non-tag text (excluding script/style).
  • category (type: String.t, default: nil) - Coarse-grained word category for token. See README.categories for category inventory.
  • end (type: integer(), default: nil) -
  • head (type: integer(), default: nil) - Head of this token in the dependency tree: the id of the token which has an arc going to this one. If it is the root token of a sentence, then it is set to -1.
  • info (type: GoogleApi.ContentWarehouse.V1.Model.Proto2BridgeMessageSet.t, default: nil) - Annotation for this token.
  • label (type: String.t, default: nil) - Label for dependency relation between this token and its head. See README.labels for label inventory.
  • lemma (type: String.t, default: nil) - Word lemma. This is only filled if the lemma is different from the word form.
  • morph (type: GoogleApi.ContentWarehouse.V1.Model.NlpSaftMorphology.t, default: nil) - Morphology information.
  • scriptCode (type: String.t, default: nil) - A string representation (typically four letters, sometimes longer) of the token's Unicode script code, based on BCP 47/CLDR, capitalized according to ISO 15924. See i18n/identifiers/scriptcode.h for details.
  • start (type: integer(), default: nil) - [start, end] describe the inclusive byte range of the UTF-8 encoded token in document.text. End gives the index of the last byte, which may be a UTF-8 continuation byte, and the length in bytes is end - start + 1. begin/end options are for goldmine AnnotationsFinder to locate the offsets of saft tokens. Start is inclusive by default and end is marked.
  • tag (type: String.t, default: nil) - Part-of-speech tag for token. See README.tags for tag inventory.
  • tagConfidence (type: number(), default: nil) - Confidence score for the tag prediction -- should be interpreted as a probability estimate that the tag is correct.
  • textProperties (type: integer(), default: nil) -
  • word (type: String.t, default: nil) - Token word form. This may not be identical to the original. For example, in goldmine annotation we do UTF-8 normalization and punctuation normalization. The punctuation normalization includes inferring the directionality of straight doublequotes -- that is, we map " to open quote (``) or close quote (''), and sometimes we get it wrong. SAFT processing in other contexts (such as queries in qrewrite) involves different normalizations.





decode(value, options)

Unwrap a decoded JSON object into its complex fields.


Link to this type


@type t() :: %GoogleApi.ContentWarehouse.V1.Model.NlpSaftToken{
  breakLevel: String.t() | nil,
  breakSkippedText: boolean() | nil,
  category: String.t() | nil,
  end: integer() | nil,
  head: integer() | nil,
  info: GoogleApi.ContentWarehouse.V1.Model.Proto2BridgeMessageSet.t() | nil,
  label: String.t() | nil,
  lemma: String.t() | nil,
  morph: GoogleApi.ContentWarehouse.V1.Model.NlpSaftMorphology.t() | nil,
  scriptCode: String.t() | nil,
  start: integer() | nil,
  tag: String.t() | nil,
  tagConfidence: number() | nil,
  textProperties: integer() | nil,
  word: String.t() | nil


Link to this function

decode(value, options)

@spec decode(struct(), keyword()) :: struct()

Unwrap a decoded JSON object into its complex fields.