GoodocDocument

AI Overview😉

  • The potential purpose of this module is to process and analyze documents, particularly those that have been scanned or uploaded as images, and extract relevant information from them. It appears to be using Optical Character Recognition (OCR) technology to recognize and extract text from images of documents.
  • This module could impact search results by allowing Google to better understand the content of documents, even if they are not in a searchable format. This could lead to more accurate and relevant search results, particularly for searches related to specific documents or pieces of information. It could also enable Google to provide more detailed and informative snippets or previews of documents in search results.
  • A website may change things to be more favorable for this function by ensuring that any documents they upload or host are in a format that can be easily processed by OCR technology, such as clear and high-resolution scans or images. They may also consider providing additional metadata or context about the documents, such as titles, descriptions, or keywords, to help Google better understand their content and relevance.

Interesting Module? Vote 👇

Voting helps other researchers find interesting modules.

Current Votes: 0

GoogleApi.ContentWarehouse.V1.Model.GoodocDocument (google_api_content_warehouse v0.4.0)

Top-level representation of OCRed document

Attributes

  • EditingHistory (type: list(String.t), default: nil) - Debug info, recording the history of any editing done through the interface in goodoc-editing.h. The strings look like "MoveParagraph(page_index = 0, source_block_index = 3, ...);
  • LogicalEntity (type: list(String.t), default: nil) - Logical entities are stored as blobs. Depending on the kind of thing this is a goodoc of, a separate .proto file is expected to define the logical entity structure. Hence we can still parse this as a goodoc for people who dont care about this, and people who care about this can parse it specifically. ocr/goodoc/logical-entity-utils.h has methods to read and write these. See Goodoc++ doc
  • LogicalEntityMessageName (type: list(String.t), default: nil) - The names of the proto messages serialized in LogicalEntity, one for each LogicalEntity. The repetitions should number 0 to leave this unspecified, or they should equal the number of LogicalEntity strings.
  • SubDocuments (type: list(GoogleApi.ContentWarehouse.V1.Model.GoodocDocument.t), default: nil) - For multi-goodoc documents
  • header (type: GoogleApi.ContentWarehouse.V1.Model.GoodocDocumentHeader.t, default: nil) -
  • page (type: list(GoogleApi.ContentWarehouse.V1.Model.GoodocDocumentPage.t), default: nil) -

Summary

Types

t()

Functions

decode(value, options)

Unwrap a decoded JSON object into its complex fields.

Types

Link to this type

t()

@type t() :: %GoogleApi.ContentWarehouse.V1.Model.GoodocDocument{
  EditingHistory: [String.t()] | nil,
  LogicalEntity: [String.t()] | nil,
  LogicalEntityMessageName: [String.t()] | nil,
  SubDocuments: [t()] | nil,
  header: GoogleApi.ContentWarehouse.V1.Model.GoodocDocumentHeader.t() | nil,
  page: [GoogleApi.ContentWarehouse.V1.Model.GoodocDocumentPage.t()] | nil
}

Functions

Link to this function

decode(value, options)

@spec decode(struct(), keyword()) :: struct()

Unwrap a decoded JSON object into its complex fields.