GoodocSummaryStats

AI Overview😉

  • The potential purpose of this module is to analyze the structure and content of a document or webpage, extracting various statistics about paragraphs, lines, words, symbols, and blocks. This information can be used to better understand the layout, readability, and content quality of a webpage, which can influence its ranking in search results.
  • This module could impact search results by favoring webpages that have a clear and organized structure, with well-formatted paragraphs, lines, and words. It may also prioritize webpages with a higher content quality, as measured by the median symbols and words per paragraph, block, and line. Additionally, the module's analysis of font sizes, line heights, and spacing could influence the ranking of webpages with better readability and accessibility.
  • To be more favorable for this function, a website could focus on improving its content structure and readability. This could include:
    • Using clear and concise headings and subheadings
    • Organizing content into well-formatted paragraphs and blocks
    • Optimizing font sizes, line heights, and spacing for better readability
    • Reducing the use of junk, header, and footer blocks
    • Improving the overall content quality, with a focus on providing valuable and informative text

Interesting Module? Vote 👇

Voting helps other researchers find interesting modules.

Current Votes: 0

GoogleApi.ContentWarehouse.V1.Model.GoodocSummaryStats (google_api_content_warehouse v0.4.0)

Goodoc stats for a range of elements, such as one page or a whole book. These stats can be computed using the SummaryStatsCollector class. Some range stats are pre-computed and stored in goodocs/volumes (eg., Page.stats below, and Ocean's CA_VolumeResult.goodoc_stats).

Attributes

  • numParagraphs (type: integer(), default: nil) - ------ Paragraph stats Median symbols and words omit junk, header and footer blocks; they are intended to be a measure of the typical "content" paragraph. There can still be substantial differences between means and medians, particularly if a table is present (every cell is a paragraph).
  • medianSymbolsPerParagraph (type: integer(), default: nil) -
  • estimatedFontSizes (type: boolean(), default: nil) - This flag is set if the histogram above has been derived by estimating font sizes from CharLabel.CharacterHeight; that happens if the FontSize field is constant, as has happened with Abbyy 9.
  • numLineSpaces (type: integer(), default: nil) - Lines (out of num_lines) that have a successor line within their para
  • medianSymbolsPerBlock (type: integer(), default: nil) -
  • numWords (type: integer(), default: nil) - ------ Word stats
  • medianSymbolsPerWord (type: integer(), default: nil) -
  • meanSymbolsPerWord (type: integer(), default: nil) -
  • numNonGraphicBlocks (type: integer(), default: nil) -
  • medianFullOddPrintedBox (type: GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox.t, default: nil) -
  • medianWordsPerLine (type: integer(), default: nil) -
  • medianLineSpan (type: integer(), default: nil) - top to next top in para
  • medianWidth (type: integer(), default: nil) -
  • medianWordsPerParagraph (type: integer(), default: nil) -
  • meanWordsPerBlock (type: integer(), default: nil) -
  • medianParagraphIndent (type: integer(), default: nil) - leading space on first line
  • medianOddPrintedBox (type: GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox.t, default: nil) - 1,3,5..
  • medianSymbolsPerLine (type: integer(), default: nil) -
  • meanSymbolsPerLine (type: integer(), default: nil) -
  • numLines (type: integer(), default: nil) - ------ Line stats "top" corresponds to the highest ascender and "bottom" to the lowest descender.
  • medianParagraphSpace (type: integer(), default: nil) - bottom to next top in block
  • numParagraphSpaces (type: integer(), default: nil) - paras that have a successor para within their block
  • medianPrintedBox (type: GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox.t, default: nil) - Each median*_printed_box excludes page header/footer and all graphic blocks
  • numPages (type: integer(), default: nil) - ------ Page stats.
  • medianHorizontalDpi (type: integer(), default: nil) -
  • meanSymbolsPerParagraph (type: integer(), default: nil) -
  • medianVerticalDpi (type: integer(), default: nil) -
  • medianFullPrintedBox (type: GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox.t, default: nil) - Each median_full*_printed_box includes page header/footer but still excludes all graphic blocks
  • fontSizeHistogram (type: list(GoogleApi.ContentWarehouse.V1.Model.GoodocFontSizeStats.t), default: nil) - Symbol counts (and other attributes) for each distinct CharLabel.FontId and FontSize; histogram is in decreasing order of symbol count
  • medianBlockSpace (type: integer(), default: nil) - bottom to next top in flow on page
  • medianLineHeight (type: integer(), default: nil) - top to bottom
  • medianHeight (type: integer(), default: nil) -
  • medianFullEvenPrintedBox (type: GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox.t, default: nil) -
  • meanWordsPerParagraph (type: integer(), default: nil) -
  • meanWordsPerLine (type: integer(), default: nil) -
  • medianEvenPrintedBox (type: GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox.t, default: nil) - 0,2,4..
  • medianLineSpace (type: integer(), default: nil) - bottom to next top in para
  • numSymbols (type: integer(), default: nil) - ------ Symbol stats
  • numBlocks (type: integer(), default: nil) - ------ Block stats Median symbols and words omit junk, header and footer blocks; they are intended to be a measure of the typical "content" block. There can still be substantial differences between means and medians; however, block values will generally exceed paragraph values (not the case when headers and footers are included).
  • medianWordsPerBlock (type: integer(), default: nil) -
  • numBlockSpaces (type: integer(), default: nil) - blocks that have a successor block within their flow on their page
  • meanSymbolsPerBlock (type: integer(), default: nil) -

Summary

Types

t()

Functions

decode(value, options)

Unwrap a decoded JSON object into its complex fields.

Types

Link to this type

t()

@type t() :: %GoogleApi.ContentWarehouse.V1.Model.GoodocSummaryStats{
  estimatedFontSizes: boolean() | nil,
  fontSizeHistogram:
    [GoogleApi.ContentWarehouse.V1.Model.GoodocFontSizeStats.t()] | nil,
  meanSymbolsPerBlock: integer() | nil,
  meanSymbolsPerLine: integer() | nil,
  meanSymbolsPerParagraph: integer() | nil,
  meanSymbolsPerWord: integer() | nil,
  meanWordsPerBlock: integer() | nil,
  meanWordsPerLine: integer() | nil,
  meanWordsPerParagraph: integer() | nil,
  medianBlockSpace: integer() | nil,
  medianEvenPrintedBox:
    GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox.t() | nil,
  medianFullEvenPrintedBox:
    GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox.t() | nil,
  medianFullOddPrintedBox:
    GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox.t() | nil,
  medianFullPrintedBox:
    GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox.t() | nil,
  medianHeight: integer() | nil,
  medianHorizontalDpi: integer() | nil,
  medianLineHeight: integer() | nil,
  medianLineSpace: integer() | nil,
  medianLineSpan: integer() | nil,
  medianOddPrintedBox:
    GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox.t() | nil,
  medianParagraphIndent: integer() | nil,
  medianParagraphSpace: integer() | nil,
  medianPrintedBox:
    GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox.t() | nil,
  medianSymbolsPerBlock: integer() | nil,
  medianSymbolsPerLine: integer() | nil,
  medianSymbolsPerParagraph: integer() | nil,
  medianSymbolsPerWord: integer() | nil,
  medianVerticalDpi: integer() | nil,
  medianWidth: integer() | nil,
  medianWordsPerBlock: integer() | nil,
  medianWordsPerLine: integer() | nil,
  medianWordsPerParagraph: integer() | nil,
  numBlockSpaces: integer() | nil,
  numBlocks: integer() | nil,
  numLineSpaces: integer() | nil,
  numLines: integer() | nil,
  numNonGraphicBlocks: integer() | nil,
  numPages: integer() | nil,
  numParagraphSpaces: integer() | nil,
  numParagraphs: integer() | nil,
  numSymbols: integer() | nil,
  numWords: integer() | nil
}

Functions

Link to this function

decode(value, options)

@spec decode(struct(), keyword()) :: struct()

Unwrap a decoded JSON object into its complex fields.