TrawlerFetchReplyDataRedirects

AI Overview😉

  • The potential purpose of this module is to analyze and store information about redirects that occur when a URL is fetched. This includes details about each hop in the redirect chain, such as the URL, HTTP response code, download time, and SSL certificate information. This module aims to provide a comprehensive understanding of the redirect process, which can be useful for search engine optimization (SEO) and website analysis.
  • This module could impact search results by allowing Google to better understand the structure and relationships between websites. By analyzing redirect chains, Google can identify patterns and anomalies that may indicate spam or manipulation. This can help to improve the quality of search results by reducing the visibility of low-quality or manipulated websites. Additionally, this module can help Google to identify canonical URLs and prevent duplicate content issues.
  • To be more favorable to this module, a website can ensure that their redirects are properly implemented and follow best practices. This includes using standard HTTP redirect codes (e.g., 301, 302), providing clear and consistent redirect chains, and avoiding excessive or unnecessary redirects. Additionally, websites can ensure that their SSL certificates are valid and properly configured, and that their robots.txt files are up-to-date and correctly formatted. By doing so, websites can improve their crawlability and indexing, which can lead to better search engine rankings and visibility.

Interesting Module? Vote 👇

Voting helps other researchers find interesting modules.

Current Votes: 1

GoogleApi.ContentWarehouse.V1.Model.TrawlerFetchReplyDataRedirects (google_api_content_warehouse v0.4.0)

The sequence of redirects fetched, if applicable. This includes url plus stats for each hop after the first hop. NOTE: This can be one redirect longer than the chain of redirects followed, in the case where there was a redirect at the end of the chain that the fetcher detected but did not follow.

Attributes

  • BadSSLCertificate (type: String.t, default: nil) - The server SSL certificate chain in SSLCertificateInfo protobuf format. See this field in FetchReplyData (i.e., the initial hop) for more description on when it will be populated.
  • CrawlTimes (type: GoogleApi.ContentWarehouse.V1.Model.TrawlerCrawlTimes.t, default: nil) - Per redirect hop timestamps. This
  • DownloadTime (type: integer(), default: nil) - Download time of this fetch (ms)
  • Endpoints (type: GoogleApi.ContentWarehouse.V1.Model.TrawlerTCPIPInfo.t, default: nil) - ## stats If fetched, ip info.
  • HSTSInfo (type: String.t, default: nil) - This specifies if the url in a redirect was rewritten to HTTPS because of an HSTS policy for the domain. See comments on FetchReplyData.HSTSInfo for how this field's values. A redirect that was rewritten with HSTS will have HSTS_STATUS_REWRITTEN ## here.
  • HTTPResponseCode (type: integer(), default: nil) - The HTTP response code for this hop. We need this since multiple response codes may have the same redirect type (e.g., 302 and 307 are both REDIRECT_TEMPORARILY), but clients may want to know which one was received. Note this is set only for the hops that are followed (i.e., TargetUrl is present). If the last redirect hop was not followed the fetch status will be URL_NOT_FOLLOWED, and the response code will be in the top level ProtocolResponse field.
  • HopPageNoIndexInfo (type: integer(), default: nil) - Extra trawler::PageNoIndexInfo for this hop. Integer: ORed together bits from trawler::PageNoIndexInfo. The information specified by this field comes from the http header or content of the source url, not the "TargetUrl" in this Redirects group.
  • HopReuseInfo (type: String.t, default: nil) - trawler::ReuseInfo with status of IMS/IMF/cache query, for this hop.
  • HopRobotsInfo (type: integer(), default: nil) - Extra trawler::RobotsInfo for this hop. Integer: ORed together bits from trawler::RobotsInfo
  • HostId (type: String.t, default: nil) - If known, the hostid for this hop
  • HttpRequestHeaders (type: String.t, default: nil) - The http headers we sent for fetching this redirect hop. Not normally filled in, unless FetchParams.WantSentHeaders is set.
  • HttpResponseHeaders (type: String.t, default: nil) - The http headers we received from this redirect hop. Trawler does not fill this in; this is intended as a placeholder for crawls like webmirror that fill in and want to track this across redirect hops.
  • RawTargetUrl (type: String.t, default: nil) - bytes: can contain bad encoding.
  • RefreshTime (type: integer(), default: nil) - Refresh time in meta redirect tag
  • RobotsTxt (type: String.t, default: nil) - The robots.txt we used for this fetch. Not normally filled in unless WantRobotsBody is set.
  • SourceBody (type: GoogleApi.ContentWarehouse.V1.Model.TrawlerFetchBodyData.t, default: nil) - For meta-redirects, this field may contain the body of the source document. Currently only filled client side and not implemented (yet) for server-side redirects.
  • TargetUrl (type: String.t, default: nil) - Difference between the following two fields: TargetUrl is set when we have followed the redirect target, and the url is canonicalized. RawTargetUrl is set in either of the following two cases: (1) The url has not be been followed. For example, the redirect is intended to be handled by the client. In the fetch reply response, you will see the url's status as URL_NOT_FOLLOWED-NOT_FOLLOWED. (2) The extracted redirect url is different from its canonicalized* form. For example, if the target url contains fragments, then this RawTargetUrl will have the fragments. Redirect target
  • Type (type: String.t, default: nil) - URL and redirect type

Summary

Types

t()

Functions

decode(value, options)

Unwrap a decoded JSON object into its complex fields.

Types

Link to this type

t()

@type t() :: %GoogleApi.ContentWarehouse.V1.Model.TrawlerFetchReplyDataRedirects{
  BadSSLCertificate: String.t() | nil,
  CrawlTimes: GoogleApi.ContentWarehouse.V1.Model.TrawlerCrawlTimes.t() | nil,
  DownloadTime: integer() | nil,
  Endpoints: GoogleApi.ContentWarehouse.V1.Model.TrawlerTCPIPInfo.t() | nil,
  HSTSInfo: String.t() | nil,
  HTTPResponseCode: integer() | nil,
  HopPageNoIndexInfo: integer() | nil,
  HopReuseInfo: String.t() | nil,
  HopRobotsInfo: integer() | nil,
  HostId: String.t() | nil,
  HttpRequestHeaders: String.t() | nil,
  HttpResponseHeaders: String.t() | nil,
  RawTargetUrl: String.t() | nil,
  RefreshTime: integer() | nil,
  RobotsTxt: String.t() | nil,
  SourceBody:
    GoogleApi.ContentWarehouse.V1.Model.TrawlerFetchBodyData.t() | nil,
  TargetUrl: String.t() | nil,
  Type: String.t() | nil
}

Functions

Link to this function

decode(value, options)

@spec decode(struct(), keyword()) :: struct()

Unwrap a decoded JSON object into its complex fields.