GoogleCloudContentwarehouseV1GcsIngestPipeline

AI Overview😉

  • The potential purpose of this module is to configure the ingestion of files from Google Cloud Storage into a Document Warehouse, allowing for the processing and indexing of documents for search and retrieval.
  • This module could impact search results by controlling which documents are ingested and processed, and how they are formatted and indexed. This could affect the relevance and ranking of search results, as well as the ability to filter and refine search results based on document metadata.
  • To be more favorable for this function, a website could ensure that their documents are stored in a Cloud Storage folder with a clear and consistent naming convention, and that the documents are formatted in a way that is compatible with the Doc AI processor type. Additionally, the website could provide a clear and consistent schema for their documents, and use custom metadata to provide additional context and filtering options. By doing so, the website can improve the accuracy and relevance of search results, and provide a better user experience.

Interesting Module? Vote 👇

Voting helps other researchers find interesting modules.

Current Votes: 0

GoogleApi.ContentWarehouse.V1.Model.GoogleCloudContentwarehouseV1GcsIngestPipeline (google_api_content_warehouse v0.4.0)

The configuration of the Cloud Storage Ingestion pipeline.

Attributes

  • inputPath (type: String.t, default: nil) - The input Cloud Storage folder. All files under this folder will be imported to Document Warehouse. Format: gs:///.
  • pipelineConfig (type: GoogleApi.ContentWarehouse.V1.Model.GoogleCloudContentwarehouseV1IngestPipelineConfig.t, default: nil) - Optional. The config for the Cloud Storage Ingestion pipeline. It provides additional customization options to run the pipeline and can be skipped if it is not applicable.
  • processorType (type: String.t, default: nil) - The Doc AI processor type name. Only used when the format of ingested files is Doc AI Document proto format.
  • schemaName (type: String.t, default: nil) - The Document Warehouse schema resource name. All documents processed by this pipeline will use this schema. Format: projects/{project_number}/locations/{location}/documentSchemas/{document_schema_id}.
  • skipIngestedDocuments (type: boolean(), default: nil) - The flag whether to skip ingested documents. If it is set to true, documents in Cloud Storage contains key "status" with value "status=ingested" in custom metadata will be skipped to ingest.

Summary

Types

t()

Functions

decode(value, options)

Unwrap a decoded JSON object into its complex fields.

Types

Link to this type

t()

@type t() ::
  %GoogleApi.ContentWarehouse.V1.Model.GoogleCloudContentwarehouseV1GcsIngestPipeline{
    inputPath: String.t() | nil,
    pipelineConfig:
      GoogleApi.ContentWarehouse.V1.Model.GoogleCloudContentwarehouseV1IngestPipelineConfig.t()
      | nil,
    processorType: String.t() | nil,
    schemaName: String.t() | nil,
    skipIngestedDocuments: boolean() | nil
  }

Functions

Link to this function

decode(value, options)

@spec decode(struct(), keyword()) :: struct()

Unwrap a decoded JSON object into its complex fields.