ResearchScienceSearchReconciledMetadata

AI Overview😉

  • The potential purpose of this module is to extract and store metadata from scientific datasets, including information such as authors, publication dates, descriptions, keywords, and licensing information. This metadata can be used to improve the discoverability and accessibility of scientific datasets.
  • This module could impact search results by providing more accurate and relevant metadata for scientific datasets, allowing users to more easily find and access relevant datasets. This could be particularly useful for researchers and scientists who rely on access to high-quality datasets to conduct their research.
  • A website may change things to be more favorable for this function by providing clear and consistent metadata for their datasets, including using standardized formats and vocabularies for metadata fields. Additionally, websites can ensure that their datasets are easily crawlable and indexable by search engines, and provide APIs or other interfaces for accessing and querying dataset metadata.

Interesting Module? Vote 👇

Voting helps other researchers find interesting modules.

Current Votes: 0

GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchReconciledMetadata (google_api_content_warehouse v0.4.0)

A proto for storing inferred and reconciled metadata for Science Search. Next available tag: 74

Attributes

  • identifierFromSource (type: list(String.t), default: nil) - An identifier as provided by the dataset itself.
  • name (type: list(String.t), default: nil) - The names of the dataset.
  • doi (type: String.t, default: nil) - The DOI for the dataset. We assume that there is only one.
  • dateUpdated (type: GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchDate.t, default: nil) - Most recent of the three dates (published, created, modified)
  • datePublished (type: GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchDate.t, default: nil) - The date when the dataset was published.
  • alternateName (type: list(String.t), default: nil) - Alternate names and acronyms for the dataset.
  • locationReconciledForName (type: boolean(), default: nil) - Indicates if the location has been reconciled for the dataset name. This is used by LocationExtender to avoid re-annotating the dataset name.
  • fieldOfStudy (type: list(GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchFieldOfStudyInfo.t), default: nil) - Field of study: a general, high-level classification of the dataset. This is only populated during indexing time and it is only populated if the classification_source is KNOWLEDGE_GRAPH or it's above inference threshold.
  • sameAs (type: list(String.t), default: nil) - Ids for other instances (not different versions) of this dataset.
  • license (type: list(GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchLicense.t), default: nil) - License for the dataset.
  • versionsSimhash (type: String.t, default: nil) - A simhash value of the fields used for identifying versions of a dataset. This will be used by the VersionClusterInfoWriter.
  • description (type: list(String.t), default: nil) - Description of the dataset.
  • coverageEndDate (type: GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchDate.t, default: nil) -
  • isAccessibleForFree (type: String.t, default: nil) - Indicates if the dataset is available for free or behind a paywal http://schema.org/isAccessibleForFree
  • coverageStartDate (type: GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchDate.t, default: nil) - The start and end date that the dataset covers. If the dataset covers a single timepoint, then start and end dates are the same. Use the ISO 8601 format for dates (e.g., 2006-05-23).
  • versionEmbeddingVector (type: list(number()), default: nil) - An embedding for the dataset to be used by the VersionAggregator.
  • authorList (type: String.t, default: nil) - A string representation of the authors of the dataset, collected from author and creator in raw metadata. The exact format (e.g., comma-separated, etc.) is up to the extender that populates this field. The assumption is that this string may appear in the UI "as is".
  • dateCreated (type: GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchDate.t, default: nil) - The date when the dataset was created.
  • topSalientTermLabel (type: list(String.t), default: nil) - Top salient term labels that describe the dataset document body.
  • keyword (type: list(String.t), default: nil) - Keywords describing the dataset.
  • hasCroissantFormat (type: boolean(), default: nil) - Indicates if the dataset has croissant format (https://github.com/mlcommons/croissant). Use optional so that explicitly setting to false will ensure the value is passed along to the KG instead of being indistinguisable from being unset and thus not set in the KG.
  • denylistStatus (type: list(String.t), default: nil) -
  • datasetClassificationScore (type: float(), default: nil) - Probability that the entity is in fact a dataset (in contrast to spam or website labelled as dataset that does not describe a dataset).
  • languageCode (type: String.t, default: nil) - The 2-letter language code for the source page for the dataset. Same as the language code in source_url_docjoin_info. Populated only when generating output for indexing.
  • sourceUrlDocjoinInfo (type: GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchSourceUrlDocjoinInfo.t, default: nil) - All the information extracted from docjoin, for the source_url of this dataset, aka DatasetMetadata.source_url.
  • compactIdentifierFromCitation (type: list(String.t), default: nil) - Compact Identifier(s) extracted from the citation field. Like in the case of DOI(s) those identify the articles related to the dataset rather than the dataset itself.
  • mentionedUrls (type: list(String.t), default: nil) - Mentioned URLs in the description.
  • dateModified (type: GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchDate.t, default: nil) - The date when the dataset was modified.
  • funder (type: list(GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchOrganization.t), default: nil) - Funder of the dataset.
  • variable (type: list(String.t), default: nil) - Variables that the data in the dataset captures (e.g., pressure, salinity, temperature). For now, these are just strings.
  • numberOfDatasetsAtSourceUrl (type: integer(), default: nil) - The number of datasets at the same source url as this dataset.
  • spatialCoverage (type: list(GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchLocation.t), default: nil) - Locations that describe spatial coverage of the data. If the data covers multiple locations then each value corresponds to one such location, describing its coordinates, mid, etc.
  • sourceOrganization (type: list(GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchOrganization.t), default: nil) - Source of the dataset: unifies provider, creator, author, publisher etc.
  • doiFromCitation (type: list(String.t), default: nil) - DOI(s) extracted from the citation field. In contrast to the "doi" field these DOIs identify the articles related to the dataset rather than the dataset itself.
  • indexInCluster (type: integer(), default: nil) - Index of this dataset in its cluster of replicas.
  • dataDownload (type: list(GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchDataDownload.t), default: nil) - The dataset in downloadable form. There can be multiple data download entries for different file types.
  • scholarQuery (type: String.t, default: nil) - Query string to send to Scholar to obtain the best approximation of citations to the dataset.
  • publication (type: list(GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchCitation.t), default: nil) -
  • catalog (type: GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchCatalog.t, default: nil) - Catalog that this dataset is a part of.
  • isBasedOn (type: list(String.t), default: nil) - A resource (most likely another dataset) from which this dataset is derived or from which it is a modification or adaption. http://schema.org/isBasedOn
  • versionClusterInfo (type: GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchVersionClusterInfo.t, default: nil) - Information on the version cluster that the dataset is a part of. This field is populated during the indexing time; the field is populated only if the dataset is part of a version cluster.
  • url (type: list(String.t), default: nil) - urls for the dataset, including doi.
  • replica (type: list(GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchReplica.t), default: nil) - The info of replicas of this dataset.
  • isInferred (type: boolean(), default: nil) - Indicates whether the metadata was inferred using an ML model rather than from the schema.org fields. Use optional so that explicitly setting to false will ensure the value is passed along to the KG instead of being indistinguisable from being unset and thus not set in the KG. This field was originally non-optional; changing to optional is backwards compatible, but protos created prior to being optional won't have has_is_inferred() (go/proto-proposals/proto3-presence#wire-format-semantic-changes).
  • metadataType (type: String.t, default: nil) -
  • scholarlyArticle (type: GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchScholarlyArticle.t, default: nil) - For tables and figures, contains all of the metadata for a scholarly article that was the source of this table or figure. This field is populated only if metadata_type is 'TABLE' or 'FIGURE'.
  • relatedArticleUrl (type: String.t, default: nil) - The url for the article that (likely) describes this dataset.
  • basicFieldsHash (type: String.t, default: nil) - A hash of the fields copied by BasicMetadataExtender and the importers. See cs/research/science_search/backend/extender/basic_metadata_extender.h for the list of fields.
  • compactIdentifier (type: list(String.t), default: nil) - Compact Identifiers (for example "RRID:SCR_002088") that can be resolved by Identifiers.org or N2T.net meta-resolvers.
  • imageUrl (type: list(String.t), default: nil) - The image urls provided by the dataset (e.g., for thumbnail images).
  • licenseDeprecated (type: list(String.t), default: nil) - License for the dataset. DEPRECATED
  • versionEmbeddingFieldsHash (type: String.t, default: nil) - A hash of the raw metadata fields used by the VersionEmbeddingExtender.
  • hasTableSummaries (type: boolean(), default: nil) - Indicates if the dataset has table summaries. This field is only populated during indexing time.
  • numberOfScholarCitations (type: integer(), default: nil) - The number of articles that reference this dataset.
  • id (type: String.t, default: nil) - A unique id for the dataset. For the data from Spore, this is the spore id, such as, for example "http://accession.nodc.noaa.gov/8500223#__sid=js0" REQUIRED
  • measurementTechnique (type: list(String.t), default: nil) - A technique or technology used in a Dataset corresponding to the method used for measuring the corresponding variable(s) (described using variableMeasured). http://schema.org/measurementTechnique
  • sourceUrl (type: String.t, default: nil) - Source url from which we gathered the metadata
  • fingerprint (type: String.t, default: nil) - The fingerprint of basic fields from DatasetMetadata, including: - name - description DEPRECATED
  • descriptionInHtml (type: list(String.t), default: nil) - Description of the dataset converted to HTML.
  • datasetClassificationFieldsHash (type: String.t, default: nil) - A hash of the raw metadata fields used by the QualityExtender.

Summary

Types

t()

Functions

decode(value, options)

Unwrap a decoded JSON object into its complex fields.

Types

Link to this type

t()

@type t() ::
  %GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchReconciledMetadata{
    alternateName: [String.t()] | nil,
    authorList: String.t() | nil,
    basicFieldsHash: String.t() | nil,
    catalog:
      GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchCatalog.t() | nil,
    compactIdentifier: [String.t()] | nil,
    compactIdentifierFromCitation: [String.t()] | nil,
    coverageEndDate:
      GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchDate.t() | nil,
    coverageStartDate:
      GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchDate.t() | nil,
    dataDownload:
      [
        GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchDataDownload.t()
      ]
      | nil,
    datasetClassificationFieldsHash: String.t() | nil,
    datasetClassificationScore: float() | nil,
    dateCreated:
      GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchDate.t() | nil,
    dateModified:
      GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchDate.t() | nil,
    datePublished:
      GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchDate.t() | nil,
    dateUpdated:
      GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchDate.t() | nil,
    denylistStatus: [String.t()] | nil,
    description: [String.t()] | nil,
    descriptionInHtml: [String.t()] | nil,
    doi: String.t() | nil,
    doiFromCitation: [String.t()] | nil,
    fieldOfStudy:
      [
        GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchFieldOfStudyInfo.t()
      ]
      | nil,
    fingerprint: String.t() | nil,
    funder:
      [
        GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchOrganization.t()
      ]
      | nil,
    hasCroissantFormat: boolean() | nil,
    hasTableSummaries: boolean() | nil,
    id: String.t() | nil,
    identifierFromSource: [String.t()] | nil,
    imageUrl: [String.t()] | nil,
    indexInCluster: integer() | nil,
    isAccessibleForFree: String.t() | nil,
    isBasedOn: [String.t()] | nil,
    isInferred: boolean() | nil,
    keyword: [String.t()] | nil,
    languageCode: String.t() | nil,
    license:
      [GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchLicense.t()]
      | nil,
    licenseDeprecated: [String.t()] | nil,
    locationReconciledForName: boolean() | nil,
    measurementTechnique: [String.t()] | nil,
    mentionedUrls: [String.t()] | nil,
    metadataType: String.t() | nil,
    name: [String.t()] | nil,
    numberOfDatasetsAtSourceUrl: integer() | nil,
    numberOfScholarCitations: integer() | nil,
    publication:
      [GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchCitation.t()]
      | nil,
    relatedArticleUrl: String.t() | nil,
    replica:
      [GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchReplica.t()]
      | nil,
    sameAs: [String.t()] | nil,
    scholarQuery: String.t() | nil,
    scholarlyArticle:
      GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchScholarlyArticle.t()
      | nil,
    sourceOrganization:
      [
        GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchOrganization.t()
      ]
      | nil,
    sourceUrl: String.t() | nil,
    sourceUrlDocjoinInfo:
      GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchSourceUrlDocjoinInfo.t()
      | nil,
    spatialCoverage:
      [GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchLocation.t()]
      | nil,
    topSalientTermLabel: [String.t()] | nil,
    url: [String.t()] | nil,
    variable: [String.t()] | nil,
    versionClusterInfo:
      GoogleApi.ContentWarehouse.V1.Model.ResearchScienceSearchVersionClusterInfo.t()
      | nil,
    versionEmbeddingFieldsHash: String.t() | nil,
    versionEmbeddingVector: [number()] | nil,
    versionsSimhash: String.t() | nil
  }

Functions

Link to this function

decode(value, options)

@spec decode(struct(), keyword()) :: struct()

Unwrap a decoded JSON object into its complex fields.