SpeechWaveHeader

AI Overview😉

  • The potential purpose of this module is to analyze and process audio signals, specifically speech waveforms, to extract relevant features and characteristics. This could be used in speech recognition, audio classification, or other audio-related applications.
  • This module could impact search results by allowing Google to better understand and analyze audio content, potentially improving the accuracy of speech recognition and audio classification. This could lead to more relevant search results for audio-related queries and improved functionality for features like voice search.
  • To be more favorable for this function, a website could optimize its audio content by providing clear and high-quality audio signals, using standardized formats and encoding, and providing relevant metadata such as sample rate, bitrate, and encoding type. Additionally, websites could provide transcripts or captions for audio content, making it easier for Google to analyze and understand the audio.

Interesting Module? Vote 👇

Voting helps other researchers find interesting modules.

Current Votes: 0

GoogleApi.ContentWarehouse.V1.Model.SpeechWaveHeader (google_api_content_warehouse v0.4.0)

A general-purpose buffer to contain sequences of samples. When representing a waveform, the samples are the scalar values of an acoustic signal. When representing a sequence of feature frames, the samples are vector-valued frames.

Attributes

  • atomicSize (type: integer(), default: nil) - Size of atomic type, in bytes.
  • atomicType (type: String.t, default: nil) - Numeric type of data elements (if generic)
  • bitRate (type: number(), default: nil) - For compressed signals with fixed bitrate, this is the number of bits per second.
  • byteOrder (type: String.t, default: nil) - Byte-order of the atomic_type When atomic_type == "char", byte_order should be always "1". When atomic_type == "int16", byte_order can be either "01" (Intel) or "10" (Motorola). Byte order should default to Intel when in question.
  • details (type: String.t, default: nil) - Typically contains the parameter settings of the program that created the file.
  • dimension (type: list(integer()), default: nil) - Array dimensions for a single sample. For audio samples: mono: rank==0, dimension==[1] stereo: rank==0, dimension==[2] (samples are interleaved) For typical ASR features representing energy, 12 MFCC coefficients, and first and second derivatives: * rank==1 and dimension==[39].
  • elementsPerSample (type: integer(), default: nil) - The number of atomic elements stored per sample. This is the product of all the entries in the dimension array. Written "out of order" in this file to be close to the dimension field, from which it can always be computed.
  • rank (type: integer(), default: nil) - The rank of each sample. For a waveform (signals that are sequences of scalar values), this is 0. For vector-valued signals (used as signals containing sequences of features, for example), this is 1. scalar=0, vector=1, matrix=2, ...
  • sampleCoding (type: String.t, default: nil) - Sample encoding. Can be "ulaw".
  • sampleRate (type: number(), default: nil) - For periodic signals, this is the number samples per second, else 0.0
  • sampleSize (type: integer(), default: nil) - Size of a single sample, in bytes.
  • sampleType (type: String.t, default: nil) - Structure of each sample. "generic" means that the samples are multi-dimensional arrays of atomic_type with the specified rank.
  • startTime (type: number(), default: nil) - Time origin for the signal, in seconds. Warning: Using float can result in rounding errors: float's smallest distance between two representable values (1 ULP; see https://en.wikipedia.org/wiki/Unit_in_the_last_place) between 1024 and 2048 (representing ~17-34 min) is 0.0001220703125, what is approximately double of what we need to represent 1 sample in a 16 kHz sample rate audio. The error is double in the 2048s-4096s, 4x in the 4096s-8192s range etc. Higher sample rate encounters rounding errors earlier: with 96 kHz, rounding errors start at ~2 min (128s).
  • totalSamples (type: String.t, default: nil) - The number of samples in file. Can be inferred for generics from file size.

Summary

Types

t()

Functions

decode(value, options)

Unwrap a decoded JSON object into its complex fields.

Types

Link to this type

t()

@type t() :: %GoogleApi.ContentWarehouse.V1.Model.SpeechWaveHeader{
  atomicSize: integer() | nil,
  atomicType: String.t() | nil,
  bitRate: number() | nil,
  byteOrder: String.t() | nil,
  details: String.t() | nil,
  dimension: [integer()] | nil,
  elementsPerSample: integer() | nil,
  rank: integer() | nil,
  sampleCoding: String.t() | nil,
  sampleRate: number() | nil,
  sampleSize: integer() | nil,
  sampleType: String.t() | nil,
  startTime: number() | nil,
  totalSamples: String.t() | nil
}

Functions

Link to this function

decode(value, options)

@spec decode(struct(), keyword()) :: struct()

Unwrap a decoded JSON object into its complex fields.