@@ -25,51 +25,51 @@ class AudioEncoding(enum.IntEnum):
25
25
26
26
All encodings support only 1 channel (mono) audio.
27
27
28
- For best results, the audio source should be captured and transmitted using
29
- a lossless encoding (``FLAC`` or ``LINEAR16``). The accuracy of the speech
30
- recognition can be reduced if lossy codecs are used to capture or transmit
31
- audio, particularly if background noise is present. Lossy codecs include
32
- ``MULAW``, ``AMR``, ``AMR_WB``, ``OGG_OPUS``, and ``SPEEX_WITH_HEADER_BYTE``.
33
-
34
- The ``FLAC`` and ``WAV`` audio file formats include a header that describes the
35
- included audio content. You can request recognition for ``WAV`` files that
36
- contain either ``LINEAR16`` or ``MULAW`` encoded audio.
37
- If you send ``FLAC`` or ``WAV`` audio file format in
38
- your request, you do not need to specify an ``AudioEncoding``; the audio
39
- encoding format is determined from the file header. If you specify
40
- an ``AudioEncoding`` when you send send ``FLAC`` or ``WAV`` audio, the
28
+ For best results, the audio source should be captured and transmitted
29
+ using a lossless encoding (``FLAC`` or ``LINEAR16``). The accuracy of
30
+ the speech recognition can be reduced if lossy codecs are used to
31
+ capture or transmit audio, particularly if background noise is present.
32
+ Lossy codecs include ``MULAW``, ``AMR``, ``AMR_WB``, ``OGG_OPUS``, and
33
+ ``SPEEX_WITH_HEADER_BYTE``.
34
+
35
+ The ``FLAC`` and ``WAV`` audio file formats include a header that
36
+ describes the included audio content. You can request recognition for
37
+ ``WAV`` files that contain either ``LINEAR16`` or ``MULAW`` encoded
38
+ audio. If you send ``FLAC`` or ``WAV`` audio file format in your
39
+ request, you do not need to specify an ``AudioEncoding``; the audio
40
+ encoding format is determined from the file header. If you specify an
41
+ ``AudioEncoding`` when you send send ``FLAC`` or ``WAV`` audio, the
41
42
encoding configuration must match the encoding described in the audio
42
43
header; otherwise the request returns an
43
44
``google.rpc.Code.INVALID_ARGUMENT`` error code.
44
45
45
46
Attributes:
46
47
ENCODING_UNSPECIFIED (int): Not specified.
47
48
LINEAR16 (int): Uncompressed 16-bit signed little-endian samples (Linear PCM).
48
- FLAC (int): ``FLAC`` (Free Lossless Audio
49
- Codec) is the recommended encoding because it is
50
- lossless--therefore recognition is not compromised--and
51
- requires only about half the bandwidth of ``LINEAR16``. ``FLAC`` stream
52
- encoding supports 16-bit and 24-bit samples, however, not all fields in
49
+ FLAC (int): ``FLAC`` (Free Lossless Audio Codec) is the recommended encoding because
50
+ it is lossless--therefore recognition is not compromised--and requires
51
+ only about half the bandwidth of ``LINEAR16``. ``FLAC`` stream encoding
52
+ supports 16-bit and 24-bit samples, however, not all fields in
53
53
``STREAMINFO`` are supported.
54
54
MULAW (int): 8-bit samples that compand 14-bit audio samples using G.711 PCMU/mu-law.
55
- AMR (int): Adaptive Multi-Rate Narrowband codec. ``sample_rate_hertz`` must be 8000.
55
+ AMR (int): Adaptive Multi-Rate Narrowband codec. ``sample_rate_hertz`` must be
56
+ 8000.
56
57
AMR_WB (int): Adaptive Multi-Rate Wideband codec. ``sample_rate_hertz`` must be 16000.
57
58
OGG_OPUS (int): Opus encoded audio frames in Ogg container
58
- (`OggOpus <https://wiki.xiph.org/OggOpus>`_).
59
- ``sample_rate_hertz`` must be one of 8000, 12000, 16000, 24000, or 48000.
59
+ (`OggOpus <https://wiki.xiph.org/OggOpus>`__). ``sample_rate_hertz``
60
+ must be one of 8000, 12000, 16000, 24000, or 48000.
60
61
SPEEX_WITH_HEADER_BYTE (int): Although the use of lossy encodings is not recommended, if a very low
61
62
bitrate encoding is required, ``OGG_OPUS`` is highly preferred over
62
- Speex encoding. The `Speex <https://speex.org/>`_ encoding supported by
63
+ Speex encoding. The `Speex <https://speex.org/>`__ encoding supported by
63
64
Cloud Speech API has a header byte in each block, as in MIME type
64
- ``audio/x-speex-with-header-byte``.
65
- It is a variant of the RTP Speex encoding defined in
66
- `RFC 5574 <https://tools.ietf.org/html/rfc5574>`_.
65
+ ``audio/x-speex-with-header-byte``. It is a variant of the RTP Speex
66
+ encoding defined in `RFC 5574 <https://tools.ietf.org/html/rfc5574>`__.
67
67
The stream is a sequence of blocks, one block per RTP packet. Each block
68
- starts with a byte containing the length of the block, in bytes, followed
69
- by one or more frames of Speex data, padded to an integral number of
70
- bytes (octets) as specified in RFC 5574. In other words, each RTP header
71
- is replaced with a single byte containing the block length. Only Speex
72
- wideband is supported. ``sample_rate_hertz`` must be 16000.
68
+ starts with a byte containing the length of the block, in bytes,
69
+ followed by one or more frames of Speex data, padded to an integral
70
+ number of bytes (octets) as specified in RFC 5574. In other words, each
71
+ RTP header is replaced with a single byte containing the block length.
72
+ Only Speex wideband is supported. ``sample_rate_hertz`` must be 16000.
73
73
"""
74
74
ENCODING_UNSPECIFIED = 0
75
75
LINEAR16 = 1
@@ -91,9 +91,9 @@ class InteractionType(enum.IntEnum):
91
91
INTERACTION_TYPE_UNSPECIFIED (int): Use case is either unknown or is something other than one of the other
92
92
values below.
93
93
DISCUSSION (int): Multiple people in a conversation or discussion. For example in a
94
- meeting with two or more people actively participating. Typically
95
- all the primary people speaking would be in the same room (if not,
96
- see PHONE_CALL )
94
+ meeting with two or more people actively participating. Typically all
95
+ the primary people speaking would be in the same room (if not, see
96
+ PHONE\_CALL )
97
97
PRESENTATION (int): One or more persons lecturing or presenting to others, mostly
98
98
uninterrupted.
99
99
PHONE_CALL (int): A phone-call or video-conference in which two or more people, who are
@@ -178,9 +178,10 @@ class SpeechEventType(enum.IntEnum):
178
178
speech utterance and expects no additional speech. Therefore, the server
179
179
will not process additional audio (although it may subsequently return
180
180
additional results). The client should stop sending additional audio
181
- data, half-close the gRPC connection, and wait for any additional results
182
- until the server closes the gRPC connection. This event is only sent if
183
- ``single_utterance`` was set to ``true``, and is not used otherwise.
181
+ data, half-close the gRPC connection, and wait for any additional
182
+ results until the server closes the gRPC connection. This event is only
183
+ sent if ``single_utterance`` was set to ``true``, and is not used
184
+ otherwise.
184
185
"""
185
186
SPEECH_EVENT_UNSPECIFIED = 0
186
187
END_OF_SINGLE_UTTERANCE = 1
0 commit comments