Query model Speech-to-Text

v1
post
/1/ai/{product_id}/openai/audio/transcriptions

Query out Speech-to-text model. The input format is the same has OpenAI API. In async mode, use the following endpoint to get the result

Path parameters

product_id
requiredinteger

AI API product identifier

Examples:24250

Body Parameters

application/json
append_punctuationsstring

Only if timestamp_granularities[]:word is True, merge these punctuation symbols with the previous word

Examples:\"'.。,,!!??::”)]}、
chunk_lengthinteger
Min:2Max:30

Defines the maximum duration for an active segment in sec. For subtitle tasks, it's recommended to set this to a short duration (5-10 seconds) to avoid long sentences.

Examples:30
filerequiredstring

The audio file to transcribe (50mo max, types : mp3,mp4,aac,wav,flac,ogg,opus,wma,m4a)

Examples:example
highlight_wordsboolean

Subtitle task. Underline each word as it is spoken in srt and vtt output formats (requires timestamp_granularities[]:word)

languagestring
Possible values:afamarasazbabebgbnbobrbscacscydadeeleneseteufafifofrglguhahawhehihrhthuhyidisitjajwkakkkmknkolalblnloltlvmgmimkmlmnmrmsmtmynenlnnnoocpaplpsptrorusasdsiskslsnsosqsrsusvswtatetgthtktltrttukuruzviyiyoyuezh

The language of the input audio. Supplying the input language will translate the output.

Examples:en
max_line_countinteger
Min:1Max:1000

Subtitle task. The maximum number of lines in a segment in srt and vtt output formats (requires timestamp_granularities[]:word)

Examples:1
max_line_widthinteger
Min:1Max:1000

Subtitle task. The maximum number of characters in a line before breaking the line in srt and vtt output formats (requires timestamp_granularities[]:word)

Examples:42
max_words_per_lineinteger
Min:1Max:1000

Subtitle task. The maximum number of words in a segment (requires timestamp_granularities[]:word)

Examples:1000
modelrequiredstring
Possible values:whisperwhisperV2

ID of the model to use.

Examples:whisper
no_speech_thresholdnumber

If the no_speech probability is higher than this value AND the average log probability over sampled tokens is below log_prob_threshold, consider the segment as silent.

Examples:0.6
prepend_punctuationsstring

Only if timestamp_granularities[]:word is True, merge these punctuation symbols with the next word

Examples:\"'“¿([{-
promptstring

An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.

Examples:example
response_formatstring
Possible values:jsonsrttextverbose_jsonvtt

The format of the transcript output (default: json)

Examples:text
timestamp_granularitiesarrayofstring
Possible values:segmentword

The timestamp granularities to populate for this transcription. Either or both of these options are supported: word, or segment. Requires response_format=verbose_json. Defaults to segment.

Examples:["word","segment"]

Response Body

application/json
batch_idstring

The id of the batch dispatched handling the transcription.

Examples:9b9fec49-cc95-44d5-8d3a-be56a6e05970

Example request

                <?php
use GuzzleHttp\Client;

$client = new Client();
$headers = [
	'Authorization' => 'Bearer YOUR-TOKEN-HERE',
	'Content-Type' => 'application/json'
];

$body = '{
    "file": "example",
    "model": "whisper"
}';

$request = new Request('POST', 'https://api.infomaniak.com/1/ai/{product_id}/openai/audio/transcriptions', $headers, $body);
$res = $client->sendAsync($request)->wait();
echo $res->getBody();
            

Example response

application/json
                
                    
{
"batch_id":"9b9fec49-cc95-44d5-8d3a-be56a6e05970"
}