Query model Speech-to-Text

post

/1/ai/{product_id}/openai/audio/transcriptions

Query out Speech-to-text model. The input format is the same has OpenAI API. In async mode, use the following endpoint to get the result

Path parameters

product_id

requiredinteger

AI API product identifier

Examples:52438

Body Parameters

application/json

append_punctuationsstring

Only if timestamp_granularities[]:word is True, merge these punctuation symbols with the previous word

Examples:\"'.。,，!！?？:：”)]}、

chunk_lengthinteger

Min:2Max:30

Defines the maximum duration for an active segment in sec. For subtitle tasks, it's recommended to set this to a short duration (5-10 seconds) to avoid long sentences.

Examples:30

filerequiredstring

Max length:50000

The audio file to transcribe (50mo max, types : mp3,mp4,aac,wav,flac,ogg,opus,wma,m4a)

Examples:example

highlight_wordsboolean

Subtitle task. Underline each word as it is spoken in srt and vtt output formats (requires timestamp_granularities[]:word)

languagestring

Possible values:afamarasazbabebgbnbobrbscacscydadeeleneseteufafifofrglguhahawhehihrhthuhyidisitjajwkakkkmknkolalblnloltlvmgmimkmlmnmrmsmtmynenlnnnoocpaplpsptrorusasdsiskslsnsosqsrsusvswtatetgthtktltrttukuruzviyiyoyuezh

The language of the input audio. Supplying the input language will translate the output.

Examples:en

max_line_countinteger

Min:1Max:1000

Subtitle task. The maximum number of lines in a segment in srt and vtt output formats (requires timestamp_granularities[]:word)

Examples:1

max_line_widthinteger

Min:1Max:1000

Subtitle task. The maximum number of characters in a line before breaking the line in srt and vtt output formats (requires timestamp_granularities[]:word)

Examples:42

max_words_per_lineinteger

Min:1Max:1000

Subtitle task. The maximum number of words in a segment (requires timestamp_granularities[]:word)

Examples:1000

modelrequiredstring

Possible values:whisperwhisperV2

ID of the model to use.

Examples:whisper

no_speech_thresholdnumber

If the no_speech probability is higher than this value AND the average log probability over sampled tokens is below log_prob_threshold, consider the segment as silent.

Examples:0.6

prepend_punctuationsstring

Only if timestamp_granularities[]:word is True, merge these punctuation symbols with the next word

Examples:\"'“¿([{-

promptstring

An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.

Examples:example

response_formatstring

Possible values:jsonsrttextverbose_jsonvtt

The format of the transcript output (default: json)

Examples:text

timestamp_granularitiesarrayofstring

Possible values:segmentword

The timestamp granularities to populate for this transcription. Either or both of these options are supported: word, or segment. Requires response_format=verbose_json. Defaults to segment.

Examples:["word","segment"]

Response Body

application/json

batch_idstring

The id of the batch dispatched handling the transcription.

Examples:9b9fec49-cc95-44d5-8d3a-be56a6e05970

Example request

                <?php
use GuzzleHttp\Client;

$client = new Client();
$headers = [
	'Authorization' => 'Bearer YOUR-TOKEN-HERE',
	'Content-Type' => 'application/json'
];

$body = '{
    "file": "example",
    "model": "whisper"
}';

$request = new Request('POST', 'https://api.infomaniak.com/1/ai/{product_id}/openai/audio/transcriptions', $headers, $body);
$res = $client->sendAsync($request)->wait();
echo $res->getBody();

Example response

application/json

                
                    
{
"batch_id":"9b9fec49-cc95-44d5-8d3a-be56a6e05970"
}