
ElevenLabs is now a Kiro Power
- Category
- ElevenAPI
- Date
Highest accuracy STT for bulk applications. Detect emphasis & sound effects, and guide transcription with keyterm prompting.
Uh, hi! So, um, I was wondering if you wanted to meet up for coffee? Maybe tomorrow morning? [nervous laugh] Totally fine if not!
Create captions, subtitles, and editable transcripts for podcasts, videos, interviews, and other recorded content – all with industry-leading accuracy via API.
Scribe v2 achieves industry-leading transcription accuracy, delivering clean, editable text even in challenging audio conditions or across diverse accents.
Uh, hi! So, um, I was wondering if you wanted to meet up for coffee? Maybe tomorrow morning? [nervous laugh] Totally fine if not!
Transcription that works in noisy environments, with background music, strong accents, and low-quality audio.
The ElevenLabs Transcription API can detect laughter, emotion, and sound effects. Use keyterm prompting to guide transcription with domain-specific terms.
.webp&w=3840&q=95)
.webp&w=3840&q=95)

Capture non-speech events like laughter, applause, music, and background noise. Transcripts include the full context of your audio, not just the words.
Automatically identify and label up to 48 speakers. Clear attribution of who said what, organized into readable transcripts.
Automatically identify and tag 56 entity types including names, dates, locations, and organizations within your transcripts.

Highest accuracy, designed for batch workloads.

Lowest latency, for realtime workloads.
Delivering exceptional accuracy across accents, dialects, and recording conditions.
Change the languageCode to preview languages
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
const elevenlabs = new ElevenLabsClient({
apiKey: "<your_api_key>"
});
const response = await fetch(
"https://storage.googleapis.com/eleven-public-cdn/audio/marketing/nicole.mp3"
);
const audioBlob = new Blob([await response.arrayBuffer()], { type: "audio/mp3" });
const transcription = await elevenlabs
.speechToText.convert({
file: audioBlob,
modelId: "scribe_v2",
tagAudioEvents: true,
languageCode: , // Set language
diarize: true
});
console.log(transcription);“From dubbing Reels in local languages, to generating music and character voices in Horizon, ElevenLabs platform enables global creators, businesses, and enterprises to build with voice, music, and sound at scale.”
“Scribe’s unmatched accuracy across so many languages lets Fieldy understand every daily conversation and easily scale across continents. Fieldy has increased user retention by 50% after moving to ElevenLabs Scribe.”
“ElevenLabs made it easy for us to quickly bring powerful text-to-speech capabilities to our SDK, allowing Agents to respond in real time with expressive voices to user questions or as feedback to what it’s seeing.”

“Twilio has integrated ElevenLabs’ generative AI voice technology into its CPaaS, enhancing ConversationRelay. This integration allows businesses and developers to create conversational AI voice interactions that sound human, feel expressive, and respond in real time directly from the Twilio CPaaS platform. We at ElevenLabs are excited that Twilio has chosen ElevenLabs to enhance ConversationRelay with the most expressive, human sounding voices available. ”

The Bulk Transcription API is part of Scribe, our Speech to Text system designed for large-scale audio and video transcription. It enables developers and enterprises to process hours of recorded content with industry-leading accuracy across 99 languages.
Scribe supports all common formats, including MP4, MOV, MP3, WAV, and more.
Scribe v2 achieves best-in-class accuracy across 99 languages and is robust to challenging audio conditions, accents, and recording quality. It outperforms previous generation models and other leading APIs on public benchmarks.
Processing time depends on file length and concurrency. Scribe is optimized for throughput and can handle large-scale pipelines with high parallelization, delivering transcripts in seconds to minutes.
Yes. The API provides smart speaker diarization, word- and character-level timestamps, and dynamic audio tagging for non-speech events like laughter or music.
Yes. You can define custom vocabularies to ensure correct transcription of product names, technical terminology, or unique brand phrases using keyterm prompting.
Scribe supports SOC 2, GDPR, and optional HIPAA compliance. Data is encrypted in transit and at rest, and teams can enable EU data residency or Zero Retention modes for stricter control.
Pricing is usage-based, calculated per minute of input audio. Volume discounts and enterprise plans are available for high-throughput workloads. Contact our sales team to discuss your requirements.
You can start transcribing immediately by generating an API key and exploring the API docs.







.webp&w=3840&q=80)
