Promptly Done

Explorations in Voice AI.

Text-to-Speech Architectures and Their Viability for New Language and Dialect Adaptation

A look at how different TTS architectures handle new languages and dialects, why your codec choice matters more than you think, and what it takes to make a model speak a language it's never heard

Cutting LLM Latency in Half with Voice Agents

How to cut your LLM's TTFT latency by 60%

Building a Low Latency (Under 1000 ms) Arabic Voice Agent with RAG Using Ultravox & LiveKit

How to build ultra-low latency Arabic voice agents using end-to-end speech models like Ultravox, with tool use and RAG capabilities

Experimenting with Speech Augmentations: Enhancing Speech-to-text Model Robustness

Explore speech augmentation experiments to boost ASR model robustness. Learn key techniques, practical examples, and their impact on WER/CER

Curating Custom Datasets for Arabic Speech-to-text Models:

A Case Study on what not to do: Lessons learned curating diverse Egyptian Arabic speech datasets for training high-quality ASR models