Speech to Text
PocketPaw can transcribe audio files to text using OpenAI’s Whisper API.
Setup
export POCKETCLAW_OPENAI_API_KEY="sk-..."Configuration
| Setting | Env Variable | Default | Description |
|---|---|---|---|
| Model | POCKETCLAW_STT_MODEL | whisper-1 | Whisper model to use |
Usage
User: Transcribe this audio file: /path/to/recording.mp3Agent: [uses stt tool] → "Here is the transcription..."Tool Schema
{ "name": "stt", "description": "Transcribe audio to text using OpenAI Whisper", "input_schema": { "type": "object", "properties": { "file_path": { "type": "string", "description": "Path to the audio file to transcribe" }, "language": { "type": "string", "description": "Language code (optional, auto-detected)" } }, "required": ["file_path"] }}Supported Formats
Whisper supports: mp3, mp4, mpeg, mpga, m4a, wav, webm.
Policy Group
Belongs to group:voice.
Was this page helpful?