Speech to Text

PocketPaw can transcribe audio files to text using OpenAI’s Whisper API.

Setup

Terminal window
export POCKETCLAW_OPENAI_API_KEY="sk-..."

Configuration

SettingEnv VariableDefaultDescription
ModelPOCKETCLAW_STT_MODELwhisper-1Whisper model to use

Usage

User: Transcribe this audio file: /path/to/recording.mp3
Agent: [uses stt tool] → "Here is the transcription..."

Tool Schema

{
"name": "stt",
"description": "Transcribe audio to text using OpenAI Whisper",
"input_schema": {
"type": "object",
"properties": {
"file_path": {
"type": "string",
"description": "Path to the audio file to transcribe"
},
"language": {
"type": "string",
"description": "Language code (optional, auto-detected)"
}
},
"required": ["file_path"]
}
}

Supported Formats

Whisper supports: mp3, mp4, mpeg, mpga, m4a, wav, webm.

Policy Group

Belongs to group:voice.