Speech to Text

PocketPaw can transcribe audio files to text using OpenAI’s Whisper API.

Setup

export POCKETCLAW_OPENAI_API_KEY="sk-..."

Configuration

Setting	Env Variable	Default	Description
Model	`POCKETCLAW_STT_MODEL`	`whisper-1`	Whisper model to use

Usage

User: Transcribe this audio file: /path/to/recording.mp3
Agent: [uses stt tool] → "Here is the transcription..."

Tool Schema

{
  "name": "stt",
  "description": "Transcribe audio to text using OpenAI Whisper",
  "input_schema": {
    "type": "object",
    "properties": {
      "file_path": {
        "type": "string",
        "description": "Path to the audio file to transcribe"
      },
      "language": {
        "type": "string",
        "description": "Language code (optional, auto-detected)"
      }
    },
    "required": ["file_path"]
  }
}

Supported Formats

Whisper supports: mp3, mp4, mpeg, mpga, m4a, wav, webm.

Policy Group

Belongs to group:voice.

Last updated: February 12, 2026

Edit this page

Was this page helpful?