Skip to content

Transcription

Persyk uses speech-to-text AI to transcribe your voice recordings into text. After recording a voice memo, the audio is sent to a transcription service and the resulting text is copied to your clipboard.

Any OpenAI-compatible transcription endpoint can be used, giving you flexibility to choose between cloud services or self-hosted solutions based on your privacy, latency, and cost requirements.

Configure transcription in your settings by specifying a provider and model:

{
"providers": {
"speaches": {
"type": "openai-compatible",
"baseUrl": "http://localhost:8000/v1",
"apiKey": "sk-..."
}
},
"transcription": {
"enabled": true,
"provider": "Systran/faster-distil-whisper-large-v3",
"model": "whisper-1"
}
}

Speaches is another project of mine - an OpenAI-compatible inference server with support for transcription, translation, text-to-speech, voice activity detection (VAD), speaker embedding, Realtime API, and more. See the GitHub repository for setup instructions.

Best for users who want full control over their data and are willing to run a local server. With the right hardware, you can achieve a significantly lower latency than cloud alternatives.

Example configuration:

{
"providers": {
"speaches": {
"type": "openai-compatible",
"baseUrl": "http://localhost:8000/v1",
"apiKey": null,
},
},
"transcription": {
"enabled": true,
"provider": "speaches",
"model": "Systran/faster-whisper-small.en",
},
}

Use OpenAI’s API directly. The quickest way to get started - just add your API key and you’re ready to go.

Example configuration:

{
"providers": {
"openai": {
"type": "openai-compatible",
"baseUrl": "https://api.openai.com/v1",
"apiKey": "sk-...",
},
},
"transcription": {
"enabled": true,
"provider": "openai",
"model": "gpt-4o-mini-transcribe", // or "whisper-1"
},
}