Podcast IR (Intermediate Representation)
Podcast notes store a structured JSON document called the IR in the notes.metadata column. The IR defines speakers, dialogue segments with script text, timing controls, and voice settings. A human-readable version is automatically derived into note_md for search and embedding via irToNoteMd().
Minimal viable example
Do not include audio, audioHistory, or audioUnsynced fields when creating a podcast note. These are managed by the TTS system and will be populated automatically when audio is generated.
{
"speakers": [
{
"id": "speaker_1",
"name": "Eva",
"role": "host",
"gender": "female",
"personality": "Professional, warm, Canadian accent"
}
],
"segments": [
{
"id": "seg_ax",
"speakerId": "speaker_1",
"script": "[confident] Welcome to the show. Today we're diving into AI agents.",
"pauseAfterMs": 500
},
{
"id": "seg_bq",
"speakerId": "speaker_1",
"script": "[explaining] A Skill is simply a set of instructions written in plain English... step-by-step workflows the Agent follows.",
"pauseAfterMs": 300
},
{
"id": "seg_cr",
"speakerId": "speaker_1",
"script": "[confident] Flexible automation on one side, fully custom reporting on the other... both from the same data.",
"pauseAfterMs": 0
}
],
"globalSettings": {
"modelId": "eleven_v3",
"outputFormat": "mp3_44100_128",
"language": "en"
}
}voiceId is optional on speakers. If omitted, the Studio UI will auto-recommend a voice based on gender and personality tags. You can also specify a voiceId from the premade voice catalog below.
Creation paths
| Path | Best for | Cost | How |
|---|---|---|---|
| Chat AI | Generate script from a topic | 1-2 credits | POST /api/chat/stream with activeTool: "podcast_script" |
| CLI | Have a pre-written script | 0 credits | nl.py notes create --type podcast --skip-ai --content '<IR JSON>' |
| GraphQL | Programmatic creation | 0 credits | createGeneralNote(content, noteType: "podcast", skipAi: true) |
| Studio UI | Manual editing in browser | 0 credits | FAB button → Podcast → opens Studio editor |
# Create a podcast note from a local IR JSON file
nl.py notes create --type podcast --skip-ai --content "$(cat my-podcast-ir.json)"mutation {
createGeneralNote(
content: "{\"speakers\":[...],\"segments\":[...],\"globalSettings\":{...}}"
noteType: "podcast"
skipAi: true
) {
id
noteMd
metadata
createdAt
}
}The content parameter receives the stringified IR JSON. The backend validates it via parsePodcastIR() (Zod). If validation fails, the note is still created but the IR is stored as plain text in note_md instead of structured metadata.
IR schema reference
The full IR (PodcastIROutputSchema) has three top-level required fields: speakers, segments, and globalSettings.
speakers[]
| Name | Type | Required | Description |
|---|---|---|---|
id | string | Required | Unique speaker ID. Convention: speaker_1, speaker_2, etc. |
name | string | Required | Display name shown in Studio UI and derived note_md. |
role | "host" | "cohost" | Required | Speaker role. |
gender | "male" | "female" | Required | Used for voice matching and catalog filtering. |
personality | string | Required | Free-text personality description. Used for auto voice recommendation via tag matching. |
voiceId | string | Optional | ElevenLabs voice ID. If omitted, Studio auto-recommends based on gender + personality. |
defaultVoiceSettings | object | Optional | { stability, similarity, style, speakerBoost } — all numbers 0-1 except speakerBoost (boolean). |
segments[]
| Name | Type | Required | Description |
|---|---|---|---|
id | string | Required | Stable segment ID. Format: "seg_" + 2 random lowercase letters (e.g. seg_ax, seg_bq). Must be unique. |
speakerId | string | Required | Must reference a speakers[].id. |
script | string | Required | Dialogue text. Supports [tag] prefixes for voice direction: [confident], [whispering], [excited], [pause], [laughs], etc. Use "..." for natural pauses with eleven_v3. |
pauseAfterMs | number | Required | Silence in ms after this segment. 200-500 for natural pacing, 500-800 for topic transitions. |
pauseBeforeMs | number | Optional | Silence in ms before this segment. Min 0. |
trimStartMs | number | Optional | Trim N ms from the start of generated audio. Min 0. |
trimEndMs | number | Optional | Trim N ms from the end of generated audio. Min 0. |
audio, audioHistory, and audioUnsynced fields exist on segments but are system-managed. Do not include them when creating or updating IR.
globalSettings
| Name | Type | Required | Description |
|---|---|---|---|
modelId | string | Required | ElevenLabs model ID. Use "eleven_v3". |
outputFormat | string | Required | Audio output format. Use "mp3_44100_128". |
language | string | Required | ISO 639-1 language code. "en" for English, "zh" for Chinese. |
Premade voice catalog
12 premade ElevenLabs voices are available without connecting a custom ElevenLabs API key. Set voiceId on a speaker to use one. The Studio UI has voice previews for auditioning.
Female voices
| Name | Voice ID | Description | Tags |
|---|---|---|---|
| Rachel | 21m00Tcm4TlvDq8ikWAM | Calm narration, analytical clarity and warmth | calm, analytical, warm |
| Sarah | EXAVITQu4vr4xnSDxMaL | Soft news voice, warm and thoughtful | warm, thoughtful |
| Alice | Xb7hH8MSUJpSbSDYk0k2 | Confident news voice, excited and passionate | excited, passionate |
| Matilda | XrExE9yKIg1WjnnlVkGX | Warm audiobook voice, reflective and comforting | warm, reflective |
| Lily | pFZP5JQG7iQjIQuC4Bku | Raspy narration voice, tense and serious | tense, serious |
| Emily | LcfcDJNUP1GQjkzn1xUU | Calm meditation voice, reflective and soothing | calm, reflective |
Male voices
| Name | Voice ID | Description | Tags |
|---|---|---|---|
| Brian | nPczCjzI2devNBz1zQrb | Deep narration, calm, analytical and thoughtful | calm, analytical, thoughtful |
| Daniel | onwK4e9ZLuTAKqWW03F9 | Deep British news voice, serious and thoughtful | serious, thoughtful |
| Chris | iP95p4xoKVk53GoZ742B | Casual conversation, curious, playful and casual | curious, playful, casual |
| Charlie | IKne3meq5aSn9XLyUdCD | Casual Australian voice, playful and casual | playful, casual |
| Bill | pqHfZKP75CvOlQylNhV4 | Strong documentary voice, passionate and serious | passionate, serious |
| Josh | TxGEqnHWrfWFTfGW9XjX | Deep young voice, excited and passionate | excited, passionate |
To use custom ElevenLabs voices, connect your API key via connectIntegration(provider: "elevenlabs", apiKey: "...") then query listElevenLabsVoices for available voice IDs.
Generating audio (TTS)
Generate audio for a single segment. Requires the podcast scope. Credits are proportional to estimated segment duration (~1 credit per 10s of audio).
mutation {
generateSegmentAudio(noteId: "your-note-uuid", segmentId: "seg_ax") {
segmentId
audioUrl
durationMs
requestId
}
}Call once per segment. Re-calling replaces existing audio (old audio is pushed to audioHistory). Play audio via GET /api/audio/:noteId/:segmentId.
AI script editing
Edit the podcast script using natural language instructions. Costs 1 credit per call. Requires the podcast scope.
POST /api/podcast/edit
{
"noteId": "your-note-uuid",
"prompt": "Make the tone more conversational"
}POST /api/podcast/edit
{
"noteId": "your-note-uuid",
"prompt": "Shorten this to under 20 words",
"segmentId": "seg_ax"
}| Name | Type | Required | Description |
|---|---|---|---|
noteId | string | Required | Note ID. |
prompt | string | Required | Edit instruction in natural language. |
segmentId | string | Optional | If provided, scopes the edit to that segment only. |
{
"changes": [
{ "segmentId": "seg_ax", "script": "Updated script text..." },
{ "segmentId": "seg_bq", "pauseAfterMs": 400 }
],
"summary": "Made the opening more conversational and shortened segment 2."
}Changes are not auto-applied. Merge the changes into your IR and call updateNote(noteId, metadata: updatedIR) to persist.
Speech-to-speech (STS)
Upload your own recorded audio and run it through ElevenLabs speech-to-speech to match the assigned voice's style. Requires the podcast scope. The speaker must have a voiceId assigned.
curl -X POST https://narrativelion.com/api/podcast/sts \
-H "Authorization: Bearer nlk_your_key" \
-F "noteId=your-note-uuid" \
-F "segmentId=seg_ax" \
-F "audio=@my-recording.mp3"| Name | Type | Required | Description |
|---|---|---|---|
noteId | string | Required | Note ID (form field). |
segmentId | string | Required | Segment ID (form field). |
audio | File | Required | Audio file, max 25 MB. |
{
"segmentId": "seg_ax",
"audioUrl": "audio/note-uuid/seg_ax/1700000000000",
"durationMs": 4200
}Updating a podcast note
Update the IR via updateNote with the metadata parameter. The backend re-validates the IR and auto-derives note_md from the updated content.
mutation {
updateNote(
noteId: "your-note-uuid"
metadata: "{\"speakers\":[...],\"segments\":[...],\"globalSettings\":{...}}"
) {
id
updatedAt
}
}You do not need to set noteMd separately — it is re-derived automatically from the IR on every updateNote(metadata: ...) call.
Exporting audio
Export all generated audio segments as a single merged MP3 or a ZIP archive of individual per-segment MP3 files. Requires the podcast scope.
mutation {
exportPodcast(noteId: "your-note-uuid", format: "mp3", paddingMs: 200) {
url
durationMs
}
}| Name | Type | Required | Description |
|---|---|---|---|
noteId | String! | Required | Note ID. |
format | String | Optional | "mp3" (default) — single merged file with silence gaps. "zip" — one MP3 per segment, no gaps. |
paddingMs | Int | Optional | Extra silence (ms) added after each segment. Only applies to mp3 format. 0-10000, default 0. |
Response
{
"url": "/api/audio/<noteId>/export", // download URL (signed, returns MP3 or ZIP)
"durationMs": 48200 // total audio duration
}Download the exported file via GET /api/audio/:noteId/export with an Authorization header. The response includes a Content-Disposition header for the appropriate filename (podcast-export.mp3 or podcast-export.zip).
Format comparison
| Feature | mp3 | zip |
|---|---|---|
| Output | Single merged MP3 file | ZIP with one MP3 per segment |
| Silence gaps | pauseBeforeMs + pauseAfterMs + paddingMs applied | No gaps (raw per-segment audio) |
| Trim | trimStartMs / trimEndMs applied | trimStartMs / trimEndMs applied |
| File naming | podcast-export.mp3 | 01-SpeakerName.mp3, 02-SpeakerName.mp3, ... |
Speed adjustment is a client-side feature (Studio UI only). It uses a Web Worker with SoundTouch time-stretch and is not available via the API. Exported audio is always at 1x speed.
ElevenLabs integration
The 12 premade voices work without any setup. To use your own custom ElevenLabs voices, connect your API key:
mutation {
connectIntegration(provider: "elevenlabs", apiKey: "your-eleven-labs-key") {
provider
connectedAt
}
}query {
listElevenLabsVoices {
voiceId
name
category
gender
description
previewUrl
}
}query {
myIntegration(provider: "elevenlabs") {
provider
connectedAt
}
}Disconnect with disconnectIntegration(provider: "elevenlabs").