Podcast IR (Intermediate Representation)

Podcast notes store a structured JSON document called the IR in the notes.metadata column. The IR defines speakers, dialogue segments with script text, timing controls, and voice settings. A human-readable version is automatically derived into note_md for search and embedding via irToNoteMd().

Minimal viable example

Do not include audio, audioHistory, or audioUnsynced fields when creating a podcast note. These are managed by the TTS system and will be populated automatically when audio is generated.

Minimal IR JSON
{
  "speakers": [
    {
      "id": "speaker_1",
      "name": "Eva",
      "role": "host",
      "gender": "female",
      "personality": "Professional, warm, Canadian accent"
    }
  ],
  "segments": [
    {
      "id": "seg_ax",
      "speakerId": "speaker_1",
      "script": "[confident] Welcome to the show. Today we're diving into AI agents.",
      "pauseAfterMs": 500
    },
    {
      "id": "seg_bq",
      "speakerId": "speaker_1",
      "script": "[explaining] A Skill is simply a set of instructions written in plain English... step-by-step workflows the Agent follows.",
      "pauseAfterMs": 300
    },
    {
      "id": "seg_cr",
      "speakerId": "speaker_1",
      "script": "[confident] Flexible automation on one side, fully custom reporting on the other... both from the same data.",
      "pauseAfterMs": 0
    }
  ],
  "globalSettings": {
    "modelId": "eleven_v3",
    "outputFormat": "mp3_44100_128",
    "language": "en"
  }
}

voiceId is optional on speakers. If omitted, the Studio UI will auto-recommend a voice based on gender and personality tags. You can also specify a voiceId from the premade voice catalog below.

Creation paths

PathBest forCostHow
Chat AIGenerate script from a topic1-2 creditsPOST /api/chat/stream with activeTool: "podcast_script"
CLIHave a pre-written script0 creditsnl.py notes create --type podcast --skip-ai --content '<IR JSON>'
GraphQLProgrammatic creation0 creditscreateGeneralNote(content, noteType: "podcast", skipAi: true)
Studio UIManual editing in browser0 creditsFAB button → Podcast → opens Studio editor
CLI example
# Create a podcast note from a local IR JSON file
nl.py notes create --type podcast --skip-ai --content "$(cat my-podcast-ir.json)"
GraphQL mutation
mutation {
  createGeneralNote(
    content: "{\"speakers\":[...],\"segments\":[...],\"globalSettings\":{...}}"
    noteType: "podcast"
    skipAi: true
  ) {
    id
    noteMd
    metadata
    createdAt
  }
}

The content parameter receives the stringified IR JSON. The backend validates it via parsePodcastIR() (Zod). If validation fails, the note is still created but the IR is stored as plain text in note_md instead of structured metadata.

IR schema reference

The full IR (PodcastIROutputSchema) has three top-level required fields: speakers, segments, and globalSettings.

speakers[]

NameTypeRequiredDescription
idstringRequiredUnique speaker ID. Convention: speaker_1, speaker_2, etc.
namestringRequiredDisplay name shown in Studio UI and derived note_md.
role"host" | "cohost"RequiredSpeaker role.
gender"male" | "female"RequiredUsed for voice matching and catalog filtering.
personalitystringRequiredFree-text personality description. Used for auto voice recommendation via tag matching.
voiceIdstringOptionalElevenLabs voice ID. If omitted, Studio auto-recommends based on gender + personality.
defaultVoiceSettingsobjectOptional{ stability, similarity, style, speakerBoost } — all numbers 0-1 except speakerBoost (boolean).

segments[]

NameTypeRequiredDescription
idstringRequiredStable segment ID. Format: "seg_" + 2 random lowercase letters (e.g. seg_ax, seg_bq). Must be unique.
speakerIdstringRequiredMust reference a speakers[].id.
scriptstringRequiredDialogue text. Supports [tag] prefixes for voice direction: [confident], [whispering], [excited], [pause], [laughs], etc. Use "..." for natural pauses with eleven_v3.
pauseAfterMsnumberRequiredSilence in ms after this segment. 200-500 for natural pacing, 500-800 for topic transitions.
pauseBeforeMsnumberOptionalSilence in ms before this segment. Min 0.
trimStartMsnumberOptionalTrim N ms from the start of generated audio. Min 0.
trimEndMsnumberOptionalTrim N ms from the end of generated audio. Min 0.

audio, audioHistory, and audioUnsynced fields exist on segments but are system-managed. Do not include them when creating or updating IR.

globalSettings

NameTypeRequiredDescription
modelIdstringRequiredElevenLabs model ID. Use "eleven_v3".
outputFormatstringRequiredAudio output format. Use "mp3_44100_128".
languagestringRequiredISO 639-1 language code. "en" for English, "zh" for Chinese.

Premade voice catalog

12 premade ElevenLabs voices are available without connecting a custom ElevenLabs API key. Set voiceId on a speaker to use one. The Studio UI has voice previews for auditioning.

Female voices

NameVoice IDDescriptionTags
Rachel21m00Tcm4TlvDq8ikWAMCalm narration, analytical clarity and warmthcalm, analytical, warm
SarahEXAVITQu4vr4xnSDxMaLSoft news voice, warm and thoughtfulwarm, thoughtful
AliceXb7hH8MSUJpSbSDYk0k2Confident news voice, excited and passionateexcited, passionate
MatildaXrExE9yKIg1WjnnlVkGXWarm audiobook voice, reflective and comfortingwarm, reflective
LilypFZP5JQG7iQjIQuC4BkuRaspy narration voice, tense and serioustense, serious
EmilyLcfcDJNUP1GQjkzn1xUUCalm meditation voice, reflective and soothingcalm, reflective

Male voices

NameVoice IDDescriptionTags
BriannPczCjzI2devNBz1zQrbDeep narration, calm, analytical and thoughtfulcalm, analytical, thoughtful
DanielonwK4e9ZLuTAKqWW03F9Deep British news voice, serious and thoughtfulserious, thoughtful
ChrisiP95p4xoKVk53GoZ742BCasual conversation, curious, playful and casualcurious, playful, casual
CharlieIKne3meq5aSn9XLyUdCDCasual Australian voice, playful and casualplayful, casual
BillpqHfZKP75CvOlQylNhV4Strong documentary voice, passionate and seriouspassionate, serious
JoshTxGEqnHWrfWFTfGW9XjXDeep young voice, excited and passionateexcited, passionate

To use custom ElevenLabs voices, connect your API key via connectIntegration(provider: "elevenlabs", apiKey: "...") then query listElevenLabsVoices for available voice IDs.

Generating audio (TTS)

Generate audio for a single segment. Requires the podcast scope. Credits are proportional to estimated segment duration (~1 credit per 10s of audio).

GraphQL mutation
mutation {
  generateSegmentAudio(noteId: "your-note-uuid", segmentId: "seg_ax") {
    segmentId
    audioUrl
    durationMs
    requestId
  }
}

Call once per segment. Re-calling replaces existing audio (old audio is pushed to audioHistory). Play audio via GET /api/audio/:noteId/:segmentId.

AI script editing

Edit the podcast script using natural language instructions. Costs 1 credit per call. Requires the podcast scope.

Edit full script
POST /api/podcast/edit
{
  "noteId": "your-note-uuid",
  "prompt": "Make the tone more conversational"
}
Edit a single segment
POST /api/podcast/edit
{
  "noteId": "your-note-uuid",
  "prompt": "Shorten this to under 20 words",
  "segmentId": "seg_ax"
}
NameTypeRequiredDescription
noteIdstringRequiredNote ID.
promptstringRequiredEdit instruction in natural language.
segmentIdstringOptionalIf provided, scopes the edit to that segment only.
Response
{
  "changes": [
    { "segmentId": "seg_ax", "script": "Updated script text..." },
    { "segmentId": "seg_bq", "pauseAfterMs": 400 }
  ],
  "summary": "Made the opening more conversational and shortened segment 2."
}

Changes are not auto-applied. Merge the changes into your IR and call updateNote(noteId, metadata: updatedIR) to persist.

Speech-to-speech (STS)

Upload your own recorded audio and run it through ElevenLabs speech-to-speech to match the assigned voice's style. Requires the podcast scope. The speaker must have a voiceId assigned.

Request
curl -X POST https://narrativelion.com/api/podcast/sts \
  -H "Authorization: Bearer nlk_your_key" \
  -F "noteId=your-note-uuid" \
  -F "segmentId=seg_ax" \
  -F "audio=@my-recording.mp3"
NameTypeRequiredDescription
noteIdstringRequiredNote ID (form field).
segmentIdstringRequiredSegment ID (form field).
audioFileRequiredAudio file, max 25 MB.
Response
{
  "segmentId": "seg_ax",
  "audioUrl": "audio/note-uuid/seg_ax/1700000000000",
  "durationMs": 4200
}

Updating a podcast note

Update the IR via updateNote with the metadata parameter. The backend re-validates the IR and auto-derives note_md from the updated content.

GraphQL mutation
mutation {
  updateNote(
    noteId: "your-note-uuid"
    metadata: "{\"speakers\":[...],\"segments\":[...],\"globalSettings\":{...}}"
  ) {
    id
    updatedAt
  }
}

You do not need to set noteMd separately — it is re-derived automatically from the IR on every updateNote(metadata: ...) call.

Exporting audio

Export all generated audio segments as a single merged MP3 or a ZIP archive of individual per-segment MP3 files. Requires the podcast scope.

GraphQL mutation
mutation {
  exportPodcast(noteId: "your-note-uuid", format: "mp3", paddingMs: 200) {
    url
    durationMs
  }
}
NameTypeRequiredDescription
noteIdString!RequiredNote ID.
formatStringOptional"mp3" (default) — single merged file with silence gaps. "zip" — one MP3 per segment, no gaps.
paddingMsIntOptionalExtra silence (ms) added after each segment. Only applies to mp3 format. 0-10000, default 0.

Response

ExportResult
{
  "url": "/api/audio/<noteId>/export",   // download URL (signed, returns MP3 or ZIP)
  "durationMs": 48200                     // total audio duration
}

Download the exported file via GET /api/audio/:noteId/export with an Authorization header. The response includes a Content-Disposition header for the appropriate filename (podcast-export.mp3 or podcast-export.zip).

Format comparison

Featuremp3zip
OutputSingle merged MP3 fileZIP with one MP3 per segment
Silence gapspauseBeforeMs + pauseAfterMs + paddingMs appliedNo gaps (raw per-segment audio)
TrimtrimStartMs / trimEndMs appliedtrimStartMs / trimEndMs applied
File namingpodcast-export.mp301-SpeakerName.mp3, 02-SpeakerName.mp3, ...

Speed adjustment is a client-side feature (Studio UI only). It uses a Web Worker with SoundTouch time-stretch and is not available via the API. Exported audio is always at 1x speed.

ElevenLabs integration

The 12 premade voices work without any setup. To use your own custom ElevenLabs voices, connect your API key:

Connect your ElevenLabs key
mutation {
  connectIntegration(provider: "elevenlabs", apiKey: "your-eleven-labs-key") {
    provider
    connectedAt
  }
}
List available voices
query {
  listElevenLabsVoices {
    voiceId
    name
    category
    gender
    description
    previewUrl
  }
}
Check connection status
query {
  myIntegration(provider: "elevenlabs") {
    provider
    connectedAt
  }
}

Disconnect with disconnectIntegration(provider: "elevenlabs").