Podcast IR (Intermediate Representation)

Podcast notes store a structured JSON document called the IR in the notes.metadata column. The IR defines speakers, dialogue segments with script text, timing controls, and voice settings. A human-readable version is automatically derived into note_md for search and embedding via irToNoteMd().

Minimal viable example

Do not include audio, audioHistory, or audioUnsynced fields when creating a podcast note. These are managed by the TTS system and will be populated automatically when audio is generated.

Minimal IR JSON

{
  "speakers": [
    {
      "id": "speaker_1",
      "name": "Eva",
      "role": "host",
      "gender": "female",
      "personality": "Professional, warm, Canadian accent"
    }
  ],
  "segments": [
    {
      "id": "seg_ax",
      "speakerId": "speaker_1",
      "script": "[confident] Welcome to the show. Today we're diving into AI agents.",
      "pauseAfterMs": 500
    },
    {
      "id": "seg_bq",
      "speakerId": "speaker_1",
      "script": "[explaining] A Skill is simply a set of instructions written in plain English... step-by-step workflows the Agent follows.",
      "pauseAfterMs": 300
    },
    {
      "id": "seg_cr",
      "speakerId": "speaker_1",
      "script": "[confident] Flexible automation on one side, fully custom reporting on the other... both from the same data.",
      "pauseAfterMs": 0
    }
  ],
  "globalSettings": {
    "modelId": "eleven_v3",
    "outputFormat": "mp3_44100_128",
    "language": "en"
  }
}

voiceId is optional on speakers. If omitted, the Studio UI will auto-recommend a voice based on gender and personality tags. You can also specify a voiceId from the premade voice catalog below.

Creation paths

Path	Best for	Cost	How
Chat AI	Generate script from a topic	1-2 credits	`POST /api/chat/stream` with `activeTool: "podcast_script"`
CLI	Have a pre-written script	0 credits	`nl.py notes create --type podcast --skip-ai --content '<IR JSON>'`
GraphQL	Programmatic creation	0 credits	`createGeneralNote(content, noteType: "podcast", skipAi: true)`
Studio UI	Manual editing in browser	0 credits	FAB button → Podcast → opens Studio editor

CLI example

# Create a podcast note from a local IR JSON file
nl.py notes create --type podcast --skip-ai --content "$(cat my-podcast-ir.json)"

GraphQL mutation

mutation {
  createGeneralNote(
    content: "{\"speakers\":[...],\"segments\":[...],\"globalSettings\":{...}}"
    noteType: "podcast"
    skipAi: true
  ) {
    id
    noteMd
    metadata
    createdAt
  }
}

The content parameter receives the stringified IR JSON. The backend validates it via parsePodcastIR() (Zod). If validation fails, the note is still created but the IR is stored as plain text in note_md instead of structured metadata.

IR schema reference

The full IR (PodcastIROutputSchema) has three top-level required fields: speakers, segments, and globalSettings.

speakers[]

Name	Type	Required	Description
`id`	string	Required	Unique speaker ID. Convention: speaker_1, speaker_2, etc.
`name`	string	Required	Display name shown in Studio UI and derived note_md.
`role`	"host" \| "cohost"	Required	Speaker role.
`gender`	"male" \| "female"	Required	Used for voice matching and catalog filtering.
`personality`	string	Required	Free-text personality description. Used for auto voice recommendation via tag matching.
`voiceId`	string	Optional	ElevenLabs voice ID. If omitted, Studio auto-recommends based on gender + personality.
`defaultVoiceSettings`	object	Optional	{ stability, similarity, style, speakerBoost } — all numbers 0-1 except speakerBoost (boolean).

segments[]

Name	Type	Required	Description
`id`	string	Required	Stable segment ID. Format: "seg_" + 2 random lowercase letters (e.g. seg_ax, seg_bq). Must be unique.
`speakerId`	string	Required	Must reference a speakers[].id.
`script`	string	Required	Dialogue text. Supports [tag] prefixes for voice direction: [confident], [whispering], [excited], [pause], [laughs], etc. Use "..." for natural pauses with eleven_v3.
`pauseAfterMs`	number	Required	Silence in ms after this segment. 200-500 for natural pacing, 500-800 for topic transitions.
`pauseBeforeMs`	number	Optional	Silence in ms before this segment. Min 0.
`trimStartMs`	number	Optional	Trim N ms from the start of generated audio. Min 0.
`trimEndMs`	number	Optional	Trim N ms from the end of generated audio. Min 0.

audio, audioHistory, and audioUnsynced fields exist on segments but are system-managed. Do not include them when creating or updating IR.

globalSettings

Name	Type	Required	Description
`modelId`	string	Required	ElevenLabs model ID. Use "eleven_v3".
`outputFormat`	string	Required	Audio output format. Use "mp3_44100_128".
`language`	string	Required	ISO 639-1 language code. "en" for English, "zh" for Chinese.

Premade voice catalog

12 premade ElevenLabs voices are available without connecting a custom ElevenLabs API key. Set voiceId on a speaker to use one. The Studio UI has voice previews for auditioning.

Female voices

Name	Voice ID	Description	Tags
Rachel	`21m00Tcm4TlvDq8ikWAM`	Calm narration, analytical clarity and warmth	calm, analytical, warm
Sarah	`EXAVITQu4vr4xnSDxMaL`	Soft news voice, warm and thoughtful	warm, thoughtful
Alice	`Xb7hH8MSUJpSbSDYk0k2`	Confident news voice, excited and passionate	excited, passionate
Matilda	`XrExE9yKIg1WjnnlVkGX`	Warm audiobook voice, reflective and comforting	warm, reflective
Lily	`pFZP5JQG7iQjIQuC4Bku`	Raspy narration voice, tense and serious	tense, serious
Emily	`LcfcDJNUP1GQjkzn1xUU`	Calm meditation voice, reflective and soothing	calm, reflective

Male voices

Name	Voice ID	Description	Tags
Brian	`nPczCjzI2devNBz1zQrb`	Deep narration, calm, analytical and thoughtful	calm, analytical, thoughtful
Daniel	`onwK4e9ZLuTAKqWW03F9`	Deep British news voice, serious and thoughtful	serious, thoughtful
Chris	`iP95p4xoKVk53GoZ742B`	Casual conversation, curious, playful and casual	curious, playful, casual
Charlie	`IKne3meq5aSn9XLyUdCD`	Casual Australian voice, playful and casual	playful, casual
Bill	`pqHfZKP75CvOlQylNhV4`	Strong documentary voice, passionate and serious	passionate, serious
Josh	`TxGEqnHWrfWFTfGW9XjX`	Deep young voice, excited and passionate	excited, passionate

To use custom ElevenLabs voices, connect your API key via connectIntegration(provider: "elevenlabs", apiKey: "...") then query listElevenLabsVoices for available voice IDs.

Generating audio (TTS)

Generate audio for a single segment. Requires the podcast scope. Credits are proportional to estimated segment duration (~1 credit per 10s of audio).

GraphQL mutation

mutation {
  generateSegmentAudio(noteId: "your-note-uuid", segmentId: "seg_ax") {
    segmentId
    audioUrl
    durationMs
    requestId
  }
}

Call once per segment. Re-calling replaces existing audio (old audio is pushed to audioHistory). Play audio via GET /api/audio/:noteId/:segmentId.

AI script editing

Edit the podcast script using natural language instructions. Costs 1 credit per call. Requires the podcast scope.

Edit full script

POST /api/podcast/edit
{
  "noteId": "your-note-uuid",
  "prompt": "Make the tone more conversational"
}

Edit a single segment

POST /api/podcast/edit
{
  "noteId": "your-note-uuid",
  "prompt": "Shorten this to under 20 words",
  "segmentId": "seg_ax"
}

Name	Type	Required	Description
`noteId`	string	Required	Note ID.
`prompt`	string	Required	Edit instruction in natural language.
`segmentId`	string	Optional	If provided, scopes the edit to that segment only.

Response

{
  "changes": [
    { "segmentId": "seg_ax", "script": "Updated script text..." },
    { "segmentId": "seg_bq", "pauseAfterMs": 400 }
  ],
  "summary": "Made the opening more conversational and shortened segment 2."
}

Changes are not auto-applied. Merge the changes into your IR and call updateNote(noteId, metadata: updatedIR) to persist.

Speech-to-speech (STS)

Upload your own recorded audio and run it through ElevenLabs speech-to-speech to match the assigned voice's style. Requires the podcast scope. The speaker must have a voiceId assigned.

Request

curl -X POST https://narrativelion.com/api/podcast/sts \
  -H "Authorization: Bearer nlk_your_key" \
  -F "noteId=your-note-uuid" \
  -F "segmentId=seg_ax" \
  -F "audio=@my-recording.mp3"

Name	Type	Required	Description
`noteId`	string	Required	Note ID (form field).
`segmentId`	string	Required	Segment ID (form field).
`audio`	File	Required	Audio file, max 25 MB.

Response

{
  "segmentId": "seg_ax",
  "audioUrl": "audio/note-uuid/seg_ax/1700000000000",
  "durationMs": 4200
}

Updating a podcast note

Update the IR via updateNote with the metadata parameter. The backend re-validates the IR and auto-derives note_md from the updated content.

GraphQL mutation

mutation {
  updateNote(
    noteId: "your-note-uuid"
    metadata: "{\"speakers\":[...],\"segments\":[...],\"globalSettings\":{...}}"
  ) {
    id
    updatedAt
  }
}

You do not need to set noteMd separately — it is re-derived automatically from the IR on every updateNote(metadata: ...) call.

Exporting audio

Export all generated audio segments as a single merged MP3 or a ZIP archive of individual per-segment MP3 files. Requires the podcast scope.

GraphQL mutation

mutation {
  exportPodcast(noteId: "your-note-uuid", format: "mp3", paddingMs: 200) {
    url
    durationMs
  }
}

Name	Type	Required	Description
`noteId`	String!	Required	Note ID.
`format`	String	Optional	"mp3" (default) — single merged file with silence gaps. "zip" — one MP3 per segment, no gaps.
`paddingMs`	Int	Optional	Extra silence (ms) added after each segment. Only applies to mp3 format. 0-10000, default 0.

Response

ExportResult

{
  "url": "/api/audio/<noteId>/export",   // download URL (signed, returns MP3 or ZIP)
  "durationMs": 48200                     // total audio duration
}

Download the exported file via GET /api/audio/:noteId/export with an Authorization header. The response includes a Content-Disposition header for the appropriate filename (podcast-export.mp3 or podcast-export.zip).

Format comparison

Feature	mp3	zip
Output	Single merged MP3 file	ZIP with one MP3 per segment
Silence gaps	pauseBeforeMs + pauseAfterMs + paddingMs applied	No gaps (raw per-segment audio)
Trim	trimStartMs / trimEndMs applied	trimStartMs / trimEndMs applied
File naming	podcast-export.mp3	01-SpeakerName.mp3, 02-SpeakerName.mp3, ...

Speed adjustment is a client-side feature (Studio UI only). It uses a Web Worker with SoundTouch time-stretch and is not available via the API. Exported audio is always at 1x speed.

ElevenLabs integration

The 12 premade voices work without any setup. To use your own custom ElevenLabs voices, connect your API key:

Connect your ElevenLabs key

mutation {
  connectIntegration(provider: "elevenlabs", apiKey: "your-eleven-labs-key") {
    provider
    connectedAt
  }
}

List available voices

query {
  listElevenLabsVoices {
    voiceId
    name
    category
    gender
    description
    previewUrl
  }
}

Check connection status

query {
  myIntegration(provider: "elevenlabs") {
    provider
    connectedAt
  }
}

Disconnect with disconnectIntegration(provider: "elevenlabs").