Supplementary Material for Declarative Audio Editing with Audio Language Model

SmartDJ (Ours) • Audio Editing Examples

Declarative audio editing examples

Declarative instruction: “Make it sound like a lively parade”

ALM inferenced atomic editing steps:

  • Add the sound of marching band at front by 3dB
  • Change the timbre of the sound of person whistle to be more bright
  • Turn down the sound of horn by 2dB

  • Original

    SmartDJ (Ours)

    Declarative instruction: “Make this sound like a windy farm.”

    ALM inferenced atomic editing steps:

  • Add the sound of wind in fields at front by 3dB
  • Add the sound of cows mooing at right font by 2dB
  • Remove the sound of person type
  • Turn down the sound of goat bleat by 3dB

  • Original

    SmartDJ (Ours)

    Declarative instruction: “Make this sound like a stormy day.”

    ALM inferenced atomic editing steps:

  • Add the sound of distant thunder rumble at left by 2dB
  • Change the timbre of the phone ringing to be more muffled

  • Original

    SmartDJ (Ours)

    Declarative instruction: “Simulate the sounds of a busy highway”

    ALM inferenced atomic editing steps:

  • Add the sound of car honking at right by 3dB
  • Turn up the sound of truck vibrate by 2dB
  • Reverb the sound of vehicle pass with large reveberatioin

  • Original

    SmartDJ (Ours)

    Atomic editing action: Change sound direction

    Edit instruction: “change the sound of a man talking and plastic crinkling and crumpling at the right front to the front”

    Original

    Audit

    SmartDJ (Ours)

    Target (Ground truth)

    Edit instruction: “Change the sound of woman speaking, food frying at the front to the right”

    Original

    Audit

    SmartDJ (Ours)

    Target (Ground truth)

    Atomic editing action: Reverb

    Edit instruction: “Reverb the sound of laughter and speech at the right with high reverberations”

    Original

    Audit

    SmartDJ (Ours)

    Target (Ground truth)

    Edit instruction: “Reverb the sound of baby cries and adult male speaks at the right front with mid reveberations”

    Original

    Audit

    SmartDJ (Ours)

    Target (Ground truth)

    Atomic editing action: Time shift

    Edit instruction: “Shift the sound of a baby is crying at the front by -3 seconds”

    Original

    Audit

    SmartDJ (Ours)

    Target (Ground truth)

    Edit instruction: “Shift the sound of engines hum and squealing tires at the right by 3 seconds”

    Original

    Audit

    SmartDJ (Ours)

    Target (Ground truth)

    Atomic editing action: Timbre

    Edit instruction: “Change the timbre of the sound of loud humming and wind blowing at the left to be more muffled”

    Original

    Audit

    SmartDJ (Ours)

    Target (Ground truth)

    Edit instruction: “Change the timbre of the sound of train horns blowing at the left front to be more bright”

    Original

    Audit

    SmartDJ (Ours)

    Target (Ground truth)

    Real recording examples

    Declarative instruction: “Make this sound like in a game room.”

    Decomposed atomic editing actions

    1. Remove the sound of explosion.
    2. Add the sound of video game playing at left by 3dB.

    Original

    Edited

    Declarative instruction: “Make this sound like in a farm.”

    Decomposed atomic editing actions

    1. Remove the sound of machines.
    2. Turn up the sound of man speech by 2dB.
    3. Add the sound of sheep at left by 2dB.

    Original

    Edited

    Multi-step Editing

    Declarative instruction: “Make this sound like a busy office.”

    Decomposed atomic editing actions

    1. Remove the sound of drilling.
    2. Turn up the sound of typewriter type by 2 dB.
    3. Add the sound of phone ringing at right by 3 dB.

    Original

    Before any editing

    Step 1 – Remove drilling

    “remove the sound of drilling”

    Step 2 – Turn up typewriter

    “turn up the sound of typewriter type by 2dB”

    Step 3 – Add phone ringring

    “Add the sound of phone ringing at right by 3dB" ”

    Declarative instruction: “Make this sound like a workshop by the dock.”

    Decomposed atomic editing actions

    1. Remove the sound of metal knock.
    2. Add the sound of seagulls squawking at the left by 3 dB.
    3. Add the sound of waves lapping at the right by 2 dB.

    Original

    Before any editing

    Step 1 – Remove metal knock

    “remove the sound of metal knock”

    Step 2 – Add seagulls

    “add seagulls squawking at left by 3 dB”

    Step 3 – Add waves

    “add the sound of waves lapping at right by 2 dB”

    Declarative audio editing examples

    Declarative instruction: “Make this sound like a workshop by the dock”

    ALM inferenced atomic editing steps:

  • Remove the sound of metal knock
  • Add the sound of seagulls squawking at left by 3dB
  • Turn down the sound of motorboat running by 2dB
  • Add the sound of waves lapping at right by 2dB

  • Original

    ZETA

    AudioEditor

    Audit

    SmartDJ (Ours)

    Declarative instruction: “Make this sound like a protest in a city”

    ALM inferenced atomic editing steps:

  • Turn up the sound of emergency siren by 3dB
  • Remove the sound of man speech
  • Add the sound of crowd chanting at front by 3dB

  • Original

    ZETA

    AudioEditor

    Audit

    SmartDJ (Ours)

    Declarative instruction: “Make this sound like a serene beach”

    ALM inferenced atomic editing steps:

  • Remove the sound of whistling
  • Turn up the sound of wave crash by 4dB
  • Add the sound of seagulls calling at front by 3dB

  • Original

    ZETA

    AudioEditor

    Audit

    SmartDJ (Ours)

    Declarative instruction: “Make this sound like a busy city street”

    ALM inferenced atomic editing steps:

  • Add the sound of distant sirens at left by 3 dB
  • Add the sound of footsteps on pavement at right by 2 dB
  • Turn down the sound of engine rev by 2dB
  • Remove the sound of bell ring

  • Original

    ZETA

    AudioEditor

    Audit

    SmartDJ (Ours)

    Declarative instruction: “Make this sound like a cozy living room”

    ALM inferenced atomic editing steps:

  • Add the sound of fireplace crackle at left by 3dB
  • Turn down the sound of woman speech by 2dB
  • Remove the sound of cat meow

  • Original

    ZETA

    AudioEditor

    Audit

    SmartDJ (Ours)

    Declarative instruction: “Make this sound like in an outdoor concert”

    ALM inferenced atomic editing steps:

  • Remove the sound of whistle
  • Turn down the sound of woman speech by 2dB
  • Add the sound of guitar strumming at left by 2dB

  • Original

    ZETA

    AudioEditor

    Audit

    SmartDJ (Ours)

    Declarative instruction: “Make this sound like a busy daycare center”

    ALM inferenced atomic editing steps:

  • Turn up the sound of child cry by 3dB
  • Remove the sound of car engine
  • Add the sound of toys clattering at left by 2dB

  • Original

    ZETA

    AudioEditor

    Audit

    SmartDJ (Ours)

    Atomic editing action: Add

    Edit instruction: “Add the sound of music playing and people singning at the right with 0 db”

    Original

    ZETA

    AudioEditor

    Audit

    SmartDJ (Ours)

    Atomic editing action: Remove

    Edit instruction: “Remove the sound of baby crying at the front”

    Original

    ZETA

    AudioEditor

    Audit

    SmartDJ (Ours)

    Target (Ground truth)