# Subjective Assessment: CodecSep vs. AudioSep

This document presents a subjective assessment comparing the outputs of **CodecSep** and **AudioSep** on 20 random test mixtures sampled from the dnr-v2 test set which were used in our MOS-LQS study. The evaluation highlights detailed comparison of model outputs, with attention to separation quality, source leakage, and perceptual artifacts across three tracks: **music**, **speech**, and **SFX**. 

 The goal of this assessment is to aid reviewers in interpreting the qualitative differences between the two models.

---
| Clip | Model     | Music                                                                 | Speech                                                      | SFX                                                             |
|------|-----------|------------------------------------------------------------------------|-------------------------------------------------------------|-----------------------------------------------------------------|
| 54   | CodecSep  | Slight presence of speech; overall clean with low gain.                | Minor bleed from music at the beginning; low gain.          | Well-presented with low gain; slight interference from speech and music. |
|    | AudioSep  | Noticeable interference from both speech and SFX, early excessive gain.| SFX leakage evident in the early segment.                   | Under-emphasized and affected by speech leakage.               |
| 88   | CodecSep  | Clearly rendered; negligible speech contamination.                     | Well-separated with clean articulation.                     | Clear overall with only slight overlap from music and speech.  |
|    | AudioSep  | Weak onset, overwhelmed by speech and SFX.                             | SFX overlaps prominently.                                   | Reasonably captured.                                           |
| 230  | CodecSep  | Slightly muted at the end; SFX leakage present but not dominant.       | Clean with only minor interference.                         | Slight background bleed from music.                            |
|   | AudioSep  | Underpowered, with noticeable contamination from other sources.        | Clear SFX intrusion.                                        | Lacks clarity; other sources interfere.                        |
| 276  | CodecSep  | Subtle leakage from SFX and speech; overall intelligible.              | Slight SFX bleed.                                           | Minimal speech presence; boundaries are well maintained.       |
|   | AudioSep  | Blending from both speech and SFX; reduced gain.                       | Mild to moderate SFX interference.                          | Key events are not rendered distinctly.                        |
| 344  | CodecSep  | Slight SFX and speech contamination, but retains structure.            | High clarity and separation.                                | Cleanly rendered.                                              |
|   | AudioSep  | Overlapping from speech and SFX at start and end; lacks energy.        | Heavily impacted by SFX.                                    | Noticeable leakage from all sources.                           |
| 505  | CodecSep  | Occasional light speech artifacts.                                     | Brief moments of music intrusion.                           | Balanced with slight interference.                             |
|   | AudioSep  | Initial intrusion from speech, followed by SFX.                        | SFX presence is dominant.                                   | Underrepresented and interfered with.                          |
| 520  | CodecSep  | Mild interference from SFX.                                            | Slight overlap, but intelligible.                           | Well rendered.                                                  |
|   | AudioSep  | Strong bleed from speech and SFX.                                      | Clearly affected by SFX.                                    | Weak presentation with overlapping content.                     |
| 641  | CodecSep  | Minor bleed from speech and SFX.                                       | Slight bleed from music.                                    | Clear and well-isolated.                                        |
|   | AudioSep  | Prominent interference from SFX.                                       | SFX becomes more intrusive over time.                       | Clarity affected by speech.                                     |
| 701  | CodecSep  | Slight leakage late in the clip.                                       | Well-separated and consistent.                              | Clearly delineated.                                             |
|   | AudioSep  | Pronounced contamination from speech and SFX.                          | Compromised by SFX.                                         | Acceptable separation.                                          |
| 702  | CodecSep  | Clear and faithful.                                                    | Small drop at the beginning.                                | Accurately separated with minimal speech overlap.               |
|  | AudioSep  | Affected by both speech and SFX.                                       | SFX bleed persists.                                         | Overall acceptable.                                             |
| 972  | CodecSep  | Slight, consistent bleed from speech and SFX; balanced and intelligible.| Minor interference from background music.                  | Light contamination from music and faint speech.                |
|   | AudioSep  | Pronounced leakage from speech and SFX.                                | Slight but noticeable SFX intrusion.                        | Lacks presence and clarity; underpowered.                       |
| 1029 | CodecSep  | Slight and consistent speech leakage; SFX blends in. Low loudness.     | Clean and intelligible with minor SFX presence at start.    | Noticeable music presence; subtly masked.                       |
|  | AudioSep  | Clear output, distinct speech leakage near end.                        | Marked contamination from SFX.                              | Well captured with minimal interference.                        |
| 1180 | CodecSep  | Moderate leakage from SFX and speech in later segments.                | Generally clean, slight musical elements.                   | Reduced clarity due to music intrusion.                         |
|  | AudioSep  | Frequent leakage from SFX and speech.                                  | Clear but influenced by SFX overlap.                        | Good quality; minor music interference near end.                |
| 1230 | CodecSep  | Speech leakage audible; slightly lower gain.                           | Clean and well-separated.                                   | Some elements missed; slight speech interference.               |
|  | AudioSep  | Prominent speech and SFX leakage.                                      | Clear but impacted by SFX and music overlap.                | Some elements not captured; stronger music interference.        |
| 1278 | CodecSep  | Slight suppression; minor speech bleed.                                | Good clarity; occasional music presence.                    | Moderate clarity; some overlap near end.                        |
|  | AudioSep  | Distinct speech leakage.                                               | Clearly affected by SFX.                                    | Noticeable speech leakage and initial degradation.              |
| 1347 | CodecSep  | SFX and speech leakage at start; intermittent contamination.           | Clean overall; brief SFX and music bleed.                   | Moderately clear; impacted by overlapping content.              |
|  | AudioSep  | Generally clean with minor SFX at start.                               | Slight SFX at beginning; increased bleed later.             | Early speech intrusion; mid-clip music leakage.                 |
| 1590 | CodecSep  | Minor SFX and speech leakage; maintains clarity.                       | Slight musical presence.                                    | Well-isolated; minimal contamination.                           |
|  | AudioSep  | Noticeable speech and SFX interference.                                | Clear but moderately affected by SFX.                       | Substantial leakage from music and speech.                      |
| 1777 | CodecSep  | Moderate SFX bleed; light speech presence.                             | Clearly impacted by SFX; limited music intrusion.           | Slight overlap from music and speech.                           |
|  | AudioSep  | Strong contamination from speech and SFX.                              | Noticeable SFX intrusion.                                   | Lacks detail; missed elements later in clip.                    |
| 1778 | CodecSep  | Moderate SFX leakage; slight speech interference.                      | Clear and well-separated.                                   | Light contamination from music and speech.                      |
|  | AudioSep  | Mild SFX bleed; distinct speech leakage at end.                        | Affected by SFX presence.                                   | Initial music contamination; generally well captured.           |
| 1781 | CodecSep  | Localized SFX and slight speech bleed near end.                        | High-quality with minimal interference.                     | Slightly low gain; some music contamination.                    |
|  | AudioSep  | End impacted by SFX and speech leakage.                                | Minor SFX presence.                                         | Underpowered with speech interference.                          |

---

# Subjective Evaluation Summary


CodecSep consistently demonstrated cleaner separation, with lower perceptual interference across modalities. Speech and SFX were generally well-isolated, with only minor leakage from music in some cases. Music outputs retained structural clarity even under slight cross-source contamination, and SFX was notably preserved with minimal speech or music overlap.

In contrast, AudioSep exhibited more frequent cross-source leakage, particularly from speech into SFX and music. Music segments were often overpowered by concurrent sources, and SFX components showed reduced clarity and presence. Temporal consistency was also less stable, with several clips showing fluctuating gain levels and degraded intelligibility in overlapping segments.

Overall, these observations support the subjective advantage of CodecSep in maintaining clearer boundaries between sources, especially under complex acoustic scenes.

---