# HMV-CL: Heterogeneous Multi-View Contrastive Learning

This repository contains the official implementation of the HMV-CL framework. Our method aligns dense textual embeddings with sparse symbolic medical views to produce structured latent representations for rare disease social media data.

## Data Privacy Notice
Due to the high sensitivity and re-identification risks of the patient narratives (the collection of data is in compliance with **French MR-004 (CNIL)**), the original dataset cannot be shared publicly. 

## Configuration
- **Input Views:** Text (768D), Clinical (64D), Procedure (32D), Pharma (64D).
- **Latent Space:** 64D shared space with L2 normalization.
- **Optimizer:** AdamW (LR=1e-4, WD=1e-5).

## Repository Structure
- model.py: Defines the MultiViewEncoder with view-specific projectors f_theta and the linear fusion head g_phi
- loss.py: Implements the multi-view contrastive loss l_mv_cl,  global alignment l_global , and orthogonality regularization l_or
- evaluation.py: Tools for calculating Effective Rank, Isotropy, Uniformity, and ARI stability.
- bertopic_utils.py centralizes the configuration of bertopic
- view_extraction.py: Pipeline for extracting symbolic views using SNOMED-CT lexicons.
- main.py: The central orchestrator for training and evaluating the framework over multiple runs (default 15).
- generate_dummy_data.py: Script to generate synthetic batches for technical verification.