

# Supplementary Materials for Non-negative Tensor Mixture Learning for Discrete Density Estimation

## Appendix
The appendix contains a detailed description of the experimental setup, environment, additional numerical results, remarks about the discussion, and proofs of theorems and propositions. They are appended after the main text.
In the following, we describe how we can reproduce the numerical results. 

## Source Code

For reproducibility, we provide our source code for all experiments. Our code works with Python 3.12.3. 
The proposed algorithms are available in `methods/ours/sparse_em_mix.py`.
The procedure for reordering tensor modes can be seen in `methods/ours/MI.py`.
Datasets for experiments are needed to be stored in the directory `data`.
After running the code for the experiment, the results will be saved in the `results/` folder.
Details for the setting of the experiments can be modified in `exp_config.py`.
Please refer to Section D for the datasets and implementation details.

###  Required packages

The following packages are needed for the proposed methods
```
numpy, itertools, functools, collections, COO
```
The following packages are needed to run baselines
```
Tensorly, nn-fac
```
The follwing packages are needed for dataset loading and preprocessing
```
pmlb, ucimlrepo, pands, pickle
```

### Data set preparation

For DMFT, Votes, Tumor datasets, please read and follow the guidance in `data\00readme.txt`
Run the following commands to download SolarFlare, SPECT, Lymphograhpy, Led7, and Chess datasets to generate training, validation, and test data as numpy files.
```
$cd data
$python3 prepro.py
```
Resulting numpy files will be saved in `data/{dataset_name}/`.
Information on dataset size is available in `data/dataset_info.py`. 

### Run Non-negative Tensor Mixture Learning
The proposed method decomposes sparse tensors stored in COO format. In preparation for running the proposed method, the following steps are performed:
```
$python3
>>> import sys
>>> import numpy as np
>>> sys.path.append("data")
>>> sys.path.append("mehtods/ours")
>>> import sp_tensor, dataset_info
>>> dataset_name = "SolarFlare"
>>> coords = np.load(f"data/{dataset_name}/X_train_coords.npy")
>>> values = np.load(f"data/{dataset_name}/X_train_values.npy")
>>> T = sp_tensor.Sp_tensor(coords, values, dataset_info.tensor_sizes[dataset_name], normalize=True)
```
The option `normalize` must be `True` to map the tensor to a discrete probability distribution. EMCPTrain decomposition can then be run by
```
>>> Rcp = 2
>>> Rtrain = [2,2,2,2,2,2,2,2,2,2]
>>> model = (1,1,1)
>>> A, G, ecp, etra, enoise = sparse_em_mix.EMMix_sparse(T, Rcp, Rtrain, model=model )
```
`Rcp` is the rank of the CP decomposition, and `Rtrain` is the rank of the Train decomposition. The number of elements in `Rtrain` needs to be one less than the dimension of the input tensor. The three binary numbers in the `model` argument represent the CP, the Tucker, and the noise term, respectively. For example, `model = (1,0,1)` means invoking a CP decomposition with noise term. 

The reconstruction on an index `idx` is obtained as follows.
```
>>> Rcp = 2
>>> Rtrain = [2,2,2,2,2,2,2,2,2,2]
>>> model = (1,1,1)
>>> A, G, ecp, etra, enoise = sparse_em_mix.EMMix_sparse(T, Rcp, Rtrain, model=model )
```

The value of the reconstruction of the indices `idx` is obtained as follows.

```
>>> idx = np.array( [1,2,1,1,1,1,1,0,1,0,1] )
>>> sparse_em_mix.mix_values_idxs(A, G, ecp, etra, enoise, idx)
```

### For Experiment in Section 5

#### Running the tensor mixture learning framework
Open `exp_para.py` to edit `dataset_name` and `methods` which are to be run, then execute 
```
$python3 exp_para.py
```
For evaluation on test dataset, open `eval.py` and edit `dataset_name` and `methods` that you want to run, then execute
```
$python3 eval.py
```
You can open the results using the below code:
```
$python3
>>> import numpy as np
>>> utils_exp as ue
>>> dataset_name = "SolarFlare"
>>> ue.pickle_load(f"results/emCPTrainO/{dataset_name}.pkl") # Results for train and validation
>>> ue.pickle_load(f"results/emCPTrainO/{dataset_name}_test.pkl") # Results for test dataset
```

#### Running the Baselines MPS, BS, and LPS
Move to `methods/baselines/` and clone the repository for baselines and change the directory name `tnfp`
```
$cd methods/baselines/
$git clone https://github.com/glivan/tensor_networks_for_probabilistic_modeling
```
For evaluation, add the following `cross` method in the class `TN` in `methods/baselines/tfnp/tensornetworks/MPSClass.py`.
```
    def cross(self, coord, values, w=None):
        distance=0
        cross=0
        epsilon=10**(-10)
        if w is not None:
            self.w=self._padding_function(w)
        self.norm=self._computenorm()

        for n in range(len(coord)):
            loglikelihood = np.log(max(self._probability(coord[n,:])/self.norm,10** (-50)))
            cross -= values[n] * loglikelihood
        return cross
```
To run the code in Python 3.12.3, change all `xrange` to `range` in the cloned files.

Open `exp_bs.py` to edit `dataset_name` and `methods` you want to run. `BM`, `MPS`, and `LPS` correspond to BornMachine, Matrix Product States, and Locally Purified States, respectively.
```
$python3 exp_bs.py
```
The evaluation code will be automatically run. 
We can open the results by:
```
$python3
>>> imoprt numpy as np
>>> utils_exp as ue
>>> dataset_name = "SolarFlare"
>>> ue.pickle_load(f"results/BM/{dataset_name}_lrs.pkl") # Results for learning rate tuning.
>>> ue.pickle_load(f"results/BM/{dataset_name}.pkl") # Results for train and validation
>>> ue.pickle_load(f"results/BM/{dataset_name}_test.pkl") # Results for test dataset
```
### For the Experiment in Section C

-  CP, NNCP, NNCPHALS, Tucker, NNTucker, or NNTuckerHALS
Open `exp_tl.py` to edit `dataset_name` and `methods` that you want to run. `TT` corresponds to Tensor train optimizing Frobenius norm.
- KLCPMU, KLNTDMU
Open `exp_nnf.py` to edit `dataset_name` and `methods` that you want to run. 

The evaluation code will be automatically run. The results will be saved in `results/` as .pkl files. We can open them in the same way described above.

### License
This source code is released under the MIT License.
