# cuda_nmf1


This is my first pass where I am still getting used to C++ and CUDA,
so stuff will probably be a bit disorganized and not final.


## Random Commands


```bash

# Install hdf5 libraries for C/C++?
sudo apt install libhdf5-dev
sudo apt-get install libhdf5-serial-dev
# IDK if either were needed, but adding "-I/usr/include/hdf5/serial -lhdf5_cpp"
# as nvcc compiler flags got stuff working.


# # Abseil. IDK if this is the best way. Need to point library/linker to it.
# conda install -c conda-forge abseil-cpp

```

#### Install NCCL
```bash

sudo rm /usr/share/keyrings/cuda-archive-keyring.gpg 
rm ./cuda-keyring_1.0-1_all.deb

sudo rm /etc/apt/sources.list.d/cuda.list
sudo rm /etc/apt/sources.list.d/cuda-ubuntu1804-x86_64.list
sudo rm /etc/apt/sources.list.d/cuda-ubuntu2004-x86_64.list

# cd /tmp

# Local
wget --no-check-certificate https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb
# Banana for whatever reason.
# wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt-get update
sudo apt install -f libnccl2 libnccl-dev


```

#### Move code to banana server
```bash
sh ~/Desktop/projects/cuda_nmf1/dev_scripts/move_code_to_fruit.sh
```

#### Enter banana
```bash
ssh -X m@banana.cs.unc.edu
```


## TODOs

- Get the profiler working (maybe)
- Get some makefile stuff working
- Implement matrix-multiply based solution, dense and single GPU
    - Probably just copy some code for now for the on device matrix multiply.
- Get stuff running on banana.
- Figure out best way to profile.
    - https://developer.nvidia.com/nsight-compute
- Learn CUDA graphs and compare to streams.
- Learn NCCL and multi-GPU stuff.
- Figure out sparse stuff
    - Best sparse matrix format.
        - I think only the CSR format is supported for the GEMM in cusparse.
    - Library to use at first (probably cusparse, make sure its the fastest one).
    - GEMM https://docs.nvidia.com/cuda/cusparse/index.html#cusparse-generic-function-spgemm



## Links

### CUDA Documentation
- https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
- https://docs.nvidia.com/cuda/index.html
- https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html
- https://docs.nvidia.com/cuda/ampere-tuning-guide/index.html
- https://nvidia.github.io/libcudacxx/extended_api.html
- Random number generation https://docs.nvidia.com/cuda/curand/host-api-overview.html

### Linear Algebra
- https://docs.nvidia.com/cuda/cublas/index.html
- https://docs.nvidia.com/cuda/cusparse/index.html
    - https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuSPARSE/compression (Compute Capability 8.0+ only)
    - https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuSPARSE/graph_capture
    - GEMM https://docs.nvidia.com/cuda/cusparse/index.html#cusparse-generic-function-spgemm
        - I think only the CSR format is supported for this.
        - The routine supports does not support CUDA graph capture
        - https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuSPARSE/spgemm

### NCCL
- https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/overview.html

### CUTLASS
- https://github.com/NVIDIA/cutlass

### CUDA Profiling
- https://developer.nvidia.com/nsight-compute

### Makefiles
- https://makefiletutorial.com/
- https://makefiletutorial.com/#makefile-cookbook

### HDF5 in C++
- https://portal.hdfgroup.org/display/HDF5
- https://portal.hdfgroup.org/display/HDF5/Examples+from+Learning+the+Basics
- https://github.com/HDFGroup/hdf5/tree/develop/c%2B%2B
- https://portal.hdfgroup.org/display/HDF5/HDF5+1.12+CPP+Reference+Manual
- http://davis.lbl.gov/Manuals/HDF5-1.6.1/Datatypes.html
- http://davis.lbl.gov/Manuals/HDF5-1.6.1/H5.user.html
- https://docs.hdfgroup.org/hdf5/v1_12/group___h5_d.html


### Sublime Text and C++
- https://github.com/niosus/EasyClangComplete
    - https://niosus.github.io/EasyClangComplete/configs/
