This is a directory containing the code for the generating the results given
in 'Memory-Efficient Approximation Algorithms for Max-k-Cut and Correlation
Clustering'.

The directory structure and the files are as follows:
1. CorClus.m -- Contains implementation of Algorithm 1 from when applied to
correlation clustering
2. MaxkCut.m -- Contains implementation of Algorithm 1 from when applied to
Max-k-Cut
3. CreateCCGraph_Labels.m -- Generate a graph whose edges have '+', '-' for
any given input graph
4. Directory 'Datasets' -- Contains graphs from GSet dataset. Please refer
to Section 7 of the paper for more details for the dataset
5. Directory 'CCGraphs' -- Contains modified graphs from GSet such that each
edge is labelled either '+' or '-'
6. Directories 'output_CC' and 'output_MkC' -- Contain the output of CorClus.m
and MaxkCut.m respectively

Please refer to the instructions given in this file to recreate the
results in Table 1, and Appendices B and C of the paper.

System requirement:
Requires MATLAB R2018 or later version

Implementing correlation clustering:

The code 'CorClus.m' generates clusters for arbitrarily weighted incomplete
graph with edges that have two labels. If you want to generate clusters for
a randomly generated graph, skip to Step 2. Otherwise, if you have an input
graph for which you would like to generate clusters, then execute the Step 1,
where any input graph is translated to the appropriate graph for clustering
using Jaccard coefficient for each pair of nodes i and j. Please refer
to the paper for more information.

Step 1: If you have the '.mat' file that has
(a) a sparse matrix Problem.W1 containing edges whose nonnegative weights
indicate dissimilarity between nodes, and
(b) a sparse matrix Problem.W2 containing edges whose nonnegative weights
indicate similarity between nodes,
then store it in the directory: 'CCGraphs' located in the same path as
'CorClus.m', and go to Step 2.
Else store your input graph ('.mat' file) in the directory 'Datasets' located
in the same path as 'CorClus.m', and run the following in the command prompt:
CreateCCGraph_Labels(filename)
For example, filename = 'G1.mat'
This will create a new dataset in the directory 'CCGraphs' with the name
['L-',filename]. Depending on the size of the graph, this step might take
a while.

Step 2: Execute the following:
(a) If you want to generate cluster on a random graph
CorClus('R', [V,degree], e1, max_time, MemLog)
where V number of nodes and average degree of each node equal to 'degree'
Example: CorClus('R',[100,4],0.05,3600,0);

(b) If you want to generate clusters on your input graph
CorClus('S',filename, e1, max_time, MemLog)
where filename is the '.mat' that consists of graph information in the format
given in Step 1.
Example: filename = 'L-G1.mat';
CorClus('S',filename,0.05,3600,0);

Set e1 = epsilon, the relative error to generate the solution to SDP (for more
information about the parameter, please refer to Section 4 in the paper)
Optionally, set
(i) MemLog = 1 if you want to track memory usage
(ii) max_time = max time to run the algorithm (in seconds)

The output is stored in the 'output_CC' directory

Implementing Max-k-Cut:

The code 'MaxkCut.m' generates k partitions for input graphs, where k is
specified by the user. Please refer to the paper for more information about
the algorithm. You can either generate partitions for a specified input graph
or for a random graph that is generated by the code.

(a) If you want to generate k partitions for a random graph,
execute the following:
MaxkCut('R', [V,degree], e1, k, max_time, MemLog)
where V is number of nodes and the average degree of each node equals 'degree'
Example: MaxkCut('R',[100,4],0.05,3,3600,0);

(b) If you want to generate k partitions for an input graph, first save the
'.mat' file in the 'Datasets' directory located in the same path as 'MaxkCut.m'
and then, execute the following:
MaxkCut('S', filename, e1, k, max_time, MemLog)
Example: filename = 'G1.mat';
MaxkCut('S',filename,0.05,3,3600,0);

Set
e1 = epsilon, the relative error to generate the solution to SDP (for more
information about the parameter, please refer to Section 3 in the paper)
k = the number of partitions
Optionally, set
(i) MemLog = 1 if you want to track memory usage
(ii) max_time = max time to run the algorithm (in seconds)

The output is stored in the 'output_MkC' directory

Note: The results given in the paper were generated after setting the seed
value to 10 (using rng(10)) at the beginning of the execution of the code
for each instance.
