% \documentclass{uai2022} % for initial submission
\documentclass[accepted]{uai2022} % after acceptance, for a revised
                                    % version; also before submission to
                                    % see how the non-anonymous paper
                                    % would look like
%% There is a class option to choose the math font
% \documentclass[mathfont=ptmx]{uai2022} % ptmx math instead of Computer
                                         % Modern (has noticable issues)
% \documentclass[mathfont=newtx]{uai2022} % newtx fonts (improves upon
                                          % ptmx; less tested, no support)
% NOTE: Only keep *one* line above as appropriate, as it will be replaced
%       automatically for papers to be published. Do not make any other
%       change above this note for an accepted version.

%% Choose your variant of English; be consistent
\usepackage[american]{babel}
% \usepackage[british]{babel}

%% Some suggested packages, as needed:
\usepackage{natbib} % has a nice set of citation styles and commands
    \bibliographystyle{plainnat}
    \renewcommand{\bibsection}{\subsubsection*{References}}
\usepackage{mathtools} % amsmath with fixes and additions
% \usepackage{siunitx} % for proper typesetting of numbers and units
\usepackage{booktabs} % commands to create good-looking tables
\usepackage{tikz} % nice language for creating drawings and diagrams
\usepackage{algorithm2e}
\usepackage{amssymb}
\usepackage{commath}
%% Provided macros
% \smaller: Because the class footnote size is essentially LaTeX's \small,
%           redefining \footnotesize, we provide the original \footnotesize
%           using this macro.
%           (Use only sparingly, e.g., in drawings, as it is quite small.)

%% Self-defined macros
\newcommand{\swap}[3][-]{#3#1#2} % just an example
\newcommand{\chien}[1]{\textcolor{black}{#1}}
\newcommand{\chiencr}[1]{\textcolor{black}{#1}}
\newcommand{\jaakkocr}[1]{\textcolor{black}{#1}}
% \title{Instructions for Authors: Title in Title Case}
\title{Nonparametric Exponential Family Graph Embeddings for Multiple Representation Learning}
% The standard author block has changed for UAI 2022 to provide
% more space for long author lists and allow for complex affiliations
%
% All author information is authomatically removed by the class for the
% anonymous submission version of your paper, so you can already add your
% information below.
%
% Add authors
% \author[1]{\href{mailto:<jj@example.edu>?Subject=Your UAI 2022 paper}{Jane~J.~von~O'L\'opez}{}}
\author[1]{Chien Lu}
\author[1]{Jaakko Peltonen}
\author[1]{Timo Nummenmaa}
\author[1]{Jyrki Nummenmaa}
% \author[3]{Further~Coauthor}
% \author[3,1]{Further~Coauthor}
% Add affiliations after the authors
\affil[1]{%
    % Computer Science Dept.\\
    % Cranberry University\\
    % Pittsburgh, Pennsylvania, USA
    Tampere University, Finland
}
% \affil[2]{%
%     Second Affiliation\\
%     Address\\
%     …
% }
% \affil[3]{%
%     Another Affiliation\\
%     Address\\
%     …
%   }
  
  \begin{document}
\maketitle


\begin{abstract}
In graph data, 
%it is common that each node serves or carries multiple different functionalities. 
each node often serves multiple functionalities.
However, most 
%of the proposed 
graph
embedding models assume that each node can only possess one representation. We address this issue by proposing a nonparametric graph embedding model.  The 
%proposed 
model allows each node to learn multiple representations 
where they are needed to represent the 
complexity of random walks in the graph.
%the graph-generated random walks.
%content in  
%according to the complexity of the graph-generated random walks. 
%The model 
It
extends the Exponential family graph embedding model with two nonparametric prior settings, the Dirichlet process and the 
% Uniform 
\chien{uniform}
process. 
The model combines the ability of 
Exponential family graph embedding to take the number of occurrences of context nodes into account with nonparametric priors giving it the flexibility to learn more than one latent representation for each node. %from the data. 
%The Exponential family graph embedding model is capable of taking the number of occurrences of context nodes into account and nonparametric priors give the flexibility for each node to learn more than one latent representation from the data. 
%We demonstrate that t
The learned embeddings 
% reflect the graph structure and the model 
outperforms other state of the art 
\chien{approaches}
% methods 
in  link prediction %multilabel classification 
and node classification tasks.
%number of learned embeddings reflects the structural features of the graph and can enhance the performance in conducted tasks 
%compared to other state-of-the-art methods.
%   This is the abstract for this article.
%   It should give a self-contained single-paragraph summary of the article's contents, including context, results, and conclusions.
%   Avoid citations; but if you do, you must give essentially the whole reference.
%   For example: This whole paper is devoted to praising É. Š. Åland von Vèreweg's most recent book (“Utopia's government formation problems during the last millenium”, Springevier Publishers, 2016).
%   Also, do not put mathematical notation and abbreviations in your abstract; be descriptive.
%   So not “we solve \(x^2+A xy+y^2\), where \(A\) is an RV”, but “we solve quadratic equations in two unknowns in which a single coefficient is a random variable”.
%   The reason is that mathematical notation will not display correctly when the abstract is reused on the proceedings website, for example, and that one should not assume the abstract's reader knows the abbreviation.
%   Of course the same remarks hold for your paper's title.
\end{abstract}

\section{Introduction}\label{sec:intro}

Data in the form of graphs is 
% increasingly commonly used
\chien{drastically growing}
across disciplines to represent complex observations and their relationships in the graph topology.
%Taking the advantage of topological features of graph, graph data have been more and more imperative for its ability to deal with complex observations across different disciplines. 
%
One common challenge for such data is unsupervised representation learning (embedding) which discovers underlying functions or characterizations of nodes solely from the graph structure without 
%using the 
requiring availability of
node attributes. 
%The development has shown encouraging
Such research has shown encouragingly
that the learned latent representations can be used as %training 
features for different predictive tasks with
%and deliver 
promising performance.

Despite the success of such models, most of the proposed methods consider only the co-appearance pattern of nodes in walks across a graph. 
%The number of occurrences of nodes, which signals an important trait of the network structure, is often ignored. 
The prominence of nodes in their surroundings, for example as hubs or bridges, is an important trait of the network structure but is often ignored.
%Another issue is that, 

Moreover,
it is a common phenomenon that each graph node can serve different functions or roles:
%In graphs, similar complexity of node roles will occur: 
a node can, for example, act both as a local hub for its nearby nodes and also as a crucial bridge along a path between far-off connected areas of a graph. 
%Our work provides a rigorous and efficient model to learn and represent such multiple roles. We will also add a 'walkthrough' text for Figure 1 into the main text of the paper.
However, most methods are unable to properly represent this: they are restricted to single representation learning where each node is only assigned one latent vector representation. 
\jaakkocr{A model that only supports one embedding per node tries to collapse all underlying roles of the node into one vector representation could omit necessary information: this can yield poor representations that are 'inbetween' the roles of the node and do not represent any of them well or represent only some roles while ignoring others.}

In this paper we introduce a 
% nonparametric exponential 
\chien{novel} embedding model, which extends exponential family embedding \citep{rudolph2016exponential} with nonparametric priors and allows
%. We allow 
a node to have more than one latent representation.
We allocate such latent representations following two nonparametric priors, 
%are investigated, they are 
the Dirichlet process and the uniform process. 
%Amid the popularity of Dirichlet process, 
While Dirichlet processes are popular in nonparametric modeling,
the uniform process has been 
neglected in such models; our results show 
%extremely neglected by the research community when it comes to non-parametric models. 
%We show that 
the uniform process is a promising prior for the proposed model. 
% To tackle the common difficulty of \chien{computation} efficiency when training a nonparametric model, a 
\chien{A tailored} 
% tailored 
truncation-free inference algorithm is developed. Different from the traditional 
% truncation-based 
approaches, the algorithm introduces new latent embedding vectors 
%attentively 
over iterations which provides more efficient inference. 

We evaluate the proposed model with two tasks, link prediction and node classification. 
Results over several datasets show the proposed multiple representation learning method improves performance compared to state of the art baselines.
%The conducted experiments show that the learned embedding vectors can enhance the performance of the predictive tasks.

The contributions of this work are:
\begin{itemize}
    %\item 
    \item \chiencr{We introduce the notion of multiple representation to graph embeddings: %allowing 
    each node 
    %is allowed to possess
    can have
    more than one latent vector representation.}
    \item \chiencr{We propose a graph embedding model leveraging Bayesian nonparametrics, which is unprecedented and challenging to do well. 
    %.The Bayesian nonparametric employeed so that 
    The number of latent representations are thus decided by the observed data.}
    \item \chiencr{In addition to the Dirichlet process, we explore the uniform process, and show it is an important option for achieving best results.}
    \item \chiencr{We develop an adaptive inference algorithm for efficient computation.}
\end{itemize}


%We organize 
The paper is organized as follows. Section \ref{sec:back} describes background concepts. Section \ref{sec:model} introduces the proposed model. Section \ref{sec:inf} develops the inference algorithm. Experiments are conducted in Section \ref{sec:exp} and Section \ref{sec:conclusion} draws the conclusions.

\section{Fundamental concepts}\label{sec:back}

\chien{This section provides a brief overview of some basic concepts that are related to our approach.}

\subsection{Exponential Family Embedding}
Exponential family embedding (EFE) \citep{rudolph2016exponential} is a probabilistic extension of the CBOW embedding model \citep{mikolov2013efficient, mikolov2013distributed}. 
%In general, the concept assumes 
Observations are made of objects $v$ that occur at locations $n$ surrounded by a context which is a set of other objects.
\jaakkocr{In a traditional word embedding scenario an object would be a word and the context would be the surrounding words in a sentence; in the graph embedding scenario that we address, objects are instead nodes of a graph and contexts are other nodes on a random walk in the graph.}

Let $x_{n,v}$ denote the observed value for object $v$ at location $n$. 
Denote the context by a set $\textbf{c}_{n}=\{v'\}$ of other objects $v'$ and a vector $\tilde{\mathbf{x}}_{\textbf{c}_{n}}=\{\tilde{x}_{n,v'}\}$ of their values in the context.
\jaakkocr{In our graph embedding case, the values represent whether the object (graph node) occurs at the location and how many times the context objects (nodes) occur in the context.}

%The context 
In EFE, conditioning on the context set $\textbf{c}_{n}$ and context values $\tilde{\mathbf{x}}_{\textbf{c}_{n}}$, the observed value $x_{n,v}$ for object $v$ is assumed to be exponential family distributed:
\begin{equation}
    x_{n,v} | \textbf{c}_{n}, \tilde{\mathbf{x}}_{\textbf{c}_{n}} \sim \mathbf{ExpFam} \left(\eta_{v} \left( \textbf{c}_{n},\tilde{\mathbf{x}}_{\textbf{c}_{n}} \right), T\left(x_{n,v} \right) \right)
\end{equation}
where $\mathbf{ExpFam}$ is an exponential family distribution, $\eta_{v} \left( \textbf{c}_{n},\tilde{\mathbf{x}}_{\textbf{c}_{n}} \right)$ is the natural parameter, and $T\left(x_{n,v} \right)$ denotes the sufficient statistics. 

In EFE, each object $v$ is represented in two ways, with an embedding vector $\boldsymbol{\rho}_v \in \mathbb{R}^D$ and a context vector $\boldsymbol{\alpha}_v \in \mathbb{R}^D$ where $D$ is the embedding dimensionality. The EFE captures the co-occurrence pattern by constructing the natural parameter based on interaction between the embedding vector of the center object and the context vectors of its context objects weighted by their %corresponding 
context values.
% , 
%\chiencr{It assumes the interactions are locally linear, exploiting a special generalized linear modeling,} so that
\jaakkocr{The model can be seen as a special generalized linear model since the
natural parameter is modeled as a link function of an inner product,
so that}
\begin{equation}
    \eta_{v} \left( \textbf{c}_{n},\tilde{\mathbf{x}}_{\textbf{c}_{n}} \right) =
    g \left( \boldsymbol{\rho}_{v}^{\top} \frac{1}{|\textbf{c}_n|} 
    %\sum_{(n',v') \in \textbf{c}_n} 
    %\tilde{x}_{n',v'} \boldsymbol{\alpha}_{v'}
    \sum_{v' \in \textbf{c}_n} 
    \tilde{x}_{n,v'} \boldsymbol{\alpha}_{v'}
    \right) \;.
\end{equation}

Since $\mathbf{ExpFam}$ can be any exponential distribution, CBOW can be seen as the special case of employing a Bernoulli distribution where the observed value $x_{n,v}$ can be either $1$ or $0$. One principal merit of the generalization to other probability distributions is the capability of capturing latent patterns by incorporating the observed values. \chiencr{For example, in a shopping cart scenario, quantity of an observed item is modeled by the quantities of its context items (i.e., other products in the shopping cart) which are not binary but positive integers. Similarly, in a graph embedding scenario counts of graph nodes in a context will be positive integers.}

\subsection{Random walk based node embedding}
Let $\mathcal{G} = (\mathbb{V}, \mathbb{E})$ be a graph where $\mathbb{V}$ denotes the set of vertices, and $\mathbb{E} \subseteq \mathbb{V} \times \mathbb{V}$ denotes the edge set. A random walk $\mathbf{w} = \{w_1, \hdots, w_L\}$ of length $L$ is a simulated sequence of nodes over the graph
%$\mathcal{G}$ 
where each node is chosen at random from the neighbors of the previous node.
%$w_{n-1}$ and $w_{n+1}$ is chosen from the neighbor nodes of $w_{n}$. 
\jaakkocr{Extraction of such random walks is a way to describe a graph by extracting sequence data representing graph connectivity. Such sequences can then be modeled by a generative model.}

Random walk based embedding approaches \citep{perozzi2014deepwalk, grover2016node2vec} model co-occurrence of nodes in a set of 
%simulated 
random walks $\mathcal{W}$. 
\jaakkocr{The generative process models the sequence content, and thus the graph connectivity, through embeddings of nodes: the model is conditional on the nodes and generates the sequences.}

%In graph embedding, the connectivity (edges, and sequences derived therefrom) is modeled based on node properties

Given a walk $\mathbf{w} \in \mathcal{W}$, the occurrence of node $w_{n}$ at position $n$ in the walk is conditional on the set $\mathbf{c}_n$ of its surrounding (context) nodes in the walk. The occurrence probability is modeled as depending on embedding vectors of the node and embedding vectors of the context nodes. The representation learning aims to optimize the probability of occurrence of the nodes $w_n$ given their contexts, i.e., $\prod_{\mathbf{w} \in \mathcal{W}} \prod_n p(w_n | \mathbf{c}_n)$.

%$\{w_{n-K}, \hdots, w_{n-1}, w_{n+1}, \hdots, w_{n+K}\}$ with each node presented as a vector representation. 

%More precisely, each vertex $v \in \mathbb{V}$ is mapped to a D-dimensional vector representations $\Phi :\mathbb{V} \rightarrow \mathbb{R}^{D}$, the  to optimize the conditional probability $P(w_{n}|\Phi(w_{n-K}), \hdots, \Phi(w_{n-1}), \Phi(w_{n+1}), \hdots, \Phi(w_{n+K}))$. The representation learning aims at optimizing the following objective function 

%\begin{equation}
%    \log p \left( \{x_v, x_v,\} | \Phi \right)
%\end{equation}




\subsection{Bayesian Nonparametics}

In Bayesian nonparametric models, %are approaches where 
the number of 
% model 
parameters is not fixed in advance but 
% is 
learned during model fitting up to a potentially infinite number of parameters.
%according to how many parameters are needed to describe the data well . 
%potentially infinite;  lear
%that assume there no a fixed number of parameter but an infinite amount. 
%The pivotal setting of such model is that the 
%Crucially, t
The models are typically described as mixtures: each observation is modeled by a parameter drawn from a distribution $G$ over the space of parameters (\chien{e.g.}
% such as 
$\mathbb{R}^D$) where only a finite number of parameter values have nonzero probability, but $G$ itself is drawn as
\begin{equation}
    G \sim NP(G_0, \gamma)
\end{equation}
from a stochastic process prior $NP$ with base distribution $G_0$ and concentration parameter $\gamma$. The process $NP$ yields distributions over the parameter space, with different numbers of possible values up to a potentially infinite number, but each draw from $NP$ has a finite number. Thus fitting the model to data with the prior $NP$ will infer how many parameters are needed to describe the data.


%The prior distribution over 
%model 
%parameters is defined on an infinite-dimensional space, thus %More specifically, 
%the model assumes 
%there are potentially an infinite number of parameters; inference algorithms visit model settings with varying numbers of parameters to find how many parameters are needed to best decribe the data according to their prior and likelihood values. 
%Commonly, such models are defined as a mixture where the number of components and their probabilities are learned, so that
%only limited parameters are observed in the data.
%It has been an imperative approach to capture the complexity of the observed data.
%Such models can be described as
%\begin{equation}
%    G \sim NP(G_0, \gamma)
%\end{equation}
%where $G$ is a distribution 
%over a feature space (such as $\mathbb{R}^d$) where only a finite number of locations have nonzero probability; effectively $G$ is a choice among a finite number of possible feature values. The $G$ in turn is a draw from a stochastic process $NP$ with base distribution $G_0$ and concentration parameter $\gamma$, 

%$NP$ is a stochastic process with the base probability distribution $G_0$ and the concentration parameter $\gamma$. 

\subsection{Related Work}
Among
%When it comes to 
random walk based unsupervised node embeddings, Deepwalk \citep{perozzi2014deepwalk} has been the classical method.
\citet{grover2016node2vec, ribeiro2017struc2vec} simulate variant random walks 
%with considering 
emphasizing different structural features of the graph. \citet{celikkanat2020exponential} extend the models with different likelihoods with 
% exponential family embedding 
\chien{EFE} framework; in their work, the context vectors are taken to represents the vertices. 
% whereas our model uses the embedding vectors.

A group of models have been proposed to learn multiple representations. Among those, \citet{sun2019vgraph} decide the number of embedding with a community detection task; \citet{liu2019single, park2020unsupervised, chen2020gaussian} impose a fixed number of embedding vectors for all nodes with a predefined value. The most similar method to ours is \citet{epasto2019single} which uses local neighborhood clustering to generate multiple representations for nodes where different nodes can have different number of embedding vectors. Those methods often depend on extra simulations \jaakkocr{of} the graph data in addition to the random walks data, whereas our method only requires the generated random walks.

Besides random walk based methods, there are other 
%braches of 
proposed approches include, for example, methods based on matrix factorization
\citep{ou2016asymmetric, wang2017community, qiu2018network} and neural network based approaches \citep{li2018deeper, velickovic2019deep,wu2020comprehensive}. 

\section{Proposed Model}\label{sec:model}

The proposed model is a Bayesian nonparametric extension of exponential family node embedding. %In this section, 
We next describe the two notions and how they are used to learn multiple node representations. 
%for nodes. 
%An overall illustration is shown in 
Figure \ref{fig:graph_illus} shows an overall illustration.
\jaakkocr{In the figure, random walks are first extracted from a graph, yielding sequences whose sliding windows each contain a center node and counts of other nodes in the context. The occurrence of the center node will be modeled based on the context, where dependency is characterized using vectorial embeddings: each node has one embedding as a context node and can have multiple embeddings as a center node. The generation of the observed sequence content can be written as a graphical plate representation where nonparametric priors are used to generate the embedding vectors of center nodes, and the center and context embedding vectors together are used to generate observed values, that is, the observed center nodes in each window of a random walk.}

\begin{figure*}[t]
  \centering
  \includegraphics[width=0.68\textwidth]{graphembedding_illustration_ver2.pdf}
  \includegraphics[width=0.18\textwidth]{graphical_representation_3.png}
  \caption{Illustrations of the proposed model. \textbf{Left:} random walk (light blue) along a graph from which windows are extracted as positive samples (green) of vertices that were center nodes and counts of other nodes in their context, and corresponding negative samples (red) of vertices that did not occur in the center. \textbf{Middle:} each vertex has one or more $d$-dimensional vector representations $\boldsymbol{\rho}$ as center nodes (circles), and one representation $\boldsymbol{\alpha}$ as a context node (diamonds). \jaakkocr{The picture shows a $d=3$ dimensional example.} \textbf{Right:} graphical plate representation of the proposed model.}\label{fig:graph_illus}
\end{figure*}



\subsection{Exponential Family Node Embeddings}

Given a 
simulated 
random walk node sequence $\mathbf{w}=\{w_1, \hdots, w_L \}$ of length $L$, 
we slide windows of length $K$ along it.
%the sequence.
%from a random walk, 
%for each sliding window of length $K$, 
In each window
the center node $w_{n}$ is surrounded by context nodes $\{w_{n-K}, \hdots, w_{n-1}, w_{n+1}, \hdots, w_{n+K}\}$. For each possible vertex $v$ we denote $x_{n,v} = 1$ if it was the center node so that $w_{n} = v$, otherwise $x_{n,v} = 0$. 
The context is denoted by the set $\textbf{c}_n$ of unique vertices in the context nodes and the counts $\tilde{\mathbf{x}}_{\mathbf{c}_n}=\{\tilde{x}_{n,v'}\}$ how many times each vertex $v'\in \textbf{c}_n$ occurred in them, $\tilde{x}_{n,v'} \le K-1$.
%$\textbf{c}_n$ denotes t
%llection of distinct context of $w_{n}$, and for each unique vertex $v' \in \textbf{c}_n$, $v'$ occurs $x_{n,v'}$ times. 
%Note that due to the sliding-window, if $n$ is the center position, the time occurrence $x_{n,v}$ can be only $1$ or $0$, and $x_{n, v'}$ can be up to $K-1$.

\jaakkocr{We will model dependency of node occurrences along a sequence, based on distributions whose natural parameter compares observed values to their context. In more detail, the natural parameter is based on comparison of node embedding vectors that characterize what kind of surroundings each node tends to appear in. We first describe the distribution and then describe the construction of the natural parameter for different exponential families (different likelihoods).}

%Equation (\ref{eq:cooccurrencepattern}) shows 
We model the co-occurrence pattern between $w_{n}$ and the context $(\textbf{c}_n,\tilde{\mathbf{x}}_{\mathbf{c}_n})$ with an exponential family 
\begin{equation}\label{eq:cooccurrencepattern}
    x_{n,v} | \textbf{c}_n,\tilde{\mathbf{x}}_{\mathbf{c}_n} \sim \mathbf{ExpFam} \left(\eta_{n} \left( \textbf{c}_n,\tilde{\mathbf{x}}_{\mathbf{c}_n} \right), T\left(x_{n,v} \right) \right)
   % x_{i} | \textbf{x}_{c_i} \sim \mathbf{ExpFam} \left(\eta_{i} \left( \textbf{x}_{c_i} \right), T\left( x_{i} \right) \right)
\end{equation}
where $\eta_{v} \left( \textbf{c}_{n},\tilde{\mathbf{x}}_{\mathbf{c}_n} \right)$ is the natural parameter and $T\left(x_{n,v} \right)$ the sufficient statistics. 

\jaakkocr{In this work occurrence of a node is represented as a one-hot choice vector and it is modeled as a draw from an exponential family distribution whose parameters depend on the surrounding nodes.}
Concretely, if the vertex appears at the location $n$, the positive likelihood is then defined as 
\begin{equation}
    p(x_{n,v} = 1) = f(x_{n,v} = 1 | \eta_{n} \left( \textbf{c}_n,\tilde{\mathbf{x}}_{\mathbf{c}_n} \right), T\left(x_{n,v} \right))
\end{equation}
where $f$ is the corresponding probability density function of the exponential family distribution.
For a vertex that does not appear at location $n$, the likelihood of the non-appearance (also called a `negative likelihood') is 
\begin{equation}
    p(x_{n,v} = 0) = f(x_{n,v} = 0| \eta_{n} \left( \textbf{c}_n,\tilde{\mathbf{x}}_{\mathbf{c}_n} \right), T\left(x_{n,v} \right)) \;.
\end{equation}
Since random walks only yield positive samples of vertices that occurred in the center of their windows, learning from them alone would bias the model; thus we 
use a popular negative sampling approach, and randomly generate several negative samples (5 in experiments) for each location $n$. A negative sample has the same context $(\textbf{c}_n,\tilde{\mathbf{x}}_{\mathbf{c}_n})$ as the positive sample at $n$, but $x_{n,v}$ is instead set to 1 for a random vertex among those that did not appear in the location.%randomly chosen vertex $v$ 
In this work, we explore three different exponential family distributions: Bernoulli, Poisson, and Gaussian.

\textbf{Bernoulli Likelihood}. We employ Bernoulli distribution  to model the co-occurrence patterns of nodes. Let $\boldsymbol{\rho}_{n,v} \chien{\in \mathbb{R}^{D}}$ denote the embedding vector of the node $v$ at the location $n$, \chien{$\boldsymbol{\alpha}_v \in \mathbb{R}^{D}$ denote the embedding vector for the vertex $v$,} the natural parameter is then defined as 
\begin{equation}
    p_n =
    \mathcal{S} \bigg( {\boldsymbol{\rho}^{\top}_{n,v}} \frac{1}{|\textbf{c}_n|} \sum_{v' \in \textbf{c}_n} \boldsymbol{\alpha}_{v'}
    \bigg)
\end{equation}
where $\mathcal{S}$ denotes the sigmoid function $\mathcal{S} = \frac{1}{1+e^{-x}}$, and $|\textbf{c}_n|$ is the number of distinct nodes in the context. The appearance of the node $v$ at the location $n$, i.e. whether $x_{n,v}=1$ or $x_{n,v}=0$, is thus sampled from a Bernoulli distribution with parameter $p_n$ so that
\begin{equation}
    x_{n,v} \sim Bern(p_n) \;.
\end{equation}

Note that we use the Bernoulli likelihood to model only the co-appearance of the nodes, which can be seen as an extension of Skip-gram based models. The number of occurrences of nodes in the context is not taken into the account. To incorporate the number of occurrences of nodes, we employ the Poisson and Gaussian distributions.

\textbf{Poisson Likelihood}. For a Poisson distribution, the parameter $\lambda_n$ is defined as 
\begin{equation}
    \lambda_n =
    \exp \bigg( \boldsymbol{\rho}^{\top}_{n,v} \frac{1}{|\textbf{c}_n|} \sum_{v' \in \textbf{c}_n} \tilde{x}_{n, v'} \boldsymbol{\alpha_{v'}} 
    \bigg)
\end{equation}
where $|\textbf{c}_n|$ is again the number of distinct nodes in context and $x_{n,v'}$ denotes the number of occurrences of node $v'$ in the context. The appearance of the node $v$ is generated as
\begin{equation}
    x_{n, v} \sim Pois(\lambda_n)
\end{equation}

The pivotal difference between the Bernoulli and Poisson cases is that the latter takes the number of occurrences of nodes in the context into account when constructing the natural parameter. The Gaussian case takes the same setting.

\textbf{Gaussian Likelihood}. Similar to the settings for Poisson Likelihood, the natural parameter here is defined as 
\begin{equation}
    \mu_n =\boldsymbol{\rho}^{\top}_{n,v} \frac{1}{|\textbf{c}_n|} \sum_{v' \in \textbf{c}_n} \tilde{x}_{n, v'} \boldsymbol{\alpha}_{v'}
\end{equation}
without a specific link function, and the appearance of the node $v$ at the location $n$ is generated as
\begin{equation}
    x_{n,v} \sim Norm(\mu_n, \sigma)
\end{equation}
where we set $\sigma$ as a fixed hyper-parameter; in the experiments we arbitrarily choose the $\sigma$ from \{1, 5, 10\}.

\jaakkocr{When several different likelihoods are feasible, The model choice can depend on domain expertise, or cross-validation can be used as a model selection process.}


\subsection{Nonparametric Embedding}
\label{sec:nonparametric_embedding}

Instead of restricting each vertex $v$ to 
% always 
have a single role represented, 
% by the same embedding vector
to better capture the complexity of vertex roles in a graph as observed in 
% the generated 
random walks, we \chiencr{present a multiple representation learning model which enables} \jaakkocr{each vertex to have multiple latent vector representations, so that the ocurrence of the}
% allow 
the vertex at each location in a walk \jaakkocr{can} arise from a different role of the vertex.
%random from the generated random walk, 
To do so,
we set a nonparametric prior on the embedding vectors $\boldsymbol{\rho}$. That is, 
%instead of restricting to a one-to-one relation from a vertex $v$ to its embedding vector $\boldsymbol{\rho}_v$, 
%instead of restricting each vertex $v$ to always have the same embedding vector,
we assume that at each location $n$, an embedding vector $\boldsymbol{\rho}_{n,v}$ is generated from a stochastic process $G_v$ specific to the vertex, so that
\begin{equation}
    \boldsymbol{\rho}_{n, v}  = \boldsymbol{\rho}_v^{(s)} \sim G_v(G_0, \gamma)
\end{equation}
where $G_v$ is a stochastic process with a base distribution $G_0$ and a concentration parameter $\gamma$. The base distribution $G_0$ has an infinite number of possible embedding vectors and $G_v$ is a draw from it allocating nonzero probability to a finite number of possibilities
%with an infinite number of possible candidate embedding vectors 
$\{\boldsymbol{\rho}_v^{(1)}, \hdots, \boldsymbol{\rho}_v^{(s)},\hdots, \boldsymbol{\rho}_v^{(S)}, \hdots, \}$ where $S$ is the number of observed embedding vectors. 
%More specifically, $G_v$ is a stochastic process with a base distribution $G_0$ and a concentration parameter $\gamma$. 
We set the base distribution to be a $d$-dimensional Normal distribution $N(0, \sigma_0)$. 
In experiments we set $\sigma_0 = 5$ for Bernoulli likelihood and $\sigma_0 = 10$ for both Poisson and Gaussian likelihood.
%Note that 
%in this work, 
For simplicity, similar to the settings of \citet{rudolph2017structured, rudolph2018dynamic}, although we allow multiple embedding vectors $\boldsymbol{\rho}_{n,v}$  for a vertex we will use only one context vector 
% we set the context vector 
%$\boldsymbol{\alpha}^{(v)}$
$\boldsymbol{\alpha}_{v}$
%static to each vertex $v$. 
per vertex;
\jaakkocr{this setting can already generate good results in the experiments, and}
%; 
generalization to allow multiple context vectors is a future work.
%The further generalization on the context embedding is a positive direction for the future works. 

In the following, let $\textbf{n}_v = \textbf{n}^{+}_v \cup \textbf{n}^{-}_v$ denote locations related to vertex $v$, so that $\textbf{n}^{+}_v$ denotes locations where the $v$ appears and $\textbf{n}^{-}_v$ locations where $v$ is the negative sample. Moreover, denote by
$\textbf{n}_{v,<n}$ the subset of $\textbf{n}_v$ where the location is before $n$, and denote by
superscript $(s)$ those locations where the embedding vector was the $s$:th embedding vector of $v$.

\textbf{Dirichlet Process}. One of the most common nonparametric process priors is
%The mostly use prior $G_v$ is 
a Dirichlet process. The predictive probability of $\boldsymbol{\rho}_{n, v}$ is defined
based on numbers of occurrences of embedding vectors
of $v$ at earlier locations $n'<n$ in positive or negative samples, so that
%\chien{
%\begin{equation}
%    P(\boldsymbol{\rho}_{n, v}|\{\boldsymbol{\rho}_{n',v}\}_{n'<n; n'\in \textbf{n}_v})
%\end{equation}
%}
%\begin{equation}
%    P(\boldsymbol{\rho}_{n, v}| \Rho_n \{\boldsymbol{\rho}_{n',v}\;\; \forall n' < n; x_{n',v}=1 \})
%\end{equation}
\begin{multline}
    P(\boldsymbol{\rho}_{n, v}|\{\boldsymbol{\rho}_{n',v}; n'\in \textbf{n}_{v,<n}\}) = \\
    %P(\boldsymbol{\rho}_{n, v}|\{\boldsymbol{\rho}_{n',v}\}_{n'<n; n'\in \textbf{n}_v}) = \\
    %P(\boldsymbol{\rho}_{n, v} | \boldsymbol{\rho}_{1,v}, \hdots, \boldsymbol{\rho}_{n-1,v}) = \\ 
    \left\{\begin{matrix}
        \frac{|\textbf{n}^{(s)}_{v,<n}|}{\sum_{s'} |\textbf{n}^{(s')}_{v,<n}|-1+\gamma} & \boldsymbol{\rho}_{v,n} = \boldsymbol{\rho}^{(s)}_v,\;\forall  \boldsymbol{\rho}^{(s)}_v \in \{\boldsymbol{\rho}^{(1)}_{v} \hdots \boldsymbol{\rho}^{(S_v)}_{v}\} \\ 
        \frac{\gamma}{\sum_{s'} |\textbf{n}^{(s')}_{v,<n}|-1+\gamma} & \boldsymbol{\rho}_{v,n} = \boldsymbol{\rho}^{(S_{v}+1)}_v\sim G_0
    \end{matrix}\right.
\end{multline}
where $|\textbf{n}^{(s)}_{v,<n}|$ is the number of locations before $n$ where $\boldsymbol{\rho}^{(s)}_v$ has been selected, and $\gamma$ governs the generation of a new embedding vector. 
%In general, the generation of $\boldsymbol{\rho}_{n, v}$ is condition on previously "selected" embedding vectors. 

\textbf{Uniform process}. An alternative to Dirichlet process is a uniform process \citep{wallach2010alternative} with the predictive probability
\begin{multline}
    P(\boldsymbol{\rho}_{n, v}|\{\boldsymbol{\rho}_{n',v}; n'\in \textbf{n}_{v,<n}\}) = \\ \left\{\begin{matrix}
        \frac{1}{S_v+\gamma}  & \boldsymbol{\rho}_{n,v} = \boldsymbol{\rho}^{(s)}_v, \forall \boldsymbol{\rho}^{(s)}_v \in \{\boldsymbol{\rho}^{(1)}_{v} \hdots \boldsymbol{\rho}^{(S_v)}_{v}\} \\ 
        \frac{\gamma}{S_v+\gamma} & \boldsymbol{\rho}_{n,v} = \boldsymbol{\rho}^{(S_v+1)}_v\sim G_0 
    \end{matrix}\right.
\end{multline}
where $S_v$ denotes the number of different embedding vectors used for $v$ before location $n$, and the embedding vector $\boldsymbol{\rho}_{n,v}$ is generated independently from the occurrence frequencies of previous generated values. The generation is only controlled by the concentration parameter $\gamma$.

Despite the popularity of Dirichlet process, it suffers from the ``rich get richer'' issue, as it tends to repeat previous values and tends to model the first (or first few) embedding vectors as highly dominant, which can limit model flexibility. The uniform process was proposed to address this issue. Figure \ref{fig:dp_up_compare} show an example where the Dirichlet process concentrates on the first embedding vector and uniform process delivers smoother weights. The uniform process has been %largely  ignored 
neglected
by the research community, with most
%of the 
applications employing Dirichlet processes as priors.


\begin{figure}
  \centering
  \includegraphics[width=0.8\linewidth,page=3]{np-pois-emb-compare_n=1892.png}
  \caption{A comparison bewteen two nonparametric priors on the embeedings of the node [YGR078C] in Yeast dataset. (a): Weights of each embedding vector Dp-Pois model ($\gamma = 0.01$). (b): from up-Pois model ($\gamma = 0.000001$).  }\label{fig:dp_up_compare}
\end{figure}

%Figure \ref{fig:graph_rep} 
\textbf{Overall generative process.}
The proposed model can be summarized with the generative process shown below (corresponding plate model shown in Figure \ref{fig:graph_illus}, Right): 
%shows a graphical representation of the proposed model. Subsequently, 
\begin{enumerate}
  \item For each vertex $v \in \mathbb{V}$:
  \begin{itemize}
      \item[-] $G_v \sim NP(G_0, \gamma)$
      \item[-] $\boldsymbol{\alpha}_v \sim N(0, \sigma^2_{0}I)$
  \end{itemize}
  \item For each walk $\textbf{w} = \{w_1, \hdots, w_L\} \in \mathcal{W}$
  \begin{itemize}
    \item[-] For location $n$:
    \begin{itemize}
    \item[-] $\boldsymbol{\rho}_{n,v} \sim G_v$
    \item[-] $\eta_{n,v} = g \left(\boldsymbol{\rho}_{n,v}^{\top} \frac{1}{|\textbf{c}_n|} \sum_{{v'} \in \textbf{c}_n} \tilde{x}_{n,v'} \boldsymbol{\alpha}_{v'}  \right)$
    \item[-] $x_{n,v} \sim P(\eta_{n, v})$
  \end{itemize}
  \end{itemize}
\end{enumerate}


%\begin{figure}
%  \centering
%  \includegraphics[width=1\linewidth,page=3]{grap%hical_representation_2.png}
%  \caption{Graphical representation of the proposed model.}\label{fig:graph_rep}
%\end{figure}

\section{Inference} \label{sec:inf}

%\subsection{Truncation-free Inference}
We adapt a truncation-free variational inference algorithm proposed by \citep{huynh2016streaming}. Using a stick-breaking construction \citep{sethuraman1994constructive}, for vertex $v$ we have 
\begin{align}
    G_v = \sum_{s=1}^{\infty} \beta^{(s)}_v \delta_{\boldsymbol{\rho}^{(s)}_v}\;, &\;\; \boldsymbol{\rho}^{(s)}_v \sim G_0 \;,\\
    \beta^{(s)}_v = \zeta^{(s)}_v \prod_{i=1}^{s-1} \left(1-{\zeta^{(i)}_v}\right), &\;\; \zeta^{(s)}_v \sim Beta(1, \gamma) \;.
\end{align}
%\begin{equation}
%    G_v = \sum_{s=1}^{\infty} \beta^{(s)}_v \delta_{\boldsymbol{\rho}^{(s)}_v},\;\; \boldsymbol{\rho}^{(s)}_v \sim G_0
%\end{equation}
%\begin{equation}
%    \beta^{(s)}_v = \zeta^{(s)}_v \prod_{i=1}^{s-1} \left(1-{\zeta^{(i)}_v}\right),\;\; \zeta^{(s)}_v \sim Beta(1, \gamma) \;.
%\end{equation}
The posterior distribution for the stick breaking parameters $\boldsymbol{\beta}_v = (\beta^{(1)}_v, \hdots, \beta^{(S_v)}_v, \beta^{(S_{v}+1)}_v )$ is then
\begin{equation}
    (\beta^{(1)}_v, \hdots, \beta^{(S)}_v, \beta^{(S+1)}_v ) \sim Dir(\theta^{(1)}_v, \hdots, \theta^{({S_v})}_v, \gamma)
\end{equation}
where parameter $\boldsymbol{\theta}_{v}$ governs the general prevalence over all potential embedding vectors. % The overall prevalence of $\boldsymbol{\rho}^{(s)}_v$ is  $E_q [\beta^{(s)}_v ] = \sum_{s'} \theta^{(s')}_{v}$. 
For each location, the embedding vector $\boldsymbol{\rho}_{n,v}$ is decided by a label $z_{n,v}$ sampled from a Multinomial distribution
\begin{equation}
    z_{n,v} \sim Multinomial(\boldsymbol{\beta}_{v}) \;, \;
    \boldsymbol{\rho}_{n,v} = \boldsymbol{\rho}^{(z_{n,v})}_{v} \;.
\end{equation}

The variational distribution $q(z_{v,n})$ is updated as
\begin{multline}
    \exp \big( E_q\big[\ln z_{n,v} \big] \big) \propto \exp \big( E\big[\ln p(x_{n,v} | \textbf{c}_n, \tilde{\mathbf{x}}_{\mathbf{c}_n}; \boldsymbol{\rho}^{(s)}_{v}, \boldsymbol{\alpha}) \big] + \\
    E\big[\ln p(z_{n,v}|z_{\mathbf{n}_v\symbol{92}n,v}; \gamma) \big] \big) \label{eq:update_z}
\end{multline} 
where the first term is the fitness of the selected embedding $\boldsymbol{\rho}^{(s)}_v$, and the second term is related to the prior. If the prior is a Dirichlet process, the second term in Equation \eqref{eq:update_z} is 
\begin{multline}
    E\left[\ln p(z_{n,v}|z_{\mathbf{n}\symbol{92}n,v}; \gamma) \right] = \\ \left\{\begin{matrix}
        \ln \frac{E[\theta^{(s)}_{\mathbf{n}_v\symbol{92}n, v}]}{|\textbf{n}_v|-1+\gamma} - \frac{1}{2} \frac{Var[\theta^{(s)}_{\mathbf{n}_v\symbol{92}n, v}]}{{E[\theta^{(-s)}_v]}^2} & s \leq S \\ 
        \ln \frac{\gamma}{|\textbf{n}_v|-1+\gamma} & s > S
    \end{matrix}\right.
\end{multline}
where $\textbf{n}_v$ denotes the locations of vertex $v$ and $|\textbf{n}_v|$ denotes its size. We then have

\begin{align} \label{eq:update_theta_mean}
E[\theta^{(s)}_{v}] = \sum_{n \in \mathbf{n}_v} q(z_{n,v} = s) \\
    E[\theta^{(s)}_{\mathbf{n}_v\symbol{92}n,v}] = \sum_{n \in {\mathbf{n}_v\symbol{92}n}} q(z_{n,v} = s) \\
    Var[\theta^{(s)}_{v}] = \sum_{n \in \mathbf{n}_v} q(z_{n,v} = s) (1-q(z_{n,v} = s))
\end{align}
\begin{align} 
    Var[\theta^{(s)}_{\mathbf{n}_v\symbol{92}n,v}] = \sum_{n \in {\mathbf{n}_v\symbol{92}n}} q(z_{n,v} = s) (1-q(z_{n,v} = s)) \label{eq:update_theta_var}
\end{align}

On the other hand, if the prior is a uniform process, the second term in Equation (\ref{eq:update_z}) has a simpler form:
\begin{equation}
    E\left[\ln p(z_{n,v}|\gamma) \right] = \left\{\begin{matrix}
        \ln \frac{1}{|\textbf{n}_v|+\gamma} & s \leq S \\ 
        \ln \frac{\gamma}{|\textbf{n}_v|+\gamma} & s > S
    \end{matrix}\right.
\end{equation}

The truncation-free algorithm starts with setting $S = 1$, where $q(z^{(S+1)}_{v,n}) = 0$. When $E[\theta^{(S+1)}_{v}] > 1$, the algorithm sets $S = S + 1$, increasing the dimension of vector $z_{v,n}$, and sets $q(z^{(S+1)}_{v,n}) = 0$. We can then use the $\theta_v$ to calculate the expected weighting of the vector $\boldsymbol{\rho}^{(s)}_v$.
% as $\hat{\beta}^{(s)}_v = E_q\left[\beta^{(s)}_v\right] =  \frac{E_q\left[\theta^{(s)}_v\right]}{\sum_{s=1}^{S_v} E_q\left[\theta^{(s)}_v\right]}$.

\begin{equation} \label{eq:exp_weight}
    \hat{\beta}^{(s)}_v = E_q\left[\beta^{(s)}_v\right] =  \frac{E_q\left[\theta^{(s)}_v\right]}{\sum_{s=1}^{S_v} E_q\left[\theta^{(s)}_v\right]}
\end{equation}

\RestyleAlgo{ruled}
\begin{algorithm}[tbh]
\SetKwInOut{Input}{input}
\SetKwInOut{Output}{output}
\caption{Inference Algorithm}\label{alg:var_inf}
\Input{Random walks $\mathcal{W}$, negative samples $\tilde{\mathcal{W}}$, initial learning rate $\xi$, number of epochs, number of mini-batches $M$
}
\Output{embedding vectors $\Phi = \{\boldsymbol{\rho}, \boldsymbol{\alpha}\}$, embedding weights $\{\boldsymbol{\hat{\beta}}\}$
}
\ForEach{v \in \mathbb{V}}{
Set $S_v = 1$, initialize embedding vectors $\rho^{(1)}_v$, $\alpha_v$}
\ForEach{epoch}{
    Divide input data into $M$ random partitions.\\
    \For{$m\gets1$ \KwTo $M$}{
        Use the subset $\mathcal{W}^{(m)}$ and $\tilde{\mathcal{W}}^{(m)}$ \\
        \ForEach{v}{
            \ForEach{$n \in \textbf{n}^{(m)}_v$}{
                update $z_{n,v}$ with Equation (\ref{eq:update_z})\\
            }
            updata $\theta_v$ with Equation (\ref{eq:update_theta_mean}) -  (\ref{eq:update_theta_var})\\
            Calculate $\hat{\beta_v}$  with Equation (\ref{eq:exp_weight})\\
            \If{$E[\theta^{(S+1)}_{v}]$ > 1}{
            $S_v$ = $S_v + 1$ \\
            \ForEach{$n \in \textbf{n}_v$}{
                increase the dimension of $z_{n,v}$ and set $z^{(S+1)}_{n,v} = 0$ \\
            }
            }\\
        }
        update embedding vectors $\Phi = \{\boldsymbol{\rho}, \boldsymbol{\alpha} \}$ \\
        $\Phi = \Phi - \xi * \frac{\partial \mathcal{L}}{\partial \Phi}$ \\
        $\xi$ is set with Adam\citep{kingma2015adam}
    }
}
\end{algorithm}

\textbf{Inference of embedding vectors.}
%Let $\textbf{n}_v = \textbf{n}^{+}_v \cup \textbf{n}^{-}_v$ denote the locations related to the vertex $v$. $\textbf{n}^{+}_v$ denotes locations where the $v$ appears and $\textbf{n}^{-}_v$ denotes locations where $v$ is the negative sample. 

After updating the $E_q\left[z_{n,v} \right]$, the inference is conducted by optimizing the objective function $\mathcal{L} = \mathcal{L}_{prior} + \mathcal{L}_{likelihood}$.

The term $\mathcal{L}_{prior} = \log p(\rho) + \log p(\alpha)$ is derived from the Gaussian prior $N(0, \sigma_0^2)$ for the embedding vectors:
\begin{align}
    \log p(\boldsymbol{\rho}^{(s)}_v) = \frac{{\norm{\boldsymbol{\rho}^{(s)}_v}}^2}{- 2 \sigma^2_0} \;,\;\;
    \log p(\boldsymbol{\alpha}_v) = \frac{{\norm{\boldsymbol{\alpha}_v}}^2}{- 2 \sigma^2_0}\;.
\end{align}

For Bernoulli likelihood we have
\begin{multline}
    \mathcal{L}_{likelihood} = \sum_{v \in \mathbb{V}} (\sum_{n \in \textbf{n}^{+}_v} \sum_{s \in S_v} E_q\left[z_{n,v} = s \right] p_n + \\
    \sum_{n \in \textbf{n}^{-}_v} \sum_{s \in S_v} E_q\left[z_{n,v}=s\right] (1-p_n) ) \;.
\end{multline}
For Poisson likelihood we have
\begin{multline}
    \mathcal{L}_{likelihood} = \sum_{v \in \mathbb{V}} (\sum_{n \in \textbf{n}^{+}_v} \sum_{s \in S_v} E_q\left[z_{n,v}=s \right] \left(\log \lambda_n - \lambda_n\right) \\
    - \sum_{n \in \textbf{n}^{-}_v} \sum_{s \in S_v} E_q\left[z_{n,v}=s \right] \lambda_n ) \;.
\end{multline}
For Gaussian likelihood, we have
\begin{multline}
    \mathcal{L}_{likelihood} = \sum_{v \in \mathbb{V}} (\sum_{n \in \textbf{n}^{+}_v} \sum_{s \in S_v} E_q\left[z_{n,v}=s \right] \left( \frac{\left(1- \mu_n \right)^2}{-2 \sigma^2} \right) \\
    + \sum_{n \in \textbf{n}^{-}_v} \sum_{s \in S_v} E_q\left[z_{n,v}=s \right] \left( \frac{\mu^2_n}{-2 \sigma^2} \right) ) \;.
\end{multline}
We then use gradient descent to update the embedding vectors over iterations.




\begin{table}
    \centering
    \caption{Datasets for Link Prediction}\label{tab:data_link_pred}
    \begin{tabular}{l c c c c } 
 \toprule % from booktabs package
 \bfseries Data & \|V\| & \|E\| & Avg.deg & Density \\ 
 \midrule
 GitHub  & 37700 & 289003 & 15.332 & 0.00041 \\
 Wikipedia  & 11631 & 180020 & 30.955 & 0.00266 \\ 
 Twitch  & 7126 & 35324 & 9.914 & 0.00140 \\ 
  \bottomrule
\end{tabular}
\end{table}

\begin{table}
    \centering
    \caption{Datasets for Node Classification}\label{tab:data_node_classif}
    \begin{tabular}{l c c c c c} 
 \toprule % from booktabs package
 \bfseries Data & \|V\| & \|E\| & \|K\| & Avg.deg & Density \\ 
 \midrule
 LastFM &  7624 & 27806 & 18 & 7.294 & 0.00095\\
 CiteSeer &  3327 & 4237 & 6 & 2.845 & 0.00043 \\ 
 Yeast & 2617 & 11855 & 13 & 9.060 & 0.00346 \\
  \bottomrule
\end{tabular}
\end{table}


\begin{table*}
    \centering
    \caption{Results for Link Prediction}\label{tab:link_pred}
    \begin{tabular}{l c c c | c c c | c c c} 
 \toprule % from booktabs package
 \bfseries  &  & GitHub &  &  & Wikipedia &  &  & Twitch & \\
   & D = 50 & D = 100 & D = 150 & D = 50 & D = 100 & D = 150 & D = 50 & D = 100 & D = 150 \\
 \midrule
 Deepwalk    &  0.722 & 0.695 & 0.694 & 0.911 & 0.915 & 0.922 & 0.659 & 0.649 & 0.672\\
 node2vec    &  0.731 & 0.734 & 0.731 & 0.913 & 0.931 & 0.941 & 0.681 & 0.691 & 0.698 \\ 
 struc2vec   &  0.849 & 0.864 & 0.874 & 0.820 & 0.881 & 0.863 & 0.830 & 0.828 & 0.840 \\ 
 EFGE (Bern) &  0.729 & 0.726 & 0.736 & 0.939 & 0.950 & 0.962 & 0.681 & 0.687 & 0.707 \\
 EFGE (Pois) &  0.728 & 0.771 & 0.771 & 0.950 & 0.955 & 0.964 & 0.679 & 0.708 & 0.714 \\
 EFGE (Norm) &  0.862 & 0.868 & 0.888 & 0.977 & 0.983 & 0.985 & 0.791 & 0.791 & 0.802\\
 Splitter & 0.898 & 0.600 & 0.900 & 0.876 & 0.880 & 0.884 & 0.836 & 0.823 & 0.823\\
 \midrule
 dp-emb (Bern) & 0.823 & 0.831 & 0.830 & 0.986 & 0.991 & 0.991 & 0.757 & 0.787 & 0.782 \\
 dp-emb (Pois) & 0.737 & 0.723 & 0.780 & 0.979 & 0.984 & 0.986 & 0.656 & 0.704 & 0.716\\
 dp-emb (Norm) & 0.923 & \textbf{0.932}& 0.929 & 0.985 & 0.985 & 0.985 & 0.847 & 0.845 & \textbf{0.871}\\
 up-emb (Bern) & 0.813 & 0.838 & 0.843 & \textbf{0.989}   & \textbf{0.991} & \textbf{0.992} & 0.750 & 0.788 & 0.784 \\
 up-emb (Pois) & 0.741 & 0.767 & 0.780 & 0.979 & 0.982 & 0.986 & 0.658 & 0.706 & 0.714\\
 up-emb (Norm) & \textbf{0.926} & 0.932 & \textbf{0.931} & 0.985 & 0.985 & 0.986 & \textbf{0.849} & \textbf{0.846} & 0.869\\
  \bottomrule
\end{tabular}
\end{table*}


\subsection{Stochastic Inference}

We employ stochastic inference. For each epoch, the input data is randomly partitioned into $M$ mini-batches and only one mini-batch is used for each iteration. When mini-batch $m$ is used, the sum over locations $\textbf{n}_v$ can be approximated by a sum over a subsampled set $\textbf{n}^{(m)}_v$, so the right-hand side of (\ref{eq:update_theta_mean}) is approximated by 
$
\frac{|\textbf{n}_v|}
{|\textbf{n}^{(m)}_v|}
\sum_{n \in \textbf{n}^{(m)}_v} 
q(z_{n,v} = s)
$ 
and similarly in the other sums. The inference procedure is summarized in Algorithm \ref{alg:var_inf}. For all the experiments conducted in this work, we run two epochs with 1000 mini-batches and initial learning rate $\xi = 0.01$. For the negative samples, we generate $\tilde{\mathcal{W}}$ with 5 negative samples for each location following the procedure of \citet{mikolov2013distributed, celikkanat2020exponential}.

 
\section{Experiments} \label{sec:exp}

\jaakkocr{For generality,} we run
%evaluate our method with 
\jaakkocr{experiments with two standard tasks commonly adopted in graph embedding works,} link prediction and node classification, with 3 data sets \citep{igraph, nr, karateclub} for each task (Tables \ref{tab:data_link_pred} and \ref{tab:data_node_classif}). \jaakkocr{The data sets cover varied domains and aim to represent typical use scenarios of the proposed method.}
%uses 3 different datasets.
We denote our method variants by prior (dp: Dirichlet process, up: uniform process) and ExpFam distribution (Bern, Pois, Norm), e.g. `up-emb (Bern)'.
We compare to random walk based methods 
%including 
DeepWalk \citep{perozzi2014deepwalk}, node2Vec \citep{grover2016node2vec}, struc2vec \citep{ribeiro2017struc2vec}, and EFGE \citep{celikkanat2020exponential}, and Splitter \citep{epasto2019single}.  
%
%To extensively evaluate the potential effect of the dimension of the embedding vectors, 
To evaluate effect of embedding dimensionality,
for each method we run three 
%different 
dimension settings: $D = 50, 100,$ and $150$. The concentration parameter for our model is chosen from $\gamma = \{0.01, 0.05, 0.1\}$ for Dirichlet process and $\gamma = \{0.0000001, 0.0000005, 0.000001\}$ for uniform process. The input random walks are generated with the R package igraph \citep{igraph} with $80$ walks per node with length $L = 10$, the random walks are also fed to EFGE. 
For other methods, parameters are all set to default values.

\subsection{Task: Link Prediction}

In link prediction, for each graph we first randomly move 50\% of the edges into a held-out test set while keeping the remaining training graph connected. In both training and test sets, randomly sampled negative edges are added in equal amount to the positive edges.
A classifier is trained based on the reduced training graph and the training negative edges; the classifier is used to classify the held-out test-set edges. 
\jaakkocr{As in the previous single-representation learning works including Deepwalk, node2vec, struc2vec, and EFGE, logistic regression is selected as the classifier. 
In our approach, to incorporate multiple representations when training the classifier, we employ logistic regression with sample weights, embedding $\boldsymbol{\rho}^{(s)}_{v}$ is weighted by $\hat{\beta^{(s)}_v}$.}
%Again, for Deepwalk, node2vec, struc2vec and EFGE, a logistic regression classifier is used.
%is employed to train the classifier. 
%For our model, the logistic regression with sample weights is used. 
The Splitter used maximum dot-product similarity, we transform the similarity into a class probability using logistic regression. 

\jaakkocr{Note that when logistic regression is trained with sample weighting, embeddings of all nodes in our model are separate samples weighted in the log-likelihood by their occurrence probabilities. The regression learns to classify nodes based on all their embedding vectors, and at test time, a node is classified by weighted average of class probabilities predicted for each of its embedding vectors. Thus, the multiple embedding vectors are treated separately instead of being combined in a simplistic weighted average.}

Three different datasets are used for the link prediction task.

\textbf{GitHub}: a social network where each node is a GitHub developer, links between nodes are mutual follow relations.\\ 
\textbf{Wikipedia}: a network of English Wikipedia pages. Edges between pages reflect their mutual links.\\
\textbf{Twitch}: a user-user interaction network between gamers. Edge between two nodes represents mutual friendship.

% \begin{itemize}
%   \item \textbf{GitHub}: a social network of GitHub developers. Each node is a developer and Links between nodes represent mutual follow relations. 
%   \item \textbf{Wikipedia}: a page-page network of English Wikipedia pages. An edge between two nodes reflects the mutual links of them.
%   \item \textbf{Twitch}: a user-user interaction network between gamers. Edge between two nodes represents the mutual friendship.
% \end{itemize}

We evaluate the binary link classification by area under the curve (AUC). Table \ref{tab:link_pred} shows our model performs well on all datasets; the model with Gaussian likelihood works best.

\subsection{Task: Node classification}

In this task, each node has a class. The learned embedding vectors are used as input features to train a classifier to predict the class of each node. 
%As in the previous single-representation learning works including Deepwalk, node2vec, struc2vec, and EFGE, logistic regression is selected as the classifier. 
%In our approach, to incorporate multiple representations when training the classifier, we employ logistic regression with sample weights, embedding $\boldsymbol{\rho}^{(s)}_{v}$ is weighted by $\hat{\beta^{(s)}_v}$. 
\jaakkocr{Again, for Deepwalk, node2vec, struc2vec and EFGE, a logistic regression classifier is used.
%is employed to train the classifier. 
For our model, the logistic regression with sample weights is used.}
For Splitter, we take the same procedure with each embedding equally weighted.
Three different datasets are used for the node classification task.

\textbf{LastFM Asia}: a network of people living in Asia using the streaming site LastFM. Links represent followership relations. The class of each node is its location.\\
\textbf{CiteSeer}: a scientific publication network from the CiteSeer digital library. Each node belongs to 1 of 6 categories.\\
\textbf{Yeast}:a protein-protein interaction network. The ``Class'' attribute of each protein is based on its function (e.g. energy).  

We evaluate the performance by Micro-averaged F1, reported
%the Micro-F1 
in Table \ref{tab:node_classification}. 
%In general, 
Our model outperforms other methods. \cite{karateclub}
\chiencr{Additionaly, in general, our model took 2-4 hours to converge (depends on different tasks and settings) without GPU. The Splitter, which also learns multiple representations for each node, took 10+ hours on a GPU machine and 100+ hours without GPU. Our approach achieved better results with less resources.}

\begin{table*}
\label{tab:node_classification}
    \centering
    \caption{Results for Node Classification}
    \label{tab:node_classification}
    %\label{tab:res_lastfm}
    \begin{tabular}{l c c c c | c c c c | c c c c} 
 \toprule % from booktabs package
 \bfseries      LastFM & \multicolumn{4}{c|}{(D = 50)} & \multicolumn{4}{c|}{(D = 100)} & \multicolumn{4}{c}{(D = 150)} \\ 
           & 10\% & 30\%  & 60\%  & 90\%  & 10\% & 30\%  & 60\%  & 90\%  & 10\% & 30\%  & 60\%  & 90\%  \\
 \midrule
    Deepwalk	&	0.756	&	0.800	&	0.819	&	0.823	 &    0.754	&	0.796	&	0.819	&	0.829	 &    0.750	&	0.797	&	0.819	&	0.826	\\
node2vec	&	0.741	&	0.796	&	0.820	&	0.828	 &    0.741	&	0.802	&	0.824	&	0.829    &    0.740	&	0.799	&	0.826	&	0.834 \\
struc2vec	&	0.116	&	0.127	&	0.130	&	0.138	 &    0.128	&	0.149	&	0.165	&	0.174    &    0.131	&	0.159	&	0.178	&	0.189 \\
EFGE-Bern	&	0.749	&	0.805	&	0.826	&	0.831	 &    0.758	&	0.805	&	0.824	&	0.830    &    0.758	&	0.803	&	0.826	&	0.832 \\
EFGE-Pois	&	0.741	&	0.791	&	0.820	&	0.825	 &    0.743	&	0.793	&	0.817	&	0.822    &    0.745	&	0.798	&	0.821	&	0.825 \\
EFGE-Norm	&	0.758	&	0.807	&	0.826	&	0.832	 &    0.752	&	0.804	&	0.824	&	0.830    &    0.755	&	0.808	&	0.827	&	0.833 \\
Splitter	&	0.428	&	0.519	&	0.541	&	0.546	 &    0.426	&	0.490	&	0.530	&	0.573    &    0.451	&	0.469	&	0.533	&	0.567 \\
 \midrule
Dp-Bern &	\textbf{0.809}	&	\textbf{0.833}	&	\textbf{0.839}	&	0.833	 &    \textbf{0.810}	&	\textbf{0.835}	&	\textbf{0.846}	&	0.849    &    0.800	&	\textbf{0.835}	&	0.843	&	\textbf{0.850} \\
Dp-Pois &	0.776	&	0.821	&	0.831	&	0.833	 &    0.782	&	0.822	&	0.831	&	0.830    &    0.782	&	0.823	&	0.832	&	0.833 \\
Dp-Norm &	0.751	&	0.807	&	0.822	&	0.823	 &    0.740	&	0.804	&	0.820	&	0.821    &    0.744	&	0.807	&	0.823	&	0.831 \\
up-Bern &	0.806	&	0.831	&	0.835	&	\textbf{0.841}	 &    0.802	&	0.835	&	0.841	&	\textbf{0.852}    &    0.804	&	0.833	&	\textbf{0.844}	&	0.844 \\
up-Pois &	0.781	&	0.818	&	0.828	&	0.829	 &    0.802	&	0.835	&	0.841	&	0.852    &    0.779	&	0.821	&	0.830	&	0.834 \\
up-Norm	&	0.754	&	0.811	&	0.822	&	0.823	 &    0.742	&	0.805	&	0.821	&	0.823    &    0.733	&	0.806	&	0.821	&	0.827 \\
  \bottomrule
  \bfseries      Citeseer & \multicolumn{4}{c|}{(D = 50)} & \multicolumn{4}{c|}{(D = 100)} & \multicolumn{4}{c}{(D = 150)} \\ 
 \midrule
Deepwalk	&	0.432	&	0.479	&	0.487	&	0.519	&	0.453	&	0.497	&	0.520	&	0.530	&	0.459	&	0.504	&	0.525	&	0.532	\\
node2vec	&	0.456	&	0.503	&	0.508	&	0.555	&	0.493	&	0.529	&	0.539	&	0.544	&	0.501	&	0.538	&	0.570	&	0.582	\\
struc2vec	&	0.224	&	0.240	&	0.278	&	0.314	&	0.226	&	0.250	&	0.274	&	0.294	&	0.224	&	0.243	&	0.254	&	0.297	\\
EFGE-Bern	&	0.468	&	0.502	&	0.508	&	0.518	&	0.477	&	0.503	&	0.516	&	0.556	&	0.478	&	0.520	&	0.532	&	0.580	\\
EFGE-Pois	&	0.460	&	0.504	&	0.501	&	0.518	&	0.497	&	0.490	&	0.491	&	0.562	&	0.497	&	0.491	&	0.499	&	0.566	\\
EFGE-Norm	&	0.456	&	0.496	&	0.503	&	0.526	&	0.473	&	0.500	&	0.516	&	0.533	&	0.471	&	0.505	&	0.533	&	0.581	\\
Splitter	&	0.164	&	0.162	&	0.183	&	0.181	&	0.169	&	0.166	&	0.188	&	0.186	&	0.165	&	0.162	&	0.166	&	0.177	\\
 \midrule
 Dp-Bern &	0.461	&	0.481	&	0.528	&	\textbf{0.589}	&	0.478	&	0.519	&	0.533	&	0.563	&	0.504	&	0.540	&	0.564	&	0.559 \\
Dp-Pois &	0.435	&	0.462	&	0.479	&	0.538	&	0.430	&	0.460	&	0.476	&	0.528	&	0.403	&	0.441	&	0.460	&	0.510 \\
Dp-Norm &	0.475	&	0.490	&	0.510	&	0.556	&	0.509	&	0.523	&	0.529	&	\textbf{0.559}	&	0.512	&	0.529	&	0.527	&	0.562 \\
up-Bern &	0.459	&	0.492	&	0.509	&	0.538	&	0.481	&	0.522	&	\textbf{0.534}	&	0.538	&	0.502	&	0.557	&	0.581	&	0.585 \\
up-Pois &	0.437	&	0.465	&	0.498	&	0.540	&	0.436	&	0.467	&	0.492	&	0.546	&	0.404	&	0.438	&	0.473	&	0.529 \\
up-Norm &	\textbf{0.518}	&	\textbf{0.527}	&	\textbf{0.532}	&	0.568	&	\textbf{0.521}	&	\textbf{0.531}	&	0.514	&	0.561	&	\textbf{0.521}	&	\textbf{0.559}	&	\textbf{0.580}	&	\textbf{0.616} \\
  \bottomrule
    \bfseries      Yeast & \multicolumn{4}{c|}{(D = 50)} & \multicolumn{4}{c|}{(D = 100)} & \multicolumn{4}{c}{(D = 150)} \\ 
 \midrule
Deepwalk	&	0.283	&	0.330	&	0.360	&	0.413	&	0.290	&	0.358	&	0.401	&	0.436	&	0.288	&	0.361	&	0.400	&	0.441	\\
node2vec	&	0.280	&	0.320	&	0.351	&	0.388	&	0.293	&	0.338	&	0.371	&	0.410	&	0.297	&	0.354	&	0.401	&	0.437	\\
struc2vec	&	0.134	&	0.150	&	0.169	&	0.256	&	0.134	&	0.153	&	0.166	&	0.238	&	0.141	&	0.161	&	0.171	&	0.225	\\
EFGE-Bern	&	0.269	&	0.324	&	0.347	&	0.380	&	0.281	&	0.339	&	0.374	&	0.418	&	0.289	&	0.349	&	0.400	&	0.414	\\
EFGE-Pois	&	0.271	&	0.320	&	0.365	&	0.373	&	0.281	&	0.331	&	0.372	&	0.399	&	0.286	&	0.339	&	0.374	&	0.409	\\
EFGE-Norm	&	0.285	&	0.325	&	0.354	&	0.383	&	0.281	&	0.332	&	0.367	&	0.405	&	0.288	&	0.354	&	0.392	&	0.428	\\
Splitter	&	0.164	&	0.207	&	0.228	&	0.246	&	0.157	&	0.214	&	0.263	&	0.263	&	0.165	&	0.211	&	0.273	&	0.297	\\
 \midrule
Dp-Bern	&	0.285	&	\textbf{0.343}	&	0.373	&	0.401	&	0.296	&	0.377	&	0.402	&	0.442	&	0.296	&	\textbf{0.376}	&	0.416	&	0.472	\\
Dp-Pois	&	0.275	&	0.328	&	0.354	&	0.375	&	0.285	&	0.327	&	0.360	&	0.383	&	0.301	&	0.338	&	0.375	&	0.402	\\
Dp-Norm	&	0.285	&	0.330	&	0.364	&	0.352	&	0.277	&	0.339	&	0.352	&	0.381	&	0.266	&	0.350	&	0.382	&	0.407	\\
up-Bern	&	\textbf{0.290}	&	0.338	&	\textbf{0.382}	&	\textbf{0.414}	&	\textbf{0.301}	&	\textbf{0.361}	&	\textbf{0.406}	&	\textbf{0.443}	&	\textbf{0.304}	&	0.367	&	\textbf{0.419}	&	\textbf{0.479}	\\
up-Pois	&	0.281	&	0.336	&	0.358	&	0.392	&	0.288	&	0.326	&	0.355	&	0.385	&	0.277	&	0.348	&	0.395	&	0.426	\\
up-Norm	&	0.282	&	0.340	&	0.372	&	0.393	&	0.289	&	0.345	&	0.391	&	0.381	&	0.288	&	0.320	&	0.364	&	0.382	\\
  \bottomrule
\end{tabular}
\end{table*}



% \begin{table*}
%     \centering
%     \caption{Results for Node Classification}\label{tab:res_lastfm}
%     \begin{tabular}{l c c c c | c c c c | c c c c} 
%  \toprule % from booktabs package
%  \bfseries      LastFM & \multicolumn{4}{c}{(D = 50)} & \multicolumn{4}{c}{(D = 100)} & \multicolumn{4}{c}{(D = 150)} \\ 
%           & 10\% & 30\%  & 60\%  & 90\%  & 10\% & 30\%  & 60\%  & 90\%  & 10\% & 30\%  & 60\%  & 90\%  \\
%  \midrule
%     Deepwalk	&	0.756	&	0.800	&	0.819	&	0.823	 &    0.754	&	0.796	&	0.819	&	0.829	 &    0.750	&	0.797	&	0.819	&	0.826	\\
% node2vec	&	0.741	&	0.796	&	0.820	&	0.828	 &    0.741	&	0.802	&	0.824	&	0.829    &    0.740	&	0.799	&	0.826	&	0.834 \\
% struc2vec	&	0.116	&	0.127	&	0.130	&	0.138	 &    0.128	&	0.149	&	0.165	&	0.174    &    0.131	&	0.159	&	0.178	&	0.189 \\
% EFGE-Bern	&	0.749	&	0.805	&	0.826	&	0.831	 &    0.758	&	0.805	&	0.824	&	0.830    &    0.758	&	0.803	&	0.826	&	0.832 \\
% EFGE-Pois	&	0.741	&	0.791	&	0.820	&	0.825	 &    0.743	&	0.793	&	0.817	&	0.822    &    0.745	&	0.798	&	0.821	&	0.825 \\
% EFGE-norm	&	0.758	&	0.807	&	0.826	&	0.832	 &    0.752	&	0.804	&	0.824	&	0.830    &    0.755	&	0.808	&	0.827	&	0.833 \\
% Splitter	&	0.428	&	0.519	&	0.541	&	0.546	 &    0.426	&	0.490	&	0.530	&	0.573    &    0.451	&	0.469	&	0.533	&	0.567 \\
%  \midrule
% Dp-Bern &	\textbf{0.809}	&	\textbf{0.833}	&	\textbf{0.839}	&	0.833	 &    \textbf{0.810}	&	\textbf{0.835}	&	\textbf{0.846}	&	0.849    &    0.800	&	\textbf{0.835}	&	0.843	&	\textbf{0.850} \\
% Dp-Pois &	0.776	&	0.821	&	0.831	&	0.833	 &    0.782	&	0.822	&	0.831	&	0.830    &    0.782	&	0.823	&	0.832	&	0.833 \\
% Dp-norm &	0.751	&	0.807	&	0.822	&	0.823	 &    0.740	&	0.804	&	0.820	&	0.821    &    0.744	&	0.807	&	0.823	&	0.831 \\
% up-Bern &	\textbf{0.806}	&	0.831	&	0.835	&	\textbf{0.841}	 &    0.802	&	0.835	&	0.841	&	\textbf{0.852}    &    0.804	&	0.833	&	\textbf{0.844}	&	0.844 \\
% up-Pois &	0.781	&	0.818	&	0.828	&	0.829	 &    0.802	&	0.835	&	0.841	&	0.852    &    0.779	&	0.821	&	0.830	&	0.834 \\
% up-norm	&	0.754	&	0.811	&	0.822	&	0.823	 &    0.742	&	0.805	&	0.821	&	0.823    &    0.733	&	0.806	&	0.821	&	0.827 \\
%   \bottomrule
% \end{tabular}
% \end{table*}

% \begin{table*}
%     \centering
%     \caption{Results for Node Classification}\label{tab:res_citeseer}
%     \begin{tabular}{l c c c c | c c c c | c c c c} 
%  \toprule % from booktabs package
%  \bfseries      Citeseer & \multicolumn{4}{c}{(D = 50)} & \multicolumn{4}{c}{(D = 100)} & \multicolumn{4}{c}{(D = 150)} \\ 
%           & 10\% & 30\%  & 60\%  & 90\%  & 10\% & 30\%  & 60\%  & 90\%  & 10\% & 30\%  & 60\%  & 90\%  \\
%  \midrule
% Deepwalk	&	0.432	&	0.479	&	0.487	&	0.519	&	0.453	&	0.497	&	0.520	&	0.530	&	0.459	&	0.504	&	0.525	&	0.532	\\
% node2vec	&	0.456	&	0.503	&	0.508	&	0.555	&	0.493	&	0.529	&	0.539	&	0.544	&	0.501	&	0.538	&	0.570	&	0.582	\\
% struc2vec	&	0.224	&	0.240	&	0.278	&	0.314	&	0.226	&	0.250	&	0.274	&	0.294	&	0.224	&	0.243	&	0.254	&	0.297	\\
% EFGE-Bern	&	0.468	&	0.502	&	0.508	&	0.518	&	0.477	&	0.503	&	0.516	&	0.556	&	0.478	&	0.520	&	0.532	&	0.580	\\
% EFGE-Pois	&	0.460	&	0.504	&	0.501	&	0.518	&	0.497	&	0.490	&	0.491	&	0.562	&	0.497	&	0.491	&	0.499	&	0.566	\\
% EFGE-Norm	&	0.456	&	0.496	&	0.503	&	0.526	&	0.473	&	0.500	&	0.516	&	0.533	&	0.471	&	0.505	&	0.533	&	0.581	\\
% Splitter	&	0.164	&	0.162	&	0.183	&	0.181	&	0.169	&	0.166	&	0.188	&	0.186	&	0.165	&	0.162	&	0.166	&	0.177	\\
%  \midrule
%  Dp-Bern &	0.461	&	0.481	&	0.528	&	\textbf{0.589}	&	0.478	&	0.519	&	0.533	&	0.563	&	0.504	&	0.540	&	0.564	&	0.559 \\
% Dp-Pois &	0.435	&	0.462	&	0.479	&	0.538	&	0.430	&	0.460	&	0.476	&	0.528	&	0.403	&	0.441	&	0.460	&	0.510 \\
% Dp-Norm &	0.475	&	0.490	&	0.510	&	0.556	&	0.509	&	0.523	&	0.529	&	\textbf{0.559}	&	0.512	&	0.529	&	0.527	&	0.562 \\
% up-Bern &	0.459	&	0.492	&	0.509	&	0.538	&	0.481	&	0.522	&	\textbf{0.534}	&	0.538	&	0.502	&	0.557	&	0.581	&	0.585 \\
% up-Pois &	0.437	&	0.465	&	0.498	&	0.540	&	0.436	&	0.467	&	0.492	&	0.546	&	0.404	&	0.438	&	0.473	&	0.529 \\
% up-Norm &	\textbf{0.518}	&	\textbf{0.527}	&	\textbf{0.532}	&	0.568	&	\textbf{0.521}	&	\textbf{0.531}	&	0.514	&	0.561	&	\textbf{0.521}	&	\textbf{0.559}	&	\textbf{0.580}	&	\textbf{0.616} \\
%   \bottomrule
% \end{tabular}
% \end{table*}


% \begin{table*}
%     \centering
%     \caption{Results for Node Classification}\label{tab:res_yeast}
%     \begin{tabular}{l c c c c | c c c c | c c c c} 
%  \toprule % from booktabs package
%  \bfseries      Yeast & \multicolumn{4}{c}{(D = 50)} & \multicolumn{4}{c}{(D = 100)} & \multicolumn{4}{c}{(D = 150)} \\ 
%           & 10\% & 30\%  & 60\%  & 90\%  & 10\% & 30\%  & 60\%  & 90\%  & 10\% & 30\%  & 60\%  & 90\%  \\
%  \midrule
% Deepwalk	&	0.283	&	0.330	&	0.360	&	0.413	&	0.290	&	0.358	&	0.401	&	0.436	&	0.288	&	0.361	&	0.400	&	0.441	\\
% node2vec	&	0.280	&	0.320	&	0.351	&	0.388	&	0.293	&	0.338	&	0.371	&	0.410	&	0.297	&	0.354	&	0.401	&	0.437	\\
% struc2vec	&	0.134	&	0.150	&	0.169	&	0.256	&	0.134	&	0.153	&	0.166	&	0.238	&	0.141	&	0.161	&	0.171	&	0.225	\\
% EFGE-Bern	&	0.269	&	0.324	&	0.347	&	0.380	&	0.281	&	0.339	&	0.374	&	0.418	&	0.289	&	0.349	&	0.400	&	0.414	\\
% EFGE-Pois	&	0.271	&	0.320	&	0.365	&	0.373	&	0.281	&	0.331	&	0.372	&	0.399	&	0.286	&	0.339	&	0.374	&	0.409	\\
% EFGE-Norm	&	0.285	&	0.325	&	0.354	&	0.383	&	0.281	&	0.332	&	0.367	&	0.405	&	0.288	&	0.354	&	0.392	&	0.428	\\
% Splitter	&	0.164	&	0.207	&	0.228	&	0.246	&	0.157	&	0.214	&	0.263	&	0.263	&	0.165	&	0.211	&	0.273	&	0.297	\\
%  \midrule
% Dp-Bern	&	0.285	&	\textbf{0.343}	&	0.373	&	0.401	&	0.296	&	0.377	&	0.402	&	0.442	&	0.296	&	\textbf{0.376}	&	0.416	&	0.472	\\
% Dp-Pois	&	0.275	&	0.328	&	0.354	&	0.375	&	0.285	&	0.327	&	0.360	&	0.383	&	0.301	&	0.338	&	0.375	&	0.402	\\
% Dp-Norm	&	0.285	&	0.330	&	0.364	&	0.352	&	0.277	&	0.339	&	0.352	&	0.381	&	0.266	&	0.350	&	0.382	&	0.407	\\
% up-Bern	&	\textbf{0.290}	&	0.338	&	\textbf{0.382}	&	\textbf{0.414}	&	\textbf{0.301}	&	\textbf{0.361}	&	\textbf{0.406}	&	\textbf{0.443}	&	\textbf{0.304}	&	0.367	&	\textbf{0.419}	&	\textbf{0.479}	\\
% up-Pois	&	0.281	&	0.336	&	0.358	&	0.392	&	0.288	&	0.326	&	0.355	&	0.385	&	0.277	&	0.348	&	0.395	&	0.426	\\
% up-Norm	&	0.282	&	0.340	&	0.372	&	0.393	&	0.289	&	0.345	&	0.391	&	0.381	&	0.288	&	0.320	&	0.364	&	0.382	\\
%   \bottomrule
% \end{tabular}
% \end{table*}

%   (D = 100) \\
% \midrule
%   Deepwalk	&	0.754	&	0.796	&	0.819	&	0.829	\\
%node2vec	&	0.741	&	0.802	&	0.824	&	0.829	\\
%struc2vec	&	0.128	&	0.149	&	0.165	&	0.174	\\
%EFGE-Bern	&	0.758	&	0.805	&	0.824	&	0.830	\\
%EFGE-Pois	&	0.743	&	0.793	&	0.817	&	0.822	\\
%EFGE-norm	&	0.752	&	0.804	&	0.824	&	0.830	\\
%Splitter	&	0.426	&	0.490	&	0.530	&	0.573	\\
% \midrule
%Dp-Bern	&	0.810	&	0.835	&	0.846	&	0.849	\\
%Dp-Pois	&	0.782	&	0.822	&	0.831	&	0.830	\\
%Dp-norm	&	0.740	&	0.804	&	0.820	&	0.821	\\
%up-Bern &	0.802	&	0.835	&	0.841	&	0.852	\\
%up-Pois	&	0.778	&	0.823	&	0.832	&	0.831	\\
%up-norm &	0.742	&	0.805	&	0.821	&	0.823	\\
%  \bottomrule
%(D = 150) \\
% \midrule
%    Deepwalk	&	0.750	&	0.797	&	0.819	&	0.826	\\
%node2vec	&	0.740	&	0.799	&	0.826	&	0.834	\\
%struc2vec	&	0.131	&	0.159	&	0.178	&	0.189	\\
%EFGE-Bern	&	0.758	&	0.803	&	0.826	&	0.832	\\
%EFGE-Pois	&	0.745	&	0.798	&	0.821	&	0.825	\\
%EFGE-norm	&	0.755	&	0.808	&	0.827	&	0.833	\\
%Splitter	&	0.451	&	0.469	&	0.533	&	0.567	\\
% \midrule
%    Dp-Bern	&	0.800	&	0.835	&	0.843	&	0.850	\\
%Dp-Pois	&	0.782	&	0.823	&	0.832	&	0.833	\\
%Dp-norm	&	0.744	&	0.807	&	0.823	&	0.831	\\
%up-Bern	&	0.804	&	0.833	&	0.844	&	0.844	\\
%up-Pois	&	0.779	&	0.821	&	0.830	&	0.834	\\
%up-norm	&	0.733	&	0.806	&	0.821	&	0.827	\\
%  \bottomrule


% MOVE THIS FULL TABLE TO SUPPLEMENTARY
% \begin{table*}
%     \centering
%     \caption{Results for Node Classification}\label{tab:res_lastfm}
%     \begin{tabular}{l c c c c c c c c c c} 
%  \toprule % from booktabs package
%  \bfseries      LastFM \\(D = 50) 
%  \bfseries   &  10\% & 20\% & 30\%  & 40\%  & 50\%  & 60\%  & 70\%  & 80\%  & 90\%  \\
%  \midrule
%     Deepwalk	&	0.756	&	0.784	&	0.800	&	0.810	&	0.815	&	0.819	&	0.821	&	0.822	&	0.823	\\
% node2vec	&	0.741	&	0.771	&	0.796	&	0.809	&	0.816	&	0.820	&	0.825	&	0.826	&	0.828	\\
% struc2vec	&	0.116	&	0.122	&	0.127	&	0.129	&	0.129	&	0.130	&	0.133	&	0.138	&	0.138	\\
% EFGE-Bern	&	0.749	&	0.786	&	0.805	&	0.815	&	0.822	&	0.826	&	0.830	&	0.830	&	0.831	\\
% EFGE-Pois	&	0.741	&	0.773	&	0.791	&	0.809	&	0.816	&	0.820	&	0.822	&	0.824	&	0.825	\\
% EFGE-norm	&	0.758	&	0.787	&	0.807	&	0.816	&	0.823	&	0.826	&	0.829	&	0.831	&	0.832	\\
% Splitter	&	0.428	&	0.487	&	0.519	&	0.527	&	0.521	&	0.541	&	0.541	&	0.555	&	0.546	\\
%  \midrule
% Dp-Bern &	0.809	&	0.826	&	0.833	&	0.832	&	0.835	&	0.839	&	0.841	&	0.840	&	0.833	\\
% Dp-Pois &	0.776	&	0.811	&	0.821	&	0.826	&	0.829	&	0.831	&	0.831	&	0.832	&	0.833	\\
% Dp-norm &	0.751	&	0.785	&	0.807	&	0.816	&	0.822	&	0.822	&	0.823	&	0.823	&	0.823	\\
% up-Bern &	0.806	&	0.826	&	0.831	&	0.837	&	0.835	&	0.835	&	0.846	&	0.840	&	0.841	\\
% up-Pois &	0.781	&	0.807	&	0.818	&	0.824	&	0.827	&	0.828	&	0.829	&	0.830	&	0.829	\\
% up-norm	&	0.754	&	0.792	&	0.811	&	0.819	&	0.821	&	0.822	&	0.823	&	0.824	&	0.823	\\
%   \bottomrule
%   (D = 100) \\
%  \midrule
%   Deepwalk	&	0.754	&	0.777	&	0.796	&	0.808	&	0.815	&	0.819	&	0.824	&	0.827	&	0.829	\\
% node2vec	&	0.741	&	0.778	&	0.802	&	0.812	&	0.819	&	0.824	&	0.825	&	0.828	&	0.829	\\
% struc2vec	&	0.128	&	0.141	&	0.149	&	0.155	&	0.158	&	0.165	&	0.165	&	0.168	&	0.174	\\
% EFGE-Bern	&	0.758	&	0.785	&	0.805	&	0.814	&	0.819	&	0.824	&	0.827	&	0.827	&	0.830	\\
% EFGE-Pois	&	0.743	&	0.774	&	0.793	&	0.807	&	0.814	&	0.817	&	0.818	&	0.821	&	0.822	\\
% EFGE-norm	&	0.752	&	0.785	&	0.804	&	0.813	&	0.819	&	0.824	&	0.827	&	0.828	&	0.830	\\
% Splitter	&	0.426	&	0.461	&	0.490	&	0.510	&	0.534	&	0.530	&	0.534	&	0.548	&	0.573	\\
%  \midrule
% Dp-Bern	&	0.810	&	0.830	&	0.835	&	0.840	&	0.843	&	0.846	&	0.845	&	0.848	&	0.849	\\
% Dp-Pois	&	0.782	&	0.813	&	0.822	&	0.826	&	0.831	&	0.831	&	0.833	&	0.834	&	0.830	\\
% Dp-norm	&	0.740	&	0.781	&	0.804	&	0.813	&	0.816	&	0.820	&	0.821	&	0.818	&	0.821	\\
% up-Bern &	0.802	&	0.827	&	0.835	&	0.839	&	0.842	&	0.841	&	0.832	&	0.842	&	0.852	\\
% up-Pois	&	0.778	&	0.813	&	0.823	&	0.828	&	0.829	&	0.832	&	0.832	&	0.832	&	0.831	\\
% up-norm &	0.742	&	0.785	&	0.805	&	0.814	&	0.820	&	0.821	&	0.823	&	0.821	&	0.823	\\
%   \bottomrule
% (D = 150) \\
%  \midrule
%     Deepwalk	&	0.750	&	0.775	&	0.797	&	0.806	&	0.813	&	0.819	&	0.822	&	0.824	&	0.826	\\
% node2vec	&	0.740	&	0.778	&	0.799	&	0.811	&	0.818	&	0.826	&	0.828	&	0.831	&	0.834	\\
% struc2vec	&	0.131	&	0.146	&	0.159	&	0.166	&	0.172	&	0.178	&	0.182	&	0.185	&	0.189	\\
% EFGE-Bern	&	0.758	&	0.787	&	0.803	&	0.814	&	0.821	&	0.826	&	0.830	&	0.831	&	0.832	\\
% EFGE-Pois	&	0.745	&	0.775	&	0.798	&	0.809	&	0.817	&	0.821	&	0.823	&	0.824	&	0.825	\\
% EFGE-norm	&	0.755	&	0.786	&	0.808	&	0.816	&	0.824	&	0.827	&	0.829	&	0.831	&	0.833	\\
% Splitter	&	0.451	&	0.455	&	0.469	&	0.491	&	0.505	&	0.533	&	0.549	&	0.546	&	0.567	\\
%  \midrule
%     Dp-Bern	&	0.800	&	0.825	&	0.835	&	0.839	&	0.844	&	0.843	&	0.846	&	0.833	&	0.850	\\
% Dp-Pois	&	0.782	&	0.813	&	0.823	&	0.827	&	0.830	&	0.832	&	0.832	&	0.836	&	0.833	\\
% Dp-norm	&	0.744	&	0.786	&	0.807	&	0.817	&	0.821	&	0.823	&	0.825	&	0.828	&	0.831	\\
% up-Bern	&	0.804	&	0.828	&	0.833	&	0.838	&	0.844	&	0.844	&	0.845	&	0.847	&	0.844	\\
% up-Pois	&	0.779	&	0.811	&	0.821	&	0.826	&	0.829	&	0.830	&	0.829	&	0.831	&	0.834	\\
% up-norm	&	0.733	&	0.783	&	0.806	&	0.815	&	0.819	&	0.821	&	0.824	&	0.821	&	0.827	\\
%   \bottomrule
% \end{tabular}
% \end{table*}

% \begin{table*}
%     \centering
%     \caption{Results for Node Classification}\label{tab:res_citeseer}
%     \begin{tabular}{l c c c c c c c c c c} 
%  \toprule % from booktabs package
%  \bfseries      Citeseer \\ (D = 50) 
%  \bfseries   &  10\% & 20\% & 30\%  & 40\%  & 50\%  & 60\%  & 70\%  & 80\%  & 90\%  \\
%  \midrule
% Deepwalk	&	0.432	&	0.474	&	0.479	&	0.478	&	0.486	&	0.487	&	0.488	&	0.487	&	0.519	\\
% node2vec	&	0.456	&	0.494	&	0.503	&	0.510	&	0.507	&	0.508	&	0.516	&	0.517	&	0.555	\\
% struc2vec	&	0.224	&	0.234	&	0.240	&	0.259	&	0.269	&	0.278	&	0.300	&	0.290	&	0.314	\\
% EFGE-Bern	&	0.468	&	0.502	&	0.502	&	0.506	&	0.502	&	0.508	&	0.508	&	0.516	&	0.518	\\
% EFGE-Pois	&	0.460	&	0.496	&	0.504	&	0.510	&	0.514	&	0.501	&	0.508	&	0.520	&	0.518	\\
% EFGE-Nomr	&	0.456	&	0.485	&	0.496	&	0.498	&	0.502	&	0.503	&	0.502	&	0.508	&	0.526	\\
% Splitter	&	0.164	&	0.167	&	0.162	&	0.168	&	0.168	&	0.183	&	0.169	&	0.170	&	0.181	\\
% \midrule
% Dp-Bern &		0.461	&	0.473	&	0.481	&	0.484	&	0.495	&	0.528	&	0.559	&	0.550	&	0.589	\\
% Dp-Pois &		0.435	&	0.451	&	0.462	&	0.465	&	0.471	&	0.479	&	0.492	&	0.507	&	0.538	\\
% Dp-Norm &	0.475	&	0.488	&	0.490	&	0.491	&	0.497	&	0.510	&	0.520	&	0.527	&	0.556	\\
% up-Bern &	0.459	&	0.493	&	0.492	&	0.499	&	0.509	&	0.509	&	0.507	&	0.520	&	0.538	\\
% up-Pois &	0.437	&	0.458	&	0.465	&	0.473	&	0.482	&	0.498	&	0.510	&	0.537	&	0.540	\\
% up-Norm &	0.518	&	0.522	&	0.527	&	0.525	&	0.530	&	0.532	&	0.545	&	0.549	&	0.568	\\
%   \bottomrule
% (D = 100) \\
%  \midrule
% deepwalk	&	0.453	&	0.490	&	0.497	&	0.486	&	0.514	&	0.520	&	0.521	&	0.531	&	0.530	\\
% node2vec	&	0.493	&	0.512	&	0.529	&	0.521	&	0.540	&	0.539	&	0.547	&	0.544	&	0.544	\\
% struc2vec	&	0.226	&	0.239	&	0.250	&	0.235	&	0.284	&	0.274	&	0.265	&	0.285	&	0.294	\\
% EFGE(Bern)	&	0.477	&	0.499	&	0.503	&	0.496	&	0.516	&	0.516	&	0.514	&	0.519	&	0.556	\\
% EFGE(Pois)	&	0.497	&	0.488	&	0.490	&	0.493	&	0.503	&	0.491	&	0.498	&	0.530	&	0.562	\\
% EFGE(Norm)	&	0.473	&	0.499	&	0.500	&	0.494	&	0.524	&	0.516	&	0.520	&	0.527	&	0.533	\\
% Splitter	&	0.169	&	0.166	&	0.166	&	0.172	&	0.183	&	0.188	&	0.180	&	0.188	&	0.186	\\
% \midrule
% Dp-Bern	&	0.478	&	0.506	&	0.519	&	0.519	&	0.540	&	0.533	&	0.543	&	0.550	&	0.563	\\
% Dp-Pois &	0.430	&	0.450	&	0.460	&	0.468	&	0.469	&	0.476	&	0.478	&	0.498	&	0.528	\\
% Dp-norm &	0.509	&	0.529	&	0.523	&	0.525	&	0.528	&	0.529	&	0.551	&	0.555	&	0.559	\\
% up-Bern &	0.481	&	0.512	&	0.522	&	0.532	&	0.526	&	0.534	&	0.548	&	0.546	&	0.538	\\
% up-Pois &	0.436	&	0.456	&	0.467	&	0.474	&	0.479	&	0.492	&	0.512	&	0.529	&	0.546	\\
% up-Norm &	0.521	&	0.531	&	0.531	&	0.519	&	0.520	&	0.514	&	0.530	&	0.551	&	0.561	\\
%   \bottomrule
% (D = 150) \\
%  \midrule
% deepwalk	&	0.459	&	0.488	&	0.504	&	0.507	&	0.525	&	0.525	&	0.528	&	0.516	&	0.532	\\
% node2vec	&	0.501	&	0.526	&	0.538	&	0.549	&	0.559	&	0.570	&	0.583	&	0.577	&	0.582	\\
% struc2vec	&	0.224	&	0.236	&	0.243	&	0.250	&	0.250	&	0.254	&	0.257	&	0.280	&	0.297	\\
% EFGE-Bern	&	0.478	&	0.499	&	0.520	&	0.517	&	0.529	&	0.532	&	0.533	&	0.552	&	0.580	\\
% EFGE-Pois	&	0.497	&	0.482	&	0.491	&	0.497	&	0.515	&	0.499	&	0.516	&	0.518	&	0.566	\\
% EFGE-Norm	&	0.471	&	0.494	&	0.505	&	0.517	&	0.521	&	0.533	&	0.534	&	0.539	&	0.581	\\
% Splitter	&	0.165	&	0.162	&	0.162	&	0.164	&	0.160	&	0.166	&	0.158	&	0.172	&	0.177	\\
% \midrule
% Dp-Bern &	0.504	&	0.524	&	0.540	&	0.551	&	0.559	&	0.564	&	0.563	&	0.562	&	0.559	\\
% Dp-Pois &	0.403	&	0.427	&	0.441	&	0.451	&	0.455	&	0.460	&	0.466	&	0.478	&	0.510	\\
% Dp-Norm &	0.512	&	0.529	&	0.529	&	0.521	&	0.536	&	0.527	&	0.525	&	0.539	&	0.562	\\
% up-Bern &	0.502	&	0.543	&	0.557	&	0.570	&	0.580	&	0.581	&	0.600	&	0.595	&	0.585	\\
% up-Pois &	0.404	&	0.426	&	0.438	&	0.442	&	0.451	&	0.473	&	0.473	&	0.504	&	0.529	\\
% up-Norm &	0.521	&	0.544	&	0.559	&	0.564	&	0.575	&	0.580	&	0.598	&	0.605	&	0.616	\\
%   \bottomrule
% \end{tabular}
% \end{table*}


% \begin{table*}
%     \centering
%     \caption{Results for Node Classification}\label{tab:res_yeast}
%     \begin{tabular}{l c c c c c c c c c c} 
%  \toprule % from booktabs package
%  \bfseries      Yeast \\ (D = 50) 
%  \bfseries   &  10\% & 20\% & 30\%  & 40\%  & 50\%  & 60\%  & 70\%  & 80\%  & 90\%  \\
%  \midrule
% Deepwalk  & 0.283 & 0.309 & 0.330 & 0.336 & 0.357 & 0.360 & 0.384 & 0.386 & 0.413 \\
% node2vec  &   0.280 & 0.303 & 0.320 & 0.341 & 0.343 & 0.351 & 0.356 & 0.360 & 0.388 \\
% struc2vec  &  0.134 & 0.139 & 0.150 & 0.153 & 0.164 & 0.169 & 0.182 & 0.224 & 0.256 \\
% EFGE-Bern  &  0.269 & 0.313 & 0.324 & 0.342 & 0.346 & 0.347 & 0.357 & 0.357 & 0.380 \\
% EFGE-Pois  &  0.271 & 0.317 & 0.320 & 0.340 & 0.333 & 0.365 & 0.358 & 0.372 & 0.373 \\
% EFGE-norm  &  0.285 & 0.317 & 0.325 & 0.333 & 0.339 & 0.354 & 0.352 & 0.375 & 0.383 \\
% Splitter   &  0.164 & 0.198 & 0.207 & 0.217 & 0.231 & 0.228 & 0.236 & 0.260 & 0.246 \\
% \midrule
% Dp-Bern & 0.285 & 0.333 & 0.343 & 0.368 & 0.365 & 0.373 & 0.383 & 0.391 & 0.401 \\
% Dp-Pois & 0.275 & 0.309 & 0.328 & 0.343 & 0.351 & 0.354 & 0.351 & 0.365 & 0.375 \\
% Dp-Norm & 0.285 & 0.303 & 0.330 & 0.347 & 0.351 & 0.364 & 0.351 & 0.353 & 0.352 \\
% up-Bern & 0.290 & 0.327 & 0.338 & 0.364 & 0.363 & 0.382 & 0.383 & 0.383 & 0.414 \\
% up-Pois & 0.281 & 0.307 & 0.336 & 0.347 & 0.345 & 0.358 & 0.364 & 0.375 & 0.392 \\
% up-Norm &   0.282 & 0.324 & 0.340 & 0.357 & 0.361 & 0.372 & 0.377 & 0.370 & 0.393 \\
% \bottomrule
% (D = 100) \\
%  \midrule
%   Deepwalk & 0.290 & 0.326 & 0.358 & 0.378 & 0.393 & 0.401 & 0.406 & 0.416 & 0.436 \\
% node2vec &  0.293 & 0.322 & 0.338 & 0.358 & 0.360 & 0.371 & 0.383 & 0.393 & 0.410 \\
% struc2vec & 0.134 & 0.150 & 0.153 & 0.161 & 0.171 & 0.166 & 0.181 & 0.204 & 0.238 \\
% EFGE-Bern & 0.281 & 0.322 & 0.339 & 0.356 & 0.366 & 0.374 & 0.376 & 0.395 & 0.418 \\
% EFGE-Pois & 0.281 & 0.303 & 0.331 & 0.336 & 0.360 & 0.372 & 0.370 & 0.388 & 0.399 \\
% EFGE-norm & 0.281 & 0.317 & 0.332 & 0.353 & 0.363 & 0.367 & 0.388 & 0.402 & 0.405 \\
% Splitter &  0.157 & 0.193 & 0.214 & 0.229 & 0.249 & 0.263 & 0.249 & 0.261 & 0.263 \\
%   \midrule
%   Dp-Bern & 0.296 & 0.348 & 0.377 & 0.388 & 0.390 & 0.402 & 0.422 & 0.445 & 0.442 \\
% Dp-Pois & 0.285 & 0.316 & 0.327 & 0.351 & 0.354 & 0.360 & 0.384 & 0.379 & 0.383 \\
% Dp-Norm & 0.277 & 0.307 & 0.339 & 0.345 & 0.349 & 0.352 & 0.364 & 0.373 & 0.381 \\
% up-Bern & 0.301 & 0.350 & 0.361 & 0.384 & 0.393 & 0.406 & 0.409 & 0.432 & 0.443 \\
% up-Pois & 0.288 & 0.313 & 0.326 & 0.331 & 0.343 & 0.355 & 0.369 & 0.363 & 0.385 \\
% up-Norm & 0.289 & 0.331 & 0.345 & 0.367 & 0.365 & 0.391 & 0.391 & 0.392 & 0.381 \\
%   \bottomrule
%   (D = 150) \\
%  \midrule
% Deepwalk  & 0.288 & 0.338 & 0.361 & 0.385 & 0.383 & 0.400 & 0.413 & 0.417 & 0.441 \\
% node2vec  &   0.297 & 0.333 & 0.354 & 0.361 & 0.384 & 0.401 & 0.399 & 0.398 & 0.437 \\
% struc2vec &   0.141 & 0.154 & 0.161 & 0.168 & 0.166 & 0.171 & 0.197 & 0.202 & 0.225 \\
% EFGE-Bern &   0.289 & 0.323 & 0.349 & 0.366 & 0.375 & 0.400 & 0.397 & 0.413 & 0.414 \\
% EFGE-Pois &   0.286 & 0.317 & 0.339 & 0.364 & 0.359 & 0.374 & 0.397 & 0.408 & 0.409 \\
% EFGE-Norm &   0.288 & 0.327 & 0.354 & 0.370 & 0.377 & 0.392 & 0.391 & 0.417 & 0.428 \\
% Splitter  &   0.165 & 0.188 & 0.211 & 0.243 & 0.260 & 0.273 & 0.283 & 0.294 & 0.297 \\
%  \midrule
% Dp-Bern & 0.296 & 0.351 & 0.376 & 0.390 & 0.403 & 0.416 & 0.425 & 0.430 & 0.472 \\
% Dp-Pois & 0.301 & 0.317 & 0.338 & 0.354 & 0.366 & 0.375 & 0.369 & 0.387 & 0.402 \\
% Dp-Norm & 0.266 & 0.318 & 0.350 & 0.370 & 0.374 & 0.382 & 0.396 & 0.403 & 0.407 \\
% up-Bern & 0.304 & 0.356 & 0.367 & 0.381 & 0.414 & 0.419 & 0.420 & 0.439 & 0.479 \\
% up-Pois & 0.288 & 0.313 & 0.320 & 0.334 & 0.354 & 0.364 & 0.373 & 0.380 & 0.382 \\
% up-Norm & 0.277 & 0.319 & 0.348 & 0.371 & 0.379 & 0.395 & 0.389 & 0.404 & 0.426 \\
%  \bottomrule
% \end{tabular}
% \end{table*}









\section{\chiencr{Discussions and} Conclusions} \label{sec:conclusion}

We proposed nonparametric exponential family graph embedding, allowing multiple node representations, drawn both with a
Dirichlet process prior, and also exploring uniform processes. A tailored algorithm for efficient computation is provided. The experiments demonstrate the learned multiple representations can enhance performance in two tasks.
\chiencr{We considered three classical exponential family distributions, Bernoulli, Poisson, and Gaussian, which yielded promising results. 
%However, to tackle different situations, 
Our model can be adapted to other distributions such as Geometric and Chi-square with the proposed nonparametric framework.}
\chiencr{In our experiments, the hyperparameter $\gamma$ of the nonparametric prior was fixed for the nodes, which already yielded promising results in the standard tasks; having differing $\gamma$ values could be useful for extending the model to scenarios 
%the proposed model can be used as a basis for extensive applications 
such as learning multiple representations for under-represented nodes, or imbalanced classification tasks.} 
%\jaakkocr{Future work could also consider further flexibility by also allowing multiple context vectors.} 

% UAI 2022 papers have to be prepared using \LaTeX.
% To start writing your paper, copy \texttt{uai2022-template.tex} and replace title, authorship, and content with your own.

% The UAI 2022 paper style is based on a custom \textsf{uai2022} class.
% The class file sets the page geometry and visual style.\footnote{%
%     The class uses the packages \textsf{adjustbox}, \textsf{environ}, \textsf{letltxmacro}, \textsf{geometry}, \textsf{footmisc}, \textsf{caption}, \textsf{textcase}, \textsf{titlesec}, \textsf{titling}, \textsf{authblk}, \textsf{enumitem}, \textsf{microtype}, \textsf{lastpage}, and \textsf{kvoptions}.
% }
% The class file also loads basic text fonts.\footnote{%
%     Fonts loaded are \textsf{times} (roman), \textsf{helvet} (sanserif), \textsf{courier} (fixed-width), and \textsf{textcomp} (common symbols).
% }
% \emph{You may not modify the geometry or style in any way, for example, to squeeze out a little bit of extra space.}
% (Also do not use \verb|\vspace| for this.)
% Feel free to use convenience functionality of loaded packages such as \textsf{enumitem}.
% The class enables hyperlinking by loading the \textsf{hyperref} package.

% You are free to load any packages available in \TeX{Live}~2020 that are compatible with the UAI class.\footnote{In case this template or your submission does not compile, always first make sure your \TeX\ installation is up-to-date.}
% (Mik\TeX{} and Mac\TeX{} generally contain the same packages.)
% Do not load conflicting packages—you will get an error message—, as this complicates creating the proceedings.
% Please avoid using obsolete commands, such as \verb|\rm|, and obsolete packages, such as \textsf{epsfig}.\footnote{%
%     See \url{https://ctan.org/pkg/l2tabu}.
% }

% \swap[ ]{in the header of your source file.}{Feel free to include your own macros}

\begin{acknowledgements} % will be removed in pdf for initial submission,
                         % so you can already fill it to test with the
                         % ‘accepted’ class option
    % Briefly acknowledge people and organizations here.

    % \emph{All} acknowledgements go in this section.
    \chiencr{This work is supported by the Academy of Finland, decisions 312395 and 327352.}
\end{acknowledgements}

\bibliography{uai2022-template}



\end{document}





\section{General Formatting Instructions}
As a general rule: \emph{follow the template}.

\subsection{Authorship}
Reviewing is double-blind.
However, you can already fill in your author names and affiliations in the \verb|\author| block in the preamble following the example of the template because the class will remove it as long as the option \textsf{accepted} is not passed to the class.
Nevertheless, make sure any other information in the paper does not disclose your identity, for example URLs to supplementary material.

\subsection{Sectioning}
Three numbered sectioning commands are provided: \verb|\section|, \verb|\subsection|, and \verb|\subsubsection|.
Please respect their order, so do not put a \verb|\subsubsection| directly beneath a \verb|\section|.
One unnumbered sectioning command is provided, \verb|\paragraph|.
It can be used directly below any numbered section level.
Do not use any other sectioning commands.

\subsubsection{Typing the Section Titles}
The \verb|\section| and \verb|\subsection| titles are uppercased by the class.
Please type them in title case.
(This is used in the PDF bookmarks.)
Please also write the \verb|\subsubsection| titles in title case.

\paragraph{What is title case?}
\href{https://en.wikipedia.org/wiki/Title_case}{Wikipedia} explains:
\begin{quote}
    Title case or headline case is a style of capitalization used for rendering the titles of published works or works of art in English.
    When using title case, all words are capitalized except for ‘minor’ words (typically articles, short prepositions, and some conjunctions) unless they are the first or last word of the title.
\end{quote}

\subsection{References, Citations, Footnotes}\label{sec:etc}
\subsubsection{Cross-Referencing}
Always use \verb|\label| and \verb|\ref|—or a command with a similar effect—when cross-referencing.
For example, this subsection is Section~\ref{sec:etc}.

\subsubsection{Citations}
Citations should include the author's last name and year.
They should be part of the sentence.
An example parenthetical citation: “Good introductions to the topic are available \citep{latexcompanion}.”
An example textual citation: “\citet{einstein} discusses electrodynamics of moving bodies.”
Do not use a parenthetical citation where a textual one is appropriate.
An example of what \emph{not} to do: “\citep{einstein} discusses electrodynamics of moving bodies.”

We strongly advise to use reference list software such as Bib\TeX{} and a citation package such as \textsf{natbib}.
The reference style you use should be compatible with the author-year citations.
Both the citation style and reference style used should be consistent.

For the original submission, take care not to reveal the authors' identity through the manner in which one's own previous work is cited.
For example, writing
“I discussed electrodynamics of moving bodies before \citep{einstein}.” would be inappropriate, as it reveals the author's identity.
Instead, write “\citet{einstein} discussed electrodynamics of moving bodies.”

\subsubsection{Footnotes}
You can include footnotes in your text.\footnote{
    Use footnotes sparingly, as they can be distracting, having readers skip back and forth between the main text and the foot of the page.
}
The footnote mark should follow the fragment to which it refers, so a footnote\footnote{
    A footnote is material put at the foot of a page.
}
for a word has a footnote mark attached to that word and a footnote for a phrase or sentence has a footnote mark attached to the closing punctuation.

\section{Math}\label{sec:math}
The class file does not load any math support package like \textsf{amsmath}\footnote{%
  See the \textsf{amsmath} documentation at \url{https://ctan.org/pkg/amsmath} for further details.
}.
We advise using the \textsf{mathtools}\footnote{%
  See the \textsf{mathtools} documentation at \url{https://ctan.org/pkg/mathtools} for further details.
}
package, which extends \textsf{amsmath} with fixes and even more useful commands.
Feel free to load other support packages for symbols, theorems, etc.

Use the \textsf{amsmath} environments for displayed equations.
So, specifically, use the \texttt{equation} environment instead of \verb|$$...$$| and the \texttt{align} environment instead of \texttt{eqnarray}.\footnote{For reasons why you should not use the obsolete \texttt{eqnarray} environment, see Lars Madsen, \textit{Avoid eqnarray!} TUGboat 33(1):21--25, 2012.}
An \texttt{equation}:
\begin{equation}\label{eq:example}
  0 = 1 - 1.
\end{equation}
Two \texttt{align}'ed equations:
\begin{align*} % no numbers with starred version
  1 + 2 &= 3,\\
  1 - 2 &= -1.
\end{align*}
Equations can also be put inline, of course.
For example, Equation~\eqref{eq:example}: \(0=1+1\). % $0=1+1$ also works
(Notice that both inline and displayed math are part of the sentence, so punctuation should be added to displayed math.)

The \textsf{amsmath} and \textsf{mathtools} packages provide a lot of nice functionality, such as many common math operators, e.g., \(\sin\) and \(\max\), and also commands for defining new ones.

\section{Floats}\label{sec:floats}
Floats, such as figures, tables and algorithms, are moving objects and are supposed to float to the nearest convenient location.
Please do not force them to go in the middle of a paragraph.
They must respect the column width.

Two-column floats are possible.
They appear at the top of the next page, so strategic placement may be necessary.
For an example, see Figure~\ref{fig:tikz}.
They may not enter the margins.
\begin{figure*}
    \centering
    \begin{tikzpicture}[xscale=1.5]
        \coordinate (origin);
        \draw[->] (origin) -- +(1cm,0) node[below] {$x$};
        \draw[->] (origin) -- +(0,1cm) node[left] {$y$};
        \fill[gray] (45:1cm) circle[radius=.2cm];
    \end{tikzpicture}
    \caption{A Nice Filled Ellipse with a Pair of Coordinate Axes.}\label{fig:tikz}
\end{figure*}

All material in floats should be legible and of good quality.
So avoid very small or large text and pixelated or fuzzy lines.

\subsection{Figures}\label{sec:figures}
Figures should go in the \texttt{figure} environment and be centered therein.
The caption should go below the figure.
Use \verb|\includegraphics| for external graphics files but omit the file extension.
Supported formats are \textsf{pdf} (preferred for vector drawings and diagrams), \textsf{png} (preferred for screenshots), and \textsf{jpeg} (preferred for photographs).
Do not use \verb|\epsfig| or \verb|\psfig|.
If you want to scale the image, it is better to use a fraction of the line width rather than an explicit length.
For example, see Figure~\ref{fig:Eindhoven}.
\begin{figure}
  \centering
  \includegraphics[width=0.7\linewidth,page=3]{Eindhoven}
  \caption{A View of a Nice City.}\label{fig:Eindhoven}
\end{figure}

Do not use \verb|\graphicspath|.
If the images are contained in a subdirectory, specify this when you include the image, for example \verb|\includegraphics{figures/mypic}|.

\subsection{Tables}\label{sec:tables}
Tables should go in the \texttt{table} environment and be centered therein.
The caption should go above the table and be in title caps.
For an example, see Table~\ref{tab:data}.
\begin{table}
    \centering
    \caption{An Interesting Table.}\label{tab:data}
    \begin{tabular}{rl}
      \toprule % from booktabs package
      \bfseries Dataset & \bfseries Result\\
      \midrule % from booktabs package
      Data1 & 0.12345\\
      Data2 & 0.67890\\
      Data3 & 0.54321\\
      Data4 & 0.09876\\
      \bottomrule % from booktabs package
    \end{tabular}
\end{table}

\subsection{Algorithms}\label{sec:algorithms}
You can load your favorite algorithm package, such as \textsf{algorithm2e}\footnote{See the \textsf{algorithm2e} documentation at \url{https://ctan.org/pkg/algorithm2e}.}.
Use the environment defined in the package to create a centered float with an algorithm inside.

\section{Back Matter}
There are a some final, special sections that come at the back of the paper, in the following order:
\begin{itemize}
  \item Author Contributions
  \item Acknowledgements
  \item References
\end{itemize}
They all use an unnumbered \verb|\subsubsection|.

For the first two special environments are provided.
(These sections are automatically removed for the anonymous submission version of your paper.)
The third is the ‘References’ section.
(See below.)

(This ‘Back Matter’ section itself should not be included in your paper.)

\begin{contributions} % will be removed in pdf for initial submission,
                      % so you can already fill it to test with the
                      % ‘accepted’ class option
    Briefly list author contributions.
    This is a nice way of making clear who did what and to give proper credit.

    H.~Q.~Bovik conceived the idea and wrote the paper.
    Coauthor One created the code.
    Coauthor Two created the figures.
\end{contributions}

\begin{acknowledgements} % will be removed in pdf for initial submission,
                         % so you can already fill it to test with the
                         % ‘accepted’ class option
    Briefly acknowledge people and organizations here.

    \emph{All} acknowledgements go in this section.
\end{acknowledgements}

\bibliography{uai2022-template}

\appendix
% NOTE: necessary when ptmx or no mathfont class option is given
\providecommand{\upGamma}{\Gamma}
\providecommand{\uppi}{\pi}
\section{Math font exposition}
How math looks in equations is important:
\begin{equation*}
  F_{\alpha,\beta}^\eta(z) = \upGamma(\tfrac{3}{2}) \prod_{\ell=1}^\infty\eta \frac{z^\ell}{\ell} + \frac{1}{2\uppi}\int_{-\infty}^z\alpha \sum_{k=1}^\infty x^{\beta k}\mathrm{d}x.
\end{equation*}
However, one should not ignore how well math mixes with text:
The frobble function \(f\) transforms zabbies \(z\) into yannies \(y\).
It is a polynomial \(f(z)=\alpha z + \beta z^2\), where \(-n<\alpha<\beta/n\leq\gamma\), with \(\gamma\) a positive real number.

\end{document}
