\section{Research Problems}
In text attributed networks, nodes are characterized by textual information. For example, in social networks, the nodes (users) are
linked with other nodes by friend relationships and each node is associated with text like their profile information. Recent works show that leveraging both textual information and topological structure into network representation learning (NRL) benefits downstream tasks performances~\citep{chen2021effective,zhang2019attributed}.  However, most real-world graphs associated with people or human-related activities are often sensitive and might contain confidential information~\citep{sajadmanesh2021locally}. Releasing the representations of nodes in real world graphs gives adversaries a potential way to infer the sensitive information of nodes and edges. Thus, it is critical to develop privacy-preserving representations for applications using graphs that require users' sensitive data to fulfill users' privacy requirements.

\textbf{Challenges}
\textit{Current NRL works have a few remaining problems. First, training a Graph Neural Network (GNN) from private node data is a challenging task due to the absence of a trusted third party. Second, typical NRLs, e.g., diffusion mechanisms and random walks on graphs, only assess the pairwise relationships between two individual nodes, but ignore the higher-order interactions between nodes. Third, current GNNs ignore important interactions between global invariant features, i.e., data characteristics which are invariant under continuous transformations such as stretching, bending, and compressing.} Recently, simplicial neural networks (SNNs) offer a mathematically rigorous framework to evaluate not only higher-order interactions, but also global invariant features of the observed graph to systematically learn topological structures. These features occur in the form of homological features, intuitively perceived as holes, or voids, in any desired dimension. It is important to investigate -$\mathbf{Q1}$: how to build representation outputs from SNNs (RO\_SNNs) that integrate attribute textual information with multi-scale relationships between nodes and -$\mathbf{Q2}$: to study if the RO\_SNNs are exposed to new threats and if they are more vulnerable compared to regular representation outputs from typical GNNs. Besides, -$\mathbf{Q3}$: it is also challenging to propose \textit{secure RO\_SNNs} that even if an adversary has access to user representation outputs in the database, that adversary will still be unable to learn too much about the user's sensitive data.  
\section{Research Plan}
Currently, there are a few SNNs~\citep{keros2022dist2cycle,chen2021topological} that utilize GNN models for learning functions parametrized by the homological features to learn topological relationships of the underlying simplicial complexes. All existing SNNs are not focusing on NRL on text-attributed networks. It is non-trivial to incorporate text representations in existing SNNs due to SNNs cannot easily handle additional information during its graph convolution. Fortunately, according to the theoretical finding that NRLs are equivalent to factorize an affinity matrix $M$ derived from the adjacency matrix of the original network~\citep{yang2015network}. In $\mathbf{Q1}$, we first develop the RT4SC. The learning process includes two stages. In the first stage, we integrate text representations with regular pairwise node interactions via factorizing $M$ into the product of three matrices as $M=W^THT$, where $W \in \mathbb{R}^{k\times \cdot }$, $H \in \mathbb{R}^{k \times t}$ and
text features $T \in \mathbb{R}^{t \times \cdot}$. Then we can concatenate $W$ and $HT$ as $2k$-dimensional representations of nodes. In the second stage, we will enrich the node representations by: 1) first extracting local topological side information from subgraphs using persistent homology of the small neighborhoods of nodes; and then 2) incorporating the extracted local topological side information into the local GNN algorithm for NRL. In $\mathbf{Q2}$, we will first investigate the membership inference attack and show that an adversary can distinguish which node participates in the training of the GNN via training an inference model to recognize differences between the prediction of the model trained with the record and that of the model trained without the record. Second, the server can aggregate nodes’ representations with their neighbors to learn better user representations for improving its services. This means if there is an edge between two nodes, then their RO\_SNNs should be closer. Therefore, a potential adversary could possibly recover the sensitive edge information (e.g., friend lists) via a machine learning classifier that simply measures distance differences of the RO\_SNNs. Thus, we will study whether representations can be inverted to recover the graph used to generate
them. Graph reconstruction attacks (GRAs) try to infer the edges of graphs. Regular GRAs predict edges via measuring distance differences between linked node pairs and unlinked node pairs via clustering~\citep{he2021stealing}. However, we will propose a GRA that utilizes a graph-decoder to minimize the reconstruction loss of the generated adjacency matrix via backpropagation. We will further perform both attacks on outputs from regular GNNs and measure if RO\_SNNs are more vulnerable compared to regular outputs from GNNs. In $\mathbf{Q3}$, I will study a privacy-preserving deterministic differentially private alternating direction method of multiplier, i.e., D$^2$-ADMM, to learn secure RO\_SNNs that not only capture multi-scale relationships, but also could defend the potential attacks on an untrusted server.
\paragraph{Completed Research and Timeline}
Me and my Ph.D. advisor have been researching privacy-preserving text representations. In our paper~\citep{zhan2021multi}, we show that some of the hidden private information correlates with the output labels and therefore can be learned by a neural network. In such a case, there is a tradeoff between the utility of the representation and its privacy. We explicitly cast this problem as multi-objective optimization and propose a multiple-gradient descent algorithm that enables the efficient application of the Frank-Wolfe algorithm to search for the optimal utility privacy configuration of the text classification network. Our prior work~\citep{zhan2022new} also show that it is challenging to protect privacy while preserving important semantic information about an input text.  In particular, the threats are (1) these representations reveal sensitive attributes, no matter if they explicitly exist in the input text and (2) the representations can be partially recovered via generative models. In our recent paper~\citep{zhan2023graph}, we propose a GRA to recover a graph’s adjacency matrix from three types of representation outputs, i.e., representation outputs from graph convolutional networks, graph attention networks, and SNNs. We find that SNN outputs obtain the highest precision and AUC on five real-world networks. Therefore, the SNN outputs reveal the lowest privacy-preserving ability to defend the GRAs. Thus, it calls for future research towards building more private and higher-order representations that could defend the potential threats. My research timeline is as follows: By the date of submission, I addressed the first part of $\mathbf{Q2}$. By the workshop date, I plan to complete $\mathbf{Q1}$ and $\mathbf{Q2}$ and I will complete $\mathbf{Q3}$ after the workshop. 

