Abstract: Community detection is a crucial task in network analysis that can be significantly
improved by incorporating subject-level information, i.e., covariates. Existing methods have
shown the effectiveness of using covariates on the low-degree nodes, but rarely discuss the
case where communities have significantly different density levels, i.e., multiscale networks.
In this paper, we introduce a novel method that addresses this challenge by constructing
network-adjusted covariates, which leverage the network connections and covariates with
a node-specific weight for each node. This weight can be calculated without tuning parameters.
We present novel theoretical results on the strong consistency of our method under
degree-corrected stochastic blockmodels with covariates, even in the presence of misspecification
and multiple sparse communities. Additionally, we establish a general lower bound
for the community detection problem when both the network and covariates are present,
and it shows that our method is optimal for connection intensity up to a constant factor.
Our method outperforms existing approaches in simulations and a LastFM app user network.
We then compare our method with others on a statistics publication citation network
where 30% of nodes are isolated, and our method produces reasonable and balanced results.
Our method is implemented in the R package NAC.
Loading