Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Local-to-global Perspectives on Graph Neural Networks

Abstract

The theoretical study of Message Passing Neural Networks (MPNN) focuses on increasing expressive power and overcoming over-squashing

amp; over-smoothing phenomena. The expressive power study of MPNN suggests that one needs to move from local computation to global modeling to gain expressive power in terms of the Weisfeiler-Lehman hierarchy. My dissertation centers around understanding the theoretical property of global GNN, its relationship to the local MPNN, and how to use GNN for coarse-graining. In particular, it consists of three parts:

Convergence of Invariant Graph Network. In this part, we investigate the convergence of one powerful GNN, Invariant Graph Network (IGN) over graphs sampled from graphons. We first prove the stability of linear layers for general k-IGN (of order k) based on a novel interpretation of linear equivariant layers. Building upon this result, we prove the convergence of k-IGN under the model of \cite{ruiz2020graphon}, where we access the edge weight but the convergence error is measured for graphon inputs. Under the more natural (and more challenging) setting of \cite{keriven2020convergence} where one can only access 0-1 adjacency matrix sampled according to edge probability, we first show a negative result that the convergence of any IGN is not possible. We then obtain the convergence of a subset of IGNs, denoted as IGN-small, after the edge probability estimation. We show that IGN-small still contains functions rich enough to approximate spectral GNNs arbitrarily well. Lastly, we perform experiments on various graphon models to verify our statements.

The Connection between MPNN and Graph Transformer. In this part, we study the connection between local GNN (MPNN) and global GNN (Graph Transformer). Graph Transformer (GT) recently has emerged as a new paradigm of graph learning algorithms, outperforming the previously popular Message Passing Neural Network (MPNN) on multiple benchmarks. Previous work \cite{kim2022pure} shows that with proper position embedding, GT can approximate MPNN arbitrarily well, implying that GT is at least as powerful as MPNN. In this paper, we study the inverse connection and show that MPNN with virtual node (VN), a commonly used heuristic with little theoretical understanding, is powerful enough to arbitrarily approximate the self-attention layer of GT.

In particular, we first show that if we consider one type of linear transformer, the so-called Performer/Linear Transformer, then MPNN + VN with only $O(1)$ depth and $O(1)$ width can approximate a self-attention layer in Performer/Linear Transformer. Next, via a connection between MPNN + VN and DeepSets, we prove the MPNN + VN with $O(n^d)$ width and $O(1)$ depth can approximate the self-attention layer arbitrarily well, where d is the input feature dimension. Lastly, under some assumptions, we provide an explicit construction of MPNN + VN with $O(1)$ width and $O(n)$ depth approximating the self-attention layer in GT arbitrarily well.

Graph coarsening with neural networks. In this part, we study the fundamental problem of reducing graph size/resolution while preserving its spectral property. We first propose a framework for measuring the quality of coarsening algorithm and show that depending on the goal, we need to carefully choose the Laplace operator on the coarse graph and associated projection/lift operators. Motivated by the observation that the current choice of edge weight for the coarse graph may be sub-optimal, we parametrize the weight assignment map with GNN and train it to improve the coarsening quality in an unsupervised way. Through extensive experiments on both synthetic and real networks, we demonstrate that our method significantly improves common graph coarsening methods under various metrics, reduction ratios, graph sizes, and graph types. It generalizes to graphs of larger size (25x of training graphs), is adaptive to different losses (differentiable and non-differentiable), and scales to much larger graphs than previous work.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View