Collaborative Networked Estimation and Inference through Graph Neural Networks

Graph neural networks (GNNs) are deep learning architectures that learn powerful representations of graphs and graph signals. GNNs have shown remarkable performance across various domains, such as biology and drug discovery, quantum chemistry, robotics, social networks, and recommender systems. Their success can be attributed to their fundamental and favorable properties, including permutation invariance-equivariance, stability to deformations, and transferability.

Since GNNs perform representation learning on graph structures and graph data, a lot of research has been conducted to answer how expressive GNN representations are. A major line of research studies the ability of GNNs to perform graph isomorphism. In particular, the representation power of GNNs has been compared to that of the Weisfeiler-Lehman (WL) test. Although graph isomorphism is an important and necessary expressivity test, it is not sufficient to fully characterize the representation power of these architectures. Consequently, there has been increased interest in approaching GNN expressivity via the counting of substructures or by solving graph bi-connectivity. We revisit the representation power of GNNs in terms of counting important substructures of the graph. Additionally, we investigate their ability to generalize and transfer structural properties to arbitrary graph families. Our work is motivated by the following research question:

(P) Given a graph G without additional features; can a message-passing GNN learn to generate permutation equivariant representations that count and list graph substructures?

This problem studies the ability of a message-passing GNN to produce substructure informative representations when the input X is completely uninformative. This is interesting for two reasons. First, it provides a theoretical measure of the representation power of GNNs. Second, it has a strong practical component, as counting and listing graph substructures, also known as subgraphs, motifs, and graphlets, is pivotal in a plethora of real-world tasks. For example, in molecular chemistry the bonds between atoms or functional groups form cycles (rings) that are indicative of a molecule’s properties and can be utilized to generate molecular fingerprints. Cliques and quasi-cliques, on the other hand, characterize protein complexes in Protein-Protein Interaction networks and community structure in social networks.

Figure 1. Cycle detection accuracy for different GNN models.

We give an affirmative answer to (P) and analyze the representation power of GNNs in terms of their ability to count and list cycles, cliques, quasi-cliques, and connected components. To this end, we consider a standard GNN architecture that performs local message passing operations, combined with normalization layers. Our analysis employs tools from tensor algebra and stochastic processes to characterize the output of the GNN, when the input is a random vector. We show that a suitable choice of activation and normalization functions can generate equivariant node representations, which capture the statistical moments of the GNN output distribution.

Additionally, we derive a deterministic, closed-form expression for the GNN output, provide a rigorous proof that accurately lists all the triangles and counts cycles ranging from size 3 to 8, 4-node cliques, quasi-cliques, as well as connected components within the graph. Our results also offer novel insights into the generalization properties of message-passing GNNs. Specifically, we demonstrate that a GNN can learn to count important substructures, not only within the family of graphs observed during training but also for any arbitrary graph. This analysis is constructive and enables the development of a generic architecture that exhibits success across various tasks. Extensive numerical tests validate the effectiveness of the proposed approach across four distinct tasks: cycle detection, cycle counting, graph classification, and molecular property prediction.

Team Members

Shirin Saeedi Bidokhti1
Alejandro Ribeiro1

Collaborators

Charilaos Kanatsoulis2

1. University of Pennsylvania
2. Stanford University