What is geometric deep learning?
Geometric deep learning (GDL) is an emerging concept of Artificial Intelligence. GDL is an umbrella term encompassing emerging techniques that generalize neural networks to Euclidean and non-Euclidean domains, such as graphs, manifolds, meshes or string representations.
Or as Oxford University DeepMind Professor Michael Bronstein puts it:
“Geometric Deep Learning is an umbrella term for approaches considering a broad class of ML problems from the perspectives of symmetry and invariance. It provides a common blueprint allowing to derive from first principles neural network architectures as diverse as CNNs, GNNs, and Transformers.”
Geometric Deep Learning (GDL) is a subfield of machine learning that focuses on developing deep learning models for structured data with geometric structures such as graphs, point clouds, meshes, and manifolds.
In traditional deep learning, the data is represented as a grid of pixels or a vector of features, but in GDL, the data is represented as a graph, which is a set of nodes (vertices) and edges that connect them. GDL has shown great promise in various fields such as computer vision, drug discovery, physics, and social network analysis.
The core idea behind GDL is to generalize convolutional neural networks (CNNs), which are highly effective in processing grid-like data such as images, to non-Euclidean domains such as graphs and manifolds. In traditional CNNs, the convolution operation becomes performed on a grid of pixels. Where each pixel is connected to its neighboring pixels in a fixed pattern.
In GDL, the convolution operation becomes generalized to operate on irregular graphs where the nodes have arbitrary connections.
One of the key challenges in GDL is designing graph convolutional neural networks (GCNNs) that can effectively process graphs of varying sizes and structures. GCNNs aim to learn localized features by propagating information from neighboring nodes in a graph using message passing algorithms. The key is to design a convolution operation that respects the underlying graph structure and is translation-invariant.
There are several architectures for GCNNs, including spectral-based and spatial-based methods.
Spectral-based methods use the
eigenvalues and eigenvectors of the graph Laplacian to perform convolution operations.
Spectral, according to Professor Bronstein; “emerged from the work of Joan Bruna and coauthors [125] using the notion of the Graph Fourier transform. The roots of this construction are in the signal processing and computational harmonic analysis communities, where dealing with non-Euclidean signals has become prominent in the late 2000s and early 2010s.”
Spectral-based methods for Graph Convolutional Neural Networks (GCNNs) are a family of algorithms that perform convolutions on the spectral domain of the graph Laplacian
.
These methods leverage the spectral properties of the graph Laplacian to define the convolution operation, which makes them particularly suitable for handling graph signals with complex structures.
The main idea behind spectral-based methods is to use the eigenvalues and eigenvectors of the graph Laplacian to define the convolution operation.
The graph Laplacian is a matrix that captures the connectivity and structure of the graph. In addition, its eigenvalues and eigenvectors encode important spectral properties of the graph.
One popular spectral-based GCNN architecture is the Graph Convolutional Network (GCN), proposed by Kipf and Welling in 2017. In GCN, the convolution operation is defined in the spectral domain as:
h = \sigma(\hat{D}^{-1/2} \hat{A} \hat{D}^{-1/2} X W)
where h is the hidden feature representation of the graph signal, X is the input graph signal, W is the weight matrix, and \sigma(.) is a non-linear activation function such as ReLU. The matrices \hat{A} and \hat{D} are the normalized graph adjacency matrix and the degree matrix, respectively, defined as:
\hat{A} = A + I \hat{D}{ii} = \sum_j \hat{A}{ij}
where A is the adjacency matrix of the graph and I is the identity matrix.
The GCN convolution operation can become interpreted as a filtering operation in the spectral domain.
Where the graph signal is filtered by a polynomial function of the graph Laplacian. The polynomial function is defined by the weight matrix W, which is learned during training.
Another popular spectral-based GCNN architecture is the ChebNet, proposed by École Polytechnique Fédérale de Lausanne’s Michaël Defferrard et al. in 2016. In ChebNet, the convolution operation becomes defined as:
h = \sigma(\sum_{k=0}^{K-1} \theta_k T_k(\tilde{L}) X)
where h is the hidden feature representation of the graph signal, X is the input graph signal, \tilde{L} is the rescaled Laplacian matrix, and T_k(.) is the k-th Chebyshev polynomial of \tilde{L}. The coefficients \theta_k become learned during training. And K is a hyperparameter that controls the order of the polynomial.
The ChebNet convolution operation can become interpreted as a filtering operation in the spectral domain. Where the graph signal is filtered by a linear combination of the Chebyshev polynomials of the Laplacian matrix. The Chebyshev polynomials provide an efficient way to approximate the filtering operation, which makes the ChebNet algorithm computationally efficient.
Spectral-based methods are highly effective for processing graph data! As a result of providing a principled way to define the convolution operation and are scalable to large graphs. These methods have shown great promise in various applications such as social network analysis, 3D object recognition, and molecular modeling.
Spatial-based methods, on the other hand, use local filters to perform convolution operations on a set of neighboring nodes.
Spatial-based methods for Graph Convolutional Neural Networks (GCNNs) are a family of algorithms that perform convolutions on a set of neighboring nodes in a graph. These methods use local filters that operate on the node and its neighbors to extract localized features.
The main idea behind spatial-based methods is to define a local neighborhood around each node in the graph and apply convolutional filters on this neighborhood. The output of the convolution operation becomes aggregated using a pooling operation to generate a feature vector for each node.
One popular spatial-based GCNN architecture is the Graph Convolutional Network (GCN). Proposed by Kipf and Welling in 2017. In GCN, the convolution operation becomes defined as:
h_i^{(l+1)} = \sigma(\sum_{j \in N_i} \frac{1}{\sqrt{deg(i)} deg(j)} h_j^{(l)} W^{(l)} h_i^{(l)})
where h_i^{(l)} is the hidden feature representation of node i in layer l, N_i is the set of neighboring nodes of i, deg(i) is the degree of node i, and W^{(l)} is the weight matrix for layer l. The activation function \sigma(.) is typically a non-linear function such as ReLU.
The GCN convolution operation is a weighted sum of the features of the neighboring nodes. Where the weights become normalized by the degree of the nodes. Moreover, this normalization ensures that nodes with higher degrees do not dominate the convolution operation. In addition, the output of the convolution operation passes through an activation function to introduce non-linearity.
Another popular spatial-based GCNN architecture is the Graph Attention Network (GAT), which was proposed by Velickovic et al. in 2018.
In GAT, the convolution operation defined as:
h_i^{(l+1)} = \sigma(\sum_{j \in N_i} \alpha_{ij}^{(l)} h_j^{(l)} W^{(l)})
where \alpha_{ij}^{(l)} is the attention coefficient between nodes i and j in layer l. The attention coefficient becomes computed as:
\alpha_{ij}^{(l)} = \frac{\exp(\text{LeakyReLU}(a^T [W h_i^{(l)} | W h_j^{(l)}]))}{\sum_{k \in N_i} \exp(\text{LeakyReLU}(a^T [W h_i^{(l)} | W h_k^{(l)}]))}
where a is a learnable weight vector, | is the concatenation operator, and LeakyReLU(.) is a variant of the rectified linear unit (ReLU) activation function that allows negative values. The attention coefficient becomes computed by taking a weighted sum of the features of the neighboring nodes. Where the weights become learned through a self-attention mechanism.
Spatial-based methods are highly effective for processing graph data because they are able to capture localized features and are translation-invariant.
These methods have shown great promise in various applications such as social network analysis, 3D object recognition, and molecular modeling.
GDL has numerous applications, including image and video analysis, 3D object recognition, molecular modeling, social network analysis, and neuroscience. GDL shows great promise in analyzing large-scale complex systems, such as social networks, transportation networks, and brain connectomes.
Where the underlying structure becomes represented as a graph!
In conclusion, GDL is a rapidly growing field that aims to extend deep learning to structured data with geometric structures. Furthermore, the development of GCNNs has opened up new possibilities for analyzing complex data, such as graphs, point clouds, and manifolds. Lastly, GDL has the potential to transform numerous fields, ranging from computer vision to social network analysis, and beyond.
Huge thank you to Professor Michael Bronstein and his work for inspiring this article! We recommend you read his article that is about 100 times better than this Towards Geometric Deep Learning