Tensorflow Tutorial

Simple usage example

import gnn.GNN as GNN
import gnn.gnn_utils
import Net as n

# Provide your own functions to generate input data
inp, arcnode, nodegraph, labels = set_load()

# Create the state transition function, output function, loss function and  metrics
net = n.Net(input_dim, state_dim, output_dim)

# Create the graph neural network model
g = GNN.GNN(net, input_dim, output_dim, state_dim)

#Training

for j in range(0, num_epoch):
    g.Train(inp, arcnode, labels, count, nodegraph)

    # Validate
    print(g.Validate(inp_val, arcnode_val, labels_val, count, nodegraph_val))

Simple toy example for input formatting

Input composed by two graphs:

_images/g1.png _images/g2.png

This graphs can be described in the EN Input format.

The E matrix describing the first graph ( [[id_p, id_c, graph_id],...]):

_images/g1.png
>>> E = [[0 1 0]
         [0 2 0]
         [0 4 0]
         [1 0 0]
         [1 2 0]
         [1 3 0]
         [2 0 0]
         [2 1 0]
         [2 3 0]
         [2 4 0]
         [3 1 0]
         [3 2 0]
         [4 0 0]
         [4 2 0]]

Note the last column, denoting the id (0) of the graph to which the arc belongs.

The E matrix describing the second graph:

_images/g2.png
>>> E = [[0 2 1]
         [0 3 1]
         [1 2 1]
         [1 3 1]
         [2 3 1]
         [2 0 1]
         [3 0 1]
         [2 1 1]
         [3 1 1]
         [3 2 1]]


Note the last column, denoting the id (1) of the graph to which the arc belongs .

The global E_tot matrix (with incremental node ids):

>>> E_tot = [[0 1 0]
             [0 2 0]
             [0 4 0]
             [1 0 0]
             [1 2 0]
             [1 3 0]
             [2 0 0]
             [2 1 0]
             [2 3 0]
             [2 4 0]
             [3 1 0]
             [3 2 0]
             [4 0 0]
             [4 2 0]
             [5 7 1]
             [5 8 1]
             [6 7 1]
             [6 8 1]
             [7 5 1]
             [7 6 1]
             [7 8 1]
             [8 5 1]
             [8 6 1]
             [8 7 1]]

The global N matrix ( in this simple case, each node has a different one-hot feature) ([[node_features, graph_id],... ]) :

>>> N = [[1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
         [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
         [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
         [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
         [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
         [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
         [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
         [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
         [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]]

The from_EN_to_GNN() util takes care of formatting the inputs:

inp, arcnode, graphnode = gnn_utils.from_EN_to_GNN(E_tot, N_tot)

# random labels
labels = np.random.randint(2, size=(N_tot.shape[0]))
labels = np.eye(max(labels)+1, dtype=np.int32)[labels]  # one-hot encoding of labels



# set input and output dim, the maximum number of iterations, the number of epochs and the optimizer
threshold = 0.01
learning_rate = 0.01
state_dim = 5
input_dim = inp.shape[1]
output_dim = labels.shape[1]
max_it = 50
num_epoch = 10000

# Create the state transition function, output function, loss function and  metrics
net = n.Net(input_dim, state_dim, output_dim)

# Create the graph neural network model
g = GNN.GNN(net, input_dim, output_dim, state_dim)

#Training

for j in range(0, num_epoch):
    g.Train(inp, arcnode, labels, count, nodegraph)

    # Validate
    print(g.Validate(inp_val, arcnode_val, labels_val, count, nodegraph_val))

Description

This guide is an introduction to the GNN package.

The implementation consists of the two modules:

  • GNN.py contains the main core of the GNN

  • Net.py contains the implementation of several task oriented structures, such as state and output networks, loss functions and metrics definion.

Users may implement their own version of the Net file, for a specific task.

Model definition

Input data

As described in Matrix-based implementation, the computations are based on the arcs in the input graphs. Hence, inputs to the model must be specified as an ordered edge list.

In particular, for each edge, this structure (inp) must contain:

  • the id of the child node (used to gather its state)

  • the father and child node labels

  • the edge label (if available)

_images/input.svg

Note

We provide a novel utility to compose this kind of input, given a description of the graph dataset in an E-N format. See section EN Input.

ArcNode

In order to aggregate the state per node, a matrix multiplication with an edge–node matrix is performed. The matrix encodes which arcs affect a certain node (see Matrix-based implementation). This matrix (arcnode) is sparse, to save memory.

_images/arcnode.svg

NodeGraph

This matrix must be defined only in the case of Graph-based problems. See Graph–Based Output function for more information.

We provide a Library to generate these inputs for the simple benchmarks reported in our research papers Examples.

EN Input

To simplify the input creation, we provide an utility in the utils.py file, the from_EN_to_GNN(E, N) function.

The user can describe the dataset in the EN format:

E
  • numpy array of edges : [[id_p, id_c, graph_id],…]. One row for each arc in the dataset. First column must contain the ids of father nodes, the second column ids of child nodes. The third column contains an id that identifies the graph (to which the node belongs) in the dataset.

N
  • numpy array of nodes features - [[node_features, graph_id],… ]. One row for each and every node. A set of columns containing the nodes features. The last column is an id that identifies the graph (to which the node belongs) in the dataset.

The from_EN_to_GNN util takes this two array as input (E N ) and returns the formatted input for the GNN model, inp, arcnode, graphnode.

See Simple toy example for input formatting for a practical example.

State and output function definition

The user must define an appropriate version of the Net file for the task to be solved. In particular, it is possible to define a custom version of the state and output network (MLPs with different architectures, number of layers etc.), loss function and metrics.

More details are available in the Examples section.

Following the notation used in the examples, we import this file into the code with:

>>> import Net_Subgraph as n

Then it is possible to define the net object:

>>> net = n.Net(input_dim, state_dim, output_dim)

where input_dim, state_dim, output_dim represent the dimension of the inputs, the state net and the output net respectively.

Afterwards, the GNN model can be defined using:

>>> g = GNN.GNN(net, input_dim, output_dim, state_dim)

In this way, the model building is complete.

This approach uses some default parameters that can be customized, specifying the complete set of parameters:

>>> g = GNN.GNN(net, input_dim, output_dim, state_dim,  max_it, optimizer, learning_rate, threshold, False, param, config,
        tensorboard)

Training

The Train method runs one epoch of the training procedure:

>>>  loss, it = g.Train(inp, arcnode, labels, j)

It receives as inputs, inp, the input matrix described in Input data, the arcnode matrix, the node targets and the current epoch step.

Note

In case of a graph-based task, you can specify the NodeGraph matrix with:

>>>  g.Train(inp, arcnode, labels, j, nodegraph)

It returns the loss value loss and the number of iteration of the current step convergence procedure it.

Validation

The Validate method performs the validation (i.e., using the validation set) of the model:

>>> loss_val, metr, it =g.Validate(inp_val, arcnode_val, labels_val, j)

It outputs, regarding the set given as input, the loss value loss, the accuracy (custom defined in the Net file) metr, the number of iteration in the convergence procedure it.

Evaluate

The Evaluate method performs the evaluation (i.e., using the test set) of the model:

>>> met = g.Evaluate(inp_test, arcnode_test, labels_test)

It gives as output the accuracy on the set given as input, met.

Predict

The Predict method can be used to compute the output of each node of the GNN model:

>>> pr = g.Evaluate(inp_test, arcnode_test, labels_test)

It gives as output the output values of the output function (all the nodes output for all the graphs (if node-based) or a single output for each graph (if graph based), pr.