Skip to content

User Manual

Siran Yang edited this page Jul 9, 2019 · 2 revisions

In this section, we introduce the usage of the Euler graph learning algorithm package tf_eulr. After users prepare the graph data for Euler, the command line can be used to train and evaluate a model and save the embeddings.

Basic usage

python -m tf_euler --data_dir <data_dir> --mode <mode> [flags]
  • --data_dir, graph data directory, required.
  • --mode, operation model: train / evaluate / save_embedding, the default is train.

Data parameters

  • --train_node_type, the node type in train data set, the default is 0.
  • --all_node_type, the node type in the whole data set, the default is -1.
  • --train_edge_type, the edge type in train data set, the default is [0].
  • --all_edge_type, the edge type in the whole data set, the default is [0, 1].
  • --max_id, the largest node id in the graph, required.
  • --feature_idx, the id of dense feature, required if using dense feature.
  • --feature_dim, the dimension of dense feature, required if using dense feature.
  • -—label_idx, the id of label in dense feature, required for supervised model.
  • --label_dim, the dimension of label in dense feature, required for supervised model.
  • --num_classes, class number, required if label is scalar.
  • --id_file, the id file of test data, one line one id, required for evaluation.

Training parameters

  • --model_dir, checkpoint path, the default is ckpt.
  • --batch_size, batch size, the default is 512.
  • --optimizer, optimizer, the default is adam.
  • --learning_rate, learning rate, the default is 0.01.
  • --num_epochs, training epochs, the default is 10.
  • --log_steps, the interval steps to print log, the default is 20.

Model parameters

  • --model, model name, includes line / randomwalk / graphsage / graphsage_supervised / scalable_gcn / gat / saved_embedding.
  • --dim, embedding dimension, the default is 128.
  • --sigmoid_loss / --nosigmoid_loss, loss function, the default is --sigmoid_loss;

LINE

unsupervised model

RandomWalk

unsupervised model

GraphSage / GraphSage(Supervised) / ScalableGCN

unsupervised model / supervised model / supervised model

  • --fanouts, the expansion number per layer, the default is [10, 10].
  • --aggregator, aggregator type, includes gcn / mean / meanpool / maxpool, the default is mean.
  • --concat / --noconcat, aggregation method, refer to GraphSAGE, the default is --concat.

GAT

supervised model

  • --head_num, attention head number, the default is 1.

Saved Embedding

supervised model

Import the embedding.npy file in modir_dir as a dense feature and build the LR model to evaluate the effects of unsupervised models.

Distributed training parameters

tf_euler uses ParamerServer for distributed training, refer to Distributed TensorFlow. The graph engine will automatically split and share data between workers. Note that the data must be placed on the HDFS in distributed training.

  • --euler_zk_addr,ZooKeeper service address,required for distributed training.
  • --euler_zk_path,ZooKeeper znode path,required for distributed training.
  • --worker_hosts,worker list, required for distributed training.
  • --ps_hosts,ps list, required for distributed training.
  • --task_name, task name, ps or worker, required for distributed training.
  • --task_index, task index, required for distributed training.
Clone this wiki locally