使用Apache MXNet和Horovod进行分布式学习

本文的翻译是在“大数据工业ML”课程开学前夕准备的。








, . , , . , , , .



Apache MXNet Horovod. Horovod , MXNet , Horovod.



Apache MXNet



Apache MXNet – , , . MXNet , , , API , Python, C++, Clojure, Java, Julia, R, Scala .



MXNet



MXNet (parameter server). , . – . , . , , «--» .



Horovod



Horovod – , Uber. GPU , NVIDIA Collective Communications Library (NCCL) Message Passing Interface (MPI) . . , MXNet, Tensorflow, Keras, PyTorch.



MXNet Horovod



MXNet Horovod API , Horovod. Horovod API horovod.broadcast(), horovod.allgather() horovod.allreduce() MXNet, . , MXNet, - . distributed optimizer, Horovod horovod.DistributedOptimizer Optimizer MXNet , API Horovod . .





MNIST MXNet Horovod MacBook.

mxnet horovod PyPI:



pip install mxnet
pip install horovod


: pip install horovod, , MACOSX_DEPLOYMENT_TARGET=10.vv, vv – MacOS, , MacOSX Sierra MACOSX_DEPLOYMENT_TARGET=10.12 pip install horovod

OpenMPI .



mxnet_mnist.py MacBook :



mpirun -np 2 -H localhost:2 -bind-to none -map-by slot python mxnet_mnist.py


. :



INFO:root:Epoch[0] Batch [0-50] Speed: 2248.71 samples/sec      accuracy=0.583640
INFO:root:Epoch[0] Batch [50-100] Speed: 2273.89 samples/sec      accuracy=0.882812
INFO:root:Epoch[0] Batch [50-100] Speed: 2273.39 samples/sec      accuracy=0.870000




ResNet50-v1 ImageNet 64 GPU p3.16xlarge EC2, 8 GPU NVIDIA Tesla V100 AWS cloud, 45000 / (.. ). 44 90 75.7%.



MXNet 8, 16, 32 64 GPU 1 1 2 1 . 1 . y , ( ) y . . , 38% 64 GPU. Horovod .





1. MXNet Horovod



1 64 GPU. MXNet Horovod .





1. Horovod 2 1.





, MXNet Horovod. MXNet .



1



MXNet 1.4.0 Horovod 0.16.0 , . GPU. Ubuntu 16.04 Linux, GPU Driver 396.44, CUDA 9.2, cuDNN 7.2.1, NCCL 2.2.13 OpenMPI 3.1.1. Amazon Deep Learning AMI, .



2



MXNet API Horovod. MXNet Gluon API . , , , . , Horovod:



  • Horovod ( 8), , .
  • ( 18), , .
  • Horovod DistributedOptimizer ( 25), .


, Horovod-MXNet MNIST ImageNet.



1  import mxnet as mx
2  import horovod.mxnet as hvd
3
4  # Horovod: initialize Horovod
5  hvd.init()
6
7  # Horovod: pin a GPU to be used to local rank
8  context = mx.gpu(hvd.local_rank())
9
10 # Build model
11 model = ...
12
13 # Initialize parameters
14 model.initialize(initializer, ctx=context)
15 params = model.collect_params()
16
17 # Horovod: broadcast parameters
18 hvd.broadcast_parameters(params, root_rank=0)
19
20 # Create optimizer
21 optimizer_params = ...
22 opt = mx.optimizer.create('sgd', **optimizer_params)
23
24 # Horovod: wrap optimizer with DistributedOptimizer
25 opt = hvd.DistributedOptimizer(opt)
26
27 # Create trainer and loss function
28 trainer = mx.gluon.Trainer(params, opt, kvstore=None)
29 loss_fn = ...
30
31 # Train model
32 for epoch in range(num_epoch):
33    ...


3



MPI. , 4 GPU , 16 GPU . (SGD) :



  • mini-batch size: 256
  • learning rate: 0.1
  • momentum: 0.9
  • weight decay: 0.0001


GPU 64 GPU GPU ( 0,1 1 GPU 6,4 64 GPU), , GPU, 256 ( 256 1 GPU 16 384 64 GPU). weight decay momentum GPU. float16 float32 , float16, GPU NVIDIA Tesla.



$ mpirun -np 16 \
    -H server1:4,server2:4,server3:4,server4:4 \
    -bind-to none -map-by slot \
    -mca pml ob1 -mca btl ^openib \
    python mxnet_imagenet_resnet50.py




Apache MXNet Horovod. ImageNet, ResNet50-v1. , , Horovod.



MXNet , MXNe, MXNet. MXNet in 60 minutes, .



MXNet Horovod, Horovod, MXNet MNIST ImageNet.



* AWS EC2






« ML »







All Articles