
=== Start adding workers ===
=> Add worker SGDMWorker(index=0, momentum=0.9)
=> Add worker SGDMWorker(index=1, momentum=0.9)
=> Add worker SGDMWorker(index=2, momentum=0.9)
=> Add worker SGDMWorker(index=3, momentum=0.9)
=> Add worker SGDMWorker(index=4, momentum=0.9)
=> Add worker SGDMWorker(index=5, momentum=0.9)
=> Add worker SGDMWorker(index=6, momentum=0.9)
=> Add worker SGDMWorker(index=7, momentum=0.9)
=> Add worker SGDMWorker(index=8, momentum=0.9)
=> Add worker SGDMWorker(index=9, momentum=0.9)

=== Start adding graph ===
<codes.graph_utils.Dumbbell object at 0x7fd72f9a46d0>

Train epoch 1
[E 1B0  |    320/60000 (  1%) ] Loss: 2.2959 top1= 10.0000

=== Peeking data label distribution E1B0 ===
Worker 0 has targets: tensor([4, 8, 8, 6, 9], device='cuda:0')
Worker 1 has targets: tensor([5, 3, 6, 0, 9], device='cuda:0')
Worker 2 has targets: tensor([2, 9, 9, 3, 1], device='cuda:0')
Worker 3 has targets: tensor([6, 9, 8, 1, 2], device='cuda:0')
Worker 4 has targets: tensor([5, 8, 9, 1, 8], device='cuda:0')
Worker 5 has targets: tensor([6, 7, 5, 2, 3], device='cuda:0')
Worker 6 has targets: tensor([3, 2, 8, 7, 9], device='cuda:0')
Worker 7 has targets: tensor([3, 8, 7, 8, 7], device='cuda:0')
Worker 8 has targets: tensor([8, 0, 2, 4, 8], device='cuda:0')
Worker 9 has targets: tensor([5, 3, 4, 6, 3], device='cuda:0')


[E 1B10 |   3520/60000 (  6%) ] Loss: 1.9205 top1= 47.1875
[E 1B20 |   6720/60000 ( 11%) ] Loss: 0.8712 top1= 72.5000
[E 1B30 |   9920/60000 ( 17%) ] Loss: 0.8883 top1= 75.3125
[E 1B40 |  13120/60000 ( 22%) ] Loss: 0.5865 top1= 81.8750
[E 1B50 |  16320/60000 ( 27%) ] Loss: 0.4350 top1= 85.9375
[E 1B60 |  19520/60000 ( 33%) ] Loss: 0.4111 top1= 88.7500
[E 1B70 |  22720/60000 ( 38%) ] Loss: 0.3610 top1= 89.0625
[E 1B80 |  25920/60000 ( 43%) ] Loss: 0.3018 top1= 91.5625
[E 1B90 |  29120/60000 ( 49%) ] Loss: 0.2973 top1= 92.5000
[E 1B100|  32320/60000 ( 54%) ] Loss: 0.3280 top1= 89.6875
[E 1B110|  35520/60000 ( 59%) ] Loss: 0.2192 top1= 92.1875
[E 1B120|  38720/60000 ( 65%) ] Loss: 0.2862 top1= 92.5000
[E 1B130|  41920/60000 ( 70%) ] Loss: 0.2974 top1= 91.2500
[E 1B140|  45120/60000 ( 75%) ] Loss: 0.2707 top1= 92.8125
[E 1B150|  48320/60000 ( 81%) ] Loss: 0.2157 top1= 93.4375
[E 1B160|  51520/60000 ( 86%) ] Loss: 0.2717 top1= 92.1875
[E 1B170|  54720/60000 ( 91%) ] Loss: 0.1565 top1= 97.8125
[E 1B180|  57920/60000 ( 97%) ] Loss: 0.2381 top1= 92.1875

=> Averaged model (Global Average Validation Accuracy) | Eval Loss=0.1472 top1= 95.7232


=> Averaged model (Clique1 Average Validation Accuracy) | Eval Loss=0.1505 top1= 95.5028


=> Averaged model (Clique2 Average Validation Accuracy) | Eval Loss=0.1549 top1= 95.6731

Train epoch 2
[E 2B0  |    320/60000 (  1%) ] Loss: 0.1880 top1= 94.3750
[E 2B10 |   3520/60000 (  6%) ] Loss: 0.1675 top1= 95.3125
[E 2B20 |   6720/60000 ( 11%) ] Loss: 0.1650 top1= 94.6875
[E 2B30 |   9920/60000 ( 17%) ] Loss: 0.2195 top1= 91.8750
[E 2B40 |  13120/60000 ( 22%) ] Loss: 0.2123 top1= 93.7500
[E 2B50 |  16320/60000 ( 27%) ] Loss: 0.1286 top1= 96.5625
[E 2B60 |  19520/60000 ( 33%) ] Loss: 0.1625 top1= 94.6875
[E 2B70 |  22720/60000 ( 38%) ] Loss: 0.1365 top1= 95.6250
[E 2B80 |  25920/60000 ( 43%) ] Loss: 0.0991 top1= 97.8125
[E 2B90 |  29120/60000 ( 49%) ] Loss: 0.1498 top1= 95.0000
[E 2B100|  32320/60000 ( 54%) ] Loss: 0.1373 top1= 95.3125
[E 2B110|  35520/60000 ( 59%) ] Loss: 0.0888 top1= 97.5000
[E 2B120|  38720/60000 ( 65%) ] Loss: 0.1465 top1= 96.8750
[E 2B130|  41920/60000 ( 70%) ] Loss: 0.1179 top1= 96.8750
[E 2B140|  45120/60000 ( 75%) ] Loss: 0.1368 top1= 95.0000
[E 2B150|  48320/60000 ( 81%) ] Loss: 0.0820 top1= 97.1875
[E 2B160|  51520/60000 ( 86%) ] Loss: 0.1120 top1= 96.5625
[E 2B170|  54720/60000 ( 91%) ] Loss: 0.0865 top1= 98.4375
[E 2B180|  57920/60000 ( 97%) ] Loss: 0.1057 top1= 95.9375

=> Averaged model (Global Average Validation Accuracy) | Eval Loss=0.0820 top1= 97.3858


=> Averaged model (Clique1 Average Validation Accuracy) | Eval Loss=0.0823 top1= 97.4960

