vocab_size not found in data/openwebtext/meta.pkl, using GPT-2 default of 50257
Initializing a new model from scratch
number of parameters: 50.96M
tensor(1.)
step 0: train loss 10.9687, val loss 10.9745
iter 0: loss 12.4264, time 5829.51ms
iter 10: loss 12.4111, time 123.28ms
iter 20: loss 12.2677, time 123.81ms
iter 30: loss 12.1409, time 123.73ms
iter 40: loss 11.9282, time 123.21ms
iter 50: loss 11.8104, time 123.17ms
iter 60: loss 11.3125, time 123.09ms
iter 70: loss 11.5847, time 123.35ms
iter 80: loss 11.1419, time 123.16ms
iter 90: loss 10.9602, time 121.82ms
tensor(0.9990)
iter 100: loss 11.2528, time 123.52ms
iter 110: loss 10.4287, time 123.04ms
iter 120: loss 10.9708, time 122.90ms
iter 130: loss 10.4952, time 124.36ms
iter 140: loss 10.9001, time 123.40ms
iter 150: loss 10.6165, time 123.46ms
iter 160: loss 10.0686, time 123.97ms
iter 170: loss 10.0540, time 123.11ms
iter 180: loss 10.0450, time 122.92ms
iter 190: loss 10.1348, time 123.04ms
tensor(0.9961)
iter 200: loss 9.7931, time 122.43ms
iter 210: loss 10.0699, time 121.76ms
iter 220: loss 10.0861, time 122.20ms
iter 230: loss 10.7554, time 123.05ms
iter 240: loss 10.1172, time 123.15ms
step 250: train loss 8.6297, val loss 8.6553
saving checkpoint to out-shakespeare-char
iter 250: loss 10.4906, time 2991.27ms
iter 260: loss 10.9349, time 123.05ms
iter 270: loss 9.6323, time 123.36ms
iter 280: loss 9.8086, time 122.89ms
iter 290: loss 9.0578, time 123.15ms
tensor(0.9911)
iter 300: loss 10.7096, time 123.26ms
iter 310: loss 10.3334, time 122.86ms
iter 320: loss 9.8428, time 122.94ms
iter 330: loss 10.2626, time 123.21ms
iter 340: loss 9.6147, time 123.12ms
iter 350: loss 10.1639, time 122.84ms
iter 360: loss 9.5526, time 123.42ms
iter 370: loss 10.0908, time 123.33ms
iter 380: loss 10.2909, time 123.08ms
iter 390: loss 9.7407, time 123.12ms
tensor(0.9843)
iter 400: loss 10.1409, time 122.20ms
iter 410: loss 9.6480, time 123.39ms
iter 420: loss 10.0820, time 124.37ms
iter 430: loss 9.6424, time 124.34ms
iter 440: loss 9.3702, time 123.36ms
iter 450: loss 10.0306, time 123.18ms
iter 460: loss 10.2619, time 123.38ms
iter 470: loss 9.1462, time 123.20ms
iter 480: loss 9.5167, time 123.23ms
iter 490: loss 9.9183, time 123.33ms
tensor(0.9755)
step 500: train loss 8.2860, val loss 8.2454
saving checkpoint to out-shakespeare-char
iter 500: loss 9.4169, time 2899.38ms
iter 510: loss 9.6276, time 123.32ms
iter 520: loss 8.9985, time 123.05ms
iter 530: loss 9.5704, time 124.42ms
iter 540: loss 9.6710, time 123.45ms
iter 550: loss 9.3980, time 123.86ms
iter 560: loss 8.9159, time 123.35ms
iter 570: loss 9.2698, time 123.23ms
iter 580: loss 9.3554, time 122.40ms
iter 590: loss 8.8933, time 122.99ms
tensor(0.9649)
iter 600: loss 9.8952, time 123.24ms
iter 610: loss 9.9149, time 123.14ms
iter 620: loss 9.3630, time 123.17ms
iter 630: loss 9.6366, time 123.23ms
iter 640: loss 9.4596, time 123.32ms
iter 650: loss 10.5198, time 122.27ms
iter 660: loss 9.5892, time 123.31ms
iter 670: loss 9.4916, time 123.95ms
iter 680: loss 9.5738, time 123.82ms
iter 690: loss 9.7881, time 124.34ms
tensor(0.9524)
iter 700: loss 9.2897, time 123.35ms
iter 710: loss 9.6055, time 124.12ms
iter 720: loss 9.7617, time 123.16ms
iter 730: loss 9.8885, time 122.41ms
iter 740: loss 9.3118, time 123.46ms
step 750: train loss 8.1234, val loss 8.0684
saving checkpoint to out-shakespeare-char
iter 750: loss 9.4980, time 2870.34ms
iter 760: loss 9.1911, time 123.19ms
iter 770: loss 9.6296, time 124.48ms
iter 780: loss 9.3461, time 123.92ms
iter 790: loss 10.2044, time 123.29ms
tensor(0.9382)
iter 800: loss 9.4433, time 123.23ms
iter 810: loss 9.2626, time 122.05ms
iter 820: loss 9.7258, time 123.00ms
iter 830: loss 9.2750, time 123.30ms
iter 840: loss 9.1160, time 123.19ms
iter 850: loss 9.3879, time 123.87ms
iter 860: loss 9.8200, time 123.56ms
iter 870: loss 9.9151, time 123.32ms
iter 880: loss 9.9191, time 126.58ms
iter 890: loss 9.6606, time 123.45ms
tensor(0.9222)
iter 900: loss 9.6944, time 121.26ms
iter 910: loss 9.5878, time 123.17ms
iter 920: loss 10.1124, time 123.27ms
iter 930: loss 9.7761, time 123.14ms
iter 940: loss 9.2791, time 123.86ms
iter 950: loss 9.5459, time 123.17ms
iter 960: loss 9.7437, time 122.05ms
iter 970: loss 9.2562, time 128.73ms
iter 980: loss 9.7160, time 123.39ms
iter 990: loss 9.8511, time 123.36ms
tensor(0.9045)
step 1000: train loss 7.9902, val loss 8.0011
saving checkpoint to out-shakespeare-char
iter 1000: loss 9.9710, time 2857.37ms
iter 1010: loss 9.5192, time 123.88ms
iter 1020: loss 9.6219, time 123.58ms
iter 1030: loss 8.7802, time 123.54ms
iter 1040: loss 9.8425, time 123.05ms
iter 1050: loss 9.0792, time 123.89ms
iter 1060: loss 9.0763, time 122.68ms
iter 1070: loss 9.1981, time 123.62ms
iter 1080: loss 9.4146, time 123.50ms
iter 1090: loss 9.2517, time 123.70ms
tensor(0.8853)
iter 1100: loss 9.5501, time 123.84ms
iter 1110: loss 8.6826, time 123.58ms
iter 1120: loss 9.7478, time 124.86ms
iter 1130: loss 9.7067, time 123.28ms
iter 1140: loss 9.6967, time 122.35ms
iter 1150: loss 9.2676, time 123.60ms
iter 1160: loss 10.0742, time 123.36ms
iter 1170: loss 8.8193, time 122.77ms
iter 1180: loss 9.9696, time 123.56ms
iter 1190: loss 9.5701, time 123.65ms
tensor(0.8645)
iter 1200: loss 9.1640, time 124.97ms
iter 1210: loss 9.4997, time 123.49ms
iter 1220: loss 9.8362, time 123.40ms
iter 1230: loss 9.7681, time 123.44ms
iter 1240: loss 8.9739, time 123.53ms
step 1250: train loss 7.8726, val loss 7.8948
saving checkpoint to out-shakespeare-char
iter 1250: loss 9.0639, time 2891.22ms
iter 1260: loss 9.3338, time 123.13ms
iter 1270: loss 9.1284, time 123.12ms
iter 1280: loss 10.1503, time 123.29ms
iter 1290: loss 9.5020, time 123.17ms
tensor(0.8423)
iter 1300: loss 9.5884, time 123.09ms
iter 1310: loss 9.3822, time 123.96ms
iter 1320: loss 9.4398, time 123.24ms
iter 1330: loss 9.0047, time 123.15ms
iter 1340: loss 8.8548, time 121.91ms
iter 1350: loss 10.0341, time 123.21ms
iter 1360: loss 9.7731, time 123.15ms
iter 1370: loss 9.6622, time 123.32ms
iter 1380: loss 8.3395, time 123.09ms
iter 1390: loss 9.4461, time 123.41ms
tensor(0.8187)
iter 1400: loss 10.0859, time 123.27ms
iter 1410: loss 9.0081, time 123.68ms
iter 1420: loss 9.3718, time 123.30ms
iter 1430: loss 9.0861, time 123.14ms
iter 1440: loss 9.1794, time 123.92ms
iter 1450: loss 9.3817, time 123.05ms
iter 1460: loss 9.2774, time 123.43ms
iter 1470: loss 9.0049, time 122.93ms
iter 1480: loss 9.5662, time 124.24ms
iter 1490: loss 9.1346, time 123.11ms
tensor(0.7939)
step 1500: train loss 7.8761, val loss 7.8218
saving checkpoint to out-shakespeare-char
iter 1500: loss 9.2520, time 2869.79ms
iter 1510: loss 10.1077, time 123.30ms
iter 1520: loss 8.8401, time 123.10ms
iter 1530: loss 8.5050, time 124.52ms
iter 1540: loss 9.0391, time 123.06ms
iter 1550: loss 9.0723, time 123.04ms
iter 1560: loss 9.1119, time 123.05ms
iter 1570: loss 9.2223, time 123.10ms
iter 1580: loss 9.4948, time 122.90ms
iter 1590: loss 8.9496, time 122.96ms
tensor(0.7679)
iter 1600: loss 9.7894, time 123.08ms
iter 1610: loss 8.9867, time 121.78ms
iter 1620: loss 9.0140, time 123.66ms
iter 1630: loss 8.4293, time 123.23ms
iter 1640: loss 8.9328, time 122.71ms
iter 1650: loss 9.2681, time 123.08ms
iter 1660: loss 9.1262, time 122.82ms
iter 1670: loss 9.0515, time 123.13ms
iter 1680: loss 8.9634, time 122.91ms
iter 1690: loss 9.4520, time 121.36ms
tensor(0.7409)
iter 1700: loss 9.3071, time 123.81ms
iter 1710: loss 9.2445, time 123.13ms
iter 1720: loss 9.1187, time 122.37ms
iter 1730: loss 9.4387, time 123.62ms
iter 1740: loss 9.0671, time 123.48ms
step 1750: train loss 7.7835, val loss 7.7583
saving checkpoint to out-shakespeare-char
iter 1750: loss 9.2809, time 2886.38ms
iter 1760: loss 9.1927, time 123.13ms
iter 1770: loss 9.6663, time 123.68ms
iter 1780: loss 9.7003, time 123.75ms
iter 1790: loss 8.8471, time 122.98ms
tensor(0.7129)
iter 1800: loss 9.0756, time 123.60ms
iter 1810: loss 9.1734, time 123.80ms
iter 1820: loss 9.2443, time 124.84ms
iter 1830: loss 8.7468, time 123.14ms
iter 1840: loss 8.8348, time 124.41ms
iter 1850: loss 9.4839, time 123.26ms
iter 1860: loss 8.9854, time 123.65ms
iter 1870: loss 9.3504, time 123.37ms
iter 1880: loss 9.2728, time 123.52ms
iter 1890: loss 9.3415, time 123.19ms
tensor(0.6841)
iter 1900: loss 8.9275, time 123.01ms
iter 1910: loss 9.4000, time 121.98ms
iter 1920: loss 9.1353, time 122.87ms
iter 1930: loss 9.2862, time 122.90ms
iter 1940: loss 9.5427, time 122.75ms
iter 1950: loss 8.5218, time 122.72ms
iter 1960: loss 9.3457, time 122.82ms
iter 1970: loss 9.5861, time 122.66ms
iter 1980: loss 9.1065, time 122.89ms
iter 1990: loss 9.1536, time 122.85ms
tensor(0.6545)
step 2000: train loss 7.7536, val loss 7.7522
saving checkpoint to out-shakespeare-char
iter 2000: loss 8.2191, time 2893.41ms
iter 2010: loss 9.3434, time 122.54ms
iter 2020: loss 9.1078, time 124.13ms
iter 2030: loss 9.3722, time 123.19ms
iter 2040: loss 8.8986, time 123.12ms
iter 2050: loss 9.3345, time 122.72ms
iter 2060: loss 9.0752, time 122.62ms
iter 2070: loss 9.1341, time 122.77ms
iter 2080: loss 8.7424, time 122.63ms
iter 2090: loss 8.7885, time 123.07ms
tensor(0.6243)
iter 2100: loss 8.8224, time 124.50ms
iter 2110: loss 8.3428, time 122.64ms
iter 2120: loss 9.9997, time 122.65ms
iter 2130: loss 8.5904, time 124.03ms
iter 2140: loss 9.4563, time 122.75ms
iter 2150: loss 8.9838, time 122.71ms
iter 2160: loss 9.4287, time 124.42ms
iter 2170: loss 9.2994, time 122.71ms
iter 2180: loss 9.3613, time 121.78ms
iter 2190: loss 9.5832, time 122.43ms
tensor(0.5937)
iter 2200: loss 9.4883, time 122.50ms
iter 2210: loss 9.3936, time 122.77ms
iter 2220: loss 9.2327, time 122.81ms
iter 2230: loss 9.9557, time 122.95ms
iter 2240: loss 9.2941, time 123.31ms
step 2250: train loss 7.6845, val loss 7.7078
saving checkpoint to out-shakespeare-char
iter 2250: loss 9.6831, time 2894.22ms
iter 2260: loss 8.9063, time 123.48ms
iter 2270: loss 9.4771, time 123.18ms
iter 2280: loss 8.7583, time 123.12ms
iter 2290: loss 9.3731, time 122.64ms
tensor(0.5627)
iter 2300: loss 9.1818, time 123.37ms
iter 2310: loss 8.7247, time 123.22ms
iter 2320: loss 8.6322, time 122.46ms
iter 2330: loss 9.3561, time 123.10ms
iter 2340: loss 9.3703, time 123.28ms
iter 2350: loss 9.7819, time 123.19ms
iter 2360: loss 9.0936, time 123.25ms
iter 2370: loss 9.6409, time 123.58ms
iter 2380: loss 9.1549, time 123.01ms
iter 2390: loss 9.1946, time 122.69ms
tensor(0.5314)
iter 2400: loss 9.8144, time 123.39ms
iter 2410: loss 10.1446, time 123.09ms
iter 2420: loss 9.2811, time 123.46ms
iter 2430: loss 8.9449, time 123.67ms
iter 2440: loss 8.7071, time 123.44ms
iter 2450: loss 9.1939, time 123.65ms
iter 2460: loss 9.2818, time 123.11ms
iter 2470: loss 9.9148, time 123.18ms
iter 2480: loss 9.2958, time 123.01ms
iter 2490: loss 9.7745, time 123.40ms
tensor(0.5000)
step 2500: train loss 7.6505, val loss 7.6517
saving checkpoint to out-shakespeare-char
iter 2500: loss 9.0403, time 2888.44ms
iter 2510: loss 9.7395, time 121.71ms
iter 2520: loss 8.9861, time 122.74ms
iter 2530: loss 9.3287, time 122.83ms
iter 2540: loss 8.9712, time 122.61ms
iter 2550: loss 9.3423, time 122.88ms
iter 2560: loss 9.2310, time 123.36ms
iter 2570: loss 9.3269, time 124.49ms
iter 2580: loss 8.7193, time 123.83ms
iter 2590: loss 9.1875, time 123.46ms
tensor(0.4686)
iter 2600: loss 10.0031, time 123.55ms
iter 2610: loss 10.0712, time 122.61ms
iter 2620: loss 9.2246, time 123.12ms
iter 2630: loss 8.8327, time 121.93ms
iter 2640: loss 8.7924, time 122.05ms
iter 2650: loss 9.0856, time 124.25ms
iter 2660: loss 9.5913, time 123.50ms
iter 2670: loss 9.5699, time 121.81ms
iter 2680: loss 9.4508, time 124.04ms
iter 2690: loss 9.6786, time 124.13ms
tensor(0.4373)
iter 2700: loss 9.1020, time 122.50ms
iter 2710: loss 8.6981, time 123.81ms
iter 2720: loss 9.2732, time 124.27ms
iter 2730: loss 9.2566, time 122.11ms
iter 2740: loss 9.1397, time 123.69ms
step 2750: train loss 7.6460, val loss 7.6708
saving checkpoint to out-shakespeare-char
iter 2750: loss 9.4020, time 2890.42ms
iter 2760: loss 9.1376, time 125.23ms
iter 2770: loss 8.9444, time 119.81ms
iter 2780: loss 8.8289, time 120.09ms
iter 2790: loss 8.7624, time 122.41ms
tensor(0.4063)
iter 2800: loss 9.3919, time 121.33ms
iter 2810: loss 8.9713, time 121.37ms
iter 2820: loss 8.5666, time 119.39ms
iter 2830: loss 9.5170, time 122.08ms
iter 2840: loss 8.1450, time 120.60ms
iter 2850: loss 8.6497, time 120.93ms
iter 2860: loss 8.5348, time 119.24ms
iter 2870: loss 8.2940, time 120.53ms
iter 2880: loss 8.6458, time 120.27ms
iter 2890: loss 9.6608, time 119.32ms
tensor(0.3757)
iter 2900: loss 8.6950, time 119.25ms
iter 2910: loss 9.6523, time 119.14ms
iter 2920: loss 8.6587, time 119.33ms
iter 2930: loss 8.8619, time 120.61ms
iter 2940: loss 9.3327, time 124.95ms
iter 2950: loss 9.0090, time 122.84ms
iter 2960: loss 8.4985, time 119.47ms
iter 2970: loss 9.2685, time 119.37ms
iter 2980: loss 8.6579, time 119.49ms
iter 2990: loss 8.6624, time 119.77ms
tensor(0.3455)
step 3000: train loss 7.6302, val loss 7.5989
saving checkpoint to out-shakespeare-char
iter 3000: loss 9.7715, time 2878.19ms
iter 3010: loss 8.6415, time 119.29ms
iter 3020: loss 8.5055, time 119.34ms
iter 3030: loss 9.0541, time 118.97ms
iter 3040: loss 8.7390, time 120.28ms
iter 3050: loss 9.0815, time 119.70ms
iter 3060: loss 9.2040, time 120.49ms
iter 3070: loss 10.1227, time 119.36ms
iter 3080: loss 9.1036, time 120.05ms
iter 3090: loss 8.5392, time 119.51ms
tensor(0.3159)
iter 3100: loss 9.7613, time 121.07ms
iter 3110: loss 8.6019, time 120.32ms
iter 3120: loss 8.9038, time 119.30ms
iter 3130: loss 9.2779, time 119.35ms
iter 3140: loss 9.8812, time 119.55ms
iter 3150: loss 9.3754, time 120.61ms
iter 3160: loss 9.4446, time 119.02ms
iter 3170: loss 8.7382, time 119.25ms
iter 3180: loss 8.4769, time 120.28ms
iter 3190: loss 8.7988, time 119.65ms
tensor(0.2871)
iter 3200: loss 9.5552, time 120.91ms
iter 3210: loss 8.9874, time 120.51ms
iter 3220: loss 9.3754, time 119.18ms
iter 3230: loss 9.1498, time 120.32ms
iter 3240: loss 9.7524, time 119.25ms
step 3250: train loss 7.5744, val loss 7.6056
saving checkpoint to out-shakespeare-char
iter 3250: loss 9.8402, time 2876.83ms
iter 3260: loss 8.7769, time 119.45ms
iter 3270: loss 9.3114, time 120.37ms
iter 3280: loss 8.5397, time 119.15ms
iter 3290: loss 9.0589, time 119.06ms
tensor(0.2591)
iter 3300: loss 9.5667, time 120.76ms
iter 3310: loss 9.7242, time 119.85ms
iter 3320: loss 9.3145, time 120.03ms
iter 3330: loss 9.4128, time 119.06ms
iter 3340: loss 9.2242, time 120.35ms
iter 3350: loss 9.3573, time 119.15ms
iter 3360: loss 9.2579, time 120.17ms
iter 3370: loss 9.0515, time 119.24ms
iter 3380: loss 9.1182, time 122.05ms
iter 3390: loss 9.3899, time 120.23ms
tensor(0.2321)
iter 3400: loss 9.6826, time 118.97ms
iter 3410: loss 9.1085, time 118.81ms
iter 3420: loss 9.2358, time 120.66ms
iter 3430: loss 8.7675, time 119.31ms
iter 3440: loss 9.4141, time 119.98ms
iter 3450: loss 9.2559, time 120.31ms
iter 3460: loss 8.6792, time 120.67ms
iter 3470: loss 9.5929, time 119.28ms
iter 3480: loss 9.5993, time 119.00ms
iter 3490: loss 9.4196, time 119.07ms
tensor(0.2061)
step 3500: train loss 7.6059, val loss 7.5842
saving checkpoint to out-shakespeare-char
iter 3500: loss 9.2125, time 2845.18ms
iter 3510: loss 8.8037, time 119.35ms
iter 3520: loss 8.5782, time 121.40ms
iter 3530: loss 9.3458, time 119.26ms
iter 3540: loss 8.5939, time 119.17ms
iter 3550: loss 9.6821, time 121.97ms
iter 3560: loss 8.9009, time 118.98ms
iter 3570: loss 9.7085, time 118.98ms
iter 3580: loss 9.1985, time 121.87ms
iter 3590: loss 8.8931, time 119.22ms
tensor(0.1813)
iter 3600: loss 9.7911, time 119.32ms
iter 3610: loss 9.3805, time 119.79ms
iter 3620: loss 8.6650, time 120.53ms
iter 3630: loss 8.3721, time 119.39ms
iter 3640: loss 8.1625, time 118.97ms
iter 3650: loss 8.7298, time 119.99ms
iter 3660: loss 8.9795, time 119.33ms
iter 3670: loss 9.7783, time 121.59ms
iter 3680: loss 9.2260, time 120.10ms
iter 3690: loss 9.2016, time 120.17ms
tensor(0.1577)
iter 3700: loss 9.3319, time 119.34ms
iter 3710: loss 9.0627, time 121.59ms
iter 3720: loss 9.6574, time 119.41ms
iter 3730: loss 8.9532, time 119.27ms
iter 3740: loss 8.7070, time 120.38ms
step 3750: train loss 7.5894, val loss 7.6273
saving checkpoint to out-shakespeare-char
iter 3750: loss 8.6751, time 2869.00ms
iter 3760: loss 8.6957, time 122.96ms
iter 3770: loss 8.9946, time 122.93ms
iter 3780: loss 9.2150, time 122.62ms
iter 3790: loss 9.1412, time 122.99ms
tensor(0.1355)
iter 3800: loss 8.6840, time 122.98ms
iter 3810: loss 9.2022, time 122.92ms
iter 3820: loss 8.4680, time 122.81ms
iter 3830: loss 9.2231, time 122.73ms
iter 3840: loss 10.0259, time 123.05ms
iter 3850: loss 8.7353, time 122.85ms
iter 3860: loss 8.8306, time 122.87ms
iter 3870: loss 9.5082, time 120.84ms
iter 3880: loss 8.7162, time 123.30ms
iter 3890: loss 9.5798, time 122.71ms
tensor(0.1147)
iter 3900: loss 8.9231, time 123.01ms
iter 3910: loss 8.7697, time 122.86ms
iter 3920: loss 9.3719, time 123.64ms
iter 3930: loss 9.1486, time 123.04ms
iter 3940: loss 8.9254, time 118.97ms
iter 3950: loss 8.9012, time 119.54ms
iter 3960: loss 9.4526, time 120.82ms
iter 3970: loss 8.9108, time 126.08ms
iter 3980: loss 9.0719, time 123.44ms
iter 3990: loss 8.8480, time 123.41ms
tensor(0.0955)
step 4000: train loss 7.6162, val loss 7.5896
saving checkpoint to out-shakespeare-char
iter 4000: loss 9.4722, time 2861.12ms
iter 4010: loss 9.1643, time 123.71ms
iter 4020: loss 9.1458, time 123.01ms
iter 4030: loss 9.1701, time 123.75ms
iter 4040: loss 9.2979, time 125.04ms
iter 4050: loss 8.6527, time 123.08ms
iter 4060: loss 8.3432, time 123.95ms
iter 4070: loss 9.1074, time 123.56ms
iter 4080: loss 8.8379, time 122.98ms
iter 4090: loss 9.2217, time 122.95ms
tensor(0.0778)
iter 4100: loss 9.3126, time 123.38ms
iter 4110: loss 8.7239, time 122.93ms
iter 4120: loss 8.3117, time 123.10ms
iter 4130: loss 9.4440, time 123.22ms
iter 4140: loss 8.9126, time 123.18ms
iter 4150: loss 9.5072, time 122.68ms
iter 4160: loss 8.8074, time 123.44ms
iter 4170: loss 8.8377, time 123.12ms
iter 4180: loss 9.7804, time 123.21ms
iter 4190: loss 8.8735, time 122.46ms
tensor(0.0618)
iter 4200: loss 8.8329, time 123.07ms
iter 4210: loss 9.3840, time 123.11ms
iter 4220: loss 8.8122, time 123.01ms
iter 4230: loss 9.1606, time 123.16ms
iter 4240: loss 9.5932, time 122.80ms
step 4250: train loss 7.5787, val loss 7.5504
saving checkpoint to out-shakespeare-char
iter 4250: loss 9.0833, time 2856.50ms
iter 4260: loss 8.7900, time 123.08ms
iter 4270: loss 9.1590, time 123.80ms
iter 4280: loss 9.8938, time 123.00ms
iter 4290: loss 9.0764, time 122.39ms
tensor(0.0476)
iter 4300: loss 8.3387, time 123.71ms
iter 4310: loss 9.6460, time 122.99ms
iter 4320: loss 9.0449, time 122.74ms
iter 4330: loss 8.6606, time 123.18ms
iter 4340: loss 8.4274, time 123.62ms
iter 4350: loss 9.2830, time 123.35ms
iter 4360: loss 9.4627, time 122.81ms
iter 4370: loss 8.8656, time 123.58ms
iter 4380: loss 8.6686, time 123.40ms
iter 4390: loss 9.2996, time 123.16ms
tensor(0.0351)
iter 4400: loss 8.0083, time 123.53ms
iter 4410: loss 9.1013, time 123.04ms
iter 4420: loss 9.1812, time 122.33ms
iter 4430: loss 8.3924, time 124.12ms
iter 4440: loss 9.7054, time 123.12ms
iter 4450: loss 9.4592, time 123.41ms
iter 4460: loss 9.0734, time 123.40ms
iter 4470: loss 9.4358, time 123.21ms
iter 4480: loss 9.7701, time 123.74ms
iter 4490: loss 8.9795, time 123.36ms
tensor(0.0245)
step 4500: train loss 7.6084, val loss 7.6232
saving checkpoint to out-shakespeare-char
iter 4500: loss 8.9600, time 2884.49ms
iter 4510: loss 8.9832, time 123.04ms
iter 4520: loss 9.0012, time 124.77ms
iter 4530: loss 8.9884, time 122.66ms
iter 4540: loss 9.0402, time 123.13ms
iter 4550: loss 8.5254, time 123.12ms
iter 4560: loss 8.5615, time 123.10ms
iter 4570: loss 9.5645, time 123.02ms
iter 4580: loss 9.4025, time 122.95ms
iter 4590: loss 8.7298, time 123.14ms
tensor(0.0157)
iter 4600: loss 8.6858, time 122.86ms
iter 4610: loss 9.4167, time 123.07ms
iter 4620: loss 8.7979, time 122.99ms
iter 4630: loss 8.2842, time 122.71ms
iter 4640: loss 9.2144, time 122.93ms
iter 4650: loss 9.5603, time 123.29ms
iter 4660: loss 9.4303, time 118.49ms
iter 4670: loss 9.0752, time 117.12ms
iter 4680: loss 8.6467, time 116.79ms
iter 4690: loss 8.9944, time 118.37ms
tensor(0.0089)
iter 4700: loss 9.0592, time 117.11ms
iter 4710: loss 9.7011, time 117.02ms
iter 4720: loss 9.1227, time 117.89ms
iter 4730: loss 9.3815, time 117.64ms
iter 4740: loss 9.0036, time 117.36ms
step 4750: train loss 7.5810, val loss 7.5834
saving checkpoint to out-shakespeare-char
iter 4750: loss 8.9076, time 2888.50ms
iter 4760: loss 9.8130, time 123.14ms
iter 4770: loss 9.2251, time 122.94ms
iter 4780: loss 8.7690, time 122.34ms
iter 4790: loss 9.0445, time 122.93ms
tensor(0.0039)
iter 4800: loss 9.1999, time 122.91ms
iter 4810: loss 9.1053, time 123.46ms
iter 4820: loss 9.4201, time 122.98ms
iter 4830: loss 9.3047, time 122.74ms
iter 4840: loss 8.9875, time 122.55ms
iter 4850: loss 9.1849, time 123.56ms
iter 4860: loss 8.8730, time 123.00ms
iter 4870: loss 8.8437, time 123.44ms
iter 4880: loss 9.4508, time 123.12ms
iter 4890: loss 9.0041, time 123.01ms
tensor(0.0010)
iter 4900: loss 8.7453, time 122.87ms
iter 4910: loss 9.1228, time 123.07ms
iter 4920: loss 9.5832, time 123.06ms
iter 4930: loss 9.3695, time 123.52ms
iter 4940: loss 8.5956, time 124.10ms
iter 4950: loss 9.1003, time 123.13ms
iter 4960: loss 9.3414, time 122.93ms
iter 4970: loss 8.1793, time 122.95ms
iter 4980: loss 9.4639, time 123.12ms
iter 4990: loss 9.0885, time 123.40ms
tensor(0.0010)
step 5000: train loss 7.5951, val loss 7.5024
saving checkpoint to out-shakespeare-char
iter 5000: loss 9.0781, time 2859.51ms
iter 5010: loss 9.7161, time 120.95ms
iter 5020: loss 8.9287, time 119.64ms
iter 5030: loss 8.8938, time 120.72ms
iter 5040: loss 9.2956, time 119.76ms
iter 5050: loss 10.0378, time 119.44ms
iter 5060: loss 8.8334, time 119.47ms
iter 5070: loss 9.7566, time 119.76ms
iter 5080: loss 8.9309, time 119.49ms
iter 5090: loss 8.5781, time 120.61ms
tensor(0.0010)
iter 5100: loss 8.2763, time 119.58ms
iter 5110: loss 9.1256, time 121.81ms
iter 5120: loss 9.0596, time 120.47ms
iter 5130: loss 10.0371, time 120.57ms
iter 5140: loss 8.7960, time 120.72ms
iter 5150: loss 8.5692, time 119.68ms
iter 5160: loss 9.1406, time 121.67ms
iter 5170: loss 9.8337, time 120.49ms
iter 5180: loss 8.3057, time 119.84ms
iter 5190: loss 8.9858, time 119.41ms
tensor(0.0039)
iter 5200: loss 8.7898, time 121.19ms
iter 5210: loss 8.9742, time 119.34ms
iter 5220: loss 8.9863, time 120.62ms
iter 5230: loss 8.7331, time 121.13ms
iter 5240: loss 8.2138, time 121.27ms
step 5250: train loss 7.5945, val loss 7.5347
saving checkpoint to out-shakespeare-char
iter 5250: loss 9.2235, time 2882.01ms
iter 5260: loss 8.4083, time 120.06ms
iter 5270: loss 9.4414, time 119.74ms
iter 5280: loss 8.9146, time 119.55ms
iter 5290: loss 9.6661, time 119.56ms
tensor(0.0089)
iter 5300: loss 8.8280, time 119.72ms
iter 5310: loss 9.4265, time 120.27ms
iter 5320: loss 8.7021, time 121.78ms
iter 5330: loss 9.6458, time 119.65ms
iter 5340: loss 8.9619, time 119.75ms
iter 5350: loss 9.5790, time 120.65ms
iter 5360: loss 8.8091, time 119.67ms
iter 5370: loss 9.2250, time 119.88ms
iter 5380: loss 8.3130, time 119.37ms
iter 5390: loss 9.0870, time 120.68ms
tensor(0.0157)
iter 5400: loss 9.0548, time 120.78ms
iter 5410: loss 9.2714, time 119.43ms
iter 5420: loss 8.7338, time 120.85ms
iter 5430: loss 9.4193, time 120.69ms
iter 5440: loss 9.1133, time 121.91ms
iter 5450: loss 9.3936, time 119.39ms
iter 5460: loss 8.9568, time 119.84ms
iter 5470: loss 9.2597, time 119.37ms
iter 5480: loss 9.7439, time 118.63ms
iter 5490: loss 9.2360, time 120.41ms
tensor(0.0245)
step 5500: train loss 7.6014, val loss 7.5707
saving checkpoint to out-shakespeare-char
iter 5500: loss 9.1438, time 2879.60ms
iter 5510: loss 8.7970, time 121.79ms
iter 5520: loss 8.4399, time 119.33ms
iter 5530: loss 9.2706, time 119.50ms
iter 5540: loss 8.7437, time 119.30ms
iter 5550: loss 8.3892, time 119.45ms
iter 5560: loss 8.6804, time 119.55ms
iter 5570: loss 8.6111, time 119.91ms
iter 5580: loss 8.5023, time 119.25ms
iter 5590: loss 9.0995, time 119.28ms
tensor(0.0351)
iter 5600: loss 8.9985, time 120.73ms
iter 5610: loss 9.2183, time 119.38ms
iter 5620: loss 9.3670, time 120.56ms
iter 5630: loss 8.3492, time 122.01ms
iter 5640: loss 8.7790, time 119.38ms
iter 5650: loss 8.6466, time 118.94ms
iter 5660: loss 9.6110, time 121.97ms
iter 5670: loss 8.5244, time 120.56ms
iter 5680: loss 9.4221, time 119.32ms
iter 5690: loss 8.0482, time 120.88ms
tensor(0.0476)
iter 5700: loss 8.3412, time 119.60ms
iter 5710: loss 9.2793, time 119.51ms
iter 5720: loss 9.4608, time 120.63ms
iter 5730: loss 9.2352, time 119.73ms
iter 5740: loss 9.4611, time 119.53ms
step 5750: train loss 7.5416, val loss 7.5533
saving checkpoint to out-shakespeare-char
iter 5750: loss 9.4435, time 2882.85ms
iter 5760: loss 9.5658, time 120.51ms
iter 5770: loss 9.2352, time 121.68ms
iter 5780: loss 9.4998, time 119.56ms
iter 5790: loss 8.8984, time 119.35ms
tensor(0.0618)
iter 5800: loss 8.3871, time 119.75ms
iter 5810: loss 9.3350, time 121.93ms
iter 5820: loss 8.9198, time 119.80ms
iter 5830: loss 9.0851, time 119.70ms
iter 5840: loss 9.5201, time 118.93ms
iter 5850: loss 9.2350, time 119.60ms
iter 5860: loss 9.2688, time 119.52ms
iter 5870: loss 9.1462, time 119.66ms
iter 5880: loss 8.8030, time 120.69ms
iter 5890: loss 8.5752, time 120.87ms
tensor(0.0778)
iter 5900: loss 9.0327, time 119.38ms
iter 5910: loss 9.0608, time 119.71ms
iter 5920: loss 9.9019, time 120.48ms
iter 5930: loss 8.4668, time 119.52ms
iter 5940: loss 9.2325, time 120.22ms
iter 5950: loss 9.0311, time 119.73ms
iter 5960: loss 8.5307, time 120.87ms
iter 5970: loss 8.9537, time 119.92ms
iter 5980: loss 8.7668, time 119.70ms
iter 5990: loss 9.5244, time 118.99ms
tensor(0.0955)
step 6000: train loss 7.5433, val loss 7.5872
saving checkpoint to out-shakespeare-char
iter 6000: loss 8.6797, time 2891.93ms
iter 6010: loss 8.5587, time 119.49ms
iter 6020: loss 8.8764, time 119.82ms
iter 6030: loss 9.4419, time 120.79ms
iter 6040: loss 8.6693, time 119.61ms
iter 6050: loss 9.5641, time 118.82ms
iter 6060: loss 9.0876, time 119.61ms
iter 6070: loss 9.1293, time 119.84ms
iter 6080: loss 9.1159, time 119.84ms
iter 6090: loss 9.7636, time 120.01ms
tensor(0.1147)
iter 6100: loss 9.1601, time 119.96ms
iter 6110: loss 8.9509, time 119.75ms
iter 6120: loss 8.8457, time 120.77ms
iter 6130: loss 9.3510, time 119.64ms
iter 6140: loss 9.3192, time 119.77ms
iter 6150: loss 9.5350, time 119.30ms
iter 6160: loss 8.5236, time 119.69ms
iter 6170: loss 9.2440, time 120.85ms
iter 6180: loss 9.9094, time 120.08ms
iter 6190: loss 9.0745, time 119.55ms
tensor(0.1355)
iter 6200: loss 8.3371, time 121.30ms
iter 6210: loss 9.4086, time 119.60ms
iter 6220: loss 9.5120, time 119.66ms
iter 6230: loss 8.5428, time 119.88ms
iter 6240: loss 9.3601, time 120.74ms
step 6250: train loss 7.5844, val loss 7.5627
saving checkpoint to out-shakespeare-char
iter 6250: loss 9.0763, time 2889.24ms
iter 6260: loss 8.8365, time 119.55ms
iter 6270: loss 8.2413, time 119.48ms
iter 6280: loss 8.9159, time 121.09ms
iter 6290: loss 8.2560, time 119.74ms
tensor(0.1577)
iter 6300: loss 8.9414, time 120.75ms
iter 6310: loss 9.5031, time 119.46ms
iter 6320: loss 8.5475, time 119.86ms
iter 6330: loss 8.6134, time 119.68ms
iter 6340: loss 9.4891, time 119.59ms
iter 6350: loss 8.1599, time 119.63ms
iter 6360: loss 8.9232, time 120.32ms
iter 6370: loss 8.0847, time 120.94ms
iter 6380: loss 9.5133, time 119.69ms
iter 6390: loss 8.6821, time 120.63ms
tensor(0.1813)
iter 6400: loss 9.3281, time 120.80ms
iter 6410: loss 9.0527, time 119.44ms
iter 6420: loss 8.7941, time 119.79ms
iter 6430: loss 8.5225, time 120.73ms
iter 6440: loss 9.1299, time 119.84ms
iter 6450: loss 8.9892, time 119.53ms
iter 6460: loss 8.5181, time 120.04ms
iter 6470: loss 9.0145, time 120.78ms
iter 6480: loss 9.2028, time 120.51ms
iter 6490: loss 9.6251, time 120.86ms
tensor(0.2061)
step 6500: train loss 7.5432, val loss 7.5991
saving checkpoint to out-shakespeare-char
iter 6500: loss 9.6499, time 2895.15ms
iter 6510: loss 8.8206, time 121.67ms
iter 6520: loss 8.9121, time 123.51ms
iter 6530: loss 8.9703, time 122.51ms
iter 6540: loss 8.9199, time 123.17ms
iter 6550: loss 9.0948, time 123.22ms
iter 6560: loss 9.7347, time 121.42ms
iter 6570: loss 8.6248, time 123.86ms
iter 6580: loss 8.9816, time 123.14ms
iter 6590: loss 8.8084, time 122.83ms
tensor(0.2321)
iter 6600: loss 8.7213, time 121.62ms
iter 6610: loss 8.7109, time 123.01ms
iter 6620: loss 8.1205, time 122.89ms
iter 6630: loss 9.9175, time 122.65ms
iter 6640: loss 9.5525, time 123.15ms
iter 6650: loss 9.1295, time 122.93ms
iter 6660: loss 8.5794, time 123.16ms
iter 6670: loss 8.7022, time 123.10ms
iter 6680: loss 9.3451, time 122.86ms
iter 6690: loss 9.5126, time 123.05ms
tensor(0.2591)
iter 6700: loss 8.9449, time 123.33ms
iter 6710: loss 9.3070, time 123.17ms
iter 6720: loss 8.9047, time 122.07ms
iter 6730: loss 9.4805, time 123.05ms
iter 6740: loss 8.8169, time 122.64ms
step 6750: train loss 7.5136, val loss 7.5282
saving checkpoint to out-shakespeare-char
iter 6750: loss 9.2900, time 2863.86ms
iter 6760: loss 9.2403, time 123.56ms
iter 6770: loss 8.5098, time 122.64ms
iter 6780: loss 8.4508, time 122.97ms
iter 6790: loss 9.5387, time 123.03ms
tensor(0.2871)
iter 6800: loss 8.7900, time 122.65ms
iter 6810: loss 8.9398, time 122.92ms
iter 6820: loss 9.3159, time 122.73ms
iter 6830: loss 8.9777, time 122.71ms
iter 6840: loss 9.1389, time 122.60ms
iter 6850: loss 8.2329, time 122.59ms
iter 6860: loss 8.7753, time 122.54ms
iter 6870: loss 9.2685, time 122.62ms
iter 6880: loss 8.5252, time 121.40ms
iter 6890: loss 9.4032, time 122.81ms
tensor(0.3159)
iter 6900: loss 8.2594, time 122.56ms
iter 6910: loss 8.5505, time 122.71ms
iter 6920: loss 9.1306, time 122.56ms
iter 6930: loss 8.9035, time 122.53ms
iter 6940: loss 8.8713, time 122.53ms
iter 6950: loss 9.0677, time 122.49ms
iter 6960: loss 9.6477, time 122.46ms
iter 6970: loss 8.7615, time 122.59ms
iter 6980: loss 9.1774, time 122.66ms
iter 6990: loss 9.0225, time 122.67ms
tensor(0.3455)
step 7000: train loss 7.5235, val loss 7.5715
saving checkpoint to out-shakespeare-char
iter 7000: loss 8.3682, time 2865.91ms
iter 7010: loss 8.5334, time 123.21ms
iter 7020: loss 8.5195, time 123.07ms
iter 7030: loss 9.1838, time 123.08ms
iter 7040: loss 9.3301, time 122.69ms
iter 7050: loss 8.8168, time 123.93ms
iter 7060: loss 9.0340, time 123.09ms
iter 7070: loss 8.9373, time 122.70ms
iter 7080: loss 9.3560, time 122.69ms
iter 7090: loss 9.1004, time 122.69ms
tensor(0.3757)
iter 7100: loss 8.7004, time 122.69ms
iter 7110: loss 8.8205, time 123.01ms
iter 7120: loss 8.9486, time 123.08ms
iter 7130: loss 8.3602, time 122.76ms
iter 7140: loss 9.3983, time 122.86ms
iter 7150: loss 8.1117, time 122.80ms
iter 7160: loss 8.8890, time 123.07ms
iter 7170: loss 8.8388, time 123.03ms
iter 7180: loss 8.6036, time 121.63ms
iter 7190: loss 8.9819, time 120.00ms
tensor(0.4063)
iter 7200: loss 9.5599, time 119.67ms
iter 7210: loss 9.2277, time 119.43ms
iter 7220: loss 8.8175, time 120.41ms
iter 7230: loss 8.9668, time 119.40ms
iter 7240: loss 8.6942, time 121.57ms
step 7250: train loss 7.5240, val loss 7.5017
saving checkpoint to out-shakespeare-char
iter 7250: loss 8.5272, time 2860.94ms
iter 7260: loss 9.3417, time 123.04ms
iter 7270: loss 8.7197, time 123.00ms
iter 7280: loss 8.9789, time 122.42ms
iter 7290: loss 7.8844, time 122.94ms
tensor(0.4373)
iter 7300: loss 8.9486, time 122.94ms
iter 7310: loss 9.0807, time 122.69ms
iter 7320: loss 9.0532, time 121.91ms
iter 7330: loss 9.3056, time 122.51ms
iter 7340: loss 9.2097, time 122.45ms
iter 7350: loss 9.0720, time 123.18ms
iter 7360: loss 9.1293, time 122.53ms
iter 7370: loss 9.0065, time 122.62ms
iter 7380: loss 8.9713, time 122.51ms
iter 7390: loss 9.0408, time 122.22ms
tensor(0.4686)
iter 7400: loss 9.0307, time 122.45ms
iter 7410: loss 9.4807, time 122.34ms
iter 7420: loss 8.9466, time 122.60ms
iter 7430: loss 8.7642, time 122.44ms
iter 7440: loss 9.8410, time 122.38ms
iter 7450: loss 9.0546, time 122.55ms
iter 7460: loss 9.4028, time 122.61ms
iter 7470: loss 9.0521, time 122.53ms
iter 7480: loss 9.1792, time 122.33ms
iter 7490: loss 8.8300, time 123.58ms
tensor(0.5000)
step 7500: train loss 7.4890, val loss 7.4430
saving checkpoint to out-shakespeare-char
iter 7500: loss 8.7428, time 2830.88ms
iter 7510: loss 8.7506, time 123.22ms
iter 7520: loss 8.9957, time 122.55ms
iter 7530: loss 8.9383, time 122.79ms
iter 7540: loss 9.3035, time 123.19ms
iter 7550: loss 9.0101, time 123.23ms
iter 7560: loss 9.5470, time 123.13ms
iter 7570: loss 8.9166, time 122.64ms
iter 7580: loss 9.3741, time 123.00ms
iter 7590: loss 8.5009, time 123.23ms
tensor(0.5314)
iter 7600: loss 9.3439, time 123.16ms
iter 7610: loss 8.9835, time 123.06ms
iter 7620: loss 9.1414, time 121.48ms
iter 7630: loss 8.5827, time 122.75ms
iter 7640: loss 8.7930, time 123.06ms
iter 7650: loss 8.5600, time 123.02ms
iter 7660: loss 8.6178, time 122.85ms
iter 7670: loss 9.0470, time 124.17ms
iter 7680: loss 9.0398, time 123.25ms
iter 7690: loss 8.9669, time 122.92ms
tensor(0.5627)
iter 7700: loss 8.4971, time 122.98ms
iter 7710: loss 8.3834, time 122.70ms
iter 7720: loss 8.9896, time 122.77ms
iter 7730: loss 8.9092, time 122.75ms
iter 7740: loss 8.8918, time 123.03ms
step 7750: train loss 7.4449, val loss 7.4685
saving checkpoint to out-shakespeare-char
iter 7750: loss 9.6388, time 2852.27ms
iter 7760: loss 9.3776, time 122.57ms
iter 7770: loss 8.9989, time 122.73ms
iter 7780: loss 8.8240, time 123.98ms
iter 7790: loss 9.5256, time 122.94ms
tensor(0.5937)
iter 7800: loss 9.1208, time 122.60ms
iter 7810: loss 8.7096, time 121.28ms
iter 7820: loss 9.0180, time 122.65ms
iter 7830: loss 8.9678, time 122.54ms
iter 7840: loss 8.7525, time 121.91ms
iter 7850: loss 8.6821, time 123.24ms
iter 7860: loss 8.1841, time 123.07ms
iter 7870: loss 8.4632, time 123.20ms
iter 7880: loss 8.9798, time 123.08ms
iter 7890: loss 9.5335, time 122.71ms
tensor(0.6243)
iter 7900: loss 8.7421, time 122.08ms
iter 7910: loss 8.8602, time 117.53ms
iter 7920: loss 9.4307, time 118.33ms
iter 7930: loss 8.9001, time 120.69ms
iter 7940: loss 9.2137, time 119.29ms
iter 7950: loss 9.2673, time 119.31ms
iter 7960: loss 9.2554, time 119.43ms
iter 7970: loss 9.6758, time 119.94ms
iter 7980: loss 9.2136, time 119.61ms
iter 7990: loss 9.0353, time 119.33ms
tensor(0.6545)
step 8000: train loss 7.4449, val loss 7.4712
saving checkpoint to out-shakespeare-char
iter 8000: loss 8.7643, time 2858.06ms
iter 8010: loss 8.9213, time 120.10ms
iter 8020: loss 9.0240, time 118.87ms
iter 8030: loss 8.9116, time 119.16ms
iter 8040: loss 8.1992, time 119.04ms
iter 8050: loss 9.0238, time 118.97ms
iter 8060: loss 8.9508, time 120.09ms
iter 8070: loss 8.7938, time 121.35ms
iter 8080: loss 8.8330, time 118.96ms
iter 8090: loss 9.4271, time 118.94ms
tensor(0.6841)
iter 8100: loss 9.3274, time 120.80ms
iter 8110: loss 8.1248, time 121.58ms
iter 8120: loss 8.9144, time 121.86ms
iter 8130: loss 8.6936, time 119.13ms
iter 8140: loss 8.5277, time 120.14ms
iter 8150: loss 8.4810, time 118.76ms
iter 8160: loss 8.6994, time 120.08ms
iter 8170: loss 9.0299, time 118.79ms
iter 8180: loss 8.2305, time 120.51ms
iter 8190: loss 9.4456, time 119.13ms
tensor(0.7129)
iter 8200: loss 8.2961, time 119.09ms
iter 8210: loss 8.8437, time 118.86ms
iter 8220: loss 8.7777, time 120.35ms
iter 8230: loss 8.3382, time 123.15ms
iter 8240: loss 8.9488, time 120.08ms
step 8250: train loss 7.4516, val loss 7.4352
saving checkpoint to out-shakespeare-char
iter 8250: loss 9.0801, time 2866.71ms
iter 8260: loss 9.2977, time 122.84ms
iter 8270: loss 8.6830, time 122.84ms
iter 8280: loss 8.6712, time 122.22ms
iter 8290: loss 8.4952, time 122.55ms
tensor(0.7409)
iter 8300: loss 9.2003, time 123.00ms
iter 8310: loss 9.0238, time 119.44ms
iter 8320: loss 8.2910, time 119.54ms
iter 8330: loss 8.7425, time 120.78ms
iter 8340: loss 8.8602, time 119.77ms
iter 8350: loss 8.8947, time 119.63ms
iter 8360: loss 9.1983, time 119.49ms
iter 8370: loss 8.4198, time 120.64ms
iter 8380: loss 9.4107, time 119.63ms
iter 8390: loss 8.3401, time 120.96ms
tensor(0.7679)
iter 8400: loss 8.4406, time 119.58ms
iter 8410: loss 8.7114, time 119.53ms
iter 8420: loss 8.6057, time 119.48ms
iter 8430: loss 9.4763, time 120.78ms
iter 8440: loss 8.8880, time 121.01ms
iter 8450: loss 9.0896, time 119.63ms
iter 8460: loss 8.8644, time 120.68ms
iter 8470: loss 8.8513, time 120.76ms
iter 8480: loss 9.0672, time 119.56ms
iter 8490: loss 8.5565, time 118.72ms
tensor(0.7939)
step 8500: train loss 7.3706, val loss 7.3998
saving checkpoint to out-shakespeare-char
iter 8500: loss 8.6614, time 2870.21ms
iter 8510: loss 9.0908, time 119.41ms
iter 8520: loss 9.1438, time 119.32ms
iter 8530: loss 9.0449, time 119.10ms
iter 8540: loss 9.4173, time 121.03ms
iter 8550: loss 9.3612, time 121.61ms
iter 8560: loss 8.5650, time 119.09ms
iter 8570: loss 8.8317, time 118.96ms
iter 8580: loss 8.0540, time 119.14ms
iter 8590: loss 9.1799, time 121.37ms
tensor(0.8187)
iter 8600: loss 8.8291, time 118.30ms
iter 8610: loss 8.8138, time 119.14ms
iter 8620: loss 8.2625, time 120.54ms
iter 8630: loss 9.2311, time 121.42ms
iter 8640: loss 9.6533, time 119.75ms
iter 8650: loss 8.5681, time 118.97ms
iter 8660: loss 9.1701, time 118.74ms
iter 8670: loss 8.5945, time 121.35ms
iter 8680: loss 9.4456, time 119.76ms
iter 8690: loss 8.9104, time 120.35ms
tensor(0.8423)
iter 8700: loss 9.2847, time 119.14ms
iter 8710: loss 8.3733, time 121.37ms
iter 8720: loss 8.9130, time 121.45ms
iter 8730: loss 8.5786, time 119.62ms
iter 8740: loss 8.7143, time 119.85ms
step 8750: train loss 7.4264, val loss 7.3611
saving checkpoint to out-shakespeare-char
iter 8750: loss 9.1752, time 2853.82ms
iter 8760: loss 8.7826, time 122.63ms
iter 8770: loss 8.6464, time 122.47ms
iter 8780: loss 9.1413, time 122.42ms
iter 8790: loss 8.6674, time 122.87ms
tensor(0.8645)
iter 8800: loss 8.0856, time 122.98ms
iter 8810: loss 8.7343, time 124.33ms
iter 8820: loss 9.3623, time 120.80ms
iter 8830: loss 8.2551, time 120.53ms
iter 8840: loss 8.1740, time 121.67ms
iter 8850: loss 8.3788, time 119.24ms
iter 8860: loss 8.7589, time 120.46ms
iter 8870: loss 9.1006, time 120.90ms
iter 8880: loss 8.4398, time 120.36ms
iter 8890: loss 9.3394, time 119.43ms
tensor(0.8853)
iter 8900: loss 8.8240, time 119.26ms
iter 8910: loss 9.0343, time 119.31ms
iter 8920: loss 9.3375, time 120.18ms
iter 8930: loss 8.2534, time 119.79ms
iter 8940: loss 9.1280, time 121.87ms
iter 8950: loss 9.1974, time 119.56ms
iter 8960: loss 9.5597, time 119.33ms
iter 8970: loss 8.7729, time 120.57ms
iter 8980: loss 8.0482, time 121.71ms
iter 8990: loss 8.8092, time 120.35ms
tensor(0.9045)
step 9000: train loss 7.3527, val loss 7.3311
saving checkpoint to out-shakespeare-char
iter 9000: loss 8.4024, time 2877.07ms
iter 9010: loss 9.3290, time 119.34ms
iter 9020: loss 8.5761, time 119.24ms
iter 9030: loss 8.6228, time 119.13ms
iter 9040: loss 9.1050, time 120.30ms
iter 9050: loss 8.5511, time 117.98ms
iter 9060: loss 8.6217, time 122.12ms
iter 9070: loss 9.4810, time 120.40ms
iter 9080: loss 8.9059, time 119.58ms
iter 9090: loss 8.4941, time 121.93ms
tensor(0.9222)
iter 9100: loss 8.9249, time 118.55ms
iter 9110: loss 8.5376, time 119.64ms
iter 9120: loss 9.2778, time 119.63ms
iter 9130: loss 8.3852, time 121.16ms
iter 9140: loss 9.1240, time 119.59ms
iter 9150: loss 8.3837, time 119.62ms
iter 9160: loss 9.5856, time 119.58ms
iter 9170: loss 8.4324, time 121.08ms
iter 9180: loss 9.6203, time 120.17ms
iter 9190: loss 8.2883, time 119.24ms
tensor(0.9382)
iter 9200: loss 8.2379, time 119.66ms
iter 9210: loss 7.9373, time 120.14ms
iter 9220: loss 9.4645, time 120.84ms
iter 9230: loss 8.8825, time 119.39ms
iter 9240: loss 8.3473, time 121.58ms
step 9250: train loss 7.2536, val loss 7.3282
saving checkpoint to out-shakespeare-char
iter 9250: loss 9.0212, time 2870.35ms
iter 9260: loss 9.0542, time 118.82ms
iter 9270: loss 8.9200, time 118.65ms
iter 9280: loss 8.8247, time 121.68ms
iter 9290: loss 9.3129, time 118.56ms
tensor(0.9524)
iter 9300: loss 8.6505, time 119.70ms
iter 9310: loss 8.8926, time 119.36ms
iter 9320: loss 8.5512, time 119.57ms
iter 9330: loss 8.9775, time 122.08ms
iter 9340: loss 8.9363, time 119.26ms
iter 9350: loss 7.8405, time 119.07ms
iter 9360: loss 8.6633, time 119.53ms
iter 9370: loss 8.3761, time 120.79ms
iter 9380: loss 7.9773, time 119.55ms
iter 9390: loss 8.3491, time 120.52ms
tensor(0.9649)
iter 9400: loss 9.2369, time 118.74ms
iter 9410: loss 8.8347, time 120.81ms
iter 9420: loss 8.5163, time 120.47ms
iter 9430: loss 8.8525, time 123.03ms
iter 9440: loss 8.5542, time 123.11ms
iter 9450: loss 8.3684, time 123.00ms
iter 9460: loss 8.5369, time 124.34ms
iter 9470: loss 8.6617, time 122.97ms
iter 9480: loss 8.5954, time 122.08ms
iter 9490: loss 8.9589, time 119.92ms
tensor(0.9755)
step 9500: train loss 7.3686, val loss 7.3639
saving checkpoint to out-shakespeare-char
iter 9500: loss 8.8687, time 2877.16ms
iter 9510: loss 8.8142, time 121.17ms
iter 9520: loss 8.0824, time 120.58ms
iter 9530: loss 8.7759, time 119.09ms
iter 9540: loss 8.8042, time 119.36ms
iter 9550: loss 8.8528, time 121.55ms
iter 9560: loss 8.7464, time 119.61ms
iter 9570: loss 8.5885, time 119.21ms
iter 9580: loss 8.5901, time 122.18ms
iter 9590: loss 8.8274, time 119.34ms
tensor(0.9843)
iter 9600: loss 8.7512, time 119.34ms
iter 9610: loss 8.1601, time 122.03ms
iter 9620: loss 8.3462, time 119.31ms
iter 9630: loss 8.9698, time 119.35ms
iter 9640: loss 8.7792, time 121.28ms
iter 9650: loss 9.1651, time 119.58ms
iter 9660: loss 8.4523, time 119.87ms
iter 9670: loss 8.9347, time 121.86ms
iter 9680: loss 8.6709, time 120.55ms
iter 9690: loss 8.9706, time 119.27ms
tensor(0.9911)
iter 9700: loss 8.9813, time 121.33ms
iter 9710: loss 8.9684, time 123.59ms
iter 9720: loss 9.4827, time 120.68ms
iter 9730: loss 8.9688, time 120.59ms
iter 9740: loss 8.7038, time 119.75ms
step 9750: train loss 7.2705, val loss 7.2552
saving checkpoint to out-shakespeare-char
iter 9750: loss 9.3472, time 2881.62ms
iter 9760: loss 9.4388, time 124.26ms
iter 9770: loss 9.2368, time 125.04ms
iter 9780: loss 9.1035, time 123.21ms
iter 9790: loss 8.8658, time 123.34ms
tensor(0.9961)
iter 9800: loss 8.8319, time 123.65ms
iter 9810: loss 8.4779, time 123.57ms
iter 9820: loss 8.6095, time 123.38ms
iter 9830: loss 8.1100, time 123.17ms
iter 9840: loss 8.5804, time 123.55ms
iter 9850: loss 8.5876, time 122.61ms
iter 9860: loss 9.1379, time 123.82ms
iter 9870: loss 9.2414, time 123.12ms
iter 9880: loss 8.3589, time 123.16ms
iter 9890: loss 8.7743, time 122.38ms
tensor(0.9990)
iter 9900: loss 8.6191, time 122.73ms
iter 9910: loss 8.0912, time 122.69ms
iter 9920: loss 8.8400, time 123.13ms
iter 9930: loss 8.8082, time 122.78ms
iter 9940: loss 8.5371, time 123.45ms
iter 9950: loss 8.0673, time 123.13ms
iter 9960: loss 8.3527, time 122.67ms
iter 9970: loss 9.0041, time 123.72ms
iter 9980: loss 8.8832, time 122.77ms
iter 9990: loss 9.2074, time 123.11ms
tensor(1.)
step 10000: train loss 7.2218, val loss 7.2569
saving checkpoint to out-shakespeare-char
iter 10000: loss 8.7493, time 2873.46ms
iter 10010: loss 8.6571, time 123.18ms
iter 10020: loss 8.4844, time 122.52ms
iter 10030: loss 8.7435, time 122.68ms
iter 10040: loss 8.8173, time 122.67ms
iter 10050: loss 9.1490, time 122.79ms
iter 10060: loss 9.0747, time 122.89ms
iter 10070: loss 9.1227, time 122.87ms
iter 10080: loss 9.2455, time 122.81ms
iter 10090: loss 8.7509, time 122.83ms
tensor(0.9990)
iter 10100: loss 8.9667, time 122.41ms
iter 10110: loss 8.5737, time 123.07ms
iter 10120: loss 8.6150, time 122.87ms
iter 10130: loss 9.0981, time 123.04ms
iter 10140: loss 8.5715, time 122.63ms
iter 10150: loss 8.5412, time 122.83ms
iter 10160: loss 8.3685, time 122.58ms
iter 10170: loss 8.8134, time 122.01ms
iter 10180: loss 8.9727, time 122.94ms
iter 10190: loss 8.1685, time 121.93ms
tensor(0.9961)
iter 10200: loss 9.5758, time 122.40ms
iter 10210: loss 8.3822, time 122.61ms
iter 10220: loss 9.0183, time 122.18ms
iter 10230: loss 8.3537, time 123.72ms
iter 10240: loss 9.1867, time 122.81ms
step 10250: train loss 7.2418, val loss 7.2302
saving checkpoint to out-shakespeare-char
iter 10250: loss 8.5701, time 2887.16ms
iter 10260: loss 9.2433, time 122.65ms
iter 10270: loss 7.9537, time 123.45ms
iter 10280: loss 8.6193, time 123.76ms
iter 10290: loss 9.0051, time 125.27ms
tensor(0.9911)
iter 10300: loss 8.8034, time 122.38ms
iter 10310: loss 8.4168, time 122.10ms
iter 10320: loss 8.2294, time 123.08ms
iter 10330: loss 8.8798, time 122.90ms
iter 10340: loss 7.7863, time 122.71ms
iter 10350: loss 8.7082, time 122.98ms
iter 10360: loss 8.3413, time 122.60ms
iter 10370: loss 8.5344, time 123.25ms
iter 10380: loss 9.0954, time 123.44ms
iter 10390: loss 8.4191, time 123.85ms
tensor(0.9843)
iter 10400: loss 9.0039, time 122.70ms
iter 10410: loss 8.7002, time 122.71ms
iter 10420: loss 8.4462, time 122.98ms
iter 10430: loss 8.1992, time 122.81ms
iter 10440: loss 9.6416, time 122.69ms
iter 10450: loss 8.5935, time 122.80ms
iter 10460: loss 9.2275, time 122.83ms
iter 10470: loss 8.5634, time 122.71ms
iter 10480: loss 8.7265, time 122.72ms
iter 10490: loss 8.3790, time 122.91ms
tensor(0.9755)
step 10500: train loss 7.2688, val loss 7.2319
saving checkpoint to out-shakespeare-char
iter 10500: loss 8.9288, time 2882.10ms
iter 10510: loss 8.8881, time 123.13ms
iter 10520: loss 9.1548, time 122.90ms
iter 10530: loss 9.0819, time 122.74ms
iter 10540: loss 8.7248, time 122.68ms
iter 10550: loss 8.5727, time 122.91ms
iter 10560: loss 8.3086, time 122.88ms
iter 10570: loss 8.7849, time 122.88ms
iter 10580: loss 8.6977, time 122.75ms
iter 10590: loss 8.1718, time 122.87ms
tensor(0.9649)
iter 10600: loss 8.0600, time 122.81ms
iter 10610: loss 7.8892, time 122.89ms
iter 10620: loss 9.3859, time 123.18ms
iter 10630: loss 7.9254, time 122.95ms
iter 10640: loss 9.1951, time 123.04ms
iter 10650: loss 8.2167, time 119.26ms
iter 10660: loss 8.6893, time 120.31ms
iter 10670: loss 7.9126, time 119.75ms
iter 10680: loss 8.9406, time 119.27ms
iter 10690: loss 8.4470, time 120.47ms
tensor(0.9524)
iter 10700: loss 8.7511, time 121.11ms
iter 10710: loss 8.4774, time 120.39ms
iter 10720: loss 7.9743, time 120.25ms
iter 10730: loss 8.3881, time 119.13ms
iter 10740: loss 8.9713, time 119.09ms
step 10750: train loss 7.1766, val loss 7.1347
saving checkpoint to out-shakespeare-char
iter 10750: loss 8.7031, time 2862.89ms
iter 10760: loss 8.4510, time 119.09ms
iter 10770: loss 8.7966, time 118.99ms
iter 10780: loss 8.7654, time 119.13ms
iter 10790: loss 8.8003, time 121.15ms
tensor(0.9382)
iter 10800: loss 8.9321, time 120.20ms
iter 10810: loss 9.4241, time 118.99ms
iter 10820: loss 9.3490, time 118.86ms
iter 10830: loss 8.7675, time 119.17ms
iter 10840: loss 8.4179, time 119.26ms
iter 10850: loss 8.3172, time 121.25ms
iter 10860: loss 8.3743, time 120.35ms
iter 10870: loss 9.0486, time 120.29ms
iter 10880: loss 8.3713, time 119.15ms
iter 10890: loss 8.4809, time 119.08ms
tensor(0.9222)
iter 10900: loss 8.1295, time 119.13ms
iter 10910: loss 7.5592, time 120.39ms
iter 10920: loss 8.4508, time 121.50ms
iter 10930: loss 8.5935, time 119.06ms
iter 10940: loss 8.3873, time 119.54ms
iter 10950: loss 9.4363, time 119.81ms
iter 10960: loss 8.5997, time 118.90ms
iter 10970: loss 7.8878, time 120.68ms
iter 10980: loss 8.7761, time 119.08ms
iter 10990: loss 8.9343, time 118.25ms
tensor(0.9045)
step 11000: train loss 7.2062, val loss 7.1500
saving checkpoint to out-shakespeare-char
iter 11000: loss 8.6585, time 2875.25ms
iter 11010: loss 8.5441, time 120.47ms
iter 11020: loss 8.2904, time 121.42ms
iter 11030: loss 8.2715, time 119.39ms
iter 11040: loss 8.3164, time 120.34ms
iter 11050: loss 9.3194, time 120.94ms
iter 11060: loss 9.0340, time 119.43ms
iter 11070: loss 8.8797, time 120.40ms
iter 11080: loss 8.6255, time 119.30ms
iter 11090: loss 7.9881, time 120.45ms
tensor(0.8853)
iter 11100: loss 9.1003, time 120.79ms
iter 11110: loss 9.1464, time 121.32ms
iter 11120: loss 8.3741, time 119.34ms
iter 11130: loss 9.3861, time 119.15ms
iter 11140: loss 8.6977, time 119.07ms
iter 11150: loss 8.6807, time 120.47ms
iter 11160: loss 8.6171, time 119.25ms
iter 11170: loss 7.7410, time 120.59ms
iter 11180: loss 9.2268, time 120.28ms
iter 11190: loss 8.9669, time 120.54ms
tensor(0.8645)
iter 11200: loss 8.7436, time 118.77ms
iter 11210: loss 8.9655, time 119.26ms
iter 11220: loss 8.4735, time 119.34ms
iter 11230: loss 8.0376, time 120.46ms
iter 11240: loss 8.3291, time 119.26ms
step 11250: train loss 7.2111, val loss 7.1859
saving checkpoint to out-shakespeare-char
iter 11250: loss 9.2399, time 2877.02ms
iter 11260: loss 8.8574, time 118.51ms
iter 11270: loss 9.2580, time 120.70ms
iter 11280: loss 7.9236, time 120.64ms
iter 11290: loss 8.5891, time 120.35ms
tensor(0.8423)
iter 11300: loss 8.6250, time 120.87ms
iter 11310: loss 8.2036, time 119.17ms
iter 11320: loss 9.4705, time 119.35ms
iter 11330: loss 8.8585, time 118.70ms
iter 11340: loss 9.1814, time 119.35ms
iter 11350: loss 8.6780, time 119.26ms
iter 11360: loss 8.1752, time 120.38ms
iter 11370: loss 8.6551, time 120.97ms
iter 11380: loss 8.5548, time 119.20ms
iter 11390: loss 8.6598, time 120.34ms
tensor(0.8187)
iter 11400: loss 8.6257, time 120.77ms
iter 11410: loss 8.4142, time 119.76ms
iter 11420: loss 8.6776, time 118.73ms
iter 11430: loss 8.1243, time 119.12ms
iter 11440: loss 8.5958, time 118.99ms
iter 11450: loss 8.8461, time 120.14ms
iter 11460: loss 8.4492, time 119.09ms
iter 11470: loss 8.7977, time 120.70ms
iter 11480: loss 9.2369, time 120.95ms
iter 11490: loss 8.2814, time 119.72ms
tensor(0.7939)
step 11500: train loss 7.2087, val loss 7.1242
saving checkpoint to out-shakespeare-char
iter 11500: loss 8.8357, time 2855.52ms
iter 11510: loss 7.8358, time 120.54ms
iter 11520: loss 9.1570, time 119.95ms
iter 11530: loss 7.6053, time 119.37ms
iter 11540: loss 8.6341, time 120.79ms
iter 11550: loss 7.3682, time 118.96ms
iter 11560: loss 7.9596, time 119.26ms
iter 11570: loss 8.2478, time 119.29ms
iter 11580: loss 8.5981, time 120.49ms
iter 11590: loss 8.8000, time 120.51ms
tensor(0.7679)
iter 11600: loss 8.7015, time 120.56ms
iter 11610: loss 8.2857, time 120.52ms
iter 11620: loss 8.5219, time 118.65ms
iter 11630: loss 8.3001, time 119.58ms
iter 11640: loss 8.2746, time 119.38ms
iter 11650: loss 8.2847, time 119.48ms
iter 11660: loss 9.1984, time 119.36ms
iter 11670: loss 8.2919, time 120.47ms
iter 11680: loss 8.2561, time 121.44ms
iter 11690: loss 9.2151, time 119.21ms
tensor(0.7409)
iter 11700: loss 9.1943, time 119.51ms
iter 11710: loss 8.7389, time 119.29ms
iter 11720: loss 8.4810, time 117.92ms
iter 11730: loss 8.6495, time 118.93ms
iter 11740: loss 9.2844, time 119.45ms
step 11750: train loss 7.1459, val loss 7.0961
saving checkpoint to out-shakespeare-char
iter 11750: loss 8.1223, time 2867.09ms
iter 11760: loss 8.3085, time 121.35ms
iter 11770: loss 7.8111, time 119.24ms
iter 11780: loss 9.3377, time 119.15ms
iter 11790: loss 9.0104, time 119.45ms
tensor(0.7129)
iter 11800: loss 8.3405, time 119.23ms
iter 11810: loss 8.9509, time 118.97ms
iter 11820: loss 7.9700, time 119.16ms
iter 11830: loss 8.8116, time 120.39ms
iter 11840: loss 8.1379, time 120.30ms
iter 11850: loss 8.3576, time 120.45ms
iter 11860: loss 8.6249, time 119.31ms
iter 11870: loss 8.2198, time 119.28ms
iter 11880: loss 7.8580, time 119.10ms
iter 11890: loss 8.7816, time 119.40ms
tensor(0.6841)
iter 11900: loss 7.9721, time 120.77ms
iter 11910: loss 8.1116, time 119.55ms
iter 11920: loss 8.2067, time 119.15ms
iter 11930: loss 8.2573, time 121.24ms
iter 11940: loss 8.7402, time 120.43ms
iter 11950: loss 8.4244, time 119.11ms
iter 11960: loss 8.3471, time 119.06ms
iter 11970: loss 8.4457, time 118.97ms
iter 11980: loss 8.5082, time 119.39ms
iter 11990: loss 8.7439, time 120.75ms
tensor(0.6545)
step 12000: train loss 7.0452, val loss 7.1454
saving checkpoint to out-shakespeare-char
iter 12000: loss 8.8464, time 2907.16ms
iter 12010: loss 9.2663, time 124.95ms
iter 12020: loss 8.3495, time 123.12ms
iter 12030: loss 8.4029, time 122.84ms
iter 12040: loss 8.6840, time 122.20ms
iter 12050: loss 7.9719, time 122.80ms
iter 12060: loss 7.9658, time 122.68ms
iter 12070: loss 9.3008, time 122.88ms
iter 12080: loss 8.1618, time 122.99ms
iter 12090: loss 8.7250, time 122.79ms
tensor(0.6243)
iter 12100: loss 8.8366, time 123.03ms
iter 12110: loss 8.6218, time 122.85ms
iter 12120: loss 8.1454, time 122.72ms
iter 12130: loss 8.6400, time 122.79ms
iter 12140: loss 8.4010, time 123.57ms
iter 12150: loss 8.2136, time 122.86ms
iter 12160: loss 7.9581, time 122.92ms
iter 12170: loss 8.6675, time 123.03ms
iter 12180: loss 8.3781, time 123.83ms
iter 12190: loss 8.6168, time 123.02ms
tensor(0.5937)
iter 12200: loss 8.4683, time 123.32ms
iter 12210: loss 8.4202, time 122.72ms
iter 12220: loss 8.6534, time 122.81ms
iter 12230: loss 8.9799, time 123.06ms
iter 12240: loss 8.7420, time 122.88ms
step 12250: train loss 7.0945, val loss 7.1157
saving checkpoint to out-shakespeare-char
iter 12250: loss 8.4326, time 2869.26ms
iter 12260: loss 8.5695, time 122.94ms
iter 12270: loss 8.1713, time 123.02ms
iter 12280: loss 8.7058, time 123.04ms
iter 12290: loss 8.6433, time 122.77ms
tensor(0.5627)
iter 12300: loss 8.4609, time 123.10ms
iter 12310: loss 7.4879, time 122.83ms
iter 12320: loss 8.2649, time 123.12ms
iter 12330: loss 9.9241, time 122.96ms
iter 12340: loss 8.7139, time 123.11ms
iter 12350: loss 8.4458, time 122.35ms
iter 12360: loss 8.5410, time 122.87ms
iter 12370: loss 8.5057, time 122.76ms
iter 12380: loss 8.1108, time 123.03ms
iter 12390: loss 8.2843, time 122.69ms
tensor(0.5314)
iter 12400: loss 8.4294, time 122.69ms
iter 12410: loss 8.9534, time 122.66ms
iter 12420: loss 8.8786, time 122.52ms
iter 12430: loss 8.1075, time 122.42ms
iter 12440: loss 9.1508, time 122.29ms
iter 12450: loss 8.0978, time 122.83ms
iter 12460: loss 8.2084, time 122.33ms
iter 12470: loss 8.4379, time 122.96ms
iter 12480: loss 8.7685, time 122.85ms
iter 12490: loss 8.6145, time 122.83ms
tensor(0.5000)
step 12500: train loss 7.0942, val loss 7.0658
saving checkpoint to out-shakespeare-char
iter 12500: loss 8.2803, time 2856.69ms
iter 12510: loss 8.2211, time 122.96ms
iter 12520: loss 8.5055, time 122.98ms
iter 12530: loss 8.9944, time 122.26ms
iter 12540: loss 8.9545, time 122.63ms
iter 12550: loss 7.4252, time 121.97ms
iter 12560: loss 8.8695, time 122.22ms
iter 12570: loss 7.5835, time 123.02ms
iter 12580: loss 8.7317, time 122.31ms
iter 12590: loss 8.5420, time 122.29ms
tensor(0.4686)
iter 12600: loss 8.5666, time 122.55ms
iter 12610: loss 8.1974, time 122.48ms
iter 12620: loss 9.0612, time 122.40ms
iter 12630: loss 8.3218, time 122.42ms
iter 12640: loss 8.5960, time 121.81ms
iter 12650: loss 8.2805, time 122.22ms
iter 12660: loss 8.0955, time 122.62ms
iter 12670: loss 9.8658, time 122.60ms
iter 12680: loss 8.1300, time 122.54ms
iter 12690: loss 9.2040, time 123.94ms
tensor(0.4373)
iter 12700: loss 8.7896, time 122.86ms
iter 12710: loss 8.8430, time 122.58ms
iter 12720: loss 7.9336, time 123.10ms
iter 12730: loss 8.8176, time 122.85ms
iter 12740: loss 8.8227, time 122.69ms
step 12750: train loss 7.0279, val loss 7.1005
saving checkpoint to out-shakespeare-char
iter 12750: loss 8.5385, time 2850.18ms
iter 12760: loss 8.6501, time 122.68ms
iter 12770: loss 8.4998, time 123.01ms
iter 12780: loss 8.3217, time 121.69ms
iter 12790: loss 9.0422, time 122.98ms
tensor(0.4063)
iter 12800: loss 9.1311, time 122.85ms
iter 12810: loss 8.6618, time 122.92ms
iter 12820: loss 8.6065, time 122.71ms
iter 12830: loss 7.7003, time 122.84ms
iter 12840: loss 8.3043, time 122.90ms
iter 12850: loss 8.0109, time 122.72ms
iter 12860: loss 8.2922, time 122.91ms
iter 12870: loss 8.3816, time 123.04ms
iter 12880: loss 8.2710, time 122.97ms
iter 12890: loss 8.4312, time 123.05ms
tensor(0.3757)
iter 12900: loss 9.5297, time 122.92ms
iter 12910: loss 9.0823, time 122.84ms
iter 12920: loss 8.9703, time 123.13ms
iter 12930: loss 8.2662, time 122.84ms
iter 12940: loss 9.2726, time 123.11ms
iter 12950: loss 8.2271, time 123.05ms
iter 12960: loss 8.6783, time 123.75ms
iter 12970: loss 7.5209, time 122.79ms
iter 12980: loss 8.7234, time 123.17ms
iter 12990: loss 8.8856, time 122.98ms
tensor(0.3455)
step 13000: train loss 7.0841, val loss 7.0253
saving checkpoint to out-shakespeare-char
iter 13000: loss 8.5528, time 2859.50ms
iter 13010: loss 9.1063, time 121.08ms
iter 13020: loss 8.5248, time 119.54ms
iter 13030: loss 8.9282, time 119.28ms
iter 13040: loss 8.4705, time 121.09ms
iter 13050: loss 8.5972, time 119.39ms
iter 13060: loss 8.3012, time 119.26ms
iter 13070: loss 9.2245, time 119.33ms
iter 13080: loss 7.8303, time 121.36ms
iter 13090: loss 8.3197, time 119.26ms
tensor(0.3159)
iter 13100: loss 8.9489, time 119.15ms
iter 13110: loss 8.6210, time 119.61ms
iter 13120: loss 7.4990, time 120.59ms
iter 13130: loss 8.4809, time 120.81ms
iter 13140: loss 8.3810, time 120.17ms
iter 13150: loss 8.6171, time 118.75ms
iter 13160: loss 8.5340, time 120.58ms
iter 13170: loss 8.5921, time 119.88ms
iter 13180: loss 9.2318, time 121.79ms
iter 13190: loss 8.4724, time 119.08ms
tensor(0.2871)
iter 13200: loss 8.1460, time 120.46ms
iter 13210: loss 8.8441, time 119.78ms
iter 13220: loss 7.4820, time 120.14ms
iter 13230: loss 9.8285, time 119.21ms
iter 13240: loss 7.9578, time 120.40ms
step 13250: train loss 7.0630, val loss 7.0766
saving checkpoint to out-shakespeare-char
iter 13250: loss 8.9565, time 2859.72ms
iter 13260: loss 8.6063, time 119.18ms
iter 13270: loss 8.0026, time 119.27ms
iter 13280: loss 8.5398, time 119.08ms
iter 13290: loss 9.2406, time 121.19ms
tensor(0.2591)
iter 13300: loss 8.4722, time 121.02ms
iter 13310: loss 8.8103, time 121.23ms
iter 13320: loss 8.6862, time 119.11ms
iter 13330: loss 9.2477, time 119.32ms
iter 13340: loss 8.0622, time 120.92ms
iter 13350: loss 9.0436, time 120.40ms
iter 13360: loss 8.8626, time 120.95ms
iter 13370: loss 7.7997, time 118.23ms
iter 13380: loss 8.2423, time 119.50ms
iter 13390: loss 8.0371, time 119.38ms
tensor(0.2321)
iter 13400: loss 9.0615, time 118.00ms
iter 13410: loss 7.8689, time 119.89ms
iter 13420: loss 7.4226, time 119.10ms
iter 13430: loss 9.2447, time 123.14ms
iter 13440: loss 7.6877, time 119.72ms
iter 13450: loss 8.5909, time 119.63ms
iter 13460: loss 8.2987, time 119.10ms
iter 13470: loss 8.5481, time 120.20ms
iter 13480: loss 8.1222, time 121.97ms
iter 13490: loss 8.2830, time 121.23ms
tensor(0.2061)
step 13500: train loss 7.0457, val loss 7.0431
saving checkpoint to out-shakespeare-char
iter 13500: loss 8.2684, time 2878.59ms
iter 13510: loss 8.2815, time 123.13ms
iter 13520: loss 8.8487, time 122.94ms
iter 13530: loss 8.6430, time 123.03ms
iter 13540: loss 8.2250, time 123.21ms
iter 13550: loss 7.4197, time 122.90ms
iter 13560: loss 8.4799, time 122.77ms
iter 13570: loss 8.1151, time 123.05ms
iter 13580: loss 8.8091, time 122.95ms
iter 13590: loss 8.2273, time 123.83ms
tensor(0.1813)
iter 13600: loss 7.7383, time 122.96ms
iter 13610: loss 9.0672, time 121.94ms
iter 13620: loss 8.7988, time 122.99ms
iter 13630: loss 8.7889, time 123.06ms
iter 13640: loss 8.0815, time 123.12ms
iter 13650: loss 8.7880, time 122.75ms
iter 13660: loss 9.0125, time 122.81ms
iter 13670: loss 7.9904, time 123.21ms
iter 13680: loss 8.8978, time 122.87ms
iter 13690: loss 8.2772, time 122.77ms
tensor(0.1577)
iter 13700: loss 8.9530, time 123.04ms
iter 13710: loss 7.9585, time 122.50ms
iter 13720: loss 8.2860, time 122.65ms
iter 13730: loss 8.6915, time 123.02ms
iter 13740: loss 8.0340, time 121.50ms
step 13750: train loss 7.0629, val loss 7.0816
saving checkpoint to out-shakespeare-char
iter 13750: loss 8.2066, time 2863.37ms
iter 13760: loss 9.4910, time 122.82ms
iter 13770: loss 8.6651, time 124.23ms
iter 13780: loss 7.5029, time 122.63ms
iter 13790: loss 8.3154, time 122.54ms
tensor(0.1355)
iter 13800: loss 8.9404, time 122.49ms
iter 13810: loss 7.8074, time 122.53ms
iter 13820: loss 8.6345, time 122.47ms
iter 13830: loss 8.2976, time 122.96ms
iter 13840: loss 8.0472, time 122.86ms
iter 13850: loss 8.1538, time 122.88ms
iter 13860: loss 8.0918, time 123.10ms
iter 13870: loss 8.8989, time 122.88ms
iter 13880: loss 8.4487, time 122.85ms
iter 13890: loss 8.3667, time 123.07ms
tensor(0.1147)
iter 13900: loss 7.9792, time 123.20ms
iter 13910: loss 9.6448, time 122.90ms
iter 13920: loss 8.7852, time 123.40ms
iter 13930: loss 8.0538, time 122.84ms
iter 13940: loss 8.2767, time 122.85ms
iter 13950: loss 8.4326, time 122.83ms
iter 13960: loss 7.8684, time 122.75ms
iter 13970: loss 8.2886, time 122.85ms
iter 13980: loss 8.8247, time 122.85ms
iter 13990: loss 8.1682, time 122.98ms
tensor(0.0955)
step 14000: train loss 7.0464, val loss 7.0562
saving checkpoint to out-shakespeare-char
iter 14000: loss 8.8926, time 2856.00ms
iter 14010: loss 8.2446, time 123.36ms
iter 14020: loss 8.0791, time 123.84ms
iter 14030: loss 9.0385, time 122.95ms
iter 14040: loss 8.2498, time 122.68ms
iter 14050: loss 8.6589, time 121.45ms
iter 14060: loss 8.2616, time 122.04ms
iter 14070: loss 8.3204, time 124.02ms
iter 14080: loss 8.6567, time 123.03ms
iter 14090: loss 8.2532, time 122.38ms
tensor(0.0778)
iter 14100: loss 8.4456, time 123.26ms
iter 14110: loss 8.7289, time 123.35ms
iter 14120: loss 8.1218, time 122.85ms
iter 14130: loss 8.4099, time 123.03ms
iter 14140: loss 9.0133, time 123.37ms
iter 14150: loss 8.7261, time 123.12ms
iter 14160: loss 8.2786, time 122.97ms
iter 14170: loss 7.5943, time 122.74ms
iter 14180: loss 8.1979, time 123.28ms
iter 14190: loss 8.2678, time 123.05ms
tensor(0.0618)
iter 14200: loss 8.1181, time 122.66ms
iter 14210: loss 7.7143, time 122.74ms
iter 14220: loss 8.2555, time 122.87ms
iter 14230: loss 8.4692, time 123.05ms
iter 14240: loss 8.1636, time 123.37ms
step 14250: train loss 7.0302, val loss 7.0622
saving checkpoint to out-shakespeare-char
iter 14250: loss 7.9350, time 2861.77ms
iter 14260: loss 8.4921, time 123.08ms
iter 14270: loss 8.5552, time 121.05ms
iter 14280: loss 8.0707, time 122.96ms
iter 14290: loss 9.1228, time 122.47ms
tensor(0.0476)
iter 14300: loss 8.1337, time 121.19ms
iter 14310: loss 9.1780, time 122.83ms
iter 14320: loss 8.1955, time 122.67ms
iter 14330: loss 7.9999, time 121.74ms
iter 14340: loss 8.1341, time 122.67ms
iter 14350: loss 8.8853, time 122.20ms
iter 14360: loss 8.3070, time 122.39ms
iter 14370: loss 7.6491, time 121.58ms
iter 14380: loss 8.2050, time 123.06ms
iter 14390: loss 8.2360, time 121.83ms
tensor(0.0351)
iter 14400: loss 8.1001, time 122.89ms
iter 14410: loss 7.7955, time 122.55ms
iter 14420: loss 9.0134, time 122.77ms
iter 14430: loss 8.7897, time 122.41ms
iter 14440: loss 8.8255, time 122.86ms
iter 14450: loss 8.8896, time 121.77ms
iter 14460: loss 8.8435, time 122.75ms
iter 14470: loss 8.9891, time 122.57ms
iter 14480: loss 7.9709, time 122.38ms
iter 14490: loss 8.5059, time 122.53ms
tensor(0.0245)
step 14500: train loss 7.0105, val loss 7.0476
saving checkpoint to out-shakespeare-char
iter 14500: loss 8.4650, time 2845.28ms
iter 14510: loss 8.9223, time 119.72ms
iter 14520: loss 8.5294, time 119.35ms
iter 14530: loss 8.4677, time 121.15ms
iter 14540: loss 8.6707, time 119.45ms
iter 14550: loss 8.0037, time 120.21ms
iter 14560: loss 8.1446, time 119.56ms
iter 14570: loss 8.6228, time 119.45ms
iter 14580: loss 7.5965, time 121.21ms
iter 14590: loss 8.6788, time 119.35ms
tensor(0.0157)
iter 14600: loss 8.6446, time 120.51ms
iter 14610: loss 8.7760, time 119.41ms
iter 14620: loss 8.0811, time 121.49ms
iter 14630: loss 9.2381, time 120.69ms
iter 14640: loss 9.1048, time 119.32ms
iter 14650: loss 8.5348, time 120.45ms
iter 14660: loss 8.3194, time 120.58ms
iter 14670: loss 8.4638, time 120.54ms
iter 14680: loss 9.0179, time 119.57ms
iter 14690: loss 8.4164, time 120.97ms
tensor(0.0089)
iter 14700: loss 8.7640, time 119.27ms
iter 14710: loss 8.3267, time 119.59ms
iter 14720: loss 8.5855, time 118.50ms
iter 14730: loss 8.2379, time 119.55ms
iter 14740: loss 9.1925, time 121.24ms
step 14750: train loss 7.0449, val loss 7.0650
saving checkpoint to out-shakespeare-char
iter 14750: loss 8.1461, time 2878.15ms
iter 14760: loss 8.9499, time 123.36ms
iter 14770: loss 8.4683, time 122.95ms
iter 14780: loss 8.5295, time 122.87ms
iter 14790: loss 8.0657, time 123.01ms
tensor(0.0039)
iter 14800: loss 8.5876, time 122.98ms
iter 14810: loss 8.4769, time 123.10ms
iter 14820: loss 8.1338, time 123.26ms
iter 14830: loss 7.8773, time 123.20ms
iter 14840: loss 8.4420, time 123.44ms
iter 14850: loss 7.8665, time 123.15ms
iter 14860: loss 8.2946, time 123.11ms
iter 14870: loss 9.1754, time 123.14ms
iter 14880: loss 8.7344, time 123.19ms
iter 14890: loss 8.0477, time 123.17ms
tensor(0.0010)
iter 14900: loss 8.5087, time 123.15ms
iter 14910: loss 8.0532, time 123.26ms
iter 14920: loss 8.6291, time 122.91ms
iter 14930: loss 7.8961, time 122.23ms
iter 14940: loss 8.3381, time 124.15ms
iter 14950: loss 9.0748, time 123.03ms
iter 14960: loss 8.4668, time 123.07ms
iter 14970: loss 8.8489, time 123.54ms
iter 14980: loss 8.3455, time 122.99ms
iter 14990: loss 8.5498, time 122.92ms
tensor(0.0010)
step 15000: train loss 7.0612, val loss 7.0190
saving checkpoint to out-shakespeare-char
iter 15000: loss 8.4993, time 2850.52ms
iter 15010: loss 8.2237, time 123.03ms
iter 15020: loss 7.7703, time 123.00ms
iter 15030: loss 8.3848, time 123.04ms
iter 15040: loss 8.5035, time 123.05ms
iter 15050: loss 8.3526, time 122.78ms
iter 15060: loss 8.8086, time 122.95ms
iter 15070: loss 9.2020, time 123.18ms
iter 15080: loss 8.5650, time 122.76ms
iter 15090: loss 8.4793, time 122.95ms
tensor(0.0010)
iter 15100: loss 8.7490, time 123.02ms
iter 15110: loss 7.4693, time 123.13ms
iter 15120: loss 8.5580, time 123.30ms
iter 15130: loss 8.3470, time 123.91ms
iter 15140: loss 9.0880, time 123.21ms
iter 15150: loss 7.9809, time 123.06ms
iter 15160: loss 8.6488, time 123.14ms
iter 15170: loss 8.8836, time 122.87ms
iter 15180: loss 7.9084, time 122.23ms
iter 15190: loss 8.2002, time 123.49ms
tensor(0.0039)
iter 15200: loss 8.5526, time 123.28ms
iter 15210: loss 9.3098, time 123.32ms
iter 15220: loss 7.8307, time 123.47ms
iter 15230: loss 9.1312, time 123.39ms
iter 15240: loss 8.8599, time 123.47ms
step 15250: train loss 7.0076, val loss 7.0815
saving checkpoint to out-shakespeare-char
iter 15250: loss 7.7281, time 2851.99ms
iter 15260: loss 9.3017, time 122.59ms
iter 15270: loss 8.9088, time 122.70ms
iter 15280: loss 8.1355, time 122.50ms
iter 15290: loss 8.7557, time 122.68ms
tensor(0.0089)
iter 15300: loss 8.9936, time 123.23ms
iter 15310: loss 8.0941, time 124.00ms
iter 15320: loss 8.1312, time 122.72ms
iter 15330: loss 8.2788, time 122.95ms
iter 15340: loss 8.4114, time 123.05ms
iter 15350: loss 8.3497, time 122.94ms
iter 15360: loss 8.6125, time 122.52ms
iter 15370: loss 8.6075, time 124.55ms
iter 15380: loss 8.9874, time 122.09ms
iter 15390: loss 8.1680, time 122.04ms
tensor(0.0157)
iter 15400: loss 8.6072, time 122.80ms
iter 15410: loss 8.4296, time 123.03ms
iter 15420: loss 8.3747, time 122.81ms
iter 15430: loss 8.2403, time 123.42ms
iter 15440: loss 8.3276, time 123.22ms
iter 15450: loss 9.0880, time 122.66ms
iter 15460: loss 8.9493, time 122.65ms
iter 15470: loss 7.6297, time 122.87ms
iter 15480: loss 8.5021, time 123.29ms
iter 15490: loss 8.5653, time 123.52ms
tensor(0.0245)
step 15500: train loss 7.0953, val loss 7.0290
saving checkpoint to out-shakespeare-char
iter 15500: loss 8.3711, time 2856.36ms
iter 15510: loss 8.3069, time 123.01ms
iter 15520: loss 8.1736, time 121.97ms
iter 15530: loss 8.4911, time 122.90ms
iter 15540: loss 8.4443, time 123.11ms
iter 15550: loss 7.3352, time 122.62ms
iter 15560: loss 8.3115, time 122.46ms
iter 15570: loss 8.5335, time 122.63ms
iter 15580: loss 8.3597, time 122.73ms
iter 15590: loss 8.8523, time 122.37ms
tensor(0.0351)
iter 15600: loss 9.0353, time 122.56ms
iter 15610: loss 8.5244, time 122.43ms
iter 15620: loss 7.8299, time 122.31ms
iter 15630: loss 9.4210, time 122.08ms
iter 15640: loss 8.9103, time 122.36ms
iter 15650: loss 8.2487, time 122.62ms
iter 15660: loss 7.9695, time 122.40ms
iter 15670: loss 8.9696, time 124.73ms
iter 15680: loss 8.7642, time 122.83ms
iter 15690: loss 8.9933, time 122.70ms
tensor(0.0476)
iter 15700: loss 8.8996, time 123.53ms
iter 15710: loss 8.0109, time 123.29ms
iter 15720: loss 8.6204, time 123.29ms
iter 15730: loss 8.5779, time 123.03ms
iter 15740: loss 8.5387, time 122.67ms
step 15750: train loss 7.0403, val loss 7.0507
saving checkpoint to out-shakespeare-char
iter 15750: loss 8.7592, time 2857.34ms
iter 15760: loss 8.6798, time 122.79ms
iter 15770: loss 8.1849, time 123.07ms
iter 15780: loss 8.4683, time 123.14ms
iter 15790: loss 8.9709, time 123.02ms
tensor(0.0618)
iter 15800: loss 8.5484, time 123.17ms
iter 15810: loss 8.6131, time 124.41ms
iter 15820: loss 7.6346, time 122.93ms
iter 15830: loss 8.4595, time 123.24ms
iter 15840: loss 8.2481, time 123.48ms
iter 15850: loss 8.6797, time 123.04ms
iter 15860: loss 8.4533, time 122.94ms
iter 15870: loss 8.7108, time 123.16ms
iter 15880: loss 7.8414, time 123.17ms
iter 15890: loss 8.3480, time 123.09ms
tensor(0.0778)
iter 15900: loss 8.4026, time 123.12ms
iter 15910: loss 8.2681, time 123.26ms
iter 15920: loss 8.2343, time 123.09ms
iter 15930: loss 8.1023, time 123.07ms
iter 15940: loss 9.3303, time 123.03ms
iter 15950: loss 8.9756, time 123.38ms
iter 15960: loss 8.7853, time 122.97ms
iter 15970: loss 8.4675, time 123.20ms
iter 15980: loss 8.3771, time 122.66ms
iter 15990: loss 8.8333, time 122.84ms
tensor(0.0955)
step 16000: train loss 7.0519, val loss 7.0256
saving checkpoint to out-shakespeare-char
iter 16000: loss 8.0836, time 2855.98ms
iter 16010: loss 8.9414, time 123.15ms
iter 16020: loss 8.5150, time 122.81ms
iter 16030: loss 9.0428, time 122.79ms
iter 16040: loss 9.0744, time 123.46ms
iter 16050: loss 8.0441, time 122.68ms
iter 16060: loss 8.8441, time 122.77ms
iter 16070: loss 8.6879, time 122.89ms
iter 16080: loss 8.4509, time 122.63ms
iter 16090: loss 8.6184, time 122.78ms
tensor(0.1147)
iter 16100: loss 8.0490, time 123.09ms
iter 16110: loss 8.7833, time 122.77ms
iter 16120: loss 7.8340, time 123.13ms
iter 16130: loss 8.2563, time 122.95ms
iter 16140: loss 8.9179, time 122.40ms
iter 16150: loss 8.8608, time 122.66ms
iter 16160: loss 8.5818, time 122.54ms
iter 16170: loss 8.5233, time 122.28ms
iter 16180: loss 8.3932, time 122.53ms
iter 16190: loss 8.0751, time 122.95ms
tensor(0.1355)
iter 16200: loss 8.8120, time 122.91ms
iter 16210: loss 8.2369, time 123.10ms
iter 16220: loss 8.6152, time 123.37ms
iter 16230: loss 8.0076, time 122.86ms
iter 16240: loss 9.0878, time 122.94ms
step 16250: train loss 7.0144, val loss 7.0130
saving checkpoint to out-shakespeare-char
iter 16250: loss 8.7630, time 2867.62ms
iter 16260: loss 8.1568, time 122.85ms
iter 16270: loss 8.0418, time 122.82ms
iter 16280: loss 7.7978, time 122.66ms
iter 16290: loss 8.5839, time 122.68ms
tensor(0.1577)
iter 16300: loss 7.7329, time 122.96ms
iter 16310: loss 8.5092, time 122.71ms
iter 16320: loss 8.4089, time 124.74ms
iter 16330: loss 8.3064, time 123.00ms
iter 16340: loss 7.7964, time 122.72ms
iter 16350: loss 8.3613, time 122.46ms
iter 16360: loss 8.0410, time 125.29ms
iter 16370: loss 8.0744, time 123.07ms
iter 16380: loss 8.6480, time 122.98ms
iter 16390: loss 8.4872, time 123.29ms
tensor(0.1813)
iter 16400: loss 8.4984, time 123.02ms
iter 16410: loss 8.0071, time 123.02ms
iter 16420: loss 8.3165, time 123.39ms
iter 16430: loss 8.6656, time 123.10ms
iter 16440: loss 9.4482, time 123.14ms
iter 16450: loss 9.5568, time 123.09ms
iter 16460: loss 8.5316, time 123.10ms
iter 16470: loss 8.6339, time 122.79ms
iter 16480: loss 8.6742, time 123.51ms
iter 16490: loss 8.1719, time 123.37ms
tensor(0.2061)
step 16500: train loss 7.0300, val loss 6.9863
saving checkpoint to out-shakespeare-char
iter 16500: loss 8.5386, time 2870.31ms
iter 16510: loss 8.8693, time 122.72ms
iter 16520: loss 8.5084, time 122.79ms
iter 16530: loss 8.3590, time 122.93ms
iter 16540: loss 7.8693, time 122.98ms
iter 16550: loss 8.3890, time 123.11ms
iter 16560: loss 8.6759, time 123.02ms
iter 16570: loss 8.6099, time 122.92ms
iter 16580: loss 8.1532, time 122.72ms
iter 16590: loss 7.9320, time 122.64ms
tensor(0.2321)
iter 16600: loss 8.4790, time 122.95ms
iter 16610: loss 8.1576, time 122.98ms
iter 16620: loss 8.8213, time 122.83ms
iter 16630: loss 8.3237, time 121.58ms
iter 16640: loss 8.7430, time 122.60ms
iter 16650: loss 8.0727, time 122.56ms
iter 16660: loss 8.1596, time 122.48ms
iter 16670: loss 8.6527, time 122.50ms
iter 16680: loss 9.1034, time 122.85ms
iter 16690: loss 8.3516, time 122.38ms
tensor(0.2591)
iter 16700: loss 8.4429, time 122.26ms
iter 16710: loss 7.7894, time 122.45ms
iter 16720: loss 8.2893, time 122.62ms
iter 16730: loss 8.4175, time 122.57ms
iter 16740: loss 7.6006, time 122.98ms
step 16750: train loss 7.0430, val loss 7.0038
saving checkpoint to out-shakespeare-char
iter 16750: loss 8.0985, time 2851.29ms
iter 16760: loss 8.3463, time 122.56ms
iter 16770: loss 7.7630, time 124.06ms
iter 16780: loss 8.6494, time 121.92ms
iter 16790: loss 9.0083, time 122.71ms
tensor(0.2871)
iter 16800: loss 7.9393, time 122.75ms
iter 16810: loss 8.3515, time 122.51ms
iter 16820: loss 8.2789, time 122.57ms
iter 16830: loss 8.1410, time 122.57ms
iter 16840: loss 8.7068, time 122.47ms
iter 16850: loss 7.9665, time 122.55ms
iter 16860: loss 8.7022, time 122.75ms
iter 16870: loss 8.1751, time 122.54ms
iter 16880: loss 9.2658, time 122.52ms
iter 16890: loss 8.3964, time 122.61ms
tensor(0.3159)
iter 16900: loss 8.0137, time 122.48ms
iter 16910: loss 8.7874, time 122.57ms
iter 16920: loss 8.1531, time 122.62ms
iter 16930: loss 7.9254, time 122.47ms
iter 16940: loss 9.3911, time 122.72ms
iter 16950: loss 8.2610, time 124.28ms
iter 16960: loss 8.0719, time 122.46ms
iter 16970: loss 8.1237, time 123.06ms
iter 16980: loss 8.2087, time 123.01ms
iter 16990: loss 7.8625, time 122.97ms
tensor(0.3455)
step 17000: train loss 7.0600, val loss 6.9776
saving checkpoint to out-shakespeare-char
iter 17000: loss 8.6801, time 2841.70ms
iter 17010: loss 8.3935, time 119.27ms
iter 17020: loss 9.0533, time 118.77ms
iter 17030: loss 8.1765, time 119.99ms
iter 17040: loss 8.3345, time 119.03ms
iter 17050: loss 8.3412, time 118.35ms
iter 17060: loss 8.7134, time 119.12ms
iter 17070: loss 8.7238, time 123.89ms
iter 17080: loss 8.7202, time 122.62ms
iter 17090: loss 8.3739, time 122.24ms
tensor(0.3757)
iter 17100: loss 8.4156, time 122.33ms
iter 17110: loss 7.8411, time 123.05ms
iter 17120: loss 8.0715, time 122.46ms
iter 17130: loss 8.4245, time 121.74ms
iter 17140: loss 8.4200, time 122.32ms
iter 17150: loss 8.2012, time 122.30ms
iter 17160: loss 8.1472, time 122.34ms
iter 17170: loss 8.6167, time 122.67ms
iter 17180: loss 8.2290, time 122.83ms
iter 17190: loss 8.2760, time 122.04ms
tensor(0.4063)
iter 17200: loss 8.6096, time 122.88ms
iter 17210: loss 8.0055, time 122.50ms
iter 17220: loss 7.9079, time 122.56ms
iter 17230: loss 8.8646, time 122.75ms
iter 17240: loss 8.8026, time 121.31ms
step 17250: train loss 7.0429, val loss 7.0082
saving checkpoint to out-shakespeare-char
iter 17250: loss 8.7589, time 2876.00ms
iter 17260: loss 8.1170, time 119.35ms
iter 17270: loss 8.3446, time 119.95ms
iter 17280: loss 8.8884, time 119.24ms
iter 17290: loss 7.9317, time 120.55ms
tensor(0.4373)
iter 17300: loss 8.5311, time 119.43ms
iter 17310: loss 8.0935, time 120.47ms
iter 17320: loss 9.1119, time 119.14ms
iter 17330: loss 8.7824, time 121.74ms
iter 17340: loss 7.5942, time 119.24ms
iter 17350: loss 8.1297, time 119.36ms
iter 17360: loss 8.9656, time 119.25ms
iter 17370: loss 8.5699, time 118.67ms
iter 17380: loss 7.8767, time 119.01ms
iter 17390: loss 8.4400, time 119.21ms
tensor(0.4686)
iter 17400: loss 8.1686, time 119.20ms
iter 17410: loss 8.5162, time 119.08ms
iter 17420: loss 7.8999, time 119.66ms
iter 17430: loss 9.1089, time 119.09ms
iter 17440: loss 7.8428, time 119.15ms
iter 17450: loss 8.4522, time 120.44ms
iter 17460: loss 8.0020, time 118.45ms
iter 17470: loss 8.4442, time 119.00ms
iter 17480: loss 9.0189, time 120.57ms
iter 17490: loss 7.9116, time 121.43ms
tensor(0.5000)
step 17500: train loss 6.9699, val loss 6.9590
saving checkpoint to out-shakespeare-char
iter 17500: loss 8.0277, time 2851.94ms
iter 17510: loss 8.4051, time 119.11ms
iter 17520: loss 8.3730, time 119.20ms
iter 17530: loss 8.4522, time 120.32ms
iter 17540: loss 8.2873, time 119.38ms
iter 17550: loss 8.7138, time 118.82ms
iter 17560: loss 8.0220, time 120.36ms
iter 17570: loss 8.7551, time 119.22ms
iter 17580: loss 8.4977, time 119.93ms
iter 17590: loss 8.3664, time 119.44ms
tensor(0.5314)
iter 17600: loss 8.9080, time 119.09ms
iter 17610: loss 8.5881, time 119.17ms
iter 17620: loss 8.7563, time 119.14ms
iter 17630: loss 8.1812, time 118.95ms
iter 17640: loss 8.7036, time 119.23ms
iter 17650: loss 8.0239, time 120.42ms
iter 17660: loss 8.6493, time 119.66ms
iter 17670: loss 7.9940, time 118.66ms
iter 17680: loss 7.6959, time 117.24ms
iter 17690: loss 8.4520, time 120.69ms
tensor(0.5627)
iter 17700: loss 7.8111, time 117.40ms
iter 17710: loss 8.2940, time 117.64ms
iter 17720: loss 8.7985, time 120.27ms
iter 17730: loss 8.2312, time 117.58ms
iter 17740: loss 8.6439, time 117.78ms
step 17750: train loss 6.9683, val loss 7.0646
saving checkpoint to out-shakespeare-char
iter 17750: loss 9.0284, time 2865.00ms
iter 17760: loss 7.5910, time 117.19ms
iter 17770: loss 7.8003, time 117.84ms
iter 17780: loss 8.8188, time 121.60ms
iter 17790: loss 7.6533, time 119.60ms
tensor(0.5937)
iter 17800: loss 8.9828, time 121.61ms
iter 17810: loss 8.0165, time 119.06ms
iter 17820: loss 8.2967, time 119.75ms
iter 17830: loss 8.1305, time 119.11ms
iter 17840: loss 8.5855, time 118.23ms
iter 17850: loss 7.6409, time 119.15ms
iter 17860: loss 8.5763, time 119.25ms
iter 17870: loss 8.8646, time 119.22ms
iter 17880: loss 8.0201, time 120.79ms
iter 17890: loss 7.9389, time 119.85ms
tensor(0.6243)
iter 17900: loss 8.7768, time 120.30ms
iter 17910: loss 8.7662, time 120.26ms
iter 17920: loss 8.4898, time 120.38ms
iter 17930: loss 8.8931, time 120.27ms
iter 17940: loss 7.8336, time 119.16ms
iter 17950: loss 8.8154, time 119.82ms
iter 17960: loss 8.5170, time 118.85ms
iter 17970: loss 8.4501, time 121.93ms
iter 17980: loss 8.3254, time 119.23ms
iter 17990: loss 8.6583, time 122.32ms
tensor(0.6545)
step 18000: train loss 6.9763, val loss 7.0196
saving checkpoint to out-shakespeare-char
iter 18000: loss 8.3263, time 2875.20ms
iter 18010: loss 8.8520, time 121.26ms
iter 18020: loss 8.7346, time 120.11ms
iter 18030: loss 9.5208, time 119.12ms
iter 18040: loss 8.6413, time 121.02ms
iter 18050: loss 8.1247, time 120.12ms
iter 18060: loss 8.6503, time 119.11ms
iter 18070: loss 9.0762, time 120.43ms
iter 18080: loss 8.7350, time 129.25ms
iter 18090: loss 8.0661, time 122.88ms
tensor(0.6841)
iter 18100: loss 8.2483, time 120.86ms
iter 18110: loss 8.2075, time 122.14ms
iter 18120: loss 7.7966, time 121.01ms
iter 18130: loss 7.4014, time 122.73ms
iter 18140: loss 8.8265, time 122.70ms
iter 18150: loss 9.0871, time 121.88ms
iter 18160: loss 8.0472, time 122.65ms
iter 18170: loss 8.4469, time 122.83ms
iter 18180: loss 8.3167, time 122.97ms
iter 18190: loss 8.4289, time 125.92ms
tensor(0.7129)
iter 18200: loss 9.0197, time 123.34ms
iter 18210: loss 8.3252, time 122.88ms
iter 18220: loss 8.1970, time 122.89ms
iter 18230: loss 8.4771, time 122.65ms
iter 18240: loss 9.4683, time 122.58ms
step 18250: train loss 7.0085, val loss 7.0171
saving checkpoint to out-shakespeare-char
iter 18250: loss 8.6276, time 2845.38ms
iter 18260: loss 8.1657, time 123.30ms
iter 18270: loss 8.9366, time 126.11ms
iter 18280: loss 8.3026, time 123.04ms
iter 18290: loss 8.7038, time 123.28ms
tensor(0.7409)
iter 18300: loss 8.0266, time 122.98ms
iter 18310: loss 8.1601, time 122.78ms
iter 18320: loss 8.5465, time 122.99ms
iter 18330: loss 8.2706, time 122.76ms
iter 18340: loss 8.2784, time 122.74ms
iter 18350: loss 8.2597, time 123.36ms
iter 18360: loss 8.3965, time 125.86ms
iter 18370: loss 8.4239, time 123.26ms
iter 18380: loss 8.2288, time 122.71ms
iter 18390: loss 7.8759, time 123.57ms
tensor(0.7679)
iter 18400: loss 8.5137, time 123.32ms
iter 18410: loss 8.5835, time 123.35ms
iter 18420: loss 8.6320, time 123.41ms
iter 18430: loss 8.2729, time 123.30ms
iter 18440: loss 9.3447, time 126.16ms
iter 18450: loss 8.1169, time 123.27ms
iter 18460: loss 8.8312, time 122.75ms
iter 18470: loss 8.7916, time 122.72ms
iter 18480: loss 8.0792, time 122.70ms
iter 18490: loss 7.8247, time 122.82ms
tensor(0.7939)
step 18500: train loss 7.0002, val loss 6.9683
saving checkpoint to out-shakespeare-char
iter 18500: loss 8.0207, time 2862.59ms
iter 18510: loss 8.3267, time 122.94ms
iter 18520: loss 8.3268, time 123.39ms
iter 18530: loss 8.3990, time 125.52ms
iter 18540: loss 8.2489, time 122.46ms
iter 18550: loss 8.8207, time 122.49ms
iter 18560: loss 8.5002, time 122.59ms
iter 18570: loss 8.2832, time 121.99ms
iter 18580: loss 8.6115, time 122.57ms
iter 18590: loss 8.7049, time 123.00ms
tensor(0.8187)
iter 18600: loss 8.6042, time 125.62ms
iter 18610: loss 7.6497, time 123.25ms
iter 18620: loss 7.4012, time 122.70ms
iter 18630: loss 8.9654, time 123.06ms
iter 18640: loss 7.9757, time 122.47ms
iter 18650: loss 8.0431, time 122.49ms
iter 18660: loss 8.1232, time 122.13ms
iter 18670: loss 8.4091, time 125.78ms
iter 18680: loss 8.0346, time 122.42ms
iter 18690: loss 8.8956, time 122.43ms
tensor(0.8423)
iter 18700: loss 9.2165, time 122.52ms
iter 18710: loss 8.8590, time 122.46ms
iter 18720: loss 8.5600, time 121.85ms
iter 18730: loss 7.6441, time 122.55ms
iter 18740: loss 8.1537, time 122.81ms
step 18750: train loss 6.9486, val loss 6.9796
saving checkpoint to out-shakespeare-char
iter 18750: loss 7.7724, time 2888.46ms
iter 18760: loss 8.0157, time 124.85ms
iter 18770: loss 8.4634, time 122.64ms
iter 18780: loss 8.2015, time 122.99ms
iter 18790: loss 8.7801, time 122.16ms
tensor(0.8645)
iter 18800: loss 8.9988, time 122.23ms
iter 18810: loss 8.1175, time 121.94ms
iter 18820: loss 8.5763, time 121.85ms
iter 18830: loss 8.8536, time 124.66ms
iter 18840: loss 8.6716, time 121.85ms
iter 18850: loss 7.8984, time 121.85ms
iter 18860: loss 8.1101, time 121.69ms
iter 18870: loss 8.3730, time 121.61ms
iter 18880: loss 8.3079, time 121.77ms
iter 18890: loss 9.0522, time 123.22ms
tensor(0.8853)
iter 18900: loss 8.5230, time 125.44ms
iter 18910: loss 7.9734, time 122.50ms
iter 18920: loss 8.1798, time 122.57ms
iter 18930: loss 8.2894, time 122.44ms
iter 18940: loss 8.0280, time 122.49ms
iter 18950: loss 8.0073, time 122.39ms
iter 18960: loss 8.2152, time 122.60ms
iter 18970: loss 7.9432, time 125.50ms
iter 18980: loss 8.4508, time 122.65ms
iter 18990: loss 8.5994, time 122.44ms
tensor(0.9045)
step 19000: train loss 6.9884, val loss 6.9166
saving checkpoint to out-shakespeare-char
iter 19000: loss 8.0148, time 2854.96ms
iter 19010: loss 8.4873, time 125.21ms
iter 19020: loss 9.2566, time 122.36ms
iter 19030: loss 7.7775, time 122.49ms
iter 19040: loss 8.1408, time 123.09ms
iter 19050: loss 8.1068, time 122.51ms
iter 19060: loss 8.3403, time 122.34ms
iter 19070: loss 8.3295, time 122.38ms
iter 19080: loss 8.5641, time 125.38ms
iter 19090: loss 8.6297, time 122.72ms
tensor(0.9222)
iter 19100: loss 8.7402, time 122.44ms
iter 19110: loss 8.0334, time 122.50ms
iter 19120: loss 8.4358, time 122.34ms
iter 19130: loss 8.9704, time 122.49ms
iter 19140: loss 7.6757, time 122.50ms
iter 19150: loss 8.5974, time 125.79ms
iter 19160: loss 8.2580, time 122.80ms
iter 19170: loss 8.2966, time 122.63ms
iter 19180: loss 8.3543, time 122.20ms
iter 19190: loss 8.6748, time 123.08ms
tensor(0.9382)
iter 19200: loss 8.7501, time 123.02ms
iter 19210: loss 8.4344, time 122.69ms
iter 19220: loss 8.2041, time 124.82ms
iter 19230: loss 8.3662, time 122.53ms
iter 19240: loss 8.8714, time 121.87ms
step 19250: train loss 6.9074, val loss 6.9774
saving checkpoint to out-shakespeare-char
iter 19250: loss 7.8024, time 2895.61ms
iter 19260: loss 8.4478, time 126.18ms
iter 19270: loss 8.2664, time 123.28ms
iter 19280: loss 9.0266, time 123.20ms
iter 19290: loss 7.8294, time 122.60ms
tensor(0.9524)
iter 19300: loss 8.2699, time 122.47ms
iter 19310: loss 7.9917, time 122.59ms
iter 19320: loss 8.3123, time 121.86ms
iter 19330: loss 8.2438, time 125.62ms
iter 19340: loss 7.2788, time 122.68ms
iter 19350: loss 7.7412, time 123.90ms
iter 19360: loss 9.5393, time 122.34ms
iter 19370: loss 8.6344, time 123.27ms
iter 19380: loss 8.0121, time 123.25ms
iter 19390: loss 8.5763, time 123.29ms
tensor(0.9649)
iter 19400: loss 9.5486, time 123.17ms
iter 19410: loss 8.5209, time 125.66ms
iter 19420: loss 8.5318, time 123.15ms
iter 19430: loss 8.5036, time 123.00ms
iter 19440: loss 7.8358, time 123.00ms
iter 19450: loss 8.8846, time 123.27ms
iter 19460: loss 8.5147, time 123.14ms
iter 19470: loss 8.1678, time 123.12ms
iter 19480: loss 8.4699, time 123.62ms
iter 19490: loss 7.9591, time 123.15ms
tensor(0.9755)
step 19500: train loss 6.9670, val loss 6.9727
saving checkpoint to out-shakespeare-char
iter 19500: loss 8.4913, time 2849.72ms
iter 19510: loss 8.2290, time 121.21ms
iter 19520: loss 7.9402, time 119.31ms
iter 19530: loss 8.8894, time 119.40ms
iter 19540: loss 8.0422, time 120.55ms
iter 19550: loss 7.6086, time 119.04ms
iter 19560: loss 8.2786, time 120.48ms
iter 19570: loss 7.5447, time 119.47ms
iter 19580: loss 8.4536, time 120.85ms
iter 19590: loss 7.8365, time 121.22ms
tensor(0.9843)
iter 19600: loss 8.6704, time 121.10ms
iter 19610: loss 8.7124, time 120.47ms
iter 19620: loss 8.9208, time 119.64ms
iter 19630: loss 7.7750, time 119.71ms
iter 19640: loss 8.3309, time 119.63ms
iter 19650: loss 8.9878, time 119.67ms
iter 19660: loss 9.2650, time 120.84ms
iter 19670: loss 9.0324, time 119.70ms
iter 19680: loss 7.9408, time 120.58ms
iter 19690: loss 7.8698, time 118.66ms
tensor(0.9911)
iter 19700: loss 8.3733, time 121.34ms
iter 19710: loss 8.5933, time 119.34ms
iter 19720: loss 8.7312, time 120.78ms
iter 19730: loss 8.1143, time 121.11ms
iter 19740: loss 8.6267, time 119.69ms
step 19750: train loss 6.8880, val loss 6.9178
saving checkpoint to out-shakespeare-char
iter 19750: loss 8.7717, time 2872.76ms
iter 19760: loss 9.3977, time 122.33ms
iter 19770: loss 8.7531, time 122.62ms
iter 19780: loss 8.3667, time 122.63ms
iter 19790: loss 7.8752, time 122.66ms
tensor(0.9961)
iter 19800: loss 8.3757, time 125.14ms
iter 19810: loss 7.6418, time 122.08ms
iter 19820: loss 8.5479, time 122.52ms
iter 19830: loss 8.7141, time 123.02ms
iter 19840: loss 8.6119, time 123.04ms
iter 19850: loss 8.5604, time 122.99ms
iter 19860: loss 8.3207, time 123.00ms
iter 19870: loss 7.7953, time 123.17ms
iter 19880: loss 8.5086, time 122.91ms
iter 19890: loss 8.0318, time 123.01ms
tensor(0.9990)
iter 19900: loss 8.6512, time 126.39ms
iter 19910: loss 8.0786, time 123.29ms
iter 19920: loss 8.4589, time 122.78ms
iter 19930: loss 8.4033, time 122.96ms
iter 19940: loss 8.7434, time 123.01ms
iter 19950: loss 7.9309, time 123.04ms
iter 19960: loss 8.4883, time 123.12ms
iter 19970: loss 9.0115, time 123.04ms
iter 19980: loss 8.7508, time 122.74ms
iter 19990: loss 8.2483, time 122.98ms
tensor(1.)
step 20000: train loss 6.9255, val loss 6.9137
saving checkpoint to out-shakespeare-char
iter 20000: loss 7.7488, time 2859.93ms
iter 20010: loss 8.5384, time 120.48ms
iter 20020: loss 8.8906, time 120.41ms
iter 20030: loss 8.7153, time 118.35ms
iter 20040: loss 8.6872, time 120.29ms
iter 20050: loss 7.6536, time 119.09ms
iter 20060: loss 8.2251, time 119.36ms
iter 20070: loss 8.6728, time 120.49ms
iter 20080: loss 9.0387, time 119.19ms
iter 20090: loss 7.8934, time 122.10ms
tensor(0.9990)
iter 20100: loss 7.9142, time 119.22ms
iter 20110: loss 8.2023, time 119.39ms
iter 20120: loss 8.1525, time 119.63ms
iter 20130: loss 8.0209, time 119.68ms
iter 20140: loss 8.5572, time 119.11ms
iter 20150: loss 8.1688, time 118.96ms
iter 20160: loss 8.5483, time 119.06ms
iter 20170: loss 7.7133, time 119.60ms
iter 20180: loss 8.4306, time 120.45ms
iter 20190: loss 8.7967, time 120.27ms
tensor(0.9961)
iter 20200: loss 8.3146, time 119.94ms
iter 20210: loss 9.7020, time 118.47ms
iter 20220: loss 8.7284, time 119.13ms
iter 20230: loss 8.7339, time 119.67ms
iter 20240: loss 8.0440, time 119.37ms
step 20250: train loss 6.9069, val loss 6.9258
saving checkpoint to out-shakespeare-char
iter 20250: loss 8.1257, time 2865.58ms
iter 20260: loss 8.2819, time 121.38ms
iter 20270: loss 8.0211, time 118.82ms
iter 20280: loss 8.4678, time 123.09ms
iter 20290: loss 7.7421, time 119.70ms
tensor(0.9911)
iter 20300: loss 8.6760, time 119.71ms
iter 20310: loss 8.8422, time 119.50ms
iter 20320: loss 8.1113, time 120.80ms
iter 20330: loss 8.0261, time 118.32ms
iter 20340: loss 9.5481, time 120.15ms
iter 20350: loss 8.3709, time 118.31ms
iter 20360: loss 8.8078, time 119.01ms
iter 20370: loss 7.9468, time 119.07ms
iter 20380: loss 8.3388, time 119.34ms
iter 20390: loss 8.2748, time 119.58ms
tensor(0.9843)
iter 20400: loss 7.6875, time 121.05ms
iter 20410: loss 8.1702, time 120.87ms
iter 20420: loss 7.5610, time 120.51ms
iter 20430: loss 8.2001, time 119.34ms
iter 20440: loss 8.1077, time 120.03ms
iter 20450: loss 8.4047, time 121.43ms
iter 20460: loss 8.3646, time 120.70ms
iter 20470: loss 8.2446, time 119.54ms
iter 20480: loss 6.9408, time 120.76ms
iter 20490: loss 8.0404, time 122.36ms
tensor(0.9755)
step 20500: train loss 6.9471, val loss 6.9224
saving checkpoint to out-shakespeare-char
iter 20500: loss 8.7683, time 2874.60ms
iter 20510: loss 8.9502, time 121.79ms
iter 20520: loss 8.1002, time 120.32ms
iter 20530: loss 7.2749, time 119.32ms
iter 20540: loss 8.7072, time 119.15ms
iter 20550: loss 8.7257, time 119.53ms
iter 20560: loss 7.9906, time 120.40ms
iter 20570: loss 8.3739, time 120.04ms
iter 20580: loss 7.6263, time 120.70ms
iter 20590: loss 9.1251, time 120.19ms
tensor(0.9649)
iter 20600: loss 8.8487, time 119.48ms
iter 20610: loss 8.4913, time 119.17ms
iter 20620: loss 8.7531, time 120.15ms
iter 20630: loss 7.9867, time 119.71ms
iter 20640: loss 7.7015, time 120.12ms
iter 20650: loss 7.8788, time 119.19ms
iter 20660: loss 8.3848, time 119.46ms
iter 20670: loss 7.6665, time 120.40ms
iter 20680: loss 8.9747, time 119.54ms
iter 20690: loss 8.1377, time 120.20ms
tensor(0.9524)
iter 20700: loss 7.6724, time 120.16ms
iter 20710: loss 8.1192, time 120.65ms
iter 20720: loss 8.8507, time 120.50ms
iter 20730: loss 8.0646, time 119.22ms
iter 20740: loss 8.4498, time 119.19ms
step 20750: train loss 6.8208, val loss 6.8397
saving checkpoint to out-shakespeare-char
iter 20750: loss 7.8575, time 2875.94ms
iter 20760: loss 7.7057, time 123.48ms
iter 20770: loss 8.4700, time 126.50ms
iter 20780: loss 8.7793, time 122.37ms
iter 20790: loss 8.0586, time 122.65ms
tensor(0.9382)
iter 20800: loss 8.5207, time 122.45ms
iter 20810: loss 7.5914, time 122.71ms
iter 20820: loss 7.7571, time 122.51ms
iter 20830: loss 8.3543, time 122.81ms
iter 20840: loss 7.8158, time 122.54ms
iter 20850: loss 8.1632, time 121.65ms
iter 20860: loss 7.5288, time 122.75ms
iter 20870: loss 8.2077, time 122.51ms
iter 20880: loss 7.5140, time 122.54ms
iter 20890: loss 8.7466, time 122.45ms
tensor(0.9222)
iter 20900: loss 9.1167, time 122.42ms
iter 20910: loss 7.7363, time 122.59ms
iter 20920: loss 8.9958, time 125.71ms
iter 20930: loss 8.1652, time 122.52ms
iter 20940: loss 8.0592, time 122.60ms
iter 20950: loss 8.0764, time 122.56ms
iter 20960: loss 7.7392, time 122.34ms
iter 20970: loss 8.7805, time 122.47ms
iter 20980: loss 8.8321, time 122.70ms
iter 20990: loss 7.8868, time 122.53ms
tensor(0.9045)
step 21000: train loss 6.7709, val loss 6.8407
saving checkpoint to out-shakespeare-char
iter 21000: loss 7.7746, time 2878.46ms
iter 21010: loss 7.8122, time 122.41ms
iter 21020: loss 8.8862, time 122.71ms
iter 21030: loss 8.3274, time 125.44ms
iter 21040: loss 8.4970, time 122.56ms
iter 21050: loss 8.1571, time 122.64ms
iter 21060: loss 7.6045, time 122.85ms
iter 21070: loss 8.2298, time 122.88ms
iter 21080: loss 8.4664, time 123.02ms
iter 21090: loss 7.8652, time 121.59ms
tensor(0.8853)
iter 21100: loss 7.6346, time 122.87ms
iter 21110: loss 8.3750, time 122.83ms
iter 21120: loss 8.3197, time 122.88ms
iter 21130: loss 7.9163, time 122.93ms
iter 21140: loss 7.6734, time 123.03ms
iter 21150: loss 9.3996, time 122.94ms
iter 21160: loss 8.3519, time 122.84ms
iter 21170: loss 8.9381, time 122.99ms
iter 21180: loss 6.8863, time 122.92ms
iter 21190: loss 8.3711, time 122.85ms
tensor(0.8645)
iter 21200: loss 8.5446, time 122.88ms
iter 21210: loss 8.2429, time 123.08ms
iter 21220: loss 9.4171, time 123.11ms
iter 21230: loss 8.5705, time 123.07ms
iter 21240: loss 8.3943, time 123.03ms
step 21250: train loss 6.8314, val loss 6.8810
saving checkpoint to out-shakespeare-char
iter 21250: loss 8.2001, time 2859.95ms
iter 21260: loss 8.2264, time 122.81ms
iter 21270: loss 8.4989, time 122.62ms
iter 21280: loss 8.2806, time 123.20ms
iter 21290: loss 8.5798, time 122.73ms
tensor(0.8423)
iter 21300: loss 7.5273, time 123.25ms
iter 21310: loss 7.1967, time 125.93ms
iter 21320: loss 8.2480, time 122.62ms
iter 21330: loss 8.8450, time 122.67ms
iter 21340: loss 8.5432, time 122.61ms
iter 21350: loss 8.3420, time 122.52ms
iter 21360: loss 8.4037, time 122.90ms
iter 21370: loss 7.7897, time 122.66ms
iter 21380: loss 8.1483, time 123.94ms
iter 21390: loss 9.1504, time 122.62ms
tensor(0.8187)
iter 21400: loss 8.9371, time 125.62ms
iter 21410: loss 7.9371, time 122.68ms
iter 21420: loss 7.5672, time 122.78ms
iter 21430: loss 8.9531, time 122.62ms
iter 21440: loss 8.2598, time 122.01ms
iter 21450: loss 7.6571, time 122.39ms
iter 21460: loss 7.9838, time 122.59ms
iter 21470: loss 7.5802, time 125.63ms
iter 21480: loss 8.6506, time 122.76ms
iter 21490: loss 7.4257, time 121.53ms
tensor(0.7939)
step 21500: train loss 6.8151, val loss 6.8213
saving checkpoint to out-shakespeare-char
iter 21500: loss 7.9841, time 2856.37ms
iter 21510: loss 8.5106, time 122.67ms
iter 21520: loss 8.3645, time 123.37ms
iter 21530: loss 8.2040, time 123.24ms
iter 21540: loss 8.2312, time 122.85ms
iter 21550: loss 8.4209, time 125.55ms
iter 21560: loss 8.0655, time 122.10ms
iter 21570: loss 8.6877, time 122.92ms
iter 21580: loss 7.5603, time 123.14ms
iter 21590: loss 8.8542, time 122.83ms
tensor(0.7679)
iter 21600: loss 8.0228, time 122.04ms
iter 21610: loss 7.9401, time 122.07ms
iter 21620: loss 8.7099, time 123.02ms
iter 21630: loss 8.3248, time 122.97ms
iter 21640: loss 8.4973, time 123.10ms
iter 21650: loss 8.6123, time 123.01ms
iter 21660: loss 7.8984, time 125.86ms
iter 21670: loss 8.1168, time 123.00ms
iter 21680: loss 8.6080, time 122.83ms
iter 21690: loss 8.7971, time 123.12ms
tensor(0.7409)
iter 21700: loss 8.4512, time 123.15ms
iter 21710: loss 8.3098, time 123.06ms
iter 21720: loss 7.7339, time 122.88ms
iter 21730: loss 7.9962, time 123.25ms
iter 21740: loss 7.7182, time 123.22ms
step 21750: train loss 6.8071, val loss 6.8362
saving checkpoint to out-shakespeare-char
iter 21750: loss 8.2883, time 2842.41ms
iter 21760: loss 8.6285, time 123.20ms
iter 21770: loss 8.9364, time 126.51ms
iter 21780: loss 8.1078, time 122.25ms
iter 21790: loss 8.0664, time 122.52ms
tensor(0.7129)
iter 21800: loss 8.1981, time 122.55ms
iter 21810: loss 7.8186, time 122.27ms
iter 21820: loss 7.8945, time 122.19ms
iter 21830: loss 8.8299, time 122.45ms
iter 21840: loss 8.2626, time 125.22ms
iter 21850: loss 9.0816, time 122.37ms
iter 21860: loss 8.0242, time 122.52ms
iter 21870: loss 8.0626, time 122.32ms
iter 21880: loss 8.5421, time 122.61ms
iter 21890: loss 7.9676, time 122.44ms
tensor(0.6841)
iter 21900: loss 8.5700, time 122.71ms
iter 21910: loss 8.7134, time 126.20ms
iter 21920: loss 9.3247, time 123.43ms
iter 21930: loss 8.4483, time 124.09ms
iter 21940: loss 8.7601, time 122.90ms
iter 21950: loss 8.5370, time 123.23ms
iter 21960: loss 7.8713, time 122.76ms
iter 21970: loss 8.4793, time 122.79ms
iter 21980: loss 7.6178, time 125.61ms
iter 21990: loss 8.5920, time 122.43ms
tensor(0.6545)
step 22000: train loss 6.8191, val loss 6.7574
saving checkpoint to out-shakespeare-char
iter 22000: loss 8.6448, time 2857.57ms
iter 22010: loss 7.9172, time 122.53ms
iter 22020: loss 7.9038, time 122.55ms
iter 22030: loss 7.5443, time 125.67ms
iter 22040: loss 7.8164, time 122.46ms
iter 22050: loss 8.1438, time 123.09ms
iter 22060: loss 8.3158, time 121.97ms
iter 22070: loss 7.0404, time 123.39ms
iter 22080: loss 8.9625, time 123.34ms
iter 22090: loss 8.4636, time 122.50ms
tensor(0.6243)
iter 22100: loss 7.5915, time 125.89ms
iter 22110: loss 8.3961, time 122.81ms
iter 22120: loss 7.4500, time 124.73ms
iter 22130: loss 8.4298, time 122.59ms
iter 22140: loss 7.8380, time 122.43ms
iter 22150: loss 7.6901, time 122.96ms
iter 22160: loss 8.5709, time 119.49ms
iter 22170: loss 8.5539, time 125.26ms
iter 22180: loss 7.8482, time 122.85ms
iter 22190: loss 7.8918, time 122.85ms
tensor(0.5937)
iter 22200: loss 7.7731, time 122.64ms
iter 22210: loss 8.3302, time 122.21ms
iter 22220: loss 7.9127, time 121.96ms
iter 22230: loss 8.4097, time 122.08ms
iter 22240: loss 8.3603, time 125.08ms
step 22250: train loss 6.7946, val loss 6.7863
saving checkpoint to out-shakespeare-char
iter 22250: loss 8.6336, time 2885.63ms
iter 22260: loss 8.1203, time 122.09ms
iter 22270: loss 7.5913, time 125.64ms
iter 22280: loss 8.7105, time 122.78ms
iter 22290: loss 7.4960, time 122.92ms
tensor(0.5627)
iter 22300: loss 8.8024, time 123.19ms
iter 22310: loss 7.7209, time 123.23ms
iter 22320: loss 8.4585, time 123.57ms
iter 22330: loss 8.8465, time 122.96ms
iter 22340: loss 8.0461, time 125.09ms
iter 22350: loss 8.0802, time 122.93ms
iter 22360: loss 8.5800, time 123.38ms
iter 22370: loss 8.9485, time 122.86ms
iter 22380: loss 7.6744, time 123.01ms
iter 22390: loss 8.0508, time 122.98ms
tensor(0.5314)
iter 22400: loss 7.3755, time 121.71ms
iter 22410: loss 8.5565, time 123.75ms
iter 22420: loss 8.0335, time 123.27ms
iter 22430: loss 8.7048, time 122.89ms
iter 22440: loss 8.6895, time 123.88ms
iter 22450: loss 9.0141, time 123.19ms
iter 22460: loss 8.6647, time 123.15ms
iter 22470: loss 8.9846, time 122.91ms
iter 22480: loss 8.6630, time 123.32ms
iter 22490: loss 7.7157, time 122.83ms
tensor(0.5000)
step 22500: train loss 6.8073, val loss 6.7833
saving checkpoint to out-shakespeare-char
iter 22500: loss 8.1416, time 2861.88ms
iter 22510: loss 8.4268, time 123.14ms
iter 22520: loss 8.1150, time 123.13ms
iter 22530: loss 8.6402, time 123.01ms
iter 22540: loss 8.9891, time 122.85ms
iter 22550: loss 8.9959, time 123.22ms
iter 22560: loss 8.1532, time 123.06ms
iter 22570: loss 7.7269, time 123.91ms
iter 22580: loss 9.2683, time 125.99ms
iter 22590: loss 8.8116, time 123.21ms
tensor(0.4686)
iter 22600: loss 8.1435, time 123.91ms
iter 22610: loss 8.5491, time 123.14ms
iter 22620: loss 8.2564, time 122.63ms
iter 22630: loss 8.2087, time 123.21ms
iter 22640: loss 7.5730, time 122.57ms
iter 22650: loss 8.0387, time 122.60ms
iter 22660: loss 7.7597, time 125.55ms
iter 22670: loss 7.9360, time 122.70ms
iter 22680: loss 7.8047, time 122.55ms
iter 22690: loss 7.5774, time 122.56ms
tensor(0.4373)
iter 22700: loss 8.4603, time 122.69ms
iter 22710: loss 7.5653, time 122.72ms
iter 22720: loss 8.7973, time 123.15ms
iter 22730: loss 8.2625, time 126.60ms
iter 22740: loss 8.9821, time 122.73ms
step 22750: train loss 6.7869, val loss 6.7928
saving checkpoint to out-shakespeare-char
iter 22750: loss 7.3278, time 2848.00ms
iter 22760: loss 8.1649, time 121.37ms
iter 22770: loss 6.6900, time 121.78ms
iter 22780: loss 7.9085, time 121.48ms
iter 22790: loss 8.2685, time 123.01ms
tensor(0.4063)
iter 22800: loss 7.7997, time 122.91ms
iter 22810: loss 7.2834, time 122.96ms
iter 22820: loss 8.2809, time 126.05ms
iter 22830: loss 8.4567, time 122.18ms
iter 22840: loss 7.8491, time 123.43ms
iter 22850: loss 8.3786, time 122.60ms
iter 22860: loss 7.9127, time 122.80ms
iter 22870: loss 8.1730, time 123.07ms
iter 22880: loss 7.6578, time 123.05ms
iter 22890: loss 8.3937, time 122.97ms
tensor(0.3757)
iter 22900: loss 8.4516, time 122.92ms
iter 22910: loss 8.3378, time 122.85ms
iter 22920: loss 8.7065, time 125.67ms
iter 22930: loss 7.5715, time 122.72ms
iter 22940: loss 8.4710, time 123.22ms
iter 22950: loss 7.6842, time 123.36ms
iter 22960: loss 7.7891, time 122.63ms
iter 22970: loss 7.9991, time 122.77ms
iter 22980: loss 8.2522, time 123.11ms
iter 22990: loss 8.5876, time 125.79ms
tensor(0.3455)
step 23000: train loss 6.8490, val loss 6.7968
saving checkpoint to out-shakespeare-char
iter 23000: loss 8.5748, time 2854.52ms
iter 23010: loss 8.0222, time 126.23ms
iter 23020: loss 7.3414, time 123.51ms
iter 23030: loss 7.8587, time 123.05ms
iter 23040: loss 7.6082, time 122.58ms
iter 23050: loss 7.5111, time 122.77ms
iter 23060: loss 8.1520, time 121.99ms
iter 23070: loss 8.4239, time 123.22ms
iter 23080: loss 7.6514, time 126.02ms
iter 23090: loss 8.3590, time 123.01ms
tensor(0.3159)
iter 23100: loss 7.4005, time 123.09ms
iter 23110: loss 7.9588, time 123.28ms
iter 23120: loss 8.4882, time 123.23ms
iter 23130: loss 8.5583, time 123.12ms
iter 23140: loss 7.9938, time 123.30ms
iter 23150: loss 8.0516, time 123.29ms
iter 23160: loss 7.9254, time 125.61ms
iter 23170: loss 8.2023, time 123.04ms
iter 23180: loss 8.5871, time 122.79ms
iter 23190: loss 8.2791, time 122.71ms
tensor(0.2871)
iter 23200: loss 8.2193, time 122.19ms
iter 23210: loss 7.8670, time 122.93ms
iter 23220: loss 7.7574, time 121.79ms
iter 23230: loss 7.6510, time 125.78ms
iter 23240: loss 8.5028, time 122.57ms
step 23250: train loss 6.7360, val loss 6.7714
saving checkpoint to out-shakespeare-char
iter 23250: loss 7.6953, time 2851.10ms
iter 23260: loss 8.4342, time 122.84ms
iter 23270: loss 7.7747, time 123.10ms
iter 23280: loss 7.4500, time 122.57ms
iter 23290: loss 7.8670, time 122.31ms
tensor(0.2591)
iter 23300: loss 7.2402, time 122.49ms
iter 23310: loss 7.5905, time 125.71ms
iter 23320: loss 8.9908, time 122.84ms
iter 23330: loss 8.4533, time 122.99ms
iter 23340: loss 7.8413, time 123.11ms
iter 23350: loss 7.4946, time 122.96ms
iter 23360: loss 7.3910, time 122.19ms
iter 23370: loss 8.0310, time 123.20ms
iter 23380: loss 8.1898, time 126.61ms
iter 23390: loss 7.7684, time 123.22ms
tensor(0.2321)
iter 23400: loss 8.9771, time 122.93ms
iter 23410: loss 7.5734, time 123.60ms
iter 23420: loss 7.6497, time 121.26ms
iter 23430: loss 8.5421, time 119.92ms
iter 23440: loss 8.2771, time 119.75ms
iter 23450: loss 8.3330, time 119.74ms
iter 23460: loss 8.4808, time 120.70ms
iter 23470: loss 7.3245, time 120.31ms
iter 23480: loss 8.0132, time 121.03ms
iter 23490: loss 8.1369, time 119.72ms
tensor(0.2061)
step 23500: train loss 6.7470, val loss 6.7787
saving checkpoint to out-shakespeare-char
iter 23500: loss 8.0968, time 2882.47ms
iter 23510: loss 8.5113, time 126.21ms
iter 23520: loss 9.1070, time 123.22ms
iter 23530: loss 7.5357, time 123.55ms
iter 23540: loss 7.6338, time 123.16ms
iter 23550: loss 8.1915, time 123.20ms
iter 23560: loss 8.7716, time 123.39ms
iter 23570: loss 8.0671, time 123.28ms
iter 23580: loss 8.4433, time 123.30ms
iter 23590: loss 7.9697, time 123.50ms
tensor(0.1813)
iter 23600: loss 8.4446, time 123.60ms
iter 23610: loss 7.8457, time 123.40ms
iter 23620: loss 8.1092, time 123.23ms
iter 23630: loss 7.3168, time 123.79ms
iter 23640: loss 8.2049, time 123.31ms
iter 23650: loss 7.9325, time 123.44ms
iter 23660: loss 7.7737, time 126.29ms
iter 23670: loss 8.6524, time 121.97ms
iter 23680: loss 8.1358, time 123.37ms
iter 23690: loss 8.1941, time 123.36ms
tensor(0.1577)
iter 23700: loss 8.7167, time 123.29ms
iter 23710: loss 8.1658, time 123.31ms
iter 23720: loss 7.9402, time 123.30ms
iter 23730: loss 8.0197, time 123.42ms
iter 23740: loss 7.8763, time 123.34ms
step 23750: train loss 6.7928, val loss 6.8455
saving checkpoint to out-shakespeare-char
iter 23750: loss 8.0947, time 2846.38ms
iter 23760: loss 8.2760, time 123.56ms
iter 23770: loss 8.3708, time 126.85ms
iter 23780: loss 8.5776, time 122.86ms
iter 23790: loss 8.4639, time 122.48ms
tensor(0.1355)
iter 23800: loss 8.3947, time 123.14ms
iter 23810: loss 8.3686, time 123.22ms
iter 23820: loss 8.0197, time 123.03ms
iter 23830: loss 8.6605, time 122.43ms
iter 23840: loss 7.8120, time 125.80ms
iter 23850: loss 8.5232, time 122.85ms
iter 23860: loss 8.0566, time 123.11ms
iter 23870: loss 8.2562, time 122.96ms
iter 23880: loss 8.6762, time 122.91ms
iter 23890: loss 8.5118, time 123.09ms
tensor(0.1147)
iter 23900: loss 8.8671, time 122.66ms
iter 23910: loss 7.8532, time 125.62ms
iter 23920: loss 7.7900, time 123.37ms
iter 23930: loss 8.0480, time 123.44ms
iter 23940: loss 7.1205, time 123.08ms
iter 23950: loss 8.6094, time 123.46ms
iter 23960: loss 8.5804, time 122.85ms
iter 23970: loss 8.8211, time 123.40ms
iter 23980: loss 8.2303, time 126.33ms
iter 23990: loss 7.8813, time 121.92ms
tensor(0.0955)
step 24000: train loss 6.7797, val loss 6.7898
saving checkpoint to out-shakespeare-char
iter 24000: loss 8.9722, time 2840.16ms
iter 24010: loss 7.5421, time 126.09ms
iter 24020: loss 7.8709, time 123.13ms
iter 24030: loss 8.0094, time 123.41ms
iter 24040: loss 8.6586, time 123.30ms
iter 24050: loss 7.9038, time 124.58ms
iter 24060: loss 8.4832, time 124.26ms
iter 24070: loss 7.3206, time 122.19ms
iter 24080: loss 8.0611, time 123.17ms
iter 24090: loss 7.9657, time 126.24ms
tensor(0.0778)
iter 24100: loss 8.7559, time 123.40ms
iter 24110: loss 8.0534, time 122.58ms
iter 24120: loss 7.6785, time 122.63ms
iter 24130: loss 7.9932, time 123.85ms
iter 24140: loss 8.0811, time 122.62ms
iter 24150: loss 7.9851, time 124.79ms
iter 24160: loss 8.1239, time 122.56ms
iter 24170: loss 7.5893, time 121.23ms
iter 24180: loss 8.3118, time 122.55ms
iter 24190: loss 8.7736, time 122.50ms
tensor(0.0618)
iter 24200: loss 8.7167, time 122.61ms
iter 24210: loss 7.9686, time 125.45ms
iter 24220: loss 8.0919, time 122.59ms
iter 24230: loss 8.7178, time 122.85ms
iter 24240: loss 8.0743, time 122.48ms
step 24250: train loss 6.7620, val loss 6.7427
saving checkpoint to out-shakespeare-char
iter 24250: loss 8.0540, time 2861.54ms
iter 24260: loss 7.2999, time 122.82ms
iter 24270: loss 7.5120, time 123.23ms
iter 24280: loss 7.4190, time 121.88ms
iter 24290: loss 8.5206, time 125.62ms
tensor(0.0476)
iter 24300: loss 8.5705, time 124.22ms
iter 24310: loss 8.1516, time 121.93ms
iter 24320: loss 7.9279, time 122.92ms
iter 24330: loss 7.8234, time 122.76ms
iter 24340: loss 8.2226, time 124.23ms
iter 24350: loss 7.7593, time 122.93ms
iter 24360: loss 8.1654, time 125.92ms
iter 24370: loss 8.8362, time 124.26ms
iter 24380: loss 9.2684, time 122.71ms
iter 24390: loss 8.0234, time 122.77ms
tensor(0.0351)
iter 24400: loss 8.2626, time 122.75ms
iter 24410: loss 8.0475, time 123.46ms
iter 24420: loss 8.2123, time 122.64ms
iter 24430: loss 7.4781, time 123.20ms
iter 24440: loss 8.3800, time 126.42ms
iter 24450: loss 8.8423, time 123.36ms
iter 24460: loss 8.4907, time 123.36ms
iter 24470: loss 7.8947, time 123.23ms
iter 24480: loss 7.5896, time 123.48ms
iter 24490: loss 8.0311, time 122.60ms
tensor(0.0245)
step 24500: train loss 6.7161, val loss 6.7980
saving checkpoint to out-shakespeare-char
iter 24500: loss 8.3361, time 2847.26ms
iter 24510: loss 8.2253, time 122.52ms
iter 24520: loss 8.4340, time 122.66ms
iter 24530: loss 8.3013, time 122.64ms
iter 24540: loss 7.9539, time 124.92ms
iter 24550: loss 9.0648, time 122.49ms
iter 24560: loss 8.1526, time 122.57ms
iter 24570: loss 7.4202, time 123.23ms
iter 24580: loss 8.4723, time 122.16ms
iter 24590: loss 8.0427, time 122.68ms
tensor(0.0157)
iter 24600: loss 7.8169, time 123.14ms
iter 24610: loss 9.2259, time 122.30ms
iter 24620: loss 8.0314, time 125.34ms
iter 24630: loss 7.4792, time 122.67ms
iter 24640: loss 8.8933, time 122.52ms
iter 24650: loss 8.0519, time 122.54ms
iter 24660: loss 7.7989, time 122.56ms
iter 24670: loss 7.9000, time 122.91ms
iter 24680: loss 8.4224, time 126.01ms
iter 24690: loss 8.3905, time 123.20ms
tensor(0.0089)
iter 24700: loss 7.3767, time 123.32ms
iter 24710: loss 8.8377, time 123.91ms
iter 24720: loss 7.3155, time 125.65ms
iter 24730: loss 8.4478, time 126.36ms
iter 24740: loss 7.7625, time 123.30ms
step 24750: train loss 6.7438, val loss 6.7332
saving checkpoint to out-shakespeare-char
iter 24750: loss 7.8907, time 2882.19ms
iter 24760: loss 8.4665, time 123.26ms
iter 24770: loss 7.9674, time 123.41ms
iter 24780: loss 8.4653, time 124.70ms
iter 24790: loss 8.9560, time 123.34ms
tensor(0.0039)
iter 24800: loss 7.8322, time 126.35ms
iter 24810: loss 7.8625, time 123.31ms
iter 24820: loss 8.3783, time 123.12ms
iter 24830: loss 7.7097, time 128.48ms
iter 24840: loss 9.1142, time 123.24ms
iter 24850: loss 8.1955, time 123.60ms
iter 24860: loss 8.2228, time 123.30ms
iter 24870: loss 8.3627, time 123.25ms
iter 24880: loss 8.3459, time 123.53ms
iter 24890: loss 8.1633, time 122.76ms
tensor(0.0010)
iter 24900: loss 8.0564, time 123.24ms
iter 24910: loss 8.5575, time 123.57ms
iter 24920: loss 7.5050, time 126.26ms
iter 24930: loss 8.4583, time 124.69ms
iter 24940: loss 7.2490, time 123.49ms
iter 24950: loss 7.8775, time 123.21ms
iter 24960: loss 8.3925, time 123.24ms
iter 24970: loss 7.5913, time 123.48ms
iter 24980: loss 8.3475, time 123.45ms
iter 24990: loss 7.9668, time 123.16ms
tensor(0.0010)
step 25000: train loss 6.7411, val loss 6.7092
saving checkpoint to out-shakespeare-char
iter 25000: loss 7.7656, time 2889.05ms
iter 25010: loss 8.3356, time 125.42ms
iter 25020: loss 7.9351, time 121.31ms
iter 25030: loss 8.3421, time 122.50ms
iter 25040: loss 7.6388, time 122.84ms
iter 25050: loss 8.2421, time 121.57ms
iter 25060: loss 7.9819, time 122.44ms
iter 25070: loss 7.6688, time 122.98ms
iter 25080: loss 9.0638, time 122.64ms
iter 25090: loss 8.6609, time 122.71ms
tensor(0.0010)
iter 25100: loss 8.6374, time 121.99ms
iter 25110: loss 8.2523, time 125.46ms
iter 25120: loss 8.4786, time 122.75ms
iter 25130: loss 7.8655, time 122.69ms
iter 25140: loss 8.5055, time 123.31ms
iter 25150: loss 8.4812, time 122.87ms
iter 25160: loss 8.0066, time 122.79ms
iter 25170: loss 7.6292, time 122.63ms
iter 25180: loss 8.4891, time 124.18ms
iter 25190: loss 7.4781, time 122.56ms
tensor(0.0039)
iter 25200: loss 8.4419, time 122.59ms
iter 25210: loss 8.2341, time 122.47ms
iter 25220: loss 8.9196, time 123.91ms
iter 25230: loss 7.6374, time 122.57ms
iter 25240: loss 8.7111, time 122.30ms
step 25250: train loss 6.7423, val loss 6.7888
saving checkpoint to out-shakespeare-char
iter 25250: loss 7.8544, time 2852.12ms
iter 25260: loss 8.5851, time 123.12ms
iter 25270: loss 8.1014, time 122.77ms
iter 25280: loss 8.2368, time 125.30ms
iter 25290: loss 6.6890, time 122.45ms
tensor(0.0089)
iter 25300: loss 7.7402, time 122.61ms
iter 25310: loss 8.2052, time 122.48ms
iter 25320: loss 8.1602, time 121.54ms
iter 25330: loss 8.6521, time 122.72ms
iter 25340: loss 8.1109, time 122.53ms
iter 25350: loss 8.3619, time 124.50ms
iter 25360: loss 8.1967, time 122.57ms
iter 25370: loss 7.9758, time 122.49ms
iter 25380: loss 7.6847, time 122.67ms
iter 25390: loss 8.6907, time 122.71ms
tensor(0.0157)
iter 25400: loss 8.7958, time 124.50ms
iter 25410: loss 7.6812, time 122.50ms
iter 25420: loss 8.3950, time 125.38ms
iter 25430: loss 8.0716, time 122.45ms
iter 25440: loss 8.0219, time 122.52ms
iter 25450: loss 7.8677, time 121.57ms
iter 25460: loss 8.3709, time 122.53ms
iter 25470: loss 8.8004, time 122.70ms
iter 25480: loss 7.5506, time 122.57ms
iter 25490: loss 7.6740, time 125.72ms
tensor(0.0245)
step 25500: train loss 6.7531, val loss 6.7566
saving checkpoint to out-shakespeare-char
iter 25500: loss 8.3799, time 2862.15ms
iter 25510: loss 7.4360, time 122.60ms
iter 25520: loss 8.5212, time 122.59ms
iter 25530: loss 7.6714, time 125.40ms
iter 25540: loss 7.7846, time 122.47ms
iter 25550: loss 7.9966, time 122.59ms
iter 25560: loss 8.1193, time 122.69ms
iter 25570: loss 7.9939, time 122.55ms
iter 25580: loss 7.5116, time 122.64ms
iter 25590: loss 8.0167, time 123.00ms
tensor(0.0351)
iter 25600: loss 7.6888, time 125.51ms
iter 25610: loss 8.5750, time 122.35ms
iter 25620: loss 7.6512, time 122.97ms
iter 25630: loss 8.1070, time 122.53ms
iter 25640: loss 8.0731, time 122.79ms
iter 25650: loss 7.7371, time 122.79ms
iter 25660: loss 8.1227, time 122.93ms
iter 25670: loss 8.7684, time 122.89ms
iter 25680: loss 7.9944, time 122.33ms
iter 25690: loss 8.2839, time 122.49ms
tensor(0.0476)
iter 25700: loss 7.4037, time 122.26ms
iter 25710: loss 7.9708, time 121.56ms
iter 25720: loss 7.5838, time 122.45ms
iter 25730: loss 8.0368, time 125.33ms
iter 25740: loss 7.7912, time 122.88ms
step 25750: train loss 6.6852, val loss 6.8112
saving checkpoint to out-shakespeare-char
iter 25750: loss 7.5486, time 2844.46ms
iter 25760: loss 7.3968, time 125.31ms
iter 25770: loss 7.8533, time 122.34ms
iter 25780: loss 8.2329, time 122.31ms
iter 25790: loss 7.7691, time 122.62ms
tensor(0.0618)
iter 25800: loss 8.8155, time 123.11ms
iter 25810: loss 7.9899, time 122.71ms
iter 25820: loss 7.8360, time 122.74ms
iter 25830: loss 8.7197, time 125.03ms
iter 25840: loss 7.7725, time 122.33ms
iter 25850: loss 8.7467, time 122.47ms
iter 25860: loss 7.8462, time 122.56ms
iter 25870: loss 8.1993, time 122.25ms
iter 25880: loss 7.9167, time 122.53ms
iter 25890: loss 7.3957, time 122.36ms
tensor(0.0778)
iter 25900: loss 9.0139, time 125.18ms
iter 25910: loss 7.7010, time 122.22ms
iter 25920: loss 8.1697, time 122.43ms
iter 25930: loss 7.9422, time 123.66ms
iter 25940: loss 8.4210, time 122.91ms
iter 25950: loss 7.9028, time 123.21ms
iter 25960: loss 8.0222, time 121.58ms
iter 25970: loss 8.3022, time 122.75ms
iter 25980: loss 8.2656, time 125.65ms
iter 25990: loss 8.5291, time 122.73ms
tensor(0.0955)
step 26000: train loss 6.6941, val loss 6.7187
saving checkpoint to out-shakespeare-char
iter 26000: loss 7.5215, time 2863.55ms
iter 26010: loss 8.1849, time 120.24ms
iter 26020: loss 7.8813, time 120.25ms
iter 26030: loss 7.4232, time 119.55ms
iter 26040: loss 8.0879, time 119.33ms
iter 26050: loss 8.7657, time 119.08ms
iter 26060: loss 7.8687, time 120.41ms
iter 26070: loss 7.9366, time 118.94ms
iter 26080: loss 8.0296, time 120.38ms
iter 26090: loss 7.8215, time 120.62ms
tensor(0.1147)
iter 26100: loss 7.5849, time 119.26ms
iter 26110: loss 7.9574, time 120.47ms
iter 26120: loss 7.5644, time 119.18ms
iter 26130: loss 8.8131, time 119.58ms
iter 26140: loss 7.6268, time 119.31ms
iter 26150: loss 7.4297, time 119.39ms
iter 26160: loss 7.2428, time 123.16ms
iter 26170: loss 8.4385, time 122.51ms
iter 26180: loss 8.1518, time 122.64ms
iter 26190: loss 7.3121, time 122.48ms
tensor(0.1355)
iter 26200: loss 8.8064, time 122.68ms
iter 26210: loss 8.8483, time 122.55ms
iter 26220: loss 8.0527, time 122.55ms
iter 26230: loss 7.7575, time 122.68ms
iter 26240: loss 7.7931, time 122.73ms
step 26250: train loss 6.7389, val loss 6.7463
saving checkpoint to out-shakespeare-char
iter 26250: loss 7.9466, time 2874.80ms
iter 26260: loss 8.0483, time 119.62ms
iter 26270: loss 8.4819, time 119.67ms
iter 26280: loss 7.6207, time 120.88ms
iter 26290: loss 7.9047, time 121.29ms
tensor(0.1577)
iter 26300: loss 7.9467, time 120.98ms
iter 26310: loss 8.6383, time 119.65ms
iter 26320: loss 8.2668, time 120.38ms
iter 26330: loss 9.0786, time 119.81ms
iter 26340: loss 7.8626, time 121.17ms
iter 26350: loss 8.0159, time 119.80ms
iter 26360: loss 7.1830, time 119.75ms
iter 26370: loss 7.4610, time 119.53ms
iter 26380: loss 8.2792, time 119.51ms
iter 26390: loss 8.3197, time 121.01ms
tensor(0.1813)
iter 26400: loss 8.0680, time 120.28ms
iter 26410: loss 8.5061, time 120.91ms
iter 26420: loss 7.9384, time 120.56ms
iter 26430: loss 8.2952, time 120.84ms
iter 26440: loss 8.1599, time 119.57ms
iter 26450: loss 8.1454, time 120.71ms
iter 26460: loss 8.1707, time 120.74ms
iter 26470: loss 8.6025, time 119.49ms
iter 26480: loss 7.9773, time 119.87ms
iter 26490: loss 7.5707, time 121.15ms
tensor(0.2061)
step 26500: train loss 6.8395, val loss 6.7549
saving checkpoint to out-shakespeare-char
iter 26500: loss 7.9935, time 2874.60ms
iter 26510: loss 9.1527, time 122.12ms
iter 26520: loss 8.0750, time 120.90ms
iter 26530: loss 7.5581, time 119.45ms
iter 26540: loss 7.6514, time 119.66ms
iter 26550: loss 7.9710, time 118.56ms
iter 26560: loss 8.0421, time 119.47ms
iter 26570: loss 7.8029, time 119.63ms
iter 26580: loss 8.4736, time 119.83ms
iter 26590: loss 7.6716, time 120.88ms
tensor(0.2321)
iter 26600: loss 8.2803, time 118.80ms
iter 26610: loss 8.7001, time 120.92ms
iter 26620: loss 8.4833, time 121.25ms
iter 26630: loss 8.7558, time 120.67ms
iter 26640: loss 7.9495, time 119.57ms
iter 26650: loss 7.7509, time 120.09ms
iter 26660: loss 8.0702, time 120.92ms
iter 26670: loss 9.1523, time 120.76ms
iter 26680: loss 7.7755, time 119.25ms
iter 26690: loss 7.8853, time 121.85ms
tensor(0.2591)
iter 26700: loss 8.3800, time 119.67ms
iter 26710: loss 8.2426, time 119.63ms
iter 26720: loss 8.7451, time 119.56ms
iter 26730: loss 8.1521, time 119.66ms
iter 26740: loss 8.4150, time 120.91ms
step 26750: train loss 6.7805, val loss 6.7369
saving checkpoint to out-shakespeare-char
iter 26750: loss 8.5499, time 2886.82ms
iter 26760: loss 8.2313, time 119.75ms
iter 26770: loss 8.4343, time 120.04ms
iter 26780: loss 8.2169, time 119.12ms
iter 26790: loss 7.6006, time 120.28ms
tensor(0.2871)
iter 26800: loss 8.2900, time 119.00ms
iter 26810: loss 8.0664, time 120.41ms
iter 26820: loss 7.9756, time 119.22ms
iter 26830: loss 8.3348, time 119.25ms
iter 26840: loss 7.9327, time 122.27ms
iter 26850: loss 7.5733, time 119.66ms
iter 26860: loss 8.8436, time 122.49ms
iter 26870: loss 7.5137, time 119.25ms
iter 26880: loss 7.5308, time 119.05ms
iter 26890: loss 8.6044, time 119.01ms
tensor(0.3159)
iter 26900: loss 7.6682, time 119.67ms
iter 26910: loss 7.6958, time 119.37ms
iter 26920: loss 7.7823, time 119.65ms
iter 26930: loss 7.2588, time 119.08ms
iter 26940: loss 9.1377, time 119.97ms
iter 26950: loss 7.9830, time 119.05ms
iter 26960: loss 7.7365, time 119.07ms
iter 26970: loss 8.5246, time 120.32ms
iter 26980: loss 8.1092, time 119.72ms
iter 26990: loss 8.0866, time 120.44ms
tensor(0.3455)
step 27000: train loss 6.7093, val loss 6.7284
saving checkpoint to out-shakespeare-char
iter 27000: loss 8.2635, time 2865.96ms
iter 27010: loss 8.5353, time 120.19ms
iter 27020: loss 7.8991, time 120.27ms
iter 27030: loss 8.5725, time 119.30ms
iter 27040: loss 7.7966, time 119.36ms
iter 27050: loss 8.3453, time 121.54ms
iter 27060: loss 7.9063, time 118.35ms
iter 27070: loss 8.6318, time 119.61ms
iter 27080: loss 7.5716, time 120.27ms
iter 27090: loss 7.6355, time 122.11ms
tensor(0.3757)
iter 27100: loss 8.4324, time 119.80ms
iter 27110: loss 7.8972, time 119.02ms
iter 27120: loss 7.3420, time 117.65ms
iter 27130: loss 7.7383, time 123.34ms
iter 27140: loss 8.2720, time 118.67ms
iter 27150: loss 8.1569, time 122.96ms
iter 27160: loss 8.3191, time 119.52ms
iter 27170: loss 8.7415, time 119.93ms
iter 27180: loss 7.6948, time 119.13ms
iter 27190: loss 7.4456, time 120.66ms
tensor(0.4063)
iter 27200: loss 8.1139, time 119.66ms
iter 27210: loss 7.8052, time 120.04ms
iter 27220: loss 8.3017, time 120.88ms
iter 27230: loss 8.2371, time 119.82ms
iter 27240: loss 7.9427, time 119.23ms
step 27250: train loss 6.7459, val loss 6.8222
saving checkpoint to out-shakespeare-char
iter 27250: loss 8.3235, time 2881.02ms
iter 27260: loss 8.1496, time 120.42ms
iter 27270: loss 8.6519, time 119.48ms
iter 27280: loss 8.1541, time 122.17ms
iter 27290: loss 8.0647, time 120.52ms
tensor(0.4373)
iter 27300: loss 7.8823, time 121.04ms
iter 27310: loss 7.8137, time 119.40ms
iter 27320: loss 8.7931, time 121.07ms
iter 27330: loss 7.7033, time 119.17ms
iter 27340: loss 8.6239, time 118.32ms
iter 27350: loss 8.1785, time 119.08ms
iter 27360: loss 7.8053, time 120.20ms
iter 27370: loss 8.2227, time 120.52ms
iter 27380: loss 8.1089, time 120.44ms
iter 27390: loss 8.0558, time 119.25ms
tensor(0.4686)
iter 27400: loss 8.4369, time 119.22ms
iter 27410: loss 7.9899, time 120.45ms
iter 27420: loss 7.8728, time 121.81ms
iter 27430: loss 7.6610, time 119.20ms
iter 27440: loss 8.2491, time 118.22ms
iter 27450: loss 7.8035, time 120.27ms
iter 27460: loss 8.1813, time 122.13ms
iter 27470: loss 7.9802, time 120.57ms
iter 27480: loss 8.1746, time 120.77ms
iter 27490: loss 8.5008, time 118.76ms
tensor(0.5000)
step 27500: train loss 6.7745, val loss 6.7728
saving checkpoint to out-shakespeare-char
iter 27500: loss 8.5489, time 2872.82ms
iter 27510: loss 8.3848, time 121.05ms
iter 27520: loss 8.7198, time 121.35ms
iter 27530: loss 8.1087, time 119.65ms
iter 27540: loss 8.1620, time 119.12ms
iter 27550: loss 7.2888, time 119.29ms
iter 27560: loss 8.3909, time 119.73ms
iter 27570: loss 7.6483, time 120.33ms
iter 27580: loss 7.6923, time 119.48ms
iter 27590: loss 7.7277, time 120.08ms
tensor(0.5314)
iter 27600: loss 7.3305, time 119.88ms
iter 27610: loss 8.0679, time 123.06ms
iter 27620: loss 8.0040, time 125.14ms
iter 27630: loss 8.8166, time 123.94ms
iter 27640: loss 7.4360, time 123.35ms
iter 27650: loss 8.0481, time 123.05ms
iter 27660: loss 8.1796, time 122.94ms
iter 27670: loss 8.2504, time 123.73ms
iter 27680: loss 7.3379, time 121.99ms
iter 27690: loss 7.8604, time 122.84ms
tensor(0.5627)
iter 27700: loss 9.1960, time 121.43ms
iter 27710: loss 8.3138, time 124.21ms
iter 27720: loss 7.6239, time 121.75ms
iter 27730: loss 8.5939, time 121.55ms
iter 27740: loss 7.8809, time 123.00ms
step 27750: train loss 6.7363, val loss 6.7574
saving checkpoint to out-shakespeare-char
iter 27750: loss 7.6521, time 2862.23ms
iter 27760: loss 7.6754, time 119.61ms
iter 27770: loss 8.3086, time 119.31ms
iter 27780: loss 7.2006, time 122.58ms
iter 27790: loss 7.7169, time 121.43ms
tensor(0.5937)
iter 27800: loss 7.1200, time 122.12ms
iter 27810: loss 7.8450, time 120.59ms
iter 27820: loss 8.1201, time 120.61ms
iter 27830: loss 7.6594, time 119.28ms
iter 27840: loss 7.2271, time 119.35ms
iter 27850: loss 7.5580, time 119.25ms
iter 27860: loss 7.5350, time 119.48ms
iter 27870: loss 7.7508, time 119.24ms
iter 27880: loss 8.7734, time 119.26ms
iter 27890: loss 8.5534, time 119.32ms
tensor(0.6243)
iter 27900: loss 8.4301, time 119.50ms
iter 27910: loss 8.1054, time 119.54ms
iter 27920: loss 8.2240, time 119.15ms
iter 27930: loss 8.6775, time 119.58ms
iter 27940: loss 7.1126, time 121.21ms
iter 27950: loss 8.6632, time 121.04ms
iter 27960: loss 7.8803, time 119.18ms
iter 27970: loss 8.5085, time 119.61ms
iter 27980: loss 7.8917, time 119.30ms
iter 27990: loss 8.5027, time 119.65ms
tensor(0.6545)
step 28000: train loss 6.7736, val loss 6.7239
saving checkpoint to out-shakespeare-char
iter 28000: loss 9.0430, time 2874.12ms
iter 28010: loss 8.1019, time 122.09ms
iter 28020: loss 8.2363, time 119.78ms
iter 28030: loss 7.5391, time 119.39ms
iter 28040: loss 8.2946, time 119.34ms
iter 28050: loss 8.5437, time 119.49ms
iter 28060: loss 8.2402, time 119.49ms
iter 28070: loss 8.5729, time 120.29ms
iter 28080: loss 8.2347, time 119.39ms
iter 28090: loss 8.4462, time 120.07ms
tensor(0.6841)
iter 28100: loss 7.7541, time 121.03ms
iter 28110: loss 8.5797, time 118.77ms
iter 28120: loss 7.7992, time 119.16ms
iter 28130: loss 8.1552, time 118.80ms
iter 28140: loss 7.6921, time 118.58ms
iter 28150: loss 7.5905, time 121.24ms
iter 28160: loss 7.6499, time 123.01ms
iter 28170: loss 7.9254, time 122.30ms
iter 28180: loss 8.0564, time 121.80ms
iter 28190: loss 7.9101, time 122.61ms
tensor(0.7129)
iter 28200: loss 7.8404, time 122.89ms
iter 28210: loss 8.4379, time 122.76ms
iter 28220: loss 8.8878, time 125.38ms
iter 28230: loss 8.1912, time 122.61ms
iter 28240: loss 8.3006, time 122.76ms
step 28250: train loss 6.8403, val loss 6.7636
saving checkpoint to out-shakespeare-char
iter 28250: loss 9.1218, time 2845.34ms
iter 28260: loss 7.8377, time 126.01ms
iter 28270: loss 7.9289, time 123.07ms
iter 28280: loss 8.6299, time 123.00ms
iter 28290: loss 8.1429, time 123.11ms
tensor(0.7409)
iter 28300: loss 8.6184, time 123.09ms
iter 28310: loss 7.4082, time 123.10ms
iter 28320: loss 8.5446, time 123.45ms
iter 28330: loss 8.7959, time 126.12ms
iter 28340: loss 8.6149, time 123.06ms
iter 28350: loss 8.6240, time 123.23ms
iter 28360: loss 7.8698, time 123.24ms
iter 28370: loss 7.4763, time 123.38ms
iter 28380: loss 8.1208, time 123.85ms
iter 28390: loss 8.1036, time 123.20ms
tensor(0.7679)
iter 28400: loss 7.7591, time 123.34ms
iter 28410: loss 7.9198, time 123.35ms
iter 28420: loss 8.7131, time 123.26ms
iter 28430: loss 8.6772, time 126.26ms
iter 28440: loss 8.0724, time 123.01ms
iter 28450: loss 8.6792, time 123.26ms
iter 28460: loss 8.5891, time 123.21ms
iter 28470: loss 8.3719, time 123.28ms
iter 28480: loss 8.0786, time 123.18ms
iter 28490: loss 7.1391, time 123.34ms
tensor(0.7939)
step 28500: train loss 6.7155, val loss 6.7052
saving checkpoint to out-shakespeare-char
iter 28500: loss 8.7115, time 2856.31ms
iter 28510: loss 7.1554, time 122.57ms
iter 28520: loss 8.0556, time 123.05ms
iter 28530: loss 8.0604, time 122.66ms
iter 28540: loss 7.1994, time 122.83ms
iter 28550: loss 8.8719, time 123.01ms
iter 28560: loss 7.6926, time 122.87ms
iter 28570: loss 8.1981, time 123.05ms
iter 28580: loss 8.6278, time 123.00ms
iter 28590: loss 7.6431, time 122.98ms
tensor(0.8187)
iter 28600: loss 8.3638, time 123.53ms
iter 28610: loss 7.9890, time 123.02ms
iter 28620: loss 8.1483, time 122.92ms
iter 28630: loss 7.5983, time 122.99ms
iter 28640: loss 8.3380, time 122.70ms
iter 28650: loss 7.9903, time 122.65ms
iter 28660: loss 8.2568, time 122.61ms
iter 28670: loss 7.8439, time 122.77ms
iter 28680: loss 8.2957, time 122.85ms
iter 28690: loss 7.5593, time 122.51ms
tensor(0.8423)
iter 28700: loss 7.6053, time 125.43ms
iter 28710: loss 8.3766, time 122.52ms
iter 28720: loss 7.4904, time 122.36ms
iter 28730: loss 8.2899, time 122.68ms
iter 28740: loss 7.5791, time 122.86ms
step 28750: train loss 6.8003, val loss 6.7790
saving checkpoint to out-shakespeare-char
iter 28750: loss 8.2232, time 2855.93ms
iter 28760: loss 8.1904, time 123.16ms
iter 28770: loss 7.8229, time 123.13ms
iter 28780: loss 7.6693, time 125.86ms
iter 28790: loss 8.1436, time 123.34ms
tensor(0.8645)
iter 28800: loss 7.3228, time 123.24ms
iter 28810: loss 8.0045, time 122.89ms
iter 28820: loss 7.8793, time 122.83ms
iter 28830: loss 7.8043, time 123.42ms
iter 28840: loss 7.6937, time 122.78ms
iter 28850: loss 7.5373, time 122.82ms
iter 28860: loss 7.3404, time 125.58ms
iter 28870: loss 8.9660, time 122.86ms
iter 28880: loss 8.1776, time 121.94ms
iter 28890: loss 7.5537, time 124.35ms
tensor(0.8853)
iter 28900: loss 8.0052, time 122.73ms
iter 28910: loss 8.3642, time 123.08ms
iter 28920: loss 7.7535, time 123.37ms
iter 28930: loss 8.1343, time 123.73ms
iter 28940: loss 8.2419, time 122.81ms
iter 28950: loss 7.3017, time 122.38ms
iter 28960: loss 7.3407, time 125.16ms
iter 28970: loss 7.9010, time 122.35ms
iter 28980: loss 7.9018, time 122.40ms
iter 28990: loss 8.2934, time 122.22ms
tensor(0.9045)
step 29000: train loss 6.7575, val loss 6.7923
saving checkpoint to out-shakespeare-char
iter 29000: loss 7.9135, time 2841.85ms
iter 29010: loss 8.4065, time 123.46ms
iter 29020: loss 8.1602, time 122.72ms
iter 29030: loss 8.0581, time 122.62ms
iter 29040: loss 8.4454, time 125.18ms
iter 29050: loss 7.3413, time 121.48ms
iter 29060: loss 8.5052, time 122.77ms
iter 29070: loss 8.8942, time 122.68ms
iter 29080: loss 7.5434, time 122.59ms
iter 29090: loss 8.2292, time 122.73ms
tensor(0.9222)
iter 29100: loss 8.2407, time 122.86ms
iter 29110: loss 8.3811, time 122.52ms
iter 29120: loss 7.9960, time 122.37ms
iter 29130: loss 8.8725, time 122.47ms
iter 29140: loss 7.7052, time 123.16ms
iter 29150: loss 8.2962, time 122.60ms
iter 29160: loss 7.9956, time 125.92ms
iter 29170: loss 7.3712, time 122.81ms
iter 29180: loss 7.9849, time 122.97ms
iter 29190: loss 7.7447, time 122.53ms
tensor(0.9382)
iter 29200: loss 8.1069, time 123.15ms
iter 29210: loss 8.6791, time 122.82ms
iter 29220: loss 8.4406, time 122.49ms
iter 29230: loss 7.4038, time 125.64ms
iter 29240: loss 9.1896, time 122.80ms
step 29250: train loss 6.8191, val loss 6.7417
saving checkpoint to out-shakespeare-char
iter 29250: loss 7.8509, time 2851.67ms
iter 29260: loss 8.1374, time 124.31ms
iter 29270: loss 7.3324, time 122.87ms
iter 29280: loss 8.8456, time 123.42ms
iter 29290: loss 8.3185, time 123.28ms
tensor(0.9524)
iter 29300: loss 7.7728, time 123.38ms
iter 29310: loss 8.5661, time 123.16ms
iter 29320: loss 8.0786, time 123.37ms
iter 29330: loss 8.0150, time 123.66ms
iter 29340: loss 7.4144, time 123.70ms
iter 29350: loss 8.0412, time 122.62ms
iter 29360: loss 8.2531, time 121.50ms
iter 29370: loss 7.8455, time 122.68ms
iter 29380: loss 8.2712, time 124.25ms
iter 29390: loss 8.5474, time 123.56ms
tensor(0.9649)
iter 29400: loss 8.6451, time 123.47ms
iter 29410: loss 8.1373, time 123.51ms
iter 29420: loss 8.1642, time 123.28ms
iter 29430: loss 8.0600, time 126.17ms
iter 29440: loss 8.1836, time 123.21ms
iter 29450: loss 8.4111, time 123.52ms
iter 29460: loss 7.9394, time 123.67ms
iter 29470: loss 8.5801, time 123.27ms
iter 29480: loss 7.9014, time 123.29ms
iter 29490: loss 8.2959, time 123.34ms
tensor(0.9755)
step 29500: train loss 6.7057, val loss 6.7523
saving checkpoint to out-shakespeare-char
iter 29500: loss 7.9494, time 2857.80ms
iter 29510: loss 8.2058, time 121.86ms
iter 29520: loss 8.1992, time 124.58ms
iter 29530: loss 8.1134, time 122.51ms
iter 29540: loss 8.1679, time 123.11ms
iter 29550: loss 8.2361, time 122.91ms
iter 29560: loss 7.7300, time 123.04ms
iter 29570: loss 8.1971, time 123.73ms
iter 29580: loss 8.1723, time 123.10ms
iter 29590: loss 8.3312, time 122.58ms
tensor(0.9843)
iter 29600: loss 7.7105, time 122.58ms
iter 29610: loss 8.0600, time 122.80ms
iter 29620: loss 8.0377, time 126.27ms
iter 29630: loss 8.0724, time 122.80ms
iter 29640: loss 8.1498, time 122.54ms
iter 29650: loss 7.7448, time 122.59ms
iter 29660: loss 8.7213, time 122.65ms
iter 29670: loss 7.9721, time 122.14ms
iter 29680: loss 7.6081, time 116.82ms
iter 29690: loss 8.6754, time 116.99ms
tensor(0.9911)
iter 29700: loss 7.6389, time 117.95ms
iter 29710: loss 8.5403, time 118.90ms
iter 29720: loss 7.1970, time 117.59ms
iter 29730: loss 8.0290, time 117.31ms
iter 29740: loss 7.8878, time 117.43ms
step 29750: train loss 6.7081, val loss 6.7153
saving checkpoint to out-shakespeare-char
iter 29750: loss 7.9835, time 2854.00ms
iter 29760: loss 8.4890, time 120.49ms
iter 29770: loss 8.1908, time 119.14ms
iter 29780: loss 8.6092, time 119.07ms
iter 29790: loss 8.4435, time 121.25ms
tensor(0.9961)
iter 29800: loss 8.2722, time 118.34ms
iter 29810: loss 8.2843, time 119.22ms
iter 29820: loss 8.1764, time 119.12ms
iter 29830: loss 7.8370, time 119.42ms
iter 29840: loss 7.7627, time 120.45ms
iter 29850: loss 8.1356, time 119.34ms
iter 29860: loss 7.3217, time 121.09ms
iter 29870: loss 7.5310, time 119.16ms
iter 29880: loss 8.4389, time 119.02ms
iter 29890: loss 7.1066, time 120.29ms
tensor(0.9990)
iter 29900: loss 8.3181, time 120.11ms
iter 29910: loss 7.7867, time 118.28ms
iter 29920: loss 9.2081, time 118.86ms
iter 29930: loss 7.6138, time 119.57ms
iter 29940: loss 7.3580, time 119.44ms
iter 29950: loss 8.0674, time 118.64ms
iter 29960: loss 7.5393, time 119.25ms
iter 29970: loss 8.9650, time 118.94ms
iter 29980: loss 8.3337, time 119.20ms
iter 29990: loss 8.2082, time 119.01ms
tensor(1.)
step 30000: train loss 6.7386, val loss 6.7217
saving checkpoint to out-shakespeare-char
iter 30000: loss 7.3356, time 2865.47ms
iter 30010: loss 7.5300, time 118.54ms
iter 30020: loss 7.4333, time 119.30ms
iter 30030: loss 8.8051, time 121.15ms
iter 30040: loss 7.5964, time 119.32ms
iter 30050: loss 7.8683, time 120.58ms
iter 30060: loss 8.9993, time 118.23ms
iter 30070: loss 7.6469, time 120.39ms
iter 30080: loss 7.9755, time 119.30ms
iter 30090: loss 7.9785, time 119.31ms
tensor(0.9990)
iter 30100: loss 7.2780, time 119.39ms
iter 30110: loss 7.7457, time 120.16ms
iter 30120: loss 7.6972, time 120.44ms
iter 30130: loss 8.1277, time 119.18ms
iter 30140: loss 8.2401, time 120.55ms
iter 30150: loss 7.8543, time 119.27ms
iter 30160: loss 8.1579, time 120.48ms
iter 30170: loss 7.9991, time 119.23ms
iter 30180: loss 7.7921, time 120.06ms
iter 30190: loss 7.2699, time 119.47ms
tensor(0.9961)
iter 30200: loss 7.4327, time 119.21ms
iter 30210: loss 7.6908, time 119.25ms
iter 30220: loss 7.7616, time 118.65ms
iter 30230: loss 8.9008, time 120.42ms
iter 30240: loss 8.3143, time 120.43ms
step 30250: train loss 6.7091, val loss 6.7277
saving checkpoint to out-shakespeare-char
iter 30250: loss 8.2188, time 2852.25ms
iter 30260: loss 7.9683, time 122.52ms
iter 30270: loss 8.2405, time 122.17ms
iter 30280: loss 8.6289, time 125.08ms
iter 30290: loss 7.8174, time 123.31ms
tensor(0.9911)
iter 30300: loss 7.9208, time 122.89ms
iter 30310: loss 7.8052, time 122.53ms
iter 30320: loss 8.1707, time 122.44ms
iter 30330: loss 7.9817, time 122.32ms
iter 30340: loss 7.8453, time 121.99ms
iter 30350: loss 7.7874, time 125.24ms
iter 30360: loss 7.3554, time 122.56ms
iter 30370: loss 7.5627, time 122.34ms
iter 30380: loss 7.9340, time 122.24ms
iter 30390: loss 7.6827, time 122.41ms
tensor(0.9843)
iter 30400: loss 7.7130, time 122.41ms
iter 30410: loss 7.4942, time 122.39ms
iter 30420: loss 8.1327, time 125.29ms
iter 30430: loss 7.4796, time 122.22ms
iter 30440: loss 7.5326, time 122.32ms
iter 30450: loss 8.4552, time 121.68ms
iter 30460: loss 7.9962, time 122.45ms
iter 30470: loss 8.4391, time 122.26ms
iter 30480: loss 8.4514, time 122.83ms
iter 30490: loss 7.2528, time 124.42ms
tensor(0.9755)
step 30500: train loss 6.6801, val loss 6.6911
saving checkpoint to out-shakespeare-char
iter 30500: loss 7.7926, time 2843.14ms
iter 30510: loss 8.3276, time 119.10ms
iter 30520: loss 7.6788, time 119.12ms
iter 30530: loss 8.1046, time 119.75ms
iter 30540: loss 7.9072, time 118.87ms
iter 30550: loss 8.3704, time 119.21ms
iter 30560: loss 8.4675, time 119.24ms
iter 30570: loss 8.4461, time 119.13ms
iter 30580: loss 8.1157, time 119.09ms
iter 30590: loss 8.3793, time 119.24ms
tensor(0.9649)
iter 30600: loss 8.7133, time 120.47ms
iter 30610: loss 7.3339, time 119.69ms
iter 30620: loss 7.7517, time 120.54ms
iter 30630: loss 8.2924, time 119.32ms
iter 30640: loss 7.3776, time 119.10ms
iter 30650: loss 8.2186, time 119.10ms
iter 30660: loss 7.9172, time 118.20ms
iter 30670: loss 8.6282, time 119.15ms
iter 30680: loss 8.1310, time 120.37ms
iter 30690: loss 7.8963, time 119.76ms
tensor(0.9524)
iter 30700: loss 8.1936, time 120.36ms
iter 30710: loss 8.7834, time 119.72ms
iter 30720: loss 7.8586, time 119.72ms
iter 30730: loss 7.6872, time 119.56ms
iter 30740: loss 8.0741, time 117.77ms
step 30750: train loss 6.6899, val loss 6.6514
saving checkpoint to out-shakespeare-char
iter 30750: loss 7.8884, time 2860.63ms
iter 30760: loss 7.1901, time 119.11ms
iter 30770: loss 7.8960, time 119.13ms
iter 30780: loss 8.4202, time 121.92ms
iter 30790: loss 7.3259, time 119.00ms
tensor(0.9382)
iter 30800: loss 7.7670, time 119.04ms
iter 30810: loss 7.3343, time 119.13ms
iter 30820: loss 8.2345, time 120.44ms
iter 30830: loss 8.1937, time 119.05ms
iter 30840: loss 8.3990, time 120.35ms
iter 30850: loss 7.7536, time 119.39ms
iter 30860: loss 8.1721, time 120.64ms
iter 30870: loss 8.4174, time 119.22ms
iter 30880: loss 8.0409, time 118.94ms
iter 30890: loss 7.9905, time 119.21ms
tensor(0.9222)
iter 30900: loss 8.1892, time 119.29ms
iter 30910: loss 7.5407, time 119.21ms
iter 30920: loss 7.9152, time 120.23ms
iter 30930: loss 8.1944, time 120.98ms
iter 30940: loss 7.8008, time 119.28ms
iter 30950: loss 8.0108, time 122.53ms
iter 30960: loss 7.4776, time 119.70ms
iter 30970: loss 7.7659, time 120.41ms
iter 30980: loss 8.7257, time 119.49ms
iter 30990: loss 7.2928, time 119.19ms
tensor(0.9045)
step 31000: train loss 6.6930, val loss 6.7074
saving checkpoint to out-shakespeare-char
iter 31000: loss 7.4243, time 2861.44ms
iter 31010: loss 7.7638, time 120.85ms
iter 31020: loss 7.9970, time 119.33ms
iter 31030: loss 7.6592, time 119.42ms
iter 31040: loss 8.6658, time 119.61ms
iter 31050: loss 6.8812, time 120.17ms
iter 31060: loss 7.9406, time 118.28ms
iter 31070: loss 7.2508, time 118.64ms
iter 31080: loss 7.6145, time 119.81ms
iter 31090: loss 7.7689, time 119.32ms
tensor(0.8853)
iter 31100: loss 8.9767, time 120.90ms
iter 31110: loss 7.5779, time 118.55ms
iter 31120: loss 8.8857, time 119.42ms
iter 31130: loss 8.5469, time 120.75ms
iter 31140: loss 8.4255, time 119.33ms
iter 31150: loss 8.2444, time 120.82ms
iter 31160: loss 8.0434, time 118.96ms
iter 31170: loss 7.4918, time 121.05ms
iter 31180: loss 8.1471, time 119.22ms
iter 31190: loss 7.9031, time 119.38ms
tensor(0.8645)
iter 31200: loss 8.2246, time 119.73ms
iter 31210: loss 8.2282, time 122.61ms
iter 31220: loss 8.4712, time 122.91ms
iter 31230: loss 8.2057, time 123.38ms
iter 31240: loss 8.7347, time 122.22ms
step 31250: train loss 6.6175, val loss 6.6065
saving checkpoint to out-shakespeare-char
iter 31250: loss 8.0336, time 2879.74ms
iter 31260: loss 8.3626, time 122.04ms
iter 31270: loss 8.0033, time 118.50ms
iter 31280: loss 8.0318, time 118.33ms
iter 31290: loss 8.0713, time 119.29ms
tensor(0.8423)
iter 31300: loss 7.0802, time 120.96ms
iter 31310: loss 7.5989, time 119.44ms
iter 31320: loss 7.4348, time 119.78ms
iter 31330: loss 8.4708, time 119.44ms
iter 31340: loss 8.5446, time 120.48ms
iter 31350: loss 8.3419, time 120.67ms
iter 31360: loss 7.5803, time 120.52ms
iter 31370: loss 8.5371, time 120.00ms
iter 31380: loss 7.9489, time 119.32ms
iter 31390: loss 6.7006, time 119.40ms
tensor(0.8187)
iter 31400: loss 7.4997, time 120.78ms
iter 31410: loss 7.9210, time 120.66ms
iter 31420: loss 8.4363, time 120.00ms
iter 31430: loss 8.6451, time 120.66ms
iter 31440: loss 8.8082, time 120.70ms
iter 31450: loss 8.0084, time 119.33ms
iter 31460: loss 8.3767, time 122.16ms
iter 31470: loss 6.8746, time 121.11ms
iter 31480: loss 7.6985, time 119.18ms
iter 31490: loss 7.7102, time 119.54ms
tensor(0.7939)
step 31500: train loss 6.6331, val loss 6.6571
saving checkpoint to out-shakespeare-char
iter 31500: loss 7.8264, time 2883.71ms
iter 31510: loss 7.9758, time 122.79ms
iter 31520: loss 7.6777, time 123.20ms
iter 31530: loss 9.2428, time 122.97ms
iter 31540: loss 7.7860, time 123.18ms
iter 31550: loss 7.9210, time 122.89ms
iter 31560: loss 6.3088, time 122.77ms
iter 31570: loss 8.3475, time 125.59ms
iter 31580: loss 8.2614, time 123.01ms
iter 31590: loss 8.1367, time 122.18ms
tensor(0.7679)
iter 31600: loss 7.7932, time 122.87ms
iter 31610: loss 7.3488, time 123.18ms
iter 31620: loss 8.4959, time 123.98ms
iter 31630: loss 7.1351, time 122.80ms
iter 31640: loss 8.2995, time 121.74ms
iter 31650: loss 7.2188, time 122.75ms
iter 31660: loss 8.0464, time 122.92ms
iter 31670: loss 7.8331, time 125.01ms
iter 31680: loss 8.7856, time 122.62ms
iter 31690: loss 7.9167, time 122.22ms
tensor(0.7409)
iter 31700: loss 8.7900, time 122.63ms
iter 31710: loss 8.0194, time 123.15ms
iter 31720: loss 7.6202, time 122.74ms
iter 31730: loss 7.1120, time 122.67ms
iter 31740: loss 7.9555, time 125.68ms
step 31750: train loss 6.6527, val loss 6.6234
saving checkpoint to out-shakespeare-char
iter 31750: loss 7.0700, time 2855.27ms
iter 31760: loss 7.5724, time 119.30ms
iter 31770: loss 7.9642, time 121.81ms
iter 31780: loss 7.0893, time 119.56ms
iter 31790: loss 7.5536, time 119.51ms
tensor(0.7129)
iter 31800: loss 7.9634, time 119.72ms
iter 31810: loss 8.0716, time 120.78ms
iter 31820: loss 7.2115, time 120.70ms
iter 31830: loss 7.9032, time 120.57ms
iter 31840: loss 7.7673, time 121.03ms
iter 31850: loss 7.7809, time 118.93ms
iter 31860: loss 8.5053, time 120.69ms
iter 31870: loss 7.8664, time 119.45ms
iter 31880: loss 7.7723, time 119.28ms
iter 31890: loss 8.1285, time 120.11ms
tensor(0.6841)
iter 31900: loss 7.4585, time 118.52ms
iter 31910: loss 7.6537, time 120.60ms
iter 31920: loss 7.6919, time 120.78ms
iter 31930: loss 7.1781, time 119.26ms
iter 31940: loss 6.9158, time 120.26ms
iter 31950: loss 7.5650, time 120.73ms
iter 31960: loss 7.8658, time 120.38ms
iter 31970: loss 8.1634, time 119.25ms
iter 31980: loss 7.4286, time 120.50ms
iter 31990: loss 8.1572, time 119.62ms
tensor(0.6545)
step 32000: train loss 6.6701, val loss 6.5935
saving checkpoint to out-shakespeare-char
iter 32000: loss 6.7680, time 2865.19ms
iter 32010: loss 8.3851, time 120.62ms
iter 32020: loss 8.2863, time 119.27ms
iter 32030: loss 8.1779, time 119.78ms
iter 32040: loss 8.0509, time 119.59ms
iter 32050: loss 8.4121, time 119.52ms
iter 32060: loss 7.8656, time 120.46ms
iter 32070: loss 7.6819, time 120.26ms
iter 32080: loss 7.9942, time 119.38ms
iter 32090: loss 8.0469, time 119.56ms
tensor(0.6243)
iter 32100: loss 9.3582, time 119.61ms
iter 32110: loss 7.7961, time 120.40ms
iter 32120: loss 7.6701, time 119.04ms
iter 32130: loss 8.5408, time 120.39ms
iter 32140: loss 7.6992, time 120.69ms
iter 32150: loss 7.8298, time 120.23ms
iter 32160: loss 8.0838, time 122.43ms
iter 32170: loss 8.5326, time 119.36ms
iter 32180: loss 8.5584, time 119.28ms
iter 32190: loss 7.9061, time 119.46ms
tensor(0.5937)
iter 32200: loss 8.0103, time 120.40ms
iter 32210: loss 8.6583, time 120.15ms
iter 32220: loss 8.0445, time 120.55ms
iter 32230: loss 8.6798, time 120.59ms
iter 32240: loss 7.8086, time 119.17ms
step 32250: train loss 6.5952, val loss 6.6056
saving checkpoint to out-shakespeare-char
iter 32250: loss 7.2187, time 2850.92ms
iter 32260: loss 7.9769, time 122.88ms
iter 32270: loss 8.0871, time 123.04ms
iter 32280: loss 7.4134, time 125.68ms
iter 32290: loss 8.1640, time 122.75ms
tensor(0.5627)
iter 32300: loss 7.3945, time 122.95ms
iter 32310: loss 7.7847, time 122.69ms
iter 32320: loss 7.1015, time 122.79ms
iter 32330: loss 8.4074, time 122.82ms
iter 32340: loss 6.8751, time 122.40ms
iter 32350: loss 7.8985, time 121.97ms
iter 32360: loss 7.8940, time 125.54ms
iter 32370: loss 7.2815, time 123.14ms
iter 32380: loss 8.3625, time 122.80ms
iter 32390: loss 8.2321, time 122.76ms
tensor(0.5314)
iter 32400: loss 7.8191, time 122.92ms
iter 32410: loss 6.9170, time 122.65ms
iter 32420: loss 7.5765, time 122.82ms
iter 32430: loss 8.0622, time 125.86ms
iter 32440: loss 7.9063, time 122.74ms
iter 32450: loss 8.2060, time 122.74ms
iter 32460: loss 7.6109, time 122.89ms
iter 32470: loss 7.2028, time 123.21ms
iter 32480: loss 7.2645, time 122.88ms
iter 32490: loss 8.0181, time 122.67ms
tensor(0.5000)
step 32500: train loss 6.6520, val loss 6.6566
saving checkpoint to out-shakespeare-char
iter 32500: loss 7.9069, time 2862.96ms
iter 32510: loss 7.9946, time 122.69ms
iter 32520: loss 7.5466, time 123.17ms
iter 32530: loss 7.7352, time 126.03ms
iter 32540: loss 8.1279, time 122.22ms
iter 32550: loss 9.0268, time 122.03ms
iter 32560: loss 7.3490, time 122.75ms
iter 32570: loss 8.0766, time 122.86ms
iter 32580: loss 7.9777, time 123.14ms
iter 32590: loss 6.6974, time 122.94ms
tensor(0.4686)
iter 32600: loss 7.6297, time 123.21ms
iter 32610: loss 7.9461, time 122.39ms
iter 32620: loss 8.2364, time 122.90ms
iter 32630: loss 8.0887, time 122.72ms
iter 32640: loss 7.5610, time 122.95ms
iter 32650: loss 6.5214, time 122.28ms
iter 32660: loss 8.2677, time 122.85ms
iter 32670: loss 8.9541, time 126.06ms
iter 32680: loss 7.8269, time 122.89ms
iter 32690: loss 7.0904, time 122.96ms
tensor(0.4373)
iter 32700: loss 8.1114, time 122.98ms
iter 32710: loss 7.7972, time 122.95ms
iter 32720: loss 7.8609, time 123.29ms
iter 32730: loss 8.1668, time 123.29ms
iter 32740: loss 8.1732, time 123.66ms
step 32750: train loss 6.5301, val loss 6.5601
saving checkpoint to out-shakespeare-char
iter 32750: loss 7.2634, time 2868.31ms
iter 32760: loss 7.4520, time 122.97ms
iter 32770: loss 8.2782, time 123.58ms
iter 32780: loss 8.1318, time 122.04ms
iter 32790: loss 7.9406, time 123.03ms
tensor(0.4063)
iter 32800: loss 8.2445, time 122.72ms
iter 32810: loss 7.6843, time 126.78ms
iter 32820: loss 8.2806, time 122.87ms
iter 32830: loss 8.3164, time 122.17ms
iter 32840: loss 9.0973, time 122.41ms
iter 32850: loss 7.9933, time 122.90ms
iter 32860: loss 7.6847, time 122.92ms
iter 32870: loss 8.5036, time 122.96ms
iter 32880: loss 8.1529, time 123.29ms
iter 32890: loss 8.2328, time 126.16ms
tensor(0.3757)
iter 32900: loss 8.4787, time 122.79ms
iter 32910: loss 8.1930, time 122.48ms
iter 32920: loss 7.8862, time 122.82ms
iter 32930: loss 8.2106, time 123.31ms
iter 32940: loss 8.6065, time 122.73ms
iter 32950: loss 7.8015, time 122.75ms
iter 32960: loss 7.7386, time 125.77ms
iter 32970: loss 7.4862, time 122.66ms
iter 32980: loss 7.8414, time 122.31ms
iter 32990: loss 7.2963, time 121.86ms
tensor(0.3455)
step 33000: train loss 6.5489, val loss 6.5576
saving checkpoint to out-shakespeare-char
iter 33000: loss 7.6206, time 2862.01ms
iter 33010: loss 8.1167, time 122.67ms
iter 33020: loss 7.4945, time 122.39ms
iter 33030: loss 8.0732, time 122.55ms
iter 33040: loss 8.4517, time 121.65ms
iter 33050: loss 7.8908, time 122.27ms
iter 33060: loss 8.1979, time 122.81ms
iter 33070: loss 8.1070, time 122.61ms
iter 33080: loss 7.9271, time 121.97ms
iter 33090: loss 8.0067, time 126.74ms
tensor(0.3159)
iter 33100: loss 7.3834, time 121.45ms
iter 33110: loss 8.0127, time 122.66ms
iter 33120: loss 7.6318, time 122.59ms
iter 33130: loss 6.8383, time 123.01ms
iter 33140: loss 8.2812, time 123.75ms
iter 33150: loss 7.8775, time 122.51ms
iter 33160: loss 7.8181, time 122.53ms
iter 33170: loss 8.0159, time 122.74ms
iter 33180: loss 8.1793, time 122.80ms
iter 33190: loss 7.6037, time 121.46ms
tensor(0.2871)
iter 33200: loss 7.8677, time 122.60ms
iter 33210: loss 7.5141, time 122.68ms
iter 33220: loss 7.9571, time 124.17ms
iter 33230: loss 7.7736, time 122.52ms
iter 33240: loss 8.2037, time 122.75ms
step 33250: train loss 6.5928, val loss 6.5796
saving checkpoint to out-shakespeare-char
iter 33250: loss 7.4178, time 2882.46ms
iter 33260: loss 7.7160, time 122.43ms
iter 33270: loss 8.1192, time 122.41ms
iter 33280: loss 7.6857, time 122.62ms
iter 33290: loss 7.7874, time 122.67ms
tensor(0.2591)
iter 33300: loss 8.0047, time 125.84ms
iter 33310: loss 7.9288, time 122.51ms
iter 33320: loss 7.7785, time 119.87ms
iter 33330: loss 7.2681, time 120.79ms
iter 33340: loss 7.3807, time 120.65ms
iter 33350: loss 7.6887, time 120.71ms
iter 33360: loss 7.9647, time 119.76ms
iter 33370: loss 7.9239, time 119.14ms
iter 33380: loss 8.0744, time 120.79ms
iter 33390: loss 7.2598, time 119.52ms
tensor(0.2321)
iter 33400: loss 7.7660, time 119.70ms
iter 33410: loss 7.4054, time 118.54ms
iter 33420: loss 7.5440, time 120.88ms
iter 33430: loss 7.5667, time 119.63ms
iter 33440: loss 9.3363, time 120.49ms
iter 33450: loss 8.2727, time 119.16ms
iter 33460: loss 7.9492, time 118.72ms
iter 33470: loss 7.8730, time 119.52ms
iter 33480: loss 7.9301, time 120.55ms
iter 33490: loss 7.5733, time 120.61ms
tensor(0.2061)
step 33500: train loss 6.5357, val loss 6.5202
saving checkpoint to out-shakespeare-char
iter 33500: loss 7.2899, time 2875.87ms
iter 33510: loss 7.8369, time 120.71ms
iter 33520: loss 7.6609, time 119.49ms
iter 33530: loss 7.4377, time 121.30ms
iter 33540: loss 8.0353, time 120.64ms
iter 33550: loss 7.3880, time 119.46ms
iter 33560: loss 7.7308, time 120.66ms
iter 33570: loss 8.2628, time 122.51ms
iter 33580: loss 7.4847, time 120.58ms
iter 33590: loss 7.8036, time 119.42ms
tensor(0.1813)
iter 33600: loss 7.6293, time 119.53ms
iter 33610: loss 7.4158, time 120.43ms
iter 33620: loss 7.6595, time 119.24ms
iter 33630: loss 8.1500, time 122.29ms
iter 33640: loss 8.0185, time 119.15ms
iter 33650: loss 7.3836, time 122.60ms
iter 33660: loss 7.9627, time 119.31ms
iter 33670: loss 8.3675, time 120.60ms
iter 33680: loss 7.9364, time 120.82ms
iter 33690: loss 7.4552, time 118.61ms
tensor(0.1577)
iter 33700: loss 7.8873, time 118.61ms
iter 33710: loss 7.6978, time 122.46ms
iter 33720: loss 7.3016, time 120.52ms
iter 33730: loss 7.6928, time 123.07ms
iter 33740: loss 8.4128, time 119.38ms
step 33750: train loss 6.5803, val loss 6.5414
saving checkpoint to out-shakespeare-char
iter 33750: loss 7.8530, time 2879.98ms
iter 33760: loss 8.2692, time 118.43ms
iter 33770: loss 7.4423, time 119.61ms
iter 33780: loss 7.8381, time 119.20ms
iter 33790: loss 8.5582, time 119.29ms
tensor(0.1355)
iter 33800: loss 8.1481, time 119.32ms
iter 33810: loss 7.8356, time 120.32ms
iter 33820: loss 8.3321, time 119.36ms
iter 33830: loss 8.3441, time 119.67ms
iter 33840: loss 8.8165, time 119.84ms
iter 33850: loss 8.3377, time 120.04ms
iter 33860: loss 8.2873, time 120.78ms
iter 33870: loss 7.9831, time 119.20ms
iter 33880: loss 7.9145, time 120.10ms
iter 33890: loss 7.3431, time 119.71ms
tensor(0.1147)
iter 33900: loss 7.4566, time 120.48ms
iter 33910: loss 8.0188, time 119.51ms
iter 33920: loss 8.1876, time 120.98ms
iter 33930: loss 7.9261, time 119.32ms
iter 33940: loss 7.7384, time 119.41ms
iter 33950: loss 7.2880, time 120.58ms
iter 33960: loss 7.7412, time 119.64ms
iter 33970: loss 7.3964, time 120.37ms
iter 33980: loss 9.0398, time 120.51ms
iter 33990: loss 7.7423, time 122.20ms
tensor(0.0955)
step 34000: train loss 6.5335, val loss 6.4799
saving checkpoint to out-shakespeare-char
iter 34000: loss 7.2374, time 2877.24ms
iter 34010: loss 7.5187, time 119.51ms
iter 34020: loss 7.4613, time 118.98ms
iter 34030: loss 8.4905, time 119.47ms
iter 34040: loss 8.6381, time 119.17ms
iter 34050: loss 7.7380, time 121.06ms
iter 34060: loss 7.4998, time 119.23ms
iter 34070: loss 7.8623, time 119.10ms
iter 34080: loss 7.4402, time 120.64ms
iter 34090: loss 8.0196, time 119.44ms
tensor(0.0778)
iter 34100: loss 7.8990, time 120.82ms
iter 34110: loss 8.0572, time 120.61ms
iter 34120: loss 7.6332, time 120.79ms
iter 34130: loss 7.3408, time 120.82ms
iter 34140: loss 7.7455, time 120.48ms
iter 34150: loss 8.4829, time 120.67ms
iter 34160: loss 7.4810, time 120.54ms
iter 34170: loss 7.9798, time 120.75ms
iter 34180: loss 7.9865, time 120.69ms
iter 34190: loss 7.4909, time 121.68ms
tensor(0.0618)
iter 34200: loss 8.3768, time 119.23ms
iter 34210: loss 7.8638, time 119.28ms
iter 34220: loss 8.5354, time 120.34ms
iter 34230: loss 8.4054, time 119.36ms
iter 34240: loss 7.8028, time 119.37ms
step 34250: train loss 6.4900, val loss 6.5442
saving checkpoint to out-shakespeare-char
iter 34250: loss 7.1050, time 2872.27ms
iter 34260: loss 7.8843, time 120.79ms
iter 34270: loss 7.5623, time 119.11ms
iter 34280: loss 7.7792, time 119.45ms
iter 34290: loss 8.2236, time 120.48ms
tensor(0.0476)
iter 34300: loss 7.8760, time 119.53ms
iter 34310: loss 8.7502, time 119.08ms
iter 34320: loss 7.6412, time 120.80ms
iter 34330: loss 8.1522, time 119.36ms
iter 34340: loss 7.9053, time 120.44ms
iter 34350: loss 6.9964, time 119.27ms
iter 34360: loss 8.1552, time 119.58ms
iter 34370: loss 6.8104, time 119.36ms
iter 34380: loss 7.9577, time 118.99ms
iter 34390: loss 8.3196, time 119.40ms
tensor(0.0351)
iter 34400: loss 7.9855, time 121.88ms
iter 34410: loss 7.7520, time 119.34ms
iter 34420: loss 7.8356, time 119.30ms
iter 34430: loss 7.4294, time 119.39ms
iter 34440: loss 7.6752, time 119.31ms
iter 34450: loss 7.8341, time 120.63ms
iter 34460: loss 7.6108, time 119.36ms
iter 34470: loss 7.9521, time 120.15ms
iter 34480: loss 7.0258, time 119.51ms
iter 34490: loss 7.3624, time 119.41ms
tensor(0.0245)
step 34500: train loss 6.4881, val loss 6.5440
saving checkpoint to out-shakespeare-char
iter 34500: loss 7.7189, time 2863.70ms
iter 34510: loss 7.8632, time 120.95ms
iter 34520: loss 8.0101, time 120.70ms
iter 34530: loss 7.6388, time 118.48ms
iter 34540: loss 8.2020, time 119.48ms
iter 34550: loss 7.3858, time 120.38ms
iter 34560: loss 7.8587, time 120.59ms
iter 34570: loss 8.0331, time 120.35ms
iter 34580: loss 7.7646, time 119.34ms
iter 34590: loss 6.9746, time 117.18ms
tensor(0.0157)
iter 34600: loss 8.6578, time 119.41ms
iter 34610: loss 7.7005, time 122.09ms
iter 34620: loss 7.8072, time 118.67ms
iter 34630: loss 8.0974, time 118.60ms
iter 34640: loss 7.8288, time 119.20ms
iter 34650: loss 7.5679, time 119.64ms
iter 34660: loss 7.7083, time 120.12ms
iter 34670: loss 7.7715, time 119.25ms
iter 34680: loss 8.0019, time 120.95ms
iter 34690: loss 7.5611, time 119.26ms
tensor(0.0089)
iter 34700: loss 6.6349, time 119.54ms
iter 34710: loss 8.1941, time 120.20ms
iter 34720: loss 8.0207, time 120.27ms
iter 34730: loss 7.6172, time 120.18ms
iter 34740: loss 8.8723, time 119.34ms
step 34750: train loss 6.5271, val loss 6.5204
saving checkpoint to out-shakespeare-char
iter 34750: loss 8.0124, time 2879.71ms
iter 34760: loss 7.9665, time 121.34ms
iter 34770: loss 7.6580, time 119.32ms
iter 34780: loss 7.6887, time 120.85ms
iter 34790: loss 7.4572, time 120.91ms
tensor(0.0039)
iter 34800: loss 7.7445, time 119.28ms
iter 34810: loss 7.9388, time 119.73ms
iter 34820: loss 7.6742, time 120.58ms
iter 34830: loss 8.3086, time 121.05ms
iter 34840: loss 7.5514, time 120.57ms
iter 34850: loss 9.2001, time 122.27ms
iter 34860: loss 7.6253, time 119.18ms
iter 34870: loss 7.1511, time 121.77ms
iter 34880: loss 7.1429, time 119.30ms
iter 34890: loss 9.1548, time 122.47ms
tensor(0.0010)
iter 34900: loss 7.7128, time 119.29ms
iter 34910: loss 8.4490, time 119.78ms
iter 34920: loss 8.3318, time 119.55ms
iter 34930: loss 7.8781, time 120.47ms
iter 34940: loss 8.8692, time 120.59ms
iter 34950: loss 8.0210, time 119.96ms
iter 34960: loss 7.1215, time 120.55ms
iter 34970: loss 7.3858, time 120.64ms
iter 34980: loss 7.8951, time 122.11ms
iter 34990: loss 7.7257, time 118.88ms
tensor(0.0010)
step 35000: train loss 6.5164, val loss 6.5042
saving checkpoint to out-shakespeare-char
iter 35000: loss 7.5568, time 2875.59ms
iter 35010: loss 8.1096, time 118.81ms
iter 35020: loss 7.9908, time 120.45ms
iter 35030: loss 8.0397, time 120.78ms
iter 35040: loss 8.4427, time 122.02ms
iter 35050: loss 7.8889, time 119.20ms
iter 35060: loss 8.2315, time 120.23ms
iter 35070: loss 8.4442, time 119.23ms
iter 35080: loss 8.5334, time 119.63ms
iter 35090: loss 6.8872, time 119.40ms
tensor(0.0010)
iter 35100: loss 7.9926, time 119.37ms
iter 35110: loss 7.6943, time 119.55ms
iter 35120: loss 8.1296, time 120.36ms
iter 35130: loss 7.8429, time 119.60ms
iter 35140: loss 7.6705, time 120.57ms
iter 35150: loss 7.7143, time 120.92ms
iter 35160: loss 7.6558, time 120.51ms
iter 35170: loss 8.8292, time 120.78ms
iter 35180: loss 7.3676, time 121.38ms
iter 35190: loss 8.6758, time 120.57ms
tensor(0.0039)
iter 35200: loss 8.0825, time 120.42ms
iter 35210: loss 7.6481, time 121.21ms
iter 35220: loss 7.3465, time 119.54ms
iter 35230: loss 7.2654, time 119.64ms
iter 35240: loss 7.4601, time 118.83ms
step 35250: train loss 6.5134, val loss 6.5535
saving checkpoint to out-shakespeare-char
iter 35250: loss 8.4090, time 2878.28ms
iter 35260: loss 7.9212, time 119.18ms
iter 35270: loss 7.6150, time 119.36ms
iter 35280: loss 7.4272, time 121.17ms
iter 35290: loss 8.4042, time 120.68ms
tensor(0.0089)
iter 35300: loss 7.6634, time 120.99ms
iter 35310: loss 7.2763, time 119.27ms
iter 35320: loss 8.1884, time 120.61ms
iter 35330: loss 8.5421, time 120.58ms
iter 35340: loss 7.5856, time 120.50ms
iter 35350: loss 7.5205, time 119.30ms
iter 35360: loss 8.3517, time 120.56ms
iter 35370: loss 7.1784, time 119.83ms
iter 35380: loss 7.8461, time 119.73ms
iter 35390: loss 8.9045, time 120.40ms
tensor(0.0157)
iter 35400: loss 8.2339, time 119.18ms
iter 35410: loss 8.2641, time 119.40ms
iter 35420: loss 7.8093, time 119.12ms
iter 35430: loss 7.5934, time 120.54ms
iter 35440: loss 6.6975, time 120.63ms
iter 35450: loss 8.6250, time 120.56ms
iter 35460: loss 7.7813, time 119.30ms
iter 35470: loss 8.1243, time 120.96ms
iter 35480: loss 7.8349, time 119.43ms
iter 35490: loss 7.2376, time 119.53ms
tensor(0.0245)
step 35500: train loss 6.5034, val loss 6.4969
saving checkpoint to out-shakespeare-char
iter 35500: loss 6.9959, time 2871.45ms
iter 35510: loss 7.9842, time 119.39ms
iter 35520: loss 7.6274, time 120.37ms
iter 35530: loss 8.1663, time 119.19ms
iter 35540: loss 8.5515, time 120.24ms
iter 35550: loss 7.4511, time 120.36ms
iter 35560: loss 8.1544, time 122.03ms
iter 35570: loss 7.9267, time 119.35ms
iter 35580: loss 7.8668, time 120.23ms
iter 35590: loss 7.6816, time 119.33ms
tensor(0.0351)
iter 35600: loss 8.2947, time 119.53ms
iter 35610: loss 7.3530, time 120.72ms
iter 35620: loss 6.9188, time 119.33ms
iter 35630: loss 8.9208, time 120.48ms
iter 35640: loss 7.7667, time 119.37ms
iter 35650: loss 8.3053, time 120.60ms
iter 35660: loss 8.0084, time 120.55ms
iter 35670: loss 7.8062, time 119.37ms
iter 35680: loss 7.6315, time 119.31ms
iter 35690: loss 8.0889, time 119.41ms
tensor(0.0476)
iter 35700: loss 7.9028, time 119.39ms
iter 35710: loss 7.4246, time 119.49ms
iter 35720: loss 7.9123, time 119.19ms
iter 35730: loss 7.2856, time 120.79ms
iter 35740: loss 7.9236, time 119.15ms
step 35750: train loss 6.5455, val loss 6.5015
saving checkpoint to out-shakespeare-char
iter 35750: loss 7.9086, time 2874.54ms
iter 35760: loss 7.6628, time 120.12ms
iter 35770: loss 8.4790, time 120.59ms
iter 35780: loss 8.3775, time 120.62ms
iter 35790: loss 7.9045, time 119.37ms
tensor(0.0618)
iter 35800: loss 7.6872, time 120.60ms
iter 35810: loss 7.8491, time 120.17ms
iter 35820: loss 9.0545, time 119.23ms
iter 35830: loss 7.1572, time 119.40ms
iter 35840: loss 7.8616, time 120.51ms
iter 35850: loss 8.0854, time 120.48ms
iter 35860: loss 8.0402, time 120.14ms
iter 35870: loss 8.3944, time 119.48ms
iter 35880: loss 7.8277, time 119.35ms
iter 35890: loss 7.2325, time 119.40ms
tensor(0.0778)
iter 35900: loss 7.3572, time 119.42ms
iter 35910: loss 8.1500, time 119.47ms
iter 35920: loss 7.9924, time 119.19ms
iter 35930: loss 8.3604, time 120.48ms
iter 35940: loss 7.9061, time 120.51ms
iter 35950: loss 7.9661, time 119.41ms
iter 35960: loss 8.2842, time 120.63ms
iter 35970: loss 7.1060, time 119.07ms
iter 35980: loss 7.9240, time 120.55ms
iter 35990: loss 7.7296, time 121.71ms
tensor(0.0955)
step 36000: train loss 6.5397, val loss 6.5037
saving checkpoint to out-shakespeare-char
iter 36000: loss 7.7542, time 2876.55ms
iter 36010: loss 7.4120, time 119.49ms
iter 36020: loss 8.6481, time 120.05ms
iter 36030: loss 8.3270, time 121.79ms
iter 36040: loss 8.0571, time 119.66ms
iter 36050: loss 7.7821, time 119.27ms
iter 36060: loss 8.3619, time 120.71ms
iter 36070: loss 7.8269, time 119.27ms
iter 36080: loss 8.2539, time 119.36ms
iter 36090: loss 7.5701, time 119.60ms
tensor(0.1147)
iter 36100: loss 8.6471, time 120.91ms
iter 36110: loss 7.8324, time 119.19ms
iter 36120: loss 8.2199, time 118.79ms
iter 36130: loss 7.8395, time 120.66ms
iter 36140: loss 7.7535, time 120.35ms
iter 36150: loss 8.1137, time 120.81ms
iter 36160: loss 7.5584, time 122.26ms
iter 36170: loss 7.0290, time 118.67ms
iter 36180: loss 8.0161, time 119.71ms
iter 36190: loss 7.9970, time 119.30ms
tensor(0.1355)
iter 36200: loss 8.1563, time 120.81ms
iter 36210: loss 7.9845, time 120.72ms
iter 36220: loss 8.3443, time 119.62ms
iter 36230: loss 7.5623, time 119.45ms
iter 36240: loss 8.0944, time 119.29ms
step 36250: train loss 6.5263, val loss 6.4926
saving checkpoint to out-shakespeare-char
iter 36250: loss 8.6000, time 2873.78ms
iter 36260: loss 8.3322, time 119.32ms
iter 36270: loss 8.2398, time 120.66ms
iter 36280: loss 7.5838, time 119.35ms
iter 36290: loss 8.3918, time 119.27ms
tensor(0.1577)
iter 36300: loss 7.2196, time 119.38ms
iter 36310: loss 6.9518, time 120.27ms
iter 36320: loss 7.1979, time 119.38ms
iter 36330: loss 7.9068, time 120.99ms
iter 36340: loss 7.1511, time 119.48ms
iter 36350: loss 7.4371, time 120.70ms
iter 36360: loss 7.4001, time 119.38ms
iter 36370: loss 7.6251, time 120.60ms
iter 36380: loss 7.5962, time 120.53ms
iter 36390: loss 7.8194, time 119.41ms
tensor(0.1813)
iter 36400: loss 7.5812, time 119.45ms
iter 36410: loss 7.7157, time 120.80ms
iter 36420: loss 8.7403, time 119.57ms
iter 36430: loss 7.3939, time 119.30ms
iter 36440: loss 8.1467, time 119.53ms
iter 36450: loss 7.4346, time 120.47ms
iter 36460: loss 8.3151, time 119.52ms
iter 36470: loss 6.9564, time 119.42ms
iter 36480: loss 7.3489, time 119.44ms
iter 36490: loss 7.7785, time 121.75ms
tensor(0.2061)
step 36500: train loss 6.4853, val loss 6.5219
saving checkpoint to out-shakespeare-char
iter 36500: loss 8.3711, time 2885.58ms
iter 36510: loss 8.0587, time 120.72ms
iter 36520: loss 7.8694, time 119.89ms
iter 36530: loss 8.2154, time 119.47ms
iter 36540: loss 8.1543, time 120.65ms
iter 36550: loss 8.2326, time 119.42ms
iter 36560: loss 8.1201, time 119.30ms
iter 36570: loss 7.9669, time 119.55ms
iter 36580: loss 7.9752, time 120.58ms
iter 36590: loss 8.5766, time 120.98ms
tensor(0.2321)
iter 36600: loss 7.7274, time 120.65ms
iter 36610: loss 8.6650, time 120.66ms
iter 36620: loss 7.8203, time 118.65ms
iter 36630: loss 8.0049, time 121.02ms
iter 36640: loss 7.9199, time 120.60ms
iter 36650: loss 8.2622, time 120.70ms
iter 36660: loss 7.5153, time 120.37ms
iter 36670: loss 7.7527, time 119.52ms
iter 36680: loss 8.6126, time 119.46ms
iter 36690: loss 8.0055, time 122.46ms
tensor(0.2591)
iter 36700: loss 8.2040, time 120.61ms
iter 36710: loss 8.5668, time 119.22ms
iter 36720: loss 7.1409, time 119.50ms
iter 36730: loss 8.5763, time 119.20ms
iter 36740: loss 7.5570, time 119.34ms
step 36750: train loss 6.5239, val loss 6.4896
saving checkpoint to out-shakespeare-char
iter 36750: loss 7.4333, time 2870.94ms
iter 36760: loss 7.9937, time 120.68ms
iter 36770: loss 7.6835, time 119.28ms
iter 36780: loss 8.3885, time 120.66ms
iter 36790: loss 7.7193, time 119.44ms
tensor(0.2871)
iter 36800: loss 6.5871, time 120.63ms
iter 36810: loss 7.5719, time 119.42ms
iter 36820: loss 8.3438, time 120.38ms
iter 36830: loss 8.2719, time 120.05ms
iter 36840: loss 7.2627, time 121.04ms
iter 36850: loss 8.5716, time 120.52ms
iter 36860: loss 8.0064, time 120.70ms
iter 36870: loss 8.5986, time 120.75ms
iter 36880: loss 7.9637, time 120.53ms
iter 36890: loss 7.9936, time 119.80ms
tensor(0.3159)
iter 36900: loss 7.2601, time 119.36ms
iter 36910: loss 7.6742, time 119.50ms
iter 36920: loss 7.4474, time 120.50ms
iter 36930: loss 7.9132, time 119.60ms
iter 36940: loss 7.6050, time 118.86ms
iter 36950: loss 7.5371, time 119.23ms
iter 36960: loss 8.3138, time 118.71ms
iter 36970: loss 8.4248, time 117.96ms
iter 36980: loss 8.0809, time 119.50ms
iter 36990: loss 7.9561, time 120.70ms
tensor(0.3455)
step 37000: train loss 6.4819, val loss 6.5400
saving checkpoint to out-shakespeare-char
iter 37000: loss 8.5045, time 2859.76ms
iter 37010: loss 7.8736, time 122.76ms
iter 37020: loss 8.2973, time 122.71ms
iter 37030: loss 6.9113, time 122.13ms
iter 37040: loss 8.2023, time 121.58ms
iter 37050: loss 7.5345, time 122.81ms
iter 37060: loss 7.5551, time 125.80ms
iter 37070: loss 8.0003, time 122.65ms
iter 37080: loss 7.7912, time 122.67ms
iter 37090: loss 7.3727, time 122.86ms
tensor(0.3757)
iter 37100: loss 8.4501, time 122.84ms
iter 37110: loss 8.3988, time 122.72ms
iter 37120: loss 7.3988, time 122.44ms
iter 37130: loss 7.6600, time 122.60ms
iter 37140: loss 7.4680, time 125.87ms
iter 37150: loss 7.3056, time 122.61ms
iter 37160: loss 7.6793, time 122.57ms
iter 37170: loss 8.6056, time 122.35ms
iter 37180: loss 8.1282, time 123.19ms
iter 37190: loss 8.4582, time 121.55ms
tensor(0.4063)
iter 37200: loss 7.5735, time 122.92ms
iter 37210: loss 7.3985, time 122.70ms
iter 37220: loss 7.5193, time 122.63ms
iter 37230: loss 8.7521, time 125.35ms
iter 37240: loss 7.7053, time 122.68ms
step 37250: train loss 6.5934, val loss 6.5493
saving checkpoint to out-shakespeare-char
iter 37250: loss 6.8345, time 2858.20ms
iter 37260: loss 7.3381, time 125.91ms
iter 37270: loss 7.1538, time 122.94ms
iter 37280: loss 7.0168, time 122.89ms
iter 37290: loss 8.1552, time 123.05ms
tensor(0.4373)
iter 37300: loss 8.6933, time 122.97ms
iter 37310: loss 7.4316, time 123.41ms
iter 37320: loss 8.1588, time 123.19ms
iter 37330: loss 7.0761, time 123.09ms
iter 37340: loss 7.4863, time 126.15ms
iter 37350: loss 7.6361, time 122.77ms
iter 37360: loss 7.9160, time 124.39ms
iter 37370: loss 8.3492, time 123.16ms
iter 37380: loss 8.3302, time 122.95ms
iter 37390: loss 8.2774, time 122.88ms
tensor(0.4686)
iter 37400: loss 8.4006, time 123.09ms
iter 37410: loss 7.8960, time 125.94ms
iter 37420: loss 8.4559, time 123.08ms
iter 37430: loss 7.6474, time 123.00ms
iter 37440: loss 7.8570, time 122.89ms
iter 37450: loss 8.3316, time 124.10ms
iter 37460: loss 8.0859, time 123.09ms
iter 37470: loss 8.7407, time 123.11ms
iter 37480: loss 7.9373, time 122.86ms
iter 37490: loss 7.3534, time 122.92ms
tensor(0.5000)
step 37500: train loss 6.5591, val loss 6.6187
saving checkpoint to out-shakespeare-char
iter 37500: loss 7.8554, time 2842.60ms
iter 37510: loss 8.5311, time 125.99ms
iter 37520: loss 7.9197, time 122.39ms
iter 37530: loss 7.8799, time 122.44ms
iter 37540: loss 7.6239, time 122.81ms
iter 37550: loss 7.5790, time 122.46ms
iter 37560: loss 7.2230, time 122.32ms
iter 37570: loss 8.3015, time 122.74ms
iter 37580: loss 7.8402, time 124.73ms
iter 37590: loss 6.9004, time 122.00ms
tensor(0.5314)
iter 37600: loss 7.9419, time 122.96ms
iter 37610: loss 7.0387, time 122.75ms
iter 37620: loss 7.3693, time 122.90ms
iter 37630: loss 7.7226, time 122.40ms
iter 37640: loss 7.3068, time 122.22ms
iter 37650: loss 8.2237, time 125.34ms
iter 37660: loss 7.9469, time 121.29ms
iter 37670: loss 7.6019, time 122.68ms
iter 37680: loss 8.1116, time 122.59ms
iter 37690: loss 7.9197, time 122.56ms
tensor(0.5627)
iter 37700: loss 7.3981, time 122.71ms
iter 37710: loss 7.4591, time 121.72ms
iter 37720: loss 7.8689, time 126.02ms
iter 37730: loss 7.3427, time 121.75ms
iter 37740: loss 7.1102, time 122.61ms
step 37750: train loss 6.5684, val loss 6.6026
saving checkpoint to out-shakespeare-char
iter 37750: loss 7.5863, time 2844.92ms
iter 37760: loss 8.0695, time 119.37ms
iter 37770: loss 7.9134, time 119.52ms
iter 37780: loss 8.6748, time 121.81ms
iter 37790: loss 7.7430, time 120.52ms
tensor(0.5937)
iter 37800: loss 7.9004, time 122.25ms
iter 37810: loss 8.2933, time 119.31ms
iter 37820: loss 7.1408, time 120.55ms
iter 37830: loss 8.0679, time 119.44ms
iter 37840: loss 7.4368, time 120.52ms
iter 37850: loss 7.6433, time 120.65ms
iter 37860: loss 7.9857, time 120.69ms
iter 37870: loss 7.1295, time 123.42ms
iter 37880: loss 8.2194, time 120.30ms
iter 37890: loss 7.9080, time 122.50ms
tensor(0.6243)
iter 37900: loss 8.1243, time 119.35ms
iter 37910: loss 7.9442, time 119.41ms
iter 37920: loss 7.5416, time 120.19ms
iter 37930: loss 7.9589, time 119.69ms
iter 37940: loss 7.5542, time 119.31ms
iter 37950: loss 8.0061, time 119.38ms
iter 37960: loss 7.2525, time 120.61ms
iter 37970: loss 8.0372, time 120.82ms
iter 37980: loss 7.9357, time 121.13ms
iter 37990: loss 7.0800, time 119.36ms
tensor(0.6545)
step 38000: train loss 6.6029, val loss 6.5490
saving checkpoint to out-shakespeare-char
iter 38000: loss 8.2143, time 2877.41ms
iter 38010: loss 8.8666, time 119.11ms
iter 38020: loss 8.7728, time 120.83ms
iter 38030: loss 8.1933, time 119.59ms
iter 38040: loss 7.5763, time 118.62ms
iter 38050: loss 7.8123, time 119.42ms
iter 38060: loss 7.3063, time 118.95ms
iter 38070: loss 7.4339, time 119.47ms
iter 38080: loss 7.2742, time 120.37ms
iter 38090: loss 6.8249, time 119.08ms
tensor(0.6841)
iter 38100: loss 7.8643, time 118.63ms
iter 38110: loss 7.9712, time 119.45ms
iter 38120: loss 7.8343, time 120.46ms
iter 38130: loss 7.7012, time 118.54ms
iter 38140: loss 8.4550, time 118.65ms
iter 38150: loss 8.2057, time 119.67ms
iter 38160: loss 8.0631, time 120.29ms
iter 38170: loss 8.1676, time 120.92ms
iter 38180: loss 7.5552, time 120.06ms
iter 38190: loss 8.4198, time 119.57ms
tensor(0.7129)
iter 38200: loss 7.7311, time 119.11ms
iter 38210: loss 8.5009, time 120.62ms
iter 38220: loss 8.5904, time 120.53ms
iter 38230: loss 7.7189, time 121.92ms
iter 38240: loss 8.1273, time 120.78ms
step 38250: train loss 6.6719, val loss 6.5691
saving checkpoint to out-shakespeare-char
iter 38250: loss 7.8601, time 2871.28ms
iter 38260: loss 7.6250, time 120.64ms
iter 38270: loss 7.7229, time 120.52ms
iter 38280: loss 7.7109, time 120.41ms
iter 38290: loss 7.3010, time 119.66ms
tensor(0.7409)
iter 38300: loss 7.7093, time 119.35ms
iter 38310: loss 7.8414, time 118.47ms
iter 38320: loss 8.9944, time 119.31ms
iter 38330: loss 8.2944, time 119.45ms
iter 38340: loss 8.1556, time 120.34ms
iter 38350: loss 8.3141, time 119.53ms
iter 38360: loss 7.2708, time 122.32ms
iter 38370: loss 7.8232, time 119.20ms
iter 38380: loss 8.2325, time 119.16ms
iter 38390: loss 6.7414, time 118.49ms
tensor(0.7679)
iter 38400: loss 7.6652, time 119.51ms
iter 38410: loss 8.2037, time 121.45ms
iter 38420: loss 7.8928, time 120.64ms
iter 38430: loss 7.7445, time 119.73ms
iter 38440: loss 7.4665, time 118.99ms
iter 38450: loss 7.8136, time 120.76ms
iter 38460: loss 8.1783, time 120.54ms
iter 38470: loss 8.1292, time 121.33ms
iter 38480: loss 8.2786, time 120.73ms
iter 38490: loss 8.0673, time 121.95ms
tensor(0.7939)
step 38500: train loss 6.5193, val loss 6.6640
saving checkpoint to out-shakespeare-char
iter 38500: loss 8.1377, time 2884.51ms
iter 38510: loss 7.8706, time 119.26ms
iter 38520: loss 8.0692, time 120.73ms
iter 38530: loss 7.5982, time 119.54ms
iter 38540: loss 7.5211, time 120.52ms
iter 38550: loss 7.2156, time 120.38ms
iter 38560: loss 8.2422, time 122.33ms
iter 38570: loss 7.7356, time 120.57ms
iter 38580: loss 8.4598, time 119.89ms
iter 38590: loss 8.4807, time 120.35ms
tensor(0.8187)
iter 38600: loss 8.7370, time 120.55ms
iter 38610: loss 7.5611, time 120.41ms
iter 38620: loss 7.5954, time 120.07ms
iter 38630: loss 7.9612, time 120.44ms
iter 38640: loss 7.8036, time 119.47ms
iter 38650: loss 7.5015, time 119.48ms
iter 38660: loss 7.1146, time 119.27ms
iter 38670: loss 7.0281, time 120.62ms
iter 38680: loss 8.2971, time 119.41ms
iter 38690: loss 9.0834, time 120.59ms
tensor(0.8423)
iter 38700: loss 7.9549, time 119.44ms
iter 38710: loss 7.6630, time 120.51ms
iter 38720: loss 8.0155, time 119.38ms
iter 38730: loss 7.5959, time 119.33ms
iter 38740: loss 7.8553, time 119.51ms
step 38750: train loss 6.5626, val loss 6.6624
saving checkpoint to out-shakespeare-char
iter 38750: loss 8.7919, time 2867.55ms
iter 38760: loss 7.5313, time 123.00ms
iter 38770: loss 7.1985, time 123.25ms
iter 38780: loss 7.7345, time 123.50ms
iter 38790: loss 8.4490, time 123.68ms
tensor(0.8645)
iter 38800: loss 7.9222, time 123.51ms
iter 38810: loss 8.5149, time 123.23ms
iter 38820: loss 7.6596, time 123.28ms
iter 38830: loss 8.8980, time 123.39ms
iter 38840: loss 8.2284, time 123.43ms
iter 38850: loss 8.7386, time 126.54ms
iter 38860: loss 8.5814, time 123.65ms
iter 38870: loss 8.2380, time 123.47ms
iter 38880: loss 6.9758, time 123.18ms
iter 38890: loss 8.2672, time 121.74ms
tensor(0.8853)
iter 38900: loss 7.6086, time 123.42ms
iter 38910: loss 7.5537, time 123.31ms
iter 38920: loss 7.0758, time 122.97ms
iter 38930: loss 7.7951, time 122.42ms
iter 38940: loss 8.8788, time 123.23ms
iter 38950: loss 7.9772, time 123.16ms
iter 38960: loss 8.1129, time 125.73ms
iter 38970: loss 7.3326, time 122.37ms
iter 38980: loss 7.2146, time 123.16ms
iter 38990: loss 8.1434, time 123.33ms
tensor(0.9045)
step 39000: train loss 6.6461, val loss 6.6467
saving checkpoint to out-shakespeare-char
iter 39000: loss 6.9796, time 2843.13ms
iter 39010: loss 8.6172, time 125.21ms
iter 39020: loss 7.7844, time 123.05ms
iter 39030: loss 8.1957, time 123.25ms
iter 39040: loss 7.7724, time 123.25ms
iter 39050: loss 8.4083, time 123.14ms
iter 39060: loss 8.1768, time 123.32ms
iter 39070: loss 7.5379, time 123.25ms
iter 39080: loss 8.4356, time 123.91ms
iter 39090: loss 7.9576, time 123.35ms
tensor(0.9222)
iter 39100: loss 8.2528, time 123.13ms
iter 39110: loss 7.6408, time 125.06ms
iter 39120: loss 8.0518, time 124.28ms
iter 39130: loss 7.7284, time 123.58ms
iter 39140: loss 7.9871, time 123.28ms
iter 39150: loss 7.8628, time 123.24ms
iter 39160: loss 7.7384, time 123.32ms
iter 39170: loss 8.3180, time 123.25ms
iter 39180: loss 8.7805, time 125.95ms
iter 39190: loss 7.4959, time 123.14ms
tensor(0.9382)
iter 39200: loss 6.9214, time 123.36ms
iter 39210: loss 7.3206, time 123.60ms
iter 39220: loss 8.1843, time 123.48ms
iter 39230: loss 7.7619, time 123.26ms
iter 39240: loss 7.4995, time 123.20ms
step 39250: train loss 6.6607, val loss 6.6006
saving checkpoint to out-shakespeare-char
iter 39250: loss 8.3836, time 2864.71ms
iter 39260: loss 8.1780, time 123.38ms
iter 39270: loss 8.6590, time 123.21ms
iter 39280: loss 7.2897, time 123.33ms
iter 39290: loss 7.9297, time 125.08ms
tensor(0.9524)
iter 39300: loss 7.1558, time 122.81ms
iter 39310: loss 8.1950, time 123.39ms
iter 39320: loss 8.2266, time 123.21ms
iter 39330: loss 7.0168, time 123.14ms
iter 39340: loss 6.9801, time 123.45ms
iter 39350: loss 8.4989, time 123.23ms
iter 39360: loss 7.6528, time 123.19ms
iter 39370: loss 7.7085, time 123.22ms
iter 39380: loss 7.6245, time 125.89ms
iter 39390: loss 8.0706, time 123.15ms
tensor(0.9649)
iter 39400: loss 8.5592, time 123.48ms
iter 39410: loss 8.5246, time 123.34ms
iter 39420: loss 8.0268, time 123.24ms
iter 39430: loss 8.0837, time 123.16ms
iter 39440: loss 7.9444, time 123.16ms
iter 39450: loss 7.1303, time 125.92ms
iter 39460: loss 7.9995, time 122.29ms
iter 39470: loss 7.2686, time 123.25ms
iter 39480: loss 7.2565, time 123.15ms
iter 39490: loss 7.7505, time 123.04ms
tensor(0.9755)
step 39500: train loss 6.6026, val loss 6.5864
saving checkpoint to out-shakespeare-char
iter 39500: loss 7.7418, time 2860.53ms
iter 39510: loss 8.6607, time 122.93ms
iter 39520: loss 8.9750, time 122.92ms
iter 39530: loss 7.6540, time 125.52ms
iter 39540: loss 7.5900, time 122.79ms
iter 39550: loss 8.3062, time 122.91ms
iter 39560: loss 8.0659, time 121.90ms
iter 39570: loss 8.3738, time 122.42ms
iter 39580: loss 7.8310, time 122.78ms
iter 39590: loss 8.2925, time 122.87ms
tensor(0.9843)
iter 39600: loss 9.3170, time 125.25ms
iter 39610: loss 7.5550, time 122.79ms
iter 39620: loss 8.4511, time 122.84ms
iter 39630: loss 7.3251, time 122.58ms
iter 39640: loss 8.0548, time 122.83ms
iter 39650: loss 8.1734, time 123.07ms
iter 39660: loss 8.4117, time 123.13ms
iter 39670: loss 8.1168, time 125.73ms
iter 39680: loss 8.0803, time 122.88ms
iter 39690: loss 9.2942, time 122.65ms
tensor(0.9911)
iter 39700: loss 8.7357, time 123.08ms
iter 39710: loss 7.2633, time 122.89ms
iter 39720: loss 7.4108, time 122.76ms
iter 39730: loss 7.0694, time 122.25ms
iter 39740: loss 7.6797, time 123.04ms
step 39750: train loss 6.6266, val loss 6.5681
saving checkpoint to out-shakespeare-char
iter 39750: loss 7.3501, time 2877.60ms
iter 39760: loss 7.7800, time 122.38ms
iter 39770: loss 7.6442, time 122.56ms
iter 39780: loss 8.1202, time 122.92ms
iter 39790: loss 8.4258, time 125.35ms
tensor(0.9961)
iter 39800: loss 7.4387, time 122.83ms
iter 39810: loss 7.9466, time 123.13ms
iter 39820: loss 7.3238, time 122.78ms
iter 39830: loss 7.1883, time 122.91ms
iter 39840: loss 8.0614, time 122.69ms
iter 39850: loss 6.8210, time 122.70ms
iter 39860: loss 7.5063, time 122.96ms
iter 39870: loss 7.7647, time 122.96ms
iter 39880: loss 7.9973, time 122.76ms
iter 39890: loss 7.9232, time 122.82ms
tensor(0.9990)
iter 39900: loss 7.3151, time 125.77ms
iter 39910: loss 8.1437, time 122.73ms
iter 39920: loss 8.2915, time 122.92ms
iter 39930: loss 7.6418, time 122.76ms
iter 39940: loss 8.5106, time 122.63ms
iter 39950: loss 8.1930, time 122.82ms
iter 39960: loss 7.4024, time 124.00ms
iter 39970: loss 7.8036, time 125.88ms
iter 39980: loss 8.0271, time 122.62ms
iter 39990: loss 8.3624, time 122.92ms
tensor(1.)
step 40000: train loss 6.6340, val loss 6.5994
saving checkpoint to out-shakespeare-char
iter 40000: loss 7.7718, time 2845.53ms
iter 40010: loss 7.3988, time 123.07ms
iter 40020: loss 8.4591, time 122.87ms
iter 40030: loss 7.9065, time 122.78ms
iter 40040: loss 7.9207, time 122.84ms
iter 40050: loss 7.7755, time 123.07ms
iter 40060: loss 7.6573, time 125.99ms
iter 40070: loss 7.6301, time 122.84ms
iter 40080: loss 7.7990, time 122.36ms
iter 40090: loss 8.5218, time 122.89ms
tensor(0.9990)
iter 40100: loss 8.3809, time 123.07ms
iter 40110: loss 8.0454, time 123.11ms
iter 40120: loss 8.0433, time 122.71ms
iter 40130: loss 7.8249, time 123.35ms
iter 40140: loss 7.6599, time 125.14ms
iter 40150: loss 6.7442, time 122.87ms
iter 40160: loss 8.6662, time 122.74ms
iter 40170: loss 8.1387, time 122.88ms
iter 40180: loss 8.0899, time 122.99ms
iter 40190: loss 7.9821, time 122.82ms
tensor(0.9961)
iter 40200: loss 7.9606, time 122.40ms
iter 40210: loss 8.4412, time 125.80ms
iter 40220: loss 7.6548, time 122.67ms
iter 40230: loss 8.7448, time 122.80ms
iter 40240: loss 8.8727, time 122.94ms
step 40250: train loss 6.5911, val loss 6.6052
saving checkpoint to out-shakespeare-char
iter 40250: loss 7.4160, time 2862.55ms
iter 40260: loss 8.0779, time 123.18ms
iter 40270: loss 7.3726, time 123.68ms
iter 40280: loss 7.4025, time 123.35ms
iter 40290: loss 8.6235, time 123.31ms
tensor(0.9911)
iter 40300: loss 7.4209, time 123.23ms
iter 40310: loss 6.6066, time 122.80ms
iter 40320: loss 7.4898, time 122.64ms
iter 40330: loss 7.2043, time 122.72ms
iter 40340: loss 7.0840, time 122.94ms
iter 40350: loss 7.5233, time 122.96ms
iter 40360: loss 8.4665, time 122.93ms
iter 40370: loss 7.6264, time 123.20ms
iter 40380: loss 7.9251, time 125.60ms
iter 40390: loss 8.4105, time 122.57ms
tensor(0.9843)
iter 40400: loss 7.2052, time 122.46ms
iter 40410: loss 7.2874, time 122.38ms
iter 40420: loss 8.1372, time 122.48ms
iter 40430: loss 7.7980, time 122.62ms
iter 40440: loss 8.0906, time 122.42ms
iter 40450: loss 7.5348, time 125.29ms
iter 40460: loss 8.0745, time 122.36ms
iter 40470: loss 7.5964, time 122.37ms
iter 40480: loss 7.5806, time 124.36ms
iter 40490: loss 7.6448, time 123.06ms
tensor(0.9755)
step 40500: train loss 6.5723, val loss 6.6034
saving checkpoint to out-shakespeare-char
iter 40500: loss 7.5859, time 2851.42ms
iter 40510: loss 8.1391, time 123.32ms
iter 40520: loss 7.4525, time 122.72ms
iter 40530: loss 8.2999, time 125.77ms
iter 40540: loss 8.1873, time 123.08ms
iter 40550: loss 7.4068, time 122.87ms
iter 40560: loss 6.9659, time 122.99ms
iter 40570: loss 8.3014, time 122.64ms
iter 40580: loss 7.4814, time 122.42ms
iter 40590: loss 7.8471, time 122.59ms
tensor(0.9649)
iter 40600: loss 7.9726, time 125.66ms
iter 40610: loss 8.2317, time 122.63ms
iter 40620: loss 7.7695, time 122.46ms
iter 40630: loss 8.4289, time 122.55ms
iter 40640: loss 8.3682, time 122.59ms
iter 40650: loss 7.9143, time 122.66ms
iter 40660: loss 7.3952, time 122.10ms
iter 40670: loss 7.4933, time 126.11ms
iter 40680: loss 7.9817, time 125.26ms
iter 40690: loss 7.6844, time 122.41ms
tensor(0.9524)
iter 40700: loss 7.2622, time 122.80ms
iter 40710: loss 8.1897, time 122.55ms
iter 40720: loss 8.0204, time 122.80ms
iter 40730: loss 7.8321, time 122.97ms
iter 40740: loss 7.1532, time 126.16ms
step 40750: train loss 6.6207, val loss 6.5511
saving checkpoint to out-shakespeare-char
iter 40750: loss 7.8112, time 2856.44ms
iter 40760: loss 7.8826, time 119.27ms
iter 40770: loss 7.5940, time 120.45ms
iter 40780: loss 6.3760, time 119.08ms
iter 40790: loss 7.1860, time 120.52ms
tensor(0.9382)
iter 40800: loss 7.1641, time 119.01ms
iter 40810: loss 8.0320, time 119.45ms
iter 40820: loss 8.5824, time 119.09ms
iter 40830: loss 7.1859, time 119.00ms
iter 40840: loss 7.4367, time 119.44ms
iter 40850: loss 8.2005, time 118.95ms
iter 40860: loss 8.5365, time 118.97ms
iter 40870: loss 7.8394, time 118.90ms
iter 40880: loss 7.6939, time 120.31ms
iter 40890: loss 7.3359, time 119.18ms
tensor(0.9222)
iter 40900: loss 7.8036, time 120.39ms
iter 40910: loss 7.7857, time 118.72ms
iter 40920: loss 7.8607, time 119.11ms
iter 40930: loss 8.4238, time 118.99ms
iter 40940: loss 7.7917, time 119.95ms
iter 40950: loss 8.5897, time 121.81ms
iter 40960: loss 7.9659, time 119.12ms
iter 40970: loss 8.1447, time 120.28ms
iter 40980: loss 7.8581, time 119.19ms
iter 40990: loss 8.2172, time 119.67ms
tensor(0.9045)
step 41000: train loss 6.5918, val loss 6.5730
saving checkpoint to out-shakespeare-char
iter 41000: loss 6.9755, time 2860.04ms
iter 41010: loss 7.1067, time 119.45ms
iter 41020: loss 8.5724, time 118.91ms
iter 41030: loss 7.0820, time 120.80ms
iter 41040: loss 7.6070, time 119.40ms
iter 41050: loss 7.7390, time 119.45ms
iter 41060: loss 7.9491, time 119.38ms
iter 41070: loss 8.0948, time 119.33ms
iter 41080: loss 7.5256, time 120.77ms
iter 41090: loss 7.6142, time 120.77ms
tensor(0.8853)
iter 41100: loss 7.6522, time 120.55ms
iter 41110: loss 8.2609, time 120.60ms
iter 41120: loss 6.6968, time 120.47ms
iter 41130: loss 7.4274, time 118.79ms
iter 41140: loss 7.5453, time 120.63ms
iter 41150: loss 8.2991, time 120.76ms
iter 41160: loss 7.8016, time 120.48ms
iter 41170: loss 8.0863, time 119.49ms
iter 41180: loss 7.1838, time 120.10ms
iter 41190: loss 8.3710, time 119.60ms
tensor(0.8645)
iter 41200: loss 7.0778, time 120.90ms
iter 41210: loss 7.4608, time 120.61ms
iter 41220: loss 7.8302, time 119.50ms
iter 41230: loss 7.6156, time 121.08ms
iter 41240: loss 8.0901, time 120.67ms
step 41250: train loss 6.5736, val loss 6.5889
saving checkpoint to out-shakespeare-char
iter 41250: loss 7.5937, time 2864.19ms
iter 41260: loss 8.2773, time 119.26ms
iter 41270: loss 8.7112, time 119.48ms
iter 41280: loss 7.4506, time 119.14ms
iter 41290: loss 7.2487, time 119.09ms
tensor(0.8423)
iter 41300: loss 6.1965, time 119.09ms
iter 41310: loss 7.4505, time 120.23ms
iter 41320: loss 7.8376, time 119.51ms
iter 41330: loss 7.5857, time 120.41ms
iter 41340: loss 8.2544, time 119.18ms
iter 41350: loss 7.3817, time 119.02ms
iter 41360: loss 8.0038, time 119.07ms
iter 41370: loss 7.8493, time 119.24ms
iter 41380: loss 8.2595, time 120.15ms
iter 41390: loss 8.2520, time 119.06ms
tensor(0.8187)
iter 41400: loss 7.8781, time 119.09ms
iter 41410: loss 8.0582, time 119.17ms
iter 41420: loss 7.6072, time 118.41ms
iter 41430: loss 8.0804, time 120.50ms
iter 41440: loss 7.9171, time 119.18ms
iter 41450: loss 7.8959, time 119.03ms
iter 41460: loss 7.7390, time 119.52ms
iter 41470: loss 7.7164, time 119.14ms
iter 41480: loss 7.7694, time 119.01ms
iter 41490: loss 7.7279, time 120.36ms
tensor(0.7939)
step 41500: train loss 6.5157, val loss 6.5390
saving checkpoint to out-shakespeare-char
iter 41500: loss 7.9379, time 2867.92ms
iter 41510: loss 8.1168, time 118.21ms
iter 41520: loss 7.5395, time 119.12ms
iter 41530: loss 7.9565, time 118.05ms
iter 41540: loss 7.9495, time 119.14ms
iter 41550: loss 7.1585, time 120.01ms
iter 41560: loss 6.7758, time 121.63ms
iter 41570: loss 7.9969, time 119.05ms
iter 41580: loss 8.2290, time 119.26ms
iter 41590: loss 7.0625, time 119.84ms
tensor(0.7679)
iter 41600: loss 8.0806, time 120.41ms
iter 41610: loss 8.3465, time 120.30ms
iter 41620: loss 7.0373, time 118.98ms
iter 41630: loss 7.4754, time 119.22ms
iter 41640: loss 7.5415, time 119.18ms
iter 41650: loss 7.2747, time 121.37ms
iter 41660: loss 7.9258, time 120.07ms
iter 41670: loss 7.4247, time 121.82ms
iter 41680: loss 7.5570, time 120.35ms
iter 41690: loss 7.3140, time 121.48ms
tensor(0.7409)
iter 41700: loss 7.6634, time 118.22ms
iter 41710: loss 8.1419, time 120.46ms
iter 41720: loss 7.4478, time 119.23ms
iter 41730: loss 7.5573, time 119.16ms
iter 41740: loss 7.3289, time 121.83ms
step 41750: train loss 6.4529, val loss 6.4952
saving checkpoint to out-shakespeare-char
iter 41750: loss 8.0452, time 2865.34ms
iter 41760: loss 7.6453, time 120.18ms
iter 41770: loss 7.0092, time 122.09ms
iter 41780: loss 8.1672, time 119.47ms
iter 41790: loss 7.3111, time 119.20ms
tensor(0.7129)
iter 41800: loss 7.9670, time 120.41ms
iter 41810: loss 8.3158, time 121.16ms
iter 41820: loss 7.4181, time 119.24ms
iter 41830: loss 6.9994, time 120.42ms
iter 41840: loss 7.8178, time 120.31ms
iter 41850: loss 7.9248, time 119.29ms
iter 41860: loss 7.2771, time 120.44ms
iter 41870: loss 8.3149, time 119.90ms
iter 41880: loss 7.0660, time 120.39ms
iter 41890: loss 7.3156, time 120.00ms
tensor(0.6841)
iter 41900: loss 7.5246, time 120.47ms
iter 41910: loss 7.6459, time 121.98ms
iter 41920: loss 8.3372, time 119.65ms
iter 41930: loss 7.5065, time 122.33ms
iter 41940: loss 7.9281, time 119.35ms
iter 41950: loss 8.3633, time 119.29ms
iter 41960: loss 7.4586, time 119.31ms
iter 41970: loss 8.1171, time 118.68ms
iter 41980: loss 7.8397, time 119.91ms
iter 41990: loss 8.7340, time 120.99ms
tensor(0.6545)
step 42000: train loss 6.5211, val loss 6.4675
saving checkpoint to out-shakespeare-char
iter 42000: loss 7.8452, time 2868.49ms
iter 42010: loss 7.0224, time 120.16ms
iter 42020: loss 8.3209, time 119.53ms
iter 42030: loss 7.4927, time 121.51ms
iter 42040: loss 6.7895, time 119.51ms
iter 42050: loss 7.5718, time 119.97ms
iter 42060: loss 7.6672, time 120.71ms
iter 42070: loss 7.0000, time 120.49ms
iter 42080: loss 7.8767, time 120.88ms
iter 42090: loss 7.6495, time 120.35ms
tensor(0.6243)
iter 42100: loss 8.1127, time 119.52ms
iter 42110: loss 7.7509, time 118.73ms
iter 42120: loss 7.3553, time 121.20ms
iter 42130: loss 7.6946, time 119.57ms
iter 42140: loss 7.5368, time 119.70ms
iter 42150: loss 7.3384, time 119.65ms
iter 42160: loss 7.2880, time 119.61ms
iter 42170: loss 8.4425, time 119.91ms
iter 42180: loss 7.1480, time 119.72ms
iter 42190: loss 7.6709, time 121.61ms
tensor(0.5937)
iter 42200: loss 7.2908, time 119.62ms
iter 42210: loss 7.7532, time 119.53ms
iter 42220: loss 7.7988, time 119.54ms
iter 42230: loss 7.4040, time 118.73ms
iter 42240: loss 7.0895, time 120.42ms
step 42250: train loss 6.4787, val loss 6.4853
saving checkpoint to out-shakespeare-char
iter 42250: loss 7.3210, time 2869.53ms
iter 42260: loss 8.0974, time 123.66ms
iter 42270: loss 7.4021, time 123.29ms
iter 42280: loss 7.2905, time 123.67ms
iter 42290: loss 6.9097, time 123.64ms
tensor(0.5627)
iter 42300: loss 7.5722, time 123.19ms
iter 42310: loss 6.7961, time 123.28ms
iter 42320: loss 6.5847, time 123.22ms
iter 42330: loss 7.2349, time 126.22ms
iter 42340: loss 7.8468, time 123.04ms
iter 42350: loss 6.8542, time 123.25ms
iter 42360: loss 7.5098, time 122.66ms
iter 42370: loss 7.5326, time 123.00ms
iter 42380: loss 8.2738, time 123.07ms
iter 42390: loss 6.7954, time 122.68ms
tensor(0.5314)
iter 42400: loss 8.7434, time 125.46ms
iter 42410: loss 7.4745, time 121.26ms
iter 42420: loss 7.3954, time 122.84ms
iter 42430: loss 7.2333, time 123.03ms
iter 42440: loss 7.2463, time 122.71ms
iter 42450: loss 7.7299, time 123.15ms
iter 42460: loss 7.6649, time 122.60ms
iter 42470: loss 7.2239, time 125.44ms
iter 42480: loss 7.5660, time 122.61ms
iter 42490: loss 6.8772, time 122.17ms
tensor(0.5000)
step 42500: train loss 6.4282, val loss 6.4533
saving checkpoint to out-shakespeare-char
iter 42500: loss 8.3179, time 2871.08ms
iter 42510: loss 7.2188, time 123.03ms
iter 42520: loss 7.9516, time 123.03ms
iter 42530: loss 7.6812, time 122.13ms
iter 42540: loss 8.9386, time 123.09ms
iter 42550: loss 7.3934, time 123.13ms
iter 42560: loss 7.4585, time 122.83ms
iter 42570: loss 7.2933, time 123.26ms
iter 42580: loss 8.0133, time 120.15ms
iter 42590: loss 7.6189, time 122.35ms
tensor(0.4686)
iter 42600: loss 7.1424, time 120.06ms
iter 42610: loss 7.9788, time 119.46ms
iter 42620: loss 7.5158, time 120.00ms
iter 42630: loss 7.6560, time 119.51ms
iter 42640: loss 8.2525, time 119.44ms
iter 42650: loss 7.9784, time 120.57ms
iter 42660: loss 7.4999, time 119.35ms
iter 42670: loss 8.4241, time 119.36ms
iter 42680: loss 7.8176, time 119.37ms
iter 42690: loss 7.8362, time 119.71ms
tensor(0.4373)
iter 42700: loss 7.8951, time 122.33ms
iter 42710: loss 8.3302, time 120.47ms
iter 42720: loss 7.4578, time 120.58ms
iter 42730: loss 7.8252, time 120.13ms
iter 42740: loss 8.5528, time 119.39ms
step 42750: train loss 6.3868, val loss 6.4264
saving checkpoint to out-shakespeare-char
iter 42750: loss 7.6578, time 2868.49ms
iter 42760: loss 8.0356, time 120.49ms
iter 42770: loss 7.6778, time 120.58ms
iter 42780: loss 7.7200, time 118.75ms
iter 42790: loss 8.2273, time 120.81ms
tensor(0.4063)
iter 42800: loss 7.4354, time 119.98ms
iter 42810: loss 7.5462, time 119.80ms
iter 42820: loss 8.6332, time 119.57ms
iter 42830: loss 7.7413, time 119.65ms
iter 42840: loss 7.4593, time 119.63ms
iter 42850: loss 7.6955, time 121.43ms
iter 42860: loss 6.7721, time 119.90ms
iter 42870: loss 7.5031, time 120.84ms
iter 42880: loss 8.0795, time 122.88ms
iter 42890: loss 7.7530, time 120.60ms
tensor(0.3757)
iter 42900: loss 7.6982, time 117.85ms
iter 42910: loss 7.5290, time 120.35ms
iter 42920: loss 7.4904, time 119.20ms
iter 42930: loss 8.3541, time 121.31ms
iter 42940: loss 7.1693, time 120.55ms
iter 42950: loss 7.0832, time 120.61ms
iter 42960: loss 8.2289, time 120.60ms
iter 42970: loss 7.5248, time 120.35ms
iter 42980: loss 7.7677, time 120.32ms
iter 42990: loss 8.0549, time 119.35ms
tensor(0.3455)
step 43000: train loss 6.3948, val loss 6.4219
saving checkpoint to out-shakespeare-char
iter 43000: loss 7.3681, time 2877.36ms
iter 43010: loss 7.3270, time 119.25ms
iter 43020: loss 8.1753, time 121.04ms
iter 43030: loss 7.6054, time 119.70ms
iter 43040: loss 7.3644, time 120.63ms
iter 43050: loss 7.7065, time 119.48ms
iter 43060: loss 8.1508, time 119.67ms
iter 43070: loss 7.2149, time 118.87ms
iter 43080: loss 7.1804, time 120.50ms
iter 43090: loss 8.4072, time 118.88ms
tensor(0.3159)
iter 43100: loss 6.9882, time 118.88ms
iter 43110: loss 7.1219, time 120.27ms
iter 43120: loss 7.9724, time 121.15ms
iter 43130: loss 7.8182, time 119.30ms
iter 43140: loss 7.6199, time 119.42ms
iter 43150: loss 7.7265, time 120.04ms
iter 43160: loss 8.3759, time 119.49ms
iter 43170: loss 7.2102, time 121.23ms
iter 43180: loss 7.4756, time 120.87ms
iter 43190: loss 7.3779, time 121.26ms
tensor(0.2871)
iter 43200: loss 7.4215, time 119.32ms
iter 43210: loss 7.7934, time 119.81ms
iter 43220: loss 7.5757, time 121.01ms
iter 43230: loss 6.9802, time 119.28ms
iter 43240: loss 7.7644, time 122.43ms
step 43250: train loss 6.3533, val loss 6.3815
saving checkpoint to out-shakespeare-char
iter 43250: loss 7.8921, time 2860.58ms
iter 43260: loss 7.3790, time 118.43ms
iter 43270: loss 7.2855, time 119.23ms
iter 43280: loss 7.7468, time 119.49ms
iter 43290: loss 7.8249, time 120.51ms
tensor(0.2591)
iter 43300: loss 7.8521, time 119.55ms
iter 43310: loss 7.2503, time 120.51ms
iter 43320: loss 6.8630, time 119.33ms
iter 43330: loss 6.7574, time 119.20ms
iter 43340: loss 8.1905, time 119.15ms
iter 43350: loss 8.0295, time 119.35ms
iter 43360: loss 7.7947, time 119.27ms
iter 43370: loss 7.1344, time 119.37ms
iter 43380: loss 8.2838, time 120.65ms
iter 43390: loss 7.7039, time 119.45ms
tensor(0.2321)
iter 43400: loss 8.3475, time 120.58ms
iter 43410: loss 7.3328, time 118.57ms
iter 43420: loss 7.5464, time 120.57ms
iter 43430: loss 7.6013, time 120.08ms
iter 43440: loss 6.8784, time 120.52ms
iter 43450: loss 7.0871, time 119.06ms
iter 43460: loss 7.1622, time 119.93ms
iter 43470: loss 6.5994, time 119.16ms
iter 43480: loss 8.1504, time 119.14ms
iter 43490: loss 7.5456, time 120.32ms
tensor(0.2061)
step 43500: train loss 6.3694, val loss 6.3324
saving checkpoint to out-shakespeare-char
iter 43500: loss 6.9912, time 2862.50ms
iter 43510: loss 7.2695, time 121.84ms
iter 43520: loss 7.6638, time 120.22ms
iter 43530: loss 7.4362, time 119.67ms
iter 43540: loss 7.0881, time 118.78ms
iter 43550: loss 7.8658, time 119.42ms
iter 43560: loss 7.6072, time 119.23ms
iter 43570: loss 7.6254, time 120.58ms
iter 43580: loss 8.2495, time 120.10ms
iter 43590: loss 7.6872, time 118.88ms
tensor(0.1813)
iter 43600: loss 7.5753, time 118.84ms
iter 43610: loss 7.9894, time 119.14ms
iter 43620: loss 7.2928, time 119.31ms
iter 43630: loss 8.3014, time 119.50ms
iter 43640: loss 8.3371, time 120.53ms
iter 43650: loss 7.5694, time 119.09ms
iter 43660: loss 8.1230, time 119.24ms
iter 43670: loss 7.8885, time 121.88ms
iter 43680: loss 7.4081, time 120.93ms
iter 43690: loss 7.4574, time 122.29ms
tensor(0.1577)
iter 43700: loss 7.4822, time 118.43ms
iter 43710: loss 7.2920, time 119.08ms
iter 43720: loss 7.7903, time 119.22ms
iter 43730: loss 7.8254, time 119.23ms
iter 43740: loss 6.8818, time 120.27ms
step 43750: train loss 6.4113, val loss 6.4180
saving checkpoint to out-shakespeare-char
iter 43750: loss 6.6772, time 2859.99ms
iter 43760: loss 6.8809, time 125.13ms
iter 43770: loss 8.3404, time 123.62ms
iter 43780: loss 7.1542, time 122.92ms
iter 43790: loss 7.5716, time 122.39ms
tensor(0.1355)
iter 43800: loss 7.4940, time 123.03ms
iter 43810: loss 6.5973, time 123.28ms
iter 43820: loss 7.8595, time 122.86ms
iter 43830: loss 6.8733, time 122.95ms
iter 43840: loss 7.1952, time 121.52ms
iter 43850: loss 7.6777, time 119.27ms
iter 43860: loss 7.5503, time 120.24ms
iter 43870: loss 8.1457, time 119.26ms
iter 43880: loss 7.5752, time 119.79ms
iter 43890: loss 8.3679, time 119.09ms
tensor(0.1147)
iter 43900: loss 8.2854, time 121.70ms
iter 43910: loss 7.8283, time 119.98ms
iter 43920: loss 7.6542, time 119.10ms
iter 43930: loss 7.1509, time 119.28ms
iter 43940: loss 7.5299, time 119.44ms
iter 43950: loss 7.9493, time 122.17ms
iter 43960: loss 7.7690, time 119.19ms
iter 43970: loss 7.9170, time 120.41ms
iter 43980: loss 7.1454, time 119.18ms
iter 43990: loss 8.3819, time 119.71ms
tensor(0.0955)
step 44000: train loss 6.3171, val loss 6.3566
saving checkpoint to out-shakespeare-char
iter 44000: loss 7.6291, time 2856.72ms
iter 44010: loss 7.3204, time 122.45ms
iter 44020: loss 7.6019, time 125.17ms
iter 44030: loss 7.1437, time 122.45ms
iter 44040: loss 8.2223, time 123.00ms
iter 44050: loss 7.5893, time 122.67ms
iter 44060: loss 7.6969, time 122.51ms
iter 44070: loss 7.3728, time 123.03ms
iter 44080: loss 6.7945, time 123.89ms
iter 44090: loss 6.8557, time 123.13ms
tensor(0.0778)
iter 44100: loss 7.6346, time 125.60ms
iter 44110: loss 7.6185, time 123.01ms
iter 44120: loss 7.0546, time 122.42ms
iter 44130: loss 7.1463, time 122.53ms
iter 44140: loss 7.5276, time 122.32ms
iter 44150: loss 7.6334, time 122.47ms
iter 44160: loss 7.9127, time 122.22ms
iter 44170: loss 8.4282, time 122.20ms
iter 44180: loss 7.1777, time 122.78ms
iter 44190: loss 7.2181, time 125.31ms
tensor(0.0618)
iter 44200: loss 8.4057, time 122.24ms
iter 44210: loss 7.7285, time 122.34ms
iter 44220: loss 7.5164, time 122.29ms
iter 44230: loss 7.2187, time 124.24ms
iter 44240: loss 7.0730, time 123.74ms
step 44250: train loss 6.3764, val loss 6.3070
saving checkpoint to out-shakespeare-char
iter 44250: loss 8.6133, time 2854.39ms
iter 44260: loss 7.4713, time 123.07ms
iter 44270: loss 7.5851, time 119.14ms
iter 44280: loss 7.2781, time 118.20ms
iter 44290: loss 7.5683, time 119.10ms
tensor(0.0476)
iter 44300: loss 7.4349, time 119.45ms
iter 44310: loss 7.0918, time 121.37ms
iter 44320: loss 7.4348, time 118.96ms
iter 44330: loss 7.1945, time 120.61ms
iter 44340: loss 8.0095, time 119.26ms
iter 44350: loss 7.8400, time 119.14ms
iter 44360: loss 7.9797, time 118.39ms
iter 44370: loss 7.6483, time 119.56ms
iter 44380: loss 8.0584, time 119.80ms
iter 44390: loss 7.6212, time 120.25ms
tensor(0.0351)
iter 44400: loss 7.2703, time 119.11ms
iter 44410: loss 7.3607, time 119.79ms
iter 44420: loss 8.1458, time 121.05ms
iter 44430: loss 7.8368, time 120.08ms
iter 44440: loss 7.5960, time 120.95ms
iter 44450: loss 7.5795, time 119.35ms
iter 44460: loss 6.9464, time 119.85ms
iter 44470: loss 7.4418, time 119.86ms
iter 44480: loss 8.0274, time 119.15ms
iter 44490: loss 7.8010, time 119.89ms
tensor(0.0245)
step 44500: train loss 6.2720, val loss 6.3326
saving checkpoint to out-shakespeare-char
iter 44500: loss 7.2200, time 2851.23ms
iter 44510: loss 6.8220, time 122.92ms
iter 44520: loss 7.6845, time 118.92ms
iter 44530: loss 8.2227, time 120.77ms
iter 44540: loss 7.9267, time 118.01ms
iter 44550: loss 8.1903, time 120.77ms
iter 44560: loss 8.1995, time 120.83ms
iter 44570: loss 7.7360, time 119.56ms
iter 44580: loss 7.9770, time 119.55ms
iter 44590: loss 8.1510, time 119.06ms
tensor(0.0157)
iter 44600: loss 7.2266, time 121.32ms
iter 44610: loss 7.1339, time 120.65ms
iter 44620: loss 8.0431, time 120.85ms
iter 44630: loss 7.5345, time 119.49ms
iter 44640: loss 8.2321, time 120.76ms
iter 44650: loss 7.3523, time 120.65ms
iter 44660: loss 7.1623, time 120.46ms
iter 44670: loss 7.5163, time 119.04ms
iter 44680: loss 7.6542, time 119.64ms
iter 44690: loss 8.1382, time 120.62ms
tensor(0.0089)
iter 44700: loss 6.6826, time 119.62ms
iter 44710: loss 7.8544, time 119.68ms
iter 44720: loss 7.9268, time 118.85ms
iter 44730: loss 7.6172, time 119.27ms
iter 44740: loss 8.1664, time 119.56ms
step 44750: train loss 6.2955, val loss 6.3262
saving checkpoint to out-shakespeare-char
iter 44750: loss 7.3596, time 2869.58ms
iter 44760: loss 7.8964, time 122.15ms
iter 44770: loss 8.1273, time 120.10ms
iter 44780: loss 8.1592, time 120.53ms
iter 44790: loss 7.2047, time 119.25ms
tensor(0.0039)
iter 44800: loss 7.2206, time 119.03ms
iter 44810: loss 7.1651, time 118.96ms
iter 44820: loss 7.9278, time 119.14ms
iter 44830: loss 7.7209, time 120.13ms
iter 44840: loss 7.2370, time 119.61ms
iter 44850: loss 7.2066, time 119.21ms
iter 44860: loss 7.5167, time 121.73ms
iter 44870: loss 7.1101, time 119.65ms
iter 44880: loss 6.8666, time 118.90ms
iter 44890: loss 7.1299, time 119.08ms
tensor(0.0010)
iter 44900: loss 7.1503, time 119.03ms
iter 44910: loss 6.3672, time 121.70ms
iter 44920: loss 7.8583, time 118.97ms
iter 44930: loss 7.6285, time 120.49ms
iter 44940: loss 7.4096, time 120.47ms
iter 44950: loss 7.1815, time 119.91ms
iter 44960: loss 7.8434, time 120.17ms
iter 44970: loss 7.4460, time 119.16ms
iter 44980: loss 7.8929, time 119.09ms
iter 44990: loss 7.0942, time 119.13ms
tensor(0.0010)
step 45000: train loss 6.2894, val loss 6.3294
saving checkpoint to out-shakespeare-char
iter 45000: loss 8.4172, time 2854.92ms
iter 45010: loss 7.1041, time 121.09ms
iter 45020: loss 7.9107, time 119.21ms
iter 45030: loss 7.1713, time 119.29ms
iter 45040: loss 8.0635, time 121.78ms
iter 45050: loss 6.4906, time 120.03ms
iter 45060: loss 7.6043, time 121.08ms
iter 45070: loss 7.6404, time 119.37ms
iter 45080: loss 6.8560, time 119.45ms
iter 45090: loss 7.9571, time 120.49ms
tensor(0.0010)
iter 45100: loss 7.1384, time 119.91ms
iter 45110: loss 7.9702, time 119.25ms
iter 45120: loss 6.7036, time 120.36ms
iter 45130: loss 7.0303, time 119.39ms
iter 45140: loss 7.1746, time 119.25ms
iter 45150: loss 7.5474, time 121.86ms
iter 45160: loss 7.5550, time 118.25ms
iter 45170: loss 7.6193, time 119.28ms
iter 45180: loss 8.2441, time 119.17ms
iter 45190: loss 6.9077, time 120.44ms
tensor(0.0039)
iter 45200: loss 7.0576, time 120.67ms
iter 45210: loss 8.1108, time 119.40ms
iter 45220: loss 7.3428, time 119.26ms
iter 45230: loss 7.4654, time 119.59ms
iter 45240: loss 7.7602, time 122.43ms
step 45250: train loss 6.3243, val loss 6.3346
saving checkpoint to out-shakespeare-char
iter 45250: loss 7.2808, time 2871.62ms
iter 45260: loss 7.4741, time 121.88ms
iter 45270: loss 7.5386, time 122.57ms
iter 45280: loss 8.0632, time 122.41ms
iter 45290: loss 7.6562, time 122.77ms
tensor(0.0089)
iter 45300: loss 7.6551, time 126.21ms
iter 45310: loss 6.5776, time 123.10ms
iter 45320: loss 7.1856, time 122.77ms
iter 45330: loss 8.2036, time 122.77ms
iter 45340: loss 7.3873, time 122.74ms
iter 45350: loss 8.1115, time 122.97ms
iter 45360: loss 7.8453, time 122.83ms
iter 45370: loss 7.9053, time 122.82ms
iter 45380: loss 8.3263, time 122.45ms
iter 45390: loss 7.0470, time 125.70ms
tensor(0.0157)
iter 45400: loss 7.8190, time 122.24ms
iter 45410: loss 7.3871, time 122.91ms
iter 45420: loss 7.2636, time 123.26ms
iter 45430: loss 7.8023, time 123.36ms
iter 45440: loss 7.5078, time 122.88ms
iter 45450: loss 8.3089, time 124.38ms
iter 45460: loss 8.0850, time 122.84ms
iter 45470: loss 8.4626, time 123.00ms
iter 45480: loss 7.7092, time 123.18ms
iter 45490: loss 8.1011, time 122.80ms
tensor(0.0245)
step 45500: train loss 6.3101, val loss 6.3403
saving checkpoint to out-shakespeare-char
iter 45500: loss 6.9588, time 2838.56ms
iter 45510: loss 7.0241, time 123.28ms
iter 45520: loss 8.1407, time 125.30ms
iter 45530: loss 7.3360, time 123.01ms
iter 45540: loss 7.7903, time 122.90ms
iter 45550: loss 8.4770, time 122.82ms
iter 45560: loss 7.6593, time 122.53ms
iter 45570: loss 7.5226, time 122.82ms
iter 45580: loss 7.5683, time 122.72ms
iter 45590: loss 6.9825, time 125.36ms
tensor(0.0351)
iter 45600: loss 8.2915, time 122.55ms
iter 45610: loss 7.5320, time 122.59ms
iter 45620: loss 8.0539, time 122.69ms
iter 45630: loss 7.2468, time 123.95ms
iter 45640: loss 6.8132, time 122.57ms
iter 45650: loss 7.7710, time 121.98ms
iter 45660: loss 8.3050, time 125.54ms
iter 45670: loss 7.6883, time 123.81ms
iter 45680: loss 7.8532, time 122.80ms
iter 45690: loss 8.1125, time 122.68ms
tensor(0.0476)
iter 45700: loss 8.0480, time 122.78ms
iter 45710: loss 6.9348, time 122.56ms
iter 45720: loss 8.2800, time 122.61ms
iter 45730: loss 6.7465, time 125.54ms
iter 45740: loss 7.3496, time 122.64ms
step 45750: train loss 6.3320, val loss 6.2752
saving checkpoint to out-shakespeare-char
iter 45750: loss 7.8359, time 2839.83ms
iter 45760: loss 6.7587, time 122.61ms
iter 45770: loss 7.2979, time 122.88ms
iter 45780: loss 7.8647, time 123.01ms
iter 45790: loss 7.6747, time 121.93ms
tensor(0.0618)
iter 45800: loss 7.6977, time 122.92ms
iter 45810: loss 7.3245, time 125.59ms
iter 45820: loss 7.3390, time 122.82ms
iter 45830: loss 7.2368, time 122.08ms
iter 45840: loss 7.2103, time 122.78ms
iter 45850: loss 7.8789, time 123.18ms
iter 45860: loss 7.6082, time 122.81ms
iter 45870: loss 7.7360, time 122.78ms
iter 45880: loss 6.9204, time 124.76ms
iter 45890: loss 7.8531, time 122.76ms
tensor(0.0778)
iter 45900: loss 7.2961, time 122.83ms
iter 45910: loss 7.3094, time 122.74ms
iter 45920: loss 7.7142, time 123.06ms
iter 45930: loss 7.4392, time 122.69ms
iter 45940: loss 7.3832, time 122.72ms
iter 45950: loss 7.9518, time 125.68ms
iter 45960: loss 7.0436, time 123.07ms
iter 45970: loss 7.8077, time 122.66ms
iter 45980: loss 7.3922, time 122.64ms
iter 45990: loss 8.3591, time 122.81ms
tensor(0.0955)
step 46000: train loss 6.3150, val loss 6.3900
saving checkpoint to out-shakespeare-char
iter 46000: loss 7.9377, time 2841.53ms
iter 46010: loss 8.3216, time 119.51ms
iter 46020: loss 7.6051, time 120.75ms
iter 46030: loss 7.3034, time 120.16ms
iter 46040: loss 8.0173, time 119.23ms
iter 46050: loss 7.7719, time 119.60ms
iter 46060: loss 7.9630, time 122.34ms
iter 46070: loss 7.8643, time 119.57ms
iter 46080: loss 7.1019, time 122.28ms
iter 46090: loss 6.6600, time 119.44ms
tensor(0.1147)
iter 46100: loss 7.1185, time 122.52ms
iter 46110: loss 6.9758, time 119.60ms
iter 46120: loss 7.7402, time 119.53ms
iter 46130: loss 7.2546, time 120.68ms
iter 46140: loss 7.8527, time 119.54ms
iter 46150: loss 7.6777, time 120.47ms
iter 46160: loss 7.1319, time 120.72ms
iter 46170: loss 7.6577, time 120.59ms
iter 46180: loss 7.7537, time 119.91ms
iter 46190: loss 7.5212, time 119.48ms
tensor(0.1355)
iter 46200: loss 7.8274, time 119.84ms
iter 46210: loss 6.3997, time 120.49ms
iter 46220: loss 7.6772, time 120.08ms
iter 46230: loss 7.5483, time 120.05ms
iter 46240: loss 7.0192, time 120.77ms
step 46250: train loss 6.3005, val loss 6.3358
saving checkpoint to out-shakespeare-char
iter 46250: loss 7.2927, time 2860.66ms
iter 46260: loss 7.0383, time 120.68ms
iter 46270: loss 7.2130, time 120.68ms
iter 46280: loss 7.1328, time 119.63ms
iter 46290: loss 7.3190, time 119.55ms
tensor(0.1577)
iter 46300: loss 8.2334, time 121.24ms
iter 46310: loss 7.6087, time 119.35ms
iter 46320: loss 7.5072, time 119.54ms
iter 46330: loss 7.7771, time 119.62ms
iter 46340: loss 8.2002, time 120.70ms
iter 46350: loss 6.8207, time 121.27ms
iter 46360: loss 7.2931, time 121.20ms
iter 46370: loss 8.8433, time 120.56ms
iter 46380: loss 8.1424, time 119.46ms
iter 46390: loss 7.8127, time 120.82ms
tensor(0.1813)
iter 46400: loss 8.1227, time 120.89ms
iter 46410: loss 7.9458, time 121.05ms
iter 46420: loss 7.9738, time 119.55ms
iter 46430: loss 7.2330, time 119.65ms
iter 46440: loss 7.7254, time 119.75ms
iter 46450: loss 7.8855, time 119.54ms
iter 46460: loss 7.3904, time 119.48ms
iter 46470: loss 7.7047, time 119.81ms
iter 46480: loss 7.2441, time 119.66ms
iter 46490: loss 7.9921, time 120.05ms
tensor(0.2061)
step 46500: train loss 6.3681, val loss 6.3994
saving checkpoint to out-shakespeare-char
iter 46500: loss 6.9692, time 2882.32ms
iter 46510: loss 7.7552, time 120.63ms
iter 46520: loss 7.0422, time 120.61ms
iter 46530: loss 7.6843, time 119.65ms
iter 46540: loss 6.4281, time 120.81ms
iter 46550: loss 7.8193, time 119.65ms
iter 46560: loss 7.4792, time 120.48ms
iter 46570: loss 7.5444, time 119.47ms
iter 46580: loss 7.7571, time 119.60ms
iter 46590: loss 7.5992, time 120.72ms
tensor(0.2321)
iter 46600: loss 6.9263, time 119.65ms
iter 46610: loss 7.2742, time 120.66ms
iter 46620: loss 6.5530, time 120.68ms
iter 46630: loss 7.8117, time 120.07ms
iter 46640: loss 6.9521, time 119.44ms
iter 46650: loss 7.6120, time 119.50ms
iter 46660: loss 7.9900, time 117.59ms
iter 46670: loss 8.0767, time 120.94ms
iter 46680: loss 7.7567, time 121.06ms
iter 46690: loss 7.5549, time 120.67ms
tensor(0.2591)
iter 46700: loss 7.4714, time 120.03ms
iter 46710: loss 7.9767, time 119.56ms
iter 46720: loss 7.5272, time 120.78ms
iter 46730: loss 7.6048, time 119.65ms
iter 46740: loss 7.8633, time 119.37ms
step 46750: train loss 6.3736, val loss 6.3517
saving checkpoint to out-shakespeare-char
iter 46750: loss 7.6932, time 2863.95ms
iter 46760: loss 7.8882, time 123.70ms
iter 46770: loss 7.0288, time 123.34ms
iter 46780: loss 7.9280, time 126.45ms
iter 46790: loss 7.3018, time 123.25ms
tensor(0.2871)
iter 46800: loss 7.6243, time 123.25ms
iter 46810: loss 7.9616, time 122.42ms
iter 46820: loss 6.7741, time 123.58ms
iter 46830: loss 7.4999, time 123.26ms
iter 46840: loss 7.0267, time 123.23ms
iter 46850: loss 7.2249, time 123.90ms
iter 46860: loss 7.7221, time 125.93ms
iter 46870: loss 7.8015, time 122.69ms
iter 46880: loss 7.9232, time 123.13ms
iter 46890: loss 6.8799, time 122.96ms
tensor(0.3159)
iter 46900: loss 7.1649, time 124.48ms
iter 46910: loss 7.8218, time 119.68ms
iter 46920: loss 7.6977, time 119.58ms
iter 46930: loss 7.0361, time 120.53ms
iter 46940: loss 7.1511, time 120.78ms
iter 46950: loss 7.2089, time 122.59ms
iter 46960: loss 8.0183, time 119.69ms
iter 46970: loss 7.9287, time 120.78ms
iter 46980: loss 6.5811, time 119.65ms
iter 46990: loss 8.1080, time 122.78ms
tensor(0.3455)
step 47000: train loss 6.4279, val loss 6.3812
saving checkpoint to out-shakespeare-char
iter 47000: loss 7.2706, time 2873.29ms
iter 47010: loss 7.4262, time 121.08ms
iter 47020: loss 7.3735, time 120.42ms
iter 47030: loss 7.4376, time 119.65ms
iter 47040: loss 7.9256, time 120.78ms
iter 47050: loss 7.3153, time 119.50ms
iter 47060: loss 7.6691, time 119.66ms
iter 47070: loss 7.7063, time 119.49ms
iter 47080: loss 7.6241, time 120.89ms
iter 47090: loss 7.8889, time 120.80ms
tensor(0.3757)
iter 47100: loss 7.5021, time 120.81ms
iter 47110: loss 7.5096, time 120.72ms
iter 47120: loss 7.9915, time 120.67ms
iter 47130: loss 7.4664, time 119.64ms
iter 47140: loss 8.4242, time 120.65ms
iter 47150: loss 7.7154, time 122.74ms
iter 47160: loss 7.0203, time 118.64ms
iter 47170: loss 7.7137, time 120.55ms
iter 47180: loss 7.2549, time 119.72ms
iter 47190: loss 7.7958, time 120.52ms
tensor(0.4063)
iter 47200: loss 7.3623, time 119.65ms
iter 47210: loss 8.0849, time 119.48ms
iter 47220: loss 7.5010, time 119.70ms
iter 47230: loss 7.9156, time 120.01ms
iter 47240: loss 7.3874, time 119.60ms
step 47250: train loss 6.4151, val loss 6.4316
saving checkpoint to out-shakespeare-char
iter 47250: loss 7.5367, time 2860.91ms
iter 47260: loss 7.3000, time 125.31ms
iter 47270: loss 6.7104, time 122.57ms
iter 47280: loss 7.4278, time 122.57ms
iter 47290: loss 7.4966, time 122.54ms
tensor(0.4373)
iter 47300: loss 6.9006, time 122.44ms
iter 47310: loss 7.2607, time 121.77ms
iter 47320: loss 7.0717, time 122.60ms
iter 47330: loss 7.5541, time 124.90ms
iter 47340: loss 7.9563, time 125.56ms
iter 47350: loss 7.7865, time 122.58ms
iter 47360: loss 8.0639, time 122.45ms
iter 47370: loss 7.4534, time 122.52ms
iter 47380: loss 7.2189, time 122.48ms
iter 47390: loss 7.5448, time 122.35ms
tensor(0.4686)
iter 47400: loss 7.6200, time 122.93ms
iter 47410: loss 7.7636, time 122.53ms
iter 47420: loss 7.7821, time 125.44ms
iter 47430: loss 7.1274, time 122.57ms
iter 47440: loss 7.9501, time 123.71ms
iter 47450: loss 8.2277, time 122.47ms
iter 47460: loss 7.7712, time 121.69ms
iter 47470: loss 6.9879, time 122.28ms
iter 47480: loss 7.4411, time 122.26ms
iter 47490: loss 7.6921, time 122.64ms
tensor(0.5000)
step 47500: train loss 6.4280, val loss 6.4310
saving checkpoint to out-shakespeare-char
iter 47500: loss 7.3333, time 2892.38ms
iter 47510: loss 7.2424, time 124.37ms
iter 47520: loss 7.4891, time 122.53ms
iter 47530: loss 7.5235, time 123.07ms
iter 47540: loss 8.1888, time 125.70ms
iter 47550: loss 7.6797, time 124.12ms
iter 47560: loss 7.9586, time 122.68ms
iter 47570: loss 8.3508, time 122.08ms
iter 47580: loss 7.8467, time 122.28ms
iter 47590: loss 8.0573, time 122.56ms
tensor(0.5314)
iter 47600: loss 7.6365, time 122.81ms
iter 47610: loss 7.7470, time 122.61ms
iter 47620: loss 6.9402, time 125.85ms
iter 47630: loss 7.6581, time 122.20ms
iter 47640: loss 7.9695, time 122.50ms
iter 47650: loss 7.4214, time 122.54ms
iter 47660: loss 7.9849, time 122.34ms
iter 47670: loss 7.5290, time 122.35ms
iter 47680: loss 7.8327, time 122.41ms
iter 47690: loss 7.9247, time 125.18ms
tensor(0.5627)
iter 47700: loss 8.1396, time 122.99ms
iter 47710: loss 7.5355, time 122.89ms
iter 47720: loss 7.4168, time 122.47ms
iter 47730: loss 7.5342, time 121.73ms
iter 47740: loss 7.3116, time 122.68ms
step 47750: train loss 6.4348, val loss 6.4378
saving checkpoint to out-shakespeare-char
iter 47750: loss 7.5736, time 2877.22ms
iter 47760: loss 8.0324, time 123.30ms
iter 47770: loss 7.1383, time 123.61ms
iter 47780: loss 7.1927, time 125.88ms
iter 47790: loss 8.4423, time 122.48ms
tensor(0.5937)
iter 47800: loss 8.3214, time 121.50ms
iter 47810: loss 7.2895, time 122.54ms
iter 47820: loss 7.8728, time 122.47ms
iter 47830: loss 7.3941, time 122.72ms
iter 47840: loss 7.6576, time 121.85ms
iter 47850: loss 8.0344, time 125.31ms
iter 47860: loss 8.0396, time 122.56ms
iter 47870: loss 7.1865, time 121.71ms
iter 47880: loss 7.2196, time 121.18ms
iter 47890: loss 7.6478, time 122.04ms
tensor(0.6243)
iter 47900: loss 6.4372, time 121.95ms
iter 47910: loss 7.0802, time 122.44ms
iter 47920: loss 7.6078, time 122.57ms
iter 47930: loss 7.4397, time 122.42ms
iter 47940: loss 7.4534, time 122.50ms
iter 47950: loss 7.9929, time 120.88ms
iter 47960: loss 7.7111, time 123.01ms
iter 47970: loss 7.5351, time 121.52ms
iter 47980: loss 7.8168, time 122.16ms
iter 47990: loss 7.3080, time 124.13ms
tensor(0.6545)
step 48000: train loss 6.4435, val loss 6.4163
saving checkpoint to out-shakespeare-char
iter 48000: loss 7.0041, time 2869.95ms
iter 48010: loss 7.5283, time 125.39ms
iter 48020: loss 7.7661, time 122.58ms
iter 48030: loss 7.5369, time 122.84ms
iter 48040: loss 7.3747, time 122.33ms
iter 48050: loss 7.2562, time 122.28ms
iter 48060: loss 7.6763, time 122.41ms
iter 48070: loss 7.6084, time 122.42ms
iter 48080: loss 8.1794, time 125.24ms
iter 48090: loss 8.2210, time 122.30ms
tensor(0.6841)
iter 48100: loss 7.6097, time 122.46ms
iter 48110: loss 7.9526, time 122.35ms
iter 48120: loss 8.2558, time 123.36ms
iter 48130: loss 7.5845, time 122.41ms
iter 48140: loss 7.7025, time 122.39ms
iter 48150: loss 7.2522, time 125.41ms
iter 48160: loss 7.7588, time 122.34ms
iter 48170: loss 7.8634, time 122.79ms
iter 48180: loss 7.9844, time 122.72ms
iter 48190: loss 7.7887, time 121.96ms
tensor(0.7129)
iter 48200: loss 7.9343, time 122.84ms
iter 48210: loss 6.9105, time 122.89ms
iter 48220: loss 8.2434, time 125.78ms
iter 48230: loss 7.4490, time 121.49ms
iter 48240: loss 8.1496, time 123.64ms
step 48250: train loss 6.4691, val loss 6.4554
saving checkpoint to out-shakespeare-char
iter 48250: loss 7.3926, time 2884.28ms
iter 48260: loss 7.6462, time 122.59ms
iter 48270: loss 7.5552, time 125.47ms
iter 48280: loss 7.9852, time 122.45ms
iter 48290: loss 7.2331, time 122.49ms
tensor(0.7409)
iter 48300: loss 7.8495, time 121.86ms
iter 48310: loss 8.1382, time 122.45ms
iter 48320: loss 7.2301, time 123.03ms
iter 48330: loss 7.7884, time 122.62ms
iter 48340: loss 7.7470, time 125.74ms
iter 48350: loss 7.8544, time 122.44ms
iter 48360: loss 7.8759, time 122.36ms
iter 48370: loss 6.9606, time 122.33ms
iter 48380: loss 7.8675, time 122.07ms
iter 48390: loss 8.3208, time 122.51ms
tensor(0.7679)
iter 48400: loss 6.8932, time 123.27ms
iter 48410: loss 7.9939, time 125.93ms
iter 48420: loss 7.6760, time 123.67ms
iter 48430: loss 7.0309, time 123.02ms
iter 48440: loss 8.2740, time 123.27ms
iter 48450: loss 8.0750, time 123.55ms
iter 48460: loss 8.2401, time 123.29ms
iter 48470: loss 8.1405, time 123.28ms
iter 48480: loss 6.9999, time 126.43ms
iter 48490: loss 7.9284, time 123.31ms
tensor(0.7939)
step 48500: train loss 6.5285, val loss 6.4944
saving checkpoint to out-shakespeare-char
iter 48500: loss 7.4478, time 2864.41ms
iter 48510: loss 7.8799, time 125.77ms
iter 48520: loss 7.4025, time 123.25ms
iter 48530: loss 7.9408, time 122.89ms
iter 48540: loss 7.7202, time 123.49ms
iter 48550: loss 7.7200, time 123.01ms
iter 48560: loss 7.3718, time 123.45ms
iter 48570: loss 7.2396, time 123.10ms
iter 48580: loss 8.1465, time 125.77ms
iter 48590: loss 7.8780, time 122.58ms
tensor(0.8187)
iter 48600: loss 7.7970, time 122.86ms
iter 48610: loss 7.8837, time 122.95ms
iter 48620: loss 7.2359, time 122.94ms
iter 48630: loss 8.1444, time 122.88ms
iter 48640: loss 8.1138, time 122.85ms
iter 48650: loss 6.8731, time 122.92ms
iter 48660: loss 7.6385, time 123.12ms
iter 48670: loss 7.8684, time 125.73ms
iter 48680: loss 7.7622, time 123.13ms
iter 48690: loss 8.0662, time 122.95ms
tensor(0.8423)
iter 48700: loss 7.6165, time 122.70ms
iter 48710: loss 7.6991, time 122.87ms
iter 48720: loss 7.3550, time 122.78ms
iter 48730: loss 7.1367, time 122.92ms
iter 48740: loss 8.5528, time 122.73ms
step 48750: train loss 6.5044, val loss 6.4846
saving checkpoint to out-shakespeare-char
iter 48750: loss 7.8998, time 2878.93ms
iter 48760: loss 7.4258, time 122.87ms
iter 48770: loss 7.4152, time 126.05ms
iter 48780: loss 7.9316, time 123.44ms
iter 48790: loss 7.2142, time 122.67ms
tensor(0.8645)
iter 48800: loss 8.1617, time 123.22ms
iter 48810: loss 8.0069, time 122.99ms
iter 48820: loss 7.2653, time 122.86ms
iter 48830: loss 7.4457, time 122.99ms
iter 48840: loss 8.6008, time 122.97ms
iter 48850: loss 7.5699, time 125.46ms
iter 48860: loss 7.4518, time 123.03ms
iter 48870: loss 7.6541, time 122.77ms
iter 48880: loss 7.9310, time 122.69ms
iter 48890: loss 8.5271, time 123.01ms
tensor(0.8853)
iter 48900: loss 7.8244, time 122.93ms
iter 48910: loss 7.8232, time 122.74ms
iter 48920: loss 8.0601, time 126.25ms
iter 48930: loss 7.6362, time 122.88ms
iter 48940: loss 7.7802, time 122.10ms
iter 48950: loss 7.5525, time 123.03ms
iter 48960: loss 6.9546, time 122.93ms
iter 48970: loss 7.1012, time 123.36ms
iter 48980: loss 7.4121, time 121.97ms
iter 48990: loss 7.9895, time 122.85ms
tensor(0.9045)
step 49000: train loss 6.5282, val loss 6.5433
saving checkpoint to out-shakespeare-char
iter 49000: loss 7.6483, time 2858.78ms
iter 49010: loss 7.8451, time 125.64ms
iter 49020: loss 7.3619, time 122.85ms
iter 49030: loss 7.2495, time 122.82ms
iter 49040: loss 7.4041, time 122.34ms
iter 49050: loss 7.1971, time 123.00ms
iter 49060: loss 8.0247, time 122.93ms
iter 49070: loss 8.0220, time 122.83ms
iter 49080: loss 6.9592, time 125.83ms
iter 49090: loss 7.5686, time 122.98ms
tensor(0.9222)
iter 49100: loss 8.1213, time 123.13ms
iter 49110: loss 8.2001, time 122.41ms
iter 49120: loss 7.1547, time 122.15ms
iter 49130: loss 7.5258, time 122.73ms
iter 49140: loss 8.4718, time 123.21ms
iter 49150: loss 8.1870, time 126.05ms
iter 49160: loss 7.3296, time 122.53ms
iter 49170: loss 8.6750, time 122.90ms
iter 49180: loss 7.0717, time 122.68ms
iter 49190: loss 7.0555, time 122.86ms
tensor(0.9382)
iter 49200: loss 7.2664, time 122.91ms
iter 49210: loss 7.9279, time 123.10ms
iter 49220: loss 8.2577, time 122.74ms
iter 49230: loss 7.0198, time 125.57ms
iter 49240: loss 7.6391, time 122.98ms
step 49250: train loss 6.5377, val loss 6.4707
saving checkpoint to out-shakespeare-char
iter 49250: loss 7.4374, time 2852.11ms
iter 49260: loss 7.6198, time 122.07ms
iter 49270: loss 7.3921, time 123.06ms
iter 49280: loss 7.1762, time 122.84ms
iter 49290: loss 6.9714, time 123.98ms
tensor(0.9524)
iter 49300: loss 8.0749, time 122.93ms
iter 49310: loss 8.4287, time 122.86ms
iter 49320: loss 7.3168, time 122.70ms
iter 49330: loss 7.2398, time 125.53ms
iter 49340: loss 7.5256, time 122.56ms
iter 49350: loss 8.1312, time 123.03ms
iter 49360: loss 7.4242, time 122.26ms
iter 49370: loss 8.0188, time 122.90ms
iter 49380: loss 7.2517, time 122.27ms
iter 49390: loss 7.6789, time 121.83ms
tensor(0.9649)
iter 49400: loss 7.7151, time 125.93ms
iter 49410: loss 7.0446, time 122.71ms
iter 49420: loss 8.0998, time 122.88ms
iter 49430: loss 8.2953, time 122.56ms
iter 49440: loss 7.6131, time 122.87ms
iter 49450: loss 8.1890, time 123.16ms
iter 49460: loss 7.5081, time 122.72ms
iter 49470: loss 7.6519, time 125.62ms
iter 49480: loss 7.6561, time 123.04ms
iter 49490: loss 8.3457, time 122.22ms
tensor(0.9755)
step 49500: train loss 6.4799, val loss 6.4701
saving checkpoint to out-shakespeare-char
iter 49500: loss 8.4065, time 2852.43ms
iter 49510: loss 7.4467, time 123.09ms
iter 49520: loss 7.6188, time 123.11ms
iter 49530: loss 7.7691, time 121.94ms
iter 49540: loss 7.9910, time 122.75ms
iter 49550: loss 7.8474, time 122.88ms
iter 49560: loss 7.6187, time 125.18ms
iter 49570: loss 7.3639, time 122.75ms
iter 49580: loss 8.2156, time 122.98ms
iter 49590: loss 7.1384, time 122.95ms
tensor(0.9843)
iter 49600: loss 7.7535, time 122.91ms
iter 49610: loss 7.5121, time 122.99ms
iter 49620: loss 8.1955, time 122.87ms
iter 49630: loss 7.9258, time 125.74ms
iter 49640: loss 7.2877, time 122.82ms
iter 49650: loss 7.9299, time 122.64ms
iter 49660: loss 7.9808, time 122.02ms
iter 49670: loss 7.8111, time 122.53ms
iter 49680: loss 7.3905, time 122.86ms
iter 49690: loss 7.6431, time 123.00ms
tensor(0.9911)
iter 49700: loss 7.2711, time 125.65ms
iter 49710: loss 7.4008, time 122.87ms
iter 49720: loss 8.2042, time 122.69ms
iter 49730: loss 7.7919, time 122.86ms
iter 49740: loss 8.0900, time 123.04ms
step 49750: train loss 6.4589, val loss 6.4409
saving checkpoint to out-shakespeare-char
iter 49750: loss 6.8509, time 2864.16ms
iter 49760: loss 7.6170, time 122.13ms
iter 49770: loss 8.1123, time 122.84ms
iter 49780: loss 8.2319, time 122.63ms
iter 49790: loss 7.5705, time 122.79ms
tensor(0.9961)
iter 49800: loss 7.7413, time 124.99ms
iter 49810: loss 7.6915, time 123.06ms
iter 49820: loss 7.0326, time 122.77ms
iter 49830: loss 6.6100, time 121.98ms
iter 49840: loss 6.7634, time 122.73ms
iter 49850: loss 8.6138, time 122.13ms
iter 49860: loss 7.3660, time 123.00ms
iter 49870: loss 7.2223, time 122.99ms
iter 49880: loss 7.9161, time 123.17ms
iter 49890: loss 7.2939, time 123.18ms
tensor(0.9990)
iter 49900: loss 7.2591, time 123.35ms
iter 49910: loss 7.9810, time 123.28ms
iter 49920: loss 8.1036, time 123.88ms
iter 49930: loss 7.4598, time 125.82ms
iter 49940: loss 8.2798, time 122.77ms
iter 49950: loss 8.5342, time 123.06ms
iter 49960: loss 7.6284, time 122.97ms
iter 49970: loss 7.3040, time 123.07ms
iter 49980: loss 6.6947, time 122.68ms
iter 49990: loss 7.6299, time 122.20ms
tensor(1.)
step 50000: train loss 6.5183, val loss 6.5057
saving checkpoint to out-shakespeare-char
iter 50000: loss 7.1037, time 2879.54ms
iter 50010: loss 8.5779, time 123.21ms
iter 50020: loss 7.2678, time 125.81ms
iter 50030: loss 7.2497, time 122.79ms
iter 50040: loss 7.6601, time 122.65ms
iter 50050: loss 7.6512, time 122.68ms
iter 50060: loss 7.3486, time 122.69ms
iter 50070: loss 7.3284, time 122.80ms
iter 50080: loss 7.2984, time 122.04ms
iter 50090: loss 7.8184, time 125.57ms
tensor(0.9990)
iter 50100: loss 7.7200, time 122.78ms
iter 50110: loss 8.2816, time 122.46ms
iter 50120: loss 8.0076, time 122.76ms
iter 50130: loss 6.5629, time 122.76ms
iter 50140: loss 7.4346, time 123.07ms
iter 50150: loss 7.5994, time 122.74ms
iter 50160: loss 7.2641, time 122.83ms
iter 50170: loss 7.5099, time 125.55ms
iter 50180: loss 7.1396, time 122.78ms
iter 50190: loss 8.1597, time 122.79ms
tensor(0.9961)
iter 50200: loss 7.7972, time 121.39ms
iter 50210: loss 7.6832, time 122.64ms
iter 50220: loss 7.5109, time 123.05ms
iter 50230: loss 8.4081, time 122.86ms
iter 50240: loss 7.7634, time 122.81ms
step 50250: train loss 6.4805, val loss 6.4277
saving checkpoint to out-shakespeare-char
iter 50250: loss 8.1339, time 2856.74ms
iter 50260: loss 8.5037, time 122.71ms
iter 50270: loss 7.3042, time 123.00ms
iter 50280: loss 7.0605, time 122.68ms
iter 50290: loss 7.9783, time 125.44ms
tensor(0.9911)
iter 50300: loss 8.0422, time 123.07ms
iter 50310: loss 7.3463, time 122.81ms
iter 50320: loss 7.7040, time 122.76ms
iter 50330: loss 7.3326, time 123.22ms
iter 50340: loss 7.6169, time 122.47ms
iter 50350: loss 7.8001, time 121.93ms
iter 50360: loss 7.4494, time 124.16ms
iter 50370: loss 7.5327, time 121.75ms
iter 50380: loss 7.0596, time 122.75ms
iter 50390: loss 7.5691, time 120.64ms
tensor(0.9843)
iter 50400: loss 7.7156, time 122.87ms
iter 50410: loss 7.0331, time 122.85ms
iter 50420: loss 7.7388, time 123.15ms
iter 50430: loss 7.8637, time 126.32ms
iter 50440: loss 7.3121, time 122.92ms
iter 50450: loss 8.4642, time 123.43ms
iter 50460: loss 7.6079, time 123.34ms
iter 50470: loss 7.9711, time 123.07ms
iter 50480: loss 7.3803, time 123.01ms
iter 50490: loss 6.9411, time 123.57ms
tensor(0.9755)
step 50500: train loss 6.5171, val loss 6.4791
saving checkpoint to out-shakespeare-char
iter 50500: loss 7.6284, time 2858.19ms
iter 50510: loss 7.2622, time 121.76ms
iter 50520: loss 7.5677, time 122.37ms
iter 50530: loss 7.3171, time 120.88ms
iter 50540: loss 8.1600, time 122.41ms
iter 50550: loss 7.3174, time 122.39ms
iter 50560: loss 7.7142, time 124.78ms
iter 50570: loss 7.3507, time 122.28ms
iter 50580: loss 7.7915, time 122.35ms
iter 50590: loss 7.6546, time 122.68ms
tensor(0.9649)
iter 50600: loss 7.6264, time 122.71ms
iter 50610: loss 7.4609, time 122.96ms
iter 50620: loss 8.2487, time 122.47ms
iter 50630: loss 7.0970, time 125.94ms
iter 50640: loss 7.4983, time 120.74ms
iter 50650: loss 7.1559, time 122.77ms
iter 50660: loss 7.8123, time 121.53ms
iter 50670: loss 8.2975, time 122.27ms
iter 50680: loss 7.3694, time 121.65ms
iter 50690: loss 8.1073, time 122.15ms
tensor(0.9524)
iter 50700: loss 6.4743, time 122.38ms
iter 50710: loss 7.5140, time 125.45ms
iter 50720: loss 6.5533, time 120.62ms
iter 50730: loss 8.0197, time 121.91ms
iter 50740: loss 6.4619, time 122.54ms
step 50750: train loss 6.4637, val loss 6.4753
saving checkpoint to out-shakespeare-char
iter 50750: loss 8.0930, time 2857.39ms
iter 50760: loss 7.5762, time 123.49ms
iter 50770: loss 7.0431, time 123.72ms
iter 50780: loss 7.3208, time 122.40ms
iter 50790: loss 7.6440, time 122.45ms
tensor(0.9382)
iter 50800: loss 7.2477, time 121.16ms
iter 50810: loss 7.6164, time 122.48ms
iter 50820: loss 7.3128, time 125.21ms
iter 50830: loss 7.4017, time 121.74ms
iter 50840: loss 8.2384, time 118.60ms
iter 50850: loss 8.3820, time 119.72ms
iter 50860: loss 7.1407, time 119.21ms
iter 50870: loss 6.7329, time 120.40ms
iter 50880: loss 7.4963, time 121.97ms
iter 50890: loss 7.3169, time 118.56ms
tensor(0.9222)
iter 50900: loss 7.8583, time 122.67ms
iter 50910: loss 7.2948, time 120.71ms
iter 50920: loss 8.4115, time 119.13ms
iter 50930: loss 8.1128, time 119.09ms
iter 50940: loss 7.1884, time 121.55ms
iter 50950: loss 7.9384, time 120.31ms
iter 50960: loss 7.6685, time 121.85ms
iter 50970: loss 7.6279, time 119.18ms
iter 50980: loss 7.5079, time 119.17ms
iter 50990: loss 7.7626, time 118.59ms
tensor(0.9045)
step 51000: train loss 6.4543, val loss 6.4414
saving checkpoint to out-shakespeare-char
iter 51000: loss 7.5386, time 2864.42ms
iter 51010: loss 6.4724, time 119.88ms
iter 51020: loss 7.5476, time 121.32ms
iter 51030: loss 7.9896, time 119.58ms
iter 51040: loss 7.1809, time 119.51ms
iter 51050: loss 7.7763, time 118.50ms
iter 51060: loss 7.9938, time 119.56ms
iter 51070: loss 7.0409, time 120.48ms
iter 51080: loss 8.1969, time 122.27ms
iter 51090: loss 7.9445, time 119.55ms
tensor(0.8853)
iter 51100: loss 7.2485, time 119.53ms
iter 51110: loss 8.3851, time 118.64ms
iter 51120: loss 7.2432, time 121.05ms
iter 51130: loss 7.3673, time 119.56ms
iter 51140: loss 7.2607, time 119.51ms
iter 51150: loss 7.3484, time 120.69ms
iter 51160: loss 7.3066, time 120.66ms
iter 51170: loss 7.6369, time 120.64ms
iter 51180: loss 7.9060, time 120.68ms
iter 51190: loss 7.6348, time 120.57ms
tensor(0.8645)
iter 51200: loss 7.6432, time 119.95ms
iter 51210: loss 7.3615, time 120.45ms
iter 51220: loss 7.3249, time 120.76ms
iter 51230: loss 7.7154, time 120.04ms
iter 51240: loss 7.5435, time 119.43ms
step 51250: train loss 6.4360, val loss 6.3581
saving checkpoint to out-shakespeare-char
iter 51250: loss 8.0384, time 2872.40ms
iter 51260: loss 7.8816, time 120.61ms
iter 51270: loss 7.5396, time 121.01ms
iter 51280: loss 7.9939, time 119.54ms
iter 51290: loss 8.0628, time 119.54ms
tensor(0.8423)
iter 51300: loss 7.4344, time 119.58ms
iter 51310: loss 7.7036, time 120.96ms
iter 51320: loss 7.2287, time 118.96ms
iter 51330: loss 7.5617, time 122.08ms
iter 51340: loss 7.1093, time 120.58ms
iter 51350: loss 8.4278, time 122.25ms
iter 51360: loss 7.9771, time 120.82ms
iter 51370: loss 7.2664, time 120.57ms
iter 51380: loss 7.1921, time 119.53ms
iter 51390: loss 7.1139, time 119.62ms
tensor(0.8187)
iter 51400: loss 8.1014, time 121.22ms
iter 51410: loss 7.7936, time 122.27ms
iter 51420: loss 6.8213, time 119.48ms
iter 51430: loss 6.6846, time 122.89ms
iter 51440: loss 6.8722, time 118.97ms
iter 51450: loss 7.9105, time 121.06ms
iter 51460: loss 7.1753, time 118.60ms
iter 51470: loss 8.1554, time 119.40ms
iter 51480: loss 7.3286, time 119.59ms
iter 51490: loss 7.5917, time 118.68ms
tensor(0.7939)
step 51500: train loss 6.4390, val loss 6.4798
saving checkpoint to out-shakespeare-char
iter 51500: loss 7.3883, time 2863.70ms
iter 51510: loss 7.1570, time 120.52ms
iter 51520: loss 7.1693, time 120.65ms
iter 51530: loss 8.5779, time 119.17ms
iter 51540: loss 7.7728, time 119.26ms
iter 51550: loss 7.3124, time 119.24ms
iter 51560: loss 6.6880, time 119.07ms
iter 51570: loss 7.6501, time 120.45ms
iter 51580: loss 7.6161, time 119.37ms
iter 51590: loss 7.5662, time 119.32ms
tensor(0.7679)
iter 51600: loss 7.6701, time 119.43ms
iter 51610: loss 7.3159, time 120.70ms
iter 51620: loss 7.2738, time 119.90ms
iter 51630: loss 7.8443, time 119.25ms
iter 51640: loss 7.5968, time 120.42ms
iter 51650: loss 7.1116, time 119.14ms
iter 51660: loss 7.0505, time 119.14ms
iter 51670: loss 7.2571, time 119.07ms
iter 51680: loss 7.4709, time 119.15ms
iter 51690: loss 7.7398, time 120.12ms
tensor(0.7409)
iter 51700: loss 6.8022, time 119.25ms
iter 51710: loss 7.7462, time 119.61ms
iter 51720: loss 7.8249, time 120.60ms
iter 51730: loss 7.2174, time 121.90ms
iter 51740: loss 8.4304, time 120.47ms
step 51750: train loss 6.3514, val loss 6.3791
saving checkpoint to out-shakespeare-char
iter 51750: loss 7.3712, time 2868.05ms
iter 51760: loss 7.8220, time 121.02ms
iter 51770: loss 7.0161, time 120.64ms
iter 51780: loss 7.7364, time 119.27ms
iter 51790: loss 7.7953, time 120.79ms
tensor(0.7129)
iter 51800: loss 7.0323, time 119.40ms
iter 51810: loss 7.6920, time 121.02ms
iter 51820: loss 6.9774, time 119.36ms
iter 51830: loss 7.7882, time 120.20ms
iter 51840: loss 8.5192, time 119.55ms
iter 51850: loss 7.0161, time 120.42ms
iter 51860: loss 8.1472, time 119.55ms
iter 51870: loss 7.0311, time 120.31ms
iter 51880: loss 7.5750, time 119.76ms
iter 51890: loss 7.9974, time 119.16ms
tensor(0.6841)
iter 51900: loss 7.7295, time 120.41ms
iter 51910: loss 8.2333, time 122.26ms
iter 51920: loss 7.2891, time 120.38ms
iter 51930: loss 7.4115, time 120.33ms
iter 51940: loss 7.9743, time 119.89ms
iter 51950: loss 6.5769, time 118.91ms
iter 51960: loss 6.9051, time 118.95ms
iter 51970: loss 7.5795, time 120.34ms
iter 51980: loss 7.9006, time 118.84ms
iter 51990: loss 7.1672, time 118.94ms
tensor(0.6545)
step 52000: train loss 6.3659, val loss 6.4088
saving checkpoint to out-shakespeare-char
iter 52000: loss 6.7114, time 2865.85ms
iter 52010: loss 7.8975, time 118.96ms
iter 52020: loss 7.0671, time 119.14ms
iter 52030: loss 7.6564, time 119.14ms
iter 52040: loss 6.8872, time 119.10ms
iter 52050: loss 7.7035, time 119.14ms
iter 52060: loss 7.8783, time 120.32ms
iter 52070: loss 7.1686, time 119.18ms
iter 52080: loss 6.8243, time 119.06ms
iter 52090: loss 6.9742, time 118.65ms
tensor(0.6243)
iter 52100: loss 7.2783, time 118.89ms
iter 52110: loss 7.1424, time 119.20ms
iter 52120: loss 7.0222, time 120.40ms
iter 52130: loss 7.4033, time 119.15ms
iter 52140: loss 7.0175, time 119.18ms
iter 52150: loss 7.4296, time 119.33ms
iter 52160: loss 7.6671, time 119.30ms
iter 52170: loss 7.5568, time 121.94ms
iter 52180: loss 7.8199, time 120.30ms
iter 52190: loss 7.8697, time 119.14ms
tensor(0.5937)
iter 52200: loss 7.9567, time 119.42ms
iter 52210: loss 7.3862, time 119.23ms
iter 52220: loss 7.6566, time 120.24ms
iter 52230: loss 8.0106, time 119.50ms
iter 52240: loss 7.7362, time 120.23ms
step 52250: train loss 6.3245, val loss 6.3047
saving checkpoint to out-shakespeare-char
iter 52250: loss 6.5646, time 2866.13ms
iter 52260: loss 6.9930, time 122.38ms
iter 52270: loss 7.1864, time 120.77ms
iter 52280: loss 7.3723, time 120.71ms
iter 52290: loss 7.3445, time 119.30ms
tensor(0.5627)
iter 52300: loss 7.6685, time 121.96ms
iter 52310: loss 7.9864, time 120.72ms
iter 52320: loss 7.0941, time 121.93ms
iter 52330: loss 7.7933, time 120.57ms
iter 52340: loss 6.8318, time 120.84ms
iter 52350: loss 7.4663, time 119.41ms
iter 52360: loss 7.5439, time 119.51ms
iter 52370: loss 6.9485, time 118.74ms
iter 52380: loss 7.5136, time 120.74ms
iter 52390: loss 7.3972, time 119.47ms
tensor(0.5314)
iter 52400: loss 7.6256, time 119.57ms
iter 52410: loss 7.2159, time 119.59ms
iter 52420: loss 6.8870, time 119.62ms
iter 52430: loss 7.1211, time 120.48ms
iter 52440: loss 7.3040, time 121.07ms
iter 52450: loss 8.0891, time 119.55ms
iter 52460: loss 7.4795, time 120.58ms
iter 52470: loss 7.2331, time 119.79ms
iter 52480: loss 7.1713, time 120.75ms
iter 52490: loss 7.2696, time 120.51ms
tensor(0.5000)
step 52500: train loss 6.3391, val loss 6.3651
saving checkpoint to out-shakespeare-char
iter 52500: loss 7.1876, time 2869.70ms
iter 52510: loss 7.5818, time 120.54ms
iter 52520: loss 7.4358, time 119.23ms
iter 52530: loss 6.5022, time 121.19ms
iter 52540: loss 7.5482, time 119.29ms
iter 52550: loss 7.0601, time 120.98ms
iter 52560: loss 7.6488, time 119.28ms
iter 52570: loss 7.1292, time 120.71ms
iter 52580: loss 7.0660, time 119.00ms
iter 52590: loss 7.6017, time 121.04ms
tensor(0.4686)
iter 52600: loss 8.0852, time 119.34ms
iter 52610: loss 7.4535, time 120.02ms
iter 52620: loss 6.8722, time 120.28ms
iter 52630: loss 7.3582, time 120.83ms
iter 52640: loss 7.3910, time 121.28ms
iter 52650: loss 7.0468, time 120.64ms
iter 52660: loss 7.1947, time 120.27ms
iter 52670: loss 6.8909, time 120.90ms
iter 52680: loss 6.6606, time 118.22ms
iter 52690: loss 6.7966, time 119.50ms
tensor(0.4373)
iter 52700: loss 7.6827, time 119.48ms
iter 52710: loss 7.6709, time 120.63ms
iter 52720: loss 7.0625, time 119.18ms
iter 52730: loss 7.8835, time 119.86ms
iter 52740: loss 7.4834, time 120.00ms
step 52750: train loss 6.3190, val loss 6.3675
saving checkpoint to out-shakespeare-char
iter 52750: loss 8.4911, time 2836.69ms
iter 52760: loss 7.4435, time 120.55ms
iter 52770: loss 7.7081, time 121.46ms
iter 52780: loss 8.0716, time 119.34ms
iter 52790: loss 6.8616, time 121.87ms
tensor(0.4063)
iter 52800: loss 6.9844, time 119.39ms
iter 52810: loss 7.6382, time 122.17ms
iter 52820: loss 7.2158, time 120.05ms
iter 52830: loss 7.9524, time 120.29ms
iter 52840: loss 7.9927, time 119.15ms
iter 52850: loss 7.0810, time 120.59ms
iter 52860: loss 7.5470, time 120.33ms
iter 52870: loss 8.1271, time 120.90ms
iter 52880: loss 7.3559, time 119.60ms
iter 52890: loss 7.3459, time 120.22ms
tensor(0.3757)
iter 52900: loss 7.3917, time 120.83ms
iter 52910: loss 7.2235, time 120.48ms
iter 52920: loss 7.4619, time 120.19ms
iter 52930: loss 6.5926, time 120.52ms
iter 52940: loss 7.7178, time 119.15ms
iter 52950: loss 6.7217, time 119.73ms
iter 52960: loss 7.4277, time 119.01ms
iter 52970: loss 6.5701, time 120.91ms
iter 52980: loss 7.3782, time 119.23ms
iter 52990: loss 8.3093, time 119.05ms
tensor(0.3455)
step 53000: train loss 6.2894, val loss 6.3163
saving checkpoint to out-shakespeare-char
iter 53000: loss 6.8097, time 2870.97ms
iter 53010: loss 7.4275, time 117.47ms
iter 53020: loss 8.0199, time 120.70ms
iter 53030: loss 7.5276, time 119.17ms
iter 53040: loss 7.0559, time 120.43ms
iter 53050: loss 6.6796, time 118.71ms
iter 53060: loss 7.6146, time 118.38ms
iter 53070: loss 7.5562, time 119.58ms
iter 53080: loss 7.9684, time 120.58ms
iter 53090: loss 7.6500, time 120.55ms
tensor(0.3159)
iter 53100: loss 7.4131, time 120.81ms
iter 53110: loss 7.4378, time 120.37ms
iter 53120: loss 8.0338, time 120.27ms
iter 53130: loss 7.7697, time 119.40ms
iter 53140: loss 7.9918, time 119.25ms
iter 53150: loss 7.3846, time 120.36ms
iter 53160: loss 8.4904, time 119.96ms
iter 53170: loss 6.7861, time 119.93ms
iter 53180: loss 7.2379, time 120.36ms
iter 53190: loss 6.7205, time 119.20ms
tensor(0.2871)
iter 53200: loss 7.4599, time 120.39ms
iter 53210: loss 8.2053, time 120.37ms
iter 53220: loss 7.4952, time 119.26ms
iter 53230: loss 7.3596, time 119.16ms
iter 53240: loss 7.5306, time 120.39ms
step 53250: train loss 6.2948, val loss 6.2714
saving checkpoint to out-shakespeare-char
iter 53250: loss 7.4115, time 2859.18ms
iter 53260: loss 6.9955, time 118.96ms
iter 53270: loss 6.3708, time 119.15ms
iter 53280: loss 7.6112, time 119.75ms
iter 53290: loss 6.8205, time 122.22ms
tensor(0.2591)
iter 53300: loss 7.1434, time 120.41ms
iter 53310: loss 6.4335, time 120.42ms
iter 53320: loss 7.4477, time 119.37ms
iter 53330: loss 7.1610, time 119.71ms
iter 53340: loss 7.9237, time 118.95ms
iter 53350: loss 7.3695, time 119.97ms
iter 53360: loss 7.0243, time 119.18ms
iter 53370: loss 7.7221, time 119.06ms
iter 53380: loss 7.4938, time 119.14ms
iter 53390: loss 6.7820, time 119.17ms
tensor(0.2321)
iter 53400: loss 7.7363, time 119.37ms
iter 53410: loss 8.0848, time 119.33ms
iter 53420: loss 7.5020, time 119.14ms
iter 53430: loss 7.9000, time 120.05ms
iter 53440: loss 7.1899, time 119.23ms
iter 53450: loss 7.0543, time 121.91ms
iter 53460: loss 7.6929, time 119.22ms
iter 53470: loss 8.3420, time 120.41ms
iter 53480: loss 6.6877, time 118.64ms
iter 53490: loss 7.5205, time 119.23ms
tensor(0.2061)
step 53500: train loss 6.2026, val loss 6.2603
saving checkpoint to out-shakespeare-char
iter 53500: loss 7.1398, time 2873.43ms
iter 53510: loss 7.2862, time 118.83ms
iter 53520: loss 6.8190, time 119.18ms
iter 53530: loss 6.8397, time 122.09ms
iter 53540: loss 7.4053, time 119.20ms
iter 53550: loss 8.1280, time 119.07ms
iter 53560: loss 7.2713, time 119.09ms
iter 53570: loss 6.3174, time 120.85ms
iter 53580: loss 7.1975, time 119.31ms
iter 53590: loss 7.0535, time 120.32ms
tensor(0.1813)
iter 53600: loss 6.6371, time 120.40ms
iter 53610: loss 6.8722, time 121.85ms
iter 53620: loss 7.9800, time 119.62ms
iter 53630: loss 7.2615, time 118.96ms
iter 53640: loss 7.3994, time 120.71ms
iter 53650: loss 6.7119, time 119.06ms
iter 53660: loss 7.5223, time 119.12ms
iter 53670: loss 7.0682, time 118.78ms
iter 53680: loss 8.1384, time 119.10ms
iter 53690: loss 7.5598, time 119.09ms
tensor(0.1577)
iter 53700: loss 7.5682, time 119.86ms
iter 53710: loss 7.3285, time 119.09ms
iter 53720: loss 7.4335, time 119.33ms
iter 53730: loss 6.6555, time 119.01ms
iter 53740: loss 7.2443, time 119.05ms
step 53750: train loss 6.2545, val loss 6.3035
saving checkpoint to out-shakespeare-char
iter 53750: loss 7.2615, time 2860.74ms
iter 53760: loss 8.5517, time 118.95ms
iter 53770: loss 7.7085, time 119.05ms
iter 53780: loss 8.2414, time 120.33ms
iter 53790: loss 6.9755, time 122.77ms
tensor(0.1355)
iter 53800: loss 7.4787, time 119.03ms
iter 53810: loss 7.2248, time 119.01ms
iter 53820: loss 8.2020, time 119.27ms
iter 53830: loss 6.8286, time 120.25ms
iter 53840: loss 6.7150, time 119.23ms
iter 53850: loss 6.7547, time 119.05ms
iter 53860: loss 7.0889, time 119.29ms
iter 53870: loss 6.8893, time 120.31ms
iter 53880: loss 7.3977, time 119.68ms
iter 53890: loss 6.6107, time 118.96ms
tensor(0.1147)
iter 53900: loss 6.4117, time 120.27ms
iter 53910: loss 7.3782, time 119.48ms
iter 53920: loss 7.7658, time 119.97ms
iter 53930: loss 6.9863, time 120.32ms
iter 53940: loss 6.4036, time 119.19ms
iter 53950: loss 6.7194, time 122.03ms
iter 53960: loss 7.6491, time 118.28ms
iter 53970: loss 6.8321, time 119.18ms
iter 53980: loss 7.7121, time 119.16ms
iter 53990: loss 7.9813, time 119.31ms
tensor(0.0955)
step 54000: train loss 6.2507, val loss 6.3099
saving checkpoint to out-shakespeare-char
iter 54000: loss 6.6121, time 2857.47ms
iter 54010: loss 7.4450, time 120.02ms
iter 54020: loss 7.0463, time 119.53ms
iter 54030: loss 6.9991, time 119.59ms
iter 54040: loss 7.3775, time 119.24ms
iter 54050: loss 7.4363, time 119.22ms
iter 54060: loss 8.0881, time 119.42ms
iter 54070: loss 6.1991, time 120.90ms
iter 54080: loss 7.8343, time 122.34ms
iter 54090: loss 7.2989, time 119.21ms
tensor(0.0778)
iter 54100: loss 6.8810, time 121.93ms
iter 54110: loss 7.1126, time 118.39ms
iter 54120: loss 7.2127, time 120.07ms
iter 54130: loss 6.9596, time 120.30ms
iter 54140: loss 6.8308, time 121.63ms
iter 54150: loss 8.2002, time 120.13ms
iter 54160: loss 7.3576, time 120.05ms
iter 54170: loss 7.2979, time 120.30ms
iter 54180: loss 7.3175, time 121.87ms
iter 54190: loss 6.3084, time 119.32ms
tensor(0.0618)
iter 54200: loss 7.6340, time 122.15ms
iter 54210: loss 6.6001, time 119.04ms
iter 54220: loss 7.5840, time 120.62ms
iter 54230: loss 7.3855, time 120.36ms
iter 54240: loss 7.2777, time 119.22ms
step 54250: train loss 6.2309, val loss 6.3007
saving checkpoint to out-shakespeare-char
iter 54250: loss 7.3509, time 2862.76ms
iter 54260: loss 7.3245, time 117.13ms
iter 54270: loss 7.2233, time 118.65ms
iter 54280: loss 7.1428, time 117.63ms
iter 54290: loss 6.8372, time 118.55ms
tensor(0.0476)
iter 54300: loss 7.3688, time 118.21ms
iter 54310: loss 7.5492, time 119.87ms
iter 54320: loss 7.8856, time 118.32ms
iter 54330: loss 7.6285, time 118.74ms
iter 54340: loss 6.8177, time 118.16ms
iter 54350: loss 7.9284, time 121.72ms
iter 54360: loss 7.7732, time 119.64ms
iter 54370: loss 6.3985, time 121.44ms
iter 54380: loss 7.2632, time 119.21ms
iter 54390: loss 7.6348, time 120.71ms
tensor(0.0351)
iter 54400: loss 7.0454, time 120.72ms
iter 54410: loss 6.9153, time 120.54ms
iter 54420: loss 7.0923, time 119.62ms
iter 54430: loss 6.8451, time 119.17ms
iter 54440: loss 6.7078, time 119.50ms
iter 54450: loss 7.6304, time 120.39ms
iter 54460: loss 6.7548, time 119.29ms
iter 54470: loss 6.7083, time 119.46ms
iter 54480: loss 7.3264, time 118.94ms
iter 54490: loss 7.2976, time 119.21ms
tensor(0.0245)
step 54500: train loss 6.2291, val loss 6.2050
saving checkpoint to out-shakespeare-char
iter 54500: loss 7.3721, time 2867.49ms
iter 54510: loss 7.4926, time 119.56ms
iter 54520: loss 7.6749, time 119.61ms
iter 54530: loss 7.1356, time 120.75ms
iter 54540: loss 6.8102, time 119.52ms
iter 54550: loss 7.6568, time 120.55ms
iter 54560: loss 7.0860, time 120.06ms
iter 54570: loss 6.9812, time 120.70ms
iter 54580: loss 7.5263, time 120.13ms
iter 54590: loss 7.5788, time 120.71ms
tensor(0.0157)
iter 54600: loss 7.1822, time 119.58ms
iter 54610: loss 7.9853, time 120.15ms
iter 54620: loss 7.6465, time 121.26ms
iter 54630: loss 7.5111, time 120.66ms
iter 54640: loss 5.9844, time 121.05ms
iter 54650: loss 7.1455, time 120.73ms
iter 54660: loss 6.8443, time 119.53ms
iter 54670: loss 7.3845, time 121.86ms
iter 54680: loss 7.5018, time 121.97ms
iter 54690: loss 6.7412, time 119.58ms
tensor(0.0089)
iter 54700: loss 6.4361, time 120.06ms
iter 54710: loss 6.9283, time 119.38ms
iter 54720: loss 7.7117, time 122.59ms
iter 54730: loss 7.2127, time 120.84ms
iter 54740: loss 7.8748, time 120.55ms
step 54750: train loss 6.2040, val loss 6.2023
saving checkpoint to out-shakespeare-char
iter 54750: loss 7.1415, time 2894.60ms
iter 54760: loss 7.0583, time 123.25ms
iter 54770: loss 7.1364, time 123.17ms
iter 54780: loss 7.5556, time 126.01ms
iter 54790: loss 6.9420, time 123.13ms
tensor(0.0039)
iter 54800: loss 7.1716, time 123.08ms
iter 54810: loss 6.3507, time 123.35ms
iter 54820: loss 6.7534, time 123.14ms
iter 54830: loss 6.8203, time 122.09ms
iter 54840: loss 7.3982, time 123.61ms
iter 54850: loss 6.8438, time 122.88ms
iter 54860: loss 7.5527, time 125.56ms
iter 54870: loss 6.1671, time 122.99ms
iter 54880: loss 7.3297, time 122.75ms
iter 54890: loss 7.8882, time 122.83ms
tensor(0.0010)
iter 54900: loss 6.6229, time 122.77ms
iter 54910: loss 6.2853, time 122.66ms
iter 54920: loss 7.2318, time 123.12ms
iter 54930: loss 7.8349, time 126.30ms
iter 54940: loss 7.8999, time 123.30ms
iter 54950: loss 7.4747, time 123.46ms
iter 54960: loss 7.6096, time 123.38ms
iter 54970: loss 7.1305, time 123.33ms
iter 54980: loss 7.6754, time 123.99ms
iter 54990: loss 7.2025, time 122.98ms
tensor(0.0010)
step 55000: train loss 6.1918, val loss 6.1649
saving checkpoint to out-shakespeare-char
iter 55000: loss 6.6344, time 2843.79ms
iter 55010: loss 6.6513, time 123.22ms
iter 55020: loss 7.0940, time 126.36ms
iter 55030: loss 6.8999, time 123.13ms
iter 55040: loss 8.6107, time 122.83ms
iter 55050: loss 7.6130, time 123.04ms
iter 55060: loss 7.3774, time 122.49ms
iter 55070: loss 6.8223, time 122.96ms
iter 55080: loss 7.3525, time 122.86ms
iter 55090: loss 7.4883, time 123.37ms
tensor(0.0010)
iter 55100: loss 7.0675, time 123.09ms
iter 55110: loss 7.2186, time 123.04ms
iter 55120: loss 7.1083, time 123.37ms
iter 55130: loss 6.7328, time 122.86ms
iter 55140: loss 7.3961, time 123.14ms
iter 55150: loss 6.9813, time 122.23ms
iter 55160: loss 7.6377, time 122.98ms
iter 55170: loss 6.6409, time 126.20ms
iter 55180: loss 6.1619, time 123.27ms
iter 55190: loss 6.7116, time 122.40ms
tensor(0.0039)
iter 55200: loss 7.5971, time 125.88ms
iter 55210: loss 6.2626, time 123.14ms
iter 55220: loss 7.2643, time 123.09ms
iter 55230: loss 7.0631, time 123.11ms
iter 55240: loss 7.4246, time 123.09ms
step 55250: train loss 6.2023, val loss 6.2286
saving checkpoint to out-shakespeare-char
iter 55250: loss 7.8541, time 2813.97ms
iter 55260: loss 7.3420, time 123.02ms
iter 55270: loss 7.9453, time 123.21ms
iter 55280: loss 7.7643, time 126.01ms
iter 55290: loss 7.1444, time 122.69ms
tensor(0.0089)
iter 55300: loss 7.6139, time 123.16ms
iter 55310: loss 7.3949, time 121.54ms
iter 55320: loss 7.3343, time 122.66ms
iter 55330: loss 7.2159, time 123.14ms
iter 55340: loss 7.0039, time 122.85ms
iter 55350: loss 7.9006, time 126.26ms
iter 55360: loss 7.8670, time 122.94ms
iter 55370: loss 8.1501, time 122.87ms
iter 55380: loss 6.9757, time 123.04ms
iter 55390: loss 6.7664, time 123.33ms
tensor(0.0157)
iter 55400: loss 7.6217, time 123.29ms
iter 55410: loss 7.4087, time 122.65ms
iter 55420: loss 7.3267, time 123.75ms
iter 55430: loss 6.5860, time 123.30ms
iter 55440: loss 8.3316, time 122.78ms
iter 55450: loss 7.2824, time 123.02ms
iter 55460: loss 7.5083, time 122.93ms
iter 55470: loss 7.4296, time 122.80ms
iter 55480: loss 7.5367, time 123.09ms
iter 55490: loss 7.1039, time 125.77ms
tensor(0.0245)
step 55500: train loss 6.2114, val loss 6.2252
saving checkpoint to out-shakespeare-char
iter 55500: loss 6.9829, time 2850.00ms
iter 55510: loss 7.5048, time 123.20ms
iter 55520: loss 7.5358, time 123.11ms
iter 55530: loss 7.6871, time 123.20ms
iter 55540: loss 7.8325, time 121.77ms
iter 55550: loss 6.8066, time 122.77ms
iter 55560: loss 7.0644, time 123.12ms
iter 55570: loss 7.4584, time 122.77ms
iter 55580: loss 7.2382, time 123.28ms
iter 55590: loss 7.9731, time 123.44ms
tensor(0.0351)
iter 55600: loss 6.5740, time 123.19ms
iter 55610: loss 7.1967, time 126.22ms
iter 55620: loss 7.2767, time 123.30ms
iter 55630: loss 6.7905, time 123.05ms
iter 55640: loss 7.4798, time 122.91ms
iter 55650: loss 6.8200, time 123.04ms
iter 55660: loss 7.5746, time 122.95ms
iter 55670: loss 7.1516, time 126.43ms
iter 55680: loss 7.1691, time 122.49ms
iter 55690: loss 6.8706, time 122.87ms
tensor(0.0476)
iter 55700: loss 7.6058, time 123.15ms
iter 55710: loss 7.0710, time 123.38ms
iter 55720: loss 8.3170, time 123.27ms
iter 55730: loss 6.9411, time 122.29ms
iter 55740: loss 7.8460, time 122.67ms
step 55750: train loss 6.1679, val loss 6.2227
saving checkpoint to out-shakespeare-char
iter 55750: loss 7.0991, time 2861.55ms
iter 55760: loss 7.1184, time 123.32ms
iter 55770: loss 7.5069, time 122.91ms
iter 55780: loss 7.6551, time 122.83ms
iter 55790: loss 6.6693, time 123.23ms
tensor(0.0618)
iter 55800: loss 7.4689, time 123.30ms
iter 55810: loss 7.2149, time 125.94ms
iter 55820: loss 6.8846, time 122.97ms
iter 55830: loss 7.6825, time 122.99ms
iter 55840: loss 6.6913, time 122.85ms
iter 55850: loss 7.1624, time 122.91ms
iter 55860: loss 7.1619, time 124.40ms
iter 55870: loss 7.3226, time 126.06ms
iter 55880: loss 7.3849, time 122.92ms
iter 55890: loss 7.0224, time 123.07ms
tensor(0.0778)
iter 55900: loss 7.2502, time 123.13ms
iter 55910: loss 7.3550, time 123.35ms
iter 55920: loss 7.6023, time 122.86ms
iter 55930: loss 7.4701, time 122.87ms
iter 55940: loss 6.3742, time 126.13ms
iter 55950: loss 6.5023, time 123.46ms
iter 55960: loss 7.4361, time 123.05ms
iter 55970: loss 7.3823, time 123.04ms
iter 55980: loss 7.1557, time 123.00ms
iter 55990: loss 8.0626, time 122.98ms
tensor(0.0955)
step 56000: train loss 6.2501, val loss 6.2527
saving checkpoint to out-shakespeare-char
iter 56000: loss 7.4375, time 2838.75ms
iter 56010: loss 6.8971, time 126.10ms
iter 56020: loss 7.1916, time 124.88ms
iter 56030: loss 7.5147, time 122.73ms
iter 56040: loss 7.7052, time 122.87ms
iter 56050: loss 7.0639, time 123.27ms
iter 56060: loss 7.3048, time 123.06ms
iter 56070: loss 7.6586, time 122.64ms
iter 56080: loss 7.2723, time 125.90ms
iter 56090: loss 6.8839, time 121.95ms
tensor(0.1147)
iter 56100: loss 7.3946, time 122.65ms
iter 56110: loss 6.8691, time 123.18ms
iter 56120: loss 6.2994, time 122.79ms
iter 56130: loss 6.8944, time 122.90ms
iter 56140: loss 7.4184, time 125.93ms
iter 56150: loss 7.2927, time 122.80ms
iter 56160: loss 6.8159, time 122.78ms
iter 56170: loss 6.6925, time 122.82ms
iter 56180: loss 7.1953, time 124.16ms
iter 56190: loss 7.1195, time 122.91ms
tensor(0.1355)
iter 56200: loss 7.2691, time 126.03ms
iter 56210: loss 6.8221, time 122.75ms
iter 56220: loss 7.6424, time 123.03ms
iter 56230: loss 7.7339, time 122.71ms
iter 56240: loss 7.2306, time 121.64ms
step 56250: train loss 6.2435, val loss 6.2062
saving checkpoint to out-shakespeare-char
iter 56250: loss 7.8430, time 2859.01ms
iter 56260: loss 7.9033, time 121.98ms
iter 56270: loss 6.4121, time 121.74ms
iter 56280: loss 6.4414, time 122.36ms
iter 56290: loss 6.9924, time 126.16ms
tensor(0.1577)
iter 56300: loss 7.5672, time 123.64ms
iter 56310: loss 7.5508, time 123.64ms
iter 56320: loss 7.3905, time 122.78ms
iter 56330: loss 7.8573, time 123.17ms
iter 56340: loss 7.5729, time 123.06ms
iter 56350: loss 7.0862, time 121.98ms
iter 56360: loss 7.3007, time 123.30ms
iter 56370: loss 6.8383, time 125.89ms
iter 56380: loss 7.5902, time 122.51ms
iter 56390: loss 6.4152, time 122.78ms
tensor(0.1813)
iter 56400: loss 6.4359, time 122.97ms
iter 56410: loss 7.1606, time 122.99ms
iter 56420: loss 7.7391, time 123.57ms
iter 56430: loss 7.5475, time 122.89ms
iter 56440: loss 7.8625, time 122.79ms
iter 56450: loss 6.7347, time 125.62ms
iter 56460: loss 6.6557, time 122.75ms
iter 56470: loss 7.5735, time 122.31ms
iter 56480: loss 6.9168, time 122.45ms
iter 56490: loss 7.5652, time 123.82ms
tensor(0.2061)
step 56500: train loss 6.2722, val loss 6.2013
saving checkpoint to out-shakespeare-char
iter 56500: loss 7.4148, time 2863.67ms
iter 56510: loss 7.5177, time 122.44ms
iter 56520: loss 6.4888, time 121.67ms
iter 56530: loss 7.1278, time 124.48ms
iter 56540: loss 7.2200, time 122.38ms
iter 56550: loss 6.5311, time 122.69ms
iter 56560: loss 6.5405, time 122.29ms
iter 56570: loss 7.1161, time 122.52ms
iter 56580: loss 7.4408, time 122.76ms
iter 56590: loss 7.7908, time 125.39ms
tensor(0.2321)
iter 56600: loss 7.4804, time 122.46ms
iter 56610: loss 7.8734, time 122.50ms
iter 56620: loss 7.6435, time 122.80ms
iter 56630: loss 7.0769, time 124.21ms
iter 56640: loss 6.8680, time 122.42ms
iter 56650: loss 7.8574, time 126.41ms
iter 56660: loss 6.9833, time 124.06ms
iter 56670: loss 6.4333, time 123.89ms
iter 56680: loss 7.4579, time 123.67ms
iter 56690: loss 7.0703, time 122.94ms
tensor(0.2591)
iter 56700: loss 7.7868, time 122.59ms
iter 56710: loss 6.6116, time 122.69ms
iter 56720: loss 7.4368, time 123.92ms
iter 56730: loss 7.8318, time 126.72ms
iter 56740: loss 7.0681, time 122.85ms
step 56750: train loss 6.2316, val loss 6.2088
saving checkpoint to out-shakespeare-char
iter 56750: loss 7.3287, time 2856.66ms
iter 56760: loss 7.5899, time 125.57ms
iter 56770: loss 7.1305, time 122.61ms
iter 56780: loss 6.5469, time 122.76ms
iter 56790: loss 6.6116, time 123.04ms
tensor(0.2871)
iter 56800: loss 7.2583, time 122.99ms
iter 56810: loss 7.3282, time 122.92ms
iter 56820: loss 7.2323, time 122.77ms
iter 56830: loss 7.2279, time 122.62ms
iter 56840: loss 7.5554, time 125.58ms
iter 56850: loss 6.7351, time 123.34ms
iter 56860: loss 6.3416, time 122.90ms
iter 56870: loss 7.4863, time 122.83ms
iter 56880: loss 6.5014, time 122.77ms
iter 56890: loss 7.6744, time 121.94ms
tensor(0.3159)
iter 56900: loss 7.3162, time 125.66ms
iter 56910: loss 7.9149, time 122.82ms
iter 56920: loss 7.8689, time 123.48ms
iter 56930: loss 7.5710, time 122.67ms
iter 56940: loss 7.1749, time 122.61ms
iter 56950: loss 6.7914, time 122.68ms
iter 56960: loss 7.3017, time 122.65ms
iter 56970: loss 7.4991, time 122.84ms
iter 56980: loss 7.6178, time 126.20ms
iter 56990: loss 6.3210, time 122.91ms
tensor(0.3455)
step 57000: train loss 6.2341, val loss 6.3187
saving checkpoint to out-shakespeare-char
iter 57000: loss 7.4291, time 2855.82ms
iter 57010: loss 6.8206, time 123.09ms
iter 57020: loss 8.0590, time 124.11ms
iter 57030: loss 7.5842, time 122.91ms
iter 57040: loss 7.5460, time 122.82ms
iter 57050: loss 6.8924, time 122.69ms
iter 57060: loss 7.9349, time 122.91ms
iter 57070: loss 7.5382, time 122.85ms
iter 57080: loss 7.1575, time 122.95ms
iter 57090: loss 7.1660, time 122.66ms
tensor(0.3757)
iter 57100: loss 7.5214, time 123.44ms
iter 57110: loss 7.0256, time 126.14ms
iter 57120: loss 7.8779, time 122.00ms
iter 57130: loss 7.5274, time 122.78ms
iter 57140: loss 7.0066, time 122.12ms
iter 57150: loss 7.2592, time 122.97ms
iter 57160: loss 7.4609, time 125.83ms
iter 57170: loss 7.9070, time 122.96ms
iter 57180: loss 7.4527, time 122.96ms
iter 57190: loss 7.4754, time 124.68ms
tensor(0.4063)
iter 57200: loss 6.8490, time 126.29ms
iter 57210: loss 7.3310, time 123.37ms
iter 57220: loss 7.6999, time 126.31ms
iter 57230: loss 8.0023, time 124.14ms
iter 57240: loss 6.7832, time 122.08ms
step 57250: train loss 6.2827, val loss 6.2683
saving checkpoint to out-shakespeare-char
iter 57250: loss 7.7246, time 2843.29ms
iter 57260: loss 6.9993, time 120.19ms
iter 57270: loss 8.0788, time 120.46ms
iter 57280: loss 7.4418, time 119.44ms
iter 57290: loss 6.9434, time 119.54ms
tensor(0.4373)
iter 57300: loss 7.3562, time 119.16ms
iter 57310: loss 7.0443, time 120.09ms
iter 57320: loss 7.2600, time 120.43ms
iter 57330: loss 7.6491, time 119.43ms
iter 57340: loss 7.7161, time 120.51ms
iter 57350: loss 6.6930, time 119.21ms
iter 57360: loss 7.6840, time 119.43ms
iter 57370: loss 6.7811, time 119.19ms
iter 57380: loss 7.6270, time 120.63ms
iter 57390: loss 6.8951, time 120.43ms
tensor(0.4686)
iter 57400: loss 7.8202, time 119.26ms
iter 57410: loss 6.4902, time 118.30ms
iter 57420: loss 7.2643, time 122.23ms
iter 57430: loss 7.2576, time 119.28ms
iter 57440: loss 7.3903, time 121.13ms
iter 57450: loss 7.0254, time 119.31ms
iter 57460: loss 7.3798, time 119.24ms
iter 57470: loss 7.6640, time 119.56ms
iter 57480: loss 7.2361, time 119.34ms
iter 57490: loss 7.6945, time 120.40ms
tensor(0.5000)
step 57500: train loss 6.3740, val loss 6.3454
saving checkpoint to out-shakespeare-char
iter 57500: loss 7.8683, time 2859.36ms
iter 57510: loss 7.1156, time 119.17ms
iter 57520: loss 7.3462, time 119.30ms
iter 57530: loss 6.9701, time 119.77ms
iter 57540: loss 7.3505, time 120.15ms
iter 57550: loss 7.2980, time 119.11ms
iter 57560: loss 7.0697, time 119.06ms
iter 57570: loss 6.7928, time 119.34ms
iter 57580: loss 7.4424, time 119.04ms
iter 57590: loss 7.1090, time 120.21ms
tensor(0.5314)
iter 57600: loss 7.7577, time 118.87ms
iter 57610: loss 7.5148, time 120.29ms
iter 57620: loss 7.1786, time 119.14ms
iter 57630: loss 7.8618, time 120.74ms
iter 57640: loss 7.1079, time 119.57ms
iter 57650: loss 7.1134, time 119.96ms
iter 57660: loss 7.2265, time 120.77ms
iter 57670: loss 7.7546, time 119.19ms
iter 57680: loss 7.0207, time 118.34ms
iter 57690: loss 7.5898, time 121.14ms
tensor(0.5627)
iter 57700: loss 7.7935, time 119.54ms
iter 57710: loss 7.4902, time 120.41ms
iter 57720: loss 7.0764, time 119.32ms
iter 57730: loss 7.2081, time 120.75ms
iter 57740: loss 7.0557, time 117.62ms
step 57750: train loss 6.3739, val loss 6.3615
saving checkpoint to out-shakespeare-char
iter 57750: loss 6.6899, time 2868.53ms
iter 57760: loss 6.7943, time 119.29ms
iter 57770: loss 7.2792, time 119.16ms
iter 57780: loss 7.4778, time 120.77ms
iter 57790: loss 6.7018, time 121.16ms
tensor(0.5937)
iter 57800: loss 7.9825, time 122.32ms
iter 57810: loss 7.7905, time 120.56ms
iter 57820: loss 7.9962, time 120.46ms
iter 57830: loss 8.4889, time 121.04ms
iter 57840: loss 7.7470, time 122.64ms
iter 57850: loss 7.1415, time 124.01ms
iter 57860: loss 7.6448, time 122.26ms
iter 57870: loss 7.2629, time 122.51ms
iter 57880: loss 7.3629, time 121.79ms
iter 57890: loss 7.5322, time 122.43ms
tensor(0.6243)
iter 57900: loss 7.0624, time 122.59ms
iter 57910: loss 7.2325, time 122.51ms
iter 57920: loss 7.5349, time 122.45ms
iter 57930: loss 7.8518, time 122.64ms
iter 57940: loss 7.0779, time 122.75ms
iter 57950: loss 7.3874, time 124.78ms
iter 57960: loss 8.3242, time 122.56ms
iter 57970: loss 7.1992, time 122.64ms
iter 57980: loss 7.2630, time 121.56ms
iter 57990: loss 7.9603, time 122.65ms
tensor(0.6545)
step 58000: train loss 6.3740, val loss 6.3592
saving checkpoint to out-shakespeare-char
iter 58000: loss 7.7111, time 2860.05ms
iter 58010: loss 7.4333, time 121.57ms
iter 58020: loss 7.4609, time 122.31ms
iter 58030: loss 7.5936, time 122.50ms
iter 58040: loss 7.2086, time 121.34ms
iter 58050: loss 7.5006, time 121.56ms
iter 58060: loss 7.5160, time 123.06ms
iter 58070: loss 6.3298, time 122.60ms
iter 58080: loss 7.3078, time 121.44ms
iter 58090: loss 7.3357, time 122.05ms
tensor(0.6841)
iter 58100: loss 6.6762, time 123.22ms
iter 58110: loss 7.8654, time 125.01ms
iter 58120: loss 7.1498, time 122.36ms
iter 58130: loss 7.5222, time 122.73ms
iter 58140: loss 7.6554, time 121.20ms
iter 58150: loss 7.6725, time 121.75ms
iter 58160: loss 6.8305, time 123.79ms
iter 58170: loss 6.7441, time 124.07ms
iter 58180: loss 6.8473, time 123.30ms
iter 58190: loss 7.6944, time 122.31ms
tensor(0.7129)
iter 58200: loss 7.1441, time 122.53ms
iter 58210: loss 7.6816, time 122.01ms
iter 58220: loss 6.7253, time 122.92ms
iter 58230: loss 7.6442, time 121.66ms
iter 58240: loss 6.7605, time 123.99ms
step 58250: train loss 6.4220, val loss 6.4323
saving checkpoint to out-shakespeare-char
iter 58250: loss 7.5536, time 2849.12ms
iter 58260: loss 7.1107, time 122.35ms
iter 58270: loss 7.6970, time 121.53ms
iter 58280: loss 8.0815, time 122.39ms
iter 58290: loss 7.8195, time 122.64ms
tensor(0.7409)
iter 58300: loss 6.8477, time 123.04ms
iter 58310: loss 6.9952, time 125.28ms
iter 58320: loss 8.1282, time 122.28ms
iter 58330: loss 7.1880, time 121.11ms
iter 58340: loss 7.1455, time 120.85ms
iter 58350: loss 7.4590, time 123.50ms
iter 58360: loss 6.9655, time 123.14ms
iter 58370: loss 7.6556, time 123.29ms
iter 58380: loss 7.1991, time 123.81ms
iter 58390: loss 7.5895, time 124.09ms
tensor(0.7679)
iter 58400: loss 7.4132, time 121.92ms
iter 58410: loss 7.4806, time 122.60ms
iter 58420: loss 8.0758, time 125.50ms
iter 58430: loss 7.2400, time 122.48ms
iter 58440: loss 6.8799, time 122.87ms
iter 58450: loss 7.1160, time 121.52ms
iter 58460: loss 6.9758, time 122.47ms
iter 58470: loss 7.3103, time 122.38ms
iter 58480: loss 7.1443, time 121.43ms
iter 58490: loss 6.7970, time 122.48ms
tensor(0.7939)
step 58500: train loss 6.4450, val loss 6.4064
saving checkpoint to out-shakespeare-char
iter 58500: loss 6.8753, time 2857.60ms
iter 58510: loss 6.6545, time 122.36ms
iter 58520: loss 7.6339, time 122.49ms
iter 58530: loss 7.7661, time 122.58ms
iter 58540: loss 7.4904, time 122.35ms
iter 58550: loss 7.6629, time 125.35ms
iter 58560: loss 7.6816, time 122.27ms
iter 58570: loss 7.8285, time 122.45ms
iter 58580: loss 7.6303, time 122.53ms
iter 58590: loss 7.4581, time 122.57ms
tensor(0.8187)
iter 58600: loss 6.9445, time 122.01ms
iter 58610: loss 7.7452, time 120.99ms
iter 58620: loss 7.8115, time 122.46ms
iter 58630: loss 6.9874, time 121.95ms
iter 58640: loss 8.0878, time 122.43ms
iter 58650: loss 7.1132, time 122.01ms
iter 58660: loss 6.8408, time 122.42ms
iter 58670: loss 7.5320, time 125.49ms
iter 58680: loss 7.5884, time 122.07ms
iter 58690: loss 7.6206, time 122.23ms
tensor(0.8423)
iter 58700: loss 7.5547, time 122.32ms
iter 58710: loss 6.8866, time 123.58ms
iter 58720: loss 7.1265, time 122.12ms
iter 58730: loss 7.6518, time 125.37ms
iter 58740: loss 7.4817, time 122.34ms
step 58750: train loss 6.4139, val loss 6.5031
saving checkpoint to out-shakespeare-char
iter 58750: loss 8.1658, time 2844.76ms
iter 58760: loss 8.1351, time 122.50ms
iter 58770: loss 7.6678, time 121.47ms
iter 58780: loss 7.3611, time 123.38ms
iter 58790: loss 7.8172, time 122.07ms
tensor(0.8645)
iter 58800: loss 7.6244, time 127.64ms
iter 58810: loss 7.7086, time 125.36ms
iter 58820: loss 7.7118, time 122.67ms
iter 58830: loss 7.3343, time 120.68ms
iter 58840: loss 7.3758, time 121.27ms
iter 58850: loss 7.2342, time 123.26ms
iter 58860: loss 7.4639, time 123.50ms
iter 58870: loss 7.1748, time 125.40ms
iter 58880: loss 6.3909, time 123.24ms
iter 58890: loss 7.4778, time 123.35ms
tensor(0.8853)
iter 58900: loss 7.9781, time 123.21ms
iter 58910: loss 8.4565, time 122.36ms
iter 58920: loss 7.2449, time 123.12ms
iter 58930: loss 7.5507, time 123.00ms
iter 58940: loss 7.9713, time 124.42ms
iter 58950: loss 7.4030, time 122.90ms
iter 58960: loss 7.2347, time 123.22ms
iter 58970: loss 7.3777, time 122.61ms
iter 58980: loss 7.7767, time 122.47ms
iter 58990: loss 7.5473, time 125.60ms
tensor(0.9045)
step 59000: train loss 6.4830, val loss 6.4226
saving checkpoint to out-shakespeare-char
iter 59000: loss 8.2234, time 2857.77ms
iter 59010: loss 7.8038, time 125.30ms
iter 59020: loss 7.6278, time 123.26ms
iter 59030: loss 7.2458, time 123.92ms
iter 59040: loss 6.6319, time 123.32ms
iter 59050: loss 7.4297, time 122.68ms
iter 59060: loss 6.6755, time 122.16ms
iter 59070: loss 6.9527, time 125.10ms
iter 59080: loss 8.0166, time 123.31ms
iter 59090: loss 7.0007, time 123.11ms
tensor(0.9222)
iter 59100: loss 7.1567, time 123.32ms
iter 59110: loss 7.5930, time 123.30ms
iter 59120: loss 7.0744, time 122.37ms
iter 59130: loss 7.0351, time 126.00ms
iter 59140: loss 7.1034, time 122.14ms
iter 59150: loss 8.1598, time 122.27ms
iter 59160: loss 7.6123, time 123.11ms
iter 59170: loss 7.2430, time 123.30ms
iter 59180: loss 7.1634, time 123.33ms
iter 59190: loss 6.9755, time 123.13ms
tensor(0.9382)
iter 59200: loss 7.5152, time 126.05ms
iter 59210: loss 7.0469, time 122.96ms
iter 59220: loss 8.0012, time 122.58ms
iter 59230: loss 7.8918, time 123.18ms
iter 59240: loss 7.4636, time 122.52ms
step 59250: train loss 6.5148, val loss 6.4599
saving checkpoint to out-shakespeare-char
iter 59250: loss 6.7588, time 2864.91ms
iter 59260: loss 7.6218, time 119.49ms
iter 59270: loss 7.7155, time 119.63ms
iter 59280: loss 6.8534, time 119.70ms
iter 59290: loss 6.4005, time 120.69ms
tensor(0.9524)
iter 59300: loss 7.3002, time 120.71ms
iter 59310: loss 6.6845, time 120.91ms
iter 59320: loss 8.1194, time 121.09ms
iter 59330: loss 6.9994, time 119.81ms
iter 59340: loss 7.6789, time 120.45ms
iter 59350: loss 7.3172, time 120.92ms
iter 59360: loss 7.4396, time 119.45ms
iter 59370: loss 6.1358, time 119.63ms
iter 59380: loss 7.0804, time 120.67ms
iter 59390: loss 7.7560, time 120.71ms
tensor(0.9649)
iter 59400: loss 7.9090, time 119.63ms
iter 59410: loss 7.4186, time 120.55ms
iter 59420: loss 7.7550, time 120.95ms
iter 59430: loss 7.1187, time 119.21ms
iter 59440: loss 7.7834, time 118.47ms
iter 59450: loss 6.7401, time 120.14ms
iter 59460: loss 8.1128, time 121.10ms
iter 59470: loss 6.8082, time 119.45ms
iter 59480: loss 7.5877, time 119.09ms
iter 59490: loss 7.0503, time 120.51ms
tensor(0.9755)
step 59500: train loss 6.4496, val loss 6.4959
saving checkpoint to out-shakespeare-char
iter 59500: loss 6.8983, time 2881.89ms
iter 59510: loss 7.8035, time 120.57ms
iter 59520: loss 6.5456, time 120.78ms
iter 59530: loss 7.2006, time 120.73ms
iter 59540: loss 7.8104, time 119.67ms
iter 59550: loss 7.5188, time 120.67ms
iter 59560: loss 7.0447, time 122.61ms
iter 59570: loss 7.8496, time 119.60ms
iter 59580: loss 7.9298, time 121.04ms
iter 59590: loss 7.2132, time 119.41ms
tensor(0.9843)
iter 59600: loss 7.6736, time 121.24ms
iter 59610: loss 7.5921, time 119.88ms
iter 59620: loss 7.5970, time 119.39ms
iter 59630: loss 7.3297, time 120.61ms
iter 59640: loss 7.3021, time 120.71ms
iter 59650: loss 7.1578, time 120.88ms
iter 59660: loss 7.6511, time 121.09ms
iter 59670: loss 7.6131, time 118.83ms
iter 59680: loss 7.7240, time 119.88ms
iter 59690: loss 7.2240, time 120.30ms
tensor(0.9911)
iter 59700: loss 7.4404, time 119.42ms
iter 59710: loss 7.7688, time 120.20ms
iter 59720: loss 7.3170, time 119.18ms
iter 59730: loss 7.2129, time 119.08ms
iter 59740: loss 7.7708, time 118.71ms
step 59750: train loss 6.4590, val loss 6.4705
saving checkpoint to out-shakespeare-char
iter 59750: loss 7.5255, time 2868.61ms
iter 59760: loss 6.9191, time 119.45ms
iter 59770: loss 7.5639, time 120.55ms
iter 59780: loss 8.0228, time 119.87ms
iter 59790: loss 7.2932, time 118.85ms
tensor(0.9961)
iter 59800: loss 7.1653, time 119.37ms
iter 59810: loss 7.4919, time 120.19ms
iter 59820: loss 7.5881, time 119.00ms
iter 59830: loss 7.3443, time 120.28ms
iter 59840: loss 7.6356, time 120.72ms
iter 59850: loss 7.2884, time 119.36ms
iter 59860: loss 7.4236, time 121.76ms
iter 59870: loss 6.4191, time 119.06ms
iter 59880: loss 7.8020, time 118.94ms
iter 59890: loss 8.1671, time 119.53ms
tensor(0.9990)
iter 59900: loss 7.7436, time 119.06ms
iter 59910: loss 7.8384, time 124.49ms
iter 59920: loss 7.5414, time 122.91ms
iter 59930: loss 7.2877, time 125.95ms
iter 59940: loss 6.5954, time 122.57ms
iter 59950: loss 7.9644, time 121.83ms
iter 59960: loss 7.0904, time 122.66ms
iter 59970: loss 7.3784, time 122.86ms
iter 59980: loss 6.7273, time 125.07ms
iter 59990: loss 7.6171, time 122.67ms
tensor(1.)
step 60000: train loss 6.4396, val loss 6.5022
saving checkpoint to out-shakespeare-char
iter 60000: loss 7.7974, time 2863.18ms
iter 60010: loss 6.7776, time 119.12ms
iter 60020: loss 7.3945, time 120.24ms
iter 60030: loss 6.8762, time 120.62ms
iter 60040: loss 7.6155, time 118.66ms
iter 60050: loss 7.5846, time 120.37ms
iter 60060: loss 6.4628, time 121.82ms
iter 60070: loss 8.0687, time 121.23ms
iter 60080: loss 6.7918, time 121.00ms
iter 60090: loss 8.1314, time 120.81ms
tensor(0.9990)
iter 60100: loss 7.3187, time 118.76ms
iter 60110: loss 7.1763, time 119.69ms
iter 60120: loss 7.7986, time 119.22ms
iter 60130: loss 7.5351, time 119.43ms
iter 60140: loss 7.3806, time 120.79ms
iter 60150: loss 7.4703, time 119.78ms
iter 60160: loss 6.9038, time 119.72ms
iter 60170: loss 7.0092, time 119.83ms
iter 60180: loss 7.6750, time 119.58ms
iter 60190: loss 7.7380, time 119.97ms
tensor(0.9961)
iter 60200: loss 7.7933, time 121.65ms
iter 60210: loss 6.9948, time 123.16ms
iter 60220: loss 7.5299, time 120.71ms
iter 60230: loss 7.6807, time 119.55ms
iter 60240: loss 7.5593, time 121.13ms
step 60250: train loss 6.4485, val loss 6.4970
saving checkpoint to out-shakespeare-char
iter 60250: loss 7.7653, time 2874.74ms
iter 60260: loss 7.6075, time 118.83ms
iter 60270: loss 7.5885, time 118.31ms
iter 60280: loss 6.7053, time 119.93ms
iter 60290: loss 6.9662, time 121.64ms
tensor(0.9911)
iter 60300: loss 7.1267, time 119.44ms
iter 60310: loss 7.5523, time 117.83ms
iter 60320: loss 7.5843, time 120.91ms
iter 60330: loss 7.0525, time 118.96ms
iter 60340: loss 6.8513, time 120.77ms
iter 60350: loss 6.7056, time 122.31ms
iter 60360: loss 8.0758, time 120.81ms
iter 60370: loss 7.3003, time 120.59ms
iter 60380: loss 7.0827, time 122.79ms
iter 60390: loss 7.4946, time 120.21ms
tensor(0.9843)
iter 60400: loss 7.2129, time 119.54ms
iter 60410: loss 6.8285, time 122.97ms
iter 60420: loss 6.7725, time 122.78ms
iter 60430: loss 7.8230, time 123.13ms
iter 60440: loss 7.3697, time 125.40ms
iter 60450: loss 7.6312, time 122.30ms
iter 60460: loss 7.5145, time 121.74ms
iter 60470: loss 7.1645, time 123.10ms
iter 60480: loss 7.4124, time 122.80ms
iter 60490: loss 7.1574, time 123.13ms
tensor(0.9755)
step 60500: train loss 6.3679, val loss 6.5013
saving checkpoint to out-shakespeare-char
iter 60500: loss 8.0308, time 2878.11ms
iter 60510: loss 7.2005, time 122.79ms
iter 60520: loss 7.6302, time 122.72ms
iter 60530: loss 7.2894, time 123.32ms
iter 60540: loss 7.4881, time 122.82ms
iter 60550: loss 6.9257, time 123.05ms
iter 60560: loss 6.6234, time 122.74ms
iter 60570: loss 7.6411, time 122.47ms
iter 60580: loss 6.7320, time 125.68ms
iter 60590: loss 7.3244, time 122.61ms
tensor(0.9649)
iter 60600: loss 7.3527, time 122.86ms
iter 60610: loss 7.0369, time 122.91ms
iter 60620: loss 6.9570, time 123.33ms
iter 60630: loss 7.1445, time 121.62ms
iter 60640: loss 6.9069, time 125.23ms
iter 60650: loss 7.3046, time 122.25ms
iter 60660: loss 6.7512, time 122.51ms
iter 60670: loss 6.9443, time 123.17ms
iter 60680: loss 6.7920, time 122.96ms
iter 60690: loss 6.8310, time 122.68ms
tensor(0.9524)
iter 60700: loss 7.5145, time 124.35ms
iter 60710: loss 7.1961, time 125.96ms
iter 60720: loss 7.1219, time 122.08ms
iter 60730: loss 7.8687, time 122.82ms
iter 60740: loss 6.9889, time 122.57ms
step 60750: train loss 6.3557, val loss 6.4205
saving checkpoint to out-shakespeare-char
iter 60750: loss 7.4242, time 2864.12ms
iter 60760: loss 6.9831, time 122.81ms
iter 60770: loss 7.3669, time 122.90ms
iter 60780: loss 6.9309, time 122.52ms
iter 60790: loss 7.8890, time 122.74ms
tensor(0.9382)
iter 60800: loss 7.5003, time 122.85ms
iter 60810: loss 7.6293, time 125.90ms
iter 60820: loss 6.6778, time 122.68ms
iter 60830: loss 7.3788, time 122.86ms
iter 60840: loss 7.5484, time 123.53ms
iter 60850: loss 7.2962, time 122.94ms
iter 60860: loss 7.4026, time 120.63ms
iter 60870: loss 7.3949, time 126.08ms
iter 60880: loss 7.3593, time 122.74ms
iter 60890: loss 7.2321, time 122.77ms
tensor(0.9222)
iter 60900: loss 7.1769, time 122.32ms
iter 60910: loss 7.1789, time 122.70ms
iter 60920: loss 7.7508, time 122.81ms
iter 60930: loss 6.7094, time 122.87ms
iter 60940: loss 7.3833, time 126.04ms
iter 60950: loss 6.7512, time 122.87ms
iter 60960: loss 7.8420, time 122.99ms
iter 60970: loss 7.7326, time 123.51ms
iter 60980: loss 7.5323, time 122.92ms
iter 60990: loss 6.8356, time 122.39ms
tensor(0.9045)
step 61000: train loss 6.4092, val loss 6.3890
saving checkpoint to out-shakespeare-char
iter 61000: loss 6.8512, time 2851.80ms
iter 61010: loss 7.5321, time 125.78ms
iter 61020: loss 7.8030, time 122.77ms
iter 61030: loss 7.2571, time 122.96ms
iter 61040: loss 7.4161, time 124.42ms
iter 61050: loss 6.5894, time 123.19ms
iter 61060: loss 6.9398, time 123.67ms
iter 61070: loss 6.7547, time 123.04ms
iter 61080: loss 7.1261, time 125.91ms
iter 61090: loss 6.9304, time 124.50ms
tensor(0.8853)
iter 61100: loss 6.8842, time 123.46ms
iter 61110: loss 7.1875, time 123.62ms
iter 61120: loss 7.0061, time 123.27ms
iter 61130: loss 6.7829, time 122.38ms
iter 61140: loss 7.0377, time 126.47ms
iter 61150: loss 8.1431, time 122.28ms
iter 61160: loss 7.9458, time 123.23ms
iter 61170: loss 6.9403, time 123.71ms
iter 61180: loss 6.5690, time 123.22ms
iter 61190: loss 6.8273, time 123.26ms
tensor(0.8645)
iter 61200: loss 8.0911, time 124.46ms
iter 61210: loss 7.0193, time 123.45ms
iter 61220: loss 6.8957, time 123.35ms
iter 61230: loss 8.0224, time 123.35ms
iter 61240: loss 7.1215, time 123.84ms
step 61250: train loss 6.4228, val loss 6.4475
saving checkpoint to out-shakespeare-char
iter 61250: loss 7.1186, time 2853.62ms
iter 61260: loss 7.1451, time 126.66ms
iter 61270: loss 7.4042, time 123.09ms
iter 61280: loss 6.7914, time 123.02ms
iter 61290: loss 7.2091, time 123.41ms
tensor(0.8423)
iter 61300: loss 7.1684, time 123.10ms
iter 61310: loss 7.6011, time 123.25ms
iter 61320: loss 8.0737, time 122.35ms
iter 61330: loss 7.0651, time 122.05ms
iter 61340: loss 7.5532, time 123.25ms
iter 61350: loss 8.1248, time 123.25ms
iter 61360: loss 7.6522, time 123.20ms
iter 61370: loss 7.7902, time 123.41ms
iter 61380: loss 7.1042, time 123.29ms
iter 61390: loss 6.8413, time 126.74ms
tensor(0.8187)
iter 61400: loss 6.8786, time 123.11ms
iter 61410: loss 6.6232, time 123.29ms
iter 61420: loss 7.3158, time 122.53ms
iter 61430: loss 7.9348, time 123.45ms
iter 61440: loss 7.6316, time 123.63ms
iter 61450: loss 7.8000, time 123.41ms
iter 61460: loss 7.5851, time 123.44ms
iter 61470: loss 7.7007, time 125.93ms
iter 61480: loss 7.6020, time 123.22ms
iter 61490: loss 7.5137, time 123.33ms
tensor(0.7939)
step 61500: train loss 6.3508, val loss 6.3408
saving checkpoint to out-shakespeare-char
iter 61500: loss 7.6729, time 2865.00ms
iter 61510: loss 7.3071, time 120.89ms
iter 61520: loss 6.8912, time 119.02ms
iter 61530: loss 7.4781, time 119.04ms
iter 61540: loss 7.4801, time 120.06ms
iter 61550: loss 6.9540, time 119.19ms
iter 61560: loss 7.5757, time 120.06ms
iter 61570: loss 6.8489, time 118.64ms
iter 61580: loss 7.7702, time 120.80ms
iter 61590: loss 7.8780, time 118.28ms
tensor(0.7679)
iter 61600: loss 6.5112, time 119.95ms
iter 61610: loss 6.5486, time 120.20ms
iter 61620: loss 6.9460, time 120.16ms
iter 61630: loss 8.0652, time 120.59ms
iter 61640: loss 7.1715, time 119.27ms
iter 61650: loss 6.9650, time 120.69ms
iter 61660: loss 6.7355, time 120.59ms
iter 61670: loss 8.3980, time 119.00ms
iter 61680: loss 7.3245, time 119.56ms
iter 61690: loss 7.4381, time 120.59ms
tensor(0.7409)
iter 61700: loss 7.0023, time 118.33ms
iter 61710: loss 7.5651, time 119.40ms
iter 61720: loss 7.1987, time 120.60ms
iter 61730: loss 6.8542, time 120.59ms
iter 61740: loss 7.4085, time 120.69ms
step 61750: train loss 6.4461, val loss 6.3745
saving checkpoint to out-shakespeare-char
iter 61750: loss 7.6488, time 2861.15ms
iter 61760: loss 7.2101, time 119.36ms
iter 61770: loss 6.7752, time 120.61ms
iter 61780: loss 6.6886, time 118.84ms
iter 61790: loss 6.8743, time 120.03ms
tensor(0.7129)
iter 61800: loss 7.5966, time 120.18ms
iter 61810: loss 7.6490, time 122.06ms
iter 61820: loss 7.4808, time 118.84ms
iter 61830: loss 7.5547, time 118.82ms
iter 61840: loss 7.0177, time 119.07ms
iter 61850: loss 6.3153, time 119.38ms
iter 61860: loss 6.4207, time 120.77ms
iter 61870: loss 6.9039, time 120.49ms
iter 61880: loss 7.2506, time 120.32ms
iter 61890: loss 6.7027, time 119.17ms
tensor(0.6841)
iter 61900: loss 7.3355, time 120.52ms
iter 61910: loss 7.0630, time 119.52ms
iter 61920: loss 7.0730, time 119.07ms
iter 61930: loss 7.4987, time 119.20ms
iter 61940: loss 7.3835, time 120.25ms
iter 61950: loss 7.3485, time 120.54ms
iter 61960: loss 7.5426, time 121.28ms
iter 61970: loss 6.8768, time 120.96ms
iter 61980: loss 6.2544, time 120.69ms
iter 61990: loss 8.1961, time 119.52ms
tensor(0.6545)
step 62000: train loss 6.3905, val loss 6.3002
saving checkpoint to out-shakespeare-char
iter 62000: loss 6.6300, time 2877.32ms
iter 62010: loss 7.5102, time 119.14ms
iter 62020: loss 7.0750, time 119.28ms
iter 62030: loss 6.6529, time 120.66ms
iter 62040: loss 7.8052, time 118.98ms
iter 62050: loss 7.3703, time 120.54ms
iter 62060: loss 8.0614, time 120.89ms
iter 62070: loss 6.7454, time 119.93ms
iter 62080: loss 6.7510, time 119.58ms
iter 62090: loss 7.2989, time 119.45ms
tensor(0.6243)
iter 62100: loss 6.9571, time 120.69ms
iter 62110: loss 7.1979, time 119.51ms
iter 62120: loss 7.4942, time 122.45ms
iter 62130: loss 7.1620, time 120.87ms
iter 62140: loss 7.9424, time 121.47ms
iter 62150: loss 7.6086, time 119.27ms
iter 62160: loss 6.9866, time 119.98ms
iter 62170: loss 7.6051, time 120.48ms
iter 62180: loss 7.0695, time 119.45ms
iter 62190: loss 7.7569, time 119.43ms
tensor(0.5937)
iter 62200: loss 7.1427, time 120.51ms
iter 62210: loss 7.2060, time 120.35ms
iter 62220: loss 6.2687, time 119.80ms
iter 62230: loss 6.7686, time 119.20ms
iter 62240: loss 7.1889, time 118.67ms
step 62250: train loss 6.3057, val loss 6.3510
saving checkpoint to out-shakespeare-char
iter 62250: loss 7.2945, time 2867.37ms
iter 62260: loss 7.5060, time 119.40ms
iter 62270: loss 8.1508, time 121.74ms
iter 62280: loss 6.7950, time 120.55ms
iter 62290: loss 7.2070, time 122.89ms
tensor(0.5627)
iter 62300: loss 7.3919, time 122.10ms
iter 62310: loss 6.9991, time 119.55ms
iter 62320: loss 6.6995, time 120.50ms
iter 62330: loss 6.9852, time 119.40ms
iter 62340: loss 7.2889, time 119.26ms
iter 62350: loss 7.0200, time 120.97ms
iter 62360: loss 5.9834, time 119.37ms
iter 62370: loss 6.3757, time 119.52ms
iter 62380: loss 7.2636, time 119.22ms
iter 62390: loss 7.7404, time 120.73ms
tensor(0.5314)
iter 62400: loss 6.4282, time 119.66ms
iter 62410: loss 7.8095, time 121.07ms
iter 62420: loss 6.5147, time 122.42ms
iter 62430: loss 7.6271, time 122.11ms
iter 62440: loss 7.6062, time 121.04ms
iter 62450: loss 7.3930, time 119.64ms
iter 62460: loss 7.2003, time 121.08ms
iter 62470: loss 6.8200, time 120.62ms
iter 62480: loss 6.9071, time 122.66ms
iter 62490: loss 6.7679, time 119.17ms
tensor(0.5000)
step 62500: train loss 6.2545, val loss 6.2363
saving checkpoint to out-shakespeare-char
iter 62500: loss 6.9542, time 2879.11ms
iter 62510: loss 7.5092, time 120.05ms
iter 62520: loss 7.0890, time 119.49ms
iter 62530: loss 7.3058, time 118.94ms
iter 62540: loss 7.5066, time 121.78ms
iter 62550: loss 6.5981, time 120.76ms
iter 62560: loss 7.4971, time 119.78ms
iter 62570: loss 7.0515, time 119.24ms
iter 62580: loss 7.0833, time 118.91ms
iter 62590: loss 6.6477, time 119.77ms
tensor(0.4686)
iter 62600: loss 7.1614, time 119.50ms
iter 62610: loss 7.3319, time 119.84ms
iter 62620: loss 7.8188, time 118.99ms
iter 62630: loss 8.1455, time 118.71ms
iter 62640: loss 7.3827, time 120.46ms
iter 62650: loss 6.5554, time 118.25ms
iter 62660: loss 7.3266, time 119.57ms
iter 62670: loss 7.2514, time 120.07ms
iter 62680: loss 7.1295, time 119.00ms
iter 62690: loss 6.8512, time 118.85ms
tensor(0.4373)
iter 62700: loss 7.6359, time 122.20ms
iter 62710: loss 7.3099, time 118.60ms
iter 62720: loss 7.1174, time 118.75ms
iter 62730: loss 7.7772, time 119.73ms
iter 62740: loss 6.8873, time 118.99ms
step 62750: train loss 6.3241, val loss 6.3272
saving checkpoint to out-shakespeare-char
iter 62750: loss 7.0028, time 2860.80ms
iter 62760: loss 7.2869, time 119.66ms
iter 62770: loss 7.1994, time 119.58ms
iter 62780: loss 7.1624, time 119.53ms
iter 62790: loss 6.6558, time 119.64ms
tensor(0.4063)
iter 62800: loss 7.2546, time 119.70ms
iter 62810: loss 6.8314, time 121.13ms
iter 62820: loss 6.8449, time 120.63ms
iter 62830: loss 7.0993, time 120.74ms
iter 62840: loss 7.7489, time 120.61ms
iter 62850: loss 7.0204, time 119.49ms
iter 62860: loss 6.7174, time 122.22ms
iter 62870: loss 6.8061, time 119.98ms
iter 62880: loss 7.0223, time 120.07ms
iter 62890: loss 7.0908, time 119.43ms
tensor(0.3757)
iter 62900: loss 7.0619, time 119.69ms
iter 62910: loss 6.7499, time 119.78ms
iter 62920: loss 7.5467, time 120.66ms
iter 62930: loss 7.0556, time 119.16ms
iter 62940: loss 7.7473, time 119.27ms
iter 62950: loss 7.5728, time 121.12ms
iter 62960: loss 6.7855, time 119.34ms
iter 62970: loss 7.5173, time 119.21ms
iter 62980: loss 7.5109, time 119.24ms
iter 62990: loss 6.7525, time 119.58ms
tensor(0.3455)
step 63000: train loss 6.2947, val loss 6.2237
saving checkpoint to out-shakespeare-char
iter 63000: loss 7.0407, time 2861.46ms
iter 63010: loss 6.7259, time 120.58ms
iter 63020: loss 6.6595, time 119.47ms
iter 63030: loss 6.7508, time 119.65ms
iter 63040: loss 7.0575, time 120.26ms
iter 63050: loss 7.5556, time 118.45ms
iter 63060: loss 6.9034, time 119.31ms
iter 63070: loss 7.5166, time 119.47ms
iter 63080: loss 6.9601, time 119.04ms
iter 63090: loss 6.9119, time 118.48ms
tensor(0.3159)
iter 63100: loss 7.1642, time 118.75ms
iter 63110: loss 7.2653, time 118.80ms
iter 63120: loss 7.2202, time 119.83ms
iter 63130: loss 6.8321, time 119.31ms
iter 63140: loss 7.4151, time 119.14ms
iter 63150: loss 6.8285, time 120.82ms
iter 63160: loss 6.8879, time 120.13ms
iter 63170: loss 7.0523, time 119.63ms
iter 63180: loss 7.4601, time 119.07ms
iter 63190: loss 6.7463, time 119.18ms
tensor(0.2871)
iter 63200: loss 6.9692, time 120.45ms
iter 63210: loss 6.7163, time 120.82ms
iter 63220: loss 7.6356, time 121.68ms
iter 63230: loss 6.6010, time 121.13ms
iter 63240: loss 6.8269, time 120.73ms
step 63250: train loss 6.1812, val loss 6.1893
saving checkpoint to out-shakespeare-char
iter 63250: loss 7.4684, time 2863.95ms
iter 63260: loss 6.2743, time 123.11ms
iter 63270: loss 7.5041, time 123.10ms
iter 63280: loss 7.4100, time 123.40ms
iter 63290: loss 7.1484, time 122.23ms
tensor(0.2591)
iter 63300: loss 6.7780, time 123.04ms
iter 63310: loss 7.0032, time 126.80ms
iter 63320: loss 7.0747, time 122.55ms
iter 63330: loss 6.7768, time 123.68ms
iter 63340: loss 6.3980, time 123.10ms
iter 63350: loss 7.1550, time 123.56ms
iter 63360: loss 7.0337, time 123.22ms
iter 63370: loss 6.7200, time 124.65ms
iter 63380: loss 7.0428, time 126.06ms
iter 63390: loss 7.3139, time 126.37ms
tensor(0.2321)
iter 63400: loss 6.6955, time 123.53ms
iter 63410: loss 6.7926, time 122.29ms
iter 63420: loss 6.7258, time 122.09ms
iter 63430: loss 7.4935, time 123.36ms
iter 63440: loss 6.3354, time 123.21ms
iter 63450: loss 7.1349, time 123.30ms
iter 63460: loss 7.2655, time 122.91ms
iter 63470: loss 7.3839, time 124.09ms
iter 63480: loss 6.8552, time 122.84ms
iter 63490: loss 7.1615, time 122.88ms
tensor(0.2061)
step 63500: train loss 6.1913, val loss 6.1739
saving checkpoint to out-shakespeare-char
iter 63500: loss 6.8283, time 2852.41ms
iter 63510: loss 6.5620, time 123.30ms
iter 63520: loss 7.1361, time 123.34ms
iter 63530: loss 6.9249, time 123.34ms
iter 63540: loss 7.5546, time 123.15ms
iter 63550: loss 6.5180, time 125.95ms
iter 63560: loss 6.8117, time 122.92ms
iter 63570: loss 7.0549, time 123.29ms
iter 63580: loss 6.1035, time 123.28ms
iter 63590: loss 7.2364, time 123.51ms
tensor(0.1813)
iter 63600: loss 7.7737, time 123.15ms
iter 63610: loss 6.5277, time 122.29ms
iter 63620: loss 7.2714, time 123.19ms
iter 63630: loss 7.4789, time 122.96ms
iter 63640: loss 7.8758, time 123.26ms
iter 63650: loss 6.9591, time 126.55ms
iter 63660: loss 6.7834, time 123.63ms
iter 63670: loss 6.6274, time 123.42ms
iter 63680: loss 6.5507, time 123.39ms
iter 63690: loss 7.4887, time 123.37ms
tensor(0.1577)
iter 63700: loss 6.7937, time 123.20ms
iter 63710: loss 6.5485, time 123.51ms
iter 63720: loss 6.7090, time 123.15ms
iter 63730: loss 7.4155, time 123.17ms
iter 63740: loss 7.1757, time 124.93ms
step 63750: train loss 6.2206, val loss 6.1548
saving checkpoint to out-shakespeare-char
iter 63750: loss 7.1360, time 2861.26ms
iter 63760: loss 6.2435, time 126.44ms
iter 63770: loss 6.9388, time 123.14ms
iter 63780: loss 6.5624, time 123.20ms
iter 63790: loss 8.0014, time 123.20ms
tensor(0.1355)
iter 63800: loss 7.2837, time 123.30ms
iter 63810: loss 6.4836, time 123.27ms
iter 63820: loss 6.4287, time 123.18ms
iter 63830: loss 6.8780, time 125.99ms
iter 63840: loss 7.0612, time 124.66ms
iter 63850: loss 6.8479, time 123.18ms
iter 63860: loss 6.5107, time 123.11ms
iter 63870: loss 7.1993, time 123.40ms
iter 63880: loss 6.9027, time 122.49ms
iter 63890: loss 7.0725, time 126.08ms
tensor(0.1147)
iter 63900: loss 6.8160, time 123.29ms
iter 63910: loss 7.1587, time 123.14ms
iter 63920: loss 7.4561, time 123.18ms
iter 63930: loss 6.5233, time 123.16ms
iter 63940: loss 6.3997, time 123.16ms
iter 63950: loss 6.7115, time 125.13ms
iter 63960: loss 7.4040, time 123.19ms
iter 63970: loss 6.0023, time 123.38ms
iter 63980: loss 6.9340, time 123.23ms
iter 63990: loss 6.7454, time 123.29ms
tensor(0.0955)
step 64000: train loss 6.1877, val loss 6.1387
saving checkpoint to out-shakespeare-char
iter 64000: loss 7.8181, time 2877.74ms
iter 64010: loss 6.6376, time 123.26ms
iter 64020: loss 6.2680, time 123.25ms
iter 64030: loss 7.0214, time 126.10ms
iter 64040: loss 6.5877, time 123.41ms
iter 64050: loss 6.6193, time 123.94ms
iter 64060: loss 6.5621, time 123.51ms
iter 64070: loss 6.5642, time 123.27ms
iter 64080: loss 6.1015, time 123.28ms
iter 64090: loss 6.8476, time 123.28ms
tensor(0.0778)
iter 64100: loss 6.2894, time 123.08ms
iter 64110: loss 7.2607, time 125.90ms
iter 64120: loss 6.5890, time 124.08ms
iter 64130: loss 8.2570, time 122.44ms
iter 64140: loss 6.9026, time 123.19ms
iter 64150: loss 7.4738, time 122.31ms
iter 64160: loss 7.0918, time 123.44ms
iter 64170: loss 7.0067, time 124.43ms
iter 64180: loss 6.4492, time 123.15ms
iter 64190: loss 6.6330, time 126.12ms
tensor(0.0618)
iter 64200: loss 6.1210, time 123.09ms
iter 64210: loss 6.4495, time 123.34ms
iter 64220: loss 6.9322, time 123.20ms
iter 64230: loss 7.1355, time 123.41ms
iter 64240: loss 6.7285, time 123.62ms
step 64250: train loss 6.1448, val loss 6.1401
saving checkpoint to out-shakespeare-char
iter 64250: loss 7.4623, time 2863.53ms
iter 64260: loss 6.5954, time 123.31ms
iter 64270: loss 5.9720, time 123.64ms
iter 64280: loss 6.2968, time 123.22ms
iter 64290: loss 7.5220, time 123.14ms
tensor(0.0476)
iter 64300: loss 6.6551, time 125.78ms
iter 64310: loss 7.4475, time 123.39ms
iter 64320: loss 6.8187, time 124.73ms
iter 64330: loss 6.6431, time 123.26ms
iter 64340: loss 7.4780, time 123.35ms
iter 64350: loss 6.6376, time 123.27ms
iter 64360: loss 6.3117, time 127.61ms
iter 64370: loss 6.8563, time 125.73ms
iter 64380: loss 7.0189, time 130.97ms
iter 64390: loss 7.5054, time 122.01ms
tensor(0.0351)
iter 64400: loss 6.5352, time 122.59ms
iter 64410: loss 6.6957, time 122.98ms
iter 64420: loss 7.4242, time 121.22ms
iter 64430: loss 7.2466, time 125.38ms
iter 64440: loss 7.3551, time 122.80ms
iter 64450: loss 6.9534, time 123.26ms
iter 64460: loss 7.3281, time 122.50ms
iter 64470: loss 6.0019, time 122.76ms
iter 64480: loss 7.0190, time 123.50ms
iter 64490: loss 6.7032, time 122.34ms
tensor(0.0245)
step 64500: train loss 6.1402, val loss 6.1571
saving checkpoint to out-shakespeare-char
iter 64500: loss 6.5099, time 2866.67ms
iter 64510: loss 6.6982, time 125.15ms
iter 64520: loss 6.8544, time 122.33ms
iter 64530: loss 6.1093, time 122.85ms
iter 64540: loss 6.9889, time 123.57ms
iter 64550: loss 6.8586, time 122.51ms
iter 64560: loss 6.8935, time 122.56ms
iter 64570: loss 7.0297, time 125.71ms
iter 64580: loss 7.1535, time 123.21ms
iter 64590: loss 6.8633, time 122.82ms
tensor(0.0157)
iter 64600: loss 7.0340, time 121.90ms
iter 64610: loss 6.6016, time 122.73ms
iter 64620: loss 6.8885, time 123.99ms
iter 64630: loss 7.0866, time 121.09ms
iter 64640: loss 7.1667, time 121.11ms
iter 64650: loss 7.8047, time 120.65ms
iter 64660: loss 6.9956, time 119.71ms
iter 64670: loss 7.3760, time 123.08ms
iter 64680: loss 6.4871, time 123.66ms
iter 64690: loss 6.2164, time 122.51ms
tensor(0.0089)
iter 64700: loss 6.7660, time 122.94ms
iter 64710: loss 6.5657, time 123.41ms
iter 64720: loss 7.5138, time 126.08ms
iter 64730: loss 6.7724, time 126.00ms
iter 64740: loss 6.8125, time 124.15ms
step 64750: train loss 6.1698, val loss 6.1394
saving checkpoint to out-shakespeare-char
iter 64750: loss 6.7176, time 2853.92ms
iter 64760: loss 7.6443, time 119.36ms
iter 64770: loss 7.0169, time 120.48ms
iter 64780: loss 7.3579, time 120.51ms
iter 64790: loss 6.4042, time 119.17ms
tensor(0.0039)
iter 64800: loss 7.7545, time 120.10ms
iter 64810: loss 6.8381, time 120.25ms
iter 64820: loss 6.8665, time 120.42ms
iter 64830: loss 6.8897, time 119.23ms
iter 64840: loss 7.3463, time 119.20ms
iter 64850: loss 7.1478, time 119.20ms
iter 64860: loss 7.2007, time 118.34ms
iter 64870: loss 6.8364, time 119.16ms
iter 64880: loss 6.2856, time 119.80ms
iter 64890: loss 7.1913, time 118.98ms
tensor(0.0010)
iter 64900: loss 6.4984, time 119.94ms
iter 64910: loss 7.4020, time 119.30ms
iter 64920: loss 5.8274, time 121.65ms
iter 64930: loss 6.8002, time 119.55ms
iter 64940: loss 7.1047, time 119.99ms
iter 64950: loss 6.5160, time 119.31ms
iter 64960: loss 7.3022, time 119.28ms
iter 64970: loss 5.9400, time 119.16ms
iter 64980: loss 7.1701, time 119.09ms
iter 64990: loss 6.7711, time 120.84ms
tensor(0.0010)
step 65000: train loss 6.1089, val loss 6.1586
saving checkpoint to out-shakespeare-char
iter 65000: loss 6.6467, time 2854.93ms
iter 65010: loss 7.1775, time 122.54ms
iter 65020: loss 6.8901, time 120.63ms
iter 65030: loss 6.6559, time 119.41ms
iter 65040: loss 6.3265, time 120.52ms
iter 65050: loss 6.7945, time 120.24ms
iter 65060: loss 6.8772, time 119.70ms
iter 65070: loss 6.9917, time 119.13ms
iter 65080: loss 7.2573, time 120.65ms
iter 65090: loss 7.2453, time 119.17ms
tensor(0.0010)
iter 65100: loss 6.5397, time 123.00ms
iter 65110: loss 7.0271, time 122.88ms
iter 65120: loss 7.0018, time 122.98ms
iter 65130: loss 6.6702, time 122.48ms
iter 65140: loss 6.5201, time 122.80ms
iter 65150: loss 7.1286, time 123.98ms
iter 65160: loss 8.1882, time 122.69ms
iter 65170: loss 6.8007, time 122.50ms
iter 65180: loss 7.2354, time 122.32ms
iter 65190: loss 6.8827, time 122.79ms
tensor(0.0039)
iter 65200: loss 7.4960, time 125.98ms
iter 65210: loss 6.7336, time 123.44ms
iter 65220: loss 6.8487, time 122.89ms
iter 65230: loss 6.6734, time 122.80ms
iter 65240: loss 6.3873, time 122.39ms
step 65250: train loss 6.1155, val loss 6.1384
saving checkpoint to out-shakespeare-char
iter 65250: loss 6.5873, time 2876.08ms
iter 65260: loss 6.4521, time 123.86ms
iter 65270: loss 7.5496, time 123.09ms
iter 65280: loss 7.0501, time 122.91ms
iter 65290: loss 6.7485, time 125.27ms
tensor(0.0089)
iter 65300: loss 7.2597, time 121.82ms
iter 65310: loss 7.5698, time 123.13ms
iter 65320: loss 7.3979, time 122.25ms
iter 65330: loss 6.9816, time 122.98ms
iter 65340: loss 7.3292, time 123.30ms
iter 65350: loss 7.6683, time 122.36ms
iter 65360: loss 7.0343, time 124.31ms
iter 65370: loss 6.7560, time 123.02ms
iter 65380: loss 6.6158, time 122.76ms
iter 65390: loss 6.7239, time 122.42ms
tensor(0.0157)
iter 65400: loss 6.7595, time 122.49ms
iter 65410: loss 6.4157, time 124.58ms
iter 65420: loss 7.8583, time 122.44ms
iter 65430: loss 6.5045, time 122.22ms
iter 65440: loss 6.5655, time 122.06ms
iter 65450: loss 6.8955, time 122.22ms
iter 65460: loss 7.1347, time 121.28ms
iter 65470: loss 6.5408, time 122.23ms
iter 65480: loss 6.5186, time 122.31ms
iter 65490: loss 6.6293, time 122.26ms
tensor(0.0245)
step 65500: train loss 6.1554, val loss 6.1211
saving checkpoint to out-shakespeare-char
iter 65500: loss 7.2253, time 2866.24ms
iter 65510: loss 6.6694, time 122.52ms
iter 65520: loss 6.4716, time 122.85ms
iter 65530: loss 7.1203, time 125.23ms
iter 65540: loss 7.0516, time 122.21ms
iter 65550: loss 6.7416, time 121.87ms
iter 65560: loss 7.6070, time 122.43ms
iter 65570: loss 7.0293, time 123.43ms
iter 65580: loss 7.0027, time 122.11ms
iter 65590: loss 7.4108, time 122.67ms
tensor(0.0351)
iter 65600: loss 6.4009, time 123.59ms
iter 65610: loss 7.3441, time 123.27ms
iter 65620: loss 7.0139, time 119.19ms
iter 65630: loss 6.7613, time 120.45ms
iter 65640: loss 6.2615, time 119.15ms
iter 65650: loss 7.6035, time 120.58ms
iter 65660: loss 7.5624, time 119.30ms
iter 65670: loss 7.9989, time 120.60ms
iter 65680: loss 6.9036, time 120.36ms
iter 65690: loss 6.8145, time 121.09ms
tensor(0.0476)
iter 65700: loss 6.9423, time 119.98ms
iter 65710: loss 7.1871, time 119.41ms
iter 65720: loss 5.6714, time 119.08ms
iter 65730: loss 7.0401, time 120.28ms
iter 65740: loss 6.4521, time 120.04ms
step 65750: train loss 6.1125, val loss 6.1444
saving checkpoint to out-shakespeare-char
iter 65750: loss 7.3055, time 2857.14ms
iter 65760: loss 7.1907, time 122.15ms
iter 65770: loss 7.4850, time 125.59ms
iter 65780: loss 5.6067, time 122.19ms
iter 65790: loss 7.8428, time 122.27ms
tensor(0.0618)
iter 65800: loss 7.5132, time 122.54ms
iter 65810: loss 7.3886, time 122.74ms
iter 65820: loss 7.2066, time 122.39ms
iter 65830: loss 7.5812, time 122.14ms
iter 65840: loss 6.8071, time 121.97ms
iter 65850: loss 6.5084, time 124.99ms
iter 65860: loss 7.3282, time 121.11ms
iter 65870: loss 6.3847, time 122.30ms
iter 65880: loss 6.6859, time 122.23ms
iter 65890: loss 7.2671, time 121.26ms
tensor(0.0778)
iter 65900: loss 6.2770, time 122.26ms
iter 65910: loss 7.0177, time 125.19ms
iter 65920: loss 7.5850, time 122.75ms
iter 65930: loss 6.8892, time 122.31ms
iter 65940: loss 6.5377, time 122.54ms
iter 65950: loss 7.8910, time 123.64ms
iter 65960: loss 7.0249, time 122.15ms
iter 65970: loss 7.2070, time 124.96ms
iter 65980: loss 6.2183, time 122.91ms
iter 65990: loss 6.7946, time 122.18ms
tensor(0.0955)
step 66000: train loss 6.1444, val loss 6.1774
saving checkpoint to out-shakespeare-char
iter 66000: loss 6.6575, time 2856.80ms
iter 66010: loss 6.4452, time 122.66ms
iter 66020: loss 6.6777, time 121.62ms
iter 66030: loss 6.1846, time 125.01ms
iter 66040: loss 6.0067, time 122.17ms
iter 66050: loss 6.4449, time 121.60ms
iter 66060: loss 6.5810, time 122.07ms
iter 66070: loss 6.6499, time 122.86ms
iter 66080: loss 6.7197, time 121.61ms
iter 66090: loss 7.6358, time 121.25ms
tensor(0.1147)
iter 66100: loss 8.0451, time 125.57ms
iter 66110: loss 6.8506, time 123.94ms
iter 66120: loss 6.1582, time 121.65ms
iter 66130: loss 6.7435, time 123.71ms
iter 66140: loss 7.0934, time 123.22ms
iter 66150: loss 7.2578, time 123.06ms
iter 66160: loss 7.5416, time 122.77ms
iter 66170: loss 6.3209, time 123.22ms
iter 66180: loss 6.7980, time 122.73ms
iter 66190: loss 6.8993, time 125.91ms
tensor(0.1355)
iter 66200: loss 6.5543, time 123.09ms
iter 66210: loss 7.2591, time 123.84ms
iter 66220: loss 7.1437, time 122.89ms
iter 66230: loss 7.7152, time 122.84ms
iter 66240: loss 7.3917, time 122.17ms
step 66250: train loss 6.1135, val loss 6.1294
saving checkpoint to out-shakespeare-char
iter 66250: loss 7.0343, time 2860.20ms
iter 66260: loss 7.2365, time 122.92ms
iter 66270: loss 7.4398, time 123.24ms
iter 66280: loss 6.7784, time 123.36ms
iter 66290: loss 7.1280, time 123.24ms
tensor(0.1577)
iter 66300: loss 6.6539, time 122.14ms
iter 66310: loss 6.6298, time 125.84ms
iter 66320: loss 6.2873, time 123.71ms
iter 66330: loss 7.0274, time 123.36ms
iter 66340: loss 7.5852, time 123.04ms
iter 66350: loss 7.4366, time 123.15ms
iter 66360: loss 6.4041, time 122.77ms
iter 66370: loss 7.6052, time 126.29ms
iter 66380: loss 6.5660, time 123.04ms
iter 66390: loss 7.0814, time 123.95ms
tensor(0.1813)
iter 66400: loss 7.2695, time 123.31ms
iter 66410: loss 6.7632, time 122.75ms
iter 66420: loss 7.3665, time 124.53ms
iter 66430: loss 6.8143, time 123.07ms
iter 66440: loss 6.7620, time 125.76ms
iter 66450: loss 7.8162, time 122.81ms
iter 66460: loss 7.6822, time 123.12ms
iter 66470: loss 7.1829, time 122.91ms
iter 66480: loss 7.0779, time 122.72ms
iter 66490: loss 7.3745, time 123.47ms
tensor(0.2061)
step 66500: train loss 6.2143, val loss 6.1302
saving checkpoint to out-shakespeare-char
iter 66500: loss 7.7754, time 2876.82ms
iter 66510: loss 6.9739, time 125.64ms
iter 66520: loss 7.0062, time 122.84ms
iter 66530: loss 6.3252, time 122.33ms
iter 66540: loss 6.5911, time 122.86ms
iter 66550: loss 7.1981, time 123.24ms
iter 66560: loss 6.9950, time 123.79ms
iter 66570: loss 6.5713, time 121.84ms
iter 66580: loss 7.5753, time 122.92ms
iter 66590: loss 7.0349, time 122.88ms
tensor(0.2321)
iter 66600: loss 6.9399, time 123.02ms
iter 66610: loss 6.6288, time 123.12ms
iter 66620: loss 6.8113, time 122.85ms
iter 66630: loss 7.6320, time 122.86ms
iter 66640: loss 6.5937, time 126.47ms
iter 66650: loss 7.0275, time 122.73ms
iter 66660: loss 7.9885, time 122.99ms
iter 66670: loss 7.6582, time 123.50ms
iter 66680: loss 6.6071, time 122.95ms
iter 66690: loss 6.5756, time 122.76ms
tensor(0.2591)
iter 66700: loss 7.3081, time 124.01ms
iter 66710: loss 6.3559, time 125.85ms
iter 66720: loss 6.5965, time 122.75ms
iter 66730: loss 7.3942, time 123.51ms
iter 66740: loss 7.4994, time 122.59ms
step 66750: train loss 6.1759, val loss 6.2125
saving checkpoint to out-shakespeare-char
iter 66750: loss 6.7292, time 2842.74ms
iter 66760: loss 6.9546, time 123.03ms
iter 66770: loss 7.3192, time 123.68ms
iter 66780: loss 6.9321, time 125.74ms
iter 66790: loss 7.1805, time 123.33ms
tensor(0.2871)
iter 66800: loss 7.2680, time 123.16ms
iter 66810: loss 6.9191, time 123.02ms
iter 66820: loss 6.9640, time 123.05ms
iter 66830: loss 7.1987, time 123.97ms
iter 66840: loss 6.9268, time 122.67ms
iter 66850: loss 7.0637, time 125.45ms
iter 66860: loss 6.8296, time 122.72ms
iter 66870: loss 7.5816, time 122.56ms
iter 66880: loss 6.1352, time 122.55ms
iter 66890: loss 7.4783, time 122.68ms
tensor(0.3159)
iter 66900: loss 6.8995, time 123.46ms
iter 66910: loss 6.2405, time 122.60ms
iter 66920: loss 7.2210, time 123.03ms
iter 66930: loss 6.9754, time 121.78ms
iter 66940: loss 6.7795, time 122.99ms
iter 66950: loss 7.2774, time 122.81ms
iter 66960: loss 7.2828, time 123.25ms
iter 66970: loss 6.7974, time 123.11ms
iter 66980: loss 7.1391, time 125.92ms
iter 66990: loss 7.7150, time 122.94ms
tensor(0.3455)
step 67000: train loss 6.1848, val loss 6.2007
saving checkpoint to out-shakespeare-char
iter 67000: loss 7.6453, time 2853.77ms
iter 67010: loss 7.3354, time 122.66ms
iter 67020: loss 7.5312, time 122.60ms
iter 67030: loss 6.5663, time 122.60ms
iter 67040: loss 6.7693, time 122.51ms
iter 67050: loss 7.3564, time 125.29ms
iter 67060: loss 6.9709, time 122.37ms
iter 67070: loss 7.5549, time 122.39ms
iter 67080: loss 6.8996, time 123.34ms
iter 67090: loss 7.5248, time 124.07ms
tensor(0.3757)
iter 67100: loss 6.6921, time 123.22ms
iter 67110: loss 6.5661, time 125.76ms
iter 67120: loss 6.3217, time 122.50ms
iter 67130: loss 6.9146, time 122.25ms
iter 67140: loss 7.3943, time 123.02ms
iter 67150: loss 6.6217, time 122.83ms
iter 67160: loss 6.1680, time 123.02ms
iter 67170: loss 6.9049, time 122.87ms
iter 67180: loss 7.5524, time 125.49ms
iter 67190: loss 7.2204, time 123.56ms
tensor(0.4063)
iter 67200: loss 7.3318, time 122.48ms
iter 67210: loss 6.8014, time 123.16ms
iter 67220: loss 7.4549, time 122.55ms
iter 67230: loss 6.4793, time 126.23ms
iter 67240: loss 7.6973, time 122.35ms
step 67250: train loss 6.2522, val loss 6.2437
saving checkpoint to out-shakespeare-char
iter 67250: loss 7.1042, time 2875.57ms
iter 67260: loss 7.2525, time 122.43ms
iter 67270: loss 6.1856, time 122.51ms
iter 67280: loss 7.4450, time 124.07ms
iter 67290: loss 7.0237, time 122.33ms
tensor(0.4373)
iter 67300: loss 7.0214, time 122.37ms
iter 67310: loss 6.6232, time 122.94ms
iter 67320: loss 7.5551, time 122.97ms
iter 67330: loss 7.1393, time 126.09ms
iter 67340: loss 7.7885, time 122.82ms
iter 67350: loss 6.9465, time 121.67ms
iter 67360: loss 6.8715, time 122.77ms
iter 67370: loss 6.3886, time 122.80ms
iter 67380: loss 7.1858, time 125.94ms
iter 67390: loss 7.8261, time 122.91ms
tensor(0.4686)
iter 67400: loss 6.8799, time 122.84ms
iter 67410: loss 7.5103, time 122.86ms
iter 67420: loss 6.8121, time 123.20ms
iter 67430: loss 6.7967, time 122.73ms
iter 67440: loss 6.5148, time 121.84ms
iter 67450: loss 6.5887, time 122.88ms
iter 67460: loss 7.5102, time 122.89ms
iter 67470: loss 7.7930, time 122.87ms
iter 67480: loss 7.3883, time 122.75ms
iter 67490: loss 6.9711, time 122.87ms
tensor(0.5000)
step 67500: train loss 6.2928, val loss 6.3271
saving checkpoint to out-shakespeare-char
iter 67500: loss 6.9965, time 2848.31ms
iter 67510: loss 6.3177, time 122.98ms
iter 67520: loss 7.3952, time 122.94ms
iter 67530: loss 7.4397, time 122.49ms
iter 67540: loss 6.6163, time 122.91ms
iter 67550: loss 6.8592, time 122.81ms
iter 67560: loss 7.6795, time 123.26ms
iter 67570: loss 8.1024, time 122.95ms
iter 67580: loss 7.6837, time 126.01ms
iter 67590: loss 7.0382, time 122.73ms
tensor(0.5314)
iter 67600: loss 7.2308, time 122.09ms
iter 67610: loss 7.0983, time 122.77ms
iter 67620: loss 7.2302, time 122.86ms
iter 67630: loss 6.5597, time 122.67ms
iter 67640: loss 6.1683, time 125.73ms
iter 67650: loss 7.3198, time 122.74ms
iter 67660: loss 7.4017, time 122.87ms
iter 67670: loss 6.9931, time 123.10ms
iter 67680: loss 6.5747, time 123.13ms
iter 67690: loss 7.4963, time 123.05ms
tensor(0.5627)
iter 67700: loss 7.2553, time 126.77ms
iter 67710: loss 8.2587, time 123.28ms
iter 67720: loss 6.8061, time 122.81ms
iter 67730: loss 7.5278, time 122.80ms
iter 67740: loss 6.8046, time 122.57ms
step 67750: train loss 6.2592, val loss 6.3119
saving checkpoint to out-shakespeare-char
iter 67750: loss 6.3713, time 2839.05ms
iter 67760: loss 7.3101, time 120.44ms
iter 67770: loss 7.0085, time 119.91ms
iter 67780: loss 6.9286, time 119.65ms
iter 67790: loss 8.2647, time 119.30ms
tensor(0.5937)
iter 67800: loss 7.6696, time 121.57ms
iter 67810: loss 6.6039, time 118.75ms
iter 67820: loss 6.8207, time 119.51ms
iter 67830: loss 6.6310, time 118.53ms
iter 67840: loss 7.5177, time 119.10ms
iter 67850: loss 6.2143, time 119.13ms
iter 67860: loss 6.7519, time 119.64ms
iter 67870: loss 7.5914, time 120.94ms
iter 67880: loss 7.4981, time 119.23ms
iter 67890: loss 7.2277, time 119.78ms
tensor(0.6243)
iter 67900: loss 6.4521, time 120.47ms
iter 67910: loss 6.5031, time 119.27ms
iter 67920: loss 7.0868, time 120.48ms
iter 67930: loss 6.5007, time 120.26ms
iter 67940: loss 6.8925, time 120.39ms
iter 67950: loss 6.5138, time 119.08ms
iter 67960: loss 6.6872, time 119.16ms
iter 67970: loss 7.9048, time 120.27ms
iter 67980: loss 6.9080, time 119.26ms
iter 67990: loss 7.9169, time 119.20ms
tensor(0.6545)
step 68000: train loss 6.3446, val loss 6.3493
saving checkpoint to out-shakespeare-char
iter 68000: loss 6.4460, time 2856.32ms
iter 68010: loss 7.0383, time 118.64ms
iter 68020: loss 7.3710, time 118.77ms
iter 68030: loss 6.5452, time 119.07ms
iter 68040: loss 7.2410, time 120.18ms
iter 68050: loss 7.3258, time 119.51ms
iter 68060: loss 7.5415, time 121.19ms
iter 68070: loss 6.9963, time 120.71ms
iter 68080: loss 7.0841, time 118.91ms
iter 68090: loss 6.8604, time 118.75ms
tensor(0.6841)
iter 68100: loss 7.2322, time 121.78ms
iter 68110: loss 6.9173, time 119.16ms
iter 68120: loss 7.2849, time 119.80ms
iter 68130: loss 6.9059, time 119.04ms
iter 68140: loss 6.8742, time 118.56ms
iter 68150: loss 7.3194, time 118.69ms
iter 68160: loss 7.3641, time 121.80ms
iter 68170: loss 7.3439, time 119.44ms
iter 68180: loss 7.4210, time 118.41ms
iter 68190: loss 7.5259, time 121.11ms
tensor(0.7129)
iter 68200: loss 7.5653, time 119.36ms
iter 68210: loss 6.2016, time 120.59ms
iter 68220: loss 6.7382, time 120.74ms
iter 68230: loss 7.1303, time 120.42ms
iter 68240: loss 7.0679, time 119.58ms
step 68250: train loss 6.4257, val loss 6.3762
saving checkpoint to out-shakespeare-char
iter 68250: loss 6.9037, time 2865.26ms
iter 68260: loss 7.6954, time 122.51ms
iter 68270: loss 6.6856, time 122.76ms
iter 68280: loss 7.9354, time 122.95ms
iter 68290: loss 7.8131, time 123.43ms
tensor(0.7409)
iter 68300: loss 7.3051, time 125.63ms
iter 68310: loss 6.7880, time 124.96ms
iter 68320: loss 6.7969, time 123.16ms
iter 68330: loss 6.8692, time 122.05ms
iter 68340: loss 6.7362, time 125.53ms
iter 68350: loss 6.4179, time 122.54ms
iter 68360: loss 6.8556, time 122.78ms
iter 68370: loss 7.3032, time 122.32ms
iter 68380: loss 6.7457, time 122.54ms
iter 68390: loss 6.7501, time 122.43ms
tensor(0.7679)
iter 68400: loss 7.7825, time 122.79ms
iter 68410: loss 7.4867, time 123.05ms
iter 68420: loss 7.6490, time 125.47ms
iter 68430: loss 7.4524, time 125.76ms
iter 68440: loss 7.7131, time 122.68ms
iter 68450: loss 6.9259, time 122.94ms
iter 68460: loss 7.2182, time 122.86ms
iter 68470: loss 7.1225, time 122.71ms
iter 68480: loss 6.7374, time 121.69ms
iter 68490: loss 6.7613, time 124.44ms
tensor(0.7939)
step 68500: train loss 6.3978, val loss 6.4270
saving checkpoint to out-shakespeare-char
iter 68500: loss 7.1027, time 2868.81ms
iter 68510: loss 6.5554, time 120.82ms
iter 68520: loss 7.5738, time 122.22ms
iter 68530: loss 7.4806, time 119.88ms
iter 68540: loss 6.9670, time 122.77ms
iter 68550: loss 7.1015, time 118.47ms
iter 68560: loss 6.6267, time 122.48ms
iter 68570: loss 7.8790, time 119.34ms
iter 68580: loss 7.4595, time 122.41ms
iter 68590: loss 7.0229, time 120.50ms
tensor(0.8187)
iter 68600: loss 6.5308, time 121.60ms
iter 68610: loss 7.2467, time 119.10ms
iter 68620: loss 7.0992, time 120.29ms
iter 68630: loss 6.9319, time 119.01ms
iter 68640: loss 6.8351, time 119.22ms
iter 68650: loss 7.4279, time 119.00ms
iter 68660: loss 7.7255, time 119.19ms
iter 68670: loss 6.6452, time 118.34ms
iter 68680: loss 7.6770, time 119.27ms
iter 68690: loss 6.5127, time 119.90ms
tensor(0.8423)
iter 68700: loss 8.1263, time 119.08ms
iter 68710: loss 7.0184, time 119.06ms
iter 68720: loss 6.9208, time 118.21ms
iter 68730: loss 7.3332, time 120.63ms
iter 68740: loss 7.6925, time 118.99ms
step 68750: train loss 6.4125, val loss 6.4332
saving checkpoint to out-shakespeare-char
iter 68750: loss 7.7156, time 2854.77ms
iter 68760: loss 6.1979, time 120.92ms
iter 68770: loss 7.8245, time 120.40ms
iter 68780: loss 7.0538, time 118.81ms
iter 68790: loss 7.9366, time 119.88ms
tensor(0.8645)
iter 68800: loss 7.0442, time 119.24ms
iter 68810: loss 6.9303, time 119.26ms
iter 68820: loss 7.2744, time 119.26ms
iter 68830: loss 7.3061, time 119.11ms
iter 68840: loss 7.3598, time 119.15ms
iter 68850: loss 6.2891, time 119.86ms
iter 68860: loss 7.3409, time 119.27ms
iter 68870: loss 8.2421, time 119.23ms
iter 68880: loss 7.3860, time 122.09ms
iter 68890: loss 6.3921, time 120.36ms
tensor(0.8853)
iter 68900: loss 7.7970, time 122.10ms
iter 68910: loss 7.3190, time 119.53ms
iter 68920: loss 6.6260, time 120.88ms
iter 68930: loss 7.3810, time 119.10ms
iter 68940: loss 7.1232, time 119.25ms
iter 68950: loss 6.6000, time 118.99ms
iter 68960: loss 6.8217, time 119.44ms
iter 68970: loss 7.1849, time 119.41ms
iter 68980: loss 7.8680, time 120.03ms
iter 68990: loss 7.4736, time 118.27ms
tensor(0.9045)
step 69000: train loss 6.3878, val loss 6.3808
saving checkpoint to out-shakespeare-char
iter 69000: loss 6.6516, time 2881.79ms
iter 69010: loss 7.1763, time 122.45ms
iter 69020: loss 7.3079, time 122.55ms
iter 69030: loss 7.6395, time 122.55ms
iter 69040: loss 6.9086, time 122.54ms
iter 69050: loss 7.2559, time 125.71ms
iter 69060: loss 6.8301, time 122.47ms
iter 69070: loss 6.5024, time 121.76ms
iter 69080: loss 6.7631, time 122.61ms
iter 69090: loss 6.5905, time 121.68ms
tensor(0.9222)
iter 69100: loss 6.8667, time 123.63ms
iter 69110: loss 7.2230, time 120.52ms
iter 69120: loss 6.7514, time 122.77ms
iter 69130: loss 7.2610, time 125.31ms
iter 69140: loss 6.5900, time 122.88ms
iter 69150: loss 6.5034, time 122.55ms
iter 69160: loss 7.6788, time 122.50ms
iter 69170: loss 7.3681, time 122.49ms
iter 69180: loss 6.7038, time 122.01ms
iter 69190: loss 8.0880, time 122.38ms
tensor(0.9382)
iter 69200: loss 6.8322, time 125.57ms
iter 69210: loss 7.5785, time 122.41ms
iter 69220: loss 7.3660, time 122.44ms
iter 69230: loss 6.9722, time 122.61ms
iter 69240: loss 7.2361, time 122.51ms
step 69250: train loss 6.4337, val loss 6.4587
saving checkpoint to out-shakespeare-char
iter 69250: loss 7.1028, time 2886.09ms
iter 69260: loss 7.5806, time 122.79ms
iter 69270: loss 6.5471, time 122.89ms
iter 69280: loss 7.3906, time 122.86ms
iter 69290: loss 6.6205, time 125.96ms
tensor(0.9524)
iter 69300: loss 7.2256, time 122.57ms
iter 69310: loss 7.0738, time 122.59ms
iter 69320: loss 8.3077, time 121.87ms
iter 69330: loss 6.7711, time 123.03ms
iter 69340: loss 6.3957, time 122.52ms
iter 69350: loss 7.6494, time 122.57ms
iter 69360: loss 7.2894, time 125.39ms
iter 69370: loss 7.0178, time 122.56ms
iter 69380: loss 6.5930, time 122.84ms
iter 69390: loss 7.0284, time 122.84ms
tensor(0.9649)
iter 69400: loss 7.8106, time 122.82ms
iter 69410: loss 7.4137, time 123.06ms
iter 69420: loss 6.8901, time 123.24ms
iter 69430: loss 7.4147, time 125.42ms
iter 69440: loss 6.5563, time 122.71ms
iter 69450: loss 6.8843, time 122.44ms
iter 69460: loss 7.5804, time 122.57ms
iter 69470: loss 6.8875, time 122.67ms
iter 69480: loss 7.7239, time 122.59ms
iter 69490: loss 7.7361, time 122.87ms
tensor(0.9755)
step 69500: train loss 6.4298, val loss 6.4587
saving checkpoint to out-shakespeare-char
iter 69500: loss 7.1516, time 2854.03ms
iter 69510: loss 7.4785, time 123.30ms
iter 69520: loss 8.1225, time 126.37ms
iter 69530: loss 7.4579, time 123.86ms
iter 69540: loss 7.1845, time 123.89ms
iter 69550: loss 6.7490, time 122.95ms
iter 69560: loss 6.7306, time 122.71ms
iter 69570: loss 7.6867, time 122.61ms
iter 69580: loss 7.2689, time 123.13ms
iter 69590: loss 6.7572, time 122.91ms
tensor(0.9843)
iter 69600: loss 7.0304, time 125.63ms
iter 69610: loss 6.1355, time 123.22ms
iter 69620: loss 6.9566, time 122.87ms
iter 69630: loss 6.4774, time 122.73ms
iter 69640: loss 7.2378, time 122.81ms
iter 69650: loss 6.9587, time 122.75ms
iter 69660: loss 7.1844, time 122.72ms
iter 69670: loss 7.8570, time 125.41ms
iter 69680: loss 6.9568, time 122.66ms
iter 69690: loss 7.3787, time 123.27ms
tensor(0.9911)
iter 69700: loss 7.5856, time 122.91ms
iter 69710: loss 6.5031, time 122.98ms
iter 69720: loss 7.3756, time 122.85ms
iter 69730: loss 7.2223, time 122.67ms
iter 69740: loss 7.1183, time 125.76ms
step 69750: train loss 6.4208, val loss 6.3879
saving checkpoint to out-shakespeare-char
iter 69750: loss 7.4926, time 2848.26ms
iter 69760: loss 7.2718, time 119.47ms
iter 69770: loss 6.2487, time 120.58ms
iter 69780: loss 6.7126, time 119.19ms
iter 69790: loss 7.2688, time 119.18ms
tensor(0.9961)
iter 69800: loss 6.3394, time 119.52ms
iter 69810: loss 6.8849, time 119.54ms
iter 69820: loss 6.8848, time 120.67ms
iter 69830: loss 7.3664, time 119.19ms
iter 69840: loss 6.4654, time 120.60ms
iter 69850: loss 7.5107, time 119.91ms
iter 69860: loss 6.7796, time 119.35ms
iter 69870: loss 7.6379, time 119.97ms
iter 69880: loss 7.1469, time 119.49ms
iter 69890: loss 7.4086, time 119.20ms
tensor(0.9990)
iter 69900: loss 7.5497, time 120.01ms
iter 69910: loss 6.9712, time 120.08ms
iter 69920: loss 7.4242, time 119.79ms
iter 69930: loss 7.0927, time 119.00ms
iter 69940: loss 7.0784, time 119.65ms
iter 69950: loss 7.1781, time 120.23ms
iter 69960: loss 7.1325, time 120.58ms
iter 69970: loss 7.4163, time 118.43ms
iter 69980: loss 7.9069, time 120.57ms
iter 69990: loss 6.9029, time 119.46ms
tensor(1.)
step 70000: train loss 6.4687, val loss 6.4945
saving checkpoint to out-shakespeare-char
iter 70000: loss 6.8081, time 2879.52ms
iter 70010: loss 6.8805, time 119.29ms
iter 70020: loss 7.5251, time 119.28ms
iter 70030: loss 6.9839, time 120.64ms
iter 70040: loss 7.8770, time 120.31ms
iter 70050: loss 6.7057, time 120.41ms
iter 70060: loss 6.5153, time 119.89ms
iter 70070: loss 6.8881, time 122.50ms
iter 70080: loss 6.6067, time 119.08ms
iter 70090: loss 6.9106, time 119.24ms
tensor(0.9990)
iter 70100: loss 8.0773, time 120.76ms
iter 70110: loss 6.8587, time 120.60ms
iter 70120: loss 7.9618, time 119.36ms
iter 70130: loss 6.7542, time 119.78ms
iter 70140: loss 7.1934, time 118.50ms
iter 70150: loss 6.1257, time 119.23ms
iter 70160: loss 7.2651, time 119.34ms
iter 70170: loss 7.5804, time 122.36ms
iter 70180: loss 7.6217, time 120.59ms
iter 70190: loss 6.7970, time 121.54ms
tensor(0.9961)
iter 70200: loss 7.2592, time 119.56ms
iter 70210: loss 6.9746, time 120.23ms
iter 70220: loss 7.3375, time 119.28ms
iter 70230: loss 7.2516, time 120.96ms
iter 70240: loss 7.2424, time 119.13ms
step 70250: train loss 6.5136, val loss 6.4910
saving checkpoint to out-shakespeare-char
iter 70250: loss 6.6858, time 2879.87ms
iter 70260: loss 6.6883, time 122.69ms
iter 70270: loss 7.4383, time 122.73ms
iter 70280: loss 7.1819, time 122.87ms
iter 70290: loss 7.6805, time 123.37ms
tensor(0.9911)
iter 70300: loss 7.6219, time 125.67ms
iter 70310: loss 6.8579, time 125.71ms
iter 70320: loss 7.9679, time 123.16ms
iter 70330: loss 8.0554, time 122.82ms
iter 70340: loss 6.6596, time 122.67ms
iter 70350: loss 6.8969, time 122.55ms
iter 70360: loss 6.4459, time 122.77ms
iter 70370: loss 7.7553, time 123.04ms
iter 70380: loss 7.5929, time 122.95ms
iter 70390: loss 6.9510, time 122.99ms
tensor(0.9843)
iter 70400: loss 7.5094, time 122.64ms
iter 70410: loss 7.2586, time 125.49ms
iter 70420: loss 5.9064, time 123.15ms
iter 70430: loss 6.7495, time 122.80ms
iter 70440: loss 7.0107, time 123.38ms
iter 70450: loss 6.9943, time 122.92ms
iter 70460: loss 7.2510, time 123.05ms
iter 70470: loss 7.5890, time 122.79ms
iter 70480: loss 7.7607, time 124.62ms
iter 70490: loss 7.1255, time 122.82ms
tensor(0.9755)
step 70500: train loss 6.4168, val loss 6.4553
saving checkpoint to out-shakespeare-char
iter 70500: loss 6.7020, time 2847.00ms
iter 70510: loss 7.5452, time 122.86ms
iter 70520: loss 7.0819, time 121.84ms
iter 70530: loss 7.5388, time 123.23ms
iter 70540: loss 7.6480, time 122.11ms
iter 70550: loss 6.5373, time 122.24ms
iter 70560: loss 6.6363, time 123.87ms
iter 70570: loss 6.6303, time 123.01ms
iter 70580: loss 7.3736, time 122.67ms
iter 70590: loss 7.3478, time 125.61ms
tensor(0.9649)
iter 70600: loss 6.8364, time 122.68ms
iter 70610: loss 6.8400, time 122.77ms
iter 70620: loss 6.7813, time 122.89ms
iter 70630: loss 6.4745, time 122.66ms
iter 70640: loss 7.6311, time 122.67ms
iter 70650: loss 7.2272, time 122.52ms
iter 70660: loss 7.3023, time 125.42ms
iter 70670: loss 7.4234, time 122.65ms
iter 70680: loss 7.1218, time 122.78ms
iter 70690: loss 6.7037, time 122.57ms
tensor(0.9524)
iter 70700: loss 6.2270, time 122.62ms
iter 70710: loss 7.0342, time 122.66ms
iter 70720: loss 7.3630, time 123.18ms
iter 70730: loss 7.4818, time 125.48ms
iter 70740: loss 6.4643, time 122.68ms
step 70750: train loss 6.4528, val loss 6.4642
saving checkpoint to out-shakespeare-char
iter 70750: loss 6.9524, time 2869.60ms
iter 70760: loss 6.8536, time 125.63ms
iter 70770: loss 7.2918, time 122.72ms
iter 70780: loss 6.8488, time 123.01ms
iter 70790: loss 6.7375, time 122.56ms
tensor(0.9382)
iter 70800: loss 6.9650, time 121.78ms
iter 70810: loss 6.5869, time 122.84ms
iter 70820: loss 7.0890, time 122.67ms
iter 70830: loss 6.7269, time 125.97ms
iter 70840: loss 7.8608, time 123.09ms
iter 70850: loss 7.6251, time 123.36ms
iter 70860: loss 7.2215, time 123.08ms
iter 70870: loss 7.1233, time 123.17ms
iter 70880: loss 7.0434, time 122.77ms
iter 70890: loss 7.2153, time 122.84ms
tensor(0.9222)
iter 70900: loss 7.1306, time 122.74ms
iter 70910: loss 6.6763, time 122.86ms
iter 70920: loss 6.2939, time 124.57ms
iter 70930: loss 7.2475, time 122.60ms
iter 70940: loss 6.9116, time 122.75ms
iter 70950: loss 7.3520, time 123.27ms
iter 70960: loss 7.0468, time 122.95ms
iter 70970: loss 6.8993, time 123.26ms
iter 70980: loss 7.1035, time 123.11ms
iter 70990: loss 7.0427, time 123.07ms
tensor(0.9045)
step 71000: train loss 6.4127, val loss 6.4236
saving checkpoint to out-shakespeare-char
iter 71000: loss 7.0162, time 2819.81ms
iter 71010: loss 6.6197, time 122.77ms
iter 71020: loss 7.3313, time 122.74ms
iter 71030: loss 7.1309, time 123.01ms
iter 71040: loss 6.9340, time 123.07ms
iter 71050: loss 7.4238, time 123.07ms
iter 71060: loss 7.3822, time 125.05ms
iter 71070: loss 6.7061, time 123.22ms
iter 71080: loss 7.6520, time 123.31ms
iter 71090: loss 7.1159, time 123.07ms
tensor(0.8853)
iter 71100: loss 6.7029, time 123.34ms
iter 71110: loss 6.6124, time 123.21ms
iter 71120: loss 6.9702, time 123.47ms
iter 71130: loss 7.7879, time 123.01ms
iter 71140: loss 7.1084, time 123.33ms
iter 71150: loss 7.0805, time 123.20ms
iter 71160: loss 6.7958, time 122.88ms
iter 71170: loss 6.6693, time 122.98ms
iter 71180: loss 7.5459, time 125.63ms
iter 71190: loss 7.4512, time 122.33ms
tensor(0.8645)
iter 71200: loss 7.2602, time 122.99ms
iter 71210: loss 6.8638, time 122.81ms
iter 71220: loss 6.4612, time 122.80ms
iter 71230: loss 7.3279, time 123.05ms
iter 71240: loss 7.0256, time 123.15ms
step 71250: train loss 6.4219, val loss 6.3903
saving checkpoint to out-shakespeare-char
iter 71250: loss 7.1864, time 2869.25ms
iter 71260: loss 7.3554, time 123.32ms
iter 71270: loss 6.7612, time 123.15ms
iter 71280: loss 7.8625, time 126.71ms
iter 71290: loss 6.8809, time 123.18ms
tensor(0.8423)
iter 71300: loss 7.4082, time 123.68ms
iter 71310: loss 6.3486, time 123.26ms
iter 71320: loss 7.0796, time 123.29ms
iter 71330: loss 6.7417, time 123.03ms
iter 71340: loss 6.6138, time 123.06ms
iter 71350: loss 6.7846, time 122.87ms
iter 71360: loss 7.0411, time 125.98ms
iter 71370: loss 7.4724, time 122.38ms
iter 71380: loss 7.4402, time 122.73ms
iter 71390: loss 7.2268, time 122.91ms
tensor(0.8187)
iter 71400: loss 6.8622, time 122.95ms
iter 71410: loss 6.1607, time 124.14ms
iter 71420: loss 6.6552, time 125.62ms
iter 71430: loss 7.0762, time 122.56ms
iter 71440: loss 7.5944, time 122.96ms
iter 71450: loss 7.3126, time 122.15ms
iter 71460: loss 7.4240, time 122.95ms
iter 71470: loss 7.7477, time 122.84ms
iter 71480: loss 7.6212, time 122.90ms
iter 71490: loss 6.0671, time 125.68ms
tensor(0.7939)
step 71500: train loss 6.3626, val loss 6.3144
saving checkpoint to out-shakespeare-char
iter 71500: loss 6.7133, time 2862.86ms
iter 71510: loss 7.2839, time 120.66ms
iter 71520: loss 7.1968, time 122.58ms
iter 71530: loss 6.4867, time 119.68ms
iter 71540: loss 7.1738, time 119.83ms
iter 71550: loss 7.4537, time 119.79ms
iter 71560: loss 7.5592, time 122.72ms
iter 71570: loss 6.8734, time 119.73ms
iter 71580: loss 6.5744, time 122.77ms
iter 71590: loss 7.0293, time 119.98ms
tensor(0.7679)
iter 71600: loss 7.1035, time 121.17ms
iter 71610: loss 6.6134, time 119.77ms
iter 71620: loss 7.3646, time 119.86ms
iter 71630: loss 6.7198, time 119.64ms
iter 71640: loss 7.8344, time 120.99ms
iter 71650: loss 6.8202, time 120.69ms
iter 71660: loss 7.3345, time 119.66ms
iter 71670: loss 7.2425, time 120.62ms
iter 71680: loss 7.3318, time 119.69ms
iter 71690: loss 7.2800, time 119.74ms
tensor(0.7409)
iter 71700: loss 7.3499, time 121.42ms
iter 71710: loss 7.3392, time 119.59ms
iter 71720: loss 7.0623, time 121.24ms
iter 71730: loss 7.5711, time 119.75ms
iter 71740: loss 7.3863, time 119.73ms
step 71750: train loss 6.4160, val loss 6.3652
saving checkpoint to out-shakespeare-char
iter 71750: loss 6.6178, time 2864.21ms
iter 71760: loss 7.2191, time 119.66ms
iter 71770: loss 7.4477, time 119.71ms
iter 71780: loss 6.9269, time 120.75ms
iter 71790: loss 6.3274, time 119.76ms
tensor(0.7129)
iter 71800: loss 6.9961, time 119.60ms
iter 71810: loss 7.8329, time 120.73ms
iter 71820: loss 6.9825, time 119.70ms
iter 71830: loss 7.1039, time 119.81ms
iter 71840: loss 6.8057, time 120.37ms
iter 71850: loss 7.1544, time 119.63ms
iter 71860: loss 6.3400, time 122.04ms
iter 71870: loss 7.2771, time 120.76ms
iter 71880: loss 7.3052, time 120.88ms
iter 71890: loss 6.8181, time 119.73ms
tensor(0.6841)
iter 71900: loss 7.6250, time 122.78ms
iter 71910: loss 8.0494, time 120.79ms
iter 71920: loss 6.6550, time 122.30ms
iter 71930: loss 6.7970, time 119.62ms
iter 71940: loss 6.3918, time 120.85ms
iter 71950: loss 7.5681, time 120.61ms
iter 71960: loss 7.1980, time 119.76ms
iter 71970: loss 7.0649, time 119.72ms
iter 71980: loss 7.3939, time 121.17ms
iter 71990: loss 6.8683, time 120.69ms
tensor(0.6545)
step 72000: train loss 6.2504, val loss 6.3449
saving checkpoint to out-shakespeare-char
iter 72000: loss 7.5018, time 2864.24ms
iter 72010: loss 7.1697, time 121.89ms
iter 72020: loss 6.8969, time 119.19ms
iter 72030: loss 7.2466, time 120.42ms
iter 72040: loss 6.7833, time 119.49ms
iter 72050: loss 6.5751, time 118.25ms
iter 72060: loss 7.4695, time 119.31ms
iter 72070: loss 7.3953, time 119.41ms
iter 72080: loss 7.0477, time 120.55ms
iter 72090: loss 7.1531, time 119.26ms
tensor(0.6243)
iter 72100: loss 7.3912, time 118.71ms
iter 72110: loss 7.5898, time 118.60ms
iter 72120: loss 6.8695, time 120.61ms
iter 72130: loss 7.1875, time 119.30ms
iter 72140: loss 6.7078, time 119.91ms
iter 72150: loss 7.7635, time 119.36ms
iter 72160: loss 7.2229, time 119.49ms
iter 72170: loss 6.8831, time 120.57ms
iter 72180: loss 7.2210, time 119.42ms
iter 72190: loss 7.8845, time 119.39ms
tensor(0.5937)
iter 72200: loss 6.0776, time 119.40ms
iter 72210: loss 6.4607, time 119.70ms
iter 72220: loss 7.7053, time 119.33ms
iter 72230: loss 7.2235, time 120.61ms
iter 72240: loss 6.9042, time 120.30ms
step 72250: train loss 6.2725, val loss 6.3018
saving checkpoint to out-shakespeare-char
iter 72250: loss 7.0901, time 2843.86ms
iter 72260: loss 7.3882, time 122.55ms
iter 72270: loss 6.3155, time 118.87ms
iter 72280: loss 6.1001, time 119.54ms
iter 72290: loss 6.8517, time 119.51ms
tensor(0.5627)
iter 72300: loss 6.8134, time 119.51ms
iter 72310: loss 6.4727, time 119.54ms
iter 72320: loss 6.9983, time 121.34ms
iter 72330: loss 6.8279, time 119.54ms
iter 72340: loss 7.4687, time 120.66ms
iter 72350: loss 6.4014, time 119.55ms
iter 72360: loss 6.8471, time 118.77ms
iter 72370: loss 7.0021, time 121.34ms
iter 72380: loss 6.3647, time 119.45ms
iter 72390: loss 7.4892, time 121.71ms
tensor(0.5314)
iter 72400: loss 7.5416, time 120.68ms
iter 72410: loss 7.5968, time 122.21ms
iter 72420: loss 7.9567, time 119.61ms
iter 72430: loss 6.7891, time 119.52ms
iter 72440: loss 6.9223, time 119.45ms
iter 72450: loss 6.9115, time 119.32ms
iter 72460: loss 6.5479, time 119.24ms
iter 72470: loss 7.1312, time 122.29ms
iter 72480: loss 7.3478, time 119.33ms
iter 72490: loss 6.8049, time 119.39ms
tensor(0.5000)
step 72500: train loss 6.3183, val loss 6.2864
saving checkpoint to out-shakespeare-char
iter 72500: loss 6.6199, time 2857.22ms
iter 72510: loss 7.1692, time 122.37ms
iter 72520: loss 6.8342, time 122.90ms
iter 72530: loss 6.5176, time 123.02ms
iter 72540: loss 6.7545, time 123.34ms
iter 72550: loss 6.6958, time 123.28ms
iter 72560: loss 7.1660, time 123.77ms
iter 72570: loss 6.5228, time 125.57ms
iter 72580: loss 6.8837, time 122.35ms
iter 72590: loss 6.6724, time 123.43ms
tensor(0.4686)
iter 72600: loss 7.1460, time 121.86ms
iter 72610: loss 6.6753, time 123.60ms
iter 72620: loss 7.0010, time 122.99ms
iter 72630: loss 6.9605, time 123.02ms
iter 72640: loss 6.7228, time 123.07ms
iter 72650: loss 6.7265, time 122.98ms
iter 72660: loss 6.1664, time 123.14ms
iter 72670: loss 8.2135, time 122.56ms
iter 72680: loss 6.9075, time 122.96ms
iter 72690: loss 6.9028, time 122.93ms
tensor(0.4373)
iter 72700: loss 6.9185, time 126.35ms
iter 72710: loss 6.5572, time 122.01ms
iter 72720: loss 6.2385, time 123.06ms
iter 72730: loss 7.1572, time 123.00ms
iter 72740: loss 7.2193, time 123.17ms
step 72750: train loss 6.2236, val loss 6.2651
saving checkpoint to out-shakespeare-char
iter 72750: loss 6.2600, time 2851.80ms
iter 72760: loss 6.6479, time 122.55ms
iter 72770: loss 6.7840, time 122.90ms
iter 72780: loss 6.4854, time 123.23ms
iter 72790: loss 6.8570, time 123.47ms
tensor(0.4063)
iter 72800: loss 6.7319, time 122.11ms
iter 72810: loss 5.9228, time 123.27ms
iter 72820: loss 7.0029, time 126.14ms
iter 72830: loss 6.9215, time 123.04ms
iter 72840: loss 7.0357, time 123.06ms
iter 72850: loss 7.1869, time 122.48ms
iter 72860: loss 6.8037, time 123.30ms
iter 72870: loss 7.0365, time 122.23ms
iter 72880: loss 6.6840, time 123.30ms
iter 72890: loss 6.6955, time 125.74ms
tensor(0.3757)
iter 72900: loss 6.5736, time 122.75ms
iter 72910: loss 6.1721, time 122.63ms
iter 72920: loss 6.5898, time 122.21ms
iter 72930: loss 6.1059, time 122.87ms
iter 72940: loss 7.2376, time 121.07ms
iter 72950: loss 7.2859, time 122.75ms
iter 72960: loss 6.7350, time 123.82ms
iter 72970: loss 6.8953, time 122.22ms
iter 72980: loss 6.8229, time 123.11ms
iter 72990: loss 7.2013, time 122.14ms
tensor(0.3455)
step 73000: train loss 6.2719, val loss 6.2325
saving checkpoint to out-shakespeare-char
iter 73000: loss 7.2883, time 2878.34ms
iter 73010: loss 7.4232, time 120.18ms
iter 73020: loss 6.9450, time 120.25ms
iter 73030: loss 7.0369, time 120.32ms
iter 73040: loss 6.2293, time 119.66ms
iter 73050: loss 7.1807, time 120.72ms
iter 73060: loss 6.7344, time 119.18ms
iter 73070: loss 6.6927, time 119.19ms
iter 73080: loss 6.4330, time 119.49ms
iter 73090: loss 6.1353, time 119.49ms
tensor(0.3159)
iter 73100: loss 6.7374, time 119.94ms
iter 73110: loss 7.0078, time 120.31ms
iter 73120: loss 7.0824, time 119.73ms
iter 73130: loss 6.3717, time 121.40ms
iter 73140: loss 6.6452, time 119.24ms
iter 73150: loss 6.7939, time 119.27ms
iter 73160: loss 6.9232, time 118.43ms
iter 73170: loss 8.0879, time 120.03ms
iter 73180: loss 6.7318, time 119.77ms
iter 73190: loss 6.9754, time 122.26ms
tensor(0.2871)
iter 73200: loss 6.5362, time 119.50ms
iter 73210: loss 6.2593, time 119.33ms
iter 73220: loss 6.9687, time 121.21ms
iter 73230: loss 6.8303, time 120.14ms
iter 73240: loss 6.7985, time 119.23ms
step 73250: train loss 6.1959, val loss 6.1754
saving checkpoint to out-shakespeare-char
iter 73250: loss 6.4486, time 2877.69ms
iter 73260: loss 6.3619, time 124.02ms
iter 73270: loss 6.5094, time 122.30ms
iter 73280: loss 6.4627, time 123.24ms
iter 73290: loss 7.0466, time 123.31ms
tensor(0.2591)
iter 73300: loss 7.4212, time 122.48ms
iter 73310: loss 6.8159, time 122.62ms
iter 73320: loss 6.7531, time 123.14ms
iter 73330: loss 6.3857, time 126.05ms
iter 73340: loss 6.3393, time 123.35ms
iter 73350: loss 6.7006, time 123.23ms
iter 73360: loss 6.7744, time 123.44ms
iter 73370: loss 6.7452, time 123.11ms
iter 73380: loss 6.4806, time 123.32ms
iter 73390: loss 6.4581, time 123.64ms
tensor(0.2321)
iter 73400: loss 6.8883, time 126.67ms
iter 73410: loss 7.6627, time 123.28ms
iter 73420: loss 6.4746, time 123.25ms
iter 73430: loss 7.0141, time 123.29ms
iter 73440: loss 6.8987, time 122.49ms
iter 73450: loss 6.1478, time 123.09ms
iter 73460: loss 6.8022, time 123.27ms
iter 73470: loss 6.3359, time 123.08ms
iter 73480: loss 7.5655, time 121.86ms
iter 73490: loss 5.9930, time 122.52ms
tensor(0.2061)
step 73500: train loss 6.1731, val loss 6.1421
saving checkpoint to out-shakespeare-char
iter 73500: loss 7.0975, time 2845.64ms
iter 73510: loss 6.1635, time 122.54ms
iter 73520: loss 6.6167, time 124.50ms
iter 73530: loss 6.4589, time 122.91ms
iter 73540: loss 6.4116, time 122.69ms
iter 73550: loss 6.5140, time 122.77ms
iter 73560: loss 7.0715, time 122.62ms
iter 73570: loss 6.6256, time 122.79ms
iter 73580: loss 7.1942, time 122.77ms
iter 73590: loss 6.8965, time 125.62ms
tensor(0.1813)
iter 73600: loss 6.9364, time 122.56ms
iter 73610: loss 7.0449, time 122.74ms
iter 73620: loss 6.9526, time 122.65ms
iter 73630: loss 7.2331, time 122.85ms
iter 73640: loss 6.4990, time 122.83ms
iter 73650: loss 6.2707, time 122.53ms
iter 73660: loss 6.7445, time 126.14ms
iter 73670: loss 6.9242, time 122.69ms
iter 73680: loss 6.9631, time 122.98ms
iter 73690: loss 6.0959, time 122.75ms
tensor(0.1577)
iter 73700: loss 6.9521, time 122.01ms
iter 73710: loss 6.9575, time 123.02ms
iter 73720: loss 6.4450, time 122.69ms
iter 73730: loss 7.2386, time 125.88ms
iter 73740: loss 6.5114, time 122.61ms
step 73750: train loss 6.1474, val loss 6.2273
saving checkpoint to out-shakespeare-char
iter 73750: loss 7.1087, time 2855.89ms
iter 73760: loss 6.7096, time 125.55ms
iter 73770: loss 7.0077, time 121.80ms
iter 73780: loss 6.7457, time 122.70ms
iter 73790: loss 7.5470, time 122.09ms
tensor(0.1355)
iter 73800: loss 7.0993, time 122.92ms
iter 73810: loss 7.4191, time 123.38ms
iter 73820: loss 7.2999, time 122.85ms
iter 73830: loss 6.5725, time 125.48ms
iter 73840: loss 6.9963, time 122.75ms
iter 73850: loss 6.2687, time 122.78ms
iter 73860: loss 6.6359, time 122.23ms
iter 73870: loss 7.2102, time 122.53ms
iter 73880: loss 6.8464, time 122.96ms
iter 73890: loss 6.5783, time 122.91ms
tensor(0.1147)
iter 73900: loss 7.1771, time 122.39ms
iter 73910: loss 7.5264, time 126.10ms
iter 73920: loss 7.0333, time 121.93ms
iter 73930: loss 6.4365, time 122.64ms
iter 73940: loss 5.5327, time 122.82ms
iter 73950: loss 6.4012, time 123.28ms
iter 73960: loss 6.6653, time 123.26ms
iter 73970: loss 6.5626, time 123.43ms
iter 73980: loss 6.2750, time 123.02ms
iter 73990: loss 7.0973, time 122.50ms
tensor(0.0955)
step 74000: train loss 6.1445, val loss 6.1554
saving checkpoint to out-shakespeare-char
iter 74000: loss 6.6395, time 2878.92ms
iter 74010: loss 6.7989, time 122.98ms
iter 74020: loss 7.0081, time 126.15ms
iter 74030: loss 6.9357, time 123.23ms
iter 74040: loss 6.5694, time 122.87ms
iter 74050: loss 7.2597, time 123.15ms
iter 74060: loss 7.0399, time 123.48ms
iter 74070: loss 6.4144, time 122.78ms
iter 74080: loss 8.1040, time 122.68ms
iter 74090: loss 6.7577, time 126.03ms
tensor(0.0778)
iter 74100: loss 7.0770, time 122.36ms
iter 74110: loss 6.2926, time 122.47ms
iter 74120: loss 6.8069, time 122.49ms
iter 74130: loss 6.7158, time 122.71ms
iter 74140: loss 6.6550, time 121.61ms
iter 74150: loss 6.8302, time 122.57ms
iter 74160: loss 6.9757, time 125.45ms
iter 74170: loss 6.6681, time 122.28ms
iter 74180: loss 6.5319, time 122.54ms
iter 74190: loss 6.7747, time 120.66ms
tensor(0.0618)
iter 74200: loss 7.2552, time 120.50ms
iter 74210: loss 6.8134, time 119.78ms
iter 74220: loss 6.7689, time 122.15ms
iter 74230: loss 6.4663, time 119.89ms
iter 74240: loss 7.3949, time 122.22ms
step 74250: train loss 6.1088, val loss 6.0414
saving checkpoint to out-shakespeare-char
iter 74250: loss 6.3155, time 2885.03ms
iter 74260: loss 7.1467, time 120.96ms
iter 74270: loss 6.9386, time 119.79ms
iter 74280: loss 6.3468, time 122.48ms
iter 74290: loss 7.3937, time 119.68ms
tensor(0.0476)
iter 74300: loss 6.1416, time 119.62ms
iter 74310: loss 6.4321, time 119.70ms
iter 74320: loss 6.9102, time 120.90ms
iter 74330: loss 6.1306, time 119.69ms
iter 74340: loss 6.5264, time 120.88ms
iter 74350: loss 7.0136, time 120.72ms
iter 74360: loss 7.7206, time 119.54ms
iter 74370: loss 6.8182, time 120.81ms
iter 74380: loss 6.6562, time 119.65ms
iter 74390: loss 6.7245, time 121.68ms
tensor(0.0351)
iter 74400: loss 7.5016, time 119.66ms
iter 74410: loss 7.1614, time 121.32ms
iter 74420: loss 6.6712, time 119.77ms
iter 74430: loss 7.0816, time 119.66ms
iter 74440: loss 6.6291, time 120.83ms
iter 74450: loss 7.1236, time 121.08ms
iter 74460: loss 6.5525, time 119.69ms
iter 74470: loss 6.4979, time 121.16ms
iter 74480: loss 7.3814, time 119.70ms
iter 74490: loss 6.8751, time 119.66ms
tensor(0.0245)
step 74500: train loss 6.0701, val loss 6.0740
saving checkpoint to out-shakespeare-char
iter 74500: loss 5.9566, time 2878.49ms
iter 74510: loss 6.5664, time 122.92ms
iter 74520: loss 6.4655, time 123.15ms
iter 74530: loss 7.1145, time 123.00ms
iter 74540: loss 6.6660, time 122.67ms
iter 74550: loss 6.7063, time 122.80ms
iter 74560: loss 5.7562, time 124.50ms
iter 74570: loss 6.0946, time 122.18ms
iter 74580: loss 6.1169, time 122.97ms
iter 74590: loss 6.8964, time 123.74ms
tensor(0.0157)
iter 74600: loss 6.5458, time 122.99ms
iter 74610: loss 6.3335, time 122.79ms
iter 74620: loss 5.9595, time 122.71ms
iter 74630: loss 6.1274, time 121.69ms
iter 74640: loss 6.5747, time 122.56ms
iter 74650: loss 6.6603, time 122.68ms
iter 74660: loss 6.7628, time 122.48ms
iter 74670: loss 6.3623, time 122.58ms
iter 74680: loss 6.7068, time 122.60ms
iter 74690: loss 7.2680, time 123.95ms
tensor(0.0089)
iter 74700: loss 5.9728, time 122.69ms
iter 74710: loss 6.8137, time 122.69ms
iter 74720: loss 7.4177, time 125.44ms
iter 74730: loss 6.6793, time 122.55ms
iter 74740: loss 5.7874, time 122.71ms
step 74750: train loss 6.1439, val loss 6.0621
saving checkpoint to out-shakespeare-char
iter 74750: loss 7.1622, time 2849.98ms
iter 74760: loss 6.7676, time 122.29ms
iter 74770: loss 6.7334, time 122.69ms
iter 74780: loss 6.5565, time 122.93ms
iter 74790: loss 6.2590, time 122.41ms
tensor(0.0039)
iter 74800: loss 6.3665, time 125.49ms
iter 74810: loss 6.7559, time 121.59ms
iter 74820: loss 6.6881, time 122.45ms
iter 74830: loss 6.1863, time 122.51ms
iter 74840: loss 6.3681, time 122.58ms
iter 74850: loss 6.2201, time 122.58ms
iter 74860: loss 6.9899, time 122.63ms
iter 74870: loss 5.8386, time 125.64ms
iter 74880: loss 6.5784, time 121.37ms
iter 74890: loss 6.5716, time 122.71ms
tensor(0.0010)
iter 74900: loss 7.0540, time 122.71ms
iter 74910: loss 6.9065, time 122.44ms
iter 74920: loss 6.2850, time 121.26ms
iter 74930: loss 6.4383, time 122.53ms
iter 74940: loss 6.2162, time 121.97ms
iter 74950: loss 7.0447, time 125.57ms
iter 74960: loss 6.4350, time 122.48ms
iter 74970: loss 6.4211, time 122.45ms
iter 74980: loss 6.3888, time 122.53ms
iter 74990: loss 6.9469, time 122.48ms
tensor(0.0010)
step 75000: train loss 6.1373, val loss 6.1169
saving checkpoint to out-shakespeare-char
iter 75000: loss 7.2886, time 2857.38ms
iter 75010: loss 6.1931, time 122.71ms
iter 75020: loss 6.9217, time 123.53ms
iter 75030: loss 7.2646, time 122.86ms
iter 75040: loss 6.5150, time 122.82ms
iter 75050: loss 6.4115, time 125.42ms
iter 75060: loss 6.9930, time 121.17ms
iter 75070: loss 6.2408, time 122.58ms
iter 75080: loss 6.4679, time 122.59ms
iter 75090: loss 6.8229, time 121.61ms
tensor(0.0010)
iter 75100: loss 6.4688, time 122.44ms
iter 75110: loss 6.5643, time 122.38ms
iter 75120: loss 6.9668, time 125.26ms
iter 75130: loss 6.5293, time 122.47ms
iter 75140: loss 7.1996, time 122.53ms
iter 75150: loss 6.0951, time 122.54ms
iter 75160: loss 6.4584, time 122.59ms
iter 75170: loss 6.8592, time 122.68ms
iter 75180: loss 6.4057, time 122.53ms
iter 75190: loss 5.8599, time 124.55ms
tensor(0.0039)
iter 75200: loss 6.5180, time 123.89ms
iter 75210: loss 6.9593, time 122.63ms
iter 75220: loss 5.7981, time 121.90ms
iter 75230: loss 7.2960, time 122.59ms
iter 75240: loss 7.4457, time 122.64ms
step 75250: train loss 6.0707, val loss 6.0711
saving checkpoint to out-shakespeare-char
iter 75250: loss 6.5038, time 2853.77ms
iter 75260: loss 7.4836, time 123.02ms
iter 75270: loss 7.1087, time 122.51ms
iter 75280: loss 6.8126, time 125.76ms
iter 75290: loss 6.3084, time 122.38ms
tensor(0.0089)
iter 75300: loss 6.6459, time 122.92ms
iter 75310: loss 6.7264, time 122.45ms
iter 75320: loss 6.8013, time 122.53ms
iter 75330: loss 6.7438, time 123.06ms
iter 75340: loss 6.6706, time 123.41ms
iter 75350: loss 6.4869, time 122.80ms
iter 75360: loss 6.4419, time 125.68ms
iter 75370: loss 5.8609, time 122.74ms
iter 75380: loss 6.6754, time 123.20ms
iter 75390: loss 6.6920, time 122.81ms
tensor(0.0157)
iter 75400: loss 6.6600, time 123.14ms
iter 75410: loss 6.7482, time 122.80ms
iter 75420: loss 7.0902, time 122.74ms
iter 75430: loss 6.5097, time 122.69ms
iter 75440: loss 6.4107, time 122.95ms
iter 75450: loss 6.4579, time 122.98ms
iter 75460: loss 6.5221, time 122.19ms
iter 75470: loss 6.8885, time 126.03ms
iter 75480: loss 6.3720, time 122.66ms
iter 75490: loss 6.8908, time 121.90ms
tensor(0.0245)
step 75500: train loss 6.0808, val loss 6.0904
saving checkpoint to out-shakespeare-char
iter 75500: loss 6.8196, time 2871.84ms
iter 75510: loss 6.7949, time 125.47ms
iter 75520: loss 6.6899, time 123.14ms
iter 75530: loss 6.3556, time 123.30ms
iter 75540: loss 5.9480, time 123.41ms
iter 75550: loss 6.5135, time 123.38ms
iter 75560: loss 6.1918, time 123.34ms
iter 75570: loss 6.4901, time 123.16ms
iter 75580: loss 6.3389, time 119.38ms
iter 75590: loss 6.3053, time 120.58ms
tensor(0.0351)
iter 75600: loss 6.6385, time 118.40ms
iter 75610: loss 6.8626, time 121.96ms
iter 75620: loss 6.5514, time 119.42ms
iter 75630: loss 6.2760, time 119.64ms
iter 75640: loss 7.4062, time 119.43ms
iter 75650: loss 6.9388, time 120.47ms
iter 75660: loss 7.2457, time 119.27ms
iter 75670: loss 6.8012, time 119.96ms
iter 75680: loss 6.6855, time 119.25ms
iter 75690: loss 7.3163, time 120.27ms
tensor(0.0476)
iter 75700: loss 6.6747, time 119.07ms
iter 75710: loss 6.7095, time 120.43ms
iter 75720: loss 6.8695, time 119.33ms
iter 75730: loss 7.7314, time 119.31ms
iter 75740: loss 6.5098, time 119.26ms
step 75750: train loss 6.0918, val loss 6.0825
saving checkpoint to out-shakespeare-char
iter 75750: loss 6.8888, time 2863.08ms
iter 75760: loss 6.9750, time 123.13ms
iter 75770: loss 7.3588, time 119.51ms
iter 75780: loss 7.1435, time 119.41ms
iter 75790: loss 6.0935, time 119.70ms
tensor(0.0618)
iter 75800: loss 6.3444, time 119.59ms
iter 75810: loss 7.1591, time 119.98ms
iter 75820: loss 6.3498, time 121.09ms
iter 75830: loss 5.5936, time 119.49ms
iter 75840: loss 6.6866, time 119.53ms
iter 75850: loss 6.6510, time 119.60ms
iter 75860: loss 6.1714, time 119.89ms
iter 75870: loss 7.2413, time 119.50ms
iter 75880: loss 6.7088, time 122.36ms
iter 75890: loss 6.1371, time 119.38ms
tensor(0.0778)
iter 75900: loss 6.5887, time 119.48ms
iter 75910: loss 7.2300, time 119.01ms
iter 75920: loss 6.9774, time 122.62ms
iter 75930: loss 6.2072, time 119.55ms
iter 75940: loss 6.6590, time 120.29ms
iter 75950: loss 7.2616, time 119.05ms
iter 75960: loss 6.1564, time 123.05ms
iter 75970: loss 7.1094, time 120.66ms
iter 75980: loss 7.0606, time 120.69ms
iter 75990: loss 7.1515, time 120.62ms
tensor(0.0955)
step 76000: train loss 6.1219, val loss 6.0196
saving checkpoint to out-shakespeare-char
iter 76000: loss 7.2371, time 2864.28ms
iter 76010: loss 6.8317, time 122.82ms
iter 76020: loss 6.1687, time 125.47ms
iter 76030: loss 6.0770, time 122.92ms
iter 76040: loss 6.7103, time 122.69ms
iter 76050: loss 6.6056, time 122.71ms
iter 76060: loss 6.5792, time 122.81ms
iter 76070: loss 6.8679, time 122.66ms
iter 76080: loss 6.0924, time 123.32ms
iter 76090: loss 5.8361, time 125.65ms
tensor(0.1147)
iter 76100: loss 7.8521, time 122.11ms
iter 76110: loss 6.8691, time 122.77ms
iter 76120: loss 6.6004, time 122.83ms
iter 76130: loss 7.0630, time 122.68ms
iter 76140: loss 6.5714, time 122.72ms
iter 76150: loss 7.2124, time 122.94ms
iter 76160: loss 6.9576, time 123.22ms
iter 76170: loss 6.7319, time 122.87ms
iter 76180: loss 7.6053, time 123.46ms
iter 76190: loss 6.5706, time 125.39ms
tensor(0.1355)
iter 76200: loss 6.4509, time 122.14ms
iter 76210: loss 6.3613, time 123.10ms
iter 76220: loss 6.1328, time 123.28ms
iter 76230: loss 6.4252, time 123.01ms
iter 76240: loss 7.5530, time 123.52ms
step 76250: train loss 6.1260, val loss 6.1274
saving checkpoint to out-shakespeare-char
iter 76250: loss 7.1208, time 2856.22ms
iter 76260: loss 7.1657, time 122.44ms
iter 76270: loss 7.2583, time 122.47ms
iter 76280: loss 5.7426, time 124.52ms
iter 76290: loss 6.8308, time 121.65ms
tensor(0.1577)
iter 76300: loss 7.2949, time 122.68ms
iter 76310: loss 6.9691, time 122.60ms
iter 76320: loss 6.8088, time 121.51ms
iter 76330: loss 6.7148, time 122.01ms
iter 76340: loss 6.7959, time 122.38ms
iter 76350: loss 6.7932, time 125.03ms
iter 76360: loss 7.1217, time 121.97ms
iter 76370: loss 7.4971, time 121.65ms
iter 76380: loss 6.0638, time 122.39ms
iter 76390: loss 6.9406, time 122.51ms
tensor(0.1813)
iter 76400: loss 6.6287, time 122.03ms
iter 76410: loss 6.7326, time 121.56ms
iter 76420: loss 6.9459, time 125.47ms
iter 76430: loss 6.8260, time 123.04ms
iter 76440: loss 6.6255, time 122.89ms
iter 76450: loss 6.7319, time 121.91ms
iter 76460: loss 6.8233, time 122.32ms
iter 76470: loss 6.6798, time 122.45ms
iter 76480: loss 7.0225, time 122.51ms
iter 76490: loss 6.9361, time 124.63ms
tensor(0.2061)
step 76500: train loss 6.1092, val loss 6.1332
saving checkpoint to out-shakespeare-char
iter 76500: loss 6.7276, time 2853.95ms
iter 76510: loss 7.5558, time 121.70ms
iter 76520: loss 6.6965, time 121.42ms
iter 76530: loss 6.4863, time 122.23ms
iter 76540: loss 6.6654, time 122.38ms
iter 76550: loss 6.6774, time 122.72ms
iter 76560: loss 6.5229, time 121.59ms
iter 76570: loss 6.9342, time 124.47ms
iter 76580: loss 6.7149, time 121.56ms
iter 76590: loss 7.2945, time 121.91ms
tensor(0.2321)
iter 76600: loss 6.2771, time 122.25ms
iter 76610: loss 6.8919, time 121.11ms
iter 76620: loss 6.5386, time 122.22ms
iter 76630: loss 7.0256, time 122.67ms
iter 76640: loss 6.4633, time 125.42ms
iter 76650: loss 7.4026, time 121.47ms
iter 76660: loss 6.8554, time 121.50ms
iter 76670: loss 7.0458, time 121.62ms
iter 76680: loss 6.6325, time 122.81ms
iter 76690: loss 6.7986, time 121.67ms
tensor(0.2591)
iter 76700: loss 6.3512, time 121.84ms
iter 76710: loss 6.7502, time 125.38ms
iter 76720: loss 7.2397, time 122.42ms
iter 76730: loss 6.9457, time 121.37ms
iter 76740: loss 6.5193, time 121.95ms
step 76750: train loss 6.0965, val loss 6.1761
saving checkpoint to out-shakespeare-char
iter 76750: loss 7.5166, time 2868.14ms
iter 76760: loss 6.9571, time 122.17ms
iter 76770: loss 6.6957, time 122.50ms
iter 76780: loss 7.3397, time 122.61ms
iter 76790: loss 8.0019, time 122.61ms
tensor(0.2871)
iter 76800: loss 7.1872, time 122.44ms
iter 76810: loss 7.0219, time 122.56ms
iter 76820: loss 6.4156, time 123.57ms
iter 76830: loss 6.1716, time 121.74ms
iter 76840: loss 6.5488, time 124.69ms
iter 76850: loss 6.5943, time 122.52ms
iter 76860: loss 6.8973, time 122.14ms
iter 76870: loss 6.7215, time 122.74ms
iter 76880: loss 6.6315, time 123.78ms
iter 76890: loss 7.2263, time 122.27ms
tensor(0.3159)
iter 76900: loss 7.7173, time 123.19ms
iter 76910: loss 6.6565, time 126.26ms
iter 76920: loss 6.7558, time 122.81ms
iter 76930: loss 6.5815, time 122.45ms
iter 76940: loss 6.4629, time 122.55ms
iter 76950: loss 7.1683, time 122.70ms
iter 76960: loss 6.5916, time 122.28ms
iter 76970: loss 6.7368, time 122.20ms
iter 76980: loss 6.7853, time 125.79ms
iter 76990: loss 8.2134, time 122.75ms
tensor(0.3455)
step 77000: train loss 6.2156, val loss 6.1825
saving checkpoint to out-shakespeare-char
iter 77000: loss 7.3499, time 2848.06ms
iter 77010: loss 6.7906, time 126.02ms
iter 77020: loss 7.2799, time 121.60ms
iter 77030: loss 7.2920, time 121.77ms
iter 77040: loss 6.7458, time 122.23ms
iter 77050: loss 6.9863, time 122.67ms
iter 77060: loss 6.8818, time 121.81ms
iter 77070: loss 7.2820, time 122.37ms
iter 77080: loss 7.1088, time 126.06ms
iter 77090: loss 6.8574, time 121.78ms
tensor(0.3757)
iter 77100: loss 7.0366, time 121.94ms
iter 77110: loss 6.9354, time 121.74ms
iter 77120: loss 6.5387, time 122.22ms
iter 77130: loss 6.7970, time 121.66ms
iter 77140: loss 6.2954, time 122.35ms
iter 77150: loss 6.1721, time 125.04ms
iter 77160: loss 6.5874, time 122.60ms
iter 77170: loss 7.2802, time 122.15ms
iter 77180: loss 7.1600, time 121.90ms
iter 77190: loss 6.7123, time 121.93ms
tensor(0.4063)
iter 77200: loss 7.0736, time 122.14ms
iter 77210: loss 6.4132, time 122.00ms
iter 77220: loss 6.9391, time 125.03ms
iter 77230: loss 6.4580, time 122.27ms
iter 77240: loss 7.4547, time 121.88ms
step 77250: train loss 6.1655, val loss 6.2442
saving checkpoint to out-shakespeare-char
iter 77250: loss 6.7565, time 2891.76ms
iter 77260: loss 7.0810, time 125.78ms
iter 77270: loss 7.3912, time 122.02ms
iter 77280: loss 6.4713, time 122.02ms
iter 77290: loss 6.5186, time 122.01ms
tensor(0.4373)
iter 77300: loss 6.9959, time 122.07ms
iter 77310: loss 7.1072, time 122.21ms
iter 77320: loss 7.6015, time 124.07ms
iter 77330: loss 6.6267, time 124.36ms
iter 77340: loss 6.6466, time 122.21ms
iter 77350: loss 6.5745, time 122.00ms
iter 77360: loss 7.1773, time 122.37ms
iter 77370: loss 6.1522, time 122.65ms
iter 77380: loss 6.4352, time 121.92ms
iter 77390: loss 6.0982, time 122.31ms
tensor(0.4686)
iter 77400: loss 7.2316, time 124.91ms
iter 77410: loss 7.0082, time 122.30ms
iter 77420: loss 7.1310, time 122.79ms
iter 77430: loss 6.9160, time 122.24ms
iter 77440: loss 7.5128, time 122.11ms
iter 77450: loss 6.6941, time 122.80ms
iter 77460: loss 5.6037, time 122.36ms
iter 77470: loss 6.0876, time 122.36ms
iter 77480: loss 7.5352, time 124.95ms
iter 77490: loss 7.3526, time 122.23ms
tensor(0.5000)
step 77500: train loss 6.2333, val loss 6.3009
saving checkpoint to out-shakespeare-char
iter 77500: loss 6.6588, time 2831.00ms
iter 77510: loss 7.6811, time 123.47ms
iter 77520: loss 7.8295, time 123.43ms
iter 77530: loss 6.9515, time 127.14ms
iter 77540: loss 6.5098, time 123.30ms
iter 77550: loss 6.5843, time 123.06ms
iter 77560: loss 6.7015, time 123.06ms
iter 77570: loss 6.8987, time 123.90ms
iter 77580: loss 7.3181, time 122.98ms
iter 77590: loss 6.7378, time 123.09ms
tensor(0.5314)
iter 77600: loss 7.3118, time 122.86ms
iter 77610: loss 7.4012, time 122.75ms
iter 77620: loss 6.6682, time 123.17ms
iter 77630: loss 7.3008, time 125.74ms
iter 77640: loss 6.7008, time 122.68ms
iter 77650: loss 6.7768, time 122.83ms
iter 77660: loss 6.8688, time 123.12ms
iter 77670: loss 7.8504, time 122.81ms
iter 77680: loss 6.4486, time 123.24ms
iter 77690: loss 7.0860, time 122.77ms
tensor(0.5627)
iter 77700: loss 6.9933, time 125.92ms
iter 77710: loss 6.8120, time 122.97ms
iter 77720: loss 6.6637, time 122.98ms
iter 77730: loss 7.1313, time 122.63ms
iter 77740: loss 7.1739, time 123.01ms
step 77750: train loss 6.3752, val loss 6.3292
saving checkpoint to out-shakespeare-char
iter 77750: loss 6.3138, time 2839.90ms
iter 77760: loss 6.9836, time 122.92ms
iter 77770: loss 6.3499, time 122.77ms
iter 77780: loss 6.6896, time 123.03ms
iter 77790: loss 6.7660, time 122.79ms
tensor(0.5937)
iter 77800: loss 6.8231, time 125.10ms
iter 77810: loss 7.1448, time 123.61ms
iter 77820: loss 6.7602, time 122.83ms
iter 77830: loss 6.6357, time 122.88ms
iter 77840: loss 6.9875, time 121.83ms
iter 77850: loss 6.7457, time 123.50ms
iter 77860: loss 6.8133, time 122.92ms
iter 77870: loss 6.6199, time 123.15ms
iter 77880: loss 7.0480, time 125.75ms
iter 77890: loss 7.2332, time 122.93ms
tensor(0.6243)
iter 77900: loss 7.8417, time 122.97ms
iter 77910: loss 7.1047, time 122.93ms
iter 77920: loss 7.0260, time 123.47ms
iter 77930: loss 6.9478, time 122.84ms
iter 77940: loss 7.0232, time 122.98ms
iter 77950: loss 6.5753, time 125.06ms
iter 77960: loss 7.1289, time 122.47ms
iter 77970: loss 6.2030, time 122.41ms
iter 77980: loss 6.8679, time 122.86ms
iter 77990: loss 7.2239, time 122.42ms
tensor(0.6545)
step 78000: train loss 6.2718, val loss 6.3404
saving checkpoint to out-shakespeare-char
iter 78000: loss 7.1031, time 2842.11ms
iter 78010: loss 5.9206, time 123.16ms
iter 78020: loss 6.9616, time 123.10ms
iter 78030: loss 6.9462, time 122.94ms
iter 78040: loss 6.8403, time 126.17ms
iter 78050: loss 6.7149, time 122.78ms
iter 78060: loss 6.9933, time 123.39ms
iter 78070: loss 7.2406, time 123.04ms
iter 78080: loss 7.3066, time 123.38ms
iter 78090: loss 6.3106, time 122.98ms
tensor(0.6841)
iter 78100: loss 6.8379, time 123.24ms
iter 78110: loss 6.7886, time 123.60ms
iter 78120: loss 6.9288, time 122.87ms
iter 78130: loss 6.5218, time 123.34ms
iter 78140: loss 6.7192, time 123.47ms
iter 78150: loss 6.5232, time 123.03ms
iter 78160: loss 6.9882, time 123.13ms
iter 78170: loss 6.9489, time 126.91ms
iter 78180: loss 7.0256, time 123.59ms
iter 78190: loss 6.8972, time 123.25ms
tensor(0.7129)
iter 78200: loss 6.8013, time 123.04ms
iter 78210: loss 7.0874, time 124.00ms
iter 78220: loss 7.1184, time 123.06ms
iter 78230: loss 6.8429, time 122.88ms
iter 78240: loss 7.1068, time 125.96ms
step 78250: train loss 6.4616, val loss 6.4217
saving checkpoint to out-shakespeare-char
iter 78250: loss 7.0986, time 2833.37ms
iter 78260: loss 7.0196, time 122.22ms
iter 78270: loss 7.1626, time 122.99ms
iter 78280: loss 6.6146, time 123.25ms
iter 78290: loss 6.7509, time 122.90ms
tensor(0.7409)
iter 78300: loss 7.1634, time 122.69ms
iter 78310: loss 6.8079, time 122.77ms
iter 78320: loss 6.4375, time 122.61ms
iter 78330: loss 7.3071, time 125.43ms
iter 78340: loss 6.5146, time 122.94ms
iter 78350: loss 7.0591, time 122.54ms
iter 78360: loss 6.6591, time 122.55ms
iter 78370: loss 6.5420, time 122.62ms
iter 78380: loss 6.4742, time 122.75ms
iter 78390: loss 7.0923, time 122.71ms
tensor(0.7679)
iter 78400: loss 6.6476, time 123.77ms
iter 78410: loss 7.0136, time 125.75ms
iter 78420: loss 7.2438, time 122.48ms
iter 78430: loss 7.2301, time 122.58ms
iter 78440: loss 7.6068, time 122.67ms
iter 78450: loss 7.0843, time 122.69ms
iter 78460: loss 7.1945, time 122.72ms
iter 78470: loss 7.6127, time 123.50ms
iter 78480: loss 6.9252, time 125.77ms
iter 78490: loss 7.2704, time 122.97ms
tensor(0.7939)
step 78500: train loss 6.4272, val loss 6.3824
saving checkpoint to out-shakespeare-char
iter 78500: loss 6.5069, time 2851.11ms
iter 78510: loss 7.1115, time 121.57ms
iter 78520: loss 7.7957, time 122.89ms
iter 78530: loss 7.2747, time 122.86ms
iter 78540: loss 6.5985, time 122.69ms
iter 78550: loss 7.1078, time 122.69ms
iter 78560: loss 7.2143, time 125.52ms
iter 78570: loss 6.5082, time 122.47ms
iter 78580: loss 6.9211, time 122.51ms
iter 78590: loss 7.2673, time 122.80ms
tensor(0.8187)
iter 78600: loss 6.6334, time 122.58ms
iter 78610: loss 7.3827, time 123.03ms
iter 78620: loss 7.2184, time 122.09ms
iter 78630: loss 6.3271, time 125.66ms
iter 78640: loss 6.9053, time 122.15ms
iter 78650: loss 6.6967, time 122.72ms
iter 78660: loss 6.5616, time 122.59ms
iter 78670: loss 6.8593, time 122.63ms
iter 78680: loss 7.7301, time 122.84ms
iter 78690: loss 7.0517, time 122.23ms
tensor(0.8423)
iter 78700: loss 6.8236, time 126.13ms
iter 78710: loss 6.2628, time 122.52ms
iter 78720: loss 7.1461, time 122.32ms
iter 78730: loss 6.6057, time 122.58ms
iter 78740: loss 6.4507, time 122.39ms
step 78750: train loss 6.4330, val loss 6.4367
saving checkpoint to out-shakespeare-char
iter 78750: loss 7.2741, time 2843.76ms
iter 78760: loss 7.6215, time 122.64ms
iter 78770: loss 7.8549, time 122.46ms
iter 78780: loss 7.2933, time 123.44ms
iter 78790: loss 6.9096, time 122.61ms
tensor(0.8645)
iter 78800: loss 7.0317, time 122.23ms
iter 78810: loss 6.4830, time 125.01ms
iter 78820: loss 6.8605, time 122.44ms
iter 78830: loss 7.1931, time 122.55ms
iter 78840: loss 7.0067, time 122.79ms
iter 78850: loss 6.6032, time 122.18ms
iter 78860: loss 7.4475, time 122.50ms
iter 78870: loss 6.8065, time 122.75ms
iter 78880: loss 6.8573, time 125.60ms
iter 78890: loss 7.8159, time 122.45ms
tensor(0.8853)
iter 78900: loss 7.0073, time 123.82ms
iter 78910: loss 7.0179, time 122.85ms
iter 78920: loss 6.8173, time 122.51ms
iter 78930: loss 7.2039, time 122.48ms
iter 78940: loss 6.6326, time 122.37ms
iter 78950: loss 6.9796, time 124.77ms
iter 78960: loss 7.8899, time 122.55ms
iter 78970: loss 7.3706, time 122.45ms
iter 78980: loss 6.8296, time 122.47ms
iter 78990: loss 7.0204, time 123.42ms
tensor(0.9045)
step 79000: train loss 6.5180, val loss 6.4403
saving checkpoint to out-shakespeare-char
iter 79000: loss 7.4830, time 2856.01ms
iter 79010: loss 7.7572, time 122.66ms
iter 79020: loss 6.6629, time 122.57ms
iter 79030: loss 7.5752, time 125.56ms
iter 79040: loss 7.5603, time 124.07ms
iter 79050: loss 7.2187, time 122.63ms
iter 79060: loss 7.3595, time 121.86ms
iter 79070: loss 7.1416, time 124.94ms
iter 79080: loss 6.9094, time 122.51ms
iter 79090: loss 6.7445, time 122.35ms
tensor(0.9222)
iter 79100: loss 6.5969, time 123.49ms
iter 79110: loss 7.2825, time 122.62ms
iter 79120: loss 7.0385, time 123.31ms
iter 79130: loss 6.9972, time 126.04ms
iter 79140: loss 6.6174, time 123.68ms
iter 79150: loss 6.1780, time 123.04ms
iter 79160: loss 6.9432, time 122.74ms
iter 79170: loss 7.2158, time 122.98ms
iter 79180: loss 6.6320, time 122.93ms
iter 79190: loss 7.2272, time 122.96ms
tensor(0.9382)
iter 79200: loss 6.7564, time 122.89ms
iter 79210: loss 6.5766, time 126.03ms
iter 79220: loss 6.7390, time 122.86ms
iter 79230: loss 5.9735, time 122.11ms
iter 79240: loss 8.0545, time 123.19ms
step 79250: train loss 6.4964, val loss 6.4438
saving checkpoint to out-shakespeare-char
iter 79250: loss 7.0106, time 2845.63ms
iter 79260: loss 7.4523, time 123.22ms
iter 79270: loss 6.9766, time 122.98ms
iter 79280: loss 7.4256, time 122.92ms
iter 79290: loss 6.8454, time 123.65ms
tensor(0.9524)
iter 79300: loss 6.7225, time 119.22ms
iter 79310: loss 7.2195, time 119.36ms
iter 79320: loss 6.6422, time 120.57ms
iter 79330: loss 6.7327, time 120.38ms
iter 79340: loss 7.1952, time 120.06ms
iter 79350: loss 7.6807, time 119.17ms
iter 79360: loss 6.7307, time 119.19ms
iter 79370: loss 6.4257, time 119.10ms
iter 79380: loss 7.3791, time 120.38ms
iter 79390: loss 7.8294, time 119.06ms
tensor(0.9649)
iter 79400: loss 7.3971, time 119.09ms
iter 79410: loss 6.5521, time 120.68ms
iter 79420: loss 6.5505, time 119.51ms
iter 79430: loss 6.8781, time 118.06ms
iter 79440: loss 7.3071, time 120.69ms
iter 79450: loss 7.2211, time 120.30ms
iter 79460: loss 7.1493, time 120.93ms
iter 79470: loss 7.4869, time 120.04ms
iter 79480: loss 8.0190, time 120.63ms
iter 79490: loss 7.4291, time 120.16ms
tensor(0.9755)
step 79500: train loss 6.4998, val loss 6.5167
saving checkpoint to out-shakespeare-char
iter 79500: loss 7.3422, time 2869.82ms
iter 79510: loss 7.5654, time 122.39ms
iter 79520: loss 6.7356, time 122.96ms
iter 79530: loss 7.2755, time 125.24ms
iter 79540: loss 6.7603, time 122.36ms
iter 79550: loss 7.1555, time 122.33ms
iter 79560: loss 7.3050, time 122.35ms
iter 79570: loss 6.3910, time 122.43ms
iter 79580: loss 7.0691, time 122.49ms
iter 79590: loss 7.9739, time 121.69ms
tensor(0.9843)
iter 79600: loss 7.1407, time 125.57ms
iter 79610: loss 7.4388, time 122.42ms
iter 79620: loss 6.2463, time 122.96ms
iter 79630: loss 7.4081, time 122.00ms
iter 79640: loss 7.8098, time 122.75ms
iter 79650: loss 7.4793, time 121.62ms
iter 79660: loss 7.3446, time 122.44ms
iter 79670: loss 6.3115, time 124.64ms
iter 79680: loss 6.6692, time 122.33ms
iter 79690: loss 6.3378, time 122.42ms
tensor(0.9911)
iter 79700: loss 6.1952, time 122.60ms
iter 79710: loss 7.3793, time 122.52ms
iter 79720: loss 7.3649, time 122.87ms
iter 79730: loss 7.3395, time 122.43ms
iter 79740: loss 6.2744, time 125.18ms
step 79750: train loss 6.5130, val loss 6.5264
saving checkpoint to out-shakespeare-char
iter 79750: loss 6.3483, time 2864.64ms
iter 79760: loss 7.7272, time 125.54ms
iter 79770: loss 7.3176, time 121.87ms
iter 79780: loss 7.3919, time 122.30ms
iter 79790: loss 6.5822, time 122.30ms
tensor(0.9961)
iter 79800: loss 7.1441, time 122.45ms
iter 79810: loss 7.5657, time 122.48ms
iter 79820: loss 6.8807, time 122.48ms
iter 79830: loss 7.4649, time 125.56ms
iter 79840: loss 6.8743, time 122.29ms
iter 79850: loss 7.2640, time 122.75ms
iter 79860: loss 7.6043, time 122.56ms
iter 79870: loss 6.3394, time 121.69ms
iter 79880: loss 7.0977, time 122.33ms
iter 79890: loss 7.2580, time 122.23ms
tensor(0.9990)
iter 79900: loss 7.5768, time 124.21ms
iter 79910: loss 7.5497, time 121.82ms
iter 79920: loss 7.2355, time 122.48ms
iter 79930: loss 7.1784, time 122.61ms
iter 79940: loss 7.6128, time 122.41ms
iter 79950: loss 7.6570, time 122.44ms
iter 79960: loss 6.8774, time 122.55ms
iter 79970: loss 7.1703, time 125.46ms
iter 79980: loss 6.6621, time 122.27ms
iter 79990: loss 6.9406, time 122.48ms
tensor(1.)
step 80000: train loss 6.4919, val loss 6.4659
saving checkpoint to out-shakespeare-char
iter 80000: loss 6.8959, time 2867.69ms
iter 80010: loss 6.5197, time 122.82ms
iter 80020: loss 6.6658, time 122.64ms
iter 80030: loss 7.4359, time 123.65ms
iter 80040: loss 6.6041, time 122.41ms
iter 80050: loss 7.5904, time 122.46ms
iter 80060: loss 6.5654, time 125.26ms
iter 80070: loss 7.1681, time 122.29ms
iter 80080: loss 6.0235, time 122.54ms
iter 80090: loss 5.9385, time 122.55ms
tensor(0.9990)
iter 80100: loss 6.6489, time 122.25ms
iter 80110: loss 6.6559, time 122.07ms
iter 80120: loss 6.9402, time 122.59ms
iter 80130: loss 7.3436, time 124.82ms
iter 80140: loss 7.1095, time 122.45ms
iter 80150: loss 7.5464, time 122.45ms
iter 80160: loss 7.5698, time 122.39ms
iter 80170: loss 6.5521, time 122.53ms
iter 80180: loss 6.6840, time 122.49ms
iter 80190: loss 6.8982, time 122.36ms
tensor(0.9961)
iter 80200: loss 7.2761, time 125.33ms
iter 80210: loss 6.5958, time 122.85ms
iter 80220: loss 7.2577, time 123.62ms
iter 80230: loss 7.4335, time 121.62ms
iter 80240: loss 6.6941, time 121.34ms
step 80250: train loss 6.5248, val loss 6.5222
saving checkpoint to out-shakespeare-char
iter 80250: loss 7.1673, time 2862.52ms
iter 80260: loss 6.7387, time 122.78ms
iter 80270: loss 7.3939, time 123.24ms
iter 80280: loss 7.0675, time 124.42ms
iter 80290: loss 7.3404, time 122.82ms
tensor(0.9911)
iter 80300: loss 7.3058, time 124.53ms
iter 80310: loss 6.7386, time 123.02ms
iter 80320: loss 7.8744, time 122.49ms
iter 80330: loss 6.8420, time 122.55ms
iter 80340: loss 7.1157, time 122.41ms
iter 80350: loss 7.3535, time 125.18ms
iter 80360: loss 7.2267, time 122.31ms
iter 80370: loss 7.3895, time 122.31ms
iter 80380: loss 6.8786, time 122.35ms
iter 80390: loss 7.9326, time 122.47ms
tensor(0.9843)
iter 80400: loss 7.4161, time 122.41ms
iter 80410: loss 6.3944, time 121.46ms
iter 80420: loss 8.2636, time 125.35ms
iter 80430: loss 6.6777, time 122.22ms
iter 80440: loss 7.4500, time 122.35ms
iter 80450: loss 6.5109, time 121.46ms
iter 80460: loss 7.3825, time 122.32ms
iter 80470: loss 7.2675, time 122.47ms
iter 80480: loss 7.0362, time 122.94ms
iter 80490: loss 7.1455, time 125.31ms
tensor(0.9755)
step 80500: train loss 6.5625, val loss 6.5643
saving checkpoint to out-shakespeare-char
iter 80500: loss 7.2446, time 2865.19ms
iter 80510: loss 6.8449, time 124.21ms
iter 80520: loss 7.3888, time 122.36ms
iter 80530: loss 7.2654, time 122.31ms
iter 80540: loss 6.7634, time 122.43ms
iter 80550: loss 7.4732, time 122.39ms
iter 80560: loss 6.1915, time 121.33ms
iter 80570: loss 6.8015, time 121.42ms
iter 80580: loss 7.3000, time 122.17ms
iter 80590: loss 6.9287, time 124.84ms
tensor(0.9649)
iter 80600: loss 7.2108, time 122.33ms
iter 80610: loss 7.7422, time 122.50ms
iter 80620: loss 6.6066, time 122.42ms
iter 80630: loss 7.3196, time 122.24ms
iter 80640: loss 7.5878, time 121.50ms
iter 80650: loss 7.1761, time 122.47ms
iter 80660: loss 7.6674, time 125.08ms
iter 80670: loss 6.8121, time 123.03ms
iter 80680: loss 7.2117, time 124.17ms
iter 80690: loss 7.4225, time 122.77ms
tensor(0.9524)
iter 80700: loss 7.2333, time 122.21ms
iter 80710: loss 6.6770, time 122.85ms
iter 80720: loss 7.1357, time 123.11ms
iter 80730: loss 6.9362, time 125.36ms
iter 80740: loss 6.9088, time 121.90ms
step 80750: train loss 6.4473, val loss 6.4588
saving checkpoint to out-shakespeare-char
iter 80750: loss 7.2004, time 2858.23ms
iter 80760: loss 7.0345, time 125.13ms
iter 80770: loss 6.6802, time 121.99ms
iter 80780: loss 6.9007, time 122.12ms
iter 80790: loss 7.4329, time 123.00ms
tensor(0.9382)
iter 80800: loss 7.1790, time 122.45ms
iter 80810: loss 7.6045, time 121.80ms
iter 80820: loss 7.1631, time 121.78ms
iter 80830: loss 6.8704, time 125.73ms
iter 80840: loss 6.7181, time 122.82ms
iter 80850: loss 7.6484, time 121.93ms
iter 80860: loss 6.9582, time 122.27ms
iter 80870: loss 6.7968, time 122.32ms
iter 80880: loss 7.4351, time 122.08ms
iter 80890: loss 7.3870, time 122.12ms
tensor(0.9222)
iter 80900: loss 7.7874, time 126.12ms
iter 80910: loss 6.6554, time 122.59ms
iter 80920: loss 6.7234, time 122.13ms
iter 80930: loss 7.3705, time 121.73ms
iter 80940: loss 6.8288, time 122.74ms
iter 80950: loss 7.4623, time 122.61ms
iter 80960: loss 7.7057, time 122.83ms
iter 80970: loss 6.6297, time 121.94ms
iter 80980: loss 7.8208, time 125.84ms
iter 80990: loss 7.6162, time 122.52ms
tensor(0.9045)
step 81000: train loss 6.5031, val loss 6.5210
saving checkpoint to out-shakespeare-char
iter 81000: loss 6.7173, time 2851.98ms
iter 81010: loss 7.9005, time 126.41ms
iter 81020: loss 7.0494, time 123.51ms
iter 81030: loss 6.3525, time 122.82ms
iter 81040: loss 7.1710, time 117.51ms
iter 81050: loss 7.5578, time 117.32ms
iter 81060: loss 7.3871, time 117.30ms
iter 81070: loss 7.1531, time 121.13ms
iter 81080: loss 7.2264, time 117.27ms
iter 81090: loss 7.1803, time 118.59ms
tensor(0.8853)
iter 81100: loss 6.7468, time 117.30ms
iter 81110: loss 7.8405, time 118.96ms
iter 81120: loss 6.4511, time 117.36ms
iter 81130: loss 6.9655, time 118.30ms
iter 81140: loss 7.1125, time 117.34ms
iter 81150: loss 7.0208, time 117.80ms
iter 81160: loss 6.8172, time 117.42ms
iter 81170: loss 6.8711, time 118.44ms
iter 81180: loss 7.1814, time 117.07ms
iter 81190: loss 6.8282, time 119.21ms
tensor(0.8645)
iter 81200: loss 7.5808, time 119.18ms
iter 81210: loss 6.9786, time 119.16ms
iter 81220: loss 6.6869, time 118.88ms
iter 81230: loss 6.6110, time 118.78ms
iter 81240: loss 6.8333, time 119.34ms
step 81250: train loss 6.4505, val loss 6.4483
saving checkpoint to out-shakespeare-char
iter 81250: loss 7.0788, time 2841.50ms
iter 81260: loss 7.3220, time 120.53ms
iter 81270: loss 7.7310, time 120.32ms
iter 81280: loss 7.2509, time 120.55ms
iter 81290: loss 6.9650, time 119.26ms
tensor(0.8423)
iter 81300: loss 6.6132, time 120.60ms
iter 81310: loss 6.9164, time 119.45ms
iter 81320: loss 7.3541, time 120.92ms
iter 81330: loss 6.8095, time 119.77ms
iter 81340: loss 7.4001, time 119.89ms
iter 81350: loss 6.9523, time 119.87ms
iter 81360: loss 6.8938, time 119.37ms
iter 81370: loss 8.2925, time 119.88ms
iter 81380: loss 7.0548, time 119.25ms
iter 81390: loss 6.6973, time 119.00ms
tensor(0.8187)
iter 81400: loss 7.4468, time 120.40ms
iter 81410: loss 7.1902, time 119.03ms
iter 81420: loss 7.1505, time 121.92ms
iter 81430: loss 7.2080, time 120.39ms
iter 81440: loss 6.8197, time 118.86ms
iter 81450: loss 6.8162, time 119.13ms
iter 81460: loss 6.1941, time 117.96ms
iter 81470: loss 7.1863, time 119.01ms
iter 81480: loss 7.9837, time 119.58ms
iter 81490: loss 6.7747, time 120.22ms
tensor(0.7939)
step 81500: train loss 6.3950, val loss 6.4187
saving checkpoint to out-shakespeare-char
iter 81500: loss 6.8034, time 2868.51ms
iter 81510: loss 7.8909, time 120.53ms
iter 81520: loss 6.9624, time 119.21ms
iter 81530: loss 7.1172, time 120.30ms
iter 81540: loss 7.4360, time 119.04ms
iter 81550: loss 6.9189, time 119.06ms
iter 81560: loss 7.2227, time 120.08ms
iter 81570: loss 7.1869, time 119.06ms
iter 81580: loss 7.3330, time 119.05ms
iter 81590: loss 7.2398, time 119.28ms
tensor(0.7679)
iter 81600: loss 7.9765, time 119.00ms
iter 81610: loss 6.7100, time 120.59ms
iter 81620: loss 7.0847, time 119.71ms
iter 81630: loss 7.0378, time 119.14ms
iter 81640: loss 6.6910, time 119.60ms
iter 81650: loss 6.9621, time 119.09ms
iter 81660: loss 6.5558, time 119.13ms
iter 81670: loss 6.5384, time 119.04ms
iter 81680: loss 6.4810, time 119.15ms
iter 81690: loss 6.7198, time 120.44ms
tensor(0.7409)
iter 81700: loss 6.8270, time 118.99ms
iter 81710: loss 6.8323, time 120.47ms
iter 81720: loss 7.2692, time 119.25ms
iter 81730: loss 7.0441, time 119.19ms
iter 81740: loss 7.2440, time 119.54ms
step 81750: train loss 6.3449, val loss 6.4195
saving checkpoint to out-shakespeare-char
iter 81750: loss 7.2774, time 2867.68ms
iter 81760: loss 7.4455, time 119.76ms
iter 81770: loss 6.9867, time 119.93ms
iter 81780: loss 7.1177, time 119.76ms
iter 81790: loss 7.2410, time 119.69ms
tensor(0.7129)
iter 81800: loss 6.0498, time 119.20ms
iter 81810: loss 6.8550, time 119.83ms
iter 81820: loss 7.4567, time 119.39ms
iter 81830: loss 6.9781, time 119.91ms
iter 81840: loss 6.0607, time 119.42ms
iter 81850: loss 7.1970, time 119.15ms
iter 81860: loss 7.1190, time 119.47ms
iter 81870: loss 6.2439, time 120.30ms
iter 81880: loss 6.6344, time 122.47ms
iter 81890: loss 6.9648, time 120.08ms
tensor(0.6841)
iter 81900: loss 7.7266, time 119.42ms
iter 81910: loss 6.5126, time 119.54ms
iter 81920: loss 7.1597, time 121.97ms
iter 81930: loss 6.7320, time 120.68ms
iter 81940: loss 6.8580, time 120.59ms
iter 81950: loss 7.4654, time 119.49ms
iter 81960: loss 7.1046, time 119.95ms
iter 81970: loss 6.2992, time 120.76ms
iter 81980: loss 7.3890, time 119.64ms
iter 81990: loss 6.7164, time 119.92ms
tensor(0.6545)
step 82000: train loss 6.3658, val loss 6.3485
saving checkpoint to out-shakespeare-char
iter 82000: loss 6.5954, time 2889.13ms
iter 82010: loss 6.8173, time 119.55ms
iter 82020: loss 6.9845, time 120.62ms
iter 82030: loss 6.7549, time 118.64ms
iter 82040: loss 6.7482, time 118.48ms
iter 82050: loss 8.0586, time 120.05ms
iter 82060: loss 7.6724, time 119.01ms
iter 82070: loss 7.6024, time 120.58ms
iter 82080: loss 6.7338, time 120.71ms
iter 82090: loss 6.7747, time 119.52ms
tensor(0.6243)
iter 82100: loss 6.3403, time 121.21ms
iter 82110: loss 7.5506, time 120.56ms
iter 82120: loss 7.1625, time 119.54ms
iter 82130: loss 6.7936, time 119.37ms
iter 82140: loss 6.8529, time 122.37ms
iter 82150: loss 6.5950, time 119.99ms
iter 82160: loss 7.4296, time 120.92ms
iter 82170: loss 7.2157, time 120.54ms
iter 82180: loss 6.7515, time 122.87ms
iter 82190: loss 7.1673, time 119.17ms
tensor(0.5937)
iter 82200: loss 6.2679, time 119.93ms
iter 82210: loss 7.2125, time 120.78ms
iter 82220: loss 6.8270, time 119.35ms
iter 82230: loss 6.8399, time 119.51ms
iter 82240: loss 7.0276, time 121.16ms
step 82250: train loss 6.2908, val loss 6.3750
saving checkpoint to out-shakespeare-char
iter 82250: loss 6.6941, time 2870.24ms
iter 82260: loss 7.2668, time 117.13ms
iter 82270: loss 7.7116, time 119.01ms
iter 82280: loss 7.0873, time 120.87ms
iter 82290: loss 7.3422, time 124.46ms
tensor(0.5627)
iter 82300: loss 7.2248, time 123.63ms
iter 82310: loss 6.8494, time 123.30ms
iter 82320: loss 7.4714, time 124.03ms
iter 82330: loss 6.4474, time 118.87ms
iter 82340: loss 7.1117, time 116.73ms
iter 82350: loss 6.9025, time 117.25ms
iter 82360: loss 6.9005, time 116.58ms
iter 82370: loss 6.1414, time 120.81ms
iter 82380: loss 7.0437, time 119.65ms
iter 82390: loss 7.3219, time 120.55ms
tensor(0.5314)
iter 82400: loss 7.0495, time 119.25ms
iter 82410: loss 7.5622, time 120.16ms
iter 82420: loss 6.3370, time 118.69ms
iter 82430: loss 6.7958, time 118.83ms
iter 82440: loss 6.7709, time 120.49ms
iter 82450: loss 5.9770, time 120.21ms
iter 82460: loss 6.7733, time 120.77ms
iter 82470: loss 6.3837, time 119.78ms
iter 82480: loss 6.2880, time 120.39ms
iter 82490: loss 6.4688, time 120.76ms
tensor(0.5000)
step 82500: train loss 6.3138, val loss 6.3815
saving checkpoint to out-shakespeare-char
iter 82500: loss 6.4359, time 2851.80ms
iter 82510: loss 6.4916, time 123.63ms
iter 82520: loss 7.5036, time 120.37ms
iter 82530: loss 7.1073, time 122.06ms
iter 82540: loss 6.8910, time 123.40ms
iter 82550: loss 5.9776, time 125.97ms
iter 82560: loss 7.2456, time 123.35ms
iter 82570: loss 6.5083, time 125.62ms
iter 82580: loss 7.1940, time 123.93ms
iter 82590: loss 7.1340, time 123.00ms
tensor(0.4686)
iter 82600: loss 6.7491, time 122.33ms
iter 82610: loss 6.5950, time 122.12ms
iter 82620: loss 6.3267, time 122.74ms
iter 82630: loss 6.4441, time 123.55ms
iter 82640: loss 7.1211, time 123.02ms
iter 82650: loss 7.0662, time 126.89ms
iter 82660: loss 7.4352, time 122.83ms
iter 82670: loss 6.9029, time 122.63ms
iter 82680: loss 6.8922, time 122.41ms
iter 82690: loss 7.3491, time 123.23ms
tensor(0.4373)
iter 82700: loss 7.4320, time 122.96ms
iter 82710: loss 6.9382, time 122.10ms
iter 82720: loss 7.1760, time 123.00ms
iter 82730: loss 7.2683, time 122.97ms
iter 82740: loss 7.1983, time 122.97ms
step 82750: train loss 6.2886, val loss 6.2863
saving checkpoint to out-shakespeare-char
iter 82750: loss 6.8060, time 2836.80ms
iter 82760: loss 7.0307, time 122.12ms
iter 82770: loss 6.7060, time 124.48ms
iter 82780: loss 7.0107, time 125.00ms
iter 82790: loss 7.0185, time 120.54ms
tensor(0.4063)
iter 82800: loss 6.5410, time 122.67ms
iter 82810: loss 6.4155, time 121.62ms
iter 82820: loss 6.8316, time 122.74ms
iter 82830: loss 6.5906, time 123.78ms
iter 82840: loss 6.1387, time 125.72ms
iter 82850: loss 7.0969, time 122.73ms
iter 82860: loss 6.4751, time 118.41ms
iter 82870: loss 6.1571, time 123.23ms
iter 82880: loss 7.4308, time 124.22ms
iter 82890: loss 6.5983, time 123.70ms
tensor(0.3757)
iter 82900: loss 6.8041, time 123.75ms
iter 82910: loss 6.6170, time 123.47ms
iter 82920: loss 6.7320, time 125.21ms
iter 82930: loss 6.5844, time 122.99ms
iter 82940: loss 7.9883, time 122.96ms
iter 82950: loss 6.7064, time 123.12ms
iter 82960: loss 7.5760, time 122.77ms
iter 82970: loss 6.8442, time 123.23ms
iter 82980: loss 6.9551, time 122.92ms
iter 82990: loss 6.7496, time 123.38ms
tensor(0.3455)
step 83000: train loss 6.2685, val loss 6.2694
saving checkpoint to out-shakespeare-char
iter 83000: loss 6.9444, time 2870.20ms
iter 83010: loss 7.0274, time 123.26ms
iter 83020: loss 7.1036, time 123.60ms
iter 83030: loss 6.2984, time 125.78ms
iter 83040: loss 6.4950, time 123.30ms
iter 83050: loss 6.7286, time 122.13ms
iter 83060: loss 6.7357, time 123.23ms
iter 83070: loss 6.5829, time 122.76ms
iter 83080: loss 7.0120, time 125.98ms
iter 83090: loss 7.1424, time 123.39ms
tensor(0.3159)
iter 83100: loss 6.5522, time 123.23ms
iter 83110: loss 7.0084, time 122.97ms
iter 83120: loss 6.2082, time 122.83ms
iter 83130: loss 7.7366, time 121.91ms
iter 83140: loss 6.2040, time 124.13ms
iter 83150: loss 6.6700, time 123.57ms
iter 83160: loss 6.6682, time 122.97ms
iter 83170: loss 7.4197, time 126.05ms
iter 83180: loss 6.9657, time 122.63ms
iter 83190: loss 7.0981, time 123.24ms
tensor(0.2871)
iter 83200: loss 6.4015, time 122.91ms
iter 83210: loss 6.9848, time 122.77ms
iter 83220: loss 6.9202, time 123.24ms
iter 83230: loss 6.7655, time 122.91ms
iter 83240: loss 6.9281, time 121.68ms
step 83250: train loss 6.1932, val loss 6.2202
saving checkpoint to out-shakespeare-char
iter 83250: loss 7.0063, time 2852.05ms
iter 83260: loss 6.6767, time 119.21ms
iter 83270: loss 6.6251, time 120.37ms
iter 83280: loss 7.0722, time 120.62ms
iter 83290: loss 6.9101, time 120.08ms
tensor(0.2591)
iter 83300: loss 6.5044, time 119.94ms
iter 83310: loss 6.6433, time 119.69ms
iter 83320: loss 7.5184, time 120.32ms
iter 83330: loss 7.2248, time 119.15ms
iter 83340: loss 6.6118, time 119.10ms
iter 83350: loss 6.8809, time 120.37ms
iter 83360: loss 6.3264, time 120.29ms
iter 83370: loss 6.3803, time 119.09ms
iter 83380: loss 6.4425, time 118.94ms
iter 83390: loss 6.5663, time 120.34ms
tensor(0.2321)
iter 83400: loss 7.0078, time 121.09ms
iter 83410: loss 5.9210, time 119.17ms
iter 83420: loss 7.0261, time 119.15ms
iter 83430: loss 6.7866, time 119.21ms
iter 83440: loss 6.7770, time 119.34ms
iter 83450: loss 6.2535, time 119.66ms
iter 83460: loss 6.3606, time 119.10ms
iter 83470: loss 7.0576, time 119.06ms
iter 83480: loss 7.2687, time 119.13ms
iter 83490: loss 6.8851, time 119.02ms
tensor(0.2061)
step 83500: train loss 6.1841, val loss 6.1924
saving checkpoint to out-shakespeare-char
iter 83500: loss 6.7633, time 2865.99ms
iter 83510: loss 6.3477, time 123.65ms
iter 83520: loss 6.1001, time 125.76ms
iter 83530: loss 6.5162, time 122.78ms
iter 83540: loss 6.7885, time 123.06ms
iter 83550: loss 5.8726, time 122.91ms
iter 83560: loss 7.2733, time 123.68ms
iter 83570: loss 6.8884, time 123.24ms
iter 83580: loss 6.7540, time 125.75ms
iter 83590: loss 5.9044, time 123.08ms
tensor(0.1813)
iter 83600: loss 6.9743, time 123.54ms
iter 83610: loss 7.1105, time 123.09ms
iter 83620: loss 6.9297, time 123.44ms
iter 83630: loss 6.6590, time 122.52ms
iter 83640: loss 6.7347, time 123.08ms
iter 83650: loss 6.7426, time 123.10ms
iter 83660: loss 6.6075, time 125.86ms
iter 83670: loss 6.9797, time 122.91ms
iter 83680: loss 6.9684, time 123.05ms
iter 83690: loss 6.3535, time 122.92ms
tensor(0.1577)
iter 83700: loss 7.6021, time 122.91ms
iter 83710: loss 6.6295, time 122.08ms
iter 83720: loss 6.7948, time 122.75ms
iter 83730: loss 6.9864, time 126.25ms
iter 83740: loss 6.9407, time 122.60ms
step 83750: train loss 6.1476, val loss 6.1753
saving checkpoint to out-shakespeare-char
iter 83750: loss 5.4285, time 2857.96ms
iter 83760: loss 6.1969, time 121.75ms
iter 83770: loss 6.7557, time 121.94ms
iter 83780: loss 7.2550, time 122.80ms
iter 83790: loss 7.3799, time 122.58ms
tensor(0.1355)
iter 83800: loss 6.6220, time 122.88ms
iter 83810: loss 6.8473, time 125.44ms
iter 83820: loss 7.0184, time 122.90ms
iter 83830: loss 6.1918, time 122.81ms
iter 83840: loss 7.0298, time 121.92ms
iter 83850: loss 6.9031, time 121.92ms
iter 83860: loss 6.7274, time 122.80ms
iter 83870: loss 5.7911, time 123.18ms
iter 83880: loss 6.7157, time 122.74ms
iter 83890: loss 6.6410, time 125.60ms
tensor(0.1147)
iter 83900: loss 6.1565, time 124.03ms
iter 83910: loss 6.5815, time 122.93ms
iter 83920: loss 7.5602, time 122.95ms
iter 83930: loss 7.0663, time 123.12ms
iter 83940: loss 6.6626, time 122.64ms
iter 83950: loss 6.6358, time 121.96ms
iter 83960: loss 5.9283, time 124.13ms
iter 83970: loss 6.2144, time 122.87ms
iter 83980: loss 6.5683, time 122.11ms
iter 83990: loss 6.3372, time 122.97ms
tensor(0.0955)
step 84000: train loss 6.1338, val loss 6.1665
saving checkpoint to out-shakespeare-char
iter 84000: loss 5.8913, time 2844.19ms
iter 84010: loss 5.5171, time 119.65ms
iter 84020: loss 7.0172, time 119.45ms
iter 84030: loss 6.5325, time 119.39ms
iter 84040: loss 6.6425, time 120.33ms
iter 84050: loss 7.2217, time 119.47ms
iter 84060: loss 7.2541, time 119.81ms
iter 84070: loss 6.0759, time 119.17ms
iter 84080: loss 7.2523, time 120.59ms
iter 84090: loss 6.7250, time 119.47ms
tensor(0.0778)
iter 84100: loss 6.8672, time 121.03ms
iter 84110: loss 6.7044, time 120.67ms
iter 84120: loss 6.5365, time 121.42ms
iter 84130: loss 6.7523, time 120.86ms
iter 84140: loss 6.7299, time 120.05ms
iter 84150: loss 6.8543, time 120.83ms
iter 84160: loss 6.8310, time 119.65ms
iter 84170: loss 6.4371, time 119.89ms
iter 84180: loss 6.7853, time 119.62ms
iter 84190: loss 6.7726, time 119.56ms
tensor(0.0618)
iter 84200: loss 7.0029, time 119.80ms
iter 84210: loss 7.2450, time 121.12ms
iter 84220: loss 7.4415, time 119.62ms
iter 84230: loss 6.8319, time 120.38ms
iter 84240: loss 6.7921, time 120.90ms
step 84250: train loss 6.0238, val loss 6.0600
saving checkpoint to out-shakespeare-char
iter 84250: loss 6.5063, time 2882.50ms
iter 84260: loss 6.5171, time 126.42ms
iter 84270: loss 6.4192, time 121.78ms
iter 84280: loss 6.7443, time 122.68ms
iter 84290: loss 6.9961, time 122.39ms
tensor(0.0476)
iter 84300: loss 6.7439, time 122.88ms
iter 84310: loss 7.5008, time 120.77ms
iter 84320: loss 6.2878, time 123.23ms
iter 84330: loss 6.5228, time 123.00ms
iter 84340: loss 6.8661, time 123.05ms
iter 84350: loss 6.5403, time 125.89ms
iter 84360: loss 6.5276, time 122.95ms
iter 84370: loss 5.7392, time 123.12ms
iter 84380: loss 7.0033, time 122.47ms
iter 84390: loss 6.4906, time 123.35ms
tensor(0.0351)
iter 84400: loss 6.4855, time 123.26ms
iter 84410: loss 6.5777, time 122.91ms
iter 84420: loss 6.5076, time 122.95ms
iter 84430: loss 6.8704, time 123.41ms
iter 84440: loss 5.3029, time 122.98ms
iter 84450: loss 6.5855, time 122.64ms
iter 84460: loss 6.2652, time 122.67ms
iter 84470: loss 6.3845, time 122.63ms
iter 84480: loss 7.0255, time 123.56ms
iter 84490: loss 6.7813, time 125.77ms
tensor(0.0245)
step 84500: train loss 6.0541, val loss 6.0808
saving checkpoint to out-shakespeare-char
iter 84500: loss 7.2045, time 2856.87ms
iter 84510: loss 6.2470, time 119.52ms
iter 84520: loss 6.9213, time 120.34ms
iter 84530: loss 7.2074, time 119.34ms
iter 84540: loss 6.1514, time 119.30ms
iter 84550: loss 7.2161, time 119.17ms
iter 84560: loss 6.7802, time 119.17ms
iter 84570: loss 6.7414, time 122.28ms
iter 84580: loss 7.0173, time 119.43ms
iter 84590: loss 6.1827, time 119.14ms
tensor(0.0157)
iter 84600: loss 6.4703, time 119.27ms
iter 84610: loss 6.5926, time 121.96ms
iter 84620: loss 6.1404, time 119.08ms
iter 84630: loss 6.0707, time 120.50ms
iter 84640: loss 7.1125, time 119.04ms
iter 84650: loss 7.5390, time 119.35ms
iter 84660: loss 7.5374, time 119.11ms
iter 84670: loss 7.2521, time 120.28ms
iter 84680: loss 6.5836, time 122.38ms
iter 84690: loss 7.2084, time 119.63ms
tensor(0.0089)
iter 84700: loss 6.9726, time 119.83ms
iter 84710: loss 5.3025, time 119.71ms
iter 84720: loss 6.6430, time 120.26ms
iter 84730: loss 6.5307, time 119.51ms
iter 84740: loss 6.7155, time 120.25ms
step 84750: train loss 6.0820, val loss 6.1440
saving checkpoint to out-shakespeare-char
iter 84750: loss 6.4634, time 2855.55ms
iter 84760: loss 6.6308, time 119.71ms
iter 84770: loss 6.7118, time 120.67ms
iter 84780: loss 7.2301, time 119.96ms
iter 84790: loss 6.9988, time 119.65ms
tensor(0.0039)
iter 84800: loss 7.1017, time 121.11ms
iter 84810: loss 6.6479, time 119.84ms
iter 84820: loss 6.4611, time 121.75ms
iter 84830: loss 5.8961, time 119.51ms
iter 84840: loss 6.7732, time 123.25ms
iter 84850: loss 6.1388, time 120.04ms
iter 84860: loss 5.8233, time 119.73ms
iter 84870: loss 6.9054, time 120.74ms
iter 84880: loss 7.0556, time 120.73ms
iter 84890: loss 6.0833, time 119.58ms
tensor(0.0010)
iter 84900: loss 6.2765, time 121.16ms
iter 84910: loss 6.3915, time 120.51ms
iter 84920: loss 6.8922, time 119.80ms
iter 84930: loss 7.1831, time 122.49ms
iter 84940: loss 6.0974, time 122.14ms
iter 84950: loss 6.0278, time 121.32ms
iter 84960: loss 6.7314, time 121.31ms
iter 84970: loss 6.1139, time 121.38ms
iter 84980: loss 6.2834, time 124.42ms
iter 84990: loss 7.2063, time 120.42ms
tensor(0.0010)
step 85000: train loss 6.0818, val loss 6.1223
saving checkpoint to out-shakespeare-char
iter 85000: loss 6.8662, time 2843.56ms
iter 85010: loss 5.9963, time 120.59ms
iter 85020: loss 6.0502, time 120.88ms
iter 85030: loss 6.6806, time 118.43ms
iter 85040: loss 5.7321, time 119.53ms
iter 85050: loss 6.4589, time 119.19ms
iter 85060: loss 6.3672, time 119.87ms
iter 85070: loss 6.9588, time 119.23ms
iter 85080: loss 6.9000, time 118.81ms
iter 85090: loss 7.3227, time 119.20ms
tensor(0.0010)
iter 85100: loss 5.6726, time 121.70ms
iter 85110: loss 6.5323, time 120.09ms
iter 85120: loss 6.6711, time 120.24ms
iter 85130: loss 7.5130, time 119.22ms
iter 85140: loss 6.5821, time 118.80ms
iter 85150: loss 6.4494, time 118.90ms
iter 85160: loss 6.7386, time 122.30ms
iter 85170: loss 6.4208, time 120.52ms
iter 85180: loss 7.1908, time 122.01ms
iter 85190: loss 7.0315, time 122.28ms
tensor(0.0039)
iter 85200: loss 6.1375, time 121.93ms
iter 85210: loss 5.9535, time 122.30ms
iter 85220: loss 6.3509, time 126.25ms
iter 85230: loss 6.6312, time 123.39ms
iter 85240: loss 7.5940, time 123.22ms
step 85250: train loss 6.1170, val loss 6.0774
saving checkpoint to out-shakespeare-char
iter 85250: loss 6.1919, time 2860.33ms
iter 85260: loss 6.6338, time 125.93ms
iter 85270: loss 6.2453, time 122.20ms
iter 85280: loss 6.0946, time 122.17ms
iter 85290: loss 6.7333, time 122.91ms
tensor(0.0089)
iter 85300: loss 6.5817, time 123.13ms
iter 85310: loss 5.7955, time 123.05ms
iter 85320: loss 7.1357, time 122.93ms
iter 85330: loss 6.5957, time 125.11ms
iter 85340: loss 6.5661, time 123.00ms
iter 85350: loss 6.1893, time 123.21ms
iter 85360: loss 5.8029, time 122.10ms
iter 85370: loss 6.8744, time 122.31ms
iter 85380: loss 6.8294, time 122.92ms
iter 85390: loss 6.9550, time 122.92ms
tensor(0.0157)
iter 85400: loss 6.1548, time 126.20ms
iter 85410: loss 5.7359, time 123.01ms
iter 85420: loss 6.3661, time 122.96ms
iter 85430: loss 5.9855, time 122.40ms
iter 85440: loss 6.0548, time 122.58ms
iter 85450: loss 6.0857, time 123.34ms
iter 85460: loss 7.4804, time 122.21ms
iter 85470: loss 6.3887, time 122.75ms
iter 85480: loss 7.4584, time 122.78ms
iter 85490: loss 6.5001, time 125.85ms
tensor(0.0245)
step 85500: train loss 6.1041, val loss 6.0565
saving checkpoint to out-shakespeare-char
iter 85500: loss 6.9333, time 2850.75ms
iter 85510: loss 6.1772, time 125.57ms
iter 85520: loss 6.5201, time 122.34ms
iter 85530: loss 7.1026, time 122.97ms
iter 85540: loss 7.0589, time 121.82ms
iter 85550: loss 6.0216, time 123.07ms
iter 85560: loss 7.1211, time 123.14ms
iter 85570: loss 6.3130, time 122.79ms
iter 85580: loss 6.2847, time 121.75ms
iter 85590: loss 6.4646, time 125.71ms
tensor(0.0351)
iter 85600: loss 6.3964, time 123.02ms
iter 85610: loss 6.8869, time 122.84ms
iter 85620: loss 6.5823, time 122.79ms
iter 85630: loss 5.9011, time 123.13ms
iter 85640: loss 5.8148, time 121.87ms
iter 85650: loss 6.4703, time 122.27ms
iter 85660: loss 5.8945, time 125.57ms
iter 85670: loss 5.9866, time 122.68ms
iter 85680: loss 5.8815, time 122.93ms
iter 85690: loss 6.1731, time 122.82ms
tensor(0.0476)
iter 85700: loss 7.0314, time 122.83ms
iter 85710: loss 6.2267, time 122.04ms
iter 85720: loss 6.6792, time 122.71ms
iter 85730: loss 6.3259, time 125.92ms
iter 85740: loss 6.1261, time 121.99ms
step 85750: train loss 6.0503, val loss 6.0871
saving checkpoint to out-shakespeare-char
iter 85750: loss 6.2806, time 2851.49ms
iter 85760: loss 6.9108, time 122.88ms
iter 85770: loss 6.3235, time 122.44ms
iter 85780: loss 6.4201, time 123.02ms
iter 85790: loss 6.4728, time 123.20ms
tensor(0.0618)
iter 85800: loss 6.4214, time 122.85ms
iter 85810: loss 6.8463, time 122.63ms
iter 85820: loss 6.9491, time 124.69ms
iter 85830: loss 7.6604, time 125.21ms
iter 85840: loss 7.2968, time 126.86ms
iter 85850: loss 6.0281, time 125.91ms
iter 85860: loss 6.6188, time 123.35ms
iter 85870: loss 6.4127, time 122.33ms
iter 85880: loss 6.5806, time 122.73ms
iter 85890: loss 6.0120, time 122.29ms
tensor(0.0778)
iter 85900: loss 6.7452, time 122.49ms
iter 85910: loss 6.8316, time 122.79ms
iter 85920: loss 6.2794, time 121.30ms
iter 85930: loss 6.3354, time 122.74ms
iter 85940: loss 6.8930, time 124.16ms
iter 85950: loss 6.6024, time 124.76ms
iter 85960: loss 6.6221, time 122.51ms
iter 85970: loss 5.8394, time 121.83ms
iter 85980: loss 7.6153, time 122.94ms
iter 85990: loss 6.0173, time 122.95ms
tensor(0.0955)
step 86000: train loss 6.0836, val loss 6.0805
saving checkpoint to out-shakespeare-char
iter 86000: loss 6.4625, time 2879.39ms
iter 86010: loss 6.0955, time 123.10ms
iter 86020: loss 6.0683, time 122.06ms
iter 86030: loss 7.0629, time 123.22ms
iter 86040: loss 7.2071, time 122.81ms
iter 86050: loss 5.7180, time 124.36ms
iter 86060: loss 6.4993, time 125.37ms
iter 86070: loss 7.0375, time 122.75ms
iter 86080: loss 7.3371, time 123.50ms
iter 86090: loss 6.7540, time 123.28ms
tensor(0.1147)
iter 86100: loss 6.6148, time 123.37ms
iter 86110: loss 6.8365, time 122.84ms
iter 86120: loss 7.0303, time 122.96ms
iter 86130: loss 7.1402, time 123.47ms
iter 86140: loss 6.5730, time 124.12ms
iter 86150: loss 6.8186, time 124.86ms
iter 86160: loss 5.8795, time 122.93ms
iter 86170: loss 6.0274, time 122.65ms
iter 86180: loss 6.1905, time 122.91ms
iter 86190: loss 7.1933, time 123.03ms
tensor(0.1355)
iter 86200: loss 6.3357, time 123.01ms
iter 86210: loss 5.8562, time 124.01ms
iter 86220: loss 6.8728, time 122.90ms
iter 86230: loss 6.5886, time 121.93ms
iter 86240: loss 5.9692, time 122.65ms
step 86250: train loss 6.0852, val loss 6.1038
saving checkpoint to out-shakespeare-char
iter 86250: loss 6.7278, time 2853.00ms
iter 86260: loss 7.1443, time 124.95ms
iter 86270: loss 6.3203, time 122.88ms
iter 86280: loss 7.0837, time 122.52ms
iter 86290: loss 6.4664, time 122.46ms
tensor(0.1577)
iter 86300: loss 6.0972, time 122.52ms
iter 86310: loss 6.3706, time 122.13ms
iter 86320: loss 6.3349, time 122.57ms
iter 86330: loss 6.0943, time 125.13ms
iter 86340: loss 6.9153, time 122.28ms
iter 86350: loss 6.7821, time 122.49ms
iter 86360: loss 6.9790, time 122.40ms
iter 86370: loss 6.7638, time 123.25ms
iter 86380: loss 6.1129, time 122.39ms
iter 86390: loss 6.5176, time 122.92ms
tensor(0.1813)
iter 86400: loss 7.2215, time 124.22ms
iter 86410: loss 6.4957, time 122.29ms
iter 86420: loss 6.6010, time 122.47ms
iter 86430: loss 7.0281, time 122.03ms
iter 86440: loss 6.5406, time 121.77ms
iter 86450: loss 6.9491, time 123.16ms
iter 86460: loss 6.4473, time 123.14ms
iter 86470: loss 7.1713, time 122.98ms
iter 86480: loss 7.1133, time 122.93ms
iter 86490: loss 6.4859, time 122.99ms
tensor(0.2061)
step 86500: train loss 6.1055, val loss 6.1556
saving checkpoint to out-shakespeare-char
iter 86500: loss 6.2407, time 2862.38ms
iter 86510: loss 5.8418, time 125.02ms
iter 86520: loss 6.8813, time 122.94ms
iter 86530: loss 6.3856, time 122.83ms
iter 86540: loss 6.4424, time 122.05ms
iter 86550: loss 6.5312, time 122.53ms
iter 86560: loss 6.9026, time 122.69ms
iter 86570: loss 7.3356, time 120.80ms
iter 86580: loss 6.6368, time 124.84ms
iter 86590: loss 6.8262, time 122.90ms
tensor(0.2321)
iter 86600: loss 6.6231, time 122.13ms
iter 86610: loss 7.0846, time 122.52ms
iter 86620: loss 6.6585, time 122.53ms
iter 86630: loss 6.5600, time 122.40ms
iter 86640: loss 6.6688, time 121.40ms
iter 86650: loss 7.1523, time 122.96ms
iter 86660: loss 6.4631, time 124.69ms
iter 86670: loss 6.1251, time 122.36ms
iter 86680: loss 6.4337, time 122.76ms
iter 86690: loss 7.3862, time 122.72ms
tensor(0.2591)
iter 86700: loss 6.6047, time 123.25ms
iter 86710: loss 6.8854, time 122.97ms
iter 86720: loss 7.4103, time 122.69ms
iter 86730: loss 7.0328, time 123.84ms
iter 86740: loss 7.5722, time 122.93ms
step 86750: train loss 6.1652, val loss 6.2061
saving checkpoint to out-shakespeare-char
iter 86750: loss 6.7180, time 2864.10ms
iter 86760: loss 6.9087, time 122.88ms
iter 86770: loss 6.7875, time 124.08ms
iter 86780: loss 6.5258, time 122.47ms
iter 86790: loss 5.9620, time 122.41ms
tensor(0.2871)
iter 86800: loss 6.6829, time 122.76ms
iter 86810: loss 6.6528, time 122.64ms
iter 86820: loss 6.5698, time 122.38ms
iter 86830: loss 6.1413, time 122.66ms
iter 86840: loss 6.8900, time 125.02ms
iter 86850: loss 6.9273, time 122.64ms
iter 86860: loss 6.7776, time 122.65ms
iter 86870: loss 7.1184, time 122.90ms
iter 86880: loss 6.6652, time 122.52ms
iter 86890: loss 6.2125, time 122.86ms
tensor(0.3159)
iter 86900: loss 7.2848, time 122.98ms
iter 86910: loss 6.4353, time 125.04ms
iter 86920: loss 6.1284, time 122.72ms
iter 86930: loss 6.7476, time 122.63ms
iter 86940: loss 6.4342, time 122.00ms
iter 86950: loss 7.2573, time 122.94ms
iter 86960: loss 6.6263, time 122.43ms
iter 86970: loss 7.1349, time 122.25ms
iter 86980: loss 7.4863, time 124.64ms
iter 86990: loss 6.5359, time 122.77ms
tensor(0.3455)
step 87000: train loss 6.2228, val loss 6.2252
saving checkpoint to out-shakespeare-char
iter 87000: loss 7.0123, time 2872.64ms
iter 87010: loss 7.1769, time 122.64ms
iter 87020: loss 6.1807, time 124.56ms
iter 87030: loss 6.8200, time 122.33ms
iter 87040: loss 7.1444, time 122.31ms
iter 87050: loss 6.6361, time 122.44ms
iter 87060: loss 7.2324, time 122.62ms
iter 87070: loss 6.7463, time 121.74ms
iter 87080: loss 6.4385, time 122.47ms
iter 87090: loss 6.5172, time 124.49ms
tensor(0.3757)
iter 87100: loss 6.7277, time 122.35ms
iter 87110: loss 6.6967, time 122.56ms
iter 87120: loss 6.1461, time 122.89ms
iter 87130: loss 6.7281, time 122.64ms
iter 87140: loss 6.4136, time 122.46ms
iter 87150: loss 6.8145, time 122.97ms
iter 87160: loss 6.6516, time 124.81ms
iter 87170: loss 6.9480, time 122.69ms
iter 87180: loss 6.1058, time 122.46ms
iter 87190: loss 6.3531, time 122.53ms
tensor(0.4063)
iter 87200: loss 6.9063, time 122.61ms
iter 87210: loss 6.3062, time 122.68ms
iter 87220: loss 7.3103, time 122.22ms
iter 87230: loss 6.1499, time 124.60ms
iter 87240: loss 6.8496, time 122.91ms
step 87250: train loss 6.2507, val loss 6.3097
saving checkpoint to out-shakespeare-char
iter 87250: loss 6.3601, time 2864.59ms
iter 87260: loss 6.7650, time 123.08ms
iter 87270: loss 7.2336, time 124.77ms
iter 87280: loss 6.1916, time 122.44ms
iter 87290: loss 7.1804, time 122.70ms
tensor(0.4373)
iter 87300: loss 6.5092, time 122.44ms
iter 87310: loss 7.1247, time 122.36ms
iter 87320: loss 6.3389, time 122.83ms
iter 87330: loss 6.1317, time 124.56ms
iter 87340: loss 7.0041, time 121.86ms
iter 87350: loss 7.1349, time 122.60ms
iter 87360: loss 6.6256, time 122.57ms
iter 87370: loss 6.6337, time 122.50ms
iter 87380: loss 7.7091, time 121.83ms
iter 87390: loss 7.8478, time 122.70ms
tensor(0.4686)
iter 87400: loss 6.8973, time 124.97ms
iter 87410: loss 6.4935, time 121.74ms
iter 87420: loss 6.0408, time 122.98ms
iter 87430: loss 6.6285, time 121.37ms
iter 87440: loss 6.5320, time 122.82ms
iter 87450: loss 7.2103, time 122.28ms
iter 87460: loss 6.5138, time 122.34ms
iter 87470: loss 6.6366, time 123.12ms
iter 87480: loss 7.0207, time 123.52ms
iter 87490: loss 6.8796, time 122.26ms
tensor(0.5000)
step 87500: train loss 6.3692, val loss 6.3824
saving checkpoint to out-shakespeare-char
iter 87500: loss 7.0450, time 2837.95ms
iter 87510: loss 6.3225, time 118.64ms
iter 87520: loss 7.0004, time 119.24ms
iter 87530: loss 7.4691, time 120.41ms
iter 87540: loss 7.8401, time 120.32ms
iter 87550: loss 6.5201, time 118.31ms
iter 87560: loss 6.7440, time 119.03ms
iter 87570: loss 6.7521, time 120.49ms
iter 87580: loss 6.5509, time 120.48ms
iter 87590: loss 6.7351, time 120.39ms
tensor(0.5314)
iter 87600: loss 7.3715, time 120.57ms
iter 87610: loss 6.7903, time 118.74ms
iter 87620: loss 6.2898, time 120.09ms
iter 87630: loss 6.9129, time 120.30ms
iter 87640: loss 6.7386, time 119.15ms
iter 87650: loss 6.0600, time 120.43ms
iter 87660: loss 6.5968, time 120.93ms
iter 87670: loss 6.5796, time 119.51ms
iter 87680: loss 6.6803, time 120.46ms
iter 87690: loss 6.9974, time 122.01ms
tensor(0.5627)
iter 87700: loss 6.4478, time 119.78ms
iter 87710: loss 6.5597, time 119.02ms
iter 87720: loss 7.6162, time 120.84ms
iter 87730: loss 6.7363, time 120.33ms
iter 87740: loss 7.4845, time 120.96ms
step 87750: train loss 6.3888, val loss 6.3578
saving checkpoint to out-shakespeare-char
iter 87750: loss 6.9589, time 2848.98ms
iter 87760: loss 7.1842, time 119.34ms
iter 87770: loss 7.1631, time 120.36ms
iter 87780: loss 7.0249, time 120.07ms
iter 87790: loss 6.3570, time 120.29ms
tensor(0.5937)
iter 87800: loss 7.2966, time 121.83ms
iter 87810: loss 7.0559, time 119.08ms
iter 87820: loss 6.5074, time 119.09ms
iter 87830: loss 6.9799, time 118.99ms
iter 87840: loss 6.8686, time 119.65ms
iter 87850: loss 7.0075, time 120.46ms
iter 87860: loss 6.9026, time 120.26ms
iter 87870: loss 7.1657, time 119.05ms
iter 87880: loss 7.3829, time 119.17ms
iter 87890: loss 6.5712, time 119.56ms
tensor(0.6243)
iter 87900: loss 7.1355, time 120.66ms
iter 87910: loss 6.6667, time 119.30ms
iter 87920: loss 6.4561, time 119.85ms
iter 87930: loss 6.6939, time 119.33ms
iter 87940: loss 7.1437, time 119.97ms
iter 87950: loss 7.9655, time 122.09ms
iter 87960: loss 6.9361, time 119.10ms
iter 87970: loss 7.2821, time 119.09ms
iter 87980: loss 7.5608, time 120.21ms
iter 87990: loss 7.0103, time 118.91ms
tensor(0.6545)
step 88000: train loss 6.3983, val loss 6.4145
saving checkpoint to out-shakespeare-char
iter 88000: loss 6.2446, time 2862.44ms
iter 88010: loss 7.1964, time 122.32ms
iter 88020: loss 5.9439, time 122.59ms
iter 88030: loss 6.9931, time 122.77ms
iter 88040: loss 7.1457, time 124.60ms
iter 88050: loss 6.7419, time 122.43ms
iter 88060: loss 6.4324, time 122.27ms
iter 88070: loss 7.1012, time 122.43ms
iter 88080: loss 7.5137, time 123.71ms
iter 88090: loss 6.9536, time 122.56ms
tensor(0.6841)
iter 88100: loss 6.8300, time 123.29ms
iter 88110: loss 7.2360, time 122.29ms
iter 88120: loss 7.5734, time 122.49ms
iter 88130: loss 6.2738, time 122.34ms
iter 88140: loss 7.1900, time 122.49ms
iter 88150: loss 7.1243, time 122.23ms
iter 88160: loss 6.9303, time 122.43ms
iter 88170: loss 6.3073, time 122.46ms
iter 88180: loss 7.8388, time 124.21ms
iter 88190: loss 7.3012, time 122.31ms
tensor(0.7129)
iter 88200: loss 7.1431, time 122.52ms
iter 88210: loss 7.0178, time 122.23ms
iter 88220: loss 7.1077, time 122.15ms
iter 88230: loss 6.8479, time 122.48ms
iter 88240: loss 7.3356, time 122.44ms
step 88250: train loss 6.4542, val loss 6.4266
saving checkpoint to out-shakespeare-char
iter 88250: loss 6.8971, time 2868.68ms
iter 88260: loss 6.8910, time 121.83ms
iter 88270: loss 6.9205, time 122.26ms
iter 88280: loss 6.4171, time 121.55ms
iter 88290: loss 6.7774, time 124.68ms
tensor(0.7409)
iter 88300: loss 7.1240, time 122.55ms
iter 88310: loss 6.9227, time 122.38ms
iter 88320: loss 7.7021, time 122.48ms
iter 88330: loss 7.8235, time 122.40ms
iter 88340: loss 7.5517, time 122.63ms
iter 88350: loss 7.1334, time 122.60ms
iter 88360: loss 6.7571, time 124.69ms
iter 88370: loss 6.7215, time 122.67ms
iter 88380: loss 6.9134, time 122.43ms
iter 88390: loss 6.8063, time 123.92ms
tensor(0.7679)
iter 88400: loss 7.2315, time 122.54ms
iter 88410: loss 8.0212, time 121.97ms
iter 88420: loss 6.3928, time 122.51ms
iter 88430: loss 7.2425, time 124.36ms
iter 88440: loss 6.8384, time 122.41ms
iter 88450: loss 6.9270, time 122.50ms
iter 88460: loss 7.1138, time 123.81ms
iter 88470: loss 7.7787, time 122.37ms
iter 88480: loss 6.9361, time 122.69ms
iter 88490: loss 7.1880, time 122.79ms
tensor(0.7939)
step 88500: train loss 6.4835, val loss 6.5163
saving checkpoint to out-shakespeare-char
iter 88500: loss 7.3247, time 2855.99ms
iter 88510: loss 7.2278, time 122.70ms
iter 88520: loss 6.9243, time 123.99ms
iter 88530: loss 7.0092, time 123.36ms
iter 88540: loss 6.9413, time 124.93ms
iter 88550: loss 7.2533, time 122.86ms
iter 88560: loss 7.2639, time 121.69ms
iter 88570: loss 7.1575, time 122.82ms
iter 88580: loss 7.2400, time 122.81ms
iter 88590: loss 6.4082, time 122.90ms
tensor(0.8187)
iter 88600: loss 7.1120, time 123.06ms
iter 88610: loss 6.7915, time 124.81ms
iter 88620: loss 7.2664, time 122.77ms
iter 88630: loss 7.7907, time 122.81ms
iter 88640: loss 7.1747, time 122.77ms
iter 88650: loss 6.9530, time 122.87ms
iter 88660: loss 7.1700, time 122.69ms
iter 88670: loss 6.8166, time 122.91ms
iter 88680: loss 7.0372, time 125.01ms
iter 88690: loss 6.7834, time 122.69ms
tensor(0.8423)
iter 88700: loss 7.2182, time 122.97ms
iter 88710: loss 7.4221, time 122.95ms
iter 88720: loss 7.2680, time 122.76ms
iter 88730: loss 7.4313, time 122.78ms
iter 88740: loss 6.8497, time 122.99ms
step 88750: train loss 6.5728, val loss 6.5368
saving checkpoint to out-shakespeare-char
iter 88750: loss 6.3468, time 2862.80ms
iter 88760: loss 6.8645, time 122.84ms
iter 88770: loss 7.2338, time 122.86ms
iter 88780: loss 7.2168, time 122.98ms
iter 88790: loss 6.8468, time 123.00ms
tensor(0.8645)
iter 88800: loss 6.8822, time 122.91ms
iter 88810: loss 7.8647, time 122.94ms
iter 88820: loss 7.5853, time 125.05ms
iter 88830: loss 6.8628, time 122.83ms
iter 88840: loss 6.3650, time 122.73ms
iter 88850: loss 7.8751, time 122.77ms
iter 88860: loss 6.9911, time 122.82ms
iter 88870: loss 6.9455, time 122.85ms
iter 88880: loss 6.6656, time 123.43ms
iter 88890: loss 7.6833, time 125.37ms
tensor(0.8853)
iter 88900: loss 7.5399, time 122.50ms
iter 88910: loss 7.3730, time 122.81ms
iter 88920: loss 7.7182, time 123.13ms
iter 88930: loss 7.8248, time 122.60ms
iter 88940: loss 7.3041, time 122.94ms
iter 88950: loss 7.7629, time 122.96ms
iter 88960: loss 6.8124, time 123.05ms
iter 88970: loss 7.3255, time 125.04ms
iter 88980: loss 6.9379, time 122.97ms
iter 88990: loss 7.4667, time 122.91ms
tensor(0.9045)
step 89000: train loss 6.5146, val loss 6.5161
saving checkpoint to out-shakespeare-char
iter 89000: loss 7.1020, time 2846.83ms
iter 89010: loss 7.4925, time 120.27ms
iter 89020: loss 7.0613, time 121.00ms
iter 89030: loss 7.1193, time 119.66ms
iter 89040: loss 7.0317, time 120.60ms
iter 89050: loss 7.4542, time 120.45ms
iter 89060: loss 7.3260, time 119.22ms
iter 89070: loss 6.1918, time 119.36ms
iter 89080: loss 7.2124, time 119.61ms
iter 89090: loss 7.0406, time 119.31ms
tensor(0.9222)
iter 89100: loss 7.3425, time 120.53ms
iter 89110: loss 8.0199, time 119.56ms
iter 89120: loss 6.8565, time 119.19ms
iter 89130: loss 7.1349, time 119.59ms
iter 89140: loss 7.1739, time 119.60ms
iter 89150: loss 7.0365, time 120.67ms
iter 89160: loss 7.3786, time 120.46ms
iter 89170: loss 7.4144, time 118.33ms
iter 89180: loss 7.4628, time 119.29ms
iter 89190: loss 7.0507, time 120.56ms
tensor(0.9382)
iter 89200: loss 6.4837, time 119.38ms
iter 89210: loss 7.1672, time 119.40ms
iter 89220: loss 7.2330, time 120.60ms
iter 89230: loss 6.7216, time 119.73ms
iter 89240: loss 7.2913, time 121.19ms
step 89250: train loss 6.6173, val loss 6.6812
saving checkpoint to out-shakespeare-char
iter 89250: loss 7.3410, time 2872.32ms
iter 89260: loss 5.8597, time 119.60ms
iter 89270: loss 7.5910, time 121.40ms
iter 89280: loss 7.7132, time 119.47ms
iter 89290: loss 7.4031, time 121.29ms
tensor(0.9524)
iter 89300: loss 7.2199, time 119.36ms
iter 89310: loss 6.9627, time 120.52ms
iter 89320: loss 7.3290, time 119.41ms
iter 89330: loss 7.3677, time 121.07ms
iter 89340: loss 7.1479, time 119.43ms
iter 89350: loss 7.7599, time 119.46ms
iter 89360: loss 7.1671, time 119.56ms
iter 89370: loss 6.5709, time 120.62ms
iter 89380: loss 7.0438, time 119.38ms
iter 89390: loss 7.1564, time 119.55ms
tensor(0.9649)
iter 89400: loss 7.1765, time 121.56ms
iter 89410: loss 7.5521, time 119.44ms
iter 89420: loss 6.9983, time 120.42ms
iter 89430: loss 7.2079, time 119.55ms
iter 89440: loss 6.7125, time 120.65ms
iter 89450: loss 6.6454, time 121.87ms
iter 89460: loss 6.9440, time 120.34ms
iter 89470: loss 8.1327, time 119.33ms
iter 89480: loss 7.6230, time 119.52ms
iter 89490: loss 6.8966, time 120.65ms
tensor(0.9755)
step 89500: train loss 6.6961, val loss 6.6539
saving checkpoint to out-shakespeare-char
iter 89500: loss 6.9331, time 2874.81ms
iter 89510: loss 6.5941, time 120.72ms
iter 89520: loss 7.6644, time 120.64ms
iter 89530: loss 7.4796, time 119.41ms
iter 89540: loss 7.4526, time 119.39ms
iter 89550: loss 7.5287, time 119.53ms
iter 89560: loss 8.2122, time 119.85ms
iter 89570: loss 6.5659, time 120.54ms
iter 89580: loss 7.6569, time 119.42ms
iter 89590: loss 7.0675, time 120.64ms
tensor(0.9843)
iter 89600: loss 6.9390, time 120.81ms
iter 89610: loss 8.5655, time 119.37ms
iter 89620: loss 7.0552, time 119.50ms
iter 89630: loss 7.0127, time 121.53ms
iter 89640: loss 7.1434, time 119.32ms
iter 89650: loss 7.5976, time 120.64ms
iter 89660: loss 6.9495, time 120.52ms
iter 89670: loss 6.7306, time 121.39ms
iter 89680: loss 7.7060, time 119.33ms
iter 89690: loss 6.6601, time 120.53ms
tensor(0.9911)
iter 89700: loss 7.1438, time 120.68ms
iter 89710: loss 6.7302, time 120.36ms
iter 89720: loss 8.2641, time 119.50ms
iter 89730: loss 7.2679, time 121.53ms
iter 89740: loss 7.2272, time 120.55ms
step 89750: train loss 6.6131, val loss 6.6387
saving checkpoint to out-shakespeare-char
iter 89750: loss 7.5897, time 2878.53ms
iter 89760: loss 7.8068, time 119.77ms
iter 89770: loss 7.2848, time 119.19ms
iter 89780: loss 7.6639, time 120.13ms
iter 89790: loss 7.4908, time 119.20ms
tensor(0.9961)
iter 89800: loss 7.6072, time 119.54ms
iter 89810: loss 7.1280, time 119.48ms
iter 89820: loss 7.2752, time 120.21ms
iter 89830: loss 7.6542, time 119.37ms
iter 89840: loss 7.9037, time 119.33ms
iter 89850: loss 6.3132, time 119.37ms
iter 89860: loss 6.5573, time 119.29ms
iter 89870: loss 7.2986, time 119.50ms
iter 89880: loss 7.3641, time 119.77ms
iter 89890: loss 7.8903, time 120.47ms
tensor(0.9990)
iter 89900: loss 7.5682, time 119.62ms
iter 89910: loss 7.7111, time 119.45ms
iter 89920: loss 7.4999, time 120.45ms
iter 89930: loss 6.7979, time 120.16ms
iter 89940: loss 7.0228, time 120.47ms
iter 89950: loss 6.8710, time 119.42ms
iter 89960: loss 6.6766, time 119.37ms
iter 89970: loss 7.2967, time 119.50ms
iter 89980: loss 7.8407, time 120.55ms
iter 89990: loss 7.5343, time 119.22ms
tensor(1.)
step 90000: train loss 6.7099, val loss 6.6612
saving checkpoint to out-shakespeare-char
iter 90000: loss 7.7273, time 2873.74ms
iter 90010: loss 6.3950, time 120.70ms
iter 90020: loss 7.0699, time 120.24ms
iter 90030: loss 7.4429, time 120.39ms
iter 90040: loss 7.5519, time 119.35ms
iter 90050: loss 7.3028, time 119.27ms
iter 90060: loss 7.0983, time 119.40ms
iter 90070: loss 7.6672, time 119.41ms
iter 90080: loss 7.1610, time 120.42ms
iter 90090: loss 7.3151, time 120.91ms
tensor(0.9990)
iter 90100: loss 7.3400, time 120.88ms
iter 90110: loss 7.7713, time 120.87ms
iter 90120: loss 7.1489, time 119.40ms
iter 90130: loss 7.3415, time 119.37ms
iter 90140: loss 7.4001, time 120.53ms
iter 90150: loss 6.7515, time 119.32ms
iter 90160: loss 7.6597, time 120.40ms
iter 90170: loss 6.8888, time 120.31ms
iter 90180: loss 7.1235, time 120.51ms
iter 90190: loss 7.7825, time 119.30ms
tensor(0.9961)
iter 90200: loss 7.5013, time 119.36ms
iter 90210: loss 7.1553, time 119.39ms
iter 90220: loss 7.5112, time 119.22ms
iter 90230: loss 7.1892, time 119.32ms
iter 90240: loss 6.5785, time 119.19ms
step 90250: train loss 6.6454, val loss 6.6263
saving checkpoint to out-shakespeare-char
iter 90250: loss 7.6128, time 2868.15ms
iter 90260: loss 7.6758, time 120.50ms
iter 90270: loss 8.3772, time 119.89ms
iter 90280: loss 7.2947, time 120.65ms
iter 90290: loss 7.4950, time 119.23ms
tensor(0.9911)
iter 90300: loss 7.0824, time 119.17ms
iter 90310: loss 7.3325, time 119.10ms
iter 90320: loss 6.8964, time 122.91ms
iter 90330: loss 7.0187, time 123.03ms
iter 90340: loss 7.0895, time 124.92ms
iter 90350: loss 7.7176, time 125.19ms
iter 90360: loss 7.7046, time 123.03ms
iter 90370: loss 7.0227, time 123.32ms
iter 90380: loss 7.3246, time 123.34ms
iter 90390: loss 6.9861, time 123.73ms
tensor(0.9843)
iter 90400: loss 7.9454, time 122.92ms
iter 90410: loss 7.7861, time 123.14ms
iter 90420: loss 6.6202, time 122.77ms
iter 90430: loss 7.3386, time 123.14ms
iter 90440: loss 7.0233, time 122.95ms
iter 90450: loss 6.5525, time 123.36ms
iter 90460: loss 7.7259, time 122.68ms
iter 90470: loss 6.9914, time 123.42ms
iter 90480: loss 7.8294, time 123.03ms
iter 90490: loss 7.4078, time 125.36ms
tensor(0.9755)
step 90500: train loss 6.6335, val loss 6.6724
saving checkpoint to out-shakespeare-char
iter 90500: loss 7.8048, time 2819.96ms
iter 90510: loss 7.4520, time 124.14ms
iter 90520: loss 8.0183, time 123.22ms
iter 90530: loss 7.7423, time 122.23ms
iter 90540: loss 7.4748, time 122.35ms
iter 90550: loss 7.7332, time 122.00ms
iter 90560: loss 7.6365, time 121.91ms
iter 90570: loss 7.4632, time 122.53ms
iter 90580: loss 6.8005, time 122.47ms
iter 90590: loss 7.2714, time 122.64ms
tensor(0.9649)
iter 90600: loss 6.3467, time 122.78ms
iter 90610: loss 7.0305, time 122.46ms
iter 90620: loss 7.2595, time 122.68ms
iter 90630: loss 7.8047, time 122.79ms
iter 90640: loss 7.4194, time 122.79ms
iter 90650: loss 7.0214, time 123.08ms
iter 90660: loss 6.8369, time 125.60ms
iter 90670: loss 6.4972, time 122.43ms
iter 90680: loss 7.6131, time 124.04ms
iter 90690: loss 6.9565, time 122.78ms
tensor(0.9524)
iter 90700: loss 7.7193, time 123.13ms
iter 90710: loss 7.1361, time 122.80ms
iter 90720: loss 6.5673, time 122.80ms
iter 90730: loss 7.3339, time 125.15ms
iter 90740: loss 7.6284, time 123.01ms
step 90750: train loss 6.5813, val loss 6.5828
saving checkpoint to out-shakespeare-char
iter 90750: loss 7.3047, time 2857.83ms
iter 90760: loss 7.4473, time 120.49ms
iter 90770: loss 7.5948, time 119.14ms
iter 90780: loss 7.6944, time 120.44ms
iter 90790: loss 7.1256, time 120.83ms
tensor(0.9382)
iter 90800: loss 7.4550, time 119.87ms
iter 90810: loss 6.9839, time 120.44ms
iter 90820: loss 7.7871, time 119.11ms
iter 90830: loss 8.2615, time 120.05ms
iter 90840: loss 7.3404, time 121.72ms
iter 90850: loss 7.3862, time 118.50ms
iter 90860: loss 7.3564, time 119.31ms
iter 90870: loss 6.9766, time 120.55ms
iter 90880: loss 6.8447, time 119.29ms
iter 90890: loss 7.4628, time 121.81ms
tensor(0.9222)
iter 90900: loss 7.7151, time 119.18ms
iter 90910: loss 6.9539, time 120.20ms
iter 90920: loss 7.2405, time 119.11ms
iter 90930: loss 7.2381, time 119.29ms
iter 90940: loss 6.7584, time 119.11ms
iter 90950: loss 7.0536, time 119.21ms
iter 90960: loss 7.4132, time 120.30ms
iter 90970: loss 7.4618, time 119.18ms
iter 90980: loss 6.6482, time 119.98ms
iter 90990: loss 7.4376, time 119.43ms
tensor(0.9045)
step 91000: train loss 6.6083, val loss 6.6061
saving checkpoint to out-shakespeare-char
iter 91000: loss 6.9986, time 2850.70ms
iter 91010: loss 7.3407, time 122.47ms
iter 91020: loss 7.4531, time 122.42ms
iter 91030: loss 8.0178, time 124.80ms
iter 91040: loss 7.0173, time 122.08ms
iter 91050: loss 7.9696, time 122.78ms
iter 91060: loss 7.1758, time 122.30ms
iter 91070: loss 7.1401, time 123.53ms
iter 91080: loss 7.1195, time 122.25ms
iter 91090: loss 7.5485, time 122.32ms
tensor(0.8853)
iter 91100: loss 7.4831, time 124.38ms
iter 91110: loss 6.6982, time 122.49ms
iter 91120: loss 7.4913, time 122.01ms
iter 91130: loss 6.9068, time 123.03ms
iter 91140: loss 7.4137, time 122.61ms
iter 91150: loss 7.2550, time 124.01ms
iter 91160: loss 6.6873, time 122.75ms
iter 91170: loss 7.3018, time 124.67ms
iter 91180: loss 7.2506, time 122.29ms
iter 91190: loss 7.0010, time 121.77ms
tensor(0.8645)
iter 91200: loss 6.2924, time 123.25ms
iter 91210: loss 7.7828, time 122.23ms
iter 91220: loss 7.0431, time 120.83ms
iter 91230: loss 7.0122, time 122.52ms
iter 91240: loss 6.7403, time 122.78ms
step 91250: train loss 6.5763, val loss 6.6042
saving checkpoint to out-shakespeare-char
iter 91250: loss 7.3271, time 2868.41ms
iter 91260: loss 7.7778, time 122.49ms
iter 91270: loss 8.1038, time 120.57ms
iter 91280: loss 7.0586, time 121.51ms
iter 91290: loss 7.2671, time 119.18ms
tensor(0.8423)
iter 91300: loss 6.0557, time 121.19ms
iter 91310: loss 7.6968, time 118.89ms
iter 91320: loss 7.3483, time 119.29ms
iter 91330: loss 7.4983, time 119.12ms
iter 91340: loss 7.8245, time 119.07ms
iter 91350: loss 7.0460, time 119.18ms
iter 91360: loss 7.1636, time 118.88ms
iter 91370: loss 7.6234, time 118.01ms
iter 91380: loss 7.2329, time 120.39ms
iter 91390: loss 7.1016, time 119.09ms
tensor(0.8187)
iter 91400: loss 7.1044, time 119.10ms
iter 91410: loss 6.8486, time 120.16ms
iter 91420: loss 7.7249, time 119.23ms
iter 91430: loss 6.9093, time 120.29ms
iter 91440: loss 7.1403, time 119.04ms
iter 91450: loss 7.3060, time 119.93ms
iter 91460: loss 8.0294, time 118.95ms
iter 91470: loss 7.1343, time 119.65ms
iter 91480: loss 7.3839, time 120.53ms
iter 91490: loss 6.7339, time 119.19ms
tensor(0.7939)
step 91500: train loss 6.5617, val loss 6.5338
saving checkpoint to out-shakespeare-char
iter 91500: loss 7.4328, time 2867.89ms
iter 91510: loss 7.3832, time 119.29ms
iter 91520: loss 6.9800, time 119.11ms
iter 91530: loss 6.8608, time 119.42ms
iter 91540: loss 7.0109, time 119.13ms
iter 91550: loss 7.2578, time 119.09ms
iter 91560: loss 6.4925, time 121.21ms
iter 91570: loss 7.0843, time 119.18ms
iter 91580: loss 6.9304, time 120.66ms
iter 91590: loss 6.8326, time 118.87ms
tensor(0.7679)
iter 91600: loss 7.1902, time 120.32ms
iter 91610: loss 8.0286, time 120.20ms
iter 91620: loss 7.8523, time 118.99ms
iter 91630: loss 7.0460, time 120.51ms
iter 91640: loss 6.4783, time 120.23ms
iter 91650: loss 7.0948, time 120.34ms
iter 91660: loss 6.9790, time 119.07ms
iter 91670: loss 7.4044, time 118.23ms
iter 91680: loss 7.5687, time 119.13ms
iter 91690: loss 7.3411, time 119.29ms
tensor(0.7409)
iter 91700: loss 7.5969, time 119.21ms
iter 91710: loss 7.4073, time 119.13ms
iter 91720: loss 6.6556, time 121.76ms
iter 91730: loss 8.1118, time 120.04ms
iter 91740: loss 7.2475, time 120.66ms
step 91750: train loss 6.5055, val loss 6.5730
saving checkpoint to out-shakespeare-char
iter 91750: loss 7.1926, time 2866.43ms
iter 91760: loss 6.9976, time 120.73ms
iter 91770: loss 6.6179, time 120.42ms
iter 91780: loss 6.8443, time 120.38ms
iter 91790: loss 7.4032, time 121.72ms
tensor(0.7129)
iter 91800: loss 6.9332, time 120.48ms
iter 91810: loss 6.4994, time 119.22ms
iter 91820: loss 7.5684, time 120.13ms
iter 91830: loss 7.1628, time 120.25ms
iter 91840: loss 7.1132, time 119.31ms
iter 91850: loss 7.6710, time 118.03ms
iter 91860: loss 6.2622, time 121.74ms
iter 91870: loss 7.2838, time 119.23ms
iter 91880: loss 6.7378, time 119.03ms
iter 91890: loss 6.4051, time 119.73ms
tensor(0.6841)
iter 91900: loss 6.5220, time 120.62ms
iter 91910: loss 7.8200, time 120.68ms
iter 91920: loss 7.0459, time 119.20ms
iter 91930: loss 8.0445, time 119.27ms
iter 91940: loss 8.1145, time 122.25ms
iter 91950: loss 7.8018, time 119.29ms
iter 91960: loss 7.5100, time 119.40ms
iter 91970: loss 7.0675, time 120.16ms
iter 91980: loss 6.9701, time 120.33ms
iter 91990: loss 7.4479, time 119.24ms
tensor(0.6545)
step 92000: train loss 6.5371, val loss 6.5298
saving checkpoint to out-shakespeare-char
iter 92000: loss 7.1851, time 2878.17ms
iter 92010: loss 7.0738, time 122.78ms
iter 92020: loss 7.6432, time 123.06ms
iter 92030: loss 6.8295, time 124.98ms
iter 92040: loss 8.0773, time 122.68ms
iter 92050: loss 6.8822, time 122.43ms
iter 92060: loss 6.9897, time 122.36ms
iter 92070: loss 6.6487, time 122.30ms
iter 92080: loss 6.8078, time 122.32ms
iter 92090: loss 6.4224, time 122.84ms
tensor(0.6243)
iter 92100: loss 7.5912, time 123.53ms
iter 92110: loss 7.1735, time 122.92ms
iter 92120: loss 7.3110, time 123.15ms
iter 92130: loss 6.9685, time 123.38ms
iter 92140: loss 7.6201, time 122.77ms
iter 92150: loss 7.1887, time 122.62ms
iter 92160: loss 7.0869, time 124.97ms
iter 92170: loss 6.8532, time 121.58ms
iter 92180: loss 6.5065, time 123.10ms
iter 92190: loss 7.5504, time 122.06ms
tensor(0.5937)
iter 92200: loss 6.6549, time 122.97ms
iter 92210: loss 6.3244, time 123.91ms
iter 92220: loss 6.9705, time 121.24ms
iter 92230: loss 7.0515, time 121.56ms
iter 92240: loss 7.3665, time 122.69ms
step 92250: train loss 6.4953, val loss 6.4476
saving checkpoint to out-shakespeare-char
iter 92250: loss 6.8094, time 2846.47ms
iter 92260: loss 7.3473, time 119.68ms
iter 92270: loss 6.7221, time 120.46ms
iter 92280: loss 7.1012, time 119.34ms
iter 92290: loss 7.2821, time 120.63ms
tensor(0.5627)
iter 92300: loss 7.2573, time 120.78ms
iter 92310: loss 7.3712, time 120.57ms
iter 92320: loss 7.5469, time 121.74ms
iter 92330: loss 8.0183, time 120.87ms
iter 92340: loss 7.2206, time 119.28ms
iter 92350: loss 7.6750, time 119.47ms
iter 92360: loss 7.7047, time 120.73ms
iter 92370: loss 7.6506, time 120.60ms
iter 92380: loss 7.7904, time 121.03ms
iter 92390: loss 7.6759, time 120.33ms
tensor(0.5314)
iter 92400: loss 6.7729, time 119.31ms
iter 92410: loss 6.6878, time 119.31ms
iter 92420: loss 6.6731, time 121.92ms
iter 92430: loss 7.3010, time 119.00ms
iter 92440: loss 7.1280, time 119.44ms
iter 92450: loss 6.9889, time 121.39ms
iter 92460: loss 7.3304, time 119.59ms
iter 92470: loss 6.6666, time 119.26ms
iter 92480: loss 7.1074, time 119.35ms
iter 92490: loss 7.3211, time 119.18ms
tensor(0.5000)
step 92500: train loss 6.4059, val loss 6.4170
saving checkpoint to out-shakespeare-char
iter 92500: loss 7.4064, time 2877.49ms
iter 92510: loss 7.3720, time 120.59ms
iter 92520: loss 7.7100, time 120.38ms
iter 92530: loss 7.1130, time 118.70ms
iter 92540: loss 6.6723, time 119.25ms
iter 92550: loss 6.4005, time 119.44ms
iter 92560: loss 7.1984, time 120.41ms
iter 92570: loss 6.4649, time 121.02ms
iter 92580: loss 7.3492, time 118.91ms
iter 92590: loss 7.6583, time 119.17ms
tensor(0.4686)
iter 92600: loss 7.0791, time 120.49ms
iter 92610: loss 6.6135, time 120.46ms
iter 92620: loss 6.7146, time 120.21ms
iter 92630: loss 7.4626, time 119.51ms
iter 92640: loss 6.2826, time 118.83ms
iter 92650: loss 6.4008, time 119.69ms
iter 92660: loss 7.3176, time 119.32ms
iter 92670: loss 6.7403, time 120.56ms
iter 92680: loss 6.9618, time 120.34ms
iter 92690: loss 7.0230, time 119.36ms
tensor(0.4373)
iter 92700: loss 6.6872, time 118.68ms
iter 92710: loss 6.7157, time 120.49ms
iter 92720: loss 6.8805, time 121.12ms
iter 92730: loss 6.7903, time 120.22ms
iter 92740: loss 5.9832, time 119.24ms
step 92750: train loss 6.3934, val loss 6.3958
saving checkpoint to out-shakespeare-char
iter 92750: loss 7.1386, time 2879.01ms
iter 92760: loss 6.5180, time 120.86ms
iter 92770: loss 7.4677, time 120.72ms
iter 92780: loss 6.8634, time 120.57ms
iter 92790: loss 6.5701, time 120.67ms
tensor(0.4063)
iter 92800: loss 7.8016, time 120.68ms
iter 92810: loss 6.5216, time 120.56ms
iter 92820: loss 6.9811, time 119.68ms
iter 92830: loss 7.0603, time 121.42ms
iter 92840: loss 7.5847, time 120.58ms
iter 92850: loss 6.8760, time 120.98ms
iter 92860: loss 6.8701, time 121.14ms
iter 92870: loss 6.3680, time 121.98ms
iter 92880: loss 7.5869, time 120.08ms
iter 92890: loss 6.7741, time 119.17ms
tensor(0.3757)
iter 92900: loss 6.3938, time 118.96ms
iter 92910: loss 7.2519, time 119.52ms
iter 92920: loss 6.8960, time 120.52ms
iter 92930: loss 6.9941, time 120.52ms
iter 92940: loss 6.8562, time 120.52ms
iter 92950: loss 8.3698, time 120.59ms
iter 92960: loss 7.1951, time 119.13ms
iter 92970: loss 6.8614, time 119.32ms
iter 92980: loss 6.7628, time 120.19ms
iter 92990: loss 6.8993, time 119.45ms
tensor(0.3455)
step 93000: train loss 6.3123, val loss 6.4283
saving checkpoint to out-shakespeare-char
iter 93000: loss 6.6154, time 2890.46ms
iter 93010: loss 6.7654, time 120.70ms
iter 93020: loss 7.3584, time 119.61ms
iter 93030: loss 6.7095, time 119.24ms
iter 93040: loss 6.4649, time 120.66ms
iter 93050: loss 6.2032, time 119.18ms
iter 93060: loss 6.6888, time 119.58ms
iter 93070: loss 6.7985, time 119.14ms
iter 93080: loss 6.4217, time 119.58ms
iter 93090: loss 6.6413, time 119.21ms
tensor(0.3159)
iter 93100: loss 8.0242, time 120.96ms
iter 93110: loss 7.4894, time 120.69ms
iter 93120: loss 6.4301, time 120.61ms
iter 93130: loss 6.1379, time 119.49ms
iter 93140: loss 7.2979, time 120.64ms
iter 93150: loss 7.1570, time 119.36ms
iter 93160: loss 7.1983, time 120.99ms
iter 93170: loss 6.5928, time 120.60ms
iter 93180: loss 7.6950, time 119.67ms
iter 93190: loss 6.5807, time 119.39ms
tensor(0.2871)
iter 93200: loss 7.0009, time 121.39ms
iter 93210: loss 6.7395, time 119.09ms
iter 93220: loss 7.0170, time 119.96ms
iter 93230: loss 6.3832, time 119.25ms
iter 93240: loss 6.2571, time 119.61ms
step 93250: train loss 6.2953, val loss 6.3066
saving checkpoint to out-shakespeare-char
iter 93250: loss 7.7360, time 2887.20ms
iter 93260: loss 6.7994, time 119.19ms
iter 93270: loss 7.1181, time 120.58ms
iter 93280: loss 7.3087, time 119.40ms
iter 93290: loss 6.2028, time 118.38ms
tensor(0.2591)
iter 93300: loss 6.5491, time 120.84ms
iter 93310: loss 7.3795, time 119.39ms
iter 93320: loss 6.5210, time 121.80ms
iter 93330: loss 6.3096, time 120.49ms
iter 93340: loss 6.5801, time 119.79ms
iter 93350: loss 6.4886, time 119.38ms
iter 93360: loss 7.0040, time 119.42ms
iter 93370: loss 6.8114, time 122.19ms
iter 93380: loss 7.0445, time 120.69ms
iter 93390: loss 6.8267, time 119.76ms
tensor(0.2321)
iter 93400: loss 6.5994, time 119.97ms
iter 93410: loss 6.4089, time 119.37ms
iter 93420: loss 6.6332, time 120.77ms
iter 93430: loss 6.7623, time 120.47ms
iter 93440: loss 7.0094, time 119.32ms
iter 93450: loss 6.1933, time 119.30ms
iter 93460: loss 6.9792, time 120.27ms
iter 93470: loss 6.3756, time 120.80ms
iter 93480: loss 6.4070, time 119.30ms
iter 93490: loss 7.4212, time 119.94ms
tensor(0.2061)
step 93500: train loss 6.2907, val loss 6.2081
saving checkpoint to out-shakespeare-char
iter 93500: loss 6.9327, time 2880.08ms
iter 93510: loss 7.0649, time 119.25ms
iter 93520: loss 6.3413, time 120.70ms
iter 93530: loss 7.1088, time 120.70ms
iter 93540: loss 7.2174, time 121.36ms
iter 93550: loss 6.5289, time 120.46ms
iter 93560: loss 6.9375, time 120.39ms
iter 93570: loss 7.1838, time 119.24ms
iter 93580: loss 7.0109, time 121.46ms
iter 93590: loss 6.7605, time 119.45ms
tensor(0.1813)
iter 93600: loss 7.0561, time 121.50ms
iter 93610: loss 6.9319, time 119.33ms
iter 93620: loss 6.5163, time 119.21ms
iter 93630: loss 6.2357, time 120.99ms
iter 93640: loss 7.4000, time 119.29ms
iter 93650: loss 6.6755, time 121.41ms
iter 93660: loss 7.3279, time 120.43ms
iter 93670: loss 7.2454, time 118.38ms
iter 93680: loss 7.1647, time 119.26ms
iter 93690: loss 7.6214, time 121.44ms
tensor(0.1577)
iter 93700: loss 6.7857, time 120.81ms
iter 93710: loss 6.9683, time 121.42ms
iter 93720: loss 7.1401, time 119.09ms
iter 93730: loss 7.0260, time 119.25ms
iter 93740: loss 6.2476, time 119.20ms
step 93750: train loss 6.2086, val loss 6.2309
saving checkpoint to out-shakespeare-char
iter 93750: loss 7.0053, time 2861.74ms
iter 93760: loss 7.5972, time 120.35ms
iter 93770: loss 7.0992, time 121.15ms
iter 93780: loss 7.6880, time 120.92ms
iter 93790: loss 6.5978, time 121.79ms
tensor(0.1355)
iter 93800: loss 6.9964, time 118.61ms
iter 93810: loss 6.4957, time 120.57ms
iter 93820: loss 6.4812, time 121.14ms
iter 93830: loss 6.2131, time 119.02ms
iter 93840: loss 7.0024, time 120.94ms
iter 93850: loss 6.6322, time 123.13ms
iter 93860: loss 7.3899, time 119.57ms
iter 93870: loss 7.3255, time 121.52ms
iter 93880: loss 6.2847, time 120.78ms
iter 93890: loss 6.1329, time 119.15ms
tensor(0.1147)
iter 93900: loss 6.3693, time 122.64ms
iter 93910: loss 6.6573, time 119.48ms
iter 93920: loss 6.0328, time 121.28ms
iter 93930: loss 6.6773, time 120.99ms
iter 93940: loss 6.1986, time 120.57ms
iter 93950: loss 6.8769, time 122.33ms
iter 93960: loss 7.2696, time 120.03ms
iter 93970: loss 6.6754, time 119.25ms
iter 93980: loss 6.6804, time 119.23ms
iter 93990: loss 6.8571, time 119.04ms
tensor(0.0955)
step 94000: train loss 6.2211, val loss 6.2262
saving checkpoint to out-shakespeare-char
iter 94000: loss 7.2569, time 2870.54ms
iter 94010: loss 6.8059, time 123.11ms
iter 94020: loss 6.5602, time 122.94ms
iter 94030: loss 6.9485, time 123.99ms
iter 94040: loss 6.3315, time 123.27ms
iter 94050: loss 6.6838, time 123.48ms
iter 94060: loss 6.8773, time 125.50ms
iter 94070: loss 6.7378, time 123.06ms
iter 94080: loss 6.7197, time 123.43ms
iter 94090: loss 6.4974, time 124.15ms
tensor(0.0778)
iter 94100: loss 6.9205, time 123.81ms
iter 94110: loss 6.7061, time 123.46ms
iter 94120: loss 7.3282, time 123.62ms
iter 94130: loss 7.1005, time 123.20ms
iter 94140: loss 6.2431, time 125.10ms
iter 94150: loss 6.9502, time 122.70ms
iter 94160: loss 6.4250, time 122.68ms
iter 94170: loss 6.9640, time 122.15ms
iter 94180: loss 7.1446, time 122.84ms
iter 94190: loss 6.5225, time 122.72ms
tensor(0.0618)
iter 94200: loss 6.7880, time 122.88ms
iter 94210: loss 6.6775, time 125.45ms
iter 94220: loss 6.9095, time 122.83ms
iter 94230: loss 7.3236, time 123.14ms
iter 94240: loss 7.2542, time 123.60ms
step 94250: train loss 6.1442, val loss 6.1893
saving checkpoint to out-shakespeare-char
iter 94250: loss 7.3472, time 2862.43ms
iter 94260: loss 6.4335, time 123.08ms
iter 94270: loss 6.9619, time 125.26ms
iter 94280: loss 6.3049, time 122.97ms
iter 94290: loss 7.1805, time 122.88ms
tensor(0.0476)
iter 94300: loss 6.4710, time 122.85ms
iter 94310: loss 7.2346, time 122.23ms
iter 94320: loss 7.2335, time 122.76ms
iter 94330: loss 7.3691, time 123.00ms
iter 94340: loss 7.4108, time 125.16ms
iter 94350: loss 5.5617, time 122.94ms
iter 94360: loss 6.5329, time 123.13ms
iter 94370: loss 7.0750, time 122.83ms
iter 94380: loss 6.7852, time 122.64ms
iter 94390: loss 6.5713, time 122.90ms
tensor(0.0351)
iter 94400: loss 7.2015, time 123.18ms
iter 94410: loss 6.7400, time 122.84ms
iter 94420: loss 6.2679, time 121.80ms
iter 94430: loss 7.0301, time 122.35ms
iter 94440: loss 6.6486, time 123.99ms
iter 94450: loss 5.9364, time 122.28ms
iter 94460: loss 7.6483, time 123.02ms
iter 94470: loss 7.6126, time 122.12ms
iter 94480: loss 7.0977, time 122.56ms
iter 94490: loss 6.9466, time 122.87ms
tensor(0.0245)
step 94500: train loss 6.1439, val loss 6.1650
saving checkpoint to out-shakespeare-char
iter 94500: loss 6.3062, time 2837.51ms
iter 94510: loss 6.9628, time 122.63ms
iter 94520: loss 6.1540, time 121.69ms
iter 94530: loss 6.9368, time 122.36ms
iter 94540: loss 7.2878, time 122.99ms
iter 94550: loss 6.2278, time 122.85ms
iter 94560: loss 7.0671, time 125.21ms
iter 94570: loss 6.7934, time 123.60ms
iter 94580: loss 7.4545, time 123.24ms
iter 94590: loss 5.8536, time 123.27ms
tensor(0.0157)
iter 94600: loss 6.8733, time 123.09ms
iter 94610: loss 6.6838, time 122.72ms
iter 94620: loss 6.6883, time 122.24ms
iter 94630: loss 6.9987, time 124.87ms
iter 94640: loss 6.4760, time 123.47ms
iter 94650: loss 6.8947, time 123.13ms
iter 94660: loss 7.1820, time 123.37ms
iter 94670: loss 6.8569, time 122.91ms
iter 94680: loss 6.4312, time 123.13ms
iter 94690: loss 6.9345, time 123.25ms
tensor(0.0089)
iter 94700: loss 7.5005, time 123.43ms
iter 94710: loss 6.8185, time 126.16ms
iter 94720: loss 6.5123, time 122.99ms
iter 94730: loss 6.3640, time 123.02ms
iter 94740: loss 6.5642, time 123.23ms
step 94750: train loss 6.1334, val loss 6.1600
saving checkpoint to out-shakespeare-char
iter 94750: loss 6.9885, time 2850.02ms
iter 94760: loss 6.9750, time 123.64ms
iter 94770: loss 6.3484, time 122.38ms
iter 94780: loss 7.5210, time 123.16ms
iter 94790: loss 6.4953, time 124.68ms
tensor(0.0039)
iter 94800: loss 6.9611, time 122.96ms
iter 94810: loss 6.6837, time 122.15ms
iter 94820: loss 6.8152, time 121.74ms
iter 94830: loss 6.4395, time 122.44ms
iter 94840: loss 6.9547, time 122.82ms
iter 94850: loss 6.3932, time 122.71ms
iter 94860: loss 5.4432, time 124.75ms
iter 94870: loss 6.8998, time 122.27ms
iter 94880: loss 6.7277, time 122.88ms
iter 94890: loss 6.6152, time 122.31ms
tensor(0.0010)
iter 94900: loss 7.3641, time 122.40ms
iter 94910: loss 6.6718, time 122.84ms
iter 94920: loss 6.3786, time 124.81ms
iter 94930: loss 6.2258, time 122.95ms
iter 94940: loss 6.4041, time 122.57ms
iter 94950: loss 6.7074, time 122.48ms
iter 94960: loss 6.8638, time 122.43ms
iter 94970: loss 7.1010, time 121.27ms
iter 94980: loss 6.2774, time 122.71ms
iter 94990: loss 7.0804, time 124.67ms
tensor(0.0010)
step 95000: train loss 6.1839, val loss 6.1864
saving checkpoint to out-shakespeare-char
iter 95000: loss 7.0891, time 2833.61ms
iter 95010: loss 6.5645, time 122.97ms
iter 95020: loss 7.7901, time 122.83ms
iter 95030: loss 7.0171, time 124.54ms
iter 95040: loss 6.4976, time 121.73ms
iter 95050: loss 6.5795, time 122.67ms
iter 95060: loss 6.6227, time 122.65ms
iter 95070: loss 6.4830, time 122.34ms
iter 95080: loss 6.3671, time 122.69ms
iter 95090: loss 6.4616, time 122.67ms
tensor(0.0010)
iter 95100: loss 7.2719, time 124.59ms
iter 95110: loss 6.2891, time 122.75ms
iter 95120: loss 6.7843, time 122.71ms
iter 95130: loss 6.5001, time 122.51ms
iter 95140: loss 6.8736, time 122.37ms
iter 95150: loss 6.8806, time 122.69ms
iter 95160: loss 7.1631, time 122.66ms
iter 95170: loss 6.7589, time 122.77ms
iter 95180: loss 6.2696, time 124.63ms
iter 95190: loss 6.0701, time 122.87ms
tensor(0.0039)
iter 95200: loss 5.8602, time 122.73ms
iter 95210: loss 6.1518, time 123.50ms
iter 95220: loss 6.7151, time 123.02ms
iter 95230: loss 6.1879, time 123.34ms
iter 95240: loss 5.7996, time 123.07ms
step 95250: train loss 6.1145, val loss 6.1364
saving checkpoint to out-shakespeare-char
iter 95250: loss 7.1965, time 2829.64ms
iter 95260: loss 6.4816, time 123.40ms
iter 95270: loss 6.4714, time 123.01ms
iter 95280: loss 6.6791, time 122.92ms
iter 95290: loss 6.6996, time 122.96ms
tensor(0.0089)
iter 95300: loss 6.8982, time 123.18ms
iter 95310: loss 7.0801, time 125.16ms
iter 95320: loss 6.6249, time 122.91ms
iter 95330: loss 6.5334, time 123.38ms
iter 95340: loss 7.5948, time 122.80ms
iter 95350: loss 6.0993, time 122.73ms
iter 95360: loss 6.6837, time 124.50ms
iter 95370: loss 5.7761, time 123.12ms
iter 95380: loss 6.5247, time 125.03ms
iter 95390: loss 6.3302, time 122.18ms
tensor(0.0157)
iter 95400: loss 6.1885, time 123.16ms
iter 95410: loss 6.6435, time 123.08ms
iter 95420: loss 6.9302, time 122.78ms
iter 95430: loss 5.9988, time 122.81ms
iter 95440: loss 6.3375, time 122.85ms
iter 95450: loss 6.2206, time 124.83ms
iter 95460: loss 6.3938, time 124.23ms
iter 95470: loss 6.6912, time 123.02ms
iter 95480: loss 6.8450, time 122.96ms
iter 95490: loss 7.4806, time 122.91ms
tensor(0.0245)
step 95500: train loss 6.1617, val loss 6.0941
saving checkpoint to out-shakespeare-char
iter 95500: loss 6.6628, time 2853.63ms
iter 95510: loss 6.3628, time 117.30ms
iter 95520: loss 6.2470, time 116.60ms
iter 95530: loss 7.0796, time 117.20ms
iter 95540: loss 6.5672, time 117.32ms
iter 95550: loss 6.8835, time 118.51ms
iter 95560: loss 7.2055, time 117.33ms
iter 95570: loss 6.7133, time 118.23ms
iter 95580: loss 7.4452, time 118.25ms
iter 95590: loss 7.1565, time 119.35ms
tensor(0.0351)
iter 95600: loss 6.3267, time 117.32ms
iter 95610: loss 6.0204, time 117.35ms
iter 95620: loss 6.2103, time 117.14ms
iter 95630: loss 7.1255, time 118.54ms
iter 95640: loss 7.4079, time 120.82ms
iter 95650: loss 7.1137, time 121.03ms
iter 95660: loss 7.7380, time 120.16ms
iter 95670: loss 6.6664, time 119.09ms
iter 95680: loss 6.7742, time 119.00ms
iter 95690: loss 6.7764, time 119.04ms
tensor(0.0476)
iter 95700: loss 6.8136, time 120.32ms
iter 95710: loss 6.3119, time 119.10ms
iter 95720: loss 7.3131, time 119.02ms
iter 95730: loss 7.5457, time 119.20ms
iter 95740: loss 6.8100, time 121.22ms
step 95750: train loss 6.1081, val loss 6.1740
saving checkpoint to out-shakespeare-char
iter 95750: loss 7.2129, time 2887.18ms
iter 95760: loss 6.8820, time 122.52ms
iter 95770: loss 7.3794, time 122.65ms
iter 95780: loss 7.0997, time 121.83ms
iter 95790: loss 7.1707, time 121.92ms
tensor(0.0618)
iter 95800: loss 6.4678, time 122.77ms
iter 95810: loss 7.5308, time 124.76ms
iter 95820: loss 6.8537, time 122.56ms
iter 95830: loss 6.1296, time 122.44ms
iter 95840: loss 6.5178, time 122.37ms
iter 95850: loss 6.8211, time 122.64ms
iter 95860: loss 6.3054, time 122.97ms
iter 95870: loss 6.7494, time 122.89ms
iter 95880: loss 6.8817, time 124.56ms
iter 95890: loss 6.5609, time 122.45ms
tensor(0.0778)
iter 95900: loss 6.4248, time 122.65ms
iter 95910: loss 6.2811, time 122.72ms
iter 95920: loss 7.0556, time 122.09ms
iter 95930: loss 6.1059, time 122.34ms
iter 95940: loss 6.6108, time 122.38ms
iter 95950: loss 7.4501, time 124.60ms
iter 95960: loss 7.2772, time 122.54ms
iter 95970: loss 6.5674, time 122.39ms
iter 95980: loss 7.3078, time 122.28ms
iter 95990: loss 6.6278, time 122.69ms
tensor(0.0955)
step 96000: train loss 6.1417, val loss 6.1888
saving checkpoint to out-shakespeare-char
iter 96000: loss 6.5466, time 2832.61ms
iter 96010: loss 6.4344, time 122.75ms
iter 96020: loss 7.2450, time 122.40ms
iter 96030: loss 6.6226, time 122.31ms
iter 96040: loss 6.5615, time 122.46ms
iter 96050: loss 6.3623, time 122.49ms
iter 96060: loss 7.1212, time 122.88ms
iter 96070: loss 5.6828, time 124.80ms
iter 96080: loss 6.8234, time 122.85ms
iter 96090: loss 6.4691, time 123.34ms
tensor(0.1147)
iter 96100: loss 6.6231, time 122.88ms
iter 96110: loss 6.3787, time 122.62ms
iter 96120: loss 6.5744, time 122.68ms
iter 96130: loss 6.8580, time 122.77ms
iter 96140: loss 6.1878, time 124.75ms
iter 96150: loss 6.7766, time 122.41ms
iter 96160: loss 6.4521, time 122.68ms
iter 96170: loss 6.3557, time 122.54ms
iter 96180: loss 6.9719, time 122.88ms
iter 96190: loss 6.8395, time 122.70ms
tensor(0.1355)
iter 96200: loss 6.9018, time 123.20ms
iter 96210: loss 5.9702, time 124.59ms
iter 96220: loss 6.9562, time 123.14ms
iter 96230: loss 6.9605, time 122.95ms
iter 96240: loss 6.8767, time 122.86ms
step 96250: train loss 6.1480, val loss 6.1492
saving checkpoint to out-shakespeare-char
iter 96250: loss 6.3831, time 2844.17ms
iter 96260: loss 7.2049, time 122.88ms
iter 96270: loss 7.4411, time 123.15ms
iter 96280: loss 6.6880, time 122.61ms
iter 96290: loss 7.1425, time 123.46ms
tensor(0.1577)
iter 96300: loss 6.8857, time 122.75ms
iter 96310: loss 6.7714, time 124.32ms
iter 96320: loss 7.2705, time 122.43ms
iter 96330: loss 6.5502, time 123.42ms
iter 96340: loss 6.9226, time 123.00ms
iter 96350: loss 6.8325, time 122.40ms
iter 96360: loss 5.7488, time 122.44ms
iter 96370: loss 6.5392, time 121.54ms
iter 96380: loss 7.1920, time 124.32ms
iter 96390: loss 5.9827, time 121.46ms
tensor(0.1813)
iter 96400: loss 6.4234, time 122.47ms
iter 96410: loss 6.4709, time 122.58ms
iter 96420: loss 5.4767, time 123.16ms
iter 96430: loss 6.2935, time 123.42ms
iter 96440: loss 6.7904, time 123.42ms
iter 96450: loss 7.0850, time 123.01ms
iter 96460: loss 6.6272, time 125.33ms
iter 96470: loss 6.7940, time 123.08ms
iter 96480: loss 6.1385, time 122.94ms
iter 96490: loss 6.8791, time 123.14ms
tensor(0.2061)
step 96500: train loss 6.2679, val loss 6.2106
saving checkpoint to out-shakespeare-char
iter 96500: loss 6.9875, time 2836.68ms
iter 96510: loss 7.1886, time 120.89ms
iter 96520: loss 6.3727, time 121.62ms
iter 96530: loss 6.6392, time 119.61ms
iter 96540: loss 6.4447, time 121.50ms
iter 96550: loss 6.7848, time 119.65ms
iter 96560: loss 7.4227, time 120.82ms
iter 96570: loss 7.3090, time 120.92ms
iter 96580: loss 6.6852, time 120.00ms
iter 96590: loss 6.5229, time 120.73ms
tensor(0.2321)
iter 96600: loss 6.8756, time 120.86ms
iter 96610: loss 7.4074, time 120.57ms
iter 96620: loss 6.5195, time 121.86ms
iter 96630: loss 7.6635, time 121.32ms
iter 96640: loss 6.8105, time 120.81ms
iter 96650: loss 6.8507, time 120.90ms
iter 96660: loss 7.6310, time 120.81ms
iter 96670: loss 6.4425, time 120.72ms
iter 96680: loss 6.5238, time 121.61ms
iter 96690: loss 6.9927, time 119.62ms
tensor(0.2591)
iter 96700: loss 7.0759, time 122.26ms
iter 96710: loss 7.1563, time 119.22ms
iter 96720: loss 7.3568, time 120.40ms
iter 96730: loss 6.7262, time 120.83ms
iter 96740: loss 7.3663, time 120.46ms
step 96750: train loss 6.2962, val loss 6.2749
saving checkpoint to out-shakespeare-char
iter 96750: loss 7.0760, time 2864.66ms
iter 96760: loss 6.6306, time 121.76ms
iter 96770: loss 6.3084, time 119.79ms
iter 96780: loss 7.9585, time 119.68ms
iter 96790: loss 6.4752, time 120.18ms
tensor(0.2871)
iter 96800: loss 6.9096, time 121.02ms
iter 96810: loss 6.4849, time 120.87ms
iter 96820: loss 6.5054, time 121.06ms
iter 96830: loss 6.8506, time 121.19ms
iter 96840: loss 7.0550, time 120.56ms
iter 96850: loss 6.8350, time 119.86ms
iter 96860: loss 6.7869, time 120.77ms
iter 96870: loss 7.4191, time 121.88ms
iter 96880: loss 7.1131, time 119.99ms
iter 96890: loss 6.7147, time 120.92ms
tensor(0.3159)
iter 96900: loss 6.4313, time 119.90ms
iter 96910: loss 6.9104, time 120.75ms
iter 96920: loss 6.2992, time 120.61ms
iter 96930: loss 7.1794, time 119.87ms
iter 96940: loss 6.8285, time 120.20ms
iter 96950: loss 7.1985, time 120.01ms
iter 96960: loss 7.3366, time 119.40ms
iter 96970: loss 7.0548, time 118.49ms
iter 96980: loss 7.2394, time 120.42ms
iter 96990: loss 6.7724, time 119.91ms
tensor(0.3455)
step 97000: train loss 6.3453, val loss 6.3344
saving checkpoint to out-shakespeare-char
iter 97000: loss 7.5641, time 2861.72ms
iter 97010: loss 6.8250, time 119.36ms
iter 97020: loss 7.0976, time 119.80ms
iter 97030: loss 7.4417, time 120.70ms
iter 97040: loss 6.7352, time 118.83ms
iter 97050: loss 7.0539, time 119.13ms
iter 97060: loss 6.6145, time 119.64ms
iter 97070: loss 7.8489, time 119.33ms
iter 97080: loss 6.7867, time 120.56ms
iter 97090: loss 7.0771, time 119.65ms
tensor(0.3757)
iter 97100: loss 6.3041, time 118.74ms
iter 97110: loss 6.5712, time 119.67ms
iter 97120: loss 6.9780, time 119.83ms
iter 97130: loss 7.4727, time 120.50ms
iter 97140: loss 7.0447, time 119.94ms
iter 97150: loss 7.1611, time 117.87ms
iter 97160: loss 7.1235, time 119.34ms
iter 97170: loss 6.9587, time 119.74ms
iter 97180: loss 7.5718, time 119.79ms
iter 97190: loss 6.3420, time 117.89ms
tensor(0.4063)
iter 97200: loss 6.3614, time 119.41ms
iter 97210: loss 7.3980, time 122.00ms
iter 97220: loss 7.2687, time 121.12ms
iter 97230: loss 6.8106, time 123.13ms
iter 97240: loss 6.6333, time 124.05ms
step 97250: train loss 6.4674, val loss 6.4290
saving checkpoint to out-shakespeare-char
iter 97250: loss 6.5613, time 2846.91ms
iter 97260: loss 7.0897, time 123.95ms
iter 97270: loss 7.0574, time 123.05ms
iter 97280: loss 7.4848, time 122.96ms
iter 97290: loss 6.9336, time 122.62ms
tensor(0.4373)
iter 97300: loss 6.9440, time 122.99ms
iter 97310: loss 7.2178, time 123.89ms
iter 97320: loss 6.7222, time 125.29ms
iter 97330: loss 7.1151, time 122.46ms
iter 97340: loss 7.1680, time 122.06ms
iter 97350: loss 7.3036, time 122.65ms
iter 97360: loss 7.8517, time 123.26ms
iter 97370: loss 7.2456, time 123.29ms
iter 97380: loss 6.9783, time 123.10ms
iter 97390: loss 7.4782, time 125.26ms
tensor(0.4686)
iter 97400: loss 6.9416, time 125.06ms
iter 97410: loss 7.5402, time 121.65ms
iter 97420: loss 6.0382, time 123.03ms
iter 97430: loss 6.8473, time 123.18ms
iter 97440: loss 6.3570, time 122.76ms
iter 97450: loss 7.0187, time 122.73ms
iter 97460: loss 7.1867, time 122.76ms
iter 97470: loss 6.9630, time 124.54ms
iter 97480: loss 7.4251, time 121.93ms
iter 97490: loss 7.4131, time 121.74ms
tensor(0.5000)
step 97500: train loss 6.4993, val loss 6.5259
saving checkpoint to out-shakespeare-char
iter 97500: loss 6.6123, time 2862.19ms
iter 97510: loss 6.7613, time 122.81ms
iter 97520: loss 6.6888, time 122.85ms
iter 97530: loss 6.5224, time 121.89ms
iter 97540: loss 7.1450, time 125.32ms
iter 97550: loss 6.6153, time 122.94ms
iter 97560: loss 7.6705, time 122.24ms
iter 97570: loss 6.8779, time 122.94ms
iter 97580: loss 7.7118, time 122.97ms
iter 97590: loss 7.2703, time 123.22ms
tensor(0.5314)
iter 97600: loss 6.9703, time 123.00ms
iter 97610: loss 6.1415, time 125.00ms
iter 97620: loss 6.4222, time 122.79ms
iter 97630: loss 6.8615, time 122.83ms
iter 97640: loss 7.6888, time 122.95ms
iter 97650: loss 7.3624, time 122.98ms
iter 97660: loss 6.6586, time 123.53ms
iter 97670: loss 7.0406, time 122.00ms
iter 97680: loss 7.2311, time 125.30ms
iter 97690: loss 6.9863, time 121.43ms
tensor(0.5627)
iter 97700: loss 6.8263, time 122.20ms
iter 97710: loss 6.9933, time 123.14ms
iter 97720: loss 7.2280, time 122.90ms
iter 97730: loss 6.9599, time 123.02ms
iter 97740: loss 7.3173, time 123.22ms
step 97750: train loss 6.5764, val loss 6.6051
saving checkpoint to out-shakespeare-char
iter 97750: loss 7.2172, time 2865.83ms
iter 97760: loss 7.0600, time 122.76ms
iter 97770: loss 6.8948, time 120.10ms
iter 97780: loss 6.9343, time 121.77ms
iter 97790: loss 6.8911, time 122.66ms
tensor(0.5937)
iter 97800: loss 6.7913, time 122.48ms
iter 97810: loss 7.6758, time 124.41ms
iter 97820: loss 7.2425, time 123.30ms
iter 97830: loss 7.2842, time 122.63ms
iter 97840: loss 6.9857, time 121.72ms
iter 97850: loss 7.7313, time 122.52ms
iter 97860: loss 7.0703, time 122.60ms
iter 97870: loss 7.2467, time 122.58ms
iter 97880: loss 7.8534, time 122.73ms
iter 97890: loss 7.4234, time 122.57ms
tensor(0.6243)
iter 97900: loss 7.0532, time 121.76ms
iter 97910: loss 7.4705, time 122.57ms
iter 97920: loss 7.0527, time 122.67ms
iter 97930: loss 7.6068, time 121.47ms
iter 97940: loss 7.1213, time 124.83ms
iter 97950: loss 7.5506, time 121.81ms
iter 97960: loss 7.1744, time 122.51ms
iter 97970: loss 7.1950, time 122.51ms
iter 97980: loss 6.8406, time 122.75ms
iter 97990: loss 6.9265, time 122.91ms
tensor(0.6545)
step 98000: train loss 6.5783, val loss 6.6012
saving checkpoint to out-shakespeare-char
iter 98000: loss 7.1778, time 2873.97ms
iter 98010: loss 7.4945, time 118.91ms
iter 98020: loss 6.9668, time 119.45ms
iter 98030: loss 7.4900, time 118.31ms
iter 98040: loss 8.1043, time 119.56ms
iter 98050: loss 7.9016, time 119.17ms
iter 98060: loss 7.4430, time 118.85ms
iter 98070: loss 7.4640, time 121.50ms
iter 98080: loss 6.7451, time 118.23ms
iter 98090: loss 7.2541, time 122.46ms
tensor(0.6841)
iter 98100: loss 7.0513, time 119.31ms
iter 98110: loss 6.9776, time 120.73ms
iter 98120: loss 6.2449, time 119.35ms
iter 98130: loss 6.6017, time 118.58ms
iter 98140: loss 7.8347, time 119.22ms
iter 98150: loss 7.1395, time 121.07ms
iter 98160: loss 6.7148, time 120.14ms
iter 98170: loss 7.1522, time 120.91ms
iter 98180: loss 7.0169, time 119.27ms
iter 98190: loss 7.5497, time 121.00ms
tensor(0.7129)
iter 98200: loss 7.4499, time 119.53ms
iter 98210: loss 7.1853, time 120.85ms
iter 98220: loss 7.3093, time 119.16ms
iter 98230: loss 7.0802, time 119.47ms
iter 98240: loss 7.0826, time 120.22ms
step 98250: train loss 6.6618, val loss 6.6035
saving checkpoint to out-shakespeare-char
iter 98250: loss 6.9791, time 2845.35ms
iter 98260: loss 7.5814, time 119.53ms
iter 98270: loss 7.9013, time 119.53ms
iter 98280: loss 7.4000, time 120.67ms
iter 98290: loss 6.5224, time 120.63ms
tensor(0.7409)
iter 98300: loss 8.2972, time 121.27ms
iter 98310: loss 7.2079, time 119.53ms
iter 98320: loss 7.2310, time 120.88ms
iter 98330: loss 6.6434, time 120.60ms
iter 98340: loss 7.1188, time 120.76ms
iter 98350: loss 7.2396, time 119.57ms
iter 98360: loss 7.1209, time 118.41ms
iter 98370: loss 6.9456, time 119.43ms
iter 98380: loss 7.1934, time 119.50ms
iter 98390: loss 7.5139, time 120.98ms
tensor(0.7679)
iter 98400: loss 7.3725, time 120.96ms
iter 98410: loss 7.4314, time 121.51ms
iter 98420: loss 7.4881, time 119.44ms
iter 98430: loss 7.4129, time 120.71ms
iter 98440: loss 7.2650, time 120.95ms
iter 98450: loss 7.3373, time 121.61ms
iter 98460: loss 6.8424, time 120.17ms
iter 98470: loss 7.5228, time 119.39ms
iter 98480: loss 7.2595, time 119.51ms
iter 98490: loss 7.8872, time 119.57ms
tensor(0.7939)
step 98500: train loss 6.6661, val loss 6.6875
saving checkpoint to out-shakespeare-char
iter 98500: loss 7.4358, time 2868.63ms
iter 98510: loss 7.4163, time 123.15ms
iter 98520: loss 7.7855, time 121.78ms
iter 98530: loss 7.3557, time 122.95ms
iter 98540: loss 7.1857, time 121.77ms
iter 98550: loss 7.3188, time 122.33ms
iter 98560: loss 7.0567, time 124.37ms
iter 98570: loss 7.1582, time 122.28ms
iter 98580: loss 7.1406, time 122.29ms
iter 98590: loss 6.8733, time 123.20ms
tensor(0.8187)
iter 98600: loss 7.5617, time 122.34ms
iter 98610: loss 7.4432, time 122.33ms
iter 98620: loss 7.3439, time 122.70ms
iter 98630: loss 6.9153, time 125.00ms
iter 98640: loss 7.7402, time 121.93ms
iter 98650: loss 7.1432, time 121.91ms
iter 98660: loss 6.7254, time 121.19ms
iter 98670: loss 7.4640, time 122.43ms
iter 98680: loss 6.4458, time 122.51ms
iter 98690: loss 6.9906, time 123.56ms
tensor(0.8423)
iter 98700: loss 8.2450, time 125.12ms
iter 98710: loss 7.2339, time 123.04ms
iter 98720: loss 8.2839, time 123.15ms
iter 98730: loss 7.8414, time 122.85ms
iter 98740: loss 7.7043, time 121.01ms
step 98750: train loss 6.7453, val loss 6.8077
saving checkpoint to out-shakespeare-char
iter 98750: loss 7.8468, time 2855.40ms
iter 98760: loss 6.6619, time 121.56ms
iter 98770: loss 8.5247, time 122.49ms
iter 98780: loss 7.8249, time 122.49ms
iter 98790: loss 7.2975, time 122.69ms
tensor(0.8645)
iter 98800: loss 7.5920, time 124.20ms
iter 98810: loss 7.9290, time 124.53ms
iter 98820: loss 7.1743, time 122.18ms
iter 98830: loss 7.3059, time 122.07ms
iter 98840: loss 7.1949, time 122.33ms
iter 98850: loss 8.2055, time 122.56ms
iter 98860: loss 6.8029, time 122.49ms
iter 98870: loss 7.8178, time 121.35ms
iter 98880: loss 6.5405, time 122.94ms
iter 98890: loss 7.3514, time 123.23ms
tensor(0.8853)
iter 98900: loss 7.0373, time 123.86ms
iter 98910: loss 6.5252, time 123.49ms
iter 98920: loss 7.6843, time 123.18ms
iter 98930: loss 7.0692, time 123.34ms
iter 98940: loss 7.6453, time 125.53ms
iter 98950: loss 7.1573, time 123.37ms
iter 98960: loss 8.3929, time 123.13ms
iter 98970: loss 7.4964, time 123.03ms
iter 98980: loss 7.6123, time 122.79ms
iter 98990: loss 7.3637, time 122.87ms
tensor(0.9045)
step 99000: train loss 6.6799, val loss 6.6562
saving checkpoint to out-shakespeare-char
iter 99000: loss 7.4698, time 2838.32ms
iter 99010: loss 7.7222, time 120.21ms
iter 99020: loss 7.7460, time 119.10ms
iter 99030: loss 7.0740, time 119.31ms
iter 99040: loss 7.8622, time 120.34ms
iter 99050: loss 7.5628, time 119.12ms
iter 99060: loss 7.1393, time 119.03ms
iter 99070: loss 7.8461, time 119.19ms
iter 99080: loss 6.8431, time 119.29ms
iter 99090: loss 7.4968, time 120.45ms
tensor(0.9222)
iter 99100: loss 7.8371, time 120.50ms
iter 99110: loss 8.0424, time 120.35ms
iter 99120: loss 8.0074, time 120.42ms
iter 99130: loss 7.7969, time 121.24ms
iter 99140: loss 7.1709, time 120.43ms
iter 99150: loss 7.4588, time 120.27ms
iter 99160: loss 8.1199, time 119.20ms
iter 99170: loss 7.7151, time 121.22ms
iter 99180: loss 7.7900, time 120.58ms
iter 99190: loss 7.7434, time 120.27ms
tensor(0.9382)
iter 99200: loss 7.6094, time 120.50ms
iter 99210: loss 7.7499, time 120.27ms
iter 99220: loss 7.9751, time 120.00ms
iter 99230: loss 7.1680, time 119.21ms
iter 99240: loss 7.7574, time 120.23ms
step 99250: train loss 6.7872, val loss 6.6910
saving checkpoint to out-shakespeare-char
iter 99250: loss 7.5569, time 2858.55ms
iter 99260: loss 7.5318, time 125.06ms
iter 99270: loss 7.2452, time 123.09ms
iter 99280: loss 7.0779, time 123.04ms
iter 99290: loss 7.4770, time 122.73ms
tensor(0.9524)
iter 99300: loss 7.0517, time 123.12ms
iter 99310: loss 7.3493, time 123.07ms
iter 99320: loss 7.7899, time 123.09ms
iter 99330: loss 8.4697, time 125.30ms
iter 99340: loss 8.0904, time 123.14ms
iter 99350: loss 8.2432, time 122.94ms
iter 99360: loss 7.5359, time 121.53ms
iter 99370: loss 6.7269, time 122.91ms
iter 99380: loss 8.4049, time 122.19ms
iter 99390: loss 7.3587, time 123.26ms
tensor(0.9649)
iter 99400: loss 6.6842, time 125.15ms
iter 99410: loss 7.5217, time 122.82ms
iter 99420: loss 7.6195, time 123.11ms
iter 99430: loss 8.0836, time 123.42ms
iter 99440: loss 7.4717, time 123.53ms
iter 99450: loss 7.4158, time 123.11ms
iter 99460: loss 6.9641, time 123.33ms
iter 99470: loss 7.3638, time 125.03ms
iter 99480: loss 7.0494, time 123.00ms
iter 99490: loss 7.7451, time 122.29ms
tensor(0.9755)
step 99500: train loss 6.7708, val loss 6.8504
saving checkpoint to out-shakespeare-char
iter 99500: loss 7.3707, time 2875.23ms
iter 99510: loss 7.0208, time 125.14ms
iter 99520: loss 7.1817, time 121.95ms
iter 99530: loss 7.7363, time 123.00ms
iter 99540: loss 6.9308, time 122.79ms
iter 99550: loss 7.2051, time 123.37ms
iter 99560: loss 6.9447, time 122.67ms
iter 99570: loss 6.9013, time 122.92ms
iter 99580: loss 7.8863, time 117.28ms
iter 99590: loss 8.0626, time 120.45ms
tensor(0.9843)
iter 99600: loss 7.5009, time 119.55ms
iter 99610: loss 7.0810, time 121.59ms
iter 99620: loss 7.1046, time 119.45ms
iter 99630: loss 8.1250, time 120.99ms
iter 99640: loss 7.7727, time 119.86ms
iter 99650: loss 7.6846, time 120.41ms
iter 99660: loss 7.8046, time 120.36ms
iter 99670: loss 7.8332, time 121.34ms
iter 99680: loss 7.4875, time 118.22ms
iter 99690: loss 7.7732, time 120.41ms
tensor(0.9911)
iter 99700: loss 7.1294, time 119.21ms
iter 99710: loss 7.3285, time 119.29ms
iter 99720: loss 7.2154, time 120.43ms
iter 99730: loss 7.7851, time 119.66ms
iter 99740: loss 7.9296, time 120.65ms
step 99750: train loss 6.7883, val loss 6.8386
saving checkpoint to out-shakespeare-char
iter 99750: loss 7.0909, time 2864.54ms
iter 99760: loss 8.1359, time 121.00ms
iter 99770: loss 7.4928, time 120.70ms
iter 99780: loss 7.6627, time 119.51ms
iter 99790: loss 7.8363, time 120.34ms
tensor(0.9961)
iter 99800: loss 7.9841, time 119.79ms
iter 99810: loss 7.7971, time 119.98ms
iter 99820: loss 7.5035, time 119.69ms
iter 99830: loss 7.7432, time 120.83ms
iter 99840: loss 7.7383, time 121.78ms
iter 99850: loss 7.7700, time 120.53ms
iter 99860: loss 7.5283, time 121.90ms
iter 99870: loss 8.0299, time 119.53ms
iter 99880: loss 8.3516, time 119.57ms
iter 99890: loss 7.2473, time 119.90ms
tensor(0.9990)
iter 99900: loss 7.1316, time 119.84ms
iter 99910: loss 7.0561, time 121.08ms
iter 99920: loss 7.9714, time 120.74ms
iter 99930: loss 8.0365, time 121.67ms
iter 99940: loss 7.3359, time 119.71ms
iter 99950: loss 7.3519, time 121.45ms
iter 99960: loss 7.8534, time 119.19ms
iter 99970: loss 7.5228, time 119.41ms
iter 99980: loss 7.8175, time 119.92ms
iter 99990: loss 7.6282, time 119.62ms
tensor(1.)
step 100000: train loss 6.7156, val loss 6.7619
saving checkpoint to out-shakespeare-char
iter 100000: loss 6.8839, time 2855.85ms
iter 100010: loss 7.8322, time 119.29ms
iter 100020: loss 8.1419, time 119.18ms
iter 100030: loss 6.9403, time 119.29ms
iter 100040: loss 7.3136, time 120.46ms
iter 100050: loss 7.5893, time 118.34ms
iter 100060: loss 7.0401, time 119.13ms
iter 100070: loss 7.0353, time 118.97ms
iter 100080: loss 7.3006, time 121.45ms
iter 100090: loss 7.8700, time 120.32ms
tensor(0.9990)
iter 100100: loss 6.8729, time 121.33ms
iter 100110: loss 8.1519, time 119.30ms
iter 100120: loss 7.4784, time 119.24ms
iter 100130: loss 7.5798, time 119.43ms
iter 100140: loss 7.3267, time 119.00ms
iter 100150: loss 7.8685, time 120.44ms
iter 100160: loss 7.6395, time 119.22ms
iter 100170: loss 7.6017, time 121.30ms
iter 100180: loss 7.5122, time 120.43ms
iter 100190: loss 8.2923, time 121.12ms
tensor(0.9961)
iter 100200: loss 7.9093, time 119.22ms
iter 100210: loss 7.7716, time 120.66ms
iter 100220: loss 7.8549, time 119.06ms
iter 100230: loss 6.6198, time 118.27ms
iter 100240: loss 7.3948, time 119.00ms
step 100250: train loss 6.7843, val loss 6.7825
saving checkpoint to out-shakespeare-char
iter 100250: loss 7.2235, time 2865.53ms
iter 100260: loss 7.9212, time 123.97ms
iter 100270: loss 7.7925, time 123.09ms
iter 100280: loss 7.3821, time 122.35ms
iter 100290: loss 6.5876, time 122.41ms
tensor(0.9911)
iter 100300: loss 8.4691, time 122.71ms
iter 100310: loss 7.7295, time 121.76ms
iter 100320: loss 6.8272, time 122.84ms
iter 100330: loss 8.1031, time 124.85ms
iter 100340: loss 7.7376, time 121.73ms
iter 100350: loss 8.0150, time 123.30ms
iter 100360: loss 6.9706, time 121.88ms
iter 100370: loss 8.4434, time 122.48ms
iter 100380: loss 7.0592, time 122.38ms
iter 100390: loss 6.9455, time 122.52ms
tensor(0.9843)
iter 100400: loss 7.6453, time 123.42ms
iter 100410: loss 7.8276, time 124.67ms
iter 100420: loss 7.7673, time 122.87ms
iter 100430: loss 7.7945, time 122.29ms
iter 100440: loss 7.2287, time 122.18ms
iter 100450: loss 7.7621, time 122.23ms
iter 100460: loss 7.2354, time 122.39ms
iter 100470: loss 6.8651, time 122.17ms
iter 100480: loss 7.3031, time 124.36ms
iter 100490: loss 7.4605, time 123.01ms
tensor(0.9755)
step 100500: train loss 6.8391, val loss 6.7640
saving checkpoint to out-shakespeare-char
iter 100500: loss 6.9559, time 2853.05ms
iter 100510: loss 7.2948, time 123.69ms
iter 100520: loss 7.3994, time 125.05ms
iter 100530: loss 6.7853, time 122.80ms
iter 100540: loss 7.1480, time 122.45ms
iter 100550: loss 7.2144, time 122.63ms
iter 100560: loss 7.4468, time 121.82ms
iter 100570: loss 8.0400, time 122.73ms
iter 100580: loss 7.9043, time 122.69ms
iter 100590: loss 7.6978, time 125.42ms
tensor(0.9649)
iter 100600: loss 6.4866, time 122.60ms
iter 100610: loss 6.7896, time 123.47ms
iter 100620: loss 7.4255, time 122.67ms
iter 100630: loss 8.1803, time 122.45ms
iter 100640: loss 7.0147, time 122.78ms
iter 100650: loss 7.8078, time 122.66ms
iter 100660: loss 7.4711, time 124.13ms
iter 100670: loss 7.6078, time 122.15ms
iter 100680: loss 7.5398, time 121.90ms
iter 100690: loss 7.4150, time 123.56ms
tensor(0.9524)
iter 100700: loss 7.7502, time 123.05ms
iter 100710: loss 7.5990, time 122.94ms
iter 100720: loss 8.3783, time 123.00ms
iter 100730: loss 7.9400, time 123.47ms
iter 100740: loss 7.4432, time 125.30ms
step 100750: train loss 6.7353, val loss 6.7676
saving checkpoint to out-shakespeare-char
iter 100750: loss 7.1852, time 2882.43ms
iter 100760: loss 8.3409, time 122.98ms
iter 100770: loss 7.0783, time 123.28ms
iter 100780: loss 7.8636, time 122.92ms
iter 100790: loss 8.6863, time 122.73ms
tensor(0.9382)
iter 100800: loss 8.5194, time 124.02ms
iter 100810: loss 7.6512, time 122.78ms
iter 100820: loss 7.3068, time 122.27ms
iter 100830: loss 7.2342, time 122.80ms
iter 100840: loss 7.3455, time 123.15ms
iter 100850: loss 6.9584, time 122.36ms
iter 100860: loss 7.1309, time 122.23ms
iter 100870: loss 7.6184, time 123.16ms
iter 100880: loss 7.3324, time 122.40ms
iter 100890: loss 7.8300, time 123.00ms
tensor(0.9222)
iter 100900: loss 6.9479, time 122.89ms
iter 100910: loss 7.2652, time 122.33ms
iter 100920: loss 8.3988, time 122.64ms
iter 100930: loss 7.5983, time 121.83ms
iter 100940: loss 7.8268, time 122.64ms
iter 100950: loss 8.0165, time 121.87ms
iter 100960: loss 7.4698, time 122.64ms
iter 100970: loss 7.2704, time 126.04ms
iter 100980: loss 7.2841, time 122.63ms
iter 100990: loss 7.4837, time 122.48ms
tensor(0.9045)
step 101000: train loss 6.7542, val loss 6.7788
saving checkpoint to out-shakespeare-char
iter 101000: loss 6.9482, time 2864.35ms
iter 101010: loss 7.6737, time 120.08ms
iter 101020: loss 8.0038, time 121.76ms
iter 101030: loss 6.6925, time 119.86ms
iter 101040: loss 7.1633, time 119.34ms
iter 101050: loss 8.2683, time 119.21ms
iter 101060: loss 7.6433, time 119.85ms
iter 101070: loss 7.0182, time 121.41ms
iter 101080: loss 7.7434, time 119.86ms
iter 101090: loss 6.9888, time 121.38ms
tensor(0.8853)
iter 101100: loss 6.3689, time 119.70ms
iter 101110: loss 7.2438, time 120.05ms
iter 101120: loss 6.6115, time 121.15ms
iter 101130: loss 6.6223, time 119.74ms
iter 101140: loss 7.2991, time 120.01ms
iter 101150: loss 7.3290, time 119.43ms
iter 101160: loss 7.8272, time 120.18ms
iter 101170: loss 7.7296, time 119.42ms
iter 101180: loss 8.1148, time 120.90ms
iter 101190: loss 7.9382, time 119.50ms
tensor(0.8645)
iter 101200: loss 7.7459, time 119.25ms
iter 101210: loss 7.3287, time 119.22ms
iter 101220: loss 6.8743, time 120.74ms
iter 101230: loss 7.2739, time 121.69ms
iter 101240: loss 8.3289, time 119.46ms
step 101250: train loss 6.7457, val loss 6.6897
saving checkpoint to out-shakespeare-char
iter 101250: loss 7.7844, time 2866.97ms
iter 101260: loss 7.7339, time 118.81ms
iter 101270: loss 7.5881, time 121.11ms
iter 101280: loss 8.0564, time 119.19ms
iter 101290: loss 7.4346, time 118.52ms
tensor(0.8423)
iter 101300: loss 7.8187, time 119.22ms
iter 101310: loss 7.8887, time 119.09ms
iter 101320: loss 7.2334, time 120.27ms
iter 101330: loss 7.2542, time 119.02ms
iter 101340: loss 6.8044, time 119.40ms
iter 101350: loss 7.5943, time 119.09ms
iter 101360: loss 7.3343, time 120.68ms
iter 101370: loss 7.3034, time 119.49ms
iter 101380: loss 7.6160, time 120.62ms
iter 101390: loss 6.9419, time 119.85ms
tensor(0.8187)
iter 101400: loss 6.8122, time 120.27ms
iter 101410: loss 7.3364, time 120.39ms
iter 101420: loss 8.2862, time 121.08ms
iter 101430: loss 7.6177, time 119.14ms
iter 101440: loss 7.9932, time 121.06ms
iter 101450: loss 6.7936, time 119.18ms
iter 101460: loss 7.6437, time 120.53ms
iter 101470: loss 6.6775, time 119.76ms
iter 101480: loss 7.9344, time 119.04ms
iter 101490: loss 7.9742, time 119.14ms
tensor(0.7939)
step 101500: train loss 6.7427, val loss 6.7151
saving checkpoint to out-shakespeare-char
iter 101500: loss 7.6492, time 2850.97ms
iter 101510: loss 7.7194, time 120.38ms
iter 101520: loss 6.6103, time 118.97ms
iter 101530: loss 7.2393, time 120.19ms
iter 101540: loss 7.6874, time 118.86ms
iter 101550: loss 6.7570, time 120.23ms
iter 101560: loss 7.5594, time 119.98ms
iter 101570: loss 6.9796, time 119.03ms
iter 101580: loss 7.5865, time 119.09ms
iter 101590: loss 7.6462, time 121.11ms
tensor(0.7679)
iter 101600: loss 7.8772, time 118.92ms
iter 101610: loss 8.0576, time 120.46ms
iter 101620: loss 7.0253, time 119.32ms
iter 101630: loss 7.9753, time 121.22ms
iter 101640: loss 7.2139, time 119.98ms
iter 101650: loss 7.0368, time 120.99ms
iter 101660: loss 7.6450, time 118.65ms
iter 101670: loss 7.4189, time 120.70ms
iter 101680: loss 7.5633, time 119.76ms
iter 101690: loss 7.3249, time 119.45ms
tensor(0.7409)
iter 101700: loss 7.7499, time 120.79ms
iter 101710: loss 7.4302, time 121.26ms
iter 101720: loss 7.0288, time 119.62ms
iter 101730: loss 7.5209, time 120.14ms
iter 101740: loss 7.0367, time 120.37ms
step 101750: train loss 6.7423, val loss 6.6709
saving checkpoint to out-shakespeare-char
iter 101750: loss 7.5346, time 2858.15ms
iter 101760: loss 8.2033, time 120.43ms
iter 101770: loss 7.4694, time 119.08ms
iter 101780: loss 7.1047, time 119.37ms
iter 101790: loss 7.5079, time 121.11ms
tensor(0.7129)
iter 101800: loss 7.0452, time 119.41ms
iter 101810: loss 7.5353, time 119.35ms
iter 101820: loss 8.0179, time 120.32ms
iter 101830: loss 7.3522, time 119.17ms
iter 101840: loss 7.1702, time 120.42ms
iter 101850: loss 7.1581, time 119.56ms
iter 101860: loss 7.8938, time 119.84ms
iter 101870: loss 7.6015, time 120.37ms
iter 101880: loss 7.0958, time 120.93ms
iter 101890: loss 7.4344, time 120.31ms
tensor(0.6841)
iter 101900: loss 6.8610, time 120.53ms
iter 101910: loss 6.9714, time 120.42ms
iter 101920: loss 8.2534, time 120.94ms
iter 101930: loss 7.3231, time 119.03ms
iter 101940: loss 6.4625, time 118.83ms
iter 101950: loss 7.6140, time 119.16ms
iter 101960: loss 7.5193, time 118.94ms
iter 101970: loss 7.5298, time 118.59ms
iter 101980: loss 8.0118, time 120.18ms
iter 101990: loss 7.4468, time 120.49ms
tensor(0.6545)
step 102000: train loss 6.6032, val loss 6.6089
saving checkpoint to out-shakespeare-char
iter 102000: loss 7.1037, time 2858.27ms
iter 102010: loss 7.5189, time 120.67ms
iter 102020: loss 7.6905, time 121.03ms
iter 102030: loss 8.0642, time 120.67ms
iter 102040: loss 7.4903, time 120.78ms
iter 102050: loss 7.4317, time 120.71ms
iter 102060: loss 6.2627, time 120.62ms
iter 102070: loss 7.1728, time 119.58ms
iter 102080: loss 7.3158, time 121.17ms
iter 102090: loss 7.2380, time 121.86ms
tensor(0.6243)
iter 102100: loss 7.5201, time 120.22ms
iter 102110: loss 7.5444, time 119.71ms
iter 102120: loss 7.6883, time 119.62ms
iter 102130: loss 6.6814, time 119.57ms
iter 102140: loss 7.2331, time 120.06ms
iter 102150: loss 7.4471, time 119.80ms
iter 102160: loss 7.0639, time 120.48ms
iter 102170: loss 7.3240, time 121.02ms
iter 102180: loss 6.9891, time 120.63ms
iter 102190: loss 6.9080, time 119.28ms
tensor(0.5937)
iter 102200: loss 7.1645, time 118.65ms
iter 102210: loss 8.0544, time 120.61ms
iter 102220: loss 7.1021, time 118.83ms
iter 102230: loss 7.2476, time 118.90ms
iter 102240: loss 7.7669, time 120.21ms
step 102250: train loss 6.5509, val loss 6.6180
saving checkpoint to out-shakespeare-char
iter 102250: loss 7.3281, time 2880.27ms
iter 102260: loss 8.1531, time 125.33ms
iter 102270: loss 7.5336, time 122.89ms
iter 102280: loss 7.5658, time 123.36ms
iter 102290: loss 6.7616, time 122.91ms
tensor(0.5627)
iter 102300: loss 7.0108, time 123.15ms
iter 102310: loss 7.6123, time 122.59ms
iter 102320: loss 7.3195, time 122.66ms
iter 102330: loss 7.8901, time 122.91ms
iter 102340: loss 7.1755, time 122.70ms
iter 102350: loss 7.5056, time 123.74ms
iter 102360: loss 6.9824, time 121.97ms
iter 102370: loss 7.6181, time 121.75ms
iter 102380: loss 6.9376, time 122.91ms
iter 102390: loss 7.6712, time 122.69ms
tensor(0.5314)
iter 102400: loss 7.5501, time 122.70ms
iter 102410: loss 6.6345, time 123.42ms
iter 102420: loss 6.5586, time 122.05ms
iter 102430: loss 7.2071, time 121.79ms
iter 102440: loss 7.0966, time 124.11ms
iter 102450: loss 6.4171, time 122.66ms
iter 102460: loss 6.8664, time 122.93ms
iter 102470: loss 7.1616, time 122.70ms
iter 102480: loss 7.5040, time 124.07ms
iter 102490: loss 6.6507, time 123.14ms
tensor(0.5000)
step 102500: train loss 6.5848, val loss 6.6032
saving checkpoint to out-shakespeare-char
iter 102500: loss 7.1402, time 2855.20ms
iter 102510: loss 6.9425, time 123.12ms
iter 102520: loss 7.7593, time 123.85ms
iter 102530: loss 7.0761, time 123.53ms
iter 102540: loss 7.3918, time 125.35ms
iter 102550: loss 6.8010, time 123.16ms
iter 102560: loss 6.9788, time 123.33ms
iter 102570: loss 7.1529, time 123.37ms
iter 102580: loss 7.1648, time 123.21ms
iter 102590: loss 7.6384, time 122.74ms
tensor(0.4686)
iter 102600: loss 6.9311, time 124.30ms
iter 102610: loss 6.6738, time 122.30ms
iter 102620: loss 7.2234, time 122.49ms
iter 102630: loss 7.0079, time 122.43ms
iter 102640: loss 6.8406, time 121.60ms
iter 102650: loss 6.6603, time 122.41ms
iter 102660: loss 6.6348, time 122.41ms
iter 102670: loss 7.6512, time 122.23ms
iter 102680: loss 7.1711, time 124.73ms
iter 102690: loss 7.1875, time 121.95ms
tensor(0.4373)
iter 102700: loss 7.2479, time 122.34ms
iter 102710: loss 7.2516, time 122.44ms
iter 102720: loss 6.6534, time 123.64ms
iter 102730: loss 7.0334, time 122.56ms
iter 102740: loss 7.5677, time 122.68ms
step 102750: train loss 6.5341, val loss 6.5047
saving checkpoint to out-shakespeare-char
iter 102750: loss 7.8404, time 2865.25ms
iter 102760: loss 7.5568, time 122.51ms
iter 102770: loss 7.2924, time 120.90ms
iter 102780: loss 6.6690, time 122.17ms
iter 102790: loss 6.6176, time 122.61ms
tensor(0.4063)
iter 102800: loss 7.2585, time 122.69ms
iter 102810: loss 6.8527, time 122.14ms
iter 102820: loss 7.2793, time 119.19ms
iter 102830: loss 7.0309, time 120.80ms
iter 102840: loss 6.9479, time 118.97ms
iter 102850: loss 7.5853, time 120.41ms
iter 102860: loss 8.4555, time 119.75ms
iter 102870: loss 7.6532, time 118.80ms
iter 102880: loss 7.0585, time 119.18ms
iter 102890: loss 7.1279, time 119.47ms
tensor(0.3757)
iter 102900: loss 6.4751, time 121.18ms
iter 102910: loss 6.5273, time 119.57ms
iter 102920: loss 6.5076, time 121.52ms
iter 102930: loss 7.3280, time 119.76ms
iter 102940: loss 6.6919, time 119.65ms
iter 102950: loss 7.3697, time 120.65ms
iter 102960: loss 7.2290, time 120.96ms
iter 102970: loss 7.3364, time 120.83ms
iter 102980: loss 7.0589, time 121.65ms
iter 102990: loss 6.7908, time 120.66ms
tensor(0.3455)
step 103000: train loss 6.4370, val loss 6.5010
saving checkpoint to out-shakespeare-char
iter 103000: loss 7.0172, time 2860.80ms
iter 103010: loss 7.2303, time 122.72ms
iter 103020: loss 8.3687, time 122.73ms
iter 103030: loss 7.3115, time 123.99ms
iter 103040: loss 6.6319, time 121.08ms
iter 103050: loss 7.2607, time 121.89ms
iter 103060: loss 7.1308, time 122.73ms
iter 103070: loss 7.4734, time 122.73ms
iter 103080: loss 7.0938, time 122.22ms
iter 103090: loss 7.0346, time 122.90ms
tensor(0.3159)
iter 103100: loss 6.5523, time 124.46ms
iter 103110: loss 7.0541, time 122.81ms
iter 103120: loss 7.7112, time 122.46ms
iter 103130: loss 6.5062, time 122.55ms
iter 103140: loss 7.4222, time 122.38ms
iter 103150: loss 6.5449, time 122.34ms
iter 103160: loss 7.0687, time 122.33ms
iter 103170: loss 7.1035, time 124.66ms
iter 103180: loss 6.8391, time 122.13ms
iter 103190: loss 7.2644, time 123.15ms
tensor(0.2871)
iter 103200: loss 7.8944, time 119.99ms
iter 103210: loss 6.5845, time 120.76ms
iter 103220: loss 7.1160, time 120.48ms
iter 103230: loss 7.0496, time 120.31ms
iter 103240: loss 6.3059, time 118.64ms
step 103250: train loss 6.5024, val loss 6.4710
saving checkpoint to out-shakespeare-char
iter 103250: loss 7.2663, time 2846.84ms
iter 103260: loss 6.8996, time 123.02ms
iter 103270: loss 6.9523, time 123.12ms
iter 103280: loss 6.6421, time 125.02ms
iter 103290: loss 6.1586, time 122.47ms
tensor(0.2591)
iter 103300: loss 7.3200, time 122.60ms
iter 103310: loss 6.9718, time 122.51ms
iter 103320: loss 7.1612, time 122.94ms
iter 103330: loss 7.0702, time 122.66ms
iter 103340: loss 6.7219, time 122.54ms
iter 103350: loss 6.8762, time 123.15ms
iter 103360: loss 7.4374, time 124.73ms
iter 103370: loss 6.3386, time 122.69ms
iter 103380: loss 7.2101, time 122.91ms
iter 103390: loss 6.6392, time 122.83ms
tensor(0.2321)
iter 103400: loss 7.2047, time 122.72ms
iter 103410: loss 7.2688, time 122.84ms
iter 103420: loss 7.1295, time 124.23ms
iter 103430: loss 6.7486, time 122.72ms
iter 103440: loss 7.1837, time 122.84ms
iter 103450: loss 6.4195, time 123.40ms
iter 103460: loss 6.3590, time 122.89ms
iter 103470: loss 6.3453, time 122.91ms
iter 103480: loss 7.4135, time 123.13ms
iter 103490: loss 6.5811, time 125.27ms
tensor(0.2061)
step 103500: train loss 6.4399, val loss 6.4318
saving checkpoint to out-shakespeare-char
iter 103500: loss 7.2197, time 2859.03ms
iter 103510: loss 7.2124, time 121.10ms
iter 103520: loss 6.4249, time 120.06ms
iter 103530: loss 6.2792, time 119.71ms
iter 103540: loss 7.0816, time 119.22ms
iter 103550: loss 6.3991, time 119.45ms
iter 103560: loss 6.7828, time 120.71ms
iter 103570: loss 6.7100, time 120.55ms
iter 103580: loss 7.7406, time 120.49ms
iter 103590: loss 6.6342, time 119.45ms
tensor(0.1813)
iter 103600: loss 6.6919, time 120.63ms
iter 103610: loss 6.7422, time 119.59ms
iter 103620: loss 6.6470, time 119.42ms
iter 103630: loss 6.8941, time 119.40ms
iter 103640: loss 6.7270, time 119.54ms
iter 103650: loss 6.8631, time 121.44ms
iter 103660: loss 6.8791, time 120.68ms
iter 103670: loss 7.0908, time 121.40ms
iter 103680: loss 6.8243, time 121.46ms
iter 103690: loss 7.0838, time 121.27ms
tensor(0.1577)
iter 103700: loss 7.7035, time 119.53ms
iter 103710: loss 7.1460, time 120.96ms
iter 103720: loss 7.1253, time 120.96ms
iter 103730: loss 7.0479, time 120.43ms
iter 103740: loss 7.2930, time 119.20ms
step 103750: train loss 6.3714, val loss 6.3931
saving checkpoint to out-shakespeare-char
iter 103750: loss 6.9973, time 2854.26ms
iter 103760: loss 6.9851, time 120.97ms
iter 103770: loss 6.9228, time 118.97ms
iter 103780: loss 7.4376, time 119.31ms
iter 103790: loss 6.8152, time 119.24ms
tensor(0.1355)
iter 103800: loss 7.4324, time 120.97ms
iter 103810: loss 6.9485, time 119.53ms
iter 103820: loss 7.3582, time 118.62ms
iter 103830: loss 7.2071, time 120.30ms
iter 103840: loss 6.3406, time 119.38ms
iter 103850: loss 7.0480, time 121.27ms
iter 103860: loss 6.9750, time 119.35ms
iter 103870: loss 6.6976, time 118.70ms
iter 103880: loss 6.9050, time 120.42ms
iter 103890: loss 6.4575, time 120.55ms
tensor(0.1147)
iter 103900: loss 7.5960, time 119.47ms
iter 103910: loss 7.1263, time 121.51ms
iter 103920: loss 7.2455, time 120.24ms
iter 103930: loss 6.6349, time 120.54ms
iter 103940: loss 6.9200, time 120.55ms
iter 103950: loss 6.9372, time 120.54ms
iter 103960: loss 7.2533, time 120.88ms
iter 103970: loss 7.0947, time 120.12ms
iter 103980: loss 6.9003, time 119.37ms
iter 103990: loss 6.6416, time 120.72ms
tensor(0.0955)
step 104000: train loss 6.3186, val loss 6.3785
saving checkpoint to out-shakespeare-char
iter 104000: loss 6.2911, time 2882.69ms
iter 104010: loss 5.9855, time 121.01ms
iter 104020: loss 8.0272, time 120.74ms
iter 104030: loss 6.8614, time 118.83ms
iter 104040: loss 5.8574, time 119.44ms
iter 104050: loss 7.1285, time 120.59ms
iter 104060: loss 7.1431, time 119.37ms
iter 104070: loss 5.7433, time 120.50ms
iter 104080: loss 7.6326, time 120.32ms
iter 104090: loss 7.0850, time 120.78ms
tensor(0.0778)
iter 104100: loss 6.2666, time 119.28ms
iter 104110: loss 7.3118, time 121.33ms
iter 104120: loss 6.9996, time 119.31ms
iter 104130: loss 6.5857, time 120.90ms
iter 104140: loss 7.3123, time 119.37ms
iter 104150: loss 7.3811, time 120.54ms
iter 104160: loss 6.6592, time 119.43ms
iter 104170: loss 7.5395, time 120.57ms
iter 104180: loss 7.1357, time 119.85ms
iter 104190: loss 6.7263, time 119.33ms
tensor(0.0618)
iter 104200: loss 7.3519, time 120.55ms
iter 104210: loss 6.8318, time 119.91ms
iter 104220: loss 6.8407, time 119.51ms
iter 104230: loss 6.8564, time 120.51ms
iter 104240: loss 6.9548, time 121.57ms
step 104250: train loss 6.2745, val loss 6.2858
saving checkpoint to out-shakespeare-char
iter 104250: loss 6.4773, time 2857.41ms
iter 104260: loss 6.4831, time 123.21ms
iter 104270: loss 6.6318, time 123.34ms
iter 104280: loss 6.2647, time 125.37ms
iter 104290: loss 6.9939, time 122.92ms
tensor(0.0476)
iter 104300: loss 6.3066, time 122.76ms
iter 104310: loss 6.7591, time 123.20ms
iter 104320: loss 6.5487, time 122.97ms
iter 104330: loss 6.9921, time 122.94ms
iter 104340: loss 5.7997, time 122.91ms
iter 104350: loss 6.1573, time 122.50ms
iter 104360: loss 6.8579, time 122.32ms
iter 104370: loss 6.7256, time 122.84ms
iter 104380: loss 6.3346, time 123.27ms
iter 104390: loss 6.4275, time 124.77ms
tensor(0.0351)
iter 104400: loss 6.1053, time 122.67ms
iter 104410: loss 6.9014, time 122.27ms
iter 104420: loss 6.4666, time 122.65ms
iter 104430: loss 6.1771, time 123.08ms
iter 104440: loss 7.0042, time 123.31ms
iter 104450: loss 7.1632, time 122.96ms
iter 104460: loss 6.9148, time 122.63ms
iter 104470: loss 6.5361, time 122.68ms
iter 104480: loss 6.8527, time 122.57ms
iter 104490: loss 6.3470, time 122.39ms
tensor(0.0245)
step 104500: train loss 6.2566, val loss 6.2635
saving checkpoint to out-shakespeare-char
iter 104500: loss 6.7116, time 2844.07ms
iter 104510: loss 6.6512, time 124.60ms
iter 104520: loss 6.5715, time 122.51ms
iter 104530: loss 6.6754, time 122.50ms
iter 104540: loss 6.5898, time 122.49ms
iter 104550: loss 6.2837, time 122.61ms
iter 104560: loss 6.9948, time 122.43ms
iter 104570: loss 6.4724, time 123.22ms
iter 104580: loss 6.2514, time 124.79ms
iter 104590: loss 7.0774, time 122.36ms
tensor(0.0157)
iter 104600: loss 6.1758, time 122.40ms
iter 104610: loss 6.4665, time 122.76ms
iter 104620: loss 6.9037, time 122.58ms
iter 104630: loss 6.1440, time 122.56ms
iter 104640: loss 7.0317, time 122.34ms
iter 104650: loss 6.7005, time 123.70ms
iter 104660: loss 6.6168, time 122.49ms
iter 104670: loss 6.7283, time 122.60ms
iter 104680: loss 6.1345, time 122.52ms
iter 104690: loss 6.8521, time 122.62ms
tensor(0.0089)
iter 104700: loss 6.8696, time 122.54ms
iter 104710: loss 6.6860, time 123.58ms
iter 104720: loss 7.5720, time 125.14ms
iter 104730: loss 7.3810, time 122.59ms
iter 104740: loss 6.2131, time 123.31ms
step 104750: train loss 6.1891, val loss 6.2564
saving checkpoint to out-shakespeare-char
iter 104750: loss 7.0166, time 2837.34ms
iter 104760: loss 6.5934, time 124.48ms
iter 104770: loss 7.2531, time 124.79ms
iter 104780: loss 6.7295, time 120.68ms
iter 104790: loss 6.6066, time 122.89ms
tensor(0.0039)
iter 104800: loss 8.0220, time 122.68ms
iter 104810: loss 6.4374, time 122.68ms
iter 104820: loss 6.5949, time 123.07ms
iter 104830: loss 6.4782, time 122.47ms
iter 104840: loss 6.7012, time 123.58ms
iter 104850: loss 6.6421, time 124.71ms
iter 104860: loss 6.6395, time 122.43ms
iter 104870: loss 6.7699, time 122.80ms
iter 104880: loss 5.6208, time 124.79ms
iter 104890: loss 6.9178, time 124.11ms
tensor(0.0010)
iter 104900: loss 6.5804, time 123.13ms
iter 104910: loss 7.1335, time 124.72ms
iter 104920: loss 7.1399, time 125.28ms
iter 104930: loss 7.5159, time 123.93ms
iter 104940: loss 6.7764, time 122.75ms
iter 104950: loss 6.2223, time 123.29ms
iter 104960: loss 6.9148, time 122.92ms
iter 104970: loss 6.8068, time 124.72ms
iter 104980: loss 6.7703, time 123.35ms
iter 104990: loss 7.2490, time 125.54ms
tensor(0.0010)
step 105000: train loss 6.2201, val loss 6.1473
saving checkpoint to out-shakespeare-char
iter 105000: loss 6.9071, time 2850.79ms
iter 105010: loss 6.8703, time 125.07ms
iter 105020: loss 6.3231, time 123.08ms
iter 105030: loss 7.1009, time 125.84ms
iter 105040: loss 7.5824, time 121.72ms
iter 105050: loss 7.1212, time 121.59ms
iter 105060: loss 7.1512, time 120.97ms
iter 105070: loss 6.2578, time 122.61ms
iter 105080: loss 6.8305, time 122.71ms
iter 105090: loss 6.6728, time 121.47ms
tensor(0.0010)
iter 105100: loss 6.7967, time 124.82ms
iter 105110: loss 5.9851, time 122.44ms
iter 105120: loss 7.3757, time 122.39ms
iter 105130: loss 6.4377, time 122.61ms
iter 105140: loss 6.7673, time 122.62ms
iter 105150: loss 6.3060, time 122.66ms
iter 105160: loss 6.3706, time 123.15ms
iter 105170: loss 6.9825, time 122.62ms
iter 105180: loss 6.5552, time 124.31ms
iter 105190: loss 7.0188, time 122.43ms
tensor(0.0039)
iter 105200: loss 6.8222, time 122.31ms
iter 105210: loss 6.9486, time 122.55ms
iter 105220: loss 6.0789, time 123.16ms
iter 105230: loss 5.9463, time 123.01ms
iter 105240: loss 6.5265, time 123.46ms
step 105250: train loss 6.2722, val loss 6.2289
saving checkpoint to out-shakespeare-char
iter 105250: loss 6.3340, time 2861.21ms
iter 105260: loss 6.5298, time 123.42ms
iter 105270: loss 6.6595, time 123.73ms
iter 105280: loss 6.6539, time 123.54ms
iter 105290: loss 7.3888, time 123.15ms
tensor(0.0089)
iter 105300: loss 7.0701, time 123.05ms
iter 105310: loss 7.3262, time 123.14ms
iter 105320: loss 6.7641, time 122.88ms
iter 105330: loss 7.1362, time 122.00ms
iter 105340: loss 6.2088, time 122.63ms
iter 105350: loss 6.8739, time 121.56ms
iter 105360: loss 6.9381, time 122.17ms
iter 105370: loss 7.5454, time 124.90ms
iter 105380: loss 6.9610, time 122.91ms
iter 105390: loss 7.3388, time 123.18ms
tensor(0.0157)
iter 105400: loss 6.4757, time 123.13ms
iter 105410: loss 7.6033, time 121.19ms
iter 105420: loss 6.3809, time 119.85ms
iter 105430: loss 6.5128, time 121.53ms
iter 105440: loss 6.6504, time 118.51ms
iter 105450: loss 7.5904, time 119.45ms
iter 105460: loss 6.7913, time 120.50ms
iter 105470: loss 7.0380, time 118.80ms
iter 105480: loss 7.0899, time 120.62ms
iter 105490: loss 6.4032, time 119.28ms
tensor(0.0245)
step 105500: train loss 6.2046, val loss 6.2389
saving checkpoint to out-shakespeare-char
iter 105500: loss 6.8556, time 2857.47ms
iter 105510: loss 7.0650, time 121.22ms
iter 105520: loss 7.3529, time 120.15ms
iter 105530: loss 6.3138, time 120.35ms
iter 105540: loss 6.6080, time 119.40ms
iter 105550: loss 6.7545, time 120.43ms
iter 105560: loss 6.8078, time 119.19ms
iter 105570: loss 7.2046, time 119.31ms
iter 105580: loss 6.2122, time 121.25ms
iter 105590: loss 6.0480, time 120.49ms
tensor(0.0351)
iter 105600: loss 5.9534, time 121.00ms
iter 105610: loss 6.1401, time 118.74ms
iter 105620: loss 6.1021, time 121.23ms
iter 105630: loss 7.2270, time 120.39ms
iter 105640: loss 7.0601, time 119.20ms
iter 105650: loss 6.7494, time 120.68ms
iter 105660: loss 6.8304, time 119.52ms
iter 105670: loss 6.3507, time 119.95ms
iter 105680: loss 6.4145, time 120.54ms
iter 105690: loss 7.3558, time 119.26ms
tensor(0.0476)
iter 105700: loss 6.8662, time 120.47ms
iter 105710: loss 6.4918, time 119.53ms
iter 105720: loss 6.9169, time 119.76ms
iter 105730: loss 6.7506, time 119.36ms
iter 105740: loss 7.4234, time 120.24ms
step 105750: train loss 6.2379, val loss 6.2112
saving checkpoint to out-shakespeare-char
iter 105750: loss 6.4760, time 2860.61ms
iter 105760: loss 6.4755, time 120.54ms
iter 105770: loss 6.9915, time 120.97ms
iter 105780: loss 6.8864, time 120.81ms
iter 105790: loss 6.8843, time 121.22ms
tensor(0.0618)
iter 105800: loss 6.7948, time 120.17ms
iter 105810: loss 5.7968, time 121.17ms
iter 105820: loss 6.6358, time 118.94ms
iter 105830: loss 5.4356, time 120.92ms
iter 105840: loss 6.9248, time 119.15ms
iter 105850: loss 6.7561, time 119.26ms
iter 105860: loss 6.4844, time 119.10ms
iter 105870: loss 6.9591, time 120.18ms
iter 105880: loss 6.5127, time 119.31ms
iter 105890: loss 6.7319, time 120.38ms
tensor(0.0778)
iter 105900: loss 6.2097, time 120.67ms
iter 105910: loss 6.1408, time 120.38ms
iter 105920: loss 6.3202, time 120.18ms
iter 105930: loss 6.3411, time 121.11ms
iter 105940: loss 7.1276, time 120.16ms
iter 105950: loss 7.9173, time 119.01ms
iter 105960: loss 5.9764, time 118.99ms
iter 105970: loss 6.8546, time 120.27ms
iter 105980: loss 6.6526, time 119.15ms
iter 105990: loss 7.2151, time 120.20ms
tensor(0.0955)
step 106000: train loss 6.1818, val loss 6.2563
saving checkpoint to out-shakespeare-char
iter 106000: loss 6.9714, time 2861.71ms
iter 106010: loss 6.9897, time 119.73ms
iter 106020: loss 6.4656, time 118.33ms
iter 106030: loss 6.5773, time 119.38ms
iter 106040: loss 6.1485, time 119.45ms
iter 106050: loss 6.2313, time 120.63ms
iter 106060: loss 7.4456, time 119.53ms
iter 106070: loss 6.6375, time 119.39ms
iter 106080: loss 7.2405, time 119.47ms
iter 106090: loss 6.8527, time 119.30ms
tensor(0.1147)
iter 106100: loss 6.3763, time 119.41ms
iter 106110: loss 6.2843, time 119.27ms
iter 106120: loss 6.9940, time 119.36ms
iter 106130: loss 7.1787, time 119.26ms
iter 106140: loss 6.9007, time 119.30ms
iter 106150: loss 7.4837, time 119.97ms
iter 106160: loss 6.9225, time 119.20ms
iter 106170: loss 6.5413, time 119.21ms
iter 106180: loss 6.6306, time 120.69ms
iter 106190: loss 6.8549, time 119.37ms
tensor(0.1355)
iter 106200: loss 7.1740, time 119.53ms
iter 106210: loss 7.4775, time 119.38ms
iter 106220: loss 6.6868, time 119.35ms
iter 106230: loss 6.7139, time 119.41ms
iter 106240: loss 7.4955, time 119.84ms
step 106250: train loss 6.2483, val loss 6.3304
saving checkpoint to out-shakespeare-char
iter 106250: loss 6.8262, time 2879.75ms
iter 106260: loss 7.0058, time 119.39ms
iter 106270: loss 7.3571, time 119.45ms
iter 106280: loss 6.9555, time 119.40ms
iter 106290: loss 6.6691, time 119.20ms
tensor(0.1577)
iter 106300: loss 6.6961, time 119.68ms
iter 106310: loss 7.4649, time 119.23ms
iter 106320: loss 6.9958, time 119.30ms
iter 106330: loss 6.3611, time 120.87ms
iter 106340: loss 7.7116, time 120.40ms
iter 106350: loss 6.7240, time 119.25ms
iter 106360: loss 6.8033, time 120.56ms
iter 106370: loss 7.5801, time 119.38ms
iter 106380: loss 6.4654, time 119.32ms
iter 106390: loss 6.9478, time 119.33ms
tensor(0.1813)
iter 106400: loss 6.5068, time 119.77ms
iter 106410: loss 7.0078, time 119.43ms
iter 106420: loss 6.4313, time 119.43ms
iter 106430: loss 6.7522, time 120.52ms
iter 106440: loss 6.1490, time 119.34ms
iter 106450: loss 6.2617, time 121.93ms
iter 106460: loss 6.4667, time 120.65ms
iter 106470: loss 6.6624, time 119.59ms
iter 106480: loss 6.4181, time 119.56ms
iter 106490: loss 7.6468, time 118.90ms
tensor(0.2061)
step 106500: train loss 6.3945, val loss 6.3755
saving checkpoint to out-shakespeare-char
iter 106500: loss 6.5595, time 2862.43ms
iter 106510: loss 7.9242, time 118.75ms
iter 106520: loss 6.0826, time 119.52ms
iter 106530: loss 6.8920, time 121.55ms
iter 106540: loss 6.6447, time 119.53ms
iter 106550: loss 7.4803, time 121.09ms
iter 106560: loss 6.6426, time 118.39ms
iter 106570: loss 6.8024, time 119.47ms
iter 106580: loss 7.0863, time 120.62ms
iter 106590: loss 7.3426, time 119.34ms
tensor(0.2321)
iter 106600: loss 6.1815, time 120.17ms
iter 106610: loss 7.6301, time 118.85ms
iter 106620: loss 6.1919, time 120.74ms
iter 106630: loss 6.3261, time 119.23ms
iter 106640: loss 6.8270, time 120.07ms
iter 106650: loss 6.6901, time 120.57ms
iter 106660: loss 6.6203, time 120.11ms
iter 106670: loss 7.4918, time 120.56ms
iter 106680: loss 6.9763, time 120.69ms
iter 106690: loss 7.3017, time 119.69ms
tensor(0.2591)
iter 106700: loss 7.4960, time 120.67ms
iter 106710: loss 7.3693, time 119.57ms
iter 106720: loss 6.6613, time 120.75ms
iter 106730: loss 7.3002, time 119.53ms
iter 106740: loss 7.1811, time 119.60ms
step 106750: train loss 6.4074, val loss 6.3772
saving checkpoint to out-shakespeare-char
iter 106750: loss 6.1487, time 2862.88ms
iter 106760: loss 7.0166, time 121.31ms
iter 106770: loss 5.9416, time 120.55ms
iter 106780: loss 6.8343, time 120.49ms
iter 106790: loss 6.8300, time 119.27ms
tensor(0.2871)
iter 106800: loss 6.6526, time 119.09ms
iter 106810: loss 7.5137, time 120.59ms
iter 106820: loss 6.6688, time 119.33ms
iter 106830: loss 7.0813, time 120.69ms
iter 106840: loss 6.7497, time 119.39ms
iter 106850: loss 6.6949, time 119.15ms
iter 106860: loss 6.6001, time 119.43ms
iter 106870: loss 6.8809, time 120.50ms
iter 106880: loss 7.2949, time 119.86ms
iter 106890: loss 7.3606, time 119.26ms
tensor(0.3159)
iter 106900: loss 7.5057, time 119.49ms
iter 106910: loss 7.0382, time 120.61ms
iter 106920: loss 7.1886, time 120.81ms
iter 106930: loss 6.6040, time 120.89ms
iter 106940: loss 7.0106, time 121.33ms
iter 106950: loss 7.0650, time 120.48ms
iter 106960: loss 7.1237, time 121.84ms
iter 106970: loss 6.8884, time 120.33ms
iter 106980: loss 6.7221, time 120.44ms
iter 106990: loss 6.9755, time 119.30ms
tensor(0.3455)
step 107000: train loss 6.4371, val loss 6.5078
saving checkpoint to out-shakespeare-char
iter 107000: loss 7.1603, time 2871.73ms
iter 107010: loss 7.4127, time 122.70ms
iter 107020: loss 6.2789, time 122.65ms
iter 107030: loss 7.0643, time 122.72ms
iter 107040: loss 7.3422, time 124.92ms
iter 107050: loss 7.5508, time 122.62ms
iter 107060: loss 7.2811, time 122.64ms
iter 107070: loss 6.8248, time 122.49ms
iter 107080: loss 7.6001, time 122.46ms
iter 107090: loss 6.9533, time 123.42ms
tensor(0.3757)
iter 107100: loss 7.0222, time 122.80ms
iter 107110: loss 7.7803, time 124.64ms
iter 107120: loss 7.5076, time 122.79ms
iter 107130: loss 6.3954, time 122.57ms
iter 107140: loss 7.0248, time 122.26ms
iter 107150: loss 7.3442, time 122.60ms
iter 107160: loss 6.5734, time 122.72ms
iter 107170: loss 7.0316, time 122.66ms
iter 107180: loss 7.7668, time 124.73ms
iter 107190: loss 6.7020, time 122.20ms
tensor(0.4063)
iter 107200: loss 5.9054, time 122.54ms
iter 107210: loss 6.9325, time 122.59ms
iter 107220: loss 7.2322, time 122.63ms
iter 107230: loss 7.2160, time 122.43ms
iter 107240: loss 7.3293, time 122.97ms
step 107250: train loss 6.6085, val loss 6.5671
saving checkpoint to out-shakespeare-char
iter 107250: loss 6.1712, time 2865.32ms
iter 107260: loss 6.5014, time 122.83ms
iter 107270: loss 6.9571, time 122.26ms
iter 107280: loss 7.0135, time 122.69ms
iter 107290: loss 6.8965, time 122.82ms
tensor(0.4373)
iter 107300: loss 7.3036, time 122.09ms
iter 107310: loss 6.9873, time 124.72ms
iter 107320: loss 7.9845, time 122.42ms
iter 107330: loss 6.6988, time 123.03ms
iter 107340: loss 7.0027, time 122.72ms
iter 107350: loss 6.7994, time 118.78ms
iter 107360: loss 6.9863, time 118.71ms
iter 107370: loss 6.9653, time 120.82ms
iter 107380: loss 7.0900, time 117.66ms
iter 107390: loss 7.4057, time 117.67ms
tensor(0.4686)
iter 107400: loss 7.5338, time 119.21ms
iter 107410: loss 6.8992, time 119.42ms
iter 107420: loss 7.2899, time 117.58ms
iter 107430: loss 7.1255, time 118.66ms
iter 107440: loss 7.3212, time 117.36ms
iter 107450: loss 7.0042, time 118.73ms
iter 107460: loss 6.8028, time 117.41ms
iter 107470: loss 5.9966, time 117.45ms
iter 107480: loss 7.8798, time 119.81ms
iter 107490: loss 7.6737, time 118.66ms
tensor(0.5000)
step 107500: train loss 6.7300, val loss 6.7141
saving checkpoint to out-shakespeare-char
iter 107500: loss 7.3420, time 2862.34ms
iter 107510: loss 7.0830, time 120.65ms
iter 107520: loss 7.1635, time 120.57ms
iter 107530: loss 8.0549, time 119.21ms
iter 107540: loss 7.0815, time 120.62ms
iter 107550: loss 7.4003, time 119.48ms
iter 107560: loss 6.8553, time 120.26ms
iter 107570: loss 7.4522, time 119.31ms
iter 107580: loss 7.1366, time 120.71ms
iter 107590: loss 7.2893, time 119.67ms
tensor(0.5314)
iter 107600: loss 7.8697, time 120.29ms
iter 107610: loss 6.5000, time 120.63ms
iter 107620: loss 7.5918, time 124.28ms
iter 107630: loss 7.7017, time 124.96ms
iter 107640: loss 7.1427, time 122.23ms
iter 107650: loss 7.3593, time 123.04ms
iter 107660: loss 6.5082, time 122.28ms
iter 107670: loss 7.9636, time 123.35ms
iter 107680: loss 7.9435, time 122.83ms
iter 107690: loss 7.3886, time 124.04ms
tensor(0.5627)
iter 107700: loss 7.6016, time 122.10ms
iter 107710: loss 6.5880, time 122.00ms
iter 107720: loss 7.1136, time 122.95ms
iter 107730: loss 7.3394, time 122.99ms
iter 107740: loss 7.9701, time 122.91ms
step 107750: train loss 6.6611, val loss 6.6793
saving checkpoint to out-shakespeare-char
iter 107750: loss 7.8624, time 2849.42ms
iter 107760: loss 6.6518, time 122.67ms
iter 107770: loss 7.7684, time 122.52ms
iter 107780: loss 7.4122, time 121.42ms
iter 107790: loss 7.7911, time 123.07ms
tensor(0.5937)
iter 107800: loss 7.0831, time 123.30ms
iter 107810: loss 6.8270, time 124.08ms
iter 107820: loss 7.2772, time 122.92ms
iter 107830: loss 7.6509, time 122.98ms
iter 107840: loss 7.4019, time 121.50ms
iter 107850: loss 7.6449, time 122.04ms
iter 107860: loss 8.5538, time 123.40ms
iter 107870: loss 7.6113, time 123.93ms
iter 107880: loss 7.0572, time 125.10ms
iter 107890: loss 7.9132, time 123.20ms
tensor(0.6243)
iter 107900: loss 7.6858, time 122.07ms
iter 107910: loss 7.3600, time 121.60ms
iter 107920: loss 7.5961, time 123.34ms
iter 107930: loss 7.2154, time 123.23ms
iter 107940: loss 8.1941, time 124.55ms
iter 107950: loss 7.5099, time 124.76ms
iter 107960: loss 7.1991, time 122.30ms
iter 107970: loss 7.6472, time 122.77ms
iter 107980: loss 7.0955, time 122.89ms
iter 107990: loss 6.9843, time 122.89ms
tensor(0.6545)
step 108000: train loss 6.7900, val loss 6.8270
saving checkpoint to out-shakespeare-char
iter 108000: loss 7.6372, time 2845.69ms
iter 108010: loss 7.1601, time 124.11ms
iter 108020: loss 7.3100, time 122.34ms
iter 108030: loss 8.0171, time 122.07ms
iter 108040: loss 7.9957, time 122.26ms
iter 108050: loss 7.2369, time 122.22ms
iter 108060: loss 7.0757, time 122.94ms
iter 108070: loss 7.6907, time 122.86ms
iter 108080: loss 7.1238, time 124.63ms
iter 108090: loss 7.0129, time 121.96ms
tensor(0.6841)
iter 108100: loss 8.0461, time 122.50ms
iter 108110: loss 7.1922, time 122.22ms
iter 108120: loss 6.9747, time 121.90ms
iter 108130: loss 7.7044, time 121.95ms
iter 108140: loss 7.2951, time 122.37ms
iter 108150: loss 7.0790, time 124.53ms
iter 108160: loss 7.5266, time 122.05ms
iter 108170: loss 7.9230, time 122.13ms
iter 108180: loss 7.8932, time 121.82ms
iter 108190: loss 7.6221, time 122.02ms
tensor(0.7129)
iter 108200: loss 7.4814, time 122.28ms
iter 108210: loss 6.6187, time 122.19ms
iter 108220: loss 7.2040, time 124.35ms
iter 108230: loss 7.3626, time 122.28ms
iter 108240: loss 7.3812, time 122.28ms
step 108250: train loss 6.7271, val loss 6.7867
saving checkpoint to out-shakespeare-char
iter 108250: loss 7.4807, time 2855.54ms
iter 108260: loss 7.9274, time 124.59ms
iter 108270: loss 7.4625, time 122.57ms
iter 108280: loss 7.5671, time 122.55ms
iter 108290: loss 7.7824, time 122.23ms
tensor(0.7409)
iter 108300: loss 7.0054, time 122.36ms
iter 108310: loss 7.2786, time 122.22ms
iter 108320: loss 7.6113, time 122.41ms
iter 108330: loss 7.8170, time 124.91ms
iter 108340: loss 7.0234, time 122.67ms
iter 108350: loss 7.9390, time 122.47ms
iter 108360: loss 7.8607, time 122.84ms
iter 108370: loss 7.7651, time 122.80ms
iter 108380: loss 7.8345, time 122.84ms
iter 108390: loss 7.0123, time 122.91ms
tensor(0.7679)
iter 108400: loss 7.7479, time 123.20ms
iter 108410: loss 7.4701, time 124.96ms
iter 108420: loss 7.9419, time 123.51ms
iter 108430: loss 7.7953, time 122.62ms
iter 108440: loss 7.5410, time 123.03ms
iter 108450: loss 7.1719, time 123.12ms
iter 108460: loss 6.4856, time 122.62ms
iter 108470: loss 8.4932, time 123.32ms
iter 108480: loss 6.9485, time 125.16ms
iter 108490: loss 7.9850, time 122.66ms
tensor(0.7939)
step 108500: train loss 6.8332, val loss 6.8897
saving checkpoint to out-shakespeare-char
iter 108500: loss 8.7267, time 2876.96ms
iter 108510: loss 8.4180, time 123.24ms
iter 108520: loss 7.2103, time 123.53ms
iter 108530: loss 6.4603, time 125.46ms
iter 108540: loss 7.6571, time 123.84ms
iter 108550: loss 7.2241, time 121.43ms
iter 108560: loss 8.2379, time 122.90ms
iter 108570: loss 8.0240, time 122.76ms
iter 108580: loss 7.6930, time 123.08ms
iter 108590: loss 7.7982, time 122.95ms
tensor(0.8187)
iter 108600: loss 8.0772, time 124.73ms
iter 108610: loss 7.5873, time 122.72ms
iter 108620: loss 6.8368, time 122.40ms
iter 108630: loss 7.0007, time 122.29ms
iter 108640: loss 7.4402, time 122.53ms
iter 108650: loss 7.5624, time 122.55ms
iter 108660: loss 8.6313, time 122.61ms
iter 108670: loss 7.7302, time 125.36ms
iter 108680: loss 7.7699, time 122.28ms
iter 108690: loss 7.8643, time 122.89ms
tensor(0.8423)
iter 108700: loss 7.8168, time 122.65ms
iter 108710: loss 6.6700, time 122.18ms
iter 108720: loss 7.9017, time 122.50ms
iter 108730: loss 8.3404, time 122.49ms
iter 108740: loss 8.2212, time 124.64ms
step 108750: train loss 6.8975, val loss 6.8929
saving checkpoint to out-shakespeare-char
iter 108750: loss 8.0348, time 2865.14ms
iter 108760: loss 6.9061, time 121.36ms
iter 108770: loss 7.8547, time 123.41ms
iter 108780: loss 7.6524, time 122.73ms
iter 108790: loss 8.1531, time 122.45ms
tensor(0.8645)
iter 108800: loss 7.6974, time 122.53ms
iter 108810: loss 7.5363, time 124.57ms
iter 108820: loss 7.5663, time 122.31ms
iter 108830: loss 7.7041, time 122.75ms
iter 108840: loss 7.9073, time 122.49ms
iter 108850: loss 7.4270, time 123.04ms
iter 108860: loss 7.8177, time 122.49ms
iter 108870: loss 8.2471, time 122.70ms
iter 108880: loss 7.0934, time 125.23ms
iter 108890: loss 7.7213, time 122.35ms
tensor(0.8853)
iter 108900: loss 8.3859, time 121.92ms
iter 108910: loss 8.2523, time 122.49ms
iter 108920: loss 7.8015, time 122.25ms
iter 108930: loss 7.9810, time 122.52ms
iter 108940: loss 7.6620, time 122.94ms
iter 108950: loss 8.4877, time 122.88ms
iter 108960: loss 8.6931, time 124.78ms
iter 108970: loss 7.7382, time 121.64ms
iter 108980: loss 7.8597, time 122.38ms
iter 108990: loss 7.8071, time 122.59ms
tensor(0.9045)
step 109000: train loss 6.9560, val loss 7.0049
saving checkpoint to out-shakespeare-char
iter 109000: loss 8.2496, time 2875.29ms
iter 109010: loss 7.6195, time 122.52ms
iter 109020: loss 7.6884, time 122.61ms
iter 109030: loss 8.0414, time 122.76ms
iter 109040: loss 7.9453, time 122.94ms
iter 109050: loss 7.8117, time 122.69ms
iter 109060: loss 8.0969, time 124.92ms
iter 109070: loss 8.0240, time 122.42ms
iter 109080: loss 8.2232, time 122.67ms
iter 109090: loss 7.1965, time 121.62ms
tensor(0.9222)
iter 109100: loss 7.4298, time 122.49ms
iter 109110: loss 8.0757, time 122.52ms
iter 109120: loss 7.7682, time 121.81ms
iter 109130: loss 7.8231, time 123.47ms
iter 109140: loss 7.9113, time 122.18ms
iter 109150: loss 8.2628, time 122.14ms
iter 109160: loss 7.4040, time 122.31ms
iter 109170: loss 8.2450, time 122.83ms
iter 109180: loss 7.7150, time 121.57ms
iter 109190: loss 8.5455, time 123.07ms
tensor(0.9382)
iter 109200: loss 7.3945, time 123.54ms
iter 109210: loss 7.3967, time 124.94ms
iter 109220: loss 7.1403, time 122.18ms
iter 109230: loss 7.2308, time 121.86ms
iter 109240: loss 7.6171, time 122.72ms
step 109250: train loss 6.8926, val loss 6.8843
saving checkpoint to out-shakespeare-char
iter 109250: loss 8.2604, time 2852.83ms
iter 109260: loss 8.2788, time 122.85ms
iter 109270: loss 7.5534, time 122.73ms
iter 109280: loss 7.8915, time 122.86ms
iter 109290: loss 7.8685, time 122.77ms
tensor(0.9524)
iter 109300: loss 7.2260, time 123.38ms
iter 109310: loss 7.3620, time 123.94ms
iter 109320: loss 8.0946, time 125.67ms
iter 109330: loss 7.9003, time 123.21ms
iter 109340: loss 8.0233, time 123.83ms
iter 109350: loss 8.3040, time 122.74ms
iter 109360: loss 7.8094, time 123.36ms
iter 109370: loss 7.5757, time 123.15ms
iter 109380: loss 7.6817, time 123.14ms
iter 109390: loss 7.3851, time 123.06ms
tensor(0.9649)
iter 109400: loss 8.8036, time 122.83ms
iter 109410: loss 7.6739, time 122.70ms
iter 109420: loss 7.5074, time 122.65ms
iter 109430: loss 7.3909, time 122.61ms
iter 109440: loss 7.3843, time 123.12ms
iter 109450: loss 8.0667, time 123.02ms
iter 109460: loss 7.7516, time 124.89ms
iter 109470: loss 7.6686, time 122.56ms
iter 109480: loss 8.1429, time 122.76ms
iter 109490: loss 7.3484, time 122.74ms
tensor(0.9755)
step 109500: train loss 6.9783, val loss 7.0119
saving checkpoint to out-shakespeare-char
iter 109500: loss 7.6718, time 2845.91ms
iter 109510: loss 7.9634, time 120.20ms
iter 109520: loss 7.7351, time 121.61ms
iter 109530: loss 7.8002, time 119.74ms
iter 109540: loss 8.3221, time 119.61ms
iter 109550: loss 7.9864, time 121.73ms
iter 109560: loss 7.9415, time 120.01ms
iter 109570: loss 6.9541, time 119.56ms
iter 109580: loss 8.3027, time 121.29ms
iter 109590: loss 7.9481, time 119.93ms
tensor(0.9843)
iter 109600: loss 7.0025, time 119.07ms
iter 109610: loss 7.6190, time 121.82ms
iter 109620: loss 7.7990, time 120.44ms
iter 109630: loss 7.5652, time 120.47ms
iter 109640: loss 6.7093, time 119.37ms
iter 109650: loss 7.7336, time 119.65ms
iter 109660: loss 7.7024, time 121.50ms
iter 109670: loss 7.6593, time 120.61ms
iter 109680: loss 7.2739, time 120.56ms
iter 109690: loss 7.5827, time 119.83ms
tensor(0.9911)
iter 109700: loss 7.0650, time 121.34ms
iter 109710: loss 8.8948, time 119.78ms
iter 109720: loss 8.4426, time 119.58ms
iter 109730: loss 7.8181, time 119.47ms
iter 109740: loss 7.7730, time 121.06ms
step 109750: train loss 6.8920, val loss 6.9359
saving checkpoint to out-shakespeare-char
iter 109750: loss 7.6641, time 2878.74ms
iter 109760: loss 7.8055, time 125.30ms
iter 109770: loss 7.6970, time 121.66ms
iter 109780: loss 7.7768, time 122.89ms
iter 109790: loss 7.3746, time 122.74ms
tensor(0.9961)
iter 109800: loss 7.1990, time 122.80ms
iter 109810: loss 8.2039, time 122.98ms
iter 109820: loss 7.0797, time 122.86ms
iter 109830: loss 8.0718, time 124.91ms
iter 109840: loss 7.5037, time 123.06ms
iter 109850: loss 7.1067, time 123.01ms
iter 109860: loss 7.3923, time 123.46ms
iter 109870: loss 7.9050, time 123.77ms
iter 109880: loss 7.7867, time 122.69ms
iter 109890: loss 9.1117, time 120.67ms
tensor(0.9990)
iter 109900: loss 8.0041, time 120.91ms
iter 109910: loss 7.1952, time 122.24ms
iter 109920: loss 7.5134, time 122.91ms
iter 109930: loss 7.2565, time 124.00ms
iter 109940: loss 7.9181, time 122.32ms
iter 109950: loss 7.9483, time 123.52ms
iter 109960: loss 7.0399, time 124.61ms
iter 109970: loss 8.3532, time 122.66ms
iter 109980: loss 7.3231, time 122.65ms
iter 109990: loss 7.4946, time 122.25ms
tensor(1.)
step 110000: train loss 7.0044, val loss 6.9847
saving checkpoint to out-shakespeare-char
iter 110000: loss 7.6140, time 2863.65ms
iter 110010: loss 7.5482, time 122.52ms
iter 110020: loss 7.2374, time 122.60ms
iter 110030: loss 8.2892, time 122.51ms
iter 110040: loss 7.2396, time 122.55ms
iter 110050: loss 7.9526, time 121.45ms
iter 110060: loss 7.0405, time 121.54ms
iter 110070: loss 8.1219, time 123.21ms
iter 110080: loss 8.0386, time 122.58ms
iter 110090: loss 7.4849, time 122.45ms
tensor(0.9990)
iter 110100: loss 7.3735, time 123.99ms
iter 110110: loss 8.0510, time 122.71ms
iter 110120: loss 7.9432, time 121.45ms
iter 110130: loss 7.6725, time 122.39ms
iter 110140: loss 8.1053, time 122.58ms
iter 110150: loss 7.5673, time 122.33ms
iter 110160: loss 7.2744, time 122.55ms
iter 110170: loss 8.2905, time 123.78ms
iter 110180: loss 7.5720, time 123.75ms
iter 110190: loss 7.8531, time 122.36ms
tensor(0.9961)
iter 110200: loss 7.8308, time 122.51ms
iter 110210: loss 7.5833, time 122.35ms
iter 110220: loss 7.9314, time 122.30ms
iter 110230: loss 8.1344, time 122.82ms
iter 110240: loss 7.7830, time 124.65ms
step 110250: train loss 6.9542, val loss 6.9425
saving checkpoint to out-shakespeare-char
iter 110250: loss 7.9993, time 2854.34ms
iter 110260: loss 7.4961, time 123.21ms
iter 110270: loss 7.8704, time 125.17ms
iter 110280: loss 7.1815, time 123.22ms
iter 110290: loss 7.6719, time 123.30ms
tensor(0.9911)
iter 110300: loss 7.7030, time 123.29ms
iter 110310: loss 8.0273, time 122.97ms
iter 110320: loss 7.9764, time 122.79ms
iter 110330: loss 7.5441, time 125.15ms
iter 110340: loss 8.0593, time 123.05ms
iter 110350: loss 7.7882, time 123.44ms
iter 110360: loss 7.8897, time 123.37ms
iter 110370: loss 7.9658, time 122.53ms
iter 110380: loss 7.1702, time 123.33ms
iter 110390: loss 7.6424, time 122.95ms
tensor(0.9843)
iter 110400: loss 8.2675, time 122.19ms
iter 110410: loss 7.2581, time 122.49ms
iter 110420: loss 7.6474, time 124.15ms
iter 110430: loss 8.1902, time 122.89ms
iter 110440: loss 7.8597, time 122.88ms
iter 110450: loss 6.9912, time 123.01ms
iter 110460: loss 8.0638, time 122.75ms
iter 110470: loss 7.5742, time 123.00ms
iter 110480: loss 7.6642, time 123.89ms
iter 110490: loss 8.0757, time 123.08ms
tensor(0.9755)
step 110500: train loss 7.0627, val loss 7.0236
saving checkpoint to out-shakespeare-char
iter 110500: loss 9.1443, time 2859.31ms
iter 110510: loss 8.0099, time 124.83ms
iter 110520: loss 7.6970, time 122.69ms
iter 110530: loss 8.0534, time 122.52ms
iter 110540: loss 7.5781, time 123.78ms
iter 110550: loss 7.5704, time 123.99ms
iter 110560: loss 7.6375, time 124.68ms
iter 110570: loss 7.8120, time 124.63ms
iter 110580: loss 8.2958, time 122.44ms
iter 110590: loss 7.8177, time 122.52ms
tensor(0.9649)
iter 110600: loss 7.6530, time 122.34ms
iter 110610: loss 8.3508, time 122.31ms
iter 110620: loss 7.8282, time 122.78ms
iter 110630: loss 7.4864, time 122.77ms
iter 110640: loss 7.7490, time 123.66ms
iter 110650: loss 8.0839, time 122.27ms
iter 110660: loss 7.6136, time 122.63ms
iter 110670: loss 7.5825, time 122.70ms
iter 110680: loss 8.2187, time 122.52ms
iter 110690: loss 7.8989, time 123.86ms
tensor(0.9524)
iter 110700: loss 8.2401, time 123.95ms
iter 110710: loss 7.8814, time 121.52ms
iter 110720: loss 7.4567, time 123.64ms
iter 110730: loss 7.7911, time 122.86ms
iter 110740: loss 7.1807, time 121.37ms
step 110750: train loss 6.9701, val loss 7.0095
saving checkpoint to out-shakespeare-char
iter 110750: loss 7.9011, time 2873.16ms
iter 110760: loss 8.0573, time 119.41ms
iter 110770: loss 8.1634, time 120.12ms
iter 110780: loss 7.9959, time 120.53ms
iter 110790: loss 8.3708, time 120.40ms
tensor(0.9382)
iter 110800: loss 7.7159, time 119.57ms
iter 110810: loss 8.0339, time 119.72ms
iter 110820: loss 7.5180, time 120.30ms
iter 110830: loss 8.0089, time 120.53ms
iter 110840: loss 7.6197, time 119.26ms
iter 110850: loss 7.5906, time 120.80ms
iter 110860: loss 7.9254, time 119.53ms
iter 110870: loss 7.8593, time 119.62ms
iter 110880: loss 7.0819, time 119.99ms
iter 110890: loss 7.6912, time 120.69ms
tensor(0.9222)
iter 110900: loss 7.5635, time 119.29ms
iter 110910: loss 8.3076, time 119.53ms
iter 110920: loss 7.1417, time 119.37ms
iter 110930: loss 7.8743, time 121.49ms
iter 110940: loss 8.2992, time 120.72ms
iter 110950: loss 7.7452, time 119.17ms
iter 110960: loss 7.8602, time 119.33ms
iter 110970: loss 7.8520, time 120.10ms
iter 110980: loss 7.7226, time 120.69ms
iter 110990: loss 7.1418, time 119.53ms
tensor(0.9045)
step 111000: train loss 6.8937, val loss 6.8760
saving checkpoint to out-shakespeare-char
iter 111000: loss 7.1808, time 2885.63ms
iter 111010: loss 7.4007, time 119.43ms
iter 111020: loss 7.3739, time 120.63ms
iter 111030: loss 8.0828, time 119.40ms
iter 111040: loss 7.5802, time 119.15ms
iter 111050: loss 8.1022, time 120.77ms
iter 111060: loss 7.5760, time 120.40ms
iter 111070: loss 7.9112, time 120.65ms
iter 111080: loss 7.1174, time 118.53ms
iter 111090: loss 7.3101, time 121.27ms
tensor(0.8853)
iter 111100: loss 7.6809, time 120.08ms
iter 111110: loss 8.0225, time 119.16ms
iter 111120: loss 7.9538, time 120.65ms
iter 111130: loss 7.1589, time 119.37ms
iter 111140: loss 7.6775, time 120.12ms
iter 111150: loss 7.0862, time 119.16ms
iter 111160: loss 7.8219, time 119.17ms
iter 111170: loss 8.0069, time 120.69ms
iter 111180: loss 8.0118, time 118.55ms
iter 111190: loss 7.6817, time 121.29ms
tensor(0.8645)
iter 111200: loss 7.8892, time 120.78ms
iter 111210: loss 7.6706, time 120.62ms
iter 111220: loss 7.5355, time 119.34ms
iter 111230: loss 8.0174, time 119.30ms
iter 111240: loss 7.4442, time 120.56ms
step 111250: train loss 6.9398, val loss 6.9367
saving checkpoint to out-shakespeare-char
iter 111250: loss 7.4615, time 2889.67ms
iter 111260: loss 7.3283, time 120.79ms
iter 111270: loss 7.3561, time 119.43ms
iter 111280: loss 7.1729, time 120.59ms
iter 111290: loss 7.6996, time 120.00ms
tensor(0.8423)
iter 111300: loss 8.3504, time 121.01ms
iter 111310: loss 7.3916, time 120.35ms
iter 111320: loss 7.5431, time 119.66ms
iter 111330: loss 6.7990, time 119.58ms
iter 111340: loss 8.3679, time 120.03ms
iter 111350: loss 7.8398, time 120.41ms
iter 111360: loss 8.3761, time 119.45ms
iter 111370: loss 7.7819, time 119.12ms
iter 111380: loss 7.7498, time 119.14ms
iter 111390: loss 7.1796, time 119.81ms
tensor(0.8187)
iter 111400: loss 6.6746, time 118.99ms
iter 111410: loss 7.1654, time 120.65ms
iter 111420: loss 8.0213, time 119.40ms
iter 111430: loss 8.2838, time 119.17ms
iter 111440: loss 7.4808, time 120.73ms
iter 111450: loss 7.9095, time 120.90ms
iter 111460: loss 6.9033, time 120.70ms
iter 111470: loss 8.0611, time 119.31ms
iter 111480: loss 7.5333, time 120.52ms
iter 111490: loss 7.8677, time 119.50ms
tensor(0.7939)
step 111500: train loss 7.0170, val loss 7.0117
saving checkpoint to out-shakespeare-char
iter 111500: loss 7.2132, time 2871.58ms
iter 111510: loss 7.7080, time 121.17ms
iter 111520: loss 6.7901, time 119.89ms
iter 111530: loss 7.2774, time 119.65ms
iter 111540: loss 7.6766, time 120.94ms
iter 111550: loss 7.3353, time 120.77ms
iter 111560: loss 7.2672, time 121.10ms
iter 111570: loss 7.3208, time 119.54ms
iter 111580: loss 8.0231, time 120.30ms
iter 111590: loss 7.0988, time 121.43ms
tensor(0.7679)
iter 111600: loss 7.3805, time 121.07ms
iter 111610: loss 7.6658, time 120.01ms
iter 111620: loss 7.1176, time 118.87ms
iter 111630: loss 7.4787, time 119.22ms
iter 111640: loss 7.5215, time 120.60ms
iter 111650: loss 7.4164, time 119.20ms
iter 111660: loss 7.1443, time 121.79ms
iter 111670: loss 7.3569, time 120.50ms
iter 111680: loss 8.3137, time 119.22ms
iter 111690: loss 8.4708, time 119.63ms
tensor(0.7409)
iter 111700: loss 8.1954, time 119.23ms
iter 111710: loss 7.9482, time 121.40ms
iter 111720: loss 7.2445, time 120.03ms
iter 111730: loss 8.1981, time 119.17ms
iter 111740: loss 6.7969, time 119.33ms
step 111750: train loss 6.8626, val loss 6.8688
saving checkpoint to out-shakespeare-char
iter 111750: loss 7.2141, time 2878.09ms
iter 111760: loss 7.8779, time 123.00ms
iter 111770: loss 7.2559, time 124.92ms
iter 111780: loss 7.2043, time 122.93ms
iter 111790: loss 6.7988, time 123.16ms
tensor(0.7129)
iter 111800: loss 7.0528, time 122.79ms
iter 111810: loss 7.3706, time 122.80ms
iter 111820: loss 7.4123, time 123.37ms
iter 111830: loss 7.2276, time 123.28ms
iter 111840: loss 7.8531, time 124.62ms
iter 111850: loss 7.8824, time 122.82ms
iter 111860: loss 7.0468, time 122.92ms
iter 111870: loss 8.1506, time 122.66ms
iter 111880: loss 7.9981, time 122.66ms
iter 111890: loss 7.7270, time 122.84ms
tensor(0.6841)
iter 111900: loss 7.2225, time 123.45ms
iter 111910: loss 7.5641, time 122.86ms
iter 111920: loss 7.5196, time 122.58ms
iter 111930: loss 7.2875, time 122.96ms
iter 111940: loss 8.2576, time 122.63ms
iter 111950: loss 7.4724, time 123.20ms
iter 111960: loss 7.3863, time 124.16ms
iter 111970: loss 7.4318, time 122.63ms
iter 111980: loss 7.5127, time 124.03ms
iter 111990: loss 7.6873, time 122.23ms
tensor(0.6545)
step 112000: train loss 6.7681, val loss 6.7755
saving checkpoint to out-shakespeare-char
iter 112000: loss 7.9888, time 2842.26ms
iter 112010: loss 6.6322, time 122.38ms
iter 112020: loss 7.7532, time 122.89ms
iter 112030: loss 8.0146, time 124.76ms
iter 112040: loss 7.4986, time 122.48ms
iter 112050: loss 7.2136, time 122.67ms
iter 112060: loss 7.6062, time 123.46ms
iter 112070: loss 7.5645, time 123.12ms
iter 112080: loss 7.0410, time 124.52ms
iter 112090: loss 7.0492, time 123.26ms
tensor(0.6243)
iter 112100: loss 6.7780, time 123.30ms
iter 112110: loss 7.0675, time 125.39ms
iter 112120: loss 7.9501, time 122.49ms
iter 112130: loss 7.2506, time 123.43ms
iter 112140: loss 7.5745, time 122.65ms
iter 112150: loss 7.9956, time 123.31ms
iter 112160: loss 7.2280, time 123.32ms
iter 112170: loss 7.7977, time 123.41ms
iter 112180: loss 6.6267, time 125.39ms
iter 112190: loss 7.4802, time 123.30ms
tensor(0.5937)
iter 112200: loss 7.8318, time 123.36ms
iter 112210: loss 7.4562, time 123.36ms
iter 112220: loss 7.3041, time 123.40ms
iter 112230: loss 6.9662, time 123.11ms
iter 112240: loss 7.8646, time 123.45ms
step 112250: train loss 6.7729, val loss 6.8212
saving checkpoint to out-shakespeare-char
iter 112250: loss 7.3949, time 2858.52ms
iter 112260: loss 7.1315, time 123.25ms
iter 112270: loss 7.1414, time 123.24ms
iter 112280: loss 7.5312, time 123.29ms
iter 112290: loss 7.2300, time 125.10ms
tensor(0.5627)
iter 112300: loss 7.5267, time 123.29ms
iter 112310: loss 6.9738, time 123.38ms
iter 112320: loss 7.7857, time 123.27ms
iter 112330: loss 7.3964, time 123.34ms
iter 112340: loss 6.5922, time 123.23ms
iter 112350: loss 8.0944, time 123.34ms
iter 112360: loss 7.6648, time 125.36ms
iter 112370: loss 6.9624, time 123.49ms
iter 112380: loss 8.1573, time 123.25ms
iter 112390: loss 6.3655, time 122.59ms
tensor(0.5314)
iter 112400: loss 7.1536, time 123.61ms
iter 112410: loss 7.0693, time 123.14ms
iter 112420: loss 7.3647, time 123.68ms
iter 112430: loss 7.3882, time 123.60ms
iter 112440: loss 7.3197, time 123.45ms
iter 112450: loss 7.7258, time 123.11ms
iter 112460: loss 7.1263, time 123.42ms
iter 112470: loss 7.3472, time 123.45ms
iter 112480: loss 7.8497, time 125.18ms
iter 112490: loss 7.1685, time 122.92ms
tensor(0.5000)
step 112500: train loss 6.7325, val loss 6.7264
saving checkpoint to out-shakespeare-char
iter 112500: loss 7.2165, time 2854.99ms
iter 112510: loss 8.2906, time 122.96ms
iter 112520: loss 6.7812, time 122.17ms
iter 112530: loss 8.0004, time 125.68ms
iter 112540: loss 7.2754, time 122.81ms
iter 112550: loss 7.6786, time 122.92ms
iter 112560: loss 6.9634, time 123.29ms
iter 112570: loss 7.5710, time 122.86ms
iter 112580: loss 8.0209, time 122.49ms
iter 112590: loss 7.4318, time 122.85ms
tensor(0.4686)
iter 112600: loss 7.6935, time 124.87ms
iter 112610: loss 7.4103, time 122.84ms
iter 112620: loss 7.6891, time 122.92ms
iter 112630: loss 6.5594, time 122.66ms
iter 112640: loss 7.4031, time 123.08ms
iter 112650: loss 7.5054, time 122.05ms
iter 112660: loss 7.0616, time 122.79ms
iter 112670: loss 7.9752, time 124.90ms
iter 112680: loss 7.4788, time 122.88ms
iter 112690: loss 7.8173, time 122.65ms
tensor(0.4373)
iter 112700: loss 7.0501, time 122.65ms
iter 112710: loss 7.7295, time 123.01ms
iter 112720: loss 6.9835, time 122.75ms
iter 112730: loss 7.3204, time 122.94ms
iter 112740: loss 7.7862, time 124.98ms
step 112750: train loss 6.6989, val loss 6.7238
saving checkpoint to out-shakespeare-char
iter 112750: loss 6.2439, time 2851.65ms
iter 112760: loss 7.2050, time 123.10ms
iter 112770: loss 7.3881, time 124.72ms
iter 112780: loss 7.6662, time 124.50ms
iter 112790: loss 8.0868, time 123.97ms
tensor(0.4063)
iter 112800: loss 7.3404, time 122.98ms
iter 112810: loss 7.3521, time 123.08ms
iter 112820: loss 7.4124, time 123.12ms
iter 112830: loss 6.9602, time 123.27ms
iter 112840: loss 7.6085, time 122.44ms
iter 112850: loss 7.9964, time 122.17ms
iter 112860: loss 7.6980, time 122.45ms
iter 112870: loss 7.6747, time 124.52ms
iter 112880: loss 7.3369, time 122.25ms
iter 112890: loss 6.9385, time 122.41ms
tensor(0.3757)
iter 112900: loss 7.6180, time 122.85ms
iter 112910: loss 7.1057, time 122.83ms
iter 112920: loss 8.0258, time 122.14ms
iter 112930: loss 7.3450, time 123.45ms
iter 112940: loss 7.1485, time 124.96ms
iter 112950: loss 7.0277, time 122.74ms
iter 112960: loss 7.1505, time 122.93ms
iter 112970: loss 6.2638, time 122.65ms
iter 112980: loss 7.8009, time 122.89ms
iter 112990: loss 6.8140, time 122.82ms
tensor(0.3455)
step 113000: train loss 6.6480, val loss 6.6182
saving checkpoint to out-shakespeare-char
iter 113000: loss 6.7955, time 2857.86ms
iter 113010: loss 7.4666, time 122.43ms
iter 113020: loss 7.0039, time 122.48ms
iter 113030: loss 8.1726, time 122.59ms
iter 113040: loss 6.9162, time 122.47ms
iter 113050: loss 7.5454, time 123.77ms
iter 113060: loss 6.8344, time 121.48ms
iter 113070: loss 7.1730, time 122.29ms
iter 113080: loss 7.0793, time 122.24ms
iter 113090: loss 7.3344, time 122.28ms
tensor(0.3159)
iter 113100: loss 6.6656, time 122.59ms
iter 113110: loss 7.7436, time 123.82ms
iter 113120: loss 6.7328, time 124.44ms
iter 113130: loss 7.6200, time 122.25ms
iter 113140: loss 7.3645, time 122.15ms
iter 113150: loss 7.3620, time 122.61ms
iter 113160: loss 7.0792, time 122.35ms
iter 113170: loss 7.0618, time 122.32ms
iter 113180: loss 7.5671, time 122.34ms
iter 113190: loss 6.8826, time 124.45ms
tensor(0.2871)
iter 113200: loss 6.5445, time 122.28ms
iter 113210: loss 7.1373, time 122.21ms
iter 113220: loss 7.1807, time 122.84ms
iter 113230: loss 6.8254, time 122.34ms
iter 113240: loss 6.4389, time 122.45ms
step 113250: train loss 6.5852, val loss 6.5829
saving checkpoint to out-shakespeare-char
iter 113250: loss 6.3190, time 2861.01ms
iter 113260: loss 7.4725, time 122.71ms
iter 113270: loss 7.1616, time 122.75ms
iter 113280: loss 7.9751, time 123.18ms
iter 113290: loss 7.2919, time 122.78ms
tensor(0.2591)
iter 113300: loss 6.8441, time 122.35ms
iter 113310: loss 6.7522, time 124.45ms
iter 113320: loss 7.5377, time 122.21ms
iter 113330: loss 6.6310, time 122.24ms
iter 113340: loss 7.0534, time 122.46ms
iter 113350: loss 6.7392, time 122.19ms
iter 113360: loss 6.8911, time 122.31ms
iter 113370: loss 6.9560, time 123.87ms
iter 113380: loss 6.9351, time 121.98ms
iter 113390: loss 6.5310, time 121.81ms
tensor(0.2321)
iter 113400: loss 7.0764, time 122.80ms
iter 113410: loss 6.7198, time 122.71ms
iter 113420: loss 7.1807, time 122.50ms
iter 113430: loss 7.0204, time 122.20ms
iter 113440: loss 7.4671, time 123.63ms
iter 113450: loss 7.6476, time 121.40ms
iter 113460: loss 7.7900, time 121.73ms
iter 113470: loss 6.8265, time 122.55ms
iter 113480: loss 6.9644, time 122.25ms
iter 113490: loss 7.5594, time 122.95ms
tensor(0.2061)
step 113500: train loss 6.5464, val loss 6.5290
saving checkpoint to out-shakespeare-char
iter 113500: loss 6.6926, time 2869.41ms
iter 113510: loss 7.2546, time 122.96ms
iter 113520: loss 6.5235, time 123.04ms
iter 113530: loss 7.6794, time 123.12ms
iter 113540: loss 7.7404, time 123.45ms
iter 113550: loss 6.6881, time 123.89ms
iter 113560: loss 6.8943, time 122.19ms
iter 113570: loss 6.5754, time 123.33ms
iter 113580: loss 6.8060, time 122.24ms
iter 113590: loss 7.8364, time 123.03ms
tensor(0.1813)
iter 113600: loss 6.9043, time 122.93ms
iter 113610: loss 7.4880, time 123.24ms
iter 113620: loss 7.7709, time 123.10ms
iter 113630: loss 6.8019, time 120.45ms
iter 113640: loss 7.3704, time 119.59ms
iter 113650: loss 8.1115, time 121.09ms
iter 113660: loss 7.1184, time 119.12ms
iter 113670: loss 6.5317, time 121.09ms
iter 113680: loss 6.8073, time 119.03ms
iter 113690: loss 6.4771, time 120.28ms
tensor(0.1577)
iter 113700: loss 6.8196, time 120.12ms
iter 113710: loss 6.4761, time 121.46ms
iter 113720: loss 6.0957, time 119.53ms
iter 113730: loss 6.9957, time 121.50ms
iter 113740: loss 7.0029, time 118.99ms
step 113750: train loss 6.4618, val loss 6.4828
saving checkpoint to out-shakespeare-char
iter 113750: loss 6.6387, time 2858.68ms
iter 113760: loss 6.6465, time 122.28ms
iter 113770: loss 7.3017, time 121.07ms
iter 113780: loss 6.6831, time 120.73ms
iter 113790: loss 7.0506, time 119.41ms
tensor(0.1355)
iter 113800: loss 6.6508, time 119.67ms
iter 113810: loss 6.8362, time 121.97ms
iter 113820: loss 7.1229, time 119.53ms
iter 113830: loss 6.9601, time 120.79ms
iter 113840: loss 6.9299, time 120.62ms
iter 113850: loss 7.2952, time 121.52ms
iter 113860: loss 6.8996, time 120.60ms
iter 113870: loss 7.2286, time 120.69ms
iter 113880: loss 6.9340, time 119.63ms
iter 113890: loss 5.5434, time 119.70ms
tensor(0.1147)
iter 113900: loss 7.0393, time 120.77ms
iter 113910: loss 6.2379, time 120.19ms
iter 113920: loss 6.7579, time 120.93ms
iter 113930: loss 6.7368, time 119.63ms
iter 113940: loss 7.0716, time 121.11ms
iter 113950: loss 6.5250, time 119.50ms
iter 113960: loss 6.4741, time 119.56ms
iter 113970: loss 6.6665, time 119.49ms
iter 113980: loss 6.8599, time 119.60ms
iter 113990: loss 7.4729, time 119.54ms
tensor(0.0955)
step 114000: train loss 6.4418, val loss 6.4664
saving checkpoint to out-shakespeare-char
iter 114000: loss 6.3107, time 2869.21ms
iter 114010: loss 6.8185, time 119.63ms
iter 114020: loss 6.9651, time 121.14ms
iter 114030: loss 6.6375, time 120.98ms
iter 114040: loss 6.8877, time 119.58ms
iter 114050: loss 6.5963, time 121.98ms
iter 114060: loss 6.8915, time 120.98ms
iter 114070: loss 7.6917, time 121.91ms
iter 114080: loss 7.3175, time 120.69ms
iter 114090: loss 6.9988, time 120.71ms
tensor(0.0778)
iter 114100: loss 6.0724, time 120.98ms
iter 114110: loss 7.3453, time 120.66ms
iter 114120: loss 6.6053, time 119.39ms
iter 114130: loss 6.9598, time 121.03ms
iter 114140: loss 6.5213, time 119.47ms
iter 114150: loss 6.8718, time 120.86ms
iter 114160: loss 7.0126, time 119.55ms
iter 114170: loss 6.7198, time 119.79ms
iter 114180: loss 7.7478, time 119.53ms
iter 114190: loss 6.4810, time 120.70ms
tensor(0.0618)
iter 114200: loss 7.8622, time 121.29ms
iter 114210: loss 6.4226, time 120.71ms
iter 114220: loss 6.6663, time 119.54ms
iter 114230: loss 7.3433, time 120.68ms
iter 114240: loss 6.6010, time 119.61ms
step 114250: train loss 6.3561, val loss 6.4057
saving checkpoint to out-shakespeare-char
iter 114250: loss 6.8076, time 2870.67ms
iter 114260: loss 6.9635, time 123.01ms
iter 114270: loss 7.2919, time 122.90ms
iter 114280: loss 6.5641, time 122.62ms
iter 114290: loss 6.7478, time 124.77ms
tensor(0.0476)
iter 114300: loss 7.6030, time 122.47ms
iter 114310: loss 6.9045, time 121.50ms
iter 114320: loss 6.1892, time 122.87ms
iter 114330: loss 5.9434, time 122.02ms
iter 114340: loss 7.6358, time 123.66ms
iter 114350: loss 6.9338, time 121.95ms
iter 114360: loss 6.4877, time 125.00ms
iter 114370: loss 6.7052, time 122.84ms
iter 114380: loss 7.0393, time 123.04ms
iter 114390: loss 6.6440, time 123.28ms
tensor(0.0351)
iter 114400: loss 6.4114, time 122.60ms
iter 114410: loss 7.6588, time 123.04ms
iter 114420: loss 7.5315, time 122.94ms
iter 114430: loss 6.9616, time 122.76ms
iter 114440: loss 6.5133, time 122.80ms
iter 114450: loss 6.9219, time 122.45ms
iter 114460: loss 6.7760, time 122.92ms
iter 114470: loss 7.5178, time 122.94ms
iter 114480: loss 7.2335, time 122.50ms
iter 114490: loss 7.0744, time 122.88ms
tensor(0.0245)
step 114500: train loss 6.3307, val loss 6.3805
saving checkpoint to out-shakespeare-char
iter 114500: loss 7.1897, time 2855.82ms
iter 114510: loss 7.4271, time 122.45ms
iter 114520: loss 7.1531, time 122.11ms
iter 114530: loss 6.6562, time 123.35ms
iter 114540: loss 6.1391, time 123.03ms
iter 114550: loss 6.6813, time 122.68ms
iter 114560: loss 6.0282, time 123.49ms
iter 114570: loss 6.7090, time 125.14ms
iter 114580: loss 6.7049, time 122.83ms
iter 114590: loss 6.8260, time 123.44ms
tensor(0.0157)
iter 114600: loss 7.3321, time 122.01ms
iter 114610: loss 6.3728, time 122.67ms
iter 114620: loss 6.5608, time 122.94ms
iter 114630: loss 6.3584, time 122.86ms
iter 114640: loss 7.1147, time 122.21ms
iter 114650: loss 7.5731, time 122.17ms
iter 114660: loss 8.2562, time 124.91ms
iter 114670: loss 5.7792, time 123.03ms
iter 114680: loss 8.1331, time 123.20ms
iter 114690: loss 6.4428, time 123.08ms
tensor(0.0089)
iter 114700: loss 6.8256, time 122.84ms
iter 114710: loss 6.6413, time 122.54ms
iter 114720: loss 7.1047, time 123.36ms
iter 114730: loss 6.2625, time 123.54ms
iter 114740: loss 7.2192, time 122.99ms
step 114750: train loss 6.3558, val loss 6.3183
saving checkpoint to out-shakespeare-char
iter 114750: loss 6.5128, time 2852.22ms
iter 114760: loss 6.4469, time 123.21ms
iter 114770: loss 6.5856, time 123.17ms
iter 114780: loss 6.3479, time 122.50ms
iter 114790: loss 6.2463, time 122.92ms
tensor(0.0039)
iter 114800: loss 7.3401, time 122.68ms
iter 114810: loss 6.0264, time 123.29ms
iter 114820: loss 6.3165, time 123.52ms
iter 114830: loss 6.5871, time 122.94ms
iter 114840: loss 6.5573, time 123.11ms
iter 114850: loss 7.0944, time 122.58ms
iter 114860: loss 6.5577, time 123.08ms
iter 114870: loss 6.5849, time 124.84ms
iter 114880: loss 6.5825, time 122.90ms
iter 114890: loss 6.2977, time 123.05ms
tensor(0.0010)
iter 114900: loss 6.3215, time 122.88ms
iter 114910: loss 6.4671, time 122.71ms
iter 114920: loss 6.8691, time 123.92ms
iter 114930: loss 6.7369, time 124.03ms
iter 114940: loss 6.5640, time 124.81ms
iter 114950: loss 7.3316, time 122.60ms
iter 114960: loss 7.7403, time 122.70ms
iter 114970: loss 6.5506, time 122.57ms
iter 114980: loss 6.9015, time 123.20ms
iter 114990: loss 6.2677, time 123.05ms
tensor(0.0010)
step 115000: train loss 6.2982, val loss 6.3382
saving checkpoint to out-shakespeare-char
iter 115000: loss 6.7637, time 2863.95ms
iter 115010: loss 6.2949, time 122.08ms
iter 115020: loss 6.1569, time 122.96ms
iter 115030: loss 6.2984, time 122.83ms
iter 115040: loss 7.3424, time 122.60ms
iter 115050: loss 7.0355, time 122.67ms
iter 115060: loss 7.0895, time 123.01ms
iter 115070: loss 7.2525, time 124.37ms
iter 115080: loss 7.2845, time 123.11ms
iter 115090: loss 6.5316, time 123.36ms
tensor(0.0010)
iter 115100: loss 7.3805, time 123.37ms
iter 115110: loss 6.5276, time 123.28ms
iter 115120: loss 7.2056, time 123.71ms
iter 115130: loss 6.7412, time 123.23ms
iter 115140: loss 5.9585, time 125.46ms
iter 115150: loss 6.6157, time 122.72ms
iter 115160: loss 7.0654, time 123.22ms
iter 115170: loss 6.8652, time 123.99ms
iter 115180: loss 6.5732, time 123.27ms
iter 115190: loss 6.6774, time 123.29ms
tensor(0.0039)
iter 115200: loss 6.5219, time 123.39ms
iter 115210: loss 6.5366, time 125.56ms
iter 115220: loss 7.5079, time 123.21ms
iter 115230: loss 6.8386, time 123.16ms
iter 115240: loss 7.1332, time 123.39ms
step 115250: train loss 6.3004, val loss 6.3751
saving checkpoint to out-shakespeare-char
iter 115250: loss 6.9242, time 2871.75ms
iter 115260: loss 7.0984, time 123.54ms
iter 115270: loss 7.4675, time 125.35ms
iter 115280: loss 7.1888, time 123.38ms
iter 115290: loss 6.2393, time 122.93ms
tensor(0.0089)
iter 115300: loss 6.3885, time 123.54ms
iter 115310: loss 7.0952, time 123.02ms
iter 115320: loss 6.6473, time 122.70ms
iter 115330: loss 6.3028, time 123.28ms
iter 115340: loss 6.5432, time 123.22ms
iter 115350: loss 6.7747, time 125.05ms
iter 115360: loss 6.4355, time 123.10ms
iter 115370: loss 6.9501, time 122.70ms
iter 115380: loss 7.2023, time 123.25ms
iter 115390: loss 7.3628, time 123.28ms
tensor(0.0157)
iter 115400: loss 6.6480, time 123.62ms
iter 115410: loss 6.5596, time 123.40ms
iter 115420: loss 7.0575, time 125.30ms
iter 115430: loss 6.7356, time 124.01ms
iter 115440: loss 6.7128, time 124.49ms
iter 115450: loss 7.3297, time 125.18ms
iter 115460: loss 7.2134, time 123.60ms
iter 115470: loss 5.9786, time 123.23ms
iter 115480: loss 6.5294, time 123.21ms
iter 115490: loss 6.6299, time 123.18ms
tensor(0.0245)
step 115500: train loss 6.3240, val loss 6.4013
saving checkpoint to out-shakespeare-char
iter 115500: loss 6.3215, time 2853.24ms
iter 115510: loss 6.9966, time 123.12ms
iter 115520: loss 7.1678, time 122.86ms
iter 115530: loss 6.2133, time 122.89ms
iter 115540: loss 6.0494, time 123.18ms
iter 115550: loss 7.0605, time 124.76ms
iter 115560: loss 6.1722, time 122.48ms
iter 115570: loss 6.2115, time 122.31ms
iter 115580: loss 6.9847, time 122.53ms
iter 115590: loss 6.9251, time 121.86ms
tensor(0.0351)
iter 115600: loss 6.3441, time 122.43ms
iter 115610: loss 7.2724, time 122.42ms
iter 115620: loss 6.9539, time 124.71ms
iter 115630: loss 6.9182, time 122.87ms
iter 115640: loss 7.2402, time 122.01ms
iter 115650: loss 6.9285, time 122.65ms
iter 115660: loss 6.6311, time 122.49ms
iter 115670: loss 6.2677, time 122.62ms
iter 115680: loss 6.5301, time 123.34ms
iter 115690: loss 5.9636, time 124.93ms
tensor(0.0476)
iter 115700: loss 6.4046, time 122.73ms
iter 115710: loss 6.8592, time 122.60ms
iter 115720: loss 6.6348, time 122.45ms
iter 115730: loss 5.6591, time 121.78ms
iter 115740: loss 7.0134, time 121.44ms
step 115750: train loss 6.3554, val loss 6.3569
saving checkpoint to out-shakespeare-char
iter 115750: loss 6.8490, time 2864.20ms
iter 115760: loss 6.7454, time 122.77ms
iter 115770: loss 6.3093, time 122.75ms
iter 115780: loss 7.1955, time 121.95ms
iter 115790: loss 6.7959, time 124.05ms
tensor(0.0618)
iter 115800: loss 7.0370, time 124.68ms
iter 115810: loss 6.4615, time 123.04ms
iter 115820: loss 7.4561, time 122.69ms
iter 115830: loss 6.6844, time 122.90ms
iter 115840: loss 7.6250, time 123.00ms
iter 115850: loss 6.5924, time 122.93ms
iter 115860: loss 6.8006, time 122.87ms
iter 115870: loss 7.6391, time 123.13ms
iter 115880: loss 7.3556, time 124.51ms
iter 115890: loss 6.8920, time 122.79ms
tensor(0.0778)
iter 115900: loss 7.1616, time 122.83ms
iter 115910: loss 7.0993, time 122.41ms
iter 115920: loss 6.9495, time 122.80ms
iter 115930: loss 6.5877, time 122.45ms
iter 115940: loss 6.4906, time 122.41ms
iter 115950: loss 6.8212, time 124.86ms
iter 115960: loss 6.6881, time 122.44ms
iter 115970: loss 6.4620, time 122.36ms
iter 115980: loss 7.2830, time 122.46ms
iter 115990: loss 7.1918, time 121.30ms
tensor(0.0955)
step 116000: train loss 6.4016, val loss 6.3407
saving checkpoint to out-shakespeare-char
iter 116000: loss 6.2811, time 2856.81ms
iter 116010: loss 7.1592, time 123.34ms
iter 116020: loss 6.2456, time 123.04ms
iter 116030: loss 6.6095, time 123.03ms
iter 116040: loss 7.5055, time 122.44ms
iter 116050: loss 7.1122, time 123.48ms
iter 116060: loss 7.4435, time 124.94ms
iter 116070: loss 6.2487, time 122.80ms
iter 116080: loss 6.8035, time 123.28ms
iter 116090: loss 7.1391, time 123.05ms
tensor(0.1147)
iter 116100: loss 6.4992, time 122.93ms
iter 116110: loss 6.4177, time 122.78ms
iter 116120: loss 6.8927, time 124.35ms
iter 116130: loss 7.5512, time 125.67ms
iter 116140: loss 7.3228, time 123.03ms
iter 116150: loss 6.6045, time 124.04ms
iter 116160: loss 6.5959, time 123.20ms
iter 116170: loss 7.3093, time 123.37ms
iter 116180: loss 6.4218, time 123.39ms
iter 116190: loss 7.4580, time 123.07ms
tensor(0.1355)
iter 116200: loss 7.9018, time 122.90ms
iter 116210: loss 7.5780, time 123.23ms
iter 116220: loss 6.7285, time 125.26ms
iter 116230: loss 6.6512, time 123.06ms
iter 116240: loss 5.9010, time 123.01ms
step 116250: train loss 6.4409, val loss 6.4231
saving checkpoint to out-shakespeare-char
iter 116250: loss 6.7397, time 2861.18ms
iter 116260: loss 6.0545, time 123.86ms
iter 116270: loss 6.7186, time 121.92ms
iter 116280: loss 6.9725, time 121.37ms
iter 116290: loss 7.3061, time 121.53ms
tensor(0.1577)
iter 116300: loss 7.2160, time 122.70ms
iter 116310: loss 7.2903, time 122.71ms
iter 116320: loss 6.4019, time 122.66ms
iter 116330: loss 6.5203, time 123.71ms
iter 116340: loss 7.3341, time 120.41ms
iter 116350: loss 6.9031, time 119.34ms
iter 116360: loss 6.9415, time 121.64ms
iter 116370: loss 6.3040, time 119.34ms
iter 116380: loss 6.9512, time 120.26ms
iter 116390: loss 7.1667, time 119.28ms
tensor(0.1813)
iter 116400: loss 5.9735, time 120.47ms
iter 116410: loss 6.7041, time 120.09ms
iter 116420: loss 7.1518, time 120.45ms
iter 116430: loss 7.2078, time 119.16ms
iter 116440: loss 6.8221, time 120.47ms
iter 116450: loss 7.7886, time 119.68ms
iter 116460: loss 7.0962, time 120.00ms
iter 116470: loss 6.8547, time 121.13ms
iter 116480: loss 7.0652, time 119.17ms
iter 116490: loss 6.7559, time 120.62ms
tensor(0.2061)
step 116500: train loss 6.4681, val loss 6.5325
saving checkpoint to out-shakespeare-char
iter 116500: loss 7.1581, time 2861.35ms
iter 116510: loss 7.2097, time 121.64ms
iter 116520: loss 7.2696, time 119.90ms
iter 116530: loss 7.3346, time 121.35ms
iter 116540: loss 6.6462, time 120.90ms
iter 116550: loss 7.5600, time 121.81ms
iter 116560: loss 7.0476, time 119.15ms
iter 116570: loss 7.7876, time 120.42ms
iter 116580: loss 6.3965, time 119.06ms
iter 116590: loss 6.8210, time 121.79ms
tensor(0.2321)
iter 116600: loss 6.4964, time 119.13ms
iter 116610: loss 6.0731, time 119.19ms
iter 116620: loss 6.9502, time 119.21ms
iter 116630: loss 7.2579, time 119.41ms
iter 116640: loss 7.1762, time 120.59ms
iter 116650: loss 7.7433, time 120.73ms
iter 116660: loss 6.7534, time 120.34ms
iter 116670: loss 6.9034, time 119.19ms
iter 116680: loss 7.4786, time 121.57ms
iter 116690: loss 6.9365, time 120.08ms
tensor(0.2591)
iter 116700: loss 6.7647, time 120.31ms
iter 116710: loss 6.6190, time 119.39ms
iter 116720: loss 6.6473, time 121.49ms
iter 116730: loss 6.8317, time 120.89ms
iter 116740: loss 7.4475, time 120.92ms
step 116750: train loss 6.5754, val loss 6.6658
saving checkpoint to out-shakespeare-char
iter 116750: loss 7.2351, time 2858.12ms
iter 116760: loss 7.4699, time 119.33ms
iter 116770: loss 6.8983, time 119.28ms
iter 116780: loss 6.8509, time 121.20ms
iter 116790: loss 7.6704, time 118.69ms
tensor(0.2871)
iter 116800: loss 7.5949, time 119.83ms
iter 116810: loss 6.8359, time 120.27ms
iter 116820: loss 7.5656, time 120.38ms
iter 116830: loss 7.2912, time 119.80ms
iter 116840: loss 7.2263, time 119.15ms
iter 116850: loss 7.3047, time 120.71ms
iter 116860: loss 7.7432, time 119.30ms
iter 116870: loss 7.2553, time 121.90ms
iter 116880: loss 6.6245, time 119.15ms
iter 116890: loss 7.3411, time 120.73ms
tensor(0.3159)
iter 116900: loss 6.6573, time 119.22ms
iter 116910: loss 6.4497, time 119.19ms
iter 116920: loss 7.0572, time 119.04ms
iter 116930: loss 7.1237, time 119.22ms
iter 116940: loss 6.9857, time 119.02ms
iter 116950: loss 6.9092, time 119.25ms
iter 116960: loss 7.4734, time 119.43ms
iter 116970: loss 8.0811, time 119.17ms
iter 116980: loss 7.5352, time 119.03ms
iter 116990: loss 6.7939, time 119.44ms
tensor(0.3455)
step 117000: train loss 6.6955, val loss 6.6979
saving checkpoint to out-shakespeare-char
iter 117000: loss 7.7633, time 2857.68ms
iter 117010: loss 6.4633, time 121.18ms
iter 117020: loss 6.4450, time 119.22ms
iter 117030: loss 6.4574, time 119.03ms
iter 117040: loss 7.1626, time 119.91ms
iter 117050: loss 7.4983, time 119.07ms
iter 117060: loss 7.0653, time 118.82ms
iter 117070: loss 7.5551, time 119.18ms
iter 117080: loss 6.2626, time 119.83ms
iter 117090: loss 6.6658, time 119.21ms
tensor(0.3757)
iter 117100: loss 7.7881, time 118.96ms
iter 117110: loss 7.5075, time 119.21ms
iter 117120: loss 7.3264, time 119.23ms
iter 117130: loss 7.0996, time 118.71ms
iter 117140: loss 6.6298, time 119.13ms
iter 117150: loss 7.1234, time 119.46ms
iter 117160: loss 6.7908, time 121.42ms
iter 117170: loss 8.0234, time 119.59ms
iter 117180: loss 7.7263, time 120.37ms
iter 117190: loss 7.1488, time 119.61ms
tensor(0.4063)
iter 117200: loss 7.6401, time 119.22ms
iter 117210: loss 7.2565, time 118.85ms
iter 117220: loss 6.2853, time 119.12ms
iter 117230: loss 7.0507, time 119.07ms
iter 117240: loss 6.8861, time 119.18ms
step 117250: train loss 6.8165, val loss 6.7447
saving checkpoint to out-shakespeare-char
iter 117250: loss 7.3183, time 2871.74ms
iter 117260: loss 6.4319, time 124.87ms
iter 117270: loss 7.3727, time 122.68ms
iter 117280: loss 7.1073, time 122.81ms
iter 117290: loss 7.5513, time 122.78ms
tensor(0.4373)
iter 117300: loss 7.7955, time 120.52ms
iter 117310: loss 7.6923, time 119.54ms
iter 117320: loss 7.7848, time 121.66ms
iter 117330: loss 7.2406, time 119.58ms
iter 117340: loss 7.2592, time 119.72ms
iter 117350: loss 7.3193, time 119.62ms
iter 117360: loss 7.5651, time 120.80ms
iter 117370: loss 7.2253, time 121.50ms
iter 117380: loss 7.1524, time 120.52ms
iter 117390: loss 7.7109, time 120.66ms
tensor(0.4686)
iter 117400: loss 7.0043, time 119.62ms
iter 117410: loss 7.1877, time 119.51ms
iter 117420: loss 7.2040, time 120.90ms
iter 117430: loss 6.9158, time 120.67ms
iter 117440: loss 7.4044, time 119.47ms
iter 117450: loss 7.1118, time 119.66ms
iter 117460: loss 7.6725, time 120.85ms
iter 117470: loss 6.9625, time 119.36ms
iter 117480: loss 7.4719, time 119.63ms
iter 117490: loss 7.7370, time 119.59ms
tensor(0.5000)
step 117500: train loss 6.8022, val loss 6.8781
saving checkpoint to out-shakespeare-char
iter 117500: loss 7.3673, time 2853.79ms
iter 117510: loss 7.6897, time 120.76ms
iter 117520: loss 6.9500, time 119.80ms
iter 117530: loss 7.9771, time 118.97ms
iter 117540: loss 7.0911, time 119.20ms
iter 117550: loss 7.2610, time 120.04ms
iter 117560: loss 7.8166, time 120.42ms
iter 117570: loss 8.0538, time 120.05ms
iter 117580: loss 6.7958, time 120.77ms
iter 117590: loss 7.3972, time 119.18ms
tensor(0.5314)
iter 117600: loss 7.6104, time 121.09ms
iter 117610: loss 7.6873, time 118.99ms
iter 117620: loss 8.0242, time 118.91ms
iter 117630: loss 7.7804, time 120.17ms
iter 117640: loss 7.3758, time 120.23ms
iter 117650: loss 7.6100, time 120.36ms
iter 117660: loss 8.0385, time 120.74ms
iter 117670: loss 7.5355, time 120.99ms
iter 117680: loss 8.1390, time 119.26ms
iter 117690: loss 7.5179, time 119.20ms
tensor(0.5627)
iter 117700: loss 6.6929, time 119.88ms
iter 117710: loss 7.6196, time 119.12ms
iter 117720: loss 7.1593, time 119.11ms
iter 117730: loss 7.6991, time 118.96ms
iter 117740: loss 7.1185, time 119.12ms
step 117750: train loss 6.8420, val loss 6.8152
saving checkpoint to out-shakespeare-char
iter 117750: loss 7.3423, time 2848.61ms
iter 117760: loss 8.0223, time 119.06ms
iter 117770: loss 7.7001, time 119.16ms
iter 117780: loss 7.8257, time 120.62ms
iter 117790: loss 7.4483, time 119.35ms
tensor(0.5937)
iter 117800: loss 7.7879, time 119.14ms
iter 117810: loss 7.9078, time 119.13ms
iter 117820: loss 7.0896, time 118.73ms
iter 117830: loss 7.0988, time 119.17ms
iter 117840: loss 7.5794, time 121.02ms
iter 117850: loss 7.3722, time 119.71ms
iter 117860: loss 7.5463, time 118.92ms
iter 117870: loss 7.7399, time 120.10ms
iter 117880: loss 6.9344, time 118.25ms
iter 117890: loss 6.9060, time 118.04ms
tensor(0.6243)
iter 117900: loss 7.9522, time 121.01ms
iter 117910: loss 8.1567, time 120.29ms
iter 117920: loss 7.1704, time 120.51ms
iter 117930: loss 8.2180, time 120.25ms
iter 117940: loss 7.6377, time 119.17ms
iter 117950: loss 8.2684, time 120.66ms
iter 117960: loss 7.6477, time 121.28ms
iter 117970: loss 7.3583, time 119.23ms
iter 117980: loss 7.4246, time 119.70ms
iter 117990: loss 7.3687, time 119.25ms
tensor(0.6545)
step 118000: train loss 6.8718, val loss 6.8641
saving checkpoint to out-shakespeare-char
iter 118000: loss 7.7978, time 2862.44ms
iter 118010: loss 6.3897, time 120.43ms
iter 118020: loss 8.0864, time 118.77ms
iter 118030: loss 7.7665, time 120.48ms
iter 118040: loss 7.6864, time 119.17ms
iter 118050: loss 7.3769, time 118.66ms
iter 118060: loss 7.3410, time 119.03ms
iter 118070: loss 7.5921, time 119.18ms
iter 118080: loss 8.1623, time 119.13ms
iter 118090: loss 8.0915, time 119.64ms
tensor(0.6841)
iter 118100: loss 7.8500, time 119.30ms
iter 118110: loss 7.0451, time 119.05ms
iter 118120: loss 8.0347, time 119.07ms
iter 118130: loss 7.5660, time 119.35ms
iter 118140: loss 7.6335, time 119.76ms
iter 118150: loss 7.2285, time 120.31ms
iter 118160: loss 7.7172, time 119.43ms
iter 118170: loss 7.9894, time 121.10ms
iter 118180: loss 8.0710, time 118.90ms
iter 118190: loss 8.0561, time 119.51ms
tensor(0.7129)
iter 118200: loss 7.8545, time 119.69ms
iter 118210: loss 7.7479, time 121.01ms
iter 118220: loss 7.6729, time 119.68ms
iter 118230: loss 6.7518, time 121.26ms
iter 118240: loss 7.7873, time 119.14ms
step 118250: train loss 6.8882, val loss 6.8536
saving checkpoint to out-shakespeare-char
iter 118250: loss 7.4806, time 2846.78ms
iter 118260: loss 7.4135, time 120.80ms
iter 118270: loss 7.8536, time 120.55ms
iter 118280: loss 7.7236, time 121.84ms
iter 118290: loss 8.4244, time 119.65ms
tensor(0.7409)
iter 118300: loss 8.0007, time 120.66ms
iter 118310: loss 6.8536, time 119.35ms
iter 118320: loss 7.5261, time 119.35ms
iter 118330: loss 8.0328, time 120.33ms
iter 118340: loss 7.6966, time 119.64ms
iter 118350: loss 7.9330, time 119.37ms
iter 118360: loss 7.3070, time 119.22ms
iter 118370: loss 7.9374, time 119.74ms
iter 118380: loss 8.0718, time 119.49ms
iter 118390: loss 7.5669, time 118.83ms
tensor(0.7679)
iter 118400: loss 7.5891, time 119.60ms
iter 118410: loss 7.8067, time 119.41ms
iter 118420: loss 7.7205, time 119.44ms
iter 118430: loss 7.3520, time 119.75ms
iter 118440: loss 7.6302, time 119.50ms
iter 118450: loss 7.7738, time 118.20ms
iter 118460: loss 8.2341, time 121.42ms
iter 118470: loss 8.5294, time 119.75ms
iter 118480: loss 7.4984, time 121.49ms
iter 118490: loss 7.9477, time 120.00ms
tensor(0.7939)
step 118500: train loss 7.0941, val loss 7.1264
saving checkpoint to out-shakespeare-char
iter 118500: loss 7.8748, time 2863.29ms
iter 118510: loss 8.8662, time 119.56ms
iter 118520: loss 8.1072, time 119.39ms
iter 118530: loss 8.3013, time 119.35ms
iter 118540: loss 8.1847, time 119.22ms
iter 118550: loss 8.1302, time 118.94ms
iter 118560: loss 8.5937, time 119.80ms
iter 118570: loss 7.7714, time 121.72ms
iter 118580: loss 10.6045, time 119.38ms
iter 118590: loss 92408.2422, time 118.26ms
tensor(0.8187)
iter 118600: loss nan, time 119.36ms
iter 118610: loss nan, time 121.61ms
iter 118620: loss nan, time 119.25ms
iter 118630: loss nan, time 120.27ms
iter 118640: loss nan, time 119.20ms
iter 118650: loss nan, time 120.21ms
iter 118660: loss nan, time 119.29ms
iter 118670: loss nan, time 121.18ms
iter 118680: loss nan, time 118.86ms
iter 118690: loss nan, time 120.54ms
tensor(0.8423)
iter 118700: loss nan, time 119.21ms
iter 118710: loss nan, time 119.21ms
iter 118720: loss nan, time 119.09ms
iter 118730: loss nan, time 121.54ms
iter 118740: loss nan, time 120.32ms
step 118750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 118750: loss nan, time 2900.57ms
iter 118760: loss nan, time 119.81ms
iter 118770: loss nan, time 119.78ms
iter 118780: loss nan, time 119.13ms
iter 118790: loss nan, time 119.07ms
tensor(0.8645)
iter 118800: loss nan, time 119.15ms
iter 118810: loss nan, time 119.19ms
iter 118820: loss nan, time 120.31ms
iter 118830: loss nan, time 120.01ms
iter 118840: loss nan, time 121.56ms
iter 118850: loss nan, time 119.00ms
iter 118860: loss nan, time 120.27ms
iter 118870: loss nan, time 120.39ms
iter 118880: loss nan, time 120.36ms
iter 118890: loss nan, time 118.47ms
tensor(0.8853)
iter 118900: loss nan, time 120.76ms
iter 118910: loss nan, time 120.30ms
iter 118920: loss nan, time 119.25ms
iter 118930: loss nan, time 119.17ms
iter 118940: loss nan, time 119.19ms
iter 118950: loss nan, time 120.15ms
iter 118960: loss nan, time 121.18ms
iter 118970: loss nan, time 119.09ms
iter 118980: loss nan, time 120.18ms
iter 118990: loss nan, time 119.04ms
tensor(0.9045)
step 119000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 119000: loss nan, time 2910.99ms
iter 119010: loss nan, time 119.27ms
iter 119020: loss nan, time 119.05ms
iter 119030: loss nan, time 120.44ms
iter 119040: loss nan, time 118.86ms
iter 119050: loss nan, time 120.15ms
iter 119060: loss nan, time 119.10ms
iter 119070: loss nan, time 119.13ms
iter 119080: loss nan, time 119.89ms
iter 119090: loss nan, time 119.05ms
tensor(0.9222)
iter 119100: loss nan, time 119.80ms
iter 119110: loss nan, time 118.23ms
iter 119120: loss nan, time 120.17ms
iter 119130: loss nan, time 119.12ms
iter 119140: loss nan, time 120.06ms
iter 119150: loss nan, time 119.19ms
iter 119160: loss nan, time 120.19ms
iter 119170: loss nan, time 119.41ms
iter 119180: loss nan, time 119.05ms
iter 119190: loss nan, time 119.13ms
tensor(0.9382)
iter 119200: loss nan, time 119.17ms
iter 119210: loss nan, time 120.19ms
iter 119220: loss nan, time 119.02ms
iter 119230: loss nan, time 119.07ms
iter 119240: loss nan, time 119.21ms
step 119250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 119250: loss nan, time 2918.34ms
iter 119260: loss nan, time 119.08ms
iter 119270: loss nan, time 120.35ms
iter 119280: loss nan, time 120.25ms
iter 119290: loss nan, time 119.08ms
tensor(0.9524)
iter 119300: loss nan, time 120.82ms
iter 119310: loss nan, time 119.08ms
iter 119320: loss nan, time 119.11ms
iter 119330: loss nan, time 119.03ms
iter 119340: loss nan, time 119.82ms
iter 119350: loss nan, time 119.95ms
iter 119360: loss nan, time 120.11ms
iter 119370: loss nan, time 118.33ms
iter 119380: loss nan, time 118.92ms
iter 119390: loss nan, time 119.21ms
tensor(0.9649)
iter 119400: loss nan, time 120.39ms
iter 119410: loss nan, time 119.81ms
iter 119420: loss nan, time 118.72ms
iter 119430: loss nan, time 118.61ms
iter 119440: loss nan, time 119.04ms
iter 119450: loss nan, time 120.51ms
iter 119460: loss nan, time 119.08ms
iter 119470: loss nan, time 119.10ms
iter 119480: loss nan, time 119.01ms
iter 119490: loss nan, time 119.37ms
tensor(0.9755)
step 119500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 119500: loss nan, time 2929.23ms
iter 119510: loss nan, time 121.61ms
iter 119520: loss nan, time 123.04ms
iter 119530: loss nan, time 121.96ms
iter 119540: loss nan, time 127.75ms
iter 119550: loss nan, time 121.59ms
iter 119560: loss nan, time 123.09ms
iter 119570: loss nan, time 122.28ms
iter 119580: loss nan, time 122.30ms
iter 119590: loss nan, time 121.56ms
tensor(0.9843)
iter 119600: loss nan, time 124.41ms
iter 119610: loss nan, time 122.76ms
iter 119620: loss nan, time 122.55ms
iter 119630: loss nan, time 123.84ms
iter 119640: loss nan, time 122.22ms
iter 119650: loss nan, time 123.12ms
iter 119660: loss nan, time 122.87ms
iter 119670: loss nan, time 124.27ms
iter 119680: loss nan, time 122.50ms
iter 119690: loss nan, time 123.08ms
tensor(0.9911)
iter 119700: loss nan, time 123.70ms
iter 119710: loss nan, time 123.19ms
iter 119720: loss nan, time 123.61ms
iter 119730: loss nan, time 122.79ms
iter 119740: loss nan, time 124.28ms
step 119750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 119750: loss nan, time 2897.19ms
iter 119760: loss nan, time 127.32ms
iter 119770: loss nan, time 122.92ms
iter 119780: loss nan, time 122.90ms
iter 119790: loss nan, time 123.12ms
tensor(0.9961)
iter 119800: loss nan, time 122.70ms
iter 119810: loss nan, time 123.28ms
iter 119820: loss nan, time 124.60ms
iter 119830: loss nan, time 125.39ms
iter 119840: loss nan, time 123.17ms
iter 119850: loss nan, time 123.15ms
iter 119860: loss nan, time 123.01ms
iter 119870: loss nan, time 121.22ms
iter 119880: loss nan, time 124.73ms
iter 119890: loss nan, time 123.04ms
tensor(0.9990)
iter 119900: loss nan, time 124.76ms
iter 119910: loss nan, time 122.88ms
iter 119920: loss nan, time 122.73ms
iter 119930: loss nan, time 122.31ms
iter 119940: loss nan, time 121.97ms
iter 119950: loss nan, time 122.78ms
iter 119960: loss nan, time 121.90ms
iter 119970: loss nan, time 124.02ms
iter 119980: loss nan, time 122.58ms
iter 119990: loss nan, time 122.51ms
tensor(1.)
step 120000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 120000: loss nan, time 2916.90ms
iter 120010: loss nan, time 121.81ms
iter 120020: loss nan, time 123.07ms
iter 120030: loss nan, time 121.72ms
iter 120040: loss nan, time 123.27ms
iter 120050: loss nan, time 125.44ms
iter 120060: loss nan, time 123.26ms
iter 120070: loss nan, time 123.09ms
iter 120080: loss nan, time 123.16ms
iter 120090: loss nan, time 122.35ms
tensor(0.9990)
iter 120100: loss nan, time 123.17ms
iter 120110: loss nan, time 123.07ms
iter 120120: loss nan, time 123.37ms
iter 120130: loss nan, time 125.26ms
iter 120140: loss nan, time 123.14ms
iter 120150: loss nan, time 124.06ms
iter 120160: loss nan, time 123.22ms
iter 120170: loss nan, time 123.10ms
iter 120180: loss nan, time 121.84ms
iter 120190: loss nan, time 123.05ms
tensor(0.9961)
iter 120200: loss nan, time 123.24ms
iter 120210: loss nan, time 125.50ms
iter 120220: loss nan, time 123.13ms
iter 120230: loss nan, time 123.21ms
iter 120240: loss nan, time 123.36ms
step 120250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 120250: loss nan, time 2921.57ms
iter 120260: loss nan, time 122.37ms
iter 120270: loss nan, time 123.11ms
iter 120280: loss nan, time 123.38ms
iter 120290: loss nan, time 123.16ms
tensor(0.9911)
iter 120300: loss nan, time 123.32ms
iter 120310: loss nan, time 123.44ms
iter 120320: loss nan, time 124.57ms
iter 120330: loss nan, time 122.82ms
iter 120340: loss nan, time 121.33ms
iter 120350: loss nan, time 123.07ms
iter 120360: loss nan, time 123.04ms
iter 120370: loss nan, time 123.19ms
iter 120380: loss nan, time 122.82ms
iter 120390: loss nan, time 124.63ms
tensor(0.9843)
iter 120400: loss nan, time 123.31ms
iter 120410: loss nan, time 123.36ms
iter 120420: loss nan, time 123.16ms
iter 120430: loss nan, time 123.22ms
iter 120440: loss nan, time 123.19ms
iter 120450: loss nan, time 123.21ms
iter 120460: loss nan, time 125.41ms
iter 120470: loss nan, time 123.16ms
iter 120480: loss nan, time 123.21ms
iter 120490: loss nan, time 123.11ms
tensor(0.9755)
step 120500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 120500: loss nan, time 2913.83ms
iter 120510: loss nan, time 122.19ms
iter 120520: loss nan, time 123.17ms
iter 120530: loss nan, time 123.39ms
iter 120540: loss nan, time 123.24ms
iter 120550: loss nan, time 125.39ms
iter 120560: loss nan, time 123.33ms
iter 120570: loss nan, time 123.26ms
iter 120580: loss nan, time 123.28ms
iter 120590: loss nan, time 123.33ms
tensor(0.9649)
iter 120600: loss nan, time 123.16ms
iter 120610: loss nan, time 123.16ms
iter 120620: loss nan, time 123.24ms
iter 120630: loss nan, time 125.26ms
iter 120640: loss nan, time 122.98ms
iter 120650: loss nan, time 123.15ms
iter 120660: loss nan, time 123.16ms
iter 120670: loss nan, time 123.46ms
iter 120680: loss nan, time 123.30ms
iter 120690: loss nan, time 123.06ms
tensor(0.9524)
iter 120700: loss nan, time 123.30ms
iter 120710: loss nan, time 125.41ms
iter 120720: loss nan, time 122.84ms
iter 120730: loss nan, time 123.27ms
iter 120740: loss nan, time 122.96ms
step 120750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 120750: loss nan, time 2925.07ms
iter 120760: loss nan, time 117.09ms
iter 120770: loss nan, time 120.34ms
iter 120780: loss nan, time 119.19ms
iter 120790: loss nan, time 120.16ms
tensor(0.9382)
iter 120800: loss nan, time 119.97ms
iter 120810: loss nan, time 119.23ms
iter 120820: loss nan, time 120.26ms
iter 120830: loss nan, time 119.89ms
iter 120840: loss nan, time 120.94ms
iter 120850: loss nan, time 121.61ms
iter 120860: loss nan, time 119.51ms
iter 120870: loss nan, time 119.57ms
iter 120880: loss nan, time 119.56ms
iter 120890: loss nan, time 119.61ms
tensor(0.9222)
iter 120900: loss nan, time 121.82ms
iter 120910: loss nan, time 121.01ms
iter 120920: loss nan, time 121.67ms
iter 120930: loss nan, time 119.37ms
iter 120940: loss nan, time 119.58ms
iter 120950: loss nan, time 119.51ms
iter 120960: loss nan, time 120.36ms
iter 120970: loss nan, time 120.70ms
iter 120980: loss nan, time 120.73ms
iter 120990: loss nan, time 120.81ms
tensor(0.9045)
step 121000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 121000: loss nan, time 2919.23ms
iter 121010: loss nan, time 119.10ms
iter 121020: loss nan, time 120.40ms
iter 121030: loss nan, time 120.30ms
iter 121040: loss nan, time 121.08ms
iter 121050: loss nan, time 119.50ms
iter 121060: loss nan, time 119.84ms
iter 121070: loss nan, time 121.44ms
iter 121080: loss nan, time 119.12ms
iter 121090: loss nan, time 119.09ms
tensor(0.8853)
iter 121100: loss nan, time 120.60ms
iter 121110: loss nan, time 118.96ms
iter 121120: loss nan, time 121.91ms
iter 121130: loss nan, time 119.01ms
iter 121140: loss nan, time 119.08ms
iter 121150: loss nan, time 120.54ms
iter 121160: loss nan, time 119.60ms
iter 121170: loss nan, time 121.69ms
iter 121180: loss nan, time 119.53ms
iter 121190: loss nan, time 119.94ms
tensor(0.8645)
iter 121200: loss nan, time 121.09ms
iter 121210: loss nan, time 120.53ms
iter 121220: loss nan, time 120.68ms
iter 121230: loss nan, time 119.28ms
iter 121240: loss nan, time 120.33ms
step 121250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 121250: loss nan, time 2926.89ms
iter 121260: loss nan, time 119.33ms
iter 121270: loss nan, time 120.56ms
iter 121280: loss nan, time 121.01ms
iter 121290: loss nan, time 119.01ms
tensor(0.8423)
iter 121300: loss nan, time 120.60ms
iter 121310: loss nan, time 121.59ms
iter 121320: loss nan, time 119.16ms
iter 121330: loss nan, time 119.28ms
iter 121340: loss nan, time 121.77ms
iter 121350: loss nan, time 120.52ms
iter 121360: loss nan, time 119.25ms
iter 121370: loss nan, time 120.57ms
iter 121380: loss nan, time 120.71ms
iter 121390: loss nan, time 120.99ms
tensor(0.8187)
iter 121400: loss nan, time 119.30ms
iter 121410: loss nan, time 119.36ms
iter 121420: loss nan, time 120.59ms
iter 121430: loss nan, time 121.02ms
iter 121440: loss nan, time 120.62ms
iter 121450: loss nan, time 120.65ms
iter 121460: loss nan, time 119.20ms
iter 121470: loss nan, time 120.44ms
iter 121480: loss nan, time 121.05ms
iter 121490: loss nan, time 121.47ms
tensor(0.7939)
step 121500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 121500: loss nan, time 2937.34ms
iter 121510: loss nan, time 121.01ms
iter 121520: loss nan, time 120.60ms
iter 121530: loss nan, time 120.18ms
iter 121540: loss nan, time 118.99ms
iter 121550: loss nan, time 119.12ms
iter 121560: loss nan, time 120.51ms
iter 121570: loss nan, time 119.15ms
iter 121580: loss nan, time 121.27ms
iter 121590: loss nan, time 118.96ms
tensor(0.7679)
iter 121600: loss nan, time 121.55ms
iter 121610: loss nan, time 119.05ms
iter 121620: loss nan, time 120.50ms
iter 121630: loss nan, time 119.38ms
iter 121640: loss nan, time 119.30ms
iter 121650: loss nan, time 120.46ms
iter 121660: loss nan, time 118.88ms
iter 121670: loss nan, time 121.79ms
iter 121680: loss nan, time 120.36ms
iter 121690: loss nan, time 119.92ms
tensor(0.7409)
iter 121700: loss nan, time 120.52ms
iter 121710: loss nan, time 120.51ms
iter 121720: loss nan, time 120.55ms
iter 121730: loss nan, time 119.87ms
iter 121740: loss nan, time 120.51ms
step 121750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 121750: loss nan, time 2932.26ms
iter 121760: loss nan, time 119.60ms
iter 121770: loss nan, time 119.80ms
iter 121780: loss nan, time 120.10ms
iter 121790: loss nan, time 118.27ms
tensor(0.7129)
iter 121800: loss nan, time 118.30ms
iter 121810: loss nan, time 120.39ms
iter 121820: loss nan, time 119.51ms
iter 121830: loss nan, time 119.96ms
iter 121840: loss nan, time 120.31ms
iter 121850: loss nan, time 122.06ms
iter 121860: loss nan, time 120.57ms
