vocab_size not found in data/openwebtext/meta.pkl, using GPT-2 default of 50257
Initializing a new model from scratch
number of parameters: 50.96M
tensor(1.)
step 0: train loss 10.9687, val loss 10.9745
iter 0: loss 12.4264, time 5885.80ms
iter 10: loss 12.4105, time 123.62ms
iter 20: loss 12.2683, time 121.85ms
iter 30: loss 12.1396, time 120.66ms
iter 40: loss 11.9201, time 119.57ms
iter 50: loss 11.7990, time 120.35ms
iter 60: loss 11.2907, time 121.66ms
iter 70: loss 11.5626, time 121.46ms
iter 80: loss 11.1175, time 121.59ms
iter 90: loss 10.9326, time 122.72ms
tensor(0.9990)
iter 100: loss 11.2292, time 121.66ms
iter 110: loss 10.3922, time 121.99ms
iter 120: loss 10.9416, time 119.52ms
iter 130: loss 10.4545, time 121.44ms
iter 140: loss 10.8789, time 118.98ms
iter 150: loss 10.5869, time 121.99ms
iter 160: loss 10.0450, time 123.50ms
iter 170: loss 10.0214, time 121.48ms
iter 180: loss 10.0040, time 121.09ms
iter 190: loss 10.1014, time 121.92ms
tensor(0.9961)
iter 200: loss 9.7640, time 119.88ms
iter 210: loss 10.0354, time 119.39ms
iter 220: loss 10.0616, time 121.51ms
iter 230: loss 10.7295, time 120.35ms
iter 240: loss 10.0730, time 121.70ms
step 250: train loss 8.5914, val loss 8.6157
saving checkpoint to out-shakespeare-char
iter 250: loss 10.4551, time 2869.36ms
iter 260: loss 10.9079, time 120.38ms
iter 270: loss 9.6001, time 120.31ms
iter 280: loss 9.7793, time 121.02ms
iter 290: loss 8.9972, time 122.55ms
tensor(0.9911)
iter 300: loss 10.6732, time 123.61ms
iter 310: loss 10.2939, time 121.17ms
iter 320: loss 9.7946, time 121.11ms
iter 330: loss 10.2267, time 119.16ms
iter 340: loss 9.5826, time 118.92ms
iter 350: loss 10.1183, time 119.96ms
iter 360: loss 9.5000, time 120.85ms
iter 370: loss 10.0208, time 121.10ms
iter 380: loss 10.2551, time 121.88ms
iter 390: loss 9.6921, time 121.25ms
tensor(0.9843)
iter 400: loss 10.0622, time 124.47ms
iter 410: loss 9.5686, time 121.64ms
iter 420: loss 10.0106, time 121.49ms
iter 430: loss 9.5888, time 119.20ms
iter 440: loss 9.3083, time 120.39ms
iter 450: loss 9.9638, time 120.85ms
iter 460: loss 10.2150, time 121.04ms
iter 470: loss 9.0959, time 121.27ms
iter 480: loss 9.4582, time 123.33ms
iter 490: loss 9.8504, time 121.36ms
tensor(0.9755)
step 500: train loss 8.2280, val loss 8.1873
saving checkpoint to out-shakespeare-char
iter 500: loss 9.3418, time 2869.71ms
iter 510: loss 9.5803, time 123.19ms
iter 520: loss 8.9337, time 123.78ms
iter 530: loss 9.5252, time 118.79ms
iter 540: loss 9.6414, time 120.67ms
iter 550: loss 9.3241, time 120.94ms
iter 560: loss 8.8209, time 119.51ms
iter 570: loss 9.1928, time 121.40ms
iter 580: loss 9.2976, time 120.78ms
iter 590: loss 8.8154, time 120.30ms
tensor(0.9649)
iter 600: loss 9.8230, time 124.13ms
iter 610: loss 9.8360, time 121.74ms
iter 620: loss 9.2806, time 121.78ms
iter 630: loss 9.5704, time 121.65ms
iter 640: loss 9.3774, time 120.02ms
iter 650: loss 10.4448, time 121.69ms
iter 660: loss 9.5181, time 121.16ms
iter 670: loss 9.4151, time 121.41ms
iter 680: loss 9.5096, time 122.33ms
iter 690: loss 9.7333, time 121.02ms
tensor(0.9524)
iter 700: loss 9.2218, time 121.27ms
iter 710: loss 9.4889, time 121.27ms
iter 720: loss 9.6977, time 118.70ms
iter 730: loss 9.8285, time 119.81ms
iter 740: loss 9.2718, time 121.45ms
step 750: train loss 8.0405, val loss 7.9852
saving checkpoint to out-shakespeare-char
iter 750: loss 9.4226, time 2874.39ms
iter 760: loss 9.0872, time 119.80ms
iter 770: loss 9.5481, time 121.63ms
iter 780: loss 9.2508, time 121.75ms
iter 790: loss 10.1406, time 121.65ms
tensor(0.9382)
iter 800: loss 9.3448, time 124.11ms
iter 810: loss 9.1570, time 121.76ms
iter 820: loss 9.6175, time 122.03ms
iter 830: loss 9.1704, time 119.46ms
iter 840: loss 9.0271, time 120.67ms
iter 850: loss 9.3012, time 121.22ms
iter 860: loss 9.7495, time 121.58ms
iter 870: loss 9.8240, time 121.75ms
iter 880: loss 9.8315, time 122.11ms
iter 890: loss 9.5626, time 121.16ms
tensor(0.9222)
iter 900: loss 9.5832, time 119.93ms
iter 910: loss 9.4481, time 120.05ms
iter 920: loss 10.0291, time 121.35ms
iter 930: loss 9.6414, time 120.16ms
iter 940: loss 9.1835, time 123.77ms
iter 950: loss 9.3861, time 121.66ms
iter 960: loss 9.6405, time 121.42ms
iter 970: loss 9.1097, time 119.43ms
iter 980: loss 9.6431, time 120.72ms
iter 990: loss 9.7491, time 121.85ms
tensor(0.9045)
step 1000: train loss 7.8822, val loss 7.8912
saving checkpoint to out-shakespeare-char
iter 1000: loss 9.8620, time 2865.49ms
iter 1010: loss 9.4227, time 119.32ms
iter 1020: loss 9.5132, time 122.00ms
iter 1030: loss 8.6580, time 122.13ms
iter 1040: loss 9.7377, time 122.75ms
iter 1050: loss 8.9788, time 121.87ms
iter 1060: loss 8.9982, time 122.13ms
iter 1070: loss 9.1098, time 119.19ms
iter 1080: loss 9.3488, time 119.46ms
iter 1090: loss 9.1199, time 121.45ms
tensor(0.8853)
iter 1100: loss 9.4486, time 122.25ms
iter 1110: loss 8.5175, time 123.77ms
iter 1120: loss 9.6136, time 121.60ms
iter 1130: loss 9.5887, time 121.35ms
iter 1140: loss 9.5720, time 121.56ms
iter 1150: loss 9.1579, time 119.12ms
iter 1160: loss 9.9852, time 120.94ms
iter 1170: loss 8.6922, time 121.36ms
iter 1180: loss 9.8914, time 121.90ms
iter 1190: loss 9.4531, time 124.69ms
tensor(0.8645)
iter 1200: loss 9.0037, time 121.79ms
iter 1210: loss 9.3854, time 119.59ms
iter 1220: loss 9.7286, time 118.91ms
iter 1230: loss 9.6887, time 120.20ms
iter 1240: loss 8.7899, time 121.96ms
step 1250: train loss 7.7399, val loss 7.7598
saving checkpoint to out-shakespeare-char
iter 1250: loss 8.8736, time 2875.94ms
iter 1260: loss 9.2319, time 119.72ms
iter 1270: loss 8.9503, time 120.26ms
iter 1280: loss 10.0359, time 119.21ms
iter 1290: loss 9.3964, time 121.85ms
tensor(0.8423)
iter 1300: loss 9.4518, time 124.36ms
iter 1310: loss 9.2479, time 121.44ms
iter 1320: loss 9.3049, time 121.32ms
iter 1330: loss 8.8323, time 120.52ms
iter 1340: loss 8.6609, time 121.50ms
iter 1350: loss 9.8838, time 119.15ms
iter 1360: loss 9.6378, time 120.63ms
iter 1370: loss 9.5241, time 121.69ms
iter 1380: loss 8.1786, time 122.02ms
iter 1390: loss 9.3039, time 123.18ms
tensor(0.8187)
iter 1400: loss 9.9167, time 122.66ms
iter 1410: loss 8.8617, time 119.36ms
iter 1420: loss 9.2326, time 121.18ms
iter 1430: loss 8.9398, time 121.34ms
iter 1440: loss 9.0185, time 119.48ms
iter 1450: loss 9.2413, time 120.44ms
iter 1460: loss 9.0941, time 121.49ms
iter 1470: loss 8.8439, time 119.76ms
iter 1480: loss 9.4661, time 123.19ms
iter 1490: loss 9.0107, time 121.49ms
tensor(0.7939)
step 1500: train loss 7.7233, val loss 7.6655
saving checkpoint to out-shakespeare-char
iter 1500: loss 9.0531, time 2872.93ms
iter 1510: loss 9.9372, time 121.96ms
iter 1520: loss 8.6741, time 123.16ms
iter 1530: loss 8.2790, time 120.53ms
iter 1540: loss 8.8442, time 121.98ms
iter 1550: loss 8.9025, time 120.82ms
iter 1560: loss 8.9062, time 121.69ms
iter 1570: loss 9.0749, time 122.26ms
iter 1580: loss 9.3403, time 119.71ms
iter 1590: loss 8.7812, time 120.17ms
tensor(0.7679)
iter 1600: loss 9.6079, time 122.15ms
iter 1610: loss 8.8345, time 121.36ms
iter 1620: loss 8.8315, time 123.04ms
iter 1630: loss 8.2036, time 122.71ms
iter 1640: loss 8.7508, time 121.22ms
iter 1650: loss 9.0533, time 121.40ms
iter 1660: loss 8.9243, time 119.70ms
iter 1670: loss 8.8810, time 120.07ms
iter 1680: loss 8.7454, time 121.56ms
iter 1690: loss 9.3792, time 122.10ms
tensor(0.7409)
iter 1700: loss 9.1708, time 123.47ms
iter 1710: loss 9.1264, time 122.02ms
iter 1720: loss 8.9026, time 121.21ms
iter 1730: loss 9.2073, time 121.37ms
iter 1740: loss 8.8208, time 118.82ms
step 1750: train loss 7.6007, val loss 7.5747
saving checkpoint to out-shakespeare-char
iter 1750: loss 9.1242, time 2866.67ms
iter 1760: loss 9.0025, time 123.15ms
iter 1770: loss 9.5551, time 122.66ms
iter 1780: loss 9.4943, time 121.50ms
iter 1790: loss 8.6640, time 119.28ms
tensor(0.7129)
iter 1800: loss 8.9171, time 119.88ms
iter 1810: loss 8.9517, time 119.77ms
iter 1820: loss 9.0795, time 119.65ms
iter 1830: loss 8.5646, time 121.75ms
iter 1840: loss 8.6697, time 122.02ms
iter 1850: loss 9.3092, time 123.44ms
iter 1860: loss 8.8058, time 121.38ms
iter 1870: loss 9.0999, time 121.86ms
iter 1880: loss 9.0813, time 119.41ms
iter 1890: loss 9.1441, time 119.42ms
tensor(0.6841)
iter 1900: loss 8.7419, time 120.55ms
iter 1910: loss 9.1610, time 120.93ms
iter 1920: loss 8.9702, time 120.70ms
iter 1930: loss 9.0754, time 123.72ms
iter 1940: loss 9.3102, time 123.31ms
iter 1950: loss 8.2454, time 121.53ms
iter 1960: loss 9.1109, time 121.78ms
iter 1970: loss 9.4385, time 119.27ms
iter 1980: loss 8.8234, time 119.87ms
iter 1990: loss 8.9802, time 119.97ms
tensor(0.6545)
step 2000: train loss 7.5532, val loss 7.5540
saving checkpoint to out-shakespeare-char
iter 2000: loss 7.9502, time 2865.85ms
iter 2010: loss 9.1748, time 119.22ms
iter 2020: loss 8.9124, time 119.70ms
iter 2030: loss 9.1993, time 119.54ms
iter 2040: loss 8.6750, time 119.51ms
iter 2050: loss 9.1116, time 120.66ms
iter 2060: loss 8.8129, time 121.16ms
iter 2070: loss 8.8656, time 122.71ms
iter 2080: loss 8.5253, time 122.98ms
iter 2090: loss 8.6069, time 122.92ms
tensor(0.6243)
iter 2100: loss 8.6233, time 123.83ms
iter 2110: loss 8.0754, time 117.99ms
iter 2120: loss 9.8047, time 121.28ms
iter 2130: loss 8.3041, time 119.49ms
iter 2140: loss 9.2621, time 119.48ms
iter 2150: loss 8.7613, time 119.88ms
iter 2160: loss 9.2503, time 121.19ms
iter 2170: loss 9.1430, time 120.39ms
iter 2180: loss 9.1497, time 123.70ms
iter 2190: loss 9.4077, time 121.41ms
tensor(0.5937)
iter 2200: loss 9.2512, time 120.10ms
iter 2210: loss 9.2234, time 120.44ms
iter 2220: loss 8.9537, time 119.77ms
iter 2230: loss 9.6527, time 119.60ms
iter 2240: loss 9.0026, time 121.59ms
step 2250: train loss 7.4627, val loss 7.4848
saving checkpoint to out-shakespeare-char
iter 2250: loss 9.5175, time 2874.95ms
iter 2260: loss 8.6779, time 119.42ms
iter 2270: loss 9.3146, time 119.05ms
iter 2280: loss 8.4188, time 120.41ms
iter 2290: loss 9.1487, time 119.86ms
tensor(0.5627)
iter 2300: loss 8.9156, time 122.73ms
iter 2310: loss 8.5010, time 121.69ms
iter 2320: loss 8.4474, time 123.25ms
iter 2330: loss 9.1704, time 123.27ms
iter 2340: loss 9.1617, time 121.63ms
iter 2350: loss 9.5002, time 119.38ms
iter 2360: loss 8.7513, time 118.76ms
iter 2370: loss 9.4146, time 118.61ms
iter 2380: loss 8.8631, time 119.05ms
iter 2390: loss 8.8702, time 119.92ms
tensor(0.5314)
iter 2400: loss 9.6120, time 121.41ms
iter 2410: loss 9.9753, time 121.39ms
iter 2420: loss 9.1212, time 121.21ms
iter 2430: loss 8.7065, time 123.33ms
iter 2440: loss 8.5237, time 123.18ms
iter 2450: loss 8.9567, time 122.48ms
iter 2460: loss 9.0926, time 121.57ms
iter 2470: loss 9.6335, time 121.25ms
iter 2480: loss 9.1108, time 117.97ms
iter 2490: loss 9.4683, time 119.06ms
tensor(0.5000)
step 2500: train loss 7.4002, val loss 7.4096
saving checkpoint to out-shakespeare-char
iter 2500: loss 8.7467, time 2871.38ms
iter 2510: loss 9.4743, time 123.17ms
iter 2520: loss 8.7346, time 121.17ms
iter 2530: loss 9.1504, time 120.86ms
iter 2540: loss 8.7404, time 119.01ms
iter 2550: loss 9.1062, time 118.95ms
iter 2560: loss 8.9538, time 119.08ms
iter 2570: loss 9.0174, time 118.93ms
iter 2580: loss 8.4701, time 119.06ms
iter 2590: loss 8.8944, time 120.33ms
tensor(0.4686)
iter 2600: loss 9.6907, time 120.76ms
iter 2610: loss 9.8115, time 120.99ms
iter 2620: loss 8.9300, time 122.28ms
iter 2630: loss 8.5553, time 122.78ms
iter 2640: loss 8.5003, time 124.84ms
iter 2650: loss 8.8331, time 122.39ms
iter 2660: loss 9.3192, time 120.36ms
iter 2670: loss 9.2763, time 122.12ms
iter 2680: loss 9.2594, time 122.70ms
iter 2690: loss 9.4810, time 122.41ms
tensor(0.4373)
iter 2700: loss 8.8970, time 120.58ms
iter 2710: loss 8.4938, time 120.37ms
iter 2720: loss 9.0084, time 121.12ms
iter 2730: loss 9.0314, time 122.81ms
iter 2740: loss 8.8438, time 122.53ms
step 2750: train loss 7.3862, val loss 7.4101
saving checkpoint to out-shakespeare-char
iter 2750: loss 9.1474, time 2853.67ms
iter 2760: loss 8.9209, time 118.25ms
iter 2770: loss 8.7927, time 117.88ms
iter 2780: loss 8.5297, time 118.63ms
iter 2790: loss 8.5293, time 118.94ms
tensor(0.4063)
iter 2800: loss 9.1232, time 119.30ms
iter 2810: loss 8.7210, time 119.45ms
iter 2820: loss 8.2747, time 119.63ms
iter 2830: loss 9.1657, time 120.10ms
iter 2840: loss 7.8681, time 119.70ms
iter 2850: loss 8.2529, time 120.81ms
iter 2860: loss 8.2246, time 120.86ms
iter 2870: loss 7.9393, time 120.71ms
iter 2880: loss 8.3427, time 121.32ms
iter 2890: loss 9.4552, time 120.59ms
tensor(0.3757)
iter 2900: loss 8.3979, time 122.04ms
iter 2910: loss 9.3838, time 122.61ms
iter 2920: loss 8.3376, time 120.57ms
iter 2930: loss 8.4101, time 122.48ms
iter 2940: loss 9.1238, time 122.19ms
iter 2950: loss 8.7656, time 122.58ms
iter 2960: loss 8.2083, time 122.44ms
iter 2970: loss 8.9542, time 120.72ms
iter 2980: loss 8.3169, time 122.84ms
iter 2990: loss 8.4403, time 121.24ms
tensor(0.3455)
step 3000: train loss 7.3457, val loss 7.3187
saving checkpoint to out-shakespeare-char
iter 3000: loss 9.5034, time 2864.48ms
iter 3010: loss 8.3407, time 120.83ms
iter 3020: loss 8.1630, time 121.12ms
iter 3030: loss 8.7698, time 120.63ms
iter 3040: loss 8.4588, time 121.23ms
iter 3050: loss 8.8365, time 120.14ms
iter 3060: loss 8.9365, time 121.66ms
iter 3070: loss 9.9345, time 121.55ms
iter 3080: loss 8.7711, time 120.73ms
iter 3090: loss 8.2740, time 122.51ms
tensor(0.3159)
iter 3100: loss 9.4815, time 123.59ms
iter 3110: loss 8.3065, time 123.45ms
iter 3120: loss 8.5557, time 121.06ms
iter 3130: loss 9.0156, time 118.00ms
iter 3140: loss 9.6011, time 121.01ms
iter 3150: loss 9.1318, time 118.25ms
iter 3160: loss 9.1657, time 119.17ms
iter 3170: loss 8.4852, time 118.72ms
iter 3180: loss 8.0590, time 119.44ms
iter 3190: loss 8.5109, time 118.92ms
tensor(0.2871)
iter 3200: loss 9.3046, time 119.21ms
iter 3210: loss 8.7095, time 120.01ms
iter 3220: loss 9.0828, time 120.42ms
iter 3230: loss 8.8415, time 121.14ms
iter 3240: loss 9.4893, time 119.29ms
step 3250: train loss 7.2710, val loss 7.3070
saving checkpoint to out-shakespeare-char
iter 3250: loss 9.5068, time 2868.08ms
iter 3260: loss 8.3704, time 117.96ms
iter 3270: loss 9.0212, time 118.76ms
iter 3280: loss 8.0590, time 118.70ms
iter 3290: loss 8.8297, time 118.81ms
tensor(0.2591)
iter 3300: loss 9.2280, time 118.95ms
iter 3310: loss 9.4681, time 117.85ms
iter 3320: loss 8.8281, time 118.98ms
iter 3330: loss 9.1774, time 119.61ms
iter 3340: loss 8.9741, time 120.34ms
iter 3350: loss 9.0374, time 118.78ms
iter 3360: loss 8.8417, time 120.97ms
iter 3370: loss 8.7751, time 121.61ms
iter 3380: loss 8.9184, time 121.72ms
iter 3390: loss 9.1327, time 121.98ms
tensor(0.2321)
iter 3400: loss 9.4478, time 121.23ms
iter 3410: loss 8.8366, time 122.71ms
iter 3420: loss 8.8151, time 122.71ms
iter 3430: loss 8.4123, time 122.73ms
iter 3440: loss 9.0906, time 122.66ms
iter 3450: loss 8.9713, time 120.76ms
iter 3460: loss 8.4444, time 122.04ms
iter 3470: loss 9.3034, time 120.69ms
iter 3480: loss 9.3498, time 120.81ms
iter 3490: loss 9.1611, time 121.36ms
tensor(0.2061)
step 3500: train loss 7.2899, val loss 7.2696
saving checkpoint to out-shakespeare-char
iter 3500: loss 8.9876, time 2869.66ms
iter 3510: loss 8.3932, time 121.35ms
iter 3520: loss 8.1641, time 123.38ms
iter 3530: loss 9.0511, time 121.46ms
iter 3540: loss 8.2101, time 121.48ms
iter 3550: loss 9.3781, time 119.25ms
iter 3560: loss 8.5267, time 118.80ms
iter 3570: loss 9.3150, time 118.70ms
iter 3580: loss 8.9060, time 119.67ms
iter 3590: loss 8.5169, time 119.70ms
tensor(0.1813)
iter 3600: loss 9.4267, time 120.96ms
iter 3610: loss 9.1313, time 120.11ms
iter 3620: loss 8.2943, time 121.61ms
iter 3630: loss 8.1525, time 122.07ms
iter 3640: loss 7.6365, time 120.84ms
iter 3650: loss 8.4414, time 123.14ms
iter 3660: loss 8.5358, time 123.30ms
iter 3670: loss 9.4395, time 122.99ms
iter 3680: loss 8.8850, time 121.21ms
iter 3690: loss 8.8169, time 117.25ms
tensor(0.1577)
iter 3700: loss 8.9125, time 119.30ms
iter 3710: loss 8.7172, time 119.10ms
iter 3720: loss 9.3955, time 117.94ms
iter 3730: loss 8.5690, time 118.81ms
iter 3740: loss 8.4251, time 119.67ms
step 3750: train loss 7.2486, val loss 7.2991
saving checkpoint to out-shakespeare-char
iter 3750: loss 8.3281, time 2860.32ms
iter 3760: loss 8.3899, time 122.88ms
iter 3770: loss 8.5321, time 122.31ms
iter 3780: loss 8.8616, time 120.31ms
iter 3790: loss 8.8529, time 120.74ms
tensor(0.1355)
iter 3800: loss 8.2469, time 117.90ms
iter 3810: loss 8.7957, time 117.79ms
iter 3820: loss 8.0699, time 117.54ms
iter 3830: loss 8.8290, time 118.39ms
iter 3840: loss 9.7305, time 117.84ms
iter 3850: loss 8.2973, time 117.94ms
iter 3860: loss 8.5271, time 118.39ms
iter 3870: loss 9.0254, time 117.93ms
iter 3880: loss 8.3970, time 117.81ms
iter 3890: loss 9.2892, time 118.23ms
tensor(0.1147)
iter 3900: loss 8.5974, time 119.23ms
iter 3910: loss 8.4813, time 118.82ms
iter 3920: loss 9.1298, time 121.03ms
iter 3930: loss 8.8168, time 120.74ms
iter 3940: loss 8.5367, time 121.56ms
iter 3950: loss 8.5625, time 121.90ms
iter 3960: loss 9.1422, time 121.05ms
iter 3970: loss 8.4052, time 123.13ms
iter 3980: loss 8.6953, time 122.24ms
iter 3990: loss 8.5342, time 122.69ms
tensor(0.0955)
step 4000: train loss 7.2742, val loss 7.2491
saving checkpoint to out-shakespeare-char
iter 4000: loss 9.1638, time 2859.71ms
iter 4010: loss 8.8194, time 117.41ms
iter 4020: loss 8.8086, time 117.88ms
iter 4030: loss 8.7344, time 117.82ms
iter 4040: loss 8.9277, time 119.02ms
iter 4050: loss 8.2793, time 118.46ms
iter 4060: loss 7.8580, time 117.65ms
iter 4070: loss 8.7291, time 117.07ms
iter 4080: loss 8.4004, time 119.51ms
iter 4090: loss 8.7743, time 118.63ms
tensor(0.0778)
iter 4100: loss 9.0084, time 119.40ms
iter 4110: loss 8.5163, time 119.23ms
iter 4120: loss 7.8588, time 119.00ms
iter 4130: loss 9.0085, time 118.09ms
iter 4140: loss 8.5909, time 119.94ms
iter 4150: loss 9.1253, time 122.36ms
iter 4160: loss 8.4075, time 122.42ms
iter 4170: loss 8.5039, time 121.06ms
iter 4180: loss 9.5659, time 122.50ms
iter 4190: loss 8.5010, time 123.00ms
tensor(0.0618)
iter 4200: loss 8.3879, time 123.46ms
iter 4210: loss 9.1804, time 122.86ms
iter 4220: loss 8.4217, time 121.38ms
iter 4230: loss 8.7819, time 119.21ms
iter 4240: loss 9.3565, time 118.94ms
step 4250: train loss 7.2201, val loss 7.1945
saving checkpoint to out-shakespeare-char
iter 4250: loss 8.7454, time 2851.34ms
iter 4260: loss 8.4853, time 121.96ms
iter 4270: loss 8.7566, time 122.79ms
iter 4280: loss 9.5891, time 121.13ms
iter 4290: loss 8.6678, time 122.07ms
tensor(0.0476)
iter 4300: loss 7.7299, time 122.68ms
iter 4310: loss 9.4365, time 121.37ms
iter 4320: loss 8.6453, time 122.31ms
iter 4330: loss 8.1458, time 120.68ms
iter 4340: loss 8.0713, time 121.21ms
iter 4350: loss 8.9172, time 122.49ms
iter 4360: loss 9.2127, time 122.36ms
iter 4370: loss 8.5333, time 122.43ms
iter 4380: loss 8.1303, time 120.60ms
iter 4390: loss 9.1355, time 121.49ms
tensor(0.0351)
iter 4400: loss 7.7765, time 120.78ms
iter 4410: loss 8.7387, time 119.68ms
iter 4420: loss 8.7388, time 120.65ms
iter 4430: loss 8.0039, time 120.72ms
iter 4440: loss 9.3278, time 119.92ms
iter 4450: loss 9.1032, time 120.77ms
iter 4460: loss 8.7064, time 120.62ms
iter 4470: loss 8.9722, time 120.93ms
iter 4480: loss 9.3666, time 118.38ms
iter 4490: loss 8.7227, time 115.90ms
tensor(0.0245)
step 4500: train loss 7.2523, val loss 7.2673
saving checkpoint to out-shakespeare-char
iter 4500: loss 8.6750, time 2871.11ms
iter 4510: loss 8.5933, time 122.48ms
iter 4520: loss 8.5858, time 122.69ms
iter 4530: loss 8.5650, time 122.47ms
iter 4540: loss 8.6810, time 120.64ms
iter 4550: loss 8.2313, time 122.47ms
iter 4560: loss 8.1352, time 122.62ms
iter 4570: loss 9.1523, time 122.51ms
iter 4580: loss 9.1064, time 122.61ms
iter 4590: loss 8.2528, time 120.66ms
tensor(0.0157)
iter 4600: loss 8.2778, time 120.96ms
iter 4610: loss 8.9385, time 121.21ms
iter 4620: loss 8.4805, time 120.92ms
iter 4630: loss 7.8858, time 120.58ms
iter 4640: loss 8.8898, time 120.77ms
iter 4650: loss 9.0428, time 118.96ms
iter 4660: loss 9.1712, time 119.41ms
iter 4670: loss 8.6536, time 119.07ms
iter 4680: loss 8.3281, time 119.53ms
iter 4690: loss 8.7745, time 119.46ms
tensor(0.0089)
iter 4700: loss 8.7782, time 121.16ms
iter 4710: loss 9.3722, time 121.00ms
iter 4720: loss 8.7537, time 120.82ms
iter 4730: loss 9.0848, time 122.38ms
iter 4740: loss 8.6346, time 123.44ms
step 4750: train loss 7.2205, val loss 7.2155
saving checkpoint to out-shakespeare-char
iter 4750: loss 8.3742, time 2856.37ms
iter 4760: loss 9.3219, time 119.16ms
iter 4770: loss 8.9149, time 120.91ms
iter 4780: loss 8.3702, time 121.29ms
iter 4790: loss 8.6111, time 122.37ms
tensor(0.0039)
iter 4800: loss 8.8783, time 123.94ms
iter 4810: loss 8.7699, time 123.44ms
iter 4820: loss 9.1082, time 120.64ms
iter 4830: loss 8.7762, time 119.30ms
iter 4840: loss 8.5705, time 121.21ms
iter 4850: loss 8.8508, time 119.37ms
iter 4860: loss 8.4223, time 119.45ms
iter 4870: loss 8.4131, time 121.49ms
iter 4880: loss 9.0043, time 119.36ms
iter 4890: loss 8.6877, time 120.74ms
tensor(0.0010)
iter 4900: loss 8.2909, time 123.80ms
iter 4910: loss 8.7603, time 120.99ms
iter 4920: loss 9.1587, time 121.22ms
iter 4930: loss 9.1340, time 121.58ms
iter 4940: loss 8.1807, time 119.23ms
iter 4950: loss 8.7213, time 120.43ms
iter 4960: loss 8.9769, time 121.37ms
iter 4970: loss 7.7786, time 121.06ms
iter 4980: loss 9.0479, time 122.05ms
iter 4990: loss 8.5750, time 123.24ms
tensor(0.0010)
step 5000: train loss 7.2315, val loss 7.1242
saving checkpoint to out-shakespeare-char
iter 5000: loss 8.5684, time 2868.13ms
iter 5010: loss 9.4134, time 121.33ms
iter 5020: loss 8.5552, time 121.15ms
iter 5030: loss 8.4253, time 121.34ms
iter 5040: loss 9.0639, time 122.26ms
iter 5050: loss 9.6649, time 121.03ms
iter 5060: loss 8.4574, time 121.34ms
iter 5070: loss 9.4600, time 121.48ms
iter 5080: loss 8.4581, time 119.64ms
iter 5090: loss 8.2257, time 119.40ms
tensor(0.0010)
iter 5100: loss 7.9211, time 121.69ms
iter 5110: loss 8.7596, time 121.72ms
iter 5120: loss 8.7074, time 122.13ms
iter 5130: loss 9.7451, time 123.47ms
iter 5140: loss 8.3457, time 121.34ms
iter 5150: loss 8.1012, time 121.33ms
iter 5160: loss 8.7101, time 121.97ms
iter 5170: loss 9.4074, time 119.36ms
iter 5180: loss 7.8949, time 121.19ms
iter 5190: loss 8.6051, time 121.25ms
tensor(0.0039)
iter 5200: loss 8.4675, time 122.66ms
iter 5210: loss 8.6459, time 123.71ms
iter 5220: loss 8.6922, time 119.00ms
iter 5230: loss 8.3664, time 121.19ms
iter 5240: loss 7.7925, time 121.32ms
step 5250: train loss 7.2272, val loss 7.1612
saving checkpoint to out-shakespeare-char
iter 5250: loss 8.8150, time 2863.22ms
iter 5260: loss 7.9116, time 121.50ms
iter 5270: loss 8.9586, time 121.54ms
iter 5280: loss 8.4953, time 118.72ms
iter 5290: loss 9.3917, time 118.60ms
tensor(0.0089)
iter 5300: loss 8.3798, time 119.60ms
iter 5310: loss 9.1055, time 120.02ms
iter 5320: loss 8.2852, time 120.31ms
iter 5330: loss 9.3853, time 120.68ms
iter 5340: loss 8.4670, time 118.55ms
iter 5350: loss 9.1440, time 120.99ms
iter 5360: loss 8.5025, time 121.87ms
iter 5370: loss 8.8297, time 121.83ms
iter 5380: loss 7.8017, time 122.30ms
iter 5390: loss 8.7762, time 120.88ms
tensor(0.0157)
iter 5400: loss 8.7106, time 123.21ms
iter 5410: loss 9.0026, time 120.76ms
iter 5420: loss 8.3482, time 120.60ms
iter 5430: loss 9.1080, time 120.70ms
iter 5440: loss 8.7046, time 120.95ms
iter 5450: loss 9.0819, time 120.70ms
iter 5460: loss 8.5981, time 120.65ms
iter 5470: loss 8.8908, time 118.24ms
iter 5480: loss 9.5952, time 118.94ms
iter 5490: loss 8.8885, time 118.91ms
tensor(0.0245)
step 5500: train loss 7.2381, val loss 7.1918
saving checkpoint to out-shakespeare-char
iter 5500: loss 8.9408, time 2863.15ms
iter 5510: loss 8.4793, time 123.08ms
iter 5520: loss 8.1259, time 121.45ms
iter 5530: loss 8.8268, time 120.70ms
iter 5540: loss 8.4362, time 120.95ms
iter 5550: loss 8.0239, time 121.47ms
iter 5560: loss 8.3006, time 119.69ms
iter 5570: loss 8.2226, time 120.36ms
iter 5580: loss 8.1122, time 121.26ms
iter 5590: loss 8.7465, time 121.47ms
tensor(0.0351)
iter 5600: loss 8.6603, time 122.77ms
iter 5610: loss 8.8374, time 123.08ms
iter 5620: loss 8.9431, time 123.12ms
iter 5630: loss 7.7836, time 119.45ms
iter 5640: loss 8.2310, time 121.39ms
iter 5650: loss 8.3062, time 120.38ms
iter 5660: loss 9.2614, time 121.22ms
iter 5670: loss 8.0688, time 120.59ms
iter 5680: loss 9.0531, time 122.10ms
iter 5690: loss 7.5824, time 121.43ms
tensor(0.0476)
iter 5700: loss 7.8058, time 121.16ms
iter 5710: loss 8.9780, time 121.29ms
iter 5720: loss 9.2537, time 121.80ms
iter 5730: loss 8.7686, time 119.46ms
iter 5740: loss 9.1341, time 119.43ms
step 5750: train loss 7.1671, val loss 7.1918
saving checkpoint to out-shakespeare-char
iter 5750: loss 9.0127, time 2827.71ms
iter 5760: loss 9.2778, time 121.71ms
iter 5770: loss 8.9108, time 122.35ms
iter 5780: loss 9.1122, time 123.39ms
iter 5790: loss 8.3541, time 123.11ms
tensor(0.0618)
iter 5800: loss 7.9644, time 121.49ms
iter 5810: loss 9.0625, time 119.62ms
iter 5820: loss 8.5127, time 121.05ms
iter 5830: loss 8.7488, time 119.06ms
iter 5840: loss 9.1583, time 119.72ms
iter 5850: loss 8.9660, time 117.00ms
iter 5860: loss 8.8046, time 118.94ms
iter 5870: loss 8.5782, time 119.83ms
iter 5880: loss 8.3827, time 118.84ms
iter 5890: loss 8.2059, time 120.76ms
tensor(0.0778)
iter 5900: loss 8.5588, time 118.64ms
iter 5910: loss 8.7582, time 120.73ms
iter 5920: loss 9.4322, time 119.33ms
iter 5930: loss 8.1337, time 120.36ms
iter 5940: loss 8.7682, time 118.39ms
iter 5950: loss 8.7758, time 117.75ms
iter 5960: loss 8.1352, time 117.87ms
iter 5970: loss 8.5568, time 119.23ms
iter 5980: loss 8.3102, time 118.52ms
iter 5990: loss 9.0250, time 116.84ms
tensor(0.0955)
step 6000: train loss 7.1770, val loss 7.2163
saving checkpoint to out-shakespeare-char
iter 6000: loss 8.1100, time 2847.08ms
iter 6010: loss 8.2075, time 119.58ms
iter 6020: loss 8.7207, time 120.84ms
iter 6030: loss 8.9589, time 121.57ms
iter 6040: loss 8.2244, time 122.63ms
iter 6050: loss 9.2163, time 121.16ms
iter 6060: loss 8.7519, time 121.97ms
iter 6070: loss 8.7761, time 121.70ms
iter 6080: loss 8.5990, time 121.59ms
iter 6090: loss 9.4480, time 120.72ms
tensor(0.1147)
iter 6100: loss 8.8323, time 122.33ms
iter 6110: loss 8.6057, time 120.51ms
iter 6120: loss 8.5325, time 123.86ms
iter 6130: loss 8.9927, time 121.92ms
iter 6140: loss 8.9941, time 121.88ms
iter 6150: loss 9.1212, time 120.28ms
iter 6160: loss 8.2372, time 121.84ms
iter 6170: loss 8.8590, time 122.01ms
iter 6180: loss 9.5355, time 124.13ms
iter 6190: loss 8.6581, time 119.73ms
tensor(0.1355)
iter 6200: loss 8.0152, time 121.95ms
iter 6210: loss 9.0214, time 119.65ms
iter 6220: loss 9.0809, time 121.73ms
iter 6230: loss 8.1357, time 121.57ms
iter 6240: loss 8.9428, time 122.98ms
step 6250: train loss 7.1937, val loss 7.1864
saving checkpoint to out-shakespeare-char
iter 6250: loss 8.7496, time 2851.53ms
iter 6260: loss 8.3905, time 119.45ms
iter 6270: loss 7.8703, time 121.59ms
iter 6280: loss 8.6155, time 122.01ms
iter 6290: loss 7.7622, time 124.03ms
tensor(0.1577)
iter 6300: loss 8.5535, time 121.84ms
iter 6310: loss 8.9976, time 121.51ms
iter 6320: loss 8.1020, time 119.35ms
iter 6330: loss 8.1475, time 120.69ms
iter 6340: loss 9.1122, time 122.03ms
iter 6350: loss 7.5954, time 123.50ms
iter 6360: loss 8.5562, time 123.71ms
iter 6370: loss 7.6595, time 121.71ms
iter 6380: loss 9.2393, time 120.94ms
iter 6390: loss 8.3404, time 120.60ms
tensor(0.1813)
iter 6400: loss 8.9848, time 120.03ms
iter 6410: loss 8.8609, time 122.24ms
iter 6420: loss 8.0822, time 123.65ms
iter 6430: loss 8.0993, time 120.03ms
iter 6440: loss 8.8654, time 121.78ms
iter 6450: loss 8.5401, time 120.58ms
iter 6460: loss 8.0657, time 121.66ms
iter 6470: loss 8.8199, time 122.62ms
iter 6480: loss 8.8664, time 122.26ms
iter 6490: loss 9.4157, time 121.85ms
tensor(0.2061)
step 6500: train loss 7.1917, val loss 7.2366
saving checkpoint to out-shakespeare-char
iter 6500: loss 9.5900, time 2854.40ms
iter 6510: loss 8.6119, time 122.34ms
iter 6520: loss 8.6229, time 124.19ms
iter 6530: loss 8.6729, time 121.65ms
iter 6540: loss 8.7199, time 119.85ms
iter 6550: loss 8.6757, time 120.41ms
iter 6560: loss 9.1935, time 122.05ms
iter 6570: loss 8.3127, time 123.26ms
iter 6580: loss 8.6146, time 122.91ms
iter 6590: loss 8.3746, time 121.83ms
tensor(0.2321)
iter 6600: loss 8.4493, time 121.85ms
iter 6610: loss 8.2076, time 121.11ms
iter 6620: loss 7.6404, time 121.70ms
iter 6630: loss 9.4974, time 121.78ms
iter 6640: loss 9.2287, time 123.02ms
iter 6650: loss 8.9399, time 119.93ms
iter 6660: loss 8.0378, time 122.11ms
iter 6670: loss 8.3718, time 120.32ms
iter 6680: loss 8.8574, time 121.97ms
iter 6690: loss 9.3300, time 120.59ms
tensor(0.2591)
iter 6700: loss 8.5337, time 123.66ms
iter 6710: loss 8.8587, time 121.73ms
iter 6720: loss 8.3525, time 119.80ms
iter 6730: loss 9.2222, time 120.92ms
iter 6740: loss 8.3168, time 122.18ms
step 6750: train loss 7.1364, val loss 7.1633
saving checkpoint to out-shakespeare-char
iter 6750: loss 9.0131, time 2875.43ms
iter 6760: loss 8.7466, time 121.66ms
iter 6770: loss 8.0504, time 122.26ms
iter 6780: loss 7.8376, time 121.94ms
iter 6790: loss 9.1053, time 121.78ms
tensor(0.2871)
iter 6800: loss 8.1908, time 122.16ms
iter 6810: loss 8.4548, time 120.76ms
iter 6820: loss 8.9853, time 122.58ms
iter 6830: loss 8.6804, time 122.41ms
iter 6840: loss 8.5598, time 121.81ms
iter 6850: loss 7.9736, time 121.23ms
iter 6860: loss 8.3345, time 121.63ms
iter 6870: loss 8.9722, time 120.23ms
iter 6880: loss 8.0999, time 121.94ms
iter 6890: loss 9.0575, time 122.17ms
tensor(0.3159)
iter 6900: loss 7.8282, time 124.11ms
iter 6910: loss 8.3534, time 121.85ms
iter 6920: loss 8.8110, time 119.16ms
iter 6930: loss 8.5857, time 118.36ms
iter 6940: loss 8.5118, time 121.10ms
iter 6950: loss 8.8709, time 121.27ms
iter 6960: loss 9.3393, time 122.88ms
iter 6970: loss 8.3307, time 123.74ms
iter 6980: loss 8.9915, time 121.93ms
iter 6990: loss 8.5997, time 121.49ms
tensor(0.3455)
step 7000: train loss 7.2088, val loss 7.2735
saving checkpoint to out-shakespeare-char
iter 7000: loss 7.7696, time 2842.37ms
iter 7010: loss 8.3011, time 122.47ms
iter 7020: loss 7.9811, time 123.56ms
iter 7030: loss 8.7830, time 121.43ms
iter 7040: loss 8.8920, time 121.69ms
iter 7050: loss 8.4482, time 119.33ms
iter 7060: loss 8.8421, time 119.97ms
iter 7070: loss 8.5389, time 121.54ms
iter 7080: loss 8.9310, time 121.68ms
iter 7090: loss 8.6512, time 123.48ms
tensor(0.3757)
iter 7100: loss 8.3708, time 123.95ms
iter 7110: loss 8.2505, time 121.70ms
iter 7120: loss 8.8056, time 119.56ms
iter 7130: loss 7.9884, time 119.48ms
iter 7140: loss 9.2336, time 121.39ms
iter 7150: loss 7.7593, time 121.84ms
iter 7160: loss 8.5758, time 123.37ms
iter 7170: loss 8.2533, time 121.25ms
iter 7180: loss 8.2523, time 121.98ms
iter 7190: loss 8.7294, time 119.62ms
tensor(0.4063)
iter 7200: loss 9.3417, time 120.58ms
iter 7210: loss 8.4474, time 121.72ms
iter 7220: loss 8.6204, time 122.04ms
iter 7230: loss 8.6545, time 123.19ms
iter 7240: loss 8.3343, time 122.61ms
step 7250: train loss 7.2027, val loss 7.1797
saving checkpoint to out-shakespeare-char
iter 7250: loss 8.2470, time 2849.80ms
iter 7260: loss 8.6566, time 121.99ms
iter 7270: loss 8.2709, time 121.81ms
iter 7280: loss 8.3967, time 123.91ms
iter 7290: loss 7.6365, time 121.69ms
tensor(0.4373)
iter 7300: loss 8.7619, time 121.93ms
iter 7310: loss 8.8857, time 119.64ms
iter 7320: loss 8.7976, time 121.63ms
iter 7330: loss 9.0528, time 119.57ms
iter 7340: loss 8.7833, time 122.76ms
iter 7350: loss 8.6880, time 122.84ms
iter 7360: loss 8.7733, time 121.81ms
iter 7370: loss 8.5752, time 120.10ms
iter 7380: loss 8.6417, time 120.58ms
iter 7390: loss 8.5066, time 121.97ms
tensor(0.4686)
iter 7400: loss 8.5310, time 123.04ms
iter 7410: loss 9.2309, time 121.74ms
iter 7420: loss 8.4742, time 121.75ms
iter 7430: loss 8.2463, time 121.78ms
iter 7440: loss 9.4809, time 120.96ms
iter 7450: loss 8.6707, time 121.91ms
iter 7460: loss 9.0307, time 122.51ms
iter 7470: loss 8.5446, time 121.74ms
iter 7480: loss 8.6277, time 121.91ms
iter 7490: loss 8.6389, time 121.63ms
tensor(0.5000)
step 7500: train loss 7.1574, val loss 7.1144
saving checkpoint to out-shakespeare-char
iter 7500: loss 8.5175, time 2858.30ms
iter 7510: loss 8.1963, time 124.01ms
iter 7520: loss 8.5091, time 121.91ms
iter 7530: loss 8.4158, time 121.75ms
iter 7540: loss 8.7513, time 117.27ms
iter 7550: loss 8.4472, time 119.07ms
iter 7560: loss 9.0191, time 116.79ms
iter 7570: loss 8.5369, time 114.45ms
iter 7580: loss 8.9622, time 118.81ms
iter 7590: loss 7.9580, time 117.19ms
tensor(0.5314)
iter 7600: loss 8.9886, time 115.39ms
iter 7610: loss 8.5274, time 117.27ms
iter 7620: loss 8.8046, time 116.66ms
iter 7630: loss 8.3191, time 114.48ms
iter 7640: loss 8.5567, time 119.85ms
iter 7650: loss 8.1789, time 116.66ms
iter 7660: loss 8.2112, time 115.07ms
iter 7670: loss 8.6556, time 118.70ms
iter 7680: loss 8.5619, time 116.63ms
iter 7690: loss 8.3204, time 114.59ms
tensor(0.5627)
iter 7700: loss 8.0502, time 117.27ms
iter 7710: loss 8.0300, time 117.85ms
iter 7720: loss 8.5883, time 116.36ms
iter 7730: loss 8.4729, time 118.80ms
iter 7740: loss 8.5391, time 116.83ms
step 7750: train loss 7.0869, val loss 7.1124
saving checkpoint to out-shakespeare-char
iter 7750: loss 9.1235, time 2843.47ms
iter 7760: loss 9.0820, time 115.11ms
iter 7770: loss 8.5199, time 117.08ms
iter 7780: loss 8.2434, time 114.66ms
iter 7790: loss 9.1065, time 114.55ms
tensor(0.5937)
iter 7800: loss 8.5065, time 119.24ms
iter 7810: loss 8.1829, time 116.74ms
iter 7820: loss 8.7508, time 114.72ms
iter 7830: loss 8.5033, time 118.61ms
iter 7840: loss 8.5715, time 114.63ms
iter 7850: loss 8.1682, time 114.87ms
iter 7860: loss 7.6790, time 118.97ms
iter 7870: loss 7.9204, time 116.80ms
iter 7880: loss 8.5342, time 114.80ms
iter 7890: loss 9.1090, time 118.95ms
tensor(0.6243)
iter 7900: loss 8.3419, time 114.82ms
iter 7910: loss 8.3033, time 114.77ms
iter 7920: loss 9.1510, time 117.72ms
iter 7930: loss 8.5587, time 116.95ms
iter 7940: loss 8.7891, time 115.00ms
iter 7950: loss 8.8123, time 118.23ms
iter 7960: loss 8.7265, time 114.53ms
iter 7970: loss 9.3730, time 115.23ms
iter 7980: loss 9.0490, time 118.92ms
iter 7990: loss 8.5437, time 116.60ms
tensor(0.6545)
step 8000: train loss 7.0449, val loss 7.0909
saving checkpoint to out-shakespeare-char
iter 8000: loss 8.0075, time 2844.29ms
iter 8010: loss 8.3168, time 114.91ms
iter 8020: loss 8.6252, time 116.15ms
iter 8030: loss 8.0119, time 117.19ms
iter 8040: loss 7.8007, time 114.49ms
iter 8050: loss 8.8481, time 118.74ms
iter 8060: loss 8.2738, time 116.94ms
iter 8070: loss 8.4711, time 114.51ms
iter 8080: loss 8.4225, time 116.48ms
iter 8090: loss 8.8279, time 116.24ms
tensor(0.6841)
iter 8100: loss 8.5633, time 115.15ms
iter 8110: loss 7.7023, time 118.81ms
iter 8120: loss 8.4578, time 115.45ms
iter 8130: loss 8.0271, time 116.55ms
iter 8140: loss 8.1236, time 116.55ms
iter 8150: loss 7.9762, time 114.89ms
iter 8160: loss 8.4167, time 116.60ms
iter 8170: loss 8.7228, time 118.34ms
iter 8180: loss 7.6434, time 114.43ms
iter 8190: loss 9.0941, time 118.36ms
tensor(0.7129)
iter 8200: loss 7.8297, time 117.11ms
iter 8210: loss 8.3839, time 114.42ms
iter 8220: loss 8.2746, time 116.70ms
iter 8230: loss 7.8960, time 117.13ms
iter 8240: loss 8.4350, time 114.79ms
step 8250: train loss 7.0880, val loss 7.0842
saving checkpoint to out-shakespeare-char
iter 8250: loss 8.4594, time 2847.86ms
iter 8260: loss 8.7467, time 118.22ms
iter 8270: loss 8.1241, time 117.82ms
iter 8280: loss 8.2139, time 114.68ms
iter 8290: loss 8.1165, time 118.65ms
tensor(0.7409)
iter 8300: loss 8.6385, time 117.62ms
iter 8310: loss 8.3466, time 114.84ms
iter 8320: loss 7.8272, time 118.62ms
iter 8330: loss 8.0559, time 117.27ms
iter 8340: loss 8.3126, time 114.57ms
iter 8350: loss 8.3347, time 118.78ms
iter 8360: loss 8.6316, time 116.63ms
iter 8370: loss 7.9951, time 114.81ms
iter 8380: loss 8.9349, time 118.49ms
iter 8390: loss 7.7407, time 117.32ms
tensor(0.7679)
iter 8400: loss 7.8115, time 114.92ms
iter 8410: loss 7.9536, time 118.72ms
iter 8420: loss 8.1899, time 116.81ms
iter 8430: loss 9.0585, time 114.37ms
iter 8440: loss 8.4962, time 118.23ms
iter 8450: loss 8.5163, time 116.64ms
iter 8460: loss 8.1121, time 114.59ms
iter 8470: loss 8.3168, time 118.71ms
iter 8480: loss 8.7271, time 116.32ms
iter 8490: loss 8.0699, time 115.28ms
tensor(0.7939)
step 8500: train loss 7.0091, val loss 7.0329
saving checkpoint to out-shakespeare-char
iter 8500: loss 8.1052, time 2844.70ms
iter 8510: loss 8.5022, time 120.49ms
iter 8520: loss 8.6329, time 122.63ms
iter 8530: loss 8.5808, time 118.47ms
iter 8540: loss 8.7419, time 120.23ms
iter 8550: loss 8.7967, time 118.51ms
iter 8560: loss 7.8551, time 116.79ms
iter 8570: loss 8.4889, time 116.35ms
iter 8580: loss 7.3463, time 118.77ms
iter 8590: loss 8.5645, time 120.53ms
tensor(0.8187)
iter 8600: loss 8.1904, time 119.46ms
iter 8610: loss 8.2917, time 119.06ms
iter 8620: loss 7.6023, time 118.00ms
iter 8630: loss 8.6795, time 117.14ms
iter 8640: loss 9.1752, time 117.00ms
iter 8650: loss 8.0506, time 118.52ms
iter 8660: loss 8.6480, time 122.26ms
iter 8670: loss 7.9800, time 123.18ms
iter 8680: loss 8.7728, time 120.77ms
iter 8690: loss 8.4418, time 118.95ms
tensor(0.8423)
iter 8700: loss 8.7874, time 121.18ms
iter 8710: loss 7.7667, time 119.04ms
iter 8720: loss 8.4216, time 117.89ms
iter 8730: loss 7.8115, time 118.77ms
iter 8740: loss 8.2509, time 119.71ms
step 8750: train loss 7.0625, val loss 6.9912
saving checkpoint to out-shakespeare-char
iter 8750: loss 8.4848, time 2856.37ms
iter 8760: loss 7.9697, time 112.95ms
iter 8770: loss 8.0003, time 117.02ms
iter 8780: loss 8.4746, time 117.14ms
iter 8790: loss 7.8900, time 114.88ms
tensor(0.8645)
iter 8800: loss 7.3100, time 118.35ms
iter 8810: loss 8.2417, time 116.89ms
iter 8820: loss 8.7595, time 113.67ms
iter 8830: loss 7.4916, time 116.77ms
iter 8840: loss 7.4602, time 116.71ms
iter 8850: loss 7.7457, time 115.31ms
iter 8860: loss 8.2964, time 118.89ms
iter 8870: loss 8.3848, time 117.02ms
iter 8880: loss 7.8921, time 113.79ms
iter 8890: loss 8.7499, time 116.46ms
tensor(0.8853)
iter 8900: loss 8.1895, time 117.17ms
iter 8910: loss 8.0162, time 115.67ms
iter 8920: loss 8.6278, time 117.41ms
iter 8930: loss 7.8026, time 116.92ms
iter 8940: loss 8.6539, time 113.75ms
iter 8950: loss 8.7196, time 116.80ms
iter 8960: loss 8.8827, time 117.43ms
iter 8970: loss 7.8652, time 114.63ms
iter 8980: loss 7.4628, time 117.93ms
iter 8990: loss 7.9179, time 117.32ms
tensor(0.9045)
step 9000: train loss 7.0476, val loss 7.0002
saving checkpoint to out-shakespeare-char
iter 9000: loss 7.5662, time 2860.48ms
iter 9010: loss 8.7532, time 114.40ms
iter 9020: loss 8.0891, time 117.62ms
iter 9030: loss 8.0336, time 118.32ms
iter 9040: loss 8.3301, time 115.48ms
iter 9050: loss 7.7988, time 117.56ms
iter 9060: loss 7.9341, time 118.80ms
iter 9070: loss 8.8675, time 115.75ms
iter 9080: loss 8.3803, time 116.33ms
iter 9090: loss 7.8883, time 118.19ms
tensor(0.9222)
iter 9100: loss 8.2358, time 114.76ms
iter 9110: loss 7.9938, time 116.39ms
iter 9120: loss 8.6362, time 117.90ms
iter 9130: loss 7.7330, time 115.03ms
iter 9140: loss 8.3786, time 119.23ms
iter 9150: loss 7.7606, time 116.26ms
iter 9160: loss 8.9816, time 115.41ms
iter 9170: loss 7.3833, time 118.73ms
iter 9180: loss 8.7517, time 115.20ms
iter 9190: loss 7.6234, time 115.25ms
tensor(0.9382)
iter 9200: loss 7.5363, time 118.98ms
iter 9210: loss 7.3918, time 116.24ms
iter 9220: loss 8.8379, time 114.68ms
iter 9230: loss 8.2072, time 119.02ms
iter 9240: loss 7.5295, time 114.28ms
step 9250: train loss 6.9960, val loss 7.0516
saving checkpoint to out-shakespeare-char
iter 9250: loss 8.5221, time 2848.48ms
iter 9260: loss 8.2988, time 114.91ms
iter 9270: loss 8.1951, time 119.41ms
iter 9280: loss 8.0892, time 114.36ms
iter 9290: loss 8.7449, time 117.53ms
tensor(0.9524)
iter 9300: loss 8.0243, time 123.02ms
iter 9310: loss 8.0929, time 114.25ms
iter 9320: loss 7.9096, time 116.94ms
iter 9330: loss 8.2086, time 117.77ms
iter 9340: loss 8.2177, time 114.37ms
iter 9350: loss 7.1628, time 118.72ms
iter 9360: loss 8.0231, time 117.23ms
iter 9370: loss 7.4479, time 114.73ms
iter 9380: loss 7.5188, time 118.69ms
iter 9390: loss 7.7895, time 117.28ms
tensor(0.9649)
iter 9400: loss 8.5846, time 114.90ms
iter 9410: loss 8.0618, time 117.97ms
iter 9420: loss 7.8549, time 116.96ms
iter 9430: loss 8.4210, time 114.60ms
iter 9440: loss 7.7874, time 118.75ms
iter 9450: loss 7.7862, time 124.34ms
iter 9460: loss 7.7812, time 116.99ms
iter 9470: loss 7.9933, time 114.71ms
iter 9480: loss 7.8619, time 117.08ms
iter 9490: loss 8.3478, time 117.88ms
tensor(0.9755)
step 9500: train loss 7.0896, val loss 7.0937
saving checkpoint to out-shakespeare-char
iter 9500: loss 7.9824, time 2848.62ms
iter 9510: loss 8.0828, time 114.31ms
iter 9520: loss 7.4396, time 115.21ms
iter 9530: loss 7.9370, time 117.37ms
iter 9540: loss 7.9501, time 114.94ms
iter 9550: loss 7.9947, time 118.49ms
iter 9560: loss 7.8981, time 117.48ms
iter 9570: loss 7.7271, time 114.45ms
iter 9580: loss 7.7963, time 116.55ms
iter 9590: loss 8.2410, time 117.26ms
tensor(0.9843)
iter 9600: loss 8.1922, time 114.76ms
iter 9610: loss 7.3705, time 118.81ms
iter 9620: loss 7.5723, time 116.31ms
iter 9630: loss 8.3406, time 115.20ms
iter 9640: loss 7.9412, time 116.82ms
iter 9650: loss 8.3628, time 114.03ms
iter 9660: loss 7.7214, time 116.56ms
iter 9670: loss 8.4149, time 117.37ms
iter 9680: loss 7.8238, time 114.46ms
iter 9690: loss 8.4057, time 117.63ms
tensor(0.9911)
iter 9700: loss 8.4331, time 116.40ms
iter 9710: loss 8.2788, time 114.98ms
iter 9720: loss 8.7860, time 116.73ms
iter 9730: loss 8.4443, time 115.49ms
iter 9740: loss 7.9984, time 114.89ms
step 9750: train loss 7.0577, val loss 7.0318
saving checkpoint to out-shakespeare-char
iter 9750: loss 8.3799, time 2831.73ms
iter 9760: loss 8.6740, time 114.02ms
iter 9770: loss 8.6093, time 117.54ms
iter 9780: loss 8.4221, time 113.08ms
iter 9790: loss 8.1523, time 117.70ms
tensor(0.9961)
iter 9800: loss 8.1284, time 112.97ms
iter 9810: loss 7.6667, time 116.80ms
iter 9820: loss 7.8305, time 112.71ms
iter 9830: loss 7.4530, time 116.75ms
iter 9840: loss 7.6786, time 113.73ms
iter 9850: loss 7.8528, time 114.90ms
iter 9860: loss 8.4239, time 112.68ms
iter 9870: loss 8.4897, time 115.18ms
iter 9880: loss 7.5913, time 112.71ms
iter 9890: loss 7.9975, time 117.18ms
tensor(0.9990)
iter 9900: loss 7.9092, time 113.14ms
iter 9910: loss 7.1642, time 116.54ms
iter 9920: loss 8.0684, time 115.49ms
iter 9930: loss 8.1762, time 116.22ms
iter 9940: loss 7.6288, time 118.70ms
iter 9950: loss 7.2040, time 115.56ms
iter 9960: loss 7.3086, time 116.44ms
iter 9970: loss 8.2646, time 118.62ms
iter 9980: loss 8.2094, time 115.63ms
iter 9990: loss 8.2486, time 116.73ms
tensor(1.)
step 10000: train loss 7.0243, val loss 7.0519
saving checkpoint to out-shakespeare-char
iter 10000: loss 7.8855, time 2849.72ms
iter 10010: loss 8.0127, time 117.67ms
iter 10020: loss 7.6982, time 115.35ms
iter 10030: loss 8.0780, time 117.01ms
iter 10040: loss 8.1282, time 118.69ms
iter 10050: loss 8.3264, time 115.41ms
iter 10060: loss 8.4404, time 115.76ms
iter 10070: loss 8.4921, time 116.79ms
iter 10080: loss 8.4727, time 115.04ms
iter 10090: loss 7.9516, time 116.30ms
tensor(0.9990)
iter 10100: loss 8.0697, time 118.90ms
iter 10110: loss 7.9098, time 115.04ms
iter 10120: loss 7.7100, time 116.59ms
iter 10130: loss 8.3460, time 117.23ms
iter 10140: loss 7.8719, time 115.52ms
iter 10150: loss 7.5429, time 116.81ms
iter 10160: loss 7.5850, time 119.51ms
iter 10170: loss 7.7793, time 115.90ms
iter 10180: loss 8.1284, time 115.09ms
iter 10190: loss 7.4217, time 116.96ms
tensor(0.9961)
iter 10200: loss 8.7940, time 115.63ms
iter 10210: loss 7.4834, time 116.70ms
iter 10220: loss 8.0640, time 118.78ms
iter 10230: loss 7.8042, time 115.45ms
iter 10240: loss 8.2583, time 116.10ms
step 10250: train loss 7.0474, val loss 7.0282
saving checkpoint to out-shakespeare-char
iter 10250: loss 7.6722, time 2841.43ms
iter 10260: loss 8.3410, time 118.89ms
iter 10270: loss 6.9603, time 115.41ms
iter 10280: loss 7.6816, time 116.67ms
iter 10290: loss 8.1877, time 118.58ms
tensor(0.9911)
iter 10300: loss 7.9758, time 116.05ms
iter 10310: loss 7.4087, time 114.77ms
iter 10320: loss 7.5816, time 118.59ms
iter 10330: loss 8.1164, time 115.15ms
iter 10340: loss 6.9619, time 116.60ms
iter 10350: loss 7.8863, time 118.93ms
iter 10360: loss 7.3552, time 115.36ms
iter 10370: loss 7.6251, time 114.27ms
iter 10380: loss 8.3042, time 118.55ms
iter 10390: loss 7.5196, time 115.38ms
tensor(0.9843)
iter 10400: loss 8.2972, time 117.03ms
iter 10410: loss 8.2402, time 118.67ms
iter 10420: loss 7.8186, time 115.65ms
iter 10430: loss 7.4994, time 115.04ms
iter 10440: loss 8.4307, time 118.69ms
iter 10450: loss 7.5780, time 115.58ms
iter 10460: loss 8.3470, time 116.28ms
iter 10470: loss 7.7216, time 118.67ms
iter 10480: loss 7.9412, time 115.46ms
iter 10490: loss 7.7380, time 114.86ms
tensor(0.9755)
step 10500: train loss 7.1669, val loss 7.1405
saving checkpoint to out-shakespeare-char
iter 10500: loss 7.8663, time 2851.17ms
iter 10510: loss 8.0773, time 118.66ms
iter 10520: loss 8.1000, time 114.57ms
iter 10530: loss 8.1321, time 116.63ms
iter 10540: loss 7.6941, time 118.17ms
iter 10550: loss 7.6285, time 115.53ms
iter 10560: loss 7.2209, time 115.39ms
iter 10570: loss 7.9557, time 118.49ms
iter 10580: loss 7.8916, time 114.59ms
iter 10590: loss 7.3670, time 116.64ms
tensor(0.9649)
iter 10600: loss 7.0615, time 119.18ms
iter 10610: loss 7.0925, time 115.58ms
iter 10620: loss 8.5799, time 116.16ms
iter 10630: loss 7.1936, time 119.17ms
iter 10640: loss 8.3547, time 115.00ms
iter 10650: loss 7.3550, time 116.61ms
iter 10660: loss 7.9279, time 119.00ms
iter 10670: loss 7.1399, time 115.57ms
iter 10680: loss 8.0245, time 116.82ms
iter 10690: loss 7.6622, time 118.59ms
tensor(0.9524)
iter 10700: loss 7.8353, time 115.40ms
iter 10710: loss 7.4726, time 116.74ms
iter 10720: loss 7.2729, time 117.54ms
iter 10730: loss 7.6833, time 115.51ms
iter 10740: loss 7.8628, time 116.40ms
step 10750: train loss 7.0683, val loss 7.0284
saving checkpoint to out-shakespeare-char
iter 10750: loss 7.9806, time 2843.14ms
iter 10760: loss 7.7771, time 116.53ms
iter 10770: loss 7.8609, time 113.71ms
iter 10780: loss 7.7921, time 116.59ms
iter 10790: loss 7.9132, time 116.45ms
tensor(0.9382)
iter 10800: loss 7.9816, time 115.16ms
iter 10810: loss 8.5559, time 118.50ms
iter 10820: loss 8.5643, time 115.63ms
iter 10830: loss 7.9059, time 116.66ms
iter 10840: loss 7.6578, time 116.74ms
iter 10850: loss 7.3850, time 114.20ms
iter 10860: loss 7.7170, time 117.24ms
iter 10870: loss 7.9669, time 118.24ms
iter 10880: loss 7.3744, time 114.46ms
iter 10890: loss 7.5999, time 116.83ms
tensor(0.9222)
iter 10900: loss 7.4242, time 116.12ms
iter 10910: loss 6.6175, time 114.08ms
iter 10920: loss 7.6056, time 118.55ms
iter 10930: loss 7.9022, time 116.83ms
iter 10940: loss 7.5648, time 114.39ms
iter 10950: loss 8.4790, time 118.40ms
iter 10960: loss 7.6663, time 114.35ms
iter 10970: loss 7.1211, time 113.93ms
iter 10980: loss 8.1295, time 118.81ms
iter 10990: loss 7.9136, time 115.89ms
tensor(0.9045)
step 11000: train loss 7.0420, val loss 6.9743
saving checkpoint to out-shakespeare-char
iter 11000: loss 7.7708, time 2835.62ms
iter 11010: loss 7.6870, time 116.80ms
iter 11020: loss 7.4067, time 116.54ms
iter 11030: loss 7.4144, time 114.32ms
iter 11040: loss 7.3590, time 116.53ms
iter 11050: loss 8.3477, time 116.40ms
iter 11060: loss 8.2735, time 114.81ms
iter 11070: loss 8.0182, time 118.44ms
iter 11080: loss 7.8251, time 115.42ms
iter 11090: loss 7.1450, time 116.62ms
tensor(0.8853)
iter 11100: loss 8.4637, time 117.14ms
iter 11110: loss 8.1616, time 114.78ms
iter 11120: loss 7.5595, time 116.63ms
iter 11130: loss 8.6563, time 118.67ms
iter 11140: loss 7.6339, time 114.96ms
iter 11150: loss 7.8274, time 117.09ms
iter 11160: loss 7.6972, time 116.54ms
iter 11170: loss 6.8424, time 115.84ms
iter 11180: loss 8.5140, time 116.90ms
iter 11190: loss 8.0291, time 118.77ms
tensor(0.8645)
iter 11200: loss 7.9071, time 117.04ms
iter 11210: loss 7.9946, time 116.64ms
iter 11220: loss 7.5801, time 116.93ms
iter 11230: loss 7.1520, time 115.67ms
iter 11240: loss 7.4809, time 115.45ms
step 11250: train loss 7.1354, val loss 7.1288
saving checkpoint to out-shakespeare-char
iter 11250: loss 8.3722, time 2840.13ms
iter 11260: loss 7.9800, time 121.42ms
iter 11270: loss 8.3913, time 121.14ms
iter 11280: loss 7.1250, time 121.05ms
iter 11290: loss 7.6476, time 118.83ms
tensor(0.8423)
iter 11300: loss 7.7617, time 119.80ms
iter 11310: loss 7.1795, time 120.80ms
iter 11320: loss 8.5494, time 120.94ms
iter 11330: loss 7.9434, time 121.36ms
iter 11340: loss 8.4930, time 123.04ms
iter 11350: loss 7.9739, time 121.61ms
iter 11360: loss 7.6480, time 121.23ms
iter 11370: loss 7.9616, time 118.93ms
iter 11380: loss 7.6007, time 118.35ms
iter 11390: loss 7.6338, time 119.35ms
tensor(0.8187)
iter 11400: loss 7.7803, time 121.65ms
iter 11410: loss 7.4407, time 121.28ms
iter 11420: loss 7.9248, time 121.03ms
iter 11430: loss 7.2697, time 120.79ms
iter 11440: loss 7.7599, time 123.25ms
iter 11450: loss 8.0581, time 118.39ms
iter 11460: loss 7.5655, time 116.92ms
iter 11470: loss 7.8250, time 114.41ms
iter 11480: loss 8.4072, time 118.69ms
iter 11490: loss 7.2736, time 116.76ms
tensor(0.7939)
step 11500: train loss 7.1429, val loss 7.0508
saving checkpoint to out-shakespeare-char
iter 11500: loss 7.9507, time 2832.47ms
iter 11510: loss 6.9069, time 116.56ms
iter 11520: loss 8.0695, time 121.45ms
iter 11530: loss 6.7061, time 120.59ms
iter 11540: loss 7.6172, time 118.27ms
iter 11550: loss 6.4361, time 120.62ms
iter 11560: loss 7.1806, time 119.12ms
iter 11570: loss 7.5297, time 119.91ms
iter 11580: loss 7.9439, time 121.37ms
iter 11590: loss 7.9085, time 121.36ms
tensor(0.7679)
iter 11600: loss 7.8648, time 121.21ms
iter 11610: loss 7.4741, time 123.66ms
iter 11620: loss 7.7408, time 121.59ms
iter 11630: loss 7.3835, time 120.93ms
iter 11640: loss 7.5218, time 119.18ms
iter 11650: loss 7.4731, time 120.19ms
iter 11660: loss 8.1062, time 121.00ms
iter 11670: loss 7.1507, time 121.07ms
iter 11680: loss 7.3224, time 120.90ms
iter 11690: loss 8.2708, time 122.51ms
tensor(0.7409)
iter 11700: loss 8.3036, time 124.02ms
iter 11710: loss 7.8220, time 121.19ms
iter 11720: loss 7.3637, time 121.14ms
iter 11730: loss 7.6205, time 119.30ms
iter 11740: loss 8.6056, time 119.11ms
step 11750: train loss 7.1175, val loss 7.0700
saving checkpoint to out-shakespeare-char
iter 11750: loss 7.3774, time 2864.33ms
iter 11760: loss 7.2925, time 121.34ms
iter 11770: loss 6.8274, time 121.10ms
iter 11780: loss 8.7236, time 119.72ms
iter 11790: loss 7.9861, time 119.45ms
tensor(0.7129)
iter 11800: loss 7.5598, time 119.26ms
iter 11810: loss 8.0257, time 121.10ms
iter 11820: loss 7.0797, time 121.13ms
iter 11830: loss 7.7442, time 121.94ms
iter 11840: loss 7.4931, time 122.67ms
iter 11850: loss 7.2497, time 121.18ms
iter 11860: loss 7.7742, time 123.38ms
iter 11870: loss 7.4710, time 121.93ms
iter 11880: loss 6.8717, time 121.08ms
iter 11890: loss 7.9741, time 121.36ms
tensor(0.6841)
iter 11900: loss 7.1790, time 119.46ms
iter 11910: loss 7.2111, time 120.20ms
iter 11920: loss 7.1311, time 121.55ms
iter 11930: loss 7.2313, time 121.21ms
iter 11940: loss 7.6641, time 122.17ms
iter 11950: loss 7.5949, time 122.90ms
iter 11960: loss 7.3477, time 123.23ms
iter 11970: loss 7.7420, time 120.98ms
iter 11980: loss 7.4911, time 118.91ms
iter 11990: loss 7.9496, time 121.00ms
tensor(0.6545)
step 12000: train loss 6.9904, val loss 7.0939
saving checkpoint to out-shakespeare-char
iter 12000: loss 7.9853, time 2869.10ms
iter 12010: loss 8.1744, time 121.47ms
iter 12020: loss 7.5684, time 121.27ms
iter 12030: loss 7.6746, time 121.32ms
iter 12040: loss 7.5664, time 118.83ms
iter 12050: loss 7.0182, time 119.50ms
iter 12060: loss 6.9050, time 121.05ms
iter 12070: loss 8.2111, time 121.46ms
iter 12080: loss 7.2298, time 121.62ms
iter 12090: loss 7.8618, time 122.94ms
tensor(0.6243)
iter 12100: loss 7.7837, time 121.69ms
iter 12110: loss 7.6625, time 121.18ms
iter 12120: loss 7.4972, time 121.20ms
iter 12130: loss 7.8493, time 119.18ms
iter 12140: loss 7.5571, time 119.00ms
iter 12150: loss 7.4453, time 119.84ms
iter 12160: loss 7.0629, time 121.16ms
iter 12170: loss 7.8641, time 121.25ms
iter 12180: loss 7.3571, time 121.19ms
iter 12190: loss 7.6351, time 123.38ms
tensor(0.5937)
iter 12200: loss 7.5660, time 123.92ms
iter 12210: loss 7.4763, time 120.93ms
iter 12220: loss 7.8725, time 121.07ms
iter 12230: loss 8.0011, time 118.89ms
iter 12240: loss 7.8533, time 119.25ms
step 12250: train loss 7.0478, val loss 7.0477
saving checkpoint to out-shakespeare-char
iter 12250: loss 7.6611, time 2866.34ms
iter 12260: loss 7.6017, time 121.47ms
iter 12270: loss 7.2634, time 120.85ms
iter 12280: loss 7.7300, time 121.10ms
iter 12290: loss 7.7909, time 119.16ms
tensor(0.5627)
iter 12300: loss 7.4984, time 119.48ms
iter 12310: loss 6.8490, time 121.26ms
iter 12320: loss 7.5157, time 121.08ms
iter 12330: loss 8.8154, time 120.60ms
iter 12340: loss 7.8086, time 121.91ms
iter 12350: loss 7.8079, time 120.58ms
iter 12360: loss 7.6410, time 122.67ms
iter 12370: loss 7.5009, time 120.41ms
iter 12380: loss 7.2363, time 120.30ms
iter 12390: loss 7.3183, time 121.53ms
tensor(0.5314)
iter 12400: loss 7.5065, time 121.06ms
iter 12410: loss 7.9318, time 120.84ms
iter 12420: loss 7.8126, time 121.13ms
iter 12430: loss 7.1852, time 118.64ms
iter 12440: loss 8.4377, time 119.70ms
iter 12450: loss 7.1543, time 119.71ms
iter 12460: loss 7.4311, time 120.44ms
iter 12470: loss 7.6637, time 120.78ms
iter 12480: loss 7.9848, time 121.09ms
iter 12490: loss 7.7877, time 121.45ms
tensor(0.5000)
step 12500: train loss 7.0167, val loss 6.9763
saving checkpoint to out-shakespeare-char
iter 12500: loss 7.3295, time 2844.19ms
iter 12510: loss 7.2582, time 121.44ms
iter 12520: loss 7.5788, time 121.43ms
iter 12530: loss 8.1993, time 119.19ms
iter 12540: loss 7.9519, time 120.06ms
iter 12550: loss 6.4690, time 121.48ms
iter 12560: loss 7.9379, time 121.15ms
iter 12570: loss 6.5618, time 123.26ms
iter 12580: loss 7.6524, time 123.50ms
iter 12590: loss 7.4018, time 119.47ms
tensor(0.4686)
iter 12600: loss 7.7152, time 121.64ms
iter 12610: loss 7.3771, time 119.57ms
iter 12620: loss 7.7381, time 120.58ms
iter 12630: loss 7.3189, time 121.22ms
iter 12640: loss 7.7243, time 121.36ms
iter 12650: loss 7.2174, time 120.59ms
iter 12660: loss 7.2116, time 123.58ms
iter 12670: loss 8.8858, time 121.26ms
iter 12680: loss 7.1643, time 121.10ms
iter 12690: loss 8.2715, time 121.57ms
tensor(0.4373)
iter 12700: loss 7.6889, time 118.72ms
iter 12710: loss 7.8088, time 118.94ms
iter 12720: loss 6.9648, time 117.98ms
iter 12730: loss 7.9387, time 119.29ms
iter 12740: loss 7.7380, time 119.66ms
step 12750: train loss 6.9493, val loss 7.0187
saving checkpoint to out-shakespeare-char
iter 12750: loss 7.5324, time 2844.25ms
iter 12760: loss 7.6010, time 120.65ms
iter 12770: loss 7.2284, time 122.23ms
iter 12780: loss 7.3520, time 120.63ms
iter 12790: loss 8.0538, time 120.81ms
tensor(0.4063)
iter 12800: loss 7.9606, time 121.06ms
iter 12810: loss 7.5866, time 121.50ms
iter 12820: loss 7.5014, time 120.07ms
iter 12830: loss 6.9650, time 119.94ms
iter 12840: loss 7.3359, time 121.73ms
iter 12850: loss 6.9833, time 122.96ms
iter 12860: loss 7.4498, time 123.07ms
iter 12870: loss 7.3240, time 121.85ms
iter 12880: loss 7.1336, time 121.75ms
iter 12890: loss 7.4000, time 120.20ms
tensor(0.3757)
iter 12900: loss 8.3999, time 118.74ms
iter 12910: loss 7.9205, time 121.76ms
iter 12920: loss 7.9133, time 123.32ms
iter 12930: loss 7.1080, time 122.14ms
iter 12940: loss 8.1927, time 121.46ms
iter 12950: loss 7.1808, time 121.83ms
iter 12960: loss 7.4267, time 120.51ms
iter 12970: loss 6.7195, time 120.67ms
iter 12980: loss 7.9368, time 121.21ms
iter 12990: loss 7.7253, time 123.29ms
tensor(0.3455)
step 13000: train loss 6.9392, val loss 6.8749
saving checkpoint to out-shakespeare-char
iter 13000: loss 7.4808, time 2863.42ms
iter 13010: loss 8.0692, time 121.72ms
iter 13020: loss 7.3778, time 123.16ms
iter 13030: loss 7.8709, time 122.17ms
iter 13040: loss 7.3754, time 119.45ms
iter 13050: loss 7.2737, time 121.58ms
iter 13060: loss 7.1704, time 120.24ms
iter 13070: loss 8.0272, time 121.59ms
iter 13080: loss 6.6612, time 124.36ms
iter 13090: loss 7.3486, time 121.67ms
tensor(0.3159)
iter 13100: loss 7.9428, time 121.69ms
iter 13110: loss 7.5865, time 119.29ms
iter 13120: loss 6.5610, time 120.73ms
iter 13130: loss 7.6223, time 122.05ms
iter 13140: loss 7.4778, time 123.00ms
iter 13150: loss 7.6264, time 124.02ms
iter 13160: loss 7.5808, time 121.75ms
iter 13170: loss 7.4731, time 121.49ms
iter 13180: loss 8.1850, time 120.79ms
iter 13190: loss 7.3190, time 119.47ms
tensor(0.2871)
iter 13200: loss 6.9559, time 122.51ms
iter 13210: loss 7.7337, time 124.16ms
iter 13220: loss 6.3887, time 121.45ms
iter 13230: loss 8.6293, time 121.72ms
iter 13240: loss 6.8968, time 120.48ms
step 13250: train loss 6.9278, val loss 6.9534
saving checkpoint to out-shakespeare-char
iter 13250: loss 7.8416, time 2824.06ms
iter 13260: loss 7.5310, time 121.30ms
iter 13270: loss 6.8554, time 123.82ms
iter 13280: loss 7.3959, time 121.70ms
iter 13290: loss 8.1450, time 121.55ms
tensor(0.2591)
iter 13300: loss 7.2963, time 119.32ms
iter 13310: loss 7.6208, time 121.86ms
iter 13320: loss 7.6522, time 121.27ms
iter 13330: loss 7.9352, time 121.49ms
iter 13340: loss 7.0121, time 121.50ms
iter 13350: loss 7.7389, time 121.79ms
iter 13360: loss 7.5524, time 121.79ms
iter 13370: loss 6.7926, time 119.63ms
iter 13380: loss 7.1300, time 121.57ms
iter 13390: loss 6.7146, time 120.70ms
tensor(0.2321)
iter 13400: loss 7.8160, time 123.99ms
iter 13410: loss 6.8313, time 121.55ms
iter 13420: loss 6.4524, time 122.68ms
iter 13430: loss 8.1706, time 119.76ms
iter 13440: loss 6.7090, time 120.80ms
iter 13450: loss 7.3613, time 121.69ms
iter 13460: loss 6.9886, time 123.30ms
iter 13470: loss 7.5740, time 121.77ms
iter 13480: loss 7.0369, time 121.89ms
iter 13490: loss 7.0120, time 121.41ms
tensor(0.2061)
step 13500: train loss 6.8199, val loss 6.8326
saving checkpoint to out-shakespeare-char
iter 13500: loss 7.1463, time 2823.22ms
iter 13510: loss 7.0587, time 121.86ms
iter 13520: loss 7.5946, time 122.43ms
iter 13530: loss 7.5539, time 120.73ms
iter 13540: loss 7.1084, time 121.81ms
iter 13550: loss 6.4016, time 121.59ms
iter 13560: loss 7.2973, time 121.94ms
iter 13570: loss 6.9455, time 119.72ms
iter 13580: loss 7.6106, time 121.58ms
iter 13590: loss 6.7755, time 119.47ms
tensor(0.1813)
iter 13600: loss 6.6847, time 121.84ms
iter 13610: loss 7.8223, time 124.17ms
iter 13620: loss 7.7382, time 120.90ms
iter 13630: loss 7.5207, time 121.96ms
iter 13640: loss 6.6562, time 120.14ms
iter 13650: loss 7.7918, time 121.93ms
iter 13660: loss 7.8443, time 122.36ms
iter 13670: loss 6.5393, time 121.64ms
iter 13680: loss 7.6303, time 120.66ms
iter 13690: loss 7.1728, time 121.04ms
tensor(0.1577)
iter 13700: loss 7.7730, time 122.27ms
iter 13710: loss 6.8895, time 120.78ms
iter 13720: loss 6.9426, time 121.79ms
iter 13730: loss 7.4175, time 120.69ms
iter 13740: loss 6.8973, time 124.07ms
step 13750: train loss 6.7673, val loss 6.7919
saving checkpoint to out-shakespeare-char
iter 13750: loss 7.0716, time 2851.37ms
iter 13760: loss 7.9303, time 120.91ms
iter 13770: loss 7.4973, time 122.40ms
iter 13780: loss 6.1362, time 123.25ms
iter 13790: loss 7.0303, time 122.21ms
tensor(0.1355)
iter 13800: loss 7.5379, time 121.94ms
iter 13810: loss 6.3140, time 121.63ms
iter 13820: loss 7.3008, time 119.99ms
iter 13830: loss 7.1117, time 121.31ms
iter 13840: loss 6.8393, time 122.24ms
iter 13850: loss 6.8491, time 123.88ms
iter 13860: loss 6.6227, time 121.88ms
iter 13870: loss 7.5073, time 119.67ms
iter 13880: loss 7.0810, time 119.58ms
iter 13890: loss 6.8365, time 121.52ms
tensor(0.1147)
iter 13900: loss 6.7829, time 122.47ms
iter 13910: loss 8.2426, time 123.92ms
iter 13920: loss 7.2045, time 121.81ms
iter 13930: loss 6.8297, time 121.85ms
iter 13940: loss 7.1255, time 119.48ms
iter 13950: loss 7.4125, time 120.85ms
iter 13960: loss 6.5739, time 121.76ms
iter 13970: loss 7.1420, time 123.23ms
iter 13980: loss 7.4181, time 122.74ms
iter 13990: loss 6.7620, time 121.70ms
tensor(0.0955)
step 14000: train loss 6.6943, val loss 6.7274
saving checkpoint to out-shakespeare-char
iter 14000: loss 7.5017, time 2835.29ms
iter 14010: loss 6.9717, time 120.07ms
iter 14020: loss 6.5978, time 120.84ms
iter 14030: loss 7.6154, time 120.86ms
iter 14040: loss 7.1233, time 121.55ms
iter 14050: loss 7.3404, time 122.44ms
iter 14060: loss 6.8368, time 123.82ms
iter 14070: loss 7.1060, time 120.03ms
iter 14080: loss 7.3122, time 121.72ms
iter 14090: loss 6.8844, time 119.73ms
tensor(0.0778)
iter 14100: loss 7.1428, time 121.61ms
iter 14110: loss 7.1556, time 121.64ms
iter 14120: loss 6.8231, time 123.04ms
iter 14130: loss 7.0774, time 121.79ms
iter 14140: loss 7.6301, time 121.65ms
iter 14150: loss 7.5704, time 121.64ms
iter 14160: loss 7.0865, time 119.92ms
iter 14170: loss 6.2813, time 121.24ms
iter 14180: loss 6.6495, time 121.85ms
iter 14190: loss 6.8948, time 123.86ms
tensor(0.0618)
iter 14200: loss 6.6979, time 122.00ms
iter 14210: loss 6.3022, time 119.74ms
iter 14220: loss 6.6946, time 119.93ms
iter 14230: loss 6.9782, time 121.06ms
iter 14240: loss 6.6213, time 121.83ms
step 14250: train loss 6.6213, val loss 6.6338
saving checkpoint to out-shakespeare-char
iter 14250: loss 6.5443, time 2848.39ms
iter 14260: loss 6.9610, time 121.89ms
iter 14270: loss 7.0981, time 120.34ms
iter 14280: loss 6.6030, time 119.04ms
iter 14290: loss 7.8016, time 121.88ms
tensor(0.0476)
iter 14300: loss 6.6683, time 124.22ms
iter 14310: loss 7.6037, time 121.60ms
iter 14320: loss 6.7786, time 121.83ms
iter 14330: loss 6.4652, time 120.28ms
iter 14340: loss 6.7784, time 121.91ms
iter 14350: loss 7.3649, time 122.30ms
iter 14360: loss 6.9177, time 121.78ms
iter 14370: loss 6.1937, time 121.77ms
iter 14380: loss 6.5451, time 121.58ms
iter 14390: loss 6.8611, time 119.58ms
tensor(0.0351)
iter 14400: loss 6.5504, time 120.62ms
iter 14410: loss 5.9524, time 121.80ms
iter 14420: loss 7.2692, time 121.22ms
iter 14430: loss 7.1599, time 123.91ms
iter 14440: loss 7.4392, time 121.85ms
iter 14450: loss 7.1758, time 119.12ms
iter 14460: loss 7.1734, time 119.72ms
iter 14470: loss 7.4100, time 121.18ms
iter 14480: loss 6.3505, time 122.59ms
iter 14490: loss 7.1048, time 123.59ms
tensor(0.0245)
step 14500: train loss 6.5262, val loss 6.5671
saving checkpoint to out-shakespeare-char
iter 14500: loss 7.1734, time 2850.25ms
iter 14510: loss 7.3199, time 119.53ms
iter 14520: loss 6.8860, time 121.04ms
iter 14530: loss 7.0751, time 121.90ms
iter 14540: loss 7.1313, time 123.16ms
iter 14550: loss 6.5744, time 123.58ms
iter 14560: loss 6.2818, time 120.86ms
iter 14570: loss 6.9988, time 121.81ms
iter 14580: loss 6.0089, time 119.43ms
iter 14590: loss 7.2027, time 119.37ms
tensor(0.0157)
iter 14600: loss 6.9185, time 119.45ms
iter 14610: loss 7.3626, time 121.36ms
iter 14620: loss 6.2681, time 120.94ms
iter 14630: loss 7.7115, time 123.26ms
iter 14640: loss 7.6922, time 123.54ms
iter 14650: loss 7.1370, time 121.50ms
iter 14660: loss 6.7942, time 119.81ms
iter 14670: loss 6.7087, time 119.84ms
iter 14680: loss 7.3855, time 121.15ms
iter 14690: loss 6.7836, time 122.47ms
tensor(0.0089)
iter 14700: loss 7.0631, time 121.34ms
iter 14710: loss 6.8263, time 123.60ms
iter 14720: loss 6.9574, time 122.22ms
iter 14730: loss 6.7094, time 119.70ms
iter 14740: loss 7.4421, time 119.37ms
step 14750: train loss 6.5388, val loss 6.5567
saving checkpoint to out-shakespeare-char
iter 14750: loss 6.6968, time 2814.54ms
iter 14760: loss 7.3030, time 120.15ms
iter 14770: loss 7.0676, time 121.17ms
iter 14780: loss 6.8633, time 120.36ms
iter 14790: loss 6.6234, time 122.31ms
tensor(0.0039)
iter 14800: loss 7.1520, time 123.03ms
iter 14810: loss 6.8118, time 121.25ms
iter 14820: loss 6.6717, time 123.02ms
iter 14830: loss 6.2599, time 121.40ms
iter 14840: loss 6.8291, time 121.46ms
iter 14850: loss 6.3315, time 118.75ms
iter 14860: loss 6.9005, time 118.79ms
iter 14870: loss 7.7379, time 118.95ms
iter 14880: loss 7.0217, time 120.08ms
iter 14890: loss 6.3129, time 121.03ms
tensor(0.0010)
iter 14900: loss 7.0525, time 121.15ms
iter 14910: loss 6.4692, time 121.88ms
iter 14920: loss 7.1145, time 120.81ms
iter 14930: loss 6.3664, time 122.55ms
iter 14940: loss 6.7581, time 123.41ms
iter 14950: loss 7.4529, time 122.68ms
iter 14960: loss 6.6605, time 123.11ms
iter 14970: loss 7.0678, time 121.21ms
iter 14980: loss 6.7323, time 120.13ms
iter 14990: loss 7.0752, time 120.04ms
tensor(0.0010)
step 15000: train loss 6.5255, val loss 6.5001
saving checkpoint to out-shakespeare-char
iter 15000: loss 6.8911, time 2838.02ms
iter 15010: loss 6.7468, time 119.33ms
iter 15020: loss 6.0672, time 120.29ms
iter 15030: loss 6.6652, time 118.69ms
iter 15040: loss 6.8297, time 120.89ms
iter 15050: loss 6.6510, time 121.34ms
iter 15060: loss 7.2245, time 120.86ms
iter 15070: loss 7.7627, time 120.48ms
iter 15080: loss 7.1496, time 120.69ms
iter 15090: loss 7.0292, time 123.06ms
tensor(0.0010)
iter 15100: loss 7.3712, time 123.16ms
iter 15110: loss 5.8560, time 122.29ms
iter 15120: loss 6.9697, time 122.63ms
iter 15130: loss 6.8139, time 120.80ms
iter 15140: loss 7.6309, time 122.67ms
iter 15150: loss 6.4211, time 122.53ms
iter 15160: loss 6.8663, time 120.69ms
iter 15170: loss 7.1872, time 119.80ms
iter 15180: loss 6.2160, time 121.05ms
iter 15190: loss 6.7112, time 118.68ms
tensor(0.0039)
iter 15200: loss 6.9767, time 118.91ms
iter 15210: loss 7.6464, time 118.77ms
iter 15220: loss 6.1795, time 118.66ms
iter 15230: loss 7.5409, time 118.22ms
iter 15240: loss 7.1771, time 119.44ms
step 15250: train loss 6.4887, val loss 6.5405
saving checkpoint to out-shakespeare-char
iter 15250: loss 6.1932, time 2845.82ms
iter 15260: loss 7.6531, time 123.41ms
iter 15270: loss 7.1152, time 122.91ms
iter 15280: loss 6.6102, time 122.62ms
iter 15290: loss 7.2222, time 120.47ms
tensor(0.0089)
iter 15300: loss 7.2472, time 121.26ms
iter 15310: loss 6.6577, time 120.92ms
iter 15320: loss 6.5115, time 120.59ms
iter 15330: loss 6.6992, time 120.80ms
iter 15340: loss 6.9238, time 120.73ms
iter 15350: loss 6.7263, time 118.73ms
iter 15360: loss 7.0878, time 119.23ms
iter 15370: loss 7.0829, time 118.94ms
iter 15380: loss 7.3732, time 118.01ms
iter 15390: loss 6.5586, time 118.65ms
tensor(0.0157)
iter 15400: loss 7.1481, time 119.16ms
iter 15410: loss 6.7276, time 117.52ms
iter 15420: loss 6.7784, time 119.03ms
iter 15430: loss 6.5606, time 118.72ms
iter 15440: loss 6.7823, time 119.29ms
iter 15450: loss 7.5239, time 120.31ms
iter 15460: loss 7.6208, time 119.76ms
iter 15470: loss 6.1625, time 121.05ms
iter 15480: loss 6.9673, time 121.08ms
iter 15490: loss 6.9440, time 121.89ms
tensor(0.0245)
step 15500: train loss 6.5791, val loss 6.5228
saving checkpoint to out-shakespeare-char
iter 15500: loss 6.8324, time 2826.67ms
iter 15510: loss 6.8207, time 123.00ms
iter 15520: loss 6.6693, time 122.55ms
iter 15530: loss 7.0435, time 121.19ms
iter 15540: loss 6.9624, time 117.87ms
iter 15550: loss 5.9847, time 117.31ms
iter 15560: loss 6.8765, time 116.49ms
iter 15570: loss 7.1552, time 114.56ms
iter 15580: loss 6.6430, time 118.68ms
iter 15590: loss 7.4313, time 116.75ms
tensor(0.0351)
iter 15600: loss 7.4845, time 115.10ms
iter 15610: loss 6.9181, time 116.66ms
iter 15620: loss 6.4463, time 115.90ms
iter 15630: loss 7.9760, time 121.71ms
iter 15640: loss 7.5462, time 115.71ms
iter 15650: loss 6.4854, time 114.54ms
iter 15660: loss 6.5008, time 119.19ms
iter 15670: loss 7.5382, time 120.99ms
iter 15680: loss 7.1995, time 117.18ms
iter 15690: loss 7.5205, time 114.61ms
tensor(0.0476)
iter 15700: loss 7.1112, time 118.88ms
iter 15710: loss 6.4392, time 116.54ms
iter 15720: loss 6.8255, time 114.54ms
iter 15730: loss 7.1832, time 123.47ms
iter 15740: loss 6.9240, time 114.88ms
step 15750: train loss 6.5890, val loss 6.6015
saving checkpoint to out-shakespeare-char
iter 15750: loss 6.9820, time 2845.24ms
iter 15760: loss 7.1720, time 118.62ms
iter 15770: loss 6.7027, time 114.95ms
iter 15780: loss 6.7428, time 114.54ms
iter 15790: loss 7.5167, time 118.55ms
tensor(0.0618)
iter 15800: loss 6.9116, time 115.23ms
iter 15810: loss 6.9104, time 117.22ms
iter 15820: loss 6.4240, time 118.94ms
iter 15830: loss 7.0830, time 123.06ms
iter 15840: loss 6.9421, time 122.98ms
iter 15850: loss 7.1385, time 121.09ms
iter 15860: loss 7.0574, time 121.28ms
iter 15870: loss 7.4758, time 120.91ms
iter 15880: loss 6.4289, time 119.01ms
iter 15890: loss 7.0152, time 118.90ms
tensor(0.0778)
iter 15900: loss 6.9190, time 119.22ms
iter 15910: loss 6.8461, time 118.26ms
iter 15920: loss 7.0183, time 118.95ms
iter 15930: loss 6.7866, time 119.60ms
iter 15940: loss 8.0863, time 120.45ms
iter 15950: loss 7.3736, time 121.39ms
iter 15960: loss 7.4305, time 124.90ms
iter 15970: loss 7.3406, time 123.34ms
iter 15980: loss 7.1170, time 119.49ms
iter 15990: loss 7.3650, time 119.30ms
tensor(0.0955)
step 16000: train loss 6.7583, val loss 6.7402
saving checkpoint to out-shakespeare-char
iter 16000: loss 6.7319, time 2860.71ms
iter 16010: loss 7.7033, time 123.31ms
iter 16020: loss 7.0281, time 123.47ms
iter 16030: loss 7.6179, time 121.53ms
iter 16040: loss 7.7849, time 119.29ms
iter 16050: loss 6.6980, time 118.92ms
iter 16060: loss 7.3894, time 119.09ms
iter 16070: loss 7.2081, time 120.10ms
iter 16080: loss 7.3150, time 120.31ms
iter 16090: loss 7.5055, time 122.12ms
tensor(0.1147)
iter 16100: loss 7.0141, time 121.84ms
iter 16110: loss 7.6002, time 123.64ms
iter 16120: loss 6.4551, time 122.53ms
iter 16130: loss 6.9484, time 121.57ms
iter 16140: loss 7.6926, time 119.77ms
iter 16150: loss 7.6207, time 118.56ms
iter 16160: loss 7.5006, time 119.35ms
iter 16170: loss 7.3066, time 120.41ms
iter 16180: loss 7.3216, time 121.05ms
iter 16190: loss 7.0567, time 123.27ms
tensor(0.1355)
iter 16200: loss 7.6042, time 123.77ms
iter 16210: loss 6.9291, time 123.66ms
iter 16220: loss 7.3711, time 121.22ms
iter 16230: loss 6.6763, time 119.17ms
iter 16240: loss 7.9114, time 119.20ms
step 16250: train loss 6.8420, val loss 6.8548
saving checkpoint to out-shakespeare-char
iter 16250: loss 7.6126, time 2848.90ms
iter 16260: loss 7.1033, time 123.19ms
iter 16270: loss 6.9781, time 124.27ms
iter 16280: loss 6.7272, time 121.93ms
iter 16290: loss 7.5614, time 119.10ms
tensor(0.1577)
iter 16300: loss 6.6894, time 119.62ms
iter 16310: loss 7.2627, time 119.14ms
iter 16320: loss 7.2495, time 118.98ms
iter 16330: loss 7.2747, time 119.44ms
iter 16340: loss 6.9803, time 120.71ms
iter 16350: loss 7.3405, time 119.77ms
iter 16360: loss 7.0458, time 122.55ms
iter 16370: loss 7.0515, time 123.16ms
iter 16380: loss 7.6608, time 122.49ms
iter 16390: loss 7.6938, time 123.14ms
tensor(0.1813)
iter 16400: loss 7.4803, time 121.59ms
iter 16410: loss 7.1174, time 120.92ms
iter 16420: loss 7.4024, time 119.30ms
iter 16430: loss 7.5994, time 119.06ms
iter 16440: loss 8.4663, time 120.23ms
iter 16450: loss 8.6159, time 120.74ms
iter 16460: loss 7.6310, time 121.98ms
iter 16470: loss 7.8543, time 123.47ms
iter 16480: loss 7.6788, time 121.18ms
iter 16490: loss 7.4572, time 123.04ms
tensor(0.2061)
step 16500: train loss 7.0128, val loss 6.9759
saving checkpoint to out-shakespeare-char
iter 16500: loss 7.4386, time 2844.93ms
iter 16510: loss 8.0932, time 114.63ms
iter 16520: loss 7.5141, time 113.33ms
iter 16530: loss 7.3166, time 116.12ms
iter 16540: loss 7.2464, time 112.91ms
iter 16550: loss 7.6156, time 115.73ms
iter 16560: loss 7.7975, time 112.76ms
iter 16570: loss 7.7274, time 115.79ms
iter 16580: loss 7.3140, time 114.81ms
iter 16590: loss 7.1123, time 115.45ms
tensor(0.2321)
iter 16600: loss 7.7937, time 115.54ms
iter 16610: loss 7.5005, time 114.87ms
iter 16620: loss 8.0883, time 114.79ms
iter 16630: loss 7.5836, time 115.58ms
iter 16640: loss 8.1733, time 115.59ms
iter 16650: loss 7.3985, time 114.39ms
iter 16660: loss 7.2703, time 114.94ms
iter 16670: loss 8.0411, time 113.61ms
iter 16680: loss 8.2138, time 117.06ms
iter 16690: loss 7.6724, time 116.74ms
tensor(0.2591)
iter 16700: loss 7.7306, time 115.29ms
iter 16710: loss 7.1760, time 116.55ms
iter 16720: loss 7.6389, time 116.68ms
iter 16730: loss 7.6176, time 113.85ms
iter 16740: loss 7.1283, time 118.58ms
step 16750: train loss 7.1325, val loss 7.1031
saving checkpoint to out-shakespeare-char
iter 16750: loss 7.7386, time 2853.46ms
iter 16760: loss 7.7251, time 113.78ms
iter 16770: loss 7.2073, time 114.89ms
iter 16780: loss 7.9892, time 117.11ms
iter 16790: loss 8.3906, time 114.28ms
tensor(0.2871)
iter 16800: loss 7.5945, time 118.95ms
iter 16810: loss 7.8206, time 116.28ms
iter 16820: loss 7.5959, time 113.25ms
iter 16830: loss 7.6039, time 116.68ms
iter 16840: loss 8.0706, time 114.62ms
iter 16850: loss 7.4207, time 116.63ms
iter 16860: loss 8.2209, time 116.92ms
iter 16870: loss 7.7459, time 114.57ms
iter 16880: loss 8.6654, time 118.47ms
iter 16890: loss 7.8474, time 116.05ms
tensor(0.3159)
iter 16900: loss 7.4202, time 115.29ms
iter 16910: loss 8.2057, time 117.04ms
iter 16920: loss 7.4835, time 114.29ms
iter 16930: loss 7.3920, time 116.09ms
iter 16940: loss 8.7134, time 117.25ms
iter 16950: loss 7.6970, time 114.55ms
iter 16960: loss 7.5762, time 118.57ms
iter 16970: loss 7.6567, time 114.52ms
iter 16980: loss 7.6512, time 114.00ms
iter 16990: loss 7.3432, time 118.89ms
tensor(0.3455)
step 17000: train loss 7.1941, val loss 7.1112
saving checkpoint to out-shakespeare-char
iter 17000: loss 7.8733, time 2848.86ms
iter 17010: loss 7.8316, time 115.52ms
iter 17020: loss 8.4836, time 115.59ms
iter 17030: loss 7.7272, time 116.63ms
iter 17040: loss 7.7637, time 115.80ms
iter 17050: loss 7.7917, time 116.37ms
iter 17060: loss 8.0632, time 118.45ms
iter 17070: loss 8.3164, time 115.28ms
iter 17080: loss 8.0576, time 116.53ms
iter 17090: loss 8.0011, time 117.02ms
tensor(0.3757)
iter 17100: loss 8.0200, time 115.96ms
iter 17110: loss 7.6750, time 116.61ms
iter 17120: loss 7.6154, time 118.10ms
iter 17130: loss 8.1344, time 114.41ms
iter 17140: loss 7.9657, time 118.38ms
iter 17150: loss 7.8064, time 116.71ms
iter 17160: loss 7.8196, time 114.49ms
iter 17170: loss 7.9173, time 116.68ms
iter 17180: loss 7.8566, time 116.36ms
iter 17190: loss 8.0433, time 114.89ms
tensor(0.4063)
iter 17200: loss 8.0880, time 118.98ms
iter 17210: loss 7.7532, time 115.56ms
iter 17220: loss 7.3666, time 115.05ms
iter 17230: loss 8.4518, time 116.67ms
iter 17240: loss 8.1667, time 114.29ms
step 17250: train loss 7.3247, val loss 7.2808
saving checkpoint to out-shakespeare-char
iter 17250: loss 8.3301, time 2832.48ms
iter 17260: loss 7.6679, time 117.99ms
iter 17270: loss 7.9813, time 115.73ms
iter 17280: loss 8.5054, time 115.91ms
iter 17290: loss 7.6986, time 117.25ms
tensor(0.4373)
iter 17300: loss 8.3936, time 115.60ms
iter 17310: loss 7.9846, time 116.66ms
iter 17320: loss 8.7099, time 118.50ms
iter 17330: loss 8.4966, time 114.87ms
iter 17340: loss 7.3316, time 116.16ms
iter 17350: loss 7.7713, time 116.70ms
iter 17360: loss 8.6091, time 114.60ms
iter 17370: loss 8.0541, time 116.64ms
iter 17380: loss 7.5900, time 116.41ms
iter 17390: loss 8.1397, time 114.08ms
tensor(0.4686)
iter 17400: loss 7.6620, time 119.22ms
iter 17410: loss 8.1811, time 114.96ms
iter 17420: loss 7.4500, time 116.68ms
iter 17430: loss 8.7993, time 116.39ms
iter 17440: loss 7.5295, time 114.52ms
iter 17450: loss 8.1398, time 115.92ms
iter 17460: loss 7.6746, time 117.10ms
iter 17470: loss 8.2069, time 114.84ms
iter 17480: loss 9.1763, time 118.64ms
iter 17490: loss 9.3873, time 114.64ms
tensor(0.5000)
step 17500: train loss 8.6056, val loss 8.5983
saving checkpoint to out-shakespeare-char
iter 17500: loss 9.2988, time 2841.01ms
iter 17510: loss 23.1366, time 115.10ms
iter 17520: loss 74.2423, time 118.99ms
iter 17530: loss 48.4226, time 116.80ms
iter 17540: loss 39.1338, time 114.91ms
iter 17550: loss 33.2739, time 116.10ms
iter 17560: loss 31.2567, time 117.03ms
iter 17570: loss 62.3238, time 115.26ms
iter 17580: loss 24.3613, time 118.80ms
iter 17590: loss 41.8452, time 115.83ms
tensor(0.5314)
iter 17600: loss 36.1515, time 115.86ms
iter 17610: loss 43.9813, time 115.92ms
iter 17620: loss 30.4098, time 116.70ms
iter 17630: loss 51.3615, time 116.24ms
iter 17640: loss 49.7100, time 118.87ms
iter 17650: loss 32.0336, time 116.17ms
iter 17660: loss 36.9642, time 115.23ms
iter 17670: loss 52.5887, time 116.67ms
iter 17680: loss 80.4601, time 116.76ms
iter 17690: loss 63.6735, time 114.61ms
tensor(0.5627)
iter 17700: loss 49.3600, time 119.69ms
iter 17710: loss 73.6157, time 115.93ms
iter 17720: loss 34.1876, time 116.17ms
iter 17730: loss 56.6920, time 117.16ms
iter 17740: loss 53.0825, time 116.70ms
step 17750: train loss 49.1031, val loss 48.7824
saving checkpoint to out-shakespeare-char
iter 17750: loss 47.8613, time 2826.50ms
iter 17760: loss 75.5379, time 115.90ms
iter 17770: loss 48.8970, time 117.57ms
iter 17780: loss 83.4359, time 115.67ms
iter 17790: loss 60.0907, time 116.79ms
tensor(0.5937)
iter 17800: loss 78.4939, time 118.48ms
iter 17810: loss 83.6105, time 114.86ms
iter 17820: loss 64.3124, time 117.04ms
iter 17830: loss 56.2641, time 117.59ms
iter 17840: loss 74.1583, time 115.46ms
iter 17850: loss 73.5347, time 116.71ms
iter 17860: loss 51.4487, time 117.79ms
iter 17870: loss 56.5623, time 115.34ms
iter 17880: loss 64.1575, time 116.87ms
iter 17890: loss 104.9051, time 117.38ms
tensor(0.6243)
iter 17900: loss 58.8369, time 116.41ms
iter 17910: loss 52.4498, time 117.03ms
iter 17920: loss 58.1520, time 117.42ms
iter 17930: loss 65.1391, time 115.78ms
iter 17940: loss 72.2682, time 116.73ms
iter 17950: loss 74.0142, time 117.75ms
iter 17960: loss 69.4578, time 115.55ms
iter 17970: loss 71.7526, time 116.85ms
iter 17980: loss 68.2242, time 118.58ms
iter 17990: loss 74.5090, time 115.45ms
tensor(0.6545)
step 18000: train loss 74.8187, val loss 75.2543
saving checkpoint to out-shakespeare-char
iter 18000: loss 79.4491, time 2840.19ms
iter 18010: loss 72.6844, time 116.96ms
iter 18020: loss 77.5410, time 115.31ms
iter 18030: loss 61.2974, time 114.62ms
iter 18040: loss 71.5079, time 115.88ms
iter 18050: loss 92.9168, time 118.22ms
iter 18060: loss 74.3510, time 115.42ms
iter 18070: loss 81.8116, time 116.04ms
iter 18080: loss 67.9922, time 116.58ms
iter 18090: loss 53.5861, time 115.81ms
tensor(0.6841)
iter 18100: loss 75.7579, time 117.53ms
iter 18110: loss 87.4719, time 117.91ms
iter 18120: loss 81.5882, time 114.89ms
iter 18130: loss 97.2019, time 116.74ms
iter 18140: loss 61.5092, time 115.53ms
iter 18150: loss 77.8142, time 115.01ms
iter 18160: loss 71.2625, time 118.72ms
iter 18170: loss 68.2930, time 119.15ms
iter 18180: loss 121.5726, time 116.62ms
iter 18190: loss 89.8219, time 115.73ms
tensor(0.7129)
iter 18200: loss 85.1611, time 115.40ms
iter 18210: loss 86.5308, time 115.72ms
iter 18220: loss 107.5864, time 116.98ms
iter 18230: loss 124.7284, time 118.12ms
iter 18240: loss 87.2201, time 114.67ms
step 18250: train loss 120.1130, val loss 121.3533
saving checkpoint to out-shakespeare-char
iter 18250: loss 112.3022, time 2847.50ms
iter 18260: loss 132.0016, time 114.64ms
iter 18270: loss 114.7186, time 116.24ms
iter 18280: loss 70.8886, time 115.70ms
iter 18290: loss 111.6422, time 117.26ms
tensor(0.7409)
iter 18300: loss 102.7630, time 118.02ms
iter 18310: loss 82.5055, time 116.12ms
iter 18320: loss 111.8329, time 114.65ms
iter 18330: loss 98.0382, time 116.26ms
iter 18340: loss 92.3543, time 115.42ms
iter 18350: loss 91.6027, time 116.60ms
iter 18360: loss 85.2162, time 117.91ms
iter 18370: loss 92.9552, time 115.40ms
iter 18380: loss 101.0926, time 114.60ms
iter 18390: loss 109.8659, time 116.73ms
tensor(0.7679)
iter 18400: loss 94.5908, time 116.02ms
iter 18410: loss 99.9003, time 117.04ms
iter 18420: loss 94.5130, time 117.98ms
iter 18430: loss 101.8949, time 115.39ms
iter 18440: loss 103.6223, time 114.51ms
iter 18450: loss 96.7543, time 117.01ms
iter 18460: loss 81.7266, time 115.42ms
iter 18470: loss 115.1679, time 116.78ms
iter 18480: loss 162.7666, time 117.93ms
iter 18490: loss 105.7551, time 114.56ms
tensor(0.7939)
step 18500: train loss 77.8396, val loss 79.4095
saving checkpoint to out-shakespeare-char
iter 18500: loss 98.1805, time 2847.56ms
iter 18510: loss 93.5491, time 116.39ms
iter 18520: loss 130.6469, time 117.60ms
iter 18530: loss 114.5955, time 114.07ms
iter 18540: loss 82.4900, time 117.00ms
iter 18550: loss 121.7948, time 117.63ms
iter 18560: loss 117.3658, time 114.87ms
iter 18570: loss 110.7862, time 121.42ms
iter 18580: loss 98.9595, time 120.22ms
iter 18590: loss 114.6837, time 120.29ms
tensor(0.8187)
iter 18600: loss 114.4525, time 120.02ms
iter 18610: loss 138.6301, time 119.89ms
iter 18620: loss 152.1382, time 120.06ms
iter 18630: loss 135.0443, time 120.36ms
iter 18640: loss 113.2038, time 120.36ms
iter 18650: loss 98.2041, time 119.46ms
iter 18660: loss 120.6809, time 120.42ms
iter 18670: loss 150.2263, time 118.37ms
iter 18680: loss 184.4793, time 119.22ms
iter 18690: loss 109.4978, time 119.31ms
tensor(0.8423)
iter 18700: loss 99.8227, time 120.23ms
iter 18710: loss 112.0351, time 120.35ms
iter 18720: loss 98.2993, time 119.84ms
iter 18730: loss 141.8301, time 120.84ms
iter 18740: loss 128.6669, time 121.26ms
step 18750: train loss 109.1276, val loss 108.6954
saving checkpoint to out-shakespeare-char
iter 18750: loss 147.3352, time 2852.71ms
iter 18760: loss 138.6523, time 119.26ms
iter 18770: loss 109.7127, time 119.69ms
iter 18780: loss 117.7967, time 120.37ms
iter 18790: loss 119.2698, time 121.81ms
tensor(0.8645)
iter 18800: loss 118.7427, time 121.59ms
iter 18810: loss 129.8883, time 122.78ms
iter 18820: loss 104.8640, time 122.35ms
iter 18830: loss 115.7989, time 119.61ms
iter 18840: loss 138.2613, time 121.71ms
iter 18850: loss 128.9796, time 119.57ms
iter 18860: loss 125.6213, time 120.80ms
iter 18870: loss 164.8094, time 121.77ms
iter 18880: loss 138.3919, time 123.18ms
iter 18890: loss 132.2118, time 121.75ms
tensor(0.8853)
iter 18900: loss 143.1571, time 121.05ms
iter 18910: loss 156.7606, time 121.74ms
iter 18920: loss 174.1686, time 119.66ms
iter 18930: loss 94.8616, time 120.65ms
iter 18940: loss 161.0060, time 121.82ms
iter 18950: loss 170.2638, time 121.23ms
iter 18960: loss 139.2813, time 123.98ms
iter 18970: loss 182.3659, time 119.57ms
iter 18980: loss 173.8666, time 121.86ms
iter 18990: loss 150.0464, time 119.59ms
tensor(0.9045)
step 19000: train loss 153.5976, val loss 153.4003
saving checkpoint to out-shakespeare-char
iter 19000: loss 182.1536, time 2861.51ms
iter 19010: loss 138.4150, time 121.65ms
iter 19020: loss 112.0819, time 121.59ms
iter 19030: loss 125.4797, time 119.74ms
iter 19040: loss 175.7036, time 118.60ms
iter 19050: loss 137.0426, time 121.38ms
iter 19060: loss 126.8322, time 122.68ms
iter 19070: loss 169.6050, time 124.04ms
iter 19080: loss 161.6478, time 121.78ms
iter 19090: loss 173.0942, time 122.25ms
tensor(0.9222)
iter 19100: loss 140.7827, time 119.44ms
iter 19110: loss 157.9395, time 121.59ms
iter 19120: loss 173.4234, time 121.64ms
iter 19130: loss 178.4996, time 123.57ms
iter 19140: loss 165.4654, time 121.71ms
iter 19150: loss 153.0275, time 121.60ms
iter 19160: loss 178.0744, time 121.98ms
iter 19170: loss 164.9214, time 120.42ms
iter 19180: loss 154.2761, time 118.37ms
iter 19190: loss 213.8773, time 121.87ms
tensor(0.9382)
iter 19200: loss 176.8748, time 123.71ms
iter 19210: loss 137.0814, time 121.63ms
iter 19220: loss 159.5184, time 121.59ms
iter 19230: loss 185.2973, time 119.77ms
iter 19240: loss 143.3834, time 127.72ms
step 19250: train loss 100.2707, val loss 98.5131
saving checkpoint to out-shakespeare-char
iter 19250: loss 108.5888, time 2829.38ms
iter 19260: loss 187.7921, time 122.03ms
iter 19270: loss 183.8032, time 121.78ms
iter 19280: loss 183.7717, time 119.87ms
iter 19290: loss 147.8851, time 120.99ms
tensor(0.9524)
iter 19300: loss 162.0709, time 121.14ms
iter 19310: loss 135.6331, time 122.95ms
iter 19320: loss 177.5752, time 121.59ms
iter 19330: loss 153.0541, time 121.92ms
iter 19340: loss 171.0072, time 122.18ms
iter 19350: loss 211.1805, time 121.92ms
iter 19360: loss 136.8698, time 120.61ms
iter 19370: loss 143.6481, time 121.42ms
iter 19380: loss 175.6035, time 120.10ms
iter 19390: loss 166.7061, time 124.31ms
tensor(0.9649)
iter 19400: loss 144.2045, time 121.35ms
iter 19410: loss 112.5442, time 121.72ms
iter 19420: loss 146.2464, time 119.47ms
iter 19430: loss 200.9142, time 120.99ms
iter 19440: loss 219.8483, time 121.99ms
iter 19450: loss 127.3948, time 123.25ms
iter 19460: loss 180.1089, time 121.41ms
iter 19470: loss 213.4609, time 121.73ms
iter 19480: loss 166.3164, time 121.93ms
iter 19490: loss 142.9285, time 119.94ms
tensor(0.9755)
step 19500: train loss 134.0469, val loss 132.1258
saving checkpoint to out-shakespeare-char
iter 19500: loss 139.7967, time 2841.79ms
iter 19510: loss 183.3063, time 123.98ms
iter 19520: loss 211.6174, time 119.58ms
iter 19530: loss 175.8744, time 121.73ms
iter 19540: loss 146.9768, time 119.76ms
iter 19550: loss 174.4941, time 121.08ms
iter 19560: loss 180.0361, time 121.48ms
iter 19570: loss 160.8849, time 121.79ms
iter 19580: loss 125.6072, time 120.78ms
iter 19590: loss 156.1242, time 121.60ms
tensor(0.9843)
iter 19600: loss 163.6185, time 122.02ms
iter 19610: loss 153.9311, time 119.96ms
iter 19620: loss 162.4805, time 121.86ms
iter 19630: loss 150.2227, time 122.04ms
iter 19640: loss 169.7211, time 123.15ms
iter 19650: loss 148.1671, time 121.88ms
iter 19660: loss 179.8023, time 119.55ms
iter 19670: loss 198.2390, time 121.62ms
iter 19680: loss 219.7662, time 120.23ms
iter 19690: loss 169.7650, time 121.50ms
tensor(0.9911)
iter 19700: loss 162.5689, time 122.86ms
iter 19710: loss 174.6093, time 123.82ms
iter 19720: loss 177.1781, time 121.79ms
iter 19730: loss 224.4035, time 121.73ms
iter 19740: loss 184.4988, time 120.34ms
step 19750: train loss 137.4111, val loss 137.3441
saving checkpoint to out-shakespeare-char
iter 19750: loss 177.0984, time 2839.04ms
iter 19760: loss 188.9562, time 123.59ms
iter 19770: loss 178.0575, time 121.57ms
iter 19780: loss 191.8017, time 121.75ms
iter 19790: loss 191.7257, time 119.87ms
tensor(0.9961)
iter 19800: loss 219.1208, time 119.46ms
iter 19810: loss 173.9356, time 122.09ms
iter 19820: loss 153.0000, time 122.58ms
iter 19830: loss 193.2466, time 123.63ms
iter 19840: loss 211.6102, time 121.01ms
iter 19850: loss 209.6056, time 120.27ms
iter 19860: loss 158.9737, time 118.74ms
iter 19870: loss 200.3913, time 120.43ms
iter 19880: loss 174.9396, time 120.74ms
iter 19890: loss 239.4397, time 120.59ms
tensor(0.9990)
iter 19900: loss 154.1175, time 120.94ms
iter 19910: loss 144.7334, time 118.72ms
iter 19920: loss 194.2493, time 120.20ms
iter 19930: loss 192.5965, time 120.43ms
iter 19940: loss 151.2461, time 120.40ms
iter 19950: loss 196.7757, time 120.43ms
iter 19960: loss 262.1187, time 118.45ms
iter 19970: loss 248.0069, time 120.34ms
iter 19980: loss 186.0212, time 121.03ms
iter 19990: loss 176.8726, time 120.36ms
tensor(1.)
step 20000: train loss 175.0242, val loss 175.6472
saving checkpoint to out-shakespeare-char
iter 20000: loss 242.5042, time 2841.82ms
iter 20010: loss 151.2639, time 118.76ms
iter 20020: loss 169.3016, time 117.84ms
iter 20030: loss 157.9544, time 118.64ms
iter 20040: loss 234.3777, time 118.96ms
iter 20050: loss 218.0882, time 118.89ms
iter 20060: loss 170.7671, time 118.88ms
iter 20070: loss 214.7885, time 118.81ms
iter 20080: loss 223.1295, time 119.11ms
iter 20090: loss 164.8492, time 119.09ms
tensor(0.9990)
iter 20100: loss 176.8190, time 120.21ms
iter 20110: loss 220.3431, time 119.62ms
iter 20120: loss 214.0592, time 119.63ms
iter 20130: loss 196.7090, time 119.90ms
iter 20140: loss 142.5396, time 119.95ms
iter 20150: loss 190.1479, time 120.27ms
iter 20160: loss 182.6930, time 119.87ms
iter 20170: loss 163.6361, time 119.48ms
iter 20180: loss 223.4031, time 120.08ms
iter 20190: loss 205.4602, time 120.37ms
tensor(0.9961)
iter 20200: loss 208.2669, time 120.90ms
iter 20210: loss 167.2552, time 120.48ms
iter 20220: loss 173.0195, time 119.94ms
iter 20230: loss 161.5416, time 121.40ms
iter 20240: loss 185.9581, time 121.64ms
step 20250: train loss 129.9073, val loss 130.5508
saving checkpoint to out-shakespeare-char
iter 20250: loss 205.3606, time 2852.74ms
iter 20260: loss 147.9429, time 121.46ms
iter 20270: loss 182.7376, time 119.09ms
iter 20280: loss 175.2243, time 120.07ms
iter 20290: loss 194.2560, time 121.35ms
tensor(0.9911)
iter 20300: loss 175.9924, time 121.66ms
iter 20310: loss 204.9875, time 122.34ms
iter 20320: loss 204.3415, time 123.75ms
iter 20330: loss 225.8795, time 118.30ms
iter 20340: loss 158.4927, time 121.07ms
iter 20350: loss 205.8806, time 120.08ms
iter 20360: loss 196.0478, time 121.35ms
iter 20370: loss 212.1920, time 120.01ms
iter 20380: loss 191.8824, time 121.78ms
iter 20390: loss 205.9028, time 119.04ms
tensor(0.9843)
iter 20400: loss 191.4600, time 121.26ms
iter 20410: loss 177.3907, time 123.02ms
iter 20420: loss 174.8445, time 121.26ms
iter 20430: loss 177.9732, time 120.43ms
iter 20440: loss 155.3027, time 121.30ms
iter 20450: loss 194.6516, time 119.00ms
iter 20460: loss 183.1788, time 119.67ms
iter 20470: loss 168.5299, time 121.00ms
iter 20480: loss 233.7649, time 121.55ms
iter 20490: loss 171.3558, time 121.77ms
tensor(0.9755)
step 20500: train loss 162.4798, val loss 162.6132
saving checkpoint to out-shakespeare-char
iter 20500: loss 205.8139, time 2814.17ms
iter 20510: loss 205.1993, time 123.98ms
iter 20520: loss 187.1725, time 121.19ms
iter 20530: loss 226.6075, time 121.07ms
iter 20540: loss 203.0824, time 121.16ms
iter 20550: loss 177.4801, time 118.75ms
iter 20560: loss 238.0423, time 120.09ms
iter 20570: loss 211.4388, time 119.57ms
iter 20580: loss 253.5544, time 117.79ms
iter 20590: loss 179.5146, time 117.73ms
tensor(0.9649)
iter 20600: loss 148.1308, time 117.15ms
iter 20610: loss 198.9546, time 118.70ms
iter 20620: loss 163.5764, time 119.99ms
iter 20630: loss 219.9728, time 120.28ms
iter 20640: loss 217.7717, time 119.86ms
iter 20650: loss 175.2048, time 120.99ms
iter 20660: loss 205.4874, time 122.11ms
iter 20670: loss 145.3325, time 123.34ms
iter 20680: loss 204.7423, time 121.21ms
iter 20690: loss 216.0770, time 121.31ms
tensor(0.9524)
iter 20700: loss 191.9384, time 122.12ms
iter 20710: loss 216.0278, time 120.67ms
iter 20720: loss 190.1042, time 119.32ms
iter 20730: loss 194.8527, time 119.91ms
iter 20740: loss 203.1779, time 119.07ms
step 20750: train loss 122.0764, val loss 122.2414
saving checkpoint to out-shakespeare-char
iter 20750: loss 166.4690, time 2846.20ms
iter 20760: loss 208.9094, time 121.24ms
iter 20770: loss 178.0012, time 121.25ms
iter 20780: loss 175.1082, time 121.15ms
iter 20790: loss 190.2861, time 119.01ms
tensor(0.9382)
iter 20800: loss 161.5545, time 119.33ms
iter 20810: loss 190.5744, time 121.52ms
iter 20820: loss 221.5272, time 120.39ms
iter 20830: loss 193.8578, time 123.06ms
iter 20840: loss 212.5192, time 122.97ms
iter 20850: loss 198.0624, time 121.12ms
iter 20860: loss 227.7018, time 120.53ms
iter 20870: loss 190.9844, time 119.04ms
iter 20880: loss 165.6196, time 118.89ms
iter 20890: loss 155.8230, time 118.79ms
tensor(0.9222)
iter 20900: loss 185.8033, time 119.22ms
iter 20910: loss 236.5363, time 119.73ms
iter 20920: loss 156.3073, time 120.92ms
iter 20930: loss 206.7980, time 121.35ms
iter 20940: loss 217.0724, time 122.65ms
iter 20950: loss 214.6434, time 123.24ms
iter 20960: loss 193.9446, time 123.17ms
iter 20970: loss 157.8540, time 123.17ms
iter 20980: loss 209.9256, time 118.36ms
iter 20990: loss 213.2419, time 119.60ms
tensor(0.9045)
step 21000: train loss 106.8806, val loss 107.1385
saving checkpoint to out-shakespeare-char
iter 21000: loss 176.6115, time 2829.87ms
iter 21010: loss 165.1248, time 118.53ms
iter 21020: loss 196.0869, time 121.17ms
iter 21030: loss 198.5941, time 122.19ms
iter 21040: loss 192.1703, time 121.39ms
iter 21050: loss 190.6685, time 123.22ms
iter 21060: loss 169.0983, time 123.27ms
iter 21070: loss 164.5671, time 121.50ms
iter 21080: loss 203.6333, time 119.22ms
iter 21090: loss 158.7826, time 119.30ms
tensor(0.8853)
iter 21100: loss 176.0545, time 119.85ms
iter 21110: loss 157.0570, time 119.84ms
iter 21120: loss 160.2147, time 120.79ms
iter 21130: loss 172.6017, time 122.12ms
iter 21140: loss 188.2888, time 123.44ms
iter 21150: loss 156.1450, time 121.28ms
iter 21160: loss 183.5408, time 123.20ms
iter 21170: loss 164.5247, time 121.21ms
iter 21180: loss 175.1859, time 119.56ms
iter 21190: loss 156.3399, time 119.32ms
tensor(0.8645)
iter 21200: loss 162.6512, time 119.53ms
iter 21210: loss 167.5741, time 120.39ms
iter 21220: loss 156.7970, time 120.62ms
iter 21230: loss 129.6145, time 121.33ms
iter 21240: loss 151.1434, time 123.41ms
step 21250: train loss 104.5624, val loss 106.2264
saving checkpoint to out-shakespeare-char
iter 21250: loss 152.4768, time 2843.82ms
iter 21260: loss 163.0881, time 119.00ms
iter 21270: loss 197.3011, time 119.47ms
iter 21280: loss 164.9142, time 120.98ms
iter 21290: loss 157.1305, time 121.95ms
tensor(0.8423)
iter 21300: loss 181.0740, time 123.33ms
iter 21310: loss 173.9628, time 123.35ms
iter 21320: loss 167.7120, time 121.24ms
iter 21330: loss 171.9648, time 120.68ms
iter 21340: loss 154.8450, time 119.13ms
iter 21350: loss 155.5282, time 118.91ms
iter 21360: loss 157.7484, time 119.34ms
iter 21370: loss 154.9979, time 120.29ms
iter 21380: loss 155.7737, time 121.62ms
iter 21390: loss 147.0314, time 121.97ms
tensor(0.8187)
iter 21400: loss 169.0413, time 121.73ms
iter 21410: loss 177.8588, time 123.27ms
iter 21420: loss 235.6500, time 123.45ms
iter 21430: loss 150.5764, time 121.47ms
iter 21440: loss 181.6059, time 118.77ms
iter 21450: loss 162.8508, time 119.46ms
iter 21460: loss 187.4000, time 119.09ms
iter 21470: loss 188.5126, time 120.47ms
iter 21480: loss 137.6778, time 121.27ms
iter 21490: loss 167.0804, time 121.69ms
tensor(0.7939)
step 21500: train loss 115.5030, val loss 117.8178
saving checkpoint to out-shakespeare-char
iter 21500: loss 179.5440, time 2834.66ms
iter 21510: loss 142.3387, time 121.46ms
iter 21520: loss 146.1086, time 120.90ms
iter 21530: loss 157.5543, time 119.42ms
iter 21540: loss 172.7461, time 119.04ms
iter 21550: loss 159.4242, time 119.20ms
iter 21560: loss 165.5009, time 118.24ms
iter 21570: loss 172.2960, time 121.20ms
iter 21580: loss 158.7724, time 121.93ms
iter 21590: loss 156.3262, time 121.50ms
tensor(0.7679)
iter 21600: loss 169.2588, time 123.55ms
iter 21610: loss 140.8434, time 123.68ms
iter 21620: loss 145.8370, time 121.45ms
iter 21630: loss 141.8323, time 120.58ms
iter 21640: loss 145.3877, time 119.33ms
iter 21650: loss 156.5631, time 118.18ms
iter 21660: loss 136.7418, time 119.53ms
iter 21670: loss 141.5800, time 120.38ms
iter 21680: loss 133.1301, time 122.33ms
iter 21690: loss 145.8280, time 122.65ms
tensor(0.7409)
iter 21700: loss 147.4997, time 121.59ms
iter 21710: loss 151.6002, time 123.59ms
iter 21720: loss 154.7762, time 121.44ms
iter 21730: loss 180.6342, time 119.61ms
iter 21740: loss 162.0577, time 118.23ms
step 21750: train loss 86.5104, val loss 87.8330
saving checkpoint to out-shakespeare-char
iter 21750: loss 135.9038, time 2868.19ms
iter 21760: loss 145.8948, time 121.06ms
iter 21770: loss 128.9588, time 122.48ms
iter 21780: loss 131.3414, time 121.52ms
iter 21790: loss 162.9447, time 119.24ms
tensor(0.7129)
iter 21800: loss 183.4079, time 119.59ms
iter 21810: loss 152.3575, time 119.01ms
iter 21820: loss 135.2665, time 119.99ms
iter 21830: loss 123.1963, time 120.99ms
iter 21840: loss 117.8617, time 120.44ms
iter 21850: loss 145.7795, time 123.64ms
iter 21860: loss 128.0546, time 123.37ms
iter 21870: loss 130.1739, time 123.32ms
iter 21880: loss 147.0192, time 120.43ms
iter 21890: loss 126.6727, time 119.53ms
tensor(0.6841)
iter 21900: loss 117.2902, time 119.73ms
iter 21910: loss 115.9313, time 119.54ms
iter 21920: loss 116.7444, time 119.01ms
iter 21930: loss 132.6327, time 120.01ms
iter 21940: loss 134.0808, time 121.13ms
iter 21950: loss 127.7070, time 120.66ms
iter 21960: loss 141.6861, time 123.11ms
iter 21970: loss 109.5539, time 123.34ms
iter 21980: loss 134.9318, time 121.30ms
iter 21990: loss 124.8257, time 121.32ms
tensor(0.6545)
step 22000: train loss 51.2723, val loss 50.8670
saving checkpoint to out-shakespeare-char
iter 22000: loss 110.6073, time 2866.43ms
iter 22010: loss 103.5560, time 121.61ms
iter 22020: loss 102.4924, time 123.18ms
iter 22030: loss 108.1674, time 123.26ms
iter 22040: loss 129.1548, time 121.94ms
iter 22050: loss 98.6639, time 119.27ms
iter 22060: loss 132.9168, time 119.36ms
iter 22070: loss 124.3904, time 118.98ms
iter 22080: loss 94.2315, time 119.73ms
iter 22090: loss 119.0651, time 120.87ms
tensor(0.6243)
iter 22100: loss 94.7344, time 121.51ms
iter 22110: loss 86.3384, time 123.41ms
iter 22120: loss 119.9196, time 123.19ms
iter 22130: loss 121.9574, time 122.31ms
iter 22140: loss 143.2068, time 119.39ms
iter 22150: loss 137.1547, time 121.36ms
iter 22160: loss 99.0856, time 119.66ms
iter 22170: loss 97.6551, time 119.12ms
iter 22180: loss 121.3964, time 120.00ms
iter 22190: loss 99.1577, time 121.45ms
tensor(0.5937)
iter 22200: loss 123.7390, time 120.92ms
iter 22210: loss 98.8935, time 122.93ms
iter 22220: loss 89.4008, time 123.27ms
iter 22230: loss 93.8301, time 122.45ms
iter 22240: loss 109.7105, time 120.89ms
step 22250: train loss 53.1254, val loss 53.3603
saving checkpoint to out-shakespeare-char
iter 22250: loss 103.3307, time 2856.91ms
iter 22260: loss 110.6513, time 119.99ms
iter 22270: loss 102.1617, time 121.94ms
iter 22280: loss 95.5672, time 122.78ms
iter 22290: loss 106.1711, time 122.15ms
tensor(0.5627)
iter 22300: loss 99.8103, time 123.29ms
iter 22310: loss 109.7642, time 121.49ms
iter 22320: loss 96.0540, time 118.54ms
iter 22330: loss 90.5397, time 119.22ms
iter 22340: loss 82.2267, time 119.08ms
iter 22350: loss 75.5483, time 119.05ms
iter 22360: loss 88.4965, time 121.17ms
iter 22370: loss 96.4689, time 123.37ms
iter 22380: loss 86.4445, time 122.88ms
iter 22390: loss 99.9406, time 121.32ms
tensor(0.5314)
iter 22400: loss 89.9205, time 123.49ms
iter 22410: loss 84.8496, time 121.19ms
iter 22420: loss 99.3350, time 121.14ms
iter 22430: loss 94.1282, time 119.28ms
iter 22440: loss 85.4209, time 118.27ms
iter 22450: loss 92.7684, time 119.40ms
iter 22460: loss 84.7507, time 120.91ms
iter 22470: loss 87.2424, time 121.97ms
iter 22480: loss 76.8913, time 123.23ms
iter 22490: loss 77.0016, time 122.58ms
tensor(0.5000)
step 22500: train loss 43.8695, val loss 44.0158
saving checkpoint to out-shakespeare-char
iter 22500: loss 92.6336, time 2847.13ms
iter 22510: loss 75.7141, time 119.25ms
iter 22520: loss 91.6940, time 116.88ms
iter 22530: loss 85.6008, time 118.36ms
iter 22540: loss 77.3681, time 114.86ms
iter 22550: loss 67.3310, time 117.46ms
iter 22560: loss 76.9264, time 117.79ms
iter 22570: loss 70.7622, time 114.53ms
iter 22580: loss 63.3234, time 118.62ms
iter 22590: loss 80.5237, time 116.86ms
tensor(0.4686)
iter 22600: loss 67.9410, time 115.47ms
iter 22610: loss 83.6752, time 118.57ms
iter 22620: loss 73.4644, time 117.90ms
iter 22630: loss 68.5421, time 114.73ms
iter 22640: loss 69.7053, time 118.68ms
iter 22650: loss 67.4182, time 117.16ms
iter 22660: loss 65.3797, time 114.62ms
iter 22670: loss 76.1665, time 118.47ms
iter 22680: loss 66.8232, time 116.70ms
iter 22690: loss 59.0991, time 114.41ms
tensor(0.4373)
iter 22700: loss 63.1756, time 118.96ms
iter 22710: loss 72.8851, time 122.22ms
iter 22720: loss 70.2162, time 122.23ms
iter 22730: loss 57.5481, time 121.06ms
iter 22740: loss 59.5067, time 122.94ms
step 22750: train loss 31.3640, val loss 31.1304
saving checkpoint to out-shakespeare-char
iter 22750: loss 63.8143, time 2836.43ms
iter 22760: loss 73.7974, time 121.19ms
iter 22770: loss 70.8529, time 118.59ms
iter 22780: loss 61.3313, time 118.68ms
iter 22790: loss 50.2738, time 118.87ms
tensor(0.4063)
iter 22800: loss 57.7575, time 118.48ms
iter 22810: loss 55.1390, time 119.49ms
iter 22820: loss 58.0570, time 120.20ms
iter 22830: loss 51.4602, time 120.85ms
iter 22840: loss 58.1796, time 120.81ms
iter 22850: loss 50.2724, time 122.41ms
iter 22860: loss 53.4979, time 123.53ms
iter 22870: loss 56.0906, time 121.14ms
iter 22880: loss 56.3867, time 123.45ms
iter 22890: loss 47.4842, time 118.47ms
tensor(0.3757)
iter 22900: loss 49.5055, time 121.82ms
iter 22910: loss 51.4064, time 120.79ms
iter 22920: loss 45.4532, time 121.83ms
iter 22930: loss 47.3927, time 120.65ms
iter 22940: loss 47.4281, time 119.51ms
iter 22950: loss 40.8334, time 122.92ms
iter 22960: loss 48.8607, time 122.04ms
iter 22970: loss 48.0224, time 122.13ms
iter 22980: loss 48.6179, time 122.44ms
iter 22990: loss 41.8155, time 120.53ms
tensor(0.3455)
step 23000: train loss 24.9373, val loss 25.0943
saving checkpoint to out-shakespeare-char
iter 23000: loss 43.6489, time 2844.31ms
iter 23010: loss 43.4217, time 118.41ms
iter 23020: loss 38.0783, time 118.43ms
iter 23030: loss 41.5947, time 118.34ms
iter 23040: loss 43.5560, time 118.54ms
iter 23050: loss 39.6651, time 118.39ms
iter 23060: loss 39.7937, time 118.38ms
iter 23070: loss 40.4325, time 118.41ms
iter 23080: loss 34.3268, time 118.30ms
iter 23090: loss 35.7932, time 117.80ms
tensor(0.3159)
iter 23100: loss 37.2155, time 118.71ms
iter 23110: loss 38.3680, time 117.27ms
iter 23120: loss 36.4407, time 117.56ms
iter 23130: loss 35.3129, time 118.00ms
iter 23140: loss 33.6426, time 118.17ms
iter 23150: loss 31.1526, time 118.54ms
iter 23160: loss 35.1857, time 117.41ms
iter 23170: loss 31.1864, time 117.55ms
iter 23180: loss 33.9165, time 117.48ms
iter 23190: loss 28.8020, time 118.39ms
tensor(0.2871)
iter 23200: loss 27.6264, time 119.19ms
iter 23210: loss 30.5031, time 117.59ms
iter 23220: loss 27.1099, time 118.05ms
iter 23230: loss 26.4186, time 119.17ms
iter 23240: loss 24.9200, time 118.90ms
step 23250: train loss 9.5453, val loss 9.5719
saving checkpoint to out-shakespeare-char
iter 23250: loss 24.2092, time 2825.55ms
iter 23260: loss 22.9019, time 120.51ms
iter 23270: loss 22.2492, time 121.60ms
iter 23280: loss 21.1323, time 121.10ms
iter 23290: loss 20.9255, time 123.02ms
tensor(0.2591)
iter 23300: loss 20.2548, time 123.22ms
iter 23310: loss 20.3069, time 122.37ms
iter 23320: loss 21.0377, time 121.08ms
iter 23330: loss 20.3623, time 121.33ms
iter 23340: loss 19.8346, time 119.06ms
iter 23350: loss 19.1162, time 119.30ms
iter 23360: loss 18.6666, time 119.08ms
iter 23370: loss 19.0711, time 120.04ms
iter 23380: loss 18.6888, time 119.60ms
iter 23390: loss 18.5143, time 121.59ms
tensor(0.2321)
iter 23400: loss 19.0149, time 121.53ms
iter 23410: loss 17.8237, time 120.94ms
iter 23420: loss 17.7342, time 122.80ms
iter 23430: loss 18.1115, time 123.17ms
iter 23440: loss 17.4539, time 121.35ms
iter 23450: loss 17.1246, time 121.30ms
iter 23460: loss 17.4442, time 119.07ms
iter 23470: loss 16.5485, time 119.03ms
iter 23480: loss 16.9841, time 119.26ms
iter 23490: loss 16.4494, time 119.47ms
tensor(0.2061)
step 23500: train loss 8.0383, val loss 8.0681
saving checkpoint to out-shakespeare-char
iter 23500: loss 16.5851, time 2821.46ms
iter 23510: loss 16.4636, time 121.17ms
iter 23520: loss 16.8994, time 120.65ms
iter 23530: loss 15.6246, time 121.77ms
iter 23540: loss 15.6277, time 122.09ms
iter 23550: loss 15.7807, time 122.79ms
iter 23560: loss 16.1675, time 123.33ms
iter 23570: loss 15.4023, time 121.30ms
iter 23580: loss 15.4640, time 122.68ms
iter 23590: loss 14.9285, time 121.50ms
tensor(0.1813)
iter 23600: loss 15.1944, time 121.30ms
iter 23610: loss 14.8245, time 120.86ms
iter 23620: loss 14.9026, time 118.78ms
iter 23630: loss 14.4947, time 118.56ms
iter 23640: loss 14.9617, time 118.77ms
iter 23650: loss 14.5677, time 118.78ms
iter 23660: loss 14.4309, time 119.03ms
iter 23670: loss 14.5088, time 119.12ms
iter 23680: loss 14.2092, time 118.73ms
iter 23690: loss 14.3793, time 120.60ms
tensor(0.1577)
iter 23700: loss 14.4808, time 121.10ms
iter 23710: loss 14.0720, time 121.01ms
iter 23720: loss 13.6846, time 121.83ms
iter 23730: loss 13.6883, time 121.03ms
iter 23740: loss 13.5762, time 122.63ms
step 23750: train loss 7.8884, val loss 7.9257
saving checkpoint to out-shakespeare-char
iter 23750: loss 13.5967, time 2844.10ms
iter 23760: loss 13.8058, time 120.97ms
iter 23770: loss 13.7579, time 119.11ms
iter 23780: loss 13.8245, time 118.85ms
iter 23790: loss 13.6021, time 120.57ms
tensor(0.1355)
iter 23800: loss 13.6289, time 121.18ms
iter 23810: loss 13.5781, time 121.41ms
iter 23820: loss 13.2327, time 121.45ms
iter 23830: loss 13.4998, time 122.95ms
iter 23840: loss 12.9086, time 121.48ms
iter 23850: loss 13.3989, time 122.90ms
iter 23860: loss 12.8886, time 120.76ms
iter 23870: loss 13.2638, time 121.14ms
iter 23880: loss 13.4224, time 119.32ms
iter 23890: loss 13.1773, time 118.54ms
tensor(0.1147)
iter 23900: loss 13.4292, time 119.49ms
iter 23910: loss 12.3143, time 119.00ms
iter 23920: loss 12.6247, time 120.21ms
iter 23930: loss 12.6966, time 120.32ms
iter 23940: loss 11.8449, time 120.52ms
iter 23950: loss 12.9690, time 121.03ms
iter 23960: loss 12.7856, time 121.77ms
iter 23970: loss 13.1450, time 120.61ms
iter 23980: loss 12.6499, time 122.73ms
iter 23990: loss 12.3800, time 122.89ms
tensor(0.0955)
step 24000: train loss 7.7969, val loss 7.8144
saving checkpoint to out-shakespeare-char
iter 24000: loss 12.8253, time 2850.90ms
iter 24010: loss 11.9916, time 119.90ms
iter 24020: loss 12.2916, time 120.92ms
iter 24030: loss 12.3434, time 123.87ms
iter 24040: loss 12.5620, time 122.95ms
iter 24050: loss 12.4397, time 119.86ms
iter 24060: loss 12.3777, time 118.46ms
iter 24070: loss 11.7348, time 119.02ms
iter 24080: loss 12.0500, time 122.03ms
iter 24090: loss 11.8042, time 121.55ms
tensor(0.0778)
iter 24100: loss 12.6147, time 123.83ms
iter 24110: loss 12.0128, time 121.16ms
iter 24120: loss 11.7859, time 121.26ms
iter 24130: loss 11.8419, time 121.14ms
iter 24140: loss 11.9865, time 119.02ms
iter 24150: loss 12.0637, time 120.43ms
iter 24160: loss 11.9543, time 120.96ms
iter 24170: loss 11.5133, time 121.26ms
iter 24180: loss 12.0087, time 121.39ms
iter 24190: loss 12.2524, time 121.60ms
tensor(0.0618)
iter 24200: loss 12.2295, time 123.32ms
iter 24210: loss 11.7339, time 123.40ms
iter 24220: loss 11.8231, time 118.86ms
iter 24230: loss 12.0234, time 121.40ms
iter 24240: loss 11.6237, time 121.04ms
step 24250: train loss 7.7404, val loss 7.7002
saving checkpoint to out-shakespeare-char
iter 24250: loss 11.5205, time 2842.97ms
iter 24260: loss 11.4042, time 118.72ms
iter 24270: loss 11.5960, time 117.51ms
iter 24280: loss 11.2484, time 118.26ms
iter 24290: loss 12.0671, time 118.14ms
tensor(0.0476)
iter 24300: loss 11.9445, time 121.63ms
iter 24310: loss 11.7728, time 119.88ms
iter 24320: loss 11.4734, time 119.29ms
iter 24330: loss 11.3137, time 119.94ms
iter 24340: loss 11.4320, time 119.91ms
iter 24350: loss 11.3325, time 120.88ms
iter 24360: loss 11.5038, time 118.92ms
iter 24370: loss 12.0182, time 119.17ms
iter 24380: loss 12.2214, time 121.36ms
iter 24390: loss 11.4560, time 119.10ms
tensor(0.0351)
iter 24400: loss 11.6471, time 117.84ms
iter 24410: loss 11.5752, time 120.06ms
iter 24420: loss 11.3454, time 119.53ms
iter 24430: loss 11.0209, time 119.40ms
iter 24440: loss 11.5467, time 120.36ms
iter 24450: loss 12.1203, time 117.24ms
iter 24460: loss 11.5352, time 121.60ms
iter 24470: loss 11.1574, time 121.00ms
iter 24480: loss 11.0216, time 122.02ms
iter 24490: loss 11.2732, time 118.96ms
tensor(0.0245)
step 24500: train loss 7.6528, val loss 7.6963
saving checkpoint to out-shakespeare-char
iter 24500: loss 11.2880, time 2858.05ms
iter 24510: loss 11.5309, time 121.09ms
iter 24520: loss 11.4900, time 117.77ms
iter 24530: loss 11.4408, time 117.16ms
iter 24540: loss 11.3949, time 118.76ms
iter 24550: loss 12.0623, time 118.08ms
iter 24560: loss 11.3866, time 119.04ms
iter 24570: loss 10.7914, time 121.13ms
iter 24580: loss 11.6346, time 122.42ms
iter 24590: loss 11.2793, time 124.63ms
tensor(0.0157)
iter 24600: loss 10.9087, time 123.55ms
iter 24610: loss 12.0418, time 121.05ms
iter 24620: loss 11.3809, time 120.89ms
iter 24630: loss 10.7620, time 122.99ms
iter 24640: loss 11.7561, time 120.94ms
iter 24650: loss 11.0739, time 120.85ms
iter 24660: loss 11.1739, time 120.47ms
iter 24670: loss 11.1329, time 118.89ms
iter 24680: loss 11.4833, time 118.56ms
iter 24690: loss 11.6134, time 118.83ms
tensor(0.0089)
iter 24700: loss 10.5123, time 119.74ms
iter 24710: loss 11.9477, time 119.70ms
iter 24720: loss 10.9074, time 120.25ms
iter 24730: loss 11.2703, time 119.40ms
iter 24740: loss 10.8930, time 121.01ms
step 24750: train loss 7.6421, val loss 7.6348
saving checkpoint to out-shakespeare-char
iter 24750: loss 11.0581, time 2846.19ms
iter 24760: loss 11.6007, time 121.85ms
iter 24770: loss 10.9887, time 121.33ms
iter 24780: loss 11.4907, time 118.46ms
iter 24790: loss 11.9685, time 118.19ms
tensor(0.0039)
iter 24800: loss 10.8825, time 119.13ms
iter 24810: loss 10.9018, time 118.57ms
iter 24820: loss 11.3205, time 118.62ms
iter 24830: loss 10.9520, time 118.99ms
iter 24840: loss 11.8946, time 119.21ms
iter 24850: loss 11.1948, time 119.40ms
iter 24860: loss 11.4192, time 119.91ms
iter 24870: loss 11.3450, time 120.12ms
iter 24880: loss 11.3048, time 121.07ms
iter 24890: loss 11.0902, time 120.65ms
tensor(0.0010)
iter 24900: loss 11.1843, time 121.19ms
iter 24910: loss 11.5049, time 121.15ms
iter 24920: loss 10.7469, time 120.90ms
iter 24930: loss 11.5184, time 121.64ms
iter 24940: loss 10.6753, time 127.36ms
iter 24950: loss 11.3723, time 121.31ms
iter 24960: loss 11.4400, time 120.73ms
iter 24970: loss 10.8997, time 120.44ms
iter 24980: loss 11.5048, time 120.79ms
iter 24990: loss 11.0919, time 120.97ms
tensor(0.0010)
step 25000: train loss 7.6370, val loss 7.6076
saving checkpoint to out-shakespeare-char
iter 25000: loss 10.9977, time 2853.48ms
iter 25010: loss 11.4715, time 115.15ms
iter 25020: loss 11.0718, time 117.21ms
iter 25030: loss 11.2636, time 115.43ms
iter 25040: loss 11.1038, time 115.53ms
iter 25050: loss 11.3925, time 116.94ms
iter 25060: loss 11.1635, time 118.19ms
iter 25070: loss 10.8202, time 115.87ms
iter 25080: loss 11.9888, time 116.95ms
iter 25090: loss 11.6576, time 115.87ms
tensor(0.0010)
iter 25100: loss 11.6021, time 116.23ms
iter 25110: loss 11.2926, time 116.92ms
iter 25120: loss 11.4402, time 117.97ms
iter 25130: loss 11.0818, time 115.75ms
iter 25140: loss 11.5404, time 116.70ms
iter 25150: loss 11.4024, time 115.18ms
iter 25160: loss 11.3487, time 115.97ms
iter 25170: loss 10.9787, time 117.02ms
iter 25180: loss 11.6696, time 117.78ms
iter 25190: loss 10.5780, time 114.53ms
tensor(0.0039)
iter 25200: loss 11.4677, time 116.77ms
iter 25210: loss 11.4236, time 120.82ms
iter 25220: loss 11.6930, time 118.64ms
iter 25230: loss 11.1584, time 120.38ms
iter 25240: loss 11.6700, time 120.97ms
step 25250: train loss 7.6461, val loss 7.6748
saving checkpoint to out-shakespeare-char
iter 25250: loss 11.1497, time 2848.52ms
iter 25260: loss 11.3590, time 120.95ms
iter 25270: loss 11.1192, time 116.99ms
iter 25280: loss 11.2502, time 119.80ms
iter 25290: loss 10.0819, time 120.13ms
tensor(0.0089)
iter 25300: loss 10.9125, time 120.33ms
iter 25310: loss 11.2920, time 120.08ms
iter 25320: loss 11.2000, time 118.44ms
iter 25330: loss 11.5358, time 120.70ms
iter 25340: loss 11.0826, time 120.47ms
iter 25350: loss 11.3970, time 120.63ms
iter 25360: loss 11.2591, time 121.82ms
iter 25370: loss 10.8262, time 119.63ms
iter 25380: loss 10.8353, time 121.48ms
iter 25390: loss 11.4831, time 122.54ms
tensor(0.0157)
iter 25400: loss 11.5986, time 123.28ms
iter 25410: loss 10.8354, time 120.63ms
iter 25420: loss 11.4256, time 121.97ms
iter 25430: loss 11.1300, time 119.74ms
iter 25440: loss 11.1755, time 119.42ms
iter 25450: loss 10.9982, time 120.67ms
iter 25460: loss 11.0883, time 121.36ms
iter 25470: loss 11.5953, time 121.58ms
iter 25480: loss 10.7681, time 122.79ms
iter 25490: loss 10.6830, time 122.93ms
tensor(0.0245)
step 25500: train loss 7.6539, val loss 7.6534
saving checkpoint to out-shakespeare-char
iter 25500: loss 11.2094, time 2836.66ms
iter 25510: loss 10.7560, time 119.40ms
iter 25520: loss 11.4657, time 120.47ms
iter 25530: loss 10.9742, time 121.29ms
iter 25540: loss 10.7401, time 121.34ms
iter 25550: loss 10.9046, time 122.87ms
iter 25560: loss 11.1967, time 121.01ms
iter 25570: loss 10.9420, time 121.03ms
iter 25580: loss 10.5528, time 121.00ms
iter 25590: loss 11.0309, time 121.45ms
tensor(0.0351)
iter 25600: loss 10.8498, time 119.60ms
iter 25610: loss 11.4088, time 120.28ms
iter 25620: loss 10.7607, time 119.21ms
iter 25630: loss 10.8971, time 121.13ms
iter 25640: loss 11.1712, time 121.54ms
iter 25650: loss 10.7056, time 123.02ms
iter 25660: loss 10.8707, time 123.55ms
iter 25670: loss 11.2778, time 121.48ms
iter 25680: loss 10.7326, time 121.22ms
iter 25690: loss 10.9336, time 121.26ms
tensor(0.0476)
iter 25700: loss 10.3950, time 119.76ms
iter 25710: loss 10.8920, time 120.95ms
iter 25720: loss 10.4739, time 121.40ms
iter 25730: loss 10.9073, time 121.75ms
iter 25740: loss 10.6662, time 123.42ms
step 25750: train loss 7.5953, val loss 7.7024
saving checkpoint to out-shakespeare-char
iter 25750: loss 10.3987, time 2851.08ms
iter 25760: loss 10.3904, time 119.56ms
iter 25770: loss 10.5199, time 121.27ms
iter 25780: loss 10.8372, time 121.29ms
iter 25790: loss 10.8373, time 121.71ms
tensor(0.0618)
iter 25800: loss 11.4626, time 123.71ms
iter 25810: loss 10.7612, time 120.49ms
iter 25820: loss 10.7488, time 121.39ms
iter 25830: loss 11.0888, time 121.38ms
iter 25840: loss 10.3405, time 119.27ms
iter 25850: loss 11.1643, time 119.94ms
iter 25860: loss 10.5671, time 121.12ms
iter 25870: loss 10.6524, time 118.96ms
iter 25880: loss 10.8155, time 122.63ms
iter 25890: loss 9.9986, time 122.81ms
tensor(0.0778)
iter 25900: loss 11.3279, time 121.36ms
iter 25910: loss 10.2339, time 121.10ms
iter 25920: loss 10.7939, time 121.17ms
iter 25930: loss 10.7812, time 119.93ms
iter 25940: loss 10.6609, time 120.27ms
iter 25950: loss 10.3179, time 121.09ms
iter 25960: loss 10.5289, time 121.17ms
iter 25970: loss 10.6598, time 121.88ms
iter 25980: loss 10.4220, time 123.96ms
iter 25990: loss 10.7247, time 121.29ms
tensor(0.0955)
step 26000: train loss 7.6842, val loss 7.7024
saving checkpoint to out-shakespeare-char
iter 26000: loss 9.9865, time 2870.41ms
iter 26010: loss 10.5490, time 121.84ms
iter 26020: loss 10.3434, time 123.68ms
iter 26030: loss 9.7183, time 121.25ms
iter 26040: loss 10.5168, time 121.35ms
iter 26050: loss 10.8313, time 121.40ms
iter 26060: loss 9.9989, time 118.78ms
iter 26070: loss 10.3668, time 118.14ms
iter 26080: loss 10.1891, time 120.21ms
iter 26090: loss 10.0910, time 121.11ms
tensor(0.1147)
iter 26100: loss 10.0861, time 121.65ms
iter 26110: loss 10.2049, time 121.58ms
iter 26120: loss 9.7670, time 120.14ms
iter 26130: loss 10.7768, time 123.37ms
iter 26140: loss 9.9986, time 121.05ms
iter 26150: loss 9.7009, time 121.40ms
iter 26160: loss 9.6453, time 121.08ms
iter 26170: loss 10.3228, time 118.94ms
iter 26180: loss 9.9623, time 119.61ms
iter 26190: loss 9.5816, time 120.42ms
tensor(0.1355)
iter 26200: loss 10.7716, time 121.28ms
iter 26210: loss 10.6726, time 121.28ms
iter 26220: loss 10.0277, time 122.15ms
iter 26230: loss 9.5242, time 123.19ms
iter 26240: loss 9.8210, time 121.44ms
step 26250: train loss 7.7104, val loss 7.7344
saving checkpoint to out-shakespeare-char
iter 26250: loss 10.1878, time 2866.08ms
iter 26260: loss 9.8655, time 121.49ms
iter 26270: loss 10.2686, time 123.43ms
iter 26280: loss 9.7828, time 122.75ms
iter 26290: loss 9.7345, time 121.29ms
tensor(0.1577)
iter 26300: loss 9.9024, time 121.80ms
iter 26310: loss 10.3172, time 119.21ms
iter 26320: loss 10.1997, time 119.02ms
iter 26330: loss 10.4555, time 120.94ms
iter 26340: loss 9.6833, time 121.10ms
iter 26350: loss 9.7245, time 121.18ms
iter 26360: loss 9.0399, time 121.72ms
iter 26370: loss 9.3530, time 121.40ms
iter 26380: loss 9.8903, time 122.46ms
iter 26390: loss 9.7790, time 121.25ms
tensor(0.1813)
iter 26400: loss 9.6325, time 121.57ms
iter 26410: loss 9.8596, time 121.25ms
iter 26420: loss 9.4705, time 119.53ms
iter 26430: loss 9.6478, time 120.70ms
iter 26440: loss 9.7645, time 121.29ms
iter 26450: loss 9.6613, time 121.01ms
iter 26460: loss 9.6368, time 122.46ms
iter 26470: loss 9.9974, time 123.53ms
iter 26480: loss 9.5002, time 121.26ms
iter 26490: loss 9.4571, time 119.32ms
tensor(0.2061)
step 26500: train loss 7.8102, val loss 7.7522
saving checkpoint to out-shakespeare-char
iter 26500: loss 9.5676, time 2814.11ms
iter 26510: loss 10.3484, time 120.64ms
iter 26520: loss 9.5816, time 120.78ms
iter 26530: loss 9.3858, time 118.86ms
iter 26540: loss 9.2902, time 119.07ms
iter 26550: loss 9.3033, time 119.10ms
iter 26560: loss 9.5460, time 119.28ms
iter 26570: loss 9.5456, time 121.20ms
iter 26580: loss 9.8859, time 120.92ms
iter 26590: loss 9.6938, time 123.17ms
tensor(0.2321)
iter 26600: loss 10.2732, time 121.30ms
iter 26610: loss 10.5519, time 120.75ms
iter 26620: loss 10.5047, time 121.07ms
iter 26630: loss 10.6682, time 121.94ms
iter 26640: loss 10.2062, time 120.87ms
iter 26650: loss 9.9024, time 123.01ms
iter 26660: loss 10.8530, time 120.74ms
iter 26670: loss 11.3027, time 123.01ms
iter 26680: loss 10.7206, time 120.77ms
iter 26690: loss 11.1940, time 119.81ms
tensor(0.2591)
iter 26700: loss 11.3297, time 121.06ms
iter 26710: loss 10.9258, time 118.72ms
iter 26720: loss 11.5210, time 121.06ms
iter 26730: loss 11.6087, time 120.69ms
iter 26740: loss 11.8110, time 117.92ms
step 26750: train loss 8.8701, val loss 8.8507
saving checkpoint to out-shakespeare-char
iter 26750: loss 11.8618, time 2848.50ms
iter 26760: loss 12.1867, time 121.27ms
iter 26770: loss 12.0996, time 120.86ms
iter 26780: loss 12.0515, time 122.63ms
iter 26790: loss 11.1195, time 122.10ms
tensor(0.2871)
iter 26800: loss 11.9441, time 121.47ms
iter 26810: loss 11.3455, time 120.92ms
iter 26820: loss 12.1628, time 118.45ms
iter 26830: loss 11.6156, time 121.28ms
iter 26840: loss 11.6337, time 120.76ms
iter 26850: loss 11.9156, time 117.90ms
iter 26860: loss 12.2767, time 119.39ms
iter 26870: loss 12.1773, time 120.55ms
iter 26880: loss 11.7322, time 118.83ms
iter 26890: loss 12.3315, time 120.99ms
tensor(0.3159)
iter 26900: loss 12.1582, time 121.23ms
iter 26910: loss 12.3241, time 121.64ms
iter 26920: loss 11.1606, time 122.35ms
iter 26930: loss 12.8456, time 120.59ms
iter 26940: loss 12.6834, time 123.25ms
iter 26950: loss 12.1713, time 120.76ms
iter 26960: loss 13.2373, time 121.00ms
iter 26970: loss 13.2065, time 120.85ms
iter 26980: loss 14.0261, time 121.28ms
iter 26990: loss 14.1657, time 118.60ms
tensor(0.3455)
step 27000: train loss 8.7052, val loss 8.7117
saving checkpoint to out-shakespeare-char
iter 27000: loss 13.9583, time 2860.74ms
iter 27010: loss 14.4926, time 123.38ms
iter 27020: loss 13.7861, time 121.38ms
iter 27030: loss 14.4186, time 121.57ms
iter 27040: loss 13.9311, time 121.04ms
iter 27050: loss 14.0140, time 121.33ms
iter 27060: loss 13.9809, time 119.32ms
iter 27070: loss 14.6044, time 119.61ms
iter 27080: loss 13.6850, time 119.25ms
iter 27090: loss 13.1830, time 120.93ms
tensor(0.3757)
iter 27100: loss 14.3912, time 121.44ms
iter 27110: loss 13.7857, time 120.20ms
iter 27120: loss 13.5420, time 121.11ms
iter 27130: loss 13.6065, time 122.47ms
iter 27140: loss 14.4732, time 123.33ms
iter 27150: loss 14.3582, time 123.01ms
iter 27160: loss 14.5072, time 120.81ms
iter 27170: loss 14.5388, time 118.76ms
iter 27180: loss 13.9444, time 120.12ms
iter 27190: loss 14.3115, time 118.80ms
tensor(0.4063)
iter 27200: loss 14.5303, time 118.13ms
iter 27210: loss 14.6592, time 119.10ms
iter 27220: loss 15.5772, time 119.25ms
iter 27230: loss 15.6668, time 119.53ms
iter 27240: loss 15.6429, time 120.18ms
step 27250: train loss 7.8018, val loss 7.8495
saving checkpoint to out-shakespeare-char
iter 27250: loss 15.5050, time 2854.60ms
iter 27260: loss 15.4706, time 123.10ms
iter 27270: loss 16.8374, time 120.85ms
iter 27280: loss 15.5749, time 118.52ms
iter 27290: loss 15.3572, time 120.83ms
tensor(0.4373)
iter 27300: loss 16.3719, time 120.16ms
iter 27310: loss 15.9967, time 120.84ms
iter 27320: loss 17.9949, time 118.47ms
iter 27330: loss 16.8789, time 118.58ms
iter 27340: loss 15.9916, time 118.59ms
iter 27350: loss 16.1954, time 118.86ms
iter 27360: loss 15.3180, time 118.66ms
iter 27370: loss 15.6902, time 118.60ms
iter 27380: loss 14.9513, time 118.64ms
iter 27390: loss 15.0071, time 118.70ms
tensor(0.4686)
iter 27400: loss 14.7012, time 119.33ms
iter 27410: loss 14.5750, time 119.05ms
iter 27420: loss 14.0483, time 119.10ms
iter 27430: loss 14.5778, time 120.62ms
iter 27440: loss 15.0691, time 119.87ms
iter 27450: loss 14.5854, time 120.72ms
iter 27460: loss 14.7573, time 121.63ms
iter 27470: loss 14.7308, time 122.14ms
iter 27480: loss 15.3084, time 123.07ms
iter 27490: loss 15.8256, time 121.08ms
tensor(0.5000)
step 27500: train loss 7.9539, val loss 7.9670
saving checkpoint to out-shakespeare-char
iter 27500: loss 15.8060, time 2833.10ms
iter 27510: loss 15.9362, time 122.83ms
iter 27520: loss 15.8019, time 121.98ms
iter 27530: loss 15.0211, time 121.95ms
iter 27540: loss 14.5327, time 118.77ms
iter 27550: loss 13.7307, time 120.01ms
iter 27560: loss 14.2263, time 120.79ms
iter 27570: loss 13.3990, time 120.67ms
iter 27580: loss 13.9712, time 120.55ms
iter 27590: loss 13.3366, time 118.52ms
tensor(0.5314)
iter 27600: loss 12.9951, time 121.08ms
iter 27610: loss 13.4687, time 120.53ms
iter 27620: loss 12.9374, time 118.82ms
iter 27630: loss 13.2269, time 118.28ms
iter 27640: loss 12.2676, time 119.00ms
iter 27650: loss 12.0499, time 118.76ms
iter 27660: loss 12.2117, time 118.85ms
iter 27670: loss 12.1194, time 118.89ms
iter 27680: loss 11.5739, time 119.03ms
iter 27690: loss 11.5113, time 119.51ms
tensor(0.5627)
iter 27700: loss 11.8929, time 118.99ms
iter 27710: loss 11.6504, time 120.22ms
iter 27720: loss 10.7521, time 120.37ms
iter 27730: loss 11.0287, time 120.73ms
iter 27740: loss 10.4371, time 120.64ms
step 27750: train loss 7.9143, val loss 7.9112
saving checkpoint to out-shakespeare-char
iter 27750: loss 10.1950, time 2825.97ms
iter 27760: loss 10.1086, time 122.32ms
iter 27770: loss 10.8435, time 122.78ms
iter 27780: loss 9.4694, time 122.81ms
iter 27790: loss 10.0504, time 122.77ms
tensor(0.5937)
iter 27800: loss 9.4137, time 120.35ms
iter 27810: loss 9.7046, time 122.93ms
iter 27820: loss 10.0059, time 122.70ms
iter 27830: loss 9.2510, time 122.87ms
iter 27840: loss 9.0647, time 122.81ms
iter 27850: loss 9.6557, time 120.88ms
iter 27860: loss 9.9033, time 121.45ms
iter 27870: loss 10.8186, time 121.30ms
iter 27880: loss 12.5001, time 119.61ms
iter 27890: loss 13.4492, time 119.28ms
tensor(0.6243)
iter 27900: loss 13.5754, time 119.03ms
iter 27910: loss 12.5980, time 118.73ms
iter 27920: loss 12.8151, time 119.10ms
iter 27930: loss 12.9624, time 119.36ms
iter 27940: loss 11.8255, time 119.87ms
iter 27950: loss 13.8884, time 120.76ms
iter 27960: loss 12.1937, time 120.01ms
iter 27970: loss 12.8372, time 121.57ms
iter 27980: loss 11.9723, time 122.52ms
iter 27990: loss 12.7339, time 122.74ms
tensor(0.6545)
step 28000: train loss 7.9293, val loss 7.9065
saving checkpoint to out-shakespeare-char
iter 28000: loss 12.9117, time 2862.63ms
iter 28010: loss 11.9380, time 118.32ms
iter 28020: loss 11.4606, time 118.80ms
iter 28030: loss 10.8909, time 118.71ms
iter 28040: loss 11.0427, time 119.04ms
iter 28050: loss 10.8893, time 119.58ms
iter 28060: loss 10.6887, time 120.43ms
iter 28070: loss 10.3376, time 119.70ms
iter 28080: loss 10.4668, time 121.59ms
iter 28090: loss 10.4341, time 122.93ms
tensor(0.6841)
iter 28100: loss 9.5417, time 123.05ms
iter 28110: loss 10.6224, time 122.92ms
iter 28120: loss 9.1875, time 120.83ms
iter 28130: loss 10.0205, time 121.03ms
iter 28140: loss 9.5480, time 120.23ms
iter 28150: loss 9.2208, time 117.88ms
iter 28160: loss 8.8772, time 118.74ms
iter 28170: loss 9.7077, time 118.78ms
iter 28180: loss 9.3577, time 119.15ms
iter 28190: loss 9.0171, time 118.48ms
tensor(0.7129)
iter 28200: loss 9.2225, time 119.06ms
iter 28210: loss 9.6618, time 118.80ms
iter 28220: loss 9.7303, time 118.75ms
iter 28230: loss 9.0243, time 118.22ms
iter 28240: loss 9.1936, time 118.07ms
step 28250: train loss 7.9617, val loss 7.9135
saving checkpoint to out-shakespeare-char
iter 28250: loss 9.6177, time 2842.91ms
iter 28260: loss 8.7984, time 122.15ms
iter 28270: loss 8.7920, time 122.77ms
iter 28280: loss 8.9062, time 120.84ms
iter 28290: loss 8.6266, time 122.58ms
tensor(0.7409)
iter 28300: loss 8.9771, time 123.08ms
iter 28310: loss 8.4573, time 122.65ms
iter 28320: loss 8.7799, time 122.79ms
iter 28330: loss 8.8921, time 120.95ms
iter 28340: loss 8.9761, time 121.37ms
iter 28350: loss 9.1109, time 120.91ms
iter 28360: loss 8.4803, time 118.50ms
iter 28370: loss 8.1234, time 118.79ms
iter 28380: loss 8.4680, time 118.64ms
iter 28390: loss 8.6914, time 118.95ms
tensor(0.7679)
iter 28400: loss 8.3690, time 118.97ms
iter 28410: loss 8.4293, time 118.45ms
iter 28420: loss 8.8086, time 118.32ms
iter 28430: loss 8.9978, time 118.99ms
iter 28440: loss 11.2972, time 119.46ms
iter 28450: loss 12.1511, time 120.33ms
iter 28460: loss 24.0263, time 120.50ms
iter 28470: loss 45.9007, time 120.98ms
iter 28480: loss 54.9255, time 121.01ms
iter 28490: loss 21.7753, time 121.57ms
tensor(0.7939)
step 28500: train loss 17.3145, val loss 17.5512
saving checkpoint to out-shakespeare-char
iter 28500: loss 19.4645, time 2853.14ms
iter 28510: loss 39.2030, time 120.64ms
iter 28520: loss 28.5411, time 118.97ms
iter 28530: loss 34.5478, time 120.28ms
iter 28540: loss 50.0382, time 119.24ms
iter 28550: loss 230.6378, time 118.16ms
iter 28560: loss 42.8855, time 118.90ms
iter 28570: loss 48.2929, time 118.68ms
iter 28580: loss 46.1453, time 119.05ms
iter 28590: loss 47.7666, time 119.22ms
tensor(0.8187)
iter 28600: loss 71.5779, time 120.37ms
iter 28610: loss 112.4349, time 120.04ms
iter 28620: loss 54.2340, time 120.82ms
iter 28630: loss 31.8197, time 121.64ms
iter 28640: loss 23.4662, time 120.99ms
iter 28650: loss 78.5142, time 120.78ms
iter 28660: loss 43.1667, time 120.10ms
iter 28670: loss 39.6580, time 121.02ms
iter 28680: loss 42.0329, time 120.15ms
iter 28690: loss 48.9116, time 120.70ms
tensor(0.8423)
iter 28700: loss 27.9729, time 121.74ms
iter 28710: loss 29.1252, time 120.86ms
iter 28720: loss 87.8921, time 120.33ms
iter 28730: loss 34.7785, time 121.40ms
iter 28740: loss 33.1884, time 121.42ms
step 28750: train loss 32.8766, val loss 34.1482
saving checkpoint to out-shakespeare-char
iter 28750: loss 49.8356, time 2856.89ms
iter 28760: loss 48.7759, time 120.93ms
iter 28770: loss 50.1000, time 119.00ms
iter 28780: loss 46.6574, time 117.72ms
iter 28790: loss 72.3759, time 118.94ms
tensor(0.8645)
iter 28800: loss 51.8780, time 118.98ms
iter 28810: loss 30.6216, time 118.87ms
iter 28820: loss 55.2502, time 118.99ms
iter 28830: loss 36.0485, time 118.50ms
iter 28840: loss 72.0639, time 119.46ms
iter 28850: loss 69.5810, time 120.20ms
iter 28860: loss 49.3121, time 121.08ms
iter 28870: loss 42.1887, time 121.34ms
iter 28880: loss 75.7561, time 120.90ms
iter 28890: loss 48.4242, time 122.88ms
tensor(0.8853)
iter 28900: loss 62.8380, time 123.18ms
iter 28910: loss 46.6957, time 122.90ms
iter 28920: loss 32.1435, time 122.98ms
iter 28930: loss 50.4080, time 118.93ms
iter 28940: loss 37.0096, time 120.93ms
iter 28950: loss 55.5379, time 121.06ms
iter 28960: loss 39.3838, time 120.73ms
iter 28970: loss 37.3608, time 118.90ms
iter 28980: loss 62.3138, time 118.84ms
iter 28990: loss 70.1749, time 119.11ms
tensor(0.9045)
step 29000: train loss 52.3052, val loss 51.9738
saving checkpoint to out-shakespeare-char
iter 29000: loss 73.0213, time 2846.31ms
iter 29010: loss 53.1570, time 122.32ms
iter 29020: loss 44.0949, time 123.20ms
iter 29030: loss 55.7400, time 122.59ms
iter 29040: loss 55.2905, time 121.07ms
iter 29050: loss 46.5652, time 122.65ms
iter 29060: loss 42.2611, time 122.53ms
iter 29070: loss 41.4971, time 122.53ms
iter 29080: loss 47.7325, time 122.93ms
iter 29090: loss 49.7134, time 120.08ms
tensor(0.9222)
iter 29100: loss 46.6795, time 120.81ms
iter 29110: loss 50.3578, time 120.59ms
iter 29120: loss 42.5188, time 120.57ms
iter 29130: loss 62.4040, time 120.58ms
iter 29140: loss 67.5358, time 118.37ms
iter 29150: loss 53.5722, time 120.63ms
iter 29160: loss 47.2838, time 120.64ms
iter 29170: loss 40.0504, time 120.49ms
iter 29180: loss 49.9040, time 121.26ms
iter 29190: loss 45.0928, time 118.32ms
tensor(0.9382)
iter 29200: loss 42.3653, time 119.03ms
iter 29210: loss 48.0191, time 118.50ms
iter 29220: loss 82.0725, time 118.38ms
iter 29230: loss 62.8280, time 118.46ms
iter 29240: loss 58.3966, time 118.36ms
step 29250: train loss 36.7686, val loss 36.6806
saving checkpoint to out-shakespeare-char
iter 29250: loss 62.8505, time 2852.39ms
iter 29260: loss 70.1750, time 120.55ms
iter 29270: loss 55.1671, time 120.02ms
iter 29280: loss 52.7416, time 120.57ms
iter 29290: loss 55.8020, time 120.56ms
tensor(0.9524)
iter 29300: loss 48.2415, time 120.88ms
iter 29310: loss 60.9540, time 120.81ms
iter 29320: loss 56.8405, time 120.86ms
iter 29330: loss 59.6598, time 120.86ms
iter 29340: loss 53.2169, time 120.76ms
iter 29350: loss 45.8439, time 120.47ms
iter 29360: loss 56.5988, time 121.56ms
iter 29370: loss 59.9147, time 120.97ms
iter 29380: loss 72.7959, time 121.03ms
iter 29390: loss 45.9120, time 121.52ms
tensor(0.9649)
iter 29400: loss 43.1510, time 120.80ms
iter 29410: loss 49.7243, time 122.18ms
iter 29420: loss 56.2533, time 121.87ms
iter 29430: loss 48.3251, time 122.64ms
iter 29440: loss 51.4343, time 122.45ms
iter 29450: loss 53.1400, time 120.56ms
iter 29460: loss 61.1464, time 122.59ms
iter 29470: loss 54.2226, time 122.49ms
iter 29480: loss 73.3119, time 122.57ms
iter 29490: loss 59.0940, time 122.55ms
tensor(0.9755)
step 29500: train loss 45.2522, val loss 45.7647
saving checkpoint to out-shakespeare-char
iter 29500: loss 58.2708, time 2839.18ms
iter 29510: loss 78.4734, time 120.55ms
iter 29520: loss 63.6635, time 118.72ms
iter 29530: loss 59.2439, time 118.40ms
iter 29540: loss 58.1920, time 118.17ms
iter 29550: loss 66.4768, time 118.38ms
iter 29560: loss 76.3738, time 117.42ms
iter 29570: loss 67.9516, time 118.36ms
iter 29580: loss 59.5459, time 118.38ms
iter 29590: loss 57.4561, time 118.69ms
tensor(0.9843)
iter 29600: loss 59.0429, time 118.67ms
iter 29610: loss 52.7419, time 118.46ms
iter 29620: loss 75.7774, time 118.71ms
iter 29630: loss 64.2324, time 118.34ms
iter 29640: loss 64.0513, time 118.43ms
iter 29650: loss 52.9352, time 118.41ms
iter 29660: loss 60.9088, time 117.58ms
iter 29670: loss 67.8285, time 118.74ms
iter 29680: loss 60.1250, time 119.00ms
iter 29690: loss 54.3583, time 118.46ms
tensor(0.9911)
iter 29700: loss 62.8050, time 118.66ms
iter 29710: loss 69.4239, time 118.38ms
iter 29720: loss 69.6413, time 120.05ms
iter 29730: loss 61.3439, time 118.80ms
iter 29740: loss 63.7916, time 118.77ms
step 29750: train loss 76.7207, val loss 76.6555
saving checkpoint to out-shakespeare-char
iter 29750: loss 96.6547, time 2855.60ms
iter 29760: loss 72.6448, time 122.42ms
iter 29770: loss 72.6056, time 122.47ms
iter 29780: loss 64.6982, time 122.50ms
iter 29790: loss 56.3868, time 121.08ms
tensor(0.9961)
iter 29800: loss 84.8156, time 122.74ms
iter 29810: loss 115.4474, time 122.51ms
iter 29820: loss 81.0140, time 122.46ms
iter 29830: loss 360.9821, time 122.79ms
iter 29840: loss 88.1173, time 120.75ms
iter 29850: loss 69.4487, time 122.66ms
iter 29860: loss 69.0436, time 122.78ms
iter 29870: loss 65.5858, time 120.76ms
iter 29880: loss 53.8896, time 120.99ms
iter 29890: loss 66.4189, time 121.42ms
tensor(0.9990)
iter 29900: loss 51.3914, time 118.45ms
iter 29910: loss 61.7451, time 118.59ms
iter 29920: loss 56.0760, time 118.77ms
iter 29930: loss 51.5984, time 118.63ms
iter 29940: loss 63.1225, time 118.70ms
iter 29950: loss 67.0667, time 119.15ms
iter 29960: loss 58.3501, time 119.46ms
iter 29970: loss 71.9937, time 119.96ms
iter 29980: loss 64.2079, time 121.76ms
iter 29990: loss 113.3581, time 121.71ms
tensor(1.)
step 30000: train loss 56.7914, val loss 57.2069
saving checkpoint to out-shakespeare-char
iter 30000: loss 83.1086, time 2853.16ms
iter 30010: loss 80.5453, time 119.60ms
iter 30020: loss 63.6359, time 119.30ms
iter 30030: loss 68.0751, time 119.44ms
iter 30040: loss 69.3850, time 120.68ms
iter 30050: loss 64.3508, time 121.51ms
iter 30060: loss 50.8807, time 122.38ms
iter 30070: loss 81.9219, time 123.26ms
iter 30080: loss 61.8402, time 121.05ms
iter 30090: loss 76.3119, time 121.40ms
tensor(0.9990)
iter 30100: loss 68.0568, time 121.63ms
iter 30110: loss 51.4112, time 119.21ms
iter 30120: loss 63.7918, time 118.95ms
iter 30130: loss 60.0639, time 119.14ms
iter 30140: loss 63.9562, time 118.23ms
iter 30150: loss 83.0968, time 120.50ms
iter 30160: loss 70.9109, time 121.85ms
iter 30170: loss 52.6312, time 123.78ms
iter 30180: loss 56.4410, time 123.38ms
iter 30190: loss 78.2319, time 120.88ms
tensor(0.9961)
iter 30200: loss 57.9586, time 121.52ms
iter 30210: loss 57.2427, time 121.17ms
iter 30220: loss 62.5124, time 119.97ms
iter 30230: loss 64.6308, time 119.81ms
iter 30240: loss 70.3789, time 120.85ms
step 30250: train loss 32.8588, val loss 32.2714
saving checkpoint to out-shakespeare-char
iter 30250: loss 52.5396, time 2828.93ms
iter 30260: loss 53.6307, time 122.56ms
iter 30270: loss 77.5298, time 123.65ms
iter 30280: loss 53.4179, time 121.04ms
iter 30290: loss 60.3599, time 121.28ms
tensor(0.9911)
iter 30300: loss 64.5867, time 121.69ms
iter 30310: loss 62.3420, time 119.50ms
iter 30320: loss 44.8003, time 118.98ms
iter 30330: loss 48.6984, time 121.42ms
iter 30340: loss 73.8013, time 120.28ms
iter 30350: loss 93.0627, time 121.98ms
iter 30360: loss 69.7483, time 123.14ms
iter 30370: loss 68.9469, time 121.22ms
iter 30380: loss 70.5669, time 118.65ms
iter 30390: loss 59.7164, time 120.69ms
tensor(0.9843)
iter 30400: loss 60.9554, time 119.33ms
iter 30410: loss 62.2281, time 119.67ms
iter 30420: loss 56.6263, time 119.73ms
iter 30430: loss 75.8918, time 121.13ms
iter 30440: loss 69.2211, time 119.92ms
iter 30450: loss 72.4217, time 122.19ms
iter 30460: loss 68.8973, time 122.60ms
iter 30470: loss 61.0635, time 121.18ms
iter 30480: loss 64.4204, time 121.18ms
iter 30490: loss 76.7172, time 121.17ms
tensor(0.9755)
step 30500: train loss 34.5885, val loss 34.2416
saving checkpoint to out-shakespeare-char
iter 30500: loss 61.3800, time 2840.54ms
iter 30510: loss 72.1040, time 121.61ms
iter 30520: loss 78.1118, time 122.95ms
iter 30530: loss 88.7602, time 122.74ms
iter 30540: loss 78.1773, time 118.07ms
iter 30550: loss 63.3662, time 115.08ms
iter 30560: loss 74.5936, time 116.99ms
iter 30570: loss 66.1495, time 118.51ms
iter 30580: loss 59.9635, time 116.35ms
iter 30590: loss 92.7755, time 116.98ms
tensor(0.9649)
iter 30600: loss 58.8538, time 118.99ms
iter 30610: loss 59.1514, time 115.62ms
iter 30620: loss 58.0035, time 116.89ms
iter 30630: loss 52.7888, time 119.62ms
iter 30640: loss 69.4835, time 117.05ms
iter 30650: loss 71.5175, time 116.93ms
iter 30660: loss 62.2174, time 118.41ms
iter 30670: loss 69.9606, time 114.45ms
iter 30680: loss 61.9818, time 115.44ms
iter 30690: loss 66.4925, time 119.41ms
tensor(0.9524)
iter 30700: loss 48.2211, time 115.36ms
iter 30710: loss 56.5668, time 114.87ms
iter 30720: loss 55.4959, time 119.06ms
iter 30730: loss 63.4080, time 114.77ms
iter 30740: loss 57.7677, time 116.20ms
step 30750: train loss 41.6137, val loss 42.1116
saving checkpoint to out-shakespeare-char
iter 30750: loss 60.7559, time 2857.35ms
iter 30760: loss 60.3825, time 117.07ms
iter 30770: loss 59.0470, time 118.29ms
iter 30780: loss 61.3849, time 116.11ms
iter 30790: loss 69.4159, time 115.01ms
tensor(0.9382)
iter 30800: loss 62.7903, time 118.59ms
iter 30810: loss 55.2570, time 116.08ms
iter 30820: loss 51.2562, time 116.97ms
iter 30830: loss 61.1867, time 118.55ms
iter 30840: loss 57.3449, time 116.01ms
iter 30850: loss 47.8662, time 114.82ms
iter 30860: loss 64.4477, time 118.38ms
iter 30870: loss 53.6977, time 116.07ms
iter 30880: loss 56.5267, time 117.04ms
iter 30890: loss 55.8111, time 118.53ms
tensor(0.9222)
iter 30900: loss 47.0330, time 116.57ms
iter 30910: loss 49.1134, time 115.25ms
iter 30920: loss 56.1168, time 117.28ms
iter 30930: loss 48.1729, time 117.49ms
iter 30940: loss 49.1551, time 116.91ms
iter 30950: loss 52.7771, time 118.45ms
iter 30960: loss 62.4141, time 116.03ms
iter 30970: loss 53.1178, time 114.82ms
iter 30980: loss 50.5919, time 118.41ms
iter 30990: loss 54.8039, time 113.77ms
tensor(0.9045)
step 31000: train loss 32.2845, val loss 32.6091
saving checkpoint to out-shakespeare-char
iter 31000: loss 53.4331, time 2841.04ms
iter 31010: loss 65.5248, time 115.64ms
iter 31020: loss 57.1135, time 115.87ms
iter 31030: loss 50.5439, time 115.95ms
iter 31040: loss 46.5250, time 115.26ms
iter 31050: loss 60.2608, time 115.34ms
iter 31060: loss 62.7514, time 113.55ms
iter 31070: loss 54.4173, time 115.17ms
iter 31080: loss 61.3528, time 114.93ms
iter 31090: loss 47.9908, time 117.25ms
tensor(0.8853)
iter 31100: loss 49.9441, time 116.56ms
iter 31110: loss 56.6070, time 116.49ms
iter 31120: loss 46.6788, time 115.05ms
iter 31130: loss 57.8520, time 115.20ms
iter 31140: loss 44.7689, time 115.03ms
iter 31150: loss 50.4883, time 115.17ms
iter 31160: loss 49.5839, time 115.09ms
iter 31170: loss 45.1315, time 114.99ms
iter 31180: loss 50.1806, time 116.01ms
iter 31190: loss 50.4101, time 115.08ms
tensor(0.8645)
iter 31200: loss 40.3397, time 115.43ms
iter 31210: loss 46.7136, time 114.23ms
iter 31220: loss 45.9690, time 117.07ms
iter 31230: loss 49.8878, time 113.70ms
iter 31240: loss 41.8578, time 118.28ms
step 31250: train loss 27.6980, val loss 27.7778
saving checkpoint to out-shakespeare-char
iter 31250: loss 40.2623, time 2849.09ms
iter 31260: loss 41.0158, time 115.69ms
iter 31270: loss 54.6838, time 117.07ms
iter 31280: loss 44.5342, time 116.44ms
iter 31290: loss 38.3096, time 116.26ms
tensor(0.8423)
iter 31300: loss 43.7971, time 117.86ms
iter 31310: loss 39.7314, time 118.30ms
iter 31320: loss 51.4628, time 115.62ms
iter 31330: loss 44.9116, time 116.93ms
iter 31340: loss 48.0584, time 116.16ms
iter 31350: loss 42.7896, time 116.52ms
iter 31360: loss 40.2155, time 116.20ms
iter 31370: loss 38.2391, time 118.12ms
iter 31380: loss 39.9454, time 115.95ms
iter 31390: loss 46.8904, time 117.14ms
tensor(0.8187)
iter 31400: loss 39.1275, time 116.17ms
iter 31410: loss 37.5771, time 116.30ms
iter 31420: loss 40.2355, time 117.11ms
iter 31430: loss 50.5931, time 118.40ms
iter 31440: loss 45.6922, time 115.01ms
iter 31450: loss 37.2033, time 116.70ms
iter 31460: loss 44.3986, time 115.99ms
iter 31470: loss 52.9712, time 115.99ms
iter 31480: loss 42.4096, time 116.98ms
iter 31490: loss 35.9890, time 118.61ms
tensor(0.7939)
step 31500: train loss 30.7946, val loss 31.2162
saving checkpoint to out-shakespeare-char
iter 31500: loss 45.0063, time 2849.07ms
iter 31510: loss 36.7384, time 116.01ms
iter 31520: loss 40.7871, time 115.06ms
iter 31530: loss 42.4356, time 117.45ms
iter 31540: loss 38.9216, time 116.11ms
iter 31550: loss 35.1560, time 116.17ms
iter 31560: loss 47.9697, time 118.39ms
iter 31570: loss 40.9656, time 116.29ms
iter 31580: loss 42.6893, time 114.84ms
iter 31590: loss 36.3637, time 118.37ms
tensor(0.7679)
iter 31600: loss 40.8951, time 116.41ms
iter 31610: loss 33.9596, time 117.03ms
iter 31620: loss 38.5394, time 119.38ms
iter 31630: loss 54.0042, time 116.79ms
iter 31640: loss 39.7813, time 114.82ms
iter 31650: loss 37.3609, time 119.15ms
iter 31660: loss 37.9726, time 116.93ms
iter 31670: loss 189.0095, time 114.94ms
iter 31680: loss 43.1541, time 118.48ms
iter 31690: loss 42.4079, time 116.77ms
tensor(0.7409)
iter 31700: loss 43.2633, time 115.78ms
iter 31710: loss 41.4849, time 117.09ms
iter 31720: loss 46.8481, time 121.77ms
iter 31730: loss 45.1394, time 122.08ms
iter 31740: loss 43.3934, time 121.81ms
step 31750: train loss 21.7171, val loss 22.0214
saving checkpoint to out-shakespeare-char
iter 31750: loss 36.3807, time 2833.44ms
iter 31760: loss 39.7419, time 121.62ms
iter 31770: loss 44.5857, time 123.87ms
iter 31780: loss 37.2751, time 121.57ms
iter 31790: loss 42.6680, time 121.66ms
tensor(0.7129)
iter 31800: loss 31.7323, time 119.81ms
iter 31810: loss 45.6178, time 121.27ms
iter 31820: loss 42.9961, time 119.89ms
iter 31830: loss 36.2462, time 122.41ms
iter 31840: loss 41.4071, time 123.73ms
iter 31850: loss 35.3521, time 121.65ms
iter 31860: loss 41.0960, time 121.67ms
iter 31870: loss 33.1746, time 119.81ms
iter 31880: loss 41.7465, time 121.70ms
iter 31890: loss 42.5508, time 122.03ms
tensor(0.6841)
iter 31900: loss 33.2269, time 122.03ms
iter 31910: loss 34.9544, time 122.17ms
iter 31920: loss 44.2030, time 121.64ms
iter 31930: loss 33.2911, time 121.90ms
iter 31940: loss 36.3732, time 120.65ms
iter 31950: loss 36.8220, time 121.82ms
iter 31960: loss 37.4434, time 121.16ms
iter 31970: loss 29.0478, time 123.18ms
iter 31980: loss 41.3486, time 122.36ms
iter 31990: loss 38.9318, time 119.72ms
tensor(0.6545)
step 32000: train loss 33.6995, val loss 34.2215
saving checkpoint to out-shakespeare-char
iter 32000: loss 42.5331, time 2818.53ms
iter 32010: loss 39.5561, time 122.40ms
iter 32020: loss 41.8476, time 122.15ms
iter 32030: loss 32.9841, time 122.04ms
iter 32040: loss 30.9851, time 121.56ms
iter 32050: loss 36.6594, time 120.29ms
iter 32060: loss 35.7016, time 120.80ms
iter 32070: loss 28.3070, time 122.52ms
iter 32080: loss 32.7214, time 123.97ms
iter 32090: loss 27.4171, time 122.16ms
tensor(0.6243)
iter 32100: loss 28.3971, time 120.28ms
iter 32110: loss 34.5884, time 119.50ms
iter 32120: loss 35.3081, time 122.08ms
iter 32130: loss 26.2484, time 123.68ms
iter 32140: loss 32.3418, time 121.94ms
iter 32150: loss 31.4269, time 121.67ms
iter 32160: loss 32.2700, time 119.69ms
iter 32170: loss 25.5868, time 122.16ms
iter 32180: loss 33.7803, time 122.32ms
iter 32190: loss 32.1505, time 122.09ms
tensor(0.5937)
iter 32200: loss 34.0279, time 122.20ms
iter 32210: loss 23.6102, time 121.78ms
iter 32220: loss 27.6906, time 120.51ms
iter 32230: loss 22.9485, time 121.85ms
iter 32240: loss 28.2360, time 122.09ms
step 32250: train loss 17.2947, val loss 17.4381
saving checkpoint to out-shakespeare-char
iter 32250: loss 27.6044, time 2830.44ms
iter 32260: loss 26.4810, time 121.66ms
iter 32270: loss 28.2064, time 119.78ms
iter 32280: loss 30.5808, time 121.52ms
iter 32290: loss 30.6814, time 121.87ms
tensor(0.5627)
iter 32300: loss 28.9236, time 123.86ms
iter 32310: loss 30.2164, time 121.88ms
iter 32320: loss 25.4947, time 121.92ms
iter 32330: loss 47.1516, time 120.35ms
iter 32340: loss 25.8086, time 121.72ms
iter 32350: loss 23.4023, time 121.38ms
iter 32360: loss 31.2516, time 123.52ms
iter 32370: loss 26.7329, time 121.72ms
iter 32380: loss 37.1925, time 121.66ms
iter 32390: loss 29.7703, time 119.76ms
tensor(0.5314)
iter 32400: loss 26.3868, time 119.71ms
iter 32410: loss 21.9685, time 121.63ms
iter 32420: loss 25.6663, time 123.62ms
iter 32430: loss 24.9513, time 121.43ms
iter 32440: loss 22.9673, time 121.84ms
iter 32450: loss 29.0792, time 119.74ms
iter 32460: loss 28.6964, time 121.24ms
iter 32470: loss 31.9266, time 121.76ms
iter 32480: loss 34.4355, time 121.95ms
iter 32490: loss 24.5830, time 124.07ms
tensor(0.5000)
step 32500: train loss 19.3270, val loss 19.0957
saving checkpoint to out-shakespeare-char
iter 32500: loss 25.7211, time 2834.92ms
iter 32510: loss 33.9816, time 120.91ms
iter 32520: loss 29.7884, time 121.78ms
iter 32530: loss 25.9621, time 123.13ms
iter 32540: loss 32.1598, time 121.88ms
iter 32550: loss 65.3302, time 121.68ms
iter 32560: loss 32.9991, time 120.52ms
iter 32570: loss 24.2828, time 120.74ms
iter 32580: loss 27.6497, time 121.85ms
iter 32590: loss 28.7266, time 122.89ms
tensor(0.4686)
iter 32600: loss 25.0019, time 122.27ms
iter 32610: loss 19.1621, time 121.89ms
iter 32620: loss 27.7956, time 122.09ms
iter 32630: loss 18.8819, time 119.87ms
iter 32640: loss 20.9057, time 121.73ms
iter 32650: loss 17.8903, time 121.76ms
iter 32660: loss 14.9293, time 123.79ms
iter 32670: loss 16.5029, time 121.68ms
iter 32680: loss 16.3896, time 119.52ms
iter 32690: loss 15.0412, time 121.78ms
tensor(0.4373)
iter 32700: loss 15.6795, time 121.72ms
iter 32710: loss 14.5186, time 122.74ms
iter 32720: loss 13.4575, time 124.39ms
iter 32730: loss 13.2272, time 121.73ms
iter 32740: loss 12.6272, time 121.97ms
step 32750: train loss 8.0819, val loss 8.0889
saving checkpoint to out-shakespeare-char
iter 32750: loss 12.8674, time 2854.07ms
iter 32760: loss 13.6486, time 122.91ms
iter 32770: loss 12.5009, time 121.86ms
iter 32780: loss 13.0440, time 121.90ms
iter 32790: loss 12.3111, time 120.02ms
tensor(0.4063)
iter 32800: loss 12.9708, time 121.19ms
iter 32810: loss 12.5597, time 121.80ms
iter 32820: loss 11.0646, time 123.28ms
iter 32830: loss 10.9628, time 121.74ms
iter 32840: loss 13.4292, time 121.47ms
iter 32850: loss 11.5014, time 119.29ms
iter 32860: loss 11.5185, time 120.47ms
iter 32870: loss 12.1484, time 121.61ms
iter 32880: loss 11.7575, time 122.37ms
iter 32890: loss 11.7210, time 121.78ms
tensor(0.3757)
iter 32900: loss 11.8458, time 121.60ms
iter 32910: loss 11.1907, time 121.80ms
iter 32920: loss 11.5550, time 119.96ms
iter 32930: loss 10.5240, time 121.81ms
iter 32940: loss 11.5116, time 121.69ms
iter 32950: loss 10.5827, time 123.19ms
iter 32960: loss 10.8888, time 122.12ms
iter 32970: loss 10.3038, time 119.43ms
iter 32980: loss 10.3185, time 120.89ms
iter 32990: loss 9.9202, time 119.36ms
tensor(0.3455)
step 33000: train loss 7.8902, val loss 7.8938
saving checkpoint to out-shakespeare-char
iter 33000: loss 9.5883, time 2833.55ms
iter 33010: loss 11.0645, time 123.88ms
iter 33020: loss 10.5596, time 121.94ms
iter 33030: loss 10.5676, time 119.80ms
iter 33040: loss 10.7583, time 121.86ms
iter 33050: loss 10.3218, time 120.25ms
iter 33060: loss 10.1328, time 121.82ms
iter 33070: loss 9.6788, time 122.63ms
iter 33080: loss 10.0549, time 123.94ms
iter 33090: loss 10.3789, time 121.78ms
tensor(0.3159)
iter 33100: loss 8.9336, time 122.34ms
iter 33110: loss 9.5810, time 120.35ms
iter 33120: loss 9.5576, time 122.16ms
iter 33130: loss 9.2613, time 123.30ms
iter 33140: loss 10.0839, time 121.95ms
iter 33150: loss 9.6640, time 121.84ms
iter 33160: loss 9.2789, time 119.83ms
iter 33170: loss 9.7990, time 120.59ms
iter 33180: loss 9.8371, time 120.03ms
iter 33190: loss 9.5576, time 122.04ms
tensor(0.2871)
iter 33200: loss 9.3132, time 123.68ms
iter 33210: loss 9.4548, time 123.91ms
iter 33220: loss 9.3117, time 121.51ms
iter 33230: loss 8.9356, time 120.83ms
iter 33240: loss 9.7112, time 118.70ms
step 33250: train loss 7.8765, val loss 7.8636
saving checkpoint to out-shakespeare-char
iter 33250: loss 9.1072, time 2822.07ms
iter 33260: loss 8.7523, time 120.65ms
iter 33270: loss 9.3057, time 120.50ms
iter 33280: loss 9.5156, time 122.21ms
iter 33290: loss 9.0187, time 121.16ms
tensor(0.2591)
iter 33300: loss 9.1232, time 124.03ms
iter 33310: loss 9.0785, time 121.88ms
iter 33320: loss 9.0279, time 121.70ms
iter 33330: loss 8.9261, time 121.46ms
iter 33340: loss 9.1014, time 118.74ms
iter 33350: loss 9.6375, time 119.44ms
iter 33360: loss 8.9560, time 121.12ms
iter 33370: loss 9.2067, time 121.22ms
iter 33380: loss 9.2050, time 123.56ms
iter 33390: loss 8.9938, time 123.94ms
tensor(0.2321)
iter 33400: loss 8.6678, time 126.44ms
iter 33410: loss 8.8427, time 119.53ms
iter 33420: loss 9.2315, time 120.51ms
iter 33430: loss 8.8489, time 120.18ms
iter 33440: loss 9.9318, time 122.78ms
iter 33450: loss 9.3265, time 123.74ms
iter 33460: loss 8.9968, time 121.27ms
iter 33470: loss 8.8021, time 121.86ms
iter 33480: loss 9.1841, time 120.20ms
iter 33490: loss 8.7226, time 122.32ms
tensor(0.2061)
step 33500: train loss 7.7847, val loss 7.7965
saving checkpoint to out-shakespeare-char
iter 33500: loss 8.5292, time 2838.16ms
iter 33510: loss 9.0143, time 121.72ms
iter 33520: loss 8.7002, time 121.79ms
iter 33530: loss 8.4384, time 120.34ms
iter 33540: loss 8.6081, time 121.68ms
iter 33550: loss 8.2402, time 124.17ms
iter 33560: loss 8.4680, time 123.97ms
iter 33570: loss 8.5796, time 119.63ms
iter 33580: loss 8.4407, time 119.60ms
iter 33590: loss 8.6907, time 119.60ms
tensor(0.1813)
iter 33600: loss 8.5398, time 120.65ms
iter 33610: loss 8.4504, time 121.39ms
iter 33620: loss 8.7001, time 123.37ms
iter 33630: loss 8.7488, time 122.16ms
iter 33640: loss 8.5595, time 123.90ms
iter 33650: loss 8.2957, time 121.97ms
iter 33660: loss 8.7471, time 119.61ms
iter 33670: loss 9.1539, time 119.45ms
iter 33680: loss 8.5501, time 120.74ms
iter 33690: loss 8.2687, time 122.51ms
tensor(0.1577)
iter 33700: loss 8.8087, time 124.15ms
iter 33710: loss 8.3490, time 121.45ms
iter 33720: loss 8.3373, time 121.42ms
iter 33730: loss 8.3907, time 119.56ms
iter 33740: loss 8.7027, time 119.66ms
step 33750: train loss 7.7903, val loss 7.7726
saving checkpoint to out-shakespeare-char
iter 33750: loss 8.7727, time 2828.08ms
iter 33760: loss 8.7863, time 123.23ms
iter 33770: loss 8.3242, time 121.86ms
iter 33780: loss 8.4580, time 121.47ms
iter 33790: loss 9.0672, time 119.95ms
tensor(0.1355)
iter 33800: loss 8.7846, time 119.82ms
iter 33810: loss 8.4625, time 119.47ms
iter 33820: loss 8.8388, time 120.02ms
iter 33830: loss 9.0249, time 120.06ms
iter 33840: loss 9.1876, time 123.69ms
iter 33850: loss 9.2807, time 123.52ms
iter 33860: loss 8.4554, time 121.29ms
iter 33870: loss 8.5838, time 119.95ms
iter 33880: loss 8.3406, time 119.83ms
iter 33890: loss 8.0527, time 120.35ms
tensor(0.1147)
iter 33900: loss 8.2084, time 122.21ms
iter 33910: loss 8.4336, time 121.74ms
iter 33920: loss 8.6297, time 123.91ms
iter 33930: loss 8.4176, time 123.04ms
iter 33940: loss 8.1663, time 121.94ms
iter 33950: loss 7.8306, time 119.53ms
iter 33960: loss 8.0533, time 119.33ms
iter 33970: loss 8.1497, time 119.64ms
iter 33980: loss 9.0270, time 121.67ms
iter 33990: loss 8.4668, time 122.09ms
tensor(0.0955)
step 34000: train loss 7.7336, val loss 7.7014
saving checkpoint to out-shakespeare-char
iter 34000: loss 8.1472, time 2830.37ms
iter 34010: loss 8.1937, time 121.85ms
iter 34020: loss 7.9505, time 118.96ms
iter 34030: loss 8.6927, time 119.42ms
iter 34040: loss 9.2116, time 119.73ms
iter 34050: loss 8.3770, time 120.94ms
iter 34060: loss 7.9790, time 121.95ms
iter 34070: loss 8.1956, time 123.63ms
iter 34080: loss 8.0075, time 121.76ms
iter 34090: loss 8.5396, time 121.06ms
tensor(0.0778)
iter 34100: loss 8.4181, time 122.23ms
iter 34110: loss 8.7632, time 118.53ms
iter 34120: loss 8.1579, time 119.05ms
iter 34130: loss 7.8758, time 119.81ms
iter 34140: loss 8.3427, time 121.58ms
iter 34150: loss 8.7169, time 122.15ms
iter 34160: loss 8.1357, time 121.02ms
iter 34170: loss 8.3596, time 123.43ms
iter 34180: loss 8.2921, time 123.66ms
iter 34190: loss 8.0235, time 121.71ms
tensor(0.0618)
iter 34200: loss 8.5929, time 119.76ms
iter 34210: loss 8.4731, time 118.47ms
iter 34220: loss 8.8260, time 119.22ms
iter 34230: loss 8.6673, time 120.43ms
iter 34240: loss 8.2546, time 121.85ms
step 34250: train loss 7.6941, val loss 7.7414
saving checkpoint to out-shakespeare-char
iter 34250: loss 8.0935, time 2839.84ms
iter 34260: loss 8.3333, time 121.42ms
iter 34270: loss 8.1829, time 119.71ms
iter 34280: loss 8.3513, time 119.63ms
iter 34290: loss 8.5014, time 121.18ms
tensor(0.0476)
iter 34300: loss 8.2197, time 122.49ms
iter 34310: loss 9.1477, time 123.73ms
iter 34320: loss 8.3851, time 123.84ms
iter 34330: loss 8.7167, time 121.33ms
iter 34340: loss 8.0709, time 116.25ms
iter 34350: loss 7.4860, time 119.36ms
iter 34360: loss 8.4692, time 119.55ms
iter 34370: loss 7.5923, time 119.96ms
iter 34380: loss 8.4200, time 122.14ms
iter 34390: loss 8.8759, time 123.58ms
tensor(0.0351)
iter 34400: loss 8.4828, time 123.94ms
iter 34410: loss 8.2941, time 119.73ms
iter 34420: loss 8.5848, time 121.15ms
iter 34430: loss 8.0309, time 119.69ms
iter 34440: loss 8.3614, time 119.96ms
iter 34450: loss 8.3818, time 120.84ms
iter 34460: loss 8.0884, time 122.59ms
iter 34470: loss 8.3257, time 121.94ms
iter 34480: loss 7.7338, time 123.80ms
iter 34490: loss 7.8541, time 120.94ms
tensor(0.0245)
step 34500: train loss 7.6474, val loss 7.7093
saving checkpoint to out-shakespeare-char
iter 34500: loss 8.3846, time 2870.32ms
iter 34510: loss 8.2787, time 124.49ms
iter 34520: loss 8.3661, time 121.62ms
iter 34530: loss 8.0661, time 120.81ms
iter 34540: loss 8.5160, time 119.43ms
iter 34550: loss 8.0908, time 121.62ms
iter 34560: loss 8.2987, time 121.54ms
iter 34570: loss 8.4697, time 122.19ms
iter 34580: loss 8.2260, time 123.27ms
iter 34590: loss 7.5203, time 123.66ms
tensor(0.0157)
iter 34600: loss 8.7878, time 119.84ms
iter 34610: loss 8.3780, time 120.62ms
iter 34620: loss 8.3031, time 120.62ms
iter 34630: loss 8.5692, time 124.17ms
iter 34640: loss 8.2765, time 122.83ms
iter 34650: loss 8.0489, time 121.96ms
iter 34660: loss 8.2315, time 120.47ms
iter 34670: loss 8.3134, time 119.82ms
iter 34680: loss 8.4458, time 117.17ms
iter 34690: loss 8.0785, time 119.54ms
tensor(0.0089)
iter 34700: loss 7.6272, time 121.91ms
iter 34710: loss 8.4767, time 123.18ms
iter 34720: loss 8.2135, time 123.68ms
iter 34730: loss 8.1302, time 121.59ms
iter 34740: loss 8.9974, time 121.60ms
step 34750: train loss 7.6829, val loss 7.6796
saving checkpoint to out-shakespeare-char
iter 34750: loss 8.4040, time 2836.42ms
iter 34760: loss 8.1785, time 122.30ms
iter 34770: loss 8.2456, time 122.07ms
iter 34780: loss 8.2694, time 123.05ms
iter 34790: loss 7.8743, time 123.68ms
tensor(0.0039)
iter 34800: loss 8.1926, time 123.72ms
iter 34810: loss 8.1576, time 119.40ms
iter 34820: loss 8.1914, time 122.82ms
iter 34830: loss 8.2643, time 118.71ms
iter 34840: loss 7.8067, time 118.85ms
iter 34850: loss 9.3451, time 118.67ms
iter 34860: loss 8.1621, time 118.66ms
iter 34870: loss 7.8862, time 119.21ms
iter 34880: loss 7.7418, time 119.00ms
iter 34890: loss 9.0136, time 118.68ms
tensor(0.0010)
iter 34900: loss 8.1573, time 119.56ms
iter 34910: loss 8.3659, time 119.48ms
iter 34920: loss 8.3276, time 118.62ms
iter 34930: loss 8.3552, time 119.97ms
iter 34940: loss 8.8856, time 120.12ms
iter 34950: loss 8.6503, time 120.43ms
iter 34960: loss 7.9919, time 120.16ms
iter 34970: loss 7.8031, time 118.00ms
iter 34980: loss 8.3169, time 119.06ms
iter 34990: loss 8.2157, time 118.79ms
tensor(0.0010)
step 35000: train loss 7.6738, val loss 7.6504
saving checkpoint to out-shakespeare-char
iter 35000: loss 8.0711, time 2849.73ms
iter 35010: loss 8.1547, time 121.79ms
iter 35020: loss 8.1978, time 119.89ms
iter 35030: loss 8.6366, time 122.09ms
iter 35040: loss 8.6797, time 122.17ms
iter 35050: loss 8.2820, time 122.17ms
iter 35060: loss 8.5943, time 121.71ms
iter 35070: loss 9.0096, time 120.06ms
iter 35080: loss 8.6191, time 121.46ms
iter 35090: loss 7.5998, time 120.68ms
tensor(0.0010)
iter 35100: loss 8.4621, time 122.38ms
iter 35110: loss 8.1021, time 122.20ms
iter 35120: loss 8.3868, time 120.40ms
iter 35130: loss 8.2362, time 122.17ms
iter 35140: loss 8.4493, time 120.75ms
iter 35150: loss 8.0412, time 121.64ms
iter 35160: loss 7.8373, time 121.88ms
iter 35170: loss 9.2261, time 120.77ms
iter 35180: loss 7.8076, time 121.76ms
iter 35190: loss 8.8863, time 121.74ms
tensor(0.0039)
iter 35200: loss 8.2124, time 123.24ms
iter 35210: loss 8.1981, time 122.48ms
iter 35220: loss 7.9339, time 120.88ms
iter 35230: loss 7.7716, time 122.30ms
iter 35240: loss 7.6403, time 121.89ms
step 35250: train loss 7.6481, val loss 7.6829
saving checkpoint to out-shakespeare-char
iter 35250: loss 8.6365, time 2832.83ms
iter 35260: loss 8.0919, time 120.94ms
iter 35270: loss 8.1657, time 122.10ms
iter 35280: loss 7.9695, time 122.39ms
iter 35290: loss 8.4203, time 122.72ms
tensor(0.0089)
iter 35300: loss 8.2105, time 120.90ms
iter 35310: loss 7.9273, time 120.93ms
iter 35320: loss 8.6315, time 120.35ms
iter 35330: loss 8.6501, time 120.90ms
iter 35340: loss 8.0791, time 120.64ms
iter 35350: loss 8.1466, time 118.50ms
iter 35360: loss 8.6722, time 118.63ms
iter 35370: loss 7.9342, time 118.58ms
iter 35380: loss 8.0756, time 118.68ms
iter 35390: loss 9.0379, time 118.66ms
tensor(0.0157)
iter 35400: loss 8.3662, time 118.92ms
iter 35410: loss 8.5563, time 118.38ms
iter 35420: loss 8.3014, time 118.42ms
iter 35430: loss 7.9447, time 118.84ms
iter 35440: loss 7.4662, time 119.41ms
iter 35450: loss 8.7564, time 119.67ms
iter 35460: loss 8.2548, time 119.58ms
iter 35470: loss 8.1297, time 121.57ms
iter 35480: loss 8.1315, time 120.77ms
iter 35490: loss 7.7226, time 122.23ms
tensor(0.0245)
step 35500: train loss 7.6531, val loss 7.6526
saving checkpoint to out-shakespeare-char
iter 35500: loss 8.0549, time 2848.40ms
iter 35510: loss 8.3399, time 120.96ms
iter 35520: loss 8.0597, time 118.91ms
iter 35530: loss 8.4104, time 118.73ms
iter 35540: loss 8.6975, time 119.03ms
iter 35550: loss 7.9501, time 118.94ms
iter 35560: loss 8.4115, time 118.80ms
iter 35570: loss 8.3951, time 119.03ms
iter 35580: loss 8.4088, time 119.78ms
iter 35590: loss 7.9716, time 121.36ms
tensor(0.0351)
iter 35600: loss 8.6976, time 122.03ms
iter 35610: loss 7.7767, time 122.56ms
iter 35620: loss 7.8077, time 122.84ms
iter 35630: loss 9.0577, time 123.05ms
iter 35640: loss 8.1977, time 120.33ms
iter 35650: loss 8.3828, time 121.13ms
iter 35660: loss 8.4074, time 121.05ms
iter 35670: loss 8.3156, time 120.57ms
iter 35680: loss 8.0818, time 118.49ms
iter 35690: loss 8.3155, time 118.06ms
tensor(0.0476)
iter 35700: loss 8.4296, time 118.99ms
iter 35710: loss 8.1137, time 118.86ms
iter 35720: loss 8.3129, time 119.16ms
iter 35730: loss 7.8982, time 117.63ms
iter 35740: loss 8.1678, time 120.12ms
step 35750: train loss 7.6961, val loss 7.6649
saving checkpoint to out-shakespeare-char
iter 35750: loss 8.4449, time 2852.14ms
iter 35760: loss 7.9915, time 117.12ms
iter 35770: loss 8.7809, time 115.37ms
iter 35780: loss 8.5276, time 118.81ms
iter 35790: loss 8.1395, time 116.81ms
tensor(0.0618)
iter 35800: loss 8.1168, time 114.82ms
iter 35810: loss 8.1040, time 116.57ms
iter 35820: loss 9.0687, time 117.78ms
iter 35830: loss 7.6900, time 115.13ms
iter 35840: loss 8.3036, time 118.76ms
iter 35850: loss 8.3074, time 117.39ms
iter 35860: loss 8.4044, time 114.57ms
iter 35870: loss 8.5262, time 117.17ms
iter 35880: loss 8.2818, time 117.91ms
iter 35890: loss 7.7706, time 115.01ms
tensor(0.0778)
iter 35900: loss 7.8028, time 117.76ms
iter 35910: loss 8.5154, time 117.41ms
iter 35920: loss 8.1720, time 115.43ms
iter 35930: loss 8.6156, time 114.93ms
iter 35940: loss 8.5464, time 117.86ms
iter 35950: loss 8.1874, time 115.20ms
iter 35960: loss 8.3776, time 118.81ms
iter 35970: loss 7.7417, time 117.57ms
iter 35980: loss 8.3417, time 114.57ms
iter 35990: loss 8.0177, time 116.41ms
tensor(0.0955)
step 36000: train loss 7.7481, val loss 7.7183
saving checkpoint to out-shakespeare-char
iter 36000: loss 8.0590, time 2848.04ms
iter 36010: loss 8.1152, time 117.54ms
iter 36020: loss 8.6698, time 114.99ms
iter 36030: loss 8.1230, time 118.92ms
iter 36040: loss 8.2362, time 117.29ms
iter 36050: loss 8.3115, time 114.62ms
iter 36060: loss 8.6801, time 118.72ms
iter 36070: loss 8.0449, time 117.69ms
iter 36080: loss 8.5238, time 114.79ms
iter 36090: loss 8.0172, time 118.65ms
tensor(0.1147)
iter 36100: loss 8.6236, time 118.42ms
iter 36110: loss 8.2492, time 114.60ms
iter 36120: loss 8.5524, time 118.66ms
iter 36130: loss 8.0760, time 117.75ms
iter 36140: loss 8.0842, time 114.43ms
iter 36150: loss 8.4351, time 119.22ms
iter 36160: loss 7.9282, time 117.44ms
iter 36170: loss 7.7161, time 114.59ms
iter 36180: loss 8.3145, time 118.58ms
iter 36190: loss 8.3572, time 117.89ms
tensor(0.1355)
iter 36200: loss 8.3637, time 115.50ms
iter 36210: loss 8.2610, time 119.23ms
iter 36220: loss 8.4507, time 117.79ms
iter 36230: loss 8.1712, time 114.66ms
iter 36240: loss 8.2850, time 116.93ms
step 36250: train loss 7.7665, val loss 7.7518
saving checkpoint to out-shakespeare-char
iter 36250: loss 8.8240, time 2848.44ms
iter 36260: loss 8.7131, time 116.85ms
iter 36270: loss 8.5727, time 114.74ms
iter 36280: loss 8.0257, time 117.49ms
iter 36290: loss 8.4497, time 116.63ms
tensor(0.1577)
iter 36300: loss 7.8753, time 115.25ms
iter 36310: loss 7.5576, time 118.73ms
iter 36320: loss 7.7039, time 116.73ms
iter 36330: loss 8.3384, time 114.65ms
iter 36340: loss 7.6176, time 116.80ms
iter 36350: loss 8.2296, time 117.43ms
iter 36360: loss 8.1315, time 115.06ms
iter 36370: loss 7.9542, time 118.37ms
iter 36380: loss 7.9728, time 116.85ms
iter 36390: loss 8.1158, time 114.57ms
tensor(0.1813)
iter 36400: loss 8.1960, time 117.22ms
iter 36410: loss 8.2564, time 117.28ms
iter 36420: loss 8.7931, time 114.63ms
iter 36430: loss 7.8686, time 118.68ms
iter 36440: loss 8.0593, time 116.79ms
iter 36450: loss 7.7599, time 114.50ms
iter 36460: loss 8.5395, time 116.67ms
iter 36470: loss 7.6028, time 117.20ms
iter 36480: loss 7.7734, time 114.86ms
iter 36490: loss 8.3095, time 118.80ms
tensor(0.2061)
step 36500: train loss 7.7836, val loss 7.8146
saving checkpoint to out-shakespeare-char
iter 36500: loss 8.7469, time 2854.47ms
iter 36510: loss 8.4905, time 117.28ms
iter 36520: loss 8.3474, time 114.67ms
iter 36530: loss 8.2409, time 118.61ms
iter 36540: loss 8.2731, time 117.10ms
iter 36550: loss 8.4004, time 114.62ms
iter 36560: loss 8.4316, time 118.35ms
iter 36570: loss 8.2212, time 117.32ms
iter 36580: loss 8.4258, time 114.54ms
iter 36590: loss 8.8219, time 119.22ms
tensor(0.2321)
iter 36600: loss 8.1434, time 119.00ms
iter 36610: loss 8.9637, time 116.12ms
iter 36620: loss 8.1114, time 117.41ms
iter 36630: loss 8.3890, time 117.33ms
iter 36640: loss 8.2826, time 115.63ms
iter 36650: loss 8.3936, time 116.95ms
iter 36660: loss 8.0890, time 117.53ms
iter 36670: loss 8.1751, time 115.79ms
iter 36680: loss 8.5960, time 116.75ms
iter 36690: loss 8.5418, time 117.82ms
tensor(0.2591)
iter 36700: loss 8.4177, time 116.05ms
iter 36710: loss 8.5778, time 116.79ms
iter 36720: loss 7.7987, time 117.62ms
iter 36730: loss 8.6323, time 115.91ms
iter 36740: loss 8.1132, time 116.89ms
step 36750: train loss 7.8433, val loss 7.8263
saving checkpoint to out-shakespeare-char
iter 36750: loss 7.7442, time 2838.95ms
iter 36760: loss 8.5426, time 116.78ms
iter 36770: loss 8.0414, time 115.43ms
iter 36780: loss 8.4210, time 118.80ms
iter 36790: loss 7.9850, time 115.11ms
tensor(0.2871)
iter 36800: loss 7.2173, time 115.30ms
iter 36810: loss 7.9816, time 118.69ms
iter 36820: loss 8.4125, time 117.08ms
iter 36830: loss 8.2785, time 114.51ms
iter 36840: loss 7.9406, time 118.92ms
iter 36850: loss 8.5456, time 114.57ms
iter 36860: loss 8.4574, time 116.45ms
iter 36870: loss 8.8384, time 117.86ms
iter 36880: loss 8.3834, time 116.73ms
iter 36890: loss 8.3284, time 114.84ms
tensor(0.3159)
iter 36900: loss 7.6880, time 119.20ms
iter 36910: loss 7.9655, time 114.63ms
iter 36920: loss 7.6229, time 115.16ms
iter 36930: loss 8.0502, time 119.08ms
iter 36940: loss 8.1174, time 116.71ms
iter 36950: loss 7.7650, time 115.19ms
iter 36960: loss 8.7649, time 118.69ms
iter 36970: loss 8.7197, time 114.56ms
iter 36980: loss 8.2162, time 116.65ms
iter 36990: loss 8.2881, time 118.08ms
tensor(0.3455)
step 37000: train loss 7.8620, val loss 7.8824
saving checkpoint to out-shakespeare-char
iter 37000: loss 8.6448, time 2843.87ms
iter 37010: loss 8.5113, time 116.75ms
iter 37020: loss 8.5717, time 116.73ms
iter 37030: loss 7.9087, time 116.84ms
iter 37040: loss 8.3916, time 116.01ms
iter 37050: loss 7.7191, time 116.13ms
iter 37060: loss 7.9778, time 118.89ms
iter 37070: loss 8.5102, time 116.99ms
iter 37080: loss 8.1828, time 116.81ms
iter 37090: loss 7.8477, time 116.83ms
tensor(0.3757)
iter 37100: loss 8.7074, time 117.04ms
iter 37110: loss 8.4885, time 116.68ms
iter 37120: loss 8.1157, time 118.91ms
iter 37130: loss 7.9511, time 117.28ms
iter 37140: loss 8.0696, time 115.17ms
iter 37150: loss 7.9347, time 116.03ms
iter 37160: loss 8.1680, time 116.79ms
iter 37170: loss 8.6097, time 115.85ms
iter 37180: loss 8.5637, time 119.13ms
iter 37190: loss 8.5989, time 116.71ms
tensor(0.4063)
iter 37200: loss 7.9569, time 115.38ms
iter 37210: loss 7.8523, time 115.83ms
iter 37220: loss 8.0430, time 116.74ms
iter 37230: loss 8.7701, time 114.76ms
iter 37240: loss 7.7230, time 119.16ms
step 37250: train loss 7.8985, val loss 7.8901
saving checkpoint to out-shakespeare-char
iter 37250: loss 7.6207, time 2851.71ms
iter 37260: loss 7.9005, time 120.46ms
iter 37270: loss 7.6425, time 119.00ms
iter 37280: loss 7.7872, time 118.29ms
iter 37290: loss 8.2110, time 116.43ms
tensor(0.4373)
iter 37300: loss 8.7047, time 119.69ms
iter 37310: loss 8.1263, time 120.30ms
iter 37320: loss 8.5494, time 120.78ms
iter 37330: loss 7.8290, time 121.05ms
iter 37340: loss 8.1138, time 120.60ms
iter 37350: loss 8.1668, time 120.63ms
iter 37360: loss 8.2254, time 120.92ms
iter 37370: loss 8.7593, time 121.71ms
iter 37380: loss 8.7919, time 120.51ms
iter 37390: loss 8.7126, time 122.38ms
tensor(0.4686)
iter 37400: loss 8.7640, time 123.04ms
iter 37410: loss 8.4082, time 121.75ms
iter 37420: loss 8.9757, time 120.65ms
iter 37430: loss 8.3963, time 118.98ms
iter 37440: loss 8.2912, time 119.97ms
iter 37450: loss 8.4919, time 120.53ms
iter 37460: loss 8.8287, time 120.78ms
iter 37470: loss 8.8235, time 119.50ms
iter 37480: loss 8.2099, time 118.41ms
iter 37490: loss 8.2226, time 120.64ms
tensor(0.5000)
step 37500: train loss 7.8718, val loss 7.9277
saving checkpoint to out-shakespeare-char
iter 37500: loss 8.4089, time 2857.53ms
iter 37510: loss 8.8352, time 120.62ms
iter 37520: loss 8.0817, time 121.57ms
iter 37530: loss 8.1928, time 120.66ms
iter 37540: loss 8.4198, time 120.78ms
iter 37550: loss 8.0563, time 121.36ms
iter 37560: loss 7.9830, time 122.03ms
iter 37570: loss 8.8263, time 122.42ms
iter 37580: loss 8.0956, time 122.05ms
iter 37590: loss 7.8261, time 120.68ms
tensor(0.5314)
iter 37600: loss 8.5494, time 121.98ms
iter 37610: loss 8.0864, time 120.64ms
iter 37620: loss 8.2375, time 120.42ms
iter 37630: loss 8.3354, time 120.77ms
iter 37640: loss 7.9774, time 118.37ms
iter 37650: loss 8.3816, time 120.66ms
iter 37660: loss 8.3548, time 120.57ms
iter 37670: loss 8.1916, time 120.45ms
iter 37680: loss 8.2033, time 120.79ms
iter 37690: loss 8.5378, time 118.64ms
tensor(0.5627)
iter 37700: loss 8.0861, time 118.15ms
iter 37710: loss 7.9986, time 119.00ms
iter 37720: loss 8.4514, time 119.21ms
iter 37730: loss 8.0689, time 120.09ms
iter 37740: loss 8.0042, time 121.13ms
step 37750: train loss 7.9171, val loss 7.9381
saving checkpoint to out-shakespeare-char
iter 37750: loss 8.1222, time 2859.94ms
iter 37760: loss 8.6082, time 120.80ms
iter 37770: loss 8.2088, time 120.58ms
iter 37780: loss 8.8712, time 120.98ms
iter 37790: loss 8.1992, time 119.48ms
tensor(0.5937)
iter 37800: loss 8.2941, time 119.50ms
iter 37810: loss 8.6944, time 118.36ms
iter 37820: loss 7.8096, time 119.64ms
iter 37830: loss 8.3728, time 120.52ms
iter 37840: loss 7.7843, time 120.53ms
iter 37850: loss 8.3555, time 120.57ms
iter 37860: loss 8.4263, time 118.38ms
iter 37870: loss 7.8579, time 120.72ms
iter 37880: loss 8.4402, time 121.70ms
iter 37890: loss 8.3317, time 115.87ms
tensor(0.6243)
iter 37900: loss 8.6526, time 115.05ms
iter 37910: loss 8.4832, time 118.59ms
iter 37920: loss 8.0997, time 115.76ms
iter 37930: loss 8.3622, time 120.89ms
iter 37940: loss 8.0455, time 119.56ms
iter 37950: loss 8.5412, time 121.02ms
iter 37960: loss 7.8532, time 121.20ms
iter 37970: loss 8.3138, time 121.46ms
iter 37980: loss 8.4114, time 122.53ms
iter 37990: loss 7.9802, time 123.18ms
tensor(0.6545)
step 38000: train loss 7.9246, val loss 7.9126
saving checkpoint to out-shakespeare-char
iter 38000: loss 8.6652, time 2851.98ms
iter 38010: loss 8.9384, time 120.56ms
iter 38020: loss 8.9254, time 120.69ms
iter 38030: loss 8.2705, time 121.12ms
iter 38040: loss 8.0971, time 123.10ms
iter 38050: loss 8.0954, time 122.74ms
iter 38060: loss 7.9839, time 118.84ms
iter 38070: loss 7.9514, time 121.18ms
iter 38080: loss 8.1005, time 122.03ms
iter 38090: loss 7.5596, time 119.84ms
tensor(0.6841)
iter 38100: loss 8.1443, time 121.07ms
iter 38110: loss 8.2730, time 120.54ms
iter 38120: loss 7.9418, time 120.33ms
iter 38130: loss 8.4434, time 122.80ms
iter 38140: loss 8.5877, time 121.39ms
iter 38150: loss 8.5039, time 121.46ms
iter 38160: loss 8.5508, time 121.36ms
iter 38170: loss 8.4163, time 119.30ms
iter 38180: loss 8.0468, time 120.94ms
iter 38190: loss 8.6673, time 120.65ms
tensor(0.7129)
iter 38200: loss 8.0343, time 120.86ms
iter 38210: loss 8.4705, time 123.17ms
iter 38220: loss 8.8119, time 123.42ms
iter 38230: loss 7.9827, time 121.66ms
iter 38240: loss 8.3752, time 121.40ms
step 38250: train loss 7.9589, val loss 7.8974
saving checkpoint to out-shakespeare-char
iter 38250: loss 8.1833, time 2858.26ms
iter 38260: loss 8.1411, time 121.26ms
iter 38270: loss 8.1137, time 123.31ms
iter 38280: loss 8.0363, time 121.21ms
iter 38290: loss 7.6804, time 121.62ms
tensor(0.7409)
iter 38300: loss 8.2978, time 121.76ms
iter 38310: loss 8.2079, time 119.39ms
iter 38320: loss 9.0265, time 118.94ms
iter 38330: loss 8.3739, time 121.35ms
iter 38340: loss 8.5436, time 121.56ms
iter 38350: loss 8.4040, time 122.72ms
iter 38360: loss 8.0946, time 122.56ms
iter 38370: loss 8.2620, time 121.02ms
iter 38380: loss 8.2301, time 120.59ms
iter 38390: loss 7.5285, time 121.21ms
tensor(0.7679)
iter 38400: loss 8.1462, time 119.67ms
iter 38410: loss 8.4536, time 119.59ms
iter 38420: loss 8.4341, time 120.82ms
iter 38430: loss 8.0560, time 120.46ms
iter 38440: loss 8.0440, time 121.30ms
iter 38450: loss 8.2938, time 121.26ms
iter 38460: loss 8.3574, time 122.72ms
iter 38470: loss 8.4983, time 120.24ms
iter 38480: loss 8.6952, time 121.57ms
iter 38490: loss 8.5516, time 121.37ms
tensor(0.7939)
step 38500: train loss 7.9224, val loss 8.0116
saving checkpoint to out-shakespeare-char
iter 38500: loss 8.3370, time 2862.30ms
iter 38510: loss 7.8744, time 121.50ms
iter 38520: loss 8.3597, time 121.48ms
iter 38530: loss 8.2791, time 120.89ms
iter 38540: loss 8.0763, time 121.56ms
iter 38550: loss 7.8825, time 119.22ms
iter 38560: loss 8.5444, time 120.97ms
iter 38570: loss 7.9831, time 118.80ms
iter 38580: loss 8.7822, time 121.48ms
iter 38590: loss 8.5557, time 122.80ms
tensor(0.8187)
iter 38600: loss 8.9169, time 124.18ms
iter 38610: loss 8.0485, time 121.14ms
iter 38620: loss 8.0615, time 121.50ms
iter 38630: loss 8.2854, time 119.13ms
iter 38640: loss 8.1631, time 119.80ms
iter 38650: loss 8.0509, time 120.82ms
iter 38660: loss 7.8507, time 121.21ms
iter 38670: loss 7.8063, time 121.30ms
iter 38680: loss 8.5313, time 123.16ms
iter 38690: loss 8.9957, time 123.40ms
tensor(0.8423)
iter 38700: loss 8.6303, time 119.52ms
iter 38710: loss 7.7733, time 121.42ms
iter 38720: loss 8.5511, time 119.63ms
iter 38730: loss 7.9374, time 119.16ms
iter 38740: loss 8.6149, time 121.45ms
step 38750: train loss 7.9437, val loss 8.0053
saving checkpoint to out-shakespeare-char
iter 38750: loss 8.9847, time 2850.21ms
iter 38760: loss 8.3654, time 119.28ms
iter 38770: loss 7.6501, time 121.58ms
iter 38780: loss 8.3728, time 119.68ms
iter 38790: loss 8.6639, time 120.20ms
tensor(0.8645)
iter 38800: loss 8.3317, time 121.65ms
iter 38810: loss 8.6874, time 121.13ms
iter 38820: loss 7.8874, time 119.68ms
iter 38830: loss 9.0383, time 123.20ms
iter 38840: loss 8.5226, time 121.53ms
iter 38850: loss 8.9609, time 121.47ms
iter 38860: loss 8.9147, time 122.05ms
iter 38870: loss 8.4659, time 119.41ms
iter 38880: loss 7.6505, time 119.77ms
iter 38890: loss 8.5452, time 121.69ms
tensor(0.8853)
iter 38900: loss 8.1092, time 121.33ms
iter 38910: loss 8.0262, time 123.44ms
iter 38920: loss 8.0811, time 123.67ms
iter 38930: loss 8.3356, time 121.68ms
iter 38940: loss 8.6104, time 121.40ms
iter 38950: loss 8.2311, time 118.81ms
iter 38960: loss 8.5177, time 121.20ms
iter 38970: loss 7.8621, time 120.43ms
iter 38980: loss 7.8899, time 120.82ms
iter 38990: loss 8.4270, time 121.37ms
tensor(0.9045)
step 39000: train loss 7.9658, val loss 7.9592
saving checkpoint to out-shakespeare-char
iter 39000: loss 7.5331, time 2839.56ms
iter 39010: loss 8.9230, time 121.84ms
iter 39020: loss 8.3084, time 121.73ms
iter 39030: loss 8.4534, time 120.99ms
iter 39040: loss 8.0491, time 121.28ms
iter 39050: loss 8.8696, time 119.35ms
iter 39060: loss 8.3997, time 120.76ms
iter 39070: loss 8.1227, time 119.41ms
iter 39080: loss 8.8108, time 121.44ms
iter 39090: loss 8.2441, time 123.02ms
tensor(0.9222)
iter 39100: loss 8.6072, time 123.98ms
iter 39110: loss 8.2008, time 121.22ms
iter 39120: loss 8.5321, time 121.34ms
iter 39130: loss 7.9765, time 120.95ms
iter 39140: loss 8.1908, time 119.07ms
iter 39150: loss 8.1462, time 119.81ms
iter 39160: loss 8.2166, time 121.32ms
iter 39170: loss 8.5672, time 121.13ms
iter 39180: loss 9.0397, time 123.32ms
iter 39190: loss 7.9707, time 122.23ms
tensor(0.9382)
iter 39200: loss 7.7449, time 119.51ms
iter 39210: loss 7.8825, time 121.22ms
iter 39220: loss 8.4837, time 118.97ms
iter 39230: loss 8.4891, time 119.69ms
iter 39240: loss 8.0573, time 120.97ms
step 39250: train loss 8.0225, val loss 7.9891
saving checkpoint to out-shakespeare-char
iter 39250: loss 8.8155, time 2857.85ms
iter 39260: loss 8.3956, time 119.30ms
iter 39270: loss 8.7605, time 121.29ms
iter 39280: loss 8.0945, time 119.36ms
iter 39290: loss 8.4642, time 119.68ms
tensor(0.9524)
iter 39300: loss 8.0001, time 121.34ms
iter 39310: loss 8.4062, time 121.20ms
iter 39320: loss 8.3978, time 119.95ms
iter 39330: loss 7.8443, time 123.32ms
iter 39340: loss 7.7387, time 121.60ms
iter 39350: loss 8.9112, time 121.12ms
iter 39360: loss 8.3579, time 121.12ms
iter 39370: loss 8.1684, time 119.34ms
iter 39380: loss 8.1699, time 120.16ms
iter 39390: loss 8.4657, time 121.26ms
tensor(0.9649)
iter 39400: loss 8.8420, time 121.44ms
iter 39410: loss 8.7030, time 122.69ms
iter 39420: loss 8.6903, time 123.29ms
iter 39430: loss 8.5049, time 121.12ms
iter 39440: loss 8.5076, time 121.60ms
iter 39450: loss 7.9272, time 119.23ms
iter 39460: loss 8.4150, time 119.06ms
iter 39470: loss 8.2525, time 120.60ms
iter 39480: loss 7.8328, time 121.17ms
iter 39490: loss 8.1875, time 121.37ms
tensor(0.9755)
step 39500: train loss 7.9398, val loss 7.9398
saving checkpoint to out-shakespeare-char
iter 39500: loss 8.0508, time 2863.72ms
iter 39510: loss 9.1320, time 119.22ms
iter 39520: loss 9.0661, time 118.20ms
iter 39530: loss 8.3250, time 122.22ms
iter 39540: loss 8.2098, time 121.61ms
iter 39550: loss 8.7870, time 123.02ms
iter 39560: loss 8.3934, time 123.20ms
iter 39570: loss 8.7273, time 120.84ms
iter 39580: loss 8.4151, time 121.13ms
iter 39590: loss 8.8513, time 120.31ms
tensor(0.9843)
iter 39600: loss 9.2409, time 120.43ms
iter 39610: loss 8.3452, time 121.51ms
iter 39620: loss 8.7577, time 121.29ms
iter 39630: loss 8.3117, time 122.22ms
iter 39640: loss 8.3213, time 123.73ms
iter 39650: loss 8.5378, time 119.51ms
iter 39660: loss 8.7027, time 121.34ms
iter 39670: loss 8.5604, time 121.27ms
iter 39680: loss 8.2495, time 119.34ms
iter 39690: loss 9.3868, time 120.45ms
tensor(0.9911)
iter 39700: loss 8.9322, time 121.49ms
iter 39710: loss 7.7897, time 119.68ms
iter 39720: loss 8.1547, time 122.88ms
iter 39730: loss 7.8701, time 122.63ms
iter 39740: loss 8.3217, time 121.59ms
step 39750: train loss 7.9761, val loss 7.9487
saving checkpoint to out-shakespeare-char
iter 39750: loss 7.7106, time 2845.46ms
iter 39760: loss 8.0576, time 121.24ms
iter 39770: loss 8.0518, time 119.13ms
iter 39780: loss 8.4923, time 122.28ms
iter 39790: loss 8.7562, time 123.44ms
tensor(0.9961)
iter 39800: loss 8.1911, time 120.92ms
iter 39810: loss 8.6701, time 121.14ms
iter 39820: loss 8.0356, time 121.24ms
iter 39830: loss 8.0671, time 121.20ms
iter 39840: loss 8.5821, time 118.97ms
iter 39850: loss 7.7706, time 120.55ms
iter 39860: loss 8.0637, time 120.98ms
iter 39870: loss 8.4451, time 121.29ms
iter 39880: loss 8.4261, time 122.78ms
iter 39890: loss 8.5427, time 123.25ms
tensor(0.9990)
iter 39900: loss 8.1084, time 119.40ms
iter 39910: loss 8.6939, time 121.32ms
iter 39920: loss 8.9027, time 119.45ms
iter 39930: loss 8.4785, time 118.70ms
iter 39940: loss 8.9820, time 119.74ms
iter 39950: loss 8.8218, time 120.79ms
iter 39960: loss 8.3159, time 119.27ms
iter 39970: loss 8.5149, time 121.66ms
iter 39980: loss 8.5461, time 122.81ms
iter 39990: loss 8.5653, time 123.16ms
tensor(1.)
step 40000: train loss 8.0202, val loss 8.0044
saving checkpoint to out-shakespeare-char
iter 40000: loss 8.4593, time 2847.88ms
iter 40010: loss 8.1495, time 119.14ms
iter 40020: loss 8.9441, time 119.15ms
iter 40030: loss 8.4591, time 119.43ms
iter 40040: loss 8.8076, time 120.56ms
iter 40050: loss 8.3909, time 121.39ms
iter 40060: loss 8.5670, time 122.15ms
iter 40070: loss 8.2057, time 120.52ms
iter 40080: loss 8.2555, time 122.94ms
iter 40090: loss 8.8784, time 123.04ms
tensor(0.9990)
iter 40100: loss 8.6280, time 121.45ms
iter 40110: loss 8.1681, time 121.13ms
iter 40120: loss 8.5929, time 118.86ms
iter 40130: loss 8.4690, time 119.08ms
iter 40140: loss 8.2954, time 118.99ms
iter 40150: loss 7.9053, time 119.13ms
iter 40160: loss 9.1674, time 120.14ms
iter 40170: loss 8.5460, time 120.95ms
iter 40180: loss 8.7195, time 122.73ms
iter 40190: loss 8.6740, time 123.26ms
tensor(0.9961)
iter 40200: loss 8.5952, time 121.41ms
iter 40210: loss 8.8148, time 122.92ms
iter 40220: loss 8.1364, time 121.48ms
iter 40230: loss 9.1642, time 121.15ms
iter 40240: loss 8.8056, time 119.26ms
step 40250: train loss 8.0077, val loss 7.9948
saving checkpoint to out-shakespeare-char
iter 40250: loss 8.2668, time 2851.57ms
iter 40260: loss 8.6234, time 115.85ms
iter 40270: loss 8.0058, time 116.75ms
iter 40280: loss 8.0675, time 116.91ms
iter 40290: loss 9.1646, time 115.40ms
tensor(0.9911)
iter 40300: loss 8.3599, time 116.32ms
iter 40310: loss 7.5431, time 118.65ms
iter 40320: loss 7.9902, time 115.12ms
iter 40330: loss 7.9802, time 116.75ms
iter 40340: loss 8.0762, time 116.67ms
iter 40350: loss 7.9625, time 115.38ms
iter 40360: loss 8.7165, time 115.95ms
iter 40370: loss 8.4381, time 118.57ms
iter 40380: loss 8.3049, time 115.26ms
iter 40390: loss 8.7104, time 116.66ms
tensor(0.9843)
iter 40400: loss 7.8630, time 117.37ms
iter 40410: loss 7.9037, time 115.11ms
iter 40420: loss 8.7444, time 116.62ms
iter 40430: loss 8.5069, time 118.69ms
iter 40440: loss 8.4120, time 115.17ms
iter 40450: loss 8.3781, time 116.69ms
iter 40460: loss 8.7845, time 116.82ms
iter 40470: loss 8.3567, time 115.44ms
iter 40480: loss 7.8765, time 116.72ms
iter 40490: loss 8.4433, time 118.67ms
tensor(0.9755)
step 40500: train loss 7.9804, val loss 8.0096
saving checkpoint to out-shakespeare-char
iter 40500: loss 8.2189, time 2845.14ms
iter 40510: loss 8.4333, time 115.72ms
iter 40520: loss 8.1771, time 114.49ms
iter 40530: loss 8.4639, time 118.70ms
iter 40540: loss 8.5362, time 114.17ms
iter 40550: loss 8.2691, time 116.65ms
iter 40560: loss 7.8314, time 119.03ms
iter 40570: loss 8.6710, time 115.51ms
iter 40580: loss 8.1054, time 114.55ms
iter 40590: loss 8.4686, time 118.69ms
tensor(0.9649)
iter 40600: loss 8.3754, time 115.86ms
iter 40610: loss 8.7059, time 116.69ms
iter 40620: loss 8.2225, time 118.67ms
iter 40630: loss 8.5915, time 115.29ms
iter 40640: loss 8.7905, time 113.51ms
iter 40650: loss 8.2391, time 118.53ms
iter 40660: loss 7.9929, time 115.20ms
iter 40670: loss 8.1472, time 116.61ms
iter 40680: loss 8.5785, time 118.75ms
iter 40690: loss 8.0508, time 115.74ms
tensor(0.9524)
iter 40700: loss 7.9560, time 115.42ms
iter 40710: loss 8.9403, time 118.72ms
iter 40720: loss 8.2854, time 115.25ms
iter 40730: loss 8.5762, time 116.73ms
iter 40740: loss 8.6026, time 118.42ms
step 40750: train loss 7.9798, val loss 7.9529
saving checkpoint to out-shakespeare-char
iter 40750: loss 9.2521, time 2842.05ms
iter 40760: loss 9.5313, time 115.13ms
iter 40770: loss 8.7681, time 117.30ms
iter 40780: loss 8.1128, time 118.43ms
iter 40790: loss 8.8468, time 114.55ms
tensor(0.9382)
iter 40800: loss 8.8213, time 118.70ms
iter 40810: loss 9.0804, time 118.50ms
iter 40820: loss 9.5670, time 115.36ms
iter 40830: loss 8.8930, time 117.25ms
iter 40840: loss 8.8239, time 118.65ms
iter 40850: loss 9.0829, time 114.63ms
iter 40860: loss 9.2061, time 116.65ms
iter 40870: loss 8.5992, time 119.10ms
iter 40880: loss 8.5567, time 115.84ms
iter 40890: loss 8.1666, time 114.95ms
tensor(0.9222)
iter 40900: loss 8.4086, time 119.45ms
iter 40910: loss 9.0704, time 114.70ms
iter 40920: loss 8.6228, time 115.07ms
iter 40930: loss 9.1315, time 120.30ms
iter 40940: loss 8.3276, time 115.22ms
iter 40950: loss 9.1103, time 115.92ms
iter 40960: loss 8.5756, time 118.80ms
iter 40970: loss 8.5636, time 114.60ms
iter 40980: loss 8.2980, time 117.05ms
iter 40990: loss 8.8043, time 118.56ms
tensor(0.9045)
step 41000: train loss 7.9916, val loss 8.0166
saving checkpoint to out-shakespeare-char
iter 41000: loss 7.8307, time 2853.51ms
iter 41010: loss 8.0047, time 115.07ms
iter 41020: loss 8.9673, time 115.36ms
iter 41030: loss 8.3238, time 117.02ms
iter 41040: loss 8.3293, time 114.72ms
iter 41050: loss 8.2923, time 116.84ms
iter 41060: loss 8.6695, time 118.98ms
iter 41070: loss 8.9194, time 114.52ms
iter 41080: loss 8.0544, time 115.08ms
iter 41090: loss 8.3674, time 116.81ms
tensor(0.8853)
iter 41100: loss 8.4241, time 115.91ms
iter 41110: loss 8.9284, time 115.29ms
iter 41120: loss 7.9182, time 118.88ms
iter 41130: loss 7.9709, time 115.45ms
iter 41140: loss 8.1924, time 116.68ms
iter 41150: loss 8.7745, time 117.03ms
iter 41160: loss 8.3717, time 115.25ms
iter 41170: loss 8.4357, time 115.29ms
iter 41180: loss 7.8479, time 118.59ms
iter 41190: loss 8.6835, time 115.46ms
tensor(0.8645)
iter 41200: loss 7.7510, time 115.96ms
iter 41210: loss 8.4021, time 116.76ms
iter 41220: loss 8.2496, time 116.97ms
iter 41230: loss 8.2486, time 115.30ms
iter 41240: loss 8.2865, time 118.95ms
step 41250: train loss 8.0317, val loss 8.0492
saving checkpoint to out-shakespeare-char
iter 41250: loss 8.4104, time 2838.96ms
iter 41260: loss 8.8266, time 114.55ms
iter 41270: loss 8.9463, time 116.05ms
iter 41280: loss 8.1589, time 116.75ms
iter 41290: loss 7.9284, time 114.20ms
tensor(0.8423)
iter 41300: loss 7.2221, time 118.10ms
iter 41310: loss 8.1393, time 116.75ms
iter 41320: loss 8.6130, time 114.74ms
iter 41330: loss 8.1674, time 117.26ms
iter 41340: loss 8.7802, time 116.86ms
iter 41350: loss 8.0531, time 114.60ms
iter 41360: loss 8.2549, time 118.43ms
iter 41370: loss 8.2908, time 117.06ms
iter 41380: loss 8.4527, time 115.07ms
iter 41390: loss 8.9226, time 118.04ms
tensor(0.8187)
iter 41400: loss 9.2123, time 116.90ms
iter 41410: loss 9.5176, time 114.95ms
iter 41420: loss 9.3250, time 118.72ms
iter 41430: loss 9.7339, time 115.34ms
iter 41440: loss 9.5311, time 115.11ms
iter 41450: loss 9.0885, time 119.01ms
iter 41460: loss 9.2309, time 115.89ms
iter 41470: loss 8.9787, time 114.75ms
iter 41480: loss 8.9717, time 118.85ms
iter 41490: loss 8.7786, time 115.45ms
tensor(0.7939)
step 41500: train loss 7.9786, val loss 7.9755
saving checkpoint to out-shakespeare-char
iter 41500: loss 9.2965, time 2851.57ms
iter 41510: loss 9.3600, time 114.76ms
iter 41520: loss 8.5722, time 118.96ms
iter 41530: loss 8.9595, time 115.84ms
iter 41540: loss 8.9032, time 114.71ms
iter 41550: loss 8.2467, time 118.61ms
iter 41560: loss 8.0047, time 114.80ms
iter 41570: loss 8.7629, time 116.40ms
iter 41580: loss 8.8802, time 119.02ms
iter 41590: loss 8.0563, time 115.43ms
tensor(0.7679)
iter 41600: loss 8.7971, time 115.61ms
iter 41610: loss 8.9737, time 118.57ms
iter 41620: loss 8.3182, time 115.21ms
iter 41630: loss 8.3148, time 117.06ms
iter 41640: loss 8.4659, time 118.65ms
iter 41650: loss 8.0643, time 115.68ms
iter 41660: loss 8.5384, time 114.87ms
iter 41670: loss 8.2710, time 118.70ms
iter 41680: loss 8.3063, time 114.71ms
iter 41690: loss 8.0274, time 115.04ms
tensor(0.7409)
iter 41700: loss 8.3134, time 119.35ms
iter 41710: loss 8.5430, time 115.69ms
iter 41720: loss 8.0919, time 115.39ms
iter 41730: loss 8.4348, time 118.96ms
iter 41740: loss 8.0054, time 114.93ms
step 41750: train loss 7.9053, val loss 7.9434
saving checkpoint to out-shakespeare-char
iter 41750: loss 8.5902, time 2852.58ms
iter 41760: loss 8.1355, time 115.16ms
iter 41770: loss 7.8526, time 118.61ms
iter 41780: loss 8.5789, time 115.08ms
iter 41790: loss 8.2192, time 115.23ms
tensor(0.7129)
iter 41800: loss 8.5832, time 117.38ms
iter 41810: loss 8.7455, time 115.52ms
iter 41820: loss 8.2280, time 115.92ms
iter 41830: loss 8.0127, time 118.98ms
iter 41840: loss 8.4329, time 115.61ms
iter 41850: loss 8.6500, time 116.66ms
iter 41860: loss 8.2857, time 116.71ms
iter 41870: loss 8.6633, time 115.46ms
iter 41880: loss 7.8489, time 116.72ms
iter 41890: loss 8.0534, time 118.94ms
tensor(0.6841)
iter 41900: loss 8.5520, time 116.09ms
iter 41910: loss 8.3516, time 115.09ms
iter 41920: loss 8.8394, time 117.31ms
iter 41930: loss 7.9799, time 115.51ms
iter 41940: loss 8.3178, time 117.05ms
iter 41950: loss 8.3874, time 118.80ms
iter 41960: loss 7.8689, time 115.65ms
iter 41970: loss 8.4402, time 115.10ms
iter 41980: loss 8.5466, time 115.98ms
iter 41990: loss 9.0548, time 114.72ms
tensor(0.6545)
step 42000: train loss 7.8977, val loss 7.8843
saving checkpoint to out-shakespeare-char
iter 42000: loss 8.3328, time 2841.49ms
iter 42010: loss 7.9807, time 118.81ms
iter 42020: loss 8.4193, time 116.37ms
iter 42030: loss 8.0322, time 114.86ms
iter 42040: loss 7.7213, time 118.54ms
iter 42050: loss 8.2082, time 115.13ms
iter 42060: loss 8.3818, time 114.46ms
iter 42070: loss 7.7509, time 117.63ms
iter 42080: loss 8.3028, time 114.53ms
iter 42090: loss 7.7338, time 116.72ms
tensor(0.6243)
iter 42100: loss 8.6985, time 119.25ms
iter 42110: loss 8.3308, time 114.52ms
iter 42120: loss 7.9038, time 115.55ms
iter 42130: loss 8.2399, time 117.21ms
iter 42140: loss 8.3680, time 114.49ms
iter 42150: loss 7.7250, time 118.64ms
iter 42160: loss 7.9081, time 115.91ms
iter 42170: loss 8.7447, time 114.84ms
iter 42180: loss 8.2626, time 115.85ms
iter 42190: loss 8.1860, time 115.30ms
tensor(0.5937)
iter 42200: loss 7.9084, time 117.23ms
iter 42210: loss 8.2863, time 118.39ms
iter 42220: loss 8.1923, time 115.88ms
iter 42230: loss 7.5831, time 116.79ms
iter 42240: loss 7.9577, time 116.75ms
step 42250: train loss 7.8751, val loss 7.8808
saving checkpoint to out-shakespeare-char
iter 42250: loss 7.9862, time 2846.26ms
iter 42260: loss 8.2150, time 114.90ms
iter 42270: loss 7.8508, time 116.06ms
iter 42280: loss 7.9055, time 118.54ms
iter 42290: loss 7.6591, time 114.62ms
tensor(0.5627)
iter 42300: loss 8.1988, time 117.93ms
iter 42310: loss 7.4084, time 118.11ms
iter 42320: loss 7.4489, time 114.74ms
iter 42330: loss 7.9662, time 118.25ms
iter 42340: loss 8.1513, time 116.69ms
iter 42350: loss 7.9589, time 113.72ms
iter 42360: loss 7.9468, time 118.76ms
iter 42370: loss 8.0891, time 116.62ms
iter 42380: loss 8.8898, time 115.43ms
iter 42390: loss 7.4764, time 119.16ms
tensor(0.5314)
iter 42400: loss 8.8242, time 117.34ms
iter 42410: loss 8.1151, time 115.14ms
iter 42420: loss 7.9768, time 118.14ms
iter 42430: loss 7.8517, time 116.59ms
iter 42440: loss 7.9727, time 114.29ms
iter 42450: loss 8.1488, time 119.00ms
iter 42460: loss 8.0283, time 116.37ms
iter 42470: loss 7.7991, time 114.68ms
iter 42480: loss 7.8951, time 118.72ms
iter 42490: loss 7.6478, time 117.33ms
tensor(0.5000)
step 42500: train loss 7.8306, val loss 7.8399
saving checkpoint to out-shakespeare-char
iter 42500: loss 8.6431, time 2844.71ms
iter 42510: loss 7.5265, time 113.82ms
iter 42520: loss 8.1229, time 118.50ms
iter 42530: loss 8.2803, time 115.31ms
iter 42540: loss 8.9225, time 114.58ms
iter 42550: loss 7.8716, time 118.70ms
iter 42560: loss 7.9954, time 116.96ms
iter 42570: loss 7.8925, time 113.94ms
iter 42580: loss 8.3281, time 118.75ms
iter 42590: loss 8.3253, time 116.04ms
tensor(0.4686)
iter 42600: loss 7.9902, time 114.98ms
iter 42610: loss 7.9677, time 118.61ms
iter 42620: loss 7.8130, time 117.29ms
iter 42630: loss 8.2886, time 114.68ms
iter 42640: loss 8.5261, time 119.45ms
iter 42650: loss 8.4886, time 115.37ms
iter 42660: loss 7.8255, time 114.61ms
iter 42670: loss 8.3552, time 118.52ms
iter 42680: loss 8.1795, time 117.17ms
iter 42690: loss 8.2424, time 114.70ms
tensor(0.4373)
iter 42700: loss 8.1860, time 119.23ms
iter 42710: loss 8.5255, time 115.28ms
iter 42720: loss 7.9991, time 114.53ms
iter 42730: loss 8.0735, time 119.39ms
iter 42740: loss 8.5097, time 117.38ms
step 42750: train loss 7.7986, val loss 7.8160
saving checkpoint to out-shakespeare-char
iter 42750: loss 8.1368, time 2849.41ms
iter 42760: loss 8.3828, time 114.54ms
iter 42770: loss 8.1848, time 116.73ms
iter 42780: loss 7.9960, time 117.18ms
iter 42790: loss 8.1486, time 114.55ms
tensor(0.4063)
iter 42800: loss 7.9432, time 119.23ms
iter 42810: loss 8.1722, time 117.29ms
iter 42820: loss 8.7343, time 114.99ms
iter 42830: loss 8.0478, time 117.20ms
iter 42840: loss 7.8471, time 117.57ms
iter 42850: loss 8.0964, time 114.47ms
iter 42860: loss 7.5540, time 118.76ms
iter 42870: loss 7.9129, time 117.73ms
iter 42880: loss 8.0298, time 114.55ms
iter 42890: loss 8.0124, time 116.66ms
tensor(0.3757)
iter 42900: loss 8.0487, time 117.83ms
iter 42910: loss 8.1730, time 114.51ms
iter 42920: loss 8.0427, time 118.69ms
iter 42930: loss 8.2848, time 117.14ms
iter 42940: loss 7.7134, time 114.53ms
iter 42950: loss 7.8296, time 116.74ms
iter 42960: loss 8.3635, time 117.06ms
iter 42970: loss 7.9104, time 114.50ms
iter 42980: loss 8.0828, time 118.60ms
iter 42990: loss 8.4327, time 115.97ms
tensor(0.3455)
step 43000: train loss 7.8122, val loss 7.8149
saving checkpoint to out-shakespeare-char
iter 43000: loss 7.8342, time 2830.72ms
iter 43010: loss 7.6340, time 116.78ms
iter 43020: loss 8.3428, time 118.67ms
iter 43030: loss 7.7374, time 117.22ms
iter 43040: loss 7.8887, time 117.06ms
iter 43050: loss 8.2280, time 118.92ms
iter 43060: loss 8.7739, time 117.32ms
iter 43070: loss 8.0522, time 116.68ms
iter 43080: loss 7.4525, time 118.71ms
iter 43090: loss 8.7083, time 117.17ms
tensor(0.3159)
iter 43100: loss 7.5710, time 116.74ms
iter 43110: loss 7.6658, time 119.26ms
iter 43120: loss 8.4496, time 116.72ms
iter 43130: loss 8.0140, time 116.09ms
iter 43140: loss 8.1375, time 118.79ms
iter 43150: loss 8.2063, time 116.75ms
iter 43160: loss 8.7599, time 116.82ms
iter 43170: loss 7.8835, time 118.90ms
iter 43180: loss 7.8238, time 116.74ms
iter 43190: loss 8.1814, time 116.66ms
tensor(0.2871)
iter 43200: loss 7.8926, time 119.15ms
iter 43210: loss 7.8084, time 116.71ms
iter 43220: loss 7.9928, time 115.94ms
iter 43230: loss 7.6838, time 118.76ms
iter 43240: loss 8.1545, time 116.81ms
step 43250: train loss 7.7476, val loss 7.7703
saving checkpoint to out-shakespeare-char
iter 43250: loss 8.0968, time 2840.54ms
iter 43260: loss 7.7218, time 116.28ms
iter 43270: loss 7.7527, time 118.75ms
iter 43280: loss 8.1413, time 114.84ms
iter 43290: loss 7.9930, time 116.66ms
tensor(0.2591)
iter 43300: loss 8.0428, time 116.34ms
iter 43310: loss 7.6129, time 116.11ms
iter 43320: loss 7.4736, time 116.67ms
iter 43330: loss 7.4228, time 118.44ms
iter 43340: loss 8.1756, time 115.12ms
iter 43350: loss 8.3981, time 116.64ms
iter 43360: loss 8.0533, time 115.57ms
iter 43370: loss 7.9765, time 116.15ms
iter 43380: loss 8.5806, time 116.69ms
iter 43390: loss 8.0539, time 117.95ms
tensor(0.2321)
iter 43400: loss 8.5362, time 116.57ms
iter 43410: loss 7.7377, time 117.21ms
iter 43420: loss 7.9014, time 115.72ms
iter 43430: loss 8.0294, time 114.79ms
iter 43440: loss 7.5369, time 117.02ms
iter 43450: loss 7.4555, time 117.85ms
iter 43460: loss 7.7872, time 115.75ms
iter 43470: loss 7.2038, time 116.60ms
iter 43480: loss 8.4679, time 115.70ms
iter 43490: loss 7.7078, time 115.56ms
tensor(0.2061)
step 43500: train loss 7.7627, val loss 7.7165
saving checkpoint to out-shakespeare-char
iter 43500: loss 7.5734, time 2832.24ms
iter 43510: loss 7.5317, time 117.74ms
iter 43520: loss 7.9544, time 116.69ms
iter 43530: loss 7.8627, time 114.47ms
iter 43540: loss 7.7180, time 117.11ms
iter 43550: loss 8.0475, time 117.54ms
iter 43560: loss 7.9818, time 114.91ms
iter 43570: loss 7.7674, time 117.44ms
iter 43580: loss 8.3300, time 116.22ms
iter 43590: loss 8.0095, time 114.30ms
tensor(0.1813)
iter 43600: loss 7.8816, time 116.67ms
iter 43610: loss 8.2129, time 116.66ms
iter 43620: loss 7.6273, time 115.94ms
iter 43630: loss 8.3935, time 118.73ms
iter 43640: loss 8.3570, time 116.64ms
iter 43650: loss 7.8051, time 115.47ms
iter 43660: loss 8.2115, time 116.72ms
iter 43670: loss 7.9458, time 116.73ms
iter 43680: loss 7.5064, time 114.57ms
iter 43690: loss 7.7497, time 118.72ms
tensor(0.1577)
iter 43700: loss 7.9417, time 117.23ms
iter 43710: loss 7.8058, time 115.11ms
iter 43720: loss 8.2098, time 116.64ms
iter 43730: loss 7.7851, time 117.03ms
iter 43740: loss 7.2533, time 114.58ms
step 43750: train loss 7.7731, val loss 7.7557
saving checkpoint to out-shakespeare-char
iter 43750: loss 7.1820, time 2841.53ms
iter 43760: loss 7.5060, time 118.97ms
iter 43770: loss 8.5342, time 117.00ms
iter 43780: loss 7.8396, time 114.50ms
iter 43790: loss 7.8247, time 117.58ms
tensor(0.1355)
iter 43800: loss 7.9372, time 117.65ms
iter 43810: loss 7.3016, time 115.03ms
iter 43820: loss 8.1355, time 116.99ms
iter 43830: loss 7.4086, time 117.25ms
iter 43840: loss 7.8222, time 114.80ms
iter 43850: loss 7.9202, time 117.48ms
iter 43860: loss 7.8627, time 117.02ms
iter 43870: loss 8.1716, time 114.85ms
iter 43880: loss 7.7768, time 117.99ms
iter 43890: loss 8.4920, time 116.80ms
tensor(0.1147)
iter 43900: loss 8.4644, time 115.32ms
iter 43910: loss 8.2321, time 116.94ms
iter 43920: loss 8.0426, time 117.07ms
iter 43930: loss 7.5505, time 114.80ms
iter 43940: loss 7.8182, time 116.98ms
iter 43950: loss 8.1372, time 117.06ms
iter 43960: loss 7.9482, time 115.28ms
iter 43970: loss 7.9638, time 118.11ms
iter 43980: loss 7.7798, time 116.67ms
iter 43990: loss 8.5026, time 115.00ms
tensor(0.0955)
step 44000: train loss 7.6883, val loss 7.7226
saving checkpoint to out-shakespeare-char
iter 44000: loss 7.9393, time 2832.21ms
iter 44010: loss 7.7004, time 115.81ms
iter 44020: loss 7.8817, time 115.82ms
iter 44030: loss 7.4767, time 116.66ms
iter 44040: loss 8.2221, time 117.80ms
iter 44050: loss 7.6620, time 115.44ms
iter 44060: loss 8.1894, time 116.68ms
iter 44070: loss 7.7446, time 115.35ms
iter 44080: loss 7.3668, time 114.59ms
iter 44090: loss 7.1492, time 117.00ms
tensor(0.0778)
iter 44100: loss 8.0637, time 117.74ms
iter 44110: loss 7.7813, time 115.74ms
iter 44120: loss 7.4828, time 116.63ms
iter 44130: loss 7.5569, time 115.59ms
iter 44140: loss 8.0320, time 115.50ms
iter 44150: loss 7.7605, time 117.38ms
iter 44160: loss 8.2512, time 117.66ms
iter 44170: loss 8.3863, time 115.23ms
iter 44180: loss 7.8343, time 116.43ms
iter 44190: loss 7.6218, time 114.76ms
tensor(0.0618)
iter 44200: loss 8.4916, time 115.53ms
iter 44210: loss 7.7976, time 116.56ms
iter 44220: loss 7.8267, time 117.76ms
iter 44230: loss 7.7196, time 114.32ms
iter 44240: loss 7.5237, time 116.82ms
step 44250: train loss 7.6986, val loss 7.6488
saving checkpoint to out-shakespeare-char
iter 44250: loss 8.5013, time 2832.87ms
iter 44260: loss 7.7496, time 116.16ms
iter 44270: loss 7.9121, time 115.77ms
iter 44280: loss 7.4904, time 119.73ms
iter 44290: loss 7.6968, time 123.96ms
tensor(0.0476)
iter 44300: loss 8.0310, time 123.38ms
iter 44310: loss 7.5357, time 121.09ms
iter 44320: loss 7.9449, time 120.92ms
iter 44330: loss 7.4699, time 119.09ms
iter 44340: loss 8.5094, time 121.25ms
iter 44350: loss 8.1797, time 119.18ms
iter 44360: loss 8.1555, time 119.78ms
iter 44370: loss 8.0892, time 120.03ms
iter 44380: loss 7.9638, time 121.27ms
iter 44390: loss 8.0208, time 121.56ms
tensor(0.0351)
iter 44400: loss 7.8023, time 121.42ms
iter 44410: loss 7.8146, time 123.04ms
iter 44420: loss 8.0035, time 123.13ms
iter 44430: loss 7.9815, time 121.18ms
iter 44440: loss 8.0744, time 121.42ms
iter 44450: loss 7.9749, time 119.12ms
iter 44460: loss 7.2981, time 119.29ms
iter 44470: loss 7.8050, time 119.67ms
iter 44480: loss 8.1368, time 119.16ms
iter 44490: loss 7.8764, time 120.68ms
tensor(0.0245)
step 44500: train loss 7.6451, val loss 7.6774
saving checkpoint to out-shakespeare-char
iter 44500: loss 7.5188, time 2832.79ms
iter 44510: loss 7.3773, time 121.30ms
iter 44520: loss 8.1878, time 122.44ms
iter 44530: loss 8.2713, time 120.80ms
iter 44540: loss 7.9672, time 121.38ms
iter 44550: loss 8.1371, time 121.21ms
iter 44560: loss 8.3612, time 119.09ms
iter 44570: loss 7.9418, time 119.10ms
iter 44580: loss 7.8611, time 119.97ms
iter 44590: loss 8.3881, time 120.45ms
tensor(0.0157)
iter 44600: loss 7.6345, time 121.38ms
iter 44610: loss 7.5500, time 121.15ms
iter 44620: loss 8.2964, time 119.67ms
iter 44630: loss 7.8573, time 122.41ms
iter 44640: loss 8.3051, time 123.22ms
iter 44650: loss 7.9700, time 121.34ms
iter 44660: loss 7.5655, time 120.33ms
iter 44670: loss 7.7280, time 120.97ms
iter 44680: loss 7.7882, time 118.41ms
iter 44690: loss 8.3955, time 121.13ms
tensor(0.0089)
iter 44700: loss 7.1178, time 119.60ms
iter 44710: loss 7.9178, time 120.33ms
iter 44720: loss 7.8453, time 120.72ms
iter 44730: loss 7.8497, time 121.27ms
iter 44740: loss 8.3056, time 121.95ms
step 44750: train loss 7.6376, val loss 7.6594
saving checkpoint to out-shakespeare-char
iter 44750: loss 7.7050, time 2858.67ms
iter 44760: loss 7.9480, time 120.87ms
iter 44770: loss 8.3547, time 119.02ms
iter 44780: loss 8.3494, time 119.77ms
iter 44790: loss 7.6685, time 120.40ms
tensor(0.0039)
iter 44800: loss 7.9588, time 121.87ms
iter 44810: loss 7.4659, time 121.19ms
iter 44820: loss 8.1347, time 122.25ms
iter 44830: loss 7.7420, time 123.21ms
iter 44840: loss 7.5630, time 123.21ms
iter 44850: loss 7.5350, time 120.82ms
iter 44860: loss 7.6819, time 119.01ms
iter 44870: loss 7.8847, time 120.90ms
iter 44880: loss 7.3541, time 121.30ms
iter 44890: loss 7.3829, time 119.27ms
tensor(0.0010)
iter 44900: loss 7.4931, time 119.18ms
iter 44910: loss 7.0574, time 119.82ms
iter 44920: loss 7.7936, time 118.94ms
iter 44930: loss 8.0657, time 120.98ms
iter 44940: loss 7.7526, time 120.99ms
iter 44950: loss 7.4626, time 121.04ms
iter 44960: loss 8.0911, time 122.15ms
iter 44970: loss 7.7864, time 121.16ms
iter 44980: loss 8.1476, time 123.17ms
iter 44990: loss 7.6126, time 120.91ms
tensor(0.0010)
step 45000: train loss 7.6448, val loss 7.6661
saving checkpoint to out-shakespeare-char
iter 45000: loss 8.6224, time 2849.63ms
iter 45010: loss 7.6780, time 120.74ms
iter 45020: loss 8.0819, time 121.07ms
iter 45030: loss 7.4475, time 119.38ms
iter 45040: loss 7.9729, time 122.00ms
iter 45050: loss 7.1886, time 123.13ms
iter 45060: loss 7.9108, time 123.10ms
iter 45070: loss 8.1178, time 120.80ms
iter 45080: loss 7.4067, time 120.79ms
iter 45090: loss 7.8822, time 120.08ms
tensor(0.0010)
iter 45100: loss 7.6994, time 121.25ms
iter 45110: loss 7.9081, time 118.84ms
iter 45120: loss 7.1416, time 119.73ms
iter 45130: loss 7.3953, time 120.83ms
iter 45140: loss 7.5842, time 121.19ms
iter 45150: loss 7.8272, time 121.63ms
iter 45160: loss 7.8903, time 121.14ms
iter 45170: loss 7.8209, time 123.28ms
iter 45180: loss 8.3822, time 121.12ms
iter 45190: loss 7.3654, time 121.31ms
tensor(0.0039)
iter 45200: loss 7.4382, time 122.07ms
iter 45210: loss 8.2254, time 119.66ms
iter 45220: loss 7.6318, time 119.20ms
iter 45230: loss 7.5437, time 121.52ms
iter 45240: loss 7.8063, time 121.15ms
step 45250: train loss 7.6620, val loss 7.6548
saving checkpoint to out-shakespeare-char
iter 45250: loss 7.4960, time 2848.12ms
iter 45260: loss 7.7673, time 118.24ms
iter 45270: loss 7.8251, time 116.16ms
iter 45280: loss 8.2002, time 121.25ms
iter 45290: loss 7.8534, time 121.56ms
tensor(0.0089)
iter 45300: loss 8.0323, time 122.35ms
iter 45310: loss 7.2855, time 122.79ms
iter 45320: loss 7.5648, time 121.21ms
iter 45330: loss 8.3522, time 121.05ms
iter 45340: loss 7.6934, time 121.00ms
iter 45350: loss 8.3069, time 121.10ms
iter 45360: loss 8.1247, time 121.17ms
iter 45370: loss 8.1931, time 119.53ms
iter 45380: loss 8.2850, time 117.64ms
iter 45390: loss 7.3780, time 120.88ms
tensor(0.0157)
iter 45400: loss 7.9852, time 121.20ms
iter 45410: loss 7.6663, time 121.85ms
iter 45420: loss 7.4368, time 121.77ms
iter 45430: loss 8.2321, time 120.12ms
iter 45440: loss 7.8844, time 122.52ms
iter 45450: loss 8.2397, time 122.59ms
iter 45460: loss 8.0781, time 121.45ms
iter 45470: loss 8.4375, time 121.21ms
iter 45480: loss 7.9182, time 121.40ms
iter 45490: loss 8.1790, time 118.24ms
tensor(0.0245)
step 45500: train loss 7.6425, val loss 7.6590
saving checkpoint to out-shakespeare-char
iter 45500: loss 7.0725, time 2826.41ms
iter 45510: loss 7.8089, time 120.59ms
iter 45520: loss 8.4721, time 120.96ms
iter 45530: loss 7.6850, time 120.97ms
iter 45540: loss 7.9033, time 119.07ms
iter 45550: loss 8.4816, time 121.98ms
iter 45560: loss 7.8075, time 123.00ms
iter 45570: loss 7.8415, time 123.38ms
iter 45580: loss 7.8873, time 121.74ms
iter 45590: loss 7.4917, time 121.19ms
tensor(0.0351)
iter 45600: loss 8.3515, time 119.19ms
iter 45610: loss 7.8801, time 121.22ms
iter 45620: loss 8.1028, time 118.92ms
iter 45630: loss 7.4490, time 121.88ms
iter 45640: loss 7.3253, time 121.42ms
iter 45650: loss 7.8850, time 121.39ms
iter 45660: loss 8.4488, time 123.55ms
iter 45670: loss 7.9110, time 119.64ms
iter 45680: loss 7.9153, time 120.42ms
iter 45690: loss 8.1372, time 121.39ms
tensor(0.0476)
iter 45700: loss 8.0657, time 120.06ms
iter 45710: loss 7.3915, time 121.16ms
iter 45720: loss 8.4893, time 121.61ms
iter 45730: loss 7.1410, time 120.73ms
iter 45740: loss 7.7314, time 122.90ms
step 45750: train loss 7.6691, val loss 7.6213
saving checkpoint to out-shakespeare-char
iter 45750: loss 7.9856, time 2837.92ms
iter 45760: loss 7.3978, time 121.81ms
iter 45770: loss 7.7128, time 120.14ms
iter 45780: loss 8.1379, time 121.76ms
iter 45790: loss 7.7310, time 120.51ms
tensor(0.0618)
iter 45800: loss 7.8549, time 123.23ms
iter 45810: loss 7.8358, time 121.65ms
iter 45820: loss 7.8613, time 121.71ms
iter 45830: loss 7.7578, time 119.90ms
iter 45840: loss 7.6420, time 120.81ms
iter 45850: loss 7.9460, time 123.19ms
iter 45860: loss 7.9020, time 122.76ms
iter 45870: loss 7.8927, time 121.97ms
iter 45880: loss 7.3661, time 122.48ms
iter 45890: loss 8.1425, time 121.81ms
tensor(0.0778)
iter 45900: loss 7.7183, time 119.83ms
iter 45910: loss 7.4388, time 120.27ms
iter 45920: loss 8.1099, time 121.51ms
iter 45930: loss 7.5757, time 118.76ms
iter 45940: loss 7.5531, time 123.07ms
iter 45950: loss 7.9730, time 121.62ms
iter 45960: loss 7.4996, time 121.55ms
iter 45970: loss 7.9494, time 122.06ms
iter 45980: loss 7.8388, time 119.79ms
iter 45990: loss 8.2996, time 120.76ms
tensor(0.0955)
step 46000: train loss 7.6602, val loss 7.7083
saving checkpoint to out-shakespeare-char
iter 46000: loss 8.1354, time 2850.01ms
iter 46010: loss 8.6653, time 121.16ms
iter 46020: loss 7.7768, time 120.99ms
iter 46030: loss 7.6152, time 121.13ms
iter 46040: loss 8.5702, time 121.01ms
iter 46050: loss 7.9921, time 118.70ms
iter 46060: loss 7.9902, time 119.90ms
iter 46070: loss 8.1005, time 121.19ms
iter 46080: loss 7.3185, time 121.29ms
iter 46090: loss 7.3671, time 122.49ms
tensor(0.1147)
iter 46100: loss 7.2203, time 122.37ms
iter 46110: loss 7.5268, time 121.26ms
iter 46120: loss 8.0000, time 118.92ms
iter 46130: loss 7.4535, time 120.99ms
iter 46140: loss 7.8049, time 121.18ms
iter 46150: loss 7.9829, time 120.18ms
iter 46160: loss 7.2588, time 119.69ms
iter 46170: loss 8.0342, time 120.68ms
iter 46180: loss 7.7878, time 118.93ms
iter 46190: loss 7.6729, time 121.02ms
tensor(0.1355)
iter 46200: loss 8.0259, time 122.26ms
iter 46210: loss 7.0880, time 122.94ms
iter 46220: loss 7.9273, time 123.31ms
iter 46230: loss 7.7093, time 121.17ms
iter 46240: loss 7.4516, time 117.64ms
step 46250: train loss 7.6940, val loss 7.6939
saving checkpoint to out-shakespeare-char
iter 46250: loss 7.5189, time 2848.27ms
iter 46260: loss 7.5367, time 121.15ms
iter 46270: loss 7.8622, time 121.12ms
iter 46280: loss 7.6179, time 119.70ms
iter 46290: loss 7.7266, time 120.74ms
tensor(0.1577)
iter 46300: loss 8.2051, time 123.66ms
iter 46310: loss 7.8121, time 121.07ms
iter 46320: loss 7.8469, time 120.87ms
iter 46330: loss 7.9099, time 121.20ms
iter 46340: loss 8.2045, time 118.94ms
iter 46350: loss 7.3146, time 119.06ms
iter 46360: loss 7.7644, time 119.38ms
iter 46370: loss 8.8676, time 120.02ms
iter 46380: loss 8.3147, time 121.35ms
iter 46390: loss 8.0456, time 121.85ms
tensor(0.1813)
iter 46400: loss 8.3095, time 121.72ms
iter 46410: loss 8.0755, time 123.13ms
iter 46420: loss 8.2202, time 121.14ms
iter 46430: loss 7.8157, time 123.46ms
iter 46440: loss 7.8611, time 123.00ms
iter 46450: loss 8.3494, time 121.17ms
iter 46460: loss 7.9149, time 121.15ms
iter 46470: loss 8.0415, time 119.03ms
iter 46480: loss 7.6247, time 119.05ms
iter 46490: loss 8.2154, time 119.08ms
tensor(0.2061)
step 46500: train loss 7.7649, val loss 7.7836
saving checkpoint to out-shakespeare-char
iter 46500: loss 7.4211, time 2850.19ms
iter 46510: loss 8.0251, time 123.26ms
iter 46520: loss 7.2766, time 123.01ms
iter 46530: loss 8.1864, time 121.18ms
iter 46540: loss 7.3141, time 120.83ms
iter 46550: loss 8.3209, time 122.42ms
iter 46560: loss 7.6818, time 122.55ms
iter 46570: loss 7.9776, time 122.93ms
iter 46580: loss 7.9956, time 119.13ms
iter 46590: loss 8.1685, time 120.58ms
tensor(0.2321)
iter 46600: loss 7.3787, time 120.86ms
iter 46610: loss 7.5854, time 122.52ms
iter 46620: loss 7.2026, time 122.66ms
iter 46630: loss 8.0187, time 119.62ms
iter 46640: loss 7.5017, time 122.46ms
iter 46650: loss 7.9273, time 122.60ms
iter 46660: loss 8.2102, time 122.30ms
iter 46670: loss 8.0894, time 122.45ms
iter 46680: loss 8.2218, time 120.30ms
iter 46690: loss 7.8884, time 122.60ms
tensor(0.2591)
iter 46700: loss 7.7684, time 121.85ms
iter 46710: loss 8.3303, time 122.62ms
iter 46720: loss 7.9177, time 120.83ms
iter 46730: loss 7.8733, time 120.81ms
iter 46740: loss 8.1662, time 122.75ms
step 46750: train loss 7.8080, val loss 7.7795
saving checkpoint to out-shakespeare-char
iter 46750: loss 7.9394, time 2842.61ms
iter 46760: loss 8.1484, time 119.15ms
iter 46770: loss 7.6145, time 120.04ms
iter 46780: loss 8.1109, time 118.97ms
iter 46790: loss 7.7238, time 118.56ms
tensor(0.2871)
iter 46800: loss 8.3194, time 116.26ms
iter 46810: loss 8.3576, time 119.62ms
iter 46820: loss 7.4505, time 119.41ms
iter 46830: loss 7.8669, time 119.64ms
iter 46840: loss 7.6357, time 120.48ms
iter 46850: loss 7.6089, time 121.23ms
iter 46860: loss 8.4735, time 120.22ms
iter 46870: loss 7.9836, time 122.96ms
iter 46880: loss 8.3370, time 122.92ms
iter 46890: loss 7.5785, time 122.93ms
tensor(0.3159)
iter 46900: loss 8.0442, time 121.84ms
iter 46910: loss 8.0493, time 121.16ms
iter 46920: loss 8.2264, time 121.11ms
iter 46930: loss 7.4059, time 118.90ms
iter 46940: loss 8.0311, time 118.70ms
iter 46950: loss 7.7548, time 118.30ms
iter 46960: loss 8.4123, time 119.32ms
iter 46970: loss 8.2221, time 120.17ms
iter 46980: loss 7.3527, time 121.17ms
iter 46990: loss 8.3458, time 120.99ms
tensor(0.3455)
step 47000: train loss 7.8597, val loss 7.8571
saving checkpoint to out-shakespeare-char
iter 47000: loss 8.0198, time 2851.63ms
iter 47010: loss 8.1019, time 121.15ms
iter 47020: loss 7.8038, time 119.24ms
iter 47030: loss 7.8196, time 119.57ms
iter 47040: loss 8.1892, time 119.02ms
iter 47050: loss 7.7372, time 119.11ms
iter 47060: loss 8.0830, time 118.81ms
iter 47070: loss 8.2551, time 118.86ms
iter 47080: loss 8.0292, time 119.76ms
iter 47090: loss 8.2941, time 120.13ms
tensor(0.3757)
iter 47100: loss 7.8377, time 120.16ms
iter 47110: loss 8.1192, time 120.83ms
iter 47120: loss 8.2846, time 122.51ms
iter 47130: loss 7.9673, time 122.80ms
iter 47140: loss 8.6872, time 123.10ms
iter 47150: loss 8.2133, time 121.36ms
iter 47160: loss 8.0503, time 121.33ms
iter 47170: loss 7.8762, time 119.43ms
iter 47180: loss 7.9920, time 119.18ms
iter 47190: loss 8.3349, time 119.12ms
tensor(0.4063)
iter 47200: loss 8.1197, time 118.57ms
iter 47210: loss 8.3673, time 119.19ms
iter 47220: loss 7.8868, time 119.87ms
iter 47230: loss 8.5519, time 120.38ms
iter 47240: loss 8.3352, time 120.46ms
step 47250: train loss 7.8723, val loss 7.8742
saving checkpoint to out-shakespeare-char
iter 47250: loss 8.2917, time 2849.12ms
iter 47260: loss 8.1973, time 121.12ms
iter 47270: loss 7.8185, time 121.63ms
iter 47280: loss 8.0595, time 121.26ms
iter 47290: loss 8.2296, time 119.01ms
tensor(0.4373)
iter 47300: loss 7.7556, time 119.38ms
iter 47310: loss 8.2316, time 118.03ms
iter 47320: loss 7.8891, time 119.87ms
iter 47330: loss 8.2973, time 120.52ms
iter 47340: loss 8.3315, time 120.94ms
iter 47350: loss 8.3617, time 121.57ms
iter 47360: loss 8.4665, time 121.47ms
iter 47370: loss 8.2036, time 121.31ms
iter 47380: loss 8.1688, time 123.08ms
iter 47390: loss 8.0609, time 122.93ms
tensor(0.4686)
iter 47400: loss 8.1221, time 121.50ms
iter 47410: loss 8.2078, time 121.33ms
iter 47420: loss 8.4505, time 121.29ms
iter 47430: loss 7.7506, time 119.19ms
iter 47440: loss 8.2508, time 119.31ms
iter 47450: loss 8.6269, time 118.98ms
iter 47460: loss 8.5006, time 119.35ms
iter 47470: loss 7.9350, time 120.65ms
iter 47480: loss 8.0521, time 121.07ms
iter 47490: loss 8.3187, time 122.55ms
tensor(0.5000)
step 47500: train loss 7.8742, val loss 7.8747
saving checkpoint to out-shakespeare-char
iter 47500: loss 8.2246, time 2853.70ms
iter 47510: loss 8.2354, time 119.32ms
iter 47520: loss 8.3017, time 119.06ms
iter 47530: loss 8.2709, time 118.97ms
iter 47540: loss 8.9633, time 119.72ms
iter 47550: loss 8.3927, time 120.53ms
iter 47560: loss 8.6600, time 120.95ms
iter 47570: loss 8.9565, time 121.91ms
iter 47580: loss 9.1666, time 122.17ms
iter 47590: loss 9.3979, time 122.58ms
tensor(0.5314)
iter 47600: loss 9.6180, time 123.68ms
iter 47610: loss 10.2605, time 120.22ms
iter 47620: loss 11.7952, time 121.17ms
iter 47630: loss 12.0580, time 121.17ms
iter 47640: loss 12.1485, time 117.89ms
iter 47650: loss 10.6304, time 119.19ms
iter 47660: loss 11.1628, time 119.10ms
iter 47670: loss 10.6710, time 118.96ms
iter 47680: loss 10.9903, time 120.57ms
iter 47690: loss 10.3489, time 121.74ms
tensor(0.5627)
iter 47700: loss 10.8993, time 122.94ms
iter 47710: loss 10.1909, time 122.97ms
iter 47720: loss 9.5700, time 121.75ms
iter 47730: loss 10.1531, time 122.94ms
iter 47740: loss 10.1130, time 121.66ms
step 47750: train loss 7.8923, val loss 7.8994
saving checkpoint to out-shakespeare-char
iter 47750: loss 9.6066, time 2833.32ms
iter 47760: loss 9.5664, time 119.32ms
iter 47770: loss 8.8862, time 120.20ms
iter 47780: loss 9.2585, time 119.02ms
iter 47790: loss 9.7117, time 119.69ms
tensor(0.5937)
iter 47800: loss 9.9175, time 119.64ms
iter 47810: loss 8.9304, time 121.15ms
iter 47820: loss 9.0639, time 122.09ms
iter 47830: loss 9.1040, time 121.22ms
iter 47840: loss 9.1512, time 122.82ms
iter 47850: loss 9.1561, time 122.99ms
iter 47860: loss 9.1730, time 123.30ms
iter 47870: loss 8.3477, time 121.18ms
iter 47880: loss 8.8524, time 121.25ms
iter 47890: loss 8.9683, time 117.35ms
tensor(0.6243)
iter 47900: loss 7.6412, time 119.39ms
iter 47910: loss 8.4266, time 118.99ms
iter 47920: loss 8.6258, time 119.24ms
iter 47930: loss 8.5989, time 119.68ms
iter 47940: loss 8.4722, time 120.82ms
iter 47950: loss 9.0439, time 121.24ms
iter 47960: loss 8.8366, time 122.17ms
iter 47970: loss 8.3251, time 123.27ms
iter 47980: loss 8.7219, time 123.08ms
iter 47990: loss 8.2796, time 122.99ms
tensor(0.6545)
step 48000: train loss 7.9147, val loss 7.8965
saving checkpoint to out-shakespeare-char
iter 48000: loss 7.9422, time 2812.02ms
iter 48010: loss 8.5953, time 122.33ms
iter 48020: loss 8.2823, time 121.29ms
iter 48030: loss 8.4711, time 119.35ms
iter 48040: loss 8.2713, time 119.10ms
iter 48050: loss 8.5169, time 118.41ms
iter 48060: loss 8.3331, time 119.33ms
iter 48070: loss 8.4359, time 120.10ms
iter 48080: loss 8.6226, time 121.34ms
iter 48090: loss 8.8715, time 121.10ms
tensor(0.6841)
iter 48100: loss 8.2147, time 122.52ms
iter 48110: loss 8.3976, time 122.95ms
iter 48120: loss 8.7305, time 120.95ms
iter 48130: loss 8.2901, time 123.39ms
iter 48140: loss 8.2158, time 122.81ms
iter 48150: loss 8.0272, time 121.94ms
iter 48160: loss 8.4577, time 121.80ms
iter 48170: loss 8.6155, time 118.22ms
iter 48180: loss 8.4356, time 117.87ms
iter 48190: loss 8.3742, time 119.03ms
tensor(0.7129)
iter 48200: loss 8.7790, time 119.17ms
iter 48210: loss 7.8563, time 118.85ms
iter 48220: loss 9.0838, time 118.82ms
iter 48230: loss 8.3219, time 118.85ms
iter 48240: loss 8.6494, time 120.48ms
step 48250: train loss 7.9596, val loss 7.9198
saving checkpoint to out-shakespeare-char
iter 48250: loss 8.2524, time 2807.87ms
iter 48260: loss 8.1655, time 121.18ms
iter 48270: loss 8.3713, time 121.09ms
iter 48280: loss 8.3351, time 121.20ms
iter 48290: loss 8.3159, time 122.40ms
tensor(0.7409)
iter 48300: loss 8.4053, time 123.22ms
iter 48310: loss 8.7612, time 122.85ms
iter 48320: loss 8.1569, time 122.83ms
iter 48330: loss 8.3741, time 120.76ms
iter 48340: loss 8.3725, time 121.20ms
iter 48350: loss 8.4905, time 120.77ms
iter 48360: loss 8.6790, time 121.00ms
iter 48370: loss 8.0482, time 118.93ms
iter 48380: loss 8.6761, time 119.26ms
iter 48390: loss 8.8046, time 118.24ms
tensor(0.7679)
iter 48400: loss 8.1885, time 119.32ms
iter 48410: loss 8.6780, time 120.02ms
iter 48420: loss 8.5008, time 120.51ms
iter 48430: loss 8.0207, time 121.08ms
iter 48440: loss 8.8956, time 119.81ms
iter 48450: loss 8.7862, time 122.39ms
iter 48460: loss 8.9342, time 122.83ms
iter 48470: loss 8.7211, time 122.88ms
iter 48480: loss 8.0258, time 122.62ms
iter 48490: loss 8.8167, time 120.86ms
tensor(0.7939)
step 48500: train loss 7.9918, val loss 7.9353
saving checkpoint to out-shakespeare-char
iter 48500: loss 8.2551, time 2844.41ms
iter 48510: loss 8.4473, time 119.12ms
iter 48520: loss 8.2070, time 118.77ms
iter 48530: loss 8.6123, time 118.92ms
iter 48540: loss 8.5293, time 118.89ms
iter 48550: loss 8.5880, time 117.82ms
iter 48560: loss 8.2134, time 120.16ms
iter 48570: loss 7.9949, time 120.42ms
iter 48580: loss 8.8687, time 121.18ms
iter 48590: loss 8.4681, time 122.35ms
tensor(0.8187)
iter 48600: loss 8.4657, time 121.35ms
iter 48610: loss 8.1013, time 122.94ms
iter 48620: loss 7.9852, time 122.76ms
iter 48630: loss 8.5885, time 122.23ms
iter 48640: loss 8.8360, time 122.02ms
iter 48650: loss 7.9984, time 121.15ms
iter 48660: loss 8.5252, time 120.83ms
iter 48670: loss 8.3483, time 118.77ms
iter 48680: loss 8.5345, time 118.83ms
iter 48690: loss 8.7285, time 119.27ms
tensor(0.8423)
iter 48700: loss 8.5687, time 119.60ms
iter 48710: loss 8.1938, time 119.79ms
iter 48720: loss 8.2618, time 120.98ms
iter 48730: loss 7.8124, time 121.01ms
iter 48740: loss 8.9425, time 122.59ms
step 48750: train loss 7.9785, val loss 7.9314
saving checkpoint to out-shakespeare-char
iter 48750: loss 8.3179, time 2851.82ms
iter 48760: loss 8.0936, time 120.81ms
iter 48770: loss 8.1929, time 119.83ms
iter 48780: loss 8.4248, time 118.68ms
iter 48790: loss 8.1178, time 118.67ms
tensor(0.8645)
iter 48800: loss 8.9078, time 118.66ms
iter 48810: loss 8.3075, time 118.82ms
iter 48820: loss 8.4011, time 118.57ms
iter 48830: loss 8.2761, time 118.78ms
iter 48840: loss 8.9852, time 118.99ms
iter 48850: loss 8.5439, time 118.64ms
iter 48860: loss 8.5917, time 118.94ms
iter 48870: loss 8.4792, time 119.90ms
iter 48880: loss 8.4589, time 120.35ms
iter 48890: loss 9.2731, time 120.70ms
tensor(0.8853)
iter 48900: loss 9.5668, time 121.38ms
iter 48910: loss 29.3841, time 121.49ms
iter 48920: loss 192.9563, time 121.17ms
iter 48930: loss 51.0672, time 122.58ms
iter 48940: loss 79.6079, time 120.89ms
iter 48950: loss 578.0381, time 122.54ms
iter 48960: loss 199.1768, time 122.62ms
iter 48970: loss 73.6966, time 122.74ms
iter 48980: loss 436.7000, time 122.50ms
iter 48990: loss 190.3366, time 120.69ms
tensor(0.9045)
step 49000: train loss 133.2785, val loss 134.5701
saving checkpoint to out-shakespeare-char
iter 49000: loss 126.1295, time 2800.57ms
iter 49010: loss 100.8996, time 123.40ms
iter 49020: loss 79.1249, time 121.60ms
iter 49030: loss 102.5519, time 122.01ms
iter 49040: loss 110.4942, time 120.74ms
iter 49050: loss 74.1047, time 120.81ms
iter 49060: loss 78.3075, time 118.83ms
iter 49070: loss 239.9333, time 118.73ms
iter 49080: loss 208.3298, time 117.47ms
iter 49090: loss 232.9935, time 118.65ms
tensor(0.9222)
iter 49100: loss 65.4622, time 119.19ms
iter 49110: loss 273.2239, time 119.08ms
iter 49120: loss 136.4304, time 119.26ms
iter 49130: loss 62.4373, time 119.87ms
iter 49140: loss 58.6570, time 120.94ms
iter 49150: loss 91.4888, time 120.84ms
iter 49160: loss 82.0052, time 121.84ms
iter 49170: loss 97.6736, time 121.87ms
iter 49180: loss 59.6150, time 122.97ms
iter 49190: loss 112.2180, time 122.73ms
tensor(0.9382)
iter 49200: loss 90.4265, time 121.06ms
iter 49210: loss 91.1488, time 122.80ms
iter 49220: loss 77.6843, time 122.77ms
iter 49230: loss 68.9697, time 120.92ms
iter 49240: loss 87.7330, time 121.23ms
step 49250: train loss 70.5984, val loss 71.1800
saving checkpoint to out-shakespeare-char
iter 49250: loss 78.6019, time 2850.95ms
iter 49260: loss 66.1210, time 120.34ms
iter 49270: loss 98.7106, time 121.26ms
iter 49280: loss 78.9242, time 120.94ms
iter 49290: loss 94.8332, time 121.92ms
tensor(0.9524)
iter 49300: loss 77.5966, time 123.18ms
iter 49310: loss 70.5783, time 120.88ms
iter 49320: loss 128.5125, time 122.78ms
iter 49330: loss 106.7818, time 123.04ms
iter 49340: loss 102.1588, time 122.69ms
iter 49350: loss 92.6534, time 121.06ms
iter 49360: loss 116.6928, time 118.81ms
iter 49370: loss 87.9716, time 120.85ms
iter 49380: loss 119.0335, time 118.69ms
iter 49390: loss 95.1459, time 118.80ms
tensor(0.9649)
iter 49400: loss 89.0150, time 118.96ms
iter 49410: loss 140.8222, time 119.93ms
iter 49420: loss 90.0201, time 118.78ms
iter 49430: loss 99.7429, time 120.93ms
iter 49440: loss 105.0898, time 120.66ms
iter 49450: loss 80.5269, time 121.38ms
iter 49460: loss 78.8935, time 122.39ms
iter 49470: loss 155.6308, time 120.98ms
iter 49480: loss 125.3627, time 123.22ms
iter 49490: loss 134.0289, time 121.15ms
tensor(0.9755)
step 49500: train loss 133.0559, val loss 134.2302
saving checkpoint to out-shakespeare-char
iter 49500: loss 156.6417, time 2858.04ms
iter 49510: loss 79.4536, time 120.37ms
iter 49520: loss 67.1374, time 120.92ms
iter 49530: loss 136.2262, time 119.65ms
iter 49540: loss 90.4868, time 122.71ms
iter 49550: loss 135.3381, time 123.18ms
iter 49560: loss 94.1721, time 120.08ms
iter 49570: loss 132.0600, time 120.97ms
iter 49580: loss 98.9740, time 120.64ms
iter 49590: loss 85.6654, time 121.09ms
tensor(0.9843)
iter 49600: loss 155.5259, time 119.35ms
iter 49610: loss 92.6962, time 119.71ms
iter 49620: loss 103.5215, time 120.95ms
iter 49630: loss 100.7791, time 121.03ms
iter 49640: loss 124.1250, time 124.43ms
iter 49650: loss 105.5562, time 123.07ms
iter 49660: loss 102.7188, time 119.30ms
iter 49670: loss 128.5872, time 119.50ms
iter 49680: loss 130.0011, time 121.21ms
iter 49690: loss 127.2552, time 123.57ms
tensor(0.9911)
iter 49700: loss 110.0100, time 122.33ms
iter 49710: loss 136.8004, time 123.52ms
iter 49720: loss 115.6526, time 121.37ms
iter 49730: loss 116.4322, time 121.07ms
iter 49740: loss 118.7146, time 119.75ms
step 49750: train loss 69.7481, val loss 70.4900
saving checkpoint to out-shakespeare-char
iter 49750: loss 119.5353, time 2853.92ms
iter 49760: loss 160.0593, time 114.91ms
iter 49770: loss 148.6222, time 114.55ms
iter 49780: loss 102.9154, time 118.24ms
iter 49790: loss 137.8906, time 115.82ms
tensor(0.9961)
iter 49800: loss 133.5963, time 119.82ms
iter 49810: loss 167.0536, time 120.59ms
iter 49820: loss 129.1566, time 115.65ms
iter 49830: loss 152.6456, time 117.33ms
iter 49840: loss 116.0556, time 114.98ms
iter 49850: loss 99.1225, time 115.72ms
iter 49860: loss 116.7405, time 116.73ms
iter 49870: loss 193.8862, time 118.11ms
iter 49880: loss 137.1882, time 115.52ms
iter 49890: loss 129.9837, time 116.70ms
tensor(0.9990)
iter 49900: loss 147.3739, time 117.07ms
iter 49910: loss 113.2828, time 115.95ms
iter 49920: loss 151.3076, time 116.88ms
iter 49930: loss 126.1162, time 118.01ms
iter 49940: loss 122.3681, time 116.54ms
iter 49950: loss 106.5353, time 116.74ms
iter 49960: loss 155.7460, time 115.56ms
iter 49970: loss 142.5432, time 116.02ms
iter 49980: loss 171.9521, time 116.97ms
iter 49990: loss 163.0310, time 117.89ms
tensor(1.)
step 50000: train loss 120.7244, val loss 122.3788
saving checkpoint to out-shakespeare-char
iter 50000: loss 157.9145, time 2837.78ms
iter 50010: loss 127.8932, time 114.31ms
iter 50020: loss 159.7389, time 116.35ms
iter 50030: loss 147.5867, time 116.99ms
iter 50040: loss 112.8256, time 116.40ms
iter 50050: loss 152.8925, time 119.15ms
iter 50060: loss 146.9781, time 116.69ms
iter 50070: loss 149.8501, time 113.82ms
iter 50080: loss 167.9048, time 116.82ms
iter 50090: loss 120.0934, time 116.72ms
tensor(0.9990)
iter 50100: loss 150.1549, time 115.42ms
iter 50110: loss 144.5574, time 117.23ms
iter 50120: loss 138.2027, time 117.55ms
iter 50130: loss 109.0568, time 115.40ms
iter 50140: loss 123.6695, time 114.81ms
iter 50150: loss 141.2626, time 117.75ms
iter 50160: loss 137.1100, time 115.59ms
iter 50170: loss 129.6305, time 116.80ms
iter 50180: loss 169.6806, time 118.25ms
iter 50190: loss 107.5125, time 116.02ms
tensor(0.9961)
iter 50200: loss 116.7651, time 114.97ms
iter 50210: loss 172.4825, time 117.89ms
iter 50220: loss 146.1493, time 115.86ms
iter 50230: loss 194.0341, time 116.88ms
iter 50240: loss 174.7671, time 118.49ms
step 50250: train loss 103.5240, val loss 104.3450
saving checkpoint to out-shakespeare-char
iter 50250: loss 140.0739, time 2859.67ms
iter 50260: loss 130.8523, time 120.24ms
iter 50270: loss 136.6282, time 118.21ms
iter 50280: loss 120.2551, time 120.52ms
iter 50290: loss 147.6758, time 118.53ms
tensor(0.9911)
iter 50300: loss 129.9876, time 119.56ms
iter 50310: loss 137.6762, time 121.31ms
iter 50320: loss 187.6917, time 120.24ms
iter 50330: loss 121.2923, time 120.13ms
iter 50340: loss 200.3362, time 123.72ms
iter 50350: loss 125.6720, time 122.98ms
iter 50360: loss 138.8513, time 120.71ms
iter 50370: loss 159.4176, time 120.67ms
iter 50380: loss 162.4676, time 121.13ms
iter 50390: loss 117.2816, time 121.33ms
tensor(0.9843)
iter 50400: loss 144.9427, time 121.27ms
iter 50410: loss 142.7915, time 119.21ms
iter 50420: loss 127.1834, time 118.54ms
iter 50430: loss 169.2205, time 119.12ms
iter 50440: loss 218.3472, time 120.69ms
iter 50450: loss 130.7133, time 120.98ms
iter 50460: loss 137.7253, time 119.98ms
iter 50470: loss 155.8721, time 121.54ms
iter 50480: loss 144.9050, time 121.69ms
iter 50490: loss 159.2807, time 122.27ms
tensor(0.9755)
step 50500: train loss 130.3160, val loss 129.5788
saving checkpoint to out-shakespeare-char
iter 50500: loss 161.8297, time 2842.01ms
iter 50510: loss 147.7438, time 120.75ms
iter 50520: loss 145.3908, time 118.75ms
iter 50530: loss 174.6402, time 118.55ms
iter 50540: loss 203.0571, time 118.97ms
iter 50550: loss 138.4548, time 119.73ms
iter 50560: loss 116.3118, time 120.06ms
iter 50570: loss 124.3040, time 120.51ms
iter 50580: loss 166.9499, time 120.86ms
iter 50590: loss 155.8600, time 120.77ms
tensor(0.9649)
iter 50600: loss 164.3734, time 121.91ms
iter 50610: loss 161.0406, time 122.48ms
iter 50620: loss 137.8392, time 120.82ms
iter 50630: loss 143.5181, time 120.93ms
iter 50640: loss 137.5251, time 120.82ms
iter 50650: loss 153.7170, time 121.10ms
iter 50660: loss 129.7377, time 120.75ms
iter 50670: loss 169.7345, time 118.88ms
iter 50680: loss 159.1316, time 119.24ms
iter 50690: loss 173.3509, time 119.20ms
tensor(0.9524)
iter 50700: loss 208.2936, time 119.85ms
iter 50710: loss 141.8567, time 120.34ms
iter 50720: loss 171.2553, time 120.77ms
iter 50730: loss 161.4837, time 118.77ms
iter 50740: loss 146.2510, time 121.19ms
step 50750: train loss 118.8227, val loss 117.3251
saving checkpoint to out-shakespeare-char
iter 50750: loss 164.0537, time 2858.14ms
iter 50760: loss 134.6699, time 121.18ms
iter 50770: loss 135.5987, time 118.58ms
iter 50780: loss 156.1710, time 119.28ms
iter 50790: loss 195.8914, time 118.62ms
tensor(0.9382)
iter 50800: loss 155.8649, time 120.89ms
iter 50810: loss 133.8242, time 120.81ms
iter 50820: loss 118.3244, time 120.77ms
iter 50830: loss 145.9130, time 121.19ms
iter 50840: loss 176.3389, time 119.53ms
iter 50850: loss 150.1003, time 122.20ms
iter 50860: loss 139.6940, time 122.82ms
iter 50870: loss 152.8585, time 123.08ms
iter 50880: loss 131.4043, time 120.70ms
iter 50890: loss 127.8024, time 120.70ms
tensor(0.9222)
iter 50900: loss 116.0764, time 121.55ms
iter 50910: loss 166.3273, time 120.77ms
iter 50920: loss 164.1407, time 118.55ms
iter 50930: loss 133.5499, time 118.61ms
iter 50940: loss 123.9742, time 118.81ms
iter 50950: loss 144.6904, time 119.69ms
iter 50960: loss 138.6757, time 120.20ms
iter 50970: loss 142.9517, time 121.09ms
iter 50980: loss 154.8082, time 120.87ms
iter 50990: loss 115.0750, time 120.60ms
tensor(0.9045)
step 51000: train loss 89.7235, val loss 89.4094
saving checkpoint to out-shakespeare-char
iter 51000: loss 129.2874, time 2847.72ms
iter 51010: loss 170.9863, time 120.87ms
iter 51020: loss 150.3237, time 120.69ms
iter 51030: loss 133.3159, time 120.65ms
iter 51040: loss 168.5296, time 120.71ms
iter 51050: loss 158.3441, time 120.72ms
iter 51060: loss 144.0867, time 119.62ms
iter 51070: loss 153.0027, time 118.59ms
iter 51080: loss 149.2257, time 118.95ms
iter 51090: loss 99.5244, time 119.38ms
tensor(0.8853)
iter 51100: loss 142.3942, time 120.57ms
iter 51110: loss 126.1128, time 120.73ms
iter 51120: loss 135.4630, time 120.65ms
iter 51130: loss 129.5753, time 120.77ms
iter 51140: loss 131.1709, time 120.08ms
iter 51150: loss 161.4975, time 121.67ms
iter 51160: loss 154.1744, time 122.05ms
iter 51170: loss 126.3356, time 123.03ms
iter 51180: loss 171.1050, time 120.81ms
iter 51190: loss 166.5761, time 120.86ms
tensor(0.8645)
iter 51200: loss 146.2914, time 120.97ms
iter 51210: loss 127.6821, time 120.61ms
iter 51220: loss 151.4096, time 120.76ms
iter 51230: loss 146.3923, time 118.49ms
iter 51240: loss 127.6000, time 120.46ms
step 51250: train loss 99.8210, val loss 101.7755
saving checkpoint to out-shakespeare-char
iter 51250: loss 120.7822, time 2844.69ms
iter 51260: loss 141.1617, time 120.86ms
iter 51270: loss 144.7183, time 120.72ms
iter 51280: loss 122.1378, time 121.27ms
iter 51290: loss 98.2650, time 120.48ms
tensor(0.8423)
iter 51300: loss 122.6078, time 122.31ms
iter 51310: loss 131.9151, time 121.91ms
iter 51320: loss 134.9318, time 122.26ms
iter 51330: loss 131.0284, time 122.81ms
iter 51340: loss 130.3769, time 120.59ms
iter 51350: loss 107.0385, time 121.04ms
iter 51360: loss 115.9886, time 120.50ms
iter 51370: loss 117.6460, time 120.41ms
iter 51380: loss 117.6404, time 120.47ms
iter 51390: loss 122.9784, time 118.79ms
tensor(0.8187)
iter 51400: loss 145.6618, time 121.04ms
iter 51410: loss 113.3365, time 120.42ms
iter 51420: loss 144.6202, time 120.94ms
iter 51430: loss 132.7756, time 120.39ms
iter 51440: loss 119.7779, time 118.59ms
iter 51450: loss 103.4892, time 120.57ms
iter 51460: loss 125.9531, time 118.27ms
iter 51470: loss 99.1737, time 118.46ms
iter 51480: loss 151.3388, time 118.64ms
iter 51490: loss 121.3169, time 120.52ms
tensor(0.7939)
step 51500: train loss 82.9322, val loss 81.1341
saving checkpoint to out-shakespeare-char
iter 51500: loss 141.1285, time 2854.39ms
iter 51510: loss 113.8410, time 121.96ms
iter 51520: loss 122.5765, time 121.32ms
iter 51530: loss 125.1941, time 121.51ms
iter 51540: loss 122.1538, time 121.85ms
iter 51550: loss 139.8697, time 120.43ms
iter 51560: loss 118.9397, time 120.49ms
iter 51570: loss 138.8682, time 121.24ms
iter 51580: loss 115.6003, time 121.99ms
iter 51590: loss 110.0373, time 122.84ms
tensor(0.7679)
iter 51600: loss 93.0170, time 120.97ms
iter 51610: loss 126.3114, time 121.77ms
iter 51620: loss 126.0479, time 120.65ms
iter 51630: loss 115.0476, time 120.91ms
iter 51640: loss 132.8642, time 121.08ms
iter 51650: loss 108.9432, time 118.39ms
iter 51660: loss 142.9386, time 120.58ms
iter 51670: loss 106.9538, time 120.49ms
iter 51680: loss 122.2019, time 119.87ms
iter 51690: loss 100.8939, time 118.34ms
tensor(0.7409)
iter 51700: loss 131.4651, time 119.69ms
iter 51710: loss 111.9258, time 118.70ms
iter 51720: loss 106.4741, time 119.68ms
iter 51730: loss 117.4392, time 119.99ms
iter 51740: loss 112.6230, time 120.31ms
step 51750: train loss 62.8550, val loss 62.9973
saving checkpoint to out-shakespeare-char
iter 51750: loss 105.1196, time 2852.37ms
iter 51760: loss 101.3736, time 121.28ms
iter 51770: loss 129.2192, time 119.81ms
iter 51780: loss 122.2104, time 120.60ms
iter 51790: loss 112.1673, time 121.06ms
tensor(0.7129)
iter 51800: loss 104.2380, time 121.88ms
iter 51810: loss 97.3267, time 119.40ms
iter 51820: loss 105.3281, time 119.32ms
iter 51830: loss 100.5243, time 119.81ms
iter 51840: loss 98.2960, time 119.65ms
iter 51850: loss 100.1728, time 121.13ms
iter 51860: loss 121.9892, time 120.93ms
iter 51870: loss 119.3695, time 119.07ms
iter 51880: loss 79.8239, time 122.84ms
iter 51890: loss 98.8638, time 123.30ms
tensor(0.6841)
iter 51900: loss 102.0712, time 121.36ms
iter 51910: loss 106.1947, time 121.51ms
iter 51920: loss 89.7683, time 119.24ms
iter 51930: loss 92.8949, time 120.77ms
iter 51940: loss 91.7645, time 120.87ms
iter 51950: loss 121.9930, time 121.32ms
iter 51960: loss 91.9409, time 121.17ms
iter 51970: loss 87.2668, time 121.85ms
iter 51980: loss 83.8686, time 123.25ms
iter 51990: loss 85.9425, time 121.34ms
tensor(0.6545)
step 52000: train loss 64.9558, val loss 63.5416
saving checkpoint to out-shakespeare-char
iter 52000: loss 104.0238, time 2856.77ms
iter 52010: loss 92.9453, time 121.17ms
iter 52020: loss 95.4032, time 121.95ms
iter 52030: loss 85.8620, time 122.27ms
iter 52040: loss 77.3316, time 122.59ms
iter 52050: loss 97.6197, time 121.37ms
iter 52060: loss 96.4086, time 119.15ms
iter 52070: loss 93.7189, time 121.23ms
iter 52080: loss 82.6622, time 119.61ms
iter 52090: loss 81.0816, time 120.74ms
tensor(0.6243)
iter 52100: loss 75.2536, time 121.31ms
iter 52110: loss 84.1372, time 121.55ms
iter 52120: loss 99.2552, time 120.28ms
iter 52130: loss 89.5486, time 123.41ms
iter 52140: loss 97.7293, time 120.98ms
iter 52150: loss 94.5062, time 121.07ms
iter 52160: loss 74.6769, time 121.26ms
iter 52170: loss 71.0558, time 120.96ms
iter 52180: loss 71.8247, time 120.11ms
iter 52190: loss 93.0496, time 120.53ms
tensor(0.5937)
iter 52200: loss 72.2166, time 121.49ms
iter 52210: loss 68.5146, time 121.88ms
iter 52220: loss 74.2356, time 122.02ms
iter 52230: loss 79.0860, time 123.27ms
iter 52240: loss 72.8558, time 121.18ms
step 52250: train loss 48.7279, val loss 47.8468
saving checkpoint to out-shakespeare-char
iter 52250: loss 69.2313, time 2846.43ms
iter 52260: loss 74.6700, time 119.97ms
iter 52270: loss 56.1214, time 121.38ms
iter 52280: loss 74.9756, time 121.22ms
iter 52290: loss 74.6961, time 121.72ms
tensor(0.5627)
iter 52300: loss 66.1399, time 122.87ms
iter 52310: loss 65.2016, time 121.53ms
iter 52320: loss 71.9357, time 120.80ms
iter 52330: loss 63.5148, time 120.83ms
iter 52340: loss 65.9162, time 120.99ms
iter 52350: loss 63.7355, time 121.60ms
iter 52360: loss 59.3886, time 119.69ms
iter 52370: loss 57.3485, time 119.39ms
iter 52380: loss 56.3322, time 120.95ms
iter 52390: loss 62.6561, time 121.50ms
tensor(0.5314)
iter 52400: loss 65.6688, time 121.04ms
iter 52410: loss 63.4014, time 122.96ms
iter 52420: loss 68.3497, time 121.12ms
iter 52430: loss 66.0353, time 121.03ms
iter 52440: loss 57.1593, time 120.64ms
iter 52450: loss 58.2641, time 120.85ms
iter 52460: loss 56.2378, time 121.25ms
iter 52470: loss 63.9826, time 121.08ms
iter 52480: loss 59.0449, time 119.09ms
iter 52490: loss 61.5312, time 120.41ms
tensor(0.5000)
step 52500: train loss 33.7628, val loss 33.6782
saving checkpoint to out-shakespeare-char
iter 52500: loss 53.3729, time 2857.07ms
iter 52510: loss 58.5656, time 120.65ms
iter 52520: loss 48.4594, time 123.46ms
iter 52530: loss 62.8942, time 120.81ms
iter 52540: loss 45.8904, time 120.72ms
iter 52550: loss 54.5865, time 120.36ms
iter 52560: loss 53.0041, time 120.43ms
iter 52570: loss 50.3785, time 120.17ms
iter 52580: loss 59.1265, time 118.31ms
iter 52590: loss 55.0061, time 118.22ms
tensor(0.4686)
iter 52600: loss 54.6726, time 118.96ms
iter 52610: loss 56.7015, time 119.24ms
iter 52620: loss 46.1355, time 120.28ms
iter 52630: loss 49.4246, time 120.19ms
iter 52640: loss 43.6916, time 120.60ms
iter 52650: loss 38.2238, time 121.81ms
iter 52660: loss 46.7070, time 121.17ms
iter 52670: loss 46.5361, time 123.11ms
iter 52680: loss 40.2874, time 123.38ms
iter 52690: loss 38.8746, time 123.45ms
tensor(0.4373)
iter 52700: loss 37.5731, time 121.36ms
iter 52710: loss 39.3746, time 119.20ms
iter 52720: loss 44.7724, time 121.21ms
iter 52730: loss 39.7532, time 120.34ms
iter 52740: loss 37.1608, time 121.72ms
step 52750: train loss 21.5532, val loss 21.4467
saving checkpoint to out-shakespeare-char
iter 52750: loss 38.7448, time 2883.03ms
iter 52760: loss 45.2745, time 118.99ms
iter 52770: loss 37.4685, time 120.57ms
iter 52780: loss 33.6415, time 119.90ms
iter 52790: loss 39.0136, time 120.60ms
tensor(0.4063)
iter 52800: loss 40.0818, time 122.23ms
iter 52810: loss 32.1981, time 122.42ms
iter 52820: loss 37.4274, time 121.84ms
iter 52830: loss 35.9105, time 120.53ms
iter 52840: loss 30.1627, time 122.17ms
iter 52850: loss 30.5242, time 122.65ms
iter 52860: loss 31.7408, time 122.83ms
iter 52870: loss 28.8942, time 122.35ms
iter 52880: loss 31.3361, time 121.28ms
iter 52890: loss 33.4179, time 119.59ms
tensor(0.3757)
iter 52900: loss 33.2316, time 119.22ms
iter 52910: loss 32.3136, time 117.70ms
iter 52920: loss 28.4808, time 118.86ms
iter 52930: loss 26.9932, time 118.69ms
iter 52940: loss 26.7278, time 118.77ms
iter 52950: loss 28.0552, time 119.02ms
iter 52960: loss 26.5525, time 119.08ms
iter 52970: loss 27.3976, time 125.30ms
iter 52980: loss 26.0485, time 121.98ms
iter 52990: loss 21.8646, time 121.90ms
tensor(0.3455)
step 53000: train loss 10.3560, val loss 10.3278
saving checkpoint to out-shakespeare-char
iter 53000: loss 20.2442, time 2840.41ms
iter 53010: loss 23.9583, time 119.23ms
iter 53020: loss 25.7485, time 118.95ms
iter 53030: loss 21.9347, time 119.05ms
iter 53040: loss 20.6716, time 119.29ms
iter 53050: loss 18.3016, time 118.69ms
iter 53060: loss 17.5347, time 118.25ms
iter 53070: loss 16.6500, time 119.56ms
iter 53080: loss 16.8384, time 120.39ms
iter 53090: loss 16.1725, time 120.88ms
tensor(0.3159)
iter 53100: loss 16.0095, time 122.03ms
iter 53110: loss 15.1598, time 120.04ms
iter 53120: loss 15.3584, time 121.31ms
iter 53130: loss 15.1205, time 122.70ms
iter 53140: loss 14.5753, time 116.81ms
iter 53150: loss 14.0865, time 120.71ms
iter 53160: loss 14.3771, time 120.93ms
iter 53170: loss 13.9664, time 121.20ms
iter 53180: loss 13.1212, time 122.23ms
iter 53190: loss 13.7650, time 122.99ms
tensor(0.2871)
iter 53200: loss 13.3531, time 122.71ms
iter 53210: loss 13.7735, time 122.88ms
iter 53220: loss 12.7871, time 120.96ms
iter 53230: loss 12.5709, time 121.00ms
iter 53240: loss 12.6067, time 119.18ms
step 53250: train loss 7.9820, val loss 7.9497
saving checkpoint to out-shakespeare-char
iter 53250: loss 12.4769, time 2859.64ms
iter 53260: loss 12.6058, time 123.12ms
iter 53270: loss 11.3852, time 123.34ms
iter 53280: loss 12.8060, time 121.06ms
iter 53290: loss 11.0475, time 121.85ms
tensor(0.2591)
iter 53300: loss 11.2269, time 120.00ms
iter 53310: loss 10.9902, time 120.01ms
iter 53320: loss 11.8104, time 122.69ms
iter 53330: loss 11.5164, time 122.78ms
iter 53340: loss 11.9871, time 122.37ms
iter 53350: loss 11.2603, time 119.29ms
iter 53360: loss 11.4595, time 119.32ms
iter 53370: loss 11.4302, time 118.52ms
iter 53380: loss 11.0540, time 119.29ms
iter 53390: loss 10.7256, time 119.58ms
tensor(0.2321)
iter 53400: loss 11.5161, time 121.23ms
iter 53410: loss 11.3399, time 121.42ms
iter 53420: loss 11.1567, time 121.52ms
iter 53430: loss 11.4299, time 123.25ms
iter 53440: loss 10.7613, time 123.18ms
iter 53450: loss 10.5872, time 121.26ms
iter 53460: loss 10.7985, time 121.11ms
iter 53470: loss 11.1835, time 119.01ms
iter 53480: loss 10.0975, time 118.91ms
iter 53490: loss 10.3967, time 119.02ms
tensor(0.2061)
step 53500: train loss 7.8197, val loss 7.8815
saving checkpoint to out-shakespeare-char
iter 53500: loss 10.2495, time 2852.38ms
iter 53510: loss 10.1567, time 123.30ms
iter 53520: loss 9.6381, time 123.16ms
iter 53530: loss 9.8194, time 119.98ms
iter 53540: loss 10.3835, time 121.00ms
iter 53550: loss 10.6329, time 121.12ms
iter 53560: loss 10.0611, time 119.15ms
iter 53570: loss 9.3689, time 118.88ms
iter 53580: loss 10.0494, time 118.08ms
iter 53590: loss 9.5986, time 119.31ms
tensor(0.1813)
iter 53600: loss 9.3986, time 121.04ms
iter 53610: loss 9.8997, time 121.58ms
iter 53620: loss 10.3147, time 122.16ms
iter 53630: loss 9.7768, time 122.97ms
iter 53640: loss 9.7005, time 121.16ms
iter 53650: loss 9.5902, time 123.13ms
iter 53660: loss 9.5141, time 121.07ms
iter 53670: loss 9.5520, time 121.11ms
iter 53680: loss 10.2407, time 119.35ms
iter 53690: loss 9.6307, time 119.91ms
tensor(0.1577)
iter 53700: loss 9.9475, time 119.00ms
iter 53710: loss 9.4164, time 120.07ms
iter 53720: loss 9.6333, time 121.09ms
iter 53730: loss 9.0407, time 120.97ms
iter 53740: loss 9.4654, time 120.90ms
step 53750: train loss 7.8188, val loss 7.8510
saving checkpoint to out-shakespeare-char
iter 53750: loss 9.2781, time 2862.84ms
iter 53760: loss 10.3672, time 118.26ms
iter 53770: loss 9.8773, time 118.99ms
iter 53780: loss 10.2325, time 119.10ms
iter 53790: loss 9.1081, time 119.28ms
tensor(0.1355)
iter 53800: loss 9.7160, time 120.58ms
iter 53810: loss 9.5276, time 121.72ms
iter 53820: loss 9.5889, time 121.87ms
iter 53830: loss 9.2079, time 121.18ms
iter 53840: loss 8.9708, time 122.78ms
iter 53850: loss 8.9401, time 123.22ms
iter 53860: loss 8.7367, time 123.17ms
iter 53870: loss 9.0656, time 122.41ms
iter 53880: loss 9.4923, time 120.18ms
iter 53890: loss 8.9881, time 121.76ms
tensor(0.1147)
iter 53900: loss 8.8554, time 119.30ms
iter 53910: loss 8.9865, time 119.25ms
iter 53920: loss 9.3470, time 118.93ms
iter 53930: loss 8.9173, time 119.04ms
iter 53940: loss 8.4108, time 119.14ms
iter 53950: loss 8.8563, time 120.03ms
iter 53960: loss 9.0294, time 120.94ms
iter 53970: loss 8.8378, time 121.47ms
iter 53980: loss 9.2388, time 121.69ms
iter 53990: loss 9.6444, time 121.28ms
tensor(0.0955)
step 54000: train loss 7.7905, val loss 7.8250
saving checkpoint to out-shakespeare-char
iter 54000: loss 8.7323, time 2844.88ms
iter 54010: loss 8.7621, time 120.85ms
iter 54020: loss 8.8168, time 120.32ms
iter 54030: loss 8.8302, time 123.61ms
iter 54040: loss 8.9900, time 121.30ms
iter 54050: loss 9.1919, time 120.85ms
iter 54060: loss 9.8858, time 120.08ms
iter 54070: loss 8.0994, time 119.04ms
iter 54080: loss 9.2152, time 118.87ms
iter 54090: loss 8.6810, time 118.59ms
tensor(0.0778)
iter 54100: loss 8.7844, time 118.59ms
iter 54110: loss 8.6191, time 119.25ms
iter 54120: loss 8.9888, time 121.03ms
iter 54130: loss 8.8866, time 121.45ms
iter 54140: loss 8.7495, time 121.84ms
iter 54150: loss 9.5604, time 120.38ms
iter 54160: loss 9.0459, time 122.35ms
iter 54170: loss 9.0875, time 123.11ms
iter 54180: loss 9.1511, time 122.32ms
iter 54190: loss 8.2563, time 121.01ms
tensor(0.0618)
iter 54200: loss 9.0104, time 121.41ms
iter 54210: loss 8.2658, time 119.37ms
iter 54220: loss 9.4194, time 119.04ms
iter 54230: loss 9.0603, time 118.95ms
iter 54240: loss 8.8652, time 120.36ms
step 54250: train loss 7.7270, val loss 7.7639
saving checkpoint to out-shakespeare-char
iter 54250: loss 8.6721, time 2869.62ms
iter 54260: loss 8.5908, time 119.22ms
iter 54270: loss 8.6786, time 119.11ms
iter 54280: loss 8.7903, time 119.86ms
iter 54290: loss 8.3074, time 120.75ms
tensor(0.0476)
iter 54300: loss 8.9598, time 121.42ms
iter 54310: loss 9.0967, time 123.40ms
iter 54320: loss 9.3106, time 123.47ms
iter 54330: loss 8.9968, time 121.92ms
iter 54340: loss 8.3107, time 119.27ms
iter 54350: loss 8.9870, time 118.80ms
iter 54360: loss 8.8263, time 119.85ms
iter 54370: loss 8.1331, time 118.89ms
iter 54380: loss 8.7091, time 120.83ms
iter 54390: loss 9.0154, time 121.97ms
tensor(0.0351)
iter 54400: loss 8.7845, time 121.73ms
iter 54410: loss 8.2037, time 122.99ms
iter 54420: loss 8.9022, time 123.25ms
iter 54430: loss 8.5428, time 121.20ms
iter 54440: loss 8.5552, time 121.07ms
iter 54450: loss 8.7989, time 119.35ms
iter 54460: loss 8.2392, time 119.26ms
iter 54470: loss 8.7123, time 119.21ms
iter 54480: loss 8.8109, time 119.72ms
iter 54490: loss 8.4842, time 120.84ms
tensor(0.0245)
step 54500: train loss 7.7187, val loss 7.6870
saving checkpoint to out-shakespeare-char
iter 54500: loss 8.9378, time 2856.85ms
iter 54510: loss 8.7643, time 121.19ms
iter 54520: loss 8.7859, time 119.26ms
iter 54530: loss 8.5906, time 119.68ms
iter 54540: loss 8.4508, time 119.02ms
iter 54550: loss 8.8375, time 118.67ms
iter 54560: loss 8.8054, time 120.17ms
iter 54570: loss 8.6031, time 121.85ms
iter 54580: loss 8.9725, time 122.33ms
iter 54590: loss 8.8722, time 121.08ms
tensor(0.0157)
iter 54600: loss 8.7067, time 123.33ms
iter 54610: loss 9.2707, time 123.21ms
iter 54620: loss 8.9153, time 121.75ms
iter 54630: loss 8.7954, time 121.46ms
iter 54640: loss 8.1095, time 118.87ms
iter 54650: loss 8.5072, time 119.14ms
iter 54660: loss 8.5347, time 119.20ms
iter 54670: loss 8.7321, time 118.51ms
iter 54680: loss 9.1385, time 119.77ms
iter 54690: loss 8.3071, time 121.85ms
tensor(0.0089)
iter 54700: loss 8.3362, time 119.09ms
iter 54710: loss 8.3428, time 123.14ms
iter 54720: loss 8.9437, time 122.75ms
iter 54730: loss 8.7773, time 122.91ms
iter 54740: loss 8.9727, time 122.27ms
step 54750: train loss 7.6649, val loss 7.6682
saving checkpoint to out-shakespeare-char
iter 54750: loss 8.6813, time 2857.70ms
iter 54760: loss 8.5820, time 119.64ms
iter 54770: loss 8.3684, time 122.51ms
iter 54780: loss 8.8478, time 121.96ms
iter 54790: loss 8.3621, time 123.15ms
tensor(0.0039)
iter 54800: loss 8.3599, time 123.16ms
iter 54810: loss 7.9902, time 121.60ms
iter 54820: loss 8.2308, time 121.19ms
iter 54830: loss 8.3335, time 119.06ms
iter 54840: loss 8.7996, time 118.21ms
iter 54850: loss 8.3731, time 119.32ms
iter 54860: loss 8.8476, time 120.08ms
iter 54870: loss 7.9754, time 121.44ms
iter 54880: loss 8.7032, time 121.46ms
iter 54890: loss 8.9823, time 120.70ms
tensor(0.0010)
iter 54900: loss 8.3635, time 122.16ms
iter 54910: loss 8.1832, time 122.07ms
iter 54920: loss 8.3906, time 122.66ms
iter 54930: loss 9.0135, time 122.59ms
iter 54940: loss 9.1240, time 119.77ms
iter 54950: loss 8.7265, time 122.59ms
iter 54960: loss 9.0105, time 122.59ms
iter 54970: loss 8.5365, time 122.52ms
iter 54980: loss 8.9610, time 122.31ms
iter 54990: loss 8.6612, time 118.67ms
tensor(0.0010)
step 55000: train loss 7.6576, val loss 7.6542
saving checkpoint to out-shakespeare-char
iter 55000: loss 8.3508, time 2860.93ms
iter 55010: loss 8.1513, time 119.59ms
iter 55020: loss 8.6311, time 119.50ms
iter 55030: loss 8.2643, time 120.25ms
iter 55040: loss 9.5156, time 121.41ms
iter 55050: loss 8.7685, time 121.21ms
iter 55060: loss 8.4339, time 123.13ms
iter 55070: loss 8.3898, time 123.07ms
iter 55080: loss 8.8142, time 123.21ms
iter 55090: loss 8.6784, time 121.34ms
tensor(0.0010)
iter 55100: loss 8.7865, time 119.33ms
iter 55110: loss 8.7855, time 118.93ms
iter 55120: loss 8.3967, time 119.02ms
iter 55130: loss 8.3228, time 119.65ms
iter 55140: loss 8.7538, time 120.35ms
iter 55150: loss 8.1995, time 121.25ms
iter 55160: loss 8.8266, time 120.97ms
iter 55170: loss 8.0779, time 122.81ms
iter 55180: loss 8.5449, time 123.17ms
iter 55190: loss 8.3828, time 123.39ms
tensor(0.0039)
iter 55200: loss 8.8325, time 123.36ms
iter 55210: loss 7.8153, time 121.02ms
iter 55220: loss 8.7945, time 120.35ms
iter 55230: loss 8.7592, time 118.91ms
iter 55240: loss 8.7407, time 118.89ms
step 55250: train loss 7.6516, val loss 7.6819
saving checkpoint to out-shakespeare-char
iter 55250: loss 8.9101, time 2857.25ms
iter 55260: loss 8.7369, time 122.87ms
iter 55270: loss 9.0692, time 120.21ms
iter 55280: loss 8.8537, time 120.95ms
iter 55290: loss 8.5231, time 120.97ms
tensor(0.0089)
iter 55300: loss 8.7057, time 119.77ms
iter 55310: loss 8.7720, time 119.17ms
iter 55320: loss 8.5849, time 120.09ms
iter 55330: loss 8.4093, time 119.74ms
iter 55340: loss 8.2945, time 120.08ms
iter 55350: loss 9.0135, time 120.51ms
iter 55360: loss 9.2173, time 120.81ms
iter 55370: loss 9.4308, time 120.41ms
iter 55380: loss 8.5270, time 121.24ms
iter 55390: loss 8.1211, time 121.11ms
tensor(0.0157)
iter 55400: loss 8.8795, time 121.31ms
iter 55410: loss 8.5491, time 122.14ms
iter 55420: loss 8.6042, time 122.70ms
iter 55430: loss 7.8963, time 122.65ms
iter 55440: loss 9.2878, time 122.71ms
iter 55450: loss 8.6960, time 120.54ms
iter 55460: loss 8.6841, time 121.80ms
iter 55470: loss 8.7386, time 122.80ms
iter 55480: loss 8.7830, time 121.05ms
iter 55490: loss 8.4496, time 120.72ms
tensor(0.0245)
step 55500: train loss 7.6797, val loss 7.6939
saving checkpoint to out-shakespeare-char
iter 55500: loss 8.3966, time 2864.36ms
iter 55510: loss 8.6338, time 121.38ms
iter 55520: loss 8.7445, time 123.23ms
iter 55530: loss 8.8240, time 123.19ms
iter 55540: loss 8.9173, time 123.19ms
iter 55550: loss 8.3382, time 121.37ms
iter 55560: loss 8.5224, time 119.19ms
iter 55570: loss 8.8987, time 119.14ms
iter 55580: loss 8.4104, time 119.42ms
iter 55590: loss 8.8792, time 120.19ms
tensor(0.0351)
iter 55600: loss 8.0671, time 121.47ms
iter 55610: loss 8.5998, time 122.11ms
iter 55620: loss 8.5898, time 121.46ms
iter 55630: loss 8.3269, time 123.31ms
iter 55640: loss 8.8725, time 123.24ms
iter 55650: loss 8.2091, time 121.50ms
iter 55660: loss 8.5265, time 119.51ms
iter 55670: loss 8.5084, time 119.55ms
iter 55680: loss 8.3497, time 119.32ms
iter 55690: loss 8.3982, time 119.69ms
tensor(0.0476)
iter 55700: loss 8.6896, time 120.87ms
iter 55710: loss 8.4743, time 121.88ms
iter 55720: loss 9.2616, time 122.55ms
iter 55730: loss 8.1910, time 123.31ms
iter 55740: loss 8.8282, time 123.12ms
step 55750: train loss 7.6910, val loss 7.7079
saving checkpoint to out-shakespeare-char
iter 55750: loss 8.4651, time 2861.91ms
iter 55760: loss 8.3662, time 121.31ms
iter 55770: loss 8.7323, time 123.42ms
iter 55780: loss 8.7281, time 120.74ms
iter 55790: loss 8.2276, time 121.53ms
tensor(0.0618)
iter 55800: loss 8.6399, time 120.97ms
iter 55810: loss 8.4935, time 119.14ms
iter 55820: loss 8.3359, time 119.07ms
iter 55830: loss 8.6219, time 121.34ms
iter 55840: loss 8.4684, time 121.05ms
iter 55850: loss 8.7349, time 122.48ms
iter 55860: loss 8.5322, time 123.11ms
iter 55870: loss 8.7926, time 121.23ms
iter 55880: loss 8.4641, time 120.97ms
iter 55890: loss 8.3726, time 120.40ms
tensor(0.0778)
iter 55900: loss 8.3966, time 119.45ms
iter 55910: loss 8.4945, time 119.42ms
iter 55920: loss 8.4966, time 119.91ms
iter 55930: loss 8.6193, time 121.31ms
iter 55940: loss 7.7788, time 120.37ms
iter 55950: loss 8.1948, time 121.05ms
iter 55960: loss 8.7063, time 122.97ms
iter 55970: loss 8.7642, time 123.28ms
iter 55980: loss 8.4439, time 121.14ms
iter 55990: loss 8.9277, time 121.29ms
tensor(0.0955)
step 56000: train loss 7.7847, val loss 7.7827
saving checkpoint to out-shakespeare-char
iter 56000: loss 8.4202, time 2852.33ms
iter 56010: loss 8.3669, time 121.54ms
iter 56020: loss 8.4439, time 123.11ms
iter 56030: loss 8.7220, time 121.68ms
iter 56040: loss 8.6933, time 122.05ms
iter 56050: loss 8.5037, time 120.66ms
iter 56060: loss 8.4825, time 118.85ms
iter 56070: loss 8.9182, time 121.06ms
iter 56080: loss 8.5023, time 120.66ms
iter 56090: loss 7.9190, time 120.73ms
tensor(0.1147)
iter 56100: loss 8.5065, time 120.93ms
iter 56110: loss 8.1871, time 118.44ms
iter 56120: loss 8.3000, time 119.00ms
iter 56130: loss 8.2455, time 119.77ms
iter 56140: loss 8.5611, time 120.70ms
iter 56150: loss 8.4513, time 121.49ms
iter 56160: loss 8.2363, time 120.67ms
iter 56170: loss 8.0197, time 119.11ms
iter 56180: loss 8.5966, time 121.82ms
iter 56190: loss 8.3617, time 123.01ms
tensor(0.1355)
iter 56200: loss 8.4184, time 123.10ms
iter 56210: loss 7.8059, time 121.83ms
iter 56220: loss 8.4473, time 120.93ms
iter 56230: loss 8.9090, time 120.57ms
iter 56240: loss 8.3648, time 120.50ms
step 56250: train loss 7.8386, val loss 7.7959
saving checkpoint to out-shakespeare-char
iter 56250: loss 8.8229, time 2847.57ms
iter 56260: loss 8.7615, time 121.16ms
iter 56270: loss 7.8265, time 120.82ms
iter 56280: loss 7.9743, time 118.85ms
iter 56290: loss 8.1303, time 121.36ms
tensor(0.1577)
iter 56300: loss 8.6214, time 122.46ms
iter 56310: loss 8.4640, time 123.02ms
iter 56320: loss 8.4438, time 122.83ms
iter 56330: loss 8.8991, time 120.89ms
iter 56340: loss 8.3935, time 120.53ms
iter 56350: loss 8.2540, time 120.65ms
iter 56360: loss 8.4389, time 120.53ms
iter 56370: loss 8.1788, time 120.75ms
iter 56380: loss 8.7768, time 120.76ms
iter 56390: loss 7.7187, time 120.28ms
tensor(0.1813)
iter 56400: loss 7.6134, time 119.07ms
iter 56410: loss 8.4318, time 118.57ms
iter 56420: loss 8.8672, time 118.60ms
iter 56430: loss 8.3733, time 119.09ms
iter 56440: loss 8.7133, time 119.56ms
iter 56450: loss 8.0193, time 119.81ms
iter 56460: loss 7.9150, time 119.79ms
iter 56470: loss 8.5164, time 120.56ms
iter 56480: loss 8.2194, time 120.80ms
iter 56490: loss 8.4751, time 120.64ms
tensor(0.2061)
step 56500: train loss 7.8426, val loss 7.7990
saving checkpoint to out-shakespeare-char
iter 56500: loss 8.3357, time 2853.75ms
iter 56510: loss 8.6662, time 120.62ms
iter 56520: loss 7.6446, time 120.83ms
iter 56530: loss 8.4894, time 119.27ms
iter 56540: loss 7.9498, time 118.84ms
iter 56550: loss 7.8614, time 118.89ms
iter 56560: loss 7.8701, time 120.78ms
iter 56570: loss 8.2160, time 121.74ms
iter 56580: loss 8.3240, time 121.72ms
iter 56590: loss 8.8395, time 122.49ms
tensor(0.2321)
iter 56600: loss 8.4217, time 123.91ms
iter 56610: loss 8.5532, time 121.33ms
iter 56620: loss 8.5334, time 119.19ms
iter 56630: loss 8.0819, time 121.14ms
iter 56640: loss 7.8685, time 119.04ms
iter 56650: loss 8.8693, time 120.50ms
iter 56660: loss 8.1788, time 120.96ms
iter 56670: loss 7.6736, time 121.11ms
iter 56680: loss 8.3387, time 119.84ms
iter 56690: loss 8.0710, time 123.11ms
tensor(0.2591)
iter 56700: loss 8.5408, time 122.70ms
iter 56710: loss 8.1644, time 121.60ms
iter 56720: loss 8.4194, time 121.49ms
iter 56730: loss 8.5329, time 119.28ms
iter 56740: loss 8.1214, time 119.47ms
step 56750: train loss 7.8553, val loss 7.8481
saving checkpoint to out-shakespeare-char
iter 56750: loss 8.1716, time 2838.37ms
iter 56760: loss 8.3535, time 123.11ms
iter 56770: loss 7.9555, time 123.50ms
iter 56780: loss 7.7842, time 121.88ms
iter 56790: loss 7.6055, time 120.73ms
tensor(0.2871)
iter 56800: loss 8.5597, time 121.32ms
iter 56810: loss 8.5122, time 118.95ms
iter 56820: loss 8.3057, time 120.15ms
iter 56830: loss 8.3888, time 121.15ms
iter 56840: loss 8.2596, time 121.12ms
iter 56850: loss 8.0615, time 121.93ms
iter 56860: loss 7.8953, time 122.19ms
iter 56870: loss 8.3490, time 121.33ms
iter 56880: loss 7.7520, time 120.42ms
iter 56890: loss 8.2482, time 121.10ms
tensor(0.3159)
iter 56900: loss 8.0762, time 122.05ms
iter 56910: loss 8.6676, time 119.20ms
iter 56920: loss 8.5624, time 120.15ms
iter 56930: loss 8.3521, time 119.09ms
iter 56940: loss 8.1177, time 121.23ms
iter 56950: loss 7.8229, time 121.60ms
iter 56960: loss 8.2211, time 122.61ms
iter 56970: loss 8.4542, time 122.84ms
iter 56980: loss 8.2211, time 121.31ms
iter 56990: loss 7.8885, time 121.48ms
tensor(0.3455)
step 57000: train loss 7.8614, val loss 7.9125
saving checkpoint to out-shakespeare-char
iter 57000: loss 8.5013, time 2844.59ms
iter 57010: loss 8.0280, time 118.86ms
iter 57020: loss 8.6690, time 119.16ms
iter 57030: loss 8.4091, time 119.21ms
iter 57040: loss 8.1578, time 117.90ms
iter 57050: loss 8.0354, time 119.96ms
iter 57060: loss 8.7490, time 121.70ms
iter 57070: loss 8.5271, time 121.11ms
iter 57080: loss 8.3787, time 121.38ms
iter 57090: loss 7.9616, time 120.38ms
tensor(0.3757)
iter 57100: loss 8.2271, time 123.77ms
iter 57110: loss 8.0871, time 121.03ms
iter 57120: loss 8.5650, time 121.15ms
iter 57130: loss 8.2798, time 121.19ms
iter 57140: loss 7.9512, time 119.16ms
iter 57150: loss 8.2832, time 121.08ms
iter 57160: loss 8.3380, time 121.10ms
iter 57170: loss 8.5016, time 121.32ms
iter 57180: loss 8.2995, time 121.74ms
iter 57190: loss 8.3225, time 123.40ms
tensor(0.4063)
iter 57200: loss 7.8169, time 123.60ms
iter 57210: loss 8.1515, time 121.03ms
iter 57220: loss 8.6710, time 118.00ms
iter 57230: loss 8.4474, time 121.20ms
iter 57240: loss 7.9815, time 118.94ms
step 57250: train loss 7.9243, val loss 7.9078
saving checkpoint to out-shakespeare-char
iter 57250: loss 8.5554, time 2844.68ms
iter 57260: loss 7.8502, time 125.23ms
iter 57270: loss 8.7468, time 121.26ms
iter 57280: loss 8.3689, time 117.87ms
iter 57290: loss 8.0761, time 121.19ms
tensor(0.4373)
iter 57300: loss 8.4458, time 119.43ms
iter 57310: loss 7.7086, time 119.59ms
iter 57320: loss 8.2478, time 120.87ms
iter 57330: loss 8.2509, time 121.36ms
iter 57340: loss 8.3589, time 119.19ms
iter 57350: loss 7.6274, time 122.21ms
iter 57360: loss 8.3450, time 123.29ms
iter 57370: loss 7.4976, time 121.14ms
iter 57380: loss 8.3268, time 121.11ms
iter 57390: loss 7.6414, time 121.00ms
tensor(0.4686)
iter 57400: loss 8.8740, time 119.27ms
iter 57410: loss 7.7629, time 119.13ms
iter 57420: loss 8.0997, time 120.15ms
iter 57430: loss 8.4946, time 121.24ms
iter 57440: loss 8.1730, time 121.55ms
iter 57450: loss 8.0463, time 122.03ms
iter 57460: loss 8.3198, time 122.88ms
iter 57470: loss 8.2543, time 121.89ms
iter 57480: loss 8.6415, time 120.80ms
iter 57490: loss 8.4457, time 121.46ms
tensor(0.5000)
step 57500: train loss 7.9633, val loss 7.9327
saving checkpoint to out-shakespeare-char
iter 57500: loss 8.5349, time 2819.35ms
iter 57510: loss 7.9275, time 121.11ms
iter 57520: loss 8.2739, time 119.25ms
iter 57530: loss 8.1189, time 119.75ms
iter 57540: loss 8.4045, time 121.48ms
iter 57550: loss 8.2450, time 120.97ms
iter 57560: loss 8.0277, time 121.71ms
iter 57570: loss 7.9474, time 122.99ms
iter 57580: loss 8.4262, time 121.22ms
iter 57590: loss 7.8933, time 121.00ms
tensor(0.5314)
iter 57600: loss 8.4658, time 121.22ms
iter 57610: loss 8.4490, time 121.04ms
iter 57620: loss 7.9235, time 121.16ms
iter 57630: loss 8.8314, time 119.82ms
iter 57640: loss 8.2608, time 118.87ms
iter 57650: loss 8.0129, time 120.91ms
iter 57660: loss 8.3612, time 120.88ms
iter 57670: loss 8.4398, time 121.62ms
iter 57680: loss 8.1161, time 122.54ms
iter 57690: loss 8.4857, time 121.08ms
tensor(0.5627)
iter 57700: loss 8.4936, time 121.42ms
iter 57710: loss 8.0708, time 121.08ms
iter 57720: loss 8.1642, time 120.94ms
iter 57730: loss 8.2883, time 121.09ms
iter 57740: loss 7.9437, time 118.80ms
step 57750: train loss 7.9594, val loss 7.9489
saving checkpoint to out-shakespeare-char
iter 57750: loss 7.7542, time 2830.97ms
iter 57760: loss 7.4636, time 121.81ms
iter 57770: loss 8.2214, time 123.39ms
iter 57780: loss 8.3444, time 122.54ms
iter 57790: loss 7.4913, time 120.95ms
tensor(0.5937)
iter 57800: loss 8.6629, time 121.34ms
iter 57810: loss 8.4412, time 120.64ms
iter 57820: loss 8.4510, time 121.40ms
iter 57830: loss 8.8239, time 118.67ms
iter 57840: loss 8.2392, time 119.55ms
iter 57850: loss 7.9335, time 119.77ms
iter 57860: loss 8.6574, time 120.90ms
iter 57870: loss 7.9922, time 121.19ms
iter 57880: loss 8.1419, time 121.11ms
iter 57890: loss 8.3510, time 122.46ms
tensor(0.6243)
iter 57900: loss 8.0710, time 123.78ms
iter 57910: loss 8.2617, time 123.51ms
iter 57920: loss 8.1908, time 121.06ms
iter 57930: loss 8.8378, time 118.63ms
iter 57940: loss 8.0189, time 121.16ms
iter 57950: loss 8.8001, time 118.98ms
iter 57960: loss 9.1389, time 119.13ms
iter 57970: loss 8.4252, time 119.94ms
iter 57980: loss 8.2695, time 121.48ms
iter 57990: loss 8.5562, time 119.02ms
tensor(0.6545)
step 58000: train loss 7.9647, val loss 7.9340
saving checkpoint to out-shakespeare-char
iter 58000: loss 8.5621, time 2837.27ms
iter 58010: loss 8.6440, time 121.76ms
iter 58020: loss 8.6359, time 121.12ms
iter 58030: loss 8.1937, time 121.11ms
iter 58040: loss 8.2864, time 118.88ms
iter 58050: loss 8.2533, time 118.83ms
iter 58060: loss 8.6814, time 120.91ms
iter 58070: loss 10.6691, time 120.33ms
iter 58080: loss 53.5170, time 121.82ms
iter 58090: loss 38.0989, time 122.27ms
tensor(0.6841)
iter 58100: loss 27.0081, time 121.39ms
iter 58110: loss 18.5647, time 121.27ms
iter 58120: loss 64.6496, time 121.17ms
iter 58130: loss 33.1027, time 121.21ms
iter 58140: loss 94.6500, time 121.11ms
iter 58150: loss 37.7050, time 118.99ms
iter 58160: loss 31.5346, time 119.93ms
iter 58170: loss 477.0977, time 120.73ms
iter 58180: loss 47.2393, time 121.11ms
iter 58190: loss 521.2724, time 121.34ms
tensor(0.7129)
iter 58200: loss 208.6893, time 121.57ms
iter 58210: loss 93.5192, time 122.92ms
iter 58220: loss 91.4980, time 121.20ms
iter 58230: loss 74.0773, time 119.59ms
iter 58240: loss 264.0366, time 121.22ms
step 58250: train loss 104.1005, val loss 99.8079
saving checkpoint to out-shakespeare-char
iter 58250: loss 96.6272, time 2846.94ms
iter 58260: loss 67.9156, time 121.28ms
iter 58270: loss 46.5416, time 122.34ms
iter 58280: loss 77.9387, time 123.51ms
iter 58290: loss 91.3565, time 119.63ms
tensor(0.7409)
iter 58300: loss 100.7849, time 121.13ms
iter 58310: loss 72.1325, time 120.94ms
iter 58320: loss 66.0237, time 119.17ms
iter 58330: loss 55.8764, time 119.75ms
iter 58340: loss 130.5097, time 120.51ms
iter 58350: loss 55.3124, time 119.06ms
iter 58360: loss 36.1712, time 120.88ms
iter 58370: loss 35.7802, time 122.06ms
iter 58380: loss 25.9814, time 123.39ms
iter 58390: loss 38.2364, time 121.13ms
tensor(0.7679)
iter 58400: loss 38.7314, time 121.45ms
iter 58410: loss 80.5639, time 120.80ms
iter 58420: loss 39.6066, time 119.07ms
iter 58430: loss 54.0298, time 121.31ms
iter 58440: loss 46.2478, time 121.71ms
iter 58450: loss 52.7509, time 121.90ms
iter 58460: loss 49.4820, time 123.63ms
iter 58470: loss 66.3270, time 121.21ms
iter 58480: loss 61.6207, time 119.10ms
iter 58490: loss 99.6646, time 121.04ms
tensor(0.7939)
step 58500: train loss 210.1553, val loss 205.9518
saving checkpoint to out-shakespeare-char
iter 58500: loss 208.6390, time 2826.61ms
iter 58510: loss 102.5997, time 120.43ms
iter 58520: loss 103.5276, time 121.13ms
iter 58530: loss 175.3424, time 121.19ms
iter 58540: loss 87.9659, time 121.03ms
iter 58550: loss 162.4310, time 123.96ms
iter 58560: loss 71.7581, time 121.51ms
iter 58570: loss 132.5903, time 121.86ms
iter 58580: loss 54.9601, time 119.87ms
iter 58590: loss 58.8585, time 121.92ms
tensor(0.8187)
iter 58600: loss 94.1144, time 120.14ms
iter 58610: loss 57.4243, time 124.38ms
iter 58620: loss 60.0956, time 121.76ms
iter 58630: loss 72.8908, time 121.76ms
iter 58640: loss 80.0808, time 119.85ms
iter 58650: loss 57.1699, time 121.35ms
iter 58660: loss 73.6402, time 121.12ms
iter 58670: loss 77.6148, time 123.89ms
iter 58680: loss 109.3761, time 119.86ms
iter 58690: loss 83.9277, time 121.59ms
tensor(0.8423)
iter 58700: loss 69.2749, time 121.79ms
iter 58710: loss 100.4161, time 119.96ms
iter 58720: loss 113.9883, time 121.23ms
iter 58730: loss 71.7883, time 121.53ms
iter 58740: loss 68.7419, time 120.75ms
step 58750: train loss 39.2833, val loss 38.8906
saving checkpoint to out-shakespeare-char
iter 58750: loss 54.1119, time 2829.70ms
iter 58760: loss 114.0096, time 122.00ms
iter 58770: loss 72.9734, time 121.87ms
iter 58780: loss 94.7669, time 119.97ms
iter 58790: loss 103.9786, time 122.04ms
tensor(0.8645)
iter 58800: loss 62.1254, time 120.44ms
iter 58810: loss 60.7663, time 123.57ms
iter 58820: loss 67.1227, time 122.70ms
iter 58830: loss 68.1795, time 121.63ms
iter 58840: loss 59.0810, time 119.51ms
iter 58850: loss 42.0232, time 120.64ms
iter 58860: loss 49.3632, time 120.78ms
iter 58870: loss 65.9817, time 122.93ms
iter 58880: loss 73.6816, time 121.70ms
iter 58890: loss 56.5892, time 121.61ms
tensor(0.8853)
iter 58900: loss 41.1735, time 122.27ms
iter 58910: loss 57.9507, time 120.03ms
iter 58920: loss 66.9848, time 121.58ms
iter 58930: loss 50.4473, time 122.48ms
iter 58940: loss 55.6035, time 121.65ms
iter 58950: loss 59.8912, time 121.68ms
iter 58960: loss 55.5561, time 121.59ms
iter 58970: loss 60.0774, time 119.78ms
iter 58980: loss 91.0455, time 120.26ms
iter 58990: loss 59.9113, time 121.55ms
tensor(0.9045)
step 59000: train loss 51.8279, val loss 52.5647
saving checkpoint to out-shakespeare-char
iter 59000: loss 81.3051, time 2835.70ms
iter 59010: loss 68.1849, time 121.69ms
iter 59020: loss 72.7227, time 121.47ms
iter 59030: loss 98.9743, time 119.53ms
iter 59040: loss 76.7156, time 118.94ms
iter 59050: loss 86.8480, time 120.08ms
iter 59060: loss 81.5500, time 122.39ms
iter 59070: loss 79.9701, time 123.69ms
iter 59080: loss 67.6822, time 122.00ms
iter 59090: loss 66.5371, time 120.98ms
tensor(0.9222)
iter 59100: loss 62.7579, time 120.13ms
iter 59110: loss 60.2844, time 119.55ms
iter 59120: loss 69.8097, time 119.66ms
iter 59130: loss 66.1856, time 120.68ms
iter 59140: loss 70.8961, time 120.48ms
iter 59150: loss 61.5001, time 123.07ms
iter 59160: loss 67.2059, time 123.49ms
iter 59170: loss 84.2727, time 121.55ms
iter 59180: loss 88.5897, time 119.89ms
iter 59190: loss 60.6910, time 119.53ms
tensor(0.9382)
iter 59200: loss 84.9250, time 120.34ms
iter 59210: loss 62.8911, time 121.49ms
iter 59220: loss 57.1636, time 121.22ms
iter 59230: loss 66.0919, time 123.83ms
iter 59240: loss 79.2243, time 123.63ms
step 59250: train loss 42.6465, val loss 42.2374
saving checkpoint to out-shakespeare-char
iter 59250: loss 64.6222, time 2826.12ms
iter 59260: loss 89.2636, time 119.48ms
iter 59270: loss 75.6891, time 119.07ms
iter 59280: loss 80.0139, time 119.05ms
iter 59290: loss 83.2687, time 118.98ms
tensor(0.9524)
iter 59300: loss 88.2429, time 119.18ms
iter 59310: loss 95.3712, time 119.95ms
iter 59320: loss 69.4179, time 120.87ms
iter 59330: loss 66.4300, time 121.17ms
iter 59340: loss 67.9584, time 122.63ms
iter 59350: loss 74.7248, time 123.10ms
iter 59360: loss 57.2775, time 123.40ms
iter 59370: loss 61.4112, time 123.04ms
iter 59380: loss 87.1798, time 118.18ms
iter 59390: loss 82.1596, time 120.97ms
tensor(0.9649)
iter 59400: loss 79.0513, time 119.29ms
iter 59410: loss 92.1609, time 119.46ms
iter 59420: loss 71.9583, time 119.13ms
iter 59430: loss 83.1083, time 119.81ms
iter 59440: loss 90.8412, time 119.03ms
iter 59450: loss 110.7244, time 121.51ms
iter 59460: loss 90.2242, time 123.23ms
iter 59470: loss 114.2528, time 123.51ms
iter 59480: loss 104.3611, time 123.62ms
iter 59490: loss 64.6537, time 121.10ms
tensor(0.9755)
step 59500: train loss 34.5186, val loss 33.9364
saving checkpoint to out-shakespeare-char
iter 59500: loss 59.5959, time 2820.82ms
iter 59510: loss 71.3281, time 119.05ms
iter 59520: loss 86.6506, time 119.30ms
iter 59530: loss 105.1591, time 119.94ms
iter 59540: loss 77.8070, time 120.96ms
iter 59550: loss 58.7282, time 120.15ms
iter 59560: loss 72.5111, time 123.17ms
iter 59570: loss 94.6784, time 123.47ms
iter 59580: loss 88.5900, time 122.50ms
iter 59590: loss 87.6014, time 121.50ms
tensor(0.9843)
iter 59600: loss 90.7076, time 119.32ms
iter 59610: loss 76.5305, time 119.08ms
iter 59620: loss 89.8642, time 119.01ms
iter 59630: loss 82.2232, time 118.98ms
iter 59640: loss 118.1029, time 119.90ms
iter 59650: loss 87.1360, time 120.60ms
iter 59660: loss 67.4903, time 121.61ms
iter 59670: loss 77.5525, time 122.09ms
iter 59680: loss 185.8559, time 121.16ms
iter 59690: loss 107.0133, time 123.44ms
tensor(0.9911)
iter 59700: loss 205.7586, time 122.52ms
iter 59710: loss 85.9656, time 121.01ms
iter 59720: loss 111.9737, time 120.94ms
iter 59730: loss 95.1060, time 118.88ms
iter 59740: loss 85.7617, time 119.16ms
step 59750: train loss 120.7919, val loss 120.6678
saving checkpoint to out-shakespeare-char
iter 59750: loss 153.6505, time 2832.25ms
iter 59760: loss 91.4672, time 120.83ms
iter 59770: loss 76.6348, time 122.13ms
iter 59780: loss 66.2899, time 123.11ms
iter 59790: loss 90.5394, time 121.47ms
tensor(0.9961)
iter 59800: loss 119.9622, time 123.15ms
iter 59810: loss 80.4112, time 121.49ms
iter 59820: loss 64.3682, time 121.11ms
iter 59830: loss 77.1489, time 118.91ms
iter 59840: loss 72.1058, time 117.99ms
iter 59850: loss 72.5180, time 118.85ms
iter 59860: loss 74.2964, time 119.72ms
iter 59870: loss 112.4996, time 120.72ms
iter 59880: loss 74.1203, time 122.04ms
iter 59890: loss 72.2391, time 122.62ms
tensor(0.9990)
iter 59900: loss 72.5164, time 121.51ms
iter 59910: loss 76.5679, time 123.14ms
iter 59920: loss 76.7390, time 122.32ms
iter 59930: loss 79.2679, time 121.30ms
iter 59940: loss 69.2709, time 119.01ms
iter 59950: loss 68.6952, time 119.31ms
iter 59960: loss 89.7102, time 119.59ms
iter 59970: loss 65.4480, time 120.26ms
iter 59980: loss 73.9799, time 121.24ms
iter 59990: loss 76.9065, time 122.99ms
tensor(1.)
step 60000: train loss 56.5047, val loss 55.2486
saving checkpoint to out-shakespeare-char
iter 60000: loss 69.2678, time 2824.93ms
iter 60010: loss 88.1857, time 121.20ms
iter 60020: loss 78.6277, time 121.18ms
iter 60030: loss 69.9072, time 121.08ms
iter 60040: loss 77.3777, time 119.13ms
iter 60050: loss 66.4957, time 119.14ms
iter 60060: loss 86.6388, time 119.17ms
iter 60070: loss 72.3602, time 120.19ms
iter 60080: loss 94.6430, time 121.33ms
iter 60090: loss 80.3803, time 120.09ms
tensor(0.9990)
iter 60100: loss 102.2440, time 122.57ms
iter 60110: loss 79.7779, time 123.06ms
iter 60120: loss 78.1158, time 121.24ms
iter 60130: loss 76.2736, time 123.14ms
iter 60140: loss 91.0851, time 120.10ms
iter 60150: loss 79.2509, time 121.11ms
iter 60160: loss 89.0248, time 121.17ms
iter 60170: loss 83.2307, time 119.14ms
iter 60180: loss 87.2166, time 119.31ms
iter 60190: loss 73.3524, time 118.58ms
tensor(0.9961)
iter 60200: loss 88.9736, time 119.46ms
iter 60210: loss 87.7496, time 120.46ms
iter 60220: loss 75.1739, time 119.94ms
iter 60230: loss 94.7975, time 121.17ms
iter 60240: loss 80.8326, time 125.39ms
step 60250: train loss 45.0222, val loss 44.7530
saving checkpoint to out-shakespeare-char
iter 60250: loss 67.4998, time 2846.78ms
iter 60260: loss 81.9743, time 119.92ms
iter 60270: loss 89.3948, time 121.12ms
iter 60280: loss 86.4692, time 123.05ms
iter 60290: loss 86.7080, time 124.14ms
tensor(0.9911)
iter 60300: loss 83.8119, time 122.56ms
iter 60310: loss 77.8871, time 120.20ms
iter 60320: loss 96.7922, time 119.59ms
iter 60330: loss 81.1858, time 120.96ms
iter 60340: loss 83.3552, time 121.87ms
iter 60350: loss 83.0854, time 123.75ms
iter 60360: loss 83.2092, time 123.29ms
iter 60370: loss 95.9738, time 122.03ms
iter 60380: loss 81.9862, time 119.43ms
iter 60390: loss 107.3628, time 120.70ms
tensor(0.9843)
iter 60400: loss 86.8420, time 119.87ms
iter 60410: loss 76.1308, time 122.50ms
iter 60420: loss 95.4442, time 123.81ms
iter 60430: loss 81.0926, time 123.22ms
iter 60440: loss 93.7583, time 122.34ms
iter 60450: loss 90.4693, time 119.69ms
iter 60460: loss 63.6050, time 119.76ms
iter 60470: loss 79.0439, time 121.94ms
iter 60480: loss 77.7242, time 121.85ms
iter 60490: loss 78.0011, time 123.97ms
tensor(0.9755)
step 60500: train loss 46.6791, val loss 46.8668
saving checkpoint to out-shakespeare-char
iter 60500: loss 68.4055, time 2825.28ms
iter 60510: loss 88.2737, time 119.83ms
iter 60520: loss 92.8589, time 119.20ms
iter 60530: loss 99.5927, time 121.51ms
iter 60540: loss 81.7304, time 121.67ms
iter 60550: loss 70.9850, time 123.74ms
iter 60560: loss 95.2032, time 123.56ms
iter 60570: loss 89.3774, time 121.97ms
iter 60580: loss 90.5126, time 119.65ms
iter 60590: loss 105.3336, time 119.61ms
tensor(0.9649)
iter 60600: loss 82.9511, time 119.80ms
iter 60610: loss 106.8221, time 121.98ms
iter 60620: loss 92.4846, time 123.74ms
iter 60630: loss 98.3321, time 124.11ms
iter 60640: loss 109.2496, time 121.68ms
iter 60650: loss 77.8386, time 119.50ms
iter 60660: loss 95.4002, time 119.58ms
iter 60670: loss 100.0169, time 121.28ms
iter 60680: loss 96.1436, time 122.04ms
iter 60690: loss 62.9366, time 124.27ms
tensor(0.9524)
iter 60700: loss 108.9458, time 123.47ms
iter 60710: loss 75.3176, time 121.81ms
iter 60720: loss 74.3924, time 119.86ms
iter 60730: loss 83.9397, time 121.03ms
iter 60740: loss 80.0249, time 121.11ms
step 60750: train loss 46.4817, val loss 46.3106
saving checkpoint to out-shakespeare-char
iter 60750: loss 71.6518, time 2842.29ms
iter 60760: loss 87.6628, time 122.03ms
iter 60770: loss 73.4611, time 119.78ms
iter 60780: loss 80.5932, time 120.77ms
iter 60790: loss 74.2523, time 122.55ms
tensor(0.9382)
iter 60800: loss 73.1698, time 122.25ms
iter 60810: loss 77.5055, time 123.82ms
iter 60820: loss 71.5729, time 122.42ms
iter 60830: loss 57.9092, time 119.93ms
iter 60840: loss 60.9615, time 120.28ms
iter 60850: loss 77.8914, time 121.78ms
iter 60860: loss 77.9179, time 123.12ms
iter 60870: loss 80.1917, time 124.11ms
iter 60880: loss 73.5140, time 119.73ms
iter 60890: loss 62.8890, time 119.73ms
tensor(0.9222)
iter 60900: loss 67.5368, time 119.87ms
iter 60910: loss 68.0564, time 121.10ms
iter 60920: loss 70.4056, time 122.36ms
iter 60930: loss 68.9053, time 123.62ms
iter 60940: loss 61.7843, time 121.75ms
iter 60950: loss 83.0131, time 121.45ms
iter 60960: loss 77.2141, time 120.20ms
iter 60970: loss 72.1767, time 119.52ms
iter 60980: loss 77.8312, time 120.78ms
iter 60990: loss 68.7295, time 121.76ms
tensor(0.9045)
step 61000: train loss 25.7401, val loss 26.5131
saving checkpoint to out-shakespeare-char
iter 61000: loss 58.2629, time 2817.87ms
iter 61010: loss 63.3534, time 123.85ms
iter 61020: loss 62.9963, time 121.91ms
iter 61030: loss 61.8839, time 119.53ms
iter 61040: loss 67.1307, time 119.64ms
iter 61050: loss 69.9486, time 120.17ms
iter 61060: loss 59.8501, time 121.83ms
iter 61070: loss 58.2953, time 124.09ms
iter 61080: loss 56.7580, time 121.23ms
iter 61090: loss 57.8466, time 121.63ms
tensor(0.8853)
iter 61100: loss 54.2602, time 119.53ms
iter 61110: loss 38.3920, time 119.59ms
iter 61120: loss 42.9606, time 120.12ms
iter 61130: loss 42.8840, time 121.92ms
iter 61140: loss 41.5639, time 121.68ms
iter 61150: loss 44.2439, time 123.86ms
iter 61160: loss 44.4021, time 121.92ms
iter 61170: loss 45.2237, time 119.80ms
iter 61180: loss 48.1910, time 119.58ms
iter 61190: loss 46.2155, time 119.83ms
tensor(0.8645)
iter 61200: loss 52.5996, time 121.20ms
iter 61210: loss 45.6511, time 122.82ms
iter 61220: loss 43.5334, time 121.87ms
iter 61230: loss 51.1238, time 123.99ms
iter 61240: loss 46.2943, time 121.64ms
step 61250: train loss 10.4504, val loss 10.4324
saving checkpoint to out-shakespeare-char
iter 61250: loss 46.2999, time 2826.28ms
iter 61260: loss 51.9758, time 121.04ms
iter 61270: loss 44.0176, time 122.45ms
iter 61280: loss 44.5434, time 121.89ms
iter 61290: loss 53.7923, time 123.83ms
tensor(0.8423)
iter 61300: loss 74.8099, time 122.31ms
iter 61310: loss 64.3036, time 119.81ms
iter 61320: loss 63.1109, time 119.77ms
iter 61330: loss 60.9637, time 121.14ms
iter 61340: loss 62.0214, time 121.30ms
iter 61350: loss 52.4935, time 123.76ms
iter 61360: loss 55.5637, time 121.83ms
iter 61370: loss 56.2151, time 119.58ms
iter 61380: loss 59.3443, time 120.20ms
iter 61390: loss 51.4696, time 120.22ms
tensor(0.8187)
iter 61400: loss 59.3241, time 122.90ms
iter 61410: loss 50.1688, time 123.86ms
iter 61420: loss 50.0389, time 121.82ms
iter 61430: loss 52.8942, time 121.27ms
iter 61440: loss 50.1230, time 121.75ms
iter 61450: loss 44.3055, time 119.48ms
iter 61460: loss 44.9918, time 119.36ms
iter 61470: loss 52.9341, time 121.02ms
iter 61480: loss 47.3332, time 120.59ms
iter 61490: loss 48.9117, time 123.64ms
tensor(0.7939)
step 61500: train loss 19.2294, val loss 19.0113
saving checkpoint to out-shakespeare-char
iter 61500: loss 51.4247, time 2832.04ms
iter 61510: loss 46.6953, time 121.63ms
iter 61520: loss 54.2819, time 119.61ms
iter 61530: loss 47.1875, time 120.27ms
iter 61540: loss 51.9080, time 120.41ms
iter 61550: loss 37.1821, time 123.18ms
iter 61560: loss 48.4831, time 123.85ms
iter 61570: loss 41.1868, time 121.62ms
iter 61580: loss 43.5484, time 119.63ms
iter 61590: loss 41.8736, time 119.62ms
tensor(0.7679)
iter 61600: loss 43.0350, time 121.65ms
iter 61610: loss 41.2667, time 123.13ms
iter 61620: loss 41.1626, time 122.01ms
iter 61630: loss 45.3717, time 121.70ms
iter 61640: loss 45.5356, time 119.76ms
iter 61650: loss 40.2975, time 119.93ms
iter 61660: loss 41.3660, time 120.21ms
iter 61670: loss 44.4867, time 121.68ms
iter 61680: loss 35.5037, time 121.93ms
iter 61690: loss 36.7943, time 123.59ms
tensor(0.7409)
iter 61700: loss 34.7596, time 122.08ms
iter 61710: loss 39.9169, time 119.45ms
iter 61720: loss 38.7589, time 120.60ms
iter 61730: loss 43.1263, time 120.85ms
iter 61740: loss 43.5579, time 122.33ms
step 61750: train loss 10.0062, val loss 10.0100
saving checkpoint to out-shakespeare-char
iter 61750: loss 44.0275, time 2827.85ms
iter 61760: loss 44.7016, time 121.72ms
iter 61770: loss 53.1067, time 119.84ms
iter 61780: loss 54.5230, time 119.54ms
iter 61790: loss 56.6994, time 119.26ms
tensor(0.7129)
iter 61800: loss 51.2331, time 122.65ms
iter 61810: loss 54.1434, time 124.18ms
iter 61820: loss 60.7211, time 121.53ms
iter 61830: loss 63.1422, time 121.83ms
iter 61840: loss 53.5438, time 119.86ms
iter 61850: loss 69.5743, time 120.47ms
iter 61860: loss 70.4323, time 121.30ms
iter 61870: loss 63.8669, time 122.21ms
iter 61880: loss 67.0928, time 121.80ms
iter 61890: loss 59.7877, time 121.21ms
tensor(0.6841)
iter 61900: loss 58.8531, time 120.02ms
iter 61910: loss 61.1736, time 119.82ms
iter 61920: loss 51.5236, time 120.59ms
iter 61930: loss 53.4698, time 122.51ms
iter 61940: loss 54.9679, time 124.16ms
iter 61950: loss 58.2469, time 123.92ms
iter 61960: loss 53.3430, time 119.62ms
iter 61970: loss 50.0705, time 119.75ms
iter 61980: loss 50.9109, time 119.25ms
iter 61990: loss 53.5602, time 120.73ms
tensor(0.6545)
step 62000: train loss 10.2779, val loss 10.3708
saving checkpoint to out-shakespeare-char
iter 62000: loss 53.0199, time 2827.93ms
iter 62010: loss 57.0621, time 123.82ms
iter 62020: loss 54.2108, time 122.20ms
iter 62030: loss 58.8450, time 120.91ms
iter 62040: loss 51.8888, time 119.52ms
iter 62050: loss 59.6757, time 119.62ms
iter 62060: loss 57.9442, time 121.90ms
iter 62070: loss 56.2901, time 122.24ms
iter 62080: loss 52.4323, time 120.59ms
iter 62090: loss 50.7003, time 121.54ms
tensor(0.6243)
iter 62100: loss 52.6445, time 121.91ms
iter 62110: loss 55.5785, time 121.57ms
iter 62120: loss 52.3428, time 119.64ms
iter 62130: loss 51.4199, time 120.88ms
iter 62140: loss 53.4178, time 121.63ms
iter 62150: loss 45.7397, time 122.87ms
iter 62160: loss 47.7122, time 121.01ms
iter 62170: loss 54.1371, time 121.84ms
iter 62180: loss 51.0378, time 120.94ms
iter 62190: loss 52.3075, time 120.85ms
tensor(0.5937)
iter 62200: loss 46.6125, time 120.21ms
iter 62210: loss 50.4951, time 121.98ms
iter 62220: loss 56.2837, time 119.55ms
iter 62230: loss 55.9333, time 123.47ms
iter 62240: loss 56.0220, time 121.76ms
step 62250: train loss 8.8372, val loss 8.8835
saving checkpoint to out-shakespeare-char
iter 62250: loss 56.2416, time 2823.19ms
iter 62260: loss 54.6508, time 119.76ms
iter 62270: loss 53.3279, time 121.19ms
iter 62280: loss 57.8208, time 119.02ms
iter 62290: loss 54.1604, time 121.51ms
tensor(0.5627)
iter 62300: loss 52.8245, time 123.49ms
iter 62310: loss 59.2862, time 121.64ms
iter 62320: loss 52.7301, time 121.66ms
iter 62330: loss 53.0893, time 121.54ms
iter 62340: loss 62.3243, time 119.42ms
iter 62350: loss 60.3314, time 120.54ms
iter 62360: loss 53.3778, time 120.70ms
iter 62370: loss 57.2101, time 121.68ms
iter 62380: loss 61.8801, time 122.30ms
iter 62390: loss 69.6579, time 123.88ms
tensor(0.5314)
iter 62400: loss 63.9211, time 121.73ms
iter 62410: loss 55.5099, time 118.73ms
iter 62420: loss 50.9265, time 121.84ms
iter 62430: loss 52.1930, time 119.90ms
iter 62440: loss 50.0893, time 120.77ms
iter 62450: loss 46.5154, time 121.52ms
iter 62460: loss 43.6587, time 121.54ms
iter 62470: loss 45.1265, time 120.79ms
iter 62480: loss 44.6293, time 123.66ms
iter 62490: loss 38.3946, time 121.54ms
tensor(0.5000)
step 62500: train loss 8.1022, val loss 8.0990
saving checkpoint to out-shakespeare-char
iter 62500: loss 44.5733, time 2822.57ms
iter 62510: loss 41.0117, time 118.87ms
iter 62520: loss 42.8818, time 121.20ms
iter 62530: loss 41.5954, time 119.61ms
iter 62540: loss 43.4895, time 122.57ms
iter 62550: loss 36.5250, time 123.52ms
iter 62560: loss 39.1278, time 120.53ms
iter 62570: loss 42.8681, time 120.80ms
iter 62580: loss 40.7831, time 121.62ms
iter 62590: loss 39.1173, time 119.57ms
tensor(0.4686)
iter 62600: loss 40.1866, time 121.15ms
iter 62610: loss 43.7117, time 121.87ms
iter 62620: loss 41.2916, time 121.35ms
iter 62630: loss 40.3871, time 122.83ms
iter 62640: loss 39.6977, time 123.57ms
iter 62650: loss 37.1711, time 121.52ms
iter 62660: loss 37.5528, time 119.49ms
iter 62670: loss 37.6331, time 121.08ms
iter 62680: loss 40.8864, time 119.84ms
iter 62690: loss 40.3415, time 121.26ms
tensor(0.4373)
iter 62700: loss 39.2646, time 121.76ms
iter 62710: loss 37.3617, time 122.93ms
iter 62720: loss 37.2777, time 121.68ms
iter 62730: loss 35.4151, time 121.45ms
iter 62740: loss 37.3762, time 121.50ms
step 62750: train loss 7.9286, val loss 7.9371
saving checkpoint to out-shakespeare-char
iter 62750: loss 36.9103, time 2823.98ms
iter 62760: loss 37.5250, time 119.72ms
iter 62770: loss 34.9789, time 121.88ms
iter 62780: loss 34.6671, time 119.46ms
iter 62790: loss 34.5037, time 119.43ms
tensor(0.4063)
iter 62800: loss 30.7107, time 118.92ms
iter 62810: loss 31.2679, time 117.53ms
iter 62820: loss 31.7581, time 115.69ms
iter 62830: loss 28.6096, time 119.06ms
iter 62840: loss 28.7766, time 120.65ms
iter 62850: loss 29.0667, time 120.63ms
iter 62860: loss 28.7826, time 119.17ms
iter 62870: loss 28.9031, time 118.12ms
iter 62880: loss 25.6644, time 116.05ms
iter 62890: loss 26.4696, time 118.82ms
tensor(0.3757)
iter 62900: loss 26.2426, time 119.66ms
iter 62910: loss 23.4958, time 121.47ms
iter 62920: loss 27.8203, time 117.92ms
iter 62930: loss 24.8974, time 118.01ms
iter 62940: loss 25.0314, time 117.12ms
iter 62950: loss 26.2114, time 117.86ms
iter 62960: loss 22.2299, time 119.22ms
iter 62970: loss 25.5397, time 122.39ms
iter 62980: loss 21.2987, time 120.10ms
iter 62990: loss 21.4454, time 118.37ms
tensor(0.3455)
step 63000: train loss 7.9843, val loss 7.9194
saving checkpoint to out-shakespeare-char
iter 63000: loss 21.1030, time 2855.25ms
iter 63010: loss 23.2432, time 121.66ms
iter 63020: loss 19.7176, time 119.36ms
iter 63030: loss 20.5660, time 121.32ms
iter 63040: loss 19.7579, time 120.27ms
iter 63050: loss 19.9208, time 120.23ms
iter 63060: loss 18.8151, time 120.93ms
iter 63070: loss 19.1877, time 121.12ms
iter 63080: loss 18.7822, time 121.40ms
iter 63090: loss 19.2608, time 121.02ms
tensor(0.3159)
iter 63100: loss 18.4877, time 122.77ms
iter 63110: loss 16.5049, time 121.13ms
iter 63120: loss 17.5118, time 121.28ms
iter 63130: loss 17.6352, time 121.12ms
iter 63140: loss 18.9995, time 118.82ms
iter 63150: loss 18.4714, time 119.18ms
iter 63160: loss 18.0004, time 120.08ms
iter 63170: loss 19.2733, time 121.03ms
iter 63180: loss 18.1710, time 121.25ms
iter 63190: loss 17.4646, time 120.22ms
tensor(0.2871)
iter 63200: loss 19.0565, time 120.51ms
iter 63210: loss 17.5786, time 122.79ms
iter 63220: loss 17.0373, time 122.60ms
iter 63230: loss 16.4224, time 121.19ms
iter 63240: loss 16.5800, time 121.21ms
step 63250: train loss 7.8261, val loss 7.8225
saving checkpoint to out-shakespeare-char
iter 63250: loss 16.9640, time 2831.23ms
iter 63260: loss 16.6361, time 119.49ms
iter 63270: loss 15.7575, time 119.85ms
iter 63280: loss 17.0263, time 120.27ms
iter 63290: loss 16.7124, time 119.31ms
tensor(0.2591)
iter 63300: loss 14.0635, time 119.61ms
iter 63310: loss 15.8073, time 118.39ms
iter 63320: loss 15.6624, time 119.62ms
iter 63330: loss 15.6629, time 118.61ms
iter 63340: loss 14.7846, time 118.20ms
iter 63350: loss 15.1620, time 117.91ms
iter 63360: loss 14.9835, time 118.97ms
iter 63370: loss 14.9054, time 119.84ms
iter 63380: loss 14.7500, time 119.68ms
iter 63390: loss 14.0270, time 119.34ms
tensor(0.2321)
iter 63400: loss 12.6405, time 119.78ms
iter 63410: loss 12.7197, time 120.21ms
iter 63420: loss 13.3677, time 121.08ms
iter 63430: loss 15.4395, time 119.64ms
iter 63440: loss 12.5213, time 122.75ms
iter 63450: loss 13.1294, time 121.82ms
iter 63460: loss 13.6165, time 121.46ms
iter 63470: loss 13.3068, time 121.27ms
iter 63480: loss 12.3465, time 119.08ms
iter 63490: loss 12.0811, time 119.73ms
tensor(0.2061)
step 63500: train loss 7.8388, val loss 7.8275
saving checkpoint to out-shakespeare-char
iter 63500: loss 13.0855, time 2855.21ms
iter 63510: loss 12.2684, time 117.46ms
iter 63520: loss 13.4014, time 121.58ms
iter 63530: loss 13.1941, time 121.40ms
iter 63540: loss 12.5765, time 119.36ms
iter 63550: loss 11.9434, time 120.29ms
iter 63560: loss 11.7982, time 121.02ms
iter 63570: loss 12.6170, time 119.97ms
iter 63580: loss 11.1199, time 123.00ms
iter 63590: loss 11.8057, time 123.16ms
tensor(0.1813)
iter 63600: loss 12.9312, time 121.41ms
iter 63610: loss 11.3698, time 121.61ms
iter 63620: loss 11.8068, time 121.83ms
iter 63630: loss 11.5453, time 119.77ms
iter 63640: loss 11.8895, time 121.47ms
iter 63650: loss 11.7258, time 121.33ms
iter 63660: loss 10.8790, time 122.70ms
iter 63670: loss 10.9494, time 124.15ms
iter 63680: loss 10.6833, time 121.14ms
iter 63690: loss 11.9520, time 121.73ms
tensor(0.1577)
iter 63700: loss 11.0607, time 119.52ms
iter 63710: loss 10.9361, time 118.94ms
iter 63720: loss 11.0944, time 119.43ms
iter 63730: loss 12.4701, time 121.23ms
iter 63740: loss 10.9277, time 121.10ms
step 63750: train loss 7.8464, val loss 7.7782
saving checkpoint to out-shakespeare-char
iter 63750: loss 11.7955, time 2852.03ms
iter 63760: loss 10.7609, time 119.02ms
iter 63770: loss 11.4282, time 119.78ms
iter 63780: loss 11.1135, time 120.81ms
iter 63790: loss 11.6631, time 121.10ms
tensor(0.1355)
iter 63800: loss 11.1032, time 121.55ms
iter 63810: loss 11.1105, time 122.83ms
iter 63820: loss 9.8736, time 121.28ms
iter 63830: loss 11.0872, time 121.32ms
iter 63840: loss 10.7396, time 121.06ms
iter 63850: loss 10.1330, time 121.13ms
iter 63860: loss 10.3218, time 118.94ms
iter 63870: loss 11.3548, time 119.99ms
iter 63880: loss 10.3939, time 121.58ms
iter 63890: loss 10.8415, time 121.59ms
tensor(0.1147)
iter 63900: loss 11.1468, time 121.36ms
iter 63910: loss 10.7551, time 123.24ms
iter 63920: loss 11.1598, time 121.25ms
iter 63930: loss 10.1212, time 121.75ms
iter 63940: loss 10.1335, time 119.14ms
iter 63950: loss 10.9506, time 119.57ms
iter 63960: loss 10.9628, time 119.01ms
iter 63970: loss 9.9814, time 121.17ms
iter 63980: loss 10.5502, time 121.72ms
iter 63990: loss 10.3447, time 122.69ms
tensor(0.0955)
step 64000: train loss 7.7743, val loss 7.7667
saving checkpoint to out-shakespeare-char
iter 64000: loss 10.8658, time 2844.94ms
iter 64010: loss 10.2013, time 119.37ms
iter 64020: loss 9.8427, time 119.36ms
iter 64030: loss 10.7809, time 120.97ms
iter 64040: loss 9.7766, time 120.43ms
iter 64050: loss 10.1928, time 121.55ms
iter 64060: loss 9.7794, time 122.48ms
iter 64070: loss 10.4598, time 121.18ms
iter 64080: loss 9.4964, time 122.10ms
iter 64090: loss 10.4571, time 122.44ms
tensor(0.0778)
iter 64100: loss 9.8765, time 120.14ms
iter 64110: loss 9.9685, time 119.99ms
iter 64120: loss 10.1344, time 120.43ms
iter 64130: loss 10.9321, time 121.17ms
iter 64140: loss 10.4072, time 121.46ms
iter 64150: loss 10.4904, time 121.18ms
iter 64160: loss 10.6318, time 123.45ms
iter 64170: loss 10.0187, time 121.27ms
iter 64180: loss 9.3792, time 120.96ms
iter 64190: loss 9.7597, time 121.42ms
tensor(0.0618)
iter 64200: loss 9.2524, time 119.41ms
iter 64210: loss 9.4691, time 119.10ms
iter 64220: loss 9.7350, time 121.34ms
iter 64230: loss 10.3563, time 120.94ms
iter 64240: loss 9.5458, time 122.36ms
step 64250: train loss 7.7185, val loss 7.7139
saving checkpoint to out-shakespeare-char
iter 64250: loss 10.4439, time 2845.86ms
iter 64260: loss 10.3477, time 118.87ms
iter 64270: loss 9.2996, time 120.39ms
iter 64280: loss 9.4205, time 118.84ms
iter 64290: loss 10.8406, time 119.07ms
tensor(0.0476)
iter 64300: loss 9.8345, time 119.65ms
iter 64310: loss 10.5273, time 119.95ms
iter 64320: loss 10.3042, time 118.75ms
iter 64330: loss 9.8346, time 120.63ms
iter 64340: loss 10.3811, time 120.76ms
iter 64350: loss 9.7305, time 121.22ms
iter 64360: loss 9.4675, time 121.27ms
iter 64370: loss 9.9422, time 119.48ms
iter 64380: loss 9.6419, time 121.98ms
iter 64390: loss 10.5944, time 121.97ms
tensor(0.0351)
iter 64400: loss 9.9753, time 122.41ms
iter 64410: loss 9.7175, time 122.78ms
iter 64420: loss 9.8460, time 120.69ms
iter 64430: loss 10.0787, time 122.80ms
iter 64440: loss 10.5400, time 122.78ms
iter 64450: loss 10.3074, time 120.70ms
iter 64460: loss 9.6813, time 120.71ms
iter 64470: loss 9.1433, time 120.91ms
iter 64480: loss 9.4325, time 120.86ms
iter 64490: loss 9.5786, time 120.59ms
tensor(0.0245)
step 64500: train loss 7.7176, val loss 7.6947
saving checkpoint to out-shakespeare-char
iter 64500: loss 9.0338, time 2843.38ms
iter 64510: loss 9.1945, time 118.26ms
iter 64520: loss 9.9557, time 118.90ms
iter 64530: loss 9.2349, time 117.21ms
iter 64540: loss 9.8652, time 119.82ms
iter 64550: loss 9.5555, time 120.46ms
iter 64560: loss 10.1215, time 119.74ms
iter 64570: loss 9.5519, time 120.08ms
iter 64580: loss 9.6203, time 117.61ms
iter 64590: loss 9.7928, time 120.09ms
tensor(0.0157)
iter 64600: loss 9.6601, time 121.14ms
iter 64610: loss 9.1341, time 120.35ms
iter 64620: loss 10.1008, time 120.47ms
iter 64630: loss 9.6907, time 118.58ms
iter 64640: loss 9.9066, time 120.43ms
iter 64650: loss 10.4762, time 120.67ms
iter 64660: loss 9.6790, time 120.95ms
iter 64670: loss 10.1457, time 121.60ms
iter 64680: loss 8.9996, time 121.23ms
iter 64690: loss 9.2879, time 123.30ms
tensor(0.0089)
iter 64700: loss 9.8437, time 121.01ms
iter 64710: loss 9.4721, time 121.28ms
iter 64720: loss 9.9050, time 120.91ms
iter 64730: loss 9.9478, time 120.97ms
iter 64740: loss 9.6478, time 120.56ms
step 64750: train loss 7.7031, val loss 7.6598
saving checkpoint to out-shakespeare-char
iter 64750: loss 9.8713, time 2842.10ms
iter 64760: loss 10.2909, time 119.94ms
iter 64770: loss 9.8734, time 120.67ms
iter 64780: loss 9.4786, time 120.11ms
iter 64790: loss 10.1115, time 119.09ms
tensor(0.0039)
iter 64800: loss 9.9130, time 120.62ms
iter 64810: loss 9.2927, time 122.04ms
iter 64820: loss 9.0310, time 123.20ms
iter 64830: loss 9.2137, time 122.49ms
iter 64840: loss 9.6869, time 122.61ms
iter 64850: loss 9.7268, time 121.05ms
iter 64860: loss 9.5467, time 121.08ms
iter 64870: loss 9.7264, time 121.02ms
iter 64880: loss 9.1061, time 120.44ms
iter 64890: loss 9.7349, time 120.15ms
tensor(0.0010)
iter 64900: loss 9.5661, time 121.48ms
iter 64910: loss 9.9965, time 118.98ms
iter 64920: loss 8.5710, time 119.76ms
iter 64930: loss 9.5903, time 119.98ms
iter 64940: loss 9.4422, time 120.21ms
iter 64950: loss 9.4326, time 119.90ms
iter 64960: loss 9.7732, time 120.17ms
iter 64970: loss 9.0279, time 120.01ms
iter 64980: loss 9.5576, time 121.13ms
iter 64990: loss 9.5818, time 120.92ms
tensor(0.0010)
step 65000: train loss 7.6488, val loss 7.6832
saving checkpoint to out-shakespeare-char
iter 65000: loss 9.2940, time 2830.31ms
iter 65010: loss 10.2779, time 119.34ms
iter 65020: loss 9.9133, time 117.75ms
iter 65030: loss 9.3679, time 121.15ms
iter 65040: loss 9.2503, time 121.58ms
iter 65050: loss 9.0709, time 121.90ms
iter 65060: loss 10.0212, time 121.94ms
iter 65070: loss 9.8759, time 121.62ms
iter 65080: loss 9.4075, time 121.74ms
iter 65090: loss 9.5887, time 120.02ms
tensor(0.0010)
iter 65100: loss 9.6329, time 120.40ms
iter 65110: loss 9.7539, time 120.81ms
iter 65120: loss 10.0207, time 120.65ms
iter 65130: loss 9.0205, time 120.45ms
iter 65140: loss 9.4572, time 122.36ms
iter 65150: loss 9.7617, time 121.63ms
iter 65160: loss 10.7770, time 121.55ms
iter 65170: loss 9.6792, time 120.65ms
iter 65180: loss 10.0837, time 119.42ms
iter 65190: loss 9.8700, time 119.44ms
tensor(0.0039)
iter 65200: loss 10.1747, time 121.39ms
iter 65210: loss 9.7941, time 120.71ms
iter 65220: loss 9.9164, time 122.05ms
iter 65230: loss 9.5865, time 123.13ms
iter 65240: loss 9.4477, time 121.70ms
step 65250: train loss 7.6676, val loss 7.6747
saving checkpoint to out-shakespeare-char
iter 65250: loss 9.3974, time 2846.02ms
iter 65260: loss 9.3543, time 120.68ms
iter 65270: loss 9.6998, time 121.09ms
iter 65280: loss 9.8876, time 122.44ms
iter 65290: loss 10.0175, time 122.99ms
tensor(0.0089)
iter 65300: loss 9.8068, time 122.15ms
iter 65310: loss 10.1255, time 121.61ms
iter 65320: loss 10.0354, time 119.93ms
iter 65330: loss 9.8386, time 120.30ms
iter 65340: loss 9.9505, time 120.45ms
iter 65350: loss 9.9591, time 120.65ms
iter 65360: loss 9.6349, time 121.77ms
iter 65370: loss 9.3413, time 122.16ms
iter 65380: loss 9.5458, time 119.59ms
iter 65390: loss 9.6108, time 119.90ms
tensor(0.0157)
iter 65400: loss 9.5765, time 119.90ms
iter 65410: loss 9.5002, time 120.11ms
iter 65420: loss 10.4111, time 120.86ms
iter 65430: loss 9.2681, time 122.40ms
iter 65440: loss 9.2454, time 121.79ms
iter 65450: loss 9.6008, time 122.98ms
iter 65460: loss 9.9941, time 121.69ms
iter 65470: loss 9.6939, time 121.37ms
iter 65480: loss 9.7928, time 119.74ms
iter 65490: loss 9.6465, time 118.78ms
tensor(0.0245)
step 65500: train loss 7.6922, val loss 7.6546
saving checkpoint to out-shakespeare-char
iter 65500: loss 10.0540, time 2827.57ms
iter 65510: loss 9.3294, time 122.43ms
iter 65520: loss 9.5423, time 123.02ms
iter 65530: loss 9.8753, time 123.10ms
iter 65540: loss 9.6260, time 122.31ms
iter 65550: loss 9.3686, time 119.38ms
iter 65560: loss 10.1114, time 120.00ms
iter 65570: loss 9.4671, time 120.12ms
iter 65580: loss 8.9641, time 121.21ms
iter 65590: loss 9.8933, time 123.05ms
tensor(0.0351)
iter 65600: loss 9.0704, time 123.50ms
iter 65610: loss 9.7534, time 121.68ms
iter 65620: loss 10.0036, time 120.06ms
iter 65630: loss 9.9053, time 120.48ms
iter 65640: loss 9.4308, time 119.78ms
iter 65650: loss 9.9969, time 121.29ms
iter 65660: loss 10.3897, time 122.60ms
iter 65670: loss 10.3634, time 122.86ms
iter 65680: loss 9.7444, time 123.18ms
iter 65690: loss 9.5560, time 121.97ms
tensor(0.0476)
iter 65700: loss 10.0434, time 120.19ms
iter 65710: loss 9.8500, time 119.71ms
iter 65720: loss 8.7552, time 120.05ms
iter 65730: loss 9.5274, time 120.75ms
iter 65740: loss 8.9048, time 121.23ms
step 65750: train loss 7.6959, val loss 7.7025
saving checkpoint to out-shakespeare-char
iter 65750: loss 9.9117, time 2862.43ms
iter 65760: loss 9.5924, time 119.91ms
iter 65770: loss 10.1318, time 119.32ms
iter 65780: loss 9.0899, time 119.14ms
iter 65790: loss 9.9470, time 119.07ms
tensor(0.0618)
iter 65800: loss 10.0406, time 119.96ms
iter 65810: loss 10.1785, time 120.26ms
iter 65820: loss 9.7501, time 120.58ms
iter 65830: loss 9.6712, time 120.27ms
iter 65840: loss 9.2681, time 121.90ms
iter 65850: loss 9.2028, time 122.45ms
iter 65860: loss 9.6465, time 122.55ms
iter 65870: loss 9.4479, time 122.38ms
iter 65880: loss 9.1776, time 120.78ms
iter 65890: loss 9.6429, time 121.58ms
tensor(0.0778)
iter 65900: loss 9.2154, time 121.42ms
iter 65910: loss 9.7354, time 121.18ms
iter 65920: loss 10.3858, time 120.22ms
iter 65930: loss 9.8721, time 119.02ms
iter 65940: loss 9.2788, time 119.05ms
iter 65950: loss 9.9307, time 118.91ms
iter 65960: loss 9.4669, time 119.29ms
iter 65970: loss 9.8011, time 119.38ms
iter 65980: loss 8.9012, time 119.61ms
iter 65990: loss 9.4326, time 119.07ms
tensor(0.0955)
step 66000: train loss 7.7408, val loss 7.7444
saving checkpoint to out-shakespeare-char
iter 66000: loss 9.0453, time 2851.14ms
iter 66010: loss 9.0856, time 122.50ms
iter 66020: loss 8.9335, time 122.72ms
iter 66030: loss 9.1735, time 121.15ms
iter 66040: loss 8.8823, time 119.01ms
iter 66050: loss 8.6995, time 121.29ms
iter 66060: loss 8.9162, time 119.06ms
iter 66070: loss 9.0797, time 119.11ms
iter 66080: loss 9.4175, time 119.11ms
iter 66090: loss 9.3188, time 119.12ms
tensor(0.1147)
iter 66100: loss 9.9158, time 119.20ms
iter 66110: loss 9.4013, time 119.90ms
iter 66120: loss 8.4796, time 120.36ms
iter 66130: loss 9.4877, time 120.67ms
iter 66140: loss 9.3771, time 120.81ms
iter 66150: loss 9.3540, time 120.46ms
iter 66160: loss 9.7840, time 122.47ms
iter 66170: loss 9.0182, time 122.19ms
iter 66180: loss 8.8812, time 122.46ms
iter 66190: loss 9.0390, time 122.52ms
tensor(0.1355)
iter 66200: loss 8.7295, time 121.94ms
iter 66210: loss 9.0715, time 121.61ms
iter 66220: loss 9.1985, time 121.06ms
iter 66230: loss 9.5221, time 121.17ms
iter 66240: loss 9.0471, time 119.27ms
step 66250: train loss 7.7851, val loss 7.7901
saving checkpoint to out-shakespeare-char
iter 66250: loss 9.4068, time 2846.47ms
iter 66260: loss 9.2615, time 119.33ms
iter 66270: loss 9.4236, time 120.77ms
iter 66280: loss 9.0096, time 120.87ms
iter 66290: loss 8.9981, time 121.71ms
tensor(0.1577)
iter 66300: loss 9.0333, time 122.57ms
iter 66310: loss 8.8892, time 120.91ms
iter 66320: loss 8.6984, time 122.79ms
iter 66330: loss 8.9004, time 122.66ms
iter 66340: loss 9.6603, time 122.40ms
iter 66350: loss 9.7285, time 122.26ms
iter 66360: loss 8.8126, time 121.10ms
iter 66370: loss 9.3320, time 121.11ms
iter 66380: loss 8.5244, time 119.33ms
iter 66390: loss 9.2989, time 119.08ms
tensor(0.1813)
iter 66400: loss 9.1261, time 119.30ms
iter 66410: loss 8.7535, time 118.47ms
iter 66420: loss 9.2430, time 117.56ms
iter 66430: loss 9.0081, time 117.99ms
iter 66440: loss 9.0713, time 117.88ms
iter 66450: loss 9.4949, time 119.38ms
iter 66460: loss 9.1841, time 117.94ms
iter 66470: loss 9.0190, time 118.00ms
iter 66480: loss 8.8314, time 119.08ms
iter 66490: loss 9.2105, time 117.79ms
tensor(0.2061)
step 66500: train loss 7.9171, val loss 7.8614
saving checkpoint to out-shakespeare-char
iter 66500: loss 9.2971, time 2850.57ms
iter 66510: loss 8.7589, time 120.35ms
iter 66520: loss 8.9029, time 121.33ms
iter 66530: loss 8.1270, time 122.41ms
iter 66540: loss 8.5663, time 122.55ms
iter 66550: loss 8.9986, time 122.20ms
iter 66560: loss 8.7112, time 120.14ms
iter 66570: loss 8.3619, time 121.17ms
iter 66580: loss 8.8504, time 121.04ms
iter 66590: loss 9.1532, time 121.27ms
tensor(0.2321)
iter 66600: loss 8.7820, time 119.28ms
iter 66610: loss 8.2435, time 119.13ms
iter 66620: loss 8.6120, time 119.09ms
iter 66630: loss 9.3693, time 118.79ms
iter 66640: loss 8.4057, time 118.07ms
iter 66650: loss 8.4089, time 119.45ms
iter 66660: loss 9.2484, time 119.43ms
iter 66670: loss 9.2718, time 118.95ms
iter 66680: loss 8.0867, time 120.49ms
iter 66690: loss 8.0582, time 120.46ms
tensor(0.2591)
iter 66700: loss 8.7801, time 121.28ms
iter 66710: loss 8.1255, time 121.26ms
iter 66720: loss 8.1795, time 120.85ms
iter 66730: loss 8.9857, time 122.42ms
iter 66740: loss 9.0232, time 123.14ms
step 66750: train loss 7.8162, val loss 7.8374
saving checkpoint to out-shakespeare-char
iter 66750: loss 8.4978, time 2855.98ms
iter 66760: loss 8.7654, time 119.13ms
iter 66770: loss 9.2487, time 119.17ms
iter 66780: loss 9.2296, time 119.10ms
iter 66790: loss 9.2340, time 120.42ms
tensor(0.2871)
iter 66800: loss 9.2859, time 120.14ms
iter 66810: loss 9.0374, time 120.67ms
iter 66820: loss 9.4378, time 121.41ms
iter 66830: loss 9.7834, time 121.24ms
iter 66840: loss 9.2829, time 122.37ms
iter 66850: loss 9.6589, time 122.39ms
iter 66860: loss 9.7186, time 122.33ms
iter 66870: loss 10.2361, time 121.08ms
iter 66880: loss 9.0682, time 121.13ms
iter 66890: loss 10.4385, time 119.13ms
tensor(0.3159)
iter 66900: loss 10.0637, time 119.68ms
iter 66910: loss 10.4584, time 118.93ms
iter 66920: loss 9.7896, time 119.32ms
iter 66930: loss 10.4457, time 119.01ms
iter 66940: loss 10.4459, time 119.19ms
iter 66950: loss 10.5229, time 120.26ms
iter 66960: loss 10.5378, time 120.15ms
iter 66970: loss 10.2306, time 121.24ms
iter 66980: loss 10.0152, time 121.14ms
iter 66990: loss 10.5001, time 121.91ms
tensor(0.3455)
step 67000: train loss 7.8839, val loss 7.8978
saving checkpoint to out-shakespeare-char
iter 67000: loss 10.3288, time 2850.75ms
iter 67010: loss 10.3439, time 121.17ms
iter 67020: loss 10.4133, time 118.96ms
iter 67030: loss 10.1722, time 118.83ms
iter 67040: loss 10.1710, time 119.02ms
iter 67050: loss 10.1761, time 118.83ms
iter 67060: loss 10.1402, time 119.52ms
iter 67070: loss 9.7458, time 118.87ms
iter 67080: loss 10.1526, time 119.68ms
iter 67090: loss 10.0951, time 120.13ms
tensor(0.3757)
iter 67100: loss 9.0638, time 121.35ms
iter 67110: loss 9.8339, time 121.99ms
iter 67120: loss 9.2899, time 120.05ms
iter 67130: loss 9.3212, time 122.37ms
iter 67140: loss 9.8632, time 122.46ms
iter 67150: loss 8.8672, time 122.63ms
iter 67160: loss 8.7371, time 122.45ms
iter 67170: loss 9.5039, time 120.11ms
iter 67180: loss 9.3455, time 121.10ms
iter 67190: loss 9.6860, time 121.25ms
tensor(0.4063)
iter 67200: loss 9.2284, time 121.27ms
iter 67210: loss 9.4596, time 118.96ms
iter 67220: loss 9.8092, time 119.14ms
iter 67230: loss 9.0424, time 118.89ms
iter 67240: loss 9.3407, time 118.91ms
step 67250: train loss 7.9222, val loss 7.9100
saving checkpoint to out-shakespeare-char
iter 67250: loss 8.7401, time 2862.53ms
iter 67260: loss 9.2797, time 122.35ms
iter 67270: loss 8.4313, time 122.58ms
iter 67280: loss 9.1364, time 119.22ms
iter 67290: loss 8.8074, time 121.02ms
tensor(0.4373)
iter 67300: loss 8.8261, time 120.97ms
iter 67310: loss 8.5910, time 118.87ms
iter 67320: loss 9.3099, time 118.58ms
iter 67330: loss 8.6004, time 119.03ms
iter 67340: loss 9.3090, time 118.81ms
iter 67350: loss 8.7933, time 118.63ms
iter 67360: loss 8.8502, time 119.29ms
iter 67370: loss 8.3319, time 118.97ms
iter 67380: loss 8.5404, time 119.49ms
iter 67390: loss 9.4088, time 118.97ms
tensor(0.4686)
iter 67400: loss 8.4865, time 120.48ms
iter 67410: loss 8.8898, time 121.10ms
iter 67420: loss 8.4389, time 121.01ms
iter 67430: loss 8.4237, time 121.55ms
iter 67440: loss 8.4037, time 121.15ms
iter 67450: loss 8.3884, time 122.42ms
iter 67460: loss 9.1069, time 122.26ms
iter 67470: loss 8.9882, time 122.32ms
iter 67480: loss 8.7285, time 122.65ms
iter 67490: loss 8.3267, time 121.06ms
tensor(0.5000)
step 67500: train loss 7.9428, val loss 7.9620
saving checkpoint to out-shakespeare-char
iter 67500: loss 8.6391, time 2856.87ms
iter 67510: loss 8.1454, time 120.35ms
iter 67520: loss 8.8320, time 121.58ms
iter 67530: loss 8.5442, time 123.59ms
iter 67540: loss 8.3952, time 122.93ms
iter 67550: loss 8.4204, time 121.35ms
iter 67560: loss 8.7539, time 122.00ms
iter 67570: loss 9.1037, time 119.11ms
iter 67580: loss 8.9471, time 119.26ms
iter 67590: loss 8.5064, time 119.16ms
tensor(0.5314)
iter 67600: loss 8.6365, time 119.39ms
iter 67610: loss 8.5591, time 119.16ms
iter 67620: loss 8.4112, time 120.21ms
iter 67630: loss 8.1943, time 120.12ms
iter 67640: loss 7.6189, time 120.60ms
iter 67650: loss 8.5916, time 121.05ms
iter 67660: loss 8.5387, time 122.23ms
iter 67670: loss 8.5903, time 122.48ms
iter 67680: loss 8.2000, time 120.15ms
iter 67690: loss 8.6804, time 121.40ms
tensor(0.5627)
iter 67700: loss 8.7369, time 122.65ms
iter 67710: loss 9.1895, time 122.35ms
iter 67720: loss 8.2389, time 121.11ms
iter 67730: loss 8.6484, time 119.19ms
iter 67740: loss 8.3202, time 121.75ms
step 67750: train loss 7.9142, val loss 7.9143
saving checkpoint to out-shakespeare-char
iter 67750: loss 8.0627, time 2854.26ms
iter 67760: loss 8.4569, time 123.78ms
iter 67770: loss 8.2812, time 123.80ms
iter 67780: loss 7.8720, time 121.06ms
iter 67790: loss 9.3440, time 118.94ms
tensor(0.5937)
iter 67800: loss 8.5405, time 118.42ms
iter 67810: loss 8.0467, time 118.90ms
iter 67820: loss 8.1761, time 119.05ms
iter 67830: loss 8.1057, time 120.64ms
iter 67840: loss 8.5982, time 124.55ms
iter 67850: loss 7.7252, time 117.32ms
iter 67860: loss 7.9966, time 115.71ms
iter 67870: loss 8.6868, time 117.16ms
iter 67880: loss 8.9527, time 118.25ms
iter 67890: loss 8.4900, time 115.08ms
tensor(0.6243)
iter 67900: loss 7.8315, time 117.34ms
iter 67910: loss 7.8243, time 115.31ms
iter 67920: loss 8.1566, time 113.88ms
iter 67930: loss 7.9352, time 118.46ms
iter 67940: loss 7.5040, time 116.30ms
iter 67950: loss 7.9579, time 115.05ms
iter 67960: loss 8.0108, time 118.22ms
iter 67970: loss 9.0733, time 114.61ms
iter 67980: loss 8.3290, time 116.88ms
iter 67990: loss 8.8184, time 117.61ms
tensor(0.6545)
step 68000: train loss 7.9866, val loss 7.9969
saving checkpoint to out-shakespeare-char
iter 68000: loss 7.8332, time 2861.57ms
iter 68010: loss 8.6268, time 121.95ms
iter 68020: loss 8.1752, time 122.05ms
iter 68030: loss 8.1011, time 119.82ms
iter 68040: loss 8.5852, time 121.62ms
iter 68050: loss 8.4487, time 122.09ms
iter 68060: loss 8.6098, time 122.10ms
iter 68070: loss 8.1853, time 121.70ms
iter 68080: loss 8.3680, time 122.08ms
iter 68090: loss 8.2113, time 121.05ms
tensor(0.6841)
iter 68100: loss 8.2239, time 122.32ms
iter 68110: loss 8.0704, time 120.53ms
iter 68120: loss 8.4985, time 120.74ms
iter 68130: loss 8.3038, time 121.55ms
iter 68140: loss 8.1687, time 122.41ms
iter 68150: loss 8.4530, time 120.92ms
iter 68160: loss 8.5652, time 118.92ms
iter 68170: loss 8.7199, time 119.03ms
iter 68180: loss 8.0842, time 118.90ms
iter 68190: loss 8.5952, time 119.18ms
tensor(0.7129)
iter 68200: loss 8.4972, time 119.40ms
iter 68210: loss 7.6241, time 119.47ms
iter 68220: loss 8.1354, time 119.91ms
iter 68230: loss 8.4471, time 120.15ms
iter 68240: loss 8.2405, time 120.82ms
step 68250: train loss 8.0157, val loss 7.9577
saving checkpoint to out-shakespeare-char
iter 68250: loss 8.2460, time 2854.42ms
iter 68260: loss 8.3425, time 119.42ms
iter 68270: loss 8.1026, time 119.92ms
iter 68280: loss 9.0400, time 119.99ms
iter 68290: loss 8.8659, time 120.08ms
tensor(0.7409)
iter 68300: loss 8.7172, time 122.06ms
iter 68310: loss 7.9780, time 122.43ms
iter 68320: loss 8.5934, time 122.19ms
iter 68330: loss 8.3691, time 122.84ms
iter 68340: loss 8.1840, time 121.37ms
iter 68350: loss 7.6817, time 121.23ms
iter 68360: loss 8.3562, time 118.86ms
iter 68370: loss 8.0710, time 118.98ms
iter 68380: loss 8.2600, time 119.10ms
iter 68390: loss 8.2085, time 118.91ms
tensor(0.7679)
iter 68400: loss 8.5272, time 119.26ms
iter 68410: loss 8.5890, time 120.28ms
iter 68420: loss 8.6627, time 120.36ms
iter 68430: loss 8.5584, time 120.12ms
iter 68440: loss 8.4793, time 119.27ms
iter 68450: loss 7.9887, time 120.44ms
iter 68460: loss 8.5363, time 121.17ms
iter 68470: loss 8.1378, time 121.40ms
iter 68480: loss 8.4593, time 120.82ms
iter 68490: loss 36.2771, time 122.32ms
tensor(0.7939)
step 68500: train loss 14.6263, val loss 14.7235
saving checkpoint to out-shakespeare-char
iter 68500: loss 17.8347, time 2849.23ms
iter 68510: loss 16.4187, time 118.95ms
iter 68520: loss 22.7011, time 120.13ms
iter 68530: loss 24.4049, time 120.42ms
iter 68540: loss 29.9787, time 120.25ms
iter 68550: loss 32.6901, time 119.22ms
iter 68560: loss 92.5338, time 120.12ms
iter 68570: loss 60.0202, time 120.84ms
iter 68580: loss 35.8193, time 120.61ms
iter 68590: loss 23.5436, time 120.12ms
tensor(0.8187)
iter 68600: loss 52.4890, time 121.59ms
iter 68610: loss 90.9457, time 122.40ms
iter 68620: loss 39.6645, time 122.64ms
iter 68630: loss 38.4359, time 120.82ms
iter 68640: loss 35.2816, time 119.00ms
iter 68650: loss 34.1880, time 121.94ms
iter 68660: loss 28.0631, time 121.65ms
iter 68670: loss 139.9118, time 118.61ms
iter 68680: loss 80.8711, time 119.87ms
iter 68690: loss 60.0119, time 120.98ms
tensor(0.8423)
iter 68700: loss 32.4278, time 119.73ms
iter 68710: loss 28.6819, time 120.20ms
iter 68720: loss 36.7826, time 120.38ms
iter 68730: loss 30.3087, time 120.83ms
iter 68740: loss 27.7991, time 121.67ms
step 68750: train loss 33.5407, val loss 33.5207
saving checkpoint to out-shakespeare-char
iter 68750: loss 38.8205, time 2855.49ms
iter 68760: loss 51.0648, time 119.81ms
iter 68770: loss 38.0504, time 120.69ms
iter 68780: loss 32.0853, time 120.81ms
iter 68790: loss 28.3243, time 120.69ms
tensor(0.8645)
iter 68800: loss 31.5712, time 121.02ms
iter 68810: loss 30.8963, time 120.68ms
iter 68820: loss 31.8034, time 121.30ms
iter 68830: loss 31.6808, time 121.95ms
iter 68840: loss 34.6682, time 121.39ms
iter 68850: loss 37.2893, time 121.27ms
iter 68860: loss 41.2174, time 121.55ms
iter 68870: loss 44.0344, time 119.66ms
iter 68880: loss 47.1746, time 119.62ms
iter 68890: loss 41.1659, time 120.01ms
tensor(0.8853)
iter 68900: loss 55.3875, time 120.86ms
iter 68910: loss 61.8628, time 120.63ms
iter 68920: loss 63.6059, time 121.43ms
iter 68930: loss 78.8234, time 122.15ms
iter 68940: loss 83.0596, time 120.39ms
iter 68950: loss 85.8757, time 121.55ms
iter 68960: loss 79.0977, time 121.73ms
iter 68970: loss 110.4611, time 121.12ms
iter 68980: loss 121.3694, time 119.18ms
iter 68990: loss 139.9130, time 119.50ms
tensor(0.9045)
step 69000: train loss 23.4502, val loss 23.4137
saving checkpoint to out-shakespeare-char
iter 69000: loss 116.1297, time 2856.07ms
iter 69010: loss 82.3947, time 121.51ms
iter 69020: loss 98.6694, time 121.51ms
iter 69030: loss 113.5239, time 121.38ms
iter 69040: loss 103.5054, time 119.35ms
iter 69050: loss 82.4228, time 119.99ms
iter 69060: loss 88.2874, time 119.24ms
iter 69070: loss 105.5883, time 120.37ms
iter 69080: loss 102.3392, time 120.85ms
iter 69090: loss 100.0074, time 120.60ms
tensor(0.9222)
iter 69100: loss 61.1118, time 122.06ms
iter 69110: loss 78.9556, time 122.10ms
iter 69120: loss 81.9568, time 121.49ms
iter 69130: loss 108.9558, time 121.50ms
iter 69140: loss 95.8094, time 121.45ms
iter 69150: loss 118.1265, time 121.33ms
iter 69160: loss 114.2565, time 119.12ms
iter 69170: loss 116.4135, time 119.87ms
iter 69180: loss 81.0231, time 120.47ms
iter 69190: loss 72.2068, time 120.35ms
tensor(0.9382)
iter 69200: loss 112.3469, time 120.81ms
iter 69210: loss 86.2050, time 121.14ms
iter 69220: loss 115.2901, time 122.51ms
iter 69230: loss 99.3555, time 122.49ms
iter 69240: loss 108.8843, time 119.35ms
step 69250: train loss 56.8752, val loss 57.2901
saving checkpoint to out-shakespeare-char
iter 69250: loss 132.5572, time 2853.23ms
iter 69260: loss 87.2823, time 120.16ms
iter 69270: loss 84.5244, time 120.79ms
iter 69280: loss 123.0889, time 120.63ms
iter 69290: loss 135.2811, time 121.89ms
tensor(0.9524)
iter 69300: loss 94.1561, time 120.52ms
iter 69310: loss 91.8450, time 122.66ms
iter 69320: loss 124.5056, time 121.22ms
iter 69330: loss 94.7022, time 121.22ms
iter 69340: loss 108.7156, time 121.95ms
iter 69350: loss 88.8959, time 119.41ms
iter 69360: loss 80.8549, time 119.39ms
iter 69370: loss 100.9528, time 120.38ms
iter 69380: loss 129.2009, time 120.62ms
iter 69390: loss 123.7298, time 120.46ms
tensor(0.9649)
iter 69400: loss 111.5382, time 120.63ms
iter 69410: loss 91.8950, time 120.02ms
iter 69420: loss 134.1906, time 121.57ms
iter 69430: loss 95.5836, time 122.77ms
iter 69440: loss 100.6290, time 121.52ms
iter 69450: loss 100.2796, time 121.28ms
iter 69460: loss 95.7229, time 121.13ms
iter 69470: loss 103.7463, time 121.30ms
iter 69480: loss 91.9316, time 119.35ms
iter 69490: loss 113.2069, time 119.86ms
tensor(0.9755)
step 69500: train loss 45.7536, val loss 44.8674
saving checkpoint to out-shakespeare-char
iter 69500: loss 108.6274, time 2852.68ms
iter 69510: loss 106.8669, time 115.92ms
iter 69520: loss 105.5343, time 116.74ms
iter 69530: loss 122.6578, time 118.12ms
iter 69540: loss 124.2563, time 115.85ms
iter 69550: loss 78.8698, time 116.81ms
iter 69560: loss 120.2215, time 118.24ms
iter 69570: loss 92.8565, time 115.92ms
iter 69580: loss 105.7257, time 116.79ms
iter 69590: loss 124.6147, time 116.45ms
tensor(0.9843)
iter 69600: loss 129.4630, time 116.31ms
iter 69610: loss 99.1152, time 116.90ms
iter 69620: loss 105.9112, time 116.24ms
iter 69630: loss 108.6654, time 114.83ms
iter 69640: loss 97.4273, time 116.83ms
iter 69650: loss 102.7559, time 115.85ms
iter 69660: loss 96.5314, time 116.77ms
iter 69670: loss 87.5666, time 118.01ms
iter 69680: loss 72.9864, time 115.89ms
iter 69690: loss 89.7556, time 114.57ms
tensor(0.9911)
iter 69700: loss 94.6187, time 117.37ms
iter 69710: loss 68.3972, time 116.11ms
iter 69720: loss 94.5381, time 116.72ms
iter 69730: loss 90.8066, time 116.31ms
iter 69740: loss 106.1395, time 115.34ms
step 69750: train loss 49.6339, val loss 49.9959
saving checkpoint to out-shakespeare-char
iter 69750: loss 92.3966, time 2856.13ms
iter 69760: loss 84.7668, time 116.68ms
iter 69770: loss 93.8054, time 116.34ms
iter 69780: loss 99.3209, time 114.93ms
iter 69790: loss 98.9774, time 116.95ms
tensor(0.9961)
iter 69800: loss 113.2715, time 116.40ms
iter 69810: loss 106.5080, time 116.08ms
iter 69820: loss 89.4639, time 118.00ms
iter 69830: loss 97.6803, time 115.71ms
iter 69840: loss 111.6564, time 116.76ms
iter 69850: loss 97.7863, time 117.86ms
iter 69860: loss 104.2598, time 115.95ms
iter 69870: loss 72.0756, time 116.79ms
iter 69880: loss 88.9379, time 116.44ms
iter 69890: loss 83.3621, time 114.95ms
tensor(0.9990)
iter 69900: loss 101.0251, time 117.39ms
iter 69910: loss 101.0271, time 115.70ms
iter 69920: loss 100.3729, time 116.72ms
iter 69930: loss 98.9326, time 117.97ms
iter 69940: loss 80.0668, time 115.86ms
iter 69950: loss 74.3372, time 114.62ms
iter 69960: loss 106.0119, time 116.68ms
iter 69970: loss 85.2056, time 115.47ms
iter 69980: loss 103.1215, time 116.80ms
iter 69990: loss 90.7482, time 116.13ms
tensor(1.)
step 70000: train loss 49.4024, val loss 49.6071
saving checkpoint to out-shakespeare-char
iter 70000: loss 85.4462, time 2835.24ms
iter 70010: loss 102.7719, time 116.82ms
iter 70020: loss 76.5180, time 116.72ms
iter 70030: loss 104.4173, time 115.34ms
iter 70040: loss 102.7992, time 116.21ms
iter 70050: loss 95.2832, time 116.05ms
iter 70060: loss 92.0527, time 114.72ms
iter 70070: loss 102.3143, time 118.42ms
iter 70080: loss 100.8585, time 115.82ms
iter 70090: loss 128.0670, time 115.00ms
tensor(0.9990)
iter 70100: loss 88.2232, time 118.50ms
iter 70110: loss 92.8470, time 116.14ms
iter 70120: loss 86.7943, time 116.78ms
iter 70130: loss 68.5216, time 116.81ms
iter 70140: loss 64.3336, time 115.24ms
iter 70150: loss 63.7478, time 114.67ms
iter 70160: loss 61.8657, time 116.02ms
iter 70170: loss 55.5822, time 114.79ms
iter 70180: loss 66.1537, time 118.04ms
iter 70190: loss 51.0612, time 115.84ms
tensor(0.9961)
iter 70200: loss 63.0677, time 117.15ms
iter 70210: loss 59.8904, time 115.83ms
iter 70220: loss 67.3812, time 115.53ms
iter 70230: loss 74.9964, time 118.04ms
iter 70240: loss 63.7513, time 116.83ms
step 70250: train loss 19.2305, val loss 19.6801
saving checkpoint to out-shakespeare-char
iter 70250: loss 74.2976, time 2842.99ms
iter 70260: loss 78.4223, time 115.24ms
iter 70270: loss 84.2344, time 116.67ms
iter 70280: loss 88.8997, time 115.78ms
iter 70290: loss 67.1179, time 115.28ms
tensor(0.9911)
iter 70300: loss 80.9230, time 118.54ms
iter 70310: loss 87.8986, time 115.83ms
iter 70320: loss 116.3565, time 116.82ms
iter 70330: loss 128.5630, time 117.63ms
iter 70340: loss 119.6918, time 116.07ms
iter 70350: loss 140.4559, time 114.66ms
iter 70360: loss 161.8574, time 116.90ms
iter 70370: loss 130.2859, time 115.36ms
iter 70380: loss 127.9566, time 116.78ms
iter 70390: loss 103.4532, time 116.28ms
tensor(0.9843)
iter 70400: loss 117.8006, time 115.07ms
iter 70410: loss 114.8136, time 114.67ms
iter 70420: loss 87.6881, time 115.84ms
iter 70430: loss 115.3156, time 116.84ms
iter 70440: loss 121.2760, time 118.14ms
iter 70450: loss 139.2411, time 115.86ms
iter 70460: loss 150.1514, time 117.23ms
iter 70470: loss 108.0637, time 116.53ms
iter 70480: loss 106.8276, time 115.93ms
iter 70490: loss 110.1378, time 116.83ms
tensor(0.9755)
step 70500: train loss 19.6415, val loss 19.4744
saving checkpoint to out-shakespeare-char
iter 70500: loss 109.1536, time 2843.11ms
iter 70510: loss 121.1143, time 115.71ms
iter 70520: loss 154.1194, time 116.75ms
iter 70530: loss 140.6924, time 118.05ms
iter 70540: loss 119.7610, time 115.83ms
iter 70550: loss 165.1649, time 114.76ms
iter 70560: loss 138.6172, time 117.79ms
iter 70570: loss 139.2528, time 115.92ms
iter 70580: loss 148.9342, time 116.85ms
iter 70590: loss 137.6261, time 116.71ms
tensor(0.9649)
iter 70600: loss 110.1941, time 116.02ms
iter 70610: loss 114.8900, time 114.59ms
iter 70620: loss 125.3505, time 115.65ms
iter 70630: loss 129.7484, time 116.57ms
iter 70640: loss 133.3895, time 117.39ms
iter 70650: loss 108.5442, time 115.82ms
iter 70660: loss 142.9933, time 116.62ms
iter 70670: loss 130.8207, time 115.76ms
iter 70680: loss 156.7754, time 114.95ms
iter 70690: loss 109.2663, time 116.64ms
tensor(0.9524)
iter 70700: loss 101.9831, time 116.39ms
iter 70710: loss 122.5168, time 117.04ms
iter 70720: loss 118.3966, time 118.17ms
iter 70730: loss 103.1873, time 115.81ms
iter 70740: loss 116.8913, time 116.63ms
step 70750: train loss 48.5323, val loss 48.1963
saving checkpoint to out-shakespeare-char
iter 70750: loss 121.3295, time 2835.79ms
iter 70760: loss 124.1555, time 115.82ms
iter 70770: loss 122.8830, time 116.38ms
iter 70780: loss 119.0872, time 118.01ms
iter 70790: loss 105.3092, time 116.23ms
tensor(0.9382)
iter 70800: loss 80.1554, time 117.13ms
iter 70810: loss 130.7276, time 115.70ms
iter 70820: loss 121.0027, time 115.10ms
iter 70830: loss 96.3496, time 116.60ms
iter 70840: loss 96.8066, time 115.82ms
iter 70850: loss 114.4221, time 115.19ms
iter 70860: loss 102.8318, time 118.63ms
iter 70870: loss 103.8206, time 115.77ms
iter 70880: loss 126.2939, time 116.57ms
iter 70890: loss 129.3776, time 115.53ms
tensor(0.9222)
iter 70900: loss 102.5064, time 116.32ms
iter 70910: loss 109.4401, time 116.53ms
iter 70920: loss 113.4856, time 115.66ms
iter 70930: loss 121.4435, time 115.88ms
iter 70940: loss 108.2800, time 117.98ms
iter 70950: loss 103.9941, time 114.69ms
iter 70960: loss 129.0208, time 116.80ms
iter 70970: loss 112.7729, time 117.24ms
iter 70980: loss 91.6095, time 115.56ms
iter 70990: loss 106.8917, time 117.28ms
tensor(0.9045)
step 71000: train loss 47.6307, val loss 47.6186
saving checkpoint to out-shakespeare-char
iter 71000: loss 103.3960, time 2854.24ms
iter 71010: loss 122.8054, time 116.33ms
iter 71020: loss 102.5763, time 115.60ms
iter 71030: loss 106.4108, time 116.34ms
iter 71040: loss 99.0281, time 115.80ms
iter 71050: loss 83.4356, time 115.61ms
iter 71060: loss 101.6978, time 118.02ms
iter 71070: loss 106.4544, time 115.95ms
iter 71080: loss 84.9514, time 116.50ms
iter 71090: loss 96.6287, time 115.55ms
tensor(0.8853)
iter 71100: loss 90.0450, time 115.55ms
iter 71110: loss 89.6354, time 117.05ms
iter 71120: loss 97.3284, time 115.81ms
iter 71130: loss 81.3901, time 116.28ms
iter 71140: loss 98.9491, time 117.96ms
iter 71150: loss 96.8238, time 114.77ms
iter 71160: loss 105.7957, time 116.68ms
iter 71170: loss 83.4420, time 116.64ms
iter 71180: loss 79.4318, time 115.35ms
iter 71190: loss 79.7931, time 116.74ms
tensor(0.8645)
iter 71200: loss 93.7399, time 116.03ms
iter 71210: loss 100.7426, time 116.75ms
iter 71220: loss 77.1103, time 117.34ms
iter 71230: loss 78.9074, time 114.75ms
iter 71240: loss 87.2890, time 116.75ms
step 71250: train loss 37.3123, val loss 36.8630
saving checkpoint to out-shakespeare-char
iter 71250: loss 84.6215, time 2818.55ms
iter 71260: loss 70.9503, time 115.53ms
iter 71270: loss 89.9859, time 116.86ms
iter 71280: loss 96.6867, time 115.50ms
iter 71290: loss 86.7736, time 116.76ms
tensor(0.8423)
iter 71300: loss 88.5969, time 118.35ms
iter 71310: loss 77.1327, time 115.87ms
iter 71320: loss 76.7158, time 116.74ms
iter 71330: loss 75.1228, time 117.00ms
iter 71340: loss 86.9541, time 115.53ms
iter 71350: loss 69.0821, time 116.77ms
iter 71360: loss 92.7198, time 115.97ms
iter 71370: loss 71.9452, time 115.32ms
iter 71380: loss 78.7367, time 117.28ms
iter 71390: loss 69.0492, time 115.71ms
tensor(0.8187)
iter 71400: loss 86.4142, time 114.97ms
iter 71410: loss 69.1728, time 118.07ms
iter 71420: loss 72.2907, time 115.54ms
iter 71430: loss 66.9722, time 116.73ms
iter 71440: loss 77.5376, time 116.40ms
iter 71450: loss 74.3031, time 115.36ms
iter 71460: loss 77.2071, time 114.45ms
iter 71470: loss 59.3960, time 115.85ms
iter 71480: loss 84.2236, time 115.37ms
iter 71490: loss 85.5042, time 117.90ms
tensor(0.7939)
step 71500: train loss 33.1952, val loss 33.0984
saving checkpoint to out-shakespeare-char
iter 71500: loss 85.8089, time 2837.13ms
iter 71510: loss 89.1904, time 116.11ms
iter 71520: loss 69.5359, time 116.56ms
iter 71530: loss 106.7576, time 116.57ms
iter 71540: loss 81.8773, time 114.68ms
iter 71550: loss 75.4676, time 116.48ms
iter 71560: loss 74.1476, time 115.65ms
iter 71570: loss 78.3212, time 116.64ms
iter 71580: loss 81.4810, time 116.40ms
iter 71590: loss 79.4960, time 115.08ms
tensor(0.7679)
iter 71600: loss 76.0683, time 115.33ms
iter 71610: loss 75.8655, time 115.50ms
iter 71620: loss 73.0288, time 116.65ms
iter 71630: loss 71.0466, time 118.24ms
iter 71640: loss 67.1680, time 115.78ms
iter 71650: loss 65.1359, time 115.91ms
iter 71660: loss 53.5039, time 115.73ms
iter 71670: loss 55.5246, time 115.41ms
iter 71680: loss 59.7525, time 116.68ms
iter 71690: loss 65.9620, time 116.11ms
tensor(0.7409)
iter 71700: loss 62.1784, time 116.72ms
iter 71710: loss 64.9026, time 117.76ms
iter 71720: loss 67.6081, time 114.98ms
iter 71730: loss 65.5124, time 117.12ms
iter 71740: loss 69.0516, time 114.51ms
step 71750: train loss 13.1102, val loss 12.9899
saving checkpoint to out-shakespeare-char
iter 71750: loss 51.5575, time 2842.86ms
iter 71760: loss 66.6208, time 117.47ms
iter 71770: loss 66.9084, time 117.24ms
iter 71780: loss 70.3607, time 116.97ms
iter 71790: loss 73.7614, time 113.75ms
tensor(0.7129)
iter 71800: loss 71.1720, time 116.64ms
iter 71810: loss 58.8919, time 116.23ms
iter 71820: loss 55.5689, time 115.35ms
iter 71830: loss 70.8092, time 117.99ms
iter 71840: loss 69.3025, time 115.28ms
iter 71850: loss 78.8005, time 116.23ms
iter 71860: loss 76.5577, time 116.29ms
iter 71870: loss 82.8160, time 113.84ms
iter 71880: loss 65.9786, time 116.94ms
iter 71890: loss 64.3472, time 117.22ms
tensor(0.6841)
iter 71900: loss 81.5546, time 114.47ms
iter 71910: loss 67.4369, time 118.30ms
iter 71920: loss 80.8377, time 115.87ms
iter 71930: loss 66.5661, time 114.87ms
iter 71940: loss 67.5831, time 117.16ms
iter 71950: loss 78.1298, time 116.43ms
iter 71960: loss 61.9690, time 114.11ms
iter 71970: loss 63.3876, time 118.19ms
iter 71980: loss 64.0852, time 115.24ms
iter 71990: loss 61.6743, time 117.63ms
tensor(0.6545)
step 72000: train loss 11.1750, val loss 11.2280
saving checkpoint to out-shakespeare-char
iter 72000: loss 64.2319, time 2842.02ms
iter 72010: loss 59.2636, time 116.54ms
iter 72020: loss 78.1547, time 115.07ms
iter 72030: loss 63.5610, time 118.20ms
iter 72040: loss 64.3114, time 115.79ms
iter 72050: loss 68.1263, time 115.84ms
iter 72060: loss 79.2499, time 115.87ms
iter 72070: loss 75.5614, time 114.59ms
iter 72080: loss 71.1733, time 117.25ms
iter 72090: loss 75.3170, time 116.44ms
tensor(0.6243)
iter 72100: loss 63.4654, time 115.21ms
iter 72110: loss 62.3385, time 117.97ms
iter 72120: loss 71.2522, time 115.11ms
iter 72130: loss 63.6977, time 116.48ms
iter 72140: loss 72.2485, time 115.83ms
iter 72150: loss 86.6545, time 114.56ms
iter 72160: loss 66.1833, time 118.27ms
iter 72170: loss 76.9170, time 116.09ms
iter 72180: loss 57.8073, time 115.20ms
iter 72190: loss 65.1735, time 121.34ms
tensor(0.5937)
iter 72200: loss 59.1773, time 115.39ms
iter 72210: loss 73.7960, time 115.10ms
iter 72220: loss 63.0394, time 118.18ms
iter 72230: loss 63.4419, time 115.88ms
iter 72240: loss 63.7233, time 115.09ms
step 72250: train loss 10.7818, val loss 10.8114
saving checkpoint to out-shakespeare-char
iter 72250: loss 46.1282, time 2848.20ms
iter 72260: loss 51.8985, time 115.94ms
iter 72270: loss 46.5822, time 114.65ms
iter 72280: loss 43.9368, time 116.78ms
iter 72290: loss 48.4933, time 116.74ms
tensor(0.5627)
iter 72300: loss 50.4350, time 115.00ms
iter 72310: loss 44.5700, time 117.97ms
iter 72320: loss 46.8859, time 115.88ms
iter 72330: loss 34.2872, time 115.95ms
iter 72340: loss 31.1519, time 116.92ms
iter 72350: loss 30.4499, time 115.16ms
iter 72360: loss 31.8019, time 116.69ms
iter 72370: loss 32.9905, time 117.66ms
iter 72380: loss 35.2213, time 114.53ms
iter 72390: loss 33.9799, time 117.89ms
tensor(0.5314)
iter 72400: loss 27.4439, time 114.89ms
iter 72410: loss 29.0434, time 114.88ms
iter 72420: loss 28.1497, time 118.15ms
iter 72430: loss 28.4274, time 115.37ms
iter 72440: loss 29.7197, time 116.77ms
iter 72450: loss 27.8652, time 117.70ms
iter 72460: loss 29.0905, time 114.51ms
iter 72470: loss 27.6088, time 117.93ms
iter 72480: loss 25.7773, time 115.92ms
iter 72490: loss 24.1139, time 114.85ms
tensor(0.5000)
step 72500: train loss 8.0817, val loss 8.0745
saving checkpoint to out-shakespeare-char
iter 72500: loss 24.1222, time 2841.58ms
iter 72510: loss 25.7190, time 117.89ms
iter 72520: loss 23.0886, time 114.69ms
iter 72530: loss 20.5597, time 117.72ms
iter 72540: loss 22.5451, time 114.25ms
iter 72550: loss 22.9893, time 114.78ms
iter 72560: loss 22.1607, time 117.96ms
iter 72570: loss 19.6814, time 115.23ms
iter 72580: loss 21.7011, time 116.45ms
iter 72590: loss 17.8944, time 116.96ms
tensor(0.4686)
iter 72600: loss 20.3062, time 115.41ms
iter 72610: loss 20.8965, time 117.11ms
iter 72620: loss 17.2932, time 115.81ms
iter 72630: loss 17.6464, time 116.76ms
iter 72640: loss 16.2190, time 118.12ms
iter 72650: loss 16.5582, time 114.68ms
iter 72660: loss 17.4327, time 117.16ms
iter 72670: loss 18.9315, time 116.54ms
iter 72680: loss 17.1280, time 114.68ms
iter 72690: loss 17.7140, time 117.94ms
tensor(0.4373)
iter 72700: loss 17.8399, time 116.69ms
iter 72710: loss 16.9183, time 114.77ms
iter 72720: loss 12.8031, time 118.09ms
iter 72730: loss 15.7689, time 115.25ms
iter 72740: loss 17.4448, time 116.87ms
step 72750: train loss 7.9445, val loss 7.9610
saving checkpoint to out-shakespeare-char
iter 72750: loss 16.4626, time 2841.50ms
iter 72760: loss 16.7874, time 116.58ms
iter 72770: loss 15.0134, time 115.37ms
iter 72780: loss 15.6918, time 117.97ms
iter 72790: loss 16.0598, time 116.03ms
tensor(0.4063)
iter 72800: loss 18.6177, time 117.66ms
iter 72810: loss 14.9164, time 117.88ms
iter 72820: loss 19.0113, time 114.99ms
iter 72830: loss 15.7705, time 116.97ms
iter 72840: loss 23.2538, time 116.65ms
iter 72850: loss 25.0805, time 114.53ms
iter 72860: loss 23.2350, time 118.46ms
iter 72870: loss 17.4931, time 115.51ms
iter 72880: loss 21.5767, time 116.79ms
iter 72890: loss 16.6352, time 116.72ms
tensor(0.3757)
iter 72900: loss 22.2774, time 114.80ms
iter 72910: loss 20.7569, time 115.70ms
iter 72920: loss 19.8618, time 115.90ms
iter 72930: loss 17.7817, time 115.23ms
iter 72940: loss 19.0263, time 117.83ms
iter 72950: loss 17.5717, time 115.43ms
iter 72960: loss 28.7786, time 116.63ms
iter 72970: loss 24.8385, time 116.14ms
iter 72980: loss 22.5493, time 113.57ms
iter 72990: loss 23.9257, time 116.52ms
tensor(0.3455)
step 73000: train loss 7.9864, val loss 7.9504
saving checkpoint to out-shakespeare-char
iter 73000: loss 20.1076, time 2845.66ms
iter 73010: loss 28.9175, time 115.58ms
iter 73020: loss 21.1659, time 117.00ms
iter 73030: loss 22.7927, time 117.76ms
iter 73040: loss 24.9619, time 114.59ms
iter 73050: loss 32.6812, time 115.67ms
iter 73060: loss 25.8630, time 115.95ms
iter 73070: loss 25.1266, time 114.64ms
iter 73080: loss 30.9307, time 118.22ms
iter 73090: loss 30.7094, time 114.82ms
tensor(0.3159)
iter 73100: loss 29.6080, time 117.03ms
iter 73110: loss 27.9821, time 115.80ms
iter 73120: loss 30.8166, time 114.45ms
iter 73130: loss 32.8025, time 116.56ms
iter 73140: loss 33.9481, time 115.72ms
iter 73150: loss 26.5089, time 115.54ms
iter 73160: loss 30.9979, time 117.88ms
iter 73170: loss 32.8958, time 114.41ms
iter 73180: loss 33.2257, time 116.75ms
iter 73190: loss 28.5777, time 115.30ms
tensor(0.2871)
iter 73200: loss 29.0038, time 114.92ms
iter 73210: loss 30.5754, time 118.26ms
iter 73220: loss 27.3971, time 115.23ms
iter 73230: loss 31.4037, time 116.65ms
iter 73240: loss 29.5326, time 117.08ms
step 73250: train loss 7.8725, val loss 7.8669
saving checkpoint to out-shakespeare-char
iter 73250: loss 26.9353, time 2853.13ms
iter 73260: loss 22.9311, time 114.02ms
iter 73270: loss 26.4847, time 113.19ms
iter 73280: loss 28.6893, time 115.69ms
iter 73290: loss 33.5240, time 113.96ms
tensor(0.2591)
iter 73300: loss 29.7434, time 115.39ms
iter 73310: loss 21.7782, time 113.25ms
iter 73320: loss 27.1475, time 113.04ms
iter 73330: loss 23.9723, time 113.16ms
iter 73340: loss 24.1191, time 115.15ms
iter 73350: loss 28.2344, time 113.50ms
iter 73360: loss 25.2588, time 116.48ms
iter 73370: loss 24.4974, time 122.75ms
iter 73380: loss 24.6753, time 120.72ms
iter 73390: loss 26.6694, time 122.31ms
tensor(0.2321)
iter 73400: loss 23.6891, time 121.43ms
iter 73410: loss 22.5138, time 119.54ms
iter 73420: loss 25.2771, time 120.40ms
iter 73430: loss 23.9822, time 119.11ms
iter 73440: loss 26.5124, time 118.76ms
iter 73450: loss 28.3455, time 118.92ms
iter 73460: loss 28.6327, time 119.36ms
iter 73470: loss 25.8334, time 118.98ms
iter 73480: loss 24.9356, time 119.48ms
iter 73490: loss 25.2569, time 118.69ms
tensor(0.2061)
step 73500: train loss 7.8436, val loss 7.7788
saving checkpoint to out-shakespeare-char
iter 73500: loss 26.1127, time 2810.59ms
iter 73510: loss 26.7561, time 118.92ms
iter 73520: loss 22.6211, time 119.51ms
iter 73530: loss 29.8776, time 119.50ms
iter 73540: loss 25.7989, time 119.60ms
iter 73550: loss 22.7028, time 119.92ms
iter 73560: loss 27.3695, time 120.19ms
iter 73570: loss 23.0447, time 120.35ms
iter 73580: loss 24.8248, time 120.50ms
iter 73590: loss 28.5652, time 119.95ms
tensor(0.1813)
iter 73600: loss 26.7350, time 121.25ms
iter 73610: loss 23.3661, time 121.14ms
iter 73620: loss 23.5865, time 121.57ms
iter 73630: loss 24.6287, time 122.09ms
iter 73640: loss 24.2404, time 120.25ms
iter 73650: loss 26.4639, time 122.24ms
iter 73660: loss 21.9995, time 122.01ms
iter 73670: loss 23.8350, time 122.03ms
iter 73680: loss 27.5074, time 122.09ms
iter 73690: loss 23.0693, time 119.84ms
tensor(0.1577)
iter 73700: loss 21.5439, time 122.28ms
iter 73710: loss 22.2159, time 122.06ms
iter 73720: loss 26.3060, time 122.48ms
iter 73730: loss 27.3153, time 122.00ms
iter 73740: loss 23.4158, time 120.41ms
step 73750: train loss 7.8300, val loss 7.8862
saving checkpoint to out-shakespeare-char
iter 73750: loss 24.4123, time 2822.10ms
iter 73760: loss 24.0405, time 121.49ms
iter 73770: loss 26.5614, time 120.31ms
iter 73780: loss 24.3977, time 122.27ms
iter 73790: loss 23.7011, time 121.38ms
tensor(0.1355)
iter 73800: loss 22.7535, time 119.74ms
iter 73810: loss 23.6211, time 119.25ms
iter 73820: loss 17.3212, time 119.19ms
iter 73830: loss 24.1904, time 118.45ms
iter 73840: loss 22.5151, time 118.20ms
iter 73850: loss 23.4533, time 119.49ms
iter 73860: loss 22.9517, time 120.33ms
iter 73870: loss 23.2832, time 120.28ms
iter 73880: loss 25.1418, time 121.43ms
iter 73890: loss 26.0309, time 122.43ms
tensor(0.1147)
iter 73900: loss 20.4054, time 121.01ms
iter 73910: loss 22.8403, time 122.40ms
iter 73920: loss 25.7454, time 122.55ms
iter 73930: loss 25.5367, time 121.35ms
iter 73940: loss 20.4488, time 121.20ms
iter 73950: loss 26.1644, time 119.09ms
iter 73960: loss 26.2689, time 119.05ms
iter 73970: loss 25.4933, time 118.94ms
iter 73980: loss 27.2052, time 119.48ms
iter 73990: loss 24.4125, time 119.36ms
tensor(0.0955)
step 74000: train loss 7.7994, val loss 7.7928
saving checkpoint to out-shakespeare-char
iter 74000: loss 24.5178, time 2847.14ms
iter 74010: loss 20.8324, time 120.76ms
iter 74020: loss 20.6235, time 123.01ms
iter 74030: loss 19.8474, time 121.60ms
iter 74040: loss 21.4356, time 121.24ms
iter 74050: loss 26.7869, time 118.89ms
iter 74060: loss 23.5238, time 119.31ms
iter 74070: loss 23.4620, time 119.39ms
iter 74080: loss 21.3282, time 120.68ms
iter 74090: loss 26.0621, time 121.65ms
tensor(0.0778)
iter 74100: loss 20.2643, time 123.36ms
iter 74110: loss 24.0543, time 123.14ms
iter 74120: loss 19.3578, time 121.48ms
iter 74130: loss 25.1066, time 121.63ms
iter 74140: loss 23.4120, time 119.56ms
iter 74150: loss 24.0522, time 119.62ms
iter 74160: loss 20.3505, time 119.66ms
iter 74170: loss 21.0626, time 120.00ms
iter 74180: loss 20.3762, time 121.28ms
iter 74190: loss 22.7883, time 121.73ms
tensor(0.0618)
iter 74200: loss 21.6993, time 116.57ms
iter 74210: loss 23.4495, time 117.10ms
iter 74220: loss 23.9591, time 117.16ms
iter 74230: loss 25.4022, time 115.89ms
iter 74240: loss 25.1569, time 117.36ms
step 74250: train loss 7.7553, val loss 7.6925
saving checkpoint to out-shakespeare-char
iter 74250: loss 25.3917, time 2862.82ms
iter 74260: loss 18.7469, time 118.14ms
iter 74270: loss 24.5206, time 116.83ms
iter 74280: loss 26.0077, time 114.91ms
iter 74290: loss 21.8590, time 118.21ms
tensor(0.0476)
iter 74300: loss 20.2866, time 116.64ms
iter 74310: loss 23.4408, time 115.02ms
iter 74320: loss 19.8984, time 118.29ms
iter 74330: loss 24.8122, time 115.23ms
iter 74340: loss 18.2255, time 114.86ms
iter 74350: loss 19.8263, time 118.02ms
iter 74360: loss 21.3203, time 115.53ms
iter 74370: loss 24.9062, time 117.16ms
iter 74380: loss 24.1200, time 118.23ms
iter 74390: loss 19.2139, time 115.74ms
tensor(0.0351)
iter 74400: loss 23.5049, time 115.68ms
iter 74410: loss 24.1310, time 118.03ms
iter 74420: loss 19.4975, time 115.91ms
iter 74430: loss 27.1733, time 117.12ms
iter 74440: loss 21.3469, time 118.20ms
iter 74450: loss 25.2173, time 115.60ms
iter 74460: loss 23.4021, time 114.91ms
iter 74470: loss 25.5341, time 118.64ms
iter 74480: loss 26.4653, time 115.64ms
iter 74490: loss 20.8245, time 117.10ms
tensor(0.0245)
step 74500: train loss 7.7074, val loss 7.7082
saving checkpoint to out-shakespeare-char
iter 74500: loss 20.1704, time 2857.60ms
iter 74510: loss 23.0304, time 118.23ms
iter 74520: loss 23.1662, time 115.28ms
iter 74530: loss 18.5923, time 116.50ms
iter 74540: loss 20.9175, time 118.17ms
iter 74550: loss 22.7874, time 114.23ms
iter 74560: loss 23.0011, time 117.16ms
iter 74570: loss 19.7587, time 118.76ms
iter 74580: loss 17.4320, time 115.16ms
iter 74590: loss 24.4162, time 116.87ms
tensor(0.0157)
iter 74600: loss 19.1972, time 118.73ms
iter 74610: loss 21.4149, time 115.25ms
iter 74620: loss 20.8040, time 116.89ms
iter 74630: loss 21.5197, time 118.14ms
iter 74640: loss 26.5298, time 114.96ms
iter 74650: loss 22.4360, time 117.12ms
iter 74660: loss 21.5071, time 117.68ms
iter 74670: loss 21.1934, time 114.78ms
iter 74680: loss 23.8569, time 118.12ms
iter 74690: loss 21.5294, time 117.14ms
tensor(0.0089)
iter 74700: loss 20.6656, time 115.42ms
iter 74710: loss 22.3163, time 117.40ms
iter 74720: loss 20.8882, time 116.23ms
iter 74730: loss 18.2883, time 115.45ms
iter 74740: loss 21.8386, time 118.06ms
step 74750: train loss 7.7571, val loss 7.6740
saving checkpoint to out-shakespeare-char
iter 74750: loss 22.5127, time 2859.09ms
iter 74760: loss 17.9563, time 121.87ms
iter 74770: loss 18.8241, time 121.94ms
iter 74780: loss 22.3982, time 122.45ms
iter 74790: loss 19.2894, time 122.79ms
tensor(0.0039)
iter 74800: loss 19.6254, time 121.56ms
iter 74810: loss 24.4749, time 120.84ms
iter 74820: loss 23.0532, time 121.24ms
iter 74830: loss 23.2497, time 120.99ms
iter 74840: loss 19.7496, time 119.30ms
iter 74850: loss 19.2356, time 119.22ms
iter 74860: loss 21.3609, time 118.41ms
iter 74870: loss 21.0690, time 119.17ms
iter 74880: loss 19.2152, time 119.31ms
iter 74890: loss 22.5767, time 119.65ms
tensor(0.0010)
iter 74900: loss 25.3615, time 120.94ms
iter 74910: loss 23.8974, time 119.39ms
iter 74920: loss 19.6293, time 120.41ms
iter 74930: loss 18.6476, time 120.17ms
iter 74940: loss 22.0014, time 121.53ms
iter 74950: loss 23.3486, time 121.52ms
iter 74960: loss 19.7826, time 122.90ms
iter 74970: loss 21.3201, time 122.63ms
iter 74980: loss 20.6178, time 120.32ms
iter 74990: loss 20.9380, time 121.57ms
tensor(0.0010)
step 75000: train loss 7.7162, val loss 7.7188
saving checkpoint to out-shakespeare-char
iter 75000: loss 20.2848, time 2857.72ms
iter 75010: loss 18.6935, time 120.44ms
iter 75020: loss 18.0556, time 120.15ms
iter 75030: loss 21.3673, time 120.42ms
iter 75040: loss 21.8010, time 120.32ms
iter 75050: loss 21.4332, time 121.41ms
iter 75060: loss 22.7149, time 122.41ms
iter 75070: loss 19.9644, time 122.34ms
iter 75080: loss 20.2003, time 121.26ms
iter 75090: loss 22.4804, time 119.33ms
tensor(0.0010)
iter 75100: loss 22.1302, time 121.25ms
iter 75110: loss 21.4775, time 121.52ms
iter 75120: loss 21.2994, time 119.06ms
iter 75130: loss 20.0380, time 119.49ms
iter 75140: loss 20.5149, time 120.90ms
iter 75150: loss 19.2165, time 119.60ms
iter 75160: loss 19.6147, time 120.28ms
iter 75170: loss 21.2800, time 120.26ms
iter 75180: loss 19.3495, time 120.77ms
iter 75190: loss 17.2795, time 121.15ms
tensor(0.0039)
iter 75200: loss 24.0989, time 121.78ms
iter 75210: loss 21.2899, time 121.92ms
iter 75220: loss 21.7554, time 121.32ms
iter 75230: loss 20.7170, time 121.30ms
iter 75240: loss 19.9657, time 121.29ms
step 75250: train loss 7.6896, val loss 7.6884
saving checkpoint to out-shakespeare-char
iter 75250: loss 20.4096, time 2849.60ms
iter 75260: loss 25.1563, time 119.30ms
iter 75270: loss 19.3458, time 120.45ms
iter 75280: loss 23.8790, time 120.24ms
iter 75290: loss 21.0873, time 121.42ms
tensor(0.0089)
iter 75300: loss 21.4269, time 122.30ms
iter 75310: loss 20.3739, time 120.81ms
iter 75320: loss 20.0924, time 121.11ms
iter 75330: loss 19.8233, time 120.83ms
iter 75340: loss 20.7747, time 121.06ms
iter 75350: loss 22.3330, time 121.30ms
iter 75360: loss 21.9627, time 120.23ms
iter 75370: loss 20.4816, time 120.96ms
iter 75380: loss 20.7982, time 119.26ms
iter 75390: loss 20.6073, time 119.80ms
tensor(0.0157)
iter 75400: loss 21.2290, time 120.87ms
iter 75410: loss 20.3780, time 120.27ms
iter 75420: loss 22.1188, time 120.30ms
iter 75430: loss 22.8046, time 119.67ms
iter 75440: loss 19.4205, time 122.54ms
iter 75450: loss 20.2719, time 122.07ms
iter 75460: loss 21.7438, time 122.92ms
iter 75470: loss 22.0129, time 121.60ms
iter 75480: loss 19.5445, time 121.36ms
iter 75490: loss 25.8282, time 119.59ms
tensor(0.0245)
step 75500: train loss 7.6866, val loss 7.6742
saving checkpoint to out-shakespeare-char
iter 75500: loss 20.9175, time 2853.64ms
iter 75510: loss 20.2246, time 121.46ms
iter 75520: loss 20.1933, time 121.93ms
iter 75530: loss 22.0497, time 122.35ms
iter 75540: loss 22.0117, time 121.88ms
iter 75550: loss 22.8478, time 119.46ms
iter 75560: loss 22.6168, time 121.71ms
iter 75570: loss 19.6095, time 121.56ms
iter 75580: loss 21.8566, time 120.01ms
iter 75590: loss 20.1757, time 119.30ms
tensor(0.0351)
iter 75600: loss 18.7415, time 120.95ms
iter 75610: loss 25.2991, time 119.16ms
iter 75620: loss 21.7561, time 120.28ms
iter 75630: loss 20.5156, time 120.28ms
iter 75640: loss 18.2841, time 121.72ms
iter 75650: loss 21.3153, time 121.72ms
iter 75660: loss 19.7993, time 121.41ms
iter 75670: loss 22.3097, time 122.31ms
iter 75680: loss 21.8422, time 121.26ms
iter 75690: loss 19.5803, time 121.13ms
tensor(0.0476)
iter 75700: loss 21.7389, time 121.68ms
iter 75710: loss 20.5857, time 121.24ms
iter 75720: loss 19.6129, time 117.74ms
iter 75730: loss 19.7008, time 119.30ms
iter 75740: loss 19.9238, time 119.45ms
step 75750: train loss 7.7029, val loss 7.7026
saving checkpoint to out-shakespeare-char
iter 75750: loss 21.0532, time 2839.57ms
iter 75760: loss 20.3297, time 120.78ms
iter 75770: loss 20.4300, time 120.48ms
iter 75780: loss 20.0874, time 122.03ms
iter 75790: loss 16.2374, time 122.55ms
tensor(0.0618)
iter 75800: loss 21.4757, time 122.68ms
iter 75810: loss 20.3722, time 121.51ms
iter 75820: loss 17.4384, time 121.03ms
iter 75830: loss 17.4436, time 119.68ms
iter 75840: loss 18.9243, time 121.21ms
iter 75850: loss 19.3325, time 120.78ms
iter 75860: loss 19.1557, time 121.00ms
iter 75870: loss 20.3396, time 119.02ms
iter 75880: loss 18.0889, time 117.56ms
iter 75890: loss 17.0065, time 118.95ms
tensor(0.0778)
iter 75900: loss 17.0185, time 119.43ms
iter 75910: loss 17.8482, time 120.44ms
iter 75920: loss 17.2774, time 120.19ms
iter 75930: loss 19.3041, time 119.98ms
iter 75940: loss 19.3510, time 120.25ms
iter 75950: loss 17.2561, time 120.20ms
iter 75960: loss 18.5937, time 121.03ms
iter 75970: loss 17.6472, time 121.03ms
iter 75980: loss 15.4713, time 121.92ms
iter 75990: loss 18.4430, time 122.47ms
tensor(0.0955)
step 76000: train loss 7.7684, val loss 7.7085
saving checkpoint to out-shakespeare-char
iter 76000: loss 19.0356, time 2857.31ms
iter 76010: loss 16.5877, time 119.29ms
iter 76020: loss 17.7465, time 119.60ms
iter 76030: loss 14.8690, time 119.36ms
iter 76040: loss 17.1117, time 119.54ms
iter 76050: loss 14.5434, time 118.80ms
iter 76060: loss 15.2389, time 121.36ms
iter 76070: loss 17.6547, time 119.18ms
iter 76080: loss 16.2244, time 120.22ms
iter 76090: loss 16.1531, time 120.44ms
tensor(0.1147)
iter 76100: loss 18.6423, time 121.50ms
iter 76110: loss 15.9117, time 120.44ms
iter 76120: loss 17.2234, time 122.56ms
iter 76130: loss 15.8737, time 121.33ms
iter 76140: loss 15.8516, time 121.29ms
iter 76150: loss 17.8409, time 121.40ms
iter 76160: loss 15.7739, time 119.35ms
iter 76170: loss 16.4365, time 121.29ms
iter 76180: loss 17.9436, time 119.60ms
iter 76190: loss 16.1029, time 120.33ms
tensor(0.1355)
iter 76200: loss 14.0182, time 120.59ms
iter 76210: loss 16.1360, time 120.13ms
iter 76220: loss 17.8489, time 118.94ms
iter 76230: loss 16.4029, time 120.48ms
iter 76240: loss 17.3591, time 121.04ms
step 76250: train loss 7.7931, val loss 7.8082
saving checkpoint to out-shakespeare-char
iter 76250: loss 16.4870, time 2847.02ms
iter 76260: loss 16.2381, time 120.95ms
iter 76270: loss 15.1931, time 118.05ms
iter 76280: loss 16.1068, time 120.35ms
iter 76290: loss 16.6075, time 120.41ms
tensor(0.1577)
iter 76300: loss 14.3075, time 118.68ms
iter 76310: loss 15.7004, time 118.20ms
iter 76320: loss 16.9608, time 118.48ms
iter 76330: loss 14.9155, time 117.97ms
iter 76340: loss 14.1630, time 120.24ms
iter 76350: loss 16.2216, time 120.52ms
iter 76360: loss 14.8348, time 120.29ms
iter 76370: loss 14.5616, time 120.18ms
iter 76380: loss 13.8320, time 118.44ms
iter 76390: loss 12.8746, time 120.27ms
tensor(0.1813)
iter 76400: loss 15.0108, time 121.47ms
iter 76410: loss 14.8547, time 121.65ms
iter 76420: loss 15.8270, time 122.35ms
iter 76430: loss 13.6481, time 121.29ms
iter 76440: loss 11.7117, time 122.58ms
iter 76450: loss 12.8087, time 121.59ms
iter 76460: loss 13.7671, time 121.32ms
iter 76470: loss 15.3074, time 121.84ms
iter 76480: loss 13.7386, time 121.02ms
iter 76490: loss 12.4424, time 120.97ms
tensor(0.2061)
step 76500: train loss 7.8291, val loss 7.8475
saving checkpoint to out-shakespeare-char
iter 76500: loss 12.9212, time 2852.31ms
iter 76510: loss 13.8156, time 120.81ms
iter 76520: loss 12.3116, time 120.65ms
iter 76530: loss 13.9354, time 121.37ms
iter 76540: loss 13.1030, time 120.38ms
iter 76550: loss 12.4964, time 121.95ms
iter 76560: loss 13.1173, time 122.89ms
iter 76570: loss 13.1197, time 121.55ms
iter 76580: loss 12.9601, time 121.34ms
iter 76590: loss 14.4285, time 121.31ms
tensor(0.2321)
iter 76600: loss 12.2223, time 121.51ms
iter 76610: loss 12.9317, time 121.52ms
iter 76620: loss 13.3142, time 121.18ms
iter 76630: loss 13.9637, time 121.19ms
iter 76640: loss 11.3111, time 118.83ms
iter 76650: loss 14.4967, time 119.39ms
iter 76660: loss 14.5051, time 120.05ms
iter 76670: loss 15.0261, time 120.28ms
iter 76680: loss 15.2094, time 120.43ms
iter 76690: loss 14.5428, time 120.32ms
tensor(0.2591)
iter 76700: loss 15.5037, time 120.62ms
iter 76710: loss 15.3788, time 121.09ms
iter 76720: loss 16.2512, time 120.28ms
iter 76730: loss 15.8064, time 122.58ms
iter 76740: loss 14.8409, time 122.74ms
step 76750: train loss 7.7528, val loss 7.8087
saving checkpoint to out-shakespeare-char
iter 76750: loss 16.6213, time 2865.65ms
iter 76760: loss 15.0688, time 120.70ms
iter 76770: loss 14.6900, time 120.50ms
iter 76780: loss 17.6012, time 120.66ms
iter 76790: loss 16.3509, time 122.01ms
tensor(0.2871)
iter 76800: loss 16.6171, time 122.87ms
iter 76810: loss 15.6399, time 122.64ms
iter 76820: loss 14.2515, time 121.36ms
iter 76830: loss 15.0788, time 119.27ms
iter 76840: loss 13.8037, time 121.28ms
iter 76850: loss 16.1202, time 121.30ms
iter 76860: loss 17.2118, time 119.23ms
iter 76870: loss 16.1459, time 118.80ms
iter 76880: loss 14.8275, time 120.54ms
iter 76890: loss 18.6620, time 119.33ms
tensor(0.3159)
iter 76900: loss 15.2239, time 120.59ms
iter 76910: loss 19.6468, time 120.45ms
iter 76920: loss 17.1099, time 120.14ms
iter 76930: loss 16.1586, time 121.56ms
iter 76940: loss 19.2572, time 121.82ms
iter 76950: loss 19.3007, time 122.32ms
iter 76960: loss 17.8999, time 121.32ms
iter 76970: loss 20.0534, time 121.11ms
iter 76980: loss 17.4578, time 121.24ms
iter 76990: loss 17.9515, time 121.23ms
tensor(0.3455)
step 77000: train loss 7.7945, val loss 7.7764
saving checkpoint to out-shakespeare-char
iter 77000: loss 15.8779, time 2854.74ms
iter 77010: loss 17.9078, time 121.35ms
iter 77020: loss 16.9989, time 122.10ms
iter 77030: loss 16.3754, time 121.98ms
iter 77040: loss 15.2577, time 122.48ms
iter 77050: loss 19.3619, time 121.28ms
iter 77060: loss 18.3419, time 121.42ms
iter 77070: loss 16.2460, time 120.59ms
iter 77080: loss 19.2758, time 121.27ms
iter 77090: loss 18.1398, time 121.36ms
tensor(0.3757)
iter 77100: loss 18.9432, time 120.78ms
iter 77110: loss 20.6825, time 119.01ms
iter 77120: loss 15.6045, time 119.03ms
iter 77130: loss 15.0289, time 119.27ms
iter 77140: loss 20.0105, time 120.32ms
iter 77150: loss 19.4832, time 120.41ms
iter 77160: loss 17.1466, time 120.31ms
iter 77170: loss 23.1289, time 119.45ms
iter 77180: loss 24.0900, time 120.46ms
iter 77190: loss 20.6714, time 120.11ms
tensor(0.4063)
iter 77200: loss 21.2631, time 120.21ms
iter 77210: loss 17.2417, time 121.07ms
iter 77220: loss 18.0733, time 122.06ms
iter 77230: loss 23.7455, time 120.27ms
iter 77240: loss 19.7699, time 121.32ms
step 77250: train loss 7.7613, val loss 7.8075
saving checkpoint to out-shakespeare-char
iter 77250: loss 16.7523, time 2860.40ms
iter 77260: loss 19.4738, time 120.50ms
iter 77270: loss 19.0235, time 120.51ms
iter 77280: loss 21.9267, time 120.52ms
iter 77290: loss 16.5456, time 120.07ms
tensor(0.4373)
iter 77300: loss 17.4691, time 121.33ms
iter 77310: loss 21.4459, time 121.44ms
iter 77320: loss 20.2287, time 122.68ms
iter 77330: loss 21.1503, time 121.49ms
iter 77340: loss 20.0709, time 119.03ms
iter 77350: loss 20.1722, time 121.25ms
iter 77360: loss 19.3174, time 121.44ms
iter 77370: loss 20.9914, time 119.73ms
iter 77380: loss 20.0809, time 119.20ms
iter 77390: loss 20.9516, time 119.05ms
tensor(0.4686)
iter 77400: loss 24.2173, time 119.44ms
iter 77410: loss 19.4365, time 120.13ms
iter 77420: loss 23.6908, time 120.14ms
iter 77430: loss 21.1319, time 120.19ms
iter 77440: loss 22.6687, time 120.18ms
iter 77450: loss 19.1643, time 120.11ms
iter 77460: loss 23.0685, time 120.55ms
iter 77470: loss 22.7051, time 121.44ms
iter 77480: loss 22.0235, time 122.36ms
iter 77490: loss 24.0901, time 122.60ms
tensor(0.5000)
step 77500: train loss 7.7332, val loss 7.7992
saving checkpoint to out-shakespeare-char
iter 77500: loss 23.7674, time 2845.46ms
iter 77510: loss 19.0834, time 119.48ms
iter 77520: loss 28.9756, time 119.49ms
iter 77530: loss 23.6181, time 120.14ms
iter 77540: loss 23.7586, time 120.28ms
iter 77550: loss 22.7014, time 120.50ms
iter 77560: loss 25.5904, time 119.56ms
iter 77570: loss 21.2528, time 120.61ms
iter 77580: loss 25.9211, time 121.41ms
iter 77590: loss 26.8679, time 122.02ms
tensor(0.5314)
iter 77600: loss 24.4624, time 122.75ms
iter 77610: loss 20.7139, time 121.22ms
iter 77620: loss 25.1166, time 121.50ms
iter 77630: loss 32.4239, time 121.15ms
iter 77640: loss 24.6050, time 121.27ms
iter 77650: loss 23.0752, time 121.22ms
iter 77660: loss 33.4488, time 119.46ms
iter 77670: loss 27.8987, time 119.83ms
iter 77680: loss 31.5796, time 119.81ms
iter 77690: loss 26.5393, time 120.29ms
tensor(0.5627)
iter 77700: loss 24.1905, time 120.66ms
iter 77710: loss 28.9404, time 120.17ms
iter 77720: loss 29.2279, time 120.67ms
iter 77730: loss 30.6433, time 121.19ms
iter 77740: loss 22.0043, time 120.52ms
step 77750: train loss 8.0656, val loss 8.0476
saving checkpoint to out-shakespeare-char
iter 77750: loss 20.9353, time 2853.66ms
iter 77760: loss 21.9534, time 119.54ms
iter 77770: loss 27.0160, time 118.29ms
iter 77780: loss 19.2866, time 119.71ms
iter 77790: loss 23.9865, time 120.46ms
tensor(0.5937)
iter 77800: loss 20.8160, time 120.43ms
iter 77810: loss 25.9717, time 120.33ms
iter 77820: loss 28.0891, time 120.63ms
iter 77830: loss 23.6589, time 120.82ms
iter 77840: loss 28.0704, time 121.95ms
iter 77850: loss 23.6749, time 120.34ms
iter 77860: loss 33.2644, time 122.52ms
iter 77870: loss 28.0115, time 121.65ms
iter 77880: loss 24.2248, time 121.40ms
iter 77890: loss 28.1625, time 121.46ms
tensor(0.6243)
iter 77900: loss 31.2938, time 119.62ms
iter 77910: loss 31.2563, time 119.24ms
iter 77920: loss 32.2608, time 120.47ms
iter 77930: loss 24.1098, time 121.21ms
iter 77940: loss 34.0585, time 119.67ms
iter 77950: loss 32.3671, time 120.88ms
iter 77960: loss 42.4798, time 120.87ms
iter 77970: loss 44.9138, time 122.98ms
iter 77980: loss 40.4348, time 121.68ms
iter 77990: loss 36.0846, time 120.82ms
tensor(0.6545)
step 78000: train loss 7.9416, val loss 7.9842
saving checkpoint to out-shakespeare-char
iter 78000: loss 30.4266, time 2853.10ms
iter 78010: loss 37.8661, time 120.40ms
iter 78020: loss 48.8015, time 119.12ms
iter 78030: loss 34.0274, time 119.52ms
iter 78040: loss 48.3836, time 120.12ms
iter 78050: loss 46.6431, time 120.59ms
iter 78060: loss 46.5887, time 121.18ms
iter 78070: loss 52.0080, time 120.19ms
iter 78080: loss 60.4632, time 121.91ms
iter 78090: loss 35.9907, time 122.59ms
tensor(0.6841)
iter 78100: loss 50.3390, time 121.96ms
iter 78110: loss 42.4006, time 121.80ms
iter 78120: loss 53.5537, time 121.09ms
iter 78130: loss 44.8766, time 121.00ms
iter 78140: loss 46.8824, time 121.05ms
iter 78150: loss 42.8448, time 120.33ms
iter 78160: loss 43.4131, time 121.11ms
iter 78170: loss 34.0296, time 120.41ms
iter 78180: loss 36.1764, time 119.22ms
iter 78190: loss 35.1253, time 119.24ms
tensor(0.7129)
iter 78200: loss 32.3147, time 118.57ms
iter 78210: loss 27.6040, time 118.90ms
iter 78220: loss 22.1333, time 118.70ms
iter 78230: loss 18.2159, time 120.09ms
iter 78240: loss 26.6315, time 120.03ms
step 78250: train loss 7.9828, val loss 7.9675
saving checkpoint to out-shakespeare-char
iter 78250: loss 24.3583, time 2843.66ms
iter 78260: loss 20.0283, time 122.32ms
iter 78270: loss 24.5149, time 121.80ms
iter 78280: loss 19.9761, time 121.01ms
iter 78290: loss 48.2112, time 123.00ms
tensor(0.7409)
iter 78300: loss 83.7800, time 122.75ms
iter 78310: loss 22.8584, time 120.44ms
iter 78320: loss 33.6375, time 120.79ms
iter 78330: loss 68.9514, time 121.21ms
iter 78340: loss 54.9999, time 119.13ms
iter 78350: loss 55.2638, time 119.14ms
iter 78360: loss 40.3014, time 118.50ms
iter 78370: loss 112.8489, time 119.04ms
iter 78380: loss 54.4746, time 119.47ms
iter 78390: loss 59.7716, time 120.47ms
tensor(0.7679)
iter 78400: loss 52.4899, time 119.53ms
iter 78410: loss 39.4213, time 120.33ms
iter 78420: loss 58.2832, time 120.53ms
iter 78430: loss 66.2226, time 120.56ms
iter 78440: loss 211.4515, time 121.81ms
iter 78450: loss 85.1111, time 123.01ms
iter 78460: loss 54.6973, time 120.43ms
iter 78470: loss 46.0905, time 122.38ms
iter 78480: loss 72.6765, time 120.63ms
iter 78490: loss 51.1448, time 121.20ms
tensor(0.7939)
step 78500: train loss 32.4067, val loss 32.5166
saving checkpoint to out-shakespeare-char
iter 78500: loss 57.9773, time 2858.04ms
iter 78510: loss 272.9227, time 121.45ms
iter 78520: loss 68.2504, time 120.55ms
iter 78530: loss 156.7607, time 122.57ms
iter 78540: loss 112.8262, time 122.58ms
iter 78550: loss 68.0026, time 122.31ms
iter 78560: loss 69.2500, time 120.37ms
iter 78570: loss 62.8137, time 119.42ms
iter 78580: loss 61.6500, time 119.37ms
iter 78590: loss 61.6181, time 119.23ms
tensor(0.8187)
iter 78600: loss 92.0433, time 119.47ms
iter 78610: loss 64.9053, time 119.61ms
iter 78620: loss 52.4514, time 120.44ms
iter 78630: loss 64.5300, time 120.15ms
iter 78640: loss 269.4722, time 122.20ms
iter 78650: loss 119.1705, time 122.67ms
iter 78660: loss 71.1933, time 122.54ms
iter 78670: loss 120.4705, time 123.01ms
iter 78680: loss 74.5167, time 121.77ms
iter 78690: loss 107.3826, time 121.51ms
tensor(0.8423)
iter 78700: loss 131.7975, time 119.69ms
iter 78710: loss 85.1299, time 119.15ms
iter 78720: loss 70.4951, time 119.29ms
iter 78730: loss 98.7818, time 120.37ms
iter 78740: loss 135.0656, time 120.78ms
step 78750: train loss 23.9385, val loss 23.9794
saving checkpoint to out-shakespeare-char
iter 78750: loss 43.1004, time 2853.62ms
iter 78760: loss 71.4869, time 121.46ms
iter 78770: loss 93.2839, time 119.54ms
iter 78780: loss 44.2194, time 115.32ms
iter 78790: loss 61.0564, time 116.16ms
tensor(0.8645)
iter 78800: loss 46.9707, time 113.67ms
iter 78810: loss 43.0873, time 116.90ms
iter 78820: loss 60.8428, time 115.14ms
iter 78830: loss 68.4304, time 114.86ms
iter 78840: loss 80.3612, time 117.48ms
iter 78850: loss 73.1385, time 114.25ms
iter 78860: loss 91.2031, time 118.02ms
iter 78870: loss 92.3595, time 114.95ms
iter 78880: loss 170.9342, time 114.51ms
iter 78890: loss 50.0479, time 117.20ms
tensor(0.8853)
iter 78900: loss 85.1534, time 116.35ms
iter 78910: loss 64.6123, time 116.37ms
iter 78920: loss 192.7175, time 117.14ms
iter 78930: loss 64.0967, time 114.58ms
iter 78940: loss 137.0262, time 117.91ms
iter 78950: loss 87.9922, time 116.24ms
iter 78960: loss 65.0873, time 114.54ms
iter 78970: loss 67.1274, time 118.06ms
iter 78980: loss 62.2962, time 115.32ms
iter 78990: loss 65.1655, time 117.23ms
tensor(0.9045)
step 79000: train loss 28.1093, val loss 28.5646
saving checkpoint to out-shakespeare-char
iter 79000: loss 69.5744, time 2843.99ms
iter 79010: loss 60.3732, time 116.47ms
iter 79020: loss 65.2162, time 114.58ms
iter 79030: loss 73.8180, time 117.91ms
iter 79040: loss 64.6787, time 116.19ms
iter 79050: loss 75.8853, time 114.98ms
iter 79060: loss 76.3216, time 118.15ms
iter 79070: loss 77.3020, time 115.38ms
iter 79080: loss 84.3749, time 116.78ms
iter 79090: loss 155.3991, time 117.44ms
tensor(0.9222)
iter 79100: loss 81.4678, time 115.22ms
iter 79110: loss 73.3075, time 117.87ms
iter 79120: loss 77.7795, time 116.11ms
iter 79130: loss 129.9025, time 114.80ms
iter 79140: loss 102.2507, time 118.05ms
iter 79150: loss 67.9073, time 114.74ms
iter 79160: loss 80.2260, time 114.64ms
iter 79170: loss 56.5482, time 117.96ms
iter 79180: loss 88.4639, time 114.98ms
iter 79190: loss 84.1914, time 117.97ms
tensor(0.9382)
iter 79200: loss 56.4037, time 116.78ms
iter 79210: loss 91.6597, time 113.66ms
iter 79220: loss 87.8580, time 115.61ms
iter 79230: loss 65.6971, time 116.09ms
iter 79240: loss 74.5736, time 115.97ms
step 79250: train loss 45.8440, val loss 46.4223
saving checkpoint to out-shakespeare-char
iter 79250: loss 83.4953, time 2844.72ms
iter 79260: loss 78.4992, time 117.60ms
iter 79270: loss 76.5319, time 115.12ms
iter 79280: loss 76.4923, time 117.29ms
iter 79290: loss 70.3417, time 122.53ms
tensor(0.9524)
iter 79300: loss 76.5207, time 121.64ms
iter 79310: loss 77.7993, time 121.02ms
iter 79320: loss 67.3912, time 122.30ms
iter 79330: loss 74.8082, time 122.41ms
iter 79340: loss 90.0283, time 122.40ms
iter 79350: loss 69.2228, time 121.98ms
iter 79360: loss 76.6646, time 120.97ms
iter 79370: loss 80.0325, time 122.22ms
iter 79380: loss 58.4465, time 122.14ms
iter 79390: loss 55.0030, time 121.47ms
tensor(0.9649)
iter 79400: loss 58.4305, time 122.28ms
iter 79410: loss 54.6376, time 121.25ms
iter 79420: loss 61.8005, time 122.08ms
iter 79430: loss 69.7348, time 122.05ms
iter 79440: loss 79.4857, time 122.17ms
iter 79450: loss 87.6565, time 121.93ms
iter 79460: loss 91.6315, time 120.56ms
iter 79470: loss 86.8429, time 122.07ms
iter 79480: loss 91.0469, time 122.26ms
iter 79490: loss 85.6230, time 122.05ms
tensor(0.9755)
step 79500: train loss 49.0508, val loss 48.5980
saving checkpoint to out-shakespeare-char
iter 79500: loss 96.9987, time 2847.63ms
iter 79510: loss 115.6534, time 119.29ms
iter 79520: loss 111.8813, time 121.07ms
iter 79530: loss 77.8569, time 120.99ms
iter 79540: loss 89.9423, time 118.93ms
iter 79550: loss 98.9477, time 118.69ms
iter 79560: loss 107.8585, time 118.85ms
iter 79570: loss 96.0128, time 118.69ms
iter 79580: loss 76.9051, time 118.73ms
iter 79590: loss 90.3337, time 118.40ms
tensor(0.9843)
iter 79600: loss 76.1380, time 119.24ms
iter 79610: loss 74.9480, time 119.11ms
iter 79620: loss 70.6591, time 117.84ms
iter 79630: loss 69.7272, time 118.83ms
iter 79640: loss 77.0502, time 118.06ms
iter 79650: loss 76.6899, time 119.30ms
iter 79660: loss 104.5948, time 118.54ms
iter 79670: loss 103.5428, time 118.79ms
iter 79680: loss 88.8493, time 119.01ms
iter 79690: loss 114.3211, time 118.21ms
tensor(0.9911)
iter 79700: loss 104.6777, time 119.61ms
iter 79710: loss 102.4401, time 119.54ms
iter 79720: loss 98.9191, time 118.64ms
iter 79730: loss 90.4143, time 118.94ms
iter 79740: loss 124.9749, time 120.02ms
step 79750: train loss 43.1523, val loss 42.9843
saving checkpoint to out-shakespeare-char
iter 79750: loss 88.0394, time 2853.13ms
iter 79760: loss 82.3201, time 122.89ms
iter 79770: loss 66.4013, time 118.77ms
iter 79780: loss 53.9023, time 121.24ms
iter 79790: loss 70.5366, time 120.84ms
tensor(0.9961)
iter 79800: loss 86.3790, time 121.20ms
iter 79810: loss 95.6633, time 120.62ms
iter 79820: loss 93.4993, time 118.67ms
iter 79830: loss 87.5195, time 119.21ms
iter 79840: loss 104.4664, time 119.05ms
iter 79850: loss 88.1707, time 118.74ms
iter 79860: loss 79.5980, time 118.84ms
iter 79870: loss 85.0619, time 118.94ms
iter 79880: loss 89.6676, time 118.15ms
iter 79890: loss 91.4446, time 118.52ms
tensor(0.9990)
iter 79900: loss 98.1908, time 119.03ms
iter 79910: loss 94.5161, time 117.69ms
iter 79920: loss 74.1503, time 118.76ms
iter 79930: loss 94.5285, time 118.63ms
iter 79940: loss 70.6244, time 118.66ms
iter 79950: loss 89.7531, time 118.59ms
iter 79960: loss 104.2037, time 118.56ms
iter 79970: loss 99.2445, time 118.51ms
iter 79980: loss 115.3309, time 118.70ms
iter 79990: loss 94.1595, time 118.83ms
tensor(1.)
step 80000: train loss 54.9491, val loss 54.7428
saving checkpoint to out-shakespeare-char
iter 80000: loss 87.9236, time 2822.33ms
iter 80010: loss 108.8823, time 118.79ms
iter 80020: loss 112.2614, time 118.62ms
iter 80030: loss 119.9005, time 118.66ms
iter 80040: loss 91.0210, time 118.03ms
iter 80050: loss 77.0834, time 118.46ms
iter 80060: loss 98.3968, time 118.59ms
iter 80070: loss 100.6712, time 118.49ms
iter 80080: loss 95.4400, time 118.58ms
iter 80090: loss 118.1185, time 118.45ms
tensor(0.9990)
iter 80100: loss 77.4337, time 119.11ms
iter 80110: loss 109.3122, time 119.25ms
iter 80120: loss 90.6069, time 118.51ms
iter 80130: loss 138.0430, time 118.66ms
iter 80140: loss 118.5819, time 118.55ms
iter 80150: loss 95.9880, time 118.65ms
iter 80160: loss 91.7603, time 118.75ms
iter 80170: loss 83.5174, time 118.55ms
iter 80180: loss 117.6360, time 118.60ms
iter 80190: loss 90.4990, time 118.67ms
tensor(0.9961)
iter 80200: loss 105.5479, time 118.82ms
iter 80210: loss 92.5610, time 118.49ms
iter 80220: loss 100.1177, time 118.48ms
iter 80230: loss 105.2780, time 118.62ms
iter 80240: loss 97.9108, time 118.26ms
step 80250: train loss 78.1818, val loss 77.7970
saving checkpoint to out-shakespeare-char
iter 80250: loss 118.9643, time 2850.47ms
iter 80260: loss 107.8547, time 120.69ms
iter 80270: loss 155.1270, time 121.67ms
iter 80280: loss 112.2033, time 121.82ms
iter 80290: loss 123.1838, time 122.20ms
tensor(0.9911)
iter 80300: loss 105.0338, time 122.09ms
iter 80310: loss 111.3557, time 120.60ms
iter 80320: loss 99.6471, time 122.29ms
iter 80330: loss 120.7700, time 121.81ms
iter 80340: loss 145.6588, time 122.31ms
iter 80350: loss 125.4408, time 121.37ms
iter 80360: loss 164.7010, time 120.79ms
iter 80370: loss 105.1322, time 121.66ms
iter 80380: loss 600.5383, time 122.39ms
iter 80390: loss 146.2051, time 122.06ms
tensor(0.9843)
iter 80400: loss 118.7392, time 122.11ms
iter 80410: loss 114.7761, time 120.82ms
iter 80420: loss 81.0830, time 121.94ms
iter 80430: loss 80.8747, time 122.03ms
iter 80440: loss 79.3752, time 121.94ms
iter 80450: loss 80.5121, time 121.42ms
iter 80460: loss 68.9367, time 120.85ms
iter 80470: loss 56.2520, time 122.00ms
iter 80480: loss 89.8033, time 121.91ms
iter 80490: loss 103.9686, time 122.34ms
tensor(0.9755)
step 80500: train loss 49.7780, val loss 50.2963
saving checkpoint to out-shakespeare-char
iter 80500: loss 91.2220, time 2847.91ms
iter 80510: loss 125.5624, time 118.99ms
iter 80520: loss 277.0956, time 119.26ms
iter 80530: loss 129.1038, time 118.61ms
iter 80540: loss 105.4691, time 118.62ms
iter 80550: loss 91.1037, time 119.25ms
iter 80560: loss 104.5362, time 119.14ms
iter 80570: loss 86.3565, time 118.71ms
iter 80580: loss 84.8925, time 118.65ms
iter 80590: loss 113.5982, time 118.71ms
tensor(0.9649)
iter 80600: loss 85.5611, time 119.04ms
iter 80610: loss 87.2710, time 118.57ms
iter 80620: loss 106.5708, time 118.87ms
iter 80630: loss 101.7802, time 118.89ms
iter 80640: loss 91.7636, time 119.19ms
iter 80650: loss 86.2264, time 119.34ms
iter 80660: loss 90.4548, time 119.58ms
iter 80670: loss 101.3645, time 118.82ms
iter 80680: loss 120.6289, time 119.77ms
iter 80690: loss 97.4536, time 120.30ms
tensor(0.9524)
iter 80700: loss 100.3952, time 121.26ms
iter 80710: loss 93.6150, time 120.58ms
iter 80720: loss 88.2436, time 119.57ms
iter 80730: loss 84.8762, time 120.98ms
iter 80740: loss 97.2615, time 121.96ms
step 80750: train loss 40.2566, val loss 40.5757
saving checkpoint to out-shakespeare-char
iter 80750: loss 73.2662, time 2844.65ms
iter 80760: loss 83.0634, time 121.06ms
iter 80770: loss 100.6097, time 118.71ms
iter 80780: loss 105.1977, time 121.00ms
iter 80790: loss 95.4877, time 120.86ms
tensor(0.9382)
iter 80800: loss 109.7088, time 119.28ms
iter 80810: loss 103.2794, time 118.61ms
iter 80820: loss 130.8181, time 118.71ms
iter 80830: loss 97.8256, time 118.88ms
iter 80840: loss 90.7555, time 118.64ms
iter 80850: loss 95.8010, time 118.79ms
iter 80860: loss 98.9651, time 117.91ms
iter 80870: loss 88.8787, time 118.80ms
iter 80880: loss 107.4559, time 118.02ms
iter 80890: loss 84.8934, time 118.54ms
tensor(0.9222)
iter 80900: loss 99.1728, time 118.04ms
iter 80910: loss 110.9277, time 118.51ms
iter 80920: loss 103.5977, time 119.04ms
iter 80930: loss 110.4732, time 118.91ms
iter 80940: loss 81.6684, time 118.78ms
iter 80950: loss 82.5715, time 118.74ms
iter 80960: loss 89.0356, time 118.20ms
iter 80970: loss 84.1229, time 118.66ms
iter 80980: loss 88.8192, time 120.62ms
iter 80990: loss 75.7570, time 118.73ms
tensor(0.9045)
step 81000: train loss 55.0359, val loss 55.1126
saving checkpoint to out-shakespeare-char
iter 81000: loss 95.4998, time 2841.21ms
iter 81010: loss 75.3234, time 119.41ms
iter 81020: loss 94.0368, time 119.09ms
iter 81030: loss 95.9357, time 119.20ms
iter 81040: loss 71.7897, time 119.71ms
iter 81050: loss 83.9933, time 119.30ms
iter 81060: loss 76.3797, time 119.43ms
iter 81070: loss 90.6435, time 119.31ms
iter 81080: loss 75.9104, time 119.35ms
iter 81090: loss 89.2874, time 119.59ms
tensor(0.8853)
iter 81100: loss 103.5770, time 120.20ms
iter 81110: loss 82.3887, time 119.85ms
iter 81120: loss 100.9922, time 119.72ms
iter 81130: loss 75.5138, time 119.74ms
iter 81140: loss 74.9107, time 119.71ms
iter 81150: loss 72.4265, time 119.78ms
iter 81160: loss 67.7017, time 119.68ms
iter 81170: loss 84.0895, time 119.94ms
iter 81180: loss 76.6837, time 119.61ms
iter 81190: loss 82.0444, time 119.61ms
tensor(0.8645)
iter 81200: loss 85.8759, time 119.96ms
iter 81210: loss 101.8554, time 119.69ms
iter 81220: loss 79.5686, time 119.69ms
iter 81230: loss 67.1875, time 119.64ms
iter 81240: loss 69.3914, time 119.60ms
step 81250: train loss 53.8972, val loss 53.7242
saving checkpoint to out-shakespeare-char
iter 81250: loss 88.1142, time 2847.00ms
iter 81260: loss 80.3848, time 120.38ms
iter 81270: loss 78.2413, time 122.46ms
iter 81280: loss 88.5320, time 121.71ms
iter 81290: loss 61.6327, time 121.68ms
tensor(0.8423)
iter 81300: loss 89.8014, time 121.87ms
iter 81310: loss 70.8370, time 120.56ms
iter 81320: loss 70.0231, time 121.26ms
iter 81330: loss 64.9733, time 122.63ms
iter 81340: loss 80.5691, time 122.02ms
iter 81350: loss 76.6614, time 122.03ms
iter 81360: loss 66.0199, time 120.60ms
iter 81370: loss 79.2287, time 121.95ms
iter 81380: loss 67.7756, time 121.86ms
iter 81390: loss 63.6180, time 122.08ms
tensor(0.8187)
iter 81400: loss 66.0786, time 122.13ms
iter 81410: loss 68.6940, time 120.91ms
iter 81420: loss 74.4165, time 121.89ms
iter 81430: loss 85.9287, time 120.89ms
iter 81440: loss 138.1590, time 122.06ms
iter 81450: loss 79.7380, time 122.11ms
iter 81460: loss 95.8992, time 120.65ms
iter 81470: loss 71.0220, time 121.96ms
iter 81480: loss 60.7266, time 121.87ms
iter 81490: loss 77.0872, time 121.95ms
tensor(0.7939)
step 81500: train loss 54.6897, val loss 54.1213
saving checkpoint to out-shakespeare-char
iter 81500: loss 70.7091, time 2852.75ms
iter 81510: loss 69.8465, time 117.90ms
iter 81520: loss 121.3295, time 119.08ms
iter 81530: loss 74.3852, time 119.51ms
iter 81540: loss 65.2220, time 118.67ms
iter 81550: loss 75.0287, time 118.43ms
iter 81560: loss 66.8465, time 118.67ms
iter 81570: loss 69.1439, time 117.76ms
iter 81580: loss 65.6406, time 118.55ms
iter 81590: loss 60.9046, time 118.65ms
tensor(0.7679)
iter 81600: loss 56.6091, time 118.73ms
iter 81610: loss 52.0878, time 118.99ms
iter 81620: loss 78.5069, time 118.48ms
iter 81630: loss 64.5052, time 118.74ms
iter 81640: loss 62.0330, time 119.11ms
iter 81650: loss 66.1167, time 119.07ms
iter 81660: loss 57.5808, time 119.23ms
iter 81670: loss 66.8131, time 118.63ms
iter 81680: loss 58.2148, time 118.89ms
iter 81690: loss 51.7801, time 119.01ms
tensor(0.7409)
iter 81700: loss 60.3844, time 118.96ms
iter 81710: loss 64.1383, time 118.82ms
iter 81720: loss 48.4075, time 118.48ms
iter 81730: loss 50.6309, time 119.28ms
iter 81740: loss 59.5107, time 118.87ms
step 81750: train loss 38.7102, val loss 38.5717
saving checkpoint to out-shakespeare-char
iter 81750: loss 44.6226, time 2835.84ms
iter 81760: loss 61.7789, time 120.40ms
iter 81770: loss 64.4925, time 120.16ms
iter 81780: loss 43.5202, time 121.21ms
iter 81790: loss 64.5361, time 121.57ms
tensor(0.7129)
iter 81800: loss 56.5198, time 122.46ms
iter 81810: loss 53.0746, time 122.10ms
iter 81820: loss 50.5664, time 119.52ms
iter 81830: loss 52.6015, time 121.09ms
iter 81840: loss 44.8322, time 121.08ms
iter 81850: loss 40.0103, time 120.74ms
iter 81860: loss 41.0541, time 119.74ms
iter 81870: loss 51.4704, time 117.89ms
iter 81880: loss 60.5653, time 121.04ms
iter 81890: loss 60.5093, time 120.53ms
tensor(0.6841)
iter 81900: loss 53.0058, time 121.05ms
iter 81910: loss 58.9661, time 120.91ms
iter 81920: loss 52.0799, time 118.56ms
iter 81930: loss 51.9615, time 120.62ms
iter 81940: loss 54.9297, time 120.87ms
iter 81950: loss 57.8515, time 120.65ms
iter 81960: loss 43.3555, time 120.56ms
iter 81970: loss 44.1957, time 118.66ms
iter 81980: loss 54.1884, time 120.39ms
iter 81990: loss 49.5587, time 120.89ms
tensor(0.6545)
step 82000: train loss 25.3904, val loss 25.2826
saving checkpoint to out-shakespeare-char
iter 82000: loss 51.2507, time 2835.86ms
iter 82010: loss 39.3068, time 119.30ms
iter 82020: loss 43.3191, time 118.64ms
iter 82030: loss 47.2492, time 118.63ms
iter 82040: loss 55.7517, time 118.74ms
iter 82050: loss 49.7038, time 117.62ms
iter 82060: loss 46.1375, time 118.84ms
iter 82070: loss 50.0572, time 118.20ms
iter 82080: loss 42.8699, time 118.64ms
iter 82090: loss 42.4326, time 118.20ms
tensor(0.6243)
iter 82100: loss 46.4329, time 119.02ms
iter 82110: loss 35.5700, time 119.07ms
iter 82120: loss 36.0809, time 118.90ms
iter 82130: loss 49.8315, time 119.13ms
iter 82140: loss 45.2122, time 119.30ms
iter 82150: loss 50.8792, time 119.54ms
iter 82160: loss 38.2384, time 119.79ms
iter 82170: loss 47.8134, time 119.71ms
iter 82180: loss 40.2905, time 119.71ms
iter 82190: loss 50.9111, time 119.77ms
tensor(0.5937)
iter 82200: loss 70.8097, time 120.09ms
iter 82210: loss 47.9222, time 120.21ms
iter 82220: loss 40.9948, time 119.63ms
iter 82230: loss 39.0458, time 119.95ms
iter 82240: loss 42.3589, time 119.81ms
step 82250: train loss 23.5899, val loss 23.4796
saving checkpoint to out-shakespeare-char
iter 82250: loss 36.3312, time 2833.52ms
iter 82260: loss 51.9257, time 120.98ms
iter 82270: loss 37.2626, time 120.18ms
iter 82280: loss 33.9032, time 121.04ms
iter 82290: loss 60.3434, time 120.95ms
tensor(0.5627)
iter 82300: loss 48.9915, time 123.00ms
iter 82310: loss 39.6917, time 120.68ms
iter 82320: loss 43.0490, time 122.09ms
iter 82330: loss 42.6413, time 121.70ms
iter 82340: loss 44.3816, time 121.40ms
iter 82350: loss 35.3771, time 122.08ms
iter 82360: loss 34.2310, time 115.33ms
iter 82370: loss 42.5427, time 115.70ms
iter 82380: loss 39.2204, time 116.94ms
iter 82390: loss 37.0232, time 115.37ms
tensor(0.5314)
iter 82400: loss 29.7535, time 117.27ms
iter 82410: loss 32.7785, time 118.16ms
iter 82420: loss 29.9093, time 114.79ms
iter 82430: loss 37.6845, time 116.68ms
iter 82440: loss 26.7005, time 117.78ms
iter 82450: loss 29.2998, time 115.91ms
iter 82460: loss 33.7382, time 116.55ms
iter 82470: loss 28.9889, time 115.73ms
iter 82480: loss 27.3225, time 114.82ms
iter 82490: loss 25.9914, time 117.23ms
tensor(0.5000)
step 82500: train loss 14.1358, val loss 13.9435
saving checkpoint to out-shakespeare-char
iter 82500: loss 28.4222, time 2856.87ms
iter 82510: loss 28.7890, time 116.78ms
iter 82520: loss 25.7049, time 115.17ms
iter 82530: loss 22.2279, time 118.40ms
iter 82540: loss 26.8377, time 114.72ms
iter 82550: loss 20.0783, time 116.75ms
iter 82560: loss 25.9175, time 116.91ms
iter 82570: loss 24.6772, time 115.82ms
iter 82580: loss 21.5160, time 116.75ms
iter 82590: loss 21.5222, time 117.37ms
tensor(0.4686)
iter 82600: loss 19.6803, time 115.48ms
iter 82610: loss 21.8547, time 116.74ms
iter 82620: loss 18.5724, time 114.60ms
iter 82630: loss 18.3057, time 116.73ms
iter 82640: loss 22.0788, time 117.59ms
iter 82650: loss 22.5966, time 115.78ms
iter 82660: loss 20.0685, time 116.71ms
iter 82670: loss 17.9781, time 117.24ms
iter 82680: loss 17.4266, time 114.86ms
iter 82690: loss 19.7075, time 116.70ms
tensor(0.4373)
iter 82700: loss 17.5837, time 116.88ms
iter 82710: loss 15.8966, time 114.71ms
iter 82720: loss 15.6470, time 117.41ms
iter 82730: loss 14.8408, time 116.93ms
iter 82740: loss 13.6631, time 116.66ms
step 82750: train loss 8.2374, val loss 8.2325
saving checkpoint to out-shakespeare-char
iter 82750: loss 12.8106, time 2841.10ms
iter 82760: loss 13.0070, time 114.51ms
iter 82770: loss 14.4986, time 115.09ms
iter 82780: loss 13.7946, time 118.04ms
iter 82790: loss 13.3398, time 115.70ms
tensor(0.4063)
iter 82800: loss 12.7978, time 117.08ms
iter 82810: loss 12.0804, time 116.20ms
iter 82820: loss 12.7783, time 115.26ms
iter 82830: loss 12.9094, time 116.93ms
iter 82840: loss 12.3490, time 114.81ms
iter 82850: loss 12.9084, time 116.56ms
iter 82860: loss 10.4403, time 116.77ms
iter 82870: loss 11.3178, time 115.33ms
iter 82880: loss 11.4978, time 116.65ms
iter 82890: loss 11.9680, time 115.76ms
tensor(0.3757)
iter 82900: loss 11.9481, time 116.97ms
iter 82910: loss 11.6015, time 118.04ms
iter 82920: loss 10.8651, time 115.81ms
iter 82930: loss 10.8007, time 113.96ms
iter 82940: loss 12.4680, time 116.50ms
iter 82950: loss 10.8300, time 115.10ms
iter 82960: loss 10.9295, time 116.81ms
iter 82970: loss 11.0933, time 115.88ms
iter 82980: loss 10.1751, time 116.77ms
iter 82990: loss 10.0546, time 116.03ms
tensor(0.3455)
step 83000: train loss 7.9282, val loss 7.9253
saving checkpoint to out-shakespeare-char
iter 83000: loss 10.8180, time 2833.41ms
iter 83010: loss 10.7979, time 115.21ms
iter 83020: loss 9.9356, time 116.44ms
iter 83030: loss 10.2864, time 116.17ms
iter 83040: loss 10.5634, time 115.37ms
iter 83050: loss 9.8311, time 117.06ms
iter 83060: loss 10.1087, time 115.45ms
iter 83070: loss 9.4767, time 115.65ms
iter 83080: loss 10.5150, time 118.05ms
iter 83090: loss 10.2558, time 115.92ms
tensor(0.3159)
iter 83100: loss 9.5625, time 116.99ms
iter 83110: loss 10.1970, time 117.35ms
iter 83120: loss 9.3592, time 115.74ms
iter 83130: loss 11.0468, time 114.74ms
iter 83140: loss 9.9216, time 118.06ms
iter 83150: loss 9.6697, time 115.83ms
iter 83160: loss 9.3799, time 116.81ms
iter 83170: loss 9.9647, time 117.03ms
iter 83180: loss 10.0559, time 115.55ms
iter 83190: loss 9.1996, time 114.82ms
tensor(0.2871)
iter 83200: loss 9.0790, time 117.12ms
iter 83210: loss 9.2338, time 115.66ms
iter 83220: loss 10.1676, time 116.92ms
iter 83230: loss 9.9602, time 115.83ms
iter 83240: loss 10.1325, time 115.81ms
step 83250: train loss 7.8710, val loss 7.8732
saving checkpoint to out-shakespeare-char
iter 83250: loss 9.7800, time 2851.08ms
iter 83260: loss 9.4047, time 118.35ms
iter 83270: loss 9.3646, time 116.01ms
iter 83280: loss 9.5356, time 116.71ms
iter 83290: loss 9.4257, time 117.80ms
tensor(0.2591)
iter 83300: loss 8.8497, time 116.50ms
iter 83310: loss 9.1881, time 116.63ms
iter 83320: loss 9.8277, time 116.93ms
iter 83330: loss 9.8487, time 116.05ms
iter 83340: loss 9.2639, time 116.53ms
iter 83350: loss 9.5414, time 115.95ms
iter 83360: loss 8.5115, time 114.95ms
iter 83370: loss 9.0386, time 116.95ms
iter 83380: loss 8.7587, time 115.99ms
iter 83390: loss 8.9729, time 114.64ms
tensor(0.2321)
iter 83400: loss 9.2371, time 119.03ms
iter 83410: loss 9.0797, time 115.71ms
iter 83420: loss 9.5710, time 117.06ms
iter 83430: loss 8.7721, time 116.91ms
iter 83440: loss 9.1590, time 115.49ms
iter 83450: loss 8.7137, time 114.62ms
iter 83460: loss 9.2964, time 116.74ms
iter 83470: loss 9.0036, time 114.85ms
iter 83480: loss 9.2715, time 117.19ms
iter 83490: loss 9.1255, time 115.76ms
tensor(0.2061)
step 83500: train loss 7.8135, val loss 7.8242
saving checkpoint to out-shakespeare-char
iter 83500: loss 9.1025, time 2836.67ms
iter 83510: loss 8.7927, time 116.76ms
iter 83520: loss 8.5671, time 116.37ms
iter 83530: loss 8.9942, time 114.80ms
iter 83540: loss 8.9639, time 117.03ms
iter 83550: loss 8.1578, time 115.88ms
iter 83560: loss 9.2482, time 116.65ms
iter 83570: loss 8.8884, time 117.78ms
iter 83580: loss 9.0455, time 115.66ms
iter 83590: loss 8.4882, time 115.25ms
tensor(0.1813)
iter 83600: loss 8.8170, time 117.04ms
iter 83610: loss 8.9249, time 115.38ms
iter 83620: loss 9.0466, time 116.85ms
iter 83630: loss 8.8426, time 115.78ms
iter 83640: loss 8.6760, time 116.82ms
iter 83650: loss 8.7336, time 115.95ms
iter 83660: loss 8.8245, time 115.90ms
iter 83670: loss 8.7448, time 116.71ms
iter 83680: loss 9.6333, time 116.11ms
iter 83690: loss 8.9547, time 115.54ms
tensor(0.1577)
iter 83700: loss 9.2991, time 117.10ms
iter 83710: loss 8.7589, time 115.78ms
iter 83720: loss 9.0341, time 117.40ms
iter 83730: loss 8.9363, time 117.14ms
iter 83740: loss 9.0723, time 115.58ms
step 83750: train loss 7.7727, val loss 7.8102
saving checkpoint to out-shakespeare-char
iter 83750: loss 7.9296, time 2834.78ms
iter 83760: loss 8.5876, time 117.07ms
iter 83770: loss 8.8050, time 116.95ms
iter 83780: loss 8.9202, time 115.19ms
iter 83790: loss 9.3791, time 114.85ms
tensor(0.1355)
iter 83800: loss 8.9658, time 116.07ms
iter 83810: loss 8.6316, time 116.13ms
iter 83820: loss 8.7627, time 115.41ms
iter 83830: loss 8.5972, time 115.43ms
iter 83840: loss 9.1050, time 114.81ms
iter 83850: loss 8.8223, time 115.25ms
iter 83860: loss 8.9165, time 115.92ms
iter 83870: loss 8.2546, time 114.99ms
iter 83880: loss 8.5425, time 114.98ms
iter 83890: loss 8.5600, time 115.49ms
tensor(0.1147)
iter 83900: loss 8.0587, time 115.49ms
iter 83910: loss 8.5076, time 116.02ms
iter 83920: loss 8.9683, time 114.61ms
iter 83930: loss 8.8107, time 115.40ms
iter 83940: loss 8.6390, time 113.61ms
iter 83950: loss 8.7398, time 115.04ms
iter 83960: loss 8.0718, time 115.50ms
iter 83970: loss 8.3802, time 115.91ms
iter 83980: loss 8.9295, time 115.67ms
iter 83990: loss 8.3203, time 114.86ms
tensor(0.0955)
step 84000: train loss 7.7803, val loss 7.8141
saving checkpoint to out-shakespeare-char
iter 84000: loss 8.2987, time 2822.82ms
iter 84010: loss 7.8189, time 122.60ms
iter 84020: loss 8.7100, time 120.92ms
iter 84030: loss 8.3095, time 122.04ms
iter 84040: loss 8.9028, time 120.74ms
iter 84050: loss 9.0354, time 120.80ms
iter 84060: loss 8.8938, time 120.78ms
iter 84070: loss 8.1292, time 120.85ms
iter 84080: loss 9.1181, time 120.71ms
iter 84090: loss 8.5539, time 120.59ms
tensor(0.0778)
iter 84100: loss 9.0324, time 121.26ms
iter 84110: loss 8.3330, time 120.64ms
iter 84120: loss 9.0886, time 120.90ms
iter 84130: loss 8.7434, time 118.99ms
iter 84140: loss 8.8570, time 118.66ms
iter 84150: loss 8.8799, time 118.95ms
iter 84160: loss 8.7530, time 118.78ms
iter 84170: loss 8.1780, time 119.07ms
iter 84180: loss 8.2322, time 119.28ms
iter 84190: loss 8.5433, time 119.71ms
tensor(0.0618)
iter 84200: loss 8.2830, time 120.40ms
iter 84210: loss 9.1301, time 119.95ms
iter 84220: loss 8.8210, time 120.36ms
iter 84230: loss 8.3298, time 120.05ms
iter 84240: loss 8.7576, time 119.92ms
step 84250: train loss 7.6480, val loss 7.6850
saving checkpoint to out-shakespeare-char
iter 84250: loss 8.4506, time 2849.26ms
iter 84260: loss 8.3916, time 121.84ms
iter 84270: loss 8.2648, time 121.35ms
iter 84280: loss 8.6171, time 121.20ms
iter 84290: loss 8.5260, time 120.94ms
tensor(0.0476)
iter 84300: loss 8.8154, time 121.26ms
iter 84310: loss 9.1625, time 119.77ms
iter 84320: loss 8.4169, time 120.76ms
iter 84330: loss 8.3263, time 118.13ms
iter 84340: loss 8.5338, time 118.72ms
iter 84350: loss 8.3186, time 118.30ms
iter 84360: loss 8.5177, time 119.72ms
iter 84370: loss 7.8272, time 120.48ms
iter 84380: loss 8.6752, time 120.84ms
iter 84390: loss 8.5136, time 119.78ms
tensor(0.0351)
iter 84400: loss 8.4003, time 121.23ms
iter 84410: loss 8.4314, time 120.67ms
iter 84420: loss 8.3863, time 119.38ms
iter 84430: loss 8.4836, time 120.59ms
iter 84440: loss 7.7322, time 120.70ms
iter 84450: loss 8.5283, time 120.43ms
iter 84460: loss 8.3803, time 120.39ms
iter 84470: loss 8.2899, time 119.51ms
iter 84480: loss 9.1401, time 120.80ms
iter 84490: loss 8.4193, time 120.47ms
tensor(0.0245)
step 84500: train loss 7.6519, val loss 7.6578
saving checkpoint to out-shakespeare-char
iter 84500: loss 8.8072, time 2843.18ms
iter 84510: loss 8.2730, time 119.02ms
iter 84520: loss 8.4602, time 119.19ms
iter 84530: loss 8.9799, time 119.82ms
iter 84540: loss 7.9717, time 118.31ms
iter 84550: loss 8.9552, time 120.83ms
iter 84560: loss 8.7388, time 119.55ms
iter 84570: loss 8.6762, time 120.08ms
iter 84580: loss 8.6101, time 120.87ms
iter 84590: loss 8.1530, time 119.97ms
tensor(0.0157)
iter 84600: loss 8.4754, time 121.81ms
iter 84610: loss 8.4065, time 121.49ms
iter 84620: loss 8.4666, time 121.55ms
iter 84630: loss 7.8099, time 121.92ms
iter 84640: loss 8.3970, time 121.42ms
iter 84650: loss 8.9304, time 121.56ms
iter 84660: loss 8.8097, time 119.30ms
iter 84670: loss 8.7117, time 119.28ms
iter 84680: loss 8.4423, time 118.65ms
iter 84690: loss 8.5403, time 119.77ms
tensor(0.0089)
iter 84700: loss 8.8153, time 119.80ms
iter 84710: loss 7.4970, time 119.25ms
iter 84720: loss 8.3936, time 120.63ms
iter 84730: loss 8.3179, time 120.88ms
iter 84740: loss 8.4473, time 121.82ms
step 84750: train loss 7.6579, val loss 7.7177
saving checkpoint to out-shakespeare-char
iter 84750: loss 8.4477, time 2863.83ms
iter 84760: loss 8.2918, time 119.00ms
iter 84770: loss 8.5044, time 118.74ms
iter 84780: loss 8.5293, time 119.43ms
iter 84790: loss 9.1330, time 118.12ms
tensor(0.0039)
iter 84800: loss 8.8401, time 120.92ms
iter 84810: loss 8.5070, time 120.62ms
iter 84820: loss 8.2978, time 121.02ms
iter 84830: loss 7.8991, time 120.50ms
iter 84840: loss 8.5702, time 123.31ms
iter 84850: loss 8.2370, time 123.88ms
iter 84860: loss 8.4227, time 121.56ms
iter 84870: loss 8.5931, time 122.32ms
iter 84880: loss 8.7409, time 119.57ms
iter 84890: loss 8.2097, time 119.33ms
tensor(0.0010)
iter 84900: loss 8.1227, time 119.55ms
iter 84910: loss 8.2950, time 119.05ms
iter 84920: loss 8.5170, time 119.74ms
iter 84930: loss 8.6105, time 120.23ms
iter 84940: loss 7.9389, time 119.94ms
iter 84950: loss 8.0847, time 121.56ms
iter 84960: loss 8.4513, time 122.93ms
iter 84970: loss 8.1534, time 122.60ms
iter 84980: loss 8.2101, time 121.97ms
iter 84990: loss 8.6221, time 121.54ms
tensor(0.0010)
step 85000: train loss 7.6782, val loss 7.6772
saving checkpoint to out-shakespeare-char
iter 85000: loss 8.4547, time 2858.14ms
iter 85010: loss 8.0844, time 118.36ms
iter 85020: loss 8.1096, time 120.31ms
iter 85030: loss 8.5367, time 120.48ms
iter 85040: loss 7.9183, time 121.38ms
iter 85050: loss 7.8129, time 119.17ms
iter 85060: loss 8.0237, time 121.19ms
iter 85070: loss 8.9709, time 122.04ms
iter 85080: loss 8.6278, time 122.32ms
iter 85090: loss 9.0295, time 122.57ms
tensor(0.0010)
iter 85100: loss 7.7780, time 121.36ms
iter 85110: loss 8.1608, time 122.60ms
iter 85120: loss 8.0431, time 122.55ms
iter 85130: loss 9.4312, time 121.22ms
iter 85140: loss 8.3482, time 120.36ms
iter 85150: loss 8.0754, time 119.51ms
iter 85160: loss 8.3103, time 119.06ms
iter 85170: loss 8.2577, time 119.62ms
iter 85180: loss 8.4492, time 119.15ms
iter 85190: loss 8.7242, time 120.22ms
tensor(0.0039)
iter 85200: loss 8.1920, time 120.69ms
iter 85210: loss 8.0064, time 120.36ms
iter 85220: loss 8.1762, time 121.03ms
iter 85230: loss 8.3472, time 119.40ms
iter 85240: loss 9.3595, time 121.44ms
step 85250: train loss 7.6857, val loss 7.6438
saving checkpoint to out-shakespeare-char
iter 85250: loss 7.5011, time 2851.10ms
iter 85260: loss 8.4935, time 121.14ms
iter 85270: loss 7.9669, time 120.64ms
iter 85280: loss 7.8475, time 120.82ms
iter 85290: loss 8.0596, time 121.14ms
tensor(0.0089)
iter 85300: loss 8.0347, time 121.25ms
iter 85310: loss 7.8044, time 119.09ms
iter 85320: loss 8.9100, time 118.85ms
iter 85330: loss 8.0278, time 118.77ms
iter 85340: loss 8.4199, time 118.82ms
iter 85350: loss 8.1052, time 116.76ms
iter 85360: loss 8.2578, time 118.73ms
iter 85370: loss 8.3989, time 118.98ms
iter 85380: loss 8.4322, time 119.50ms
iter 85390: loss 8.7119, time 119.95ms
tensor(0.0157)
iter 85400: loss 8.1631, time 120.42ms
iter 85410: loss 8.1633, time 120.18ms
iter 85420: loss 8.1463, time 120.46ms
iter 85430: loss 7.9550, time 120.98ms
iter 85440: loss 8.2199, time 120.21ms
iter 85450: loss 7.9983, time 122.30ms
iter 85460: loss 8.8539, time 122.14ms
iter 85470: loss 8.0408, time 122.74ms
iter 85480: loss 8.9204, time 122.24ms
iter 85490: loss 8.5695, time 120.06ms
tensor(0.0245)
step 85500: train loss 7.6980, val loss 7.6437
saving checkpoint to out-shakespeare-char
iter 85500: loss 8.8284, time 2838.36ms
iter 85510: loss 8.0589, time 119.45ms
iter 85520: loss 8.0872, time 118.85ms
iter 85530: loss 8.5498, time 118.95ms
iter 85540: loss 8.5281, time 119.18ms
iter 85550: loss 8.1425, time 119.07ms
iter 85560: loss 8.6171, time 119.85ms
iter 85570: loss 8.3411, time 120.16ms
iter 85580: loss 7.9780, time 120.54ms
iter 85590: loss 8.0895, time 121.08ms
tensor(0.0351)
iter 85600: loss 8.1347, time 120.49ms
iter 85610: loss 8.6106, time 121.70ms
iter 85620: loss 8.3660, time 122.14ms
iter 85630: loss 8.0445, time 122.44ms
iter 85640: loss 7.9022, time 122.68ms
iter 85650: loss 8.4062, time 120.24ms
iter 85660: loss 8.0786, time 122.39ms
iter 85670: loss 7.7037, time 121.35ms
iter 85680: loss 7.6766, time 121.10ms
iter 85690: loss 7.9968, time 121.24ms
tensor(0.0476)
iter 85700: loss 8.6727, time 119.19ms
iter 85710: loss 8.2014, time 119.51ms
iter 85720: loss 8.2668, time 119.78ms
iter 85730: loss 8.2870, time 119.60ms
iter 85740: loss 8.1608, time 121.18ms
step 85750: train loss 7.6991, val loss 7.6898
saving checkpoint to out-shakespeare-char
iter 85750: loss 8.0769, time 2863.78ms
iter 85760: loss 8.6527, time 118.84ms
iter 85770: loss 7.8754, time 119.56ms
iter 85780: loss 8.1953, time 119.50ms
iter 85790: loss 8.4293, time 119.82ms
tensor(0.0618)
iter 85800: loss 8.2896, time 120.63ms
iter 85810: loss 8.6339, time 121.05ms
iter 85820: loss 8.8515, time 120.18ms
iter 85830: loss 9.2282, time 122.10ms
iter 85840: loss 8.5636, time 123.33ms
iter 85850: loss 8.0747, time 122.13ms
iter 85860: loss 8.3644, time 122.72ms
iter 85870: loss 8.3030, time 121.49ms
iter 85880: loss 8.5188, time 121.41ms
iter 85890: loss 7.5516, time 119.92ms
tensor(0.0778)
iter 85900: loss 8.4276, time 119.74ms
iter 85910: loss 8.3802, time 119.64ms
iter 85920: loss 7.8918, time 120.35ms
iter 85930: loss 8.5533, time 120.39ms
iter 85940: loss 8.4165, time 120.85ms
iter 85950: loss 8.2540, time 119.70ms
iter 85960: loss 8.4125, time 121.28ms
iter 85970: loss 7.7374, time 121.85ms
iter 85980: loss 9.1199, time 123.70ms
iter 85990: loss 7.9110, time 120.85ms
tensor(0.0955)
step 86000: train loss 7.7372, val loss 7.7283
saving checkpoint to out-shakespeare-char
iter 86000: loss 7.8955, time 2836.84ms
iter 86010: loss 7.9791, time 120.69ms
iter 86020: loss 8.2857, time 120.69ms
iter 86030: loss 8.7426, time 121.13ms
iter 86040: loss 8.9815, time 122.51ms
iter 86050: loss 8.0715, time 122.74ms
iter 86060: loss 8.2572, time 120.68ms
iter 86070: loss 8.7236, time 122.33ms
iter 86080: loss 8.9625, time 121.72ms
iter 86090: loss 8.4723, time 120.37ms
tensor(0.1147)
iter 86100: loss 8.3788, time 121.29ms
iter 86110: loss 8.3892, time 118.25ms
iter 86120: loss 8.7169, time 118.20ms
iter 86130: loss 8.6216, time 118.96ms
iter 86140: loss 8.0306, time 119.02ms
iter 86150: loss 8.3014, time 120.08ms
iter 86160: loss 8.0295, time 120.27ms
iter 86170: loss 8.2442, time 118.70ms
iter 86180: loss 8.4435, time 121.48ms
iter 86190: loss 8.8396, time 122.40ms
tensor(0.1355)
iter 86200: loss 8.4104, time 123.03ms
iter 86210: loss 7.8520, time 122.64ms
iter 86220: loss 8.6578, time 121.32ms
iter 86230: loss 8.7771, time 122.67ms
iter 86240: loss 8.3052, time 123.09ms
step 86250: train loss 7.7747, val loss 7.7974
saving checkpoint to out-shakespeare-char
iter 86250: loss 8.4824, time 2830.54ms
iter 86260: loss 9.1386, time 120.79ms
iter 86270: loss 8.1128, time 118.81ms
iter 86280: loss 8.4297, time 119.93ms
iter 86290: loss 8.4917, time 118.86ms
tensor(0.1577)
iter 86300: loss 8.0165, time 118.80ms
iter 86310: loss 8.0196, time 118.79ms
iter 86320: loss 8.2201, time 118.68ms
iter 86330: loss 7.8783, time 116.34ms
iter 86340: loss 8.5471, time 118.72ms
iter 86350: loss 8.1492, time 118.82ms
iter 86360: loss 8.6759, time 118.14ms
iter 86370: loss 8.4694, time 118.51ms
iter 86380: loss 7.7590, time 117.88ms
iter 86390: loss 8.2921, time 118.64ms
tensor(0.1813)
iter 86400: loss 8.4875, time 120.72ms
iter 86410: loss 8.1180, time 118.75ms
iter 86420: loss 8.4586, time 117.57ms
iter 86430: loss 8.6382, time 118.59ms
iter 86440: loss 8.3347, time 118.80ms
iter 86450: loss 8.8371, time 118.69ms
iter 86460: loss 7.9550, time 118.17ms
iter 86470: loss 8.8305, time 118.60ms
iter 86480: loss 8.8466, time 117.94ms
iter 86490: loss 8.1978, time 118.96ms
tensor(0.2061)
step 86500: train loss 7.8695, val loss 7.8625
saving checkpoint to out-shakespeare-char
iter 86500: loss 8.2546, time 2837.35ms
iter 86510: loss 7.9734, time 119.76ms
iter 86520: loss 8.2530, time 119.36ms
iter 86530: loss 8.0689, time 119.95ms
iter 86540: loss 8.3231, time 119.32ms
iter 86550: loss 8.4719, time 118.22ms
iter 86560: loss 8.2442, time 118.17ms
iter 86570: loss 8.8566, time 118.45ms
iter 86580: loss 8.5377, time 117.86ms
iter 86590: loss 8.3935, time 119.35ms
tensor(0.2321)
iter 86600: loss 8.1344, time 121.59ms
iter 86610: loss 8.4959, time 119.17ms
iter 86620: loss 8.1205, time 120.89ms
iter 86630: loss 8.8337, time 119.91ms
iter 86640: loss 8.4182, time 119.17ms
iter 86650: loss 8.4871, time 117.66ms
iter 86660: loss 8.2088, time 118.08ms
iter 86670: loss 7.5944, time 117.38ms
iter 86680: loss 8.2340, time 118.88ms
iter 86690: loss 8.8348, time 116.44ms
tensor(0.2591)
iter 86700: loss 8.2620, time 119.54ms
iter 86710: loss 8.4602, time 118.58ms
iter 86720: loss 8.9090, time 119.39ms
iter 86730: loss 8.4344, time 118.13ms
iter 86740: loss 8.6195, time 116.62ms
step 86750: train loss 7.8792, val loss 7.8882
saving checkpoint to out-shakespeare-char
iter 86750: loss 8.4844, time 2840.45ms
iter 86760: loss 8.5088, time 118.91ms
iter 86770: loss 8.4622, time 118.37ms
iter 86780: loss 8.3560, time 118.74ms
iter 86790: loss 8.0541, time 118.00ms
tensor(0.2871)
iter 86800: loss 8.3838, time 119.40ms
iter 86810: loss 8.4864, time 118.38ms
iter 86820: loss 8.3553, time 118.77ms
iter 86830: loss 8.0550, time 118.56ms
iter 86840: loss 8.6308, time 118.73ms
iter 86850: loss 8.5262, time 119.66ms
iter 86860: loss 8.6175, time 117.92ms
iter 86870: loss 8.7340, time 121.06ms
iter 86880: loss 8.1789, time 120.01ms
iter 86890: loss 7.9368, time 119.59ms
tensor(0.3159)
iter 86900: loss 8.7979, time 120.53ms
iter 86910: loss 8.2425, time 118.28ms
iter 86920: loss 7.8951, time 117.50ms
iter 86930: loss 8.3291, time 118.18ms
iter 86940: loss 8.2830, time 120.50ms
iter 86950: loss 8.7040, time 118.87ms
iter 86960: loss 8.2414, time 118.91ms
iter 86970: loss 8.5592, time 116.85ms
iter 86980: loss 8.9024, time 115.33ms
iter 86990: loss 8.2350, time 114.51ms
tensor(0.3455)
step 87000: train loss 7.8796, val loss 7.8876
saving checkpoint to out-shakespeare-char
iter 87000: loss 9.0803, time 2835.41ms
iter 87010: loss 8.7602, time 117.92ms
iter 87020: loss 8.0179, time 114.49ms
iter 87030: loss 8.4187, time 117.20ms
iter 87040: loss 8.6090, time 116.71ms
iter 87050: loss 8.3374, time 120.03ms
iter 87060: loss 8.5972, time 119.85ms
iter 87070: loss 8.2194, time 119.79ms
iter 87080: loss 8.0537, time 120.45ms
iter 87090: loss 8.2634, time 119.77ms
tensor(0.3757)
iter 87100: loss 8.3682, time 120.55ms
iter 87110: loss 8.1952, time 120.39ms
iter 87120: loss 7.8074, time 119.35ms
iter 87130: loss 8.2442, time 119.55ms
iter 87140: loss 8.0820, time 119.30ms
iter 87150: loss 8.3478, time 119.24ms
iter 87160: loss 8.1670, time 119.41ms
iter 87170: loss 8.4241, time 119.45ms
iter 87180: loss 8.0524, time 118.81ms
iter 87190: loss 7.9753, time 118.59ms
tensor(0.4063)
iter 87200: loss 8.4765, time 118.40ms
iter 87210: loss 8.0625, time 118.79ms
iter 87220: loss 8.3414, time 118.76ms
iter 87230: loss 8.0641, time 119.19ms
iter 87240: loss 8.3659, time 118.83ms
step 87250: train loss 7.8802, val loss 7.9328
saving checkpoint to out-shakespeare-char
iter 87250: loss 8.0010, time 2838.97ms
iter 87260: loss 8.1569, time 121.98ms
iter 87270: loss 8.6055, time 120.97ms
iter 87280: loss 7.8823, time 122.04ms
iter 87290: loss 8.5931, time 122.05ms
tensor(0.4373)
iter 87300: loss 7.7344, time 122.24ms
iter 87310: loss 8.7270, time 121.94ms
iter 87320: loss 8.1445, time 121.29ms
iter 87330: loss 7.8645, time 122.04ms
iter 87340: loss 8.2659, time 122.10ms
iter 87350: loss 8.4622, time 121.96ms
iter 87360: loss 8.0313, time 122.34ms
iter 87370: loss 7.9509, time 120.99ms
iter 87380: loss 9.0529, time 122.28ms
iter 87390: loss 9.0780, time 122.09ms
tensor(0.4686)
iter 87400: loss 8.3045, time 122.33ms
iter 87410: loss 8.2779, time 121.42ms
iter 87420: loss 7.8933, time 120.48ms
iter 87430: loss 8.2745, time 120.92ms
iter 87440: loss 8.0811, time 118.72ms
iter 87450: loss 8.7577, time 119.21ms
iter 87460: loss 8.0605, time 118.94ms
iter 87470: loss 7.9950, time 118.90ms
iter 87480: loss 8.6687, time 119.60ms
iter 87490: loss 8.2169, time 119.30ms
tensor(0.5000)
step 87500: train loss 7.9537, val loss 7.9492
saving checkpoint to out-shakespeare-char
iter 87500: loss 8.6707, time 2864.52ms
iter 87510: loss 8.0264, time 121.57ms
iter 87520: loss 8.4377, time 122.40ms
iter 87530: loss 8.9149, time 120.75ms
iter 87540: loss 8.7902, time 121.72ms
iter 87550: loss 8.0365, time 118.95ms
iter 87560: loss 8.0096, time 119.07ms
iter 87570: loss 8.1426, time 119.26ms
iter 87580: loss 8.2588, time 119.44ms
iter 87590: loss 8.1201, time 120.05ms
tensor(0.5314)
iter 87600: loss 8.7728, time 120.59ms
iter 87610: loss 8.4199, time 120.43ms
iter 87620: loss 7.8920, time 121.62ms
iter 87630: loss 8.1813, time 122.29ms
iter 87640: loss 8.1802, time 123.01ms
iter 87650: loss 7.9053, time 123.59ms
iter 87660: loss 8.1712, time 118.87ms
iter 87670: loss 7.8901, time 121.09ms
iter 87680: loss 8.3956, time 121.30ms
iter 87690: loss 8.3972, time 121.49ms
tensor(0.5627)
iter 87700: loss 8.2131, time 121.37ms
iter 87710: loss 8.0381, time 119.06ms
iter 87720: loss 8.6147, time 119.07ms
iter 87730: loss 8.0666, time 120.15ms
iter 87740: loss 8.6469, time 120.70ms
step 87750: train loss 7.9518, val loss 7.9331
saving checkpoint to out-shakespeare-char
iter 87750: loss 8.1788, time 2854.78ms
iter 87760: loss 8.2035, time 122.28ms
iter 87770: loss 8.5654, time 119.04ms
iter 87780: loss 8.2401, time 120.72ms
iter 87790: loss 7.6933, time 120.93ms
tensor(0.5937)
iter 87800: loss 8.3820, time 121.07ms
iter 87810: loss 8.3778, time 120.50ms
iter 87820: loss 7.8907, time 118.76ms
iter 87830: loss 8.2798, time 120.86ms
iter 87840: loss 8.3753, time 121.43ms
iter 87850: loss 8.4537, time 119.28ms
iter 87860: loss 8.2076, time 117.84ms
iter 87870: loss 8.5308, time 118.69ms
iter 87880: loss 8.5498, time 117.90ms
iter 87890: loss 8.0798, time 119.94ms
tensor(0.6243)
iter 87900: loss 8.2842, time 120.36ms
iter 87910: loss 8.1090, time 119.19ms
iter 87920: loss 7.9003, time 119.98ms
iter 87930: loss 8.5343, time 119.08ms
iter 87940: loss 8.5786, time 120.14ms
iter 87950: loss 100.9302, time 120.23ms
iter 87960: loss 9.7295, time 120.00ms
iter 87970: loss 13.2149, time 120.65ms
iter 87980: loss 21.8192, time 119.70ms
iter 87990: loss 20.4730, time 121.70ms
tensor(0.6545)
step 88000: train loss 20.4304, val loss 20.5492
saving checkpoint to out-shakespeare-char
iter 88000: loss 19.1356, time 2852.56ms
iter 88010: loss 22.2158, time 121.19ms
iter 88020: loss 21.7328, time 119.37ms
iter 88030: loss 330.9467, time 118.90ms
iter 88040: loss 66.2403, time 119.35ms
iter 88050: loss 53.8191, time 119.74ms
iter 88060: loss 49.7215, time 119.86ms
iter 88070: loss 72.6509, time 120.32ms
iter 88080: loss 50.4520, time 120.44ms
iter 88090: loss 61.8560, time 118.60ms
tensor(0.6841)
iter 88100: loss 41.2773, time 120.67ms
iter 88110: loss 51.1666, time 120.85ms
iter 88120: loss 32.8511, time 121.16ms
iter 88130: loss 102.1201, time 121.97ms
iter 88140: loss 113.1161, time 121.28ms
iter 88150: loss 59.6399, time 122.63ms
iter 88160: loss 84.2517, time 121.26ms
iter 88170: loss 39.4301, time 121.12ms
iter 88180: loss 22.3872, time 121.31ms
iter 88190: loss 79.3735, time 121.05ms
tensor(0.7129)
iter 88200: loss 29.5377, time 121.31ms
iter 88210: loss 64.8730, time 121.05ms
iter 88220: loss 30.9657, time 119.02ms
iter 88230: loss 44.9412, time 119.05ms
iter 88240: loss 55.5360, time 119.27ms
step 88250: train loss 63.0642, val loss 62.6629
saving checkpoint to out-shakespeare-char
iter 88250: loss 75.2208, time 2851.22ms
iter 88260: loss 61.9293, time 121.50ms
iter 88270: loss 59.3827, time 121.30ms
iter 88280: loss 52.8236, time 122.01ms
iter 88290: loss 38.2438, time 122.60ms
tensor(0.7409)
iter 88300: loss 85.3479, time 120.44ms
iter 88310: loss 65.2116, time 121.96ms
iter 88320: loss 76.6090, time 121.16ms
iter 88330: loss 48.4882, time 121.01ms
iter 88340: loss 378.9638, time 120.08ms
iter 88350: loss 34.3092, time 120.62ms
iter 88360: loss 39.9325, time 120.95ms
iter 88370: loss 111.5709, time 120.81ms
iter 88380: loss 71.9781, time 120.98ms
iter 88390: loss 68.7557, time 120.08ms
tensor(0.7679)
iter 88400: loss 68.1821, time 120.84ms
iter 88410: loss 56.8211, time 120.73ms
iter 88420: loss 72.6478, time 120.99ms
iter 88430: loss 32.4929, time 120.84ms
iter 88440: loss 46.8924, time 120.16ms
iter 88450: loss 54.2515, time 119.13ms
iter 88460: loss 90.3320, time 118.74ms
iter 88470: loss 38.8958, time 118.87ms
iter 88480: loss 52.2778, time 119.19ms
iter 88490: loss 46.0833, time 119.38ms
tensor(0.7939)
step 88500: train loss 63.0922, val loss 63.3670
saving checkpoint to out-shakespeare-char
iter 88500: loss 71.1838, time 2775.99ms
iter 88510: loss 57.8013, time 122.25ms
iter 88520: loss 43.7450, time 120.97ms
iter 88530: loss 58.1029, time 120.99ms
iter 88540: loss 38.7594, time 120.89ms
iter 88550: loss 70.0367, time 119.02ms
iter 88560: loss 41.4679, time 121.11ms
iter 88570: loss 37.5827, time 120.82ms
iter 88580: loss 43.3473, time 120.99ms
iter 88590: loss 53.0080, time 121.08ms
tensor(0.8187)
iter 88600: loss 33.5115, time 119.45ms
iter 88610: loss 46.6072, time 119.43ms
iter 88620: loss 52.3191, time 119.90ms
iter 88630: loss 49.1527, time 120.15ms
iter 88640: loss 49.3763, time 120.13ms
iter 88650: loss 55.7435, time 119.11ms
iter 88660: loss 54.9376, time 118.93ms
iter 88670: loss 46.9205, time 120.10ms
iter 88680: loss 45.4279, time 120.06ms
iter 88690: loss 61.3174, time 120.69ms
tensor(0.8423)
iter 88700: loss 58.5389, time 120.71ms
iter 88710: loss 37.9245, time 120.66ms
iter 88720: loss 52.7614, time 122.38ms
iter 88730: loss 44.1283, time 122.31ms
iter 88740: loss 52.7392, time 121.04ms
step 88750: train loss 33.8541, val loss 33.6714
saving checkpoint to out-shakespeare-char
iter 88750: loss 46.5285, time 2814.56ms
iter 88760: loss 59.2035, time 118.00ms
iter 88770: loss 39.1002, time 120.17ms
iter 88780: loss 50.5812, time 120.52ms
iter 88790: loss 56.3931, time 120.32ms
tensor(0.8645)
iter 88800: loss 59.5580, time 120.47ms
iter 88810: loss 61.5955, time 118.16ms
iter 88820: loss 64.8741, time 119.55ms
iter 88830: loss 61.0181, time 120.03ms
iter 88840: loss 47.9720, time 120.34ms
iter 88850: loss 60.5695, time 118.43ms
iter 88860: loss 63.2808, time 119.18ms
iter 88870: loss 51.8076, time 117.85ms
iter 88880: loss 49.0177, time 119.72ms
iter 88890: loss 56.2443, time 120.29ms
tensor(0.8853)
iter 88900: loss 49.5163, time 120.64ms
iter 88910: loss 70.3154, time 120.28ms
iter 88920: loss 48.5928, time 118.98ms
iter 88930: loss 44.1349, time 120.12ms
iter 88940: loss 66.1770, time 120.74ms
iter 88950: loss 58.1940, time 121.08ms
iter 88960: loss 48.1439, time 120.76ms
iter 88970: loss 72.7467, time 121.38ms
iter 88980: loss 62.2720, time 122.37ms
iter 88990: loss 46.6111, time 121.09ms
tensor(0.9045)
step 89000: train loss 46.0316, val loss 45.9317
saving checkpoint to out-shakespeare-char
iter 89000: loss 68.9893, time 2806.43ms
iter 89010: loss 45.7841, time 120.77ms
iter 89020: loss 68.4805, time 118.64ms
iter 89030: loss 64.3253, time 121.06ms
iter 89040: loss 77.5634, time 120.85ms
iter 89050: loss 78.0290, time 121.10ms
iter 89060: loss 73.2073, time 121.72ms
iter 89070: loss 77.2564, time 119.54ms
iter 89080: loss 55.3038, time 119.30ms
iter 89090: loss 55.9519, time 120.39ms
tensor(0.9222)
iter 89100: loss 79.2524, time 121.52ms
iter 89110: loss 69.8111, time 121.25ms
iter 89120: loss 56.6736, time 122.92ms
iter 89130: loss 76.9106, time 121.69ms
iter 89140: loss 56.7651, time 121.71ms
iter 89150: loss 57.6831, time 121.62ms
iter 89160: loss 59.7588, time 120.05ms
iter 89170: loss 63.4530, time 120.70ms
iter 89180: loss 64.7644, time 120.84ms
iter 89190: loss 66.5460, time 120.28ms
tensor(0.9382)
iter 89200: loss 68.3396, time 123.44ms
iter 89210: loss 62.7758, time 120.44ms
iter 89220: loss 61.8149, time 122.14ms
iter 89230: loss 68.7201, time 121.69ms
iter 89240: loss 61.0646, time 121.80ms
step 89250: train loss 49.3551, val loss 48.9358
saving checkpoint to out-shakespeare-char
iter 89250: loss 63.3584, time 2824.03ms
iter 89260: loss 65.6360, time 120.72ms
iter 89270: loss 57.3772, time 120.90ms
iter 89280: loss 69.8862, time 121.10ms
iter 89290: loss 74.8299, time 122.11ms
tensor(0.9524)
iter 89300: loss 52.5959, time 122.36ms
iter 89310: loss 67.3701, time 121.73ms
iter 89320: loss 65.9457, time 119.73ms
iter 89330: loss 74.4442, time 121.66ms
iter 89340: loss 61.7970, time 120.07ms
iter 89350: loss 52.5778, time 120.72ms
iter 89360: loss 70.6212, time 120.59ms
iter 89370: loss 68.0899, time 119.91ms
iter 89380: loss 48.2820, time 120.95ms
iter 89390: loss 94.5370, time 122.97ms
tensor(0.9649)
iter 89400: loss 97.5756, time 121.98ms
iter 89410: loss 93.1841, time 122.19ms
iter 89420: loss 63.1615, time 121.77ms
iter 89430: loss 69.6574, time 120.54ms
iter 89440: loss 74.8243, time 121.22ms
iter 89450: loss 91.4914, time 121.18ms
iter 89460: loss 70.4205, time 120.85ms
iter 89470: loss 63.8877, time 123.13ms
iter 89480: loss 64.8111, time 121.80ms
iter 89490: loss 68.4104, time 121.90ms
tensor(0.9755)
step 89500: train loss 44.5671, val loss 44.5984
saving checkpoint to out-shakespeare-char
iter 89500: loss 72.8539, time 2822.83ms
iter 89510: loss 94.1309, time 119.90ms
iter 89520: loss 54.6733, time 120.94ms
iter 89530: loss 62.7410, time 120.80ms
iter 89540: loss 67.3606, time 120.99ms
iter 89550: loss 69.7067, time 122.20ms
iter 89560: loss 75.2530, time 123.15ms
iter 89570: loss 81.7557, time 119.58ms
iter 89580: loss 66.5627, time 120.92ms
iter 89590: loss 88.5253, time 121.62ms
tensor(0.9843)
iter 89600: loss 80.1320, time 120.14ms
iter 89610: loss 59.1017, time 120.74ms
iter 89620: loss 65.0282, time 120.65ms
iter 89630: loss 56.9862, time 120.35ms
iter 89640: loss 50.3708, time 122.92ms
iter 89650: loss 67.1963, time 121.73ms
iter 89660: loss 74.0483, time 121.68ms
iter 89670: loss 66.0621, time 121.33ms
iter 89680: loss 86.4978, time 119.63ms
iter 89690: loss 57.3236, time 120.87ms
tensor(0.9911)
iter 89700: loss 77.3386, time 120.64ms
iter 89710: loss 79.2014, time 119.85ms
iter 89720: loss 100.9023, time 120.42ms
iter 89730: loss 95.7458, time 120.70ms
iter 89740: loss 68.9906, time 121.45ms
step 89750: train loss 44.7977, val loss 44.6852
saving checkpoint to out-shakespeare-char
iter 89750: loss 67.4790, time 2747.26ms
iter 89760: loss 92.8503, time 122.03ms
iter 89770: loss 75.4055, time 119.48ms
iter 89780: loss 73.3383, time 120.66ms
iter 89790: loss 70.6625, time 119.66ms
tensor(0.9961)
iter 89800: loss 88.0661, time 120.83ms
iter 89810: loss 107.6764, time 121.15ms
iter 89820: loss 61.8245, time 121.30ms
iter 89830: loss 66.7886, time 123.11ms
iter 89840: loss 82.4532, time 121.47ms
iter 89850: loss 90.7033, time 121.60ms
iter 89860: loss 104.6414, time 121.44ms
iter 89870: loss 81.9655, time 121.41ms
iter 89880: loss 73.8709, time 118.49ms
iter 89890: loss 324.2888, time 119.37ms
tensor(0.9990)
iter 89900: loss 88.8746, time 120.86ms
iter 89910: loss 215.7130, time 120.61ms
iter 89920: loss 84.8788, time 120.51ms
iter 89930: loss 128.0442, time 122.56ms
iter 89940: loss 87.9578, time 122.25ms
iter 89950: loss 65.9989, time 121.49ms
iter 89960: loss 119.6192, time 121.53ms
iter 89970: loss 109.0779, time 119.71ms
iter 89980: loss 80.6855, time 119.41ms
iter 89990: loss 82.9779, time 120.40ms
tensor(1.)
step 90000: train loss 125.9075, val loss 126.1188
saving checkpoint to out-shakespeare-char
iter 90000: loss 154.8818, time 2846.67ms
iter 90010: loss 107.0121, time 122.57ms
iter 90020: loss 145.8828, time 121.62ms
iter 90030: loss 136.1697, time 119.23ms
iter 90040: loss 108.5592, time 121.91ms
iter 90050: loss 103.7617, time 119.53ms
iter 90060: loss 92.6521, time 119.89ms
iter 90070: loss 63.2169, time 120.90ms
iter 90080: loss 87.2437, time 120.54ms
iter 90090: loss 79.2054, time 119.29ms
tensor(0.9990)
iter 90100: loss 75.2823, time 121.32ms
iter 90110: loss 80.7925, time 122.18ms
iter 90120: loss 64.6233, time 122.66ms
iter 90130: loss 59.7109, time 121.38ms
iter 90140: loss 67.4609, time 121.50ms
iter 90150: loss 56.0936, time 121.54ms
iter 90160: loss 61.7178, time 119.35ms
iter 90170: loss 63.2323, time 120.42ms
iter 90180: loss 63.4347, time 120.58ms
iter 90190: loss 76.7735, time 120.52ms
tensor(0.9961)
iter 90200: loss 71.0986, time 120.75ms
iter 90210: loss 75.6882, time 121.54ms
iter 90220: loss 70.4692, time 120.60ms
iter 90230: loss 65.3744, time 121.99ms
iter 90240: loss 74.8341, time 121.31ms
step 90250: train loss 19.1467, val loss 19.1660
saving checkpoint to out-shakespeare-char
iter 90250: loss 67.8882, time 2844.23ms
iter 90260: loss 85.1618, time 119.65ms
iter 90270: loss 80.7876, time 120.41ms
iter 90280: loss 82.9415, time 120.41ms
iter 90290: loss 71.5864, time 120.69ms
tensor(0.9911)
iter 90300: loss 73.1720, time 122.04ms
iter 90310: loss 79.6129, time 122.11ms
iter 90320: loss 78.1457, time 122.72ms
iter 90330: loss 69.8766, time 119.37ms
iter 90340: loss 105.6769, time 121.44ms
iter 90350: loss 98.5912, time 121.38ms
iter 90360: loss 89.1667, time 119.34ms
iter 90370: loss 101.7986, time 120.65ms
iter 90380: loss 115.6724, time 120.57ms
iter 90390: loss 80.7940, time 119.43ms
tensor(0.9843)
iter 90400: loss 114.6726, time 121.05ms
iter 90410: loss 127.4585, time 121.57ms
iter 90420: loss 124.3477, time 122.87ms
iter 90430: loss 116.8164, time 121.51ms
iter 90440: loss 111.4408, time 121.50ms
iter 90450: loss 105.6261, time 121.25ms
iter 90460: loss 91.5299, time 121.45ms
iter 90470: loss 86.6586, time 119.65ms
iter 90480: loss 119.7757, time 120.06ms
iter 90490: loss 105.8340, time 120.53ms
tensor(0.9755)
step 90500: train loss 56.0863, val loss 55.5437
saving checkpoint to out-shakespeare-char
iter 90500: loss 100.5032, time 2852.63ms
iter 90510: loss 87.3856, time 121.72ms
iter 90520: loss 101.0780, time 121.57ms
iter 90530: loss 100.6425, time 119.64ms
iter 90540: loss 99.4394, time 119.72ms
iter 90550: loss 83.3591, time 120.54ms
iter 90560: loss 108.6669, time 120.69ms
iter 90570: loss 79.0499, time 120.55ms
iter 90580: loss 102.1159, time 120.62ms
iter 90590: loss 93.9056, time 123.02ms
tensor(0.9649)
iter 90600: loss 109.0304, time 122.07ms
iter 90610: loss 120.1541, time 122.12ms
iter 90620: loss 106.1884, time 121.43ms
iter 90630: loss 80.0884, time 119.34ms
iter 90640: loss 102.2118, time 119.41ms
iter 90650: loss 94.2870, time 120.57ms
iter 90660: loss 106.0550, time 120.64ms
iter 90670: loss 99.4549, time 120.89ms
iter 90680: loss 85.2884, time 122.34ms
iter 90690: loss 103.3820, time 121.53ms
tensor(0.9524)
iter 90700: loss 99.5893, time 121.77ms
iter 90710: loss 103.1911, time 121.48ms
iter 90720: loss 107.1283, time 121.66ms
iter 90730: loss 92.9295, time 119.76ms
iter 90740: loss 111.8930, time 119.90ms
step 90750: train loss 45.8546, val loss 45.0743
saving checkpoint to out-shakespeare-char
iter 90750: loss 76.4262, time 2845.57ms
iter 90760: loss 97.1061, time 122.00ms
iter 90770: loss 94.1508, time 122.80ms
iter 90780: loss 87.4435, time 121.43ms
iter 90790: loss 100.7305, time 121.55ms
tensor(0.9382)
iter 90800: loss 93.7577, time 121.82ms
iter 90810: loss 98.9631, time 119.41ms
iter 90820: loss 97.9877, time 120.09ms
iter 90830: loss 95.8670, time 120.85ms
iter 90840: loss 79.1151, time 120.79ms
iter 90850: loss 95.9777, time 120.64ms
iter 90860: loss 108.1742, time 122.25ms
iter 90870: loss 100.4590, time 122.77ms
iter 90880: loss 100.6226, time 119.43ms
iter 90890: loss 105.5169, time 121.60ms
tensor(0.9222)
iter 90900: loss 109.8597, time 121.77ms
iter 90910: loss 98.9921, time 119.29ms
iter 90920: loss 92.3098, time 119.95ms
iter 90930: loss 87.1092, time 121.20ms
iter 90940: loss 114.0451, time 119.32ms
iter 90950: loss 106.4082, time 120.55ms
iter 90960: loss 105.1885, time 122.96ms
iter 90970: loss 93.6014, time 122.60ms
iter 90980: loss 81.6419, time 121.47ms
iter 90990: loss 81.0845, time 121.43ms
tensor(0.9045)
step 91000: train loss 37.9003, val loss 38.3917
saving checkpoint to out-shakespeare-char
iter 91000: loss 72.0173, time 2813.03ms
iter 91010: loss 88.4296, time 118.93ms
iter 91020: loss 81.8232, time 119.51ms
iter 91030: loss 80.7346, time 119.53ms
iter 91040: loss 83.6403, time 120.13ms
iter 91050: loss 77.2160, time 119.47ms
iter 91060: loss 82.3782, time 121.19ms
iter 91070: loss 88.8517, time 121.77ms
iter 91080: loss 71.4072, time 122.69ms
iter 91090: loss 82.2282, time 122.72ms
tensor(0.8853)
iter 91100: loss 78.7724, time 121.90ms
iter 91110: loss 86.2409, time 121.56ms
iter 91120: loss 90.2096, time 121.69ms
iter 91130: loss 87.1387, time 119.82ms
iter 91140: loss 79.9547, time 118.64ms
iter 91150: loss 90.0561, time 119.71ms
iter 91160: loss 71.4062, time 120.31ms
iter 91170: loss 68.8125, time 120.94ms
iter 91180: loss 73.3568, time 119.93ms
iter 91190: loss 69.3944, time 121.14ms
tensor(0.8645)
iter 91200: loss 88.4567, time 122.99ms
iter 91210: loss 76.2781, time 123.12ms
iter 91220: loss 72.4422, time 121.55ms
iter 91230: loss 72.9791, time 119.33ms
iter 91240: loss 77.0198, time 118.43ms
step 91250: train loss 48.7196, val loss 49.1009
saving checkpoint to out-shakespeare-char
iter 91250: loss 83.4275, time 2811.42ms
iter 91260: loss 67.3984, time 119.52ms
iter 91270: loss 65.6847, time 119.52ms
iter 91280: loss 63.9641, time 119.97ms
iter 91290: loss 75.4503, time 119.93ms
tensor(0.8423)
iter 91300: loss 87.3239, time 121.33ms
iter 91310: loss 75.8696, time 122.08ms
iter 91320: loss 78.2865, time 123.09ms
iter 91330: loss 61.3721, time 122.72ms
iter 91340: loss 75.3104, time 119.32ms
iter 91350: loss 77.8935, time 120.70ms
iter 91360: loss 66.7280, time 121.64ms
iter 91370: loss 79.0830, time 119.44ms
iter 91380: loss 90.9550, time 119.36ms
iter 91390: loss 66.4922, time 120.50ms
tensor(0.8187)
iter 91400: loss 98.3031, time 119.33ms
iter 91410: loss 65.1644, time 120.84ms
iter 91420: loss 64.2010, time 121.95ms
iter 91430: loss 75.1701, time 122.57ms
iter 91440: loss 64.0500, time 122.73ms
iter 91450: loss 94.3329, time 121.15ms
iter 91460: loss 69.7954, time 122.73ms
iter 91470: loss 94.1057, time 121.38ms
iter 91480: loss 73.1823, time 121.12ms
iter 91490: loss 74.7961, time 120.60ms
tensor(0.7939)
step 91500: train loss 39.4554, val loss 39.7380
saving checkpoint to out-shakespeare-char
iter 91500: loss 64.7521, time 2811.05ms
iter 91510: loss 66.6664, time 119.18ms
iter 91520: loss 65.1878, time 118.83ms
iter 91530: loss 65.4885, time 119.30ms
iter 91540: loss 52.9609, time 119.01ms
iter 91550: loss 62.5552, time 118.77ms
iter 91560: loss 62.9805, time 118.40ms
iter 91570: loss 71.0656, time 118.76ms
iter 91580: loss 74.1004, time 120.06ms
iter 91590: loss 70.0647, time 119.66ms
tensor(0.7679)
iter 91600: loss 59.7981, time 119.67ms
iter 91610: loss 67.0176, time 118.39ms
iter 91620: loss 53.7264, time 119.64ms
iter 91630: loss 56.0497, time 119.52ms
iter 91640: loss 85.3267, time 119.67ms
iter 91650: loss 55.6288, time 119.84ms
iter 91660: loss 66.2258, time 119.41ms
iter 91670: loss 57.3204, time 121.20ms
iter 91680: loss 59.9099, time 122.18ms
iter 91690: loss 54.9286, time 122.40ms
tensor(0.7409)
iter 91700: loss 50.6123, time 122.82ms
iter 91710: loss 59.0741, time 121.32ms
iter 91720: loss 62.3882, time 122.74ms
iter 91730: loss 45.9346, time 121.99ms
iter 91740: loss 44.2925, time 119.46ms
step 91750: train loss 38.3285, val loss 38.2381
saving checkpoint to out-shakespeare-char
iter 91750: loss 54.8527, time 2859.13ms
iter 91760: loss 50.0285, time 121.93ms
iter 91770: loss 54.5275, time 121.94ms
iter 91780: loss 50.8685, time 122.58ms
iter 91790: loss 48.9340, time 121.50ms
tensor(0.7129)
iter 91800: loss 51.3029, time 120.48ms
iter 91810: loss 41.2478, time 119.05ms
iter 91820: loss 54.7263, time 119.06ms
iter 91830: loss 65.3159, time 119.30ms
iter 91840: loss 46.8139, time 119.33ms
iter 91850: loss 48.3080, time 119.98ms
iter 91860: loss 48.3700, time 119.79ms
iter 91870: loss 51.8085, time 120.95ms
iter 91880: loss 49.0319, time 121.36ms
iter 91890: loss 54.4403, time 121.84ms
tensor(0.6841)
iter 91900: loss 46.7416, time 121.15ms
iter 91910: loss 46.2444, time 122.79ms
iter 91920: loss 51.8213, time 122.45ms
iter 91930: loss 46.7929, time 122.72ms
iter 91940: loss 47.7341, time 121.14ms
iter 91950: loss 43.3595, time 119.08ms
iter 91960: loss 40.7464, time 121.71ms
iter 91970: loss 48.7493, time 121.20ms
iter 91980: loss 48.7091, time 118.92ms
iter 91990: loss 46.9637, time 119.41ms
tensor(0.6545)
step 92000: train loss 30.7749, val loss 30.6956
saving checkpoint to out-shakespeare-char
iter 92000: loss 47.1519, time 2841.52ms
iter 92010: loss 45.0065, time 120.50ms
iter 92020: loss 40.2446, time 121.55ms
iter 92030: loss 48.2630, time 122.63ms
iter 92040: loss 45.5803, time 122.61ms
iter 92050: loss 44.5351, time 122.50ms
iter 92060: loss 49.6717, time 120.39ms
iter 92070: loss 40.3615, time 121.68ms
iter 92080: loss 40.3137, time 121.28ms
iter 92090: loss 54.9954, time 121.32ms
tensor(0.6243)
iter 92100: loss 39.6923, time 119.69ms
iter 92110: loss 41.3620, time 119.43ms
iter 92120: loss 39.9323, time 119.04ms
iter 92130: loss 35.1780, time 119.66ms
iter 92140: loss 38.3679, time 120.25ms
iter 92150: loss 40.2781, time 120.55ms
iter 92160: loss 38.3080, time 120.89ms
iter 92170: loss 36.8199, time 120.25ms
iter 92180: loss 45.1974, time 121.46ms
iter 92190: loss 36.7150, time 122.38ms
tensor(0.5937)
iter 92200: loss 36.5970, time 122.18ms
iter 92210: loss 34.2276, time 122.48ms
iter 92220: loss 39.5408, time 121.23ms
iter 92230: loss 44.7757, time 121.20ms
iter 92240: loss 35.0388, time 121.53ms
step 92250: train loss 25.2817, val loss 25.5825
saving checkpoint to out-shakespeare-char
iter 92250: loss 43.3463, time 2838.05ms
iter 92260: loss 29.3893, time 119.00ms
iter 92270: loss 33.4438, time 118.91ms
iter 92280: loss 27.6008, time 119.40ms
iter 92290: loss 33.2126, time 119.91ms
tensor(0.5627)
iter 92300: loss 39.2163, time 120.58ms
iter 92310: loss 35.0792, time 120.91ms
iter 92320: loss 28.1273, time 121.40ms
iter 92330: loss 30.3261, time 121.17ms
iter 92340: loss 32.2325, time 122.57ms
iter 92350: loss 30.6407, time 122.43ms
iter 92360: loss 27.2010, time 122.55ms
iter 92370: loss 23.0541, time 122.42ms
iter 92380: loss 31.4919, time 121.52ms
iter 92390: loss 27.5291, time 121.59ms
tensor(0.5314)
iter 92400: loss 29.9578, time 121.49ms
iter 92410: loss 32.0019, time 119.01ms
iter 92420: loss 31.5943, time 118.96ms
iter 92430: loss 27.8703, time 119.16ms
iter 92440: loss 26.9977, time 119.10ms
iter 92450: loss 28.0883, time 119.39ms
iter 92460: loss 29.6586, time 119.75ms
iter 92470: loss 39.7058, time 120.37ms
iter 92480: loss 31.5498, time 120.44ms
iter 92490: loss 31.4678, time 121.80ms
tensor(0.5000)
step 92500: train loss 22.5813, val loss 22.6525
saving checkpoint to out-shakespeare-char
iter 92500: loss 33.4205, time 2843.59ms
iter 92510: loss 27.5957, time 121.31ms
iter 92520: loss 27.2867, time 121.21ms
iter 92530: loss 35.9156, time 120.66ms
iter 92540: loss 26.6550, time 120.77ms
iter 92550: loss 28.6899, time 118.90ms
iter 92560: loss 28.9123, time 118.65ms
iter 92570: loss 23.8474, time 118.59ms
iter 92580: loss 23.6837, time 118.81ms
iter 92590: loss 22.7537, time 118.95ms
tensor(0.4686)
iter 92600: loss 24.9179, time 119.26ms
iter 92610: loss 25.0395, time 118.64ms
iter 92620: loss 20.8930, time 119.05ms
iter 92630: loss 26.6574, time 119.07ms
iter 92640: loss 20.5005, time 118.79ms
iter 92650: loss 20.8864, time 118.64ms
iter 92660: loss 27.9559, time 118.60ms
iter 92670: loss 20.7801, time 118.59ms
iter 92680: loss 19.4778, time 118.60ms
iter 92690: loss 19.1318, time 118.48ms
tensor(0.4373)
iter 92700: loss 18.3393, time 118.73ms
iter 92710: loss 18.9489, time 118.50ms
iter 92720: loss 20.3858, time 118.62ms
iter 92730: loss 18.9164, time 118.95ms
iter 92740: loss 17.6471, time 118.65ms
step 92750: train loss 13.7268, val loss 13.7340
saving checkpoint to out-shakespeare-char
iter 92750: loss 19.4210, time 2844.49ms
iter 92760: loss 20.5978, time 120.38ms
iter 92770: loss 24.7544, time 121.35ms
iter 92780: loss 16.4479, time 122.22ms
iter 92790: loss 17.9513, time 122.70ms
tensor(0.4063)
iter 92800: loss 19.2984, time 122.02ms
iter 92810: loss 25.9676, time 121.24ms
iter 92820: loss 25.3165, time 122.62ms
iter 92830: loss 21.3758, time 121.30ms
iter 92840: loss 15.7631, time 121.51ms
iter 92850: loss 23.1561, time 119.36ms
iter 92860: loss 18.8554, time 118.70ms
iter 92870: loss 19.1029, time 119.19ms
iter 92880: loss 16.2790, time 119.82ms
iter 92890: loss 13.7463, time 120.39ms
tensor(0.3757)
iter 92900: loss 13.5683, time 120.77ms
iter 92910: loss 13.0998, time 121.70ms
iter 92920: loss 14.9075, time 121.20ms
iter 92930: loss 14.2936, time 120.25ms
iter 92940: loss 12.7137, time 122.39ms
iter 92950: loss 14.4453, time 122.39ms
iter 92960: loss 13.0820, time 122.57ms
iter 92970: loss 12.0247, time 122.42ms
iter 92980: loss 11.9287, time 120.31ms
iter 92990: loss 13.2698, time 121.25ms
tensor(0.3455)
step 93000: train loss 7.9842, val loss 8.0762
saving checkpoint to out-shakespeare-char
iter 93000: loss 10.8987, time 2852.78ms
iter 93010: loss 10.5016, time 120.47ms
iter 93020: loss 11.9250, time 121.15ms
iter 93030: loss 11.9195, time 121.59ms
iter 93040: loss 9.9129, time 120.86ms
iter 93050: loss 9.7216, time 122.80ms
iter 93060: loss 11.6577, time 122.56ms
iter 93070: loss 10.5062, time 123.51ms
iter 93080: loss 9.8534, time 121.41ms
iter 93090: loss 10.8028, time 119.12ms
tensor(0.3159)
iter 93100: loss 11.1395, time 119.12ms
iter 93110: loss 11.5789, time 118.48ms
iter 93120: loss 10.8643, time 118.67ms
iter 93130: loss 9.5912, time 118.68ms
iter 93140: loss 10.3307, time 118.54ms
iter 93150: loss 10.3764, time 118.96ms
iter 93160: loss 10.6287, time 118.64ms
iter 93170: loss 10.5806, time 118.92ms
iter 93180: loss 10.1619, time 118.54ms
iter 93190: loss 9.1718, time 118.46ms
tensor(0.2871)
iter 93200: loss 10.3110, time 120.95ms
iter 93210: loss 10.0346, time 118.89ms
iter 93220: loss 10.1136, time 118.87ms
iter 93230: loss 9.0846, time 118.77ms
iter 93240: loss 9.1793, time 118.54ms
step 93250: train loss 7.8836, val loss 7.8977
saving checkpoint to out-shakespeare-char
iter 93250: loss 10.3938, time 2839.19ms
iter 93260: loss 10.0540, time 118.95ms
iter 93270: loss 9.5629, time 119.04ms
iter 93280: loss 10.7773, time 120.24ms
iter 93290: loss 9.7309, time 119.81ms
tensor(0.2591)
iter 93300: loss 9.7015, time 120.05ms
iter 93310: loss 10.3450, time 120.57ms
iter 93320: loss 10.1888, time 120.13ms
iter 93330: loss 9.4660, time 120.46ms
iter 93340: loss 9.9039, time 120.70ms
iter 93350: loss 8.9074, time 120.72ms
iter 93360: loss 9.8354, time 120.68ms
iter 93370: loss 10.0509, time 120.15ms
iter 93380: loss 10.1210, time 120.93ms
iter 93390: loss 9.6153, time 121.04ms
tensor(0.2321)
iter 93400: loss 9.1099, time 120.80ms
iter 93410: loss 9.6961, time 121.17ms
iter 93420: loss 9.0305, time 121.38ms
iter 93430: loss 8.7444, time 122.03ms
iter 93440: loss 9.6807, time 122.39ms
iter 93450: loss 8.4552, time 120.20ms
iter 93460: loss 9.2440, time 122.45ms
iter 93470: loss 8.6520, time 122.49ms
iter 93480: loss 8.6383, time 122.33ms
iter 93490: loss 9.6227, time 122.35ms
tensor(0.2061)
step 93500: train loss 7.8822, val loss 7.7958
saving checkpoint to out-shakespeare-char
iter 93500: loss 9.3333, time 2831.75ms
iter 93510: loss 9.6440, time 122.70ms
iter 93520: loss 9.4819, time 120.99ms
iter 93530: loss 9.3792, time 121.18ms
iter 93540: loss 8.9288, time 119.04ms
iter 93550: loss 8.3699, time 118.90ms
iter 93560: loss 8.6893, time 118.86ms
iter 93570: loss 9.5902, time 119.48ms
iter 93580: loss 9.5619, time 118.96ms
iter 93590: loss 9.3218, time 120.40ms
tensor(0.1813)
iter 93600: loss 9.1929, time 120.42ms
iter 93610: loss 9.4727, time 120.28ms
iter 93620: loss 8.9343, time 120.79ms
iter 93630: loss 9.0110, time 121.17ms
iter 93640: loss 9.7193, time 122.30ms
iter 93650: loss 9.2466, time 122.71ms
iter 93660: loss 9.3490, time 120.08ms
iter 93670: loss 8.7306, time 122.24ms
iter 93680: loss 8.9014, time 122.13ms
iter 93690: loss 9.4526, time 122.08ms
tensor(0.1577)
iter 93700: loss 9.1698, time 123.71ms
iter 93710: loss 9.2873, time 117.79ms
iter 93720: loss 8.7790, time 119.15ms
iter 93730: loss 9.1247, time 118.99ms
iter 93740: loss 8.5460, time 118.89ms
step 93750: train loss 7.7592, val loss 7.7833
saving checkpoint to out-shakespeare-char
iter 93750: loss 8.9453, time 2850.26ms
iter 93760: loss 9.4588, time 122.90ms
iter 93770: loss 8.7197, time 119.88ms
iter 93780: loss 9.2537, time 122.43ms
iter 93790: loss 8.7166, time 122.34ms
tensor(0.1355)
iter 93800: loss 8.9943, time 123.13ms
iter 93810: loss 8.8517, time 121.66ms
iter 93820: loss 8.2675, time 117.12ms
iter 93830: loss 8.8495, time 121.08ms
iter 93840: loss 8.9249, time 121.51ms
iter 93850: loss 8.3464, time 118.94ms
iter 93860: loss 9.1123, time 119.34ms
iter 93870: loss 9.0095, time 118.39ms
iter 93880: loss 8.1783, time 120.19ms
iter 93890: loss 8.3386, time 120.54ms
tensor(0.1147)
iter 93900: loss 8.1054, time 120.73ms
iter 93910: loss 8.4642, time 120.53ms
iter 93920: loss 8.0907, time 120.13ms
iter 93930: loss 8.6874, time 119.44ms
iter 93940: loss 8.2152, time 121.49ms
iter 93950: loss 8.9746, time 120.27ms
iter 93960: loss 9.2369, time 122.36ms
iter 93970: loss 8.9118, time 122.28ms
iter 93980: loss 8.7052, time 121.07ms
iter 93990: loss 8.5991, time 123.09ms
tensor(0.0955)
step 94000: train loss 7.7567, val loss 7.7741
saving checkpoint to out-shakespeare-char
iter 94000: loss 9.2413, time 2858.32ms
iter 94010: loss 8.7248, time 119.31ms
iter 94020: loss 8.6618, time 119.63ms
iter 94030: loss 8.5597, time 119.29ms
iter 94040: loss 8.0026, time 118.91ms
iter 94050: loss 8.7929, time 120.30ms
iter 94060: loss 8.8175, time 120.53ms
iter 94070: loss 8.0674, time 120.09ms
iter 94080: loss 8.6072, time 119.95ms
iter 94090: loss 8.2787, time 119.27ms
tensor(0.0778)
iter 94100: loss 8.3620, time 120.76ms
iter 94110: loss 8.0859, time 120.99ms
iter 94120: loss 8.7526, time 121.33ms
iter 94130: loss 8.7349, time 121.49ms
iter 94140: loss 8.5559, time 121.27ms
iter 94150: loss 8.8618, time 122.31ms
iter 94160: loss 8.6090, time 121.56ms
iter 94170: loss 8.5751, time 121.25ms
iter 94180: loss 8.5077, time 120.64ms
iter 94190: loss 8.4699, time 121.31ms
tensor(0.0618)
iter 94200: loss 8.4770, time 120.97ms
iter 94210: loss 8.7308, time 121.05ms
iter 94220: loss 8.7077, time 121.13ms
iter 94230: loss 8.7450, time 121.14ms
iter 94240: loss 8.7354, time 120.98ms
step 94250: train loss 7.6917, val loss 7.7315
saving checkpoint to out-shakespeare-char
iter 94250: loss 8.9185, time 2831.92ms
iter 94260: loss 8.4432, time 119.97ms
iter 94270: loss 8.5487, time 120.07ms
iter 94280: loss 8.6279, time 120.34ms
iter 94290: loss 8.9414, time 120.31ms
tensor(0.0476)
iter 94300: loss 8.4085, time 119.27ms
iter 94310: loss 8.6972, time 120.60ms
iter 94320: loss 8.8422, time 121.56ms
iter 94330: loss 8.9840, time 122.24ms
iter 94340: loss 9.2549, time 122.26ms
iter 94350: loss 7.7676, time 121.06ms
iter 94360: loss 8.0583, time 122.30ms
iter 94370: loss 8.7230, time 120.90ms
iter 94380: loss 8.3730, time 120.57ms
iter 94390: loss 8.1861, time 121.00ms
tensor(0.0351)
iter 94400: loss 8.5331, time 121.28ms
iter 94410: loss 8.6370, time 121.01ms
iter 94420: loss 8.3949, time 120.75ms
iter 94430: loss 8.7603, time 121.10ms
iter 94440: loss 8.5062, time 119.12ms
iter 94450: loss 7.9546, time 118.86ms
iter 94460: loss 9.0312, time 119.83ms
iter 94470: loss 8.8716, time 120.31ms
iter 94480: loss 8.5814, time 120.17ms
iter 94490: loss 8.5297, time 120.21ms
tensor(0.0245)
step 94500: train loss 7.6834, val loss 7.7028
saving checkpoint to out-shakespeare-char
iter 94500: loss 8.1768, time 2827.54ms
iter 94510: loss 8.6507, time 119.27ms
iter 94520: loss 8.2583, time 120.31ms
iter 94530: loss 8.3743, time 122.36ms
iter 94540: loss 8.7087, time 122.49ms
iter 94550: loss 8.0907, time 122.40ms
iter 94560: loss 8.7811, time 121.10ms
iter 94570: loss 8.7411, time 121.12ms
iter 94580: loss 8.6061, time 121.08ms
iter 94590: loss 7.7863, time 121.03ms
tensor(0.0157)
iter 94600: loss 8.6496, time 121.49ms
iter 94610: loss 8.5620, time 121.18ms
iter 94620: loss 8.3968, time 119.21ms
iter 94630: loss 9.0310, time 118.90ms
iter 94640: loss 8.4569, time 118.67ms
iter 94650: loss 8.5246, time 120.10ms
iter 94660: loss 8.6291, time 120.21ms
iter 94670: loss 8.3818, time 120.49ms
iter 94680: loss 8.4386, time 120.26ms
iter 94690: loss 8.3112, time 120.02ms
tensor(0.0089)
iter 94700: loss 8.8910, time 121.25ms
iter 94710: loss 8.5403, time 120.88ms
iter 94720: loss 8.3580, time 121.79ms
iter 94730: loss 8.3125, time 122.85ms
iter 94740: loss 8.2776, time 119.04ms
step 94750: train loss 7.6613, val loss 7.6682
saving checkpoint to out-shakespeare-char
iter 94750: loss 8.5177, time 2820.41ms
iter 94760: loss 8.5550, time 121.68ms
iter 94770: loss 7.8995, time 121.41ms
iter 94780: loss 8.8972, time 119.80ms
iter 94790: loss 8.1426, time 120.24ms
tensor(0.0039)
iter 94800: loss 8.6737, time 120.89ms
iter 94810: loss 8.3074, time 120.68ms
iter 94820: loss 8.5497, time 120.21ms
iter 94830: loss 8.2344, time 120.88ms
iter 94840: loss 8.6082, time 121.46ms
iter 94850: loss 8.3167, time 119.99ms
iter 94860: loss 7.4714, time 122.52ms
iter 94870: loss 8.7234, time 121.15ms
iter 94880: loss 8.5438, time 121.05ms
iter 94890: loss 8.2013, time 120.53ms
tensor(0.0010)
iter 94900: loss 8.8192, time 119.20ms
iter 94910: loss 8.2592, time 121.14ms
iter 94920: loss 8.3139, time 118.94ms
iter 94930: loss 8.3794, time 119.48ms
iter 94940: loss 8.4414, time 119.80ms
iter 94950: loss 8.1061, time 120.38ms
iter 94960: loss 8.4432, time 119.43ms
iter 94970: loss 9.1257, time 120.07ms
iter 94980: loss 8.2121, time 119.97ms
iter 94990: loss 8.6369, time 120.58ms
tensor(0.0010)
step 95000: train loss 7.6899, val loss 7.6818
saving checkpoint to out-shakespeare-char
iter 95000: loss 8.4976, time 2846.96ms
iter 95010: loss 8.1026, time 119.41ms
iter 95020: loss 9.5609, time 120.14ms
iter 95030: loss 8.7980, time 120.94ms
iter 95040: loss 8.1347, time 119.18ms
iter 95050: loss 8.4878, time 118.30ms
iter 95060: loss 8.4051, time 118.87ms
iter 95070: loss 8.3534, time 119.18ms
iter 95080: loss 8.3396, time 120.24ms
iter 95090: loss 8.2533, time 120.23ms
tensor(0.0010)
iter 95100: loss 8.8916, time 120.47ms
iter 95110: loss 8.3317, time 120.41ms
iter 95120: loss 8.3234, time 119.27ms
iter 95130: loss 8.5133, time 121.47ms
iter 95140: loss 8.4598, time 121.68ms
iter 95150: loss 8.4241, time 122.52ms
iter 95160: loss 8.6418, time 122.72ms
iter 95170: loss 8.4180, time 121.71ms
iter 95180: loss 8.1154, time 121.28ms
iter 95190: loss 8.0236, time 121.06ms
tensor(0.0039)
iter 95200: loss 7.3996, time 118.74ms
iter 95210: loss 8.1571, time 119.03ms
iter 95220: loss 8.3913, time 119.26ms
iter 95230: loss 8.1957, time 119.68ms
iter 95240: loss 7.6923, time 120.15ms
step 95250: train loss 7.6341, val loss 7.6569
saving checkpoint to out-shakespeare-char
iter 95250: loss 8.6036, time 2857.25ms
iter 95260: loss 8.1661, time 122.00ms
iter 95270: loss 8.5231, time 121.25ms
iter 95280: loss 8.2984, time 121.29ms
iter 95290: loss 8.5548, time 121.07ms
tensor(0.0089)
iter 95300: loss 8.3789, time 121.48ms
iter 95310: loss 8.5253, time 121.24ms
iter 95320: loss 8.0668, time 121.48ms
iter 95330: loss 8.2822, time 121.12ms
iter 95340: loss 8.9272, time 118.98ms
iter 95350: loss 7.8299, time 118.89ms
iter 95360: loss 8.4414, time 119.58ms
iter 95370: loss 8.0093, time 120.23ms
iter 95380: loss 8.6600, time 120.17ms
iter 95390: loss 8.0777, time 120.29ms
tensor(0.0157)
iter 95400: loss 7.9242, time 120.06ms
iter 95410: loss 8.4173, time 119.47ms
iter 95420: loss 8.7222, time 120.54ms
iter 95430: loss 7.9322, time 120.71ms
iter 95440: loss 8.4196, time 121.38ms
iter 95450: loss 8.1436, time 122.74ms
iter 95460: loss 8.4727, time 119.40ms
iter 95470: loss 8.4107, time 120.58ms
iter 95480: loss 8.3734, time 121.17ms
iter 95490: loss 9.0827, time 120.15ms
tensor(0.0245)
step 95500: train loss 7.6773, val loss 7.6438
saving checkpoint to out-shakespeare-char
iter 95500: loss 8.2295, time 2829.54ms
iter 95510: loss 8.1637, time 119.32ms
iter 95520: loss 8.1227, time 119.50ms
iter 95530: loss 8.2662, time 120.17ms
iter 95540: loss 8.1543, time 120.69ms
iter 95550: loss 8.7637, time 120.23ms
iter 95560: loss 8.9403, time 120.26ms
iter 95570: loss 8.6879, time 120.10ms
iter 95580: loss 9.0921, time 121.11ms
iter 95590: loss 8.7460, time 121.25ms
tensor(0.0351)
iter 95600: loss 8.5758, time 122.17ms
iter 95610: loss 8.0406, time 122.49ms
iter 95620: loss 7.6746, time 118.90ms
iter 95630: loss 8.7850, time 121.38ms
iter 95640: loss 8.8237, time 121.27ms
iter 95650: loss 8.5449, time 121.67ms
iter 95660: loss 9.1933, time 120.24ms
iter 95670: loss 8.1009, time 119.05ms
iter 95680: loss 8.7730, time 121.17ms
iter 95690: loss 8.3361, time 120.16ms
tensor(0.0476)
iter 95700: loss 8.5912, time 120.96ms
iter 95710: loss 8.1106, time 120.91ms
iter 95720: loss 8.9798, time 120.47ms
iter 95730: loss 8.9792, time 119.02ms
iter 95740: loss 8.4942, time 119.57ms
step 95750: train loss 7.6808, val loss 7.7145
saving checkpoint to out-shakespeare-char
iter 95750: loss 8.6543, time 2835.07ms
iter 95760: loss 8.3300, time 122.79ms
iter 95770: loss 9.0291, time 121.49ms
iter 95780: loss 8.6866, time 119.42ms
iter 95790: loss 8.8879, time 120.95ms
tensor(0.0618)
iter 95800: loss 8.0492, time 121.38ms
iter 95810: loss 8.8072, time 119.08ms
iter 95820: loss 8.8830, time 119.17ms
iter 95830: loss 7.8989, time 119.88ms
iter 95840: loss 8.6602, time 119.04ms
iter 95850: loss 8.1855, time 120.32ms
iter 95860: loss 7.8515, time 120.05ms
iter 95870: loss 8.5199, time 119.15ms
iter 95880: loss 8.3955, time 120.52ms
iter 95890: loss 8.6905, time 119.72ms
tensor(0.0778)
iter 95900: loss 8.1568, time 122.38ms
iter 95910: loss 8.0434, time 122.37ms
iter 95920: loss 8.6301, time 120.65ms
iter 95930: loss 8.1574, time 121.26ms
iter 95940: loss 8.4036, time 121.15ms
iter 95950: loss 8.7874, time 121.28ms
iter 95960: loss 8.6050, time 121.53ms
iter 95970: loss 8.1269, time 119.27ms
iter 95980: loss 8.5716, time 119.34ms
iter 95990: loss 8.1398, time 119.63ms
tensor(0.0955)
step 96000: train loss 7.7138, val loss 7.7460
saving checkpoint to out-shakespeare-char
iter 96000: loss 8.3871, time 2825.66ms
iter 96010: loss 8.1869, time 120.64ms
iter 96020: loss 8.8573, time 120.90ms
iter 96030: loss 8.1953, time 121.25ms
iter 96040: loss 8.2367, time 122.27ms
iter 96050: loss 8.0143, time 121.16ms
iter 96060: loss 8.5033, time 122.84ms
iter 96070: loss 7.9082, time 121.61ms
iter 96080: loss 8.7797, time 120.98ms
iter 96090: loss 8.4835, time 121.09ms
tensor(0.1147)
iter 96100: loss 8.1632, time 121.45ms
iter 96110: loss 8.0599, time 121.19ms
iter 96120: loss 8.3703, time 119.21ms
iter 96130: loss 8.4331, time 119.66ms
iter 96140: loss 7.9688, time 118.62ms
iter 96150: loss 8.6438, time 119.94ms
iter 96160: loss 7.8803, time 120.36ms
iter 96170: loss 8.3513, time 120.32ms
iter 96180: loss 8.4076, time 120.18ms
iter 96190: loss 8.6876, time 120.07ms
tensor(0.1355)
iter 96200: loss 8.2957, time 120.64ms
iter 96210: loss 8.1886, time 120.36ms
iter 96220: loss 8.5177, time 120.10ms
iter 96230: loss 8.4642, time 120.41ms
iter 96240: loss 8.7396, time 122.30ms
step 96250: train loss 7.7584, val loss 7.7848
saving checkpoint to out-shakespeare-char
iter 96250: loss 7.9977, time 2829.49ms
iter 96260: loss 8.7109, time 120.42ms
iter 96270: loss 8.7247, time 121.31ms
iter 96280: loss 8.5083, time 119.66ms
iter 96290: loss 8.6976, time 120.00ms
tensor(0.1577)
iter 96300: loss 8.4138, time 118.71ms
iter 96310: loss 8.0441, time 120.34ms
iter 96320: loss 8.2432, time 120.08ms
iter 96330: loss 8.5484, time 120.13ms
iter 96340: loss 8.2084, time 120.25ms
iter 96350: loss 8.8790, time 120.95ms
iter 96360: loss 7.9826, time 121.30ms
iter 96370: loss 8.1372, time 121.91ms
iter 96380: loss 8.5851, time 122.03ms
iter 96390: loss 7.8969, time 120.38ms
tensor(0.1813)
iter 96400: loss 8.0947, time 122.72ms
iter 96410: loss 8.2622, time 121.18ms
iter 96420: loss 7.7890, time 120.90ms
iter 96430: loss 8.0809, time 119.29ms
iter 96440: loss 8.4714, time 118.26ms
iter 96450: loss 8.4091, time 121.58ms
iter 96460: loss 7.9712, time 121.59ms
iter 96470: loss 8.5543, time 119.45ms
iter 96480: loss 7.7354, time 118.98ms
iter 96490: loss 8.6372, time 119.12ms
tensor(0.2061)
step 96500: train loss 7.8609, val loss 7.8252
saving checkpoint to out-shakespeare-char
iter 96500: loss 8.2306, time 2814.11ms
iter 96510: loss 8.5734, time 117.72ms
iter 96520: loss 7.9875, time 120.36ms
iter 96530: loss 8.1745, time 119.01ms
iter 96540: loss 8.3756, time 120.13ms
iter 96550: loss 8.2387, time 120.40ms
iter 96560: loss 8.8048, time 120.33ms
iter 96570: loss 8.5531, time 119.61ms
iter 96580: loss 8.2304, time 120.60ms
iter 96590: loss 8.2822, time 120.69ms
tensor(0.2321)
iter 96600: loss 8.2751, time 119.82ms
iter 96610: loss 8.7351, time 122.21ms
iter 96620: loss 8.3053, time 123.44ms
iter 96630: loss 8.9980, time 120.49ms
iter 96640: loss 8.2095, time 121.28ms
iter 96650: loss 8.4597, time 118.93ms
iter 96660: loss 9.1520, time 121.10ms
iter 96670: loss 8.3615, time 121.41ms
iter 96680: loss 8.1953, time 119.50ms
iter 96690: loss 8.5190, time 119.03ms
tensor(0.2591)
iter 96700: loss 8.3441, time 118.81ms
iter 96710: loss 8.6677, time 119.68ms
iter 96720: loss 8.7739, time 120.34ms
iter 96730: loss 8.2043, time 120.33ms
iter 96740: loss 8.7084, time 120.27ms
step 96750: train loss 7.8731, val loss 7.8523
saving checkpoint to out-shakespeare-char
iter 96750: loss 8.5584, time 2848.46ms
iter 96760: loss 8.1945, time 120.48ms
iter 96770: loss 7.9861, time 121.12ms
iter 96780: loss 8.9947, time 121.69ms
iter 96790: loss 8.2055, time 121.67ms
tensor(0.2871)
iter 96800: loss 8.3148, time 121.20ms
iter 96810: loss 8.3151, time 119.64ms
iter 96820: loss 8.3782, time 119.29ms
iter 96830: loss 8.1511, time 119.91ms
iter 96840: loss 8.8020, time 120.66ms
iter 96850: loss 8.0489, time 120.22ms
iter 96860: loss 8.0856, time 120.28ms
iter 96870: loss 8.5702, time 120.18ms
iter 96880: loss 8.6457, time 121.20ms
iter 96890: loss 8.3151, time 122.72ms
tensor(0.3159)
iter 96900: loss 8.2947, time 123.35ms
iter 96910: loss 8.2212, time 121.50ms
iter 96920: loss 8.0641, time 121.52ms
iter 96930: loss 8.3363, time 121.45ms
iter 96940: loss 8.2999, time 121.46ms
iter 96950: loss 8.6036, time 121.70ms
iter 96960: loss 8.7615, time 119.36ms
iter 96970: loss 8.2761, time 119.92ms
iter 96980: loss 8.5873, time 120.88ms
iter 96990: loss 8.3363, time 121.16ms
tensor(0.3455)
step 97000: train loss 7.8980, val loss 7.9103
saving checkpoint to out-shakespeare-char
iter 97000: loss 8.7736, time 2864.77ms
iter 97010: loss 8.1709, time 120.02ms
iter 97020: loss 8.2569, time 120.76ms
iter 97030: loss 8.6871, time 120.99ms
iter 97040: loss 8.2267, time 120.82ms
iter 97050: loss 8.2840, time 122.09ms
iter 97060: loss 8.3848, time 121.06ms
iter 97070: loss 9.1986, time 121.56ms
iter 97080: loss 8.2876, time 121.74ms
iter 97090: loss 8.3637, time 122.02ms
tensor(0.3757)
iter 97100: loss 7.9291, time 119.83ms
iter 97110: loss 8.0623, time 121.59ms
iter 97120: loss 8.2921, time 119.53ms
iter 97130: loss 8.8512, time 121.04ms
iter 97140: loss 8.4424, time 121.03ms
iter 97150: loss 8.4060, time 122.27ms
iter 97160: loss 8.1930, time 123.17ms
iter 97170: loss 8.1402, time 121.73ms
iter 97180: loss 8.8098, time 121.57ms
iter 97190: loss 8.0495, time 121.86ms
tensor(0.4063)
iter 97200: loss 7.8958, time 119.77ms
iter 97210: loss 8.6153, time 120.03ms
iter 97220: loss 8.6054, time 119.73ms
iter 97230: loss 8.1318, time 120.11ms
iter 97240: loss 8.0632, time 121.11ms
step 97250: train loss 7.9268, val loss 7.9054
saving checkpoint to out-shakespeare-char
iter 97250: loss 7.6750, time 2860.84ms
iter 97260: loss 8.3827, time 119.91ms
iter 97270: loss 8.4905, time 120.87ms
iter 97280: loss 8.7682, time 120.93ms
iter 97290: loss 8.0888, time 122.36ms
tensor(0.4373)
iter 97300: loss 8.0770, time 122.81ms
iter 97310: loss 8.3890, time 120.10ms
iter 97320: loss 7.9221, time 122.57ms
iter 97330: loss 8.3176, time 122.66ms
iter 97340: loss 8.1674, time 122.45ms
iter 97350: loss 8.5904, time 122.46ms
iter 97360: loss 8.8221, time 120.04ms
iter 97370: loss 8.4510, time 120.46ms
iter 97380: loss 8.1699, time 121.32ms
iter 97390: loss 8.6679, time 121.34ms
tensor(0.4686)
iter 97400: loss 8.2134, time 121.91ms
iter 97410: loss 8.6355, time 118.31ms
iter 97420: loss 7.7468, time 119.84ms
iter 97430: loss 8.0899, time 119.16ms
iter 97440: loss 7.9293, time 119.06ms
iter 97450: loss 8.3571, time 119.08ms
iter 97460: loss 8.5517, time 119.53ms
iter 97470: loss 8.0489, time 119.05ms
iter 97480: loss 8.7685, time 120.47ms
iter 97490: loss 8.5511, time 121.26ms
tensor(0.5000)
step 97500: train loss 7.9091, val loss 7.9416
saving checkpoint to out-shakespeare-char
iter 97500: loss 8.0538, time 2849.57ms
iter 97510: loss 7.9566, time 121.25ms
iter 97520: loss 7.9267, time 119.10ms
iter 97530: loss 7.8289, time 119.26ms
iter 97540: loss 8.1886, time 119.27ms
iter 97550: loss 8.2167, time 119.46ms
iter 97560: loss 8.6907, time 118.57ms
iter 97570: loss 8.2933, time 116.82ms
iter 97580: loss 8.8113, time 120.10ms
iter 97590: loss 8.6506, time 119.60ms
tensor(0.5314)
iter 97600: loss 8.2888, time 119.81ms
iter 97610: loss 7.7186, time 120.04ms
iter 97620: loss 7.8339, time 120.32ms
iter 97630: loss 8.3747, time 121.59ms
iter 97640: loss 8.6756, time 122.05ms
iter 97650: loss 8.5144, time 122.58ms
iter 97660: loss 8.0917, time 122.32ms
iter 97670: loss 8.2864, time 120.12ms
iter 97680: loss 8.3860, time 122.54ms
iter 97690: loss 8.2409, time 121.14ms
tensor(0.5627)
iter 97700: loss 8.3269, time 121.32ms
iter 97710: loss 8.0861, time 121.16ms
iter 97720: loss 8.4439, time 119.16ms
iter 97730: loss 8.1649, time 119.04ms
iter 97740: loss 8.2695, time 119.21ms
step 97750: train loss 7.9438, val loss 7.9749
saving checkpoint to out-shakespeare-char
iter 97750: loss 8.1900, time 2851.66ms
iter 97760: loss 8.2218, time 122.54ms
iter 97770: loss 8.3086, time 122.48ms
iter 97780: loss 8.2664, time 120.85ms
iter 97790: loss 8.3206, time 121.97ms
tensor(0.5937)
iter 97800: loss 8.1115, time 121.70ms
iter 97810: loss 8.6318, time 121.41ms
iter 97820: loss 8.2524, time 119.10ms
iter 97830: loss 8.6592, time 119.57ms
iter 97840: loss 7.9244, time 119.56ms
iter 97850: loss 8.7634, time 119.68ms
iter 97860: loss 8.1477, time 120.52ms
iter 97870: loss 8.2286, time 121.28ms
iter 97880: loss 8.9273, time 122.67ms
iter 97890: loss 8.2573, time 121.54ms
tensor(0.6243)
iter 97900: loss 7.9838, time 122.84ms
iter 97910: loss 8.1352, time 122.72ms
iter 97920: loss 8.6506, time 121.49ms
iter 97930: loss 8.7028, time 119.65ms
iter 97940: loss 8.2388, time 120.03ms
iter 97950: loss 8.6788, time 119.57ms
iter 97960: loss 8.2858, time 119.83ms
iter 97970: loss 8.2666, time 121.09ms
iter 97980: loss 7.9377, time 122.03ms
iter 97990: loss 7.8925, time 122.80ms
tensor(0.6545)
step 98000: train loss 7.9779, val loss 7.9704
saving checkpoint to out-shakespeare-char
iter 98000: loss 8.3482, time 2854.20ms
iter 98010: loss 8.5156, time 119.13ms
iter 98020: loss 8.1604, time 119.47ms
iter 98030: loss 8.4833, time 120.07ms
iter 98040: loss 8.9325, time 120.45ms
iter 98050: loss 8.5293, time 120.48ms
iter 98060: loss 8.6788, time 121.70ms
iter 98070: loss 8.4701, time 122.35ms
iter 98080: loss 8.1999, time 120.66ms
iter 98090: loss 10.0003, time 122.38ms
tensor(0.6841)
iter 98100: loss 10.0383, time 121.64ms
iter 98110: loss 9.9239, time 121.66ms
iter 98120: loss 10.7265, time 119.22ms
iter 98130: loss 9.6803, time 119.12ms
iter 98140: loss 11.7975, time 119.54ms
iter 98150: loss 10.8708, time 119.65ms
iter 98160: loss 10.0255, time 120.43ms
iter 98170: loss 10.4171, time 120.20ms
iter 98180: loss 10.6181, time 120.96ms
iter 98190: loss 10.4750, time 120.60ms
tensor(0.7129)
iter 98200: loss 10.3066, time 122.82ms
iter 98210: loss 9.4096, time 122.65ms
iter 98220: loss 9.0948, time 122.45ms
iter 98230: loss 9.4647, time 122.43ms
iter 98240: loss 9.2206, time 121.40ms
step 98250: train loss 7.9695, val loss 7.9381
saving checkpoint to out-shakespeare-char
iter 98250: loss 8.7011, time 2836.63ms
iter 98260: loss 9.7684, time 119.27ms
iter 98270: loss 10.0774, time 119.45ms
iter 98280: loss 8.6235, time 119.68ms
iter 98290: loss 8.5149, time 119.51ms
tensor(0.7409)
iter 98300: loss 9.9410, time 119.56ms
iter 98310: loss 9.4462, time 121.33ms
iter 98320: loss 8.7845, time 122.02ms
iter 98330: loss 8.6025, time 122.43ms
iter 98340: loss 9.0662, time 122.65ms
iter 98350: loss 9.1064, time 121.24ms
iter 98360: loss 8.8240, time 122.54ms
iter 98370: loss 8.8780, time 121.62ms
iter 98380: loss 8.7459, time 121.19ms
iter 98390: loss 8.8752, time 119.96ms
tensor(0.7679)
iter 98400: loss 8.4426, time 119.46ms
iter 98410: loss 9.1177, time 119.42ms
iter 98420: loss 8.9937, time 119.03ms
iter 98430: loss 8.5522, time 119.61ms
iter 98440: loss 8.5920, time 120.34ms
iter 98450: loss 8.8859, time 120.57ms
iter 98460: loss 8.3114, time 121.59ms
iter 98470: loss 8.5784, time 122.28ms
iter 98480: loss 8.3904, time 120.02ms
iter 98490: loss 8.9745, time 122.19ms
tensor(0.7939)
step 98500: train loss 7.9519, val loss 7.9414
saving checkpoint to out-shakespeare-char
iter 98500: loss 8.6350, time 2857.76ms
iter 98510: loss 8.9669, time 119.63ms
iter 98520: loss 9.6242, time 119.60ms
iter 98530: loss 12.0547, time 120.41ms
iter 98540: loss 14.0525, time 120.78ms
iter 98550: loss 16.7335, time 122.50ms
iter 98560: loss 21.4668, time 123.24ms
iter 98570: loss 17.2733, time 122.57ms
iter 98580: loss 20.7322, time 121.16ms
iter 98590: loss 16.6809, time 119.25ms
tensor(0.8187)
iter 98600: loss 17.4501, time 121.39ms
iter 98610: loss 18.6250, time 119.25ms
iter 98620: loss 22.0700, time 119.47ms
iter 98630: loss 16.3761, time 119.23ms
iter 98640: loss 24.6201, time 119.62ms
iter 98650: loss 20.1978, time 119.25ms
iter 98660: loss 24.3964, time 120.57ms
iter 98670: loss 22.8440, time 120.94ms
iter 98680: loss 29.5729, time 121.67ms
iter 98690: loss 23.6278, time 122.55ms
tensor(0.8423)
iter 98700: loss 15.3410, time 121.54ms
iter 98710: loss 23.2694, time 122.16ms
iter 98720: loss 15.5962, time 122.31ms
iter 98730: loss 19.1434, time 121.22ms
iter 98740: loss 20.5267, time 121.64ms
step 98750: train loss 7.9897, val loss 8.0404
saving checkpoint to out-shakespeare-char
iter 98750: loss 18.1981, time 2858.32ms
iter 98760: loss 21.3907, time 120.21ms
iter 98770: loss 26.4944, time 122.26ms
iter 98780: loss 21.6257, time 122.05ms
iter 98790: loss 11.8546, time 122.26ms
tensor(0.8645)
iter 98800: loss 12.9808, time 123.11ms
iter 98810: loss 15.3744, time 122.00ms
iter 98820: loss 14.7352, time 121.12ms
iter 98830: loss 17.4299, time 119.15ms
iter 98840: loss 16.2494, time 118.79ms
iter 98850: loss 10.9935, time 119.38ms
iter 98860: loss 13.3780, time 119.35ms
iter 98870: loss 12.1856, time 120.16ms
iter 98880: loss 12.8655, time 120.13ms
iter 98890: loss 15.8566, time 119.59ms
tensor(0.8853)
iter 98900: loss 12.6105, time 121.51ms
iter 98910: loss 12.0677, time 122.69ms
iter 98920: loss 12.0523, time 121.80ms
iter 98930: loss 12.4601, time 122.23ms
iter 98940: loss 12.8574, time 120.26ms
iter 98950: loss 10.5796, time 123.00ms
iter 98960: loss 11.3333, time 122.38ms
iter 98970: loss 9.1975, time 121.36ms
iter 98980: loss 12.2316, time 121.31ms
iter 98990: loss 10.5098, time 119.17ms
tensor(0.9045)
step 99000: train loss 7.9994, val loss 7.9567
saving checkpoint to out-shakespeare-char
iter 99000: loss 10.5370, time 2850.24ms
iter 99010: loss 9.7576, time 121.16ms
iter 99020: loss 9.7854, time 121.60ms
iter 99030: loss 9.9575, time 122.57ms
iter 99040: loss 11.1565, time 122.42ms
iter 99050: loss 10.1766, time 120.19ms
iter 99060: loss 9.4099, time 121.41ms
iter 99070: loss 9.9895, time 121.27ms
iter 99080: loss 8.7723, time 121.40ms
iter 99090: loss 9.1457, time 119.24ms
tensor(0.9222)
iter 99100: loss 10.1596, time 119.18ms
iter 99110: loss 8.8673, time 119.06ms
iter 99120: loss 9.3241, time 118.98ms
iter 99130: loss 9.8817, time 119.30ms
iter 99140: loss 9.3185, time 119.04ms
iter 99150: loss 9.3379, time 119.52ms
iter 99160: loss 9.4368, time 118.35ms
iter 99170: loss 9.0129, time 120.30ms
iter 99180: loss 9.2169, time 120.25ms
iter 99190: loss 9.0029, time 121.39ms
tensor(0.9382)
iter 99200: loss 8.8415, time 122.26ms
iter 99210: loss 8.8026, time 121.21ms
iter 99220: loss 9.4077, time 122.16ms
iter 99230: loss 8.5541, time 122.73ms
iter 99240: loss 8.8814, time 122.50ms
step 99250: train loss 7.9779, val loss 7.9152
saving checkpoint to out-shakespeare-char
iter 99250: loss 8.6423, time 2842.43ms
iter 99260: loss 8.6679, time 119.27ms
iter 99270: loss 8.4020, time 119.36ms
iter 99280: loss 8.5326, time 119.18ms
iter 99290: loss 8.6275, time 118.55ms
tensor(0.9524)
iter 99300: loss 7.9857, time 120.37ms
iter 99310: loss 8.2086, time 120.33ms
iter 99320: loss 8.8622, time 119.02ms
iter 99330: loss 9.4086, time 120.78ms
iter 99340: loss 8.9679, time 120.19ms
iter 99350: loss 9.4215, time 122.15ms
iter 99360: loss 8.5691, time 122.74ms
iter 99370: loss 78.4552, time 121.51ms
iter 99380: loss 171.8403, time 122.50ms
iter 99390: loss 35.1120, time 122.24ms
tensor(0.9649)
iter 99400: loss 22.2077, time 122.80ms
iter 99410: loss 15.0199, time 122.70ms
iter 99420: loss 16.9571, time 121.36ms
iter 99430: loss 18.0596, time 121.38ms
iter 99440: loss 24.4237, time 119.34ms
iter 99450: loss 50.5153, time 119.29ms
iter 99460: loss 22.1381, time 119.09ms
iter 99470: loss 23.9407, time 118.97ms
iter 99480: loss 27.2303, time 119.60ms
iter 99490: loss 28.2990, time 119.96ms
tensor(0.9755)
step 99500: train loss 9.6060, val loss 9.6171
saving checkpoint to out-shakespeare-char
iter 99500: loss 33.1348, time 2849.11ms
iter 99510: loss 40.5701, time 122.83ms
iter 99520: loss 47.5239, time 122.95ms
iter 99530: loss 57.6319, time 121.19ms
iter 99540: loss 60.4357, time 121.11ms
iter 99550: loss 67.2148, time 119.05ms
iter 99560: loss 80.6831, time 119.11ms
iter 99570: loss 90.6546, time 118.84ms
iter 99580: loss 64.9822, time 119.19ms
iter 99590: loss 70.7963, time 119.23ms
tensor(0.9843)
iter 99600: loss 80.4863, time 119.44ms
iter 99610: loss 87.4237, time 118.96ms
iter 99620: loss 64.8901, time 119.26ms
iter 99630: loss 60.4489, time 119.85ms
iter 99640: loss 54.8358, time 120.53ms
iter 99650: loss 46.9744, time 120.67ms
iter 99660: loss 44.7041, time 120.24ms
iter 99670: loss 48.0343, time 122.59ms
iter 99680: loss 37.8700, time 122.34ms
iter 99690: loss 39.6280, time 122.55ms
tensor(0.9911)
iter 99700: loss 36.4369, time 121.55ms
iter 99710: loss 27.6175, time 120.12ms
iter 99720: loss 28.3344, time 121.22ms
iter 99730: loss 25.9681, time 121.49ms
iter 99740: loss 25.6136, time 120.20ms
step 99750: train loss 7.9798, val loss 8.0262
saving checkpoint to out-shakespeare-char
iter 99750: loss 22.5881, time 2848.42ms
iter 99760: loss 20.7230, time 120.59ms
iter 99770: loss 18.5581, time 120.20ms
iter 99780: loss 17.7001, time 121.09ms
iter 99790: loss 16.7423, time 122.04ms
tensor(0.9961)
iter 99800: loss 18.2028, time 122.74ms
iter 99810: loss 18.2609, time 122.58ms
iter 99820: loss 21.2588, time 120.30ms
iter 99830: loss 23.6077, time 122.52ms
iter 99840: loss 27.7836, time 121.41ms
iter 99850: loss 29.6336, time 121.19ms
iter 99860: loss 29.3966, time 120.89ms
iter 99870: loss 28.8866, time 118.99ms
iter 99880: loss 30.1557, time 119.10ms
iter 99890: loss 32.7018, time 119.12ms
tensor(0.9990)
iter 99900: loss 37.4961, time 119.55ms
iter 99910: loss 38.0807, time 118.79ms
iter 99920: loss 38.0727, time 118.98ms
iter 99930: loss 36.7284, time 119.00ms
iter 99940: loss 31.5821, time 119.91ms
iter 99950: loss 26.6555, time 120.40ms
iter 99960: loss 24.5876, time 119.35ms
iter 99970: loss 27.7916, time 120.69ms
iter 99980: loss 25.8906, time 119.96ms
iter 99990: loss 30.9891, time 121.58ms
tensor(1.)
step 100000: train loss 7.9030, val loss 7.9295
saving checkpoint to out-shakespeare-char
iter 100000: loss 29.4445, time 2851.75ms
iter 100010: loss 30.1040, time 119.32ms
iter 100020: loss 29.4431, time 119.12ms
iter 100030: loss 36.8157, time 119.05ms
iter 100040: loss 40.8412, time 119.64ms
iter 100050: loss 39.8537, time 119.18ms
iter 100060: loss 35.4309, time 119.25ms
iter 100070: loss 33.5806, time 119.03ms
iter 100080: loss 38.3862, time 119.82ms
iter 100090: loss 35.8751, time 118.92ms
tensor(0.9990)
iter 100100: loss 31.3640, time 120.80ms
iter 100110: loss 30.3029, time 120.79ms
iter 100120: loss 25.2545, time 121.43ms
iter 100130: loss 24.8529, time 122.16ms
iter 100140: loss 22.2293, time 121.78ms
iter 100150: loss 25.4170, time 122.54ms
iter 100160: loss 20.9951, time 122.27ms
iter 100170: loss 18.3511, time 122.28ms
iter 100180: loss 18.5906, time 122.46ms
iter 100190: loss 18.5479, time 121.15ms
tensor(0.9961)
iter 100200: loss 16.5888, time 121.35ms
iter 100210: loss 16.8022, time 121.26ms
iter 100220: loss 15.4355, time 119.01ms
iter 100230: loss 11.9889, time 118.83ms
iter 100240: loss 15.1059, time 119.12ms
step 100250: train loss 7.9572, val loss 7.9554
saving checkpoint to out-shakespeare-char
iter 100250: loss 12.4227, time 2844.52ms
iter 100260: loss 13.0825, time 122.22ms
iter 100270: loss 12.2084, time 122.63ms
iter 100280: loss 12.9546, time 122.26ms
iter 100290: loss 12.4478, time 122.28ms
tensor(0.9911)
iter 100300: loss 11.6919, time 121.43ms
iter 100310: loss 11.6698, time 121.32ms
iter 100320: loss 11.1610, time 121.29ms
iter 100330: loss 12.1278, time 119.00ms
iter 100340: loss 10.4240, time 118.93ms
iter 100350: loss 11.1050, time 119.24ms
iter 100360: loss 10.0227, time 119.84ms
iter 100370: loss 11.2636, time 120.29ms
iter 100380: loss 11.5654, time 120.63ms
iter 100390: loss 14.4782, time 120.34ms
tensor(0.9843)
iter 100400: loss 19.1357, time 118.38ms
iter 100410: loss 18.0690, time 121.42ms
iter 100420: loss 27.6041, time 121.84ms
iter 100430: loss 28.9795, time 120.38ms
iter 100440: loss 26.5488, time 122.27ms
iter 100450: loss 32.0548, time 121.34ms
iter 100460: loss 37.7998, time 121.04ms
iter 100470: loss 35.9642, time 121.32ms
iter 100480: loss 38.8277, time 119.21ms
iter 100490: loss 36.9674, time 121.01ms
tensor(0.9755)
step 100500: train loss 7.9238, val loss 7.8735
saving checkpoint to out-shakespeare-char
iter 100500: loss 44.8931, time 2851.91ms
iter 100510: loss 43.1462, time 121.00ms
iter 100520: loss 60.0494, time 122.07ms
iter 100530: loss 54.4272, time 122.44ms
iter 100540: loss 53.5715, time 119.64ms
iter 100550: loss 40.2896, time 121.25ms
iter 100560: loss 44.8061, time 121.09ms
iter 100570: loss 39.1080, time 121.48ms
iter 100580: loss 42.4260, time 121.07ms
iter 100590: loss 39.8527, time 119.22ms
tensor(0.9649)
iter 100600: loss 46.0603, time 119.74ms
iter 100610: loss 38.9538, time 119.96ms
iter 100620: loss 52.2155, time 120.51ms
iter 100630: loss 57.6972, time 120.03ms
iter 100640: loss 63.1696, time 120.82ms
iter 100650: loss 54.8604, time 120.30ms
iter 100660: loss 59.0956, time 122.34ms
iter 100670: loss 40.6508, time 122.60ms
iter 100680: loss 51.3191, time 120.55ms
iter 100690: loss 41.8791, time 121.35ms
tensor(0.9524)
iter 100700: loss 46.5966, time 121.57ms
iter 100710: loss 40.7596, time 121.49ms
iter 100720: loss 27.9785, time 119.56ms
iter 100730: loss 42.5997, time 119.08ms
iter 100740: loss 41.2655, time 119.75ms
step 100750: train loss 8.5729, val loss 8.6243
saving checkpoint to out-shakespeare-char
iter 100750: loss 25.0112, time 2846.81ms
iter 100760: loss 357.2743, time 121.16ms
iter 100770: loss 204.5876, time 122.30ms
iter 100780: loss 57.3665, time 122.31ms
iter 100790: loss 185.9599, time 121.11ms
tensor(0.9382)
iter 100800: loss 368.0521, time 121.52ms
iter 100810: loss 111.7937, time 121.04ms
iter 100820: loss 117.2912, time 122.05ms
iter 100830: loss 172.1796, time 121.52ms
iter 100840: loss 167.2321, time 119.30ms
iter 100850: loss 195.1120, time 120.42ms
iter 100860: loss 173.4411, time 122.22ms
iter 100870: loss 125.1383, time 120.64ms
iter 100880: loss 317.6906, time 120.56ms
iter 100890: loss 256.5389, time 120.11ms
tensor(0.9222)
iter 100900: loss 200.4399, time 121.94ms
iter 100910: loss 185.7820, time 121.14ms
iter 100920: loss 153.0748, time 122.67ms
iter 100930: loss 129.4730, time 122.26ms
iter 100940: loss 154.5880, time 118.99ms
iter 100950: loss 98.7161, time 121.15ms
iter 100960: loss 74.9726, time 121.01ms
iter 100970: loss 112.6890, time 120.95ms
iter 100980: loss 109.1115, time 121.26ms
iter 100990: loss 433.5932, time 118.75ms
tensor(0.9045)
step 101000: train loss 57.6825, val loss 56.8860
saving checkpoint to out-shakespeare-char
iter 101000: loss 151.6270, time 2829.66ms
iter 101010: loss 82.1038, time 121.37ms
iter 101020: loss 65.2892, time 120.11ms
iter 101030: loss 118.1086, time 120.43ms
iter 101040: loss 94.1276, time 120.22ms
iter 101050: loss 96.4097, time 119.88ms
iter 101060: loss 78.8306, time 120.70ms
iter 101070: loss 62.2392, time 121.17ms
iter 101080: loss 95.6099, time 122.00ms
iter 101090: loss 128.7475, time 122.14ms
tensor(0.8853)
iter 101100: loss 108.5428, time 120.21ms
iter 101110: loss 66.3498, time 120.31ms
iter 101120: loss 85.0764, time 121.00ms
iter 101130: loss 111.3458, time 121.79ms
iter 101140: loss 82.1690, time 120.77ms
iter 101150: loss 100.0255, time 118.83ms
iter 101160: loss 65.2108, time 121.02ms
iter 101170: loss 92.5536, time 121.04ms
iter 101180: loss 74.1876, time 120.96ms
iter 101190: loss 71.2980, time 118.83ms
tensor(0.8645)
iter 101200: loss 65.2602, time 119.18ms
iter 101210: loss 84.2542, time 118.66ms
iter 101220: loss 98.8512, time 120.11ms
iter 101230: loss 91.5672, time 120.13ms
iter 101240: loss 103.0976, time 120.05ms
step 101250: train loss 38.3800, val loss 38.7125
saving checkpoint to out-shakespeare-char
iter 101250: loss 90.0563, time 2812.41ms
iter 101260: loss 102.1935, time 120.21ms
iter 101270: loss 89.1383, time 120.16ms
iter 101280: loss 93.0867, time 120.02ms
iter 101290: loss 117.0120, time 120.50ms
tensor(0.8423)
iter 101300: loss 93.9671, time 120.81ms
iter 101310: loss 107.3459, time 120.13ms
iter 101320: loss 80.1853, time 121.79ms
iter 101330: loss 98.8325, time 121.99ms
iter 101340: loss 102.5359, time 122.24ms
iter 101350: loss 201.9042, time 122.14ms
iter 101360: loss 93.7766, time 119.43ms
iter 101370: loss 112.1668, time 120.45ms
iter 101380: loss 839.1853, time 120.82ms
iter 101390: loss 147.3237, time 120.94ms
tensor(0.8187)
iter 101400: loss 146.2534, time 121.35ms
iter 101410: loss 115.7346, time 118.91ms
iter 101420: loss 119.0583, time 120.73ms
iter 101430: loss 125.7431, time 120.97ms
iter 101440: loss 124.8887, time 120.96ms
iter 101450: loss 224.6302, time 119.19ms
iter 101460: loss 136.9092, time 118.83ms
iter 101470: loss 143.6285, time 117.78ms
iter 101480: loss 109.7731, time 119.27ms
iter 101490: loss 121.8244, time 119.63ms
tensor(0.7939)
step 101500: train loss 60.0183, val loss 60.5659
saving checkpoint to out-shakespeare-char
iter 101500: loss 108.6480, time 2828.95ms
iter 101510: loss 98.3855, time 120.47ms
iter 101520: loss 87.1178, time 120.26ms
iter 101530: loss 84.0296, time 122.40ms
iter 101540: loss 69.2827, time 122.44ms
iter 101550: loss 74.8474, time 122.19ms
iter 101560: loss 81.2791, time 121.63ms
iter 101570: loss 52.5731, time 118.10ms
iter 101580: loss 80.1555, time 121.69ms
iter 101590: loss 68.1621, time 121.01ms
tensor(0.7679)
iter 101600: loss 76.2910, time 121.75ms
iter 101610: loss 60.5242, time 121.23ms
iter 101620: loss 63.7452, time 118.55ms
iter 101630: loss 60.1947, time 119.92ms
iter 101640: loss 50.3264, time 120.91ms
iter 101650: loss 62.9659, time 120.37ms
iter 101660: loss 57.6778, time 120.28ms
iter 101670: loss 65.5077, time 118.47ms
iter 101680: loss 58.7922, time 120.97ms
iter 101690: loss 68.9339, time 120.49ms
tensor(0.7409)
iter 101700: loss 61.0996, time 120.33ms
iter 101710: loss 53.2747, time 120.90ms
iter 101720: loss 71.1332, time 118.24ms
iter 101730: loss 69.0901, time 119.99ms
iter 101740: loss 82.0501, time 120.83ms
step 101750: train loss 13.4037, val loss 13.4312
saving checkpoint to out-shakespeare-char
iter 101750: loss 64.3455, time 2793.35ms
iter 101760: loss 63.7840, time 121.34ms
iter 101770: loss 62.3514, time 121.04ms
iter 101780: loss 73.3725, time 120.91ms
iter 101790: loss 72.4246, time 121.17ms
tensor(0.7129)
iter 101800: loss 67.0002, time 121.49ms
iter 101810: loss 66.1061, time 120.99ms
iter 101820: loss 69.5715, time 119.09ms
iter 101830: loss 74.9727, time 119.11ms
iter 101840: loss 62.1625, time 119.99ms
iter 101850: loss 62.6826, time 119.75ms
iter 101860: loss 79.8194, time 120.26ms
iter 101870: loss 69.9649, time 120.52ms
iter 101880: loss 61.5202, time 119.85ms
iter 101890: loss 63.5545, time 120.12ms
tensor(0.6841)
iter 101900: loss 69.3932, time 120.74ms
iter 101910: loss 68.0623, time 121.21ms
iter 101920: loss 62.5592, time 122.56ms
iter 101930: loss 79.0142, time 120.08ms
iter 101940: loss 64.1748, time 121.41ms
iter 101950: loss 60.0869, time 120.97ms
iter 101960: loss 64.1879, time 121.28ms
iter 101970: loss 72.9426, time 121.23ms
iter 101980: loss 67.3525, time 119.03ms
iter 101990: loss 73.7691, time 119.19ms
tensor(0.6545)
step 102000: train loss 13.2718, val loss 13.1946
saving checkpoint to out-shakespeare-char
iter 102000: loss 65.1508, time 2863.53ms
iter 102010: loss 85.0324, time 124.55ms
iter 102020: loss 84.4506, time 122.35ms
iter 102030: loss 57.1594, time 121.31ms
iter 102040: loss 74.3410, time 118.93ms
iter 102050: loss 81.2152, time 120.67ms
iter 102060: loss 77.3819, time 121.01ms
iter 102070: loss 69.9527, time 120.91ms
iter 102080: loss 73.7878, time 121.16ms
iter 102090: loss 81.2393, time 118.80ms
tensor(0.6243)
iter 102100: loss 86.5235, time 121.40ms
iter 102110: loss 70.2449, time 118.83ms
iter 102120: loss 71.2721, time 118.86ms
iter 102130: loss 72.3699, time 119.60ms
iter 102140: loss nan, time 120.06ms
iter 102150: loss nan, time 118.62ms
iter 102160: loss nan, time 120.05ms
iter 102170: loss nan, time 120.20ms
iter 102180: loss nan, time 119.96ms
iter 102190: loss nan, time 119.94ms
tensor(0.5937)
iter 102200: loss nan, time 118.95ms
iter 102210: loss nan, time 120.95ms
iter 102220: loss nan, time 121.78ms
iter 102230: loss nan, time 123.05ms
iter 102240: loss nan, time 122.52ms
step 102250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 102250: loss nan, time 2896.53ms
iter 102260: loss nan, time 121.43ms
iter 102270: loss nan, time 122.43ms
iter 102280: loss nan, time 121.07ms
iter 102290: loss nan, time 121.02ms
tensor(0.5627)
iter 102300: loss nan, time 121.70ms
iter 102310: loss nan, time 121.37ms
iter 102320: loss nan, time 119.28ms
iter 102330: loss nan, time 119.00ms
iter 102340: loss nan, time 119.32ms
iter 102350: loss nan, time 119.92ms
iter 102360: loss nan, time 120.11ms
iter 102370: loss nan, time 119.90ms
iter 102380: loss nan, time 120.44ms
iter 102390: loss nan, time 120.46ms
tensor(0.5314)
iter 102400: loss nan, time 121.19ms
iter 102410: loss nan, time 121.58ms
iter 102420: loss nan, time 122.79ms
iter 102430: loss nan, time 121.46ms
iter 102440: loss nan, time 119.13ms
iter 102450: loss nan, time 121.35ms
iter 102460: loss nan, time 121.49ms
iter 102470: loss nan, time 121.51ms
iter 102480: loss nan, time 119.11ms
iter 102490: loss nan, time 120.47ms
tensor(0.5000)
step 102500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 102500: loss nan, time 2900.75ms
iter 102510: loss nan, time 119.48ms
iter 102520: loss nan, time 120.37ms
iter 102530: loss nan, time 120.63ms
iter 102540: loss nan, time 120.40ms
iter 102550: loss nan, time 121.07ms
iter 102560: loss nan, time 121.06ms
iter 102570: loss nan, time 122.60ms
iter 102580: loss nan, time 121.23ms
iter 102590: loss nan, time 121.71ms
tensor(0.4686)
iter 102600: loss nan, time 121.74ms
iter 102610: loss nan, time 119.43ms
iter 102620: loss nan, time 119.19ms
iter 102630: loss nan, time 121.53ms
iter 102640: loss nan, time 120.41ms
iter 102650: loss nan, time 120.72ms
iter 102660: loss nan, time 120.38ms
iter 102670: loss nan, time 121.23ms
iter 102680: loss nan, time 122.78ms
iter 102690: loss nan, time 120.53ms
tensor(0.4373)
iter 102700: loss nan, time 121.76ms
iter 102710: loss nan, time 121.23ms
iter 102720: loss nan, time 121.42ms
iter 102730: loss nan, time 121.26ms
iter 102740: loss nan, time 118.50ms
step 102750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 102750: loss nan, time 2906.95ms
iter 102760: loss nan, time 119.12ms
iter 102770: loss nan, time 120.54ms
iter 102780: loss nan, time 120.39ms
iter 102790: loss nan, time 120.40ms
tensor(0.4063)
iter 102800: loss nan, time 120.48ms
iter 102810: loss nan, time 118.35ms
iter 102820: loss nan, time 121.76ms
iter 102830: loss nan, time 122.84ms
iter 102840: loss nan, time 121.23ms
iter 102850: loss nan, time 121.33ms
iter 102860: loss nan, time 121.46ms
iter 102870: loss nan, time 121.90ms
iter 102880: loss nan, time 119.27ms
iter 102890: loss nan, time 119.83ms
tensor(0.3757)
iter 102900: loss nan, time 120.39ms
iter 102910: loss nan, time 120.20ms
iter 102920: loss nan, time 119.57ms
iter 102930: loss nan, time 120.61ms
iter 102940: loss nan, time 120.71ms
iter 102950: loss nan, time 122.14ms
iter 102960: loss nan, time 122.81ms
iter 102970: loss nan, time 121.15ms
iter 102980: loss nan, time 121.28ms
iter 102990: loss nan, time 119.32ms
tensor(0.3455)
step 103000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 103000: loss nan, time 2896.41ms
iter 103010: loss nan, time 121.13ms
iter 103020: loss nan, time 121.32ms
iter 103030: loss nan, time 121.21ms
iter 103040: loss nan, time 119.31ms
iter 103050: loss nan, time 119.40ms
iter 103060: loss nan, time 119.69ms
iter 103070: loss nan, time 120.52ms
iter 103080: loss nan, time 121.25ms
iter 103090: loss nan, time 121.41ms
tensor(0.3159)
iter 103100: loss nan, time 122.09ms
iter 103110: loss nan, time 121.66ms
iter 103120: loss nan, time 120.51ms
iter 103130: loss nan, time 121.70ms
iter 103140: loss nan, time 121.38ms
iter 103150: loss nan, time 120.64ms
iter 103160: loss nan, time 119.31ms
iter 103170: loss nan, time 119.54ms
iter 103180: loss nan, time 120.19ms
iter 103190: loss nan, time 120.14ms
tensor(0.2871)
iter 103200: loss nan, time 121.01ms
iter 103210: loss nan, time 120.58ms
iter 103220: loss nan, time 120.67ms
iter 103230: loss nan, time 120.86ms
iter 103240: loss nan, time 120.21ms
step 103250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 103250: loss nan, time 2910.78ms
iter 103260: loss nan, time 121.60ms
iter 103270: loss nan, time 121.34ms
iter 103280: loss nan, time 119.96ms
iter 103290: loss nan, time 121.12ms
tensor(0.2591)
iter 103300: loss nan, time 119.49ms
iter 103310: loss nan, time 121.52ms
iter 103320: loss nan, time 121.40ms
iter 103330: loss nan, time 120.28ms
iter 103340: loss nan, time 119.17ms
iter 103350: loss nan, time 119.67ms
iter 103360: loss nan, time 119.16ms
iter 103370: loss nan, time 120.22ms
iter 103380: loss nan, time 120.25ms
iter 103390: loss nan, time 120.53ms
tensor(0.2321)
iter 103400: loss nan, time 121.43ms
iter 103410: loss nan, time 121.21ms
iter 103420: loss nan, time 122.53ms
iter 103430: loss nan, time 121.51ms
iter 103440: loss nan, time 120.63ms
iter 103450: loss nan, time 122.11ms
iter 103460: loss nan, time 121.17ms
iter 103470: loss nan, time 121.10ms
iter 103480: loss nan, time 121.18ms
iter 103490: loss nan, time 118.14ms
tensor(0.2061)
step 103500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 103500: loss nan, time 2903.84ms
iter 103510: loss nan, time 121.10ms
iter 103520: loss nan, time 119.01ms
iter 103530: loss nan, time 119.49ms
iter 103540: loss nan, time 120.39ms
iter 103550: loss nan, time 120.87ms
iter 103560: loss nan, time 120.18ms
iter 103570: loss nan, time 120.19ms
iter 103580: loss nan, time 120.40ms
iter 103590: loss nan, time 121.47ms
tensor(0.1813)
iter 103600: loss nan, time 120.69ms
iter 103610: loss nan, time 122.46ms
iter 103620: loss nan, time 122.39ms
iter 103630: loss nan, time 121.08ms
iter 103640: loss nan, time 121.23ms
iter 103650: loss nan, time 119.00ms
iter 103660: loss nan, time 118.72ms
iter 103670: loss nan, time 119.22ms
iter 103680: loss nan, time 118.87ms
iter 103690: loss nan, time 118.99ms
tensor(0.1577)
iter 103700: loss nan, time 120.07ms
iter 103710: loss nan, time 119.15ms
iter 103720: loss nan, time 120.39ms
iter 103730: loss nan, time 121.38ms
iter 103740: loss nan, time 122.12ms
step 103750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 103750: loss nan, time 2921.74ms
iter 103760: loss nan, time 122.61ms
iter 103770: loss nan, time 121.38ms
iter 103780: loss nan, time 122.49ms
iter 103790: loss nan, time 120.58ms
tensor(0.1355)
iter 103800: loss nan, time 121.34ms
iter 103810: loss nan, time 121.21ms
iter 103820: loss nan, time 119.39ms
iter 103830: loss nan, time 114.62ms
iter 103840: loss nan, time 118.51ms
iter 103850: loss nan, time 115.73ms
iter 103860: loss nan, time 114.93ms
iter 103870: loss nan, time 122.16ms
iter 103880: loss nan, time 119.73ms
iter 103890: loss nan, time 121.05ms
tensor(0.1147)
iter 103900: loss nan, time 121.30ms
iter 103910: loss nan, time 123.53ms
iter 103920: loss nan, time 121.97ms
iter 103930: loss nan, time 121.35ms
iter 103940: loss nan, time 121.99ms
iter 103950: loss nan, time 117.56ms
iter 103960: loss nan, time 119.81ms
iter 103970: loss nan, time 119.68ms
iter 103980: loss nan, time 119.53ms
iter 103990: loss nan, time 119.33ms
tensor(0.0955)
step 104000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 104000: loss nan, time 2910.17ms
iter 104010: loss nan, time 120.73ms
iter 104020: loss nan, time 121.22ms
iter 104030: loss nan, time 119.60ms
iter 104040: loss nan, time 117.88ms
iter 104050: loss nan, time 120.84ms
iter 104060: loss nan, time 119.04ms
iter 104070: loss nan, time 120.46ms
iter 104080: loss nan, time 120.68ms
iter 104090: loss nan, time 121.44ms
tensor(0.0778)
iter 104100: loss nan, time 122.66ms
iter 104110: loss nan, time 122.58ms
iter 104120: loss nan, time 122.63ms
iter 104130: loss nan, time 122.16ms
iter 104140: loss nan, time 119.24ms
iter 104150: loss nan, time 120.15ms
iter 104160: loss nan, time 120.11ms
iter 104170: loss nan, time 119.58ms
iter 104180: loss nan, time 120.78ms
iter 104190: loss nan, time 120.59ms
tensor(0.0618)
iter 104200: loss nan, time 122.89ms
iter 104210: loss nan, time 122.93ms
iter 104220: loss nan, time 121.11ms
iter 104230: loss nan, time 122.58ms
iter 104240: loss nan, time 122.26ms
step 104250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 104250: loss nan, time 2905.95ms
iter 104260: loss nan, time 123.17ms
iter 104270: loss nan, time 121.35ms
iter 104280: loss nan, time 115.23ms
iter 104290: loss nan, time 118.50ms
tensor(0.0476)
iter 104300: loss nan, time 117.41ms
iter 104310: loss nan, time 116.89ms
iter 104320: loss nan, time 116.07ms
iter 104330: loss nan, time 115.50ms
iter 104340: loss nan, time 117.17ms
iter 104350: loss nan, time 118.25ms
iter 104360: loss nan, time 115.52ms
iter 104370: loss nan, time 117.58ms
iter 104380: loss nan, time 116.07ms
iter 104390: loss nan, time 114.80ms
tensor(0.0351)
iter 104400: loss nan, time 116.58ms
iter 104410: loss nan, time 116.85ms
iter 104420: loss nan, time 114.52ms
iter 104430: loss nan, time 117.70ms
iter 104440: loss nan, time 115.75ms
iter 104450: loss nan, time 117.18ms
iter 104460: loss nan, time 117.34ms
iter 104470: loss nan, time 115.81ms
iter 104480: loss nan, time 115.21ms
iter 104490: loss nan, time 118.16ms
tensor(0.0245)
step 104500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 104500: loss nan, time 2902.09ms
iter 104510: loss nan, time 118.39ms
iter 104520: loss nan, time 116.87ms
iter 104530: loss nan, time 114.72ms
iter 104540: loss nan, time 117.22ms
iter 104550: loss nan, time 115.60ms
iter 104560: loss nan, time 117.19ms
iter 104570: loss nan, time 118.06ms
iter 104580: loss nan, time 115.02ms
iter 104590: loss nan, time 116.84ms
tensor(0.0157)
iter 104600: loss nan, time 117.53ms
iter 104610: loss nan, time 114.72ms
iter 104620: loss nan, time 117.75ms
iter 104630: loss nan, time 116.38ms
iter 104640: loss nan, time 115.34ms
iter 104650: loss nan, time 118.29ms
iter 104660: loss nan, time 116.08ms
iter 104670: loss nan, time 115.06ms
iter 104680: loss nan, time 118.53ms
iter 104690: loss nan, time 114.97ms
tensor(0.0089)
iter 104700: loss nan, time 115.45ms
iter 104710: loss nan, time 116.61ms
iter 104720: loss nan, time 114.84ms
iter 104730: loss nan, time 116.79ms
iter 104740: loss nan, time 114.99ms
step 104750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 104750: loss nan, time 2898.59ms
iter 104760: loss nan, time 115.78ms
iter 104770: loss nan, time 116.69ms
iter 104780: loss nan, time 117.91ms
iter 104790: loss nan, time 114.86ms
tensor(0.0039)
iter 104800: loss nan, time 117.25ms
iter 104810: loss nan, time 116.43ms
iter 104820: loss nan, time 113.75ms
iter 104830: loss nan, time 117.49ms
iter 104840: loss nan, time 115.67ms
iter 104850: loss nan, time 116.36ms
iter 104860: loss nan, time 117.02ms
iter 104870: loss nan, time 114.55ms
iter 104880: loss nan, time 117.90ms
iter 104890: loss nan, time 115.55ms
tensor(0.0010)
iter 104900: loss nan, time 115.39ms
iter 104910: loss nan, time 117.05ms
iter 104920: loss nan, time 114.75ms
iter 104930: loss nan, time 116.83ms
iter 104940: loss nan, time 116.91ms
iter 104950: loss nan, time 114.46ms
iter 104960: loss nan, time 115.81ms
iter 104970: loss nan, time 116.02ms
iter 104980: loss nan, time 116.41ms
iter 104990: loss nan, time 118.30ms
tensor(0.0010)
step 105000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 105000: loss nan, time 2906.13ms
iter 105010: loss nan, time 117.96ms
iter 105020: loss nan, time 116.29ms
iter 105030: loss nan, time 114.76ms
iter 105040: loss nan, time 118.01ms
iter 105050: loss nan, time 115.12ms
iter 105060: loss nan, time 116.89ms
iter 105070: loss nan, time 119.36ms
iter 105080: loss nan, time 114.78ms
iter 105090: loss nan, time 116.84ms
tensor(0.0010)
iter 105100: loss nan, time 118.05ms
iter 105110: loss nan, time 114.90ms
iter 105120: loss nan, time 118.57ms
iter 105130: loss nan, time 116.30ms
iter 105140: loss nan, time 115.12ms
iter 105150: loss nan, time 117.26ms
iter 105160: loss nan, time 115.45ms
iter 105170: loss nan, time 116.69ms
iter 105180: loss nan, time 117.14ms
iter 105190: loss nan, time 114.77ms
tensor(0.0039)
iter 105200: loss nan, time 119.45ms
iter 105210: loss nan, time 116.52ms
iter 105220: loss nan, time 114.68ms
iter 105230: loss nan, time 118.37ms
iter 105240: loss nan, time 116.40ms
step 105250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 105250: loss nan, time 2890.49ms
iter 105260: loss nan, time 116.37ms
iter 105270: loss nan, time 114.58ms
iter 105280: loss nan, time 118.18ms
iter 105290: loss nan, time 115.79ms
tensor(0.0089)
iter 105300: loss nan, time 115.66ms
iter 105310: loss nan, time 117.13ms
iter 105320: loss nan, time 115.71ms
iter 105330: loss nan, time 116.54ms
iter 105340: loss nan, time 118.99ms
iter 105350: loss nan, time 114.75ms
iter 105360: loss nan, time 117.76ms
iter 105370: loss nan, time 115.88ms
iter 105380: loss nan, time 115.04ms
iter 105390: loss nan, time 117.88ms
tensor(0.0157)
iter 105400: loss nan, time 116.26ms
iter 105410: loss nan, time 114.57ms
iter 105420: loss nan, time 118.28ms
iter 105430: loss nan, time 114.72ms
iter 105440: loss nan, time 116.88ms
iter 105450: loss nan, time 117.63ms
iter 105460: loss nan, time 114.20ms
iter 105470: loss nan, time 117.91ms
iter 105480: loss nan, time 116.35ms
iter 105490: loss nan, time 115.16ms
tensor(0.0245)
step 105500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 105500: loss nan, time 2891.52ms
iter 105510: loss nan, time 115.97ms
iter 105520: loss nan, time 118.14ms
iter 105530: loss nan, time 114.56ms
iter 105540: loss nan, time 117.12ms
iter 105550: loss nan, time 115.70ms
iter 105560: loss nan, time 115.02ms
iter 105570: loss nan, time 117.01ms
iter 105580: loss nan, time 115.12ms
iter 105590: loss nan, time 116.65ms
tensor(0.0351)
iter 105600: loss nan, time 118.43ms
iter 105610: loss nan, time 114.76ms
iter 105620: loss nan, time 118.09ms
iter 105630: loss nan, time 115.21ms
iter 105640: loss nan, time 114.70ms
iter 105650: loss nan, time 117.62ms
iter 105660: loss nan, time 115.93ms
iter 105670: loss nan, time 114.87ms
iter 105680: loss nan, time 118.48ms
iter 105690: loss nan, time 114.67ms
tensor(0.0476)
iter 105700: loss nan, time 117.14ms
iter 105710: loss nan, time 117.88ms
iter 105720: loss nan, time 114.78ms
iter 105730: loss nan, time 117.04ms
iter 105740: loss nan, time 116.53ms
step 105750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 105750: loss nan, time 2894.33ms
iter 105760: loss nan, time 116.76ms
iter 105770: loss nan, time 114.75ms
iter 105780: loss nan, time 117.12ms
iter 105790: loss nan, time 116.53ms
tensor(0.0618)
iter 105800: loss nan, time 115.91ms
iter 105810: loss nan, time 116.37ms
iter 105820: loss nan, time 116.22ms
iter 105830: loss nan, time 116.83ms
iter 105840: loss nan, time 118.31ms
iter 105850: loss nan, time 114.72ms
iter 105860: loss nan, time 116.55ms
iter 105870: loss nan, time 115.79ms
iter 105880: loss nan, time 115.17ms
iter 105890: loss nan, time 116.23ms
tensor(0.0778)
iter 105900: loss nan, time 115.94ms
iter 105910: loss nan, time 117.19ms
iter 105920: loss nan, time 117.58ms
iter 105930: loss nan, time 114.64ms
iter 105940: loss nan, time 117.80ms
iter 105950: loss nan, time 114.67ms
iter 105960: loss nan, time 114.19ms
iter 105970: loss nan, time 118.62ms
iter 105980: loss nan, time 114.43ms
iter 105990: loss nan, time 116.54ms
tensor(0.0955)
step 106000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 106000: loss nan, time 2896.37ms
iter 106010: loss nan, time 114.61ms
iter 106020: loss nan, time 117.91ms
iter 106030: loss nan, time 114.98ms
iter 106040: loss nan, time 117.93ms
iter 106050: loss nan, time 115.80ms
iter 106060: loss nan, time 115.66ms
iter 106070: loss nan, time 115.50ms
iter 106080: loss nan, time 114.77ms
iter 106090: loss nan, time 117.23ms
tensor(0.1147)
iter 106100: loss nan, time 116.96ms
iter 106110: loss nan, time 114.98ms
iter 106120: loss nan, time 117.06ms
iter 106130: loss nan, time 115.52ms
iter 106140: loss nan, time 116.84ms
iter 106150: loss nan, time 116.21ms
iter 106160: loss nan, time 114.28ms
iter 106170: loss nan, time 117.85ms
iter 106180: loss nan, time 115.65ms
iter 106190: loss nan, time 114.76ms
tensor(0.1355)
iter 106200: loss nan, time 118.55ms
iter 106210: loss nan, time 115.05ms
iter 106220: loss nan, time 116.10ms
iter 106230: loss nan, time 117.60ms
iter 106240: loss nan, time 114.64ms
step 106250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 106250: loss nan, time 2897.64ms
iter 106260: loss nan, time 114.74ms
iter 106270: loss nan, time 115.05ms
iter 106280: loss nan, time 116.49ms
iter 106290: loss nan, time 116.44ms
tensor(0.1577)
iter 106300: loss nan, time 118.71ms
iter 106310: loss nan, time 116.28ms
iter 106320: loss nan, time 116.92ms
iter 106330: loss nan, time 116.12ms
iter 106340: loss nan, time 114.60ms
iter 106350: loss nan, time 116.82ms
iter 106360: loss nan, time 116.47ms
iter 106370: loss nan, time 115.03ms
iter 106380: loss nan, time 117.93ms
iter 106390: loss nan, time 115.32ms
tensor(0.1813)
iter 106400: loss nan, time 117.55ms
iter 106410: loss nan, time 116.91ms
iter 106420: loss nan, time 114.84ms
iter 106430: loss nan, time 117.16ms
iter 106440: loss nan, time 116.66ms
iter 106450: loss nan, time 114.66ms
iter 106460: loss nan, time 117.88ms
iter 106470: loss nan, time 113.77ms
iter 106480: loss nan, time 115.00ms
iter 106490: loss nan, time 118.23ms
tensor(0.2061)
step 106500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 106500: loss nan, time 2890.72ms
iter 106510: loss nan, time 118.02ms
iter 106520: loss nan, time 115.26ms
iter 106530: loss nan, time 114.64ms
iter 106540: loss nan, time 117.95ms
iter 106550: loss nan, time 114.71ms
iter 106560: loss nan, time 117.96ms
iter 106570: loss nan, time 116.50ms
iter 106580: loss nan, time 115.32ms
iter 106590: loss nan, time 116.12ms
tensor(0.2321)
iter 106600: loss nan, time 116.19ms
iter 106610: loss nan, time 115.84ms
iter 106620: loss nan, time 117.94ms
iter 106630: loss nan, time 115.13ms
iter 106640: loss nan, time 117.22ms
iter 106650: loss nan, time 116.18ms
iter 106660: loss nan, time 114.62ms
iter 106670: loss nan, time 116.85ms
iter 106680: loss nan, time 115.59ms
iter 106690: loss nan, time 115.05ms
tensor(0.2591)
iter 106700: loss nan, time 118.43ms
iter 106710: loss nan, time 115.77ms
iter 106720: loss nan, time 117.33ms
iter 106730: loss nan, time 116.97ms
iter 106740: loss nan, time 114.56ms
step 106750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 106750: loss nan, time 2906.97ms
iter 106760: loss nan, time 115.77ms
iter 106770: loss nan, time 117.04ms
iter 106780: loss nan, time 118.23ms
iter 106790: loss nan, time 116.03ms
tensor(0.2871)
iter 106800: loss nan, time 117.61ms
iter 106810: loss nan, time 117.45ms
iter 106820: loss nan, time 116.10ms
iter 106830: loss nan, time 116.87ms
iter 106840: loss nan, time 116.57ms
iter 106850: loss nan, time 115.37ms
iter 106860: loss nan, time 116.65ms
iter 106870: loss nan, time 115.81ms
iter 106880: loss nan, time 115.53ms
iter 106890: loss nan, time 118.38ms
tensor(0.3159)
iter 106900: loss nan, time 116.14ms
iter 106910: loss nan, time 114.65ms
iter 106920: loss nan, time 117.06ms
iter 106930: loss nan, time 115.47ms
iter 106940: loss nan, time 116.72ms
iter 106950: loss nan, time 115.95ms
iter 106960: loss nan, time 114.79ms
iter 106970: loss nan, time 116.20ms
iter 106980: loss nan, time 115.76ms
iter 106990: loss nan, time 116.74ms
tensor(0.3455)
step 107000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 107000: loss nan, time 2900.04ms
iter 107010: loss nan, time 114.81ms
iter 107020: loss nan, time 114.02ms
iter 107030: loss nan, time 115.82ms
iter 107040: loss nan, time 114.06ms
iter 107050: loss nan, time 113.08ms
iter 107060: loss nan, time 113.40ms
iter 107070: loss nan, time 113.92ms
iter 107080: loss nan, time 114.92ms
iter 107090: loss nan, time 117.10ms
tensor(0.3757)
iter 107100: loss nan, time 116.30ms
iter 107110: loss nan, time 116.97ms
iter 107120: loss nan, time 118.00ms
iter 107130: loss nan, time 114.55ms
iter 107140: loss nan, time 116.85ms
iter 107150: loss nan, time 117.82ms
iter 107160: loss nan, time 115.89ms
iter 107170: loss nan, time 116.80ms
iter 107180: loss nan, time 116.20ms
iter 107190: loss nan, time 115.06ms
tensor(0.4063)
iter 107200: loss nan, time 117.08ms
iter 107210: loss nan, time 116.02ms
iter 107220: loss nan, time 116.83ms
iter 107230: loss nan, time 117.99ms
iter 107240: loss nan, time 115.87ms
step 107250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 107250: loss nan, time 2891.42ms
iter 107260: loss nan, time 112.87ms
iter 107270: loss nan, time 113.85ms
iter 107280: loss nan, time 114.51ms
iter 107290: loss nan, time 117.75ms
tensor(0.4373)
iter 107300: loss nan, time 115.76ms
iter 107310: loss nan, time 116.06ms
iter 107320: loss nan, time 117.09ms
iter 107330: loss nan, time 114.26ms
iter 107340: loss nan, time 115.86ms
iter 107350: loss nan, time 116.42ms
iter 107360: loss nan, time 115.20ms
iter 107370: loss nan, time 118.02ms
iter 107380: loss nan, time 116.03ms
iter 107390: loss nan, time 116.08ms
tensor(0.4686)
iter 107400: loss nan, time 117.11ms
iter 107410: loss nan, time 116.54ms
iter 107420: loss nan, time 116.72ms
iter 107430: loss nan, time 116.31ms
iter 107440: loss nan, time 114.66ms
iter 107450: loss nan, time 117.97ms
iter 107460: loss nan, time 115.75ms
iter 107470: loss nan, time 116.86ms
iter 107480: loss nan, time 117.63ms
iter 107490: loss nan, time 115.68ms
tensor(0.5000)
step 107500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 107500: loss nan, time 2898.95ms
iter 107510: loss nan, time 115.30ms
iter 107520: loss nan, time 116.79ms
iter 107530: loss nan, time 114.93ms
iter 107540: loss nan, time 114.09ms
iter 107550: loss nan, time 118.16ms
iter 107560: loss nan, time 115.56ms
iter 107570: loss nan, time 116.25ms
iter 107580: loss nan, time 118.16ms
iter 107590: loss nan, time 113.71ms
tensor(0.5314)
iter 107600: loss nan, time 116.94ms
iter 107610: loss nan, time 116.57ms
iter 107620: loss nan, time 115.52ms
iter 107630: loss nan, time 117.12ms
iter 107640: loss nan, time 115.88ms
iter 107650: loss nan, time 116.29ms
iter 107660: loss nan, time 118.38ms
iter 107670: loss nan, time 115.80ms
iter 107680: loss nan, time 116.80ms
iter 107690: loss nan, time 117.23ms
tensor(0.5627)
iter 107700: loss nan, time 116.45ms
iter 107710: loss nan, time 116.71ms
iter 107720: loss nan, time 116.17ms
iter 107730: loss nan, time 115.18ms
iter 107740: loss nan, time 116.43ms
step 107750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 107750: loss nan, time 2902.67ms
iter 107760: loss nan, time 116.79ms
iter 107770: loss nan, time 116.53ms
iter 107780: loss nan, time 116.34ms
iter 107790: loss nan, time 117.09ms
tensor(0.5937)
iter 107800: loss nan, time 118.07ms
iter 107810: loss nan, time 116.29ms
iter 107820: loss nan, time 116.53ms
iter 107830: loss nan, time 115.94ms
iter 107840: loss nan, time 115.12ms
iter 107850: loss nan, time 116.91ms
iter 107860: loss nan, time 115.64ms
iter 107870: loss nan, time 116.87ms
iter 107880: loss nan, time 117.12ms
iter 107890: loss nan, time 115.62ms
tensor(0.6243)
iter 107900: loss nan, time 117.09ms
iter 107910: loss nan, time 114.92ms
iter 107920: loss nan, time 115.42ms
iter 107930: loss nan, time 116.99ms
iter 107940: loss nan, time 115.87ms
iter 107950: loss nan, time 116.28ms
iter 107960: loss nan, time 117.87ms
iter 107970: loss nan, time 114.66ms
iter 107980: loss nan, time 116.64ms
iter 107990: loss nan, time 116.34ms
tensor(0.6545)
step 108000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 108000: loss nan, time 2900.34ms
iter 108010: loss nan, time 117.93ms
iter 108020: loss nan, time 116.01ms
iter 108030: loss nan, time 114.51ms
iter 108040: loss nan, time 117.06ms
iter 108050: loss nan, time 115.62ms
iter 108060: loss nan, time 117.07ms
iter 108070: loss nan, time 115.98ms
iter 108080: loss nan, time 114.60ms
iter 108090: loss nan, time 115.87ms
tensor(0.6841)
iter 108100: loss nan, time 116.01ms
iter 108110: loss nan, time 116.60ms
iter 108120: loss nan, time 116.41ms
iter 108130: loss nan, time 115.32ms
iter 108140: loss nan, time 116.66ms
iter 108150: loss nan, time 114.53ms
iter 108160: loss nan, time 116.95ms
iter 108170: loss nan, time 117.39ms
iter 108180: loss nan, time 115.63ms
iter 108190: loss nan, time 116.74ms
tensor(0.7129)
iter 108200: loss nan, time 117.14ms
iter 108210: loss nan, time 115.31ms
iter 108220: loss nan, time 116.64ms
iter 108230: loss nan, time 114.65ms
iter 108240: loss nan, time 114.67ms
step 108250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 108250: loss nan, time 2897.58ms
iter 108260: loss nan, time 114.87ms
iter 108270: loss nan, time 113.94ms
iter 108280: loss nan, time 116.08ms
iter 108290: loss nan, time 114.09ms
tensor(0.7409)
iter 108300: loss nan, time 115.43ms
iter 108310: loss nan, time 112.77ms
iter 108320: loss nan, time 115.96ms
iter 108330: loss nan, time 115.33ms
iter 108340: loss nan, time 114.92ms
iter 108350: loss nan, time 116.54ms
iter 108360: loss nan, time 114.43ms
iter 108370: loss nan, time 116.61ms
iter 108380: loss nan, time 115.42ms
iter 108390: loss nan, time 116.81ms
tensor(0.7679)
iter 108400: loss nan, time 116.58ms
iter 108410: loss nan, time 117.00ms
iter 108420: loss nan, time 115.98ms
iter 108430: loss nan, time 115.94ms
iter 108440: loss nan, time 115.99ms
iter 108450: loss nan, time 115.99ms
iter 108460: loss nan, time 115.35ms
iter 108470: loss nan, time 117.15ms
iter 108480: loss nan, time 115.99ms
iter 108490: loss nan, time 117.04ms
tensor(0.7939)
step 108500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 108500: loss nan, time 2897.62ms
iter 108510: loss nan, time 115.05ms
iter 108520: loss nan, time 116.75ms
iter 108530: loss nan, time 115.78ms
iter 108540: loss nan, time 115.86ms
iter 108550: loss nan, time 118.26ms
iter 108560: loss nan, time 116.37ms
iter 108570: loss nan, time 116.76ms
iter 108580: loss nan, time 116.25ms
iter 108590: loss nan, time 115.20ms
tensor(0.8187)
iter 108600: loss nan, time 117.10ms
iter 108610: loss nan, time 115.86ms
iter 108620: loss nan, time 114.66ms
iter 108630: loss nan, time 118.10ms
iter 108640: loss nan, time 115.79ms
iter 108650: loss nan, time 116.71ms
iter 108660: loss nan, time 116.77ms
iter 108670: loss nan, time 114.38ms
iter 108680: loss nan, time 114.60ms
iter 108690: loss nan, time 115.73ms
tensor(0.8423)
iter 108700: loss nan, time 117.36ms
iter 108710: loss nan, time 118.28ms
iter 108720: loss nan, time 115.89ms
iter 108730: loss nan, time 116.74ms
iter 108740: loss nan, time 115.11ms
step 108750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 108750: loss nan, time 2900.21ms
iter 108760: loss nan, time 117.55ms
iter 108770: loss nan, time 114.69ms
iter 108780: loss nan, time 116.78ms
iter 108790: loss nan, time 116.48ms
tensor(0.8645)
iter 108800: loss nan, time 115.20ms
iter 108810: loss nan, time 116.85ms
iter 108820: loss nan, time 116.96ms
iter 108830: loss nan, time 117.26ms
iter 108840: loss nan, time 117.68ms
iter 108850: loss nan, time 115.82ms
iter 108860: loss nan, time 115.97ms
iter 108870: loss nan, time 116.09ms
iter 108880: loss nan, time 114.94ms
iter 108890: loss nan, time 116.73ms
tensor(0.8853)
iter 108900: loss nan, time 116.17ms
iter 108910: loss nan, time 116.84ms
iter 108920: loss nan, time 118.28ms
iter 108930: loss nan, time 115.86ms
iter 108940: loss nan, time 114.70ms
iter 108950: loss nan, time 116.38ms
iter 108960: loss nan, time 114.56ms
iter 108970: loss nan, time 116.61ms
iter 108980: loss nan, time 116.00ms
iter 108990: loss nan, time 114.81ms
tensor(0.9045)
step 109000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 109000: loss nan, time 2897.11ms
iter 109010: loss nan, time 114.23ms
iter 109020: loss nan, time 117.88ms
iter 109030: loss nan, time 114.69ms
iter 109040: loss nan, time 116.73ms
iter 109050: loss nan, time 118.17ms
iter 109060: loss nan, time 115.13ms
iter 109070: loss nan, time 116.76ms
iter 109080: loss nan, time 116.15ms
iter 109090: loss nan, time 114.78ms
tensor(0.9222)
iter 109100: loss nan, time 117.18ms
iter 109110: loss nan, time 115.97ms
iter 109120: loss nan, time 116.87ms
iter 109130: loss nan, time 118.28ms
iter 109140: loss nan, time 116.32ms
iter 109150: loss nan, time 116.88ms
iter 109160: loss nan, time 116.04ms
iter 109170: loss nan, time 115.47ms
iter 109180: loss nan, time 116.76ms
iter 109190: loss nan, time 116.28ms
tensor(0.9382)
iter 109200: loss nan, time 114.95ms
iter 109210: loss nan, time 117.44ms
iter 109220: loss nan, time 112.89ms
iter 109230: loss nan, time 114.67ms
iter 109240: loss nan, time 114.78ms
step 109250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 109250: loss nan, time 2897.34ms
iter 109260: loss nan, time 115.21ms
iter 109270: loss nan, time 113.62ms
iter 109280: loss nan, time 113.97ms
iter 109290: loss nan, time 114.32ms
tensor(0.9524)
iter 109300: loss nan, time 115.47ms
iter 109310: loss nan, time 112.78ms
iter 109320: loss nan, time 114.74ms
iter 109330: loss nan, time 115.97ms
iter 109340: loss nan, time 114.37ms
iter 109350: loss nan, time 113.74ms
iter 109360: loss nan, time 115.20ms
iter 109370: loss nan, time 115.24ms
iter 109380: loss nan, time 117.14ms
iter 109390: loss nan, time 115.85ms
tensor(0.9649)
iter 109400: loss nan, time 117.09ms
iter 109410: loss nan, time 116.40ms
iter 109420: loss nan, time 115.74ms
iter 109430: loss nan, time 117.74ms
iter 109440: loss nan, time 117.59ms
iter 109450: loss nan, time 115.79ms
iter 109460: loss nan, time 116.83ms
iter 109470: loss nan, time 115.77ms
iter 109480: loss nan, time 115.53ms
iter 109490: loss nan, time 117.20ms
tensor(0.9755)
step 109500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 109500: loss nan, time 2896.89ms
iter 109510: loss nan, time 115.86ms
iter 109520: loss nan, time 116.02ms
iter 109530: loss nan, time 114.67ms
iter 109540: loss nan, time 117.40ms
iter 109550: loss nan, time 115.76ms
iter 109560: loss nan, time 116.97ms
iter 109570: loss nan, time 118.02ms
iter 109580: loss nan, time 116.19ms
iter 109590: loss nan, time 116.60ms
tensor(0.9843)
iter 109600: loss nan, time 117.14ms
iter 109610: loss nan, time 116.21ms
iter 109620: loss nan, time 116.72ms
iter 109630: loss nan, time 115.89ms
iter 109640: loss nan, time 116.66ms
iter 109650: loss nan, time 118.05ms
iter 109660: loss nan, time 115.30ms
iter 109670: loss nan, time 114.14ms
iter 109680: loss nan, time 116.03ms
iter 109690: loss nan, time 114.78ms
tensor(0.9911)
iter 109700: loss nan, time 118.10ms
iter 109710: loss nan, time 115.77ms
iter 109720: loss nan, time 115.50ms
iter 109730: loss nan, time 115.75ms
iter 109740: loss nan, time 115.78ms
step 109750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 109750: loss nan, time 2894.28ms
iter 109760: loss nan, time 114.37ms
iter 109770: loss nan, time 116.74ms
iter 109780: loss nan, time 115.76ms
iter 109790: loss nan, time 116.96ms
tensor(0.9961)
iter 109800: loss nan, time 118.46ms
iter 109810: loss nan, time 116.17ms
iter 109820: loss nan, time 116.74ms
iter 109830: loss nan, time 116.50ms
iter 109840: loss nan, time 115.39ms
iter 109850: loss nan, time 116.82ms
iter 109860: loss nan, time 116.00ms
iter 109870: loss nan, time 114.51ms
iter 109880: loss nan, time 117.64ms
iter 109890: loss nan, time 115.49ms
tensor(0.9990)
iter 109900: loss nan, time 117.51ms
iter 109910: loss nan, time 116.26ms
iter 109920: loss nan, time 113.76ms
iter 109930: loss nan, time 115.72ms
iter 109940: loss nan, time 116.44ms
iter 109950: loss nan, time 115.61ms
iter 109960: loss nan, time 117.53ms
iter 109970: loss nan, time 116.15ms
iter 109980: loss nan, time 115.69ms
iter 109990: loss nan, time 116.34ms
tensor(1.)
step 110000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 110000: loss nan, time 2897.46ms
iter 110010: loss nan, time 118.12ms
iter 110020: loss nan, time 115.08ms
iter 110030: loss nan, time 116.91ms
iter 110040: loss nan, time 115.85ms
iter 110050: loss nan, time 115.25ms
iter 110060: loss nan, time 116.88ms
iter 110070: loss nan, time 115.76ms
iter 110080: loss nan, time 116.29ms
iter 110090: loss nan, time 117.87ms
tensor(0.9990)
iter 110100: loss nan, time 115.52ms
iter 110110: loss nan, time 116.70ms
iter 110120: loss nan, time 117.20ms
iter 110130: loss nan, time 115.64ms
iter 110140: loss nan, time 117.76ms
iter 110150: loss nan, time 118.09ms
iter 110160: loss nan, time 116.57ms
iter 110170: loss nan, time 116.24ms
iter 110180: loss nan, time 117.34ms
iter 110190: loss nan, time 114.71ms
tensor(0.9961)
iter 110200: loss nan, time 118.39ms
iter 110210: loss nan, time 116.27ms
iter 110220: loss nan, time 115.10ms
iter 110230: loss nan, time 118.67ms
iter 110240: loss nan, time 115.88ms
step 110250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 110250: loss nan, time 2913.14ms
iter 110260: loss nan, time 116.02ms
iter 110270: loss nan, time 114.53ms
iter 110280: loss nan, time 116.81ms
iter 110290: loss nan, time 116.12ms
tensor(0.9911)
iter 110300: loss nan, time 116.17ms
iter 110310: loss nan, time 117.97ms
iter 110320: loss nan, time 115.74ms
iter 110330: loss nan, time 116.75ms
iter 110340: loss nan, time 116.95ms
iter 110350: loss nan, time 114.63ms
iter 110360: loss nan, time 117.24ms
iter 110370: loss nan, time 116.61ms
iter 110380: loss nan, time 114.68ms
iter 110390: loss nan, time 117.87ms
tensor(0.9843)
iter 110400: loss nan, time 115.55ms
iter 110410: loss nan, time 114.96ms
iter 110420: loss nan, time 117.83ms
iter 110430: loss nan, time 115.28ms
iter 110440: loss nan, time 116.81ms
iter 110450: loss nan, time 118.01ms
iter 110460: loss nan, time 114.61ms
iter 110470: loss nan, time 116.74ms
iter 110480: loss nan, time 116.75ms
iter 110490: loss nan, time 114.69ms
tensor(0.9755)
step 110500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 110500: loss nan, time 2900.49ms
iter 110510: loss nan, time 115.04ms
iter 110520: loss nan, time 114.41ms
iter 110530: loss nan, time 116.81ms
iter 110540: loss nan, time 114.67ms
iter 110550: loss nan, time 118.08ms
iter 110560: loss nan, time 116.46ms
iter 110570: loss nan, time 114.69ms
iter 110580: loss nan, time 115.74ms
iter 110590: loss nan, time 116.27ms
tensor(0.9649)
iter 110600: loss nan, time 117.39ms
iter 110610: loss nan, time 118.06ms
iter 110620: loss nan, time 114.16ms
iter 110630: loss nan, time 117.19ms
iter 110640: loss nan, time 115.77ms
iter 110650: loss nan, time 114.85ms
iter 110660: loss nan, time 116.73ms
iter 110670: loss nan, time 116.05ms
iter 110680: loss nan, time 114.62ms
iter 110690: loss nan, time 118.16ms
tensor(0.9524)
iter 110700: loss nan, time 116.09ms
iter 110710: loss nan, time 116.68ms
iter 110720: loss nan, time 116.91ms
iter 110730: loss nan, time 114.58ms
iter 110740: loss nan, time 116.99ms
step 110750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 110750: loss nan, time 2903.29ms
iter 110760: loss nan, time 116.92ms
iter 110770: loss nan, time 118.28ms
iter 110780: loss nan, time 114.66ms
iter 110790: loss nan, time 118.09ms
tensor(0.9382)
iter 110800: loss nan, time 116.97ms
iter 110810: loss nan, time 115.14ms
iter 110820: loss nan, time 117.94ms
iter 110830: loss nan, time 115.49ms
iter 110840: loss nan, time 113.67ms
iter 110850: loss nan, time 117.86ms
iter 110860: loss nan, time 114.58ms
iter 110870: loss nan, time 117.27ms
iter 110880: loss nan, time 116.61ms
iter 110890: loss nan, time 114.52ms
tensor(0.9222)
iter 110900: loss nan, time 115.50ms
iter 110910: loss nan, time 116.04ms
iter 110920: loss nan, time 116.91ms
iter 110930: loss nan, time 117.58ms
iter 110940: loss nan, time 115.75ms
iter 110950: loss nan, time 116.71ms
iter 110960: loss nan, time 115.88ms
iter 110970: loss nan, time 114.73ms
iter 110980: loss nan, time 116.89ms
iter 110990: loss nan, time 116.48ms
tensor(0.9045)
step 111000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 111000: loss nan, time 2906.63ms
iter 111010: loss nan, time 118.41ms
iter 111020: loss nan, time 114.98ms
iter 111030: loss nan, time 116.84ms
iter 111040: loss nan, time 116.78ms
iter 111050: loss nan, time 114.13ms
iter 111060: loss nan, time 118.04ms
iter 111070: loss nan, time 115.58ms
iter 111080: loss nan, time 114.79ms
iter 111090: loss nan, time 118.22ms
tensor(0.8853)
iter 111100: loss nan, time 116.73ms
iter 111110: loss nan, time 114.93ms
iter 111120: loss nan, time 118.19ms
iter 111130: loss nan, time 115.79ms
iter 111140: loss nan, time 117.74ms
iter 111150: loss nan, time 118.17ms
iter 111160: loss nan, time 115.33ms
iter 111170: loss nan, time 117.02ms
iter 111180: loss nan, time 117.47ms
iter 111190: loss nan, time 114.70ms
tensor(0.8645)
iter 111200: loss nan, time 118.66ms
iter 111210: loss nan, time 116.47ms
iter 111220: loss nan, time 114.75ms
iter 111230: loss nan, time 117.94ms
iter 111240: loss nan, time 115.73ms
step 111250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 111250: loss nan, time 2902.92ms
iter 111260: loss nan, time 115.17ms
iter 111270: loss nan, time 114.70ms
iter 111280: loss nan, time 118.03ms
iter 111290: loss nan, time 114.90ms
tensor(0.8423)
iter 111300: loss nan, time 117.55ms
iter 111310: loss nan, time 115.55ms
iter 111320: loss nan, time 114.70ms
iter 111330: loss nan, time 117.64ms
iter 111340: loss nan, time 116.12ms
iter 111350: loss nan, time 114.85ms
iter 111360: loss nan, time 118.06ms
iter 111370: loss nan, time 114.11ms
iter 111380: loss nan, time 116.72ms
iter 111390: loss nan, time 118.08ms
tensor(0.8187)
iter 111400: loss nan, time 115.71ms
iter 111410: loss nan, time 117.48ms
iter 111420: loss nan, time 116.64ms
iter 111430: loss nan, time 115.21ms
iter 111440: loss nan, time 117.90ms
iter 111450: loss nan, time 115.87ms
iter 111460: loss nan, time 116.83ms
iter 111470: loss nan, time 118.43ms
iter 111480: loss nan, time 114.71ms
iter 111490: loss nan, time 117.59ms
tensor(0.7939)
step 111500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 111500: loss nan, time 2907.05ms
iter 111510: loss nan, time 114.83ms
iter 111520: loss nan, time 117.92ms
iter 111530: loss nan, time 115.12ms
iter 111540: loss nan, time 116.84ms
iter 111550: loss nan, time 115.97ms
iter 111560: loss nan, time 114.62ms
iter 111570: loss nan, time 117.06ms
iter 111580: loss nan, time 116.56ms
iter 111590: loss nan, time 113.79ms
tensor(0.7679)
iter 111600: loss nan, time 118.89ms
iter 111610: loss nan, time 115.90ms
iter 111620: loss nan, time 115.19ms
iter 111630: loss nan, time 116.79ms
iter 111640: loss nan, time 115.22ms
iter 111650: loss nan, time 116.81ms
iter 111660: loss nan, time 117.50ms
iter 111670: loss nan, time 114.54ms
iter 111680: loss nan, time 117.95ms
iter 111690: loss nan, time 114.93ms
tensor(0.7409)
iter 111700: loss nan, time 115.53ms
iter 111710: loss nan, time 117.87ms
iter 111720: loss nan, time 115.29ms
iter 111730: loss nan, time 116.68ms
iter 111740: loss nan, time 117.31ms
step 111750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 111750: loss nan, time 2896.75ms
iter 111760: loss nan, time 117.84ms
iter 111770: loss nan, time 114.61ms
iter 111780: loss nan, time 117.94ms
iter 111790: loss nan, time 116.46ms
tensor(0.7129)
iter 111800: loss nan, time 115.69ms
iter 111810: loss nan, time 116.19ms
iter 111820: loss nan, time 116.13ms
iter 111830: loss nan, time 115.53ms
iter 111840: loss nan, time 117.33ms
iter 111850: loss nan, time 114.92ms
iter 111860: loss nan, time 116.94ms
iter 111870: loss nan, time 116.35ms
iter 111880: loss nan, time 114.51ms
iter 111890: loss nan, time 117.12ms
tensor(0.6841)
iter 111900: loss nan, time 116.47ms
iter 111910: loss nan, time 115.78ms
iter 111920: loss nan, time 117.98ms
iter 111930: loss nan, time 115.43ms
iter 111940: loss nan, time 117.27ms
iter 111950: loss nan, time 116.83ms
iter 111960: loss nan, time 114.91ms
iter 111970: loss nan, time 116.85ms
iter 111980: loss nan, time 117.53ms
iter 111990: loss nan, time 114.68ms
tensor(0.6545)
step 112000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 112000: loss nan, time 2905.92ms
iter 112010: loss nan, time 115.56ms
iter 112020: loss nan, time 116.62ms
iter 112030: loss nan, time 118.10ms
iter 112040: loss nan, time 115.46ms
iter 112050: loss nan, time 116.91ms
iter 112060: loss nan, time 117.97ms
iter 112070: loss nan, time 114.84ms
iter 112080: loss nan, time 117.61ms
iter 112090: loss nan, time 116.60ms
tensor(0.6243)
iter 112100: loss nan, time 115.54ms
iter 112110: loss nan, time 118.27ms
iter 112120: loss nan, time 115.78ms
iter 112130: loss nan, time 114.76ms
iter 112140: loss nan, time 118.11ms
iter 112150: loss nan, time 115.18ms
iter 112160: loss nan, time 116.79ms
iter 112170: loss nan, time 118.19ms
iter 112180: loss nan, time 114.94ms
iter 112190: loss nan, time 114.81ms
tensor(0.5937)
iter 112200: loss nan, time 117.17ms
iter 112210: loss nan, time 115.05ms
iter 112220: loss nan, time 117.95ms
iter 112230: loss nan, time 115.54ms
iter 112240: loss nan, time 114.79ms
step 112250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 112250: loss nan, time 2898.15ms
iter 112260: loss nan, time 114.61ms
iter 112270: loss nan, time 118.04ms
iter 112280: loss nan, time 114.74ms
iter 112290: loss nan, time 116.79ms
tensor(0.5627)
iter 112300: loss nan, time 119.49ms
iter 112310: loss nan, time 115.58ms
iter 112320: loss nan, time 116.79ms
iter 112330: loss nan, time 118.14ms
iter 112340: loss nan, time 114.42ms
iter 112350: loss nan, time 116.78ms
iter 112360: loss nan, time 117.03ms
iter 112370: loss nan, time 114.66ms
iter 112380: loss nan, time 117.94ms
iter 112390: loss nan, time 115.84ms
tensor(0.5314)
iter 112400: loss nan, time 116.39ms
iter 112410: loss nan, time 117.88ms
iter 112420: loss nan, time 115.53ms
iter 112430: loss nan, time 117.02ms
iter 112440: loss nan, time 117.24ms
iter 112450: loss nan, time 114.55ms
iter 112460: loss nan, time 117.87ms
iter 112470: loss nan, time 116.31ms
iter 112480: loss nan, time 115.09ms
iter 112490: loss nan, time 118.02ms
tensor(0.5000)
step 112500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 112500: loss nan, time 2853.09ms
iter 112510: loss nan, time 115.82ms
iter 112520: loss nan, time 116.77ms
iter 112530: loss nan, time 117.84ms
iter 112540: loss nan, time 114.67ms
iter 112550: loss nan, time 116.93ms
iter 112560: loss nan, time 116.91ms
iter 112570: loss nan, time 114.62ms
iter 112580: loss nan, time 117.83ms
iter 112590: loss nan, time 115.75ms
tensor(0.4686)
iter 112600: loss nan, time 117.03ms
iter 112610: loss nan, time 118.05ms
iter 112620: loss nan, time 114.98ms
iter 112630: loss nan, time 116.28ms
iter 112640: loss nan, time 117.21ms
iter 112650: loss nan, time 114.61ms
iter 112660: loss nan, time 117.64ms
iter 112670: loss nan, time 115.98ms
iter 112680: loss nan, time 114.63ms
iter 112690: loss nan, time 117.91ms
tensor(0.4373)
iter 112700: loss nan, time 115.89ms
iter 112710: loss nan, time 114.70ms
iter 112720: loss nan, time 117.92ms
iter 112730: loss nan, time 114.85ms
iter 112740: loss nan, time 117.08ms
step 112750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 112750: loss nan, time 2898.56ms
iter 112760: loss nan, time 116.62ms
iter 112770: loss nan, time 117.56ms
iter 112780: loss nan, time 114.68ms
iter 112790: loss nan, time 118.08ms
tensor(0.4063)
iter 112800: loss nan, time 115.45ms
iter 112810: loss nan, time 115.05ms
iter 112820: loss nan, time 118.01ms
iter 112830: loss nan, time 115.50ms
iter 112840: loss nan, time 116.76ms
iter 112850: loss nan, time 117.91ms
iter 112860: loss nan, time 114.68ms
iter 112870: loss nan, time 116.77ms
iter 112880: loss nan, time 116.37ms
iter 112890: loss nan, time 114.77ms
tensor(0.3757)
iter 112900: loss nan, time 118.52ms
iter 112910: loss nan, time 116.11ms
iter 112920: loss nan, time 114.64ms
iter 112930: loss nan, time 118.22ms
iter 112940: loss nan, time 115.30ms
iter 112950: loss nan, time 116.28ms
iter 112960: loss nan, time 117.55ms
iter 112970: loss nan, time 114.57ms
iter 112980: loss nan, time 117.83ms
iter 112990: loss nan, time 116.32ms
tensor(0.3455)
step 113000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 113000: loss nan, time 2907.82ms
iter 113010: loss nan, time 117.98ms
iter 113020: loss nan, time 115.37ms
iter 113030: loss nan, time 116.84ms
iter 113040: loss nan, time 116.20ms
iter 113050: loss nan, time 114.64ms
iter 113060: loss nan, time 116.91ms
iter 113070: loss nan, time 116.51ms
iter 113080: loss nan, time 114.72ms
iter 113090: loss nan, time 118.05ms
tensor(0.3159)
iter 113100: loss nan, time 116.41ms
iter 113110: loss nan, time 115.74ms
iter 113120: loss nan, time 116.90ms
iter 113130: loss nan, time 120.99ms
iter 113140: loss nan, time 121.64ms
iter 113150: loss nan, time 122.22ms
iter 113160: loss nan, time 122.24ms
iter 113170: loss nan, time 120.99ms
iter 113180: loss nan, time 122.27ms
iter 113190: loss nan, time 122.26ms
tensor(0.2871)
iter 113200: loss nan, time 122.78ms
iter 113210: loss nan, time 122.23ms
iter 113220: loss nan, time 121.31ms
iter 113230: loss nan, time 121.18ms
iter 113240: loss nan, time 120.90ms
step 113250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 113250: loss nan, time 2908.60ms
iter 113260: loss nan, time 122.44ms
iter 113270: loss nan, time 120.96ms
iter 113280: loss nan, time 120.87ms
iter 113290: loss nan, time 120.89ms
tensor(0.2591)
iter 113300: loss nan, time 121.25ms
iter 113310: loss nan, time 119.17ms
iter 113320: loss nan, time 118.99ms
iter 113330: loss nan, time 119.03ms
iter 113340: loss nan, time 119.23ms
iter 113350: loss nan, time 119.13ms
iter 113360: loss nan, time 119.16ms
iter 113370: loss nan, time 120.14ms
iter 113380: loss nan, time 120.00ms
iter 113390: loss nan, time 120.11ms
tensor(0.2321)
iter 113400: loss nan, time 120.22ms
iter 113410: loss nan, time 120.41ms
iter 113420: loss nan, time 120.04ms
iter 113430: loss nan, time 119.84ms
iter 113440: loss nan, time 120.40ms
iter 113450: loss nan, time 120.57ms
iter 113460: loss nan, time 119.92ms
iter 113470: loss nan, time 121.31ms
iter 113480: loss nan, time 121.63ms
iter 113490: loss nan, time 122.36ms
tensor(0.2061)
step 113500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 113500: loss nan, time 2920.81ms
iter 113510: loss nan, time 121.21ms
iter 113520: loss nan, time 118.73ms
iter 113530: loss nan, time 120.85ms
iter 113540: loss nan, time 120.78ms
iter 113550: loss nan, time 120.90ms
iter 113560: loss nan, time 120.97ms
iter 113570: loss nan, time 118.69ms
iter 113580: loss nan, time 120.60ms
iter 113590: loss nan, time 120.81ms
tensor(0.1813)
iter 113600: loss nan, time 121.15ms
iter 113610: loss nan, time 120.72ms
iter 113620: loss nan, time 118.63ms
iter 113630: loss nan, time 119.08ms
iter 113640: loss nan, time 118.61ms
iter 113650: loss nan, time 119.40ms
iter 113660: loss nan, time 119.77ms
iter 113670: loss nan, time 119.67ms
iter 113680: loss nan, time 117.82ms
iter 113690: loss nan, time 120.24ms
tensor(0.1577)
iter 113700: loss nan, time 119.49ms
iter 113710: loss nan, time 119.99ms
iter 113720: loss nan, time 120.04ms
iter 113730: loss nan, time 118.76ms
iter 113740: loss nan, time 120.46ms
step 113750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 113750: loss nan, time 2906.84ms
iter 113760: loss nan, time 120.13ms
iter 113770: loss nan, time 119.95ms
iter 113780: loss nan, time 119.99ms
iter 113790: loss nan, time 119.04ms
tensor(0.1355)
iter 113800: loss nan, time 120.46ms
iter 113810: loss nan, time 120.22ms
iter 113820: loss nan, time 120.04ms
iter 113830: loss nan, time 120.36ms
iter 113840: loss nan, time 119.78ms
iter 113850: loss nan, time 120.66ms
iter 113860: loss nan, time 122.16ms
iter 113870: loss nan, time 122.18ms
iter 113880: loss nan, time 122.10ms
iter 113890: loss nan, time 121.36ms
tensor(0.1147)
iter 113900: loss nan, time 120.12ms
iter 113910: loss nan, time 120.83ms
iter 113920: loss nan, time 121.49ms
iter 113930: loss nan, time 120.85ms
iter 113940: loss nan, time 120.92ms
iter 113950: loss nan, time 119.34ms
iter 113960: loss nan, time 118.55ms
iter 113970: loss nan, time 120.10ms
iter 113980: loss nan, time 120.65ms
iter 113990: loss nan, time 120.17ms
tensor(0.0955)
step 114000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 114000: loss nan, time 2917.04ms
iter 114010: loss nan, time 120.10ms
iter 114020: loss nan, time 119.71ms
iter 114030: loss nan, time 120.07ms
iter 114040: loss nan, time 119.20ms
iter 114050: loss nan, time 119.21ms
iter 114060: loss nan, time 121.29ms
iter 114070: loss nan, time 121.75ms
iter 114080: loss nan, time 120.28ms
iter 114090: loss nan, time 122.15ms
tensor(0.0778)
iter 114100: loss nan, time 122.80ms
iter 114110: loss nan, time 121.98ms
iter 114120: loss nan, time 120.82ms
iter 114130: loss nan, time 119.32ms
iter 114140: loss nan, time 120.79ms
iter 114150: loss nan, time 121.61ms
iter 114160: loss nan, time 122.25ms
iter 114170: loss nan, time 121.65ms
iter 114180: loss nan, time 120.82ms
iter 114190: loss nan, time 120.37ms
tensor(0.0618)
iter 114200: loss nan, time 122.00ms
iter 114210: loss nan, time 122.52ms
iter 114220: loss nan, time 121.22ms
iter 114230: loss nan, time 121.47ms
iter 114240: loss nan, time 120.86ms
step 114250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 114250: loss nan, time 2904.50ms
iter 114260: loss nan, time 120.12ms
iter 114270: loss nan, time 120.81ms
iter 114280: loss nan, time 120.54ms
iter 114290: loss nan, time 120.94ms
tensor(0.0476)
iter 114300: loss nan, time 121.05ms
iter 114310: loss nan, time 122.40ms
iter 114320: loss nan, time 123.41ms
iter 114330: loss nan, time 121.73ms
iter 114340: loss nan, time 121.15ms
iter 114350: loss nan, time 120.60ms
iter 114360: loss nan, time 121.27ms
iter 114370: loss nan, time 120.77ms
iter 114380: loss nan, time 121.18ms
iter 114390: loss nan, time 119.23ms
tensor(0.0351)
iter 114400: loss nan, time 120.05ms
iter 114410: loss nan, time 120.09ms
iter 114420: loss nan, time 120.15ms
iter 114430: loss nan, time 120.49ms
iter 114440: loss nan, time 120.16ms
iter 114450: loss nan, time 120.05ms
iter 114460: loss nan, time 119.02ms
iter 114470: loss nan, time 119.94ms
iter 114480: loss nan, time 119.94ms
iter 114490: loss nan, time 120.84ms
tensor(0.0245)
step 114500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 114500: loss nan, time 2904.03ms
iter 114510: loss nan, time 119.69ms
iter 114520: loss nan, time 120.10ms
iter 114530: loss nan, time 120.02ms
iter 114540: loss nan, time 120.13ms
iter 114550: loss nan, time 121.08ms
iter 114560: loss nan, time 120.71ms
iter 114570: loss nan, time 122.04ms
iter 114580: loss nan, time 122.54ms
iter 114590: loss nan, time 118.83ms
tensor(0.0157)
iter 114600: loss nan, time 121.51ms
iter 114610: loss nan, time 121.00ms
iter 114620: loss nan, time 120.97ms
iter 114630: loss nan, time 120.73ms
iter 114640: loss nan, time 118.59ms
iter 114650: loss nan, time 120.14ms
iter 114660: loss nan, time 120.97ms
iter 114670: loss nan, time 120.89ms
iter 114680: loss nan, time 120.84ms
iter 114690: loss nan, time 118.78ms
tensor(0.0089)
iter 114700: loss nan, time 121.65ms
iter 114710: loss nan, time 120.86ms
iter 114720: loss nan, time 121.21ms
iter 114730: loss nan, time 118.66ms
iter 114740: loss nan, time 118.54ms
step 114750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 114750: loss nan, time 2897.66ms
iter 114760: loss nan, time 120.10ms
iter 114770: loss nan, time 120.66ms
iter 114780: loss nan, time 123.31ms
iter 114790: loss nan, time 120.95ms
tensor(0.0039)
iter 114800: loss nan, time 119.82ms
iter 114810: loss nan, time 121.32ms
iter 114820: loss nan, time 120.83ms
iter 114830: loss nan, time 120.69ms
iter 114840: loss nan, time 120.75ms
iter 114850: loss nan, time 118.81ms
iter 114860: loss nan, time 120.43ms
iter 114870: loss nan, time 119.11ms
iter 114880: loss nan, time 118.64ms
iter 114890: loss nan, time 118.99ms
tensor(0.0010)
iter 114900: loss nan, time 118.05ms
iter 114910: loss nan, time 118.82ms
iter 114920: loss nan, time 119.16ms
iter 114930: loss nan, time 118.81ms
iter 114940: loss nan, time 119.26ms
iter 114950: loss nan, time 119.15ms
iter 114960: loss nan, time 118.07ms
iter 114970: loss nan, time 118.93ms
iter 114980: loss nan, time 119.96ms
iter 114990: loss nan, time 119.93ms
tensor(0.0010)
step 115000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 115000: loss nan, time 2920.52ms
iter 115010: loss nan, time 120.13ms
iter 115020: loss nan, time 118.98ms
iter 115030: loss nan, time 120.36ms
iter 115040: loss nan, time 120.42ms
iter 115050: loss nan, time 120.86ms
iter 115060: loss nan, time 121.17ms
iter 115070: loss nan, time 119.87ms
iter 115080: loss nan, time 122.36ms
iter 115090: loss nan, time 122.34ms
tensor(0.0010)
iter 115100: loss nan, time 121.27ms
iter 115110: loss nan, time 121.14ms
iter 115120: loss nan, time 120.86ms
iter 115130: loss nan, time 121.36ms
iter 115140: loss nan, time 121.16ms
iter 115150: loss nan, time 120.97ms
iter 115160: loss nan, time 121.20ms
iter 115170: loss nan, time 121.08ms
iter 115180: loss nan, time 120.87ms
iter 115190: loss nan, time 121.02ms
tensor(0.0039)
iter 115200: loss nan, time 119.11ms
iter 115210: loss nan, time 119.29ms
iter 115220: loss nan, time 119.15ms
iter 115230: loss nan, time 118.60ms
iter 115240: loss nan, time 118.76ms
step 115250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 115250: loss nan, time 2907.98ms
iter 115260: loss nan, time 119.01ms
iter 115270: loss nan, time 119.45ms
iter 115280: loss nan, time 119.17ms
iter 115290: loss nan, time 119.03ms
tensor(0.0089)
iter 115300: loss nan, time 119.18ms
iter 115310: loss nan, time 119.43ms
iter 115320: loss nan, time 119.31ms
iter 115330: loss nan, time 119.48ms
iter 115340: loss nan, time 119.31ms
iter 115350: loss nan, time 119.08ms
iter 115360: loss nan, time 119.93ms
iter 115370: loss nan, time 120.07ms
iter 115380: loss nan, time 119.62ms
iter 115390: loss nan, time 118.90ms
tensor(0.0157)
iter 115400: loss nan, time 119.31ms
iter 115410: loss nan, time 120.33ms
iter 115420: loss nan, time 120.02ms
iter 115430: loss nan, time 119.33ms
iter 115440: loss nan, time 119.13ms
iter 115450: loss nan, time 119.34ms
iter 115460: loss nan, time 119.87ms
iter 115470: loss nan, time 120.35ms
iter 115480: loss nan, time 119.57ms
iter 115490: loss nan, time 121.83ms
tensor(0.0245)
step 115500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 115500: loss nan, time 2907.24ms
iter 115510: loss nan, time 119.81ms
iter 115520: loss nan, time 120.17ms
iter 115530: loss nan, time 121.12ms
iter 115540: loss nan, time 119.96ms
iter 115550: loss nan, time 120.14ms
iter 115560: loss nan, time 120.46ms
iter 115570: loss nan, time 119.96ms
iter 115580: loss nan, time 122.44ms
iter 115590: loss nan, time 121.99ms
tensor(0.0351)
iter 115600: loss nan, time 122.52ms
iter 115610: loss nan, time 121.26ms
iter 115620: loss nan, time 119.82ms
iter 115630: loss nan, time 120.76ms
iter 115640: loss nan, time 120.54ms
iter 115650: loss nan, time 120.88ms
iter 115660: loss nan, time 120.88ms
iter 115670: loss nan, time 118.81ms
iter 115680: loss nan, time 120.87ms
iter 115690: loss nan, time 120.91ms
tensor(0.0476)
iter 115700: loss nan, time 121.03ms
iter 115710: loss nan, time 120.86ms
iter 115720: loss nan, time 118.77ms
iter 115730: loss nan, time 120.96ms
iter 115740: loss nan, time 121.06ms
step 115750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 115750: loss nan, time 2906.97ms
iter 115760: loss nan, time 119.67ms
iter 115770: loss nan, time 120.88ms
iter 115780: loss nan, time 118.74ms
iter 115790: loss nan, time 120.84ms
tensor(0.0618)
iter 115800: loss nan, time 120.38ms
iter 115810: loss nan, time 120.19ms
iter 115820: loss nan, time 120.84ms
iter 115830: loss nan, time 118.78ms
iter 115840: loss nan, time 118.88ms
iter 115850: loss nan, time 118.86ms
iter 115860: loss nan, time 119.04ms
iter 115870: loss nan, time 119.00ms
iter 115880: loss nan, time 119.60ms
iter 115890: loss nan, time 118.74ms
tensor(0.0778)
iter 115900: loss nan, time 120.24ms
iter 115910: loss nan, time 120.10ms
iter 115920: loss nan, time 119.14ms
iter 115930: loss nan, time 120.16ms
iter 115940: loss nan, time 118.88ms
iter 115950: loss nan, time 119.51ms
iter 115960: loss nan, time 120.38ms
iter 115970: loss nan, time 119.83ms
iter 115980: loss nan, time 120.93ms
iter 115990: loss nan, time 120.31ms
tensor(0.0955)
step 116000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 116000: loss nan, time 2911.47ms
iter 116010: loss nan, time 120.66ms
iter 116020: loss nan, time 120.96ms
iter 116030: loss nan, time 121.47ms
iter 116040: loss nan, time 121.20ms
iter 116050: loss nan, time 120.69ms
iter 116060: loss nan, time 122.27ms
iter 116070: loss nan, time 122.33ms
iter 116080: loss nan, time 121.34ms
iter 116090: loss nan, time 121.12ms
tensor(0.1147)
iter 116100: loss nan, time 121.35ms
iter 116110: loss nan, time 120.84ms
iter 116120: loss nan, time 120.89ms
iter 116130: loss nan, time 119.77ms
iter 116140: loss nan, time 120.03ms
iter 116150: loss nan, time 120.91ms
iter 116160: loss nan, time 120.87ms
iter 116170: loss nan, time 120.78ms
iter 116180: loss nan, time 120.33ms
iter 116190: loss nan, time 120.31ms
tensor(0.1355)
iter 116200: loss nan, time 121.10ms
iter 116210: loss nan, time 121.44ms
iter 116220: loss nan, time 118.88ms
iter 116230: loss nan, time 117.63ms
iter 116240: loss nan, time 118.71ms
step 116250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 116250: loss nan, time 2898.04ms
iter 116260: loss nan, time 120.85ms
iter 116270: loss nan, time 121.01ms
iter 116280: loss nan, time 120.89ms
iter 116290: loss nan, time 120.77ms
tensor(0.1577)
iter 116300: loss nan, time 118.69ms
iter 116310: loss nan, time 118.60ms
iter 116320: loss nan, time 119.39ms
iter 116330: loss nan, time 119.02ms
iter 116340: loss nan, time 118.32ms
iter 116350: loss nan, time 119.88ms
iter 116360: loss nan, time 119.33ms
iter 116370: loss nan, time 120.01ms
iter 116380: loss nan, time 119.91ms
iter 116390: loss nan, time 119.08ms
tensor(0.1813)
iter 116400: loss nan, time 120.40ms
iter 116410: loss nan, time 119.86ms
iter 116420: loss nan, time 119.93ms
iter 116430: loss nan, time 119.97ms
iter 116440: loss nan, time 119.64ms
iter 116450: loss nan, time 119.78ms
iter 116460: loss nan, time 119.84ms
iter 116470: loss nan, time 120.87ms
iter 116480: loss nan, time 121.20ms
iter 116490: loss nan, time 119.96ms
tensor(0.2061)
step 116500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 116500: loss nan, time 2901.83ms
iter 116510: loss nan, time 120.47ms
iter 116520: loss nan, time 119.77ms
iter 116530: loss nan, time 120.18ms
iter 116540: loss nan, time 120.51ms
iter 116550: loss nan, time 118.87ms
iter 116560: loss nan, time 120.65ms
iter 116570: loss nan, time 120.66ms
iter 116580: loss nan, time 122.14ms
iter 116590: loss nan, time 122.14ms
tensor(0.2321)
iter 116600: loss nan, time 120.25ms
iter 116610: loss nan, time 122.73ms
iter 116620: loss nan, time 122.22ms
iter 116630: loss nan, time 122.22ms
iter 116640: loss nan, time 122.16ms
iter 116650: loss nan, time 120.06ms
iter 116660: loss nan, time 122.33ms
iter 116670: loss nan, time 116.05ms
iter 116680: loss nan, time 116.19ms
iter 116690: loss nan, time 118.19ms
tensor(0.2591)
iter 116700: loss nan, time 116.00ms
iter 116710: loss nan, time 115.31ms
iter 116720: loss nan, time 117.30ms
iter 116730: loss nan, time 114.99ms
iter 116740: loss nan, time 116.77ms
step 116750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 116750: loss nan, time 2904.30ms
iter 116760: loss nan, time 115.34ms
iter 116770: loss nan, time 118.03ms
iter 116780: loss nan, time 115.35ms
iter 116790: loss nan, time 117.18ms
tensor(0.2871)
iter 116800: loss nan, time 116.12ms
iter 116810: loss nan, time 114.67ms
iter 116820: loss nan, time 117.97ms
iter 116830: loss nan, time 116.58ms
iter 116840: loss nan, time 114.88ms
iter 116850: loss nan, time 117.80ms
iter 116860: loss nan, time 114.72ms
iter 116870: loss nan, time 116.66ms
iter 116880: loss nan, time 118.72ms
iter 116890: loss nan, time 115.22ms
tensor(0.3159)
iter 116900: loss nan, time 117.13ms
iter 116910: loss nan, time 115.97ms
iter 116920: loss nan, time 114.68ms
iter 116930: loss nan, time 118.32ms
iter 116940: loss nan, time 116.22ms
iter 116950: loss nan, time 115.55ms
iter 116960: loss nan, time 118.04ms
iter 116970: loss nan, time 115.13ms
iter 116980: loss nan, time 117.00ms
iter 116990: loss nan, time 117.29ms
tensor(0.3455)
step 117000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 117000: loss nan, time 2908.81ms
iter 117010: loss nan, time 117.93ms
iter 117020: loss nan, time 116.18ms
iter 117030: loss nan, time 116.09ms
iter 117040: loss nan, time 115.92ms
iter 117050: loss nan, time 115.05ms
iter 117060: loss nan, time 116.86ms
iter 117070: loss nan, time 117.53ms
iter 117080: loss nan, time 114.16ms
iter 117090: loss nan, time 116.74ms
tensor(0.3757)
iter 117100: loss nan, time 113.90ms
iter 117110: loss nan, time 116.17ms
iter 117120: loss nan, time 112.75ms
iter 117130: loss nan, time 115.91ms
iter 117140: loss nan, time 113.53ms
iter 117150: loss nan, time 115.03ms
iter 117160: loss nan, time 114.78ms
iter 117170: loss nan, time 115.27ms
iter 117180: loss nan, time 114.83ms
iter 117190: loss nan, time 116.85ms
tensor(0.4063)
iter 117200: loss nan, time 117.33ms
iter 117210: loss nan, time 114.56ms
iter 117220: loss nan, time 117.94ms
iter 117230: loss nan, time 115.90ms
iter 117240: loss nan, time 116.33ms
step 117250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 117250: loss nan, time 2912.30ms
iter 117260: loss nan, time 114.69ms
iter 117270: loss nan, time 118.00ms
iter 117280: loss nan, time 116.44ms
iter 117290: loss nan, time 115.03ms
tensor(0.4373)
iter 117300: loss nan, time 118.21ms
iter 117310: loss nan, time 115.91ms
iter 117320: loss nan, time 115.02ms
iter 117330: loss nan, time 118.06ms
iter 117340: loss nan, time 115.19ms
iter 117350: loss nan, time 116.89ms
iter 117360: loss nan, time 117.52ms
iter 117370: loss nan, time 113.93ms
iter 117380: loss nan, time 118.19ms
iter 117390: loss nan, time 116.03ms
tensor(0.4686)
iter 117400: loss nan, time 115.40ms
iter 117410: loss nan, time 116.25ms
iter 117420: loss nan, time 112.77ms
iter 117430: loss nan, time 113.90ms
iter 117440: loss nan, time 115.06ms
iter 117450: loss nan, time 112.99ms
iter 117460: loss nan, time 116.01ms
iter 117470: loss nan, time 113.28ms
iter 117480: loss nan, time 116.07ms
iter 117490: loss nan, time 112.69ms
tensor(0.5000)
step 117500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 117500: loss nan, time 2899.65ms
iter 117510: loss nan, time 112.78ms
iter 117520: loss nan, time 115.55ms
iter 117530: loss nan, time 114.26ms
iter 117540: loss nan, time 114.64ms
iter 117550: loss nan, time 114.85ms
iter 117560: loss nan, time 113.97ms
iter 117570: loss nan, time 114.83ms
iter 117580: loss nan, time 113.31ms
iter 117590: loss nan, time 117.12ms
tensor(0.5314)
iter 117600: loss nan, time 113.02ms
iter 117610: loss nan, time 118.05ms
iter 117620: loss nan, time 116.63ms
iter 117630: loss nan, time 114.65ms
iter 117640: loss nan, time 118.59ms
iter 117650: loss nan, time 116.27ms
iter 117660: loss nan, time 114.87ms
iter 117670: loss nan, time 118.04ms
iter 117680: loss nan, time 114.51ms
iter 117690: loss nan, time 116.81ms
tensor(0.5627)
iter 117700: loss nan, time 118.18ms
iter 117710: loss nan, time 114.79ms
iter 117720: loss nan, time 118.16ms
iter 117730: loss nan, time 116.64ms
iter 117740: loss nan, time 114.53ms
step 117750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 117750: loss nan, time 2904.65ms
iter 117760: loss nan, time 114.87ms
iter 117770: loss nan, time 117.40ms
iter 117780: loss nan, time 115.85ms
iter 117790: loss nan, time 115.22ms
tensor(0.5937)
iter 117800: loss nan, time 117.13ms
iter 117810: loss nan, time 115.86ms
iter 117820: loss nan, time 117.06ms
iter 117830: loss nan, time 118.18ms
iter 117840: loss nan, time 115.01ms
iter 117850: loss nan, time 116.74ms
iter 117860: loss nan, time 115.63ms
iter 117870: loss nan, time 115.27ms
iter 117880: loss nan, time 117.97ms
iter 117890: loss nan, time 115.95ms
tensor(0.6243)
iter 117900: loss nan, time 115.41ms
iter 117910: loss nan, time 117.86ms
iter 117920: loss nan, time 114.90ms
iter 117930: loss nan, time 116.87ms
iter 117940: loss nan, time 116.54ms
iter 117950: loss nan, time 114.89ms
iter 117960: loss nan, time 117.99ms
iter 117970: loss nan, time 116.51ms
iter 117980: loss nan, time 114.86ms
iter 117990: loss nan, time 118.06ms
tensor(0.6545)
step 118000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 118000: loss nan, time 2891.84ms
iter 118010: loss nan, time 117.72ms
iter 118020: loss nan, time 115.61ms
iter 118030: loss nan, time 116.95ms
iter 118040: loss nan, time 115.90ms
iter 118050: loss nan, time 115.58ms
iter 118060: loss nan, time 116.86ms
iter 118070: loss nan, time 118.12ms
iter 118080: loss nan, time 114.97ms
iter 118090: loss nan, time 116.15ms
tensor(0.6841)
iter 118100: loss nan, time 116.24ms
iter 118110: loss nan, time 114.64ms
iter 118120: loss nan, time 116.83ms
iter 118130: loss nan, time 116.04ms
iter 118140: loss nan, time 115.13ms
iter 118150: loss nan, time 118.02ms
iter 118160: loss nan, time 114.55ms
iter 118170: loss nan, time 116.80ms
iter 118180: loss nan, time 115.60ms
iter 118190: loss nan, time 114.75ms
tensor(0.7129)
iter 118200: loss nan, time 118.23ms
iter 118210: loss nan, time 116.34ms
iter 118220: loss nan, time 114.65ms
iter 118230: loss nan, time 117.34ms
iter 118240: loss nan, time 115.22ms
step 118250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 118250: loss nan, time 2908.81ms
iter 118260: loss nan, time 116.72ms
iter 118270: loss nan, time 114.73ms
iter 118280: loss nan, time 117.98ms
iter 118290: loss nan, time 115.11ms
tensor(0.7409)
iter 118300: loss nan, time 115.03ms
iter 118310: loss nan, time 117.94ms
iter 118320: loss nan, time 115.44ms
iter 118330: loss nan, time 116.76ms
iter 118340: loss nan, time 117.50ms
iter 118350: loss nan, time 114.69ms
iter 118360: loss nan, time 114.80ms
iter 118370: loss nan, time 117.36ms
iter 118380: loss nan, time 114.69ms
iter 118390: loss nan, time 118.02ms
tensor(0.7679)
iter 118400: loss nan, time 116.58ms
iter 118410: loss nan, time 114.88ms
iter 118420: loss nan, time 116.03ms
iter 118430: loss nan, time 115.67ms
iter 118440: loss nan, time 115.96ms
iter 118450: loss nan, time 118.07ms
iter 118460: loss nan, time 114.94ms
iter 118470: loss nan, time 116.80ms
iter 118480: loss nan, time 115.79ms
iter 118490: loss nan, time 114.47ms
tensor(0.7939)
step 118500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 118500: loss nan, time 2902.76ms
iter 118510: loss nan, time 114.75ms
iter 118520: loss nan, time 116.84ms
iter 118530: loss nan, time 116.87ms
iter 118540: loss nan, time 115.09ms
iter 118550: loss nan, time 117.72ms
iter 118560: loss nan, time 116.56ms
iter 118570: loss nan, time 115.53ms
iter 118580: loss nan, time 117.93ms
iter 118590: loss nan, time 115.51ms
tensor(0.8187)
iter 118600: loss nan, time 117.06ms
iter 118610: loss nan, time 117.94ms
iter 118620: loss nan, time 114.62ms
iter 118630: loss nan, time 118.03ms
iter 118640: loss nan, time 116.74ms
iter 118650: loss nan, time 114.68ms
iter 118660: loss nan, time 117.96ms
iter 118670: loss nan, time 115.88ms
iter 118680: loss nan, time 114.76ms
iter 118690: loss nan, time 118.11ms
tensor(0.8423)
iter 118700: loss nan, time 115.49ms
iter 118710: loss nan, time 117.37ms
iter 118720: loss nan, time 117.57ms
iter 118730: loss nan, time 113.84ms
iter 118740: loss nan, time 115.92ms
step 118750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 118750: loss nan, time 2920.37ms
iter 118760: loss nan, time 114.87ms
iter 118770: loss nan, time 117.03ms
iter 118780: loss nan, time 116.68ms
iter 118790: loss nan, time 115.11ms
tensor(0.8645)
iter 118800: loss nan, time 118.46ms
iter 118810: loss nan, time 115.99ms
iter 118820: loss nan, time 115.34ms
iter 118830: loss nan, time 117.54ms
iter 118840: loss nan, time 116.44ms
iter 118850: loss nan, time 115.18ms
iter 118860: loss nan, time 118.00ms
iter 118870: loss nan, time 115.64ms
iter 118880: loss nan, time 116.85ms
iter 118890: loss nan, time 116.99ms
tensor(0.8853)
iter 118900: loss nan, time 115.26ms
iter 118910: loss nan, time 116.94ms
iter 118920: loss nan, time 117.98ms
iter 118930: loss nan, time 115.08ms
iter 118940: loss nan, time 118.07ms
iter 118950: loss nan, time 115.33ms
iter 118960: loss nan, time 114.65ms
iter 118970: loss nan, time 117.99ms
iter 118980: loss nan, time 116.57ms
iter 118990: loss nan, time 114.72ms
tensor(0.9045)
step 119000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 119000: loss nan, time 2907.61ms
iter 119010: loss nan, time 112.83ms
iter 119020: loss nan, time 115.50ms
iter 119030: loss nan, time 114.82ms
iter 119040: loss nan, time 117.96ms
iter 119050: loss nan, time 115.17ms
iter 119060: loss nan, time 116.87ms
iter 119070: loss nan, time 115.84ms
iter 119080: loss nan, time 114.03ms
iter 119090: loss nan, time 117.01ms
tensor(0.9222)
iter 119100: loss nan, time 116.28ms
iter 119110: loss nan, time 117.23ms
iter 119120: loss nan, time 118.43ms
iter 119130: loss nan, time 115.33ms
iter 119140: loss nan, time 116.06ms
iter 119150: loss nan, time 113.73ms
iter 119160: loss nan, time 116.10ms
iter 119170: loss nan, time 112.63ms
iter 119180: loss nan, time 115.77ms
iter 119190: loss nan, time 112.91ms
tensor(0.9382)
iter 119200: loss nan, time 114.14ms
iter 119210: loss nan, time 114.96ms
iter 119220: loss nan, time 113.98ms
iter 119230: loss nan, time 114.75ms
iter 119240: loss nan, time 114.93ms
step 119250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 119250: loss nan, time 2908.43ms
iter 119260: loss nan, time 116.73ms
iter 119270: loss nan, time 114.51ms
iter 119280: loss nan, time 115.93ms
iter 119290: loss nan, time 114.48ms
tensor(0.9524)
iter 119300: loss nan, time 117.06ms
iter 119310: loss nan, time 117.86ms
iter 119320: loss nan, time 113.75ms
iter 119330: loss nan, time 118.32ms
iter 119340: loss nan, time 115.78ms
iter 119350: loss nan, time 114.97ms
iter 119360: loss nan, time 116.67ms
iter 119370: loss nan, time 115.45ms
iter 119380: loss nan, time 116.71ms
iter 119390: loss nan, time 118.06ms
tensor(0.9649)
iter 119400: loss nan, time 115.05ms
iter 119410: loss nan, time 117.89ms
iter 119420: loss nan, time 115.70ms
iter 119430: loss nan, time 114.72ms
iter 119440: loss nan, time 117.85ms
iter 119450: loss nan, time 116.29ms
iter 119460: loss nan, time 115.09ms
iter 119470: loss nan, time 118.12ms
iter 119480: loss nan, time 114.82ms
iter 119490: loss nan, time 115.90ms
tensor(0.9755)
step 119500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 119500: loss nan, time 2894.43ms
iter 119510: loss nan, time 114.51ms
iter 119520: loss nan, time 117.83ms
iter 119530: loss nan, time 115.20ms
iter 119540: loss nan, time 114.52ms
iter 119550: loss nan, time 118.15ms
iter 119560: loss nan, time 115.15ms
iter 119570: loss nan, time 117.17ms
iter 119580: loss nan, time 117.04ms
iter 119590: loss nan, time 114.57ms
tensor(0.9843)
iter 119600: loss nan, time 116.18ms
iter 119610: loss nan, time 116.41ms
iter 119620: loss nan, time 114.76ms
iter 119630: loss nan, time 118.05ms
iter 119640: loss nan, time 115.57ms
iter 119650: loss nan, time 116.70ms
iter 119660: loss nan, time 115.85ms
iter 119670: loss nan, time 115.67ms
iter 119680: loss nan, time 116.90ms
iter 119690: loss nan, time 117.88ms
tensor(0.9911)
iter 119700: loss nan, time 114.97ms
iter 119710: loss nan, time 117.90ms
iter 119720: loss nan, time 116.26ms
iter 119730: loss nan, time 115.32ms
iter 119740: loss nan, time 116.86ms
step 119750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 119750: loss nan, time 2899.23ms
iter 119760: loss nan, time 116.82ms
iter 119770: loss nan, time 117.80ms
iter 119780: loss nan, time 116.05ms
iter 119790: loss nan, time 116.92ms
tensor(0.9961)
iter 119800: loss nan, time 117.16ms
iter 119810: loss nan, time 114.91ms
iter 119820: loss nan, time 116.96ms
iter 119830: loss nan, time 117.11ms
iter 119840: loss nan, time 115.76ms
iter 119850: loss nan, time 116.81ms
iter 119860: loss nan, time 117.06ms
iter 119870: loss nan, time 114.85ms
iter 119880: loss nan, time 117.17ms
iter 119890: loss nan, time 117.83ms
tensor(0.9990)
iter 119900: loss nan, time 115.47ms
iter 119910: loss nan, time 116.90ms
iter 119920: loss nan, time 116.69ms
iter 119930: loss nan, time 114.56ms
iter 119940: loss nan, time 116.90ms
iter 119950: loss nan, time 116.18ms
iter 119960: loss nan, time 115.32ms
iter 119970: loss nan, time 116.96ms
iter 119980: loss nan, time 115.89ms
iter 119990: loss nan, time 117.23ms
tensor(1.)
step 120000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 120000: loss nan, time 2912.52ms
iter 120010: loss nan, time 115.89ms
iter 120020: loss nan, time 117.10ms
iter 120030: loss nan, time 116.75ms
iter 120040: loss nan, time 115.27ms
iter 120050: loss nan, time 114.85ms
iter 120060: loss nan, time 115.69ms
iter 120070: loss nan, time 116.09ms
iter 120080: loss nan, time 118.10ms
iter 120090: loss nan, time 115.83ms
tensor(0.9990)
iter 120100: loss nan, time 117.00ms
iter 120110: loss nan, time 115.95ms
iter 120120: loss nan, time 115.90ms
iter 120130: loss nan, time 116.87ms
iter 120140: loss nan, time 117.28ms
iter 120150: loss nan, time 115.96ms
iter 120160: loss nan, time 116.74ms
iter 120170: loss nan, time 116.08ms
iter 120180: loss nan, time 115.24ms
iter 120190: loss nan, time 116.85ms
tensor(0.9961)
iter 120200: loss nan, time 117.03ms
iter 120210: loss nan, time 115.80ms
iter 120220: loss nan, time 116.68ms
iter 120230: loss nan, time 115.78ms
iter 120240: loss nan, time 116.77ms
step 120250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 120250: loss nan, time 2913.38ms
iter 120260: loss nan, time 114.74ms
iter 120270: loss nan, time 116.68ms
iter 120280: loss nan, time 116.74ms
iter 120290: loss nan, time 115.55ms
tensor(0.9911)
iter 120300: loss nan, time 117.12ms
iter 120310: loss nan, time 115.99ms
iter 120320: loss nan, time 114.73ms
iter 120330: loss nan, time 116.86ms
iter 120340: loss nan, time 115.99ms
iter 120350: loss nan, time 116.69ms
iter 120360: loss nan, time 118.05ms
iter 120370: loss nan, time 116.77ms
iter 120380: loss nan, time 117.32ms
iter 120390: loss nan, time 117.01ms
tensor(0.9843)
iter 120400: loss nan, time 116.15ms
iter 120410: loss nan, time 117.19ms
iter 120420: loss nan, time 116.23ms
iter 120430: loss nan, time 114.88ms
iter 120440: loss nan, time 117.16ms
iter 120450: loss nan, time 115.77ms
iter 120460: loss nan, time 116.84ms
iter 120470: loss nan, time 118.29ms
iter 120480: loss nan, time 115.82ms
iter 120490: loss nan, time 114.94ms
tensor(0.9755)
step 120500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 120500: loss nan, time 2905.60ms
iter 120510: loss nan, time 115.58ms
iter 120520: loss nan, time 116.74ms
iter 120530: loss nan, time 116.38ms
iter 120540: loss nan, time 115.35ms
iter 120550: loss nan, time 116.61ms
iter 120560: loss nan, time 115.94ms
iter 120570: loss nan, time 117.11ms
iter 120580: loss nan, time 116.88ms
iter 120590: loss nan, time 115.86ms
tensor(0.9649)
iter 120600: loss nan, time 117.09ms
iter 120610: loss nan, time 118.08ms
iter 120620: loss nan, time 116.29ms
iter 120630: loss nan, time 116.87ms
iter 120640: loss nan, time 118.10ms
iter 120650: loss nan, time 117.24ms
iter 120660: loss nan, time 116.29ms
iter 120670: loss nan, time 118.39ms
iter 120680: loss nan, time 116.24ms
iter 120690: loss nan, time 117.03ms
tensor(0.9524)
iter 120700: loss nan, time 117.58ms
iter 120710: loss nan, time 116.01ms
iter 120720: loss nan, time 117.02ms
iter 120730: loss nan, time 118.62ms
iter 120740: loss nan, time 116.00ms
step 120750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 120750: loss nan, time 2905.62ms
iter 120760: loss nan, time 116.82ms
iter 120770: loss nan, time 114.50ms
iter 120780: loss nan, time 117.50ms
iter 120790: loss nan, time 116.36ms
tensor(0.9382)
iter 120800: loss nan, time 115.65ms
iter 120810: loss nan, time 116.95ms
iter 120820: loss nan, time 116.23ms
iter 120830: loss nan, time 116.20ms
iter 120840: loss nan, time 118.14ms
iter 120850: loss nan, time 115.96ms
iter 120860: loss nan, time 117.01ms
iter 120870: loss nan, time 118.47ms
iter 120880: loss nan, time 115.98ms
iter 120890: loss nan, time 117.39ms
tensor(0.9222)
iter 120900: loss nan, time 117.13ms
iter 120910: loss nan, time 116.17ms
iter 120920: loss nan, time 116.87ms
iter 120930: loss nan, time 116.53ms
iter 120940: loss nan, time 115.11ms
iter 120950: loss nan, time 116.91ms
iter 120960: loss nan, time 115.85ms
iter 120970: loss nan, time 116.76ms
iter 120980: loss nan, time 117.96ms
iter 120990: loss nan, time 115.77ms
tensor(0.9045)
step 121000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 121000: loss nan, time 2901.78ms
iter 121010: loss nan, time 115.71ms
iter 121020: loss nan, time 116.21ms
iter 121030: loss nan, time 116.94ms
iter 121040: loss nan, time 115.81ms
iter 121050: loss nan, time 116.68ms
iter 121060: loss nan, time 118.01ms
iter 121070: loss nan, time 115.85ms
iter 121080: loss nan, time 116.68ms
iter 121090: loss nan, time 115.19ms
tensor(0.8853)
iter 121100: loss nan, time 115.64ms
iter 121110: loss nan, time 117.18ms
iter 121120: loss nan, time 116.39ms
iter 121130: loss nan, time 116.74ms
iter 121140: loss nan, time 118.07ms
iter 121150: loss nan, time 114.72ms
iter 121160: loss nan, time 116.83ms
iter 121170: loss nan, time 118.05ms
iter 121180: loss nan, time 116.04ms
iter 121190: loss nan, time 116.27ms
tensor(0.8645)
iter 121200: loss nan, time 116.97ms
iter 121210: loss nan, time 115.09ms
iter 121220: loss nan, time 117.17ms
iter 121230: loss nan, time 116.54ms
iter 121240: loss nan, time 115.32ms
step 121250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 121250: loss nan, time 2909.40ms
iter 121260: loss nan, time 115.91ms
iter 121270: loss nan, time 114.69ms
iter 121280: loss nan, time 117.78ms
iter 121290: loss nan, time 116.36ms
tensor(0.8423)
iter 121300: loss nan, time 117.29ms
iter 121310: loss nan, time 116.41ms
iter 121320: loss nan, time 116.00ms
iter 121330: loss nan, time 114.86ms
iter 121340: loss nan, time 115.85ms
iter 121350: loss nan, time 115.75ms
iter 121360: loss nan, time 116.90ms
iter 121370: loss nan, time 115.03ms
iter 121380: loss nan, time 114.89ms
iter 121390: loss nan, time 116.60ms
tensor(0.8187)
iter 121400: loss nan, time 116.13ms
iter 121410: loss nan, time 116.83ms
iter 121420: loss nan, time 118.21ms
iter 121430: loss nan, time 115.98ms
iter 121440: loss nan, time 117.65ms
iter 121450: loss nan, time 116.04ms
iter 121460: loss nan, time 115.87ms
iter 121470: loss nan, time 116.89ms
iter 121480: loss nan, time 118.31ms
iter 121490: loss nan, time 115.96ms
tensor(0.7939)
step 121500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 121500: loss nan, time 2921.30ms
iter 121510: loss nan, time 118.18ms
iter 121520: loss nan, time 115.91ms
iter 121530: loss nan, time 116.88ms
iter 121540: loss nan, time 115.82ms
iter 121550: loss nan, time 116.18ms
iter 121560: loss nan, time 116.81ms
iter 121570: loss nan, time 116.80ms
iter 121580: loss nan, time 115.52ms
iter 121590: loss nan, time 117.09ms
tensor(0.7679)
iter 121600: loss nan, time 115.10ms
iter 121610: loss nan, time 114.87ms
iter 121620: loss nan, time 118.06ms
iter 121630: loss nan, time 116.02ms
iter 121640: loss nan, time 116.73ms
iter 121650: loss nan, time 118.41ms
iter 121660: loss nan, time 114.66ms
iter 121670: loss nan, time 116.86ms
iter 121680: loss nan, time 118.04ms
iter 121690: loss nan, time 115.96ms
tensor(0.7409)
iter 121700: loss nan, time 117.05ms
iter 121710: loss nan, time 116.44ms
iter 121720: loss nan, time 115.17ms
iter 121730: loss nan, time 116.84ms
iter 121740: loss nan, time 115.66ms
step 121750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 121750: loss nan, time 2902.01ms
iter 121760: loss nan, time 116.75ms
iter 121770: loss nan, time 115.28ms
iter 121780: loss nan, time 114.58ms
iter 121790: loss nan, time 116.98ms
tensor(0.7129)
iter 121800: loss nan, time 115.53ms
iter 121810: loss nan, time 117.01ms
iter 121820: loss nan, time 115.76ms
iter 121830: loss nan, time 115.93ms
iter 121840: loss nan, time 115.88ms
iter 121850: loss nan, time 115.78ms
iter 121860: loss nan, time 116.99ms
iter 121870: loss nan, time 117.02ms
iter 121880: loss nan, time 115.55ms
iter 121890: loss nan, time 116.74ms
tensor(0.6841)
iter 121900: loss nan, time 116.09ms
iter 121910: loss nan, time 115.04ms
iter 121920: loss nan, time 116.64ms
iter 121930: loss nan, time 115.85ms
iter 121940: loss nan, time 116.92ms
iter 121950: loss nan, time 118.10ms
iter 121960: loss nan, time 115.89ms
iter 121970: loss nan, time 116.56ms
iter 121980: loss nan, time 117.63ms
iter 121990: loss nan, time 115.99ms
tensor(0.6545)
step 122000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 122000: loss nan, time 2907.35ms
iter 122010: loss nan, time 114.77ms
iter 122020: loss nan, time 114.58ms
iter 122030: loss nan, time 118.37ms
iter 122040: loss nan, time 115.73ms
iter 122050: loss nan, time 116.95ms
iter 122060: loss nan, time 118.54ms
iter 122070: loss nan, time 115.95ms
iter 122080: loss nan, time 116.81ms
iter 122090: loss nan, time 117.65ms
tensor(0.6243)
iter 122100: loss nan, time 116.28ms
iter 122110: loss nan, time 116.94ms
iter 122120: loss nan, time 116.71ms
iter 122130: loss nan, time 116.05ms
iter 122140: loss nan, time 117.06ms
iter 122150: loss nan, time 116.46ms
iter 122160: loss nan, time 115.29ms
iter 122170: loss nan, time 116.87ms
iter 122180: loss nan, time 116.55ms
iter 122190: loss nan, time 115.57ms
tensor(0.5937)
iter 122200: loss nan, time 117.32ms
iter 122210: loss nan, time 116.32ms
iter 122220: loss nan, time 115.59ms
iter 122230: loss nan, time 116.83ms
iter 122240: loss nan, time 115.91ms
step 122250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 122250: loss nan, time 2908.81ms
iter 122260: loss nan, time 117.98ms
iter 122270: loss nan, time 115.90ms
iter 122280: loss nan, time 117.02ms
iter 122290: loss nan, time 115.80ms
tensor(0.5627)
iter 122300: loss nan, time 115.95ms
iter 122310: loss nan, time 116.85ms
iter 122320: loss nan, time 116.25ms
iter 122330: loss nan, time 115.79ms
iter 122340: loss nan, time 116.86ms
iter 122350: loss nan, time 115.79ms
iter 122360: loss nan, time 115.51ms
iter 122370: loss nan, time 116.91ms
iter 122380: loss nan, time 115.94ms
iter 122390: loss nan, time 116.85ms
tensor(0.5314)
iter 122400: loss nan, time 117.74ms
iter 122410: loss nan, time 115.94ms
iter 122420: loss nan, time 117.12ms
iter 122430: loss nan, time 116.03ms
iter 122440: loss nan, time 115.60ms
iter 122450: loss nan, time 117.11ms
iter 122460: loss nan, time 115.68ms
iter 122470: loss nan, time 114.97ms
iter 122480: loss nan, time 117.86ms
iter 122490: loss nan, time 114.65ms
tensor(0.5000)
step 122500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 122500: loss nan, time 2911.30ms
iter 122510: loss nan, time 115.95ms
iter 122520: loss nan, time 116.08ms
iter 122530: loss nan, time 116.98ms
iter 122540: loss nan, time 115.83ms
iter 122550: loss nan, time 114.66ms
iter 122560: loss nan, time 118.39ms
iter 122570: loss nan, time 115.86ms
iter 122580: loss nan, time 116.79ms
iter 122590: loss nan, time 116.31ms
tensor(0.4686)
iter 122600: loss nan, time 116.23ms
iter 122610: loss nan, time 114.67ms
iter 122620: loss nan, time 115.03ms
iter 122630: loss nan, time 117.01ms
iter 122640: loss nan, time 118.18ms
iter 122650: loss nan, time 114.96ms
iter 122660: loss nan, time 116.84ms
iter 122670: loss nan, time 116.12ms
iter 122680: loss nan, time 115.02ms
iter 122690: loss nan, time 116.80ms
tensor(0.4373)
iter 122700: loss nan, time 116.48ms
iter 122710: loss nan, time 115.77ms
iter 122720: loss nan, time 116.84ms
iter 122730: loss nan, time 116.25ms
iter 122740: loss nan, time 117.72ms
step 122750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 122750: loss nan, time 2905.09ms
iter 122760: loss nan, time 116.91ms
iter 122770: loss nan, time 118.06ms
iter 122780: loss nan, time 116.17ms
iter 122790: loss nan, time 116.76ms
tensor(0.4063)
iter 122800: loss nan, time 118.42ms
iter 122810: loss nan, time 116.11ms
iter 122820: loss nan, time 117.10ms
iter 122830: loss nan, time 116.72ms
iter 122840: loss nan, time 114.69ms
iter 122850: loss nan, time 116.87ms
iter 122860: loss nan, time 116.02ms
iter 122870: loss nan, time 114.67ms
iter 122880: loss nan, time 118.16ms
iter 122890: loss nan, time 115.85ms
tensor(0.3757)
iter 122900: loss nan, time 117.02ms
iter 122910: loss nan, time 116.12ms
iter 122920: loss nan, time 115.92ms
iter 122930: loss nan, time 114.69ms
iter 122940: loss nan, time 115.93ms
iter 122950: loss nan, time 115.06ms
iter 122960: loss nan, time 116.75ms
iter 122970: loss nan, time 114.86ms
iter 122980: loss nan, time 116.89ms
iter 122990: loss nan, time 116.19ms
tensor(0.3455)
step 123000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 123000: loss nan, time 2906.84ms
iter 123010: loss nan, time 117.37ms
iter 123020: loss nan, time 115.61ms
iter 123030: loss nan, time 115.81ms
iter 123040: loss nan, time 116.85ms
iter 123050: loss nan, time 115.95ms
iter 123060: loss nan, time 115.58ms
iter 123070: loss nan, time 116.80ms
iter 123080: loss nan, time 114.62ms
iter 123090: loss nan, time 116.74ms
tensor(0.3159)
iter 123100: loss nan, time 118.37ms
iter 123110: loss nan, time 115.92ms
iter 123120: loss nan, time 114.75ms
iter 123130: loss nan, time 118.57ms
iter 123140: loss nan, time 114.66ms
iter 123150: loss nan, time 116.75ms
iter 123160: loss nan, time 116.85ms
iter 123170: loss nan, time 114.59ms
iter 123180: loss nan, time 118.04ms
iter 123190: loss nan, time 115.98ms
tensor(0.2871)
iter 123200: loss nan, time 115.06ms
iter 123210: loss nan, time 117.95ms
iter 123220: loss nan, time 114.84ms
iter 123230: loss nan, time 116.68ms
iter 123240: loss nan, time 116.77ms
step 123250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 123250: loss nan, time 2907.06ms
iter 123260: loss nan, time 115.79ms
iter 123270: loss nan, time 115.09ms
iter 123280: loss nan, time 117.13ms
iter 123290: loss nan, time 116.44ms
tensor(0.2591)
iter 123300: loss nan, time 114.98ms
iter 123310: loss nan, time 117.88ms
iter 123320: loss nan, time 115.81ms
iter 123330: loss nan, time 115.05ms
iter 123340: loss nan, time 117.06ms
iter 123350: loss nan, time 115.58ms
iter 123360: loss nan, time 116.84ms
iter 123370: loss nan, time 117.35ms
iter 123380: loss nan, time 115.42ms
iter 123390: loss nan, time 116.71ms
tensor(0.2321)
iter 123400: loss nan, time 115.66ms
iter 123410: loss nan, time 114.63ms
iter 123420: loss nan, time 117.95ms
iter 123430: loss nan, time 112.86ms
iter 123440: loss nan, time 116.05ms
iter 123450: loss nan, time 112.72ms
iter 123460: loss nan, time 116.72ms
iter 123470: loss nan, time 112.94ms
iter 123480: loss nan, time 116.01ms
iter 123490: loss nan, time 112.80ms
tensor(0.2061)
step 123500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 123500: loss nan, time 2890.94ms
iter 123510: loss nan, time 116.04ms
iter 123520: loss nan, time 115.09ms
iter 123530: loss nan, time 114.92ms
iter 123540: loss nan, time 117.64ms
iter 123550: loss nan, time 114.67ms
iter 123560: loss nan, time 118.16ms
iter 123570: loss nan, time 116.41ms
iter 123580: loss nan, time 115.13ms
iter 123590: loss nan, time 115.85ms
tensor(0.1813)
iter 123600: loss nan, time 116.35ms
iter 123610: loss nan, time 117.13ms
iter 123620: loss nan, time 118.12ms
iter 123630: loss nan, time 114.78ms
iter 123640: loss nan, time 116.74ms
iter 123650: loss nan, time 115.40ms
iter 123660: loss nan, time 114.01ms
iter 123670: loss nan, time 116.99ms
iter 123680: loss nan, time 115.96ms
iter 123690: loss nan, time 117.06ms
tensor(0.1577)
iter 123700: loss nan, time 118.59ms
iter 123710: loss nan, time 115.07ms
iter 123720: loss nan, time 116.86ms
iter 123730: loss nan, time 115.68ms
iter 123740: loss nan, time 114.69ms
step 123750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 123750: loss nan, time 2907.77ms
iter 123760: loss nan, time 115.62ms
iter 123770: loss nan, time 116.86ms
iter 123780: loss nan, time 117.90ms
iter 123790: loss nan, time 114.61ms
tensor(0.1355)
iter 123800: loss nan, time 118.64ms
iter 123810: loss nan, time 116.23ms
iter 123820: loss nan, time 115.54ms
iter 123830: loss nan, time 118.69ms
iter 123840: loss nan, time 116.46ms
iter 123850: loss nan, time 114.68ms
iter 123860: loss nan, time 117.77ms
iter 123870: loss nan, time 116.17ms
iter 123880: loss nan, time 116.95ms
iter 123890: loss nan, time 117.81ms
tensor(0.1147)
iter 123900: loss nan, time 115.02ms
iter 123910: loss nan, time 115.05ms
iter 123920: loss nan, time 115.84ms
iter 123930: loss nan, time 115.11ms
iter 123940: loss nan, time 118.02ms
iter 123950: loss nan, time 115.40ms
iter 123960: loss nan, time 116.71ms
iter 123970: loss nan, time 115.85ms
iter 123980: loss nan, time 114.38ms
iter 123990: loss nan, time 116.89ms
tensor(0.0955)
step 124000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 124000: loss nan, time 2901.38ms
iter 124010: loss nan, time 116.67ms
iter 124020: loss nan, time 117.23ms
iter 124030: loss nan, time 114.63ms
iter 124040: loss nan, time 117.72ms
iter 124050: loss nan, time 115.87ms
iter 124060: loss nan, time 115.60ms
iter 124070: loss nan, time 118.16ms
iter 124080: loss nan, time 115.19ms
iter 124090: loss nan, time 116.83ms
tensor(0.0778)
iter 124100: loss nan, time 117.66ms
iter 124110: loss nan, time 114.58ms
iter 124120: loss nan, time 117.94ms
iter 124130: loss nan, time 115.83ms
iter 124140: loss nan, time 114.62ms
iter 124150: loss nan, time 118.64ms
iter 124160: loss nan, time 116.27ms
iter 124170: loss nan, time 114.72ms
iter 124180: loss nan, time 118.38ms
iter 124190: loss nan, time 115.56ms
tensor(0.0618)
iter 124200: loss nan, time 117.15ms
iter 124210: loss nan, time 116.87ms
iter 124220: loss nan, time 114.63ms
iter 124230: loss nan, time 115.88ms
iter 124240: loss nan, time 114.93ms
step 124250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 124250: loss nan, time 2905.04ms
iter 124260: loss nan, time 115.65ms
iter 124270: loss nan, time 114.74ms
iter 124280: loss nan, time 118.05ms
iter 124290: loss nan, time 116.00ms
tensor(0.0476)
iter 124300: loss nan, time 115.18ms
iter 124310: loss nan, time 117.91ms
iter 124320: loss nan, time 114.84ms
iter 124330: loss nan, time 116.77ms
iter 124340: loss nan, time 117.03ms
iter 124350: loss nan, time 114.70ms
iter 124360: loss nan, time 119.00ms
iter 124370: loss nan, time 115.95ms
iter 124380: loss nan, time 114.18ms
iter 124390: loss nan, time 118.50ms
tensor(0.0351)
iter 124400: loss nan, time 115.44ms
iter 124410: loss nan, time 116.82ms
iter 124420: loss nan, time 118.65ms
iter 124430: loss nan, time 114.05ms
iter 124440: loss nan, time 118.02ms
iter 124450: loss nan, time 116.24ms
iter 124460: loss nan, time 115.18ms
iter 124470: loss nan, time 117.76ms
iter 124480: loss nan, time 115.56ms
iter 124490: loss nan, time 114.69ms
tensor(0.0245)
step 124500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 124500: loss nan, time 2891.65ms
iter 124510: loss nan, time 116.75ms
iter 124520: loss nan, time 116.78ms
iter 124530: loss nan, time 114.37ms
iter 124540: loss nan, time 118.01ms
iter 124550: loss nan, time 116.11ms
iter 124560: loss nan, time 115.07ms
iter 124570: loss nan, time 118.60ms
iter 124580: loss nan, time 114.42ms
iter 124590: loss nan, time 116.85ms
tensor(0.0157)
iter 124600: loss nan, time 118.36ms
iter 124610: loss nan, time 115.05ms
iter 124620: loss nan, time 117.88ms
iter 124630: loss nan, time 116.29ms
iter 124640: loss nan, time 114.86ms
iter 124650: loss nan, time 117.86ms
iter 124660: loss nan, time 115.13ms
iter 124670: loss nan, time 116.81ms
iter 124680: loss nan, time 117.90ms
iter 124690: loss nan, time 114.64ms
tensor(0.0089)
iter 124700: loss nan, time 117.46ms
iter 124710: loss nan, time 116.04ms
iter 124720: loss nan, time 114.95ms
iter 124730: loss nan, time 117.79ms
iter 124740: loss nan, time 115.40ms
step 124750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 124750: loss nan, time 2910.01ms
iter 124760: loss nan, time 116.05ms
iter 124770: loss nan, time 114.74ms
iter 124780: loss nan, time 116.82ms
iter 124790: loss nan, time 115.39ms
tensor(0.0039)
iter 124800: loss nan, time 115.23ms
iter 124810: loss nan, time 118.19ms
iter 124820: loss nan, time 115.70ms
iter 124830: loss nan, time 116.88ms
iter 124840: loss nan, time 116.47ms
iter 124850: loss nan, time 114.88ms
iter 124860: loss nan, time 116.83ms
iter 124870: loss nan, time 118.02ms
iter 124880: loss nan, time 115.03ms
iter 124890: loss nan, time 118.52ms
tensor(0.0010)
iter 124900: loss nan, time 115.41ms
iter 124910: loss nan, time 114.90ms
iter 124920: loss nan, time 118.35ms
iter 124930: loss nan, time 122.94ms
iter 124940: loss nan, time 120.32ms
iter 124950: loss nan, time 121.41ms
iter 124960: loss nan, time 121.80ms
iter 124970: loss nan, time 121.43ms
iter 124980: loss nan, time 121.18ms
iter 124990: loss nan, time 119.39ms
tensor(0.0010)
step 125000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 125000: loss nan, time 2907.28ms
iter 125010: loss nan, time 119.25ms
iter 125020: loss nan, time 119.33ms
iter 125030: loss nan, time 119.06ms
iter 125040: loss nan, time 119.00ms
iter 125050: loss nan, time 119.49ms
iter 125060: loss nan, time 119.46ms
iter 125070: loss nan, time 120.19ms
iter 125080: loss nan, time 120.99ms
iter 125090: loss nan, time 121.98ms
tensor(0.0010)
iter 125100: loss nan, time 122.77ms
iter 125110: loss nan, time 121.42ms
iter 125120: loss nan, time 122.41ms
iter 125130: loss nan, time 121.35ms
iter 125140: loss nan, time 121.85ms
iter 125150: loss nan, time 119.40ms
iter 125160: loss nan, time 119.49ms
iter 125170: loss nan, time 119.26ms
iter 125180: loss nan, time 120.44ms
iter 125190: loss nan, time 120.60ms
tensor(0.0039)
iter 125200: loss nan, time 121.81ms
iter 125210: loss nan, time 123.15ms
iter 125220: loss nan, time 122.79ms
iter 125230: loss nan, time 121.92ms
iter 125240: loss nan, time 120.11ms
step 125250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 125250: loss nan, time 2903.76ms
iter 125260: loss nan, time 121.42ms
iter 125270: loss nan, time 122.22ms
iter 125280: loss nan, time 121.79ms
iter 125290: loss nan, time 120.98ms
tensor(0.0089)
iter 125300: loss nan, time 119.39ms
iter 125310: loss nan, time 120.16ms
iter 125320: loss nan, time 119.18ms
iter 125330: loss nan, time 118.40ms
iter 125340: loss nan, time 118.95ms
iter 125350: loss nan, time 119.08ms
iter 125360: loss nan, time 119.46ms
iter 125370: loss nan, time 120.07ms
iter 125380: loss nan, time 120.54ms
iter 125390: loss nan, time 121.20ms
tensor(0.0157)
iter 125400: loss nan, time 124.58ms
iter 125410: loss nan, time 120.99ms
iter 125420: loss nan, time 118.92ms
iter 125430: loss nan, time 116.18ms
iter 125440: loss nan, time 116.83ms
iter 125450: loss nan, time 116.19ms
iter 125460: loss nan, time 116.03ms
iter 125470: loss nan, time 115.01ms
iter 125480: loss nan, time 118.57ms
iter 125490: loss nan, time 115.81ms
tensor(0.0245)
step 125500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 125500: loss nan, time 2897.89ms
iter 125510: loss nan, time 116.93ms
iter 125520: loss nan, time 114.97ms
iter 125530: loss nan, time 118.26ms
iter 125540: loss nan, time 113.37ms
iter 125550: loss nan, time 115.02ms
iter 125560: loss nan, time 115.29ms
iter 125570: loss nan, time 115.01ms
iter 125580: loss nan, time 113.80ms
iter 125590: loss nan, time 115.21ms
tensor(0.0351)
iter 125600: loss nan, time 114.90ms
iter 125610: loss nan, time 116.95ms
iter 125620: loss nan, time 116.58ms
iter 125630: loss nan, time 114.54ms
iter 125640: loss nan, time 118.13ms
iter 125650: loss nan, time 115.60ms
iter 125660: loss nan, time 117.08ms
iter 125670: loss nan, time 117.67ms
iter 125680: loss nan, time 118.18ms
iter 125690: loss nan, time 115.89ms
tensor(0.0476)
iter 125700: loss nan, time 117.01ms
iter 125710: loss nan, time 116.14ms
iter 125720: loss nan, time 114.62ms
iter 125730: loss nan, time 116.96ms
iter 125740: loss nan, time 116.47ms
step 125750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 125750: loss nan, time 2883.07ms
iter 125760: loss nan, time 116.32ms
iter 125770: loss nan, time 114.08ms
iter 125780: loss nan, time 118.80ms
iter 125790: loss nan, time 116.07ms
tensor(0.0618)
iter 125800: loss nan, time 116.68ms
iter 125810: loss nan, time 117.93ms
iter 125820: loss nan, time 114.17ms
iter 125830: loss nan, time 118.16ms
iter 125840: loss nan, time 116.89ms
iter 125850: loss nan, time 115.01ms
iter 125860: loss nan, time 118.60ms
iter 125870: loss nan, time 116.22ms
iter 125880: loss nan, time 114.12ms
iter 125890: loss nan, time 118.16ms
tensor(0.0778)
iter 125900: loss nan, time 116.69ms
iter 125910: loss nan, time 115.15ms
iter 125920: loss nan, time 118.59ms
iter 125930: loss nan, time 115.89ms
iter 125940: loss nan, time 115.76ms
iter 125950: loss nan, time 118.53ms
iter 125960: loss nan, time 116.08ms
iter 125970: loss nan, time 114.57ms
iter 125980: loss nan, time 118.82ms
iter 125990: loss nan, time 120.67ms
tensor(0.0955)
step 126000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 126000: loss nan, time 2912.16ms
iter 126010: loss nan, time 120.73ms
iter 126020: loss nan, time 120.53ms
iter 126030: loss nan, time 119.80ms
iter 126040: loss nan, time 120.68ms
iter 126050: loss nan, time 120.93ms
iter 126060: loss nan, time 121.64ms
iter 126070: loss nan, time 122.22ms
iter 126080: loss nan, time 120.04ms
iter 126090: loss nan, time 122.05ms
tensor(0.1147)
iter 126100: loss nan, time 122.35ms
iter 126110: loss nan, time 122.24ms
iter 126120: loss nan, time 122.03ms
iter 126130: loss nan, time 119.98ms
iter 126140: loss nan, time 122.13ms
iter 126150: loss nan, time 122.00ms
iter 126160: loss nan, time 122.16ms
iter 126170: loss nan, time 121.66ms
iter 126180: loss nan, time 119.09ms
iter 126190: loss nan, time 121.03ms
tensor(0.1355)
iter 126200: loss nan, time 121.16ms
iter 126210: loss nan, time 121.03ms
iter 126220: loss nan, time 121.00ms
iter 126230: loss nan, time 118.80ms
iter 126240: loss nan, time 121.08ms
step 126250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 126250: loss nan, time 2906.16ms
iter 126260: loss nan, time 122.26ms
iter 126270: loss nan, time 119.90ms
iter 126280: loss nan, time 119.30ms
iter 126290: loss nan, time 119.58ms
tensor(0.1577)
iter 126300: loss nan, time 119.23ms
iter 126310: loss nan, time 120.14ms
iter 126320: loss nan, time 120.52ms
iter 126330: loss nan, time 120.40ms
iter 126340: loss nan, time 120.14ms
iter 126350: loss nan, time 119.04ms
iter 126360: loss nan, time 120.32ms
iter 126370: loss nan, time 121.08ms
iter 126380: loss nan, time 121.47ms
iter 126390: loss nan, time 121.68ms
tensor(0.1813)
iter 126400: loss nan, time 121.35ms
iter 126410: loss nan, time 122.40ms
iter 126420: loss nan, time 121.17ms
iter 126430: loss nan, time 121.09ms
iter 126440: loss nan, time 120.92ms
iter 126450: loss nan, time 121.54ms
iter 126460: loss nan, time 120.95ms
iter 126470: loss nan, time 121.17ms
iter 126480: loss nan, time 121.14ms
iter 126490: loss nan, time 121.22ms
tensor(0.2061)
step 126500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 126500: loss nan, time 2914.84ms
iter 126510: loss nan, time 118.91ms
iter 126520: loss nan, time 119.59ms
iter 126530: loss nan, time 120.50ms
iter 126540: loss nan, time 119.78ms
iter 126550: loss nan, time 119.99ms
iter 126560: loss nan, time 121.15ms
iter 126570: loss nan, time 120.38ms
iter 126580: loss nan, time 120.76ms
iter 126590: loss nan, time 119.99ms
tensor(0.2321)
iter 126600: loss nan, time 121.52ms
iter 126610: loss nan, time 122.31ms
iter 126620: loss nan, time 122.08ms
iter 126630: loss nan, time 121.23ms
iter 126640: loss nan, time 118.79ms
iter 126650: loss nan, time 121.06ms
iter 126660: loss nan, time 121.15ms
iter 126670: loss nan, time 120.66ms
iter 126680: loss nan, time 120.80ms
iter 126690: loss nan, time 118.81ms
tensor(0.2591)
iter 126700: loss nan, time 121.33ms
iter 126710: loss nan, time 121.07ms
iter 126720: loss nan, time 118.86ms
iter 126730: loss nan, time 118.72ms
iter 126740: loss nan, time 119.18ms
step 126750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 126750: loss nan, time 2914.40ms
iter 126760: loss nan, time 118.73ms
iter 126770: loss nan, time 119.53ms
iter 126780: loss nan, time 119.70ms
iter 126790: loss nan, time 120.13ms
tensor(0.2871)
iter 126800: loss nan, time 120.50ms
iter 126810: loss nan, time 118.30ms
iter 126820: loss nan, time 120.46ms
iter 126830: loss nan, time 119.96ms
iter 126840: loss nan, time 120.66ms
iter 126850: loss nan, time 120.29ms
iter 126860: loss nan, time 120.04ms
iter 126870: loss nan, time 121.45ms
iter 126880: loss nan, time 121.66ms
iter 126890: loss nan, time 122.04ms
tensor(0.3159)
iter 126900: loss nan, time 122.35ms
iter 126910: loss nan, time 121.10ms
iter 126920: loss nan, time 121.04ms
iter 126930: loss nan, time 120.94ms
iter 126940: loss nan, time 121.14ms
iter 126950: loss nan, time 121.28ms
iter 126960: loss nan, time 121.05ms
iter 126970: loss nan, time 121.20ms
iter 126980: loss nan, time 118.84ms
iter 126990: loss nan, time 120.34ms
tensor(0.3455)
step 127000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 127000: loss nan, time 2919.43ms
iter 127010: loss nan, time 120.26ms
iter 127020: loss nan, time 120.18ms
iter 127030: loss nan, time 119.93ms
iter 127040: loss nan, time 120.60ms
iter 127050: loss nan, time 120.06ms
iter 127060: loss nan, time 120.65ms
iter 127070: loss nan, time 120.77ms
iter 127080: loss nan, time 121.19ms
iter 127090: loss nan, time 121.75ms
tensor(0.3757)
iter 127100: loss nan, time 120.51ms
iter 127110: loss nan, time 122.24ms
iter 127120: loss nan, time 121.11ms
iter 127130: loss nan, time 121.16ms
iter 127140: loss nan, time 121.13ms
iter 127150: loss nan, time 118.86ms
iter 127160: loss nan, time 121.09ms
iter 127170: loss nan, time 121.08ms
iter 127180: loss nan, time 118.86ms
iter 127190: loss nan, time 119.16ms
tensor(0.4063)
iter 127200: loss nan, time 119.82ms
iter 127210: loss nan, time 118.86ms
iter 127220: loss nan, time 120.02ms
iter 127230: loss nan, time 120.08ms
iter 127240: loss nan, time 119.94ms
step 127250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 127250: loss nan, time 2912.64ms
iter 127260: loss nan, time 119.93ms
iter 127270: loss nan, time 118.78ms
iter 127280: loss nan, time 120.64ms
iter 127290: loss nan, time 120.68ms
tensor(0.4373)
iter 127300: loss nan, time 121.39ms
iter 127310: loss nan, time 121.87ms
iter 127320: loss nan, time 120.82ms
iter 127330: loss nan, time 122.40ms
iter 127340: loss nan, time 122.11ms
iter 127350: loss nan, time 120.86ms
iter 127360: loss nan, time 120.35ms
iter 127370: loss nan, time 121.01ms
iter 127380: loss nan, time 120.84ms
iter 127390: loss nan, time 121.29ms
tensor(0.4686)
iter 127400: loss nan, time 120.96ms
iter 127410: loss nan, time 120.81ms
iter 127420: loss nan, time 120.82ms
iter 127430: loss nan, time 120.78ms
iter 127440: loss nan, time 118.62ms
iter 127450: loss nan, time 119.07ms
iter 127460: loss nan, time 118.80ms
iter 127470: loss nan, time 119.09ms
iter 127480: loss nan, time 119.62ms
iter 127490: loss nan, time 119.41ms
tensor(0.5000)
step 127500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 127500: loss nan, time 2904.94ms
iter 127510: loss nan, time 119.65ms
iter 127520: loss nan, time 120.27ms
iter 127530: loss nan, time 120.28ms
iter 127540: loss nan, time 120.65ms
iter 127550: loss nan, time 118.67ms
iter 127560: loss nan, time 120.46ms
iter 127570: loss nan, time 120.12ms
iter 127580: loss nan, time 120.35ms
iter 127590: loss nan, time 121.33ms
tensor(0.5314)
iter 127600: loss nan, time 123.57ms
iter 127610: loss nan, time 120.31ms
iter 127620: loss nan, time 121.12ms
iter 127630: loss nan, time 121.28ms
iter 127640: loss nan, time 121.23ms
iter 127650: loss nan, time 121.25ms
iter 127660: loss nan, time 119.00ms
iter 127670: loss nan, time 120.04ms
iter 127680: loss nan, time 119.64ms
iter 127690: loss nan, time 120.85ms
tensor(0.5627)
iter 127700: loss nan, time 120.96ms
iter 127710: loss nan, time 119.56ms
iter 127720: loss nan, time 118.33ms
iter 127730: loss nan, time 119.13ms
iter 127740: loss nan, time 120.23ms
step 127750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 127750: loss nan, time 2915.40ms
iter 127760: loss nan, time 119.23ms
iter 127770: loss nan, time 120.21ms
iter 127780: loss nan, time 120.84ms
iter 127790: loss nan, time 120.18ms
tensor(0.5937)
iter 127800: loss nan, time 122.36ms
iter 127810: loss nan, time 122.74ms
iter 127820: loss nan, time 122.26ms
iter 127830: loss nan, time 121.20ms
iter 127840: loss nan, time 120.54ms
iter 127850: loss nan, time 120.88ms
iter 127860: loss nan, time 121.07ms
iter 127870: loss nan, time 121.38ms
iter 127880: loss nan, time 121.03ms
iter 127890: loss nan, time 120.77ms
tensor(0.6243)
iter 127900: loss nan, time 118.72ms
iter 127910: loss nan, time 118.95ms
iter 127920: loss nan, time 119.03ms
iter 127930: loss nan, time 119.30ms
iter 127940: loss nan, time 119.70ms
iter 127950: loss nan, time 119.62ms
iter 127960: loss nan, time 118.48ms
iter 127970: loss nan, time 119.11ms
iter 127980: loss nan, time 119.84ms
iter 127990: loss nan, time 119.89ms
tensor(0.6545)
step 128000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 128000: loss nan, time 2904.70ms
iter 128010: loss nan, time 118.02ms
iter 128020: loss nan, time 119.64ms
iter 128030: loss nan, time 120.14ms
iter 128040: loss nan, time 120.06ms
iter 128050: loss nan, time 120.18ms
iter 128060: loss nan, time 120.34ms
iter 128070: loss nan, time 119.92ms
iter 128080: loss nan, time 120.70ms
iter 128090: loss nan, time 121.62ms
tensor(0.6841)
iter 128100: loss nan, time 122.75ms
iter 128110: loss nan, time 122.59ms
iter 128120: loss nan, time 119.27ms
iter 128130: loss nan, time 121.42ms
iter 128140: loss nan, time 121.62ms
iter 128150: loss nan, time 120.64ms
iter 128160: loss nan, time 119.61ms
iter 128170: loss nan, time 119.77ms
iter 128180: loss nan, time 119.73ms
iter 128190: loss nan, time 120.47ms
tensor(0.7129)
iter 128200: loss nan, time 120.95ms
iter 128210: loss nan, time 121.37ms
iter 128220: loss nan, time 121.18ms
iter 128230: loss nan, time 121.45ms
iter 128240: loss nan, time 121.86ms
step 128250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 128250: loss nan, time 2904.83ms
iter 128260: loss nan, time 121.35ms
iter 128270: loss nan, time 121.55ms
iter 128280: loss nan, time 121.95ms
iter 128290: loss nan, time 119.50ms
tensor(0.7409)
iter 128300: loss nan, time 120.47ms
iter 128310: loss nan, time 120.45ms
iter 128320: loss nan, time 120.54ms
iter 128330: loss nan, time 120.92ms
iter 128340: loss nan, time 120.04ms
iter 128350: loss nan, time 121.99ms
iter 128360: loss nan, time 122.47ms
iter 128370: loss nan, time 119.28ms
iter 128380: loss nan, time 121.40ms
iter 128390: loss nan, time 121.55ms
tensor(0.7679)
iter 128400: loss nan, time 119.41ms
iter 128410: loss nan, time 119.37ms
iter 128420: loss nan, time 120.42ms
iter 128430: loss nan, time 117.27ms
iter 128440: loss nan, time 120.64ms
iter 128450: loss nan, time 120.76ms
iter 128460: loss nan, time 120.62ms
iter 128470: loss nan, time 121.35ms
iter 128480: loss nan, time 121.15ms
iter 128490: loss nan, time 122.70ms
tensor(0.7939)
step 128500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 128500: loss nan, time 2910.61ms
iter 128510: loss nan, time 122.73ms
iter 128520: loss nan, time 121.19ms
iter 128530: loss nan, time 122.20ms
iter 128540: loss nan, time 122.15ms
iter 128550: loss nan, time 119.11ms
iter 128560: loss nan, time 119.77ms
iter 128570: loss nan, time 119.78ms
iter 128580: loss nan, time 120.70ms
iter 128590: loss nan, time 120.38ms
tensor(0.8187)
iter 128600: loss nan, time 120.30ms
iter 128610: loss nan, time 121.19ms
iter 128620: loss nan, time 120.29ms
iter 128630: loss nan, time 122.45ms
iter 128640: loss nan, time 122.56ms
iter 128650: loss nan, time 121.15ms
iter 128660: loss nan, time 121.57ms
iter 128670: loss nan, time 118.94ms
iter 128680: loss nan, time 121.24ms
iter 128690: loss nan, time 119.21ms
tensor(0.8423)
iter 128700: loss nan, time 120.25ms
iter 128710: loss nan, time 120.37ms
iter 128720: loss nan, time 120.41ms
iter 128730: loss nan, time 118.92ms
iter 128740: loss nan, time 119.84ms
step 128750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 128750: loss nan, time 2887.58ms
iter 128760: loss nan, time 119.90ms
iter 128770: loss nan, time 120.38ms
iter 128780: loss nan, time 120.30ms
iter 128790: loss nan, time 119.15ms
tensor(0.8645)
iter 128800: loss nan, time 120.31ms
iter 128810: loss nan, time 120.49ms
iter 128820: loss nan, time 116.37ms
iter 128830: loss nan, time 114.80ms
iter 128840: loss nan, time 116.86ms
iter 128850: loss nan, time 115.93ms
iter 128860: loss nan, time 116.80ms
iter 128870: loss nan, time 118.18ms
iter 128880: loss nan, time 116.07ms
iter 128890: loss nan, time 114.75ms
tensor(0.8853)
iter 128900: loss nan, time 116.19ms
iter 128910: loss nan, time 115.78ms
iter 128920: loss nan, time 116.84ms
iter 128930: loss nan, time 116.55ms
iter 128940: loss nan, time 119.85ms
iter 128950: loss nan, time 120.44ms
iter 128960: loss nan, time 118.93ms
iter 128970: loss nan, time 120.15ms
iter 128980: loss nan, time 119.89ms
iter 128990: loss nan, time 119.99ms
tensor(0.9045)
step 129000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 129000: loss nan, time 2909.40ms
iter 129010: loss nan, time 120.21ms
iter 129020: loss nan, time 118.89ms
iter 129030: loss nan, time 120.05ms
iter 129040: loss nan, time 120.56ms
iter 129050: loss nan, time 121.04ms
iter 129060: loss nan, time 121.60ms
iter 129070: loss nan, time 121.21ms
iter 129080: loss nan, time 122.34ms
iter 129090: loss nan, time 121.00ms
tensor(0.9222)
iter 129100: loss nan, time 121.38ms
iter 129110: loss nan, time 121.10ms
iter 129120: loss nan, time 120.93ms
iter 129130: loss nan, time 120.94ms
iter 129140: loss nan, time 121.24ms
iter 129150: loss nan, time 119.37ms
iter 129160: loss nan, time 120.16ms
iter 129170: loss nan, time 120.45ms
iter 129180: loss nan, time 120.08ms
iter 129190: loss nan, time 120.10ms
tensor(0.9382)
iter 129200: loss nan, time 120.51ms
iter 129210: loss nan, time 121.31ms
iter 129220: loss nan, time 120.42ms
iter 129230: loss nan, time 120.71ms
iter 129240: loss nan, time 121.35ms
step 129250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 129250: loss nan, time 2907.82ms
iter 129260: loss nan, time 120.05ms
iter 129270: loss nan, time 121.27ms
iter 129280: loss nan, time 122.20ms
iter 129290: loss nan, time 122.23ms
tensor(0.9524)
iter 129300: loss nan, time 121.57ms
iter 129310: loss nan, time 119.01ms
iter 129320: loss nan, time 121.23ms
iter 129330: loss nan, time 121.46ms
iter 129340: loss nan, time 120.95ms
iter 129350: loss nan, time 121.28ms
iter 129360: loss nan, time 118.89ms
iter 129370: loss nan, time 117.41ms
iter 129380: loss nan, time 117.31ms
iter 129390: loss nan, time 115.00ms
tensor(0.9649)
iter 129400: loss nan, time 117.38ms
iter 129410: loss nan, time 116.21ms
iter 129420: loss nan, time 115.07ms
iter 129430: loss nan, time 117.70ms
iter 129440: loss nan, time 116.74ms
iter 129450: loss nan, time 115.44ms
iter 129460: loss nan, time 118.10ms
iter 129470: loss nan, time 116.28ms
iter 129480: loss nan, time 115.01ms
iter 129490: loss nan, time 117.68ms
tensor(0.9755)
step 129500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 129500: loss nan, time 2906.03ms
iter 129510: loss nan, time 117.15ms
iter 129520: loss nan, time 118.33ms
iter 129530: loss nan, time 115.31ms
iter 129540: loss nan, time 115.20ms
iter 129550: loss nan, time 116.21ms
iter 129560: loss nan, time 115.92ms
iter 129570: loss nan, time 116.46ms
iter 129580: loss nan, time 118.04ms
iter 129590: loss nan, time 115.44ms
tensor(0.9843)
iter 129600: loss nan, time 116.46ms
iter 129610: loss nan, time 116.45ms
iter 129620: loss nan, time 115.84ms
iter 129630: loss nan, time 117.14ms
iter 129640: loss nan, time 118.23ms
iter 129650: loss nan, time 114.88ms
iter 129660: loss nan, time 116.97ms
iter 129670: loss nan, time 116.17ms
iter 129680: loss nan, time 115.20ms
iter 129690: loss nan, time 116.35ms
tensor(0.9911)
iter 129700: loss nan, time 118.45ms
iter 129710: loss nan, time 114.89ms
iter 129720: loss nan, time 117.88ms
iter 129730: loss nan, time 115.95ms
iter 129740: loss nan, time 115.06ms
step 129750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 129750: loss nan, time 2894.19ms
iter 129760: loss nan, time 114.78ms
iter 129770: loss nan, time 117.27ms
iter 129780: loss nan, time 118.45ms
iter 129790: loss nan, time 114.94ms
tensor(0.9961)
iter 129800: loss nan, time 118.39ms
iter 129810: loss nan, time 117.69ms
iter 129820: loss nan, time 120.11ms
iter 129830: loss nan, time 122.06ms
iter 129840: loss nan, time 122.80ms
iter 129850: loss nan, time 122.84ms
iter 129860: loss nan, time 122.60ms
iter 129870: loss nan, time 121.80ms
iter 129880: loss nan, time 118.61ms
iter 129890: loss nan, time 119.28ms
tensor(0.9990)
iter 129900: loss nan, time 119.55ms
iter 129910: loss nan, time 120.35ms
iter 129920: loss nan, time 120.37ms
iter 129930: loss nan, time 120.95ms
iter 129940: loss nan, time 123.04ms
iter 129950: loss nan, time 120.35ms
iter 129960: loss nan, time 122.53ms
iter 129970: loss nan, time 122.44ms
iter 129980: loss nan, time 121.24ms
iter 129990: loss nan, time 121.58ms
tensor(1.)
step 130000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 130000: loss nan, time 2884.96ms
iter 130010: loss nan, time 120.51ms
iter 130020: loss nan, time 121.55ms
iter 130030: loss nan, time 121.44ms
iter 130040: loss nan, time 119.43ms
iter 130050: loss nan, time 119.22ms
iter 130060: loss nan, time 119.04ms
iter 130070: loss nan, time 118.95ms
iter 130080: loss nan, time 120.12ms
iter 130090: loss nan, time 120.29ms
tensor(0.9990)
iter 130100: loss nan, time 121.59ms
iter 130110: loss nan, time 122.85ms
iter 130120: loss nan, time 120.95ms
iter 130130: loss nan, time 122.73ms
iter 130140: loss nan, time 122.87ms
iter 130150: loss nan, time 121.36ms
iter 130160: loss nan, time 121.29ms
iter 130170: loss nan, time 119.21ms
iter 130180: loss nan, time 119.79ms
iter 130190: loss nan, time 119.29ms
tensor(0.9961)
iter 130200: loss nan, time 120.20ms
iter 130210: loss nan, time 119.79ms
iter 130220: loss nan, time 120.66ms
iter 130230: loss nan, time 121.28ms
iter 130240: loss nan, time 122.60ms
step 130250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 130250: loss nan, time 2912.05ms
iter 130260: loss nan, time 120.39ms
iter 130270: loss nan, time 122.38ms
iter 130280: loss nan, time 122.41ms
iter 130290: loss nan, time 122.30ms
tensor(0.9911)
iter 130300: loss nan, time 120.81ms
iter 130310: loss nan, time 118.86ms
iter 130320: loss nan, time 121.08ms
iter 130330: loss nan, time 119.03ms
iter 130340: loss nan, time 118.82ms
iter 130350: loss nan, time 118.87ms
iter 130360: loss nan, time 118.87ms
iter 130370: loss nan, time 118.96ms
iter 130380: loss nan, time 120.01ms
iter 130390: loss nan, time 120.25ms
tensor(0.9843)
iter 130400: loss nan, time 121.45ms
iter 130410: loss nan, time 121.65ms
iter 130420: loss nan, time 121.71ms
iter 130430: loss nan, time 122.48ms
iter 130440: loss nan, time 122.37ms
iter 130450: loss nan, time 122.30ms
iter 130460: loss nan, time 121.47ms
iter 130470: loss nan, time 121.47ms
iter 130480: loss nan, time 119.36ms
iter 130490: loss nan, time 119.25ms
tensor(0.9755)
step 130500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 130500: loss nan, time 2920.62ms
iter 130510: loss nan, time 119.24ms
iter 130520: loss nan, time 120.12ms
iter 130530: loss nan, time 120.30ms
iter 130540: loss nan, time 121.00ms
iter 130550: loss nan, time 121.62ms
iter 130560: loss nan, time 120.43ms
iter 130570: loss nan, time 122.71ms
iter 130580: loss nan, time 122.64ms
iter 130590: loss nan, time 121.97ms
tensor(0.9649)
iter 130600: loss nan, time 120.17ms
iter 130610: loss nan, time 119.17ms
iter 130620: loss nan, time 119.06ms
iter 130630: loss nan, time 119.14ms
iter 130640: loss nan, time 119.83ms
iter 130650: loss nan, time 120.32ms
iter 130660: loss nan, time 121.83ms
iter 130670: loss nan, time 120.52ms
iter 130680: loss nan, time 122.63ms
iter 130690: loss nan, time 122.67ms
tensor(0.9524)
iter 130700: loss nan, time 122.98ms
iter 130710: loss nan, time 121.65ms
iter 130720: loss nan, time 121.31ms
iter 130730: loss nan, time 119.55ms
iter 130740: loss nan, time 119.46ms
step 130750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 130750: loss nan, time 2923.27ms
iter 130760: loss nan, time 120.00ms
iter 130770: loss nan, time 120.99ms
iter 130780: loss nan, time 120.86ms
iter 130790: loss nan, time 121.23ms
tensor(0.9382)
iter 130800: loss nan, time 122.72ms
iter 130810: loss nan, time 120.41ms
iter 130820: loss nan, time 122.76ms
iter 130830: loss nan, time 121.46ms
iter 130840: loss nan, time 121.30ms
iter 130850: loss nan, time 119.12ms
iter 130860: loss nan, time 119.32ms
iter 130870: loss nan, time 118.94ms
iter 130880: loss nan, time 119.79ms
iter 130890: loss nan, time 120.58ms
tensor(0.9222)
iter 130900: loss nan, time 121.56ms
iter 130910: loss nan, time 122.53ms
iter 130920: loss nan, time 121.33ms
iter 130930: loss nan, time 122.31ms
iter 130940: loss nan, time 122.65ms
iter 130950: loss nan, time 121.39ms
iter 130960: loss nan, time 121.96ms
iter 130970: loss nan, time 119.41ms
iter 130980: loss nan, time 119.47ms
iter 130990: loss nan, time 120.09ms
tensor(0.9045)
step 131000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 131000: loss nan, time 2921.81ms
iter 131010: loss nan, time 121.05ms
iter 131020: loss nan, time 123.04ms
iter 131030: loss nan, time 122.71ms
iter 131040: loss nan, time 122.20ms
iter 131050: loss nan, time 121.71ms
iter 131060: loss nan, time 119.48ms
iter 131070: loss nan, time 121.72ms
iter 131080: loss nan, time 119.73ms
iter 131090: loss nan, time 119.36ms
tensor(0.8853)
iter 131100: loss nan, time 119.76ms
iter 131110: loss nan, time 120.39ms
iter 131120: loss nan, time 119.43ms
iter 131130: loss nan, time 121.24ms
iter 131140: loss nan, time 123.17ms
iter 131150: loss nan, time 122.63ms
iter 131160: loss nan, time 122.94ms
iter 131170: loss nan, time 121.53ms
iter 131180: loss nan, time 121.29ms
iter 131190: loss nan, time 119.28ms
tensor(0.8645)
iter 131200: loss nan, time 119.52ms
iter 131210: loss nan, time 119.69ms
iter 131220: loss nan, time 120.33ms
iter 131230: loss nan, time 120.54ms
iter 131240: loss nan, time 121.65ms
step 131250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 131250: loss nan, time 2929.55ms
iter 131260: loss nan, time 120.43ms
iter 131270: loss nan, time 123.15ms
iter 131280: loss nan, time 121.58ms
iter 131290: loss nan, time 119.69ms
tensor(0.8423)
iter 131300: loss nan, time 119.84ms
iter 131310: loss nan, time 119.10ms
iter 131320: loss nan, time 119.65ms
iter 131330: loss nan, time 120.35ms
iter 131340: loss nan, time 120.87ms
iter 131350: loss nan, time 121.69ms
iter 131360: loss nan, time 122.58ms
iter 131370: loss nan, time 121.45ms
iter 131380: loss nan, time 122.54ms
iter 131390: loss nan, time 122.60ms
tensor(0.8187)
iter 131400: loss nan, time 121.63ms
iter 131410: loss nan, time 119.29ms
iter 131420: loss nan, time 119.12ms
iter 131430: loss nan, time 118.37ms
iter 131440: loss nan, time 118.58ms
iter 131450: loss nan, time 118.20ms
iter 131460: loss nan, time 119.51ms
iter 131470: loss nan, time 119.55ms
iter 131480: loss nan, time 120.45ms
iter 131490: loss nan, time 120.69ms
tensor(0.7939)
step 131500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 131500: loss nan, time 2920.85ms
iter 131510: loss nan, time 120.50ms
iter 131520: loss nan, time 121.68ms
iter 131530: loss nan, time 122.55ms
iter 131540: loss nan, time 122.07ms
iter 131550: loss nan, time 121.79ms
iter 131560: loss nan, time 119.19ms
iter 131570: loss nan, time 121.48ms
iter 131580: loss nan, time 119.27ms
iter 131590: loss nan, time 119.31ms
tensor(0.7679)
iter 131600: loss nan, time 119.59ms
iter 131610: loss nan, time 119.60ms
iter 131620: loss nan, time 119.85ms
iter 131630: loss nan, time 119.74ms
iter 131640: loss nan, time 120.11ms
iter 131650: loss nan, time 120.74ms
iter 131660: loss nan, time 122.56ms
iter 131670: loss nan, time 121.60ms
iter 131680: loss nan, time 122.61ms
iter 131690: loss nan, time 122.56ms
tensor(0.7409)
iter 131700: loss nan, time 121.72ms
iter 131710: loss nan, time 119.63ms
iter 131720: loss nan, time 119.39ms
iter 131730: loss nan, time 118.02ms
iter 131740: loss nan, time 119.19ms
step 131750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 131750: loss nan, time 2924.89ms
iter 131760: loss nan, time 120.23ms
iter 131770: loss nan, time 119.86ms
iter 131780: loss nan, time 121.31ms
iter 131790: loss nan, time 121.81ms
tensor(0.7129)
iter 131800: loss nan, time 123.79ms
iter 131810: loss nan, time 120.44ms
iter 131820: loss nan, time 122.00ms
iter 131830: loss nan, time 122.23ms
iter 131840: loss nan, time 121.64ms
iter 131850: loss nan, time 122.34ms
iter 131860: loss nan, time 120.24ms
iter 131870: loss nan, time 122.06ms
iter 131880: loss nan, time 121.73ms
iter 131890: loss nan, time 121.80ms
tensor(0.6841)
iter 131900: loss nan, time 122.39ms
iter 131910: loss nan, time 119.57ms
iter 131920: loss nan, time 121.83ms
iter 131930: loss nan, time 121.97ms
iter 131940: loss nan, time 121.75ms
iter 131950: loss nan, time 122.18ms
iter 131960: loss nan, time 119.61ms
iter 131970: loss nan, time 122.07ms
iter 131980: loss nan, time 121.15ms
iter 131990: loss nan, time 121.75ms
tensor(0.6545)
step 132000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 132000: loss nan, time 2911.73ms
iter 132010: loss nan, time 121.85ms
iter 132020: loss nan, time 119.88ms
iter 132030: loss nan, time 121.97ms
iter 132040: loss nan, time 121.86ms
iter 132050: loss nan, time 121.72ms
iter 132060: loss nan, time 121.56ms
iter 132070: loss nan, time 120.12ms
iter 132080: loss nan, time 121.20ms
iter 132090: loss nan, time 122.00ms
tensor(0.6243)
iter 132100: loss nan, time 121.06ms
iter 132110: loss nan, time 121.14ms
iter 132120: loss nan, time 119.65ms
iter 132130: loss nan, time 122.19ms
iter 132140: loss nan, time 121.66ms
iter 132150: loss nan, time 121.86ms
iter 132160: loss nan, time 121.33ms
iter 132170: loss nan, time 120.02ms
iter 132180: loss nan, time 121.07ms
iter 132190: loss nan, time 121.82ms
tensor(0.5937)
iter 132200: loss nan, time 121.26ms
iter 132210: loss nan, time 121.45ms
iter 132220: loss nan, time 119.72ms
iter 132230: loss nan, time 120.93ms
iter 132240: loss nan, time 120.83ms
step 132250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 132250: loss nan, time 2914.74ms
iter 132260: loss nan, time 122.02ms
iter 132270: loss nan, time 121.55ms
iter 132280: loss nan, time 118.82ms
iter 132290: loss nan, time 120.76ms
tensor(0.5627)
iter 132300: loss nan, time 121.28ms
iter 132310: loss nan, time 121.99ms
iter 132320: loss nan, time 121.72ms
iter 132330: loss nan, time 120.27ms
iter 132340: loss nan, time 120.87ms
iter 132350: loss nan, time 121.09ms
iter 132360: loss nan, time 121.84ms
iter 132370: loss nan, time 121.42ms
iter 132380: loss nan, time 119.86ms
iter 132390: loss nan, time 122.11ms
tensor(0.5314)
iter 132400: loss nan, time 121.55ms
iter 132410: loss nan, time 122.08ms
iter 132420: loss nan, time 120.62ms
iter 132430: loss nan, time 119.57ms
iter 132440: loss nan, time 122.72ms
iter 132450: loss nan, time 121.62ms
iter 132460: loss nan, time 121.87ms
iter 132470: loss nan, time 121.36ms
iter 132480: loss nan, time 119.79ms
iter 132490: loss nan, time 121.03ms
tensor(0.5000)
step 132500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 132500: loss nan, time 2905.22ms
iter 132510: loss nan, time 121.86ms
iter 132520: loss nan, time 121.77ms
iter 132530: loss nan, time 121.76ms
iter 132540: loss nan, time 118.81ms
iter 132550: loss nan, time 121.74ms
iter 132560: loss nan, time 121.54ms
iter 132570: loss nan, time 124.64ms
iter 132580: loss nan, time 120.75ms
iter 132590: loss nan, time 118.61ms
tensor(0.4686)
iter 132600: loss nan, time 121.29ms
iter 132610: loss nan, time 120.82ms
iter 132620: loss nan, time 121.76ms
iter 132630: loss nan, time 122.15ms
iter 132640: loss nan, time 119.65ms
iter 132650: loss nan, time 122.06ms
iter 132660: loss nan, time 121.90ms
iter 132670: loss nan, time 121.77ms
iter 132680: loss nan, time 121.82ms
iter 132690: loss nan, time 119.57ms
tensor(0.4373)
iter 132700: loss nan, time 121.48ms
iter 132710: loss nan, time 122.01ms
iter 132720: loss nan, time 121.79ms
iter 132730: loss nan, time 120.96ms
iter 132740: loss nan, time 118.78ms
step 132750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 132750: loss nan, time 2911.02ms
iter 132760: loss nan, time 120.84ms
iter 132770: loss nan, time 121.99ms
iter 132780: loss nan, time 120.71ms
iter 132790: loss nan, time 121.94ms
tensor(0.4063)
iter 132800: loss nan, time 119.88ms
iter 132810: loss nan, time 120.91ms
iter 132820: loss nan, time 121.84ms
iter 132830: loss nan, time 121.78ms
iter 132840: loss nan, time 121.75ms
iter 132850: loss nan, time 119.77ms
iter 132860: loss nan, time 120.83ms
iter 132870: loss nan, time 121.80ms
iter 132880: loss nan, time 121.87ms
iter 132890: loss nan, time 122.02ms
tensor(0.3757)
iter 132900: loss nan, time 119.88ms
iter 132910: loss nan, time 121.01ms
iter 132920: loss nan, time 121.90ms
iter 132930: loss nan, time 121.96ms
iter 132940: loss nan, time 122.34ms
iter 132950: loss nan, time 119.66ms
iter 132960: loss nan, time 120.06ms
iter 132970: loss nan, time 121.85ms
iter 132980: loss nan, time 122.43ms
iter 132990: loss nan, time 121.64ms
tensor(0.3455)
step 133000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 133000: loss nan, time 2919.34ms
iter 133010: loss nan, time 118.83ms
iter 133020: loss nan, time 120.18ms
iter 133030: loss nan, time 120.66ms
iter 133040: loss nan, time 120.81ms
iter 133050: loss nan, time 120.55ms
iter 133060: loss nan, time 118.65ms
iter 133070: loss nan, time 120.38ms
iter 133080: loss nan, time 120.74ms
iter 133090: loss nan, time 120.69ms
tensor(0.3159)
iter 133100: loss nan, time 120.95ms
iter 133110: loss nan, time 118.56ms
iter 133120: loss nan, time 120.50ms
iter 133130: loss nan, time 120.50ms
iter 133140: loss nan, time 120.80ms
iter 133150: loss nan, time 119.00ms
iter 133160: loss nan, time 118.49ms
iter 133170: loss nan, time 118.09ms
iter 133180: loss nan, time 118.78ms
iter 133190: loss nan, time 119.10ms
tensor(0.2871)
iter 133200: loss nan, time 118.91ms
iter 133210: loss nan, time 119.20ms
iter 133220: loss nan, time 117.78ms
iter 133230: loss nan, time 119.71ms
iter 133240: loss nan, time 119.80ms
step 133250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 133250: loss nan, time 2906.94ms
iter 133260: loss nan, time 118.85ms
iter 133270: loss nan, time 118.77ms
iter 133280: loss nan, time 117.47ms
iter 133290: loss nan, time 118.82ms
tensor(0.2591)
iter 133300: loss nan, time 119.27ms
iter 133310: loss nan, time 119.25ms
iter 133320: loss nan, time 120.29ms
iter 133330: loss nan, time 117.97ms
iter 133340: loss nan, time 120.67ms
iter 133350: loss nan, time 119.86ms
iter 133360: loss nan, time 119.63ms
iter 133370: loss nan, time 119.85ms
iter 133380: loss nan, time 118.03ms
iter 133390: loss nan, time 119.60ms
tensor(0.2321)
iter 133400: loss nan, time 119.91ms
iter 133410: loss nan, time 119.84ms
iter 133420: loss nan, time 120.12ms
iter 133430: loss nan, time 118.10ms
iter 133440: loss nan, time 119.90ms
iter 133450: loss nan, time 119.68ms
iter 133460: loss nan, time 119.70ms
iter 133470: loss nan, time 119.92ms
iter 133480: loss nan, time 118.48ms
iter 133490: loss nan, time 119.69ms
tensor(0.2061)
step 133500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 133500: loss nan, time 2904.38ms
iter 133510: loss nan, time 118.72ms
iter 133520: loss nan, time 118.72ms
iter 133530: loss nan, time 118.81ms
iter 133540: loss nan, time 118.09ms
iter 133550: loss nan, time 118.60ms
iter 133560: loss nan, time 118.41ms
iter 133570: loss nan, time 119.02ms
iter 133580: loss nan, time 118.65ms
iter 133590: loss nan, time 117.43ms
tensor(0.1813)
iter 133600: loss nan, time 118.92ms
iter 133610: loss nan, time 118.90ms
iter 133620: loss nan, time 118.72ms
iter 133630: loss nan, time 118.93ms
iter 133640: loss nan, time 117.33ms
iter 133650: loss nan, time 119.53ms
iter 133660: loss nan, time 118.74ms
iter 133670: loss nan, time 118.84ms
iter 133680: loss nan, time 118.73ms
iter 133690: loss nan, time 117.92ms
tensor(0.1577)
iter 133700: loss nan, time 119.08ms
iter 133710: loss nan, time 119.13ms
iter 133720: loss nan, time 118.96ms
iter 133730: loss nan, time 119.14ms
iter 133740: loss nan, time 118.26ms
step 133750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 133750: loss nan, time 2905.88ms
iter 133760: loss nan, time 119.16ms
iter 133770: loss nan, time 118.76ms
iter 133780: loss nan, time 118.63ms
iter 133790: loss nan, time 118.59ms
tensor(0.1355)
iter 133800: loss nan, time 118.47ms
iter 133810: loss nan, time 118.51ms
iter 133820: loss nan, time 119.23ms
iter 133830: loss nan, time 118.62ms
iter 133840: loss nan, time 118.55ms
iter 133850: loss nan, time 118.20ms
iter 133860: loss nan, time 118.55ms
iter 133870: loss nan, time 118.82ms
iter 133880: loss nan, time 119.01ms
iter 133890: loss nan, time 118.97ms
tensor(0.1147)
iter 133900: loss nan, time 117.83ms
iter 133910: loss nan, time 118.90ms
iter 133920: loss nan, time 119.03ms
iter 133930: loss nan, time 118.02ms
iter 133940: loss nan, time 118.98ms
iter 133950: loss nan, time 117.06ms
iter 133960: loss nan, time 118.76ms
iter 133970: loss nan, time 118.93ms
iter 133980: loss nan, time 118.88ms
iter 133990: loss nan, time 118.86ms
tensor(0.0955)
step 134000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 134000: loss nan, time 2902.17ms
iter 134010: loss nan, time 120.85ms
iter 134020: loss nan, time 120.74ms
iter 134030: loss nan, time 120.54ms
iter 134040: loss nan, time 120.19ms
iter 134050: loss nan, time 118.52ms
iter 134060: loss nan, time 120.09ms
iter 134070: loss nan, time 119.73ms
iter 134080: loss nan, time 120.67ms
iter 134090: loss nan, time 120.57ms
tensor(0.0778)
iter 134100: loss nan, time 119.01ms
iter 134110: loss nan, time 120.98ms
iter 134120: loss nan, time 120.84ms
iter 134130: loss nan, time 120.66ms
iter 134140: loss nan, time 120.87ms
iter 134150: loss nan, time 118.53ms
iter 134160: loss nan, time 118.87ms
iter 134170: loss nan, time 121.25ms
iter 134180: loss nan, time 120.41ms
iter 134190: loss nan, time 120.61ms
tensor(0.0618)
iter 134200: loss nan, time 118.74ms
iter 134210: loss nan, time 120.62ms
iter 134220: loss nan, time 120.67ms
iter 134230: loss nan, time 120.60ms
iter 134240: loss nan, time 120.56ms
step 134250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 134250: loss nan, time 2905.37ms
iter 134260: loss nan, time 119.80ms
iter 134270: loss nan, time 120.49ms
iter 134280: loss nan, time 121.03ms
iter 134290: loss nan, time 120.59ms
tensor(0.0476)
iter 134300: loss nan, time 120.98ms
iter 134310: loss nan, time 118.75ms
iter 134320: loss nan, time 120.79ms
iter 134330: loss nan, time 120.98ms
iter 134340: loss nan, time 120.56ms
iter 134350: loss nan, time 120.67ms
iter 134360: loss nan, time 118.45ms
iter 134370: loss nan, time 120.62ms
iter 134380: loss nan, time 120.59ms
iter 134390: loss nan, time 120.00ms
tensor(0.0351)
iter 134400: loss nan, time 120.67ms
iter 134410: loss nan, time 118.48ms
iter 134420: loss nan, time 120.70ms
iter 134430: loss nan, time 120.78ms
iter 134440: loss nan, time 120.49ms
iter 134450: loss nan, time 120.10ms
iter 134460: loss nan, time 118.54ms
iter 134470: loss nan, time 120.63ms
iter 134480: loss nan, time 120.55ms
iter 134490: loss nan, time 121.03ms
tensor(0.0245)
step 134500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 134500: loss nan, time 2908.78ms
iter 134510: loss nan, time 120.68ms
iter 134520: loss nan, time 118.56ms
iter 134530: loss nan, time 120.86ms
iter 134540: loss nan, time 121.90ms
iter 134550: loss nan, time 121.14ms
iter 134560: loss nan, time 120.84ms
iter 134570: loss nan, time 118.47ms
iter 134580: loss nan, time 120.01ms
iter 134590: loss nan, time 120.66ms
tensor(0.0157)
iter 134600: loss nan, time 121.25ms
iter 134610: loss nan, time 121.26ms
iter 134620: loss nan, time 118.64ms
iter 134630: loss nan, time 120.59ms
iter 134640: loss nan, time 120.74ms
iter 134650: loss nan, time 120.70ms
iter 134660: loss nan, time 120.84ms
iter 134670: loss nan, time 118.51ms
iter 134680: loss nan, time 120.33ms
iter 134690: loss nan, time 119.61ms
tensor(0.0089)
iter 134700: loss nan, time 121.12ms
iter 134710: loss nan, time 121.11ms
iter 134720: loss nan, time 118.81ms
iter 134730: loss nan, time 121.11ms
iter 134740: loss nan, time 121.02ms
step 134750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 134750: loss nan, time 2909.29ms
iter 134760: loss nan, time 120.90ms
iter 134770: loss nan, time 121.02ms
iter 134780: loss nan, time 118.72ms
iter 134790: loss nan, time 118.48ms
tensor(0.0039)
iter 134800: loss nan, time 119.57ms
iter 134810: loss nan, time 119.74ms
iter 134820: loss nan, time 119.89ms
iter 134830: loss nan, time 120.00ms
iter 134840: loss nan, time 118.85ms
iter 134850: loss nan, time 120.35ms
iter 134860: loss nan, time 119.41ms
iter 134870: loss nan, time 120.01ms
iter 134880: loss nan, time 119.83ms
iter 134890: loss nan, time 118.84ms
tensor(0.0010)
iter 134900: loss nan, time 120.53ms
iter 134910: loss nan, time 120.72ms
iter 134920: loss nan, time 120.92ms
iter 134930: loss nan, time 121.11ms
iter 134940: loss nan, time 120.49ms
iter 134950: loss nan, time 122.22ms
iter 134960: loss nan, time 121.66ms
iter 134970: loss nan, time 122.27ms
iter 134980: loss nan, time 120.95ms
iter 134990: loss nan, time 120.87ms
tensor(0.0010)
step 135000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 135000: loss nan, time 2915.08ms
iter 135010: loss nan, time 120.83ms
iter 135020: loss nan, time 121.21ms
iter 135030: loss nan, time 120.93ms
iter 135040: loss nan, time 120.87ms
iter 135050: loss nan, time 120.65ms
iter 135060: loss nan, time 120.79ms
iter 135070: loss nan, time 121.33ms
iter 135080: loss nan, time 119.97ms
iter 135090: loss nan, time 120.79ms
tensor(0.0010)
iter 135100: loss nan, time 120.75ms
iter 135110: loss nan, time 118.83ms
iter 135120: loss nan, time 118.71ms
iter 135130: loss nan, time 118.75ms
iter 135140: loss nan, time 118.78ms
iter 135150: loss nan, time 118.43ms
iter 135160: loss nan, time 119.02ms
iter 135170: loss nan, time 119.44ms
iter 135180: loss nan, time 119.73ms
iter 135190: loss nan, time 119.83ms
tensor(0.0039)
iter 135200: loss nan, time 119.39ms
iter 135210: loss nan, time 119.91ms
iter 135220: loss nan, time 119.86ms
iter 135230: loss nan, time 119.94ms
iter 135240: loss nan, time 119.69ms
step 135250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 135250: loss nan, time 2900.65ms
iter 135260: loss nan, time 119.20ms
iter 135270: loss nan, time 120.36ms
iter 135280: loss nan, time 120.16ms
iter 135290: loss nan, time 120.13ms
tensor(0.0089)
iter 135300: loss nan, time 120.78ms
iter 135310: loss nan, time 119.77ms
iter 135320: loss nan, time 120.22ms
iter 135330: loss nan, time 120.06ms
iter 135340: loss nan, time 119.92ms
iter 135350: loss nan, time 120.82ms
iter 135360: loss nan, time 120.75ms
iter 135370: loss nan, time 121.54ms
iter 135380: loss nan, time 122.17ms
iter 135390: loss nan, time 119.95ms
tensor(0.0157)
iter 135400: loss nan, time 122.49ms
iter 135410: loss nan, time 122.53ms
iter 135420: loss nan, time 121.27ms
iter 135430: loss nan, time 120.68ms
iter 135440: loss nan, time 118.58ms
iter 135450: loss nan, time 120.79ms
iter 135460: loss nan, time 120.86ms
iter 135470: loss nan, time 121.16ms
iter 135480: loss nan, time 120.84ms
iter 135490: loss nan, time 119.05ms
tensor(0.0245)
step 135500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 135500: loss nan, time 2912.22ms
iter 135510: loss nan, time 120.81ms
iter 135520: loss nan, time 120.69ms
iter 135530: loss nan, time 121.03ms
iter 135540: loss nan, time 118.97ms
iter 135550: loss nan, time 119.05ms
iter 135560: loss nan, time 118.84ms
iter 135570: loss nan, time 119.08ms
iter 135580: loss nan, time 120.37ms
iter 135590: loss nan, time 119.97ms
tensor(0.0351)
iter 135600: loss nan, time 120.60ms
iter 135610: loss nan, time 119.22ms
iter 135620: loss nan, time 120.24ms
iter 135630: loss nan, time 119.82ms
iter 135640: loss nan, time 120.37ms
iter 135650: loss nan, time 121.16ms
iter 135660: loss nan, time 121.01ms
iter 135670: loss nan, time 122.24ms
iter 135680: loss nan, time 121.17ms
iter 135690: loss nan, time 120.94ms
tensor(0.0476)
iter 135700: loss nan, time 120.16ms
iter 135710: loss nan, time 121.65ms
iter 135720: loss nan, time 120.94ms
iter 135730: loss nan, time 120.96ms
iter 135740: loss nan, time 120.83ms
step 135750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 135750: loss nan, time 2905.07ms
iter 135760: loss nan, time 120.79ms
iter 135770: loss nan, time 120.86ms
iter 135780: loss nan, time 120.82ms
iter 135790: loss nan, time 120.62ms
tensor(0.0618)
iter 135800: loss nan, time 121.21ms
iter 135810: loss nan, time 118.91ms
iter 135820: loss nan, time 119.07ms
iter 135830: loss nan, time 118.89ms
iter 135840: loss nan, time 119.71ms
iter 135850: loss nan, time 118.56ms
iter 135860: loss nan, time 118.08ms
iter 135870: loss nan, time 118.51ms
iter 135880: loss nan, time 118.42ms
iter 135890: loss nan, time 118.59ms
tensor(0.0778)
iter 135900: loss nan, time 118.85ms
iter 135910: loss nan, time 118.82ms
iter 135920: loss nan, time 118.56ms
iter 135930: loss nan, time 119.53ms
iter 135940: loss nan, time 120.06ms
iter 135950: loss nan, time 119.92ms
iter 135960: loss nan, time 119.74ms
iter 135970: loss nan, time 120.30ms
iter 135980: loss nan, time 120.39ms
iter 135990: loss nan, time 120.55ms
tensor(0.0955)
step 136000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 136000: loss nan, time 2901.60ms
iter 136010: loss nan, time 118.65ms
iter 136020: loss nan, time 118.83ms
iter 136030: loss nan, time 118.66ms
iter 136040: loss nan, time 118.82ms
iter 136050: loss nan, time 119.05ms
iter 136060: loss nan, time 118.58ms
iter 136070: loss nan, time 119.12ms
iter 136080: loss nan, time 119.07ms
iter 136090: loss nan, time 119.41ms
tensor(0.1147)
iter 136100: loss nan, time 120.19ms
iter 136110: loss nan, time 119.99ms
iter 136120: loss nan, time 121.06ms
iter 136130: loss nan, time 121.29ms
iter 136140: loss nan, time 122.43ms
iter 136150: loss nan, time 123.16ms
iter 136160: loss nan, time 120.37ms
iter 136170: loss nan, time 121.71ms
iter 136180: loss nan, time 121.40ms
iter 136190: loss nan, time 121.45ms
tensor(0.1355)
iter 136200: loss nan, time 120.09ms
iter 136210: loss nan, time 119.07ms
iter 136220: loss nan, time 118.49ms
iter 136230: loss nan, time 118.36ms
iter 136240: loss nan, time 116.35ms
step 136250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 136250: loss nan, time 2893.45ms
iter 136260: loss nan, time 120.46ms
iter 136270: loss nan, time 117.66ms
iter 136280: loss nan, time 119.59ms
iter 136290: loss nan, time 120.26ms
tensor(0.1577)
iter 136300: loss nan, time 120.68ms
iter 136310: loss nan, time 121.12ms
iter 136320: loss nan, time 119.45ms
iter 136330: loss nan, time 121.70ms
iter 136340: loss nan, time 121.76ms
iter 136350: loss nan, time 121.80ms
iter 136360: loss nan, time 121.60ms
iter 136370: loss nan, time 119.53ms
iter 136380: loss nan, time 121.56ms
iter 136390: loss nan, time 121.74ms
tensor(0.1813)
iter 136400: loss nan, time 121.76ms
iter 136410: loss nan, time 121.55ms
iter 136420: loss nan, time 119.40ms
iter 136430: loss nan, time 121.67ms
iter 136440: loss nan, time 121.55ms
iter 136450: loss nan, time 121.77ms
iter 136460: loss nan, time 121.77ms
iter 136470: loss nan, time 119.87ms
iter 136480: loss nan, time 121.59ms
iter 136490: loss nan, time 121.54ms
tensor(0.2061)
step 136500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 136500: loss nan, time 2914.53ms
iter 136510: loss nan, time 121.85ms
iter 136520: loss nan, time 121.55ms
iter 136530: loss nan, time 119.70ms
iter 136540: loss nan, time 121.54ms
iter 136550: loss nan, time 121.56ms
iter 136560: loss nan, time 121.61ms
iter 136570: loss nan, time 121.66ms
iter 136580: loss nan, time 119.89ms
iter 136590: loss nan, time 120.37ms
tensor(0.2321)
iter 136600: loss nan, time 120.65ms
iter 136610: loss nan, time 120.72ms
iter 136620: loss nan, time 120.83ms
iter 136630: loss nan, time 118.96ms
iter 136640: loss nan, time 121.09ms
iter 136650: loss nan, time 121.90ms
iter 136660: loss nan, time 121.94ms
iter 136670: loss nan, time 119.98ms
iter 136680: loss nan, time 119.73ms
iter 136690: loss nan, time 119.33ms
tensor(0.2591)
iter 136700: loss nan, time 119.91ms
iter 136710: loss nan, time 119.33ms
iter 136720: loss nan, time 117.80ms
iter 136730: loss nan, time 119.32ms
iter 136740: loss nan, time 119.42ms
step 136750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 136750: loss nan, time 2901.43ms
iter 136760: loss nan, time 118.54ms
iter 136770: loss nan, time 118.86ms
iter 136780: loss nan, time 118.41ms
iter 136790: loss nan, time 118.45ms
tensor(0.2871)
iter 136800: loss nan, time 118.19ms
iter 136810: loss nan, time 118.68ms
iter 136820: loss nan, time 118.52ms
iter 136830: loss nan, time 118.48ms
iter 136840: loss nan, time 118.51ms
iter 136850: loss nan, time 118.48ms
iter 136860: loss nan, time 118.46ms
iter 136870: loss nan, time 118.58ms
iter 136880: loss nan, time 118.35ms
iter 136890: loss nan, time 118.50ms
tensor(0.3159)
iter 136900: loss nan, time 118.60ms
iter 136910: loss nan, time 118.40ms
iter 136920: loss nan, time 118.41ms
iter 136930: loss nan, time 118.39ms
iter 136940: loss nan, time 118.39ms
iter 136950: loss nan, time 118.83ms
iter 136960: loss nan, time 118.02ms
iter 136970: loss nan, time 118.46ms
iter 136980: loss nan, time 118.52ms
iter 136990: loss nan, time 118.52ms
tensor(0.3455)
step 137000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 137000: loss nan, time 2909.99ms
iter 137010: loss nan, time 118.58ms
iter 137020: loss nan, time 118.58ms
iter 137030: loss nan, time 118.51ms
iter 137040: loss nan, time 118.39ms
iter 137050: loss nan, time 118.51ms
iter 137060: loss nan, time 118.23ms
iter 137070: loss nan, time 117.78ms
iter 137080: loss nan, time 120.49ms
iter 137090: loss nan, time 118.80ms
tensor(0.3757)
iter 137100: loss nan, time 118.91ms
iter 137110: loss nan, time 120.32ms
iter 137120: loss nan, time 118.72ms
iter 137130: loss nan, time 118.32ms
iter 137140: loss nan, time 118.72ms
iter 137150: loss nan, time 118.56ms
iter 137160: loss nan, time 120.60ms
iter 137170: loss nan, time 118.82ms
iter 137180: loss nan, time 118.45ms
iter 137190: loss nan, time 118.48ms
tensor(0.4063)
iter 137200: loss nan, time 118.69ms
iter 137210: loss nan, time 118.68ms
iter 137220: loss nan, time 118.80ms
iter 137230: loss nan, time 119.93ms
iter 137240: loss nan, time 118.84ms
step 137250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 137250: loss nan, time 2917.42ms
iter 137260: loss nan, time 118.44ms
iter 137270: loss nan, time 118.78ms
iter 137280: loss nan, time 118.82ms
iter 137290: loss nan, time 118.65ms
tensor(0.4373)
iter 137300: loss nan, time 119.08ms
iter 137310: loss nan, time 119.13ms
iter 137320: loss nan, time 118.29ms
iter 137330: loss nan, time 118.21ms
iter 137340: loss nan, time 118.56ms
iter 137350: loss nan, time 119.88ms
iter 137360: loss nan, time 119.07ms
iter 137370: loss nan, time 119.17ms
iter 137380: loss nan, time 118.90ms
iter 137390: loss nan, time 120.26ms
tensor(0.4686)
iter 137400: loss nan, time 120.40ms
iter 137410: loss nan, time 121.01ms
iter 137420: loss nan, time 119.60ms
iter 137430: loss nan, time 120.02ms
iter 137440: loss nan, time 120.86ms
iter 137450: loss nan, time 121.19ms
iter 137460: loss nan, time 120.38ms
iter 137470: loss nan, time 122.36ms
iter 137480: loss nan, time 121.11ms
iter 137490: loss nan, time 121.33ms
tensor(0.5000)
step 137500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 137500: loss nan, time 2907.06ms
iter 137510: loss nan, time 121.04ms
iter 137520: loss nan, time 119.83ms
iter 137530: loss nan, time 120.62ms
iter 137540: loss nan, time 122.15ms
iter 137550: loss nan, time 122.14ms
iter 137560: loss nan, time 121.60ms
iter 137570: loss nan, time 119.90ms
iter 137580: loss nan, time 122.69ms
iter 137590: loss nan, time 122.43ms
tensor(0.5314)
iter 137600: loss nan, time 122.79ms
iter 137610: loss nan, time 122.16ms
iter 137620: loss nan, time 119.94ms
iter 137630: loss nan, time 121.15ms
iter 137640: loss nan, time 121.25ms
iter 137650: loss nan, time 120.73ms
iter 137660: loss nan, time 120.86ms
iter 137670: loss nan, time 118.66ms
iter 137680: loss nan, time 120.89ms
iter 137690: loss nan, time 120.58ms
tensor(0.5627)
iter 137700: loss nan, time 121.08ms
iter 137710: loss nan, time 120.86ms
iter 137720: loss nan, time 118.60ms
iter 137730: loss nan, time 119.51ms
iter 137740: loss nan, time 119.13ms
step 137750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 137750: loss nan, time 2900.27ms
iter 137760: loss nan, time 120.97ms
iter 137770: loss nan, time 120.73ms
iter 137780: loss nan, time 118.64ms
iter 137790: loss nan, time 121.41ms
tensor(0.5937)
iter 137800: loss nan, time 121.24ms
iter 137810: loss nan, time 121.72ms
iter 137820: loss nan, time 120.86ms
iter 137830: loss nan, time 118.77ms
iter 137840: loss nan, time 120.02ms
iter 137850: loss nan, time 121.79ms
iter 137860: loss nan, time 121.14ms
iter 137870: loss nan, time 121.03ms
iter 137880: loss nan, time 115.98ms
iter 137890: loss nan, time 115.91ms
tensor(0.6243)
iter 137900: loss nan, time 116.84ms
iter 137910: loss nan, time 118.38ms
iter 137920: loss nan, time 115.63ms
iter 137930: loss nan, time 116.15ms
iter 137940: loss nan, time 115.92ms
iter 137950: loss nan, time 114.75ms
iter 137960: loss nan, time 116.92ms
iter 137970: loss nan, time 115.48ms
iter 137980: loss nan, time 116.40ms
iter 137990: loss nan, time 116.14ms
tensor(0.6545)
step 138000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 138000: loss nan, time 2899.07ms
iter 138010: loss nan, time 116.77ms
iter 138020: loss nan, time 114.64ms
iter 138030: loss nan, time 117.99ms
iter 138040: loss nan, time 116.12ms
iter 138050: loss nan, time 115.01ms
iter 138060: loss nan, time 118.12ms
iter 138070: loss nan, time 116.25ms
iter 138080: loss nan, time 114.67ms
iter 138090: loss nan, time 117.22ms
tensor(0.6841)
iter 138100: loss nan, time 115.39ms
iter 138110: loss nan, time 116.93ms
iter 138120: loss nan, time 117.84ms
iter 138130: loss nan, time 114.69ms
iter 138140: loss nan, time 115.95ms
iter 138150: loss nan, time 116.29ms
iter 138160: loss nan, time 115.47ms
iter 138170: loss nan, time 118.14ms
iter 138180: loss nan, time 115.93ms
iter 138190: loss nan, time 115.80ms
tensor(0.7129)
iter 138200: loss nan, time 116.32ms
iter 138210: loss nan, time 115.38ms
iter 138220: loss nan, time 116.85ms
iter 138230: loss nan, time 118.22ms
iter 138240: loss nan, time 114.66ms
step 138250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 138250: loss nan, time 2907.48ms
iter 138260: loss nan, time 122.66ms
iter 138270: loss nan, time 119.83ms
iter 138280: loss nan, time 121.21ms
iter 138290: loss nan, time 121.00ms
tensor(0.7409)
iter 138300: loss nan, time 121.15ms
iter 138310: loss nan, time 121.20ms
iter 138320: loss nan, time 118.69ms
iter 138330: loss nan, time 120.93ms
iter 138340: loss nan, time 119.06ms
iter 138350: loss nan, time 118.87ms
iter 138360: loss nan, time 118.80ms
iter 138370: loss nan, time 118.74ms
iter 138380: loss nan, time 118.73ms
iter 138390: loss nan, time 118.62ms
tensor(0.7679)
iter 138400: loss nan, time 119.32ms
iter 138410: loss nan, time 118.37ms
iter 138420: loss nan, time 118.67ms
iter 138430: loss nan, time 118.62ms
iter 138440: loss nan, time 118.73ms
iter 138450: loss nan, time 118.64ms
iter 138460: loss nan, time 119.36ms
iter 138470: loss nan, time 118.81ms
iter 138480: loss nan, time 118.65ms
iter 138490: loss nan, time 118.71ms
tensor(0.7939)
step 138500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 138500: loss nan, time 2899.97ms
iter 138510: loss nan, time 118.60ms
iter 138520: loss nan, time 117.96ms
iter 138530: loss nan, time 118.73ms
iter 138540: loss nan, time 118.70ms
iter 138550: loss nan, time 118.68ms
iter 138560: loss nan, time 118.73ms
iter 138570: loss nan, time 118.19ms
iter 138580: loss nan, time 119.42ms
iter 138590: loss nan, time 118.78ms
tensor(0.8187)
iter 138600: loss nan, time 118.90ms
iter 138610: loss nan, time 119.06ms
iter 138620: loss nan, time 118.90ms
iter 138630: loss nan, time 119.25ms
iter 138640: loss nan, time 118.79ms
iter 138650: loss nan, time 120.10ms
iter 138660: loss nan, time 119.97ms
iter 138670: loss nan, time 120.78ms
iter 138680: loss nan, time 120.27ms
iter 138690: loss nan, time 119.29ms
tensor(0.8423)
iter 138700: loss nan, time 120.79ms
iter 138710: loss nan, time 120.93ms
iter 138720: loss nan, time 121.33ms
iter 138730: loss nan, time 121.66ms
iter 138740: loss nan, time 120.96ms
step 138750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 138750: loss nan, time 2903.64ms
iter 138760: loss nan, time 120.69ms
iter 138770: loss nan, time 121.25ms
iter 138780: loss nan, time 121.91ms
iter 138790: loss nan, time 122.21ms
tensor(0.8645)
iter 138800: loss nan, time 121.38ms
iter 138810: loss nan, time 122.35ms
iter 138820: loss nan, time 122.24ms
iter 138830: loss nan, time 122.26ms
iter 138840: loss nan, time 122.26ms
iter 138850: loss nan, time 121.03ms
iter 138860: loss nan, time 121.13ms
iter 138870: loss nan, time 121.02ms
iter 138880: loss nan, time 121.51ms
iter 138890: loss nan, time 121.03ms
tensor(0.8853)
iter 138900: loss nan, time 121.34ms
iter 138910: loss nan, time 118.10ms
iter 138920: loss nan, time 118.48ms
iter 138930: loss nan, time 118.80ms
iter 138940: loss nan, time 118.24ms
iter 138950: loss nan, time 119.10ms
iter 138960: loss nan, time 119.80ms
iter 138970: loss nan, time 120.22ms
iter 138980: loss nan, time 120.04ms
iter 138990: loss nan, time 119.92ms
tensor(0.9045)
step 139000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 139000: loss nan, time 2908.76ms
iter 139010: loss nan, time 120.01ms
iter 139020: loss nan, time 120.09ms
iter 139030: loss nan, time 119.95ms
iter 139040: loss nan, time 119.98ms
iter 139050: loss nan, time 119.87ms
iter 139060: loss nan, time 120.00ms
iter 139070: loss nan, time 121.60ms
iter 139080: loss nan, time 120.30ms
iter 139090: loss nan, time 119.97ms
tensor(0.9222)
iter 139100: loss nan, time 121.01ms
iter 139110: loss nan, time 121.00ms
iter 139120: loss nan, time 121.59ms
iter 139130: loss nan, time 122.28ms
iter 139140: loss nan, time 119.77ms
iter 139150: loss nan, time 122.21ms
iter 139160: loss nan, time 121.24ms
iter 139170: loss nan, time 121.05ms
iter 139180: loss nan, time 120.77ms
iter 139190: loss nan, time 118.67ms
tensor(0.9382)
iter 139200: loss nan, time 120.66ms
iter 139210: loss nan, time 120.76ms
iter 139220: loss nan, time 120.85ms
iter 139230: loss nan, time 120.69ms
iter 139240: loss nan, time 118.84ms
step 139250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 139250: loss nan, time 2898.25ms
iter 139260: loss nan, time 120.59ms
iter 139270: loss nan, time 120.77ms
iter 139280: loss nan, time 120.71ms
iter 139290: loss nan, time 121.04ms
tensor(0.9524)
iter 139300: loss nan, time 118.90ms
iter 139310: loss nan, time 120.89ms
iter 139320: loss nan, time 120.94ms
iter 139330: loss nan, time 120.87ms
iter 139340: loss nan, time 120.90ms
iter 139350: loss nan, time 118.74ms
iter 139360: loss nan, time 121.08ms
iter 139370: loss nan, time 119.00ms
iter 139380: loss nan, time 118.69ms
iter 139390: loss nan, time 119.05ms
tensor(0.9649)
iter 139400: loss nan, time 120.14ms
iter 139410: loss nan, time 119.19ms
iter 139420: loss nan, time 120.12ms
iter 139430: loss nan, time 120.08ms
iter 139440: loss nan, time 120.83ms
iter 139450: loss nan, time 121.05ms
iter 139460: loss nan, time 120.90ms
iter 139470: loss nan, time 122.74ms
iter 139480: loss nan, time 122.95ms
iter 139490: loss nan, time 121.84ms
tensor(0.9755)
step 139500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 139500: loss nan, time 2912.22ms
iter 139510: loss nan, time 121.75ms
iter 139520: loss nan, time 121.53ms
iter 139530: loss nan, time 119.48ms
iter 139540: loss nan, time 119.74ms
iter 139550: loss nan, time 120.43ms
iter 139560: loss nan, time 121.04ms
iter 139570: loss nan, time 120.48ms
iter 139580: loss nan, time 121.42ms
iter 139590: loss nan, time 122.78ms
tensor(0.9843)
iter 139600: loss nan, time 120.08ms
iter 139610: loss nan, time 121.45ms
iter 139620: loss nan, time 121.88ms
iter 139630: loss nan, time 119.34ms
iter 139640: loss nan, time 120.14ms
iter 139650: loss nan, time 120.69ms
iter 139660: loss nan, time 119.27ms
iter 139670: loss nan, time 120.49ms
iter 139680: loss nan, time 121.63ms
iter 139690: loss nan, time 122.71ms
tensor(0.9911)
iter 139700: loss nan, time 121.79ms
iter 139710: loss nan, time 121.30ms
iter 139720: loss nan, time 121.39ms
iter 139730: loss nan, time 121.31ms
iter 139740: loss nan, time 119.57ms
step 139750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 139750: loss nan, time 2906.58ms
iter 139760: loss nan, time 119.50ms
iter 139770: loss nan, time 120.01ms
iter 139780: loss nan, time 120.59ms
iter 139790: loss nan, time 120.45ms
tensor(0.9961)
iter 139800: loss nan, time 120.60ms
iter 139810: loss nan, time 122.52ms
iter 139820: loss nan, time 122.50ms
iter 139830: loss nan, time 121.58ms
iter 139840: loss nan, time 121.27ms
iter 139850: loss nan, time 119.17ms
iter 139860: loss nan, time 121.54ms
iter 139870: loss nan, time 119.44ms
iter 139880: loss nan, time 120.04ms
iter 139890: loss nan, time 120.58ms
tensor(0.9990)
iter 139900: loss nan, time 120.83ms
iter 139910: loss nan, time 119.34ms
iter 139920: loss nan, time 121.18ms
iter 139930: loss nan, time 122.25ms
iter 139940: loss nan, time 122.71ms
iter 139950: loss nan, time 121.45ms
iter 139960: loss nan, time 121.48ms
iter 139970: loss nan, time 121.56ms
iter 139980: loss nan, time 121.52ms
iter 139990: loss nan, time 119.70ms
tensor(1.)
step 140000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 140000: loss nan, time 2916.29ms
iter 140010: loss nan, time 120.61ms
iter 140020: loss nan, time 120.22ms
iter 140030: loss nan, time 120.62ms
iter 140040: loss nan, time 121.06ms
iter 140050: loss nan, time 120.28ms
iter 140060: loss nan, time 122.56ms
iter 140070: loss nan, time 121.61ms
iter 140080: loss nan, time 121.22ms
iter 140090: loss nan, time 121.30ms
tensor(0.9990)
iter 140100: loss nan, time 119.48ms
iter 140110: loss nan, time 121.23ms
iter 140120: loss nan, time 119.04ms
iter 140130: loss nan, time 119.99ms
iter 140140: loss nan, time 120.50ms
iter 140150: loss nan, time 120.54ms
iter 140160: loss nan, time 119.26ms
iter 140170: loss nan, time 121.53ms
iter 140180: loss nan, time 122.26ms
iter 140190: loss nan, time 123.00ms
tensor(0.9961)
iter 140200: loss nan, time 122.61ms
iter 140210: loss nan, time 121.54ms
iter 140220: loss nan, time 121.31ms
iter 140230: loss nan, time 119.26ms
iter 140240: loss nan, time 119.89ms
step 140250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 140250: loss nan, time 2908.02ms
iter 140260: loss nan, time 120.28ms
iter 140270: loss nan, time 120.36ms
iter 140280: loss nan, time 120.85ms
iter 140290: loss nan, time 120.67ms
tensor(0.9911)
iter 140300: loss nan, time 120.65ms
iter 140310: loss nan, time 122.62ms
iter 140320: loss nan, time 122.79ms
iter 140330: loss nan, time 121.41ms
iter 140340: loss nan, time 121.21ms
iter 140350: loss nan, time 120.17ms
iter 140360: loss nan, time 119.21ms
iter 140370: loss nan, time 119.73ms
iter 140380: loss nan, time 120.50ms
iter 140390: loss nan, time 120.54ms
tensor(0.9843)
iter 140400: loss nan, time 120.99ms
iter 140410: loss nan, time 119.04ms
iter 140420: loss nan, time 121.88ms
iter 140430: loss nan, time 122.75ms
iter 140440: loss nan, time 121.71ms
iter 140450: loss nan, time 121.52ms
iter 140460: loss nan, time 121.57ms
iter 140470: loss nan, time 121.26ms
iter 140480: loss nan, time 119.28ms
iter 140490: loss nan, time 120.14ms
tensor(0.9755)
step 140500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 140500: loss nan, time 2901.83ms
iter 140510: loss nan, time 119.86ms
iter 140520: loss nan, time 119.69ms
iter 140530: loss nan, time 120.54ms
iter 140540: loss nan, time 120.26ms
iter 140550: loss nan, time 120.30ms
iter 140560: loss nan, time 121.59ms
iter 140570: loss nan, time 122.41ms
iter 140580: loss nan, time 122.80ms
iter 140590: loss nan, time 121.26ms
tensor(0.9649)
iter 140600: loss nan, time 119.81ms
iter 140610: loss nan, time 121.30ms
iter 140620: loss nan, time 119.17ms
iter 140630: loss nan, time 120.02ms
iter 140640: loss nan, time 120.38ms
iter 140650: loss nan, time 120.22ms
iter 140660: loss nan, time 119.73ms
iter 140670: loss nan, time 121.82ms
iter 140680: loss nan, time 122.98ms
iter 140690: loss nan, time 121.72ms
tensor(0.9524)
iter 140700: loss nan, time 121.81ms
iter 140710: loss nan, time 121.37ms
iter 140720: loss nan, time 119.28ms
iter 140730: loss nan, time 120.05ms
iter 140740: loss nan, time 120.02ms
step 140750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 140750: loss nan, time 2908.93ms
iter 140760: loss nan, time 120.47ms
iter 140770: loss nan, time 120.38ms
iter 140780: loss nan, time 120.73ms
iter 140790: loss nan, time 121.33ms
tensor(0.9382)
iter 140800: loss nan, time 120.66ms
iter 140810: loss nan, time 122.42ms
iter 140820: loss nan, time 121.45ms
iter 140830: loss nan, time 121.52ms
iter 140840: loss nan, time 121.30ms
iter 140850: loss nan, time 119.05ms
iter 140860: loss nan, time 119.01ms
iter 140870: loss nan, time 119.93ms
iter 140880: loss nan, time 120.18ms
iter 140890: loss nan, time 120.42ms
tensor(0.9222)
iter 140900: loss nan, time 120.57ms
iter 140910: loss nan, time 119.53ms
iter 140920: loss nan, time 121.47ms
iter 140930: loss nan, time 122.24ms
iter 140940: loss nan, time 121.32ms
iter 140950: loss nan, time 121.50ms
iter 140960: loss nan, time 121.13ms
iter 140970: loss nan, time 121.00ms
iter 140980: loss nan, time 121.20ms
iter 140990: loss nan, time 119.63ms
tensor(0.9045)
step 141000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 141000: loss nan, time 2907.78ms
iter 141010: loss nan, time 119.68ms
iter 141020: loss nan, time 119.12ms
iter 141030: loss nan, time 118.62ms
iter 141040: loss nan, time 119.64ms
iter 141050: loss nan, time 120.26ms
iter 141060: loss nan, time 120.19ms
iter 141070: loss nan, time 120.41ms
iter 141080: loss nan, time 120.08ms
iter 141090: loss nan, time 119.75ms
tensor(0.8853)
iter 141100: loss nan, time 120.52ms
iter 141110: loss nan, time 121.31ms
iter 141120: loss nan, time 122.06ms
iter 141130: loss nan, time 123.51ms
iter 141140: loss nan, time 121.12ms
iter 141150: loss nan, time 119.04ms
iter 141160: loss nan, time 121.08ms
iter 141170: loss nan, time 121.25ms
iter 141180: loss nan, time 119.23ms
iter 141190: loss nan, time 119.99ms
tensor(0.8645)
iter 141200: loss nan, time 120.85ms
iter 141210: loss nan, time 119.12ms
iter 141220: loss nan, time 120.12ms
iter 141230: loss nan, time 120.90ms
iter 141240: loss nan, time 121.95ms
step 141250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 141250: loss nan, time 2924.46ms
iter 141260: loss nan, time 121.59ms
iter 141270: loss nan, time 121.15ms
iter 141280: loss nan, time 121.35ms
iter 141290: loss nan, time 121.17ms
tensor(0.8423)
iter 141300: loss nan, time 121.66ms
iter 141310: loss nan, time 119.15ms
iter 141320: loss nan, time 119.38ms
iter 141330: loss nan, time 120.33ms
iter 141340: loss nan, time 120.75ms
iter 141350: loss nan, time 120.42ms
iter 141360: loss nan, time 121.11ms
iter 141370: loss nan, time 121.29ms
iter 141380: loss nan, time 122.31ms
iter 141390: loss nan, time 122.58ms
tensor(0.8187)
iter 141400: loss nan, time 119.40ms
iter 141410: loss nan, time 120.87ms
iter 141420: loss nan, time 121.23ms
iter 141430: loss nan, time 121.36ms
iter 141440: loss nan, time 119.45ms
iter 141450: loss nan, time 119.45ms
iter 141460: loss nan, time 119.04ms
iter 141470: loss nan, time 120.31ms
iter 141480: loss nan, time 120.39ms
iter 141490: loss nan, time 120.98ms
tensor(0.7939)
step 141500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 141500: loss nan, time 2921.32ms
iter 141510: loss nan, time 115.31ms
iter 141520: loss nan, time 116.66ms
iter 141530: loss nan, time 117.43ms
iter 141540: loss nan, time 115.60ms
iter 141550: loss nan, time 117.41ms
iter 141560: loss nan, time 117.06ms
iter 141570: loss nan, time 116.33ms
iter 141580: loss nan, time 117.09ms
iter 141590: loss nan, time 116.35ms
tensor(0.7679)
iter 141600: loss nan, time 116.69ms
iter 141610: loss nan, time 117.13ms
iter 141620: loss nan, time 117.25ms
iter 141630: loss nan, time 115.81ms
iter 141640: loss nan, time 117.21ms
iter 141650: loss nan, time 116.56ms
iter 141660: loss nan, time 115.15ms
iter 141670: loss nan, time 117.21ms
iter 141680: loss nan, time 117.03ms
iter 141690: loss nan, time 115.81ms
tensor(0.7409)
iter 141700: loss nan, time 116.65ms
iter 141710: loss nan, time 117.45ms
iter 141720: loss nan, time 116.09ms
iter 141730: loss nan, time 117.13ms
iter 141740: loss nan, time 117.08ms
step 141750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 141750: loss nan, time 2900.65ms
iter 141760: loss nan, time 116.03ms
iter 141770: loss nan, time 115.94ms
iter 141780: loss nan, time 116.89ms
iter 141790: loss nan, time 117.82ms
tensor(0.7129)
iter 141800: loss nan, time 116.16ms
iter 141810: loss nan, time 116.60ms
iter 141820: loss nan, time 115.43ms
iter 141830: loss nan, time 116.15ms
iter 141840: loss nan, time 116.81ms
iter 141850: loss nan, time 116.44ms
iter 141860: loss nan, time 115.71ms
iter 141870: loss nan, time 117.02ms
iter 141880: loss nan, time 116.12ms
iter 141890: loss nan, time 114.75ms
tensor(0.6841)
iter 141900: loss nan, time 117.78ms
iter 141910: loss nan, time 116.03ms
iter 141920: loss nan, time 116.81ms
iter 141930: loss nan, time 118.25ms
iter 141940: loss nan, time 116.01ms
iter 141950: loss nan, time 117.17ms
iter 141960: loss nan, time 117.18ms
iter 141970: loss nan, time 116.28ms
iter 141980: loss nan, time 116.73ms
iter 141990: loss nan, time 118.12ms
tensor(0.6545)
step 142000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 142000: loss nan, time 2905.33ms
iter 142010: loss nan, time 116.97ms
iter 142020: loss nan, time 115.68ms
iter 142030: loss nan, time 114.43ms
iter 142040: loss nan, time 117.13ms
iter 142050: loss nan, time 117.26ms
iter 142060: loss nan, time 115.89ms
iter 142070: loss nan, time 117.08ms
iter 142080: loss nan, time 116.43ms
iter 142090: loss nan, time 114.79ms
tensor(0.6243)
iter 142100: loss nan, time 117.55ms
iter 142110: loss nan, time 116.32ms
iter 142120: loss nan, time 114.91ms
iter 142130: loss nan, time 117.13ms
iter 142140: loss nan, time 116.18ms
iter 142150: loss nan, time 116.35ms
iter 142160: loss nan, time 118.11ms
iter 142170: loss nan, time 116.42ms
iter 142180: loss nan, time 116.01ms
iter 142190: loss nan, time 117.08ms
tensor(0.5937)
iter 142200: loss nan, time 116.94ms
iter 142210: loss nan, time 116.00ms
iter 142220: loss nan, time 118.31ms
iter 142230: loss nan, time 116.12ms
iter 142240: loss nan, time 117.23ms
step 142250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 142250: loss nan, time 2902.81ms
iter 142260: loss nan, time 115.02ms
iter 142270: loss nan, time 113.90ms
iter 142280: loss nan, time 118.10ms
iter 142290: loss nan, time 114.89ms
tensor(0.5627)
iter 142300: loss nan, time 117.88ms
iter 142310: loss nan, time 117.08ms
iter 142320: loss nan, time 114.63ms
iter 142330: loss nan, time 115.96ms
iter 142340: loss nan, time 115.83ms
iter 142350: loss nan, time 115.29ms
iter 142360: loss nan, time 118.23ms
iter 142370: loss nan, time 115.81ms
iter 142380: loss nan, time 115.53ms
iter 142390: loss nan, time 116.10ms
tensor(0.5314)
iter 142400: loss nan, time 116.16ms
iter 142410: loss nan, time 117.02ms
iter 142420: loss nan, time 118.16ms
iter 142430: loss nan, time 116.21ms
iter 142440: loss nan, time 116.13ms
iter 142450: loss nan, time 116.06ms
iter 142460: loss nan, time 115.79ms
iter 142470: loss nan, time 117.17ms
iter 142480: loss nan, time 118.03ms
iter 142490: loss nan, time 116.24ms
tensor(0.5000)
step 142500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 142500: loss nan, time 2915.80ms
iter 142510: loss nan, time 118.39ms
iter 142520: loss nan, time 115.83ms
iter 142530: loss nan, time 117.17ms
iter 142540: loss nan, time 117.08ms
iter 142550: loss nan, time 114.92ms
iter 142560: loss nan, time 117.32ms
iter 142570: loss nan, time 118.32ms
iter 142580: loss nan, time 115.47ms
iter 142590: loss nan, time 117.64ms
tensor(0.4686)
iter 142600: loss nan, time 117.62ms
iter 142610: loss nan, time 119.28ms
iter 142620: loss nan, time 120.64ms
iter 142630: loss nan, time 120.41ms
iter 142640: loss nan, time 120.83ms
iter 142650: loss nan, time 119.08ms
iter 142660: loss nan, time 120.06ms
iter 142670: loss nan, time 120.03ms
iter 142680: loss nan, time 120.09ms
iter 142690: loss nan, time 120.62ms
tensor(0.4373)
iter 142700: loss nan, time 120.23ms
iter 142710: loss nan, time 120.99ms
iter 142720: loss nan, time 122.16ms
iter 142730: loss nan, time 122.36ms
iter 142740: loss nan, time 121.06ms
step 142750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 142750: loss nan, time 2923.93ms
iter 142760: loss nan, time 121.04ms
iter 142770: loss nan, time 122.49ms
iter 142780: loss nan, time 121.21ms
iter 142790: loss nan, time 121.17ms
tensor(0.4063)
iter 142800: loss nan, time 119.83ms
iter 142810: loss nan, time 123.24ms
iter 142820: loss nan, time 121.12ms
iter 142830: loss nan, time 122.44ms
iter 142840: loss nan, time 120.04ms
iter 142850: loss nan, time 120.98ms
iter 142860: loss nan, time 122.31ms
iter 142870: loss nan, time 122.22ms
iter 142880: loss nan, time 120.71ms
iter 142890: loss nan, time 119.37ms
tensor(0.3757)
iter 142900: loss nan, time 120.86ms
iter 142910: loss nan, time 119.13ms
iter 142920: loss nan, time 119.10ms
iter 142930: loss nan, time 119.22ms
iter 142940: loss nan, time 118.90ms
iter 142950: loss nan, time 118.74ms
iter 142960: loss nan, time 118.86ms
iter 142970: loss nan, time 119.26ms
iter 142980: loss nan, time 118.50ms
iter 142990: loss nan, time 118.24ms
tensor(0.3455)
step 143000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 143000: loss nan, time 2895.92ms
iter 143010: loss nan, time 118.47ms
iter 143020: loss nan, time 118.47ms
iter 143030: loss nan, time 118.76ms
iter 143040: loss nan, time 118.39ms
iter 143050: loss nan, time 118.99ms
iter 143060: loss nan, time 118.45ms
iter 143070: loss nan, time 118.57ms
iter 143080: loss nan, time 118.53ms
iter 143090: loss nan, time 117.96ms
tensor(0.3159)
iter 143100: loss nan, time 118.83ms
iter 143110: loss nan, time 118.93ms
iter 143120: loss nan, time 119.87ms
iter 143130: loss nan, time 119.06ms
iter 143140: loss nan, time 119.03ms
iter 143150: loss nan, time 118.75ms
iter 143160: loss nan, time 119.07ms
iter 143170: loss nan, time 119.66ms
iter 143180: loss nan, time 120.43ms
iter 143190: loss nan, time 120.44ms
tensor(0.2871)
iter 143200: loss nan, time 120.04ms
iter 143210: loss nan, time 120.15ms
iter 143220: loss nan, time 122.47ms
iter 143230: loss nan, time 122.02ms
iter 143240: loss nan, time 122.15ms
step 143250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 143250: loss nan, time 2920.11ms
iter 143260: loss nan, time 122.02ms
iter 143270: loss nan, time 121.14ms
iter 143280: loss nan, time 122.09ms
iter 143290: loss nan, time 122.40ms
tensor(0.2591)
iter 143300: loss nan, time 120.14ms
iter 143310: loss nan, time 122.36ms
iter 143320: loss nan, time 121.36ms
iter 143330: loss nan, time 121.42ms
iter 143340: loss nan, time 121.07ms
iter 143350: loss nan, time 118.60ms
iter 143360: loss nan, time 118.67ms
iter 143370: loss nan, time 119.15ms
iter 143380: loss nan, time 119.05ms
iter 143390: loss nan, time 117.93ms
tensor(0.2321)
iter 143400: loss nan, time 118.89ms
iter 143410: loss nan, time 118.78ms
iter 143420: loss nan, time 118.56ms
iter 143430: loss nan, time 118.56ms
iter 143440: loss nan, time 118.48ms
iter 143450: loss nan, time 118.73ms
iter 143460: loss nan, time 119.04ms
iter 143470: loss nan, time 119.41ms
iter 143480: loss nan, time 119.67ms
iter 143490: loss nan, time 120.30ms
tensor(0.2061)
step 143500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 143500: loss nan, time 2909.00ms
iter 143510: loss nan, time 120.25ms
iter 143520: loss nan, time 120.95ms
iter 143530: loss nan, time 121.11ms
iter 143540: loss nan, time 121.82ms
iter 143550: loss nan, time 122.56ms
iter 143560: loss nan, time 120.30ms
iter 143570: loss nan, time 122.01ms
iter 143580: loss nan, time 122.16ms
iter 143590: loss nan, time 122.12ms
tensor(0.1813)
iter 143600: loss nan, time 121.42ms
iter 143610: loss nan, time 118.91ms
iter 143620: loss nan, time 121.49ms
iter 143630: loss nan, time 120.96ms
iter 143640: loss nan, time 120.73ms
iter 143650: loss nan, time 118.65ms
iter 143660: loss nan, time 119.10ms
iter 143670: loss nan, time 118.53ms
iter 143680: loss nan, time 118.47ms
iter 143690: loss nan, time 118.44ms
tensor(0.1577)
iter 143700: loss nan, time 119.72ms
iter 143710: loss nan, time 118.63ms
iter 143720: loss nan, time 120.69ms
iter 143730: loss nan, time 118.52ms
iter 143740: loss nan, time 120.23ms
step 143750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 143750: loss nan, time 2901.10ms
iter 143760: loss nan, time 121.71ms
iter 143770: loss nan, time 119.87ms
iter 143780: loss nan, time 122.20ms
iter 143790: loss nan, time 121.93ms
tensor(0.1355)
iter 143800: loss nan, time 121.77ms
iter 143810: loss nan, time 120.82ms
iter 143820: loss nan, time 118.59ms
iter 143830: loss nan, time 120.80ms
iter 143840: loss nan, time 120.95ms
iter 143850: loss nan, time 122.10ms
iter 143860: loss nan, time 120.31ms
iter 143870: loss nan, time 118.95ms
iter 143880: loss nan, time 121.07ms
iter 143890: loss nan, time 120.34ms
tensor(0.1147)
iter 143900: loss nan, time 119.01ms
iter 143910: loss nan, time 119.01ms
iter 143920: loss nan, time 119.50ms
iter 143930: loss nan, time 119.00ms
iter 143940: loss nan, time 119.29ms
iter 143950: loss nan, time 118.29ms
iter 143960: loss nan, time 117.82ms
iter 143970: loss nan, time 118.84ms
iter 143980: loss nan, time 121.08ms
iter 143990: loss nan, time 118.82ms
tensor(0.0955)
step 144000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 144000: loss nan, time 2916.24ms
iter 144010: loss nan, time 119.59ms
iter 144020: loss nan, time 120.35ms
iter 144030: loss nan, time 120.32ms
iter 144040: loss nan, time 119.72ms
iter 144050: loss nan, time 121.49ms
iter 144060: loss nan, time 122.30ms
iter 144070: loss nan, time 122.04ms
iter 144080: loss nan, time 121.81ms
iter 144090: loss nan, time 120.79ms
tensor(0.0778)
iter 144100: loss nan, time 122.21ms
iter 144110: loss nan, time 121.88ms
iter 144120: loss nan, time 122.51ms
iter 144130: loss nan, time 122.43ms
iter 144140: loss nan, time 121.04ms
iter 144150: loss nan, time 121.26ms
iter 144160: loss nan, time 121.15ms
iter 144170: loss nan, time 121.33ms
iter 144180: loss nan, time 119.16ms
iter 144190: loss nan, time 119.43ms
tensor(0.0618)
iter 144200: loss nan, time 119.04ms
iter 144210: loss nan, time 118.43ms
iter 144220: loss nan, time 117.92ms
iter 144230: loss nan, time 117.90ms
iter 144240: loss nan, time 117.58ms
step 144250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 144250: loss nan, time 2891.54ms
iter 144260: loss nan, time 120.47ms
iter 144270: loss nan, time 120.69ms
iter 144280: loss nan, time 120.89ms
iter 144290: loss nan, time 122.73ms
tensor(0.0476)
iter 144300: loss nan, time 121.35ms
iter 144310: loss nan, time 121.46ms
iter 144320: loss nan, time 122.01ms
iter 144330: loss nan, time 120.59ms
iter 144340: loss nan, time 120.79ms
iter 144350: loss nan, time 120.68ms
iter 144360: loss nan, time 120.72ms
iter 144370: loss nan, time 121.04ms
iter 144380: loss nan, time 118.40ms
iter 144390: loss nan, time 118.66ms
tensor(0.0351)
iter 144400: loss nan, time 118.86ms
iter 144410: loss nan, time 118.66ms
iter 144420: loss nan, time 118.66ms
iter 144430: loss nan, time 118.52ms
iter 144440: loss nan, time 118.86ms
iter 144450: loss nan, time 119.28ms
iter 144460: loss nan, time 118.56ms
iter 144470: loss nan, time 118.80ms
iter 144480: loss nan, time 118.56ms
iter 144490: loss nan, time 118.95ms
tensor(0.0245)
step 144500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 144500: loss nan, time 2907.81ms
iter 144510: loss nan, time 120.75ms
iter 144520: loss nan, time 120.68ms
iter 144530: loss nan, time 120.59ms
iter 144540: loss nan, time 121.23ms
iter 144550: loss nan, time 120.75ms
iter 144560: loss nan, time 118.69ms
iter 144570: loss nan, time 118.05ms
iter 144580: loss nan, time 118.78ms
iter 144590: loss nan, time 118.51ms
tensor(0.0157)
iter 144600: loss nan, time 119.00ms
iter 144610: loss nan, time 118.74ms
iter 144620: loss nan, time 118.62ms
iter 144630: loss nan, time 118.96ms
iter 144640: loss nan, time 118.55ms
iter 144650: loss nan, time 118.66ms
iter 144660: loss nan, time 118.77ms
iter 144670: loss nan, time 118.72ms
iter 144680: loss nan, time 118.66ms
iter 144690: loss nan, time 118.82ms
tensor(0.0089)
iter 144700: loss nan, time 118.89ms
iter 144710: loss nan, time 118.62ms
iter 144720: loss nan, time 119.02ms
iter 144730: loss nan, time 118.63ms
iter 144740: loss nan, time 118.63ms
step 144750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 144750: loss nan, time 2899.06ms
iter 144760: loss nan, time 122.04ms
iter 144770: loss nan, time 120.91ms
iter 144780: loss nan, time 121.10ms
iter 144790: loss nan, time 121.18ms
tensor(0.0039)
iter 144800: loss nan, time 121.22ms
iter 144810: loss nan, time 119.04ms
iter 144820: loss nan, time 118.69ms
iter 144830: loss nan, time 118.78ms
iter 144840: loss nan, time 118.64ms
iter 144850: loss nan, time 118.71ms
iter 144860: loss nan, time 118.56ms
iter 144870: loss nan, time 118.61ms
iter 144880: loss nan, time 118.62ms
iter 144890: loss nan, time 118.57ms
tensor(0.0010)
iter 144900: loss nan, time 118.76ms
iter 144910: loss nan, time 118.69ms
iter 144920: loss nan, time 118.53ms
iter 144930: loss nan, time 118.49ms
iter 144940: loss nan, time 118.44ms
iter 144950: loss nan, time 118.46ms
iter 144960: loss nan, time 119.06ms
iter 144970: loss nan, time 118.72ms
iter 144980: loss nan, time 118.77ms
iter 144990: loss nan, time 118.70ms
tensor(0.0010)
step 145000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 145000: loss nan, time 2898.00ms
iter 145010: loss nan, time 121.87ms
iter 145020: loss nan, time 121.86ms
iter 145030: loss nan, time 120.69ms
iter 145040: loss nan, time 121.89ms
iter 145050: loss nan, time 121.72ms
iter 145060: loss nan, time 121.81ms
iter 145070: loss nan, time 121.83ms
iter 145080: loss nan, time 120.58ms
iter 145090: loss nan, time 122.24ms
tensor(0.0010)
iter 145100: loss nan, time 122.44ms
iter 145110: loss nan, time 121.85ms
iter 145120: loss nan, time 122.09ms
iter 145130: loss nan, time 120.72ms
iter 145140: loss nan, time 120.96ms
iter 145150: loss nan, time 121.12ms
iter 145160: loss nan, time 120.75ms
iter 145170: loss nan, time 120.71ms
iter 145180: loss nan, time 121.25ms
iter 145190: loss nan, time 120.78ms
tensor(0.0039)
iter 145200: loss nan, time 121.04ms
iter 145210: loss nan, time 120.94ms
iter 145220: loss nan, time 120.95ms
iter 145230: loss nan, time 120.89ms
iter 145240: loss nan, time 119.88ms
step 145250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 145250: loss nan, time 2915.54ms
iter 145260: loss nan, time 118.70ms
iter 145270: loss nan, time 118.47ms
iter 145280: loss nan, time 118.74ms
iter 145290: loss nan, time 118.68ms
tensor(0.0089)
iter 145300: loss nan, time 119.23ms
iter 145310: loss nan, time 119.07ms
iter 145320: loss nan, time 118.67ms
iter 145330: loss nan, time 119.56ms
iter 145340: loss nan, time 119.74ms
iter 145350: loss nan, time 120.03ms
iter 145360: loss nan, time 119.96ms
iter 145370: loss nan, time 119.62ms
iter 145380: loss nan, time 120.03ms
iter 145390: loss nan, time 119.97ms
tensor(0.0157)
iter 145400: loss nan, time 120.47ms
iter 145410: loss nan, time 119.90ms
iter 145420: loss nan, time 119.98ms
iter 145430: loss nan, time 119.44ms
iter 145440: loss nan, time 120.63ms
iter 145450: loss nan, time 120.70ms
iter 145460: loss nan, time 121.68ms
iter 145470: loss nan, time 120.04ms
iter 145480: loss nan, time 122.16ms
iter 145490: loss nan, time 121.90ms
tensor(0.0245)
step 145500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 145500: loss nan, time 2906.71ms
iter 145510: loss nan, time 122.27ms
iter 145520: loss nan, time 122.28ms
iter 145530: loss nan, time 118.97ms
iter 145540: loss nan, time 120.19ms
iter 145550: loss nan, time 120.79ms
iter 145560: loss nan, time 121.19ms
iter 145570: loss nan, time 121.34ms
iter 145580: loss nan, time 119.06ms
iter 145590: loss nan, time 119.06ms
tensor(0.0351)
iter 145600: loss nan, time 119.81ms
iter 145610: loss nan, time 120.12ms
iter 145620: loss nan, time 120.23ms
iter 145630: loss nan, time 120.15ms
iter 145640: loss nan, time 119.37ms
iter 145650: loss nan, time 120.45ms
iter 145660: loss nan, time 121.19ms
iter 145670: loss nan, time 122.05ms
iter 145680: loss nan, time 122.47ms
iter 145690: loss nan, time 121.23ms
tensor(0.0476)
iter 145700: loss nan, time 121.54ms
iter 145710: loss nan, time 121.01ms
iter 145720: loss nan, time 121.20ms
iter 145730: loss nan, time 121.48ms
iter 145740: loss nan, time 121.13ms
step 145750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 145750: loss nan, time 2906.63ms
iter 145760: loss nan, time 121.34ms
iter 145770: loss nan, time 119.30ms
iter 145780: loss nan, time 119.95ms
iter 145790: loss nan, time 120.52ms
tensor(0.0618)
iter 145800: loss nan, time 120.73ms
iter 145810: loss nan, time 120.38ms
iter 145820: loss nan, time 120.14ms
iter 145830: loss nan, time 120.48ms
iter 145840: loss nan, time 121.68ms
iter 145850: loss nan, time 122.01ms
iter 145860: loss nan, time 122.35ms
iter 145870: loss nan, time 121.07ms
iter 145880: loss nan, time 118.92ms
iter 145890: loss nan, time 121.37ms
tensor(0.0778)
iter 145900: loss nan, time 121.33ms
iter 145910: loss nan, time 121.14ms
iter 145920: loss nan, time 120.81ms
iter 145930: loss nan, time 119.07ms
iter 145940: loss nan, time 119.18ms
iter 145950: loss nan, time 120.64ms
iter 145960: loss nan, time 120.71ms
iter 145970: loss nan, time 120.35ms
iter 145980: loss nan, time 120.53ms
iter 145990: loss nan, time 120.34ms
tensor(0.0955)
step 146000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 146000: loss nan, time 2919.48ms
iter 146010: loss nan, time 122.86ms
iter 146020: loss nan, time 121.22ms
iter 146030: loss nan, time 121.54ms
iter 146040: loss nan, time 121.28ms
iter 146050: loss nan, time 121.55ms
iter 146060: loss nan, time 119.70ms
iter 146070: loss nan, time 119.75ms
iter 146080: loss nan, time 120.35ms
iter 146090: loss nan, time 120.29ms
tensor(0.1147)
iter 146100: loss nan, time 120.88ms
iter 146110: loss nan, time 120.67ms
iter 146120: loss nan, time 121.47ms
iter 146130: loss nan, time 120.57ms
iter 146140: loss nan, time 122.23ms
iter 146150: loss nan, time 121.35ms
iter 146160: loss nan, time 121.60ms
iter 146170: loss nan, time 120.81ms
iter 146180: loss nan, time 119.45ms
iter 146190: loss nan, time 119.11ms
tensor(0.1355)
iter 146200: loss nan, time 120.84ms
iter 146210: loss nan, time 120.36ms
iter 146220: loss nan, time 119.96ms
iter 146230: loss nan, time 120.06ms
iter 146240: loss nan, time 119.20ms
step 146250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 146250: loss nan, time 2895.16ms
iter 146260: loss nan, time 120.57ms
iter 146270: loss nan, time 120.63ms
iter 146280: loss nan, time 120.82ms
iter 146290: loss nan, time 120.57ms
tensor(0.1577)
iter 146300: loss nan, time 120.05ms
iter 146310: loss nan, time 122.05ms
iter 146320: loss nan, time 122.63ms
iter 146330: loss nan, time 121.81ms
iter 146340: loss nan, time 120.95ms
iter 146350: loss nan, time 121.05ms
iter 146360: loss nan, time 121.15ms
iter 146370: loss nan, time 121.12ms
iter 146380: loss nan, time 121.53ms
iter 146390: loss nan, time 119.11ms
tensor(0.1813)
iter 146400: loss nan, time 119.58ms
iter 146410: loss nan, time 120.21ms
iter 146420: loss nan, time 120.37ms
iter 146430: loss nan, time 120.89ms
iter 146440: loss nan, time 120.06ms
iter 146450: loss nan, time 120.94ms
iter 146460: loss nan, time 121.44ms
iter 146470: loss nan, time 122.59ms
iter 146480: loss nan, time 120.11ms
iter 146490: loss nan, time 122.27ms
tensor(0.2061)
step 146500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 146500: loss nan, time 2904.47ms
iter 146510: loss nan, time 121.13ms
iter 146520: loss nan, time 121.41ms
iter 146530: loss nan, time 120.96ms
iter 146540: loss nan, time 119.04ms
iter 146550: loss nan, time 121.20ms
iter 146560: loss nan, time 121.27ms
iter 146570: loss nan, time 118.57ms
iter 146580: loss nan, time 119.00ms
iter 146590: loss nan, time 119.72ms
tensor(0.2321)
iter 146600: loss nan, time 119.41ms
iter 146610: loss nan, time 120.47ms
iter 146620: loss nan, time 120.85ms
iter 146630: loss nan, time 119.34ms
iter 146640: loss nan, time 120.59ms
iter 146650: loss nan, time 119.37ms
iter 146660: loss nan, time 121.30ms
iter 146670: loss nan, time 122.14ms
iter 146680: loss nan, time 122.15ms
iter 146690: loss nan, time 122.34ms
tensor(0.2591)
iter 146700: loss nan, time 121.56ms
iter 146710: loss nan, time 121.19ms
iter 146720: loss nan, time 121.38ms
iter 146730: loss nan, time 121.37ms
iter 146740: loss nan, time 121.24ms
step 146750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 146750: loss nan, time 2929.79ms
iter 146760: loss nan, time 120.49ms
iter 146770: loss nan, time 120.40ms
iter 146780: loss nan, time 119.68ms
iter 146790: loss nan, time 120.50ms
tensor(0.2871)
iter 146800: loss nan, time 121.23ms
iter 146810: loss nan, time 122.78ms
iter 146820: loss nan, time 122.65ms
iter 146830: loss nan, time 121.17ms
iter 146840: loss nan, time 119.11ms
iter 146850: loss nan, time 120.72ms
iter 146860: loss nan, time 121.29ms
iter 146870: loss nan, time 119.20ms
iter 146880: loss nan, time 119.02ms
iter 146890: loss nan, time 119.44ms
tensor(0.3159)
iter 146900: loss nan, time 119.46ms
iter 146910: loss nan, time 120.55ms
iter 146920: loss nan, time 119.96ms
iter 146930: loss nan, time 120.33ms
iter 146940: loss nan, time 120.48ms
iter 146950: loss nan, time 120.17ms
iter 146960: loss nan, time 121.89ms
iter 146970: loss nan, time 122.63ms
iter 146980: loss nan, time 121.97ms
iter 146990: loss nan, time 121.31ms
tensor(0.3455)
step 147000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 147000: loss nan, time 2901.69ms
iter 147010: loss nan, time 121.37ms
iter 147020: loss nan, time 121.74ms
iter 147030: loss nan, time 121.37ms
iter 147040: loss nan, time 120.32ms
iter 147050: loss nan, time 121.34ms
iter 147060: loss nan, time 119.53ms
iter 147070: loss nan, time 119.65ms
iter 147080: loss nan, time 119.34ms
iter 147090: loss nan, time 119.16ms
tensor(0.3757)
iter 147100: loss nan, time 120.62ms
iter 147110: loss nan, time 120.43ms
iter 147120: loss nan, time 120.47ms
iter 147130: loss nan, time 120.26ms
iter 147140: loss nan, time 120.27ms
iter 147150: loss nan, time 122.14ms
iter 147160: loss nan, time 122.45ms
iter 147170: loss nan, time 121.40ms
iter 147180: loss nan, time 121.52ms
iter 147190: loss nan, time 119.01ms
tensor(0.4063)
iter 147200: loss nan, time 121.50ms
iter 147210: loss nan, time 121.29ms
iter 147220: loss nan, time 121.15ms
iter 147230: loss nan, time 121.20ms
iter 147240: loss nan, time 119.09ms
step 147250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 147250: loss nan, time 2910.87ms
iter 147260: loss nan, time 120.38ms
iter 147270: loss nan, time 119.03ms
iter 147280: loss nan, time 119.56ms
iter 147290: loss nan, time 119.62ms
tensor(0.4373)
iter 147300: loss nan, time 119.78ms
iter 147310: loss nan, time 119.21ms
iter 147320: loss nan, time 120.50ms
iter 147330: loss nan, time 120.49ms
iter 147340: loss nan, time 120.35ms
iter 147350: loss nan, time 121.10ms
iter 147360: loss nan, time 120.60ms
iter 147370: loss nan, time 122.72ms
iter 147380: loss nan, time 122.40ms
iter 147390: loss nan, time 121.35ms
tensor(0.4686)
iter 147400: loss nan, time 121.30ms
iter 147410: loss nan, time 121.43ms
iter 147420: loss nan, time 121.08ms
iter 147430: loss nan, time 121.51ms
iter 147440: loss nan, time 119.44ms
iter 147450: loss nan, time 120.49ms
iter 147460: loss nan, time 120.34ms
iter 147470: loss nan, time 120.49ms
iter 147480: loss nan, time 119.65ms
iter 147490: loss nan, time 119.55ms
tensor(0.5000)
step 147500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 147500: loss nan, time 2911.38ms
iter 147510: loss nan, time 120.25ms
iter 147520: loss nan, time 122.11ms
iter 147530: loss nan, time 122.45ms
iter 147540: loss nan, time 121.38ms
iter 147550: loss nan, time 118.63ms
iter 147560: loss nan, time 121.03ms
iter 147570: loss nan, time 121.25ms
iter 147580: loss nan, time 121.12ms
iter 147590: loss nan, time 121.34ms
tensor(0.5314)
iter 147600: loss nan, time 119.72ms
iter 147610: loss nan, time 119.55ms
iter 147620: loss nan, time 120.10ms
iter 147630: loss nan, time 120.53ms
iter 147640: loss nan, time 120.36ms
iter 147650: loss nan, time 120.20ms
iter 147660: loss nan, time 119.33ms
iter 147670: loss nan, time 121.03ms
iter 147680: loss nan, time 121.96ms
iter 147690: loss nan, time 122.27ms
tensor(0.5627)
iter 147700: loss nan, time 122.52ms
iter 147710: loss nan, time 121.25ms
iter 147720: loss nan, time 120.62ms
iter 147730: loss nan, time 121.05ms
iter 147740: loss nan, time 121.16ms
step 147750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 147750: loss nan, time 2916.03ms
iter 147760: loss nan, time 119.13ms
iter 147770: loss nan, time 119.60ms
iter 147780: loss nan, time 120.69ms
iter 147790: loss nan, time 120.37ms
tensor(0.5937)
iter 147800: loss nan, time 120.21ms
iter 147810: loss nan, time 120.46ms
iter 147820: loss nan, time 121.33ms
iter 147830: loss nan, time 121.85ms
iter 147840: loss nan, time 122.82ms
iter 147850: loss nan, time 119.34ms
iter 147860: loss nan, time 121.31ms
iter 147870: loss nan, time 121.13ms
iter 147880: loss nan, time 121.29ms
iter 147890: loss nan, time 121.29ms
tensor(0.6243)
iter 147900: loss nan, time 119.74ms
iter 147910: loss nan, time 119.40ms
iter 147920: loss nan, time 120.29ms
iter 147930: loss nan, time 120.46ms
iter 147940: loss nan, time 119.57ms
iter 147950: loss nan, time 120.34ms
iter 147960: loss nan, time 118.92ms
iter 147970: loss nan, time 121.61ms
iter 147980: loss nan, time 122.47ms
iter 147990: loss nan, time 122.62ms
tensor(0.6545)
step 148000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 148000: loss nan, time 2900.10ms
iter 148010: loss nan, time 121.14ms
iter 148020: loss nan, time 121.16ms
iter 148030: loss nan, time 122.59ms
iter 148040: loss nan, time 122.50ms
iter 148050: loss nan, time 120.68ms
iter 148060: loss nan, time 121.42ms
iter 148070: loss nan, time 121.12ms
iter 148080: loss nan, time 121.29ms
iter 148090: loss nan, time 121.28ms
tensor(0.6841)
iter 148100: loss nan, time 118.57ms
iter 148110: loss nan, time 118.51ms
iter 148120: loss nan, time 119.65ms
iter 148130: loss nan, time 120.46ms
iter 148140: loss nan, time 120.14ms
iter 148150: loss nan, time 119.96ms
iter 148160: loss nan, time 120.97ms
iter 148170: loss nan, time 120.98ms
iter 148180: loss nan, time 121.78ms
iter 148190: loss nan, time 122.77ms
tensor(0.7129)
iter 148200: loss nan, time 119.20ms
iter 148210: loss nan, time 121.68ms
iter 148220: loss nan, time 121.12ms
iter 148230: loss nan, time 121.18ms
iter 148240: loss nan, time 121.31ms
step 148250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 148250: loss nan, time 2915.68ms
iter 148260: loss nan, time 119.39ms
iter 148270: loss nan, time 118.32ms
iter 148280: loss nan, time 120.86ms
iter 148290: loss nan, time 120.43ms
tensor(0.7409)
iter 148300: loss nan, time 121.02ms
iter 148310: loss nan, time 121.44ms
iter 148320: loss nan, time 121.25ms
iter 148330: loss nan, time 122.80ms
iter 148340: loss nan, time 121.54ms
iter 148350: loss nan, time 121.11ms
iter 148360: loss nan, time 120.57ms
iter 148370: loss nan, time 121.56ms
iter 148380: loss nan, time 119.23ms
iter 148390: loss nan, time 119.87ms
tensor(0.7679)
iter 148400: loss nan, time 120.42ms
iter 148410: loss nan, time 119.89ms
iter 148420: loss nan, time 120.30ms
iter 148430: loss nan, time 120.91ms
iter 148440: loss nan, time 120.94ms
iter 148450: loss nan, time 120.55ms
iter 148460: loss nan, time 122.54ms
iter 148470: loss nan, time 121.96ms
iter 148480: loss nan, time 121.30ms
iter 148490: loss nan, time 121.29ms
tensor(0.7939)
step 148500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 148500: loss nan, time 2902.42ms
iter 148510: loss nan, time 119.61ms
iter 148520: loss nan, time 121.31ms
iter 148530: loss nan, time 119.34ms
iter 148540: loss nan, time 119.75ms
iter 148550: loss nan, time 120.17ms
iter 148560: loss nan, time 120.65ms
iter 148570: loss nan, time 119.38ms
iter 148580: loss nan, time 121.95ms
iter 148590: loss nan, time 122.20ms
tensor(0.8187)
iter 148600: loss nan, time 122.64ms
iter 148610: loss nan, time 122.60ms
iter 148620: loss nan, time 121.27ms
iter 148630: loss nan, time 121.30ms
iter 148640: loss nan, time 121.22ms
iter 148650: loss nan, time 119.53ms
iter 148660: loss nan, time 118.73ms
iter 148670: loss nan, time 119.15ms
iter 148680: loss nan, time 119.61ms
iter 148690: loss nan, time 120.48ms
tensor(0.8423)
iter 148700: loss nan, time 120.89ms
iter 148710: loss nan, time 121.86ms
iter 148720: loss nan, time 122.55ms
iter 148730: loss nan, time 122.80ms
iter 148740: loss nan, time 122.63ms
step 148750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 148750: loss nan, time 2891.82ms
iter 148760: loss nan, time 119.04ms
iter 148770: loss nan, time 120.81ms
iter 148780: loss nan, time 120.42ms
iter 148790: loss nan, time 120.92ms
tensor(0.8645)
iter 148800: loss nan, time 119.74ms
iter 148810: loss nan, time 119.03ms
iter 148820: loss nan, time 119.61ms
iter 148830: loss nan, time 120.24ms
iter 148840: loss nan, time 120.47ms
iter 148850: loss nan, time 120.50ms
iter 148860: loss nan, time 119.88ms
iter 148870: loss nan, time 120.13ms
iter 148880: loss nan, time 120.59ms
iter 148890: loss nan, time 120.31ms
tensor(0.8853)
iter 148900: loss nan, time 121.31ms
iter 148910: loss nan, time 119.86ms
iter 148920: loss nan, time 121.81ms
iter 148930: loss nan, time 122.98ms
iter 148940: loss nan, time 122.34ms
iter 148950: loss nan, time 122.49ms
iter 148960: loss nan, time 119.24ms
iter 148970: loss nan, time 121.40ms
iter 148980: loss nan, time 119.25ms
iter 148990: loss nan, time 119.32ms
tensor(0.9045)
step 149000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 149000: loss nan, time 2924.07ms
iter 149010: loss nan, time 119.66ms
iter 149020: loss nan, time 121.06ms
iter 149030: loss nan, time 120.95ms
iter 149040: loss nan, time 122.69ms
iter 149050: loss nan, time 122.51ms
iter 149060: loss nan, time 122.48ms
iter 149070: loss nan, time 121.56ms
iter 149080: loss nan, time 121.37ms
iter 149090: loss nan, time 119.18ms
tensor(0.9222)
iter 149100: loss nan, time 119.51ms
iter 149110: loss nan, time 119.24ms
iter 149120: loss nan, time 119.34ms
iter 149130: loss nan, time 120.99ms
iter 149140: loss nan, time 121.13ms
iter 149150: loss nan, time 121.95ms
iter 149160: loss nan, time 120.34ms
iter 149170: loss nan, time 122.93ms
iter 149180: loss nan, time 122.68ms
iter 149190: loss nan, time 121.49ms
tensor(0.9382)
iter 149200: loss nan, time 121.72ms
iter 149210: loss nan, time 119.19ms
iter 149220: loss nan, time 119.77ms
iter 149230: loss nan, time 119.25ms
iter 149240: loss nan, time 119.49ms
step 149250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 149250: loss nan, time 2907.23ms
iter 149260: loss nan, time 119.67ms
iter 149270: loss nan, time 120.41ms
iter 149280: loss nan, time 119.35ms
iter 149290: loss nan, time 121.45ms
tensor(0.9524)
iter 149300: loss nan, time 122.88ms
iter 149310: loss nan, time 122.93ms
iter 149320: loss nan, time 122.39ms
iter 149330: loss nan, time 121.44ms
iter 149340: loss nan, time 121.73ms
iter 149350: loss nan, time 119.52ms
iter 149360: loss nan, time 119.34ms
iter 149370: loss nan, time 118.10ms
iter 149380: loss nan, time 119.78ms
iter 149390: loss nan, time 120.32ms
tensor(0.9649)
iter 149400: loss nan, time 121.18ms
iter 149410: loss nan, time 120.72ms
iter 149420: loss nan, time 122.56ms
iter 149430: loss nan, time 123.15ms
iter 149440: loss nan, time 122.59ms
iter 149450: loss nan, time 121.23ms
iter 149460: loss nan, time 119.42ms
iter 149470: loss nan, time 119.89ms
iter 149480: loss nan, time 119.37ms
iter 149490: loss nan, time 119.24ms
tensor(0.9755)
step 149500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 149500: loss nan, time 2909.69ms
iter 149510: loss nan, time 119.16ms
iter 149520: loss nan, time 119.04ms
iter 149530: loss nan, time 118.89ms
iter 149540: loss nan, time 120.39ms
iter 149550: loss nan, time 121.02ms
iter 149560: loss nan, time 121.59ms
iter 149570: loss nan, time 122.99ms
iter 149580: loss nan, time 121.69ms
iter 149590: loss nan, time 121.50ms
tensor(0.9843)
iter 149600: loss nan, time 122.00ms
iter 149610: loss nan, time 121.67ms
iter 149620: loss nan, time 121.68ms
iter 149630: loss nan, time 121.00ms
iter 149640: loss nan, time 121.12ms
iter 149650: loss nan, time 119.13ms
iter 149660: loss nan, time 119.21ms
iter 149670: loss nan, time 118.44ms
iter 149680: loss nan, time 119.33ms
iter 149690: loss nan, time 120.42ms
tensor(0.9911)
iter 149700: loss nan, time 120.18ms
iter 149710: loss nan, time 120.53ms
iter 149720: loss nan, time 122.11ms
iter 149730: loss nan, time 122.68ms
iter 149740: loss nan, time 122.74ms
step 149750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 149750: loss nan, time 2924.32ms
iter 149760: loss nan, time 121.60ms
iter 149770: loss nan, time 119.35ms
iter 149780: loss nan, time 119.42ms
iter 149790: loss nan, time 119.44ms
tensor(0.9961)
iter 149800: loss nan, time 120.73ms
iter 149810: loss nan, time 120.54ms
iter 149820: loss nan, time 121.61ms
iter 149830: loss nan, time 121.18ms
iter 149840: loss nan, time 123.06ms
iter 149850: loss nan, time 122.57ms
iter 149860: loss nan, time 122.70ms
iter 149870: loss nan, time 121.52ms
iter 149880: loss nan, time 121.46ms
iter 149890: loss nan, time 119.36ms
tensor(0.9990)
iter 149900: loss nan, time 119.59ms
iter 149910: loss nan, time 119.39ms
iter 149920: loss nan, time 118.75ms
iter 149930: loss nan, time 120.59ms
iter 149940: loss nan, time 121.42ms
iter 149950: loss nan, time 122.68ms
iter 149960: loss nan, time 120.44ms
iter 149970: loss nan, time 122.68ms
iter 149980: loss nan, time 121.65ms
iter 149990: loss nan, time 121.25ms
tensor(1.)
step 150000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 150000: loss nan, time 2916.37ms
iter 150010: loss nan, time 119.27ms
iter 150020: loss nan, time 119.21ms
iter 150030: loss nan, time 118.06ms
iter 150040: loss nan, time 120.18ms
iter 150050: loss nan, time 120.72ms
iter 150060: loss nan, time 121.12ms
iter 150070: loss nan, time 122.04ms
iter 150080: loss nan, time 121.80ms
iter 150090: loss nan, time 122.46ms
tensor(0.9990)
iter 150100: loss nan, time 122.90ms
iter 150110: loss nan, time 121.32ms
iter 150120: loss nan, time 121.27ms
iter 150130: loss nan, time 119.26ms
iter 150140: loss nan, time 119.56ms
iter 150150: loss nan, time 119.24ms
iter 150160: loss nan, time 119.40ms
iter 150170: loss nan, time 119.54ms
iter 150180: loss nan, time 120.48ms
iter 150190: loss nan, time 121.31ms
tensor(0.9961)
iter 150200: loss nan, time 123.13ms
iter 150210: loss nan, time 120.67ms
iter 150220: loss nan, time 122.36ms
iter 150230: loss nan, time 123.05ms
iter 150240: loss nan, time 121.35ms
step 150250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 150250: loss nan, time 2916.40ms
iter 150260: loss nan, time 121.37ms
iter 150270: loss nan, time 119.53ms
iter 150280: loss nan, time 119.29ms
iter 150290: loss nan, time 119.34ms
tensor(0.9911)
iter 150300: loss nan, time 119.73ms
iter 150310: loss nan, time 119.36ms
iter 150320: loss nan, time 120.37ms
iter 150330: loss nan, time 119.19ms
iter 150340: loss nan, time 120.76ms
iter 150350: loss nan, time 121.75ms
iter 150360: loss nan, time 123.05ms
iter 150370: loss nan, time 122.55ms
iter 150380: loss nan, time 121.42ms
iter 150390: loss nan, time 121.39ms
tensor(0.9843)
iter 150400: loss nan, time 121.69ms
iter 150410: loss nan, time 119.86ms
iter 150420: loss nan, time 118.42ms
iter 150430: loss nan, time 119.35ms
iter 150440: loss nan, time 119.50ms
iter 150450: loss nan, time 120.00ms
iter 150460: loss nan, time 120.97ms
iter 150470: loss nan, time 122.35ms
iter 150480: loss nan, time 122.04ms
iter 150490: loss nan, time 122.93ms
tensor(0.9755)
step 150500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 150500: loss nan, time 2925.22ms
iter 150510: loss nan, time 121.46ms
iter 150520: loss nan, time 119.21ms
iter 150530: loss nan, time 121.37ms
iter 150540: loss nan, time 119.34ms
iter 150550: loss nan, time 119.60ms
iter 150560: loss nan, time 119.33ms
iter 150570: loss nan, time 119.25ms
iter 150580: loss nan, time 117.90ms
iter 150590: loss nan, time 119.77ms
tensor(0.9649)
iter 150600: loss nan, time 119.94ms
iter 150610: loss nan, time 120.05ms
iter 150620: loss nan, time 120.67ms
iter 150630: loss nan, time 119.83ms
iter 150640: loss nan, time 122.10ms
iter 150650: loss nan, time 122.53ms
iter 150660: loss nan, time 122.42ms
iter 150670: loss nan, time 122.76ms
iter 150680: loss nan, time 121.46ms
iter 150690: loss nan, time 120.34ms
tensor(0.9524)
iter 150700: loss nan, time 119.93ms
iter 150710: loss nan, time 119.49ms
iter 150720: loss nan, time 118.45ms
iter 150730: loss nan, time 119.67ms
iter 150740: loss nan, time 120.55ms
step 150750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 150750: loss nan, time 2927.03ms
iter 150760: loss nan, time 122.45ms
iter 150770: loss nan, time 120.29ms
iter 150780: loss nan, time 122.32ms
iter 150790: loss nan, time 122.49ms
tensor(0.9382)
iter 150800: loss nan, time 122.97ms
iter 150810: loss nan, time 121.70ms
iter 150820: loss nan, time 119.18ms
iter 150830: loss nan, time 119.64ms
iter 150840: loss nan, time 119.19ms
iter 150850: loss nan, time 119.28ms
iter 150860: loss nan, time 119.03ms
iter 150870: loss nan, time 118.65ms
iter 150880: loss nan, time 118.62ms
iter 150890: loss nan, time 118.54ms
tensor(0.9222)
iter 150900: loss nan, time 119.14ms
iter 150910: loss nan, time 118.70ms
iter 150920: loss nan, time 118.92ms
iter 150930: loss nan, time 118.61ms
iter 150940: loss nan, time 118.35ms
iter 150950: loss nan, time 118.77ms
iter 150960: loss nan, time 119.05ms
iter 150970: loss nan, time 119.01ms
iter 150980: loss nan, time 118.84ms
iter 150990: loss nan, time 118.78ms
tensor(0.9045)
step 151000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 151000: loss nan, time 2901.41ms
iter 151010: loss nan, time 118.70ms
iter 151020: loss nan, time 118.98ms
iter 151030: loss nan, time 118.64ms
iter 151040: loss nan, time 118.89ms
iter 151050: loss nan, time 118.68ms
iter 151060: loss nan, time 118.67ms
iter 151070: loss nan, time 118.98ms
iter 151080: loss nan, time 118.54ms
iter 151090: loss nan, time 118.66ms
tensor(0.8853)
iter 151100: loss nan, time 118.94ms
iter 151110: loss nan, time 119.00ms
iter 151120: loss nan, time 119.29ms
iter 151130: loss nan, time 118.80ms
iter 151140: loss nan, time 118.82ms
iter 151150: loss nan, time 118.90ms
iter 151160: loss nan, time 119.63ms
iter 151170: loss nan, time 119.24ms
iter 151180: loss nan, time 119.15ms
iter 151190: loss nan, time 118.76ms
tensor(0.8645)
iter 151200: loss nan, time 119.79ms
iter 151210: loss nan, time 119.79ms
iter 151220: loss nan, time 120.38ms
iter 151230: loss nan, time 120.59ms
iter 151240: loss nan, time 119.34ms
step 151250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 151250: loss nan, time 2908.49ms
iter 151260: loss nan, time 120.06ms
iter 151270: loss nan, time 120.17ms
iter 151280: loss nan, time 119.86ms
iter 151290: loss nan, time 120.78ms
tensor(0.8423)
iter 151300: loss nan, time 120.14ms
iter 151310: loss nan, time 121.36ms
iter 151320: loss nan, time 122.16ms
iter 151330: loss nan, time 122.16ms
iter 151340: loss nan, time 121.48ms
iter 151350: loss nan, time 121.06ms
iter 151360: loss nan, time 122.18ms
iter 151370: loss nan, time 122.14ms
iter 151380: loss nan, time 122.16ms
iter 151390: loss nan, time 121.94ms
tensor(0.8187)
iter 151400: loss nan, time 121.29ms
iter 151410: loss nan, time 122.67ms
iter 151420: loss nan, time 122.40ms
iter 151430: loss nan, time 122.29ms
iter 151440: loss nan, time 122.00ms
iter 151450: loss nan, time 121.02ms
iter 151460: loss nan, time 121.06ms
iter 151470: loss nan, time 121.01ms
iter 151480: loss nan, time 121.35ms
iter 151490: loss nan, time 120.42ms
tensor(0.7939)
step 151500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 151500: loss nan, time 2909.84ms
iter 151510: loss nan, time 120.68ms
iter 151520: loss nan, time 122.20ms
iter 151530: loss nan, time 120.73ms
iter 151540: loss nan, time 121.38ms
iter 151550: loss nan, time 120.53ms
iter 151560: loss nan, time 120.77ms
iter 151570: loss nan, time 120.77ms
iter 151580: loss nan, time 121.20ms
iter 151590: loss nan, time 118.96ms
tensor(0.7679)
iter 151600: loss nan, time 118.17ms
iter 151610: loss nan, time 119.72ms
iter 151620: loss nan, time 120.89ms
iter 151630: loss nan, time 120.12ms
iter 151640: loss nan, time 119.75ms
iter 151650: loss nan, time 119.13ms
iter 151660: loss nan, time 120.44ms
iter 151670: loss nan, time 119.81ms
iter 151680: loss nan, time 119.83ms
iter 151690: loss nan, time 119.63ms
tensor(0.7409)
iter 151700: loss nan, time 120.05ms
iter 151710: loss nan, time 120.09ms
iter 151720: loss nan, time 120.19ms
iter 151730: loss nan, time 120.21ms
iter 151740: loss nan, time 119.73ms
step 151750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 151750: loss nan, time 2899.21ms
iter 151760: loss nan, time 118.34ms
iter 151770: loss nan, time 120.29ms
iter 151780: loss nan, time 119.91ms
iter 151790: loss nan, time 119.75ms
tensor(0.7129)
iter 151800: loss nan, time 120.01ms
iter 151810: loss nan, time 118.63ms
iter 151820: loss nan, time 120.09ms
iter 151830: loss nan, time 119.29ms
iter 151840: loss nan, time 119.76ms
iter 151850: loss nan, time 119.82ms
iter 151860: loss nan, time 119.07ms
iter 151870: loss nan, time 120.03ms
iter 151880: loss nan, time 119.81ms
iter 151890: loss nan, time 119.80ms
tensor(0.6841)
iter 151900: loss nan, time 120.17ms
iter 151910: loss nan, time 120.05ms
iter 151920: loss nan, time 120.54ms
iter 151930: loss nan, time 120.31ms
iter 151940: loss nan, time 120.59ms
iter 151950: loss nan, time 120.66ms
iter 151960: loss nan, time 120.83ms
iter 151970: loss nan, time 120.89ms
iter 151980: loss nan, time 120.97ms
iter 151990: loss nan, time 121.95ms
tensor(0.6545)
step 152000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 152000: loss nan, time 2913.85ms
iter 152010: loss nan, time 120.44ms
iter 152020: loss nan, time 121.98ms
iter 152030: loss nan, time 122.46ms
iter 152040: loss nan, time 122.60ms
iter 152050: loss nan, time 120.97ms
iter 152060: loss nan, time 118.79ms
iter 152070: loss nan, time 120.45ms
iter 152080: loss nan, time 121.33ms
iter 152090: loss nan, time 121.56ms
tensor(0.6243)
iter 152100: loss nan, time 121.16ms
iter 152110: loss nan, time 118.82ms
iter 152120: loss nan, time 121.36ms
iter 152130: loss nan, time 120.85ms
iter 152140: loss nan, time 119.16ms
iter 152150: loss nan, time 118.87ms
iter 152160: loss nan, time 119.04ms
iter 152170: loss nan, time 118.01ms
iter 152180: loss nan, time 119.29ms
iter 152190: loss nan, time 119.39ms
tensor(0.5937)
iter 152200: loss nan, time 120.23ms
iter 152210: loss nan, time 120.16ms
iter 152220: loss nan, time 118.78ms
iter 152230: loss nan, time 119.93ms
iter 152240: loss nan, time 120.07ms
step 152250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 152250: loss nan, time 2902.26ms
iter 152260: loss nan, time 119.16ms
iter 152270: loss nan, time 119.96ms
iter 152280: loss nan, time 118.00ms
iter 152290: loss nan, time 119.97ms
tensor(0.5627)
iter 152300: loss nan, time 120.29ms
iter 152310: loss nan, time 119.91ms
iter 152320: loss nan, time 119.84ms
iter 152330: loss nan, time 118.70ms
iter 152340: loss nan, time 119.97ms
iter 152350: loss nan, time 120.13ms
iter 152360: loss nan, time 120.07ms
iter 152370: loss nan, time 120.37ms
iter 152380: loss nan, time 119.34ms
iter 152390: loss nan, time 120.54ms
tensor(0.5314)
iter 152400: loss nan, time 121.67ms
iter 152410: loss nan, time 121.02ms
iter 152420: loss nan, time 122.15ms
iter 152430: loss nan, time 121.02ms
iter 152440: loss nan, time 122.29ms
iter 152450: loss nan, time 121.61ms
iter 152460: loss nan, time 120.23ms
iter 152470: loss nan, time 120.86ms
iter 152480: loss nan, time 121.04ms
iter 152490: loss nan, time 120.28ms
tensor(0.5000)
step 152500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 152500: loss nan, time 2908.21ms
iter 152510: loss nan, time 120.76ms
iter 152520: loss nan, time 119.80ms
iter 152530: loss nan, time 120.86ms
iter 152540: loss nan, time 120.89ms
iter 152550: loss nan, time 120.95ms
iter 152560: loss nan, time 120.78ms
iter 152570: loss nan, time 120.84ms
iter 152580: loss nan, time 120.24ms
iter 152590: loss nan, time 121.03ms
tensor(0.4686)
iter 152600: loss nan, time 119.19ms
iter 152610: loss nan, time 118.69ms
iter 152620: loss nan, time 118.71ms
iter 152630: loss nan, time 118.20ms
iter 152640: loss nan, time 119.01ms
iter 152650: loss nan, time 119.48ms
iter 152660: loss nan, time 119.76ms
iter 152670: loss nan, time 119.89ms
iter 152680: loss nan, time 119.47ms
iter 152690: loss nan, time 120.01ms
tensor(0.4373)
iter 152700: loss nan, time 120.62ms
iter 152710: loss nan, time 120.10ms
iter 152720: loss nan, time 120.08ms
iter 152730: loss nan, time 118.21ms
iter 152740: loss nan, time 120.05ms
step 152750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 152750: loss nan, time 2912.62ms
iter 152760: loss nan, time 120.26ms
iter 152770: loss nan, time 119.98ms
iter 152780: loss nan, time 119.97ms
iter 152790: loss nan, time 120.40ms
tensor(0.4063)
iter 152800: loss nan, time 120.20ms
iter 152810: loss nan, time 119.77ms
iter 152820: loss nan, time 120.62ms
iter 152830: loss nan, time 119.93ms
iter 152840: loss nan, time 121.32ms
iter 152850: loss nan, time 120.23ms
iter 152860: loss nan, time 121.36ms
iter 152870: loss nan, time 122.24ms
iter 152880: loss nan, time 119.99ms
iter 152890: loss nan, time 122.37ms
tensor(0.3757)
iter 152900: loss nan, time 121.50ms
iter 152910: loss nan, time 122.10ms
iter 152920: loss nan, time 120.74ms
iter 152930: loss nan, time 118.85ms
iter 152940: loss nan, time 120.86ms
iter 152950: loss nan, time 120.06ms
iter 152960: loss nan, time 120.90ms
iter 152970: loss nan, time 120.78ms
iter 152980: loss nan, time 118.57ms
iter 152990: loss nan, time 120.86ms
tensor(0.3455)
step 153000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 153000: loss nan, time 2918.66ms
iter 153010: loss nan, time 119.82ms
iter 153020: loss nan, time 120.92ms
iter 153030: loss nan, time 120.97ms
iter 153040: loss nan, time 118.74ms
iter 153050: loss nan, time 120.93ms
iter 153060: loss nan, time 119.50ms
iter 153070: loss nan, time 118.93ms
iter 153080: loss nan, time 118.74ms
iter 153090: loss nan, time 118.73ms
tensor(0.3159)
iter 153100: loss nan, time 119.08ms
iter 153110: loss nan, time 119.28ms
iter 153120: loss nan, time 119.47ms
iter 153130: loss nan, time 119.25ms
iter 153140: loss nan, time 119.52ms
iter 153150: loss nan, time 118.79ms
iter 153160: loss nan, time 120.25ms
iter 153170: loss nan, time 120.16ms
iter 153180: loss nan, time 120.35ms
iter 153190: loss nan, time 120.09ms
tensor(0.2871)
iter 153200: loss nan, time 119.08ms
iter 153210: loss nan, time 119.69ms
iter 153220: loss nan, time 119.96ms
iter 153230: loss nan, time 120.06ms
iter 153240: loss nan, time 119.72ms
step 153250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 153250: loss nan, time 2913.63ms
iter 153260: loss nan, time 118.80ms
iter 153270: loss nan, time 119.92ms
iter 153280: loss nan, time 119.66ms
iter 153290: loss nan, time 120.16ms
tensor(0.2591)
iter 153300: loss nan, time 119.87ms
iter 153310: loss nan, time 117.95ms
iter 153320: loss nan, time 120.09ms
iter 153330: loss nan, time 120.21ms
iter 153340: loss nan, time 120.24ms
iter 153350: loss nan, time 120.28ms
iter 153360: loss nan, time 119.02ms
iter 153370: loss nan, time 119.82ms
iter 153380: loss nan, time 120.05ms
iter 153390: loss nan, time 119.93ms
tensor(0.2321)
iter 153400: loss nan, time 119.98ms
iter 153410: loss nan, time 119.02ms
iter 153420: loss nan, time 119.87ms
iter 153430: loss nan, time 120.56ms
iter 153440: loss nan, time 120.57ms
iter 153450: loss nan, time 121.59ms
iter 153460: loss nan, time 121.14ms
iter 153470: loss nan, time 122.57ms
iter 153480: loss nan, time 121.21ms
iter 153490: loss nan, time 120.22ms
tensor(0.2061)
step 153500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 153500: loss nan, time 2906.23ms
iter 153510: loss nan, time 121.22ms
iter 153520: loss nan, time 121.15ms
iter 153530: loss nan, time 121.12ms
iter 153540: loss nan, time 121.29ms
iter 153550: loss nan, time 120.75ms
iter 153560: loss nan, time 120.29ms
iter 153570: loss nan, time 118.93ms
iter 153580: loss nan, time 119.83ms
iter 153590: loss nan, time 119.99ms
tensor(0.1813)
iter 153600: loss nan, time 120.77ms
iter 153610: loss nan, time 119.26ms
iter 153620: loss nan, time 120.76ms
iter 153630: loss nan, time 119.65ms
iter 153640: loss nan, time 120.77ms
iter 153650: loss nan, time 119.25ms
iter 153660: loss nan, time 121.30ms
iter 153670: loss nan, time 122.99ms
iter 153680: loss nan, time 121.39ms
iter 153690: loss nan, time 121.24ms
tensor(0.1577)
iter 153700: loss nan, time 119.31ms
iter 153710: loss nan, time 121.43ms
iter 153720: loss nan, time 119.55ms
iter 153730: loss nan, time 119.75ms
iter 153740: loss nan, time 120.03ms
step 153750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 153750: loss nan, time 2903.72ms
iter 153760: loss nan, time 120.43ms
iter 153770: loss nan, time 119.96ms
iter 153780: loss nan, time 120.73ms
iter 153790: loss nan, time 120.20ms
tensor(0.1355)
iter 153800: loss nan, time 121.52ms
iter 153810: loss nan, time 120.91ms
iter 153820: loss nan, time 120.31ms
iter 153830: loss nan, time 122.88ms
iter 153840: loss nan, time 121.20ms
iter 153850: loss nan, time 121.22ms
iter 153860: loss nan, time 121.25ms
iter 153870: loss nan, time 121.30ms
iter 153880: loss nan, time 121.25ms
iter 153890: loss nan, time 119.79ms
tensor(0.1147)
iter 153900: loss nan, time 120.68ms
iter 153910: loss nan, time 120.31ms
iter 153920: loss nan, time 120.03ms
iter 153930: loss nan, time 121.49ms
iter 153940: loss nan, time 122.40ms
iter 153950: loss nan, time 120.68ms
iter 153960: loss nan, time 126.07ms
iter 153970: loss nan, time 120.88ms
iter 153980: loss nan, time 120.67ms
iter 153990: loss nan, time 121.21ms
tensor(0.0955)
step 154000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 154000: loss nan, time 2929.71ms
iter 154010: loss nan, time 122.57ms
iter 154020: loss nan, time 121.12ms
iter 154030: loss nan, time 121.17ms
iter 154040: loss nan, time 121.32ms
iter 154050: loss nan, time 121.48ms
iter 154060: loss nan, time 119.88ms
iter 154070: loss nan, time 120.29ms
iter 154080: loss nan, time 120.18ms
iter 154090: loss nan, time 120.30ms
tensor(0.0778)
iter 154100: loss nan, time 119.88ms
iter 154110: loss nan, time 119.76ms
iter 154120: loss nan, time 120.94ms
iter 154130: loss nan, time 122.03ms
iter 154140: loss nan, time 122.45ms
iter 154150: loss nan, time 119.30ms
iter 154160: loss nan, time 121.30ms
iter 154170: loss nan, time 121.12ms
iter 154180: loss nan, time 121.13ms
iter 154190: loss nan, time 121.32ms
tensor(0.0618)
iter 154200: loss nan, time 119.40ms
iter 154210: loss nan, time 119.03ms
iter 154220: loss nan, time 119.22ms
iter 154230: loss nan, time 120.29ms
iter 154240: loss nan, time 120.00ms
step 154250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 154250: loss nan, time 2881.58ms
iter 154260: loss nan, time 119.22ms
iter 154270: loss nan, time 119.09ms
iter 154280: loss nan, time 120.13ms
iter 154290: loss nan, time 120.33ms
tensor(0.0476)
iter 154300: loss nan, time 120.62ms
iter 154310: loss nan, time 120.24ms
iter 154320: loss nan, time 119.48ms
iter 154330: loss nan, time 121.14ms
iter 154340: loss nan, time 121.82ms
iter 154350: loss nan, time 122.82ms
iter 154360: loss nan, time 121.29ms
iter 154370: loss nan, time 121.24ms
iter 154380: loss nan, time 121.17ms
iter 154390: loss nan, time 121.01ms
tensor(0.0351)
iter 154400: loss nan, time 121.72ms
iter 154410: loss nan, time 119.43ms
iter 154420: loss nan, time 119.85ms
iter 154430: loss nan, time 120.93ms
iter 154440: loss nan, time 121.33ms
iter 154450: loss nan, time 120.45ms
iter 154460: loss nan, time 121.73ms
iter 154470: loss nan, time 122.46ms
iter 154480: loss nan, time 122.53ms
iter 154490: loss nan, time 121.53ms
tensor(0.0245)
step 154500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 154500: loss nan, time 2903.40ms
iter 154510: loss nan, time 119.04ms
iter 154520: loss nan, time 117.87ms
iter 154530: loss nan, time 115.98ms
iter 154540: loss nan, time 116.46ms
iter 154550: loss nan, time 118.66ms
iter 154560: loss nan, time 116.37ms
iter 154570: loss nan, time 115.15ms
iter 154580: loss nan, time 118.57ms
iter 154590: loss nan, time 116.29ms
tensor(0.0157)
iter 154600: loss nan, time 116.48ms
iter 154610: loss nan, time 117.41ms
iter 154620: loss nan, time 116.09ms
iter 154630: loss nan, time 115.40ms
iter 154640: loss nan, time 117.30ms
iter 154650: loss nan, time 116.39ms
iter 154660: loss nan, time 115.42ms
iter 154670: loss nan, time 117.76ms
iter 154680: loss nan, time 117.98ms
iter 154690: loss nan, time 116.48ms
tensor(0.0089)
iter 154700: loss nan, time 116.78ms
iter 154710: loss nan, time 118.81ms
iter 154720: loss nan, time 116.88ms
iter 154730: loss nan, time 117.20ms
iter 154740: loss nan, time 118.04ms
step 154750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 154750: loss nan, time 2925.59ms
iter 154760: loss nan, time 115.52ms
iter 154770: loss nan, time 117.17ms
iter 154780: loss nan, time 116.35ms
iter 154790: loss nan, time 114.11ms
tensor(0.0039)
iter 154800: loss nan, time 118.57ms
iter 154810: loss nan, time 116.13ms
iter 154820: loss nan, time 114.93ms
iter 154830: loss nan, time 118.39ms
iter 154840: loss nan, time 115.99ms
iter 154850: loss nan, time 116.91ms
iter 154860: loss nan, time 118.65ms
iter 154870: loss nan, time 115.08ms
iter 154880: loss nan, time 114.64ms
iter 154890: loss nan, time 116.88ms
tensor(0.0010)
iter 154900: loss nan, time 116.01ms
iter 154910: loss nan, time 117.11ms
iter 154920: loss nan, time 117.34ms
iter 154930: loss nan, time 115.39ms
iter 154940: loss nan, time 114.92ms
iter 154950: loss nan, time 118.39ms
iter 154960: loss nan, time 115.43ms
iter 154970: loss nan, time 117.13ms
iter 154980: loss nan, time 118.47ms
iter 154990: loss nan, time 116.47ms
tensor(0.0010)
step 155000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 155000: loss nan, time 2919.72ms
iter 155010: loss nan, time 116.28ms
iter 155020: loss nan, time 115.75ms
iter 155030: loss nan, time 117.14ms
iter 155040: loss nan, time 118.37ms
iter 155050: loss nan, time 116.54ms
iter 155060: loss nan, time 115.14ms
iter 155070: loss nan, time 115.93ms
iter 155080: loss nan, time 115.96ms
iter 155090: loss nan, time 116.02ms
tensor(0.0010)
iter 155100: loss nan, time 118.92ms
iter 155110: loss nan, time 115.89ms
iter 155120: loss nan, time 117.06ms
iter 155130: loss nan, time 116.09ms
iter 155140: loss nan, time 115.26ms
iter 155150: loss nan, time 116.90ms
iter 155160: loss nan, time 117.48ms
iter 155170: loss nan, time 115.18ms
iter 155180: loss nan, time 118.16ms
iter 155190: loss nan, time 115.08ms
tensor(0.0039)
iter 155200: loss nan, time 115.37ms
iter 155210: loss nan, time 117.03ms
iter 155220: loss nan, time 114.48ms
iter 155230: loss nan, time 116.92ms
iter 155240: loss nan, time 114.91ms
step 155250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 155250: loss nan, time 2906.88ms
iter 155260: loss nan, time 117.81ms
iter 155270: loss nan, time 113.64ms
iter 155280: loss nan, time 117.75ms
iter 155290: loss nan, time 116.59ms
tensor(0.0089)
iter 155300: loss nan, time 115.29ms
iter 155310: loss nan, time 117.91ms
iter 155320: loss nan, time 115.24ms
iter 155330: loss nan, time 114.65ms
iter 155340: loss nan, time 116.25ms
iter 155350: loss nan, time 113.90ms
iter 155360: loss nan, time 117.87ms
iter 155370: loss nan, time 114.87ms
iter 155380: loss nan, time 118.05ms
iter 155390: loss nan, time 115.86ms
tensor(0.0157)
iter 155400: loss nan, time 114.84ms
iter 155410: loss nan, time 116.20ms
iter 155420: loss nan, time 115.87ms
iter 155430: loss nan, time 116.89ms
iter 155440: loss nan, time 118.08ms
iter 155450: loss nan, time 115.11ms
iter 155460: loss nan, time 116.59ms
iter 155470: loss nan, time 115.34ms
iter 155480: loss nan, time 114.67ms
iter 155490: loss nan, time 118.30ms
tensor(0.0245)
step 155500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 155500: loss nan, time 2908.35ms
iter 155510: loss nan, time 117.04ms
iter 155520: loss nan, time 117.95ms
iter 155530: loss nan, time 114.97ms
iter 155540: loss nan, time 116.95ms
iter 155550: loss nan, time 117.14ms
iter 155560: loss nan, time 114.73ms
iter 155570: loss nan, time 117.29ms
iter 155580: loss nan, time 116.66ms
iter 155590: loss nan, time 114.85ms
tensor(0.0351)
iter 155600: loss nan, time 117.51ms
iter 155610: loss nan, time 116.05ms
iter 155620: loss nan, time 115.37ms
iter 155630: loss nan, time 118.23ms
iter 155640: loss nan, time 115.59ms
iter 155650: loss nan, time 114.80ms
iter 155660: loss nan, time 118.26ms
iter 155670: loss nan, time 114.66ms
iter 155680: loss nan, time 116.15ms
iter 155690: loss nan, time 115.80ms
tensor(0.0476)
iter 155700: loss nan, time 115.07ms
iter 155710: loss nan, time 115.44ms
iter 155720: loss nan, time 115.84ms
iter 155730: loss nan, time 117.25ms
iter 155740: loss nan, time 118.10ms
step 155750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 155750: loss nan, time 2898.05ms
iter 155760: loss nan, time 118.02ms
iter 155770: loss nan, time 115.56ms
iter 155780: loss nan, time 116.66ms
iter 155790: loss nan, time 117.75ms
tensor(0.0618)
iter 155800: loss nan, time 114.93ms
iter 155810: loss nan, time 116.79ms
iter 155820: loss nan, time 116.98ms
iter 155830: loss nan, time 114.74ms
iter 155840: loss nan, time 117.96ms
iter 155850: loss nan, time 115.95ms
iter 155860: loss nan, time 116.08ms
iter 155870: loss nan, time 118.18ms
iter 155880: loss nan, time 114.82ms
iter 155890: loss nan, time 115.85ms
tensor(0.0778)
iter 155900: loss nan, time 117.55ms
iter 155910: loss nan, time 114.67ms
iter 155920: loss nan, time 116.67ms
iter 155930: loss nan, time 115.79ms
iter 155940: loss nan, time 114.22ms
iter 155950: loss nan, time 117.89ms
iter 155960: loss nan, time 115.44ms
iter 155970: loss nan, time 114.58ms
iter 155980: loss nan, time 116.76ms
iter 155990: loss nan, time 114.49ms
tensor(0.0955)
step 156000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 156000: loss nan, time 2905.58ms
iter 156010: loss nan, time 114.34ms
iter 156020: loss nan, time 117.21ms
iter 156030: loss nan, time 116.33ms
iter 156040: loss nan, time 115.18ms
iter 156050: loss nan, time 117.92ms
iter 156060: loss nan, time 114.84ms
iter 156070: loss nan, time 116.71ms
iter 156080: loss nan, time 116.75ms
iter 156090: loss nan, time 114.84ms
tensor(0.1147)
iter 156100: loss nan, time 118.56ms
iter 156110: loss nan, time 115.95ms
iter 156120: loss nan, time 114.62ms
iter 156130: loss nan, time 118.02ms
iter 156140: loss nan, time 115.19ms
iter 156150: loss nan, time 116.78ms
iter 156160: loss nan, time 116.96ms
iter 156170: loss nan, time 114.56ms
iter 156180: loss nan, time 117.90ms
iter 156190: loss nan, time 115.83ms
tensor(0.1355)
iter 156200: loss nan, time 117.90ms
iter 156210: loss nan, time 117.93ms
iter 156220: loss nan, time 114.43ms
iter 156230: loss nan, time 115.18ms
iter 156240: loss nan, time 117.42ms
step 156250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 156250: loss nan, time 2920.57ms
iter 156260: loss nan, time 116.83ms
iter 156270: loss nan, time 116.57ms
iter 156280: loss nan, time 114.55ms
iter 156290: loss nan, time 118.08ms
tensor(0.1577)
iter 156300: loss nan, time 115.75ms
iter 156310: loss nan, time 116.73ms
iter 156320: loss nan, time 117.05ms
iter 156330: loss nan, time 115.01ms
iter 156340: loss nan, time 116.78ms
iter 156350: loss nan, time 116.71ms
iter 156360: loss nan, time 114.78ms
iter 156370: loss nan, time 117.84ms
iter 156380: loss nan, time 114.31ms
iter 156390: loss nan, time 114.79ms
tensor(0.1813)
iter 156400: loss nan, time 118.99ms
iter 156410: loss nan, time 115.52ms
iter 156420: loss nan, time 117.30ms
iter 156430: loss nan, time 117.91ms
iter 156440: loss nan, time 114.65ms
iter 156450: loss nan, time 117.18ms
iter 156460: loss nan, time 117.57ms
iter 156470: loss nan, time 114.49ms
iter 156480: loss nan, time 117.89ms
iter 156490: loss nan, time 116.74ms
tensor(0.2061)
step 156500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 156500: loss nan, time 2912.97ms
iter 156510: loss nan, time 118.01ms
iter 156520: loss nan, time 115.38ms
iter 156530: loss nan, time 116.86ms
iter 156540: loss nan, time 118.69ms
iter 156550: loss nan, time 114.89ms
iter 156560: loss nan, time 114.73ms
iter 156570: loss nan, time 117.91ms
iter 156580: loss nan, time 114.48ms
iter 156590: loss nan, time 117.17ms
tensor(0.2321)
iter 156600: loss nan, time 117.76ms
iter 156610: loss nan, time 115.03ms
iter 156620: loss nan, time 115.93ms
iter 156630: loss nan, time 116.53ms
iter 156640: loss nan, time 115.11ms
iter 156650: loss nan, time 118.35ms
iter 156660: loss nan, time 116.26ms
iter 156670: loss nan, time 115.19ms
iter 156680: loss nan, time 115.87ms
iter 156690: loss nan, time 115.80ms
tensor(0.2591)
iter 156700: loss nan, time 117.63ms
iter 156710: loss nan, time 118.05ms
iter 156720: loss nan, time 115.44ms
iter 156730: loss nan, time 117.11ms
iter 156740: loss nan, time 116.23ms
step 156750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 156750: loss nan, time 2908.48ms
iter 156760: loss nan, time 118.03ms
iter 156770: loss nan, time 114.89ms
iter 156780: loss nan, time 115.16ms
iter 156790: loss nan, time 118.43ms
tensor(0.2871)
iter 156800: loss nan, time 115.24ms
iter 156810: loss nan, time 117.11ms
iter 156820: loss nan, time 118.01ms
iter 156830: loss nan, time 114.69ms
iter 156840: loss nan, time 116.88ms
iter 156850: loss nan, time 117.61ms
iter 156860: loss nan, time 114.58ms
iter 156870: loss nan, time 118.08ms
iter 156880: loss nan, time 116.13ms
iter 156890: loss nan, time 114.15ms
tensor(0.3159)
iter 156900: loss nan, time 118.43ms
iter 156910: loss nan, time 115.65ms
iter 156920: loss nan, time 116.83ms
iter 156930: loss nan, time 118.78ms
iter 156940: loss nan, time 113.66ms
iter 156950: loss nan, time 118.54ms
iter 156960: loss nan, time 116.34ms
iter 156970: loss nan, time 114.66ms
iter 156980: loss nan, time 117.95ms
iter 156990: loss nan, time 115.95ms
tensor(0.3455)
step 157000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 157000: loss nan, time 2894.85ms
iter 157010: loss nan, time 115.91ms
iter 157020: loss nan, time 115.19ms
iter 157030: loss nan, time 116.79ms
iter 157040: loss nan, time 115.57ms
iter 157050: loss nan, time 117.44ms
iter 157060: loss nan, time 118.31ms
iter 157070: loss nan, time 115.41ms
iter 157080: loss nan, time 116.88ms
iter 157090: loss nan, time 117.35ms
tensor(0.3757)
iter 157100: loss nan, time 115.24ms
iter 157110: loss nan, time 116.73ms
iter 157120: loss nan, time 115.83ms
iter 157130: loss nan, time 114.95ms
iter 157140: loss nan, time 117.55ms
iter 157150: loss nan, time 114.62ms
iter 157160: loss nan, time 115.48ms
iter 157170: loss nan, time 118.39ms
iter 157180: loss nan, time 114.40ms
iter 157190: loss nan, time 116.59ms
tensor(0.4063)
iter 157200: loss nan, time 118.51ms
iter 157210: loss nan, time 114.87ms
iter 157220: loss nan, time 116.85ms
iter 157230: loss nan, time 117.74ms
iter 157240: loss nan, time 113.77ms
step 157250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 157250: loss nan, time 2906.56ms
iter 157260: loss nan, time 115.54ms
iter 157270: loss nan, time 114.58ms
iter 157280: loss nan, time 117.60ms
iter 157290: loss nan, time 114.07ms
tensor(0.4373)
iter 157300: loss nan, time 118.79ms
iter 157310: loss nan, time 116.33ms
iter 157320: loss nan, time 114.80ms
iter 157330: loss nan, time 116.28ms
iter 157340: loss nan, time 115.36ms
iter 157350: loss nan, time 116.03ms
iter 157360: loss nan, time 117.10ms
iter 157370: loss nan, time 114.46ms
iter 157380: loss nan, time 117.99ms
iter 157390: loss nan, time 115.86ms
tensor(0.4686)
iter 157400: loss nan, time 117.27ms
iter 157410: loss nan, time 117.02ms
iter 157420: loss nan, time 115.51ms
iter 157430: loss nan, time 117.22ms
iter 157440: loss nan, time 116.31ms
iter 157450: loss nan, time 114.55ms
iter 157460: loss nan, time 117.54ms
iter 157470: loss nan, time 115.07ms
iter 157480: loss nan, time 114.67ms
iter 157490: loss nan, time 118.44ms
tensor(0.5000)
step 157500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 157500: loss nan, time 2897.99ms
iter 157510: loss nan, time 117.93ms
iter 157520: loss nan, time 116.60ms
iter 157530: loss nan, time 114.60ms
iter 157540: loss nan, time 118.00ms
iter 157550: loss nan, time 115.79ms
iter 157560: loss nan, time 116.69ms
iter 157570: loss nan, time 118.64ms
iter 157580: loss nan, time 115.34ms
iter 157590: loss nan, time 114.67ms
tensor(0.5314)
iter 157600: loss nan, time 117.41ms
iter 157610: loss nan, time 114.69ms
iter 157620: loss nan, time 118.10ms
iter 157630: loss nan, time 116.23ms
iter 157640: loss nan, time 117.26ms
iter 157650: loss nan, time 115.89ms
iter 157660: loss nan, time 114.86ms
iter 157670: loss nan, time 116.81ms
iter 157680: loss nan, time 116.77ms
iter 157690: loss nan, time 114.57ms
tensor(0.5627)
iter 157700: loss nan, time 118.53ms
iter 157710: loss nan, time 115.84ms
iter 157720: loss nan, time 115.28ms
iter 157730: loss nan, time 116.74ms
iter 157740: loss nan, time 114.97ms
step 157750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 157750: loss nan, time 2909.43ms
iter 157760: loss nan, time 116.34ms
iter 157770: loss nan, time 115.13ms
iter 157780: loss nan, time 118.01ms
iter 157790: loss nan, time 115.79ms
tensor(0.5937)
iter 157800: loss nan, time 117.10ms
iter 157810: loss nan, time 117.35ms
iter 157820: loss nan, time 114.62ms
iter 157830: loss nan, time 117.88ms
iter 157840: loss nan, time 116.02ms
iter 157850: loss nan, time 114.62ms
iter 157860: loss nan, time 117.03ms
iter 157870: loss nan, time 114.54ms
iter 157880: loss nan, time 116.35ms
iter 157890: loss nan, time 116.74ms
tensor(0.6243)
iter 157900: loss nan, time 115.22ms
iter 157910: loss nan, time 116.33ms
iter 157920: loss nan, time 115.80ms
iter 157930: loss nan, time 116.66ms
iter 157940: loss nan, time 117.23ms
iter 157950: loss nan, time 115.16ms
iter 157960: loss nan, time 117.81ms
iter 157970: loss nan, time 116.11ms
iter 157980: loss nan, time 114.72ms
iter 157990: loss nan, time 116.55ms
tensor(0.6545)
step 158000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 158000: loss nan, time 2906.36ms
iter 158010: loss nan, time 117.76ms
iter 158020: loss nan, time 116.62ms
iter 158030: loss nan, time 115.21ms
iter 158040: loss nan, time 117.92ms
iter 158050: loss nan, time 116.04ms
iter 158060: loss nan, time 116.95ms
iter 158070: loss nan, time 118.01ms
iter 158080: loss nan, time 115.71ms
iter 158090: loss nan, time 116.70ms
tensor(0.6841)
iter 158100: loss nan, time 117.00ms
iter 158110: loss nan, time 115.67ms
iter 158120: loss nan, time 116.76ms
iter 158130: loss nan, time 115.84ms
iter 158140: loss nan, time 114.65ms
iter 158150: loss nan, time 118.05ms
iter 158160: loss nan, time 116.07ms
iter 158170: loss nan, time 113.93ms
iter 158180: loss nan, time 116.48ms
iter 158190: loss nan, time 115.83ms
tensor(0.7129)
iter 158200: loss nan, time 116.38ms
iter 158210: loss nan, time 116.53ms
iter 158220: loss nan, time 114.97ms
iter 158230: loss nan, time 113.68ms
iter 158240: loss nan, time 115.94ms
step 158250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 158250: loss nan, time 2912.83ms
iter 158260: loss nan, time 117.60ms
iter 158270: loss nan, time 116.30ms
iter 158280: loss nan, time 117.04ms
iter 158290: loss nan, time 117.07ms
tensor(0.7409)
iter 158300: loss nan, time 116.22ms
iter 158310: loss nan, time 116.88ms
iter 158320: loss nan, time 114.99ms
iter 158330: loss nan, time 115.26ms
iter 158340: loss nan, time 117.84ms
iter 158350: loss nan, time 116.34ms
iter 158360: loss nan, time 116.85ms
iter 158370: loss nan, time 117.94ms
iter 158380: loss nan, time 114.77ms
iter 158390: loss nan, time 117.00ms
tensor(0.7679)
iter 158400: loss nan, time 118.63ms
iter 158410: loss nan, time 116.10ms
iter 158420: loss nan, time 116.31ms
iter 158430: loss nan, time 116.81ms
iter 158440: loss nan, time 114.78ms
iter 158450: loss nan, time 116.75ms
iter 158460: loss nan, time 116.32ms
iter 158470: loss nan, time 116.03ms
iter 158480: loss nan, time 117.10ms
iter 158490: loss nan, time 115.91ms
tensor(0.7939)
step 158500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 158500: loss nan, time 2914.77ms
iter 158510: loss nan, time 117.66ms
iter 158520: loss nan, time 115.94ms
iter 158530: loss nan, time 116.07ms
iter 158540: loss nan, time 118.73ms
iter 158550: loss nan, time 115.84ms
iter 158560: loss nan, time 114.12ms
iter 158570: loss nan, time 118.17ms
iter 158580: loss nan, time 115.97ms
iter 158590: loss nan, time 116.39ms
tensor(0.8187)
iter 158600: loss nan, time 118.89ms
iter 158610: loss nan, time 116.06ms
iter 158620: loss nan, time 114.23ms
iter 158630: loss nan, time 116.56ms
iter 158640: loss nan, time 115.69ms
iter 158650: loss nan, time 116.24ms
iter 158660: loss nan, time 115.84ms
iter 158670: loss nan, time 116.24ms
iter 158680: loss nan, time 115.78ms
iter 158690: loss nan, time 115.92ms
tensor(0.8423)
iter 158700: loss nan, time 117.55ms
iter 158710: loss nan, time 116.74ms
iter 158720: loss nan, time 115.87ms
iter 158730: loss nan, time 116.74ms
iter 158740: loss nan, time 115.82ms
step 158750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 158750: loss nan, time 2896.63ms
iter 158760: loss nan, time 115.93ms
iter 158770: loss nan, time 114.69ms
iter 158780: loss nan, time 118.04ms
iter 158790: loss nan, time 115.94ms
tensor(0.8645)
iter 158800: loss nan, time 117.00ms
iter 158810: loss nan, time 117.22ms
iter 158820: loss nan, time 115.47ms
iter 158830: loss nan, time 116.54ms
iter 158840: loss nan, time 115.72ms
iter 158850: loss nan, time 116.20ms
iter 158860: loss nan, time 118.01ms
iter 158870: loss nan, time 115.14ms
iter 158880: loss nan, time 113.67ms
iter 158890: loss nan, time 116.68ms
tensor(0.8853)
iter 158900: loss nan, time 115.55ms
iter 158910: loss nan, time 116.45ms
iter 158920: loss nan, time 116.01ms
iter 158930: loss nan, time 116.76ms
iter 158940: loss nan, time 115.92ms
iter 158950: loss nan, time 116.14ms
iter 158960: loss nan, time 116.95ms
iter 158970: loss nan, time 117.97ms
iter 158980: loss nan, time 115.96ms
iter 158990: loss nan, time 116.76ms
tensor(0.9045)
step 159000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 159000: loss nan, time 2906.77ms
iter 159010: loss nan, time 115.28ms
iter 159020: loss nan, time 117.93ms
iter 159030: loss nan, time 114.55ms
iter 159040: loss nan, time 116.74ms
iter 159050: loss nan, time 118.40ms
iter 159060: loss nan, time 115.86ms
iter 159070: loss nan, time 117.06ms
iter 159080: loss nan, time 117.75ms
iter 159090: loss nan, time 114.64ms
tensor(0.9222)
iter 159100: loss nan, time 117.29ms
iter 159110: loss nan, time 116.73ms
iter 159120: loss nan, time 115.84ms
iter 159130: loss nan, time 116.72ms
iter 159140: loss nan, time 115.82ms
iter 159150: loss nan, time 115.04ms
iter 159160: loss nan, time 117.68ms
iter 159170: loss nan, time 115.85ms
iter 159180: loss nan, time 116.84ms
iter 159190: loss nan, time 119.19ms
tensor(0.9382)
iter 159200: loss nan, time 115.79ms
iter 159210: loss nan, time 117.02ms
iter 159220: loss nan, time 116.25ms
iter 159230: loss nan, time 115.86ms
iter 159240: loss nan, time 117.07ms
step 159250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 159250: loss nan, time 2906.74ms
iter 159260: loss nan, time 116.75ms
iter 159270: loss nan, time 116.02ms
iter 159280: loss nan, time 115.96ms
iter 159290: loss nan, time 115.86ms
tensor(0.9524)
iter 159300: loss nan, time 117.65ms
iter 159310: loss nan, time 115.57ms
iter 159320: loss nan, time 117.00ms
iter 159330: loss nan, time 115.85ms
iter 159340: loss nan, time 113.79ms
iter 159350: loss nan, time 115.90ms
iter 159360: loss nan, time 116.16ms
iter 159370: loss nan, time 117.14ms
iter 159380: loss nan, time 118.23ms
iter 159390: loss nan, time 116.21ms
tensor(0.9649)
iter 159400: loss nan, time 120.57ms
iter 159410: loss nan, time 120.13ms
iter 159420: loss nan, time 120.20ms
iter 159430: loss nan, time 120.69ms
iter 159440: loss nan, time 120.04ms
iter 159450: loss nan, time 121.97ms
iter 159460: loss nan, time 122.48ms
iter 159470: loss nan, time 122.13ms
iter 159480: loss nan, time 121.12ms
iter 159490: loss nan, time 119.07ms
tensor(0.9755)
step 159500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 159500: loss nan, time 2915.76ms
iter 159510: loss nan, time 120.82ms
iter 159520: loss nan, time 120.94ms
iter 159530: loss nan, time 121.28ms
iter 159540: loss nan, time 120.52ms
iter 159550: loss nan, time 118.79ms
iter 159560: loss nan, time 121.19ms
iter 159570: loss nan, time 119.03ms
iter 159580: loss nan, time 119.32ms
iter 159590: loss nan, time 119.24ms
tensor(0.9843)
iter 159600: loss nan, time 119.71ms
iter 159610: loss nan, time 118.05ms
iter 159620: loss nan, time 119.94ms
iter 159630: loss nan, time 119.93ms
iter 159640: loss nan, time 119.96ms
iter 159650: loss nan, time 120.11ms
iter 159660: loss nan, time 118.73ms
iter 159670: loss nan, time 119.84ms
iter 159680: loss nan, time 119.87ms
iter 159690: loss nan, time 119.86ms
tensor(0.9911)
iter 159700: loss nan, time 120.20ms
iter 159710: loss nan, time 118.83ms
iter 159720: loss nan, time 120.53ms
iter 159730: loss nan, time 120.58ms
iter 159740: loss nan, time 121.04ms
step 159750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 159750: loss nan, time 2908.21ms
iter 159760: loss nan, time 120.26ms
iter 159770: loss nan, time 119.36ms
iter 159780: loss nan, time 120.40ms
iter 159790: loss nan, time 120.73ms
tensor(0.9961)
iter 159800: loss nan, time 120.96ms
iter 159810: loss nan, time 120.52ms
iter 159820: loss nan, time 119.89ms
iter 159830: loss nan, time 120.60ms
iter 159840: loss nan, time 122.49ms
iter 159850: loss nan, time 122.10ms
iter 159860: loss nan, time 122.16ms
iter 159870: loss nan, time 120.91ms
iter 159880: loss nan, time 122.16ms
iter 159890: loss nan, time 122.30ms
tensor(0.9990)
iter 159900: loss nan, time 121.29ms
iter 159910: loss nan, time 121.07ms
iter 159920: loss nan, time 120.80ms
iter 159930: loss nan, time 119.80ms
iter 159940: loss nan, time 120.77ms
iter 159950: loss nan, time 120.83ms
iter 159960: loss nan, time 120.96ms
iter 159970: loss nan, time 120.83ms
iter 159980: loss nan, time 120.90ms
iter 159990: loss nan, time 121.00ms
tensor(1.)
step 160000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 160000: loss nan, time 2918.12ms
iter 160010: loss nan, time 120.77ms
iter 160020: loss nan, time 119.02ms
iter 160030: loss nan, time 118.72ms
iter 160040: loss nan, time 118.56ms
iter 160050: loss nan, time 118.95ms
iter 160060: loss nan, time 119.36ms
iter 160070: loss nan, time 119.74ms
iter 160080: loss nan, time 120.09ms
iter 160090: loss nan, time 120.13ms
tensor(0.9990)
iter 160100: loss nan, time 120.25ms
iter 160110: loss nan, time 119.98ms
iter 160120: loss nan, time 120.06ms
iter 160130: loss nan, time 120.00ms
iter 160140: loss nan, time 120.08ms
iter 160150: loss nan, time 120.27ms
iter 160160: loss nan, time 119.58ms
iter 160170: loss nan, time 120.58ms
iter 160180: loss nan, time 120.21ms
iter 160190: loss nan, time 119.86ms
tensor(0.9961)
iter 160200: loss nan, time 121.66ms
iter 160210: loss nan, time 119.92ms
iter 160220: loss nan, time 122.29ms
iter 160230: loss nan, time 122.02ms
iter 160240: loss nan, time 121.53ms
step 160250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 160250: loss nan, time 2899.81ms
iter 160260: loss nan, time 122.20ms
iter 160270: loss nan, time 120.05ms
iter 160280: loss nan, time 122.10ms
iter 160290: loss nan, time 122.47ms
tensor(0.9911)
iter 160300: loss nan, time 121.96ms
iter 160310: loss nan, time 120.96ms
iter 160320: loss nan, time 118.90ms
iter 160330: loss nan, time 120.90ms
iter 160340: loss nan, time 120.88ms
iter 160350: loss nan, time 121.37ms
iter 160360: loss nan, time 121.04ms
iter 160370: loss nan, time 118.82ms
iter 160380: loss nan, time 121.12ms
iter 160390: loss nan, time 121.14ms
tensor(0.9843)
iter 160400: loss nan, time 119.30ms
iter 160410: loss nan, time 119.23ms
iter 160420: loss nan, time 118.87ms
iter 160430: loss nan, time 118.87ms
iter 160440: loss nan, time 119.53ms
iter 160450: loss nan, time 120.01ms
iter 160460: loss nan, time 119.90ms
iter 160470: loss nan, time 120.02ms
iter 160480: loss nan, time 119.17ms
iter 160490: loss nan, time 119.97ms
tensor(0.9755)
step 160500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 160500: loss nan, time 2903.36ms
iter 160510: loss nan, time 120.28ms
iter 160520: loss nan, time 120.05ms
iter 160530: loss nan, time 120.11ms
iter 160540: loss nan, time 118.80ms
iter 160550: loss nan, time 119.82ms
iter 160560: loss nan, time 120.70ms
iter 160570: loss nan, time 119.94ms
iter 160580: loss nan, time 119.83ms
iter 160590: loss nan, time 118.74ms
tensor(0.9649)
iter 160600: loss nan, time 120.36ms
iter 160610: loss nan, time 120.50ms
iter 160620: loss nan, time 121.29ms
iter 160630: loss nan, time 122.16ms
iter 160640: loss nan, time 120.72ms
iter 160650: loss nan, time 122.05ms
iter 160660: loss nan, time 122.30ms
iter 160670: loss nan, time 122.15ms
iter 160680: loss nan, time 121.95ms
iter 160690: loss nan, time 120.93ms
tensor(0.9524)
iter 160700: loss nan, time 121.20ms
iter 160710: loss nan, time 120.84ms
iter 160720: loss nan, time 120.72ms
iter 160730: loss nan, time 120.68ms
iter 160740: loss nan, time 120.80ms
step 160750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 160750: loss nan, time 2906.41ms
iter 160760: loss nan, time 120.92ms
iter 160770: loss nan, time 121.08ms
iter 160780: loss nan, time 121.08ms
iter 160790: loss nan, time 121.12ms
tensor(0.9382)
iter 160800: loss nan, time 121.04ms
iter 160810: loss nan, time 121.31ms
iter 160820: loss nan, time 118.65ms
iter 160830: loss nan, time 118.69ms
iter 160840: loss nan, time 118.41ms
iter 160850: loss nan, time 119.16ms
iter 160860: loss nan, time 119.39ms
iter 160870: loss nan, time 119.63ms
iter 160880: loss nan, time 120.37ms
iter 160890: loss nan, time 119.91ms
tensor(0.9222)
iter 160900: loss nan, time 120.52ms
iter 160910: loss nan, time 120.10ms
iter 160920: loss nan, time 120.08ms
iter 160930: loss nan, time 119.90ms
iter 160940: loss nan, time 120.18ms
iter 160950: loss nan, time 119.91ms
iter 160960: loss nan, time 120.55ms
iter 160970: loss nan, time 121.02ms
iter 160980: loss nan, time 119.80ms
iter 160990: loss nan, time 119.95ms
tensor(0.9045)
step 161000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 161000: loss nan, time 2907.36ms
iter 161010: loss nan, time 119.72ms
iter 161020: loss nan, time 120.12ms
iter 161030: loss nan, time 120.32ms
iter 161040: loss nan, time 119.88ms
iter 161050: loss nan, time 120.44ms
iter 161060: loss nan, time 120.20ms
iter 161070: loss nan, time 120.95ms
iter 161080: loss nan, time 121.31ms
iter 161090: loss nan, time 119.84ms
tensor(0.8853)
iter 161100: loss nan, time 120.25ms
iter 161110: loss nan, time 121.18ms
iter 161120: loss nan, time 121.81ms
iter 161130: loss nan, time 122.11ms
iter 161140: loss nan, time 119.07ms
iter 161150: loss nan, time 121.13ms
iter 161160: loss nan, time 121.13ms
iter 161170: loss nan, time 121.01ms
iter 161180: loss nan, time 120.62ms
iter 161190: loss nan, time 118.76ms
tensor(0.8645)
iter 161200: loss nan, time 121.16ms
iter 161210: loss nan, time 119.88ms
iter 161220: loss nan, time 120.79ms
iter 161230: loss nan, time 120.61ms
iter 161240: loss nan, time 118.56ms
step 161250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 161250: loss nan, time 2902.95ms
iter 161260: loss nan, time 117.46ms
iter 161270: loss nan, time 116.65ms
iter 161280: loss nan, time 115.66ms
iter 161290: loss nan, time 117.07ms
tensor(0.8423)
iter 161300: loss nan, time 116.68ms
iter 161310: loss nan, time 114.89ms
iter 161320: loss nan, time 116.95ms
iter 161330: loss nan, time 115.83ms
iter 161340: loss nan, time 117.04ms
iter 161350: loss nan, time 118.25ms
iter 161360: loss nan, time 115.88ms
iter 161370: loss nan, time 113.94ms
iter 161380: loss nan, time 117.97ms
iter 161390: loss nan, time 115.54ms
tensor(0.8187)
iter 161400: loss nan, time 116.89ms
iter 161410: loss nan, time 118.06ms
iter 161420: loss nan, time 114.69ms
iter 161430: loss nan, time 115.85ms
iter 161440: loss nan, time 115.33ms
iter 161450: loss nan, time 115.19ms
iter 161460: loss nan, time 118.05ms
iter 161470: loss nan, time 115.22ms
iter 161480: loss nan, time 116.62ms
iter 161490: loss nan, time 115.95ms
tensor(0.7939)
step 161500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 161500: loss nan, time 2881.97ms
iter 161510: loss nan, time 116.26ms
iter 161520: loss nan, time 114.94ms
iter 161530: loss nan, time 118.07ms
iter 161540: loss nan, time 116.05ms
iter 161550: loss nan, time 116.96ms
iter 161560: loss nan, time 117.99ms
iter 161570: loss nan, time 115.42ms
iter 161580: loss nan, time 116.84ms
iter 161590: loss nan, time 120.02ms
tensor(0.7679)
iter 161600: loss nan, time 115.16ms
iter 161610: loss nan, time 116.89ms
iter 161620: loss nan, time 118.76ms
iter 161630: loss nan, time 114.67ms
iter 161640: loss nan, time 117.19ms
iter 161650: loss nan, time 116.52ms
iter 161660: loss nan, time 114.67ms
iter 161670: loss nan, time 117.99ms
iter 161680: loss nan, time 116.16ms
iter 161690: loss nan, time 114.70ms
tensor(0.7409)
iter 161700: loss nan, time 118.92ms
iter 161710: loss nan, time 115.39ms
iter 161720: loss nan, time 116.90ms
iter 161730: loss nan, time 118.04ms
iter 161740: loss nan, time 114.93ms
step 161750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 161750: loss nan, time 2922.73ms
iter 161760: loss nan, time 116.07ms
iter 161770: loss nan, time 114.61ms
iter 161780: loss nan, time 116.69ms
iter 161790: loss nan, time 116.22ms
tensor(0.7129)
iter 161800: loss nan, time 115.19ms
iter 161810: loss nan, time 117.89ms
iter 161820: loss nan, time 115.50ms
iter 161830: loss nan, time 115.15ms
iter 161840: loss nan, time 116.82ms
iter 161850: loss nan, time 115.02ms
iter 161860: loss nan, time 116.75ms
iter 161870: loss nan, time 116.99ms
iter 161880: loss nan, time 114.58ms
iter 161890: loss nan, time 118.01ms
tensor(0.6841)
iter 161900: loss nan, time 115.71ms
iter 161910: loss nan, time 115.20ms
iter 161920: loss nan, time 118.18ms
iter 161930: loss nan, time 115.85ms
iter 161940: loss nan, time 116.95ms
iter 161950: loss nan, time 117.90ms
iter 161960: loss nan, time 114.69ms
iter 161970: loss nan, time 116.97ms
iter 161980: loss nan, time 116.90ms
iter 161990: loss nan, time 114.60ms
tensor(0.6545)
step 162000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 162000: loss nan, time 2917.97ms
iter 162010: loss nan, time 116.74ms
iter 162020: loss nan, time 114.68ms
iter 162030: loss nan, time 118.31ms
iter 162040: loss nan, time 115.69ms
iter 162050: loss nan, time 114.68ms
iter 162060: loss nan, time 118.10ms
iter 162070: loss nan, time 116.05ms
iter 162080: loss nan, time 114.78ms
iter 162090: loss nan, time 117.05ms
tensor(0.6243)
iter 162100: loss nan, time 115.55ms
iter 162110: loss nan, time 118.10ms
iter 162120: loss nan, time 116.49ms
iter 162130: loss nan, time 115.18ms
iter 162140: loss nan, time 115.91ms
iter 162150: loss nan, time 116.06ms
iter 162160: loss nan, time 115.00ms
iter 162170: loss nan, time 118.13ms
iter 162180: loss nan, time 116.14ms
iter 162190: loss nan, time 116.32ms
tensor(0.5937)
iter 162200: loss nan, time 116.46ms
iter 162210: loss nan, time 115.47ms
iter 162220: loss nan, time 116.75ms
iter 162230: loss nan, time 118.74ms
iter 162240: loss nan, time 115.10ms
step 162250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 162250: loss nan, time 2920.68ms
iter 162260: loss nan, time 117.46ms
iter 162270: loss nan, time 114.71ms
iter 162280: loss nan, time 118.01ms
iter 162290: loss nan, time 115.39ms
tensor(0.5627)
iter 162300: loss nan, time 115.05ms
iter 162310: loss nan, time 118.21ms
iter 162320: loss nan, time 116.38ms
iter 162330: loss nan, time 115.07ms
iter 162340: loss nan, time 117.95ms
iter 162350: loss nan, time 115.18ms
iter 162360: loss nan, time 116.83ms
iter 162370: loss nan, time 118.42ms
iter 162380: loss nan, time 114.75ms
iter 162390: loss nan, time 117.54ms
tensor(0.5314)
iter 162400: loss nan, time 116.34ms
iter 162410: loss nan, time 115.35ms
iter 162420: loss nan, time 118.34ms
iter 162430: loss nan, time 113.77ms
iter 162440: loss nan, time 116.27ms
iter 162450: loss nan, time 117.15ms
iter 162460: loss nan, time 114.03ms
iter 162470: loss nan, time 117.98ms
iter 162480: loss nan, time 115.87ms
iter 162490: loss nan, time 114.59ms
tensor(0.5000)
step 162500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 162500: loss nan, time 2877.10ms
iter 162510: loss nan, time 116.92ms
iter 162520: loss nan, time 117.67ms
iter 162530: loss nan, time 114.59ms
iter 162540: loss nan, time 117.23ms
iter 162550: loss nan, time 115.80ms
iter 162560: loss nan, time 114.75ms
iter 162570: loss nan, time 118.37ms
iter 162580: loss nan, time 116.23ms
iter 162590: loss nan, time 115.13ms
tensor(0.4686)
iter 162600: loss nan, time 118.56ms
iter 162610: loss nan, time 114.75ms
iter 162620: loss nan, time 117.03ms
iter 162630: loss nan, time 118.54ms
iter 162640: loss nan, time 115.93ms
iter 162650: loss nan, time 116.96ms
iter 162660: loss nan, time 118.10ms
iter 162670: loss nan, time 115.18ms
iter 162680: loss nan, time 117.03ms
iter 162690: loss nan, time 118.10ms
tensor(0.4373)
iter 162700: loss nan, time 115.76ms
iter 162710: loss nan, time 116.83ms
iter 162720: loss nan, time 117.58ms
iter 162730: loss nan, time 114.75ms
iter 162740: loss nan, time 117.15ms
step 162750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 162750: loss nan, time 2904.59ms
iter 162760: loss nan, time 116.06ms
iter 162770: loss nan, time 114.34ms
iter 162780: loss nan, time 116.04ms
iter 162790: loss nan, time 114.06ms
tensor(0.4063)
iter 162800: loss nan, time 117.13ms
iter 162810: loss nan, time 112.71ms
iter 162820: loss nan, time 117.69ms
iter 162830: loss nan, time 115.46ms
iter 162840: loss nan, time 116.82ms
iter 162850: loss nan, time 117.81ms
iter 162860: loss nan, time 114.70ms
iter 162870: loss nan, time 117.98ms
iter 162880: loss nan, time 115.87ms
iter 162890: loss nan, time 115.05ms
tensor(0.3757)
iter 162900: loss nan, time 118.64ms
iter 162910: loss nan, time 115.28ms
iter 162920: loss nan, time 114.96ms
iter 162930: loss nan, time 118.08ms
iter 162940: loss nan, time 114.58ms
iter 162950: loss nan, time 116.87ms
iter 162960: loss nan, time 117.03ms
iter 162970: loss nan, time 114.57ms
iter 162980: loss nan, time 115.75ms
iter 162990: loss nan, time 115.14ms
tensor(0.3455)
step 163000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 163000: loss nan, time 2907.25ms
iter 163010: loss nan, time 116.38ms
iter 163020: loss nan, time 114.60ms
iter 163030: loss nan, time 118.35ms
iter 163040: loss nan, time 115.79ms
iter 163050: loss nan, time 115.78ms
iter 163060: loss nan, time 117.89ms
iter 163070: loss nan, time 115.10ms
iter 163080: loss nan, time 116.72ms
iter 163090: loss nan, time 117.07ms
tensor(0.3159)
iter 163100: loss nan, time 115.15ms
iter 163110: loss nan, time 117.89ms
iter 163120: loss nan, time 116.13ms
iter 163130: loss nan, time 114.79ms
iter 163140: loss nan, time 117.99ms
iter 163150: loss nan, time 115.30ms
iter 163160: loss nan, time 116.81ms
iter 163170: loss nan, time 117.81ms
iter 163180: loss nan, time 115.14ms
iter 163190: loss nan, time 117.96ms
tensor(0.2871)
iter 163200: loss nan, time 116.85ms
iter 163210: loss nan, time 115.26ms
iter 163220: loss nan, time 117.97ms
iter 163230: loss nan, time 116.17ms
iter 163240: loss nan, time 114.51ms
step 163250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 163250: loss nan, time 2922.23ms
iter 163260: loss nan, time 115.63ms
iter 163270: loss nan, time 116.06ms
iter 163280: loss nan, time 117.98ms
iter 163290: loss nan, time 114.91ms
tensor(0.2591)
iter 163300: loss nan, time 117.13ms
iter 163310: loss nan, time 115.82ms
iter 163320: loss nan, time 114.57ms
iter 163330: loss nan, time 116.37ms
iter 163340: loss nan, time 116.09ms
iter 163350: loss nan, time 115.36ms
iter 163360: loss nan, time 117.93ms
iter 163370: loss nan, time 115.28ms
iter 163380: loss nan, time 116.78ms
iter 163390: loss nan, time 116.16ms
tensor(0.2321)
iter 163400: loss nan, time 115.34ms
iter 163410: loss nan, time 117.91ms
iter 163420: loss nan, time 115.98ms
iter 163430: loss nan, time 114.72ms
iter 163440: loss nan, time 117.69ms
iter 163450: loss nan, time 114.49ms
iter 163460: loss nan, time 116.71ms
iter 163470: loss nan, time 116.95ms
iter 163480: loss nan, time 114.44ms
iter 163490: loss nan, time 117.91ms
tensor(0.2061)
step 163500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 163500: loss nan, time 2906.52ms
iter 163510: loss nan, time 114.29ms
iter 163520: loss nan, time 117.40ms
iter 163530: loss nan, time 114.53ms
iter 163540: loss nan, time 118.37ms
iter 163550: loss nan, time 116.05ms
iter 163560: loss nan, time 115.81ms
iter 163570: loss nan, time 115.66ms
iter 163580: loss nan, time 114.39ms
iter 163590: loss nan, time 116.71ms
tensor(0.1813)
iter 163600: loss nan, time 115.95ms
iter 163610: loss nan, time 114.55ms
iter 163620: loss nan, time 117.86ms
iter 163630: loss nan, time 115.07ms
iter 163640: loss nan, time 116.63ms
iter 163650: loss nan, time 115.29ms
iter 163660: loss nan, time 113.94ms
iter 163670: loss nan, time 118.30ms
iter 163680: loss nan, time 115.46ms
iter 163690: loss nan, time 116.77ms
tensor(0.1577)
iter 163700: loss nan, time 118.29ms
iter 163710: loss nan, time 114.98ms
iter 163720: loss nan, time 117.10ms
iter 163730: loss nan, time 116.99ms
iter 163740: loss nan, time 114.71ms
step 163750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 163750: loss nan, time 2922.38ms
iter 163760: loss nan, time 116.98ms
iter 163770: loss nan, time 114.65ms
iter 163780: loss nan, time 118.02ms
iter 163790: loss nan, time 116.29ms
tensor(0.1355)
iter 163800: loss nan, time 115.04ms
iter 163810: loss nan, time 118.12ms
iter 163820: loss nan, time 115.32ms
iter 163830: loss nan, time 114.73ms
iter 163840: loss nan, time 118.25ms
iter 163850: loss nan, time 114.66ms
iter 163860: loss nan, time 117.10ms
iter 163870: loss nan, time 116.28ms
iter 163880: loss nan, time 114.92ms
iter 163890: loss nan, time 115.95ms
tensor(0.1147)
iter 163900: loss nan, time 116.48ms
iter 163910: loss nan, time 115.11ms
iter 163920: loss nan, time 118.41ms
iter 163930: loss nan, time 116.96ms
iter 163940: loss nan, time 115.06ms
iter 163950: loss nan, time 116.05ms
iter 163960: loss nan, time 115.41ms
iter 163970: loss nan, time 117.26ms
iter 163980: loss nan, time 118.21ms
iter 163990: loss nan, time 115.23ms
tensor(0.0955)
step 164000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 164000: loss nan, time 2901.09ms
iter 164010: loss nan, time 116.44ms
iter 164020: loss nan, time 114.89ms
iter 164030: loss nan, time 117.91ms
iter 164040: loss nan, time 115.03ms
iter 164050: loss nan, time 116.45ms
iter 164060: loss nan, time 118.43ms
iter 164070: loss nan, time 115.87ms
iter 164080: loss nan, time 116.85ms
iter 164090: loss nan, time 118.25ms
tensor(0.0778)
iter 164100: loss nan, time 115.35ms
iter 164110: loss nan, time 117.00ms
iter 164120: loss nan, time 118.45ms
iter 164130: loss nan, time 115.95ms
iter 164140: loss nan, time 117.05ms
iter 164150: loss nan, time 117.95ms
iter 164160: loss nan, time 114.57ms
iter 164170: loss nan, time 117.19ms
iter 164180: loss nan, time 118.99ms
iter 164190: loss nan, time 115.54ms
tensor(0.0618)
iter 164200: loss nan, time 116.84ms
iter 164210: loss nan, time 117.91ms
iter 164220: loss nan, time 114.55ms
iter 164230: loss nan, time 116.64ms
iter 164240: loss nan, time 121.56ms
step 164250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 164250: loss nan, time 2903.63ms
iter 164260: loss nan, time 120.06ms
iter 164270: loss nan, time 120.05ms
iter 164280: loss nan, time 121.16ms
iter 164290: loss nan, time 121.32ms
tensor(0.0476)
iter 164300: loss nan, time 121.23ms
iter 164310: loss nan, time 122.52ms
iter 164320: loss nan, time 120.01ms
iter 164330: loss nan, time 122.20ms
iter 164340: loss nan, time 122.66ms
iter 164350: loss nan, time 122.17ms
iter 164360: loss nan, time 120.90ms
iter 164370: loss nan, time 119.48ms
iter 164380: loss nan, time 121.06ms
iter 164390: loss nan, time 119.25ms
tensor(0.0351)
iter 164400: loss nan, time 119.32ms
iter 164410: loss nan, time 118.97ms
iter 164420: loss nan, time 118.57ms
iter 164430: loss nan, time 117.00ms
iter 164440: loss nan, time 117.21ms
iter 164450: loss nan, time 116.29ms
iter 164460: loss nan, time 114.37ms
iter 164470: loss nan, time 116.44ms
iter 164480: loss nan, time 115.39ms
iter 164490: loss nan, time 116.07ms
tensor(0.0245)
step 164500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 164500: loss nan, time 2863.04ms
iter 164510: loss nan, time 120.27ms
iter 164520: loss nan, time 121.03ms
iter 164530: loss nan, time 118.28ms
iter 164540: loss nan, time 121.02ms
iter 164550: loss nan, time 118.59ms
iter 164560: loss nan, time 119.35ms
iter 164570: loss nan, time 118.73ms
iter 164580: loss nan, time 119.32ms
iter 164590: loss nan, time 119.02ms
tensor(0.0157)
iter 164600: loss nan, time 120.54ms
iter 164610: loss nan, time 118.60ms
iter 164620: loss nan, time 118.89ms
iter 164630: loss nan, time 115.35ms
iter 164640: loss nan, time 117.06ms
iter 164650: loss nan, time 116.39ms
iter 164660: loss nan, time 115.95ms
iter 164670: loss nan, time 116.67ms
iter 164680: loss nan, time 116.31ms
iter 164690: loss nan, time 115.67ms
tensor(0.0089)
iter 164700: loss nan, time 117.49ms
iter 164710: loss nan, time 116.24ms
iter 164720: loss nan, time 115.96ms
iter 164730: loss nan, time 117.16ms
iter 164740: loss nan, time 117.21ms
step 164750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 164750: loss nan, time 2907.69ms
iter 164760: loss nan, time 116.58ms
iter 164770: loss nan, time 114.76ms
iter 164780: loss nan, time 116.51ms
iter 164790: loss nan, time 117.40ms
tensor(0.0039)
iter 164800: loss nan, time 114.87ms
iter 164810: loss nan, time 116.73ms
iter 164820: loss nan, time 115.74ms
iter 164830: loss nan, time 116.30ms
iter 164840: loss nan, time 117.49ms
iter 164850: loss nan, time 115.37ms
iter 164860: loss nan, time 115.78ms
iter 164870: loss nan, time 115.83ms
iter 164880: loss nan, time 116.82ms
iter 164890: loss nan, time 118.14ms
tensor(0.0010)
iter 164900: loss nan, time 116.05ms
iter 164910: loss nan, time 115.03ms
iter 164920: loss nan, time 116.18ms
iter 164930: loss nan, time 115.19ms
iter 164940: loss nan, time 118.17ms
iter 164950: loss nan, time 115.30ms
iter 164960: loss nan, time 117.00ms
iter 164970: loss nan, time 115.81ms
iter 164980: loss nan, time 116.02ms
iter 164990: loss nan, time 116.91ms
tensor(0.0010)
step 165000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 165000: loss nan, time 2895.54ms
iter 165010: loss nan, time 116.60ms
iter 165020: loss nan, time 116.49ms
iter 165030: loss nan, time 115.62ms
iter 165040: loss nan, time 116.80ms
iter 165050: loss nan, time 115.77ms
iter 165060: loss nan, time 115.99ms
iter 165070: loss nan, time 118.10ms
iter 165080: loss nan, time 115.84ms
iter 165090: loss nan, time 116.69ms
tensor(0.0010)
iter 165100: loss nan, time 117.24ms
iter 165110: loss nan, time 115.58ms
iter 165120: loss nan, time 116.72ms
iter 165130: loss nan, time 116.33ms
iter 165140: loss nan, time 114.77ms
iter 165150: loss nan, time 118.13ms
iter 165160: loss nan, time 115.66ms
iter 165170: loss nan, time 114.47ms
iter 165180: loss nan, time 116.88ms
iter 165190: loss nan, time 115.09ms
tensor(0.0039)
iter 165200: loss nan, time 117.01ms
iter 165210: loss nan, time 115.74ms
iter 165220: loss nan, time 116.56ms
iter 165230: loss nan, time 115.92ms
iter 165240: loss nan, time 115.58ms
step 165250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 165250: loss nan, time 2901.88ms
iter 165260: loss nan, time 114.57ms
iter 165270: loss nan, time 116.48ms
iter 165280: loss nan, time 116.53ms
iter 165290: loss nan, time 115.15ms
tensor(0.0089)
iter 165300: loss nan, time 116.86ms
iter 165310: loss nan, time 115.60ms
iter 165320: loss nan, time 116.65ms
iter 165330: loss nan, time 118.10ms
iter 165340: loss nan, time 115.67ms
iter 165350: loss nan, time 116.65ms
iter 165360: loss nan, time 116.28ms
iter 165370: loss nan, time 115.37ms
iter 165380: loss nan, time 116.56ms
iter 165390: loss nan, time 116.81ms
tensor(0.0157)
iter 165400: loss nan, time 117.04ms
iter 165410: loss nan, time 118.11ms
iter 165420: loss nan, time 115.93ms
iter 165430: loss nan, time 114.58ms
iter 165440: loss nan, time 116.88ms
iter 165450: loss nan, time 115.80ms
iter 165460: loss nan, time 116.68ms
iter 165470: loss nan, time 116.76ms
iter 165480: loss nan, time 115.55ms
iter 165490: loss nan, time 114.64ms
tensor(0.0245)
step 165500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 165500: loss nan, time 2912.10ms
iter 165510: loss nan, time 116.79ms
iter 165520: loss nan, time 116.86ms
iter 165530: loss nan, time 115.98ms
iter 165540: loss nan, time 117.47ms
iter 165550: loss nan, time 116.57ms
iter 165560: loss nan, time 116.51ms
iter 165570: loss nan, time 117.22ms
iter 165580: loss nan, time 116.10ms
iter 165590: loss nan, time 115.92ms
tensor(0.0351)
iter 165600: loss nan, time 117.59ms
iter 165610: loss nan, time 116.95ms
iter 165620: loss nan, time 115.64ms
iter 165630: loss nan, time 116.88ms
iter 165640: loss nan, time 114.99ms
iter 165650: loss nan, time 115.64ms
iter 165660: loss nan, time 116.85ms
iter 165670: loss nan, time 115.11ms
iter 165680: loss nan, time 114.78ms
iter 165690: loss nan, time 116.88ms
tensor(0.0476)
iter 165700: loss nan, time 114.98ms
iter 165710: loss nan, time 117.21ms
iter 165720: loss nan, time 118.09ms
iter 165730: loss nan, time 114.89ms
iter 165740: loss nan, time 116.72ms
step 165750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 165750: loss nan, time 2914.93ms
iter 165760: loss nan, time 115.09ms
iter 165770: loss nan, time 116.61ms
iter 165780: loss nan, time 115.77ms
iter 165790: loss nan, time 116.90ms
tensor(0.0618)
iter 165800: loss nan, time 118.19ms
iter 165810: loss nan, time 115.77ms
iter 165820: loss nan, time 114.53ms
iter 165830: loss nan, time 116.54ms
iter 165840: loss nan, time 114.67ms
iter 165850: loss nan, time 118.02ms
iter 165860: loss nan, time 115.78ms
iter 165870: loss nan, time 116.73ms
iter 165880: loss nan, time 116.07ms
iter 165890: loss nan, time 115.34ms
tensor(0.0778)
iter 165900: loss nan, time 117.35ms
iter 165910: loss nan, time 115.93ms
iter 165920: loss nan, time 114.97ms
iter 165930: loss nan, time 116.87ms
iter 165940: loss nan, time 115.67ms
iter 165950: loss nan, time 116.65ms
iter 165960: loss nan, time 116.54ms
iter 165970: loss nan, time 116.13ms
iter 165980: loss nan, time 116.99ms
iter 165990: loss nan, time 115.74ms
tensor(0.0955)
step 166000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 166000: loss nan, time 2909.46ms
iter 166010: loss nan, time 117.87ms
iter 166020: loss nan, time 116.29ms
iter 166030: loss nan, time 116.66ms
iter 166040: loss nan, time 116.61ms
iter 166050: loss nan, time 115.59ms
iter 166060: loss nan, time 116.58ms
iter 166070: loss nan, time 115.69ms
iter 166080: loss nan, time 114.62ms
iter 166090: loss nan, time 117.94ms
tensor(0.1147)
iter 166100: loss nan, time 116.35ms
iter 166110: loss nan, time 117.12ms
iter 166120: loss nan, time 116.89ms
iter 166130: loss nan, time 115.53ms
iter 166140: loss nan, time 115.02ms
iter 166150: loss nan, time 116.12ms
iter 166160: loss nan, time 114.76ms
iter 166170: loss nan, time 117.99ms
iter 166180: loss nan, time 115.85ms
iter 166190: loss nan, time 116.94ms
tensor(0.1355)
iter 166200: loss nan, time 116.23ms
iter 166210: loss nan, time 115.90ms
iter 166220: loss nan, time 116.78ms
iter 166230: loss nan, time 116.45ms
iter 166240: loss nan, time 115.19ms
step 166250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 166250: loss nan, time 2909.27ms
iter 166260: loss nan, time 114.93ms
iter 166270: loss nan, time 117.31ms
iter 166280: loss nan, time 117.60ms
iter 166290: loss nan, time 114.97ms
tensor(0.1577)
iter 166300: loss nan, time 116.69ms
iter 166310: loss nan, time 116.52ms
iter 166320: loss nan, time 114.97ms
iter 166330: loss nan, time 116.68ms
iter 166340: loss nan, time 116.25ms
iter 166350: loss nan, time 116.87ms
iter 166360: loss nan, time 118.07ms
iter 166370: loss nan, time 115.88ms
iter 166380: loss nan, time 116.90ms
iter 166390: loss nan, time 118.43ms
tensor(0.1813)
iter 166400: loss nan, time 116.28ms
iter 166410: loss nan, time 116.96ms
iter 166420: loss nan, time 117.62ms
iter 166430: loss nan, time 115.94ms
iter 166440: loss nan, time 117.61ms
iter 166450: loss nan, time 116.52ms
iter 166460: loss nan, time 114.41ms
iter 166470: loss nan, time 117.08ms
iter 166480: loss nan, time 115.87ms
iter 166490: loss nan, time 116.89ms
tensor(0.2061)
step 166500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 166500: loss nan, time 2916.40ms
iter 166510: loss nan, time 116.01ms
iter 166520: loss nan, time 117.34ms
iter 166530: loss nan, time 115.86ms
iter 166540: loss nan, time 115.47ms
iter 166550: loss nan, time 116.61ms
iter 166560: loss nan, time 115.76ms
iter 166570: loss nan, time 115.29ms
iter 166580: loss nan, time 117.00ms
iter 166590: loss nan, time 115.83ms
tensor(0.2321)
iter 166600: loss nan, time 117.07ms
iter 166610: loss nan, time 117.14ms
iter 166620: loss nan, time 116.35ms
iter 166630: loss nan, time 116.70ms
iter 166640: loss nan, time 115.64ms
iter 166650: loss nan, time 115.09ms
iter 166660: loss nan, time 117.04ms
iter 166670: loss nan, time 114.68ms
iter 166680: loss nan, time 116.75ms
iter 166690: loss nan, time 117.99ms
tensor(0.2591)
iter 166700: loss nan, time 116.17ms
iter 166710: loss nan, time 116.94ms
iter 166720: loss nan, time 117.63ms
iter 166730: loss nan, time 114.56ms
iter 166740: loss nan, time 116.90ms
step 166750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 166750: loss nan, time 2913.39ms
iter 166760: loss nan, time 115.01ms
iter 166770: loss nan, time 117.37ms
iter 166780: loss nan, time 115.94ms
iter 166790: loss nan, time 114.90ms
tensor(0.2871)
iter 166800: loss nan, time 118.34ms
iter 166810: loss nan, time 115.84ms
iter 166820: loss nan, time 116.93ms
iter 166830: loss nan, time 117.90ms
iter 166840: loss nan, time 115.80ms
iter 166850: loss nan, time 115.01ms
iter 166860: loss nan, time 116.75ms
iter 166870: loss nan, time 114.77ms
iter 166880: loss nan, time 113.84ms
iter 166890: loss nan, time 114.14ms
tensor(0.3159)
iter 166900: loss nan, time 114.04ms
iter 166910: loss nan, time 114.86ms
iter 166920: loss nan, time 113.73ms
iter 166930: loss nan, time 115.40ms
iter 166940: loss nan, time 114.56ms
iter 166950: loss nan, time 116.41ms
iter 166960: loss nan, time 113.24ms
iter 166970: loss nan, time 115.50ms
iter 166980: loss nan, time 115.01ms
iter 166990: loss nan, time 117.82ms
tensor(0.3455)
step 167000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 167000: loss nan, time 2898.82ms
iter 167010: loss nan, time 117.14ms
iter 167020: loss nan, time 115.63ms
iter 167030: loss nan, time 114.78ms
iter 167040: loss nan, time 118.32ms
iter 167050: loss nan, time 115.76ms
iter 167060: loss nan, time 114.52ms
iter 167070: loss nan, time 117.51ms
iter 167080: loss nan, time 115.55ms
iter 167090: loss nan, time 116.04ms
tensor(0.3757)
iter 167100: loss nan, time 116.32ms
iter 167110: loss nan, time 114.75ms
iter 167120: loss nan, time 116.15ms
iter 167130: loss nan, time 116.24ms
iter 167140: loss nan, time 116.73ms
iter 167150: loss nan, time 117.99ms
iter 167160: loss nan, time 115.76ms
iter 167170: loss nan, time 116.74ms
iter 167180: loss nan, time 116.19ms
iter 167190: loss nan, time 115.34ms
tensor(0.4063)
iter 167200: loss nan, time 117.10ms
iter 167210: loss nan, time 115.86ms
iter 167220: loss nan, time 114.71ms
iter 167230: loss nan, time 118.06ms
iter 167240: loss nan, time 115.87ms
step 167250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 167250: loss nan, time 2916.03ms
iter 167260: loss nan, time 117.43ms
iter 167270: loss nan, time 114.65ms
iter 167280: loss nan, time 116.98ms
iter 167290: loss nan, time 116.69ms
tensor(0.4373)
iter 167300: loss nan, time 116.57ms
iter 167310: loss nan, time 116.72ms
iter 167320: loss nan, time 116.32ms
iter 167330: loss nan, time 114.55ms
iter 167340: loss nan, time 116.74ms
iter 167350: loss nan, time 116.75ms
iter 167360: loss nan, time 115.15ms
iter 167370: loss nan, time 119.79ms
iter 167380: loss nan, time 120.36ms
iter 167390: loss nan, time 121.13ms
tensor(0.4686)
iter 167400: loss nan, time 120.47ms
iter 167410: loss nan, time 119.58ms
iter 167420: loss nan, time 119.46ms
iter 167430: loss nan, time 121.66ms
iter 167440: loss nan, time 120.74ms
iter 167450: loss nan, time 122.46ms
iter 167460: loss nan, time 122.38ms
iter 167470: loss nan, time 121.17ms
iter 167480: loss nan, time 121.21ms
iter 167490: loss nan, time 120.81ms
tensor(0.5000)
step 167500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 167500: loss nan, time 2910.45ms
iter 167510: loss nan, time 120.92ms
iter 167520: loss nan, time 120.91ms
iter 167530: loss nan, time 120.82ms
iter 167540: loss nan, time 120.60ms
iter 167550: loss nan, time 120.87ms
iter 167560: loss nan, time 121.89ms
iter 167570: loss nan, time 121.03ms
iter 167580: loss nan, time 118.93ms
iter 167590: loss nan, time 118.69ms
tensor(0.5314)
iter 167600: loss nan, time 119.08ms
iter 167610: loss nan, time 118.86ms
iter 167620: loss nan, time 120.45ms
iter 167630: loss nan, time 120.33ms
iter 167640: loss nan, time 120.43ms
iter 167650: loss nan, time 120.70ms
iter 167660: loss nan, time 121.54ms
iter 167670: loss nan, time 122.81ms
iter 167680: loss nan, time 120.45ms
iter 167690: loss nan, time 121.39ms
tensor(0.5627)
iter 167700: loss nan, time 121.62ms
iter 167710: loss nan, time 121.56ms
iter 167720: loss nan, time 121.34ms
iter 167730: loss nan, time 119.25ms
iter 167740: loss nan, time 118.21ms
step 167750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 167750: loss nan, time 2895.73ms
iter 167760: loss nan, time 119.16ms
iter 167770: loss nan, time 119.37ms
iter 167780: loss nan, time 119.75ms
iter 167790: loss nan, time 120.13ms
tensor(0.5937)
iter 167800: loss nan, time 118.61ms
iter 167810: loss nan, time 118.80ms
iter 167820: loss nan, time 120.26ms
iter 167830: loss nan, time 120.68ms
iter 167840: loss nan, time 121.45ms
iter 167850: loss nan, time 121.49ms
iter 167860: loss nan, time 121.98ms
iter 167870: loss nan, time 121.43ms
iter 167880: loss nan, time 121.20ms
iter 167890: loss nan, time 121.39ms
tensor(0.6243)
iter 167900: loss nan, time 121.64ms
iter 167910: loss nan, time 120.52ms
iter 167920: loss nan, time 119.97ms
iter 167930: loss nan, time 119.96ms
iter 167940: loss nan, time 121.03ms
iter 167950: loss nan, time 120.85ms
iter 167960: loss nan, time 121.96ms
iter 167970: loss nan, time 122.71ms
iter 167980: loss nan, time 120.24ms
iter 167990: loss nan, time 122.67ms
tensor(0.6545)
step 168000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 168000: loss nan, time 2910.16ms
iter 168010: loss nan, time 122.54ms
iter 168020: loss nan, time 122.43ms
iter 168030: loss nan, time 121.41ms
iter 168040: loss nan, time 119.12ms
iter 168050: loss nan, time 121.23ms
iter 168060: loss nan, time 121.18ms
iter 168070: loss nan, time 119.74ms
iter 168080: loss nan, time 119.15ms
iter 168090: loss nan, time 119.06ms
tensor(0.6841)
iter 168100: loss nan, time 118.51ms
iter 168110: loss nan, time 119.69ms
iter 168120: loss nan, time 120.40ms
iter 168130: loss nan, time 120.39ms
iter 168140: loss nan, time 120.80ms
iter 168150: loss nan, time 120.09ms
iter 168160: loss nan, time 121.79ms
iter 168170: loss nan, time 122.39ms
iter 168180: loss nan, time 122.84ms
iter 168190: loss nan, time 122.39ms
tensor(0.7129)
iter 168200: loss nan, time 121.41ms
iter 168210: loss nan, time 121.30ms
iter 168220: loss nan, time 121.25ms
iter 168230: loss nan, time 121.46ms
iter 168240: loss nan, time 119.02ms
step 168250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 168250: loss nan, time 2914.54ms
iter 168260: loss nan, time 119.46ms
iter 168270: loss nan, time 119.01ms
iter 168280: loss nan, time 119.35ms
iter 168290: loss nan, time 119.19ms
tensor(0.7409)
iter 168300: loss nan, time 120.40ms
iter 168310: loss nan, time 120.14ms
iter 168320: loss nan, time 120.99ms
iter 168330: loss nan, time 121.85ms
iter 168340: loss nan, time 120.06ms
iter 168350: loss nan, time 123.05ms
iter 168360: loss nan, time 122.21ms
iter 168370: loss nan, time 122.31ms
iter 168380: loss nan, time 122.81ms
iter 168390: loss nan, time 119.14ms
tensor(0.7679)
iter 168400: loss nan, time 121.08ms
iter 168410: loss nan, time 119.32ms
iter 168420: loss nan, time 119.43ms
iter 168430: loss nan, time 118.93ms
iter 168440: loss nan, time 119.02ms
iter 168450: loss nan, time 118.92ms
iter 168460: loss nan, time 119.17ms
iter 168470: loss nan, time 120.11ms
iter 168480: loss nan, time 120.24ms
iter 168490: loss nan, time 120.81ms
tensor(0.7939)
step 168500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 168500: loss nan, time 2918.51ms
iter 168510: loss nan, time 120.60ms
iter 168520: loss nan, time 122.48ms
iter 168530: loss nan, time 122.45ms
iter 168540: loss nan, time 122.42ms
iter 168550: loss nan, time 122.17ms
iter 168560: loss nan, time 121.17ms
iter 168570: loss nan, time 121.18ms
iter 168580: loss nan, time 119.13ms
iter 168590: loss nan, time 119.19ms
tensor(0.8187)
iter 168600: loss nan, time 119.24ms
iter 168610: loss nan, time 118.96ms
iter 168620: loss nan, time 118.14ms
iter 168630: loss nan, time 119.26ms
iter 168640: loss nan, time 120.02ms
iter 168650: loss nan, time 120.48ms
iter 168660: loss nan, time 120.35ms
iter 168670: loss nan, time 121.10ms
iter 168680: loss nan, time 121.62ms
iter 168690: loss nan, time 120.40ms
tensor(0.8423)
iter 168700: loss nan, time 122.75ms
iter 168710: loss nan, time 122.52ms
iter 168720: loss nan, time 122.91ms
iter 168730: loss nan, time 121.25ms
iter 168740: loss nan, time 118.98ms
step 168750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 168750: loss nan, time 2918.68ms
iter 168760: loss nan, time 121.49ms
iter 168770: loss nan, time 118.94ms
iter 168780: loss nan, time 119.14ms
iter 168790: loss nan, time 119.01ms
tensor(0.8645)
iter 168800: loss nan, time 119.49ms
iter 168810: loss nan, time 119.01ms
iter 168820: loss nan, time 120.28ms
iter 168830: loss nan, time 120.42ms
iter 168840: loss nan, time 120.89ms
iter 168850: loss nan, time 121.36ms
iter 168860: loss nan, time 120.91ms
iter 168870: loss nan, time 122.32ms
iter 168880: loss nan, time 122.43ms
iter 168890: loss nan, time 122.36ms
tensor(0.8853)
iter 168900: loss nan, time 122.74ms
iter 168910: loss nan, time 121.12ms
iter 168920: loss nan, time 121.30ms
iter 168930: loss nan, time 121.07ms
iter 168940: loss nan, time 119.22ms
iter 168950: loss nan, time 118.89ms
iter 168960: loss nan, time 119.02ms
iter 168970: loss nan, time 119.21ms
iter 168980: loss nan, time 118.92ms
iter 168990: loss nan, time 118.97ms
tensor(0.9045)
step 169000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 169000: loss nan, time 2915.32ms
iter 169010: loss nan, time 119.49ms
iter 169020: loss nan, time 119.74ms
iter 169030: loss nan, time 120.37ms
iter 169040: loss nan, time 121.15ms
iter 169050: loss nan, time 120.12ms
iter 169060: loss nan, time 121.95ms
iter 169070: loss nan, time 116.71ms
iter 169080: loss nan, time 115.46ms
iter 169090: loss nan, time 118.61ms
tensor(0.9222)
iter 169100: loss nan, time 116.56ms
iter 169110: loss nan, time 114.52ms
iter 169120: loss nan, time 118.39ms
iter 169130: loss nan, time 113.07ms
iter 169140: loss nan, time 116.50ms
iter 169150: loss nan, time 114.57ms
iter 169160: loss nan, time 116.50ms
iter 169170: loss nan, time 113.14ms
iter 169180: loss nan, time 117.30ms
iter 169190: loss nan, time 113.10ms
tensor(0.9382)
iter 169200: loss nan, time 116.83ms
iter 169210: loss nan, time 114.89ms
iter 169220: loss nan, time 118.35ms
iter 169230: loss nan, time 117.05ms
iter 169240: loss nan, time 114.99ms
step 169250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 169250: loss nan, time 2895.84ms
iter 169260: loss nan, time 115.34ms
iter 169270: loss nan, time 116.94ms
iter 169280: loss nan, time 118.22ms
iter 169290: loss nan, time 115.92ms
tensor(0.9524)
iter 169300: loss nan, time 117.93ms
iter 169310: loss nan, time 116.14ms
iter 169320: loss nan, time 116.26ms
iter 169330: loss nan, time 117.06ms
iter 169340: loss nan, time 118.33ms
iter 169350: loss nan, time 115.46ms
iter 169360: loss nan, time 117.06ms
iter 169370: loss nan, time 116.02ms
iter 169380: loss nan, time 115.62ms
iter 169390: loss nan, time 116.45ms
tensor(0.9649)
iter 169400: loss nan, time 119.13ms
iter 169410: loss nan, time 115.10ms
iter 169420: loss nan, time 116.99ms
iter 169430: loss nan, time 116.37ms
iter 169440: loss nan, time 115.10ms
iter 169450: loss nan, time 116.99ms
iter 169460: loss nan, time 116.62ms
iter 169470: loss nan, time 115.11ms
iter 169480: loss nan, time 118.03ms
iter 169490: loss nan, time 116.06ms
tensor(0.9755)
step 169500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 169500: loss nan, time 2908.95ms
iter 169510: loss nan, time 118.12ms
iter 169520: loss nan, time 114.59ms
iter 169530: loss nan, time 116.75ms
iter 169540: loss nan, time 116.83ms
iter 169550: loss nan, time 115.11ms
iter 169560: loss nan, time 117.79ms
iter 169570: loss nan, time 116.97ms
iter 169580: loss nan, time 114.86ms
iter 169590: loss nan, time 118.30ms
tensor(0.9843)
iter 169600: loss nan, time 117.74ms
iter 169610: loss nan, time 114.46ms
iter 169620: loss nan, time 118.20ms
iter 169630: loss nan, time 116.55ms
iter 169640: loss nan, time 115.08ms
iter 169650: loss nan, time 118.09ms
iter 169660: loss nan, time 115.29ms
iter 169670: loss nan, time 115.02ms
iter 169680: loss nan, time 118.22ms
iter 169690: loss nan, time 114.08ms
tensor(0.9911)
iter 169700: loss nan, time 116.58ms
iter 169710: loss nan, time 118.11ms
iter 169720: loss nan, time 113.87ms
iter 169730: loss nan, time 116.87ms
iter 169740: loss nan, time 117.27ms
step 169750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 169750: loss nan, time 2901.27ms
iter 169760: loss nan, time 116.40ms
iter 169770: loss nan, time 116.08ms
iter 169780: loss nan, time 116.68ms
iter 169790: loss nan, time 118.46ms
tensor(0.9961)
iter 169800: loss nan, time 116.43ms
iter 169810: loss nan, time 115.21ms
iter 169820: loss nan, time 116.10ms
iter 169830: loss nan, time 115.96ms
iter 169840: loss nan, time 116.17ms
iter 169850: loss nan, time 118.11ms
iter 169860: loss nan, time 115.73ms
iter 169870: loss nan, time 115.82ms
iter 169880: loss nan, time 116.19ms
iter 169890: loss nan, time 115.53ms
tensor(0.9990)
iter 169900: loss nan, time 116.92ms
iter 169910: loss nan, time 118.41ms
iter 169920: loss nan, time 115.72ms
iter 169930: loss nan, time 117.07ms
iter 169940: loss nan, time 116.24ms
iter 169950: loss nan, time 115.82ms
iter 169960: loss nan, time 116.95ms
iter 169970: loss nan, time 117.94ms
iter 169980: loss nan, time 115.62ms
iter 169990: loss nan, time 117.69ms
tensor(1.)
step 170000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 170000: loss nan, time 2912.31ms
iter 170010: loss nan, time 114.88ms
iter 170020: loss nan, time 116.95ms
iter 170030: loss nan, time 115.52ms
iter 170040: loss nan, time 115.05ms
iter 170050: loss nan, time 118.04ms
iter 170060: loss nan, time 122.70ms
iter 170070: loss nan, time 120.62ms
iter 170080: loss nan, time 122.64ms
iter 170090: loss nan, time 121.55ms
tensor(0.9990)
iter 170100: loss nan, time 121.85ms
iter 170110: loss nan, time 119.39ms
iter 170120: loss nan, time 119.23ms
iter 170130: loss nan, time 119.08ms
iter 170140: loss nan, time 119.84ms
iter 170150: loss nan, time 120.66ms
iter 170160: loss nan, time 120.80ms
iter 170170: loss nan, time 121.45ms
iter 170180: loss nan, time 121.03ms
iter 170190: loss nan, time 122.87ms
tensor(0.9961)
iter 170200: loss nan, time 122.97ms
iter 170210: loss nan, time 121.40ms
iter 170220: loss nan, time 122.15ms
iter 170230: loss nan, time 120.05ms
iter 170240: loss nan, time 119.50ms
step 170250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 170250: loss nan, time 2881.07ms
iter 170260: loss nan, time 121.44ms
iter 170270: loss nan, time 119.51ms
iter 170280: loss nan, time 119.40ms
iter 170290: loss nan, time 119.44ms
tensor(0.9911)
iter 170300: loss nan, time 119.69ms
iter 170310: loss nan, time 120.26ms
iter 170320: loss nan, time 120.05ms
iter 170330: loss nan, time 121.97ms
iter 170340: loss nan, time 122.83ms
iter 170350: loss nan, time 122.36ms
iter 170360: loss nan, time 122.63ms
iter 170370: loss nan, time 119.98ms
iter 170380: loss nan, time 121.49ms
iter 170390: loss nan, time 119.32ms
tensor(0.9843)
iter 170400: loss nan, time 119.70ms
iter 170410: loss nan, time 119.29ms
iter 170420: loss nan, time 120.04ms
iter 170430: loss nan, time 119.43ms
iter 170440: loss nan, time 121.53ms
iter 170450: loss nan, time 122.92ms
iter 170460: loss nan, time 122.87ms
iter 170470: loss nan, time 122.85ms
iter 170480: loss nan, time 121.50ms
iter 170490: loss nan, time 121.68ms
tensor(0.9755)
step 170500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 170500: loss nan, time 2911.73ms
iter 170510: loss nan, time 119.39ms
iter 170520: loss nan, time 119.83ms
iter 170530: loss nan, time 120.12ms
iter 170540: loss nan, time 120.09ms
iter 170550: loss nan, time 122.24ms
iter 170560: loss nan, time 123.50ms
iter 170570: loss nan, time 120.72ms
iter 170580: loss nan, time 121.08ms
iter 170590: loss nan, time 121.65ms
tensor(0.9649)
iter 170600: loss nan, time 119.96ms
iter 170610: loss nan, time 119.36ms
iter 170620: loss nan, time 120.26ms
iter 170630: loss nan, time 119.67ms
iter 170640: loss nan, time 121.95ms
iter 170650: loss nan, time 122.92ms
iter 170660: loss nan, time 122.86ms
iter 170670: loss nan, time 122.62ms
iter 170680: loss nan, time 119.93ms
iter 170690: loss nan, time 119.71ms
tensor(0.9524)
iter 170700: loss nan, time 120.84ms
iter 170710: loss nan, time 120.98ms
iter 170720: loss nan, time 122.35ms
iter 170730: loss nan, time 123.13ms
iter 170740: loss nan, time 122.46ms
step 170750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 170750: loss nan, time 2911.11ms
iter 170760: loss nan, time 122.05ms
iter 170770: loss nan, time 119.64ms
iter 170780: loss nan, time 119.65ms
iter 170790: loss nan, time 120.18ms
tensor(0.9382)
iter 170800: loss nan, time 120.70ms
iter 170810: loss nan, time 119.22ms
iter 170820: loss nan, time 120.17ms
iter 170830: loss nan, time 118.97ms
iter 170840: loss nan, time 120.22ms
iter 170850: loss nan, time 121.42ms
iter 170860: loss nan, time 120.47ms
iter 170870: loss nan, time 119.97ms
iter 170880: loss nan, time 119.39ms
iter 170890: loss nan, time 120.50ms
tensor(0.9222)
iter 170900: loss nan, time 122.24ms
iter 170910: loss nan, time 122.09ms
iter 170920: loss nan, time 120.95ms
iter 170930: loss nan, time 121.34ms
iter 170940: loss nan, time 122.13ms
iter 170950: loss nan, time 122.38ms
iter 170960: loss nan, time 120.94ms
iter 170970: loss nan, time 121.54ms
iter 170980: loss nan, time 121.09ms
iter 170990: loss nan, time 120.85ms
tensor(0.9045)
step 171000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 171000: loss nan, time 2904.66ms
iter 171010: loss nan, time 121.29ms
iter 171020: loss nan, time 120.79ms
iter 171030: loss nan, time 120.32ms
iter 171040: loss nan, time 121.00ms
iter 171050: loss nan, time 120.99ms
iter 171060: loss nan, time 120.41ms
iter 171070: loss nan, time 120.77ms
iter 171080: loss nan, time 120.85ms
iter 171090: loss nan, time 119.25ms
tensor(0.8853)
iter 171100: loss nan, time 119.40ms
iter 171110: loss nan, time 119.11ms
iter 171120: loss nan, time 119.28ms
iter 171130: loss nan, time 119.96ms
iter 171140: loss nan, time 120.23ms
iter 171150: loss nan, time 120.03ms
iter 171160: loss nan, time 120.60ms
iter 171170: loss nan, time 119.95ms
iter 171180: loss nan, time 120.03ms
iter 171190: loss nan, time 121.19ms
tensor(0.8645)
iter 171200: loss nan, time 119.95ms
iter 171210: loss nan, time 120.30ms
iter 171220: loss nan, time 120.03ms
iter 171230: loss nan, time 121.09ms
iter 171240: loss nan, time 121.64ms
step 171250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 171250: loss nan, time 2904.97ms
iter 171260: loss nan, time 120.99ms
iter 171270: loss nan, time 122.14ms
iter 171280: loss nan, time 120.21ms
iter 171290: loss nan, time 122.60ms
tensor(0.8423)
iter 171300: loss nan, time 121.41ms
iter 171310: loss nan, time 121.11ms
iter 171320: loss nan, time 121.05ms
iter 171330: loss nan, time 118.87ms
iter 171340: loss nan, time 120.32ms
iter 171350: loss nan, time 121.67ms
iter 171360: loss nan, time 121.12ms
iter 171370: loss nan, time 121.00ms
iter 171380: loss nan, time 119.05ms
iter 171390: loss nan, time 119.14ms
tensor(0.8187)
iter 171400: loss nan, time 119.78ms
iter 171410: loss nan, time 120.03ms
iter 171420: loss nan, time 120.25ms
iter 171430: loss nan, time 120.03ms
iter 171440: loss nan, time 119.25ms
iter 171450: loss nan, time 120.12ms
iter 171460: loss nan, time 120.70ms
iter 171470: loss nan, time 121.07ms
iter 171480: loss nan, time 121.99ms
iter 171490: loss nan, time 120.94ms
tensor(0.7939)
step 171500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 171500: loss nan, time 2906.65ms
iter 171510: loss nan, time 121.84ms
iter 171520: loss nan, time 122.23ms
iter 171530: loss nan, time 122.54ms
iter 171540: loss nan, time 121.00ms
iter 171550: loss nan, time 121.03ms
iter 171560: loss nan, time 120.95ms
iter 171570: loss nan, time 120.76ms
iter 171580: loss nan, time 120.91ms
iter 171590: loss nan, time 121.02ms
tensor(0.7679)
iter 171600: loss nan, time 119.14ms
iter 171610: loss nan, time 119.22ms
iter 171620: loss nan, time 119.31ms
iter 171630: loss nan, time 120.03ms
iter 171640: loss nan, time 120.04ms
iter 171650: loss nan, time 120.02ms
iter 171660: loss nan, time 120.30ms
iter 171670: loss nan, time 119.94ms
iter 171680: loss nan, time 118.81ms
iter 171690: loss nan, time 120.00ms
tensor(0.7409)
iter 171700: loss nan, time 120.77ms
iter 171710: loss nan, time 120.95ms
iter 171720: loss nan, time 121.52ms
iter 171730: loss nan, time 120.27ms
iter 171740: loss nan, time 122.93ms
step 171750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 171750: loss nan, time 2902.29ms
iter 171760: loss nan, time 121.30ms
iter 171770: loss nan, time 121.42ms
iter 171780: loss nan, time 122.42ms
iter 171790: loss nan, time 120.24ms
tensor(0.7129)
iter 171800: loss nan, time 122.45ms
iter 171810: loss nan, time 121.34ms
iter 171820: loss nan, time 120.67ms
iter 171830: loss nan, time 120.50ms
iter 171840: loss nan, time 118.85ms
iter 171850: loss nan, time 120.98ms
iter 171860: loss nan, time 121.09ms
iter 171870: loss nan, time 120.97ms
iter 171880: loss nan, time 121.33ms
iter 171890: loss nan, time 118.99ms
tensor(0.6841)
iter 171900: loss nan, time 121.19ms
iter 171910: loss nan, time 119.31ms
iter 171920: loss nan, time 119.94ms
iter 171930: loss nan, time 119.75ms
iter 171940: loss nan, time 119.78ms
iter 171950: loss nan, time 118.53ms
iter 171960: loss nan, time 120.19ms
iter 171970: loss nan, time 120.13ms
iter 171980: loss nan, time 119.96ms
iter 171990: loss nan, time 120.22ms
tensor(0.6545)
step 172000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 172000: loss nan, time 2908.00ms
iter 172010: loss nan, time 118.84ms
iter 172020: loss nan, time 120.08ms
iter 172030: loss nan, time 121.39ms
iter 172040: loss nan, time 120.29ms
iter 172050: loss nan, time 120.79ms
iter 172060: loss nan, time 119.85ms
iter 172070: loss nan, time 121.66ms
iter 172080: loss nan, time 122.17ms
iter 172090: loss nan, time 122.11ms
tensor(0.6243)
iter 172100: loss nan, time 122.12ms
iter 172110: loss nan, time 121.02ms
iter 172120: loss nan, time 121.74ms
iter 172130: loss nan, time 121.06ms
iter 172140: loss nan, time 120.80ms
iter 172150: loss nan, time 121.41ms
iter 172160: loss nan, time 120.81ms
iter 172170: loss nan, time 120.89ms
iter 172180: loss nan, time 121.25ms
iter 172190: loss nan, time 120.94ms
tensor(0.5937)
iter 172200: loss nan, time 119.66ms
iter 172210: loss nan, time 118.63ms
iter 172220: loss nan, time 119.06ms
iter 172230: loss nan, time 119.13ms
iter 172240: loss nan, time 119.41ms
step 172250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 172250: loss nan, time 2904.56ms
iter 172260: loss nan, time 119.16ms
iter 172270: loss nan, time 118.75ms
iter 172280: loss nan, time 118.83ms
iter 172290: loss nan, time 119.54ms
tensor(0.5627)
iter 172300: loss nan, time 119.73ms
iter 172310: loss nan, time 120.10ms
iter 172320: loss nan, time 119.92ms
iter 172330: loss nan, time 119.95ms
iter 172340: loss nan, time 119.98ms
iter 172350: loss nan, time 120.63ms
iter 172360: loss nan, time 119.99ms
iter 172370: loss nan, time 119.88ms
iter 172380: loss nan, time 120.10ms
iter 172390: loss nan, time 120.72ms
tensor(0.5314)
iter 172400: loss nan, time 120.64ms
iter 172410: loss nan, time 122.02ms
iter 172420: loss nan, time 122.12ms
iter 172430: loss nan, time 122.16ms
iter 172440: loss nan, time 121.76ms
iter 172450: loss nan, time 118.82ms
iter 172460: loss nan, time 121.03ms
iter 172470: loss nan, time 120.76ms
iter 172480: loss nan, time 120.02ms
iter 172490: loss nan, time 120.93ms
tensor(0.5000)
step 172500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 172500: loss nan, time 2921.90ms
iter 172510: loss nan, time 119.08ms
iter 172520: loss nan, time 118.66ms
iter 172530: loss nan, time 119.17ms
iter 172540: loss nan, time 119.89ms
iter 172550: loss nan, time 119.82ms
iter 172560: loss nan, time 120.03ms
iter 172570: loss nan, time 118.83ms
iter 172580: loss nan, time 120.43ms
iter 172590: loss nan, time 119.83ms
tensor(0.4686)
iter 172600: loss nan, time 120.27ms
iter 172610: loss nan, time 119.76ms
iter 172620: loss nan, time 118.93ms
iter 172630: loss nan, time 120.12ms
iter 172640: loss nan, time 120.76ms
iter 172650: loss nan, time 121.15ms
iter 172660: loss nan, time 121.30ms
iter 172670: loss nan, time 121.20ms
iter 172680: loss nan, time 122.16ms
iter 172690: loss nan, time 122.05ms
tensor(0.4373)
iter 172700: loss nan, time 121.11ms
iter 172710: loss nan, time 120.76ms
iter 172720: loss nan, time 121.23ms
iter 172730: loss nan, time 120.74ms
iter 172740: loss nan, time 120.78ms
step 172750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 172750: loss nan, time 2902.88ms
iter 172760: loss nan, time 121.14ms
iter 172770: loss nan, time 120.79ms
iter 172780: loss nan, time 121.35ms
iter 172790: loss nan, time 120.86ms
tensor(0.4063)
iter 172800: loss nan, time 121.01ms
iter 172810: loss nan, time 120.86ms
iter 172820: loss nan, time 120.82ms
iter 172830: loss nan, time 120.85ms
iter 172840: loss nan, time 121.09ms
iter 172850: loss nan, time 120.79ms
iter 172860: loss nan, time 118.66ms
iter 172870: loss nan, time 118.57ms
iter 172880: loss nan, time 118.72ms
iter 172890: loss nan, time 119.88ms
tensor(0.3757)
iter 172900: loss nan, time 120.81ms
iter 172910: loss nan, time 120.18ms
iter 172920: loss nan, time 120.36ms
iter 172930: loss nan, time 120.41ms
iter 172940: loss nan, time 120.91ms
iter 172950: loss nan, time 121.93ms
iter 172960: loss nan, time 120.25ms
iter 172970: loss nan, time 122.60ms
iter 172980: loss nan, time 120.42ms
iter 172990: loss nan, time 121.24ms
tensor(0.3455)
step 173000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 173000: loss nan, time 2878.33ms
iter 173010: loss nan, time 121.68ms
iter 173020: loss nan, time 120.54ms
iter 173030: loss nan, time 122.59ms
iter 173040: loss nan, time 121.24ms
iter 173050: loss nan, time 121.04ms
iter 173060: loss nan, time 121.32ms
iter 173070: loss nan, time 119.00ms
iter 173080: loss nan, time 121.02ms
iter 173090: loss nan, time 121.18ms
tensor(0.3159)
iter 173100: loss nan, time 119.60ms
iter 173110: loss nan, time 119.79ms
iter 173120: loss nan, time 120.36ms
iter 173130: loss nan, time 119.08ms
iter 173140: loss nan, time 119.89ms
iter 173150: loss nan, time 120.24ms
iter 173160: loss nan, time 120.72ms
iter 173170: loss nan, time 120.89ms
iter 173180: loss nan, time 120.11ms
iter 173190: loss nan, time 122.39ms
tensor(0.2871)
iter 173200: loss nan, time 122.56ms
iter 173210: loss nan, time 120.45ms
iter 173220: loss nan, time 120.50ms
iter 173230: loss nan, time 120.91ms
iter 173240: loss nan, time 121.29ms
step 173250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 173250: loss nan, time 2871.06ms
iter 173260: loss nan, time 120.14ms
iter 173270: loss nan, time 121.38ms
iter 173280: loss nan, time 121.96ms
iter 173290: loss nan, time 121.30ms
tensor(0.2591)
iter 173300: loss nan, time 122.85ms
iter 173310: loss nan, time 121.22ms
iter 173320: loss nan, time 121.01ms
iter 173330: loss nan, time 121.20ms
iter 173340: loss nan, time 121.11ms
iter 173350: loss nan, time 121.20ms
iter 173360: loss nan, time 119.07ms
iter 173370: loss nan, time 118.47ms
iter 173380: loss nan, time 120.00ms
iter 173390: loss nan, time 120.23ms
tensor(0.2321)
iter 173400: loss nan, time 120.53ms
iter 173410: loss nan, time 120.16ms
iter 173420: loss nan, time 120.35ms
iter 173430: loss nan, time 120.15ms
iter 173440: loss nan, time 121.12ms
iter 173450: loss nan, time 121.37ms
iter 173460: loss nan, time 123.03ms
iter 173470: loss nan, time 120.29ms
iter 173480: loss nan, time 121.04ms
iter 173490: loss nan, time 121.21ms
tensor(0.2061)
step 173500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 173500: loss nan, time 2919.53ms
iter 173510: loss nan, time 121.02ms
iter 173520: loss nan, time 121.08ms
iter 173530: loss nan, time 119.12ms
iter 173540: loss nan, time 119.28ms
iter 173550: loss nan, time 119.15ms
iter 173560: loss nan, time 120.04ms
iter 173570: loss nan, time 120.22ms
iter 173580: loss nan, time 120.07ms
iter 173590: loss nan, time 119.23ms
tensor(0.1813)
iter 173600: loss nan, time 120.75ms
iter 173610: loss nan, time 121.28ms
iter 173620: loss nan, time 122.47ms
iter 173630: loss nan, time 122.61ms
iter 173640: loss nan, time 121.31ms
iter 173650: loss nan, time 121.22ms
iter 173660: loss nan, time 123.05ms
iter 173670: loss nan, time 121.38ms
iter 173680: loss nan, time 121.28ms
iter 173690: loss nan, time 119.41ms
tensor(0.1577)
iter 173700: loss nan, time 120.17ms
iter 173710: loss nan, time 120.27ms
iter 173720: loss nan, time 120.26ms
iter 173730: loss nan, time 120.30ms
iter 173740: loss nan, time 120.60ms
step 173750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 173750: loss nan, time 2934.63ms
iter 173760: loss nan, time 122.39ms
iter 173770: loss nan, time 120.93ms
iter 173780: loss nan, time 119.00ms
iter 173790: loss nan, time 121.12ms
tensor(0.1355)
iter 173800: loss nan, time 121.33ms
iter 173810: loss nan, time 121.23ms
iter 173820: loss nan, time 119.13ms
iter 173830: loss nan, time 119.31ms
iter 173840: loss nan, time 119.18ms
iter 173850: loss nan, time 120.19ms
iter 173860: loss nan, time 120.18ms
iter 173870: loss nan, time 120.21ms
iter 173880: loss nan, time 120.38ms
iter 173890: loss nan, time 119.29ms
tensor(0.1147)
iter 173900: loss nan, time 121.41ms
iter 173910: loss nan, time 121.89ms
iter 173920: loss nan, time 122.62ms
iter 173930: loss nan, time 121.66ms
iter 173940: loss nan, time 120.95ms
iter 173950: loss nan, time 121.22ms
iter 173960: loss nan, time 120.98ms
iter 173970: loss nan, time 121.19ms
iter 173980: loss nan, time 121.12ms
iter 173990: loss nan, time 119.03ms
tensor(0.0955)
step 174000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 174000: loss nan, time 2906.58ms
iter 174010: loss nan, time 119.30ms
iter 174020: loss nan, time 119.06ms
iter 174030: loss nan, time 119.43ms
iter 174040: loss nan, time 119.74ms
iter 174050: loss nan, time 120.29ms
iter 174060: loss nan, time 120.02ms
iter 174070: loss nan, time 120.14ms
iter 174080: loss nan, time 120.50ms
iter 174090: loss nan, time 121.00ms
tensor(0.0778)
iter 174100: loss nan, time 121.66ms
iter 174110: loss nan, time 122.56ms
iter 174120: loss nan, time 122.50ms
iter 174130: loss nan, time 119.08ms
iter 174140: loss nan, time 121.18ms
iter 174150: loss nan, time 121.20ms
iter 174160: loss nan, time 121.04ms
iter 174170: loss nan, time 121.27ms
iter 174180: loss nan, time 119.08ms
iter 174190: loss nan, time 118.61ms
tensor(0.0618)
iter 174200: loss nan, time 120.62ms
iter 174210: loss nan, time 120.39ms
iter 174220: loss nan, time 120.20ms
iter 174230: loss nan, time 120.37ms
iter 174240: loss nan, time 119.86ms
step 174250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 174250: loss nan, time 2930.07ms
iter 174260: loss nan, time 122.42ms
iter 174270: loss nan, time 121.07ms
iter 174280: loss nan, time 121.65ms
iter 174290: loss nan, time 121.24ms
tensor(0.0476)
iter 174300: loss nan, time 119.82ms
iter 174310: loss nan, time 118.94ms
iter 174320: loss nan, time 118.88ms
iter 174330: loss nan, time 119.10ms
iter 174340: loss nan, time 119.63ms
iter 174350: loss nan, time 120.67ms
iter 174360: loss nan, time 121.03ms
iter 174370: loss nan, time 120.67ms
iter 174380: loss nan, time 120.14ms
iter 174390: loss nan, time 122.25ms
tensor(0.0351)
iter 174400: loss nan, time 122.75ms
iter 174410: loss nan, time 122.57ms
iter 174420: loss nan, time 122.43ms
iter 174430: loss nan, time 120.55ms
iter 174440: loss nan, time 121.19ms
iter 174450: loss nan, time 121.32ms
iter 174460: loss nan, time 121.42ms
iter 174470: loss nan, time 121.25ms
iter 174480: loss nan, time 119.25ms
iter 174490: loss nan, time 118.95ms
tensor(0.0245)
step 174500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 174500: loss nan, time 2917.05ms
iter 174510: loss nan, time 119.22ms
iter 174520: loss nan, time 119.35ms
iter 174530: loss nan, time 119.12ms
iter 174540: loss nan, time 119.83ms
iter 174550: loss nan, time 118.97ms
iter 174560: loss nan, time 120.94ms
iter 174570: loss nan, time 121.71ms
iter 174580: loss nan, time 122.08ms
iter 174590: loss nan, time 122.28ms
tensor(0.0157)
iter 174600: loss nan, time 121.45ms
iter 174610: loss nan, time 122.31ms
iter 174620: loss nan, time 122.42ms
iter 174630: loss nan, time 122.31ms
iter 174640: loss nan, time 120.94ms
iter 174650: loss nan, time 121.27ms
iter 174660: loss nan, time 121.13ms
iter 174670: loss nan, time 119.10ms
iter 174680: loss nan, time 119.05ms
iter 174690: loss nan, time 119.16ms
tensor(0.0089)
iter 174700: loss nan, time 119.37ms
iter 174710: loss nan, time 119.55ms
iter 174720: loss nan, time 119.41ms
iter 174730: loss nan, time 119.03ms
iter 174740: loss nan, time 119.87ms
step 174750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 174750: loss nan, time 2916.36ms
iter 174760: loss nan, time 119.97ms
iter 174770: loss nan, time 120.21ms
iter 174780: loss nan, time 119.78ms
iter 174790: loss nan, time 121.04ms
tensor(0.0039)
iter 174800: loss nan, time 122.13ms
iter 174810: loss nan, time 122.19ms
iter 174820: loss nan, time 122.63ms
iter 174830: loss nan, time 122.45ms
iter 174840: loss nan, time 120.18ms
iter 174850: loss nan, time 122.28ms
iter 174860: loss nan, time 121.16ms
iter 174870: loss nan, time 121.13ms
iter 174880: loss nan, time 121.27ms
iter 174890: loss nan, time 118.90ms
tensor(0.0010)
iter 174900: loss nan, time 119.08ms
iter 174910: loss nan, time 118.93ms
iter 174920: loss nan, time 119.40ms
iter 174930: loss nan, time 119.68ms
iter 174940: loss nan, time 120.42ms
iter 174950: loss nan, time 119.56ms
iter 174960: loss nan, time 121.34ms
iter 174970: loss nan, time 122.65ms
iter 174980: loss nan, time 122.57ms
iter 174990: loss nan, time 122.63ms
tensor(0.0010)
step 175000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 175000: loss nan, time 2931.48ms
iter 175010: loss nan, time 119.25ms
iter 175020: loss nan, time 118.69ms
iter 175030: loss nan, time 119.42ms
iter 175040: loss nan, time 120.80ms
iter 175050: loss nan, time 121.33ms
iter 175060: loss nan, time 122.19ms
iter 175070: loss nan, time 123.24ms
iter 175080: loss nan, time 122.73ms
iter 175090: loss nan, time 119.22ms
tensor(0.0010)
iter 175100: loss nan, time 121.51ms
iter 175110: loss nan, time 121.49ms
iter 175120: loss nan, time 119.44ms
iter 175130: loss nan, time 118.27ms
iter 175140: loss nan, time 119.75ms
iter 175150: loss nan, time 119.15ms
iter 175160: loss nan, time 120.46ms
iter 175170: loss nan, time 120.80ms
iter 175180: loss nan, time 121.31ms
iter 175190: loss nan, time 122.24ms
tensor(0.0039)
iter 175200: loss nan, time 121.88ms
iter 175210: loss nan, time 122.41ms
iter 175220: loss nan, time 122.74ms
iter 175230: loss nan, time 121.57ms
iter 175240: loss nan, time 121.42ms
step 175250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 175250: loss nan, time 2919.56ms
iter 175260: loss nan, time 119.22ms
iter 175270: loss nan, time 119.18ms
iter 175280: loss nan, time 120.11ms
iter 175290: loss nan, time 121.09ms
tensor(0.0089)
iter 175300: loss nan, time 122.08ms
iter 175310: loss nan, time 122.23ms
iter 175320: loss nan, time 122.92ms
iter 175330: loss nan, time 122.56ms
iter 175340: loss nan, time 120.40ms
iter 175350: loss nan, time 120.71ms
iter 175360: loss nan, time 121.45ms
iter 175370: loss nan, time 119.24ms
iter 175380: loss nan, time 119.50ms
iter 175390: loss nan, time 118.95ms
tensor(0.0157)
iter 175400: loss nan, time 119.39ms
iter 175410: loss nan, time 120.29ms
iter 175420: loss nan, time 120.76ms
iter 175430: loss nan, time 121.91ms
iter 175440: loss nan, time 122.59ms
iter 175450: loss nan, time 121.60ms
iter 175460: loss nan, time 122.35ms
iter 175470: loss nan, time 122.36ms
iter 175480: loss nan, time 121.52ms
iter 175490: loss nan, time 121.09ms
tensor(0.0245)
step 175500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 175500: loss nan, time 2907.98ms
iter 175510: loss nan, time 120.54ms
iter 175520: loss nan, time 120.64ms
iter 175530: loss nan, time 119.03ms
iter 175540: loss nan, time 120.22ms
iter 175550: loss nan, time 118.19ms
iter 175560: loss nan, time 119.32ms
iter 175570: loss nan, time 119.63ms
iter 175580: loss nan, time 120.30ms
iter 175590: loss nan, time 120.54ms
tensor(0.0351)
iter 175600: loss nan, time 121.66ms
iter 175610: loss nan, time 122.34ms
iter 175620: loss nan, time 122.48ms
iter 175630: loss nan, time 114.19ms
iter 175640: loss nan, time 114.78ms
iter 175650: loss nan, time 115.79ms
iter 175660: loss nan, time 114.77ms
iter 175670: loss nan, time 118.33ms
iter 175680: loss nan, time 113.73ms
iter 175690: loss nan, time 116.88ms
tensor(0.0476)
iter 175700: loss nan, time 116.73ms
iter 175710: loss nan, time 113.50ms
iter 175720: loss nan, time 117.86ms
iter 175730: loss nan, time 114.84ms
iter 175740: loss nan, time 116.09ms
step 175750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 175750: loss nan, time 2875.40ms
iter 175760: loss nan, time 118.03ms
iter 175770: loss nan, time 115.72ms
iter 175780: loss nan, time 116.71ms
iter 175790: loss nan, time 118.01ms
tensor(0.0618)
iter 175800: loss nan, time 114.77ms
iter 175810: loss nan, time 117.72ms
iter 175820: loss nan, time 116.05ms
iter 175830: loss nan, time 114.68ms
iter 175840: loss nan, time 117.98ms
iter 175850: loss nan, time 115.08ms
iter 175860: loss nan, time 116.66ms
iter 175870: loss nan, time 116.94ms
iter 175880: loss nan, time 114.58ms
iter 175890: loss nan, time 118.41ms
tensor(0.0778)
iter 175900: loss nan, time 116.23ms
iter 175910: loss nan, time 114.59ms
iter 175920: loss nan, time 118.37ms
iter 175930: loss nan, time 115.27ms
iter 175940: loss nan, time 115.97ms
iter 175950: loss nan, time 118.33ms
iter 175960: loss nan, time 114.74ms
iter 175970: loss nan, time 114.86ms
iter 175980: loss nan, time 116.88ms
iter 175990: loss nan, time 114.67ms
tensor(0.0955)
step 176000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 176000: loss nan, time 2910.26ms
iter 176010: loss nan, time 115.60ms
iter 176020: loss nan, time 116.83ms
iter 176030: loss nan, time 117.11ms
iter 176040: loss nan, time 115.03ms
iter 176050: loss nan, time 116.99ms
iter 176060: loss nan, time 116.37ms
iter 176070: loss nan, time 114.75ms
iter 176080: loss nan, time 118.34ms
iter 176090: loss nan, time 117.01ms
tensor(0.1147)
iter 176100: loss nan, time 115.17ms
iter 176110: loss nan, time 118.11ms
iter 176120: loss nan, time 115.19ms
iter 176130: loss nan, time 114.88ms
iter 176140: loss nan, time 118.00ms
iter 176150: loss nan, time 116.20ms
iter 176160: loss nan, time 114.92ms
iter 176170: loss nan, time 118.48ms
iter 176180: loss nan, time 114.77ms
iter 176190: loss nan, time 116.93ms
tensor(0.1355)
iter 176200: loss nan, time 119.01ms
iter 176210: loss nan, time 115.26ms
iter 176220: loss nan, time 116.98ms
iter 176230: loss nan, time 117.97ms
iter 176240: loss nan, time 114.65ms
step 176250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 176250: loss nan, time 2905.36ms
iter 176260: loss nan, time 116.98ms
iter 176270: loss nan, time 114.74ms
iter 176280: loss nan, time 117.94ms
iter 176290: loss nan, time 116.08ms
tensor(0.1577)
iter 176300: loss nan, time 115.34ms
iter 176310: loss nan, time 117.98ms
iter 176320: loss nan, time 115.46ms
iter 176330: loss nan, time 116.86ms
iter 176340: loss nan, time 118.05ms
iter 176350: loss nan, time 115.02ms
iter 176360: loss nan, time 114.84ms
iter 176370: loss nan, time 117.61ms
iter 176380: loss nan, time 114.41ms
iter 176390: loss nan, time 118.12ms
tensor(0.1813)
iter 176400: loss nan, time 117.39ms
iter 176410: loss nan, time 114.93ms
iter 176420: loss nan, time 115.78ms
iter 176430: loss nan, time 116.50ms
iter 176440: loss nan, time 115.19ms
iter 176450: loss nan, time 118.05ms
iter 176460: loss nan, time 115.84ms
iter 176470: loss nan, time 116.64ms
iter 176480: loss nan, time 115.70ms
iter 176490: loss nan, time 115.61ms
tensor(0.2061)
step 176500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 176500: loss nan, time 2902.58ms
iter 176510: loss nan, time 115.79ms
iter 176520: loss nan, time 114.69ms
iter 176530: loss nan, time 117.84ms
iter 176540: loss nan, time 115.81ms
iter 176550: loss nan, time 114.79ms
iter 176560: loss nan, time 117.83ms
iter 176570: loss nan, time 115.01ms
iter 176580: loss nan, time 116.71ms
iter 176590: loss nan, time 117.34ms
tensor(0.2321)
iter 176600: loss nan, time 115.08ms
iter 176610: loss nan, time 117.90ms
iter 176620: loss nan, time 116.24ms
iter 176630: loss nan, time 115.45ms
iter 176640: loss nan, time 118.06ms
iter 176650: loss nan, time 115.50ms
iter 176660: loss nan, time 116.11ms
iter 176670: loss nan, time 117.97ms
iter 176680: loss nan, time 115.37ms
iter 176690: loss nan, time 116.73ms
tensor(0.2591)
iter 176700: loss nan, time 118.87ms
iter 176710: loss nan, time 115.45ms
iter 176720: loss nan, time 116.60ms
iter 176730: loss nan, time 117.48ms
iter 176740: loss nan, time 114.68ms
step 176750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 176750: loss nan, time 2910.55ms
iter 176760: loss nan, time 115.84ms
iter 176770: loss nan, time 116.79ms
iter 176780: loss nan, time 118.07ms
iter 176790: loss nan, time 114.84ms
tensor(0.2871)
iter 176800: loss nan, time 117.23ms
iter 176810: loss nan, time 115.78ms
iter 176820: loss nan, time 114.57ms
iter 176830: loss nan, time 116.68ms
iter 176840: loss nan, time 116.10ms
iter 176850: loss nan, time 115.69ms
iter 176860: loss nan, time 118.14ms
iter 176870: loss nan, time 115.56ms
iter 176880: loss nan, time 114.92ms
iter 176890: loss nan, time 117.27ms
tensor(0.3159)
iter 176900: loss nan, time 116.51ms
iter 176910: loss nan, time 117.31ms
iter 176920: loss nan, time 118.04ms
iter 176930: loss nan, time 114.79ms
iter 176940: loss nan, time 116.64ms
iter 176950: loss nan, time 115.31ms
iter 176960: loss nan, time 114.61ms
iter 176970: loss nan, time 118.14ms
iter 176980: loss nan, time 116.10ms
iter 176990: loss nan, time 115.05ms
tensor(0.3455)
step 177000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 177000: loss nan, time 2905.30ms
iter 177010: loss nan, time 114.68ms
iter 177020: loss nan, time 118.03ms
iter 177030: loss nan, time 117.03ms
iter 177040: loss nan, time 115.57ms
iter 177050: loss nan, time 118.07ms
iter 177060: loss nan, time 116.48ms
iter 177070: loss nan, time 114.69ms
iter 177080: loss nan, time 118.51ms
iter 177090: loss nan, time 115.75ms
tensor(0.3757)
iter 177100: loss nan, time 116.92ms
iter 177110: loss nan, time 118.63ms
iter 177120: loss nan, time 120.63ms
iter 177130: loss nan, time 120.28ms
iter 177140: loss nan, time 117.98ms
iter 177150: loss nan, time 119.18ms
iter 177160: loss nan, time 118.89ms
iter 177170: loss nan, time 117.34ms
iter 177180: loss nan, time 118.08ms
iter 177190: loss nan, time 116.92ms
tensor(0.4063)
iter 177200: loss nan, time 120.82ms
iter 177210: loss nan, time 119.52ms
iter 177220: loss nan, time 119.71ms
iter 177230: loss nan, time 118.95ms
iter 177240: loss nan, time 120.83ms
step 177250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 177250: loss nan, time 2821.37ms
iter 177260: loss nan, time 120.71ms
iter 177270: loss nan, time 121.84ms
iter 177280: loss nan, time 121.85ms
iter 177290: loss nan, time 122.26ms
tensor(0.4373)
iter 177300: loss nan, time 121.99ms
iter 177310: loss nan, time 121.00ms
iter 177320: loss nan, time 121.67ms
iter 177330: loss nan, time 121.85ms
iter 177340: loss nan, time 122.16ms
iter 177350: loss nan, time 122.04ms
iter 177360: loss nan, time 120.56ms
iter 177370: loss nan, time 120.58ms
iter 177380: loss nan, time 120.82ms
iter 177390: loss nan, time 121.45ms
tensor(0.4686)
iter 177400: loss nan, time 122.06ms
iter 177410: loss nan, time 120.77ms
iter 177420: loss nan, time 121.88ms
iter 177430: loss nan, time 122.56ms
iter 177440: loss nan, time 121.01ms
iter 177450: loss nan, time 121.62ms
iter 177460: loss nan, time 120.63ms
iter 177470: loss nan, time 121.99ms
iter 177480: loss nan, time 121.83ms
iter 177490: loss nan, time 120.94ms
tensor(0.5000)
step 177500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 177500: loss nan, time 2918.22ms
iter 177510: loss nan, time 119.92ms
iter 177520: loss nan, time 119.50ms
iter 177530: loss nan, time 119.74ms
iter 177540: loss nan, time 119.54ms
iter 177550: loss nan, time 119.82ms
iter 177560: loss nan, time 119.68ms
iter 177570: loss nan, time 118.39ms
iter 177580: loss nan, time 120.10ms
iter 177590: loss nan, time 119.78ms
tensor(0.5314)
iter 177600: loss nan, time 119.69ms
iter 177610: loss nan, time 119.60ms
iter 177620: loss nan, time 118.81ms
iter 177630: loss nan, time 120.46ms
iter 177640: loss nan, time 120.95ms
iter 177650: loss nan, time 122.46ms
iter 177660: loss nan, time 122.55ms
iter 177670: loss nan, time 121.84ms
iter 177680: loss nan, time 121.92ms
iter 177690: loss nan, time 120.63ms
tensor(0.5627)
iter 177700: loss nan, time 121.18ms
iter 177710: loss nan, time 120.65ms
iter 177720: loss nan, time 120.61ms
iter 177730: loss nan, time 120.68ms
iter 177740: loss nan, time 120.63ms
step 177750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 177750: loss nan, time 2913.31ms
iter 177760: loss nan, time 120.72ms
iter 177770: loss nan, time 120.90ms
iter 177780: loss nan, time 120.49ms
iter 177790: loss nan, time 120.61ms
tensor(0.5937)
iter 177800: loss nan, time 120.64ms
iter 177810: loss nan, time 121.24ms
iter 177820: loss nan, time 121.56ms
iter 177830: loss nan, time 121.33ms
iter 177840: loss nan, time 119.69ms
iter 177850: loss nan, time 119.41ms
iter 177860: loss nan, time 119.81ms
iter 177870: loss nan, time 120.47ms
iter 177880: loss nan, time 120.39ms
iter 177890: loss nan, time 120.72ms
tensor(0.6243)
iter 177900: loss nan, time 120.78ms
iter 177910: loss nan, time 120.37ms
iter 177920: loss nan, time 121.38ms
iter 177930: loss nan, time 121.80ms
iter 177940: loss nan, time 122.59ms
iter 177950: loss nan, time 122.35ms
iter 177960: loss nan, time 118.98ms
iter 177970: loss nan, time 121.02ms
iter 177980: loss nan, time 120.06ms
iter 177990: loss nan, time 120.06ms
tensor(0.6545)
step 178000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 178000: loss nan, time 2918.37ms
iter 178010: loss nan, time 120.85ms
iter 178020: loss nan, time 117.98ms
iter 178030: loss nan, time 118.28ms
iter 178040: loss nan, time 118.45ms
iter 178050: loss nan, time 118.65ms
iter 178060: loss nan, time 119.21ms
iter 178070: loss nan, time 119.24ms
iter 178080: loss nan, time 118.29ms
iter 178090: loss nan, time 120.22ms
tensor(0.6841)
iter 178100: loss nan, time 119.54ms
iter 178110: loss nan, time 120.46ms
iter 178120: loss nan, time 119.56ms
iter 178130: loss nan, time 118.86ms
iter 178140: loss nan, time 119.73ms
iter 178150: loss nan, time 119.34ms
iter 178160: loss nan, time 119.92ms
iter 178170: loss nan, time 121.16ms
iter 178180: loss nan, time 120.80ms
iter 178190: loss nan, time 121.65ms
tensor(0.7129)
iter 178200: loss nan, time 122.53ms
iter 178210: loss nan, time 120.31ms
iter 178220: loss nan, time 121.12ms
iter 178230: loss nan, time 121.03ms
iter 178240: loss nan, time 120.02ms
step 178250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 178250: loss nan, time 2881.88ms
iter 178260: loss nan, time 122.08ms
iter 178270: loss nan, time 119.91ms
iter 178280: loss nan, time 122.05ms
iter 178290: loss nan, time 120.91ms
tensor(0.7409)
iter 178300: loss nan, time 121.60ms
iter 178310: loss nan, time 122.36ms
iter 178320: loss nan, time 122.15ms
iter 178330: loss nan, time 122.52ms
iter 178340: loss nan, time 120.82ms
iter 178350: loss nan, time 121.94ms
iter 178360: loss nan, time 121.29ms
iter 178370: loss nan, time 122.25ms
iter 178380: loss nan, time 121.33ms
iter 178390: loss nan, time 121.21ms
tensor(0.7679)
iter 178400: loss nan, time 122.76ms
iter 178410: loss nan, time 121.34ms
iter 178420: loss nan, time 121.10ms
iter 178430: loss nan, time 120.99ms
iter 178440: loss nan, time 121.11ms
iter 178450: loss nan, time 121.34ms
iter 178460: loss nan, time 121.37ms
iter 178470: loss nan, time 119.22ms
iter 178480: loss nan, time 119.86ms
iter 178490: loss nan, time 119.68ms
tensor(0.7939)
step 178500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 178500: loss nan, time 2917.26ms
iter 178510: loss nan, time 120.41ms
iter 178520: loss nan, time 120.71ms
iter 178530: loss nan, time 120.37ms
iter 178540: loss nan, time 120.65ms
iter 178550: loss nan, time 121.66ms
iter 178560: loss nan, time 122.74ms
iter 178570: loss nan, time 121.82ms
iter 178580: loss nan, time 119.02ms
iter 178590: loss nan, time 121.34ms
tensor(0.8187)
iter 178600: loss nan, time 121.46ms
iter 178610: loss nan, time 121.54ms
iter 178620: loss nan, time 118.50ms
iter 178630: loss nan, time 119.92ms
iter 178640: loss nan, time 118.28ms
iter 178650: loss nan, time 120.65ms
iter 178660: loss nan, time 119.44ms
iter 178670: loss nan, time 121.54ms
iter 178680: loss nan, time 122.06ms
iter 178690: loss nan, time 121.72ms
tensor(0.8423)
iter 178700: loss nan, time 120.62ms
iter 178710: loss nan, time 121.02ms
iter 178720: loss nan, time 120.58ms
iter 178730: loss nan, time 121.62ms
iter 178740: loss nan, time 119.45ms
step 178750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 178750: loss nan, time 2927.78ms
iter 178760: loss nan, time 120.47ms
iter 178770: loss nan, time 118.61ms
iter 178780: loss nan, time 120.51ms
iter 178790: loss nan, time 121.69ms
tensor(0.8645)
iter 178800: loss nan, time 123.41ms
iter 178810: loss nan, time 121.98ms
iter 178820: loss nan, time 121.97ms
iter 178830: loss nan, time 118.80ms
iter 178840: loss nan, time 121.51ms
iter 178850: loss nan, time 119.51ms
iter 178860: loss nan, time 120.39ms
iter 178870: loss nan, time 120.63ms
iter 178880: loss nan, time 120.67ms
iter 178890: loss nan, time 119.85ms
tensor(0.8853)
iter 178900: loss nan, time 122.76ms
iter 178910: loss nan, time 121.78ms
iter 178920: loss nan, time 121.29ms
iter 178930: loss nan, time 121.27ms
iter 178940: loss nan, time 120.91ms
iter 178950: loss nan, time 121.64ms
iter 178960: loss nan, time 121.14ms
iter 178970: loss nan, time 120.80ms
iter 178980: loss nan, time 120.92ms
iter 178990: loss nan, time 118.95ms
tensor(0.9045)
step 179000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 179000: loss nan, time 2895.62ms
iter 179010: loss nan, time 120.39ms
iter 179020: loss nan, time 120.69ms
iter 179030: loss nan, time 121.90ms
iter 179040: loss nan, time 121.54ms
iter 179050: loss nan, time 119.26ms
iter 179060: loss nan, time 120.38ms
iter 179070: loss nan, time 120.39ms
iter 179080: loss nan, time 120.59ms
iter 179090: loss nan, time 120.24ms
tensor(0.9222)
iter 179100: loss nan, time 121.34ms
iter 179110: loss nan, time 122.04ms
iter 179120: loss nan, time 122.50ms
iter 179130: loss nan, time 120.67ms
iter 179140: loss nan, time 121.52ms
iter 179150: loss nan, time 121.06ms
iter 179160: loss nan, time 121.64ms
iter 179170: loss nan, time 119.81ms
iter 179180: loss nan, time 120.34ms
iter 179190: loss nan, time 119.75ms
tensor(0.9382)
iter 179200: loss nan, time 120.83ms
iter 179210: loss nan, time 122.23ms
iter 179220: loss nan, time 123.08ms
iter 179230: loss nan, time 122.08ms
iter 179240: loss nan, time 121.73ms
step 179250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 179250: loss nan, time 2914.13ms
iter 179260: loss nan, time 120.11ms
iter 179270: loss nan, time 121.24ms
iter 179280: loss nan, time 120.80ms
iter 179290: loss nan, time 120.48ms
tensor(0.9524)
iter 179300: loss nan, time 121.74ms
iter 179310: loss nan, time 123.15ms
iter 179320: loss nan, time 122.21ms
iter 179330: loss nan, time 119.65ms
iter 179340: loss nan, time 119.54ms
iter 179350: loss nan, time 120.71ms
iter 179360: loss nan, time 120.76ms
iter 179370: loss nan, time 121.49ms
iter 179380: loss nan, time 123.29ms
iter 179390: loss nan, time 122.69ms
tensor(0.9649)
iter 179400: loss nan, time 122.06ms
iter 179410: loss nan, time 121.50ms
iter 179420: loss nan, time 119.55ms
iter 179430: loss nan, time 119.93ms
iter 179440: loss nan, time 120.41ms
iter 179450: loss nan, time 121.08ms
iter 179460: loss nan, time 121.64ms
iter 179470: loss nan, time 120.56ms
iter 179480: loss nan, time 122.73ms
iter 179490: loss nan, time 121.57ms
tensor(0.9755)
step 179500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 179500: loss nan, time 2912.88ms
iter 179510: loss nan, time 119.97ms
iter 179520: loss nan, time 121.55ms
iter 179530: loss nan, time 119.85ms
iter 179540: loss nan, time 119.62ms
iter 179550: loss nan, time 119.78ms
iter 179560: loss nan, time 120.55ms
iter 179570: loss nan, time 120.58ms
iter 179580: loss nan, time 119.87ms
iter 179590: loss nan, time 120.16ms
tensor(0.9843)
iter 179600: loss nan, time 121.51ms
iter 179610: loss nan, time 123.28ms
iter 179620: loss nan, time 122.63ms
iter 179630: loss nan, time 120.18ms
iter 179640: loss nan, time 120.61ms
iter 179650: loss nan, time 121.06ms
iter 179660: loss nan, time 121.38ms
iter 179670: loss nan, time 118.88ms
iter 179680: loss nan, time 119.29ms
iter 179690: loss nan, time 120.15ms
tensor(0.9911)
iter 179700: loss nan, time 120.77ms
iter 179710: loss nan, time 120.48ms
iter 179720: loss nan, time 119.25ms
iter 179730: loss nan, time 120.52ms
iter 179740: loss nan, time 121.22ms
step 179750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 179750: loss nan, time 2913.36ms
iter 179760: loss nan, time 122.76ms
iter 179770: loss nan, time 122.70ms
iter 179780: loss nan, time 118.20ms
iter 179790: loss nan, time 121.29ms
tensor(0.9961)
iter 179800: loss nan, time 121.74ms
iter 179810: loss nan, time 121.19ms
iter 179820: loss nan, time 118.74ms
iter 179830: loss nan, time 120.39ms
iter 179840: loss nan, time 119.30ms
iter 179850: loss nan, time 120.93ms
iter 179860: loss nan, time 120.51ms
iter 179870: loss nan, time 121.39ms
iter 179880: loss nan, time 122.01ms
iter 179890: loss nan, time 122.12ms
tensor(0.9990)
iter 179900: loss nan, time 122.16ms
iter 179910: loss nan, time 121.58ms
iter 179920: loss nan, time 120.47ms
iter 179930: loss nan, time 121.64ms
iter 179940: loss nan, time 118.35ms
iter 179950: loss nan, time 120.69ms
iter 179960: loss nan, time 119.21ms
iter 179970: loss nan, time 120.53ms
iter 179980: loss nan, time 120.90ms
iter 179990: loss nan, time 120.41ms
tensor(1.)
step 180000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 180000: loss nan, time 2923.51ms
iter 180010: loss nan, time 121.65ms
iter 180020: loss nan, time 121.58ms
iter 180030: loss nan, time 119.65ms
iter 180040: loss nan, time 119.18ms
iter 180050: loss nan, time 120.80ms
iter 180060: loss nan, time 121.04ms
iter 180070: loss nan, time 120.69ms
iter 180080: loss nan, time 121.20ms
iter 180090: loss nan, time 121.73ms
tensor(0.9990)
iter 180100: loss nan, time 123.28ms
iter 180110: loss nan, time 121.53ms
iter 180120: loss nan, time 120.24ms
iter 180130: loss nan, time 121.04ms
iter 180140: loss nan, time 119.31ms
iter 180150: loss nan, time 120.20ms
iter 180160: loss nan, time 118.46ms
iter 180170: loss nan, time 120.85ms
iter 180180: loss nan, time 120.91ms
iter 180190: loss nan, time 121.61ms
tensor(0.9961)
iter 180200: loss nan, time 123.37ms
iter 180210: loss nan, time 122.09ms
iter 180220: loss nan, time 119.42ms
iter 180230: loss nan, time 121.95ms
iter 180240: loss nan, time 119.91ms
step 180250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 180250: loss nan, time 2910.76ms
iter 180260: loss nan, time 120.96ms
iter 180270: loss nan, time 120.66ms
iter 180280: loss nan, time 120.68ms
iter 180290: loss nan, time 119.27ms
tensor(0.9911)
iter 180300: loss nan, time 121.49ms
iter 180310: loss nan, time 121.55ms
iter 180320: loss nan, time 121.65ms
iter 180330: loss nan, time 121.92ms
iter 180340: loss nan, time 120.36ms
iter 180350: loss nan, time 121.57ms
iter 180360: loss nan, time 121.09ms
iter 180370: loss nan, time 121.05ms
iter 180380: loss nan, time 121.02ms
iter 180390: loss nan, time 121.14ms
tensor(0.9843)
iter 180400: loss nan, time 121.82ms
iter 180410: loss nan, time 121.22ms
iter 180420: loss nan, time 121.09ms
iter 180430: loss nan, time 120.24ms
iter 180440: loss nan, time 119.31ms
iter 180450: loss nan, time 117.24ms
iter 180460: loss nan, time 118.47ms
iter 180470: loss nan, time 116.57ms
iter 180480: loss nan, time 115.03ms
iter 180490: loss nan, time 117.55ms
tensor(0.9755)
step 180500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 180500: loss nan, time 2884.87ms
iter 180510: loss nan, time 116.93ms
iter 180520: loss nan, time 116.15ms
iter 180530: loss nan, time 115.02ms
iter 180540: loss nan, time 118.52ms
iter 180550: loss nan, time 117.29ms
iter 180560: loss nan, time 114.83ms
iter 180570: loss nan, time 117.02ms
iter 180580: loss nan, time 115.97ms
iter 180590: loss nan, time 116.45ms
tensor(0.9649)
iter 180600: loss nan, time 117.81ms
iter 180610: loss nan, time 116.18ms
iter 180620: loss nan, time 116.57ms
iter 180630: loss nan, time 117.42ms
iter 180640: loss nan, time 116.25ms
iter 180650: loss nan, time 116.45ms
iter 180660: loss nan, time 118.38ms
iter 180670: loss nan, time 115.70ms
iter 180680: loss nan, time 115.21ms
iter 180690: loss nan, time 117.83ms
tensor(0.9524)
iter 180700: loss nan, time 117.60ms
iter 180710: loss nan, time 115.20ms
iter 180720: loss nan, time 117.49ms
iter 180730: loss nan, time 116.40ms
iter 180740: loss nan, time 115.12ms
step 180750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 180750: loss nan, time 2912.43ms
iter 180760: loss nan, time 115.82ms
iter 180770: loss nan, time 115.22ms
iter 180780: loss nan, time 118.68ms
iter 180790: loss nan, time 117.26ms
tensor(0.9382)
iter 180800: loss nan, time 113.84ms
iter 180810: loss nan, time 118.69ms
iter 180820: loss nan, time 115.02ms
iter 180830: loss nan, time 115.48ms
iter 180840: loss nan, time 118.66ms
iter 180850: loss nan, time 116.80ms
iter 180860: loss nan, time 115.15ms
iter 180870: loss nan, time 116.61ms
iter 180880: loss nan, time 115.15ms
iter 180890: loss nan, time 117.02ms
tensor(0.9222)
iter 180900: loss nan, time 118.57ms
iter 180910: loss nan, time 115.77ms
iter 180920: loss nan, time 117.21ms
iter 180930: loss nan, time 118.31ms
iter 180940: loss nan, time 115.96ms
iter 180950: loss nan, time 117.18ms
iter 180960: loss nan, time 118.46ms
iter 180970: loss nan, time 115.83ms
iter 180980: loss nan, time 117.22ms
iter 180990: loss nan, time 117.71ms
tensor(0.9045)
step 181000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 181000: loss nan, time 2898.43ms
iter 181010: loss nan, time 118.14ms
iter 181020: loss nan, time 116.49ms
iter 181030: loss nan, time 115.63ms
iter 181040: loss nan, time 119.42ms
iter 181050: loss nan, time 115.84ms
iter 181060: loss nan, time 114.86ms
iter 181070: loss nan, time 118.46ms
iter 181080: loss nan, time 116.07ms
iter 181090: loss nan, time 115.54ms
tensor(0.8853)
iter 181100: loss nan, time 118.66ms
iter 181110: loss nan, time 116.18ms
iter 181120: loss nan, time 115.03ms
iter 181130: loss nan, time 118.63ms
iter 181140: loss nan, time 116.05ms
iter 181150: loss nan, time 116.08ms
iter 181160: loss nan, time 118.22ms
iter 181170: loss nan, time 115.34ms
iter 181180: loss nan, time 114.73ms
iter 181190: loss nan, time 118.09ms
tensor(0.8645)
iter 181200: loss nan, time 115.01ms
iter 181210: loss nan, time 116.86ms
iter 181220: loss nan, time 117.29ms
iter 181230: loss nan, time 114.47ms
iter 181240: loss nan, time 115.91ms
step 181250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 181250: loss nan, time 2917.64ms
iter 181260: loss nan, time 115.00ms
iter 181270: loss nan, time 116.97ms
iter 181280: loss nan, time 115.73ms
iter 181290: loss nan, time 115.38ms
tensor(0.8423)
iter 181300: loss nan, time 118.70ms
iter 181310: loss nan, time 116.12ms
iter 181320: loss nan, time 115.10ms
iter 181330: loss nan, time 117.55ms
iter 181340: loss nan, time 115.93ms
iter 181350: loss nan, time 116.52ms
iter 181360: loss nan, time 117.62ms
iter 181370: loss nan, time 115.23ms
iter 181380: loss nan, time 116.44ms
iter 181390: loss nan, time 117.59ms
tensor(0.8187)
iter 181400: loss nan, time 115.51ms
iter 181410: loss nan, time 118.50ms
iter 181420: loss nan, time 116.92ms
iter 181430: loss nan, time 114.82ms
iter 181440: loss nan, time 117.33ms
iter 181450: loss nan, time 114.78ms
iter 181460: loss nan, time 115.21ms
iter 181470: loss nan, time 119.09ms
iter 181480: loss nan, time 122.66ms
iter 181490: loss nan, time 119.70ms
tensor(0.7939)
step 181500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 181500: loss nan, time 2897.97ms
iter 181510: loss nan, time 122.57ms
iter 181520: loss nan, time 121.10ms
iter 181530: loss nan, time 121.27ms
iter 181540: loss nan, time 119.25ms
iter 181550: loss nan, time 119.00ms
iter 181560: loss nan, time 119.13ms
iter 181570: loss nan, time 119.11ms
iter 181580: loss nan, time 119.27ms
iter 181590: loss nan, time 119.97ms
tensor(0.7679)
iter 181600: loss nan, time 120.61ms
iter 181610: loss nan, time 119.60ms
iter 181620: loss nan, time 121.39ms
iter 181630: loss nan, time 121.49ms
iter 181640: loss nan, time 122.36ms
iter 181650: loss nan, time 122.44ms
iter 181660: loss nan, time 121.77ms
iter 181670: loss nan, time 122.57ms
iter 181680: loss nan, time 121.67ms
iter 181690: loss nan, time 119.84ms
tensor(0.7409)
iter 181700: loss nan, time 118.77ms
iter 181710: loss nan, time 119.62ms
iter 181720: loss nan, time 120.76ms
iter 181730: loss nan, time 121.45ms
iter 181740: loss nan, time 120.47ms
step 181750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 181750: loss nan, time 2914.70ms
iter 181760: loss nan, time 122.68ms
iter 181770: loss nan, time 122.77ms
iter 181780: loss nan, time 121.53ms
iter 181790: loss nan, time 121.40ms
tensor(0.7129)
iter 181800: loss nan, time 120.41ms
iter 181810: loss nan, time 119.13ms
iter 181820: loss nan, time 119.64ms
iter 181830: loss nan, time 120.88ms
iter 181840: loss nan, time 120.59ms
iter 181850: loss nan, time 121.94ms
iter 181860: loss nan, time 121.32ms
iter 181870: loss nan, time 122.68ms
iter 181880: loss nan, time 122.55ms
iter 181890: loss nan, time 121.79ms
tensor(0.6841)
iter 181900: loss nan, time 121.83ms
iter 181910: loss nan, time 119.49ms
iter 181920: loss nan, time 119.50ms
iter 181930: loss nan, time 119.73ms
iter 181940: loss nan, time 119.69ms
iter 181950: loss nan, time 121.13ms
iter 181960: loss nan, time 121.84ms
iter 181970: loss nan, time 121.74ms
iter 181980: loss nan, time 122.67ms
iter 181990: loss nan, time 120.51ms
tensor(0.6545)
step 182000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 182000: loss nan, time 2916.10ms
iter 182010: loss nan, time 121.08ms
iter 182020: loss nan, time 121.66ms
iter 182030: loss nan, time 119.39ms
iter 182040: loss nan, time 119.39ms
iter 182050: loss nan, time 119.61ms
iter 182060: loss nan, time 119.22ms
iter 182070: loss nan, time 120.74ms
iter 182080: loss nan, time 121.85ms
iter 182090: loss nan, time 122.54ms
tensor(0.6243)
iter 182100: loss nan, time 122.28ms
iter 182110: loss nan, time 121.45ms
iter 182120: loss nan, time 122.33ms
iter 182130: loss nan, time 121.31ms
iter 182140: loss nan, time 121.55ms
iter 182150: loss nan, time 119.14ms
iter 182160: loss nan, time 119.73ms
iter 182170: loss nan, time 118.83ms
iter 182180: loss nan, time 119.26ms
iter 182190: loss nan, time 119.93ms
tensor(0.5937)
iter 182200: loss nan, time 120.51ms
iter 182210: loss nan, time 120.98ms
iter 182220: loss nan, time 121.47ms
iter 182230: loss nan, time 122.54ms
iter 182240: loss nan, time 120.38ms
step 182250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 182250: loss nan, time 2902.61ms
iter 182260: loss nan, time 122.28ms
iter 182270: loss nan, time 122.25ms
iter 182280: loss nan, time 122.54ms
iter 182290: loss nan, time 122.59ms
tensor(0.5627)
iter 182300: loss nan, time 119.59ms
iter 182310: loss nan, time 121.10ms
iter 182320: loss nan, time 121.31ms
iter 182330: loss nan, time 121.40ms
iter 182340: loss nan, time 119.04ms
iter 182350: loss nan, time 119.94ms
iter 182360: loss nan, time 119.29ms
iter 182370: loss nan, time 119.44ms
iter 182380: loss nan, time 119.98ms
iter 182390: loss nan, time 120.28ms
tensor(0.5314)
iter 182400: loss nan, time 120.87ms
iter 182410: loss nan, time 119.82ms
iter 182420: loss nan, time 121.62ms
iter 182430: loss nan, time 122.36ms
iter 182440: loss nan, time 122.58ms
iter 182450: loss nan, time 122.68ms
iter 182460: loss nan, time 120.88ms
iter 182470: loss nan, time 121.36ms
iter 182480: loss nan, time 121.37ms
iter 182490: loss nan, time 119.68ms
tensor(0.5000)
step 182500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 182500: loss nan, time 2913.77ms
iter 182510: loss nan, time 119.63ms
iter 182520: loss nan, time 119.15ms
iter 182530: loss nan, time 119.30ms
iter 182540: loss nan, time 120.34ms
iter 182550: loss nan, time 120.30ms
iter 182560: loss nan, time 121.86ms
iter 182570: loss nan, time 122.58ms
iter 182580: loss nan, time 122.51ms
iter 182590: loss nan, time 122.68ms
tensor(0.4686)
iter 182600: loss nan, time 120.66ms
iter 182610: loss nan, time 121.30ms
iter 182620: loss nan, time 121.48ms
iter 182630: loss nan, time 119.53ms
iter 182640: loss nan, time 119.48ms
iter 182650: loss nan, time 119.38ms
iter 182660: loss nan, time 119.74ms
iter 182670: loss nan, time 119.63ms
iter 182680: loss nan, time 120.43ms
iter 182690: loss nan, time 120.87ms
tensor(0.4373)
iter 182700: loss nan, time 122.58ms
iter 182710: loss nan, time 121.39ms
iter 182720: loss nan, time 122.66ms
iter 182730: loss nan, time 121.70ms
iter 182740: loss nan, time 121.64ms
step 182750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 182750: loss nan, time 2919.11ms
iter 182760: loss nan, time 121.47ms
iter 182770: loss nan, time 118.35ms
iter 182780: loss nan, time 119.63ms
iter 182790: loss nan, time 119.51ms
tensor(0.4063)
iter 182800: loss nan, time 119.59ms
iter 182810: loss nan, time 120.25ms
iter 182820: loss nan, time 119.28ms
iter 182830: loss nan, time 121.67ms
iter 182840: loss nan, time 122.07ms
iter 182850: loss nan, time 120.42ms
iter 182860: loss nan, time 122.44ms
iter 182870: loss nan, time 122.61ms
iter 182880: loss nan, time 121.47ms
iter 182890: loss nan, time 121.74ms
tensor(0.3757)
iter 182900: loss nan, time 119.42ms
iter 182910: loss nan, time 119.39ms
iter 182920: loss nan, time 119.31ms
iter 182930: loss nan, time 120.17ms
iter 182940: loss nan, time 120.24ms
iter 182950: loss nan, time 120.79ms
iter 182960: loss nan, time 119.74ms
iter 182970: loss nan, time 122.15ms
iter 182980: loss nan, time 121.84ms
iter 182990: loss nan, time 122.26ms
tensor(0.3455)
step 183000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 183000: loss nan, time 2904.61ms
iter 183010: loss nan, time 122.48ms
iter 183020: loss nan, time 121.45ms
iter 183030: loss nan, time 122.40ms
iter 183040: loss nan, time 122.28ms
iter 183050: loss nan, time 121.31ms
iter 183060: loss nan, time 121.32ms
iter 183070: loss nan, time 121.11ms
iter 183080: loss nan, time 119.31ms
iter 183090: loss nan, time 119.19ms
tensor(0.3159)
iter 183100: loss nan, time 119.74ms
iter 183110: loss nan, time 118.23ms
iter 183120: loss nan, time 119.19ms
iter 183130: loss nan, time 119.18ms
iter 183140: loss nan, time 119.32ms
iter 183150: loss nan, time 119.34ms
iter 183160: loss nan, time 120.25ms
iter 183170: loss nan, time 120.87ms
iter 183180: loss nan, time 121.58ms
iter 183190: loss nan, time 122.28ms
tensor(0.2871)
iter 183200: loss nan, time 120.63ms
iter 183210: loss nan, time 122.43ms
iter 183220: loss nan, time 122.36ms
iter 183230: loss nan, time 121.24ms
iter 183240: loss nan, time 120.99ms
step 183250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 183250: loss nan, time 2910.66ms
iter 183260: loss nan, time 119.34ms
iter 183270: loss nan, time 119.16ms
iter 183280: loss nan, time 119.09ms
iter 183290: loss nan, time 119.66ms
tensor(0.2591)
iter 183300: loss nan, time 120.35ms
iter 183310: loss nan, time 119.99ms
iter 183320: loss nan, time 118.34ms
iter 183330: loss nan, time 119.27ms
iter 183340: loss nan, time 120.25ms
iter 183350: loss nan, time 120.28ms
iter 183360: loss nan, time 120.40ms
iter 183370: loss nan, time 118.90ms
iter 183380: loss nan, time 121.00ms
iter 183390: loss nan, time 120.50ms
tensor(0.2321)
iter 183400: loss nan, time 121.84ms
iter 183410: loss nan, time 121.98ms
iter 183420: loss nan, time 121.17ms
iter 183430: loss nan, time 122.21ms
iter 183440: loss nan, time 121.97ms
iter 183450: loss nan, time 121.95ms
iter 183460: loss nan, time 122.00ms
iter 183470: loss nan, time 120.89ms
iter 183480: loss nan, time 120.92ms
iter 183490: loss nan, time 121.09ms
tensor(0.2061)
step 183500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 183500: loss nan, time 2917.65ms
iter 183510: loss nan, time 120.83ms
iter 183520: loss nan, time 120.90ms
iter 183530: loss nan, time 120.85ms
iter 183540: loss nan, time 120.98ms
iter 183550: loss nan, time 120.79ms
iter 183560: loss nan, time 118.74ms
iter 183570: loss nan, time 118.74ms
iter 183580: loss nan, time 118.81ms
iter 183590: loss nan, time 118.69ms
tensor(0.1813)
iter 183600: loss nan, time 119.06ms
iter 183610: loss nan, time 118.77ms
iter 183620: loss nan, time 118.79ms
iter 183630: loss nan, time 118.71ms
iter 183640: loss nan, time 118.85ms
iter 183650: loss nan, time 119.54ms
iter 183660: loss nan, time 119.57ms
iter 183670: loss nan, time 120.30ms
iter 183680: loss nan, time 119.93ms
iter 183690: loss nan, time 120.48ms
tensor(0.1577)
iter 183700: loss nan, time 120.44ms
iter 183710: loss nan, time 119.97ms
iter 183720: loss nan, time 121.09ms
iter 183730: loss nan, time 121.59ms
iter 183740: loss nan, time 121.44ms
step 183750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 183750: loss nan, time 2904.10ms
iter 183760: loss nan, time 121.19ms
iter 183770: loss nan, time 119.86ms
iter 183780: loss nan, time 121.72ms
iter 183790: loss nan, time 121.52ms
tensor(0.1355)
iter 183800: loss nan, time 122.64ms
iter 183810: loss nan, time 121.38ms
iter 183820: loss nan, time 120.55ms
iter 183830: loss nan, time 121.04ms
iter 183840: loss nan, time 120.68ms
iter 183850: loss nan, time 120.90ms
iter 183860: loss nan, time 120.84ms
iter 183870: loss nan, time 118.96ms
iter 183880: loss nan, time 121.63ms
iter 183890: loss nan, time 121.34ms
tensor(0.1147)
iter 183900: loss nan, time 121.01ms
iter 183910: loss nan, time 120.86ms
iter 183920: loss nan, time 118.80ms
iter 183930: loss nan, time 118.93ms
iter 183940: loss nan, time 118.94ms
iter 183950: loss nan, time 119.01ms
iter 183960: loss nan, time 119.83ms
iter 183970: loss nan, time 120.39ms
iter 183980: loss nan, time 119.95ms
iter 183990: loss nan, time 120.22ms
tensor(0.0955)
step 184000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 184000: loss nan, time 2906.88ms
iter 184010: loss nan, time 119.90ms
iter 184020: loss nan, time 119.41ms
iter 184030: loss nan, time 119.78ms
iter 184040: loss nan, time 118.73ms
iter 184050: loss nan, time 120.25ms
iter 184060: loss nan, time 119.86ms
iter 184070: loss nan, time 119.90ms
iter 184080: loss nan, time 119.78ms
iter 184090: loss nan, time 118.55ms
tensor(0.0778)
iter 184100: loss nan, time 120.38ms
iter 184110: loss nan, time 119.64ms
iter 184120: loss nan, time 120.02ms
iter 184130: loss nan, time 119.80ms
iter 184140: loss nan, time 119.37ms
iter 184150: loss nan, time 120.70ms
iter 184160: loss nan, time 120.82ms
iter 184170: loss nan, time 120.87ms
iter 184180: loss nan, time 120.95ms
iter 184190: loss nan, time 120.11ms
tensor(0.0618)
iter 184200: loss nan, time 121.88ms
iter 184210: loss nan, time 121.92ms
iter 184220: loss nan, time 121.68ms
iter 184230: loss nan, time 121.85ms
iter 184240: loss nan, time 120.81ms
step 184250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 184250: loss nan, time 2895.83ms
iter 184260: loss nan, time 120.12ms
iter 184270: loss nan, time 119.72ms
iter 184280: loss nan, time 120.07ms
iter 184290: loss nan, time 119.49ms
tensor(0.0476)
iter 184300: loss nan, time 118.70ms
iter 184310: loss nan, time 120.12ms
iter 184320: loss nan, time 119.68ms
iter 184330: loss nan, time 119.58ms
iter 184340: loss nan, time 119.60ms
iter 184350: loss nan, time 118.35ms
iter 184360: loss nan, time 119.62ms
iter 184370: loss nan, time 119.44ms
iter 184380: loss nan, time 119.59ms
iter 184390: loss nan, time 119.47ms
tensor(0.0351)
iter 184400: loss nan, time 118.77ms
iter 184410: loss nan, time 120.27ms
iter 184420: loss nan, time 120.41ms
iter 184430: loss nan, time 120.12ms
iter 184440: loss nan, time 119.72ms
iter 184450: loss nan, time 118.52ms
iter 184460: loss nan, time 119.25ms
iter 184470: loss nan, time 119.68ms
iter 184480: loss nan, time 119.50ms
iter 184490: loss nan, time 119.61ms
tensor(0.0245)
step 184500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 184500: loss nan, time 2912.93ms
iter 184510: loss nan, time 118.29ms
iter 184520: loss nan, time 119.88ms
iter 184530: loss nan, time 119.72ms
iter 184540: loss nan, time 119.56ms
iter 184550: loss nan, time 119.65ms
iter 184560: loss nan, time 118.31ms
iter 184570: loss nan, time 120.19ms
iter 184580: loss nan, time 119.56ms
iter 184590: loss nan, time 119.59ms
tensor(0.0157)
iter 184600: loss nan, time 120.38ms
iter 184610: loss nan, time 119.33ms
iter 184620: loss nan, time 120.82ms
iter 184630: loss nan, time 122.71ms
iter 184640: loss nan, time 121.95ms
iter 184650: loss nan, time 121.43ms
iter 184660: loss nan, time 120.67ms
iter 184670: loss nan, time 120.66ms
iter 184680: loss nan, time 121.50ms
iter 184690: loss nan, time 121.77ms
tensor(0.0089)
iter 184700: loss nan, time 121.05ms
iter 184710: loss nan, time 120.55ms
iter 184720: loss nan, time 120.79ms
iter 184730: loss nan, time 119.56ms
iter 184740: loss nan, time 116.51ms
step 184750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 184750: loss nan, time 2891.13ms
iter 184760: loss nan, time 115.73ms
iter 184770: loss nan, time 117.36ms
iter 184780: loss nan, time 118.62ms
iter 184790: loss nan, time 116.34ms
tensor(0.0039)
iter 184800: loss nan, time 114.94ms
iter 184810: loss nan, time 117.56ms
iter 184820: loss nan, time 117.01ms
iter 184830: loss nan, time 117.18ms
iter 184840: loss nan, time 118.59ms
iter 184850: loss nan, time 116.39ms
iter 184860: loss nan, time 115.27ms
iter 184870: loss nan, time 117.14ms
iter 184880: loss nan, time 116.58ms
iter 184890: loss nan, time 115.70ms
tensor(0.0010)
iter 184900: loss nan, time 119.08ms
iter 184910: loss nan, time 116.37ms
iter 184920: loss nan, time 116.96ms
iter 184930: loss nan, time 118.16ms
iter 184940: loss nan, time 116.39ms
iter 184950: loss nan, time 117.41ms
iter 184960: loss nan, time 118.32ms
iter 184970: loss nan, time 116.31ms
iter 184980: loss nan, time 117.17ms
iter 184990: loss nan, time 117.51ms
tensor(0.0010)
step 185000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 185000: loss nan, time 2912.83ms
iter 185010: loss nan, time 114.97ms
iter 185020: loss nan, time 118.45ms
iter 185030: loss nan, time 115.42ms
iter 185040: loss nan, time 117.15ms
iter 185050: loss nan, time 118.26ms
iter 185060: loss nan, time 115.92ms
iter 185070: loss nan, time 114.90ms
iter 185080: loss nan, time 117.99ms
iter 185090: loss nan, time 116.03ms
tensor(0.0010)
iter 185100: loss nan, time 118.14ms
iter 185110: loss nan, time 117.04ms
iter 185120: loss nan, time 115.45ms
iter 185130: loss nan, time 114.83ms
iter 185140: loss nan, time 116.41ms
iter 185150: loss nan, time 115.60ms
iter 185160: loss nan, time 117.03ms
iter 185170: loss nan, time 115.25ms
iter 185180: loss nan, time 120.69ms
iter 185190: loss nan, time 120.82ms
tensor(0.0039)
iter 185200: loss nan, time 120.41ms
iter 185210: loss nan, time 122.32ms
iter 185220: loss nan, time 122.43ms
iter 185230: loss nan, time 121.60ms
iter 185240: loss nan, time 121.59ms
step 185250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 185250: loss nan, time 2917.74ms
iter 185260: loss nan, time 119.53ms
iter 185270: loss nan, time 120.84ms
iter 185280: loss nan, time 120.85ms
iter 185290: loss nan, time 120.78ms
tensor(0.0089)
iter 185300: loss nan, time 121.43ms
iter 185310: loss nan, time 122.50ms
iter 185320: loss nan, time 123.02ms
iter 185330: loss nan, time 120.79ms
iter 185340: loss nan, time 119.64ms
iter 185350: loss nan, time 122.11ms
iter 185360: loss nan, time 120.48ms
iter 185370: loss nan, time 120.61ms
iter 185380: loss nan, time 120.39ms
iter 185390: loss nan, time 120.60ms
tensor(0.0157)
iter 185400: loss nan, time 120.74ms
iter 185410: loss nan, time 122.97ms
iter 185420: loss nan, time 122.36ms
iter 185430: loss nan, time 117.77ms
iter 185440: loss nan, time 116.40ms
iter 185450: loss nan, time 116.89ms
iter 185460: loss nan, time 116.66ms
iter 185470: loss nan, time 115.49ms
iter 185480: loss nan, time 116.95ms
iter 185490: loss nan, time 115.44ms
tensor(0.0245)
step 185500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 185500: loss nan, time 2898.88ms
iter 185510: loss nan, time 115.66ms
iter 185520: loss nan, time 116.53ms
iter 185530: loss nan, time 117.13ms
iter 185540: loss nan, time 116.53ms
iter 185550: loss nan, time 115.61ms
iter 185560: loss nan, time 116.60ms
iter 185570: loss nan, time 115.75ms
iter 185580: loss nan, time 116.68ms
iter 185590: loss nan, time 116.83ms
tensor(0.0351)
iter 185600: loss nan, time 116.21ms
iter 185610: loss nan, time 116.60ms
iter 185620: loss nan, time 117.31ms
iter 185630: loss nan, time 115.61ms
iter 185640: loss nan, time 116.72ms
iter 185650: loss nan, time 114.85ms
iter 185660: loss nan, time 114.54ms
iter 185670: loss nan, time 118.24ms
iter 185680: loss nan, time 115.88ms
iter 185690: loss nan, time 116.89ms
tensor(0.0476)
iter 185700: loss nan, time 118.73ms
iter 185710: loss nan, time 114.62ms
iter 185720: loss nan, time 116.94ms
iter 185730: loss nan, time 117.14ms
iter 185740: loss nan, time 115.81ms
step 185750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 185750: loss nan, time 2906.27ms
iter 185760: loss nan, time 116.06ms
iter 185770: loss nan, time 114.55ms
iter 185780: loss nan, time 118.63ms
iter 185790: loss nan, time 115.86ms
tensor(0.0618)
iter 185800: loss nan, time 117.73ms
iter 185810: loss nan, time 118.33ms
iter 185820: loss nan, time 116.52ms
iter 185830: loss nan, time 114.98ms
iter 185840: loss nan, time 117.77ms
iter 185850: loss nan, time 115.54ms
iter 185860: loss nan, time 116.77ms
iter 185870: loss nan, time 115.87ms
iter 185880: loss nan, time 114.71ms
iter 185890: loss nan, time 115.95ms
tensor(0.0778)
iter 185900: loss nan, time 116.21ms
iter 185910: loss nan, time 116.99ms
iter 185920: loss nan, time 117.22ms
iter 185930: loss nan, time 115.33ms
iter 185940: loss nan, time 116.69ms
iter 185950: loss nan, time 115.59ms
iter 185960: loss nan, time 117.03ms
iter 185970: loss nan, time 116.83ms
iter 185980: loss nan, time 115.75ms
iter 185990: loss nan, time 116.91ms
tensor(0.0955)
step 186000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 186000: loss nan, time 2888.89ms
iter 186010: loss nan, time 116.52ms
iter 186020: loss nan, time 114.87ms
iter 186030: loss nan, time 114.38ms
iter 186040: loss nan, time 117.35ms
iter 186050: loss nan, time 115.60ms
iter 186060: loss nan, time 116.48ms
iter 186070: loss nan, time 116.84ms
iter 186080: loss nan, time 114.68ms
iter 186090: loss nan, time 114.62ms
tensor(0.1147)
iter 186100: loss nan, time 115.53ms
iter 186110: loss nan, time 116.69ms
iter 186120: loss nan, time 117.97ms
iter 186130: loss nan, time 115.95ms
iter 186140: loss nan, time 116.88ms
iter 186150: loss nan, time 115.81ms
iter 186160: loss nan, time 115.23ms
iter 186170: loss nan, time 116.53ms
iter 186180: loss nan, time 115.69ms
iter 186190: loss nan, time 116.86ms
tensor(0.1355)
iter 186200: loss nan, time 118.84ms
iter 186210: loss nan, time 115.89ms
iter 186220: loss nan, time 116.65ms
iter 186230: loss nan, time 116.06ms
iter 186240: loss nan, time 115.82ms
step 186250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 186250: loss nan, time 2901.41ms
iter 186260: loss nan, time 117.39ms
iter 186270: loss nan, time 116.14ms
iter 186280: loss nan, time 117.89ms
iter 186290: loss nan, time 116.40ms
tensor(0.1577)
iter 186300: loss nan, time 117.37ms
iter 186310: loss nan, time 118.04ms
iter 186320: loss nan, time 115.94ms
iter 186330: loss nan, time 117.08ms
iter 186340: loss nan, time 118.19ms
iter 186350: loss nan, time 115.36ms
iter 186360: loss nan, time 116.91ms
iter 186370: loss nan, time 118.03ms
iter 186380: loss nan, time 116.50ms
iter 186390: loss nan, time 116.79ms
tensor(0.1813)
iter 186400: loss nan, time 117.14ms
iter 186410: loss nan, time 116.17ms
iter 186420: loss nan, time 116.69ms
iter 186430: loss nan, time 116.60ms
iter 186440: loss nan, time 115.52ms
iter 186450: loss nan, time 117.01ms
iter 186460: loss nan, time 116.53ms
iter 186470: loss nan, time 114.98ms
iter 186480: loss nan, time 116.48ms
iter 186490: loss nan, time 116.09ms
tensor(0.2061)
step 186500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 186500: loss nan, time 2917.46ms
iter 186510: loss nan, time 118.16ms
iter 186520: loss nan, time 116.05ms
iter 186530: loss nan, time 116.81ms
iter 186540: loss nan, time 115.92ms
iter 186550: loss nan, time 115.78ms
iter 186560: loss nan, time 117.05ms
iter 186570: loss nan, time 117.30ms
iter 186580: loss nan, time 115.65ms
iter 186590: loss nan, time 116.75ms
tensor(0.2321)
iter 186600: loss nan, time 116.88ms
iter 186610: loss nan, time 115.52ms
iter 186620: loss nan, time 116.87ms
iter 186630: loss nan, time 116.18ms
iter 186640: loss nan, time 116.91ms
iter 186650: loss nan, time 118.79ms
iter 186660: loss nan, time 114.95ms
iter 186670: loss nan, time 116.89ms
iter 186680: loss nan, time 116.24ms
iter 186690: loss nan, time 115.87ms
tensor(0.2591)
iter 186700: loss nan, time 117.61ms
iter 186710: loss nan, time 116.09ms
iter 186720: loss nan, time 114.03ms
iter 186730: loss nan, time 117.99ms
iter 186740: loss nan, time 113.77ms
step 186750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 186750: loss nan, time 2905.36ms
iter 186760: loss nan, time 115.18ms
iter 186770: loss nan, time 115.10ms
iter 186780: loss nan, time 117.10ms
iter 186790: loss nan, time 115.88ms
tensor(0.2871)
iter 186800: loss nan, time 115.29ms
iter 186810: loss nan, time 118.22ms
iter 186820: loss nan, time 115.83ms
iter 186830: loss nan, time 117.02ms
iter 186840: loss nan, time 116.04ms
iter 186850: loss nan, time 115.32ms
iter 186860: loss nan, time 114.61ms
iter 186870: loss nan, time 116.06ms
iter 186880: loss nan, time 115.25ms
iter 186890: loss nan, time 118.05ms
tensor(0.3159)
iter 186900: loss nan, time 116.81ms
iter 186910: loss nan, time 116.96ms
iter 186920: loss nan, time 115.38ms
iter 186930: loss nan, time 115.62ms
iter 186940: loss nan, time 117.23ms
iter 186950: loss nan, time 116.75ms
iter 186960: loss nan, time 115.43ms
iter 186970: loss nan, time 116.78ms
iter 186980: loss nan, time 116.13ms
iter 186990: loss nan, time 114.79ms
tensor(0.3455)
step 187000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 187000: loss nan, time 2904.89ms
iter 187010: loss nan, time 114.80ms
iter 187020: loss nan, time 117.09ms
iter 187030: loss nan, time 116.31ms
iter 187040: loss nan, time 114.84ms
iter 187050: loss nan, time 116.90ms
iter 187060: loss nan, time 116.09ms
iter 187070: loss nan, time 115.10ms
iter 187080: loss nan, time 117.95ms
iter 187090: loss nan, time 116.72ms
tensor(0.3757)
iter 187100: loss nan, time 116.24ms
iter 187110: loss nan, time 118.02ms
iter 187120: loss nan, time 115.92ms
iter 187130: loss nan, time 116.31ms
iter 187140: loss nan, time 118.41ms
iter 187150: loss nan, time 115.60ms
iter 187160: loss nan, time 117.03ms
iter 187170: loss nan, time 118.05ms
iter 187180: loss nan, time 114.93ms
iter 187190: loss nan, time 116.10ms
tensor(0.4063)
iter 187200: loss nan, time 117.16ms
iter 187210: loss nan, time 114.72ms
iter 187220: loss nan, time 117.52ms
iter 187230: loss nan, time 116.61ms
iter 187240: loss nan, time 114.72ms
step 187250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 187250: loss nan, time 2898.20ms
iter 187260: loss nan, time 114.51ms
iter 187270: loss nan, time 116.97ms
iter 187280: loss nan, time 115.93ms
iter 187290: loss nan, time 115.19ms
tensor(0.4373)
iter 187300: loss nan, time 118.35ms
iter 187310: loss nan, time 114.75ms
iter 187320: loss nan, time 116.81ms
iter 187330: loss nan, time 115.90ms
iter 187340: loss nan, time 114.69ms
iter 187350: loss nan, time 118.16ms
iter 187360: loss nan, time 115.61ms
iter 187370: loss nan, time 115.58ms
iter 187380: loss nan, time 118.02ms
iter 187390: loss nan, time 114.66ms
tensor(0.4686)
iter 187400: loss nan, time 117.06ms
iter 187410: loss nan, time 116.35ms
iter 187420: loss nan, time 114.64ms
iter 187430: loss nan, time 117.83ms
iter 187440: loss nan, time 115.75ms
iter 187450: loss nan, time 116.12ms
iter 187460: loss nan, time 117.35ms
iter 187470: loss nan, time 114.54ms
iter 187480: loss nan, time 116.98ms
iter 187490: loss nan, time 117.46ms
tensor(0.5000)
step 187500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 187500: loss nan, time 2885.39ms
iter 187510: loss nan, time 115.71ms
iter 187520: loss nan, time 115.02ms
iter 187530: loss nan, time 116.68ms
iter 187540: loss nan, time 114.77ms
iter 187550: loss nan, time 116.69ms
iter 187560: loss nan, time 116.78ms
iter 187570: loss nan, time 114.56ms
iter 187580: loss nan, time 117.48ms
iter 187590: loss nan, time 114.63ms
tensor(0.5314)
iter 187600: loss nan, time 117.01ms
iter 187610: loss nan, time 118.13ms
iter 187620: loss nan, time 114.62ms
iter 187630: loss nan, time 116.52ms
iter 187640: loss nan, time 116.32ms
iter 187650: loss nan, time 115.13ms
iter 187660: loss nan, time 117.91ms
iter 187670: loss nan, time 115.05ms
iter 187680: loss nan, time 116.49ms
iter 187690: loss nan, time 117.65ms
tensor(0.5627)
iter 187700: loss nan, time 115.13ms
iter 187710: loss nan, time 118.05ms
iter 187720: loss nan, time 116.15ms
iter 187730: loss nan, time 114.80ms
iter 187740: loss nan, time 118.24ms
step 187750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 187750: loss nan, time 2886.57ms
iter 187760: loss nan, time 117.97ms
iter 187770: loss nan, time 114.84ms
iter 187780: loss nan, time 116.80ms
iter 187790: loss nan, time 116.30ms
tensor(0.5937)
iter 187800: loss nan, time 115.52ms
iter 187810: loss nan, time 118.47ms
iter 187820: loss nan, time 117.58ms
iter 187830: loss nan, time 114.79ms
iter 187840: loss nan, time 118.07ms
iter 187850: loss nan, time 115.58ms
iter 187860: loss nan, time 114.72ms
iter 187870: loss nan, time 118.28ms
iter 187880: loss nan, time 116.33ms
iter 187890: loss nan, time 115.01ms
tensor(0.6243)
iter 187900: loss nan, time 118.81ms
iter 187910: loss nan, time 114.82ms
iter 187920: loss nan, time 114.83ms
iter 187930: loss nan, time 118.25ms
iter 187940: loss nan, time 115.51ms
iter 187950: loss nan, time 116.98ms
iter 187960: loss nan, time 118.16ms
iter 187970: loss nan, time 114.94ms
iter 187980: loss nan, time 116.85ms
iter 187990: loss nan, time 118.22ms
tensor(0.6545)
step 188000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 188000: loss nan, time 2907.22ms
iter 188010: loss nan, time 118.16ms
iter 188020: loss nan, time 116.23ms
iter 188030: loss nan, time 114.84ms
iter 188040: loss nan, time 117.89ms
iter 188050: loss nan, time 116.16ms
iter 188060: loss nan, time 114.96ms
iter 188070: loss nan, time 118.16ms
iter 188080: loss nan, time 115.58ms
iter 188090: loss nan, time 114.69ms
tensor(0.6841)
iter 188100: loss nan, time 118.16ms
iter 188110: loss nan, time 114.96ms
iter 188120: loss nan, time 116.90ms
iter 188130: loss nan, time 116.86ms
iter 188140: loss nan, time 115.04ms
iter 188150: loss nan, time 115.86ms
iter 188160: loss nan, time 116.14ms
iter 188170: loss nan, time 114.97ms
iter 188180: loss nan, time 117.99ms
iter 188190: loss nan, time 115.32ms
tensor(0.7129)
iter 188200: loss nan, time 117.22ms
iter 188210: loss nan, time 115.88ms
iter 188220: loss nan, time 114.88ms
iter 188230: loss nan, time 117.08ms
iter 188240: loss nan, time 116.50ms
step 188250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 188250: loss nan, time 2901.57ms
iter 188260: loss nan, time 118.18ms
iter 188270: loss nan, time 113.92ms
iter 188280: loss nan, time 117.98ms
iter 188290: loss nan, time 116.31ms
tensor(0.7409)
iter 188300: loss nan, time 114.70ms
iter 188310: loss nan, time 118.43ms
iter 188320: loss nan, time 115.87ms
iter 188330: loss nan, time 114.74ms
iter 188340: loss nan, time 118.12ms
iter 188350: loss nan, time 113.86ms
iter 188360: loss nan, time 116.83ms
iter 188370: loss nan, time 115.99ms
iter 188380: loss nan, time 114.76ms
iter 188390: loss nan, time 117.93ms
tensor(0.7679)
iter 188400: loss nan, time 115.68ms
iter 188410: loss nan, time 114.71ms
iter 188420: loss nan, time 117.90ms
iter 188430: loss nan, time 114.58ms
iter 188440: loss nan, time 116.79ms
iter 188450: loss nan, time 115.97ms
iter 188460: loss nan, time 115.19ms
iter 188470: loss nan, time 115.84ms
iter 188480: loss nan, time 114.15ms
iter 188490: loss nan, time 116.41ms
tensor(0.7939)
step 188500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 188500: loss nan, time 2906.12ms
iter 188510: loss nan, time 114.84ms
iter 188520: loss nan, time 118.53ms
iter 188530: loss nan, time 115.72ms
iter 188540: loss nan, time 116.39ms
iter 188550: loss nan, time 117.25ms
iter 188560: loss nan, time 114.66ms
iter 188570: loss nan, time 117.98ms
iter 188580: loss nan, time 116.61ms
iter 188590: loss nan, time 115.00ms
tensor(0.8187)
iter 188600: loss nan, time 118.83ms
iter 188610: loss nan, time 115.17ms
iter 188620: loss nan, time 117.23ms
iter 188630: loss nan, time 119.66ms
iter 188640: loss nan, time 116.12ms
iter 188650: loss nan, time 116.85ms
iter 188660: loss nan, time 118.88ms
iter 188670: loss nan, time 113.57ms
iter 188680: loss nan, time 116.97ms
iter 188690: loss nan, time 116.73ms
tensor(0.8423)
iter 188700: loss nan, time 115.25ms
iter 188710: loss nan, time 117.93ms
iter 188720: loss nan, time 115.74ms
iter 188730: loss nan, time 113.67ms
iter 188740: loss nan, time 117.80ms
step 188750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 188750: loss nan, time 2897.55ms
iter 188760: loss nan, time 117.37ms
iter 188770: loss nan, time 115.51ms
iter 188780: loss nan, time 117.60ms
iter 188790: loss nan, time 118.30ms
tensor(0.8645)
iter 188800: loss nan, time 114.91ms
iter 188810: loss nan, time 116.98ms
iter 188820: loss nan, time 115.64ms
iter 188830: loss nan, time 114.49ms
iter 188840: loss nan, time 118.24ms
iter 188850: loss nan, time 115.83ms
iter 188860: loss nan, time 115.24ms
iter 188870: loss nan, time 118.48ms
iter 188880: loss nan, time 114.75ms
iter 188890: loss nan, time 116.65ms
tensor(0.8853)
iter 188900: loss nan, time 118.57ms
iter 188910: loss nan, time 115.92ms
iter 188920: loss nan, time 116.39ms
iter 188930: loss nan, time 119.21ms
iter 188940: loss nan, time 114.35ms
iter 188950: loss nan, time 116.68ms
iter 188960: loss nan, time 115.94ms
iter 188970: loss nan, time 114.63ms
iter 188980: loss nan, time 118.10ms
iter 188990: loss nan, time 115.84ms
tensor(0.9045)
step 189000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 189000: loss nan, time 2901.11ms
iter 189010: loss nan, time 116.80ms
iter 189020: loss nan, time 114.79ms
iter 189030: loss nan, time 117.93ms
iter 189040: loss nan, time 115.87ms
iter 189050: loss nan, time 114.66ms
iter 189060: loss nan, time 116.43ms
iter 189070: loss nan, time 114.37ms
iter 189080: loss nan, time 117.10ms
iter 189090: loss nan, time 116.49ms
tensor(0.9222)
iter 189100: loss nan, time 115.86ms
iter 189110: loss nan, time 118.03ms
iter 189120: loss nan, time 115.62ms
iter 189130: loss nan, time 117.12ms
iter 189140: loss nan, time 116.79ms
iter 189150: loss nan, time 115.17ms
iter 189160: loss nan, time 118.63ms
iter 189170: loss nan, time 116.21ms
iter 189180: loss nan, time 114.79ms
iter 189190: loss nan, time 118.26ms
tensor(0.9382)
iter 189200: loss nan, time 115.45ms
iter 189210: loss nan, time 117.01ms
iter 189220: loss nan, time 118.29ms
iter 189230: loss nan, time 114.64ms
iter 189240: loss nan, time 116.84ms
step 189250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 189250: loss nan, time 2898.73ms
iter 189260: loss nan, time 113.82ms
iter 189270: loss nan, time 117.61ms
iter 189280: loss nan, time 114.71ms
iter 189290: loss nan, time 118.07ms
tensor(0.9524)
iter 189300: loss nan, time 117.01ms
iter 189310: loss nan, time 114.73ms
iter 189320: loss nan, time 115.89ms
iter 189330: loss nan, time 115.01ms
iter 189340: loss nan, time 116.66ms
iter 189350: loss nan, time 117.22ms
iter 189360: loss nan, time 114.81ms
iter 189370: loss nan, time 117.92ms
iter 189380: loss nan, time 115.88ms
iter 189390: loss nan, time 116.48ms
tensor(0.9649)
iter 189400: loss nan, time 117.38ms
iter 189410: loss nan, time 113.73ms
iter 189420: loss nan, time 118.15ms
iter 189430: loss nan, time 116.39ms
iter 189440: loss nan, time 115.02ms
iter 189450: loss nan, time 117.19ms
iter 189460: loss nan, time 114.66ms
iter 189470: loss nan, time 116.65ms
iter 189480: loss nan, time 116.70ms
iter 189490: loss nan, time 114.67ms
tensor(0.9755)
step 189500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 189500: loss nan, time 2909.04ms
iter 189510: loss nan, time 115.35ms
iter 189520: loss nan, time 114.73ms
iter 189530: loss nan, time 118.59ms
iter 189540: loss nan, time 114.60ms
iter 189550: loss nan, time 116.74ms
iter 189560: loss nan, time 116.64ms
iter 189570: loss nan, time 114.74ms
iter 189580: loss nan, time 116.12ms
iter 189590: loss nan, time 115.92ms
tensor(0.9843)
iter 189600: loss nan, time 115.75ms
iter 189610: loss nan, time 118.31ms
iter 189620: loss nan, time 121.26ms
iter 189630: loss nan, time 121.14ms
iter 189640: loss nan, time 121.76ms
iter 189650: loss nan, time 121.38ms
iter 189660: loss nan, time 122.50ms
iter 189670: loss nan, time 122.36ms
iter 189680: loss nan, time 122.77ms
iter 189690: loss nan, time 122.25ms
tensor(0.9911)
iter 189700: loss nan, time 121.36ms
iter 189710: loss nan, time 121.16ms
iter 189720: loss nan, time 120.96ms
iter 189730: loss nan, time 120.25ms
iter 189740: loss nan, time 118.90ms
step 189750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 189750: loss nan, time 2873.70ms
iter 189760: loss nan, time 121.37ms
iter 189770: loss nan, time 122.52ms
iter 189780: loss nan, time 122.09ms
iter 189790: loss nan, time 122.43ms
tensor(0.9961)
iter 189800: loss nan, time 121.50ms
iter 189810: loss nan, time 121.06ms
iter 189820: loss nan, time 120.58ms
iter 189830: loss nan, time 119.05ms
iter 189840: loss nan, time 119.49ms
iter 189850: loss nan, time 119.11ms
iter 189860: loss nan, time 119.47ms
iter 189870: loss nan, time 118.87ms
iter 189880: loss nan, time 119.23ms
iter 189890: loss nan, time 119.19ms
tensor(0.9990)
iter 189900: loss nan, time 119.14ms
iter 189910: loss nan, time 119.69ms
iter 189920: loss nan, time 121.11ms
iter 189930: loss nan, time 121.19ms
iter 189940: loss nan, time 120.17ms
iter 189950: loss nan, time 122.05ms
iter 189960: loss nan, time 122.65ms
iter 189970: loss nan, time 122.38ms
iter 189980: loss nan, time 122.40ms
iter 189990: loss nan, time 120.48ms
tensor(1.)
step 190000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 190000: loss nan, time 2916.36ms
iter 190010: loss nan, time 121.56ms
iter 190020: loss nan, time 121.16ms
iter 190030: loss nan, time 121.18ms
iter 190040: loss nan, time 119.29ms
iter 190050: loss nan, time 118.14ms
iter 190060: loss nan, time 119.41ms
iter 190070: loss nan, time 118.97ms
iter 190080: loss nan, time 119.16ms
iter 190090: loss nan, time 119.70ms
tensor(0.9990)
iter 190100: loss nan, time 120.62ms
iter 190110: loss nan, time 118.87ms
iter 190120: loss nan, time 120.57ms
iter 190130: loss nan, time 121.54ms
iter 190140: loss nan, time 121.48ms
iter 190150: loss nan, time 122.43ms
iter 190160: loss nan, time 121.20ms
iter 190170: loss nan, time 122.44ms
iter 190180: loss nan, time 122.46ms
iter 190190: loss nan, time 121.16ms
tensor(0.9961)
iter 190200: loss nan, time 121.46ms
iter 190210: loss nan, time 121.20ms
iter 190220: loss nan, time 120.99ms
iter 190230: loss nan, time 119.50ms
iter 190240: loss nan, time 119.11ms
step 190250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 190250: loss nan, time 2914.47ms
iter 190260: loss nan, time 118.87ms
iter 190270: loss nan, time 119.20ms
iter 190280: loss nan, time 119.87ms
iter 190290: loss nan, time 119.99ms
tensor(0.9911)
iter 190300: loss nan, time 120.34ms
iter 190310: loss nan, time 119.16ms
iter 190320: loss nan, time 120.00ms
iter 190330: loss nan, time 119.47ms
iter 190340: loss nan, time 120.21ms
iter 190350: loss nan, time 120.15ms
iter 190360: loss nan, time 120.79ms
iter 190370: loss nan, time 121.28ms
iter 190380: loss nan, time 121.87ms
iter 190390: loss nan, time 122.47ms
tensor(0.9843)
iter 190400: loss nan, time 120.58ms
iter 190410: loss nan, time 121.11ms
iter 190420: loss nan, time 120.93ms
iter 190430: loss nan, time 121.20ms
iter 190440: loss nan, time 121.03ms
iter 190450: loss nan, time 118.01ms
iter 190460: loss nan, time 121.09ms
iter 190470: loss nan, time 121.16ms
iter 190480: loss nan, time 121.17ms
iter 190490: loss nan, time 119.30ms
tensor(0.9755)
step 190500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 190500: loss nan, time 2893.21ms
iter 190510: loss nan, time 118.37ms
iter 190520: loss nan, time 121.15ms
iter 190530: loss nan, time 121.83ms
iter 190540: loss nan, time 119.14ms
iter 190550: loss nan, time 118.46ms
iter 190560: loss nan, time 118.02ms
iter 190570: loss nan, time 119.09ms
iter 190580: loss nan, time 119.97ms
iter 190590: loss nan, time 119.38ms
tensor(0.9649)
iter 190600: loss nan, time 119.23ms
iter 190610: loss nan, time 119.48ms
iter 190620: loss nan, time 119.00ms
iter 190630: loss nan, time 120.07ms
iter 190640: loss nan, time 120.43ms
iter 190650: loss nan, time 120.18ms
iter 190660: loss nan, time 119.99ms
iter 190670: loss nan, time 118.85ms
iter 190680: loss nan, time 119.55ms
iter 190690: loss nan, time 120.61ms
tensor(0.9524)
iter 190700: loss nan, time 121.32ms
iter 190710: loss nan, time 121.50ms
iter 190720: loss nan, time 121.16ms
iter 190730: loss nan, time 122.34ms
iter 190740: loss nan, time 121.40ms
step 190750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 190750: loss nan, time 2912.76ms
iter 190760: loss nan, time 121.36ms
iter 190770: loss nan, time 120.92ms
iter 190780: loss nan, time 120.92ms
iter 190790: loss nan, time 121.11ms
tensor(0.9382)
iter 190800: loss nan, time 121.41ms
iter 190810: loss nan, time 121.18ms
iter 190820: loss nan, time 121.19ms
iter 190830: loss nan, time 120.95ms
iter 190840: loss nan, time 119.12ms
iter 190850: loss nan, time 119.27ms
iter 190860: loss nan, time 120.07ms
iter 190870: loss nan, time 120.20ms
iter 190880: loss nan, time 120.37ms
iter 190890: loss nan, time 120.75ms
tensor(0.9222)
iter 190900: loss nan, time 120.53ms
iter 190910: loss nan, time 119.78ms
iter 190920: loss nan, time 120.64ms
iter 190930: loss nan, time 120.66ms
iter 190940: loss nan, time 122.69ms
iter 190950: loss nan, time 122.90ms
iter 190960: loss nan, time 119.59ms
iter 190970: loss nan, time 120.63ms
iter 190980: loss nan, time 120.05ms
iter 190990: loss nan, time 120.30ms
tensor(0.9045)
step 191000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 191000: loss nan, time 2897.14ms
iter 191010: loss nan, time 121.21ms
iter 191020: loss nan, time 119.06ms
iter 191030: loss nan, time 120.20ms
iter 191040: loss nan, time 120.20ms
iter 191050: loss nan, time 121.04ms
iter 191060: loss nan, time 121.26ms
iter 191070: loss nan, time 119.13ms
iter 191080: loss nan, time 118.84ms
iter 191090: loss nan, time 119.13ms
tensor(0.8853)
iter 191100: loss nan, time 119.66ms
iter 191110: loss nan, time 119.84ms
iter 191120: loss nan, time 120.10ms
iter 191130: loss nan, time 119.06ms
iter 191140: loss nan, time 120.42ms
iter 191150: loss nan, time 120.14ms
iter 191160: loss nan, time 120.31ms
iter 191170: loss nan, time 120.94ms
iter 191180: loss nan, time 120.50ms
iter 191190: loss nan, time 122.18ms
tensor(0.8645)
iter 191200: loss nan, time 122.79ms
iter 191210: loss nan, time 122.41ms
iter 191220: loss nan, time 122.28ms
iter 191230: loss nan, time 120.18ms
iter 191240: loss nan, time 121.21ms
step 191250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 191250: loss nan, time 2884.97ms
iter 191260: loss nan, time 121.37ms
iter 191270: loss nan, time 122.03ms
iter 191280: loss nan, time 122.25ms
iter 191290: loss nan, time 120.96ms
tensor(0.8423)
iter 191300: loss nan, time 120.44ms
iter 191310: loss nan, time 120.85ms
iter 191320: loss nan, time 120.09ms
iter 191330: loss nan, time 120.41ms
iter 191340: loss nan, time 120.10ms
iter 191350: loss nan, time 120.33ms
iter 191360: loss nan, time 120.52ms
iter 191370: loss nan, time 120.65ms
iter 191380: loss nan, time 117.95ms
iter 191390: loss nan, time 118.31ms
tensor(0.8187)
iter 191400: loss nan, time 118.70ms
iter 191410: loss nan, time 119.04ms
iter 191420: loss nan, time 119.27ms
iter 191430: loss nan, time 120.68ms
iter 191440: loss nan, time 120.14ms
iter 191450: loss nan, time 120.29ms
iter 191460: loss nan, time 120.47ms
iter 191470: loss nan, time 120.30ms
iter 191480: loss nan, time 120.48ms
iter 191490: loss nan, time 121.05ms
tensor(0.7939)
step 191500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 191500: loss nan, time 2869.38ms
iter 191510: loss nan, time 119.29ms
iter 191520: loss nan, time 119.02ms
iter 191530: loss nan, time 119.34ms
iter 191540: loss nan, time 119.39ms
iter 191550: loss nan, time 119.46ms
iter 191560: loss nan, time 119.94ms
iter 191570: loss nan, time 119.92ms
iter 191580: loss nan, time 120.12ms
iter 191590: loss nan, time 120.37ms
tensor(0.7679)
iter 191600: loss nan, time 120.06ms
iter 191610: loss nan, time 119.35ms
iter 191620: loss nan, time 120.14ms
iter 191630: loss nan, time 120.38ms
iter 191640: loss nan, time 120.91ms
iter 191650: loss nan, time 120.51ms
iter 191660: loss nan, time 120.87ms
iter 191670: loss nan, time 121.61ms
iter 191680: loss nan, time 120.25ms
iter 191690: loss nan, time 122.42ms
tensor(0.7409)
iter 191700: loss nan, time 121.72ms
iter 191710: loss nan, time 122.31ms
iter 191720: loss nan, time 121.02ms
iter 191730: loss nan, time 119.10ms
iter 191740: loss nan, time 121.17ms
step 191750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 191750: loss nan, time 2873.47ms
iter 191760: loss nan, time 120.05ms
iter 191770: loss nan, time 120.76ms
iter 191780: loss nan, time 121.04ms
iter 191790: loss nan, time 120.16ms
tensor(0.7129)
iter 191800: loss nan, time 121.81ms
iter 191810: loss nan, time 121.57ms
iter 191820: loss nan, time 122.47ms
iter 191830: loss nan, time 122.35ms
iter 191840: loss nan, time 119.46ms
iter 191850: loss nan, time 121.00ms
iter 191860: loss nan, time 121.23ms
iter 191870: loss nan, time 121.45ms
iter 191880: loss nan, time 120.69ms
iter 191890: loss nan, time 118.29ms
tensor(0.6841)
iter 191900: loss nan, time 118.83ms
iter 191910: loss nan, time 118.62ms
iter 191920: loss nan, time 119.40ms
iter 191930: loss nan, time 119.37ms
iter 191940: loss nan, time 119.69ms
iter 191950: loss nan, time 118.25ms
iter 191960: loss nan, time 119.60ms
iter 191970: loss nan, time 120.51ms
iter 191980: loss nan, time 121.43ms
iter 191990: loss nan, time 122.20ms
tensor(0.6545)
step 192000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 192000: loss nan, time 2919.00ms
iter 192010: loss nan, time 121.60ms
iter 192020: loss nan, time 121.26ms
iter 192030: loss nan, time 121.21ms
iter 192040: loss nan, time 121.26ms
iter 192050: loss nan, time 121.26ms
iter 192060: loss nan, time 119.55ms
iter 192070: loss nan, time 119.03ms
iter 192080: loss nan, time 119.35ms
iter 192090: loss nan, time 118.45ms
tensor(0.6243)
iter 192100: loss nan, time 120.18ms
iter 192110: loss nan, time 120.23ms
iter 192120: loss nan, time 120.44ms
iter 192130: loss nan, time 120.34ms
iter 192140: loss nan, time 120.52ms
iter 192150: loss nan, time 121.12ms
iter 192160: loss nan, time 121.30ms
iter 192170: loss nan, time 121.76ms
iter 192180: loss nan, time 122.19ms
iter 192190: loss nan, time 120.75ms
tensor(0.5937)
iter 192200: loss nan, time 121.45ms
iter 192210: loss nan, time 121.20ms
iter 192220: loss nan, time 121.47ms
iter 192230: loss nan, time 121.29ms
iter 192240: loss nan, time 119.11ms
step 192250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 192250: loss nan, time 2902.31ms
iter 192260: loss nan, time 121.17ms
iter 192270: loss nan, time 121.11ms
iter 192280: loss nan, time 121.17ms
iter 192290: loss nan, time 119.76ms
tensor(0.5627)
iter 192300: loss nan, time 119.38ms
iter 192310: loss nan, time 119.60ms
iter 192320: loss nan, time 120.25ms
iter 192330: loss nan, time 120.34ms
iter 192340: loss nan, time 120.52ms
iter 192350: loss nan, time 120.32ms
iter 192360: loss nan, time 119.45ms
iter 192370: loss nan, time 121.31ms
iter 192380: loss nan, time 121.92ms
iter 192390: loss nan, time 122.43ms
tensor(0.5314)
iter 192400: loss nan, time 122.76ms
iter 192410: loss nan, time 121.14ms
iter 192420: loss nan, time 121.39ms
iter 192430: loss nan, time 121.06ms
iter 192440: loss nan, time 121.37ms
iter 192450: loss nan, time 121.34ms
iter 192460: loss nan, time 119.16ms
iter 192470: loss nan, time 118.93ms
iter 192480: loss nan, time 119.59ms
iter 192490: loss nan, time 120.41ms
tensor(0.5000)
step 192500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 192500: loss nan, time 2913.38ms
iter 192510: loss nan, time 119.34ms
iter 192520: loss nan, time 119.16ms
iter 192530: loss nan, time 120.17ms
iter 192540: loss nan, time 121.14ms
iter 192550: loss nan, time 120.06ms
iter 192560: loss nan, time 120.91ms
iter 192570: loss nan, time 122.60ms
iter 192580: loss nan, time 121.59ms
iter 192590: loss nan, time 121.67ms
tensor(0.4686)
iter 192600: loss nan, time 119.54ms
iter 192610: loss nan, time 121.20ms
iter 192620: loss nan, time 121.28ms
iter 192630: loss nan, time 119.57ms
iter 192640: loss nan, time 119.67ms
iter 192650: loss nan, time 120.83ms
iter 192660: loss nan, time 119.49ms
iter 192670: loss nan, time 120.32ms
iter 192680: loss nan, time 120.43ms
iter 192690: loss nan, time 121.69ms
tensor(0.4373)
iter 192700: loss nan, time 122.57ms
iter 192710: loss nan, time 121.48ms
iter 192720: loss nan, time 121.30ms
iter 192730: loss nan, time 122.55ms
iter 192740: loss nan, time 120.47ms
step 192750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 192750: loss nan, time 2922.81ms
iter 192760: loss nan, time 120.43ms
iter 192770: loss nan, time 120.23ms
iter 192780: loss nan, time 120.34ms
iter 192790: loss nan, time 121.28ms
tensor(0.4063)
iter 192800: loss nan, time 120.77ms
iter 192810: loss nan, time 123.36ms
iter 192820: loss nan, time 122.12ms
iter 192830: loss nan, time 121.12ms
iter 192840: loss nan, time 120.40ms
iter 192850: loss nan, time 119.48ms
iter 192860: loss nan, time 118.77ms
iter 192870: loss nan, time 119.90ms
iter 192880: loss nan, time 119.65ms
iter 192890: loss nan, time 120.08ms
tensor(0.3757)
iter 192900: loss nan, time 119.40ms
iter 192910: loss nan, time 120.08ms
iter 192920: loss nan, time 118.89ms
iter 192930: loss nan, time 119.92ms
iter 192940: loss nan, time 120.13ms
iter 192950: loss nan, time 120.19ms
iter 192960: loss nan, time 118.14ms
iter 192970: loss nan, time 119.89ms
iter 192980: loss nan, time 119.34ms
iter 192990: loss nan, time 118.57ms
tensor(0.3455)
step 193000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 193000: loss nan, time 2891.77ms
iter 193010: loss nan, time 119.08ms
iter 193020: loss nan, time 118.86ms
iter 193030: loss nan, time 120.29ms
iter 193040: loss nan, time 120.19ms
iter 193050: loss nan, time 120.25ms
iter 193060: loss nan, time 120.21ms
iter 193070: loss nan, time 119.50ms
iter 193080: loss nan, time 121.73ms
iter 193090: loss nan, time 122.57ms
tensor(0.3159)
iter 193100: loss nan, time 122.76ms
iter 193110: loss nan, time 121.28ms
iter 193120: loss nan, time 121.38ms
iter 193130: loss nan, time 122.04ms
iter 193140: loss nan, time 119.31ms
iter 193150: loss nan, time 119.97ms
iter 193160: loss nan, time 120.72ms
iter 193170: loss nan, time 120.90ms
iter 193180: loss nan, time 121.65ms
iter 193190: loss nan, time 122.34ms
tensor(0.2871)
iter 193200: loss nan, time 120.79ms
iter 193210: loss nan, time 121.35ms
iter 193220: loss nan, time 122.84ms
iter 193230: loss nan, time 121.44ms
iter 193240: loss nan, time 121.30ms
step 193250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 193250: loss nan, time 2909.30ms
iter 193260: loss nan, time 119.49ms
iter 193270: loss nan, time 119.22ms
iter 193280: loss nan, time 120.07ms
iter 193290: loss nan, time 120.32ms
tensor(0.2591)
iter 193300: loss nan, time 120.50ms
iter 193310: loss nan, time 120.28ms
iter 193320: loss nan, time 119.34ms
iter 193330: loss nan, time 121.17ms
iter 193340: loss nan, time 122.27ms
iter 193350: loss nan, time 122.36ms
iter 193360: loss nan, time 122.18ms
iter 193370: loss nan, time 120.97ms
iter 193380: loss nan, time 121.05ms
iter 193390: loss nan, time 121.24ms
tensor(0.2321)
iter 193400: loss nan, time 121.43ms
iter 193410: loss nan, time 121.53ms
iter 193420: loss nan, time 119.29ms
iter 193430: loss nan, time 119.61ms
iter 193440: loss nan, time 120.13ms
iter 193450: loss nan, time 120.49ms
iter 193460: loss nan, time 120.40ms
iter 193470: loss nan, time 120.41ms
iter 193480: loss nan, time 120.76ms
iter 193490: loss nan, time 122.33ms
tensor(0.2061)
step 193500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 193500: loss nan, time 2914.83ms
iter 193510: loss nan, time 120.41ms
iter 193520: loss nan, time 120.66ms
iter 193530: loss nan, time 121.19ms
iter 193540: loss nan, time 119.06ms
iter 193550: loss nan, time 119.24ms
iter 193560: loss nan, time 119.15ms
iter 193570: loss nan, time 119.04ms
iter 193580: loss nan, time 119.68ms
iter 193590: loss nan, time 120.42ms
tensor(0.1813)
iter 193600: loss nan, time 121.30ms
iter 193610: loss nan, time 121.34ms
iter 193620: loss nan, time 121.13ms
iter 193630: loss nan, time 122.87ms
iter 193640: loss nan, time 122.49ms
iter 193650: loss nan, time 122.92ms
iter 193660: loss nan, time 121.82ms
iter 193670: loss nan, time 121.25ms
iter 193680: loss nan, time 119.07ms
iter 193690: loss nan, time 119.61ms
tensor(0.1577)
iter 193700: loss nan, time 119.14ms
iter 193710: loss nan, time 119.80ms
iter 193720: loss nan, time 120.11ms
iter 193730: loss nan, time 120.43ms
iter 193740: loss nan, time 120.91ms
step 193750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 193750: loss nan, time 2914.42ms
iter 193760: loss nan, time 120.21ms
iter 193770: loss nan, time 122.62ms
iter 193780: loss nan, time 122.35ms
iter 193790: loss nan, time 122.39ms
tensor(0.1355)
iter 193800: loss nan, time 123.18ms
iter 193810: loss nan, time 120.38ms
iter 193820: loss nan, time 121.29ms
iter 193830: loss nan, time 121.30ms
iter 193840: loss nan, time 119.70ms
iter 193850: loss nan, time 119.21ms
iter 193860: loss nan, time 118.00ms
iter 193870: loss nan, time 119.40ms
iter 193880: loss nan, time 119.86ms
iter 193890: loss nan, time 120.48ms
tensor(0.1147)
iter 193900: loss nan, time 120.83ms
iter 193910: loss nan, time 120.05ms
iter 193920: loss nan, time 120.59ms
iter 193930: loss nan, time 122.71ms
iter 193940: loss nan, time 122.54ms
iter 193950: loss nan, time 122.29ms
iter 193960: loss nan, time 123.03ms
iter 193970: loss nan, time 121.44ms
iter 193980: loss nan, time 121.03ms
iter 193990: loss nan, time 121.02ms
tensor(0.0955)
step 194000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 194000: loss nan, time 2926.24ms
iter 194010: loss nan, time 118.32ms
iter 194020: loss nan, time 119.43ms
iter 194030: loss nan, time 119.84ms
iter 194040: loss nan, time 120.22ms
iter 194050: loss nan, time 120.13ms
iter 194060: loss nan, time 119.70ms
iter 194070: loss nan, time 119.78ms
iter 194080: loss nan, time 119.76ms
iter 194090: loss nan, time 120.02ms
tensor(0.0778)
iter 194100: loss nan, time 120.15ms
iter 194110: loss nan, time 119.69ms
iter 194120: loss nan, time 119.98ms
iter 194130: loss nan, time 119.59ms
iter 194140: loss nan, time 119.65ms
iter 194150: loss nan, time 120.56ms
iter 194160: loss nan, time 119.82ms
iter 194170: loss nan, time 120.20ms
iter 194180: loss nan, time 120.11ms
iter 194190: loss nan, time 120.23ms
tensor(0.0618)
iter 194200: loss nan, time 121.06ms
iter 194210: loss nan, time 119.63ms
iter 194220: loss nan, time 120.31ms
iter 194230: loss nan, time 119.71ms
iter 194240: loss nan, time 119.74ms
step 194250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 194250: loss nan, time 2902.99ms
iter 194260: loss nan, time 120.03ms
iter 194270: loss nan, time 120.80ms
iter 194280: loss nan, time 120.72ms
iter 194290: loss nan, time 120.81ms
tensor(0.0476)
iter 194300: loss nan, time 120.78ms
iter 194310: loss nan, time 122.05ms
iter 194320: loss nan, time 120.59ms
iter 194330: loss nan, time 122.61ms
iter 194340: loss nan, time 122.30ms
iter 194350: loss nan, time 122.37ms
iter 194360: loss nan, time 122.87ms
iter 194370: loss nan, time 119.00ms
iter 194380: loss nan, time 120.98ms
iter 194390: loss nan, time 121.67ms
tensor(0.0351)
iter 194400: loss nan, time 118.78ms
iter 194410: loss nan, time 119.07ms
iter 194420: loss nan, time 119.87ms
iter 194430: loss nan, time 119.16ms
iter 194440: loss nan, time 119.28ms
iter 194450: loss nan, time 119.07ms
iter 194460: loss nan, time 120.29ms
iter 194470: loss nan, time 120.92ms
iter 194480: loss nan, time 120.17ms
iter 194490: loss nan, time 121.44ms
tensor(0.0245)
step 194500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 194500: loss nan, time 2915.58ms
iter 194510: loss nan, time 120.95ms
iter 194520: loss nan, time 122.82ms
iter 194530: loss nan, time 122.87ms
iter 194540: loss nan, time 121.32ms
iter 194550: loss nan, time 121.89ms
iter 194560: loss nan, time 121.67ms
iter 194570: loss nan, time 119.43ms
iter 194580: loss nan, time 119.52ms
iter 194590: loss nan, time 119.50ms
tensor(0.0157)
iter 194600: loss nan, time 117.82ms
iter 194610: loss nan, time 119.02ms
iter 194620: loss nan, time 119.81ms
iter 194630: loss nan, time 120.22ms
iter 194640: loss nan, time 121.29ms
iter 194650: loss nan, time 122.16ms
iter 194660: loss nan, time 122.62ms
iter 194670: loss nan, time 120.44ms
iter 194680: loss nan, time 122.52ms
iter 194690: loss nan, time 121.52ms
tensor(0.0089)
iter 194700: loss nan, time 121.85ms
iter 194710: loss nan, time 121.23ms
iter 194720: loss nan, time 118.97ms
iter 194730: loss nan, time 119.38ms
iter 194740: loss nan, time 118.14ms
step 194750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 194750: loss nan, time 2915.13ms
iter 194760: loss nan, time 119.25ms
iter 194770: loss nan, time 119.11ms
iter 194780: loss nan, time 118.57ms
iter 194790: loss nan, time 119.11ms
tensor(0.0039)
iter 194800: loss nan, time 120.02ms
iter 194810: loss nan, time 119.44ms
iter 194820: loss nan, time 120.26ms
iter 194830: loss nan, time 120.06ms
iter 194840: loss nan, time 120.30ms
iter 194850: loss nan, time 121.92ms
iter 194860: loss nan, time 122.50ms
iter 194870: loss nan, time 122.23ms
iter 194880: loss nan, time 121.51ms
iter 194890: loss nan, time 121.29ms
tensor(0.0010)
iter 194900: loss nan, time 122.54ms
iter 194910: loss nan, time 121.52ms
iter 194920: loss nan, time 121.22ms
iter 194930: loss nan, time 121.31ms
iter 194940: loss nan, time 121.30ms
iter 194950: loss nan, time 121.27ms
iter 194960: loss nan, time 119.11ms
iter 194970: loss nan, time 119.10ms
iter 194980: loss nan, time 118.95ms
iter 194990: loss nan, time 118.98ms
tensor(0.0010)
step 195000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 195000: loss nan, time 2911.93ms
iter 195010: loss nan, time 119.02ms
iter 195020: loss nan, time 119.29ms
iter 195030: loss nan, time 120.07ms
iter 195040: loss nan, time 119.57ms
iter 195050: loss nan, time 121.19ms
iter 195060: loss nan, time 121.66ms
iter 195070: loss nan, time 122.52ms
iter 195080: loss nan, time 120.37ms
iter 195090: loss nan, time 122.42ms
tensor(0.0010)
iter 195100: loss nan, time 122.69ms
iter 195110: loss nan, time 121.35ms
iter 195120: loss nan, time 121.18ms
iter 195130: loss nan, time 119.18ms
iter 195140: loss nan, time 120.61ms
iter 195150: loss nan, time 119.29ms
iter 195160: loss nan, time 119.06ms
iter 195170: loss nan, time 119.06ms
iter 195180: loss nan, time 119.38ms
iter 195190: loss nan, time 119.24ms
tensor(0.0039)
iter 195200: loss nan, time 120.76ms
iter 195210: loss nan, time 121.14ms
iter 195220: loss nan, time 122.16ms
iter 195230: loss nan, time 122.66ms
iter 195240: loss nan, time 121.61ms
step 195250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 195250: loss nan, time 2906.40ms
iter 195260: loss nan, time 121.96ms
iter 195270: loss nan, time 122.46ms
iter 195280: loss nan, time 121.50ms
iter 195290: loss nan, time 121.77ms
tensor(0.0089)
iter 195300: loss nan, time 122.17ms
iter 195310: loss nan, time 119.20ms
iter 195320: loss nan, time 119.02ms
iter 195330: loss nan, time 119.25ms
iter 195340: loss nan, time 119.72ms
iter 195350: loss nan, time 119.90ms
iter 195360: loss nan, time 120.76ms
iter 195370: loss nan, time 120.77ms
iter 195380: loss nan, time 120.84ms
iter 195390: loss nan, time 122.60ms
tensor(0.0157)
iter 195400: loss nan, time 122.20ms
iter 195410: loss nan, time 122.44ms
iter 195420: loss nan, time 122.47ms
iter 195430: loss nan, time 119.31ms
iter 195440: loss nan, time 120.48ms
iter 195450: loss nan, time 120.31ms
iter 195460: loss nan, time 121.81ms
iter 195470: loss nan, time 119.15ms
iter 195480: loss nan, time 119.37ms
iter 195490: loss nan, time 119.41ms
tensor(0.0245)
step 195500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 195500: loss nan, time 2923.72ms
iter 195510: loss nan, time 120.45ms
iter 195520: loss nan, time 121.16ms
iter 195530: loss nan, time 122.72ms
iter 195540: loss nan, time 122.50ms
iter 195550: loss nan, time 121.40ms
iter 195560: loss nan, time 122.57ms
iter 195570: loss nan, time 121.14ms
iter 195580: loss nan, time 121.17ms
iter 195590: loss nan, time 119.30ms
tensor(0.0351)
iter 195600: loss nan, time 119.38ms
iter 195610: loss nan, time 119.21ms
iter 195620: loss nan, time 119.03ms
iter 195630: loss nan, time 119.44ms
iter 195640: loss nan, time 120.71ms
iter 195650: loss nan, time 120.61ms
iter 195660: loss nan, time 121.10ms
iter 195670: loss nan, time 121.55ms
iter 195680: loss nan, time 120.63ms
iter 195690: loss nan, time 122.36ms
tensor(0.0476)
iter 195700: loss nan, time 123.05ms
iter 195710: loss nan, time 122.34ms
iter 195720: loss nan, time 121.10ms
iter 195730: loss nan, time 118.95ms
iter 195740: loss nan, time 119.16ms
step 195750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 195750: loss nan, time 2923.53ms
iter 195760: loss nan, time 119.47ms
iter 195770: loss nan, time 119.20ms
iter 195780: loss nan, time 119.83ms
iter 195790: loss nan, time 120.21ms
tensor(0.0618)
iter 195800: loss nan, time 119.81ms
iter 195810: loss nan, time 120.98ms
iter 195820: loss nan, time 122.03ms
iter 195830: loss nan, time 122.57ms
iter 195840: loss nan, time 122.55ms
iter 195850: loss nan, time 121.13ms
iter 195860: loss nan, time 121.26ms
iter 195870: loss nan, time 121.12ms
iter 195880: loss nan, time 121.43ms
iter 195890: loss nan, time 119.08ms
tensor(0.0778)
iter 195900: loss nan, time 119.86ms
iter 195910: loss nan, time 118.04ms
iter 195920: loss nan, time 116.81ms
iter 195930: loss nan, time 118.30ms
iter 195940: loss nan, time 115.69ms
iter 195950: loss nan, time 117.34ms
iter 195960: loss nan, time 118.23ms
iter 195970: loss nan, time 115.77ms
iter 195980: loss nan, time 116.82ms
iter 195990: loss nan, time 118.28ms
tensor(0.0955)
step 196000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 196000: loss nan, time 2916.01ms
iter 196010: loss nan, time 115.05ms
iter 196020: loss nan, time 118.36ms
iter 196030: loss nan, time 115.24ms
iter 196040: loss nan, time 117.14ms
iter 196050: loss nan, time 117.19ms
iter 196060: loss nan, time 114.75ms
iter 196070: loss nan, time 115.89ms
iter 196080: loss nan, time 116.62ms
iter 196090: loss nan, time 115.31ms
tensor(0.1147)
iter 196100: loss nan, time 118.51ms
iter 196110: loss nan, time 115.19ms
iter 196120: loss nan, time 116.48ms
iter 196130: loss nan, time 116.19ms
iter 196140: loss nan, time 115.83ms
iter 196150: loss nan, time 116.32ms
iter 196160: loss nan, time 118.30ms
iter 196170: loss nan, time 115.90ms
iter 196180: loss nan, time 115.60ms
iter 196190: loss nan, time 116.23ms
tensor(0.1355)
iter 196200: loss nan, time 115.46ms
iter 196210: loss nan, time 117.15ms
iter 196220: loss nan, time 118.18ms
iter 196230: loss nan, time 115.33ms
iter 196240: loss nan, time 116.95ms
step 196250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 196250: loss nan, time 2934.55ms
iter 196260: loss nan, time 115.27ms
iter 196270: loss nan, time 115.43ms
iter 196280: loss nan, time 116.23ms
iter 196290: loss nan, time 116.16ms
tensor(0.1577)
iter 196300: loss nan, time 115.30ms
iter 196310: loss nan, time 118.26ms
iter 196320: loss nan, time 115.81ms
iter 196330: loss nan, time 116.89ms
iter 196340: loss nan, time 117.07ms
iter 196350: loss nan, time 115.94ms
iter 196360: loss nan, time 117.67ms
iter 196370: loss nan, time 118.15ms
iter 196380: loss nan, time 115.64ms
iter 196390: loss nan, time 117.14ms
tensor(0.1813)
iter 196400: loss nan, time 117.62ms
iter 196410: loss nan, time 115.88ms
iter 196420: loss nan, time 116.93ms
iter 196430: loss nan, time 118.61ms
iter 196440: loss nan, time 115.60ms
iter 196450: loss nan, time 116.91ms
iter 196460: loss nan, time 127.61ms
iter 196470: loss nan, time 121.93ms
iter 196480: loss nan, time 119.21ms
iter 196490: loss nan, time 119.71ms
tensor(0.2061)
step 196500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 196500: loss nan, time 2900.67ms
iter 196510: loss nan, time 119.79ms
iter 196520: loss nan, time 120.63ms
iter 196530: loss nan, time 120.57ms
iter 196540: loss nan, time 120.44ms
iter 196550: loss nan, time 119.90ms
iter 196560: loss nan, time 123.05ms
iter 196570: loss nan, time 121.90ms
iter 196580: loss nan, time 121.84ms
iter 196590: loss nan, time 122.10ms
tensor(0.2321)
iter 196600: loss nan, time 122.05ms
iter 196610: loss nan, time 119.28ms
iter 196620: loss nan, time 120.89ms
iter 196630: loss nan, time 122.04ms
iter 196640: loss nan, time 123.20ms
iter 196650: loss nan, time 121.92ms
iter 196660: loss nan, time 121.82ms
iter 196670: loss nan, time 120.09ms
iter 196680: loss nan, time 120.64ms
iter 196690: loss nan, time 120.27ms
tensor(0.2591)
iter 196700: loss nan, time 121.17ms
iter 196710: loss nan, time 121.87ms
iter 196720: loss nan, time 123.15ms
iter 196730: loss nan, time 121.86ms
iter 196740: loss nan, time 119.86ms
step 196750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 196750: loss nan, time 2894.98ms
iter 196760: loss nan, time 121.97ms
iter 196770: loss nan, time 120.26ms
iter 196780: loss nan, time 120.65ms
iter 196790: loss nan, time 121.17ms
tensor(0.2871)
iter 196800: loss nan, time 122.04ms
iter 196810: loss nan, time 120.33ms
iter 196820: loss nan, time 122.88ms
iter 196830: loss nan, time 121.64ms
iter 196840: loss nan, time 121.32ms
iter 196850: loss nan, time 121.54ms
iter 196860: loss nan, time 119.65ms
iter 196870: loss nan, time 120.81ms
iter 196880: loss nan, time 120.74ms
iter 196890: loss nan, time 121.20ms
tensor(0.3159)
iter 196900: loss nan, time 120.92ms
iter 196910: loss nan, time 121.11ms
iter 196920: loss nan, time 122.24ms
iter 196930: loss nan, time 122.82ms
iter 196940: loss nan, time 119.18ms
iter 196950: loss nan, time 121.12ms
iter 196960: loss nan, time 122.07ms
iter 196970: loss nan, time 121.10ms
iter 196980: loss nan, time 119.60ms
iter 196990: loss nan, time 119.54ms
tensor(0.3455)
step 197000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 197000: loss nan, time 2897.01ms
iter 197010: loss nan, time 120.13ms
iter 197020: loss nan, time 118.13ms
iter 197030: loss nan, time 119.64ms
iter 197040: loss nan, time 120.87ms
iter 197050: loss nan, time 119.66ms
iter 197060: loss nan, time 118.31ms
iter 197070: loss nan, time 121.02ms
iter 197080: loss nan, time 120.76ms
iter 197090: loss nan, time 121.84ms
tensor(0.3757)
iter 197100: loss nan, time 123.02ms
iter 197110: loss nan, time 121.22ms
iter 197120: loss nan, time 122.43ms
iter 197130: loss nan, time 116.36ms
iter 197140: loss nan, time 115.65ms
iter 197150: loss nan, time 116.02ms
iter 197160: loss nan, time 116.06ms
iter 197170: loss nan, time 115.71ms
iter 197180: loss nan, time 118.31ms
iter 197190: loss nan, time 116.06ms
tensor(0.4063)
iter 197200: loss nan, time 116.63ms
iter 197210: loss nan, time 116.04ms
iter 197220: loss nan, time 115.94ms
iter 197230: loss nan, time 117.68ms
iter 197240: loss nan, time 121.26ms
step 197250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 197250: loss nan, time 2916.73ms
iter 197260: loss nan, time 120.78ms
iter 197270: loss nan, time 121.48ms
iter 197280: loss nan, time 119.00ms
iter 197290: loss nan, time 118.69ms
tensor(0.4373)
iter 197300: loss nan, time 120.08ms
iter 197310: loss nan, time 119.82ms
iter 197320: loss nan, time 119.86ms
iter 197330: loss nan, time 120.60ms
iter 197340: loss nan, time 120.65ms
iter 197350: loss nan, time 120.35ms
iter 197360: loss nan, time 120.02ms
iter 197370: loss nan, time 120.17ms
iter 197380: loss nan, time 122.06ms
iter 197390: loss nan, time 120.58ms
tensor(0.4686)
iter 197400: loss nan, time 121.76ms
iter 197410: loss nan, time 122.18ms
iter 197420: loss nan, time 118.92ms
iter 197430: loss nan, time 122.12ms
iter 197440: loss nan, time 121.13ms
iter 197450: loss nan, time 121.91ms
iter 197460: loss nan, time 122.19ms
iter 197470: loss nan, time 119.80ms
iter 197480: loss nan, time 122.23ms
iter 197490: loss nan, time 121.12ms
tensor(0.5000)
step 197500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 197500: loss nan, time 2908.56ms
iter 197510: loss nan, time 121.77ms
iter 197520: loss nan, time 121.42ms
iter 197530: loss nan, time 119.09ms
iter 197540: loss nan, time 120.67ms
iter 197550: loss nan, time 121.18ms
iter 197560: loss nan, time 121.13ms
iter 197570: loss nan, time 120.28ms
iter 197580: loss nan, time 119.07ms
iter 197590: loss nan, time 115.80ms
tensor(0.5314)
iter 197600: loss nan, time 114.65ms
iter 197610: loss nan, time 116.23ms
iter 197620: loss nan, time 115.15ms
iter 197630: loss nan, time 115.27ms
iter 197640: loss nan, time 115.56ms
iter 197650: loss nan, time 115.24ms
iter 197660: loss nan, time 114.52ms
iter 197670: loss nan, time 115.22ms
iter 197680: loss nan, time 114.32ms
iter 197690: loss nan, time 116.58ms
tensor(0.5627)
iter 197700: loss nan, time 118.44ms
iter 197710: loss nan, time 116.66ms
iter 197720: loss nan, time 117.12ms
iter 197730: loss nan, time 117.00ms
iter 197740: loss nan, time 116.44ms
step 197750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 197750: loss nan, time 2902.80ms
iter 197760: loss nan, time 114.26ms
iter 197770: loss nan, time 117.05ms
iter 197780: loss nan, time 115.66ms
iter 197790: loss nan, time 115.40ms
tensor(0.5937)
iter 197800: loss nan, time 117.43ms
iter 197810: loss nan, time 117.40ms
iter 197820: loss nan, time 116.11ms
iter 197830: loss nan, time 115.07ms
iter 197840: loss nan, time 116.23ms
iter 197850: loss nan, time 115.29ms
iter 197860: loss nan, time 116.88ms
iter 197870: loss nan, time 116.75ms
iter 197880: loss nan, time 116.03ms
iter 197890: loss nan, time 116.92ms
tensor(0.6243)
iter 197900: loss nan, time 116.67ms
iter 197910: loss nan, time 115.47ms
iter 197920: loss nan, time 116.98ms
iter 197930: loss nan, time 116.90ms
iter 197940: loss nan, time 115.97ms
iter 197950: loss nan, time 117.11ms
iter 197960: loss nan, time 116.04ms
iter 197970: loss nan, time 115.66ms
iter 197980: loss nan, time 116.92ms
iter 197990: loss nan, time 116.17ms
tensor(0.6545)
step 198000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 198000: loss nan, time 2900.57ms
iter 198010: loss nan, time 117.36ms
iter 198020: loss nan, time 114.70ms
iter 198030: loss nan, time 116.98ms
iter 198040: loss nan, time 116.59ms
iter 198050: loss nan, time 117.12ms
iter 198060: loss nan, time 118.14ms
iter 198070: loss nan, time 115.85ms
iter 198080: loss nan, time 116.91ms
iter 198090: loss nan, time 117.47ms
tensor(0.6841)
iter 198100: loss nan, time 116.01ms
iter 198110: loss nan, time 116.91ms
iter 198120: loss nan, time 116.02ms
iter 198130: loss nan, time 114.35ms
iter 198140: loss nan, time 116.94ms
iter 198150: loss nan, time 116.05ms
iter 198160: loss nan, time 113.24ms
iter 198170: loss nan, time 117.31ms
iter 198180: loss nan, time 115.79ms
iter 198190: loss nan, time 115.76ms
tensor(0.7129)
iter 198200: loss nan, time 115.97ms
iter 198210: loss nan, time 115.45ms
iter 198220: loss nan, time 116.18ms
iter 198230: loss nan, time 116.14ms
iter 198240: loss nan, time 116.66ms
step 198250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 198250: loss nan, time 2896.78ms
iter 198260: loss nan, time 115.58ms
iter 198270: loss nan, time 117.07ms
iter 198280: loss nan, time 116.52ms
iter 198290: loss nan, time 115.82ms
tensor(0.7409)
iter 198300: loss nan, time 116.27ms
iter 198310: loss nan, time 114.91ms
iter 198320: loss nan, time 116.70ms
iter 198330: loss nan, time 118.93ms
iter 198340: loss nan, time 116.17ms
iter 198350: loss nan, time 116.74ms
iter 198360: loss nan, time 117.29ms
iter 198370: loss nan, time 115.00ms
iter 198380: loss nan, time 116.40ms
iter 198390: loss nan, time 116.62ms
tensor(0.7679)
iter 198400: loss nan, time 114.61ms
iter 198410: loss nan, time 117.15ms
iter 198420: loss nan, time 116.38ms
iter 198430: loss nan, time 116.98ms
iter 198440: loss nan, time 118.05ms
iter 198450: loss nan, time 116.02ms
iter 198460: loss nan, time 116.60ms
iter 198470: loss nan, time 116.53ms
iter 198480: loss nan, time 114.54ms
iter 198490: loss nan, time 116.66ms
tensor(0.7939)
step 198500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 198500: loss nan, time 2891.69ms
iter 198510: loss nan, time 117.04ms
iter 198520: loss nan, time 117.03ms
iter 198530: loss nan, time 116.45ms
iter 198540: loss nan, time 117.09ms
iter 198550: loss nan, time 116.09ms
iter 198560: loss nan, time 116.13ms
iter 198570: loss nan, time 116.98ms
iter 198580: loss nan, time 117.07ms
iter 198590: loss nan, time 115.94ms
tensor(0.8187)
iter 198600: loss nan, time 117.34ms
iter 198610: loss nan, time 116.24ms
iter 198620: loss nan, time 115.51ms
iter 198630: loss nan, time 117.46ms
iter 198640: loss nan, time 116.98ms
iter 198650: loss nan, time 115.70ms
iter 198660: loss nan, time 116.84ms
iter 198670: loss nan, time 116.18ms
iter 198680: loss nan, time 115.72ms
iter 198690: loss nan, time 116.95ms
tensor(0.8423)
iter 198700: loss nan, time 117.40ms
iter 198710: loss nan, time 116.43ms
iter 198720: loss nan, time 117.20ms
iter 198730: loss nan, time 116.09ms
iter 198740: loss nan, time 116.10ms
step 198750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 198750: loss nan, time 2916.94ms
iter 198760: loss nan, time 116.07ms
iter 198770: loss nan, time 121.24ms
iter 198780: loss nan, time 122.19ms
iter 198790: loss nan, time 122.73ms
tensor(0.8645)
iter 198800: loss nan, time 121.97ms
iter 198810: loss nan, time 121.40ms
iter 198820: loss nan, time 121.56ms
iter 198830: loss nan, time 121.75ms
iter 198840: loss nan, time 119.65ms
iter 198850: loss nan, time 119.38ms
iter 198860: loss nan, time 120.14ms
iter 198870: loss nan, time 119.62ms
iter 198880: loss nan, time 120.88ms
iter 198890: loss nan, time 120.49ms
tensor(0.8853)
iter 198900: loss nan, time 122.34ms
iter 198910: loss nan, time 122.25ms
iter 198920: loss nan, time 122.76ms
iter 198930: loss nan, time 122.27ms
iter 198940: loss nan, time 119.38ms
iter 198950: loss nan, time 121.29ms
iter 198960: loss nan, time 119.19ms
iter 198970: loss nan, time 120.11ms
iter 198980: loss nan, time 119.57ms
iter 198990: loss nan, time 120.58ms
tensor(0.9045)
step 199000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 199000: loss nan, time 2914.79ms
iter 199010: loss nan, time 118.52ms
iter 199020: loss nan, time 120.42ms
iter 199030: loss nan, time 121.21ms
iter 199040: loss nan, time 122.42ms
iter 199050: loss nan, time 123.13ms
iter 199060: loss nan, time 121.01ms
iter 199070: loss nan, time 121.19ms
iter 199080: loss nan, time 121.36ms
iter 199090: loss nan, time 122.07ms
tensor(0.9222)
iter 199100: loss nan, time 119.71ms
iter 199110: loss nan, time 119.25ms
iter 199120: loss nan, time 120.32ms
iter 199130: loss nan, time 120.48ms
iter 199140: loss nan, time 120.68ms
iter 199150: loss nan, time 119.85ms
iter 199160: loss nan, time 121.20ms
iter 199170: loss nan, time 122.50ms
iter 199180: loss nan, time 122.74ms
iter 199190: loss nan, time 119.46ms
tensor(0.9382)
iter 199200: loss nan, time 120.72ms
iter 199210: loss nan, time 121.29ms
iter 199220: loss nan, time 120.07ms
iter 199230: loss nan, time 120.32ms
iter 199240: loss nan, time 119.51ms
step 199250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 199250: loss nan, time 2908.70ms
iter 199260: loss nan, time 119.31ms
iter 199270: loss nan, time 119.68ms
iter 199280: loss nan, time 118.82ms
iter 199290: loss nan, time 120.46ms
tensor(0.9524)
iter 199300: loss nan, time 120.50ms
iter 199310: loss nan, time 119.17ms
iter 199320: loss nan, time 120.22ms
iter 199330: loss nan, time 119.84ms
iter 199340: loss nan, time 121.48ms
iter 199350: loss nan, time 122.01ms
iter 199360: loss nan, time 121.42ms
iter 199370: loss nan, time 122.58ms
iter 199380: loss nan, time 120.01ms
iter 199390: loss nan, time 121.06ms
tensor(0.9649)
iter 199400: loss nan, time 121.63ms
iter 199410: loss nan, time 121.54ms
iter 199420: loss nan, time 119.42ms
iter 199430: loss nan, time 119.29ms
iter 199440: loss nan, time 120.89ms
iter 199450: loss nan, time 120.46ms
iter 199460: loss nan, time 120.21ms
iter 199470: loss nan, time 120.83ms
iter 199480: loss nan, time 120.91ms
iter 199490: loss nan, time 120.53ms
tensor(0.9755)
step 199500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 199500: loss nan, time 2913.97ms
iter 199510: loss nan, time 122.70ms
iter 199520: loss nan, time 121.33ms
iter 199530: loss nan, time 121.03ms
iter 199540: loss nan, time 120.46ms
iter 199550: loss nan, time 119.19ms
iter 199560: loss nan, time 121.29ms
iter 199570: loss nan, time 119.19ms
iter 199580: loss nan, time 119.70ms
iter 199590: loss nan, time 119.36ms
tensor(0.9843)
iter 199600: loss nan, time 120.82ms
iter 199610: loss nan, time 119.24ms
iter 199620: loss nan, time 120.21ms
iter 199630: loss nan, time 121.02ms
iter 199640: loss nan, time 122.46ms
iter 199650: loss nan, time 122.61ms
iter 199660: loss nan, time 121.68ms
iter 199670: loss nan, time 121.48ms
iter 199680: loss nan, time 121.34ms
iter 199690: loss nan, time 121.45ms
tensor(0.9911)
iter 199700: loss nan, time 120.14ms
iter 199710: loss nan, time 120.00ms
iter 199720: loss nan, time 120.65ms
iter 199730: loss nan, time 120.58ms
iter 199740: loss nan, time 119.79ms
step 199750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 199750: loss nan, time 2910.12ms
iter 199760: loss nan, time 120.83ms
iter 199770: loss nan, time 121.51ms
iter 199780: loss nan, time 122.16ms
iter 199790: loss nan, time 122.77ms
tensor(0.9961)
iter 199800: loss nan, time 119.48ms
iter 199810: loss nan, time 121.53ms
iter 199820: loss nan, time 121.18ms
iter 199830: loss nan, time 121.25ms
iter 199840: loss nan, time 119.08ms
iter 199850: loss nan, time 120.00ms
iter 199860: loss nan, time 119.12ms
iter 199870: loss nan, time 120.24ms
iter 199880: loss nan, time 121.72ms
iter 199890: loss nan, time 122.82ms
tensor(0.9990)
iter 199900: loss nan, time 123.02ms
iter 199910: loss nan, time 121.82ms
iter 199920: loss nan, time 121.67ms
iter 199930: loss nan, time 121.51ms
iter 199940: loss nan, time 119.59ms
iter 199950: loss nan, time 119.30ms
iter 199960: loss nan, time 119.21ms
iter 199970: loss nan, time 120.40ms
iter 199980: loss nan, time 119.68ms
iter 199990: loss nan, time 120.84ms
tensor(1.)
step 200000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 200000: loss nan, time 2913.14ms
iter 200010: loss nan, time 121.61ms
iter 200020: loss nan, time 122.58ms
iter 200030: loss nan, time 122.61ms
iter 200040: loss nan, time 120.63ms
iter 200050: loss nan, time 119.60ms
iter 200060: loss nan, time 119.62ms
iter 200070: loss nan, time 119.78ms
iter 200080: loss nan, time 119.40ms
iter 200090: loss nan, time 119.66ms
tensor(0.9990)
iter 200100: loss nan, time 121.42ms
iter 200110: loss nan, time 120.35ms
iter 200120: loss nan, time 120.88ms
iter 200130: loss nan, time 121.73ms
iter 200140: loss nan, time 123.29ms
iter 200150: loss nan, time 122.63ms
iter 200160: loss nan, time 121.36ms
iter 200170: loss nan, time 119.31ms
iter 200180: loss nan, time 118.54ms
iter 200190: loss nan, time 118.61ms
tensor(0.9961)
iter 200200: loss nan, time 118.75ms
iter 200210: loss nan, time 119.00ms
iter 200220: loss nan, time 118.13ms
iter 200230: loss nan, time 118.45ms
iter 200240: loss nan, time 118.56ms
step 200250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 200250: loss nan, time 2904.46ms
iter 200260: loss nan, time 121.67ms
iter 200270: loss nan, time 120.57ms
iter 200280: loss nan, time 121.77ms
iter 200290: loss nan, time 121.78ms
tensor(0.9911)
iter 200300: loss nan, time 121.81ms
iter 200310: loss nan, time 121.80ms
iter 200320: loss nan, time 120.47ms
iter 200330: loss nan, time 122.98ms
iter 200340: loss nan, time 121.76ms
iter 200350: loss nan, time 121.71ms
iter 200360: loss nan, time 121.77ms
iter 200370: loss nan, time 120.52ms
iter 200380: loss nan, time 121.72ms
iter 200390: loss nan, time 121.72ms
tensor(0.9843)
iter 200400: loss nan, time 122.24ms
iter 200410: loss nan, time 121.94ms
iter 200420: loss nan, time 120.63ms
iter 200430: loss nan, time 121.81ms
iter 200440: loss nan, time 121.95ms
iter 200450: loss nan, time 121.83ms
iter 200460: loss nan, time 121.95ms
iter 200470: loss nan, time 120.62ms
iter 200480: loss nan, time 121.00ms
iter 200490: loss nan, time 122.20ms
tensor(0.9755)
step 200500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 200500: loss nan, time 2919.89ms
iter 200510: loss nan, time 122.09ms
iter 200520: loss nan, time 122.01ms
iter 200530: loss nan, time 121.00ms
iter 200540: loss nan, time 122.02ms
iter 200550: loss nan, time 121.87ms
iter 200560: loss nan, time 121.20ms
iter 200570: loss nan, time 121.82ms
iter 200580: loss nan, time 120.88ms
iter 200590: loss nan, time 120.31ms
tensor(0.9649)
iter 200600: loss nan, time 122.45ms
iter 200610: loss nan, time 122.52ms
iter 200620: loss nan, time 122.12ms
iter 200630: loss nan, time 121.04ms
iter 200640: loss nan, time 121.96ms
iter 200650: loss nan, time 120.83ms
iter 200660: loss nan, time 122.04ms
iter 200670: loss nan, time 122.06ms
iter 200680: loss nan, time 120.58ms
iter 200690: loss nan, time 122.03ms
tensor(0.9524)
iter 200700: loss nan, time 122.13ms
iter 200710: loss nan, time 121.17ms
iter 200720: loss nan, time 120.71ms
iter 200730: loss nan, time 120.80ms
iter 200740: loss nan, time 122.04ms
step 200750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 200750: loss nan, time 2911.08ms
iter 200760: loss nan, time 122.20ms
iter 200770: loss nan, time 121.99ms
iter 200780: loss nan, time 121.16ms
iter 200790: loss nan, time 118.54ms
tensor(0.9382)
iter 200800: loss nan, time 120.87ms
iter 200810: loss nan, time 121.90ms
iter 200820: loss nan, time 122.16ms
iter 200830: loss nan, time 122.02ms
iter 200840: loss nan, time 121.02ms
iter 200850: loss nan, time 121.92ms
iter 200860: loss nan, time 121.94ms
iter 200870: loss nan, time 122.34ms
iter 200880: loss nan, time 122.08ms
iter 200890: loss nan, time 120.25ms
tensor(0.9222)
iter 200900: loss nan, time 121.40ms
iter 200910: loss nan, time 121.87ms
iter 200920: loss nan, time 121.35ms
iter 200930: loss nan, time 122.20ms
iter 200940: loss nan, time 120.96ms
iter 200950: loss nan, time 121.98ms
iter 200960: loss nan, time 121.91ms
iter 200970: loss nan, time 121.88ms
iter 200980: loss nan, time 121.91ms
iter 200990: loss nan, time 120.98ms
tensor(0.9045)
step 201000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 201000: loss nan, time 2894.45ms
iter 201010: loss nan, time 119.56ms
iter 201020: loss nan, time 119.94ms
iter 201030: loss nan, time 119.96ms
iter 201040: loss nan, time 119.17ms
iter 201050: loss nan, time 118.98ms
iter 201060: loss nan, time 120.00ms
iter 201070: loss nan, time 118.97ms
iter 201080: loss nan, time 119.85ms
iter 201090: loss nan, time 119.77ms
tensor(0.8853)
iter 201100: loss nan, time 118.64ms
iter 201110: loss nan, time 119.80ms
iter 201120: loss nan, time 119.86ms
iter 201130: loss nan, time 119.30ms
iter 201140: loss nan, time 120.03ms
iter 201150: loss nan, time 118.87ms
iter 201160: loss nan, time 119.31ms
iter 201170: loss nan, time 119.84ms
iter 201180: loss nan, time 119.33ms
iter 201190: loss nan, time 120.08ms
tensor(0.8645)
iter 201200: loss nan, time 118.86ms
iter 201210: loss nan, time 119.85ms
iter 201220: loss nan, time 119.77ms
iter 201230: loss nan, time 118.66ms
iter 201240: loss nan, time 119.25ms
step 201250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 201250: loss nan, time 2906.29ms
iter 201260: loss nan, time 118.73ms
iter 201270: loss nan, time 119.35ms
iter 201280: loss nan, time 119.58ms
iter 201290: loss nan, time 118.03ms
tensor(0.8423)
iter 201300: loss nan, time 119.14ms
iter 201310: loss nan, time 118.54ms
iter 201320: loss nan, time 118.55ms
iter 201330: loss nan, time 118.64ms
iter 201340: loss nan, time 118.34ms
iter 201350: loss nan, time 118.56ms
iter 201360: loss nan, time 118.15ms
iter 201370: loss nan, time 117.71ms
iter 201380: loss nan, time 118.72ms
iter 201390: loss nan, time 118.03ms
tensor(0.8187)
iter 201400: loss nan, time 118.80ms
iter 201410: loss nan, time 118.59ms
iter 201420: loss nan, time 116.81ms
iter 201430: loss nan, time 118.41ms
iter 201440: loss nan, time 117.62ms
iter 201450: loss nan, time 118.55ms
iter 201460: loss nan, time 117.51ms
iter 201470: loss nan, time 118.55ms
iter 201480: loss nan, time 118.71ms
iter 201490: loss nan, time 117.59ms
tensor(0.7939)
step 201500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 201500: loss nan, time 2893.77ms
iter 201510: loss nan, time 118.46ms
iter 201520: loss nan, time 120.61ms
iter 201530: loss nan, time 120.80ms
iter 201540: loss nan, time 121.86ms
iter 201550: loss nan, time 120.95ms
iter 201560: loss nan, time 119.71ms
iter 201570: loss nan, time 120.24ms
iter 201580: loss nan, time 121.84ms
iter 201590: loss nan, time 121.82ms
tensor(0.7679)
iter 201600: loss nan, time 121.72ms
iter 201610: loss nan, time 119.80ms
iter 201620: loss nan, time 121.75ms
iter 201630: loss nan, time 120.41ms
iter 201640: loss nan, time 121.77ms
iter 201650: loss nan, time 122.00ms
iter 201660: loss nan, time 119.94ms
iter 201670: loss nan, time 120.76ms
iter 201680: loss nan, time 120.60ms
iter 201690: loss nan, time 121.83ms
tensor(0.7409)
iter 201700: loss nan, time 122.13ms
iter 201710: loss nan, time 119.68ms
iter 201720: loss nan, time 122.10ms
iter 201730: loss nan, time 121.45ms
iter 201740: loss nan, time 121.76ms
step 201750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 201750: loss nan, time 2906.66ms
iter 201760: loss nan, time 114.84ms
iter 201770: loss nan, time 115.38ms
iter 201780: loss nan, time 117.84ms
iter 201790: loss nan, time 114.95ms
tensor(0.7129)
iter 201800: loss nan, time 117.31ms
iter 201810: loss nan, time 117.69ms
iter 201820: loss nan, time 114.53ms
iter 201830: loss nan, time 118.11ms
iter 201840: loss nan, time 116.52ms
iter 201850: loss nan, time 115.21ms
iter 201860: loss nan, time 117.96ms
iter 201870: loss nan, time 115.79ms
iter 201880: loss nan, time 114.68ms
iter 201890: loss nan, time 117.95ms
tensor(0.6841)
iter 201900: loss nan, time 116.01ms
iter 201910: loss nan, time 116.97ms
iter 201920: loss nan, time 118.03ms
iter 201930: loss nan, time 114.62ms
iter 201940: loss nan, time 116.82ms
iter 201950: loss nan, time 117.18ms
iter 201960: loss nan, time 114.48ms
iter 201970: loss nan, time 117.98ms
iter 201980: loss nan, time 116.63ms
iter 201990: loss nan, time 114.55ms
tensor(0.6545)
step 202000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 202000: loss nan, time 2905.76ms
iter 202010: loss nan, time 114.93ms
iter 202020: loss nan, time 116.99ms
iter 202030: loss nan, time 115.92ms
iter 202040: loss nan, time 114.62ms
iter 202050: loss nan, time 116.73ms
iter 202060: loss nan, time 117.04ms
iter 202070: loss nan, time 114.59ms
iter 202080: loss nan, time 118.30ms
iter 202090: loss nan, time 115.78ms
tensor(0.6243)
iter 202100: loss nan, time 115.24ms
iter 202110: loss nan, time 116.66ms
iter 202120: loss nan, time 115.87ms
iter 202130: loss nan, time 117.22ms
iter 202140: loss nan, time 117.96ms
iter 202150: loss nan, time 114.84ms
iter 202160: loss nan, time 116.76ms
iter 202170: loss nan, time 116.60ms
iter 202180: loss nan, time 114.75ms
iter 202190: loss nan, time 118.60ms
tensor(0.5937)
iter 202200: loss nan, time 116.97ms
iter 202210: loss nan, time 115.01ms
iter 202220: loss nan, time 118.54ms
iter 202230: loss nan, time 114.87ms
iter 202240: loss nan, time 116.83ms
step 202250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 202250: loss nan, time 2903.35ms
iter 202260: loss nan, time 114.83ms
iter 202270: loss nan, time 118.12ms
iter 202280: loss nan, time 115.26ms
iter 202290: loss nan, time 114.63ms
tensor(0.5627)
iter 202300: loss nan, time 117.77ms
iter 202310: loss nan, time 114.63ms
iter 202320: loss nan, time 116.33ms
iter 202330: loss nan, time 116.55ms
iter 202340: loss nan, time 115.06ms
iter 202350: loss nan, time 116.85ms
iter 202360: loss nan, time 116.92ms
iter 202370: loss nan, time 114.74ms
iter 202380: loss nan, time 118.04ms
iter 202390: loss nan, time 116.18ms
tensor(0.5314)
iter 202400: loss nan, time 115.18ms
iter 202410: loss nan, time 115.50ms
iter 202420: loss nan, time 116.70ms
iter 202430: loss nan, time 114.99ms
iter 202440: loss nan, time 117.93ms
iter 202450: loss nan, time 114.87ms
iter 202460: loss nan, time 117.08ms
iter 202470: loss nan, time 114.98ms
iter 202480: loss nan, time 114.64ms
iter 202490: loss nan, time 116.79ms
tensor(0.5000)
step 202500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 202500: loss nan, time 2908.68ms
iter 202510: loss nan, time 116.94ms
iter 202520: loss nan, time 118.07ms
iter 202530: loss nan, time 114.55ms
iter 202540: loss nan, time 117.13ms
iter 202550: loss nan, time 116.91ms
iter 202560: loss nan, time 115.13ms
iter 202570: loss nan, time 117.90ms
iter 202580: loss nan, time 115.93ms
iter 202590: loss nan, time 116.81ms
tensor(0.4686)
iter 202600: loss nan, time 118.57ms
iter 202610: loss nan, time 115.20ms
iter 202620: loss nan, time 116.77ms
iter 202630: loss nan, time 116.43ms
iter 202640: loss nan, time 113.75ms
iter 202650: loss nan, time 118.28ms
iter 202660: loss nan, time 115.83ms
iter 202670: loss nan, time 114.27ms
iter 202680: loss nan, time 118.30ms
iter 202690: loss nan, time 114.94ms
tensor(0.4373)
iter 202700: loss nan, time 116.96ms
iter 202710: loss nan, time 116.73ms
iter 202720: loss nan, time 114.54ms
iter 202730: loss nan, time 115.89ms
iter 202740: loss nan, time 116.27ms
step 202750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 202750: loss nan, time 2900.70ms
iter 202760: loss nan, time 112.86ms
iter 202770: loss nan, time 117.66ms
iter 202780: loss nan, time 116.81ms
iter 202790: loss nan, time 114.58ms
tensor(0.4063)
iter 202800: loss nan, time 118.74ms
iter 202810: loss nan, time 117.59ms
iter 202820: loss nan, time 114.43ms
iter 202830: loss nan, time 118.10ms
iter 202840: loss nan, time 117.03ms
iter 202850: loss nan, time 114.64ms
iter 202860: loss nan, time 117.89ms
iter 202870: loss nan, time 117.04ms
iter 202880: loss nan, time 114.58ms
iter 202890: loss nan, time 117.61ms
tensor(0.3757)
iter 202900: loss nan, time 116.45ms
iter 202910: loss nan, time 115.02ms
iter 202920: loss nan, time 118.04ms
iter 202930: loss nan, time 114.59ms
iter 202940: loss nan, time 116.80ms
iter 202950: loss nan, time 117.65ms
iter 202960: loss nan, time 115.82ms
iter 202970: loss nan, time 116.87ms
iter 202980: loss nan, time 116.27ms
iter 202990: loss nan, time 114.95ms
tensor(0.3455)
step 203000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 203000: loss nan, time 2879.31ms
iter 203010: loss nan, time 115.87ms
iter 203020: loss nan, time 116.87ms
iter 203030: loss nan, time 115.74ms
iter 203040: loss nan, time 116.71ms
iter 203050: loss nan, time 116.90ms
iter 203060: loss nan, time 115.33ms
iter 203070: loss nan, time 116.83ms
iter 203080: loss nan, time 114.24ms
iter 203090: loss nan, time 114.91ms
tensor(0.3159)
iter 203100: loss nan, time 118.68ms
iter 203110: loss nan, time 115.81ms
iter 203120: loss nan, time 116.87ms
iter 203130: loss nan, time 117.29ms
iter 203140: loss nan, time 113.74ms
iter 203150: loss nan, time 117.13ms
iter 203160: loss nan, time 117.11ms
iter 203170: loss nan, time 116.46ms
iter 203180: loss nan, time 116.02ms
iter 203190: loss nan, time 116.46ms
tensor(0.2871)
iter 203200: loss nan, time 115.43ms
iter 203210: loss nan, time 116.95ms
iter 203220: loss nan, time 115.89ms
iter 203230: loss nan, time 116.12ms
iter 203240: loss nan, time 118.18ms
step 203250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 203250: loss nan, time 2900.23ms
iter 203260: loss nan, time 114.77ms
iter 203270: loss nan, time 116.30ms
iter 203280: loss nan, time 115.02ms
iter 203290: loss nan, time 116.88ms
tensor(0.2591)
iter 203300: loss nan, time 116.32ms
iter 203310: loss nan, time 117.09ms
iter 203320: loss nan, time 116.02ms
iter 203330: loss nan, time 115.99ms
iter 203340: loss nan, time 116.73ms
iter 203350: loss nan, time 118.35ms
iter 203360: loss nan, time 116.67ms
iter 203370: loss nan, time 117.40ms
iter 203380: loss nan, time 116.08ms
iter 203390: loss nan, time 115.99ms
tensor(0.2321)
iter 203400: loss nan, time 117.17ms
iter 203410: loss nan, time 117.36ms
iter 203420: loss nan, time 115.90ms
iter 203430: loss nan, time 116.92ms
iter 203440: loss nan, time 115.89ms
iter 203450: loss nan, time 115.51ms
iter 203460: loss nan, time 116.77ms
iter 203470: loss nan, time 115.96ms
iter 203480: loss nan, time 116.23ms
iter 203490: loss nan, time 118.28ms
tensor(0.2061)
step 203500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 203500: loss nan, time 2902.01ms
iter 203510: loss nan, time 116.94ms
iter 203520: loss nan, time 115.88ms
iter 203530: loss nan, time 115.01ms
iter 203540: loss nan, time 117.37ms
iter 203550: loss nan, time 115.97ms
iter 203560: loss nan, time 115.43ms
iter 203570: loss nan, time 118.03ms
iter 203580: loss nan, time 115.94ms
iter 203590: loss nan, time 116.88ms
tensor(0.1813)
iter 203600: loss nan, time 118.88ms
iter 203610: loss nan, time 115.85ms
iter 203620: loss nan, time 116.85ms
iter 203630: loss nan, time 118.27ms
iter 203640: loss nan, time 115.69ms
iter 203650: loss nan, time 116.81ms
iter 203660: loss nan, time 117.02ms
iter 203670: loss nan, time 115.83ms
iter 203680: loss nan, time 116.87ms
iter 203690: loss nan, time 116.37ms
tensor(0.1577)
iter 203700: loss nan, time 115.55ms
iter 203710: loss nan, time 117.14ms
iter 203720: loss nan, time 115.90ms
iter 203730: loss nan, time 115.52ms
iter 203740: loss nan, time 117.14ms
step 203750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 203750: loss nan, time 2905.00ms
iter 203760: loss nan, time 116.66ms
iter 203770: loss nan, time 116.06ms
iter 203780: loss nan, time 115.70ms
iter 203790: loss nan, time 116.57ms
tensor(0.1355)
iter 203800: loss nan, time 116.42ms
iter 203810: loss nan, time 115.32ms
iter 203820: loss nan, time 116.59ms
iter 203830: loss nan, time 115.77ms
iter 203840: loss nan, time 116.84ms
iter 203850: loss nan, time 117.06ms
iter 203860: loss nan, time 116.01ms
iter 203870: loss nan, time 116.70ms
iter 203880: loss nan, time 116.69ms
iter 203890: loss nan, time 114.74ms
tensor(0.1147)
iter 203900: loss nan, time 117.00ms
iter 203910: loss nan, time 114.26ms
iter 203920: loss nan, time 116.73ms
iter 203930: loss nan, time 117.97ms
iter 203940: loss nan, time 115.46ms
iter 203950: loss nan, time 116.68ms
iter 203960: loss nan, time 116.19ms
iter 203970: loss nan, time 115.08ms
iter 203980: loss nan, time 116.97ms
iter 203990: loss nan, time 115.88ms
tensor(0.0955)
step 204000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 204000: loss nan, time 2897.89ms
iter 204010: loss nan, time 115.72ms
iter 204020: loss nan, time 114.74ms
iter 204030: loss nan, time 114.61ms
iter 204040: loss nan, time 115.78ms
iter 204050: loss nan, time 116.86ms
iter 204060: loss nan, time 117.87ms
iter 204070: loss nan, time 114.77ms
iter 204080: loss nan, time 116.67ms
iter 204090: loss nan, time 115.72ms
tensor(0.0778)
iter 204100: loss nan, time 114.86ms
iter 204110: loss nan, time 116.92ms
iter 204120: loss nan, time 115.58ms
iter 204130: loss nan, time 116.57ms
iter 204140: loss nan, time 118.29ms
iter 204150: loss nan, time 114.82ms
iter 204160: loss nan, time 116.93ms
iter 204170: loss nan, time 115.07ms
iter 204180: loss nan, time 115.30ms
iter 204190: loss nan, time 117.13ms
tensor(0.0618)
iter 204200: loss nan, time 115.43ms
iter 204210: loss nan, time 116.42ms
iter 204220: loss nan, time 117.64ms
iter 204230: loss nan, time 114.96ms
iter 204240: loss nan, time 117.00ms
step 204250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 204250: loss nan, time 2907.12ms
iter 204260: loss nan, time 114.79ms
iter 204270: loss nan, time 118.67ms
iter 204280: loss nan, time 115.97ms
iter 204290: loss nan, time 114.57ms
tensor(0.0476)
iter 204300: loss nan, time 118.63ms
iter 204310: loss nan, time 115.73ms
iter 204320: loss nan, time 116.80ms
iter 204330: loss nan, time 118.00ms
iter 204340: loss nan, time 115.80ms
iter 204350: loss nan, time 114.69ms
iter 204360: loss nan, time 115.29ms
iter 204370: loss nan, time 115.84ms
iter 204380: loss nan, time 116.80ms
iter 204390: loss nan, time 115.84ms
tensor(0.0351)
iter 204400: loss nan, time 117.29ms
iter 204410: loss nan, time 116.08ms
iter 204420: loss nan, time 115.81ms
iter 204430: loss nan, time 116.56ms
iter 204440: loss nan, time 117.16ms
iter 204450: loss nan, time 116.02ms
iter 204460: loss nan, time 117.03ms
iter 204470: loss nan, time 116.02ms
iter 204480: loss nan, time 114.81ms
iter 204490: loss nan, time 116.69ms
tensor(0.0245)
step 204500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 204500: loss nan, time 2897.50ms
iter 204510: loss nan, time 116.77ms
iter 204520: loss nan, time 116.32ms
iter 204530: loss nan, time 115.29ms
iter 204540: loss nan, time 116.80ms
iter 204550: loss nan, time 115.78ms
iter 204560: loss nan, time 116.11ms
iter 204570: loss nan, time 117.97ms
iter 204580: loss nan, time 115.85ms
iter 204590: loss nan, time 116.94ms
tensor(0.0157)
iter 204600: loss nan, time 118.30ms
iter 204610: loss nan, time 115.91ms
iter 204620: loss nan, time 116.59ms
iter 204630: loss nan, time 116.03ms
iter 204640: loss nan, time 115.78ms
iter 204650: loss nan, time 117.20ms
iter 204660: loss nan, time 116.67ms
iter 204670: loss nan, time 114.73ms
iter 204680: loss nan, time 116.51ms
iter 204690: loss nan, time 115.76ms
tensor(0.0089)
iter 204700: loss nan, time 117.13ms
iter 204710: loss nan, time 118.20ms
iter 204720: loss nan, time 116.32ms
iter 204730: loss nan, time 114.68ms
iter 204740: loss nan, time 116.79ms
step 204750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 204750: loss nan, time 2898.81ms
iter 204760: loss nan, time 117.49ms
iter 204770: loss nan, time 116.00ms
iter 204780: loss nan, time 116.67ms
iter 204790: loss nan, time 118.31ms
tensor(0.0039)
iter 204800: loss nan, time 116.22ms
iter 204810: loss nan, time 116.18ms
iter 204820: loss nan, time 116.65ms
iter 204830: loss nan, time 115.26ms
iter 204840: loss nan, time 116.62ms
iter 204850: loss nan, time 116.11ms
iter 204860: loss nan, time 114.74ms
iter 204870: loss nan, time 117.95ms
iter 204880: loss nan, time 114.05ms
iter 204890: loss nan, time 116.71ms
tensor(0.0010)
iter 204900: loss nan, time 117.42ms
iter 204910: loss nan, time 116.21ms
iter 204920: loss nan, time 116.71ms
iter 204930: loss nan, time 115.96ms
iter 204940: loss nan, time 114.73ms
iter 204950: loss nan, time 118.12ms
iter 204960: loss nan, time 115.73ms
iter 204970: loss nan, time 116.78ms
iter 204980: loss nan, time 116.69ms
iter 204990: loss nan, time 114.87ms
tensor(0.0010)
step 205000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 205000: loss nan, time 2891.31ms
iter 205010: loss nan, time 115.21ms
iter 205020: loss nan, time 117.02ms
iter 205030: loss nan, time 116.41ms
iter 205040: loss nan, time 115.40ms
iter 205050: loss nan, time 117.21ms
iter 205060: loss nan, time 115.60ms
iter 205070: loss nan, time 116.83ms
iter 205080: loss nan, time 117.32ms
iter 205090: loss nan, time 115.83ms
tensor(0.0010)
iter 205100: loss nan, time 117.05ms
iter 205110: loss nan, time 117.00ms
iter 205120: loss nan, time 115.19ms
iter 205130: loss nan, time 116.60ms
iter 205140: loss nan, time 114.54ms
iter 205150: loss nan, time 114.91ms
iter 205160: loss nan, time 118.06ms
iter 205170: loss nan, time 116.33ms
iter 205180: loss nan, time 116.69ms
iter 205190: loss nan, time 117.65ms
tensor(0.0039)
iter 205200: loss nan, time 115.10ms
iter 205210: loss nan, time 116.49ms
iter 205220: loss nan, time 117.39ms
iter 205230: loss nan, time 115.32ms
iter 205240: loss nan, time 117.57ms
step 205250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 205250: loss nan, time 2905.80ms
iter 205260: loss nan, time 114.70ms
iter 205270: loss nan, time 118.21ms
iter 205280: loss nan, time 116.12ms
iter 205290: loss nan, time 117.03ms
tensor(0.0089)
iter 205300: loss nan, time 119.14ms
iter 205310: loss nan, time 116.13ms
iter 205320: loss nan, time 114.94ms
iter 205330: loss nan, time 117.80ms
iter 205340: loss nan, time 115.70ms
iter 205350: loss nan, time 116.99ms
iter 205360: loss nan, time 118.43ms
iter 205370: loss nan, time 115.95ms
iter 205380: loss nan, time 114.80ms
iter 205390: loss nan, time 118.17ms
tensor(0.0157)
iter 205400: loss nan, time 116.40ms
iter 205410: loss nan, time 116.82ms
iter 205420: loss nan, time 118.32ms
iter 205430: loss nan, time 115.96ms
iter 205440: loss nan, time 114.80ms
iter 205450: loss nan, time 117.92ms
iter 205460: loss nan, time 116.07ms
iter 205470: loss nan, time 116.96ms
iter 205480: loss nan, time 118.18ms
iter 205490: loss nan, time 116.14ms
tensor(0.0245)
step 205500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 205500: loss nan, time 2899.33ms
iter 205510: loss nan, time 115.75ms
iter 205520: loss nan, time 115.42ms
iter 205530: loss nan, time 116.66ms
iter 205540: loss nan, time 116.07ms
iter 205550: loss nan, time 115.40ms
iter 205560: loss nan, time 116.73ms
iter 205570: loss nan, time 115.99ms
iter 205580: loss nan, time 116.75ms
iter 205590: loss nan, time 116.91ms
tensor(0.0351)
iter 205600: loss nan, time 116.60ms
iter 205610: loss nan, time 116.48ms
iter 205620: loss nan, time 117.85ms
iter 205630: loss nan, time 115.90ms
iter 205640: loss nan, time 116.68ms
iter 205650: loss nan, time 121.33ms
iter 205660: loss nan, time 121.24ms
iter 205670: loss nan, time 121.36ms
iter 205680: loss nan, time 119.62ms
iter 205690: loss nan, time 119.17ms
tensor(0.0476)
iter 205700: loss nan, time 120.14ms
iter 205710: loss nan, time 120.59ms
iter 205720: loss nan, time 120.36ms
iter 205730: loss nan, time 120.39ms
iter 205740: loss nan, time 121.19ms
step 205750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 205750: loss nan, time 2915.35ms
iter 205760: loss nan, time 121.15ms
iter 205770: loss nan, time 122.38ms
iter 205780: loss nan, time 123.06ms
iter 205790: loss nan, time 119.01ms
tensor(0.0618)
iter 205800: loss nan, time 121.47ms
iter 205810: loss nan, time 121.06ms
iter 205820: loss nan, time 121.62ms
iter 205830: loss nan, time 121.19ms
iter 205840: loss nan, time 119.09ms
iter 205850: loss nan, time 119.11ms
iter 205860: loss nan, time 120.31ms
iter 205870: loss nan, time 120.54ms
iter 205880: loss nan, time 120.14ms
iter 205890: loss nan, time 119.09ms
tensor(0.0778)
iter 205900: loss nan, time 119.99ms
iter 205910: loss nan, time 121.34ms
iter 205920: loss nan, time 121.27ms
iter 205930: loss nan, time 122.94ms
iter 205940: loss nan, time 122.55ms
iter 205950: loss nan, time 121.19ms
iter 205960: loss nan, time 121.16ms
iter 205970: loss nan, time 121.13ms
iter 205980: loss nan, time 121.65ms
iter 205990: loss nan, time 117.83ms
tensor(0.0955)
step 206000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 206000: loss nan, time 2907.45ms
iter 206010: loss nan, time 121.47ms
iter 206020: loss nan, time 119.54ms
iter 206030: loss nan, time 119.90ms
iter 206040: loss nan, time 120.10ms
iter 206050: loss nan, time 120.28ms
iter 206060: loss nan, time 120.45ms
iter 206070: loss nan, time 120.28ms
iter 206080: loss nan, time 120.47ms
iter 206090: loss nan, time 120.75ms
tensor(0.1147)
iter 206100: loss nan, time 122.87ms
iter 206110: loss nan, time 121.17ms
iter 206120: loss nan, time 121.18ms
iter 206130: loss nan, time 121.65ms
iter 206140: loss nan, time 119.17ms
iter 206150: loss nan, time 119.35ms
iter 206160: loss nan, time 120.19ms
iter 206170: loss nan, time 121.57ms
iter 206180: loss nan, time 120.53ms
iter 206190: loss nan, time 120.29ms
tensor(0.1355)
iter 206200: loss nan, time 119.87ms
iter 206210: loss nan, time 121.85ms
iter 206220: loss nan, time 122.52ms
iter 206230: loss nan, time 122.01ms
iter 206240: loss nan, time 121.31ms
step 206250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 206250: loss nan, time 2901.45ms
iter 206260: loss nan, time 121.16ms
iter 206270: loss nan, time 121.39ms
iter 206280: loss nan, time 121.27ms
iter 206290: loss nan, time 121.29ms
tensor(0.1577)
iter 206300: loss nan, time 121.39ms
iter 206310: loss nan, time 121.36ms
iter 206320: loss nan, time 118.25ms
iter 206330: loss nan, time 119.57ms
iter 206340: loss nan, time 120.36ms
iter 206350: loss nan, time 120.59ms
iter 206360: loss nan, time 120.36ms
iter 206370: loss nan, time 121.68ms
iter 206380: loss nan, time 122.59ms
iter 206390: loss nan, time 120.73ms
tensor(0.1813)
iter 206400: loss nan, time 122.87ms
iter 206410: loss nan, time 122.39ms
iter 206420: loss nan, time 122.08ms
iter 206430: loss nan, time 121.22ms
iter 206440: loss nan, time 119.26ms
iter 206450: loss nan, time 119.34ms
iter 206460: loss nan, time 118.85ms
iter 206470: loss nan, time 119.15ms
iter 206480: loss nan, time 119.20ms
iter 206490: loss nan, time 119.15ms
tensor(0.2061)
step 206500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 206500: loss nan, time 2876.85ms
iter 206510: loss nan, time 121.13ms
iter 206520: loss nan, time 121.15ms
iter 206530: loss nan, time 121.32ms
iter 206540: loss nan, time 119.63ms
iter 206550: loss nan, time 119.27ms
iter 206560: loss nan, time 119.03ms
iter 206570: loss nan, time 119.61ms
iter 206580: loss nan, time 120.18ms
iter 206590: loss nan, time 120.65ms
tensor(0.2321)
iter 206600: loss nan, time 122.82ms
iter 206610: loss nan, time 121.42ms
iter 206620: loss nan, time 123.02ms
iter 206630: loss nan, time 122.10ms
iter 206640: loss nan, time 121.20ms
iter 206650: loss nan, time 121.42ms
iter 206660: loss nan, time 118.94ms
iter 206670: loss nan, time 119.58ms
iter 206680: loss nan, time 119.10ms
iter 206690: loss nan, time 119.09ms
tensor(0.2591)
iter 206700: loss nan, time 119.41ms
iter 206710: loss nan, time 119.24ms
iter 206720: loss nan, time 120.91ms
iter 206730: loss nan, time 120.84ms
iter 206740: loss nan, time 120.19ms
step 206750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 206750: loss nan, time 2868.79ms
iter 206760: loss nan, time 119.14ms
iter 206770: loss nan, time 119.15ms
iter 206780: loss nan, time 118.96ms
iter 206790: loss nan, time 118.55ms
tensor(0.2871)
iter 206800: loss nan, time 119.86ms
iter 206810: loss nan, time 118.94ms
iter 206820: loss nan, time 118.20ms
iter 206830: loss nan, time 119.15ms
iter 206840: loss nan, time 119.56ms
iter 206850: loss nan, time 120.83ms
iter 206860: loss nan, time 122.03ms
iter 206870: loss nan, time 123.47ms
iter 206880: loss nan, time 123.03ms
iter 206890: loss nan, time 121.82ms
tensor(0.3159)
iter 206900: loss nan, time 119.88ms
iter 206910: loss nan, time 119.77ms
iter 206920: loss nan, time 119.95ms
iter 206930: loss nan, time 119.09ms
iter 206940: loss nan, time 118.58ms
iter 206950: loss nan, time 120.80ms
iter 206960: loss nan, time 119.71ms
iter 206970: loss nan, time 122.64ms
iter 206980: loss nan, time 119.94ms
iter 206990: loss nan, time 122.28ms
tensor(0.3455)
step 207000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 207000: loss nan, time 2880.44ms
iter 207010: loss nan, time 122.01ms
iter 207020: loss nan, time 121.65ms
iter 207030: loss nan, time 123.00ms
iter 207040: loss nan, time 121.84ms
iter 207050: loss nan, time 121.56ms
iter 207060: loss nan, time 119.51ms
iter 207070: loss nan, time 119.22ms
iter 207080: loss nan, time 121.09ms
iter 207090: loss nan, time 121.62ms
tensor(0.3757)
iter 207100: loss nan, time 121.33ms
iter 207110: loss nan, time 123.05ms
iter 207120: loss nan, time 123.15ms
iter 207130: loss nan, time 121.84ms
iter 207140: loss nan, time 119.70ms
iter 207150: loss nan, time 119.77ms
iter 207160: loss nan, time 119.75ms
iter 207170: loss nan, time 121.50ms
iter 207180: loss nan, time 123.14ms
iter 207190: loss nan, time 123.05ms
tensor(0.4063)
iter 207200: loss nan, time 122.41ms
iter 207210: loss nan, time 121.77ms
iter 207220: loss nan, time 119.86ms
iter 207230: loss nan, time 119.41ms
iter 207240: loss nan, time 120.44ms
step 207250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 207250: loss nan, time 2877.72ms
iter 207260: loss nan, time 119.65ms
iter 207270: loss nan, time 119.60ms
iter 207280: loss nan, time 121.42ms
iter 207290: loss nan, time 121.10ms
tensor(0.4373)
iter 207300: loss nan, time 121.26ms
iter 207310: loss nan, time 121.73ms
iter 207320: loss nan, time 121.70ms
iter 207330: loss nan, time 121.79ms
iter 207340: loss nan, time 119.80ms
iter 207350: loss nan, time 119.83ms
iter 207360: loss nan, time 119.73ms
iter 207370: loss nan, time 121.94ms
iter 207380: loss nan, time 123.34ms
iter 207390: loss nan, time 123.09ms
tensor(0.4686)
iter 207400: loss nan, time 122.41ms
iter 207410: loss nan, time 119.69ms
iter 207420: loss nan, time 119.72ms
iter 207430: loss nan, time 120.67ms
iter 207440: loss nan, time 121.50ms
iter 207450: loss nan, time 123.04ms
iter 207460: loss nan, time 122.89ms
iter 207470: loss nan, time 121.98ms
iter 207480: loss nan, time 119.56ms
iter 207490: loss nan, time 119.76ms
tensor(0.5000)
step 207500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 207500: loss nan, time 2898.23ms
iter 207510: loss nan, time 119.89ms
iter 207520: loss nan, time 119.78ms
iter 207530: loss nan, time 120.84ms
iter 207540: loss nan, time 122.47ms
iter 207550: loss nan, time 123.11ms
iter 207560: loss nan, time 121.71ms
iter 207570: loss nan, time 121.93ms
iter 207580: loss nan, time 119.98ms
iter 207590: loss nan, time 120.17ms
tensor(0.5314)
iter 207600: loss nan, time 121.31ms
iter 207610: loss nan, time 121.62ms
iter 207620: loss nan, time 123.25ms
iter 207630: loss nan, time 123.09ms
iter 207640: loss nan, time 119.70ms
iter 207650: loss nan, time 119.59ms
iter 207660: loss nan, time 119.71ms
iter 207670: loss nan, time 120.38ms
iter 207680: loss nan, time 121.00ms
iter 207690: loss nan, time 122.00ms
tensor(0.5627)
iter 207700: loss nan, time 122.39ms
iter 207710: loss nan, time 123.19ms
iter 207720: loss nan, time 122.02ms
iter 207730: loss nan, time 119.85ms
iter 207740: loss nan, time 119.71ms
step 207750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 207750: loss nan, time 2885.57ms
iter 207760: loss nan, time 119.57ms
iter 207770: loss nan, time 119.89ms
iter 207780: loss nan, time 120.53ms
iter 207790: loss nan, time 120.93ms
tensor(0.5937)
iter 207800: loss nan, time 123.31ms
iter 207810: loss nan, time 123.13ms
iter 207820: loss nan, time 122.28ms
iter 207830: loss nan, time 121.64ms
iter 207840: loss nan, time 120.17ms
iter 207850: loss nan, time 119.67ms
iter 207860: loss nan, time 121.13ms
iter 207870: loss nan, time 122.21ms
iter 207880: loss nan, time 122.99ms
iter 207890: loss nan, time 123.01ms
tensor(0.6243)
iter 207900: loss nan, time 122.28ms
iter 207910: loss nan, time 119.76ms
iter 207920: loss nan, time 119.43ms
iter 207930: loss nan, time 120.04ms
iter 207940: loss nan, time 121.22ms
iter 207950: loss nan, time 122.19ms
iter 207960: loss nan, time 122.99ms
iter 207970: loss nan, time 122.94ms
iter 207980: loss nan, time 119.82ms
iter 207990: loss nan, time 119.16ms
tensor(0.6545)
step 208000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 208000: loss nan, time 2889.94ms
iter 208010: loss nan, time 121.91ms
iter 208020: loss nan, time 119.53ms
iter 208030: loss nan, time 119.54ms
iter 208040: loss nan, time 119.24ms
iter 208050: loss nan, time 119.35ms
iter 208060: loss nan, time 120.95ms
iter 208070: loss nan, time 122.26ms
iter 208080: loss nan, time 122.97ms
iter 208090: loss nan, time 123.29ms
tensor(0.6841)
iter 208100: loss nan, time 121.74ms
iter 208110: loss nan, time 121.78ms
iter 208120: loss nan, time 119.62ms
iter 208130: loss nan, time 119.44ms
iter 208140: loss nan, time 120.01ms
iter 208150: loss nan, time 121.28ms
iter 208160: loss nan, time 122.64ms
iter 208170: loss nan, time 123.09ms
iter 208180: loss nan, time 120.86ms
iter 208190: loss nan, time 121.78ms
tensor(0.7129)
iter 208200: loss nan, time 119.88ms
iter 208210: loss nan, time 119.77ms
iter 208220: loss nan, time 119.09ms
iter 208230: loss nan, time 120.36ms
iter 208240: loss nan, time 120.38ms
step 208250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 208250: loss nan, time 2887.38ms
iter 208260: loss nan, time 120.06ms
iter 208270: loss nan, time 120.92ms
iter 208280: loss nan, time 123.19ms
iter 208290: loss nan, time 122.89ms
tensor(0.7409)
iter 208300: loss nan, time 122.15ms
iter 208310: loss nan, time 121.79ms
iter 208320: loss nan, time 119.84ms
iter 208330: loss nan, time 119.19ms
iter 208340: loss nan, time 118.86ms
iter 208350: loss nan, time 118.78ms
iter 208360: loss nan, time 116.98ms
iter 208370: loss nan, time 120.86ms
iter 208380: loss nan, time 120.41ms
iter 208390: loss nan, time 121.03ms
tensor(0.7679)
iter 208400: loss nan, time 121.53ms
iter 208410: loss nan, time 122.42ms
iter 208420: loss nan, time 122.03ms
iter 208430: loss nan, time 120.57ms
iter 208440: loss nan, time 122.82ms
iter 208450: loss nan, time 122.40ms
iter 208460: loss nan, time 121.92ms
iter 208470: loss nan, time 121.15ms
iter 208480: loss nan, time 119.18ms
iter 208490: loss nan, time 121.51ms
tensor(0.7939)
step 208500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 208500: loss nan, time 2907.99ms
iter 208510: loss nan, time 121.02ms
iter 208520: loss nan, time 121.29ms
iter 208530: loss nan, time 120.33ms
iter 208540: loss nan, time 119.20ms
iter 208550: loss nan, time 119.14ms
iter 208560: loss nan, time 118.93ms
iter 208570: loss nan, time 118.77ms
iter 208580: loss nan, time 118.71ms
iter 208590: loss nan, time 119.90ms
tensor(0.8187)
iter 208600: loss nan, time 119.09ms
iter 208610: loss nan, time 120.76ms
iter 208620: loss nan, time 121.12ms
iter 208630: loss nan, time 121.96ms
iter 208640: loss nan, time 122.40ms
iter 208650: loss nan, time 121.30ms
iter 208660: loss nan, time 122.36ms
iter 208670: loss nan, time 122.33ms
iter 208680: loss nan, time 121.10ms
iter 208690: loss nan, time 121.29ms
tensor(0.8423)
iter 208700: loss nan, time 121.52ms
iter 208710: loss nan, time 118.90ms
iter 208720: loss nan, time 119.11ms
iter 208730: loss nan, time 118.88ms
iter 208740: loss nan, time 118.03ms
step 208750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 208750: loss nan, time 2884.40ms
iter 208760: loss nan, time 121.13ms
iter 208770: loss nan, time 119.39ms
iter 208780: loss nan, time 119.46ms
iter 208790: loss nan, time 119.19ms
tensor(0.8645)
iter 208800: loss nan, time 119.34ms
iter 208810: loss nan, time 120.55ms
iter 208820: loss nan, time 120.28ms
iter 208830: loss nan, time 121.38ms
iter 208840: loss nan, time 120.26ms
iter 208850: loss nan, time 122.68ms
iter 208860: loss nan, time 121.97ms
iter 208870: loss nan, time 122.37ms
iter 208880: loss nan, time 121.18ms
iter 208890: loss nan, time 119.35ms
tensor(0.8853)
iter 208900: loss nan, time 119.93ms
iter 208910: loss nan, time 119.02ms
iter 208920: loss nan, time 119.85ms
iter 208930: loss nan, time 119.69ms
iter 208940: loss nan, time 120.17ms
iter 208950: loss nan, time 119.60ms
iter 208960: loss nan, time 121.39ms
iter 208970: loss nan, time 121.46ms
iter 208980: loss nan, time 122.34ms
iter 208990: loss nan, time 121.89ms
tensor(0.9045)
step 209000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 209000: loss nan, time 2879.12ms
iter 209010: loss nan, time 118.84ms
iter 209020: loss nan, time 120.56ms
iter 209030: loss nan, time 121.21ms
iter 209040: loss nan, time 121.67ms
iter 209050: loss nan, time 122.42ms
iter 209060: loss nan, time 121.18ms
iter 209070: loss nan, time 122.29ms
iter 209080: loss nan, time 122.50ms
iter 209090: loss nan, time 121.17ms
tensor(0.9222)
iter 209100: loss nan, time 121.97ms
iter 209110: loss nan, time 121.28ms
iter 209120: loss nan, time 119.10ms
iter 209130: loss nan, time 119.31ms
iter 209140: loss nan, time 119.20ms
iter 209150: loss nan, time 118.17ms
iter 209160: loss nan, time 119.28ms
iter 209170: loss nan, time 119.40ms
iter 209180: loss nan, time 120.32ms
iter 209190: loss nan, time 120.09ms
tensor(0.9382)
iter 209200: loss nan, time 121.28ms
iter 209210: loss nan, time 121.45ms
iter 209220: loss nan, time 122.51ms
iter 209230: loss nan, time 122.88ms
iter 209240: loss nan, time 120.10ms
step 209250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 209250: loss nan, time 2908.49ms
iter 209260: loss nan, time 121.56ms
iter 209270: loss nan, time 121.21ms
iter 209280: loss nan, time 119.56ms
iter 209290: loss nan, time 118.96ms
tensor(0.9524)
iter 209300: loss nan, time 119.23ms
iter 209310: loss nan, time 118.26ms
iter 209320: loss nan, time 119.09ms
iter 209330: loss nan, time 121.73ms
iter 209340: loss nan, time 122.47ms
iter 209350: loss nan, time 122.80ms
iter 209360: loss nan, time 121.68ms
iter 209370: loss nan, time 121.28ms
iter 209380: loss nan, time 121.69ms
iter 209390: loss nan, time 121.19ms
tensor(0.9649)
iter 209400: loss nan, time 119.60ms
iter 209410: loss nan, time 120.15ms
iter 209420: loss nan, time 119.36ms
iter 209430: loss nan, time 121.41ms
iter 209440: loss nan, time 121.02ms
iter 209450: loss nan, time 120.52ms
iter 209460: loss nan, time 120.65ms
iter 209470: loss nan, time 121.22ms
iter 209480: loss nan, time 122.38ms
iter 209490: loss nan, time 121.03ms
tensor(0.9755)
step 209500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 209500: loss nan, time 2877.22ms
iter 209510: loss nan, time 120.20ms
iter 209520: loss nan, time 120.02ms
iter 209530: loss nan, time 120.71ms
iter 209540: loss nan, time 121.54ms
iter 209550: loss nan, time 119.83ms
iter 209560: loss nan, time 122.04ms
iter 209570: loss nan, time 121.46ms
iter 209580: loss nan, time 121.96ms
iter 209590: loss nan, time 121.41ms
tensor(0.9843)
iter 209600: loss nan, time 118.95ms
iter 209610: loss nan, time 120.94ms
iter 209620: loss nan, time 120.20ms
iter 209630: loss nan, time 121.14ms
iter 209640: loss nan, time 120.12ms
iter 209650: loss nan, time 119.91ms
iter 209660: loss nan, time 121.71ms
iter 209670: loss nan, time 117.89ms
iter 209680: loss nan, time 119.10ms
iter 209690: loss nan, time 119.39ms
tensor(0.9911)
iter 209700: loss nan, time 120.27ms
iter 209710: loss nan, time 119.50ms
iter 209720: loss nan, time 119.17ms
iter 209730: loss nan, time 120.20ms
iter 209740: loss nan, time 120.17ms
step 209750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 209750: loss nan, time 2880.02ms
iter 209760: loss nan, time 119.07ms
iter 209770: loss nan, time 118.73ms
iter 209780: loss nan, time 119.42ms
iter 209790: loss nan, time 120.64ms
tensor(0.9961)
iter 209800: loss nan, time 120.63ms
iter 209810: loss nan, time 121.35ms
iter 209820: loss nan, time 120.64ms
iter 209830: loss nan, time 122.39ms
iter 209840: loss nan, time 121.79ms
iter 209850: loss nan, time 121.15ms
iter 209860: loss nan, time 122.10ms
iter 209870: loss nan, time 121.30ms
iter 209880: loss nan, time 120.29ms
iter 209890: loss nan, time 121.76ms
tensor(0.9990)
iter 209900: loss nan, time 119.35ms
iter 209910: loss nan, time 119.95ms
iter 209920: loss nan, time 120.27ms
iter 209930: loss nan, time 119.99ms
iter 209940: loss nan, time 120.43ms
iter 209950: loss nan, time 120.24ms
iter 209960: loss nan, time 120.60ms
iter 209970: loss nan, time 120.42ms
iter 209980: loss nan, time 121.94ms
iter 209990: loss nan, time 122.62ms
tensor(1.)
step 210000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 210000: loss nan, time 2905.84ms
iter 210010: loss nan, time 120.35ms
iter 210020: loss nan, time 122.34ms
iter 210030: loss nan, time 122.63ms
iter 210040: loss nan, time 121.61ms
iter 210050: loss nan, time 121.85ms
iter 210060: loss nan, time 119.61ms
iter 210070: loss nan, time 118.81ms
iter 210080: loss nan, time 119.72ms
iter 210090: loss nan, time 120.19ms
tensor(0.9990)
iter 210100: loss nan, time 120.60ms
iter 210110: loss nan, time 120.31ms
iter 210120: loss nan, time 119.10ms
iter 210130: loss nan, time 120.53ms
iter 210140: loss nan, time 120.98ms
iter 210150: loss nan, time 121.61ms
iter 210160: loss nan, time 121.61ms
iter 210170: loss nan, time 121.42ms
iter 210180: loss nan, time 121.88ms
iter 210190: loss nan, time 121.38ms
tensor(0.9961)
iter 210200: loss nan, time 121.81ms
iter 210210: loss nan, time 121.24ms
iter 210220: loss nan, time 121.22ms
iter 210230: loss nan, time 119.76ms
iter 210240: loss nan, time 119.31ms
step 210250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 210250: loss nan, time 2885.68ms
iter 210260: loss nan, time 121.33ms
iter 210270: loss nan, time 120.62ms
iter 210280: loss nan, time 121.33ms
iter 210290: loss nan, time 121.55ms
tensor(0.9911)
iter 210300: loss nan, time 119.34ms
iter 210310: loss nan, time 120.13ms
iter 210320: loss nan, time 120.59ms
iter 210330: loss nan, time 120.29ms
iter 210340: loss nan, time 120.59ms
iter 210350: loss nan, time 121.03ms
iter 210360: loss nan, time 120.23ms
iter 210370: loss nan, time 122.33ms
iter 210380: loss nan, time 122.50ms
iter 210390: loss nan, time 121.17ms
tensor(0.9843)
iter 210400: loss nan, time 121.56ms
iter 210410: loss nan, time 119.81ms
iter 210420: loss nan, time 121.40ms
iter 210430: loss nan, time 119.14ms
iter 210440: loss nan, time 119.77ms
iter 210450: loss nan, time 120.29ms
iter 210460: loss nan, time 119.51ms
iter 210470: loss nan, time 119.02ms
iter 210480: loss nan, time 120.89ms
iter 210490: loss nan, time 120.71ms
tensor(0.9755)
step 210500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 210500: loss nan, time 2904.16ms
iter 210510: loss nan, time 120.41ms
iter 210520: loss nan, time 121.11ms
iter 210530: loss nan, time 121.22ms
iter 210540: loss nan, time 122.45ms
iter 210550: loss nan, time 121.25ms
iter 210560: loss nan, time 121.22ms
iter 210570: loss nan, time 121.31ms
iter 210580: loss nan, time 121.55ms
iter 210590: loss nan, time 121.35ms
tensor(0.9649)
iter 210600: loss nan, time 119.40ms
iter 210610: loss nan, time 119.55ms
iter 210620: loss nan, time 120.52ms
iter 210630: loss nan, time 120.49ms
iter 210640: loss nan, time 120.06ms
iter 210650: loss nan, time 120.60ms
iter 210660: loss nan, time 120.42ms
iter 210670: loss nan, time 122.63ms
iter 210680: loss nan, time 122.20ms
iter 210690: loss nan, time 120.98ms
tensor(0.9524)
iter 210700: loss nan, time 121.64ms
iter 210710: loss nan, time 118.99ms
iter 210720: loss nan, time 120.35ms
iter 210730: loss nan, time 118.97ms
iter 210740: loss nan, time 119.81ms
step 210750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 210750: loss nan, time 2883.01ms
iter 210760: loss nan, time 121.24ms
iter 210770: loss nan, time 119.09ms
iter 210780: loss nan, time 119.25ms
iter 210790: loss nan, time 119.57ms
tensor(0.9382)
iter 210800: loss nan, time 120.65ms
iter 210810: loss nan, time 120.45ms
iter 210820: loss nan, time 120.41ms
iter 210830: loss nan, time 119.60ms
iter 210840: loss nan, time 120.73ms
iter 210850: loss nan, time 121.82ms
iter 210860: loss nan, time 122.53ms
iter 210870: loss nan, time 122.61ms
iter 210880: loss nan, time 121.03ms
iter 210890: loss nan, time 122.02ms
tensor(0.9222)
iter 210900: loss nan, time 121.60ms
iter 210910: loss nan, time 118.75ms
iter 210920: loss nan, time 119.06ms
iter 210930: loss nan, time 119.61ms
iter 210940: loss nan, time 120.39ms
iter 210950: loss nan, time 120.31ms
iter 210960: loss nan, time 120.06ms
iter 210970: loss nan, time 120.32ms
iter 210980: loss nan, time 121.67ms
iter 210990: loss nan, time 121.80ms
tensor(0.9045)
step 211000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 211000: loss nan, time 2896.50ms
iter 211010: loss nan, time 120.92ms
iter 211020: loss nan, time 120.63ms
iter 211030: loss nan, time 122.62ms
iter 211040: loss nan, time 120.55ms
iter 211050: loss nan, time 121.22ms
iter 211060: loss nan, time 120.54ms
iter 211070: loss nan, time 118.51ms
iter 211080: loss nan, time 121.30ms
iter 211090: loss nan, time 118.76ms
tensor(0.8853)
iter 211100: loss nan, time 120.61ms
iter 211110: loss nan, time 119.82ms
iter 211120: loss nan, time 120.31ms
iter 211130: loss nan, time 119.18ms
iter 211140: loss nan, time 120.89ms
iter 211150: loss nan, time 121.66ms
iter 211160: loss nan, time 122.49ms
iter 211170: loss nan, time 121.68ms
iter 211180: loss nan, time 121.27ms
iter 211190: loss nan, time 120.90ms
tensor(0.8645)
iter 211200: loss nan, time 121.46ms
iter 211210: loss nan, time 121.46ms
iter 211220: loss nan, time 119.39ms
iter 211230: loss nan, time 119.56ms
iter 211240: loss nan, time 120.56ms
step 211250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 211250: loss nan, time 2878.92ms
iter 211260: loss nan, time 121.03ms
iter 211270: loss nan, time 119.92ms
iter 211280: loss nan, time 119.40ms
iter 211290: loss nan, time 119.21ms
tensor(0.8423)
iter 211300: loss nan, time 119.96ms
iter 211310: loss nan, time 120.46ms
iter 211320: loss nan, time 120.43ms
iter 211330: loss nan, time 120.41ms
iter 211340: loss nan, time 120.21ms
iter 211350: loss nan, time 120.70ms
iter 211360: loss nan, time 121.36ms
iter 211370: loss nan, time 120.40ms
iter 211380: loss nan, time 121.95ms
iter 211390: loss nan, time 122.87ms
tensor(0.8187)
iter 211400: loss nan, time 122.04ms
iter 211410: loss nan, time 121.06ms
iter 211420: loss nan, time 119.12ms
iter 211430: loss nan, time 121.12ms
iter 211440: loss nan, time 121.36ms
iter 211450: loss nan, time 121.05ms
iter 211460: loss nan, time 121.16ms
iter 211470: loss nan, time 118.92ms
iter 211480: loss nan, time 120.12ms
iter 211490: loss nan, time 119.58ms
tensor(0.7939)
step 211500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 211500: loss nan, time 2887.60ms
iter 211510: loss nan, time 121.06ms
iter 211520: loss nan, time 121.28ms
iter 211530: loss nan, time 119.03ms
iter 211540: loss nan, time 119.29ms
iter 211550: loss nan, time 120.03ms
iter 211560: loss nan, time 120.64ms
iter 211570: loss nan, time 120.30ms
iter 211580: loss nan, time 120.01ms
iter 211590: loss nan, time 120.38ms
tensor(0.7679)
iter 211600: loss nan, time 123.94ms
iter 211610: loss nan, time 122.30ms
iter 211620: loss nan, time 121.80ms
iter 211630: loss nan, time 116.80ms
iter 211640: loss nan, time 113.69ms
iter 211650: loss nan, time 117.14ms
iter 211660: loss nan, time 116.18ms
iter 211670: loss nan, time 116.87ms
iter 211680: loss nan, time 117.19ms
iter 211690: loss nan, time 115.67ms
tensor(0.7409)
iter 211700: loss nan, time 117.76ms
iter 211710: loss nan, time 116.68ms
iter 211720: loss nan, time 115.75ms
iter 211730: loss nan, time 116.97ms
iter 211740: loss nan, time 116.63ms
step 211750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 211750: loss nan, time 2906.00ms
iter 211760: loss nan, time 114.86ms
iter 211770: loss nan, time 115.55ms
iter 211780: loss nan, time 116.93ms
iter 211790: loss nan, time 118.50ms
tensor(0.7129)
iter 211800: loss nan, time 116.59ms
iter 211810: loss nan, time 116.91ms
iter 211820: loss nan, time 115.76ms
iter 211830: loss nan, time 115.74ms
iter 211840: loss nan, time 116.20ms
iter 211850: loss nan, time 116.94ms
iter 211860: loss nan, time 115.18ms
iter 211870: loss nan, time 116.88ms
iter 211880: loss nan, time 117.21ms
iter 211890: loss nan, time 116.50ms
tensor(0.6841)
iter 211900: loss nan, time 117.83ms
iter 211910: loss nan, time 116.79ms
iter 211920: loss nan, time 115.74ms
iter 211930: loss nan, time 116.85ms
iter 211940: loss nan, time 116.40ms
iter 211950: loss nan, time 115.62ms
iter 211960: loss nan, time 117.02ms
iter 211970: loss nan, time 117.14ms
iter 211980: loss nan, time 116.17ms
iter 211990: loss nan, time 117.19ms
tensor(0.6545)
step 212000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 212000: loss nan, time 2904.42ms
iter 212010: loss nan, time 115.01ms
iter 212020: loss nan, time 118.38ms
iter 212030: loss nan, time 115.21ms
iter 212040: loss nan, time 114.97ms
iter 212050: loss nan, time 118.72ms
iter 212060: loss nan, time 116.19ms
iter 212070: loss nan, time 115.19ms
iter 212080: loss nan, time 118.20ms
iter 212090: loss nan, time 114.97ms
tensor(0.6243)
iter 212100: loss nan, time 115.64ms
iter 212110: loss nan, time 118.29ms
iter 212120: loss nan, time 116.24ms
iter 212130: loss nan, time 114.94ms
iter 212140: loss nan, time 118.26ms
iter 212150: loss nan, time 115.05ms
iter 212160: loss nan, time 116.20ms
iter 212170: loss nan, time 118.43ms
iter 212180: loss nan, time 116.27ms
iter 212190: loss nan, time 115.13ms
tensor(0.5937)
iter 212200: loss nan, time 118.96ms
iter 212210: loss nan, time 115.00ms
iter 212220: loss nan, time 115.68ms
iter 212230: loss nan, time 118.16ms
iter 212240: loss nan, time 115.99ms
step 212250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 212250: loss nan, time 2901.01ms
iter 212260: loss nan, time 116.27ms
iter 212270: loss nan, time 115.13ms
iter 212280: loss nan, time 116.92ms
iter 212290: loss nan, time 116.11ms
tensor(0.5627)
iter 212300: loss nan, time 117.27ms
iter 212310: loss nan, time 118.22ms
iter 212320: loss nan, time 116.21ms
iter 212330: loss nan, time 114.95ms
iter 212340: loss nan, time 118.28ms
iter 212350: loss nan, time 115.93ms
iter 212360: loss nan, time 116.90ms
iter 212370: loss nan, time 116.83ms
iter 212380: loss nan, time 115.49ms
iter 212390: loss nan, time 114.75ms
tensor(0.5314)
iter 212400: loss nan, time 116.60ms
iter 212410: loss nan, time 115.21ms
iter 212420: loss nan, time 116.95ms
iter 212430: loss nan, time 115.67ms
iter 212440: loss nan, time 117.00ms
iter 212450: loss nan, time 116.10ms
iter 212460: loss nan, time 116.00ms
iter 212470: loss nan, time 116.92ms
iter 212480: loss nan, time 117.17ms
iter 212490: loss nan, time 115.93ms
tensor(0.5000)
step 212500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 212500: loss nan, time 2915.16ms
iter 212510: loss nan, time 116.91ms
iter 212520: loss nan, time 115.81ms
iter 212530: loss nan, time 116.82ms
iter 212540: loss nan, time 115.64ms
iter 212550: loss nan, time 115.74ms
iter 212560: loss nan, time 117.62ms
iter 212570: loss nan, time 117.14ms
iter 212580: loss nan, time 115.64ms
iter 212590: loss nan, time 117.00ms
tensor(0.4686)
iter 212600: loss nan, time 115.75ms
iter 212610: loss nan, time 115.43ms
iter 212620: loss nan, time 117.19ms
iter 212630: loss nan, time 121.82ms
iter 212640: loss nan, time 119.67ms
iter 212650: loss nan, time 119.03ms
iter 212660: loss nan, time 120.78ms
iter 212670: loss nan, time 120.65ms
iter 212680: loss nan, time 121.68ms
iter 212690: loss nan, time 122.68ms
tensor(0.4373)
iter 212700: loss nan, time 122.32ms
iter 212710: loss nan, time 122.53ms
iter 212720: loss nan, time 122.65ms
iter 212730: loss nan, time 121.34ms
iter 212740: loss nan, time 121.40ms
step 212750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 212750: loss nan, time 2900.61ms
iter 212760: loss nan, time 121.46ms
iter 212770: loss nan, time 119.39ms
iter 212780: loss nan, time 119.14ms
iter 212790: loss nan, time 119.43ms
tensor(0.4063)
iter 212800: loss nan, time 119.71ms
iter 212810: loss nan, time 119.83ms
iter 212820: loss nan, time 120.22ms
iter 212830: loss nan, time 120.96ms
iter 212840: loss nan, time 120.24ms
iter 212850: loss nan, time 122.04ms
iter 212860: loss nan, time 122.53ms
iter 212870: loss nan, time 122.64ms
iter 212880: loss nan, time 123.67ms
iter 212890: loss nan, time 119.69ms
tensor(0.3757)
iter 212900: loss nan, time 121.10ms
iter 212910: loss nan, time 119.43ms
iter 212920: loss nan, time 119.46ms
iter 212930: loss nan, time 119.81ms
iter 212940: loss nan, time 119.41ms
iter 212950: loss nan, time 120.25ms
iter 212960: loss nan, time 121.12ms
iter 212970: loss nan, time 121.34ms
iter 212980: loss nan, time 122.54ms
iter 212990: loss nan, time 122.49ms
tensor(0.3455)
step 213000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 213000: loss nan, time 2913.46ms
iter 213010: loss nan, time 121.22ms
iter 213020: loss nan, time 121.40ms
iter 213030: loss nan, time 121.30ms
iter 213040: loss nan, time 119.40ms
iter 213050: loss nan, time 119.23ms
iter 213060: loss nan, time 119.08ms
iter 213070: loss nan, time 119.32ms
iter 213080: loss nan, time 120.41ms
iter 213090: loss nan, time 120.32ms
tensor(0.3159)
iter 213100: loss nan, time 121.42ms
iter 213110: loss nan, time 121.33ms
iter 213120: loss nan, time 122.33ms
iter 213130: loss nan, time 123.12ms
iter 213140: loss nan, time 120.27ms
iter 213150: loss nan, time 121.24ms
iter 213160: loss nan, time 121.32ms
iter 213170: loss nan, time 119.33ms
iter 213180: loss nan, time 119.39ms
iter 213190: loss nan, time 119.41ms
tensor(0.2871)
iter 213200: loss nan, time 119.59ms
iter 213210: loss nan, time 119.90ms
iter 213220: loss nan, time 120.75ms
iter 213230: loss nan, time 121.55ms
iter 213240: loss nan, time 122.35ms
step 213250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 213250: loss nan, time 2907.49ms
iter 213260: loss nan, time 121.25ms
iter 213270: loss nan, time 122.79ms
iter 213280: loss nan, time 122.04ms
iter 213290: loss nan, time 121.59ms
tensor(0.2591)
iter 213300: loss nan, time 121.72ms
iter 213310: loss nan, time 121.45ms
iter 213320: loss nan, time 119.08ms
iter 213330: loss nan, time 119.57ms
iter 213340: loss nan, time 119.27ms
iter 213350: loss nan, time 119.17ms
iter 213360: loss nan, time 119.91ms
iter 213370: loss nan, time 120.27ms
iter 213380: loss nan, time 121.01ms
iter 213390: loss nan, time 120.33ms
tensor(0.2321)
iter 213400: loss nan, time 122.65ms
iter 213410: loss nan, time 122.60ms
iter 213420: loss nan, time 122.70ms
iter 213430: loss nan, time 122.03ms
iter 213440: loss nan, time 119.36ms
iter 213450: loss nan, time 121.25ms
iter 213460: loss nan, time 119.51ms
iter 213470: loss nan, time 119.27ms
iter 213480: loss nan, time 119.27ms
iter 213490: loss nan, time 120.14ms
tensor(0.2061)
step 213500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 213500: loss nan, time 2886.09ms
iter 213510: loss nan, time 118.91ms
iter 213520: loss nan, time 119.25ms
iter 213530: loss nan, time 119.27ms
iter 213540: loss nan, time 118.09ms
iter 213550: loss nan, time 119.90ms
iter 213560: loss nan, time 118.96ms
iter 213570: loss nan, time 120.75ms
iter 213580: loss nan, time 122.21ms
iter 213590: loss nan, time 122.71ms
tensor(0.1813)
iter 213600: loss nan, time 122.73ms
iter 213610: loss nan, time 121.44ms
iter 213620: loss nan, time 121.59ms
iter 213630: loss nan, time 121.78ms
iter 213640: loss nan, time 119.16ms
iter 213650: loss nan, time 119.31ms
iter 213660: loss nan, time 119.29ms
iter 213670: loss nan, time 120.12ms
iter 213680: loss nan, time 120.76ms
iter 213690: loss nan, time 120.48ms
tensor(0.1577)
iter 213700: loss nan, time 121.69ms
iter 213710: loss nan, time 121.04ms
iter 213720: loss nan, time 122.95ms
iter 213730: loss nan, time 124.06ms
iter 213740: loss nan, time 119.91ms
step 213750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 213750: loss nan, time 2901.40ms
iter 213760: loss nan, time 120.79ms
iter 213770: loss nan, time 120.21ms
iter 213780: loss nan, time 119.56ms
iter 213790: loss nan, time 120.08ms
tensor(0.1355)
iter 213800: loss nan, time 119.71ms
iter 213810: loss nan, time 118.93ms
iter 213820: loss nan, time 121.53ms
iter 213830: loss nan, time 122.53ms
iter 213840: loss nan, time 122.94ms
iter 213850: loss nan, time 122.87ms
iter 213860: loss nan, time 121.61ms
iter 213870: loss nan, time 121.94ms
iter 213880: loss nan, time 119.68ms
iter 213890: loss nan, time 119.50ms
tensor(0.1147)
iter 213900: loss nan, time 120.67ms
iter 213910: loss nan, time 120.91ms
iter 213920: loss nan, time 121.99ms
iter 213930: loss nan, time 123.02ms
iter 213940: loss nan, time 120.83ms
iter 213950: loss nan, time 121.74ms
iter 213960: loss nan, time 121.60ms
iter 213970: loss nan, time 119.20ms
iter 213980: loss nan, time 118.27ms
iter 213990: loss nan, time 119.58ms
tensor(0.0955)
step 214000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 214000: loss nan, time 2906.29ms
iter 214010: loss nan, time 118.89ms
iter 214020: loss nan, time 120.39ms
iter 214030: loss nan, time 120.75ms
iter 214040: loss nan, time 121.36ms
iter 214050: loss nan, time 122.46ms
iter 214060: loss nan, time 121.57ms
iter 214070: loss nan, time 122.57ms
iter 214080: loss nan, time 122.64ms
iter 214090: loss nan, time 121.93ms
tensor(0.0778)
iter 214100: loss nan, time 120.98ms
iter 214110: loss nan, time 119.41ms
iter 214120: loss nan, time 116.69ms
iter 214130: loss nan, time 118.18ms
iter 214140: loss nan, time 115.56ms
iter 214150: loss nan, time 114.84ms
iter 214160: loss nan, time 118.86ms
iter 214170: loss nan, time 113.80ms
iter 214180: loss nan, time 117.93ms
iter 214190: loss nan, time 116.56ms
tensor(0.0618)
iter 214200: loss nan, time 115.63ms
iter 214210: loss nan, time 115.76ms
iter 214220: loss nan, time 115.58ms
iter 214230: loss nan, time 116.96ms
iter 214240: loss nan, time 118.12ms
step 214250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 214250: loss nan, time 2900.88ms
iter 214260: loss nan, time 117.90ms
iter 214270: loss nan, time 115.73ms
iter 214280: loss nan, time 117.34ms
iter 214290: loss nan, time 117.38ms
tensor(0.0476)
iter 214300: loss nan, time 115.15ms
iter 214310: loss nan, time 116.72ms
iter 214320: loss nan, time 117.01ms
iter 214330: loss nan, time 114.81ms
iter 214340: loss nan, time 117.49ms
iter 214350: loss nan, time 116.68ms
iter 214360: loss nan, time 114.35ms
iter 214370: loss nan, time 118.10ms
iter 214380: loss nan, time 116.02ms
iter 214390: loss nan, time 114.96ms
tensor(0.0351)
iter 214400: loss nan, time 118.37ms
iter 214410: loss nan, time 120.79ms
iter 214420: loss nan, time 119.32ms
iter 214430: loss nan, time 120.95ms
iter 214440: loss nan, time 122.61ms
iter 214450: loss nan, time 122.50ms
iter 214460: loss nan, time 121.91ms
iter 214470: loss nan, time 121.27ms
iter 214480: loss nan, time 122.37ms
iter 214490: loss nan, time 122.31ms
tensor(0.0245)
step 214500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 214500: loss nan, time 2910.03ms
iter 214510: loss nan, time 122.29ms
iter 214520: loss nan, time 121.09ms
iter 214530: loss nan, time 120.13ms
iter 214540: loss nan, time 121.80ms
iter 214550: loss nan, time 121.62ms
iter 214560: loss nan, time 122.11ms
iter 214570: loss nan, time 121.86ms
iter 214580: loss nan, time 121.20ms
iter 214590: loss nan, time 122.41ms
tensor(0.0157)
iter 214600: loss nan, time 122.73ms
iter 214610: loss nan, time 120.53ms
iter 214620: loss nan, time 117.00ms
iter 214630: loss nan, time 115.48ms
iter 214640: loss nan, time 117.28ms
iter 214650: loss nan, time 117.59ms
iter 214660: loss nan, time 114.70ms
iter 214670: loss nan, time 117.72ms
iter 214680: loss nan, time 116.33ms
iter 214690: loss nan, time 114.68ms
tensor(0.0089)
iter 214700: loss nan, time 118.48ms
iter 214710: loss nan, time 116.18ms
iter 214720: loss nan, time 114.82ms
iter 214730: loss nan, time 118.42ms
iter 214740: loss nan, time 115.96ms
step 214750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 214750: loss nan, time 2902.86ms
iter 214760: loss nan, time 116.48ms
iter 214770: loss nan, time 115.42ms
iter 214780: loss nan, time 118.29ms
iter 214790: loss nan, time 116.17ms
tensor(0.0039)
iter 214800: loss nan, time 116.20ms
iter 214810: loss nan, time 115.99ms
iter 214820: loss nan, time 115.04ms
iter 214830: loss nan, time 116.50ms
iter 214840: loss nan, time 116.69ms
iter 214850: loss nan, time 114.86ms
iter 214860: loss nan, time 117.86ms
iter 214870: loss nan, time 115.28ms
iter 214880: loss nan, time 117.06ms
iter 214890: loss nan, time 116.77ms
tensor(0.0010)
iter 214900: loss nan, time 114.87ms
iter 214910: loss nan, time 116.61ms
iter 214920: loss nan, time 116.91ms
iter 214930: loss nan, time 114.86ms
iter 214940: loss nan, time 118.39ms
iter 214950: loss nan, time 115.33ms
iter 214960: loss nan, time 114.85ms
iter 214970: loss nan, time 118.31ms
iter 214980: loss nan, time 116.81ms
iter 214990: loss nan, time 114.65ms
tensor(0.0010)
step 215000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 215000: loss nan, time 2912.93ms
iter 215010: loss nan, time 115.27ms
iter 215020: loss nan, time 116.73ms
iter 215030: loss nan, time 117.54ms
iter 215040: loss nan, time 114.65ms
iter 215050: loss nan, time 118.37ms
iter 215060: loss nan, time 116.23ms
iter 215070: loss nan, time 114.82ms
iter 215080: loss nan, time 117.82ms
iter 215090: loss nan, time 115.43ms
tensor(0.0010)
iter 215100: loss nan, time 115.35ms
iter 215110: loss nan, time 117.90ms
iter 215120: loss nan, time 116.48ms
iter 215130: loss nan, time 114.83ms
iter 215140: loss nan, time 118.37ms
iter 215150: loss nan, time 115.37ms
iter 215160: loss nan, time 116.55ms
iter 215170: loss nan, time 118.27ms
iter 215180: loss nan, time 115.86ms
iter 215190: loss nan, time 114.63ms
tensor(0.0039)
iter 215200: loss nan, time 118.44ms
iter 215210: loss nan, time 113.69ms
iter 215220: loss nan, time 118.24ms
iter 215230: loss nan, time 116.13ms
iter 215240: loss nan, time 114.93ms
step 215250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 215250: loss nan, time 2912.83ms
iter 215260: loss nan, time 114.23ms
iter 215270: loss nan, time 118.45ms
iter 215280: loss nan, time 115.24ms
iter 215290: loss nan, time 114.77ms
tensor(0.0089)
iter 215300: loss nan, time 118.26ms
iter 215310: loss nan, time 115.91ms
iter 215320: loss nan, time 116.58ms
iter 215330: loss nan, time 117.91ms
iter 215340: loss nan, time 114.51ms
iter 215350: loss nan, time 117.19ms
iter 215360: loss nan, time 116.67ms
iter 215370: loss nan, time 115.02ms
iter 215380: loss nan, time 117.78ms
iter 215390: loss nan, time 115.66ms
tensor(0.0157)
iter 215400: loss nan, time 116.88ms
iter 215410: loss nan, time 116.73ms
iter 215420: loss nan, time 114.89ms
iter 215430: loss nan, time 117.77ms
iter 215440: loss nan, time 116.44ms
iter 215450: loss nan, time 114.63ms
iter 215460: loss nan, time 117.80ms
iter 215470: loss nan, time 114.25ms
iter 215480: loss nan, time 116.73ms
iter 215490: loss nan, time 116.95ms
tensor(0.0245)
step 215500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 215500: loss nan, time 2906.91ms
iter 215510: loss nan, time 117.88ms
iter 215520: loss nan, time 115.03ms
iter 215530: loss nan, time 116.77ms
iter 215540: loss nan, time 115.84ms
iter 215550: loss nan, time 114.59ms
iter 215560: loss nan, time 117.96ms
iter 215570: loss nan, time 116.22ms
iter 215580: loss nan, time 115.74ms
iter 215590: loss nan, time 118.24ms
tensor(0.0351)
iter 215600: loss nan, time 114.96ms
iter 215610: loss nan, time 116.92ms
iter 215620: loss nan, time 118.07ms
iter 215630: loss nan, time 114.87ms
iter 215640: loss nan, time 116.95ms
iter 215650: loss nan, time 116.74ms
iter 215660: loss nan, time 114.60ms
iter 215670: loss nan, time 117.97ms
iter 215680: loss nan, time 115.84ms
iter 215690: loss nan, time 116.71ms
tensor(0.0476)
iter 215700: loss nan, time 118.28ms
iter 215710: loss nan, time 115.07ms
iter 215720: loss nan, time 116.76ms
iter 215730: loss nan, time 116.82ms
iter 215740: loss nan, time 115.08ms
step 215750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 215750: loss nan, time 2911.42ms
iter 215760: loss nan, time 116.43ms
iter 215770: loss nan, time 115.04ms
iter 215780: loss nan, time 114.79ms
iter 215790: loss nan, time 116.39ms
tensor(0.0618)
iter 215800: loss nan, time 115.83ms
iter 215810: loss nan, time 118.30ms
iter 215820: loss nan, time 115.86ms
iter 215830: loss nan, time 116.90ms
iter 215840: loss nan, time 116.02ms
iter 215850: loss nan, time 115.90ms
iter 215860: loss nan, time 116.87ms
iter 215870: loss nan, time 116.71ms
iter 215880: loss nan, time 116.23ms
iter 215890: loss nan, time 116.76ms
tensor(0.0778)
iter 215900: loss nan, time 116.16ms
iter 215910: loss nan, time 115.26ms
iter 215920: loss nan, time 116.79ms
iter 215930: loss nan, time 115.96ms
iter 215940: loss nan, time 117.00ms
iter 215950: loss nan, time 117.08ms
iter 215960: loss nan, time 115.83ms
iter 215970: loss nan, time 117.19ms
iter 215980: loss nan, time 117.03ms
iter 215990: loss nan, time 116.02ms
tensor(0.0955)
step 216000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 216000: loss nan, time 2913.65ms
iter 216010: loss nan, time 115.99ms
iter 216020: loss nan, time 116.22ms
iter 216030: loss nan, time 117.86ms
iter 216040: loss nan, time 116.51ms
iter 216050: loss nan, time 114.65ms
iter 216060: loss nan, time 116.81ms
iter 216070: loss nan, time 116.06ms
iter 216080: loss nan, time 114.90ms
iter 216090: loss nan, time 116.67ms
tensor(0.1147)
iter 216100: loss nan, time 116.05ms
iter 216110: loss nan, time 116.08ms
iter 216120: loss nan, time 118.08ms
iter 216130: loss nan, time 116.08ms
iter 216140: loss nan, time 116.67ms
iter 216150: loss nan, time 115.35ms
iter 216160: loss nan, time 115.21ms
iter 216170: loss nan, time 115.96ms
iter 216180: loss nan, time 115.74ms
iter 216190: loss nan, time 116.70ms
tensor(0.1355)
iter 216200: loss nan, time 117.00ms
iter 216210: loss nan, time 115.79ms
iter 216220: loss nan, time 114.72ms
iter 216230: loss nan, time 116.11ms
iter 216240: loss nan, time 115.02ms
step 216250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 216250: loss nan, time 2922.94ms
iter 216260: loss nan, time 116.14ms
iter 216270: loss nan, time 116.31ms
iter 216280: loss nan, time 117.70ms
iter 216290: loss nan, time 115.70ms
tensor(0.1577)
iter 216300: loss nan, time 115.39ms
iter 216310: loss nan, time 116.81ms
iter 216320: loss nan, time 115.84ms
iter 216330: loss nan, time 117.00ms
iter 216340: loss nan, time 118.09ms
iter 216350: loss nan, time 115.87ms
iter 216360: loss nan, time 116.74ms
iter 216370: loss nan, time 115.34ms
iter 216380: loss nan, time 115.32ms
iter 216390: loss nan, time 117.22ms
tensor(0.1813)
iter 216400: loss nan, time 116.14ms
iter 216410: loss nan, time 116.78ms
iter 216420: loss nan, time 117.88ms
iter 216430: loss nan, time 115.16ms
iter 216440: loss nan, time 116.75ms
iter 216450: loss nan, time 117.15ms
iter 216460: loss nan, time 115.27ms
iter 216470: loss nan, time 116.71ms
iter 216480: loss nan, time 115.93ms
iter 216490: loss nan, time 116.85ms
tensor(0.2061)
step 216500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 216500: loss nan, time 2906.42ms
iter 216510: loss nan, time 115.81ms
iter 216520: loss nan, time 116.74ms
iter 216530: loss nan, time 116.54ms
iter 216540: loss nan, time 115.52ms
iter 216550: loss nan, time 114.92ms
iter 216560: loss nan, time 116.08ms
iter 216570: loss nan, time 114.75ms
iter 216580: loss nan, time 118.07ms
iter 216590: loss nan, time 115.93ms
tensor(0.2321)
iter 216600: loss nan, time 117.18ms
iter 216610: loss nan, time 115.94ms
iter 216620: loss nan, time 115.58ms
iter 216630: loss nan, time 116.72ms
iter 216640: loss nan, time 117.10ms
iter 216650: loss nan, time 116.05ms
iter 216660: loss nan, time 116.78ms
iter 216670: loss nan, time 115.88ms
iter 216680: loss nan, time 115.07ms
iter 216690: loss nan, time 116.92ms
tensor(0.2591)
iter 216700: loss nan, time 116.24ms
iter 216710: loss nan, time 116.54ms
iter 216720: loss nan, time 117.69ms
iter 216730: loss nan, time 116.03ms
iter 216740: loss nan, time 117.01ms
step 216750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 216750: loss nan, time 2902.77ms
iter 216760: loss nan, time 115.01ms
iter 216770: loss nan, time 116.83ms
iter 216780: loss nan, time 115.75ms
iter 216790: loss nan, time 114.74ms
tensor(0.2871)
iter 216800: loss nan, time 118.43ms
iter 216810: loss nan, time 116.10ms
iter 216820: loss nan, time 116.05ms
iter 216830: loss nan, time 117.46ms
iter 216840: loss nan, time 116.12ms
iter 216850: loss nan, time 116.78ms
iter 216860: loss nan, time 116.52ms
iter 216870: loss nan, time 115.18ms
iter 216880: loss nan, time 116.94ms
iter 216890: loss nan, time 115.86ms
tensor(0.3159)
iter 216900: loss nan, time 117.04ms
iter 216910: loss nan, time 118.51ms
iter 216920: loss nan, time 114.98ms
iter 216930: loss nan, time 114.55ms
iter 216940: loss nan, time 116.79ms
iter 216950: loss nan, time 115.52ms
iter 216960: loss nan, time 116.95ms
iter 216970: loss nan, time 115.80ms
iter 216980: loss nan, time 117.19ms
iter 216990: loss nan, time 115.89ms
tensor(0.3455)
step 217000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 217000: loss nan, time 2905.64ms
iter 217010: loss nan, time 116.62ms
iter 217020: loss nan, time 115.05ms
iter 217030: loss nan, time 114.75ms
iter 217040: loss nan, time 117.89ms
iter 217050: loss nan, time 115.78ms
iter 217060: loss nan, time 116.63ms
iter 217070: loss nan, time 116.72ms
iter 217080: loss nan, time 114.31ms
iter 217090: loss nan, time 116.69ms
tensor(0.3757)
iter 217100: loss nan, time 116.06ms
iter 217110: loss nan, time 116.65ms
iter 217120: loss nan, time 117.96ms
iter 217130: loss nan, time 115.64ms
iter 217140: loss nan, time 116.67ms
iter 217150: loss nan, time 115.87ms
iter 217160: loss nan, time 116.18ms
iter 217170: loss nan, time 116.69ms
iter 217180: loss nan, time 116.03ms
iter 217190: loss nan, time 114.54ms
tensor(0.4063)
iter 217200: loss nan, time 118.75ms
iter 217210: loss nan, time 115.21ms
iter 217220: loss nan, time 116.74ms
iter 217230: loss nan, time 117.72ms
iter 217240: loss nan, time 115.96ms
step 217250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 217250: loss nan, time 2921.08ms
iter 217260: loss nan, time 116.01ms
iter 217270: loss nan, time 115.79ms
iter 217280: loss nan, time 116.79ms
iter 217290: loss nan, time 117.28ms
tensor(0.4373)
iter 217300: loss nan, time 116.54ms
iter 217310: loss nan, time 116.77ms
iter 217320: loss nan, time 115.79ms
iter 217330: loss nan, time 115.27ms
iter 217340: loss nan, time 116.78ms
iter 217350: loss nan, time 115.83ms
iter 217360: loss nan, time 116.48ms
iter 217370: loss nan, time 118.11ms
iter 217380: loss nan, time 115.87ms
iter 217390: loss nan, time 117.25ms
tensor(0.4686)
iter 217400: loss nan, time 117.38ms
iter 217410: loss nan, time 115.99ms
iter 217420: loss nan, time 116.83ms
iter 217430: loss nan, time 116.82ms
iter 217440: loss nan, time 115.90ms
iter 217450: loss nan, time 116.84ms
iter 217460: loss nan, time 115.24ms
iter 217470: loss nan, time 115.22ms
iter 217480: loss nan, time 116.86ms
iter 217490: loss nan, time 116.98ms
tensor(0.5000)
step 217500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 217500: loss nan, time 2906.67ms
iter 217510: loss nan, time 116.00ms
iter 217520: loss nan, time 115.88ms
iter 217530: loss nan, time 117.02ms
iter 217540: loss nan, time 118.10ms
iter 217550: loss nan, time 115.96ms
iter 217560: loss nan, time 116.86ms
iter 217570: loss nan, time 118.09ms
iter 217580: loss nan, time 116.16ms
iter 217590: loss nan, time 116.87ms
tensor(0.5314)
iter 217600: loss nan, time 118.09ms
iter 217610: loss nan, time 116.00ms
iter 217620: loss nan, time 116.78ms
iter 217630: loss nan, time 117.24ms
iter 217640: loss nan, time 116.38ms
iter 217650: loss nan, time 116.81ms
iter 217660: loss nan, time 113.97ms
iter 217670: loss nan, time 114.43ms
iter 217680: loss nan, time 116.42ms
iter 217690: loss nan, time 116.81ms
tensor(0.5627)
iter 217700: loss nan, time 116.10ms
iter 217710: loss nan, time 115.57ms
iter 217720: loss nan, time 116.80ms
iter 217730: loss nan, time 116.08ms
iter 217740: loss nan, time 115.36ms
step 217750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 217750: loss nan, time 2884.57ms
iter 217760: loss nan, time 115.96ms
iter 217770: loss nan, time 117.95ms
iter 217780: loss nan, time 115.76ms
iter 217790: loss nan, time 116.78ms
tensor(0.5937)
iter 217800: loss nan, time 116.96ms
iter 217810: loss nan, time 115.80ms
iter 217820: loss nan, time 116.79ms
iter 217830: loss nan, time 116.06ms
iter 217840: loss nan, time 114.56ms
iter 217850: loss nan, time 117.28ms
iter 217860: loss nan, time 116.03ms
iter 217870: loss nan, time 116.87ms
iter 217880: loss nan, time 117.79ms
iter 217890: loss nan, time 115.90ms
tensor(0.6243)
iter 217900: loss nan, time 114.93ms
iter 217910: loss nan, time 116.83ms
iter 217920: loss nan, time 115.35ms
iter 217930: loss nan, time 116.78ms
iter 217940: loss nan, time 115.89ms
iter 217950: loss nan, time 114.88ms
iter 217960: loss nan, time 116.32ms
iter 217970: loss nan, time 115.77ms
iter 217980: loss nan, time 116.47ms
iter 217990: loss nan, time 118.13ms
tensor(0.6545)
step 218000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 218000: loss nan, time 2906.16ms
iter 218010: loss nan, time 116.91ms
iter 218020: loss nan, time 116.76ms
iter 218030: loss nan, time 115.29ms
iter 218040: loss nan, time 116.83ms
iter 218050: loss nan, time 114.63ms
iter 218060: loss nan, time 114.75ms
iter 218070: loss nan, time 118.06ms
iter 218080: loss nan, time 115.83ms
iter 218090: loss nan, time 116.94ms
tensor(0.6841)
iter 218100: loss nan, time 118.00ms
iter 218110: loss nan, time 114.85ms
iter 218120: loss nan, time 116.82ms
iter 218130: loss nan, time 116.46ms
iter 218140: loss nan, time 115.45ms
iter 218150: loss nan, time 116.65ms
iter 218160: loss nan, time 116.05ms
iter 218170: loss nan, time 115.09ms
iter 218180: loss nan, time 117.39ms
iter 218190: loss nan, time 116.25ms
tensor(0.7129)
iter 218200: loss nan, time 117.13ms
iter 218210: loss nan, time 118.12ms
iter 218220: loss nan, time 115.98ms
iter 218230: loss nan, time 116.70ms
iter 218240: loss nan, time 115.59ms
step 218250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 218250: loss nan, time 2901.80ms
iter 218260: loss nan, time 118.17ms
iter 218270: loss nan, time 116.03ms
iter 218280: loss nan, time 116.65ms
iter 218290: loss nan, time 115.81ms
tensor(0.7409)
iter 218300: loss nan, time 115.08ms
iter 218310: loss nan, time 116.79ms
iter 218320: loss nan, time 115.78ms
iter 218330: loss nan, time 115.97ms
iter 218340: loss nan, time 118.16ms
iter 218350: loss nan, time 115.01ms
iter 218360: loss nan, time 116.57ms
iter 218370: loss nan, time 116.14ms
iter 218380: loss nan, time 115.48ms
iter 218390: loss nan, time 116.77ms
tensor(0.7679)
iter 218400: loss nan, time 116.40ms
iter 218410: loss nan, time 114.81ms
iter 218420: loss nan, time 116.86ms
iter 218430: loss nan, time 114.70ms
iter 218440: loss nan, time 116.78ms
iter 218450: loss nan, time 118.01ms
iter 218460: loss nan, time 115.93ms
iter 218470: loss nan, time 116.79ms
iter 218480: loss nan, time 116.74ms
iter 218490: loss nan, time 114.62ms
tensor(0.7939)
step 218500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 218500: loss nan, time 2902.55ms
iter 218510: loss nan, time 116.01ms
iter 218520: loss nan, time 116.63ms
iter 218530: loss nan, time 117.04ms
iter 218540: loss nan, time 115.49ms
iter 218550: loss nan, time 114.91ms
iter 218560: loss nan, time 116.51ms
iter 218570: loss nan, time 115.56ms
iter 218580: loss nan, time 117.08ms
iter 218590: loss nan, time 115.73ms
tensor(0.8187)
iter 218600: loss nan, time 117.14ms
iter 218610: loss nan, time 115.44ms
iter 218620: loss nan, time 116.83ms
iter 218630: loss nan, time 115.41ms
iter 218640: loss nan, time 117.31ms
iter 218650: loss nan, time 116.26ms
iter 218660: loss nan, time 115.39ms
iter 218670: loss nan, time 115.37ms
iter 218680: loss nan, time 116.47ms
iter 218690: loss nan, time 114.98ms
tensor(0.8423)
iter 218700: loss nan, time 117.73ms
iter 218710: loss nan, time 115.90ms
iter 218720: loss nan, time 116.85ms
iter 218730: loss nan, time 115.85ms
iter 218740: loss nan, time 115.74ms
step 218750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 218750: loss nan, time 2905.73ms
iter 218760: loss nan, time 115.96ms
iter 218770: loss nan, time 116.19ms
iter 218780: loss nan, time 116.99ms
iter 218790: loss nan, time 116.37ms
tensor(0.8645)
iter 218800: loss nan, time 115.21ms
iter 218810: loss nan, time 116.62ms
iter 218820: loss nan, time 114.57ms
iter 218830: loss nan, time 116.68ms
iter 218840: loss nan, time 118.17ms
iter 218850: loss nan, time 115.79ms
iter 218860: loss nan, time 116.64ms
iter 218870: loss nan, time 116.35ms
iter 218880: loss nan, time 114.96ms
iter 218890: loss nan, time 116.62ms
tensor(0.8853)
iter 218900: loss nan, time 116.00ms
iter 218910: loss nan, time 116.68ms
iter 218920: loss nan, time 117.93ms
iter 218930: loss nan, time 115.70ms
iter 218940: loss nan, time 116.90ms
iter 218950: loss nan, time 116.24ms
iter 218960: loss nan, time 117.03ms
iter 218970: loss nan, time 116.59ms
iter 218980: loss nan, time 115.75ms
iter 218990: loss nan, time 114.56ms
tensor(0.9045)
step 219000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 219000: loss nan, time 2904.83ms
iter 219010: loss nan, time 115.51ms
iter 219020: loss nan, time 116.72ms
iter 219030: loss nan, time 116.06ms
iter 219040: loss nan, time 115.01ms
iter 219050: loss nan, time 117.14ms
iter 219060: loss nan, time 115.78ms
iter 219070: loss nan, time 116.79ms
iter 219080: loss nan, time 116.83ms
iter 219090: loss nan, time 116.10ms
tensor(0.9222)
iter 219100: loss nan, time 117.39ms
iter 219110: loss nan, time 118.23ms
iter 219120: loss nan, time 115.88ms
iter 219130: loss nan, time 116.88ms
iter 219140: loss nan, time 116.80ms
iter 219150: loss nan, time 115.63ms
iter 219160: loss nan, time 115.53ms
iter 219170: loss nan, time 118.10ms
iter 219180: loss nan, time 115.34ms
iter 219190: loss nan, time 116.75ms
tensor(0.9382)
iter 219200: loss nan, time 116.97ms
iter 219210: loss nan, time 114.73ms
iter 219220: loss nan, time 117.03ms
iter 219230: loss nan, time 117.48ms
iter 219240: loss nan, time 114.54ms
step 219250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 219250: loss nan, time 2910.59ms
iter 219260: loss nan, time 115.50ms
iter 219270: loss nan, time 116.15ms
iter 219280: loss nan, time 118.10ms
iter 219290: loss nan, time 115.05ms
tensor(0.9524)
iter 219300: loss nan, time 117.12ms
iter 219310: loss nan, time 116.90ms
iter 219320: loss nan, time 114.65ms
iter 219330: loss nan, time 118.22ms
iter 219340: loss nan, time 115.79ms
iter 219350: loss nan, time 115.35ms
iter 219360: loss nan, time 118.21ms
iter 219370: loss nan, time 116.29ms
iter 219380: loss nan, time 114.67ms
iter 219390: loss nan, time 117.94ms
tensor(0.9649)
iter 219400: loss nan, time 115.61ms
iter 219410: loss nan, time 116.95ms
iter 219420: loss nan, time 118.26ms
iter 219430: loss nan, time 114.71ms
iter 219440: loss nan, time 114.71ms
iter 219450: loss nan, time 116.70ms
iter 219460: loss nan, time 115.21ms
iter 219470: loss nan, time 117.97ms
iter 219480: loss nan, time 116.57ms
iter 219490: loss nan, time 114.59ms
tensor(0.9755)
step 219500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 219500: loss nan, time 2923.68ms
iter 219510: loss nan, time 115.90ms
iter 219520: loss nan, time 115.23ms
iter 219530: loss nan, time 116.99ms
iter 219540: loss nan, time 116.11ms
iter 219550: loss nan, time 115.39ms
iter 219560: loss nan, time 117.90ms
iter 219570: loss nan, time 115.46ms
iter 219580: loss nan, time 116.80ms
iter 219590: loss nan, time 116.88ms
tensor(0.9843)
iter 219600: loss nan, time 114.96ms
iter 219610: loss nan, time 116.74ms
iter 219620: loss nan, time 116.80ms
iter 219630: loss nan, time 114.73ms
iter 219640: loss nan, time 118.46ms
iter 219650: loss nan, time 114.91ms
iter 219660: loss nan, time 115.12ms
iter 219670: loss nan, time 117.88ms
iter 219680: loss nan, time 115.38ms
iter 219690: loss nan, time 116.68ms
tensor(0.9911)
iter 219700: loss nan, time 118.46ms
iter 219710: loss nan, time 114.68ms
iter 219720: loss nan, time 116.79ms
iter 219730: loss nan, time 117.51ms
iter 219740: loss nan, time 114.60ms
step 219750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 219750: loss nan, time 2898.28ms
iter 219760: loss nan, time 115.11ms
iter 219770: loss nan, time 114.71ms
iter 219780: loss nan, time 117.48ms
iter 219790: loss nan, time 114.61ms
tensor(0.9961)
iter 219800: loss nan, time 118.35ms
iter 219810: loss nan, time 116.40ms
iter 219820: loss nan, time 115.27ms
iter 219830: loss nan, time 115.76ms
iter 219840: loss nan, time 115.68ms
iter 219850: loss nan, time 116.75ms
iter 219860: loss nan, time 118.27ms
iter 219870: loss nan, time 120.72ms
iter 219880: loss nan, time 121.28ms
iter 219890: loss nan, time 122.28ms
tensor(0.9990)
iter 219900: loss nan, time 121.33ms
iter 219910: loss nan, time 122.64ms
iter 219920: loss nan, time 122.55ms
iter 219930: loss nan, time 121.21ms
iter 219940: loss nan, time 121.94ms
iter 219950: loss nan, time 118.49ms
iter 219960: loss nan, time 118.81ms
iter 219970: loss nan, time 119.63ms
iter 219980: loss nan, time 118.54ms
iter 219990: loss nan, time 120.39ms
tensor(1.)
step 220000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 220000: loss nan, time 2889.95ms
iter 220010: loss nan, time 119.48ms
iter 220020: loss nan, time 118.09ms
iter 220030: loss nan, time 119.33ms
iter 220040: loss nan, time 120.76ms
iter 220050: loss nan, time 120.89ms
iter 220060: loss nan, time 121.13ms
iter 220070: loss nan, time 121.98ms
iter 220080: loss nan, time 122.57ms
iter 220090: loss nan, time 120.40ms
tensor(0.9990)
iter 220100: loss nan, time 122.81ms
iter 220110: loss nan, time 121.29ms
iter 220120: loss nan, time 121.21ms
iter 220130: loss nan, time 121.30ms
iter 220140: loss nan, time 119.15ms
iter 220150: loss nan, time 118.38ms
iter 220160: loss nan, time 119.34ms
iter 220170: loss nan, time 119.63ms
iter 220180: loss nan, time 120.08ms
iter 220190: loss nan, time 120.75ms
tensor(0.9961)
iter 220200: loss nan, time 120.61ms
iter 220210: loss nan, time 122.73ms
iter 220220: loss nan, time 122.34ms
iter 220230: loss nan, time 121.97ms
iter 220240: loss nan, time 121.79ms
step 220250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 220250: loss nan, time 2907.24ms
iter 220260: loss nan, time 119.46ms
iter 220270: loss nan, time 119.45ms
iter 220280: loss nan, time 119.61ms
iter 220290: loss nan, time 120.57ms
tensor(0.9911)
iter 220300: loss nan, time 121.41ms
iter 220310: loss nan, time 121.52ms
iter 220320: loss nan, time 122.90ms
iter 220330: loss nan, time 122.66ms
iter 220340: loss nan, time 120.33ms
iter 220350: loss nan, time 120.35ms
iter 220360: loss nan, time 121.39ms
iter 220370: loss nan, time 119.55ms
iter 220380: loss nan, time 119.46ms
iter 220390: loss nan, time 119.96ms
tensor(0.9843)
iter 220400: loss nan, time 119.59ms
iter 220410: loss nan, time 120.70ms
iter 220420: loss nan, time 122.43ms
iter 220430: loss nan, time 122.58ms
iter 220440: loss nan, time 122.61ms
iter 220450: loss nan, time 120.53ms
iter 220460: loss nan, time 121.34ms
iter 220470: loss nan, time 121.50ms
iter 220480: loss nan, time 119.12ms
iter 220490: loss nan, time 119.93ms
tensor(0.9755)
step 220500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 220500: loss nan, time 2909.14ms
iter 220510: loss nan, time 119.53ms
iter 220520: loss nan, time 120.23ms
iter 220530: loss nan, time 120.38ms
iter 220540: loss nan, time 120.53ms
iter 220550: loss nan, time 122.40ms
iter 220560: loss nan, time 122.59ms
iter 220570: loss nan, time 122.51ms
iter 220580: loss nan, time 122.50ms
iter 220590: loss nan, time 119.28ms
tensor(0.9649)
iter 220600: loss nan, time 121.82ms
iter 220610: loss nan, time 119.40ms
iter 220620: loss nan, time 119.94ms
iter 220630: loss nan, time 120.42ms
iter 220640: loss nan, time 120.89ms
iter 220650: loss nan, time 120.20ms
iter 220660: loss nan, time 122.72ms
iter 220670: loss nan, time 122.75ms
iter 220680: loss nan, time 122.62ms
iter 220690: loss nan, time 121.23ms
tensor(0.9524)
iter 220700: loss nan, time 121.67ms
iter 220710: loss nan, time 119.30ms
iter 220720: loss nan, time 119.03ms
iter 220730: loss nan, time 119.19ms
iter 220740: loss nan, time 120.19ms
step 220750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 220750: loss nan, time 2900.48ms
iter 220760: loss nan, time 119.30ms
iter 220770: loss nan, time 120.43ms
iter 220780: loss nan, time 121.72ms
iter 220790: loss nan, time 120.69ms
tensor(0.9382)
iter 220800: loss nan, time 123.22ms
iter 220810: loss nan, time 122.52ms
iter 220820: loss nan, time 121.78ms
iter 220830: loss nan, time 121.24ms
iter 220840: loss nan, time 119.33ms
iter 220850: loss nan, time 119.38ms
iter 220860: loss nan, time 119.20ms
iter 220870: loss nan, time 120.31ms
iter 220880: loss nan, time 120.42ms
iter 220890: loss nan, time 121.35ms
tensor(0.9222)
iter 220900: loss nan, time 121.13ms
iter 220910: loss nan, time 122.94ms
iter 220920: loss nan, time 122.83ms
iter 220930: loss nan, time 121.96ms
iter 220940: loss nan, time 121.76ms
iter 220950: loss nan, time 119.58ms
iter 220960: loss nan, time 119.38ms
iter 220970: loss nan, time 119.76ms
iter 220980: loss nan, time 120.42ms
iter 220990: loss nan, time 122.20ms
tensor(0.9045)
step 221000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 221000: loss nan, time 2886.81ms
iter 221010: loss nan, time 119.78ms
iter 221020: loss nan, time 121.15ms
iter 221030: loss nan, time 120.92ms
iter 221040: loss nan, time 120.19ms
iter 221050: loss nan, time 122.63ms
iter 221060: loss nan, time 122.79ms
iter 221070: loss nan, time 122.74ms
iter 221080: loss nan, time 121.44ms
iter 221090: loss nan, time 118.98ms
tensor(0.8853)
iter 221100: loss nan, time 119.60ms
iter 221110: loss nan, time 119.12ms
iter 221120: loss nan, time 119.45ms
iter 221130: loss nan, time 119.80ms
iter 221140: loss nan, time 120.70ms
iter 221150: loss nan, time 120.31ms
iter 221160: loss nan, time 122.27ms
iter 221170: loss nan, time 122.64ms
iter 221180: loss nan, time 122.51ms
iter 221190: loss nan, time 122.46ms
tensor(0.8645)
iter 221200: loss nan, time 121.77ms
iter 221210: loss nan, time 121.33ms
iter 221220: loss nan, time 119.36ms
iter 221230: loss nan, time 119.19ms
iter 221240: loss nan, time 119.30ms
step 221250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 221250: loss nan, time 2916.74ms
iter 221260: loss nan, time 120.16ms
iter 221270: loss nan, time 121.01ms
iter 221280: loss nan, time 121.57ms
iter 221290: loss nan, time 120.70ms
tensor(0.8423)
iter 221300: loss nan, time 121.85ms
iter 221310: loss nan, time 122.45ms
iter 221320: loss nan, time 122.40ms
iter 221330: loss nan, time 122.16ms
iter 221340: loss nan, time 119.46ms
iter 221350: loss nan, time 119.25ms
iter 221360: loss nan, time 118.62ms
iter 221370: loss nan, time 119.63ms
iter 221380: loss nan, time 119.56ms
iter 221390: loss nan, time 120.44ms
tensor(0.8187)
iter 221400: loss nan, time 119.39ms
iter 221410: loss nan, time 121.25ms
iter 221420: loss nan, time 122.85ms
iter 221430: loss nan, time 122.74ms
iter 221440: loss nan, time 122.47ms
iter 221450: loss nan, time 121.67ms
iter 221460: loss nan, time 121.52ms
iter 221470: loss nan, time 121.29ms
iter 221480: loss nan, time 119.20ms
iter 221490: loss nan, time 119.34ms
tensor(0.7939)
step 221500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 221500: loss nan, time 2906.61ms
iter 221510: loss nan, time 119.50ms
iter 221520: loss nan, time 119.33ms
iter 221530: loss nan, time 119.91ms
iter 221540: loss nan, time 120.59ms
iter 221550: loss nan, time 121.78ms
iter 221560: loss nan, time 123.02ms
iter 221570: loss nan, time 123.12ms
iter 221580: loss nan, time 121.46ms
iter 221590: loss nan, time 119.21ms
tensor(0.7679)
iter 221600: loss nan, time 120.62ms
iter 221610: loss nan, time 119.40ms
iter 221620: loss nan, time 119.11ms
iter 221630: loss nan, time 119.44ms
iter 221640: loss nan, time 120.48ms
iter 221650: loss nan, time 119.47ms
iter 221660: loss nan, time 121.15ms
iter 221670: loss nan, time 122.44ms
iter 221680: loss nan, time 122.58ms
iter 221690: loss nan, time 122.09ms
tensor(0.7409)
iter 221700: loss nan, time 121.94ms
iter 221710: loss nan, time 121.89ms
iter 221720: loss nan, time 119.49ms
iter 221730: loss nan, time 119.62ms
iter 221740: loss nan, time 119.97ms
step 221750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 221750: loss nan, time 2914.75ms
iter 221760: loss nan, time 120.75ms
iter 221770: loss nan, time 121.88ms
iter 221780: loss nan, time 122.44ms
iter 221790: loss nan, time 120.29ms
tensor(0.7129)
iter 221800: loss nan, time 123.00ms
iter 221810: loss nan, time 121.43ms
iter 221820: loss nan, time 121.54ms
iter 221830: loss nan, time 119.16ms
iter 221840: loss nan, time 118.94ms
iter 221850: loss nan, time 119.53ms
iter 221860: loss nan, time 120.78ms
iter 221870: loss nan, time 120.34ms
iter 221880: loss nan, time 120.98ms
iter 221890: loss nan, time 121.43ms
tensor(0.6841)
iter 221900: loss nan, time 121.13ms
iter 221910: loss nan, time 122.91ms
iter 221920: loss nan, time 122.23ms
iter 221930: loss nan, time 122.58ms
iter 221940: loss nan, time 121.57ms
iter 221950: loss nan, time 121.54ms
iter 221960: loss nan, time 119.05ms
iter 221970: loss nan, time 119.20ms
iter 221980: loss nan, time 119.23ms
iter 221990: loss nan, time 119.38ms
tensor(0.6545)
step 222000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 222000: loss nan, time 2893.49ms
iter 222010: loss nan, time 119.93ms
iter 222020: loss nan, time 118.99ms
iter 222030: loss nan, time 119.65ms
iter 222040: loss nan, time 120.47ms
iter 222050: loss nan, time 120.63ms
iter 222060: loss nan, time 121.20ms
iter 222070: loss nan, time 122.54ms
iter 222080: loss nan, time 122.78ms
iter 222090: loss nan, time 120.47ms
tensor(0.6243)
iter 222100: loss nan, time 122.92ms
iter 222110: loss nan, time 121.72ms
iter 222120: loss nan, time 121.81ms
iter 222130: loss nan, time 119.10ms
iter 222140: loss nan, time 118.64ms
iter 222150: loss nan, time 119.14ms
iter 222160: loss nan, time 119.17ms
iter 222170: loss nan, time 119.32ms
iter 222180: loss nan, time 120.44ms
iter 222190: loss nan, time 121.07ms
tensor(0.5937)
iter 222200: loss nan, time 120.65ms
iter 222210: loss nan, time 122.71ms
iter 222220: loss nan, time 122.61ms
iter 222230: loss nan, time 121.50ms
iter 222240: loss nan, time 121.30ms
step 222250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 222250: loss nan, time 2897.75ms
iter 222260: loss nan, time 121.78ms
iter 222270: loss nan, time 121.79ms
iter 222280: loss nan, time 121.42ms
iter 222290: loss nan, time 121.33ms
tensor(0.5627)
iter 222300: loss nan, time 120.01ms
iter 222310: loss nan, time 120.44ms
iter 222320: loss nan, time 120.62ms
iter 222330: loss nan, time 120.40ms
iter 222340: loss nan, time 120.28ms
iter 222350: loss nan, time 123.26ms
iter 222360: loss nan, time 121.57ms
iter 222370: loss nan, time 121.28ms
iter 222380: loss nan, time 121.38ms
iter 222390: loss nan, time 119.29ms
tensor(0.5314)
iter 222400: loss nan, time 119.86ms
iter 222410: loss nan, time 120.87ms
iter 222420: loss nan, time 121.13ms
iter 222430: loss nan, time 120.30ms
iter 222440: loss nan, time 120.29ms
iter 222450: loss nan, time 120.34ms
iter 222460: loss nan, time 122.74ms
iter 222470: loss nan, time 121.39ms
iter 222480: loss nan, time 121.13ms
iter 222490: loss nan, time 121.51ms
tensor(0.5000)
step 222500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 222500: loss nan, time 2927.04ms
iter 222510: loss nan, time 119.74ms
iter 222520: loss nan, time 120.32ms
iter 222530: loss nan, time 120.26ms
iter 222540: loss nan, time 122.91ms
iter 222550: loss nan, time 121.86ms
iter 222560: loss nan, time 122.82ms
iter 222570: loss nan, time 121.77ms
iter 222580: loss nan, time 121.18ms
iter 222590: loss nan, time 119.33ms
tensor(0.4686)
iter 222600: loss nan, time 122.38ms
iter 222610: loss nan, time 120.55ms
iter 222620: loss nan, time 120.39ms
iter 222630: loss nan, time 120.74ms
iter 222640: loss nan, time 121.13ms
iter 222650: loss nan, time 121.13ms
iter 222660: loss nan, time 122.74ms
iter 222670: loss nan, time 121.38ms
iter 222680: loss nan, time 120.73ms
iter 222690: loss nan, time 121.49ms
tensor(0.4373)
iter 222700: loss nan, time 121.53ms
iter 222710: loss nan, time 119.62ms
iter 222720: loss nan, time 120.35ms
iter 222730: loss nan, time 120.70ms
iter 222740: loss nan, time 120.39ms
step 222750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 222750: loss nan, time 2909.04ms
iter 222760: loss nan, time 120.47ms
iter 222770: loss nan, time 120.77ms
iter 222780: loss nan, time 121.82ms
iter 222790: loss nan, time 120.45ms
tensor(0.4063)
iter 222800: loss nan, time 123.11ms
iter 222810: loss nan, time 121.39ms
iter 222820: loss nan, time 120.75ms
iter 222830: loss nan, time 121.41ms
iter 222840: loss nan, time 118.99ms
iter 222850: loss nan, time 119.07ms
iter 222860: loss nan, time 119.55ms
iter 222870: loss nan, time 120.51ms
iter 222880: loss nan, time 119.58ms
iter 222890: loss nan, time 120.28ms
tensor(0.3757)
iter 222900: loss nan, time 120.06ms
iter 222910: loss nan, time 120.44ms
iter 222920: loss nan, time 122.13ms
iter 222930: loss nan, time 122.91ms
iter 222940: loss nan, time 122.60ms
iter 222950: loss nan, time 121.36ms
iter 222960: loss nan, time 121.12ms
iter 222970: loss nan, time 121.48ms
iter 222980: loss nan, time 121.91ms
iter 222990: loss nan, time 121.26ms
tensor(0.3455)
step 223000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 223000: loss nan, time 2888.53ms
iter 223010: loss nan, time 121.52ms
iter 223020: loss nan, time 121.05ms
iter 223030: loss nan, time 119.13ms
iter 223040: loss nan, time 119.90ms
iter 223050: loss nan, time 120.50ms
iter 223060: loss nan, time 120.35ms
iter 223070: loss nan, time 120.20ms
iter 223080: loss nan, time 120.30ms
iter 223090: loss nan, time 119.33ms
tensor(0.3159)
iter 223100: loss nan, time 121.86ms
iter 223110: loss nan, time 122.82ms
iter 223120: loss nan, time 122.97ms
iter 223130: loss nan, time 121.22ms
iter 223140: loss nan, time 119.45ms
iter 223150: loss nan, time 121.45ms
iter 223160: loss nan, time 119.61ms
iter 223170: loss nan, time 120.01ms
iter 223180: loss nan, time 120.43ms
iter 223190: loss nan, time 120.40ms
tensor(0.2871)
iter 223200: loss nan, time 118.90ms
iter 223210: loss nan, time 121.66ms
iter 223220: loss nan, time 122.97ms
iter 223230: loss nan, time 121.76ms
iter 223240: loss nan, time 121.51ms
step 223250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 223250: loss nan, time 2917.61ms
iter 223260: loss nan, time 121.64ms
iter 223270: loss nan, time 119.70ms
iter 223280: loss nan, time 119.37ms
iter 223290: loss nan, time 120.50ms
tensor(0.2591)
iter 223300: loss nan, time 120.22ms
iter 223310: loss nan, time 120.53ms
iter 223320: loss nan, time 121.50ms
iter 223330: loss nan, time 122.84ms
iter 223340: loss nan, time 119.65ms
iter 223350: loss nan, time 120.73ms
iter 223360: loss nan, time 121.56ms
iter 223370: loss nan, time 121.50ms
iter 223380: loss nan, time 119.08ms
iter 223390: loss nan, time 119.79ms
tensor(0.2321)
iter 223400: loss nan, time 119.63ms
iter 223410: loss nan, time 120.56ms
iter 223420: loss nan, time 120.53ms
iter 223430: loss nan, time 120.72ms
iter 223440: loss nan, time 121.69ms
iter 223450: loss nan, time 121.62ms
iter 223460: loss nan, time 122.75ms
iter 223470: loss nan, time 121.64ms
iter 223480: loss nan, time 121.30ms
iter 223490: loss nan, time 121.25ms
tensor(0.2061)
step 223500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 223500: loss nan, time 2901.72ms
iter 223510: loss nan, time 121.84ms
iter 223520: loss nan, time 119.29ms
iter 223530: loss nan, time 119.62ms
iter 223540: loss nan, time 120.29ms
iter 223550: loss nan, time 120.47ms
iter 223560: loss nan, time 120.69ms
iter 223570: loss nan, time 121.39ms
iter 223580: loss nan, time 122.46ms
iter 223590: loss nan, time 120.61ms
tensor(0.1813)
iter 223600: loss nan, time 121.27ms
iter 223610: loss nan, time 121.58ms
iter 223620: loss nan, time 121.75ms
iter 223630: loss nan, time 119.76ms
iter 223640: loss nan, time 119.81ms
iter 223650: loss nan, time 119.37ms
iter 223660: loss nan, time 120.68ms
iter 223670: loss nan, time 120.82ms
iter 223680: loss nan, time 122.34ms
iter 223690: loss nan, time 122.71ms
tensor(0.1577)
iter 223700: loss nan, time 121.52ms
iter 223710: loss nan, time 121.37ms
iter 223720: loss nan, time 121.35ms
iter 223730: loss nan, time 119.61ms
iter 223740: loss nan, time 119.17ms
step 223750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 223750: loss nan, time 2914.34ms
iter 223760: loss nan, time 120.28ms
iter 223770: loss nan, time 120.19ms
iter 223780: loss nan, time 120.58ms
iter 223790: loss nan, time 120.59ms
tensor(0.1355)
iter 223800: loss nan, time 120.80ms
iter 223810: loss nan, time 122.60ms
iter 223820: loss nan, time 121.50ms
iter 223830: loss nan, time 120.83ms
iter 223840: loss nan, time 119.15ms
iter 223850: loss nan, time 120.16ms
iter 223860: loss nan, time 121.44ms
iter 223870: loss nan, time 121.16ms
iter 223880: loss nan, time 120.89ms
iter 223890: loss nan, time 119.24ms
tensor(0.1147)
iter 223900: loss nan, time 118.65ms
iter 223910: loss nan, time 118.71ms
iter 223920: loss nan, time 119.83ms
iter 223930: loss nan, time 120.45ms
iter 223940: loss nan, time 120.00ms
iter 223950: loss nan, time 118.83ms
iter 223960: loss nan, time 120.08ms
iter 223970: loss nan, time 120.23ms
iter 223980: loss nan, time 120.24ms
iter 223990: loss nan, time 120.00ms
tensor(0.0955)
step 224000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 224000: loss nan, time 2915.83ms
iter 224010: loss nan, time 120.87ms
iter 224020: loss nan, time 122.03ms
iter 224030: loss nan, time 122.74ms
iter 224040: loss nan, time 122.10ms
iter 224050: loss nan, time 121.05ms
iter 224060: loss nan, time 121.22ms
iter 224070: loss nan, time 121.47ms
iter 224080: loss nan, time 119.65ms
iter 224090: loss nan, time 120.82ms
tensor(0.0778)
iter 224100: loss nan, time 120.61ms
iter 224110: loss nan, time 120.42ms
iter 224120: loss nan, time 120.87ms
iter 224130: loss nan, time 121.73ms
iter 224140: loss nan, time 120.54ms
iter 224150: loss nan, time 121.59ms
iter 224160: loss nan, time 121.19ms
iter 224170: loss nan, time 121.59ms
iter 224180: loss nan, time 121.58ms
iter 224190: loss nan, time 119.33ms
tensor(0.0618)
iter 224200: loss nan, time 119.35ms
iter 224210: loss nan, time 120.54ms
iter 224220: loss nan, time 120.56ms
iter 224230: loss nan, time 120.50ms
iter 224240: loss nan, time 120.27ms
step 224250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 224250: loss nan, time 2918.32ms
iter 224260: loss nan, time 120.88ms
iter 224270: loss nan, time 122.45ms
iter 224280: loss nan, time 122.36ms
iter 224290: loss nan, time 121.07ms
tensor(0.0476)
iter 224300: loss nan, time 121.69ms
iter 224310: loss nan, time 120.83ms
iter 224320: loss nan, time 119.18ms
iter 224330: loss nan, time 119.11ms
iter 224340: loss nan, time 119.76ms
iter 224350: loss nan, time 121.24ms
iter 224360: loss nan, time 120.37ms
iter 224370: loss nan, time 120.39ms
iter 224380: loss nan, time 121.82ms
iter 224390: loss nan, time 120.31ms
tensor(0.0351)
iter 224400: loss nan, time 122.91ms
iter 224410: loss nan, time 121.31ms
iter 224420: loss nan, time 121.49ms
iter 224430: loss nan, time 121.29ms
iter 224440: loss nan, time 119.24ms
iter 224450: loss nan, time 119.60ms
iter 224460: loss nan, time 119.47ms
iter 224470: loss nan, time 120.93ms
iter 224480: loss nan, time 120.67ms
iter 224490: loss nan, time 120.36ms
tensor(0.0245)
step 224500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 224500: loss nan, time 2920.75ms
iter 224510: loss nan, time 121.99ms
iter 224520: loss nan, time 121.24ms
iter 224530: loss nan, time 121.47ms
iter 224540: loss nan, time 121.40ms
iter 224550: loss nan, time 121.19ms
iter 224560: loss nan, time 119.48ms
iter 224570: loss nan, time 120.69ms
iter 224580: loss nan, time 121.01ms
iter 224590: loss nan, time 120.43ms
tensor(0.0157)
iter 224600: loss nan, time 119.67ms
iter 224610: loss nan, time 121.35ms
iter 224620: loss nan, time 122.58ms
iter 224630: loss nan, time 122.59ms
iter 224640: loss nan, time 119.57ms
iter 224650: loss nan, time 121.16ms
iter 224660: loss nan, time 121.18ms
iter 224670: loss nan, time 121.36ms
iter 224680: loss nan, time 119.05ms
iter 224690: loss nan, time 119.28ms
tensor(0.0089)
iter 224700: loss nan, time 119.69ms
iter 224710: loss nan, time 120.67ms
iter 224720: loss nan, time 120.49ms
iter 224730: loss nan, time 120.50ms
iter 224740: loss nan, time 121.15ms
step 224750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 224750: loss nan, time 2925.66ms
iter 224760: loss nan, time 121.39ms
iter 224770: loss nan, time 122.02ms
iter 224780: loss nan, time 120.80ms
iter 224790: loss nan, time 120.99ms
tensor(0.0039)
iter 224800: loss nan, time 120.14ms
iter 224810: loss nan, time 119.07ms
iter 224820: loss nan, time 119.75ms
iter 224830: loss nan, time 119.25ms
iter 224840: loss nan, time 119.22ms
iter 224850: loss nan, time 119.59ms
iter 224860: loss nan, time 119.63ms
iter 224870: loss nan, time 120.12ms
iter 224880: loss nan, time 120.70ms
iter 224890: loss nan, time 119.84ms
tensor(0.0010)
iter 224900: loss nan, time 120.32ms
iter 224910: loss nan, time 119.86ms
iter 224920: loss nan, time 120.09ms
iter 224930: loss nan, time 120.68ms
iter 224940: loss nan, time 119.84ms
iter 224950: loss nan, time 121.19ms
iter 224960: loss nan, time 121.33ms
iter 224970: loss nan, time 121.56ms
iter 224980: loss nan, time 122.06ms
iter 224990: loss nan, time 120.15ms
tensor(0.0010)
step 225000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 225000: loss nan, time 2916.76ms
iter 225010: loss nan, time 121.96ms
iter 225020: loss nan, time 121.13ms
iter 225030: loss nan, time 120.67ms
iter 225040: loss nan, time 121.03ms
iter 225050: loss nan, time 118.76ms
iter 225060: loss nan, time 120.88ms
iter 225070: loss nan, time 121.20ms
iter 225080: loss nan, time 121.54ms
iter 225090: loss nan, time 120.85ms
tensor(0.0010)
iter 225100: loss nan, time 118.20ms
iter 225110: loss nan, time 118.25ms
iter 225120: loss nan, time 118.99ms
iter 225130: loss nan, time 120.52ms
iter 225140: loss nan, time 119.82ms
iter 225150: loss nan, time 119.77ms
iter 225160: loss nan, time 117.85ms
iter 225170: loss nan, time 120.21ms
iter 225180: loss nan, time 120.33ms
iter 225190: loss nan, time 119.76ms
tensor(0.0039)
iter 225200: loss nan, time 120.15ms
iter 225210: loss nan, time 118.47ms
iter 225220: loss nan, time 119.82ms
iter 225230: loss nan, time 119.77ms
iter 225240: loss nan, time 119.87ms
step 225250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 225250: loss nan, time 2905.57ms
iter 225260: loss nan, time 119.94ms
iter 225270: loss nan, time 118.76ms
iter 225280: loss nan, time 119.86ms
iter 225290: loss nan, time 118.80ms
tensor(0.0089)
iter 225300: loss nan, time 120.13ms
iter 225310: loss nan, time 119.90ms
iter 225320: loss nan, time 118.74ms
iter 225330: loss nan, time 119.88ms
iter 225340: loss nan, time 119.18ms
iter 225350: loss nan, time 120.39ms
iter 225360: loss nan, time 120.29ms
iter 225370: loss nan, time 119.74ms
iter 225380: loss nan, time 121.21ms
iter 225390: loss nan, time 118.57ms
tensor(0.0157)
iter 225400: loss nan, time 115.41ms
iter 225410: loss nan, time 114.79ms
iter 225420: loss nan, time 115.83ms
iter 225430: loss nan, time 115.39ms
iter 225440: loss nan, time 118.10ms
iter 225450: loss nan, time 116.12ms
iter 225460: loss nan, time 116.97ms
iter 225470: loss nan, time 117.01ms
iter 225480: loss nan, time 115.97ms
iter 225490: loss nan, time 117.20ms
tensor(0.0245)
step 225500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 225500: loss nan, time 2913.30ms
iter 225510: loss nan, time 114.86ms
iter 225520: loss nan, time 116.95ms
iter 225530: loss nan, time 118.43ms
iter 225540: loss nan, time 114.68ms
iter 225550: loss nan, time 118.31ms
iter 225560: loss nan, time 115.44ms
iter 225570: loss nan, time 114.50ms
iter 225580: loss nan, time 118.49ms
iter 225590: loss nan, time 115.63ms
tensor(0.0351)
iter 225600: loss nan, time 116.63ms
iter 225610: loss nan, time 117.68ms
iter 225620: loss nan, time 114.82ms
iter 225630: loss nan, time 118.02ms
iter 225640: loss nan, time 115.74ms
iter 225650: loss nan, time 114.70ms
iter 225660: loss nan, time 118.12ms
iter 225670: loss nan, time 115.55ms
iter 225680: loss nan, time 115.89ms
iter 225690: loss nan, time 116.22ms
tensor(0.0476)
iter 225700: loss nan, time 114.02ms
iter 225710: loss nan, time 118.25ms
iter 225720: loss nan, time 115.81ms
iter 225730: loss nan, time 114.79ms
iter 225740: loss nan, time 118.34ms
step 225750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 225750: loss nan, time 2904.75ms
iter 225760: loss nan, time 116.79ms
iter 225770: loss nan, time 115.28ms
iter 225780: loss nan, time 117.13ms
iter 225790: loss nan, time 118.18ms
tensor(0.0618)
iter 225800: loss nan, time 115.31ms
iter 225810: loss nan, time 118.11ms
iter 225820: loss nan, time 115.11ms
iter 225830: loss nan, time 114.92ms
iter 225840: loss nan, time 118.42ms
iter 225850: loss nan, time 115.63ms
iter 225860: loss nan, time 117.01ms
iter 225870: loss nan, time 118.30ms
iter 225880: loss nan, time 114.74ms
iter 225890: loss nan, time 116.86ms
tensor(0.0778)
iter 225900: loss nan, time 117.39ms
iter 225910: loss nan, time 115.08ms
iter 225920: loss nan, time 118.11ms
iter 225930: loss nan, time 117.45ms
iter 225940: loss nan, time 114.74ms
iter 225950: loss nan, time 117.69ms
iter 225960: loss nan, time 116.24ms
iter 225970: loss nan, time 116.66ms
iter 225980: loss nan, time 118.10ms
iter 225990: loss nan, time 115.01ms
tensor(0.0955)
step 226000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 226000: loss nan, time 2909.17ms
iter 226010: loss nan, time 116.94ms
iter 226020: loss nan, time 114.64ms
iter 226030: loss nan, time 117.97ms
iter 226040: loss nan, time 115.77ms
iter 226050: loss nan, time 114.94ms
iter 226060: loss nan, time 115.97ms
iter 226070: loss nan, time 115.61ms
iter 226080: loss nan, time 117.16ms
iter 226090: loss nan, time 118.35ms
tensor(0.1147)
iter 226100: loss nan, time 115.27ms
iter 226110: loss nan, time 116.84ms
iter 226120: loss nan, time 116.28ms
iter 226130: loss nan, time 114.13ms
iter 226140: loss nan, time 116.84ms
iter 226150: loss nan, time 116.47ms
iter 226160: loss nan, time 115.23ms
iter 226170: loss nan, time 118.17ms
iter 226180: loss nan, time 116.25ms
iter 226190: loss nan, time 114.75ms
tensor(0.1355)
iter 226200: loss nan, time 117.21ms
iter 226210: loss nan, time 116.34ms
iter 226220: loss nan, time 116.18ms
iter 226230: loss nan, time 117.20ms
iter 226240: loss nan, time 116.31ms
step 226250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 226250: loss nan, time 2927.62ms
iter 226260: loss nan, time 118.89ms
iter 226270: loss nan, time 115.67ms
iter 226280: loss nan, time 114.37ms
iter 226290: loss nan, time 118.33ms
tensor(0.1577)
iter 226300: loss nan, time 117.29ms
iter 226310: loss nan, time 114.83ms
iter 226320: loss nan, time 117.93ms
iter 226330: loss nan, time 115.05ms
iter 226340: loss nan, time 114.08ms
iter 226350: loss nan, time 118.50ms
iter 226360: loss nan, time 115.29ms
iter 226370: loss nan, time 116.99ms
iter 226380: loss nan, time 117.48ms
iter 226390: loss nan, time 114.77ms
tensor(0.1813)
iter 226400: loss nan, time 118.36ms
iter 226410: loss nan, time 115.90ms
iter 226420: loss nan, time 115.28ms
iter 226430: loss nan, time 117.31ms
iter 226440: loss nan, time 115.34ms
iter 226450: loss nan, time 116.96ms
iter 226460: loss nan, time 117.74ms
iter 226470: loss nan, time 114.97ms
iter 226480: loss nan, time 118.64ms
iter 226490: loss nan, time 117.06ms
tensor(0.2061)
step 226500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 226500: loss nan, time 2916.19ms
iter 226510: loss nan, time 115.74ms
iter 226520: loss nan, time 116.48ms
iter 226530: loss nan, time 115.06ms
iter 226540: loss nan, time 117.34ms
iter 226550: loss nan, time 115.76ms
iter 226560: loss nan, time 116.88ms
iter 226570: loss nan, time 116.35ms
iter 226580: loss nan, time 115.89ms
iter 226590: loss nan, time 117.06ms
tensor(0.2321)
iter 226600: loss nan, time 118.50ms
iter 226610: loss nan, time 115.69ms
iter 226620: loss nan, time 117.19ms
iter 226630: loss nan, time 116.17ms
iter 226640: loss nan, time 114.72ms
iter 226650: loss nan, time 117.16ms
iter 226660: loss nan, time 117.28ms
iter 226670: loss nan, time 114.71ms
iter 226680: loss nan, time 118.02ms
iter 226690: loss nan, time 115.93ms
tensor(0.2591)
iter 226700: loss nan, time 115.80ms
iter 226710: loss nan, time 116.79ms
iter 226720: loss nan, time 115.06ms
iter 226730: loss nan, time 116.98ms
iter 226740: loss nan, time 117.87ms
step 226750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 226750: loss nan, time 2893.80ms
iter 226760: loss nan, time 116.67ms
iter 226770: loss nan, time 114.75ms
iter 226780: loss nan, time 118.04ms
iter 226790: loss nan, time 115.73ms
tensor(0.2871)
iter 226800: loss nan, time 115.88ms
iter 226810: loss nan, time 118.21ms
iter 226820: loss nan, time 116.45ms
iter 226830: loss nan, time 114.48ms
iter 226840: loss nan, time 117.04ms
iter 226850: loss nan, time 114.73ms
iter 226860: loss nan, time 118.31ms
iter 226870: loss nan, time 115.81ms
iter 226880: loss nan, time 115.04ms
iter 226890: loss nan, time 116.06ms
tensor(0.3159)
iter 226900: loss nan, time 116.89ms
iter 226910: loss nan, time 116.66ms
iter 226920: loss nan, time 117.84ms
iter 226930: loss nan, time 115.15ms
iter 226940: loss nan, time 117.19ms
iter 226950: loss nan, time 116.16ms
iter 226960: loss nan, time 115.09ms
iter 226970: loss nan, time 116.70ms
iter 226980: loss nan, time 116.61ms
iter 226990: loss nan, time 114.52ms
tensor(0.3455)
step 227000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 227000: loss nan, time 2897.88ms
iter 227010: loss nan, time 114.66ms
iter 227020: loss nan, time 117.81ms
iter 227030: loss nan, time 115.91ms
iter 227040: loss nan, time 114.61ms
iter 227050: loss nan, time 117.87ms
iter 227060: loss nan, time 115.89ms
iter 227070: loss nan, time 116.66ms
iter 227080: loss nan, time 117.78ms
iter 227090: loss nan, time 114.55ms
tensor(0.3757)
iter 227100: loss nan, time 118.45ms
iter 227110: loss nan, time 115.92ms
iter 227120: loss nan, time 114.99ms
iter 227130: loss nan, time 117.92ms
iter 227140: loss nan, time 115.08ms
iter 227150: loss nan, time 114.57ms
iter 227160: loss nan, time 117.39ms
iter 227170: loss nan, time 114.56ms
iter 227180: loss nan, time 117.71ms
iter 227190: loss nan, time 114.80ms
tensor(0.4063)
iter 227200: loss nan, time 115.66ms
iter 227210: loss nan, time 115.81ms
iter 227220: loss nan, time 114.54ms
iter 227230: loss nan, time 116.71ms
iter 227240: loss nan, time 116.50ms
step 227250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 227250: loss nan, time 2904.15ms
iter 227260: loss nan, time 118.23ms
iter 227270: loss nan, time 115.39ms
iter 227280: loss nan, time 116.66ms
iter 227290: loss nan, time 118.45ms
tensor(0.4373)
iter 227300: loss nan, time 114.61ms
iter 227310: loss nan, time 117.31ms
iter 227320: loss nan, time 116.67ms
iter 227330: loss nan, time 114.54ms
iter 227340: loss nan, time 117.98ms
iter 227350: loss nan, time 116.10ms
iter 227360: loss nan, time 114.84ms
iter 227370: loss nan, time 117.25ms
iter 227380: loss nan, time 115.60ms
iter 227390: loss nan, time 116.90ms
tensor(0.4686)
iter 227400: loss nan, time 118.96ms
iter 227410: loss nan, time 114.61ms
iter 227420: loss nan, time 116.78ms
iter 227430: loss nan, time 116.27ms
iter 227440: loss nan, time 115.15ms
iter 227450: loss nan, time 118.01ms
iter 227460: loss nan, time 115.87ms
iter 227470: loss nan, time 114.75ms
iter 227480: loss nan, time 118.94ms
iter 227490: loss nan, time 115.38ms
tensor(0.5000)
step 227500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 227500: loss nan, time 2905.24ms
iter 227510: loss nan, time 116.60ms
iter 227520: loss nan, time 114.69ms
iter 227530: loss nan, time 118.27ms
iter 227540: loss nan, time 115.94ms
iter 227550: loss nan, time 115.18ms
iter 227560: loss nan, time 116.85ms
iter 227570: loss nan, time 115.63ms
iter 227580: loss nan, time 116.82ms
iter 227590: loss nan, time 118.09ms
tensor(0.5314)
iter 227600: loss nan, time 115.46ms
iter 227610: loss nan, time 116.99ms
iter 227620: loss nan, time 115.34ms
iter 227630: loss nan, time 115.57ms
iter 227640: loss nan, time 117.40ms
iter 227650: loss nan, time 117.37ms
iter 227660: loss nan, time 114.72ms
iter 227670: loss nan, time 118.12ms
iter 227680: loss nan, time 114.87ms
iter 227690: loss nan, time 115.37ms
tensor(0.5627)
iter 227700: loss nan, time 118.91ms
iter 227710: loss nan, time 115.52ms
iter 227720: loss nan, time 116.94ms
iter 227730: loss nan, time 117.97ms
iter 227740: loss nan, time 114.95ms
step 227750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 227750: loss nan, time 2907.07ms
iter 227760: loss nan, time 116.41ms
iter 227770: loss nan, time 115.14ms
iter 227780: loss nan, time 118.01ms
iter 227790: loss nan, time 115.89ms
tensor(0.5937)
iter 227800: loss nan, time 115.20ms
iter 227810: loss nan, time 117.20ms
iter 227820: loss nan, time 115.44ms
iter 227830: loss nan, time 117.17ms
iter 227840: loss nan, time 118.00ms
iter 227850: loss nan, time 114.91ms
iter 227860: loss nan, time 114.68ms
iter 227870: loss nan, time 115.50ms
iter 227880: loss nan, time 114.54ms
iter 227890: loss nan, time 117.86ms
tensor(0.6243)
iter 227900: loss nan, time 115.45ms
iter 227910: loss nan, time 116.92ms
iter 227920: loss nan, time 115.98ms
iter 227930: loss nan, time 114.56ms
iter 227940: loss nan, time 116.67ms
iter 227950: loss nan, time 115.90ms
iter 227960: loss nan, time 114.83ms
iter 227970: loss nan, time 117.09ms
iter 227980: loss nan, time 114.94ms
iter 227990: loss nan, time 116.65ms
tensor(0.6545)
step 228000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 228000: loss nan, time 2893.23ms
iter 228010: loss nan, time 116.43ms
iter 228020: loss nan, time 116.07ms
iter 228030: loss nan, time 115.07ms
iter 228040: loss nan, time 117.64ms
iter 228050: loss nan, time 115.24ms
iter 228060: loss nan, time 114.55ms
iter 228070: loss nan, time 117.31ms
iter 228080: loss nan, time 114.49ms
iter 228090: loss nan, time 117.99ms
tensor(0.6841)
iter 228100: loss nan, time 116.06ms
iter 228110: loss nan, time 116.92ms
iter 228120: loss nan, time 115.77ms
iter 228130: loss nan, time 115.06ms
iter 228140: loss nan, time 116.68ms
iter 228150: loss nan, time 116.09ms
iter 228160: loss nan, time 115.46ms
iter 228170: loss nan, time 118.01ms
iter 228180: loss nan, time 115.58ms
iter 228190: loss nan, time 116.75ms
tensor(0.7129)
iter 228200: loss nan, time 116.17ms
iter 228210: loss nan, time 114.35ms
iter 228220: loss nan, time 118.17ms
iter 228230: loss nan, time 115.93ms
iter 228240: loss nan, time 114.10ms
step 228250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 228250: loss nan, time 2902.51ms
iter 228260: loss nan, time 114.78ms
iter 228270: loss nan, time 116.99ms
iter 228280: loss nan, time 115.93ms
iter 228290: loss nan, time 115.62ms
tensor(0.7409)
iter 228300: loss nan, time 118.47ms
iter 228310: loss nan, time 115.51ms
iter 228320: loss nan, time 114.69ms
iter 228330: loss nan, time 118.30ms
iter 228340: loss nan, time 114.68ms
iter 228350: loss nan, time 116.76ms
iter 228360: loss nan, time 116.63ms
iter 228370: loss nan, time 114.69ms
iter 228380: loss nan, time 115.98ms
iter 228390: loss nan, time 116.08ms
tensor(0.7679)
iter 228400: loss nan, time 115.24ms
iter 228410: loss nan, time 118.10ms
iter 228420: loss nan, time 115.01ms
iter 228430: loss nan, time 116.79ms
iter 228440: loss nan, time 115.80ms
iter 228450: loss nan, time 114.68ms
iter 228460: loss nan, time 116.75ms
iter 228470: loss nan, time 115.50ms
iter 228480: loss nan, time 116.76ms
iter 228490: loss nan, time 118.13ms
tensor(0.7939)
step 228500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 228500: loss nan, time 2894.57ms
iter 228510: loss nan, time 117.89ms
iter 228520: loss nan, time 115.39ms
iter 228530: loss nan, time 116.80ms
iter 228540: loss nan, time 116.21ms
iter 228550: loss nan, time 114.05ms
iter 228560: loss nan, time 118.04ms
iter 228570: loss nan, time 115.90ms
iter 228580: loss nan, time 114.60ms
iter 228590: loss nan, time 117.06ms
tensor(0.8187)
iter 228600: loss nan, time 115.32ms
iter 228610: loss nan, time 116.71ms
iter 228620: loss nan, time 117.08ms
iter 228630: loss nan, time 115.87ms
iter 228640: loss nan, time 114.79ms
iter 228650: loss nan, time 116.22ms
iter 228660: loss nan, time 114.50ms
iter 228670: loss nan, time 117.02ms
iter 228680: loss nan, time 115.74ms
iter 228690: loss nan, time 116.50ms
tensor(0.8423)
iter 228700: loss nan, time 116.73ms
iter 228710: loss nan, time 115.80ms
iter 228720: loss nan, time 116.88ms
iter 228730: loss nan, time 116.26ms
iter 228740: loss nan, time 116.15ms
step 228750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 228750: loss nan, time 2898.54ms
iter 228760: loss nan, time 116.22ms
iter 228770: loss nan, time 116.68ms
iter 228780: loss nan, time 116.69ms
iter 228790: loss nan, time 114.63ms
tensor(0.8645)
iter 228800: loss nan, time 117.39ms
iter 228810: loss nan, time 115.85ms
iter 228820: loss nan, time 115.16ms
iter 228830: loss nan, time 117.91ms
iter 228840: loss nan, time 116.00ms
iter 228850: loss nan, time 116.95ms
iter 228860: loss nan, time 116.87ms
iter 228870: loss nan, time 115.88ms
iter 228880: loss nan, time 116.65ms
iter 228890: loss nan, time 116.43ms
tensor(0.8853)
iter 228900: loss nan, time 115.38ms
iter 228910: loss nan, time 116.04ms
iter 228920: loss nan, time 115.69ms
iter 228930: loss nan, time 116.72ms
iter 228940: loss nan, time 117.09ms
iter 228950: loss nan, time 115.90ms
iter 228960: loss nan, time 114.74ms
iter 228970: loss nan, time 116.44ms
iter 228980: loss nan, time 115.10ms
iter 228990: loss nan, time 115.56ms
tensor(0.9045)
step 229000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 229000: loss nan, time 2898.72ms
iter 229010: loss nan, time 116.73ms
iter 229020: loss nan, time 115.96ms
iter 229030: loss nan, time 115.53ms
iter 229040: loss nan, time 117.95ms
iter 229050: loss nan, time 114.62ms
iter 229060: loss nan, time 115.70ms
iter 229070: loss nan, time 117.74ms
iter 229080: loss nan, time 115.76ms
iter 229090: loss nan, time 116.72ms
tensor(0.9222)
iter 229100: loss nan, time 116.61ms
iter 229110: loss nan, time 114.69ms
iter 229120: loss nan, time 117.99ms
iter 229130: loss nan, time 115.78ms
iter 229140: loss nan, time 116.52ms
iter 229150: loss nan, time 117.81ms
iter 229160: loss nan, time 115.73ms
iter 229170: loss nan, time 117.23ms
iter 229180: loss nan, time 116.57ms
iter 229190: loss nan, time 115.22ms
tensor(0.9382)
iter 229200: loss nan, time 117.49ms
iter 229210: loss nan, time 115.83ms
iter 229220: loss nan, time 114.59ms
iter 229230: loss nan, time 117.96ms
iter 229240: loss nan, time 116.00ms
step 229250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 229250: loss nan, time 2904.07ms
iter 229260: loss nan, time 115.72ms
iter 229270: loss nan, time 115.14ms
iter 229280: loss nan, time 117.53ms
iter 229290: loss nan, time 116.01ms
tensor(0.9524)
iter 229300: loss nan, time 117.46ms
iter 229310: loss nan, time 116.31ms
iter 229320: loss nan, time 114.65ms
iter 229330: loss nan, time 116.73ms
iter 229340: loss nan, time 115.87ms
iter 229350: loss nan, time 115.03ms
iter 229360: loss nan, time 117.92ms
iter 229370: loss nan, time 114.52ms
iter 229380: loss nan, time 115.66ms
iter 229390: loss nan, time 117.61ms
tensor(0.9649)
iter 229400: loss nan, time 116.39ms
iter 229410: loss nan, time 117.06ms
iter 229420: loss nan, time 116.20ms
iter 229430: loss nan, time 115.03ms
iter 229440: loss nan, time 117.72ms
iter 229450: loss nan, time 115.29ms
iter 229460: loss nan, time 116.89ms
iter 229470: loss nan, time 117.83ms
iter 229480: loss nan, time 115.68ms
iter 229490: loss nan, time 117.02ms
tensor(0.9755)
step 229500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 229500: loss nan, time 2905.22ms
iter 229510: loss nan, time 116.96ms
iter 229520: loss nan, time 117.98ms
iter 229530: loss nan, time 115.24ms
iter 229540: loss nan, time 116.78ms
iter 229550: loss nan, time 116.08ms
iter 229560: loss nan, time 115.46ms
iter 229570: loss nan, time 116.78ms
iter 229580: loss nan, time 115.73ms
iter 229590: loss nan, time 114.94ms
tensor(0.9843)
iter 229600: loss nan, time 118.82ms
iter 229610: loss nan, time 116.14ms
iter 229620: loss nan, time 116.68ms
iter 229630: loss nan, time 116.73ms
iter 229640: loss nan, time 115.24ms
iter 229650: loss nan, time 116.77ms
iter 229660: loss nan, time 115.84ms
iter 229670: loss nan, time 115.32ms
iter 229680: loss nan, time 116.75ms
iter 229690: loss nan, time 114.69ms
tensor(0.9911)
iter 229700: loss nan, time 115.39ms
iter 229710: loss nan, time 118.12ms
iter 229720: loss nan, time 115.55ms
iter 229730: loss nan, time 116.77ms
iter 229740: loss nan, time 116.85ms
step 229750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 229750: loss nan, time 2904.10ms
iter 229760: loss nan, time 118.00ms
iter 229770: loss nan, time 115.70ms
iter 229780: loss nan, time 116.80ms
iter 229790: loss nan, time 117.35ms
tensor(0.9961)
iter 229800: loss nan, time 116.67ms
iter 229810: loss nan, time 114.59ms
iter 229820: loss nan, time 116.16ms
iter 229830: loss nan, time 114.78ms
iter 229840: loss nan, time 118.03ms
iter 229850: loss nan, time 114.93ms
iter 229860: loss nan, time 117.46ms
iter 229870: loss nan, time 116.30ms
iter 229880: loss nan, time 115.85ms
iter 229890: loss nan, time 117.12ms
tensor(0.9990)
iter 229900: loss nan, time 116.70ms
iter 229910: loss nan, time 115.44ms
iter 229920: loss nan, time 116.71ms
iter 229930: loss nan, time 116.16ms
iter 229940: loss nan, time 116.76ms
iter 229950: loss nan, time 116.79ms
iter 229960: loss nan, time 115.33ms
iter 229970: loss nan, time 116.82ms
iter 229980: loss nan, time 116.76ms
iter 229990: loss nan, time 115.25ms
tensor(1.)
step 230000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 230000: loss nan, time 2918.00ms
iter 230010: loss nan, time 115.66ms
iter 230020: loss nan, time 116.77ms
iter 230030: loss nan, time 118.10ms
iter 230040: loss nan, time 115.24ms
iter 230050: loss nan, time 116.69ms
iter 230060: loss nan, time 116.80ms
iter 230070: loss nan, time 115.79ms
iter 230080: loss nan, time 116.66ms
iter 230090: loss nan, time 115.96ms
tensor(0.9990)
iter 230100: loss nan, time 115.04ms
iter 230110: loss nan, time 118.03ms
iter 230120: loss nan, time 115.79ms
iter 230130: loss nan, time 114.69ms
iter 230140: loss nan, time 118.07ms
iter 230150: loss nan, time 115.90ms
iter 230160: loss nan, time 116.63ms
iter 230170: loss nan, time 117.56ms
iter 230180: loss nan, time 115.99ms
iter 230190: loss nan, time 114.81ms
tensor(0.9961)
iter 230200: loss nan, time 116.13ms
iter 230210: loss nan, time 116.03ms
iter 230220: loss nan, time 117.09ms
iter 230230: loss nan, time 115.95ms
iter 230240: loss nan, time 114.85ms
step 230250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 230250: loss nan, time 2917.63ms
iter 230260: loss nan, time 115.86ms
iter 230270: loss nan, time 116.75ms
iter 230280: loss nan, time 116.95ms
iter 230290: loss nan, time 115.45ms
tensor(0.9911)
iter 230300: loss nan, time 116.98ms
iter 230310: loss nan, time 116.05ms
iter 230320: loss nan, time 114.80ms
iter 230330: loss nan, time 117.99ms
iter 230340: loss nan, time 114.65ms
iter 230350: loss nan, time 116.72ms
iter 230360: loss nan, time 117.29ms
iter 230370: loss nan, time 115.85ms
iter 230380: loss nan, time 116.64ms
iter 230390: loss nan, time 115.69ms
tensor(0.9843)
iter 230400: loss nan, time 114.95ms
iter 230410: loss nan, time 118.01ms
iter 230420: loss nan, time 115.68ms
iter 230430: loss nan, time 116.75ms
iter 230440: loss nan, time 117.27ms
iter 230450: loss nan, time 115.42ms
iter 230460: loss nan, time 116.55ms
iter 230470: loss nan, time 115.77ms
iter 230480: loss nan, time 114.66ms
iter 230490: loss nan, time 118.08ms
tensor(0.9755)
step 230500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 230500: loss nan, time 2897.60ms
iter 230510: loss nan, time 116.84ms
iter 230520: loss nan, time 115.63ms
iter 230530: loss nan, time 116.75ms
iter 230540: loss nan, time 116.71ms
iter 230550: loss nan, time 115.78ms
iter 230560: loss nan, time 116.75ms
iter 230570: loss nan, time 116.91ms
iter 230580: loss nan, time 115.19ms
iter 230590: loss nan, time 117.12ms
tensor(0.9649)
iter 230600: loss nan, time 115.35ms
iter 230610: loss nan, time 116.43ms
iter 230620: loss nan, time 118.40ms
iter 230630: loss nan, time 115.73ms
iter 230640: loss nan, time 116.62ms
iter 230650: loss nan, time 117.33ms
iter 230660: loss nan, time 114.55ms
iter 230670: loss nan, time 116.72ms
iter 230680: loss nan, time 115.84ms
iter 230690: loss nan, time 115.99ms
tensor(0.9524)
iter 230700: loss nan, time 118.34ms
iter 230710: loss nan, time 115.79ms
iter 230720: loss nan, time 116.81ms
iter 230730: loss nan, time 116.88ms
iter 230740: loss nan, time 115.56ms
step 230750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 230750: loss nan, time 2900.05ms
iter 230760: loss nan, time 115.84ms
iter 230770: loss nan, time 116.66ms
iter 230780: loss nan, time 115.91ms
iter 230790: loss nan, time 115.51ms
tensor(0.9382)
iter 230800: loss nan, time 117.19ms
iter 230810: loss nan, time 116.68ms
iter 230820: loss nan, time 115.73ms
iter 230830: loss nan, time 117.01ms
iter 230840: loss nan, time 115.85ms
iter 230850: loss nan, time 114.72ms
iter 230860: loss nan, time 116.90ms
iter 230870: loss nan, time 115.89ms
iter 230880: loss nan, time 117.15ms
iter 230890: loss nan, time 118.18ms
tensor(0.9222)
iter 230900: loss nan, time 116.59ms
iter 230910: loss nan, time 116.87ms
iter 230920: loss nan, time 116.23ms
iter 230930: loss nan, time 116.15ms
iter 230940: loss nan, time 116.77ms
iter 230950: loss nan, time 115.63ms
iter 230960: loss nan, time 114.95ms
iter 230970: loss nan, time 116.78ms
iter 230980: loss nan, time 114.67ms
iter 230990: loss nan, time 116.07ms
tensor(0.9045)
step 231000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 231000: loss nan, time 2903.29ms
iter 231010: loss nan, time 115.92ms
iter 231020: loss nan, time 116.59ms
iter 231030: loss nan, time 116.12ms
iter 231040: loss nan, time 114.96ms
iter 231050: loss nan, time 116.72ms
iter 231060: loss nan, time 115.80ms
iter 231070: loss nan, time 116.91ms
iter 231080: loss nan, time 117.69ms
iter 231090: loss nan, time 115.96ms
tensor(0.8853)
iter 231100: loss nan, time 115.64ms
iter 231110: loss nan, time 116.47ms
iter 231120: loss nan, time 115.49ms
iter 231130: loss nan, time 116.76ms
iter 231140: loss nan, time 115.41ms
iter 231150: loss nan, time 116.40ms
iter 231160: loss nan, time 115.66ms
iter 231170: loss nan, time 115.76ms
iter 231180: loss nan, time 116.99ms
iter 231190: loss nan, time 116.71ms
tensor(0.8645)
iter 231200: loss nan, time 116.37ms
iter 231210: loss nan, time 116.70ms
iter 231220: loss nan, time 115.77ms
iter 231230: loss nan, time 114.98ms
iter 231240: loss nan, time 117.01ms
step 231250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 231250: loss nan, time 2908.49ms
iter 231260: loss nan, time 116.75ms
iter 231270: loss nan, time 114.69ms
iter 231280: loss nan, time 114.74ms
iter 231290: loss nan, time 117.88ms
tensor(0.8423)
iter 231300: loss nan, time 115.99ms
iter 231310: loss nan, time 116.66ms
iter 231320: loss nan, time 117.51ms
iter 231330: loss nan, time 115.74ms
iter 231340: loss nan, time 116.23ms
iter 231350: loss nan, time 115.57ms
iter 231360: loss nan, time 114.83ms
iter 231370: loss nan, time 117.98ms
iter 231380: loss nan, time 115.97ms
iter 231390: loss nan, time 116.64ms
tensor(0.8187)
iter 231400: loss nan, time 118.71ms
iter 231410: loss nan, time 116.01ms
iter 231420: loss nan, time 114.78ms
iter 231430: loss nan, time 117.40ms
iter 231440: loss nan, time 115.13ms
iter 231450: loss nan, time 116.67ms
iter 231460: loss nan, time 115.58ms
iter 231470: loss nan, time 116.72ms
iter 231480: loss nan, time 115.81ms
iter 231490: loss nan, time 115.75ms
tensor(0.7939)
step 231500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 231500: loss nan, time 2902.70ms
iter 231510: loss nan, time 115.55ms
iter 231520: loss nan, time 117.08ms
iter 231530: loss nan, time 118.41ms
iter 231540: loss nan, time 115.74ms
iter 231550: loss nan, time 116.76ms
iter 231560: loss nan, time 117.50ms
iter 231570: loss nan, time 114.96ms
iter 231580: loss nan, time 116.74ms
iter 231590: loss nan, time 116.51ms
tensor(0.7679)
iter 231600: loss nan, time 116.37ms
iter 231610: loss nan, time 116.63ms
iter 231620: loss nan, time 115.73ms
iter 231630: loss nan, time 116.82ms
iter 231640: loss nan, time 118.01ms
iter 231650: loss nan, time 116.06ms
iter 231660: loss nan, time 117.03ms
iter 231670: loss nan, time 118.12ms
iter 231680: loss nan, time 115.65ms
iter 231690: loss nan, time 116.95ms
tensor(0.7409)
iter 231700: loss nan, time 118.41ms
iter 231710: loss nan, time 115.90ms
iter 231720: loss nan, time 117.68ms
iter 231730: loss nan, time 121.35ms
iter 231740: loss nan, time 121.45ms
step 231750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 231750: loss nan, time 2906.70ms
iter 231760: loss nan, time 121.07ms
iter 231770: loss nan, time 120.54ms
iter 231780: loss nan, time 121.18ms
iter 231790: loss nan, time 118.86ms
tensor(0.7129)
iter 231800: loss nan, time 119.91ms
iter 231810: loss nan, time 119.45ms
iter 231820: loss nan, time 120.26ms
iter 231830: loss nan, time 120.38ms
iter 231840: loss nan, time 120.07ms
iter 231850: loss nan, time 120.63ms
iter 231860: loss nan, time 120.24ms
iter 231870: loss nan, time 121.37ms
iter 231880: loss nan, time 122.43ms
iter 231890: loss nan, time 119.53ms
tensor(0.6841)
iter 231900: loss nan, time 121.36ms
iter 231910: loss nan, time 121.44ms
iter 231920: loss nan, time 121.63ms
iter 231930: loss nan, time 121.13ms
iter 231940: loss nan, time 119.22ms
iter 231950: loss nan, time 119.15ms
iter 231960: loss nan, time 120.02ms
iter 231970: loss nan, time 120.14ms
iter 231980: loss nan, time 120.20ms
iter 231990: loss nan, time 120.62ms
tensor(0.6545)
step 232000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 232000: loss nan, time 2915.64ms
iter 232010: loss nan, time 121.00ms
iter 232020: loss nan, time 122.58ms
iter 232030: loss nan, time 122.41ms
iter 232040: loss nan, time 122.03ms
iter 232050: loss nan, time 122.46ms
iter 232060: loss nan, time 121.14ms
iter 232070: loss nan, time 121.68ms
iter 232080: loss nan, time 121.16ms
iter 232090: loss nan, time 119.09ms
tensor(0.6243)
iter 232100: loss nan, time 119.33ms
iter 232110: loss nan, time 119.23ms
iter 232120: loss nan, time 119.24ms
iter 232130: loss nan, time 119.60ms
iter 232140: loss nan, time 119.97ms
iter 232150: loss nan, time 120.20ms
iter 232160: loss nan, time 120.54ms
iter 232170: loss nan, time 121.32ms
iter 232180: loss nan, time 122.20ms
iter 232190: loss nan, time 120.55ms
tensor(0.5937)
iter 232200: loss nan, time 122.91ms
iter 232210: loss nan, time 122.46ms
iter 232220: loss nan, time 122.24ms
iter 232230: loss nan, time 121.46ms
iter 232240: loss nan, time 119.16ms
step 232250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 232250: loss nan, time 2896.21ms
iter 232260: loss nan, time 122.44ms
iter 232270: loss nan, time 120.77ms
iter 232280: loss nan, time 121.15ms
iter 232290: loss nan, time 121.14ms
tensor(0.5627)
iter 232300: loss nan, time 119.73ms
iter 232310: loss nan, time 118.98ms
iter 232320: loss nan, time 119.07ms
iter 232330: loss nan, time 119.04ms
iter 232340: loss nan, time 118.87ms
iter 232350: loss nan, time 119.48ms
iter 232360: loss nan, time 118.92ms
iter 232370: loss nan, time 119.85ms
iter 232380: loss nan, time 119.88ms
iter 232390: loss nan, time 120.59ms
tensor(0.5314)
iter 232400: loss nan, time 121.12ms
iter 232410: loss nan, time 119.98ms
iter 232420: loss nan, time 122.24ms
iter 232430: loss nan, time 121.43ms
iter 232440: loss nan, time 122.55ms
iter 232450: loss nan, time 122.57ms
iter 232460: loss nan, time 120.70ms
iter 232470: loss nan, time 121.40ms
iter 232480: loss nan, time 121.10ms
iter 232490: loss nan, time 121.53ms
tensor(0.5000)
step 232500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 232500: loss nan, time 2905.01ms
iter 232510: loss nan, time 120.89ms
iter 232520: loss nan, time 119.76ms
iter 232530: loss nan, time 119.54ms
iter 232540: loss nan, time 118.46ms
iter 232550: loss nan, time 118.84ms
iter 232560: loss nan, time 118.84ms
iter 232570: loss nan, time 119.61ms
iter 232580: loss nan, time 119.76ms
iter 232590: loss nan, time 119.77ms
tensor(0.4686)
iter 232600: loss nan, time 120.33ms
iter 232610: loss nan, time 120.66ms
iter 232620: loss nan, time 120.34ms
iter 232630: loss nan, time 120.46ms
iter 232640: loss nan, time 121.07ms
iter 232650: loss nan, time 119.79ms
iter 232660: loss nan, time 121.66ms
iter 232670: loss nan, time 122.31ms
iter 232680: loss nan, time 122.22ms
iter 232690: loss nan, time 122.11ms
tensor(0.4373)
iter 232700: loss nan, time 120.29ms
iter 232710: loss nan, time 122.15ms
iter 232720: loss nan, time 121.93ms
iter 232730: loss nan, time 122.06ms
iter 232740: loss nan, time 122.03ms
step 232750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 232750: loss nan, time 2894.61ms
iter 232760: loss nan, time 119.99ms
iter 232770: loss nan, time 122.29ms
iter 232780: loss nan, time 122.28ms
iter 232790: loss nan, time 122.09ms
tensor(0.4063)
iter 232800: loss nan, time 121.26ms
iter 232810: loss nan, time 120.28ms
iter 232820: loss nan, time 122.07ms
iter 232830: loss nan, time 121.95ms
iter 232840: loss nan, time 122.04ms
iter 232850: loss nan, time 122.01ms
iter 232860: loss nan, time 119.79ms
iter 232870: loss nan, time 122.16ms
iter 232880: loss nan, time 122.28ms
iter 232890: loss nan, time 122.05ms
tensor(0.3757)
iter 232900: loss nan, time 121.70ms
iter 232910: loss nan, time 120.01ms
iter 232920: loss nan, time 121.68ms
iter 232930: loss nan, time 122.16ms
iter 232940: loss nan, time 122.01ms
iter 232950: loss nan, time 122.37ms
iter 232960: loss nan, time 119.82ms
iter 232970: loss nan, time 122.46ms
iter 232980: loss nan, time 121.96ms
iter 232990: loss nan, time 123.06ms
tensor(0.3455)
step 233000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 233000: loss nan, time 2901.09ms
iter 233010: loss nan, time 121.57ms
iter 233020: loss nan, time 118.94ms
iter 233030: loss nan, time 120.64ms
iter 233040: loss nan, time 121.75ms
iter 233050: loss nan, time 122.10ms
iter 233060: loss nan, time 122.21ms
iter 233070: loss nan, time 119.03ms
iter 233080: loss nan, time 121.99ms
iter 233090: loss nan, time 122.62ms
tensor(0.3159)
iter 233100: loss nan, time 122.44ms
iter 233110: loss nan, time 122.38ms
iter 233120: loss nan, time 119.76ms
iter 233130: loss nan, time 122.18ms
iter 233140: loss nan, time 121.92ms
iter 233150: loss nan, time 120.95ms
iter 233160: loss nan, time 121.26ms
iter 233170: loss nan, time 119.12ms
iter 233180: loss nan, time 120.45ms
iter 233190: loss nan, time 120.25ms
tensor(0.2871)
iter 233200: loss nan, time 120.50ms
iter 233210: loss nan, time 119.42ms
iter 233220: loss nan, time 119.48ms
iter 233230: loss nan, time 119.71ms
iter 233240: loss nan, time 119.06ms
step 233250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 233250: loss nan, time 2818.70ms
iter 233260: loss nan, time 119.86ms
iter 233270: loss nan, time 120.45ms
iter 233280: loss nan, time 120.34ms
iter 233290: loss nan, time 120.54ms
tensor(0.2591)
iter 233300: loss nan, time 120.07ms
iter 233310: loss nan, time 121.10ms
iter 233320: loss nan, time 122.39ms
iter 233330: loss nan, time 122.19ms
iter 233340: loss nan, time 122.12ms
iter 233350: loss nan, time 121.00ms
iter 233360: loss nan, time 122.67ms
iter 233370: loss nan, time 122.20ms
iter 233380: loss nan, time 121.96ms
iter 233390: loss nan, time 122.26ms
tensor(0.2321)
iter 233400: loss nan, time 121.41ms
iter 233410: loss nan, time 120.90ms
iter 233420: loss nan, time 121.15ms
iter 233430: loss nan, time 119.06ms
iter 233440: loss nan, time 118.44ms
iter 233450: loss nan, time 119.16ms
iter 233460: loss nan, time 119.18ms
iter 233470: loss nan, time 119.78ms
iter 233480: loss nan, time 119.31ms
iter 233490: loss nan, time 119.95ms
tensor(0.2061)
step 233500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 233500: loss nan, time 2906.15ms
iter 233510: loss nan, time 118.72ms
iter 233520: loss nan, time 119.02ms
iter 233530: loss nan, time 119.34ms
iter 233540: loss nan, time 118.69ms
iter 233550: loss nan, time 119.34ms
iter 233560: loss nan, time 120.03ms
iter 233570: loss nan, time 119.92ms
iter 233580: loss nan, time 120.69ms
iter 233590: loss nan, time 120.21ms
tensor(0.1813)
iter 233600: loss nan, time 120.46ms
iter 233610: loss nan, time 120.49ms
iter 233620: loss nan, time 119.98ms
iter 233630: loss nan, time 120.62ms
iter 233640: loss nan, time 119.85ms
iter 233650: loss nan, time 120.51ms
iter 233660: loss nan, time 119.96ms
iter 233670: loss nan, time 120.68ms
iter 233680: loss nan, time 120.49ms
iter 233690: loss nan, time 119.87ms
tensor(0.1577)
iter 233700: loss nan, time 121.04ms
iter 233710: loss nan, time 121.15ms
iter 233720: loss nan, time 121.15ms
iter 233730: loss nan, time 122.13ms
iter 233740: loss nan, time 119.91ms
step 233750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 233750: loss nan, time 2904.44ms
iter 233760: loss nan, time 120.74ms
iter 233770: loss nan, time 121.64ms
iter 233780: loss nan, time 122.25ms
iter 233790: loss nan, time 122.27ms
tensor(0.1355)
iter 233800: loss nan, time 120.19ms
iter 233810: loss nan, time 122.02ms
iter 233820: loss nan, time 122.81ms
iter 233830: loss nan, time 121.80ms
iter 233840: loss nan, time 121.41ms
iter 233850: loss nan, time 120.24ms
iter 233860: loss nan, time 122.23ms
iter 233870: loss nan, time 122.49ms
iter 233880: loss nan, time 122.03ms
iter 233890: loss nan, time 121.16ms
tensor(0.1147)
iter 233900: loss nan, time 118.98ms
iter 233910: loss nan, time 120.74ms
iter 233920: loss nan, time 120.79ms
iter 233930: loss nan, time 120.64ms
iter 233940: loss nan, time 120.90ms
iter 233950: loss nan, time 118.51ms
iter 233960: loss nan, time 119.87ms
iter 233970: loss nan, time 118.88ms
iter 233980: loss nan, time 119.16ms
iter 233990: loss nan, time 118.74ms
tensor(0.0955)
step 234000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 234000: loss nan, time 2901.20ms
iter 234010: loss nan, time 118.76ms
iter 234020: loss nan, time 121.37ms
iter 234030: loss nan, time 120.72ms
iter 234040: loss nan, time 121.71ms
iter 234050: loss nan, time 121.11ms
iter 234060: loss nan, time 119.04ms
iter 234070: loss nan, time 121.65ms
iter 234080: loss nan, time 120.90ms
iter 234090: loss nan, time 121.11ms
tensor(0.0778)
iter 234100: loss nan, time 121.03ms
iter 234110: loss nan, time 118.96ms
iter 234120: loss nan, time 121.43ms
iter 234130: loss nan, time 121.13ms
iter 234140: loss nan, time 122.08ms
iter 234150: loss nan, time 120.86ms
iter 234160: loss nan, time 118.83ms
iter 234170: loss nan, time 121.54ms
iter 234180: loss nan, time 120.97ms
iter 234190: loss nan, time 122.00ms
tensor(0.0618)
iter 234200: loss nan, time 121.01ms
iter 234210: loss nan, time 118.72ms
iter 234220: loss nan, time 120.90ms
iter 234230: loss nan, time 120.97ms
iter 234240: loss nan, time 122.04ms
step 234250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 234250: loss nan, time 2914.57ms
iter 234260: loss nan, time 121.15ms
iter 234270: loss nan, time 119.61ms
iter 234280: loss nan, time 120.75ms
iter 234290: loss nan, time 120.90ms
tensor(0.0476)
iter 234300: loss nan, time 121.19ms
iter 234310: loss nan, time 120.86ms
iter 234320: loss nan, time 118.66ms
iter 234330: loss nan, time 121.36ms
iter 234340: loss nan, time 120.91ms
iter 234350: loss nan, time 118.96ms
iter 234360: loss nan, time 118.89ms
iter 234370: loss nan, time 118.97ms
iter 234380: loss nan, time 118.41ms
iter 234390: loss nan, time 118.76ms
tensor(0.0351)
iter 234400: loss nan, time 119.20ms
iter 234410: loss nan, time 118.74ms
iter 234420: loss nan, time 118.99ms
iter 234430: loss nan, time 119.11ms
iter 234440: loss nan, time 118.97ms
iter 234450: loss nan, time 119.01ms
iter 234460: loss nan, time 119.16ms
iter 234470: loss nan, time 119.89ms
iter 234480: loss nan, time 118.90ms
iter 234490: loss nan, time 120.47ms
tensor(0.0245)
step 234500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 234500: loss nan, time 2924.17ms
iter 234510: loss nan, time 121.71ms
iter 234520: loss nan, time 122.21ms
iter 234530: loss nan, time 122.11ms
iter 234540: loss nan, time 120.96ms
iter 234550: loss nan, time 122.43ms
iter 234560: loss nan, time 122.09ms
iter 234570: loss nan, time 122.22ms
iter 234580: loss nan, time 122.01ms
iter 234590: loss nan, time 120.89ms
tensor(0.0157)
iter 234600: loss nan, time 122.93ms
iter 234610: loss nan, time 122.08ms
iter 234620: loss nan, time 122.08ms
iter 234630: loss nan, time 121.87ms
iter 234640: loss nan, time 120.78ms
iter 234650: loss nan, time 121.97ms
iter 234660: loss nan, time 121.12ms
iter 234670: loss nan, time 120.99ms
iter 234680: loss nan, time 121.88ms
iter 234690: loss nan, time 120.73ms
tensor(0.0089)
iter 234700: loss nan, time 121.08ms
iter 234710: loss nan, time 120.86ms
iter 234720: loss nan, time 120.96ms
iter 234730: loss nan, time 120.77ms
iter 234740: loss nan, time 120.78ms
step 234750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 234750: loss nan, time 2889.53ms
iter 234760: loss nan, time 122.19ms
iter 234770: loss nan, time 122.12ms
iter 234780: loss nan, time 122.18ms
iter 234790: loss nan, time 122.61ms
tensor(0.0039)
iter 234800: loss nan, time 120.93ms
iter 234810: loss nan, time 123.41ms
iter 234820: loss nan, time 121.27ms
iter 234830: loss nan, time 121.42ms
iter 234840: loss nan, time 121.86ms
iter 234850: loss nan, time 120.73ms
iter 234860: loss nan, time 122.15ms
iter 234870: loss nan, time 121.97ms
iter 234880: loss nan, time 121.90ms
iter 234890: loss nan, time 121.80ms
tensor(0.0010)
iter 234900: loss nan, time 120.53ms
iter 234910: loss nan, time 121.93ms
iter 234920: loss nan, time 122.08ms
iter 234930: loss nan, time 122.01ms
iter 234940: loss nan, time 122.12ms
iter 234950: loss nan, time 120.53ms
iter 234960: loss nan, time 122.25ms
iter 234970: loss nan, time 121.98ms
iter 234980: loss nan, time 121.94ms
iter 234990: loss nan, time 121.92ms
tensor(0.0010)
step 235000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 235000: loss nan, time 2902.97ms
iter 235010: loss nan, time 121.00ms
iter 235020: loss nan, time 122.40ms
iter 235030: loss nan, time 121.97ms
iter 235040: loss nan, time 122.19ms
iter 235050: loss nan, time 121.93ms
iter 235060: loss nan, time 120.75ms
iter 235070: loss nan, time 121.13ms
iter 235080: loss nan, time 121.45ms
iter 235090: loss nan, time 120.87ms
tensor(0.0010)
iter 235100: loss nan, time 121.25ms
iter 235110: loss nan, time 120.70ms
iter 235120: loss nan, time 119.60ms
iter 235130: loss nan, time 121.06ms
iter 235140: loss nan, time 121.02ms
iter 235150: loss nan, time 120.83ms
iter 235160: loss nan, time 118.99ms
iter 235170: loss nan, time 118.95ms
iter 235180: loss nan, time 118.70ms
iter 235190: loss nan, time 119.00ms
tensor(0.0039)
iter 235200: loss nan, time 118.97ms
iter 235210: loss nan, time 118.68ms
iter 235220: loss nan, time 119.58ms
iter 235230: loss nan, time 119.51ms
iter 235240: loss nan, time 118.81ms
step 235250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 235250: loss nan, time 2904.24ms
iter 235260: loss nan, time 119.19ms
iter 235270: loss nan, time 118.71ms
iter 235280: loss nan, time 119.44ms
iter 235290: loss nan, time 119.27ms
tensor(0.0089)
iter 235300: loss nan, time 119.88ms
iter 235310: loss nan, time 119.77ms
iter 235320: loss nan, time 119.84ms
iter 235330: loss nan, time 120.17ms
iter 235340: loss nan, time 119.48ms
iter 235350: loss nan, time 119.55ms
iter 235360: loss nan, time 119.89ms
iter 235370: loss nan, time 120.33ms
iter 235380: loss nan, time 120.76ms
iter 235390: loss nan, time 119.99ms
tensor(0.0157)
iter 235400: loss nan, time 120.18ms
iter 235410: loss nan, time 120.02ms
iter 235420: loss nan, time 120.76ms
iter 235430: loss nan, time 120.89ms
iter 235440: loss nan, time 120.60ms
iter 235450: loss nan, time 120.07ms
iter 235460: loss nan, time 119.78ms
iter 235470: loss nan, time 120.03ms
iter 235480: loss nan, time 120.95ms
iter 235490: loss nan, time 120.95ms
tensor(0.0245)
step 235500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 235500: loss nan, time 2909.13ms
iter 235510: loss nan, time 119.74ms
iter 235520: loss nan, time 119.88ms
iter 235530: loss nan, time 119.96ms
iter 235540: loss nan, time 120.82ms
iter 235550: loss nan, time 120.87ms
iter 235560: loss nan, time 119.29ms
iter 235570: loss nan, time 120.58ms
iter 235580: loss nan, time 120.88ms
iter 235590: loss nan, time 121.47ms
tensor(0.0351)
iter 235600: loss nan, time 122.84ms
iter 235610: loss nan, time 120.84ms
iter 235620: loss nan, time 122.09ms
iter 235630: loss nan, time 121.95ms
iter 235640: loss nan, time 121.05ms
iter 235650: loss nan, time 121.14ms
iter 235660: loss nan, time 118.75ms
iter 235670: loss nan, time 121.08ms
iter 235680: loss nan, time 120.91ms
iter 235690: loss nan, time 121.38ms
tensor(0.0476)
iter 235700: loss nan, time 121.15ms
iter 235710: loss nan, time 117.87ms
iter 235720: loss nan, time 121.06ms
iter 235730: loss nan, time 118.97ms
iter 235740: loss nan, time 119.08ms
step 235750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 235750: loss nan, time 2902.66ms
iter 235760: loss nan, time 121.04ms
iter 235770: loss nan, time 117.96ms
iter 235780: loss nan, time 118.74ms
iter 235790: loss nan, time 119.59ms
tensor(0.0618)
iter 235800: loss nan, time 121.22ms
iter 235810: loss nan, time 120.68ms
iter 235820: loss nan, time 119.85ms
iter 235830: loss nan, time 119.06ms
iter 235840: loss nan, time 119.88ms
iter 235850: loss nan, time 120.21ms
iter 235860: loss nan, time 120.09ms
iter 235870: loss nan, time 120.98ms
iter 235880: loss nan, time 119.96ms
iter 235890: loss nan, time 121.38ms
tensor(0.0778)
iter 235900: loss nan, time 122.37ms
iter 235910: loss nan, time 122.21ms
iter 235920: loss nan, time 122.38ms
iter 235930: loss nan, time 120.85ms
iter 235940: loss nan, time 122.56ms
iter 235950: loss nan, time 121.01ms
iter 235960: loss nan, time 121.10ms
iter 235970: loss nan, time 120.83ms
iter 235980: loss nan, time 120.67ms
iter 235990: loss nan, time 120.69ms
tensor(0.0955)
step 236000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 236000: loss nan, time 2922.08ms
iter 236010: loss nan, time 119.73ms
iter 236020: loss nan, time 120.80ms
iter 236030: loss nan, time 120.67ms
iter 236040: loss nan, time 121.04ms
iter 236050: loss nan, time 122.10ms
iter 236060: loss nan, time 122.14ms
iter 236070: loss nan, time 119.87ms
iter 236080: loss nan, time 120.39ms
iter 236090: loss nan, time 121.49ms
tensor(0.1147)
iter 236100: loss nan, time 119.42ms
iter 236110: loss nan, time 118.98ms
iter 236120: loss nan, time 120.42ms
iter 236130: loss nan, time 119.39ms
iter 236140: loss nan, time 120.72ms
iter 236150: loss nan, time 121.20ms
iter 236160: loss nan, time 122.68ms
iter 236170: loss nan, time 122.65ms
iter 236180: loss nan, time 121.62ms
iter 236190: loss nan, time 121.61ms
tensor(0.1355)
iter 236200: loss nan, time 121.91ms
iter 236210: loss nan, time 121.41ms
iter 236220: loss nan, time 118.64ms
iter 236230: loss nan, time 120.06ms
iter 236240: loss nan, time 120.40ms
step 236250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 236250: loss nan, time 2904.45ms
iter 236260: loss nan, time 120.34ms
iter 236270: loss nan, time 119.57ms
iter 236280: loss nan, time 120.55ms
iter 236290: loss nan, time 120.49ms
tensor(0.1577)
iter 236300: loss nan, time 119.53ms
iter 236310: loss nan, time 120.80ms
iter 236320: loss nan, time 121.83ms
iter 236330: loss nan, time 122.91ms
iter 236340: loss nan, time 121.13ms
iter 236350: loss nan, time 119.14ms
iter 236360: loss nan, time 121.38ms
iter 236370: loss nan, time 119.18ms
iter 236380: loss nan, time 121.45ms
iter 236390: loss nan, time 119.68ms
tensor(0.1813)
iter 236400: loss nan, time 120.14ms
iter 236410: loss nan, time 119.45ms
iter 236420: loss nan, time 120.42ms
iter 236430: loss nan, time 117.84ms
iter 236440: loss nan, time 120.11ms
iter 236450: loss nan, time 120.35ms
iter 236460: loss nan, time 120.61ms
iter 236470: loss nan, time 122.39ms
iter 236480: loss nan, time 121.10ms
iter 236490: loss nan, time 121.22ms
tensor(0.2061)
step 236500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 236500: loss nan, time 2900.11ms
iter 236510: loss nan, time 121.90ms
iter 236520: loss nan, time 121.57ms
iter 236530: loss nan, time 121.41ms
iter 236540: loss nan, time 121.58ms
iter 236550: loss nan, time 119.52ms
iter 236560: loss nan, time 119.21ms
iter 236570: loss nan, time 120.63ms
iter 236580: loss nan, time 120.87ms
iter 236590: loss nan, time 120.67ms
tensor(0.2321)
iter 236600: loss nan, time 122.31ms
iter 236610: loss nan, time 122.73ms
iter 236620: loss nan, time 120.08ms
iter 236630: loss nan, time 122.06ms
iter 236640: loss nan, time 122.31ms
iter 236650: loss nan, time 122.61ms
iter 236660: loss nan, time 121.60ms
iter 236670: loss nan, time 119.40ms
iter 236680: loss nan, time 121.68ms
iter 236690: loss nan, time 122.44ms
tensor(0.2591)
iter 236700: loss nan, time 119.97ms
iter 236710: loss nan, time 120.66ms
iter 236720: loss nan, time 119.01ms
iter 236730: loss nan, time 119.78ms
iter 236740: loss nan, time 121.00ms
step 236750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 236750: loss nan, time 2873.65ms
iter 236760: loss nan, time 119.82ms
iter 236770: loss nan, time 120.66ms
iter 236780: loss nan, time 118.94ms
iter 236790: loss nan, time 119.68ms
tensor(0.2871)
iter 236800: loss nan, time 122.23ms
iter 236810: loss nan, time 122.03ms
iter 236820: loss nan, time 122.94ms
iter 236830: loss nan, time 121.71ms
iter 236840: loss nan, time 121.58ms
iter 236850: loss nan, time 121.86ms
iter 236860: loss nan, time 119.46ms
iter 236870: loss nan, time 119.98ms
iter 236880: loss nan, time 120.20ms
iter 236890: loss nan, time 120.98ms
tensor(0.3159)
iter 236900: loss nan, time 120.77ms
iter 236910: loss nan, time 123.45ms
iter 236920: loss nan, time 121.28ms
iter 236930: loss nan, time 121.97ms
iter 236940: loss nan, time 122.12ms
iter 236950: loss nan, time 119.84ms
iter 236960: loss nan, time 120.46ms
iter 236970: loss nan, time 121.00ms
iter 236980: loss nan, time 119.75ms
iter 236990: loss nan, time 121.57ms
tensor(0.3455)
step 237000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 237000: loss nan, time 2907.97ms
iter 237010: loss nan, time 120.54ms
iter 237020: loss nan, time 123.31ms
iter 237030: loss nan, time 122.00ms
iter 237040: loss nan, time 122.23ms
iter 237050: loss nan, time 119.14ms
iter 237060: loss nan, time 119.58ms
iter 237070: loss nan, time 121.32ms
iter 237080: loss nan, time 120.69ms
iter 237090: loss nan, time 121.38ms
tensor(0.3757)
iter 237100: loss nan, time 123.65ms
iter 237110: loss nan, time 123.88ms
iter 237120: loss nan, time 119.29ms
iter 237130: loss nan, time 121.34ms
iter 237140: loss nan, time 119.60ms
iter 237150: loss nan, time 119.24ms
iter 237160: loss nan, time 119.44ms
iter 237170: loss nan, time 118.72ms
iter 237180: loss nan, time 119.08ms
iter 237190: loss nan, time 120.30ms
tensor(0.4063)
iter 237200: loss nan, time 120.40ms
iter 237210: loss nan, time 120.08ms
iter 237220: loss nan, time 120.53ms
iter 237230: loss nan, time 120.44ms
iter 237240: loss nan, time 122.31ms
step 237250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 237250: loss nan, time 2888.59ms
iter 237260: loss nan, time 120.42ms
iter 237270: loss nan, time 121.08ms
iter 237280: loss nan, time 121.41ms
iter 237290: loss nan, time 121.35ms
tensor(0.4373)
iter 237300: loss nan, time 122.99ms
iter 237310: loss nan, time 121.35ms
iter 237320: loss nan, time 121.16ms
iter 237330: loss nan, time 121.19ms
iter 237340: loss nan, time 121.22ms
iter 237350: loss nan, time 121.10ms
iter 237360: loss nan, time 121.21ms
iter 237370: loss nan, time 118.92ms
iter 237380: loss nan, time 119.26ms
iter 237390: loss nan, time 119.77ms
tensor(0.4686)
iter 237400: loss nan, time 120.51ms
iter 237410: loss nan, time 120.36ms
iter 237420: loss nan, time 120.05ms
iter 237430: loss nan, time 120.30ms
iter 237440: loss nan, time 120.29ms
iter 237450: loss nan, time 120.82ms
iter 237460: loss nan, time 122.21ms
iter 237470: loss nan, time 120.46ms
iter 237480: loss nan, time 122.24ms
iter 237490: loss nan, time 121.23ms
tensor(0.5000)
step 237500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 237500: loss nan, time 2892.93ms
iter 237510: loss nan, time 122.80ms
iter 237520: loss nan, time 121.22ms
iter 237530: loss nan, time 118.89ms
iter 237540: loss nan, time 121.26ms
iter 237550: loss nan, time 121.17ms
iter 237560: loss nan, time 121.26ms
iter 237570: loss nan, time 118.91ms
iter 237580: loss nan, time 119.43ms
iter 237590: loss nan, time 118.95ms
tensor(0.5314)
iter 237600: loss nan, time 120.46ms
iter 237610: loss nan, time 120.25ms
iter 237620: loss nan, time 120.36ms
iter 237630: loss nan, time 120.27ms
iter 237640: loss nan, time 118.98ms
iter 237650: loss nan, time 120.59ms
iter 237660: loss nan, time 122.31ms
iter 237670: loss nan, time 122.63ms
iter 237680: loss nan, time 122.45ms
iter 237690: loss nan, time 121.47ms
tensor(0.5627)
iter 237700: loss nan, time 121.68ms
iter 237710: loss nan, time 121.53ms
iter 237720: loss nan, time 121.19ms
iter 237730: loss nan, time 121.11ms
iter 237740: loss nan, time 119.41ms
step 237750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 237750: loss nan, time 2879.31ms
iter 237760: loss nan, time 121.65ms
iter 237770: loss nan, time 121.29ms
iter 237780: loss nan, time 121.12ms
iter 237790: loss nan, time 121.05ms
tensor(0.5937)
iter 237800: loss nan, time 121.59ms
iter 237810: loss nan, time 121.47ms
iter 237820: loss nan, time 119.15ms
iter 237830: loss nan, time 119.04ms
iter 237840: loss nan, time 119.12ms
iter 237850: loss nan, time 120.45ms
iter 237860: loss nan, time 119.68ms
iter 237870: loss nan, time 119.55ms
iter 237880: loss nan, time 120.43ms
iter 237890: loss nan, time 120.32ms
tensor(0.6243)
iter 237900: loss nan, time 120.58ms
iter 237910: loss nan, time 120.79ms
iter 237920: loss nan, time 121.75ms
iter 237930: loss nan, time 120.44ms
iter 237940: loss nan, time 122.30ms
iter 237950: loss nan, time 121.17ms
iter 237960: loss nan, time 121.35ms
iter 237970: loss nan, time 121.07ms
iter 237980: loss nan, time 119.25ms
iter 237990: loss nan, time 119.38ms
tensor(0.6545)
step 238000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 238000: loss nan, time 2896.67ms
iter 238010: loss nan, time 121.34ms
iter 238020: loss nan, time 119.06ms
iter 238030: loss nan, time 119.60ms
iter 238040: loss nan, time 120.37ms
iter 238050: loss nan, time 119.19ms
iter 238060: loss nan, time 120.22ms
iter 238070: loss nan, time 120.26ms
iter 238080: loss nan, time 120.13ms
iter 238090: loss nan, time 121.19ms
tensor(0.6841)
iter 238100: loss nan, time 121.30ms
iter 238110: loss nan, time 122.38ms
iter 238120: loss nan, time 122.35ms
iter 238130: loss nan, time 121.14ms
iter 238140: loss nan, time 121.15ms
iter 238150: loss nan, time 121.12ms
iter 238160: loss nan, time 121.11ms
iter 238170: loss nan, time 121.20ms
iter 238180: loss nan, time 119.10ms
iter 238190: loss nan, time 119.50ms
tensor(0.7129)
iter 238200: loss nan, time 120.80ms
iter 238210: loss nan, time 120.11ms
iter 238220: loss nan, time 120.23ms
iter 238230: loss nan, time 119.51ms
iter 238240: loss nan, time 120.64ms
step 238250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 238250: loss nan, time 2923.68ms
iter 238260: loss nan, time 121.94ms
iter 238270: loss nan, time 122.61ms
iter 238280: loss nan, time 121.26ms
iter 238290: loss nan, time 119.08ms
tensor(0.7409)
iter 238300: loss nan, time 121.35ms
iter 238310: loss nan, time 121.17ms
iter 238320: loss nan, time 119.26ms
iter 238330: loss nan, time 119.25ms
iter 238340: loss nan, time 119.96ms
iter 238350: loss nan, time 119.17ms
iter 238360: loss nan, time 120.60ms
iter 238370: loss nan, time 120.99ms
iter 238380: loss nan, time 121.69ms
iter 238390: loss nan, time 122.43ms
tensor(0.7679)
iter 238400: loss nan, time 121.53ms
iter 238410: loss nan, time 122.42ms
iter 238420: loss nan, time 122.24ms
iter 238430: loss nan, time 121.26ms
iter 238440: loss nan, time 121.13ms
iter 238450: loss nan, time 119.00ms
iter 238460: loss nan, time 119.17ms
iter 238470: loss nan, time 118.77ms
iter 238480: loss nan, time 119.10ms
iter 238490: loss nan, time 120.34ms
tensor(0.7939)
step 238500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 238500: loss nan, time 2881.08ms
iter 238510: loss nan, time 118.29ms
iter 238520: loss nan, time 118.95ms
iter 238530: loss nan, time 118.89ms
iter 238540: loss nan, time 119.04ms
iter 238550: loss nan, time 119.07ms
iter 238560: loss nan, time 119.05ms
iter 238570: loss nan, time 119.71ms
iter 238580: loss nan, time 120.39ms
iter 238590: loss nan, time 120.21ms
tensor(0.8187)
iter 238600: loss nan, time 121.19ms
iter 238610: loss nan, time 121.31ms
iter 238620: loss nan, time 121.74ms
iter 238630: loss nan, time 122.42ms
iter 238640: loss nan, time 119.76ms
iter 238650: loss nan, time 122.34ms
iter 238660: loss nan, time 122.42ms
iter 238670: loss nan, time 122.09ms
iter 238680: loss nan, time 120.98ms
iter 238690: loss nan, time 117.98ms
tensor(0.8423)
iter 238700: loss nan, time 120.34ms
iter 238710: loss nan, time 121.07ms
iter 238720: loss nan, time 119.68ms
iter 238730: loss nan, time 119.29ms
iter 238740: loss nan, time 118.18ms
step 238750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 238750: loss nan, time 2899.98ms
iter 238760: loss nan, time 119.03ms
iter 238770: loss nan, time 119.19ms
iter 238780: loss nan, time 118.93ms
iter 238790: loss nan, time 118.96ms
tensor(0.8645)
iter 238800: loss nan, time 119.27ms
iter 238810: loss nan, time 118.55ms
iter 238820: loss nan, time 120.02ms
iter 238830: loss nan, time 120.23ms
iter 238840: loss nan, time 120.58ms
iter 238850: loss nan, time 120.94ms
iter 238860: loss nan, time 120.24ms
iter 238870: loss nan, time 122.46ms
iter 238880: loss nan, time 122.43ms
iter 238890: loss nan, time 122.53ms
tensor(0.8853)
iter 238900: loss nan, time 122.93ms
iter 238910: loss nan, time 121.09ms
iter 238920: loss nan, time 120.72ms
iter 238930: loss nan, time 121.21ms
iter 238940: loss nan, time 121.03ms
iter 238950: loss nan, time 121.27ms
iter 238960: loss nan, time 119.21ms
iter 238970: loss nan, time 119.21ms
iter 238980: loss nan, time 119.01ms
iter 238990: loss nan, time 119.03ms
tensor(0.9045)
step 239000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 239000: loss nan, time 2878.96ms
iter 239010: loss nan, time 121.23ms
iter 239020: loss nan, time 121.19ms
iter 239030: loss nan, time 120.22ms
iter 239040: loss nan, time 119.03ms
iter 239050: loss nan, time 118.90ms
iter 239060: loss nan, time 119.05ms
iter 239070: loss nan, time 119.03ms
iter 239080: loss nan, time 119.99ms
iter 239090: loss nan, time 120.32ms
tensor(0.9222)
iter 239100: loss nan, time 120.48ms
iter 239110: loss nan, time 120.76ms
iter 239120: loss nan, time 121.41ms
iter 239130: loss nan, time 122.21ms
iter 239140: loss nan, time 122.41ms
iter 239150: loss nan, time 120.13ms
iter 239160: loss nan, time 122.44ms
iter 239170: loss nan, time 122.31ms
iter 239180: loss nan, time 121.28ms
iter 239190: loss nan, time 121.13ms
tensor(0.9382)
iter 239200: loss nan, time 125.01ms
iter 239210: loss nan, time 119.00ms
iter 239220: loss nan, time 119.31ms
iter 239230: loss nan, time 118.99ms
iter 239240: loss nan, time 119.17ms
step 239250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 239250: loss nan, time 2882.33ms
iter 239260: loss nan, time 119.09ms
iter 239270: loss nan, time 119.01ms
iter 239280: loss nan, time 119.30ms
iter 239290: loss nan, time 119.64ms
tensor(0.9524)
iter 239300: loss nan, time 119.41ms
iter 239310: loss nan, time 119.87ms
iter 239320: loss nan, time 118.76ms
iter 239330: loss nan, time 120.79ms
iter 239340: loss nan, time 120.45ms
iter 239350: loss nan, time 120.59ms
iter 239360: loss nan, time 121.25ms
iter 239370: loss nan, time 119.96ms
iter 239380: loss nan, time 122.66ms
iter 239390: loss nan, time 122.38ms
tensor(0.9649)
iter 239400: loss nan, time 122.31ms
iter 239410: loss nan, time 121.89ms
iter 239420: loss nan, time 120.74ms
iter 239430: loss nan, time 122.06ms
iter 239440: loss nan, time 121.93ms
iter 239450: loss nan, time 122.34ms
iter 239460: loss nan, time 121.87ms
iter 239470: loss nan, time 120.81ms
iter 239480: loss nan, time 121.99ms
iter 239490: loss nan, time 122.43ms
tensor(0.9755)
step 239500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 239500: loss nan, time 2915.59ms
iter 239510: loss nan, time 121.91ms
iter 239520: loss nan, time 121.78ms
iter 239530: loss nan, time 120.76ms
iter 239540: loss nan, time 121.99ms
iter 239550: loss nan, time 121.87ms
iter 239560: loss nan, time 121.90ms
iter 239570: loss nan, time 122.13ms
iter 239580: loss nan, time 120.71ms
iter 239590: loss nan, time 121.98ms
tensor(0.9843)
iter 239600: loss nan, time 122.16ms
iter 239610: loss nan, time 121.93ms
iter 239620: loss nan, time 121.94ms
iter 239630: loss nan, time 120.82ms
iter 239640: loss nan, time 121.81ms
iter 239650: loss nan, time 121.89ms
iter 239660: loss nan, time 121.98ms
iter 239670: loss nan, time 121.79ms
iter 239680: loss nan, time 120.60ms
iter 239690: loss nan, time 121.81ms
tensor(0.9911)
iter 239700: loss nan, time 122.27ms
iter 239710: loss nan, time 121.76ms
iter 239720: loss nan, time 121.85ms
iter 239730: loss nan, time 120.68ms
iter 239740: loss nan, time 122.42ms
step 239750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 239750: loss nan, time 2890.10ms
iter 239760: loss nan, time 119.73ms
iter 239770: loss nan, time 119.89ms
iter 239780: loss nan, time 120.36ms
iter 239790: loss nan, time 119.43ms
tensor(0.9961)
iter 239800: loss nan, time 120.54ms
iter 239810: loss nan, time 120.31ms
iter 239820: loss nan, time 120.81ms
iter 239830: loss nan, time 120.00ms
iter 239840: loss nan, time 119.18ms
iter 239850: loss nan, time 120.83ms
iter 239860: loss nan, time 120.53ms
iter 239870: loss nan, time 120.68ms
iter 239880: loss nan, time 120.70ms
iter 239890: loss nan, time 119.50ms
tensor(0.9990)
iter 239900: loss nan, time 121.20ms
iter 239910: loss nan, time 121.61ms
iter 239920: loss nan, time 121.86ms
iter 239930: loss nan, time 121.92ms
iter 239940: loss nan, time 120.83ms
iter 239950: loss nan, time 121.93ms
iter 239960: loss nan, time 122.67ms
iter 239970: loss nan, time 122.09ms
iter 239980: loss nan, time 122.34ms
iter 239990: loss nan, time 120.87ms
tensor(1.)
step 240000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 240000: loss nan, time 2910.81ms
iter 240010: loss nan, time 122.02ms
iter 240020: loss nan, time 122.20ms
iter 240030: loss nan, time 122.00ms
iter 240040: loss nan, time 121.84ms
iter 240050: loss nan, time 120.75ms
iter 240060: loss nan, time 121.63ms
iter 240070: loss nan, time 122.26ms
iter 240080: loss nan, time 122.07ms
iter 240090: loss nan, time 121.96ms
tensor(0.9990)
iter 240100: loss nan, time 121.05ms
iter 240110: loss nan, time 121.91ms
iter 240120: loss nan, time 122.44ms
iter 240130: loss nan, time 121.96ms
iter 240140: loss nan, time 122.01ms
iter 240150: loss nan, time 120.86ms
iter 240160: loss nan, time 122.13ms
iter 240170: loss nan, time 122.22ms
iter 240180: loss nan, time 121.98ms
iter 240190: loss nan, time 121.74ms
tensor(0.9961)
iter 240200: loss nan, time 121.04ms
iter 240210: loss nan, time 122.26ms
iter 240220: loss nan, time 121.88ms
iter 240230: loss nan, time 121.98ms
iter 240240: loss nan, time 121.45ms
step 240250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 240250: loss nan, time 2919.86ms
iter 240260: loss nan, time 120.75ms
iter 240270: loss nan, time 122.39ms
iter 240280: loss nan, time 122.31ms
iter 240290: loss nan, time 122.32ms
tensor(0.9911)
iter 240300: loss nan, time 122.05ms
iter 240310: loss nan, time 120.34ms
iter 240320: loss nan, time 122.43ms
iter 240330: loss nan, time 120.72ms
iter 240340: loss nan, time 121.11ms
iter 240350: loss nan, time 120.83ms
iter 240360: loss nan, time 120.73ms
iter 240370: loss nan, time 121.06ms
iter 240380: loss nan, time 120.83ms
iter 240390: loss nan, time 120.61ms
tensor(0.9843)
iter 240400: loss nan, time 118.91ms
iter 240410: loss nan, time 118.51ms
iter 240420: loss nan, time 118.86ms
iter 240430: loss nan, time 118.57ms
iter 240440: loss nan, time 118.82ms
iter 240450: loss nan, time 118.99ms
iter 240460: loss nan, time 119.35ms
iter 240470: loss nan, time 118.33ms
iter 240480: loss nan, time 119.03ms
iter 240490: loss nan, time 119.50ms
tensor(0.9755)
step 240500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 240500: loss nan, time 2911.08ms
iter 240510: loss nan, time 121.10ms
iter 240520: loss nan, time 118.07ms
iter 240530: loss nan, time 119.34ms
iter 240540: loss nan, time 119.13ms
iter 240550: loss nan, time 119.05ms
iter 240560: loss nan, time 118.54ms
iter 240570: loss nan, time 119.05ms
iter 240580: loss nan, time 119.79ms
iter 240590: loss nan, time 119.48ms
tensor(0.9649)
iter 240600: loss nan, time 120.39ms
iter 240610: loss nan, time 119.80ms
iter 240620: loss nan, time 119.27ms
iter 240630: loss nan, time 120.89ms
iter 240640: loss nan, time 120.94ms
iter 240650: loss nan, time 119.92ms
iter 240660: loss nan, time 121.00ms
iter 240670: loss nan, time 121.84ms
iter 240680: loss nan, time 122.07ms
iter 240690: loss nan, time 122.03ms
tensor(0.9524)
iter 240700: loss nan, time 120.21ms
iter 240710: loss nan, time 121.19ms
iter 240720: loss nan, time 122.11ms
iter 240730: loss nan, time 122.22ms
iter 240740: loss nan, time 122.16ms
step 240750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 240750: loss nan, time 2880.38ms
iter 240760: loss nan, time 118.50ms
iter 240770: loss nan, time 119.57ms
iter 240780: loss nan, time 119.84ms
iter 240790: loss nan, time 120.28ms
tensor(0.9382)
iter 240800: loss nan, time 120.85ms
iter 240810: loss nan, time 119.96ms
iter 240820: loss nan, time 120.08ms
iter 240830: loss nan, time 120.11ms
iter 240840: loss nan, time 119.94ms
iter 240850: loss nan, time 119.87ms
iter 240860: loss nan, time 120.16ms
iter 240870: loss nan, time 120.47ms
iter 240880: loss nan, time 120.35ms
iter 240890: loss nan, time 120.57ms
tensor(0.9222)
iter 240900: loss nan, time 120.99ms
iter 240910: loss nan, time 119.90ms
iter 240920: loss nan, time 119.64ms
iter 240930: loss nan, time 119.90ms
iter 240940: loss nan, time 120.15ms
iter 240950: loss nan, time 120.54ms
iter 240960: loss nan, time 120.46ms
iter 240970: loss nan, time 120.13ms
iter 240980: loss nan, time 119.97ms
iter 240990: loss nan, time 119.68ms
tensor(0.9045)
step 241000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 241000: loss nan, time 2880.40ms
iter 241010: loss nan, time 119.07ms
iter 241020: loss nan, time 119.01ms
iter 241030: loss nan, time 120.77ms
iter 241040: loss nan, time 119.05ms
iter 241050: loss nan, time 119.13ms
iter 241060: loss nan, time 119.60ms
iter 241070: loss nan, time 118.93ms
iter 241080: loss nan, time 119.29ms
iter 241090: loss nan, time 119.06ms
tensor(0.8853)
iter 241100: loss nan, time 119.20ms
iter 241110: loss nan, time 118.12ms
iter 241120: loss nan, time 118.95ms
iter 241130: loss nan, time 118.93ms
iter 241140: loss nan, time 118.92ms
iter 241150: loss nan, time 118.91ms
iter 241160: loss nan, time 118.79ms
iter 241170: loss nan, time 119.08ms
iter 241180: loss nan, time 120.91ms
iter 241190: loss nan, time 118.99ms
tensor(0.8645)
iter 241200: loss nan, time 119.39ms
iter 241210: loss nan, time 119.50ms
iter 241220: loss nan, time 118.93ms
iter 241230: loss nan, time 118.85ms
iter 241240: loss nan, time 119.06ms
step 241250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 241250: loss nan, time 2898.41ms
iter 241260: loss nan, time 121.07ms
iter 241270: loss nan, time 121.08ms
iter 241280: loss nan, time 121.01ms
iter 241290: loss nan, time 121.02ms
tensor(0.8423)
iter 241300: loss nan, time 121.28ms
iter 241310: loss nan, time 119.03ms
iter 241320: loss nan, time 119.14ms
iter 241330: loss nan, time 118.92ms
iter 241340: loss nan, time 118.94ms
iter 241350: loss nan, time 118.87ms
iter 241360: loss nan, time 118.81ms
iter 241370: loss nan, time 119.01ms
iter 241380: loss nan, time 118.95ms
iter 241390: loss nan, time 119.37ms
tensor(0.8187)
iter 241400: loss nan, time 120.21ms
iter 241410: loss nan, time 120.85ms
iter 241420: loss nan, time 120.77ms
iter 241430: loss nan, time 120.02ms
iter 241440: loss nan, time 120.25ms
iter 241450: loss nan, time 122.17ms
iter 241460: loss nan, time 122.01ms
iter 241470: loss nan, time 122.20ms
iter 241480: loss nan, time 120.87ms
iter 241490: loss nan, time 121.02ms
tensor(0.7939)
step 241500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 241500: loss nan, time 2884.87ms
iter 241510: loss nan, time 119.90ms
iter 241520: loss nan, time 119.93ms
iter 241530: loss nan, time 120.33ms
iter 241540: loss nan, time 119.89ms
iter 241550: loss nan, time 119.68ms
iter 241560: loss nan, time 121.51ms
iter 241570: loss nan, time 121.76ms
iter 241580: loss nan, time 122.18ms
iter 241590: loss nan, time 119.98ms
tensor(0.7679)
iter 241600: loss nan, time 122.60ms
iter 241610: loss nan, time 121.12ms
iter 241620: loss nan, time 120.87ms
iter 241630: loss nan, time 120.85ms
iter 241640: loss nan, time 119.14ms
iter 241650: loss nan, time 120.31ms
iter 241660: loss nan, time 121.16ms
iter 241670: loss nan, time 121.74ms
iter 241680: loss nan, time 121.11ms
iter 241690: loss nan, time 118.89ms
tensor(0.7409)
iter 241700: loss nan, time 119.12ms
iter 241710: loss nan, time 119.56ms
iter 241720: loss nan, time 120.05ms
iter 241730: loss nan, time 120.56ms
iter 241740: loss nan, time 120.02ms
step 241750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 241750: loss nan, time 2869.23ms
iter 241760: loss nan, time 119.74ms
iter 241770: loss nan, time 120.81ms
iter 241780: loss nan, time 120.95ms
iter 241790: loss nan, time 120.91ms
tensor(0.7129)
iter 241800: loss nan, time 119.00ms
iter 241810: loss nan, time 120.93ms
iter 241820: loss nan, time 121.30ms
iter 241830: loss nan, time 119.78ms
iter 241840: loss nan, time 120.64ms
iter 241850: loss nan, time 118.49ms
iter 241860: loss nan, time 119.56ms
iter 241870: loss nan, time 120.84ms
iter 241880: loss nan, time 120.76ms
iter 241890: loss nan, time 120.66ms
tensor(0.6841)
iter 241900: loss nan, time 119.99ms
iter 241910: loss nan, time 118.51ms
iter 241920: loss nan, time 118.65ms
iter 241930: loss nan, time 118.84ms
iter 241940: loss nan, time 118.26ms
iter 241950: loss nan, time 118.14ms
iter 241960: loss nan, time 118.00ms
iter 241970: loss nan, time 118.31ms
iter 241980: loss nan, time 118.42ms
iter 241990: loss nan, time 118.26ms
tensor(0.6545)
step 242000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 242000: loss nan, time 2876.56ms
iter 242010: loss nan, time 119.83ms
iter 242020: loss nan, time 121.41ms
iter 242030: loss nan, time 121.20ms
iter 242040: loss nan, time 121.13ms
iter 242050: loss nan, time 122.14ms
iter 242060: loss nan, time 119.93ms
iter 242070: loss nan, time 121.14ms
iter 242080: loss nan, time 120.85ms
iter 242090: loss nan, time 121.22ms
tensor(0.6243)
iter 242100: loss nan, time 121.46ms
iter 242110: loss nan, time 118.68ms
iter 242120: loss nan, time 121.33ms
iter 242130: loss nan, time 121.92ms
iter 242140: loss nan, time 121.74ms
iter 242150: loss nan, time 121.32ms
iter 242160: loss nan, time 118.66ms
iter 242170: loss nan, time 120.82ms
iter 242180: loss nan, time 120.94ms
iter 242190: loss nan, time 120.94ms
tensor(0.5937)
iter 242200: loss nan, time 120.07ms
iter 242210: loss nan, time 118.39ms
iter 242220: loss nan, time 120.11ms
iter 242230: loss nan, time 120.96ms
iter 242240: loss nan, time 121.00ms
step 242250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 242250: loss nan, time 2874.24ms
iter 242260: loss nan, time 119.92ms
iter 242270: loss nan, time 120.12ms
iter 242280: loss nan, time 120.46ms
iter 242290: loss nan, time 120.45ms
tensor(0.5627)
iter 242300: loss nan, time 120.26ms
iter 242310: loss nan, time 120.04ms
iter 242320: loss nan, time 120.11ms
iter 242330: loss nan, time 120.06ms
iter 242340: loss nan, time 120.75ms
iter 242350: loss nan, time 122.23ms
iter 242360: loss nan, time 121.70ms
iter 242370: loss nan, time 119.44ms
iter 242380: loss nan, time 121.19ms
iter 242390: loss nan, time 120.51ms
tensor(0.5314)
iter 242400: loss nan, time 120.93ms
iter 242410: loss nan, time 120.57ms
iter 242420: loss nan, time 119.01ms
iter 242430: loss nan, time 118.27ms
iter 242440: loss nan, time 120.36ms
iter 242450: loss nan, time 120.05ms
iter 242460: loss nan, time 120.48ms
iter 242470: loss nan, time 121.01ms
iter 242480: loss nan, time 120.90ms
iter 242490: loss nan, time 122.83ms
tensor(0.5000)
step 242500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 242500: loss nan, time 2901.13ms
iter 242510: loss nan, time 121.16ms
iter 242520: loss nan, time 122.10ms
iter 242530: loss nan, time 122.57ms
iter 242540: loss nan, time 121.14ms
iter 242550: loss nan, time 121.79ms
iter 242560: loss nan, time 121.73ms
iter 242570: loss nan, time 121.56ms
iter 242580: loss nan, time 121.58ms
iter 242590: loss nan, time 119.39ms
tensor(0.4686)
iter 242600: loss nan, time 120.43ms
iter 242610: loss nan, time 120.67ms
iter 242620: loss nan, time 120.61ms
iter 242630: loss nan, time 120.51ms
iter 242640: loss nan, time 120.76ms
iter 242650: loss nan, time 121.88ms
iter 242660: loss nan, time 122.54ms
iter 242670: loss nan, time 120.64ms
iter 242680: loss nan, time 121.24ms
iter 242690: loss nan, time 120.43ms
tensor(0.4373)
iter 242700: loss nan, time 119.88ms
iter 242710: loss nan, time 121.42ms
iter 242720: loss nan, time 118.88ms
iter 242730: loss nan, time 119.42ms
iter 242740: loss nan, time 117.84ms
step 242750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 242750: loss nan, time 2921.32ms
iter 242760: loss nan, time 119.80ms
iter 242770: loss nan, time 120.08ms
iter 242780: loss nan, time 120.52ms
iter 242790: loss nan, time 119.26ms
tensor(0.4063)
iter 242800: loss nan, time 120.37ms
iter 242810: loss nan, time 122.03ms
iter 242820: loss nan, time 122.62ms
iter 242830: loss nan, time 122.75ms
iter 242840: loss nan, time 121.25ms
iter 242850: loss nan, time 121.39ms
iter 242860: loss nan, time 121.43ms
iter 242870: loss nan, time 119.39ms
iter 242880: loss nan, time 119.67ms
iter 242890: loss nan, time 120.50ms
tensor(0.3757)
iter 242900: loss nan, time 121.00ms
iter 242910: loss nan, time 119.97ms
iter 242920: loss nan, time 120.53ms
iter 242930: loss nan, time 122.40ms
iter 242940: loss nan, time 122.63ms
iter 242950: loss nan, time 121.47ms
iter 242960: loss nan, time 120.61ms
iter 242970: loss nan, time 119.54ms
iter 242980: loss nan, time 118.90ms
iter 242990: loss nan, time 118.52ms
tensor(0.3455)
step 243000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 243000: loss nan, time 2830.59ms
iter 243010: loss nan, time 120.88ms
iter 243020: loss nan, time 121.60ms
iter 243030: loss nan, time 120.55ms
iter 243040: loss nan, time 122.84ms
iter 243050: loss nan, time 121.56ms
iter 243060: loss nan, time 121.83ms
iter 243070: loss nan, time 119.79ms
iter 243080: loss nan, time 120.35ms
iter 243090: loss nan, time 119.51ms
tensor(0.3159)
iter 243100: loss nan, time 120.89ms
iter 243110: loss nan, time 120.82ms
iter 243120: loss nan, time 122.12ms
iter 243130: loss nan, time 122.74ms
iter 243140: loss nan, time 121.40ms
iter 243150: loss nan, time 121.57ms
iter 243160: loss nan, time 121.71ms
iter 243170: loss nan, time 119.61ms
iter 243180: loss nan, time 120.37ms
iter 243190: loss nan, time 120.67ms
tensor(0.2871)
iter 243200: loss nan, time 121.13ms
iter 243210: loss nan, time 122.56ms
iter 243220: loss nan, time 120.20ms
iter 243230: loss nan, time 122.41ms
iter 243240: loss nan, time 121.26ms
step 243250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 243250: loss nan, time 2879.77ms
iter 243260: loss nan, time 120.12ms
iter 243270: loss nan, time 120.34ms
iter 243280: loss nan, time 120.16ms
iter 243290: loss nan, time 120.76ms
tensor(0.2591)
iter 243300: loss nan, time 121.11ms
iter 243310: loss nan, time 121.47ms
iter 243320: loss nan, time 120.95ms
iter 243330: loss nan, time 119.85ms
iter 243340: loss nan, time 121.51ms
iter 243350: loss nan, time 122.09ms
iter 243360: loss nan, time 121.95ms
iter 243370: loss nan, time 121.99ms
iter 243380: loss nan, time 120.06ms
iter 243390: loss nan, time 121.24ms
tensor(0.2321)
iter 243400: loss nan, time 121.20ms
iter 243410: loss nan, time 121.39ms
iter 243420: loss nan, time 120.84ms
iter 243430: loss nan, time 118.84ms
iter 243440: loss nan, time 120.71ms
iter 243450: loss nan, time 119.99ms
iter 243460: loss nan, time 120.72ms
iter 243470: loss nan, time 120.79ms
iter 243480: loss nan, time 118.19ms
iter 243490: loss nan, time 121.04ms
tensor(0.2061)
step 243500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 243500: loss nan, time 2879.07ms
iter 243510: loss nan, time 122.28ms
iter 243520: loss nan, time 122.09ms
iter 243530: loss nan, time 122.11ms
iter 243540: loss nan, time 118.85ms
iter 243550: loss nan, time 121.27ms
iter 243560: loss nan, time 120.85ms
iter 243570: loss nan, time 120.84ms
iter 243580: loss nan, time 119.55ms
iter 243590: loss nan, time 118.64ms
tensor(0.1813)
iter 243600: loss nan, time 121.04ms
iter 243610: loss nan, time 121.48ms
iter 243620: loss nan, time 120.87ms
iter 243630: loss nan, time 120.82ms
iter 243640: loss nan, time 118.88ms
iter 243650: loss nan, time 120.86ms
iter 243660: loss nan, time 119.42ms
iter 243670: loss nan, time 118.83ms
iter 243680: loss nan, time 118.78ms
iter 243690: loss nan, time 119.64ms
tensor(0.1577)
iter 243700: loss nan, time 119.11ms
iter 243710: loss nan, time 120.17ms
iter 243720: loss nan, time 120.65ms
iter 243730: loss nan, time 119.86ms
iter 243740: loss nan, time 119.37ms
step 243750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 243750: loss nan, time 2914.78ms
iter 243760: loss nan, time 118.67ms
iter 243770: loss nan, time 119.70ms
iter 243780: loss nan, time 119.89ms
iter 243790: loss nan, time 120.02ms
tensor(0.1355)
iter 243800: loss nan, time 120.53ms
iter 243810: loss nan, time 119.31ms
iter 243820: loss nan, time 119.16ms
iter 243830: loss nan, time 120.42ms
iter 243840: loss nan, time 120.57ms
iter 243850: loss nan, time 121.03ms
iter 243860: loss nan, time 119.88ms
iter 243870: loss nan, time 121.00ms
iter 243880: loss nan, time 121.54ms
iter 243890: loss nan, time 122.15ms
tensor(0.1147)
iter 243900: loss nan, time 122.41ms
iter 243910: loss nan, time 120.94ms
iter 243920: loss nan, time 122.89ms
iter 243930: loss nan, time 120.68ms
iter 243940: loss nan, time 120.75ms
iter 243950: loss nan, time 120.87ms
iter 243960: loss nan, time 120.70ms
iter 243970: loss nan, time 120.91ms
iter 243980: loss nan, time 120.88ms
iter 243990: loss nan, time 120.88ms
tensor(0.0955)
step 244000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 244000: loss nan, time 2919.54ms
iter 244010: loss nan, time 119.84ms
iter 244020: loss nan, time 118.69ms
iter 244030: loss nan, time 118.65ms
iter 244040: loss nan, time 119.09ms
iter 244050: loss nan, time 118.95ms
iter 244060: loss nan, time 119.22ms
iter 244070: loss nan, time 119.33ms
iter 244080: loss nan, time 120.28ms
iter 244090: loss nan, time 120.14ms
tensor(0.0778)
iter 244100: loss nan, time 120.03ms
iter 244110: loss nan, time 120.21ms
iter 244120: loss nan, time 120.40ms
iter 244130: loss nan, time 119.65ms
iter 244140: loss nan, time 120.11ms
iter 244150: loss nan, time 119.84ms
iter 244160: loss nan, time 120.03ms
iter 244170: loss nan, time 119.79ms
iter 244180: loss nan, time 120.11ms
iter 244190: loss nan, time 120.21ms
tensor(0.0618)
iter 244200: loss nan, time 119.96ms
iter 244210: loss nan, time 120.68ms
iter 244220: loss nan, time 121.12ms
iter 244230: loss nan, time 120.95ms
iter 244240: loss nan, time 121.80ms
step 244250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 244250: loss nan, time 2911.15ms
iter 244260: loss nan, time 119.93ms
iter 244270: loss nan, time 120.58ms
iter 244280: loss nan, time 120.64ms
iter 244290: loss nan, time 120.98ms
tensor(0.0476)
iter 244300: loss nan, time 121.32ms
iter 244310: loss nan, time 120.00ms
iter 244320: loss nan, time 121.44ms
iter 244330: loss nan, time 122.16ms
iter 244340: loss nan, time 122.44ms
iter 244350: loss nan, time 122.04ms
iter 244360: loss nan, time 120.22ms
iter 244370: loss nan, time 121.33ms
iter 244380: loss nan, time 120.83ms
iter 244390: loss nan, time 122.32ms
tensor(0.0351)
iter 244400: loss nan, time 121.14ms
iter 244410: loss nan, time 118.74ms
iter 244420: loss nan, time 120.82ms
iter 244430: loss nan, time 120.52ms
iter 244440: loss nan, time 121.33ms
iter 244450: loss nan, time 120.76ms
iter 244460: loss nan, time 118.85ms
iter 244470: loss nan, time 121.28ms
iter 244480: loss nan, time 118.64ms
iter 244490: loss nan, time 119.23ms
tensor(0.0245)
step 244500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 244500: loss nan, time 2908.11ms
iter 244510: loss nan, time 118.99ms
iter 244520: loss nan, time 119.25ms
iter 244530: loss nan, time 118.65ms
iter 244540: loss nan, time 120.06ms
iter 244550: loss nan, time 120.02ms
iter 244560: loss nan, time 119.91ms
iter 244570: loss nan, time 120.60ms
iter 244580: loss nan, time 118.81ms
iter 244590: loss nan, time 119.91ms
tensor(0.0157)
iter 244600: loss nan, time 120.20ms
iter 244610: loss nan, time 120.03ms
iter 244620: loss nan, time 121.31ms
iter 244630: loss nan, time 118.78ms
iter 244640: loss nan, time 120.27ms
iter 244650: loss nan, time 120.31ms
iter 244660: loss nan, time 120.32ms
iter 244670: loss nan, time 121.10ms
iter 244680: loss nan, time 120.38ms
iter 244690: loss nan, time 121.88ms
tensor(0.0089)
iter 244700: loss nan, time 119.46ms
iter 244710: loss nan, time 120.55ms
iter 244720: loss nan, time 117.60ms
iter 244730: loss nan, time 116.43ms
iter 244740: loss nan, time 118.63ms
step 244750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 244750: loss nan, time 2889.96ms
iter 244760: loss nan, time 119.10ms
iter 244770: loss nan, time 118.35ms
iter 244780: loss nan, time 118.86ms
iter 244790: loss nan, time 117.64ms
tensor(0.0039)
iter 244800: loss nan, time 117.13ms
iter 244810: loss nan, time 119.10ms
iter 244820: loss nan, time 120.36ms
iter 244830: loss nan, time 120.07ms
iter 244840: loss nan, time 119.42ms
iter 244850: loss nan, time 116.73ms
iter 244860: loss nan, time 114.45ms
iter 244870: loss nan, time 116.77ms
iter 244880: loss nan, time 118.75ms
iter 244890: loss nan, time 117.16ms
tensor(0.0010)
iter 244900: loss nan, time 120.23ms
iter 244910: loss nan, time 120.02ms
iter 244920: loss nan, time 118.27ms
iter 244930: loss nan, time 116.66ms
iter 244940: loss nan, time 116.60ms
iter 244950: loss nan, time 117.95ms
iter 244960: loss nan, time 117.73ms
iter 244970: loss nan, time 119.57ms
iter 244980: loss nan, time 119.32ms
iter 244990: loss nan, time 119.59ms
tensor(0.0010)
step 245000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 245000: loss nan, time 2887.71ms
iter 245010: loss nan, time 117.19ms
iter 245020: loss nan, time 120.59ms
iter 245030: loss nan, time 118.32ms
iter 245040: loss nan, time 118.25ms
iter 245050: loss nan, time 116.63ms
iter 245060: loss nan, time 119.73ms
iter 245070: loss nan, time 119.17ms
iter 245080: loss nan, time 119.14ms
iter 245090: loss nan, time 119.64ms
tensor(0.0010)
iter 245100: loss nan, time 119.13ms
iter 245110: loss nan, time 117.57ms
iter 245120: loss nan, time 116.50ms
iter 245130: loss nan, time 118.32ms
iter 245140: loss nan, time 119.54ms
iter 245150: loss nan, time 120.90ms
iter 245160: loss nan, time 119.59ms
iter 245170: loss nan, time 117.63ms
iter 245180: loss nan, time 119.46ms
iter 245190: loss nan, time 117.79ms
tensor(0.0039)
iter 245200: loss nan, time 117.77ms
iter 245210: loss nan, time 118.36ms
iter 245220: loss nan, time 119.61ms
iter 245230: loss nan, time 119.69ms
iter 245240: loss nan, time 118.95ms
step 245250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 245250: loss nan, time 2922.63ms
iter 245260: loss nan, time 119.96ms
iter 245270: loss nan, time 120.44ms
iter 245280: loss nan, time 118.49ms
iter 245290: loss nan, time 118.54ms
tensor(0.0089)
iter 245300: loss nan, time 118.56ms
iter 245310: loss nan, time 118.15ms
iter 245320: loss nan, time 118.72ms
iter 245330: loss nan, time 118.37ms
iter 245340: loss nan, time 116.31ms
iter 245350: loss nan, time 119.83ms
iter 245360: loss nan, time 119.92ms
iter 245370: loss nan, time 120.92ms
iter 245380: loss nan, time 120.36ms
iter 245390: loss nan, time 117.86ms
tensor(0.0157)
iter 245400: loss nan, time 120.34ms
iter 245410: loss nan, time 118.40ms
iter 245420: loss nan, time 118.36ms
iter 245430: loss nan, time 118.89ms
iter 245440: loss nan, time 116.84ms
iter 245450: loss nan, time 119.88ms
iter 245460: loss nan, time 118.93ms
iter 245470: loss nan, time 119.99ms
iter 245480: loss nan, time 120.53ms
iter 245490: loss nan, time 119.94ms
tensor(0.0245)
step 245500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 245500: loss nan, time 2888.91ms
iter 245510: loss nan, time 118.00ms
iter 245520: loss nan, time 117.08ms
iter 245530: loss nan, time 119.48ms
iter 245540: loss nan, time 119.70ms
iter 245550: loss nan, time 120.69ms
iter 245560: loss nan, time 118.51ms
iter 245570: loss nan, time 119.79ms
iter 245580: loss nan, time 120.36ms
iter 245590: loss nan, time 118.40ms
tensor(0.0351)
iter 245600: loss nan, time 119.87ms
iter 245610: loss nan, time 118.01ms
iter 245620: loss nan, time 117.86ms
iter 245630: loss nan, time 116.98ms
iter 245640: loss nan, time 119.26ms
iter 245650: loss nan, time 119.15ms
iter 245660: loss nan, time 120.09ms
iter 245670: loss nan, time 120.25ms
iter 245680: loss nan, time 118.70ms
iter 245690: loss nan, time 119.35ms
tensor(0.0476)
iter 245700: loss nan, time 119.13ms
iter 245710: loss nan, time 118.75ms
iter 245720: loss nan, time 117.81ms
iter 245730: loss nan, time 117.48ms
iter 245740: loss nan, time 118.51ms
step 245750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 245750: loss nan, time 2896.95ms
iter 245760: loss nan, time 120.31ms
iter 245770: loss nan, time 120.36ms
iter 245780: loss nan, time 118.36ms
iter 245790: loss nan, time 119.58ms
tensor(0.0618)
iter 245800: loss nan, time 117.82ms
iter 245810: loss nan, time 117.76ms
iter 245820: loss nan, time 117.24ms
iter 245830: loss nan, time 119.74ms
iter 245840: loss nan, time 119.68ms
iter 245850: loss nan, time 118.95ms
iter 245860: loss nan, time 119.85ms
iter 245870: loss nan, time 120.10ms
iter 245880: loss nan, time 118.94ms
iter 245890: loss nan, time 118.90ms
tensor(0.0778)
iter 245900: loss nan, time 117.69ms
iter 245910: loss nan, time 117.34ms
iter 245920: loss nan, time 118.74ms
iter 245930: loss nan, time 119.14ms
iter 245940: loss nan, time 117.46ms
iter 245950: loss nan, time 121.00ms
iter 245960: loss nan, time 119.82ms
iter 245970: loss nan, time 118.84ms
iter 245980: loss nan, time 119.40ms
iter 245990: loss nan, time 117.92ms
tensor(0.0955)
step 246000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 246000: loss nan, time 2900.09ms
iter 246010: loss nan, time 120.33ms
iter 246020: loss nan, time 118.64ms
iter 246030: loss nan, time 120.16ms
iter 246040: loss nan, time 120.28ms
iter 246050: loss nan, time 118.88ms
iter 246060: loss nan, time 118.60ms
iter 246070: loss nan, time 117.99ms
iter 246080: loss nan, time 118.77ms
iter 246090: loss nan, time 117.96ms
tensor(0.1147)
iter 246100: loss nan, time 120.05ms
iter 246110: loss nan, time 117.11ms
iter 246120: loss nan, time 119.61ms
iter 246130: loss nan, time 120.64ms
iter 246140: loss nan, time 120.83ms
iter 246150: loss nan, time 119.94ms
iter 246160: loss nan, time 118.29ms
iter 246170: loss nan, time 119.82ms
iter 246180: loss nan, time 118.33ms
iter 246190: loss nan, time 118.63ms
tensor(0.1355)
iter 246200: loss nan, time 119.59ms
iter 246210: loss nan, time 118.62ms
iter 246220: loss nan, time 118.24ms
iter 246230: loss nan, time 117.77ms
iter 246240: loss nan, time 118.63ms
step 246250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 246250: loss nan, time 2887.61ms
iter 246260: loss nan, time 120.36ms
iter 246270: loss nan, time 119.89ms
iter 246280: loss nan, time 121.59ms
iter 246290: loss nan, time 120.52ms
tensor(0.1577)
iter 246300: loss nan, time 122.48ms
iter 246310: loss nan, time 122.01ms
iter 246320: loss nan, time 122.25ms
iter 246330: loss nan, time 122.08ms
iter 246340: loss nan, time 121.04ms
iter 246350: loss nan, time 121.96ms
iter 246360: loss nan, time 122.04ms
iter 246370: loss nan, time 121.71ms
iter 246380: loss nan, time 122.03ms
iter 246390: loss nan, time 120.91ms
tensor(0.1813)
iter 246400: loss nan, time 122.39ms
iter 246410: loss nan, time 122.02ms
iter 246420: loss nan, time 122.26ms
iter 246430: loss nan, time 122.03ms
iter 246440: loss nan, time 119.98ms
iter 246450: loss nan, time 122.03ms
iter 246460: loss nan, time 121.98ms
iter 246470: loss nan, time 121.49ms
iter 246480: loss nan, time 120.74ms
iter 246490: loss nan, time 120.92ms
tensor(0.2061)
step 246500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 246500: loss nan, time 2892.21ms
iter 246510: loss nan, time 116.53ms
iter 246520: loss nan, time 112.92ms
iter 246530: loss nan, time 115.61ms
iter 246540: loss nan, time 114.60ms
iter 246550: loss nan, time 115.70ms
iter 246560: loss nan, time 115.29ms
iter 246570: loss nan, time 116.63ms
iter 246580: loss nan, time 117.57ms
iter 246590: loss nan, time 114.51ms
tensor(0.2321)
iter 246600: loss nan, time 118.05ms
iter 246610: loss nan, time 115.72ms
iter 246620: loss nan, time 115.42ms
iter 246630: loss nan, time 117.22ms
iter 246640: loss nan, time 115.47ms
iter 246650: loss nan, time 116.62ms
iter 246660: loss nan, time 116.99ms
iter 246670: loss nan, time 114.59ms
iter 246680: loss nan, time 117.86ms
iter 246690: loss nan, time 114.66ms
tensor(0.2591)
iter 246700: loss nan, time 116.64ms
iter 246710: loss nan, time 118.76ms
iter 246720: loss nan, time 113.50ms
iter 246730: loss nan, time 117.48ms
iter 246740: loss nan, time 116.45ms
step 246750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 246750: loss nan, time 2910.87ms
iter 246760: loss nan, time 118.49ms
iter 246770: loss nan, time 115.69ms
iter 246780: loss nan, time 117.03ms
iter 246790: loss nan, time 118.20ms
tensor(0.2871)
iter 246800: loss nan, time 114.60ms
iter 246810: loss nan, time 114.91ms
iter 246820: loss nan, time 118.22ms
iter 246830: loss nan, time 114.78ms
iter 246840: loss nan, time 117.47ms
iter 246850: loss nan, time 116.56ms
iter 246860: loss nan, time 114.85ms
iter 246870: loss nan, time 116.36ms
iter 246880: loss nan, time 115.39ms
iter 246890: loss nan, time 116.91ms
tensor(0.3159)
iter 246900: loss nan, time 117.67ms
iter 246910: loss nan, time 114.96ms
iter 246920: loss nan, time 116.89ms
iter 246930: loss nan, time 116.09ms
iter 246940: loss nan, time 114.63ms
iter 246950: loss nan, time 117.15ms
iter 246960: loss nan, time 116.23ms
iter 246970: loss nan, time 115.09ms
iter 246980: loss nan, time 118.16ms
iter 246990: loss nan, time 115.41ms
tensor(0.3455)
step 247000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 247000: loss nan, time 2905.55ms
iter 247010: loss nan, time 115.21ms
iter 247020: loss nan, time 114.85ms
iter 247030: loss nan, time 117.96ms
iter 247040: loss nan, time 115.67ms
iter 247050: loss nan, time 115.51ms
iter 247060: loss nan, time 117.88ms
iter 247070: loss nan, time 114.81ms
iter 247080: loss nan, time 118.33ms
iter 247090: loss nan, time 116.05ms
tensor(0.3757)
iter 247100: loss nan, time 115.31ms
iter 247110: loss nan, time 117.96ms
iter 247120: loss nan, time 115.58ms
iter 247130: loss nan, time 114.77ms
iter 247140: loss nan, time 117.42ms
iter 247150: loss nan, time 114.80ms
iter 247160: loss nan, time 116.60ms
iter 247170: loss nan, time 116.79ms
iter 247180: loss nan, time 115.14ms
iter 247190: loss nan, time 115.82ms
tensor(0.4063)
iter 247200: loss nan, time 116.31ms
iter 247210: loss nan, time 115.65ms
iter 247220: loss nan, time 118.14ms
iter 247230: loss nan, time 115.16ms
iter 247240: loss nan, time 116.92ms
step 247250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 247250: loss nan, time 2912.72ms
iter 247260: loss nan, time 114.57ms
iter 247270: loss nan, time 117.89ms
iter 247280: loss nan, time 114.96ms
iter 247290: loss nan, time 116.78ms
tensor(0.4373)
iter 247300: loss nan, time 118.31ms
iter 247310: loss nan, time 113.95ms
iter 247320: loss nan, time 117.93ms
iter 247330: loss nan, time 116.23ms
iter 247340: loss nan, time 115.13ms
iter 247350: loss nan, time 118.46ms
iter 247360: loss nan, time 115.79ms
iter 247370: loss nan, time 115.57ms
iter 247380: loss nan, time 118.07ms
iter 247390: loss nan, time 114.87ms
tensor(0.4686)
iter 247400: loss nan, time 117.51ms
iter 247410: loss nan, time 117.15ms
iter 247420: loss nan, time 114.60ms
iter 247430: loss nan, time 117.90ms
iter 247440: loss nan, time 115.74ms
iter 247450: loss nan, time 114.79ms
iter 247460: loss nan, time 117.94ms
iter 247470: loss nan, time 115.13ms
iter 247480: loss nan, time 116.66ms
iter 247490: loss nan, time 117.02ms
tensor(0.5000)
step 247500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 247500: loss nan, time 2905.00ms
iter 247510: loss nan, time 118.01ms
iter 247520: loss nan, time 114.85ms
iter 247530: loss nan, time 116.68ms
iter 247540: loss nan, time 115.50ms
iter 247550: loss nan, time 114.72ms
iter 247560: loss nan, time 118.57ms
iter 247570: loss nan, time 115.85ms
iter 247580: loss nan, time 114.73ms
iter 247590: loss nan, time 117.94ms
tensor(0.5314)
iter 247600: loss nan, time 115.43ms
iter 247610: loss nan, time 117.13ms
iter 247620: loss nan, time 117.81ms
iter 247630: loss nan, time 114.04ms
iter 247640: loss nan, time 116.93ms
iter 247650: loss nan, time 116.90ms
iter 247660: loss nan, time 115.08ms
iter 247670: loss nan, time 118.09ms
iter 247680: loss nan, time 115.59ms
iter 247690: loss nan, time 116.64ms
tensor(0.5627)
iter 247700: loss nan, time 118.38ms
iter 247710: loss nan, time 114.90ms
iter 247720: loss nan, time 118.32ms
iter 247730: loss nan, time 116.19ms
iter 247740: loss nan, time 115.03ms
step 247750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 247750: loss nan, time 2908.90ms
iter 247760: loss nan, time 115.12ms
iter 247770: loss nan, time 116.87ms
iter 247780: loss nan, time 115.74ms
iter 247790: loss nan, time 115.15ms
tensor(0.5937)
iter 247800: loss nan, time 118.20ms
iter 247810: loss nan, time 115.63ms
iter 247820: loss nan, time 114.60ms
iter 247830: loss nan, time 118.12ms
iter 247840: loss nan, time 115.58ms
iter 247850: loss nan, time 116.65ms
iter 247860: loss nan, time 115.96ms
iter 247870: loss nan, time 115.97ms
iter 247880: loss nan, time 116.75ms
iter 247890: loss nan, time 116.55ms
tensor(0.6243)
iter 247900: loss nan, time 115.68ms
iter 247910: loss nan, time 116.70ms
iter 247920: loss nan, time 115.00ms
iter 247930: loss nan, time 116.68ms
iter 247940: loss nan, time 117.99ms
iter 247950: loss nan, time 115.53ms
iter 247960: loss nan, time 117.23ms
iter 247970: loss nan, time 116.73ms
iter 247980: loss nan, time 114.63ms
iter 247990: loss nan, time 117.19ms
tensor(0.6545)
step 248000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 248000: loss nan, time 2902.31ms
iter 248010: loss nan, time 116.70ms
iter 248020: loss nan, time 116.92ms
iter 248030: loss nan, time 115.37ms
iter 248040: loss nan, time 114.63ms
iter 248050: loss nan, time 115.75ms
iter 248060: loss nan, time 115.05ms
iter 248070: loss nan, time 118.02ms
iter 248080: loss nan, time 115.98ms
iter 248090: loss nan, time 117.18ms
tensor(0.6841)
iter 248100: loss nan, time 116.29ms
iter 248110: loss nan, time 115.78ms
iter 248120: loss nan, time 116.99ms
iter 248130: loss nan, time 116.12ms
iter 248140: loss nan, time 115.92ms
iter 248150: loss nan, time 117.06ms
iter 248160: loss nan, time 115.60ms
iter 248170: loss nan, time 115.05ms
iter 248180: loss nan, time 116.81ms
iter 248190: loss nan, time 116.05ms
tensor(0.7129)
iter 248200: loss nan, time 117.10ms
iter 248210: loss nan, time 116.74ms
iter 248220: loss nan, time 115.61ms
iter 248230: loss nan, time 116.69ms
iter 248240: loss nan, time 115.20ms
step 248250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 248250: loss nan, time 2906.48ms
iter 248260: loss nan, time 117.01ms
iter 248270: loss nan, time 115.56ms
iter 248280: loss nan, time 116.79ms
iter 248290: loss nan, time 116.29ms
tensor(0.7409)
iter 248300: loss nan, time 115.28ms
iter 248310: loss nan, time 116.63ms
iter 248320: loss nan, time 115.75ms
iter 248330: loss nan, time 116.78ms
iter 248340: loss nan, time 116.92ms
iter 248350: loss nan, time 116.38ms
iter 248360: loss nan, time 114.69ms
iter 248370: loss nan, time 116.11ms
iter 248380: loss nan, time 114.73ms
iter 248390: loss nan, time 117.99ms
tensor(0.7679)
iter 248400: loss nan, time 115.66ms
iter 248410: loss nan, time 116.78ms
iter 248420: loss nan, time 116.06ms
iter 248430: loss nan, time 116.05ms
iter 248440: loss nan, time 117.11ms
iter 248450: loss nan, time 116.22ms
iter 248460: loss nan, time 115.38ms
iter 248470: loss nan, time 116.56ms
iter 248480: loss nan, time 115.73ms
iter 248490: loss nan, time 116.83ms
tensor(0.7939)
step 248500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 248500: loss nan, time 2915.81ms
iter 248510: loss nan, time 114.69ms
iter 248520: loss nan, time 116.72ms
iter 248530: loss nan, time 116.83ms
iter 248540: loss nan, time 115.45ms
iter 248550: loss nan, time 116.64ms
iter 248560: loss nan, time 117.45ms
iter 248570: loss nan, time 114.14ms
iter 248580: loss nan, time 116.99ms
iter 248590: loss nan, time 116.51ms
tensor(0.8187)
iter 248600: loss nan, time 114.62ms
iter 248610: loss nan, time 117.15ms
iter 248620: loss nan, time 115.86ms
iter 248630: loss nan, time 116.66ms
iter 248640: loss nan, time 118.42ms
iter 248650: loss nan, time 115.97ms
iter 248660: loss nan, time 117.23ms
iter 248670: loss nan, time 118.19ms
iter 248680: loss nan, time 115.78ms
iter 248690: loss nan, time 116.70ms
tensor(0.8423)
iter 248700: loss nan, time 116.60ms
iter 248710: loss nan, time 114.92ms
iter 248720: loss nan, time 117.84ms
iter 248730: loss nan, time 115.89ms
iter 248740: loss nan, time 114.63ms
step 248750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 248750: loss nan, time 2913.26ms
iter 248760: loss nan, time 115.62ms
iter 248770: loss nan, time 117.09ms
iter 248780: loss nan, time 117.48ms
iter 248790: loss nan, time 115.60ms
tensor(0.8645)
iter 248800: loss nan, time 117.06ms
iter 248810: loss nan, time 115.86ms
iter 248820: loss nan, time 114.44ms
iter 248830: loss nan, time 116.77ms
iter 248840: loss nan, time 116.04ms
iter 248850: loss nan, time 116.76ms
iter 248860: loss nan, time 116.87ms
iter 248870: loss nan, time 115.30ms
iter 248880: loss nan, time 116.79ms
iter 248890: loss nan, time 115.28ms
tensor(0.8853)
iter 248900: loss nan, time 116.41ms
iter 248910: loss nan, time 118.11ms
iter 248920: loss nan, time 115.95ms
iter 248930: loss nan, time 116.94ms
iter 248940: loss nan, time 115.77ms
iter 248950: loss nan, time 115.10ms
iter 248960: loss nan, time 117.30ms
iter 248970: loss nan, time 114.83ms
iter 248980: loss nan, time 116.77ms
iter 248990: loss nan, time 118.18ms
tensor(0.9045)
step 249000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 249000: loss nan, time 2917.25ms
iter 249010: loss nan, time 114.85ms
iter 249020: loss nan, time 117.82ms
iter 249030: loss nan, time 115.55ms
iter 249040: loss nan, time 115.97ms
iter 249050: loss nan, time 116.34ms
iter 249060: loss nan, time 114.62ms
iter 249070: loss nan, time 114.72ms
iter 249080: loss nan, time 115.78ms
iter 249090: loss nan, time 116.64ms
tensor(0.9222)
iter 249100: loss nan, time 118.73ms
iter 249110: loss nan, time 115.60ms
iter 249120: loss nan, time 117.17ms
iter 249130: loss nan, time 116.24ms
iter 249140: loss nan, time 116.19ms
iter 249150: loss nan, time 116.88ms
iter 249160: loss nan, time 118.05ms
iter 249170: loss nan, time 115.52ms
iter 249180: loss nan, time 116.53ms
iter 249190: loss nan, time 115.87ms
tensor(0.9382)
iter 249200: loss nan, time 116.21ms
iter 249210: loss nan, time 116.84ms
iter 249220: loss nan, time 116.64ms
iter 249230: loss nan, time 115.77ms
iter 249240: loss nan, time 116.73ms
step 249250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 249250: loss nan, time 2901.29ms
iter 249260: loss nan, time 117.04ms
iter 249270: loss nan, time 116.10ms
iter 249280: loss nan, time 117.09ms
iter 249290: loss nan, time 118.48ms
tensor(0.9524)
iter 249300: loss nan, time 116.22ms
iter 249310: loss nan, time 116.72ms
iter 249320: loss nan, time 118.37ms
iter 249330: loss nan, time 116.37ms
iter 249340: loss nan, time 117.00ms
iter 249350: loss nan, time 116.22ms
iter 249360: loss nan, time 115.75ms
iter 249370: loss nan, time 116.62ms
iter 249380: loss nan, time 115.85ms
iter 249390: loss nan, time 114.56ms
tensor(0.9649)
iter 249400: loss nan, time 118.02ms
iter 249410: loss nan, time 115.09ms
iter 249420: loss nan, time 116.02ms
iter 249430: loss nan, time 115.96ms
iter 249440: loss nan, time 114.71ms
iter 249450: loss nan, time 115.98ms
iter 249460: loss nan, time 116.16ms
iter 249470: loss nan, time 116.77ms
iter 249480: loss nan, time 117.59ms
iter 249490: loss nan, time 116.09ms
tensor(0.9755)
step 249500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 249500: loss nan, time 2911.53ms
iter 249510: loss nan, time 116.91ms
iter 249520: loss nan, time 115.39ms
iter 249530: loss nan, time 116.75ms
iter 249540: loss nan, time 114.56ms
iter 249550: loss nan, time 117.17ms
iter 249560: loss nan, time 118.24ms
iter 249570: loss nan, time 115.90ms
iter 249580: loss nan, time 116.99ms
iter 249590: loss nan, time 116.49ms
tensor(0.9843)
iter 249600: loss nan, time 115.36ms
iter 249610: loss nan, time 116.96ms
iter 249620: loss nan, time 115.90ms
iter 249630: loss nan, time 114.70ms
iter 249640: loss nan, time 118.17ms
iter 249650: loss nan, time 115.83ms
iter 249660: loss nan, time 116.74ms
iter 249670: loss nan, time 117.42ms
iter 249680: loss nan, time 115.92ms
iter 249690: loss nan, time 116.77ms
tensor(0.9911)
iter 249700: loss nan, time 116.62ms
iter 249710: loss nan, time 115.63ms
iter 249720: loss nan, time 116.85ms
iter 249730: loss nan, time 116.04ms
iter 249740: loss nan, time 115.43ms
step 249750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 249750: loss nan, time 2923.56ms
iter 249760: loss nan, time 116.52ms
iter 249770: loss nan, time 115.84ms
iter 249780: loss nan, time 114.60ms
iter 249790: loss nan, time 117.09ms
tensor(0.9961)
iter 249800: loss nan, time 115.35ms
iter 249810: loss nan, time 116.76ms
iter 249820: loss nan, time 115.04ms
iter 249830: loss nan, time 116.71ms
iter 249840: loss nan, time 116.66ms
iter 249850: loss nan, time 115.70ms
iter 249860: loss nan, time 116.51ms
iter 249870: loss nan, time 116.44ms
iter 249880: loss nan, time 115.33ms
iter 249890: loss nan, time 116.83ms
tensor(0.9990)
iter 249900: loss nan, time 116.52ms
iter 249910: loss nan, time 115.59ms
iter 249920: loss nan, time 116.89ms
iter 249930: loss nan, time 115.86ms
iter 249940: loss nan, time 116.81ms
iter 249950: loss nan, time 118.11ms
iter 249960: loss nan, time 115.89ms
iter 249970: loss nan, time 117.15ms
iter 249980: loss nan, time 115.57ms
iter 249990: loss nan, time 115.59ms
tensor(1.)
step 250000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 250000: loss nan, time 2912.04ms
iter 250010: loss nan, time 115.86ms
iter 250020: loss nan, time 115.58ms
iter 250030: loss nan, time 117.93ms
iter 250040: loss nan, time 115.75ms
iter 250050: loss nan, time 116.86ms
iter 250060: loss nan, time 118.21ms
iter 250070: loss nan, time 116.11ms
iter 250080: loss nan, time 116.73ms
iter 250090: loss nan, time 116.62ms
tensor(0.9990)
iter 250100: loss nan, time 116.20ms
iter 250110: loss nan, time 116.01ms
iter 250120: loss nan, time 115.73ms
iter 250130: loss nan, time 116.90ms
iter 250140: loss nan, time 118.54ms
iter 250150: loss nan, time 115.90ms
iter 250160: loss nan, time 114.57ms
iter 250170: loss nan, time 117.53ms
iter 250180: loss nan, time 115.92ms
iter 250190: loss nan, time 116.80ms
tensor(0.9961)
iter 250200: loss nan, time 116.73ms
iter 250210: loss nan, time 115.11ms
iter 250220: loss nan, time 114.54ms
iter 250230: loss nan, time 115.88ms
iter 250240: loss nan, time 116.63ms
step 250250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 250250: loss nan, time 2913.06ms
iter 250260: loss nan, time 121.86ms
iter 250270: loss nan, time 120.93ms
iter 250280: loss nan, time 122.14ms
iter 250290: loss nan, time 122.11ms
tensor(0.9911)
iter 250300: loss nan, time 121.18ms
iter 250310: loss nan, time 122.16ms
iter 250320: loss nan, time 122.17ms
iter 250330: loss nan, time 122.09ms
iter 250340: loss nan, time 122.02ms
iter 250350: loss nan, time 120.96ms
iter 250360: loss nan, time 122.51ms
iter 250370: loss nan, time 121.03ms
iter 250380: loss nan, time 120.80ms
iter 250390: loss nan, time 120.71ms
tensor(0.9843)
iter 250400: loss nan, time 121.13ms
iter 250410: loss nan, time 120.97ms
iter 250420: loss nan, time 121.15ms
iter 250430: loss nan, time 120.28ms
iter 250440: loss nan, time 120.95ms
iter 250450: loss nan, time 120.73ms
iter 250460: loss nan, time 121.00ms
iter 250470: loss nan, time 120.87ms
iter 250480: loss nan, time 117.54ms
iter 250490: loss nan, time 118.86ms
tensor(0.9755)
step 250500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 250500: loss nan, time 2902.22ms
iter 250510: loss nan, time 120.82ms
iter 250520: loss nan, time 120.83ms
iter 250530: loss nan, time 120.79ms
iter 250540: loss nan, time 118.52ms
iter 250550: loss nan, time 118.84ms
iter 250560: loss nan, time 118.46ms
iter 250570: loss nan, time 118.61ms
iter 250580: loss nan, time 119.16ms
iter 250590: loss nan, time 117.74ms
tensor(0.9649)
iter 250600: loss nan, time 118.81ms
iter 250610: loss nan, time 118.60ms
iter 250620: loss nan, time 118.99ms
iter 250630: loss nan, time 118.97ms
iter 250640: loss nan, time 118.31ms
iter 250650: loss nan, time 118.33ms
iter 250660: loss nan, time 118.86ms
iter 250670: loss nan, time 119.75ms
iter 250680: loss nan, time 119.08ms
iter 250690: loss nan, time 118.77ms
tensor(0.9524)
iter 250700: loss nan, time 120.03ms
iter 250710: loss nan, time 120.03ms
iter 250720: loss nan, time 120.15ms
iter 250730: loss nan, time 119.94ms
iter 250740: loss nan, time 118.92ms
step 250750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 250750: loss nan, time 2910.44ms
iter 250760: loss nan, time 120.04ms
iter 250770: loss nan, time 119.91ms
iter 250780: loss nan, time 120.35ms
iter 250790: loss nan, time 119.96ms
tensor(0.9382)
iter 250800: loss nan, time 120.31ms
iter 250810: loss nan, time 119.50ms
iter 250820: loss nan, time 120.14ms
iter 250830: loss nan, time 121.17ms
iter 250840: loss nan, time 121.46ms
iter 250850: loss nan, time 120.34ms
iter 250860: loss nan, time 122.47ms
iter 250870: loss nan, time 122.06ms
iter 250880: loss nan, time 122.27ms
iter 250890: loss nan, time 121.09ms
tensor(0.9222)
iter 250900: loss nan, time 119.04ms
iter 250910: loss nan, time 121.34ms
iter 250920: loss nan, time 121.15ms
iter 250930: loss nan, time 121.00ms
iter 250940: loss nan, time 120.83ms
iter 250950: loss nan, time 118.58ms
iter 250960: loss nan, time 120.96ms
iter 250970: loss nan, time 120.77ms
iter 250980: loss nan, time 119.04ms
iter 250990: loss nan, time 118.45ms
tensor(0.9045)
step 251000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 251000: loss nan, time 2896.89ms
iter 251010: loss nan, time 119.09ms
iter 251020: loss nan, time 120.85ms
iter 251030: loss nan, time 120.96ms
iter 251040: loss nan, time 120.93ms
iter 251050: loss nan, time 121.03ms
iter 251060: loss nan, time 118.05ms
iter 251070: loss nan, time 120.79ms
iter 251080: loss nan, time 118.66ms
iter 251090: loss nan, time 118.89ms
tensor(0.8853)
iter 251100: loss nan, time 120.12ms
iter 251110: loss nan, time 119.72ms
iter 251120: loss nan, time 118.60ms
iter 251130: loss nan, time 119.25ms
iter 251140: loss nan, time 119.53ms
iter 251150: loss nan, time 119.82ms
iter 251160: loss nan, time 119.90ms
iter 251170: loss nan, time 118.52ms
iter 251180: loss nan, time 119.97ms
iter 251190: loss nan, time 119.84ms
tensor(0.8645)
iter 251200: loss nan, time 120.39ms
iter 251210: loss nan, time 120.36ms
iter 251220: loss nan, time 118.56ms
iter 251230: loss nan, time 120.09ms
iter 251240: loss nan, time 121.01ms
step 251250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 251250: loss nan, time 2911.22ms
iter 251260: loss nan, time 119.82ms
iter 251270: loss nan, time 119.84ms
iter 251280: loss nan, time 118.53ms
iter 251290: loss nan, time 119.79ms
tensor(0.8423)
iter 251300: loss nan, time 120.91ms
iter 251310: loss nan, time 120.32ms
iter 251320: loss nan, time 120.59ms
iter 251330: loss nan, time 119.59ms
iter 251340: loss nan, time 121.61ms
iter 251350: loss nan, time 122.45ms
iter 251360: loss nan, time 122.03ms
iter 251370: loss nan, time 121.87ms
iter 251380: loss nan, time 121.20ms
iter 251390: loss nan, time 122.06ms
tensor(0.8187)
iter 251400: loss nan, time 121.72ms
iter 251410: loss nan, time 122.17ms
iter 251420: loss nan, time 122.43ms
iter 251430: loss nan, time 120.81ms
iter 251440: loss nan, time 121.48ms
iter 251450: loss nan, time 121.98ms
iter 251460: loss nan, time 122.20ms
iter 251470: loss nan, time 122.36ms
iter 251480: loss nan, time 120.81ms
iter 251490: loss nan, time 120.96ms
tensor(0.7939)
step 251500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 251500: loss nan, time 2916.84ms
iter 251510: loss nan, time 122.20ms
iter 251520: loss nan, time 121.22ms
iter 251530: loss nan, time 122.09ms
iter 251540: loss nan, time 121.09ms
iter 251550: loss nan, time 120.96ms
iter 251560: loss nan, time 121.36ms
iter 251570: loss nan, time 120.93ms
iter 251580: loss nan, time 120.87ms
iter 251590: loss nan, time 119.02ms
tensor(0.7679)
iter 251600: loss nan, time 119.01ms
iter 251610: loss nan, time 118.96ms
iter 251620: loss nan, time 118.84ms
iter 251630: loss nan, time 118.88ms
iter 251640: loss nan, time 118.74ms
iter 251650: loss nan, time 117.90ms
iter 251660: loss nan, time 119.81ms
iter 251670: loss nan, time 120.39ms
iter 251680: loss nan, time 120.64ms
iter 251690: loss nan, time 121.20ms
tensor(0.7409)
iter 251700: loss nan, time 122.94ms
iter 251710: loss nan, time 122.89ms
iter 251720: loss nan, time 120.64ms
iter 251730: loss nan, time 121.64ms
iter 251740: loss nan, time 121.63ms
step 251750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 251750: loss nan, time 2871.58ms
iter 251760: loss nan, time 122.94ms
iter 251770: loss nan, time 122.18ms
iter 251780: loss nan, time 119.47ms
iter 251790: loss nan, time 122.56ms
tensor(0.7129)
iter 251800: loss nan, time 121.59ms
iter 251810: loss nan, time 122.05ms
iter 251820: loss nan, time 119.25ms
iter 251830: loss nan, time 120.00ms
iter 251840: loss nan, time 119.35ms
iter 251850: loss nan, time 120.35ms
iter 251860: loss nan, time 121.27ms
iter 251870: loss nan, time 122.13ms
iter 251880: loss nan, time 123.07ms
iter 251890: loss nan, time 121.50ms
tensor(0.6841)
iter 251900: loss nan, time 121.03ms
iter 251910: loss nan, time 117.83ms
iter 251920: loss nan, time 119.05ms
iter 251930: loss nan, time 120.14ms
iter 251940: loss nan, time 119.84ms
iter 251950: loss nan, time 121.37ms
iter 251960: loss nan, time 122.34ms
iter 251970: loss nan, time 121.40ms
iter 251980: loss nan, time 122.81ms
iter 251990: loss nan, time 121.83ms
tensor(0.6545)
step 252000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 252000: loss nan, time 2903.98ms
iter 252010: loss nan, time 123.51ms
iter 252020: loss nan, time 121.27ms
iter 252030: loss nan, time 119.45ms
iter 252040: loss nan, time 121.33ms
iter 252050: loss nan, time 118.96ms
iter 252060: loss nan, time 118.90ms
iter 252070: loss nan, time 118.20ms
iter 252080: loss nan, time 118.87ms
iter 252090: loss nan, time 118.52ms
tensor(0.6243)
iter 252100: loss nan, time 118.93ms
iter 252110: loss nan, time 118.18ms
iter 252120: loss nan, time 118.75ms
iter 252130: loss nan, time 119.18ms
iter 252140: loss nan, time 118.61ms
iter 252150: loss nan, time 119.05ms
iter 252160: loss nan, time 117.77ms
iter 252170: loss nan, time 118.69ms
iter 252180: loss nan, time 118.75ms
iter 252190: loss nan, time 126.50ms
tensor(0.5937)
iter 252200: loss nan, time 122.89ms
iter 252210: loss nan, time 122.40ms
iter 252220: loss nan, time 122.23ms
iter 252230: loss nan, time 122.35ms
iter 252240: loss nan, time 121.43ms
step 252250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 252250: loss nan, time 2907.77ms
iter 252260: loss nan, time 122.11ms
iter 252270: loss nan, time 122.85ms
iter 252280: loss nan, time 120.66ms
iter 252290: loss nan, time 121.63ms
tensor(0.5627)
iter 252300: loss nan, time 122.02ms
iter 252310: loss nan, time 121.17ms
iter 252320: loss nan, time 121.12ms
iter 252330: loss nan, time 119.15ms
iter 252340: loss nan, time 118.64ms
iter 252350: loss nan, time 118.87ms
iter 252360: loss nan, time 119.42ms
iter 252370: loss nan, time 118.41ms
iter 252380: loss nan, time 119.49ms
iter 252390: loss nan, time 119.47ms
tensor(0.5314)
iter 252400: loss nan, time 119.76ms
iter 252410: loss nan, time 120.04ms
iter 252420: loss nan, time 119.64ms
iter 252430: loss nan, time 119.46ms
iter 252440: loss nan, time 120.95ms
iter 252450: loss nan, time 120.61ms
iter 252460: loss nan, time 120.54ms
iter 252470: loss nan, time 119.59ms
iter 252480: loss nan, time 120.22ms
iter 252490: loss nan, time 121.78ms
tensor(0.5000)
step 252500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 252500: loss nan, time 2910.52ms
iter 252510: loss nan, time 121.75ms
iter 252520: loss nan, time 122.33ms
iter 252530: loss nan, time 122.28ms
iter 252540: loss nan, time 120.35ms
iter 252550: loss nan, time 122.32ms
iter 252560: loss nan, time 121.24ms
iter 252570: loss nan, time 120.89ms
iter 252580: loss nan, time 121.09ms
iter 252590: loss nan, time 118.72ms
tensor(0.4686)
iter 252600: loss nan, time 120.99ms
iter 252610: loss nan, time 120.72ms
iter 252620: loss nan, time 119.06ms
iter 252630: loss nan, time 119.02ms
iter 252640: loss nan, time 118.71ms
iter 252650: loss nan, time 118.73ms
iter 252660: loss nan, time 118.66ms
iter 252670: loss nan, time 118.93ms
iter 252680: loss nan, time 118.55ms
iter 252690: loss nan, time 118.97ms
tensor(0.4373)
iter 252700: loss nan, time 119.59ms
iter 252710: loss nan, time 119.87ms
iter 252720: loss nan, time 120.20ms
iter 252730: loss nan, time 120.85ms
iter 252740: loss nan, time 120.56ms
step 252750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 252750: loss nan, time 2912.04ms
iter 252760: loss nan, time 119.89ms
iter 252770: loss nan, time 120.48ms
iter 252780: loss nan, time 121.15ms
iter 252790: loss nan, time 121.43ms
tensor(0.4063)
iter 252800: loss nan, time 121.89ms
iter 252810: loss nan, time 121.33ms
iter 252820: loss nan, time 121.95ms
iter 252830: loss nan, time 122.12ms
iter 252840: loss nan, time 122.04ms
iter 252850: loss nan, time 121.76ms
iter 252860: loss nan, time 121.10ms
iter 252870: loss nan, time 122.57ms
iter 252880: loss nan, time 122.35ms
iter 252890: loss nan, time 121.04ms
tensor(0.3757)
iter 252900: loss nan, time 122.23ms
iter 252910: loss nan, time 120.82ms
iter 252920: loss nan, time 121.14ms
iter 252930: loss nan, time 121.03ms
iter 252940: loss nan, time 120.97ms
iter 252950: loss nan, time 122.02ms
iter 252960: loss nan, time 120.65ms
iter 252970: loss nan, time 121.89ms
iter 252980: loss nan, time 120.98ms
iter 252990: loss nan, time 120.95ms
tensor(0.3455)
step 253000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 253000: loss nan, time 2874.25ms
iter 253010: loss nan, time 119.72ms
iter 253020: loss nan, time 118.74ms
iter 253030: loss nan, time 119.78ms
iter 253040: loss nan, time 119.42ms
iter 253050: loss nan, time 119.57ms
iter 253060: loss nan, time 119.74ms
iter 253070: loss nan, time 118.46ms
iter 253080: loss nan, time 119.66ms
iter 253090: loss nan, time 120.29ms
tensor(0.3159)
iter 253100: loss nan, time 120.77ms
iter 253110: loss nan, time 119.07ms
iter 253120: loss nan, time 118.58ms
iter 253130: loss nan, time 119.93ms
iter 253140: loss nan, time 119.97ms
iter 253150: loss nan, time 119.95ms
iter 253160: loss nan, time 119.98ms
iter 253170: loss nan, time 118.82ms
iter 253180: loss nan, time 120.32ms
iter 253190: loss nan, time 120.77ms
tensor(0.2871)
iter 253200: loss nan, time 122.41ms
iter 253210: loss nan, time 122.02ms
iter 253220: loss nan, time 120.95ms
iter 253230: loss nan, time 122.19ms
iter 253240: loss nan, time 121.18ms
step 253250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 253250: loss nan, time 2914.02ms
iter 253260: loss nan, time 122.15ms
iter 253270: loss nan, time 121.56ms
iter 253280: loss nan, time 120.96ms
iter 253290: loss nan, time 122.20ms
tensor(0.2591)
iter 253300: loss nan, time 121.73ms
iter 253310: loss nan, time 122.10ms
iter 253320: loss nan, time 122.21ms
iter 253330: loss nan, time 121.25ms
iter 253340: loss nan, time 122.24ms
iter 253350: loss nan, time 122.11ms
iter 253360: loss nan, time 121.67ms
iter 253370: loss nan, time 121.95ms
iter 253380: loss nan, time 120.87ms
iter 253390: loss nan, time 122.13ms
tensor(0.2321)
iter 253400: loss nan, time 121.50ms
iter 253410: loss nan, time 122.07ms
iter 253420: loss nan, time 122.04ms
iter 253430: loss nan, time 120.49ms
iter 253440: loss nan, time 121.12ms
iter 253450: loss nan, time 120.74ms
iter 253460: loss nan, time 121.05ms
iter 253470: loss nan, time 120.84ms
iter 253480: loss nan, time 120.18ms
iter 253490: loss nan, time 118.77ms
tensor(0.2061)
step 253500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 253500: loss nan, time 2907.18ms
iter 253510: loss nan, time 120.70ms
iter 253520: loss nan, time 121.01ms
iter 253530: loss nan, time 119.87ms
iter 253540: loss nan, time 120.89ms
iter 253550: loss nan, time 120.74ms
iter 253560: loss nan, time 118.92ms
iter 253570: loss nan, time 118.70ms
iter 253580: loss nan, time 118.43ms
iter 253590: loss nan, time 118.69ms
tensor(0.1813)
iter 253600: loss nan, time 119.09ms
iter 253610: loss nan, time 119.09ms
iter 253620: loss nan, time 117.96ms
iter 253630: loss nan, time 119.05ms
iter 253640: loss nan, time 118.83ms
iter 253650: loss nan, time 118.08ms
iter 253660: loss nan, time 118.62ms
iter 253670: loss nan, time 119.00ms
iter 253680: loss nan, time 118.28ms
iter 253690: loss nan, time 118.70ms
tensor(0.1577)
iter 253700: loss nan, time 119.21ms
iter 253710: loss nan, time 118.77ms
iter 253720: loss nan, time 118.78ms
iter 253730: loss nan, time 118.74ms
iter 253740: loss nan, time 118.76ms
step 253750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 253750: loss nan, time 2907.44ms
iter 253760: loss nan, time 118.79ms
iter 253770: loss nan, time 118.66ms
iter 253780: loss nan, time 118.84ms
iter 253790: loss nan, time 118.62ms
tensor(0.1355)
iter 253800: loss nan, time 118.48ms
iter 253810: loss nan, time 119.32ms
iter 253820: loss nan, time 120.42ms
iter 253830: loss nan, time 120.10ms
iter 253840: loss nan, time 120.34ms
iter 253850: loss nan, time 120.78ms
iter 253860: loss nan, time 121.26ms
iter 253870: loss nan, time 120.86ms
iter 253880: loss nan, time 120.71ms
iter 253890: loss nan, time 121.52ms
tensor(0.1147)
iter 253900: loss nan, time 122.40ms
iter 253910: loss nan, time 121.46ms
iter 253920: loss nan, time 122.24ms
iter 253930: loss nan, time 120.00ms
iter 253940: loss nan, time 121.88ms
iter 253950: loss nan, time 122.66ms
iter 253960: loss nan, time 122.09ms
iter 253970: loss nan, time 120.98ms
iter 253980: loss nan, time 119.08ms
iter 253990: loss nan, time 120.41ms
tensor(0.0955)
step 254000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 254000: loss nan, time 2912.64ms
iter 254010: loss nan, time 122.10ms
iter 254020: loss nan, time 121.08ms
iter 254030: loss nan, time 121.40ms
iter 254040: loss nan, time 119.44ms
iter 254050: loss nan, time 121.19ms
iter 254060: loss nan, time 122.11ms
iter 254070: loss nan, time 120.68ms
iter 254080: loss nan, time 121.99ms
iter 254090: loss nan, time 118.76ms
tensor(0.0778)
iter 254100: loss nan, time 121.52ms
iter 254110: loss nan, time 120.86ms
iter 254120: loss nan, time 120.70ms
iter 254130: loss nan, time 120.72ms
iter 254140: loss nan, time 118.67ms
iter 254150: loss nan, time 120.87ms
iter 254160: loss nan, time 120.96ms
iter 254170: loss nan, time 120.55ms
iter 254180: loss nan, time 120.59ms
iter 254190: loss nan, time 118.51ms
tensor(0.0618)
iter 254200: loss nan, time 121.02ms
iter 254210: loss nan, time 120.64ms
iter 254220: loss nan, time 119.63ms
iter 254230: loss nan, time 118.78ms
iter 254240: loss nan, time 118.07ms
step 254250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 254250: loss nan, time 2892.36ms
iter 254260: loss nan, time 120.47ms
iter 254270: loss nan, time 120.86ms
iter 254280: loss nan, time 121.91ms
iter 254290: loss nan, time 121.97ms
tensor(0.0476)
iter 254300: loss nan, time 119.18ms
iter 254310: loss nan, time 121.12ms
iter 254320: loss nan, time 122.30ms
iter 254330: loss nan, time 120.80ms
iter 254340: loss nan, time 122.48ms
iter 254350: loss nan, time 119.68ms
iter 254360: loss nan, time 121.06ms
iter 254370: loss nan, time 121.81ms
iter 254380: loss nan, time 121.92ms
iter 254390: loss nan, time 120.75ms
tensor(0.0351)
iter 254400: loss nan, time 118.82ms
iter 254410: loss nan, time 121.66ms
iter 254420: loss nan, time 120.63ms
iter 254430: loss nan, time 121.88ms
iter 254440: loss nan, time 120.97ms
iter 254450: loss nan, time 119.75ms
iter 254460: loss nan, time 121.67ms
iter 254470: loss nan, time 122.08ms
iter 254480: loss nan, time 121.98ms
iter 254490: loss nan, time 121.44ms
tensor(0.0245)
step 254500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 254500: loss nan, time 2886.43ms
iter 254510: loss nan, time 119.76ms
iter 254520: loss nan, time 119.88ms
iter 254530: loss nan, time 119.34ms
iter 254540: loss nan, time 120.06ms
iter 254550: loss nan, time 119.87ms
iter 254560: loss nan, time 119.76ms
iter 254570: loss nan, time 119.66ms
iter 254580: loss nan, time 120.89ms
iter 254590: loss nan, time 119.92ms
tensor(0.0157)
iter 254600: loss nan, time 120.27ms
iter 254610: loss nan, time 119.76ms
iter 254620: loss nan, time 120.05ms
iter 254630: loss nan, time 120.07ms
iter 254640: loss nan, time 120.22ms
iter 254650: loss nan, time 119.47ms
iter 254660: loss nan, time 119.77ms
iter 254670: loss nan, time 120.01ms
iter 254680: loss nan, time 119.71ms
iter 254690: loss nan, time 119.51ms
tensor(0.0089)
iter 254700: loss nan, time 119.97ms
iter 254710: loss nan, time 119.57ms
iter 254720: loss nan, time 119.69ms
iter 254730: loss nan, time 119.07ms
iter 254740: loss nan, time 119.70ms
step 254750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 254750: loss nan, time 2903.67ms
iter 254760: loss nan, time 119.87ms
iter 254770: loss nan, time 119.74ms
iter 254780: loss nan, time 118.65ms
iter 254790: loss nan, time 119.58ms
tensor(0.0039)
iter 254800: loss nan, time 120.38ms
iter 254810: loss nan, time 119.81ms
iter 254820: loss nan, time 119.62ms
iter 254830: loss nan, time 119.66ms
iter 254840: loss nan, time 119.63ms
iter 254850: loss nan, time 119.66ms
iter 254860: loss nan, time 120.19ms
iter 254870: loss nan, time 119.60ms
iter 254880: loss nan, time 119.62ms
iter 254890: loss nan, time 119.62ms
tensor(0.0010)
iter 254900: loss nan, time 120.09ms
iter 254910: loss nan, time 119.87ms
iter 254920: loss nan, time 119.59ms
iter 254930: loss nan, time 119.70ms
iter 254940: loss nan, time 119.71ms
iter 254950: loss nan, time 119.86ms
iter 254960: loss nan, time 118.61ms
iter 254970: loss nan, time 119.58ms
iter 254980: loss nan, time 119.70ms
iter 254990: loss nan, time 119.61ms
tensor(0.0010)
step 255000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 255000: loss nan, time 2913.78ms
iter 255010: loss nan, time 119.77ms
iter 255020: loss nan, time 119.02ms
iter 255030: loss nan, time 119.71ms
iter 255040: loss nan, time 119.72ms
iter 255050: loss nan, time 120.11ms
iter 255060: loss nan, time 120.18ms
iter 255070: loss nan, time 119.70ms
iter 255080: loss nan, time 119.84ms
iter 255090: loss nan, time 119.63ms
tensor(0.0010)
iter 255100: loss nan, time 119.57ms
iter 255110: loss nan, time 120.43ms
iter 255120: loss nan, time 119.66ms
iter 255130: loss nan, time 119.81ms
iter 255140: loss nan, time 119.57ms
iter 255150: loss nan, time 118.80ms
iter 255160: loss nan, time 119.67ms
iter 255170: loss nan, time 119.88ms
iter 255180: loss nan, time 119.72ms
iter 255190: loss nan, time 119.03ms
tensor(0.0039)
iter 255200: loss nan, time 119.97ms
iter 255210: loss nan, time 120.11ms
iter 255220: loss nan, time 119.84ms
iter 255230: loss nan, time 119.64ms
iter 255240: loss nan, time 119.59ms
step 255250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 255250: loss nan, time 2896.90ms
iter 255260: loss nan, time 121.67ms
iter 255270: loss nan, time 118.72ms
iter 255280: loss nan, time 118.77ms
iter 255290: loss nan, time 118.22ms
tensor(0.0089)
iter 255300: loss nan, time 120.89ms
iter 255310: loss nan, time 120.62ms
iter 255320: loss nan, time 118.99ms
iter 255330: loss nan, time 120.60ms
iter 255340: loss nan, time 118.65ms
iter 255350: loss nan, time 120.64ms
iter 255360: loss nan, time 118.65ms
iter 255370: loss nan, time 119.24ms
iter 255380: loss nan, time 120.66ms
iter 255390: loss nan, time 120.70ms
tensor(0.0157)
iter 255400: loss nan, time 120.87ms
iter 255410: loss nan, time 118.62ms
iter 255420: loss nan, time 118.85ms
iter 255430: loss nan, time 120.63ms
iter 255440: loss nan, time 121.99ms
iter 255450: loss nan, time 121.56ms
iter 255460: loss nan, time 122.52ms
iter 255470: loss nan, time 124.00ms
iter 255480: loss nan, time 121.58ms
iter 255490: loss nan, time 119.62ms
tensor(0.0245)
step 255500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 255500: loss nan, time 2918.71ms
iter 255510: loss nan, time 120.75ms
iter 255520: loss nan, time 125.76ms
iter 255530: loss nan, time 121.22ms
iter 255540: loss nan, time 125.01ms
iter 255550: loss nan, time 119.07ms
iter 255560: loss nan, time 118.73ms
iter 255570: loss nan, time 118.76ms
iter 255580: loss nan, time 125.11ms
iter 255590: loss nan, time 122.47ms
tensor(0.0351)
iter 255600: loss nan, time 121.62ms
iter 255610: loss nan, time 122.16ms
iter 255620: loss nan, time 120.54ms
iter 255630: loss nan, time 123.53ms
iter 255640: loss nan, time 119.88ms
iter 255650: loss nan, time 125.07ms
iter 255660: loss nan, time 119.61ms
iter 255670: loss nan, time 120.11ms
iter 255680: loss nan, time 118.57ms
iter 255690: loss nan, time 119.50ms
tensor(0.0476)
iter 255700: loss nan, time 118.71ms
iter 255710: loss nan, time 121.86ms
iter 255720: loss nan, time 120.53ms
iter 255730: loss nan, time 121.10ms
iter 255740: loss nan, time 123.07ms
step 255750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 255750: loss nan, time 2907.42ms
iter 255760: loss nan, time 119.17ms
iter 255770: loss nan, time 118.76ms
iter 255780: loss nan, time 120.08ms
iter 255790: loss nan, time 123.54ms
tensor(0.0618)
iter 255800: loss nan, time 119.80ms
iter 255810: loss nan, time 120.44ms
iter 255820: loss nan, time 120.73ms
iter 255830: loss nan, time 121.83ms
iter 255840: loss nan, time 118.66ms
iter 255850: loss nan, time 121.96ms
iter 255860: loss nan, time 120.52ms
iter 255870: loss nan, time 121.19ms
iter 255880: loss nan, time 125.74ms
iter 255890: loss nan, time 122.08ms
tensor(0.0778)
iter 255900: loss nan, time 120.98ms
iter 255910: loss nan, time 120.94ms
iter 255920: loss nan, time 122.11ms
iter 255930: loss nan, time 121.54ms
iter 255940: loss nan, time 122.13ms
iter 255950: loss nan, time 120.76ms
iter 255960: loss nan, time 122.40ms
iter 255970: loss nan, time 126.01ms
iter 255980: loss nan, time 119.63ms
iter 255990: loss nan, time 120.62ms
tensor(0.0955)
step 256000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 256000: loss nan, time 2912.47ms
iter 256010: loss nan, time 114.82ms
iter 256020: loss nan, time 122.37ms
iter 256030: loss nan, time 115.90ms
iter 256040: loss nan, time 116.97ms
iter 256050: loss nan, time 114.60ms
iter 256060: loss nan, time 116.87ms
iter 256070: loss nan, time 116.86ms
iter 256080: loss nan, time 117.98ms
iter 256090: loss nan, time 116.73ms
tensor(0.1147)
iter 256100: loss nan, time 117.04ms
iter 256110: loss nan, time 115.53ms
iter 256120: loss nan, time 116.67ms
iter 256130: loss nan, time 116.75ms
iter 256140: loss nan, time 117.72ms
iter 256150: loss nan, time 116.98ms
iter 256160: loss nan, time 116.64ms
iter 256170: loss nan, time 116.01ms
iter 256180: loss nan, time 116.91ms
iter 256190: loss nan, time 116.49ms
tensor(0.1355)
iter 256200: loss nan, time 118.21ms
iter 256210: loss nan, time 116.58ms
iter 256220: loss nan, time 116.59ms
iter 256230: loss nan, time 116.57ms
iter 256240: loss nan, time 116.61ms
step 256250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 256250: loss nan, time 2894.45ms
iter 256260: loss nan, time 116.62ms
iter 256270: loss nan, time 117.04ms
iter 256280: loss nan, time 117.43ms
iter 256290: loss nan, time 117.02ms
tensor(0.1577)
iter 256300: loss nan, time 117.66ms
iter 256310: loss nan, time 117.35ms
iter 256320: loss nan, time 116.17ms
iter 256330: loss nan, time 117.37ms
iter 256340: loss nan, time 117.32ms
iter 256350: loss nan, time 117.00ms
iter 256360: loss nan, time 117.32ms
iter 256370: loss nan, time 116.61ms
iter 256380: loss nan, time 116.37ms
iter 256390: loss nan, time 117.35ms
tensor(0.1813)
iter 256400: loss nan, time 117.46ms
iter 256410: loss nan, time 115.87ms
iter 256420: loss nan, time 117.94ms
iter 256430: loss nan, time 116.44ms
iter 256440: loss nan, time 116.72ms
iter 256450: loss nan, time 116.26ms
iter 256460: loss nan, time 116.82ms
iter 256470: loss nan, time 116.69ms
iter 256480: loss nan, time 117.62ms
iter 256490: loss nan, time 116.69ms
tensor(0.2061)
step 256500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 256500: loss nan, time 2903.16ms
iter 256510: loss nan, time 116.26ms
iter 256520: loss nan, time 116.53ms
iter 256530: loss nan, time 116.89ms
iter 256540: loss nan, time 116.75ms
iter 256550: loss nan, time 115.78ms
iter 256560: loss nan, time 116.95ms
iter 256570: loss nan, time 116.79ms
iter 256580: loss nan, time 115.87ms
iter 256590: loss nan, time 117.11ms
tensor(0.2321)
iter 256600: loss nan, time 117.15ms
iter 256610: loss nan, time 115.64ms
iter 256620: loss nan, time 116.93ms
iter 256630: loss nan, time 116.87ms
iter 256640: loss nan, time 116.23ms
iter 256650: loss nan, time 116.37ms
iter 256660: loss nan, time 117.46ms
iter 256670: loss nan, time 115.89ms
iter 256680: loss nan, time 117.15ms
iter 256690: loss nan, time 117.09ms
tensor(0.2591)
iter 256700: loss nan, time 116.50ms
iter 256710: loss nan, time 116.76ms
iter 256720: loss nan, time 116.99ms
iter 256730: loss nan, time 116.99ms
iter 256740: loss nan, time 117.01ms
step 256750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 256750: loss nan, time 2911.96ms
iter 256760: loss nan, time 116.04ms
iter 256770: loss nan, time 113.94ms
iter 256780: loss nan, time 116.92ms
iter 256790: loss nan, time 116.78ms
tensor(0.2871)
iter 256800: loss nan, time 117.35ms
iter 256810: loss nan, time 114.79ms
iter 256820: loss nan, time 116.91ms
iter 256830: loss nan, time 114.69ms
iter 256840: loss nan, time 117.09ms
iter 256850: loss nan, time 116.50ms
iter 256860: loss nan, time 116.91ms
iter 256870: loss nan, time 114.59ms
iter 256880: loss nan, time 116.77ms
iter 256890: loss nan, time 115.28ms
tensor(0.3159)
iter 256900: loss nan, time 116.99ms
iter 256910: loss nan, time 117.55ms
iter 256920: loss nan, time 116.80ms
iter 256930: loss nan, time 114.53ms
iter 256940: loss nan, time 117.37ms
iter 256950: loss nan, time 114.60ms
iter 256960: loss nan, time 116.62ms
iter 256970: loss nan, time 117.64ms
iter 256980: loss nan, time 116.91ms
iter 256990: loss nan, time 114.63ms
tensor(0.3455)
step 257000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 257000: loss nan, time 2904.97ms
iter 257010: loss nan, time 116.24ms
iter 257020: loss nan, time 116.62ms
iter 257030: loss nan, time 116.56ms
iter 257040: loss nan, time 116.21ms
iter 257050: loss nan, time 116.91ms
iter 257060: loss nan, time 116.63ms
iter 257070: loss nan, time 116.82ms
iter 257080: loss nan, time 117.60ms
iter 257090: loss nan, time 117.07ms
tensor(0.3757)
iter 257100: loss nan, time 118.02ms
iter 257110: loss nan, time 117.19ms
iter 257120: loss nan, time 117.05ms
iter 257130: loss nan, time 117.87ms
iter 257140: loss nan, time 116.75ms
iter 257150: loss nan, time 116.61ms
iter 257160: loss nan, time 116.42ms
iter 257170: loss nan, time 117.06ms
iter 257180: loss nan, time 116.78ms
iter 257190: loss nan, time 118.06ms
tensor(0.4063)
iter 257200: loss nan, time 117.49ms
iter 257210: loss nan, time 116.76ms
iter 257220: loss nan, time 116.45ms
iter 257230: loss nan, time 116.69ms
iter 257240: loss nan, time 116.72ms
step 257250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 257250: loss nan, time 2898.30ms
iter 257260: loss nan, time 118.03ms
iter 257270: loss nan, time 116.74ms
iter 257280: loss nan, time 116.88ms
iter 257290: loss nan, time 117.87ms
tensor(0.4373)
iter 257300: loss nan, time 116.31ms
iter 257310: loss nan, time 116.12ms
iter 257320: loss nan, time 116.75ms
iter 257330: loss nan, time 117.15ms
iter 257340: loss nan, time 117.08ms
iter 257350: loss nan, time 118.01ms
iter 257360: loss nan, time 116.76ms
iter 257370: loss nan, time 116.15ms
iter 257380: loss nan, time 117.49ms
iter 257390: loss nan, time 117.35ms
tensor(0.4686)
iter 257400: loss nan, time 119.29ms
iter 257410: loss nan, time 117.66ms
iter 257420: loss nan, time 116.79ms
iter 257430: loss nan, time 116.15ms
iter 257440: loss nan, time 116.16ms
iter 257450: loss nan, time 116.78ms
iter 257460: loss nan, time 116.85ms
iter 257470: loss nan, time 118.69ms
iter 257480: loss nan, time 116.97ms
iter 257490: loss nan, time 116.19ms
tensor(0.5000)
step 257500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 257500: loss nan, time 2911.05ms
iter 257510: loss nan, time 116.93ms
iter 257520: loss nan, time 117.09ms
iter 257530: loss nan, time 116.04ms
iter 257540: loss nan, time 117.03ms
iter 257550: loss nan, time 114.82ms
iter 257560: loss nan, time 117.02ms
iter 257570: loss nan, time 116.86ms
iter 257580: loss nan, time 115.74ms
iter 257590: loss nan, time 116.87ms
tensor(0.5314)
iter 257600: loss nan, time 117.19ms
iter 257610: loss nan, time 116.50ms
iter 257620: loss nan, time 117.81ms
iter 257630: loss nan, time 116.83ms
iter 257640: loss nan, time 115.31ms
iter 257650: loss nan, time 116.94ms
iter 257660: loss nan, time 116.79ms
iter 257670: loss nan, time 115.80ms
iter 257680: loss nan, time 117.31ms
iter 257690: loss nan, time 117.09ms
tensor(0.5627)
iter 257700: loss nan, time 116.19ms
iter 257710: loss nan, time 115.99ms
iter 257720: loss nan, time 116.69ms
iter 257730: loss nan, time 116.78ms
iter 257740: loss nan, time 118.03ms
step 257750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 257750: loss nan, time 2896.97ms
iter 257760: loss nan, time 114.52ms
iter 257770: loss nan, time 116.97ms
iter 257780: loss nan, time 114.63ms
iter 257790: loss nan, time 117.67ms
tensor(0.5937)
iter 257800: loss nan, time 119.85ms
iter 257810: loss nan, time 117.47ms
iter 257820: loss nan, time 114.70ms
iter 257830: loss nan, time 117.83ms
iter 257840: loss nan, time 114.18ms
iter 257850: loss nan, time 114.99ms
iter 257860: loss nan, time 118.83ms
iter 257870: loss nan, time 117.40ms
iter 257880: loss nan, time 114.54ms
iter 257890: loss nan, time 118.68ms
tensor(0.6243)
iter 257900: loss nan, time 114.92ms
iter 257910: loss nan, time 116.46ms
iter 257920: loss nan, time 118.70ms
iter 257930: loss nan, time 116.78ms
iter 257940: loss nan, time 114.65ms
iter 257950: loss nan, time 118.90ms
iter 257960: loss nan, time 114.42ms
iter 257970: loss nan, time 115.50ms
iter 257980: loss nan, time 118.89ms
iter 257990: loss nan, time 116.67ms
tensor(0.6545)
step 258000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 258000: loss nan, time 2911.60ms
iter 258010: loss nan, time 116.69ms
iter 258020: loss nan, time 116.76ms
iter 258030: loss nan, time 116.49ms
iter 258040: loss nan, time 118.88ms
iter 258050: loss nan, time 116.96ms
iter 258060: loss nan, time 114.68ms
iter 258070: loss nan, time 114.79ms
iter 258080: loss nan, time 116.72ms
iter 258090: loss nan, time 115.20ms
tensor(0.6841)
iter 258100: loss nan, time 119.16ms
iter 258110: loss nan, time 116.88ms
iter 258120: loss nan, time 116.82ms
iter 258130: loss nan, time 114.66ms
iter 258140: loss nan, time 117.08ms
iter 258150: loss nan, time 114.86ms
iter 258160: loss nan, time 118.90ms
iter 258170: loss nan, time 116.79ms
iter 258180: loss nan, time 116.27ms
iter 258190: loss nan, time 115.87ms
tensor(0.7129)
iter 258200: loss nan, time 117.05ms
iter 258210: loss nan, time 115.05ms
iter 258220: loss nan, time 118.98ms
iter 258230: loss nan, time 116.80ms
iter 258240: loss nan, time 117.15ms
step 258250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 258250: loss nan, time 2918.69ms
iter 258260: loss nan, time 116.75ms
iter 258270: loss nan, time 116.78ms
iter 258280: loss nan, time 118.46ms
iter 258290: loss nan, time 116.79ms
tensor(0.7409)
iter 258300: loss nan, time 116.80ms
iter 258310: loss nan, time 116.73ms
iter 258320: loss nan, time 116.67ms
iter 258330: loss nan, time 116.66ms
iter 258340: loss nan, time 118.44ms
iter 258350: loss nan, time 116.76ms
iter 258360: loss nan, time 116.22ms
iter 258370: loss nan, time 117.49ms
iter 258380: loss nan, time 116.62ms
iter 258390: loss nan, time 116.67ms
tensor(0.7679)
iter 258400: loss nan, time 117.07ms
iter 258410: loss nan, time 116.85ms
iter 258420: loss nan, time 116.32ms
iter 258430: loss nan, time 118.06ms
iter 258440: loss nan, time 121.80ms
iter 258450: loss nan, time 117.54ms
iter 258460: loss nan, time 114.93ms
iter 258470: loss nan, time 117.37ms
iter 258480: loss nan, time 117.98ms
iter 258490: loss nan, time 116.15ms
tensor(0.7939)
step 258500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 258500: loss nan, time 2911.38ms
iter 258510: loss nan, time 116.78ms
iter 258520: loss nan, time 116.21ms
iter 258530: loss nan, time 117.05ms
iter 258540: loss nan, time 117.96ms
iter 258550: loss nan, time 115.80ms
iter 258560: loss nan, time 117.08ms
iter 258570: loss nan, time 116.83ms
iter 258580: loss nan, time 115.55ms
iter 258590: loss nan, time 117.32ms
tensor(0.8187)
iter 258600: loss nan, time 117.93ms
iter 258610: loss nan, time 115.50ms
iter 258620: loss nan, time 116.92ms
iter 258630: loss nan, time 117.18ms
iter 258640: loss nan, time 115.79ms
iter 258650: loss nan, time 117.10ms
iter 258660: loss nan, time 117.62ms
iter 258670: loss nan, time 115.76ms
iter 258680: loss nan, time 116.88ms
iter 258690: loss nan, time 116.68ms
tensor(0.8423)
iter 258700: loss nan, time 115.69ms
iter 258710: loss nan, time 117.17ms
iter 258720: loss nan, time 117.59ms
iter 258730: loss nan, time 116.08ms
iter 258740: loss nan, time 117.37ms
step 258750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 258750: loss nan, time 2908.74ms
iter 258760: loss nan, time 114.33ms
iter 258770: loss nan, time 116.73ms
iter 258780: loss nan, time 116.43ms
iter 258790: loss nan, time 115.45ms
tensor(0.8645)
iter 258800: loss nan, time 117.95ms
iter 258810: loss nan, time 118.64ms
iter 258820: loss nan, time 114.45ms
iter 258830: loss nan, time 116.66ms
iter 258840: loss nan, time 118.14ms
iter 258850: loss nan, time 115.70ms
iter 258860: loss nan, time 116.92ms
iter 258870: loss nan, time 115.42ms
iter 258880: loss nan, time 115.82ms
iter 258890: loss nan, time 115.69ms
tensor(0.8853)
iter 258900: loss nan, time 115.29ms
iter 258910: loss nan, time 116.00ms
iter 258920: loss nan, time 115.54ms
iter 258930: loss nan, time 116.76ms
iter 258940: loss nan, time 118.52ms
iter 258950: loss nan, time 115.68ms
iter 258960: loss nan, time 116.70ms
iter 258970: loss nan, time 115.86ms
iter 258980: loss nan, time 115.86ms
iter 258990: loss nan, time 116.99ms
tensor(0.9045)
step 259000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 259000: loss nan, time 2906.92ms
iter 259010: loss nan, time 114.76ms
iter 259020: loss nan, time 114.79ms
iter 259030: loss nan, time 116.72ms
iter 259040: loss nan, time 114.66ms
iter 259050: loss nan, time 118.94ms
iter 259060: loss nan, time 116.66ms
iter 259070: loss nan, time 116.45ms
iter 259080: loss nan, time 114.79ms
iter 259090: loss nan, time 116.78ms
tensor(0.9222)
iter 259100: loss nan, time 114.92ms
iter 259110: loss nan, time 118.42ms
iter 259120: loss nan, time 117.06ms
iter 259130: loss nan, time 116.87ms
iter 259140: loss nan, time 116.90ms
iter 259150: loss nan, time 116.78ms
iter 259160: loss nan, time 114.95ms
iter 259170: loss nan, time 119.09ms
iter 259180: loss nan, time 117.22ms
iter 259190: loss nan, time 116.16ms
tensor(0.9382)
iter 259200: loss nan, time 115.08ms
iter 259210: loss nan, time 116.77ms
iter 259220: loss nan, time 116.39ms
iter 259230: loss nan, time 118.27ms
iter 259240: loss nan, time 116.79ms
step 259250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 259250: loss nan, time 2915.47ms
iter 259260: loss nan, time 118.87ms
iter 259270: loss nan, time 116.46ms
iter 259280: loss nan, time 117.35ms
iter 259290: loss nan, time 118.85ms
tensor(0.9524)
iter 259300: loss nan, time 117.04ms
iter 259310: loss nan, time 115.91ms
iter 259320: loss nan, time 117.36ms
iter 259330: loss nan, time 115.87ms
iter 259340: loss nan, time 116.49ms
iter 259350: loss nan, time 118.81ms
iter 259360: loss nan, time 116.33ms
iter 259370: loss nan, time 115.57ms
iter 259380: loss nan, time 117.28ms
iter 259390: loss nan, time 115.86ms
tensor(0.9649)
iter 259400: loss nan, time 115.75ms
iter 259410: loss nan, time 118.67ms
iter 259420: loss nan, time 116.66ms
iter 259430: loss nan, time 115.28ms
iter 259440: loss nan, time 117.99ms
iter 259450: loss nan, time 115.85ms
iter 259460: loss nan, time 116.75ms
iter 259470: loss nan, time 119.56ms
iter 259480: loss nan, time 117.10ms
iter 259490: loss nan, time 114.66ms
tensor(0.9755)
step 259500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 259500: loss nan, time 2902.80ms
iter 259510: loss nan, time 116.36ms
iter 259520: loss nan, time 116.61ms
iter 259530: loss nan, time 117.97ms
iter 259540: loss nan, time 115.81ms
iter 259550: loss nan, time 116.69ms
iter 259560: loss nan, time 118.18ms
iter 259570: loss nan, time 115.81ms
iter 259580: loss nan, time 116.78ms
iter 259590: loss nan, time 117.89ms
tensor(0.9843)
iter 259600: loss nan, time 116.16ms
iter 259610: loss nan, time 116.68ms
iter 259620: loss nan, time 117.69ms
iter 259630: loss nan, time 115.43ms
iter 259640: loss nan, time 116.71ms
iter 259650: loss nan, time 118.01ms
iter 259660: loss nan, time 115.78ms
iter 259670: loss nan, time 116.82ms
iter 259680: loss nan, time 118.56ms
iter 259690: loss nan, time 115.81ms
tensor(0.9911)
iter 259700: loss nan, time 117.15ms
iter 259710: loss nan, time 117.96ms
iter 259720: loss nan, time 114.96ms
iter 259730: loss nan, time 116.77ms
iter 259740: loss nan, time 117.97ms
step 259750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 259750: loss nan, time 2899.92ms
iter 259760: loss nan, time 119.02ms
iter 259770: loss nan, time 116.78ms
iter 259780: loss nan, time 114.61ms
iter 259790: loss nan, time 117.71ms
tensor(0.9961)
iter 259800: loss nan, time 114.87ms
iter 259810: loss nan, time 114.87ms
iter 259820: loss nan, time 118.96ms
iter 259830: loss nan, time 116.74ms
iter 259840: loss nan, time 114.76ms
iter 259850: loss nan, time 119.29ms
iter 259860: loss nan, time 114.63ms
iter 259870: loss nan, time 115.35ms
iter 259880: loss nan, time 118.91ms
iter 259890: loss nan, time 117.03ms
tensor(0.9990)
iter 259900: loss nan, time 114.95ms
iter 259910: loss nan, time 117.66ms
iter 259920: loss nan, time 114.57ms
iter 259930: loss nan, time 115.49ms
iter 259940: loss nan, time 118.96ms
iter 259950: loss nan, time 116.77ms
iter 259960: loss nan, time 114.59ms
iter 259970: loss nan, time 118.63ms
iter 259980: loss nan, time 114.61ms
iter 259990: loss nan, time 121.94ms
tensor(1.)
step 260000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 260000: loss nan, time 2906.01ms
iter 260010: loss nan, time 114.78ms
iter 260020: loss nan, time 117.13ms
iter 260030: loss nan, time 115.88ms
iter 260040: loss nan, time 116.31ms
iter 260050: loss nan, time 116.56ms
iter 260060: loss nan, time 116.97ms
iter 260070: loss nan, time 116.40ms
iter 260080: loss nan, time 117.13ms
iter 260090: loss nan, time 116.84ms
tensor(0.9990)
iter 260100: loss nan, time 115.51ms
iter 260110: loss nan, time 116.68ms
iter 260120: loss nan, time 116.75ms
iter 260130: loss nan, time 115.50ms
iter 260140: loss nan, time 116.46ms
iter 260150: loss nan, time 117.18ms
iter 260160: loss nan, time 115.14ms
iter 260170: loss nan, time 117.03ms
iter 260180: loss nan, time 116.84ms
iter 260190: loss nan, time 115.78ms
tensor(0.9961)
iter 260200: loss nan, time 117.01ms
iter 260210: loss nan, time 117.60ms
iter 260220: loss nan, time 114.57ms
iter 260230: loss nan, time 117.02ms
iter 260240: loss nan, time 117.59ms
step 260250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 260250: loss nan, time 2896.85ms
iter 260260: loss nan, time 119.05ms
iter 260270: loss nan, time 116.93ms
iter 260280: loss nan, time 114.85ms
iter 260290: loss nan, time 117.80ms
tensor(0.9911)
iter 260300: loss nan, time 115.24ms
iter 260310: loss nan, time 114.58ms
iter 260320: loss nan, time 118.78ms
iter 260330: loss nan, time 117.16ms
iter 260340: loss nan, time 115.78ms
iter 260350: loss nan, time 117.15ms
iter 260360: loss nan, time 115.05ms
iter 260370: loss nan, time 114.62ms
iter 260380: loss nan, time 119.08ms
iter 260390: loss nan, time 117.10ms
tensor(0.9843)
iter 260400: loss nan, time 115.03ms
iter 260410: loss nan, time 117.02ms
iter 260420: loss nan, time 114.97ms
iter 260430: loss nan, time 114.50ms
iter 260440: loss nan, time 118.28ms
iter 260450: loss nan, time 117.22ms
iter 260460: loss nan, time 115.00ms
iter 260470: loss nan, time 116.90ms
iter 260480: loss nan, time 114.78ms
iter 260490: loss nan, time 114.96ms
tensor(0.9755)
step 260500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 260500: loss nan, time 2907.24ms
iter 260510: loss nan, time 116.82ms
iter 260520: loss nan, time 116.94ms
iter 260530: loss nan, time 116.52ms
iter 260540: loss nan, time 116.82ms
iter 260550: loss nan, time 114.80ms
iter 260560: loss nan, time 119.22ms
iter 260570: loss nan, time 116.83ms
iter 260580: loss nan, time 115.34ms
iter 260590: loss nan, time 115.69ms
tensor(0.9649)
iter 260600: loss nan, time 117.23ms
iter 260610: loss nan, time 114.60ms
iter 260620: loss nan, time 116.85ms
iter 260630: loss nan, time 117.42ms
iter 260640: loss nan, time 115.97ms
iter 260650: loss nan, time 114.65ms
iter 260660: loss nan, time 118.19ms
iter 260670: loss nan, time 115.26ms
iter 260680: loss nan, time 116.52ms
iter 260690: loss nan, time 118.05ms
tensor(0.9524)
iter 260700: loss nan, time 116.41ms
iter 260710: loss nan, time 115.17ms
iter 260720: loss nan, time 118.01ms
iter 260730: loss nan, time 114.97ms
iter 260740: loss nan, time 116.73ms
step 260750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 260750: loss nan, time 2901.74ms
iter 260760: loss nan, time 114.75ms
iter 260770: loss nan, time 116.85ms
iter 260780: loss nan, time 116.71ms
iter 260790: loss nan, time 116.79ms
tensor(0.9382)
iter 260800: loss nan, time 118.88ms
iter 260810: loss nan, time 116.84ms
iter 260820: loss nan, time 115.43ms
iter 260830: loss nan, time 118.86ms
iter 260840: loss nan, time 116.71ms
iter 260850: loss nan, time 116.82ms
iter 260860: loss nan, time 117.88ms
iter 260870: loss nan, time 116.70ms
iter 260880: loss nan, time 115.07ms
iter 260890: loss nan, time 118.88ms
tensor(0.9222)
iter 260900: loss nan, time 117.05ms
iter 260910: loss nan, time 116.87ms
iter 260920: loss nan, time 118.74ms
iter 260930: loss nan, time 116.71ms
iter 260940: loss nan, time 116.58ms
iter 260950: loss nan, time 118.52ms
iter 260960: loss nan, time 116.77ms
iter 260970: loss nan, time 116.69ms
iter 260980: loss nan, time 118.91ms
iter 260990: loss nan, time 116.58ms
tensor(0.9045)
step 261000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 261000: loss nan, time 2903.74ms
iter 261010: loss nan, time 117.07ms
iter 261020: loss nan, time 115.78ms
iter 261030: loss nan, time 117.02ms
iter 261040: loss nan, time 118.64ms
iter 261050: loss nan, time 114.53ms
iter 261060: loss nan, time 116.78ms
iter 261070: loss nan, time 117.93ms
iter 261080: loss nan, time 115.71ms
iter 261090: loss nan, time 116.63ms
tensor(0.8853)
iter 261100: loss nan, time 118.20ms
iter 261110: loss nan, time 115.07ms
iter 261120: loss nan, time 116.63ms
iter 261130: loss nan, time 116.71ms
iter 261140: loss nan, time 115.23ms
iter 261150: loss nan, time 116.57ms
iter 261160: loss nan, time 117.87ms
iter 261170: loss nan, time 115.33ms
iter 261180: loss nan, time 116.53ms
iter 261190: loss nan, time 118.04ms
tensor(0.8645)
iter 261200: loss nan, time 116.06ms
iter 261210: loss nan, time 116.79ms
iter 261220: loss nan, time 117.94ms
iter 261230: loss nan, time 115.61ms
iter 261240: loss nan, time 116.69ms
step 261250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 261250: loss nan, time 2896.86ms
iter 261260: loss nan, time 115.58ms
iter 261270: loss nan, time 118.96ms
iter 261280: loss nan, time 115.71ms
iter 261290: loss nan, time 115.17ms
tensor(0.8423)
iter 261300: loss nan, time 117.61ms
iter 261310: loss nan, time 114.38ms
iter 261320: loss nan, time 114.84ms
iter 261330: loss nan, time 119.09ms
iter 261340: loss nan, time 116.72ms
iter 261350: loss nan, time 115.11ms
iter 261360: loss nan, time 116.91ms
iter 261370: loss nan, time 114.15ms
iter 261380: loss nan, time 114.85ms
iter 261390: loss nan, time 118.94ms
tensor(0.8187)
iter 261400: loss nan, time 116.20ms
iter 261410: loss nan, time 115.21ms
iter 261420: loss nan, time 116.74ms
iter 261430: loss nan, time 114.83ms
iter 261440: loss nan, time 114.80ms
iter 261450: loss nan, time 119.12ms
iter 261460: loss nan, time 117.41ms
iter 261470: loss nan, time 114.68ms
iter 261480: loss nan, time 117.84ms
iter 261490: loss nan, time 115.04ms
tensor(0.7939)
step 261500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 261500: loss nan, time 2914.21ms
iter 261510: loss nan, time 118.91ms
iter 261520: loss nan, time 116.90ms
iter 261530: loss nan, time 116.85ms
iter 261540: loss nan, time 115.94ms
iter 261550: loss nan, time 116.78ms
iter 261560: loss nan, time 115.14ms
iter 261570: loss nan, time 118.94ms
iter 261580: loss nan, time 117.45ms
iter 261590: loss nan, time 116.19ms
tensor(0.7679)
iter 261600: loss nan, time 115.93ms
iter 261610: loss nan, time 116.74ms
iter 261620: loss nan, time 114.39ms
iter 261630: loss nan, time 118.60ms
iter 261640: loss nan, time 116.67ms
iter 261650: loss nan, time 116.33ms
iter 261660: loss nan, time 116.31ms
iter 261670: loss nan, time 116.76ms
iter 261680: loss nan, time 115.49ms
iter 261690: loss nan, time 119.01ms
tensor(0.7409)
iter 261700: loss nan, time 116.99ms
iter 261710: loss nan, time 115.80ms
iter 261720: loss nan, time 116.25ms
iter 261730: loss nan, time 116.66ms
iter 261740: loss nan, time 115.89ms
step 261750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 261750: loss nan, time 2914.00ms
iter 261760: loss nan, time 116.50ms
iter 261770: loss nan, time 116.69ms
iter 261780: loss nan, time 118.81ms
iter 261790: loss nan, time 116.76ms
tensor(0.7129)
iter 261800: loss nan, time 115.17ms
iter 261810: loss nan, time 116.67ms
iter 261820: loss nan, time 115.13ms
iter 261830: loss nan, time 116.09ms
iter 261840: loss nan, time 114.96ms
iter 261850: loss nan, time 115.95ms
iter 261860: loss nan, time 112.69ms
iter 261870: loss nan, time 117.25ms
iter 261880: loss nan, time 113.20ms
iter 261890: loss nan, time 116.63ms
tensor(0.6841)
iter 261900: loss nan, time 118.24ms
iter 261910: loss nan, time 114.78ms
iter 261920: loss nan, time 114.48ms
iter 261930: loss nan, time 117.69ms
iter 261940: loss nan, time 113.69ms
iter 261950: loss nan, time 116.76ms
iter 261960: loss nan, time 118.20ms
iter 261970: loss nan, time 114.78ms
iter 261980: loss nan, time 114.57ms
iter 261990: loss nan, time 118.49ms
tensor(0.6545)
step 262000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 262000: loss nan, time 2911.27ms
iter 262010: loss nan, time 116.88ms
iter 262020: loss nan, time 117.25ms
iter 262030: loss nan, time 116.47ms
iter 262040: loss nan, time 116.79ms
iter 262050: loss nan, time 116.83ms
iter 262060: loss nan, time 115.99ms
iter 262070: loss nan, time 117.15ms
iter 262080: loss nan, time 118.21ms
iter 262090: loss nan, time 115.91ms
tensor(0.6243)
iter 262100: loss nan, time 117.34ms
iter 262110: loss nan, time 116.96ms
iter 262120: loss nan, time 116.16ms
iter 262130: loss nan, time 117.34ms
iter 262140: loss nan, time 118.00ms
iter 262150: loss nan, time 115.48ms
iter 262160: loss nan, time 116.77ms
iter 262170: loss nan, time 116.80ms
iter 262180: loss nan, time 116.04ms
iter 262190: loss nan, time 116.69ms
tensor(0.5937)
iter 262200: loss nan, time 117.87ms
iter 262210: loss nan, time 116.40ms
iter 262220: loss nan, time 117.26ms
iter 262230: loss nan, time 117.51ms
iter 262240: loss nan, time 116.77ms
step 262250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 262250: loss nan, time 2925.67ms
iter 262260: loss nan, time 119.40ms
iter 262270: loss nan, time 117.04ms
iter 262280: loss nan, time 115.05ms
iter 262290: loss nan, time 117.15ms
tensor(0.5627)
iter 262300: loss nan, time 117.38ms
iter 262310: loss nan, time 116.85ms
iter 262320: loss nan, time 117.06ms
iter 262330: loss nan, time 116.06ms
iter 262340: loss nan, time 115.12ms
iter 262350: loss nan, time 116.49ms
iter 262360: loss nan, time 116.99ms
iter 262370: loss nan, time 114.91ms
iter 262380: loss nan, time 118.67ms
iter 262390: loss nan, time 116.85ms
tensor(0.5314)
iter 262400: loss nan, time 115.11ms
iter 262410: loss nan, time 117.48ms
iter 262420: loss nan, time 117.14ms
iter 262430: loss nan, time 115.68ms
iter 262440: loss nan, time 117.50ms
iter 262450: loss nan, time 117.82ms
iter 262460: loss nan, time 117.21ms
iter 262470: loss nan, time 117.02ms
iter 262480: loss nan, time 116.92ms
iter 262490: loss nan, time 116.96ms
tensor(0.5000)
step 262500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 262500: loss nan, time 2913.16ms
iter 262510: loss nan, time 117.81ms
iter 262520: loss nan, time 116.93ms
iter 262530: loss nan, time 114.98ms
iter 262540: loss nan, time 116.49ms
iter 262550: loss nan, time 117.60ms
iter 262560: loss nan, time 118.54ms
iter 262570: loss nan, time 116.77ms
iter 262580: loss nan, time 116.86ms
iter 262590: loss nan, time 117.35ms
tensor(0.4686)
iter 262600: loss nan, time 118.26ms
iter 262610: loss nan, time 115.45ms
iter 262620: loss nan, time 117.23ms
iter 262630: loss nan, time 116.76ms
iter 262640: loss nan, time 117.61ms
iter 262650: loss nan, time 116.84ms
iter 262660: loss nan, time 116.65ms
iter 262670: loss nan, time 116.16ms
iter 262680: loss nan, time 116.74ms
iter 262690: loss nan, time 116.66ms
tensor(0.4373)
iter 262700: loss nan, time 117.60ms
iter 262710: loss nan, time 117.13ms
iter 262720: loss nan, time 118.17ms
iter 262730: loss nan, time 114.83ms
iter 262740: loss nan, time 117.34ms
step 262750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 262750: loss nan, time 2909.06ms
iter 262760: loss nan, time 116.82ms
iter 262770: loss nan, time 116.94ms
iter 262780: loss nan, time 116.40ms
iter 262790: loss nan, time 116.87ms
tensor(0.4063)
iter 262800: loss nan, time 117.10ms
iter 262810: loss nan, time 116.26ms
iter 262820: loss nan, time 117.04ms
iter 262830: loss nan, time 116.99ms
iter 262840: loss nan, time 116.92ms
iter 262850: loss nan, time 117.07ms
iter 262860: loss nan, time 116.80ms
iter 262870: loss nan, time 116.93ms
iter 262880: loss nan, time 117.06ms
iter 262890: loss nan, time 117.65ms
tensor(0.3757)
iter 262900: loss nan, time 117.16ms
iter 262910: loss nan, time 117.40ms
iter 262920: loss nan, time 114.73ms
iter 262930: loss nan, time 116.81ms
iter 262940: loss nan, time 116.87ms
iter 262950: loss nan, time 116.37ms
iter 262960: loss nan, time 116.81ms
iter 262970: loss nan, time 116.95ms
iter 262980: loss nan, time 114.83ms
iter 262990: loss nan, time 118.26ms
tensor(0.3455)
step 263000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 263000: loss nan, time 2901.49ms
iter 263010: loss nan, time 116.87ms
iter 263020: loss nan, time 116.71ms
iter 263030: loss nan, time 117.14ms
iter 263040: loss nan, time 117.59ms
iter 263050: loss nan, time 116.68ms
iter 263060: loss nan, time 118.03ms
iter 263070: loss nan, time 116.76ms
iter 263080: loss nan, time 116.69ms
iter 263090: loss nan, time 116.12ms
tensor(0.3159)
iter 263100: loss nan, time 117.16ms
iter 263110: loss nan, time 116.77ms
iter 263120: loss nan, time 117.73ms
iter 263130: loss nan, time 116.85ms
iter 263140: loss nan, time 116.81ms
iter 263150: loss nan, time 116.21ms
iter 263160: loss nan, time 116.72ms
iter 263170: loss nan, time 116.65ms
iter 263180: loss nan, time 117.06ms
iter 263190: loss nan, time 116.76ms
tensor(0.2871)
iter 263200: loss nan, time 116.99ms
iter 263210: loss nan, time 116.52ms
iter 263220: loss nan, time 115.97ms
iter 263230: loss nan, time 116.76ms
iter 263240: loss nan, time 118.19ms
step 263250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 263250: loss nan, time 2914.64ms
iter 263260: loss nan, time 116.65ms
iter 263270: loss nan, time 118.07ms
iter 263280: loss nan, time 117.01ms
iter 263290: loss nan, time 116.71ms
tensor(0.2591)
iter 263300: loss nan, time 116.59ms
iter 263310: loss nan, time 116.73ms
iter 263320: loss nan, time 116.74ms
iter 263330: loss nan, time 116.88ms
iter 263340: loss nan, time 117.33ms
iter 263350: loss nan, time 117.28ms
iter 263360: loss nan, time 117.39ms
iter 263370: loss nan, time 117.04ms
iter 263380: loss nan, time 117.11ms
iter 263390: loss nan, time 116.64ms
tensor(0.2321)
iter 263400: loss nan, time 118.02ms
iter 263410: loss nan, time 116.99ms
iter 263420: loss nan, time 118.50ms
iter 263430: loss nan, time 117.09ms
iter 263440: loss nan, time 116.77ms
iter 263450: loss nan, time 117.05ms
iter 263460: loss nan, time 117.31ms
iter 263470: loss nan, time 117.40ms
iter 263480: loss nan, time 117.26ms
iter 263490: loss nan, time 116.16ms
tensor(0.2061)
step 263500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 263500: loss nan, time 2910.52ms
iter 263510: loss nan, time 116.81ms
iter 263520: loss nan, time 116.62ms
iter 263530: loss nan, time 116.95ms
iter 263540: loss nan, time 116.81ms
iter 263550: loss nan, time 116.91ms
iter 263560: loss nan, time 116.71ms
iter 263570: loss nan, time 116.91ms
iter 263580: loss nan, time 116.30ms
iter 263590: loss nan, time 116.76ms
tensor(0.1813)
iter 263600: loss nan, time 117.13ms
iter 263610: loss nan, time 117.81ms
iter 263620: loss nan, time 116.74ms
iter 263630: loss nan, time 116.72ms
iter 263640: loss nan, time 117.93ms
iter 263650: loss nan, time 117.48ms
iter 263660: loss nan, time 116.70ms
iter 263670: loss nan, time 116.65ms
iter 263680: loss nan, time 116.77ms
iter 263690: loss nan, time 116.77ms
tensor(0.1577)
iter 263700: loss nan, time 117.31ms
iter 263710: loss nan, time 117.09ms
iter 263720: loss nan, time 116.47ms
iter 263730: loss nan, time 118.34ms
iter 263740: loss nan, time 116.83ms
step 263750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 263750: loss nan, time 2910.97ms
iter 263760: loss nan, time 117.32ms
iter 263770: loss nan, time 117.13ms
iter 263780: loss nan, time 116.76ms
iter 263790: loss nan, time 117.87ms
tensor(0.1355)
iter 263800: loss nan, time 117.20ms
iter 263810: loss nan, time 115.89ms
iter 263820: loss nan, time 117.63ms
iter 263830: loss nan, time 117.09ms
iter 263840: loss nan, time 117.28ms
iter 263850: loss nan, time 117.95ms
iter 263860: loss nan, time 117.02ms
iter 263870: loss nan, time 116.06ms
iter 263880: loss nan, time 116.43ms
iter 263890: loss nan, time 116.63ms
tensor(0.1147)
iter 263900: loss nan, time 116.64ms
iter 263910: loss nan, time 116.86ms
iter 263920: loss nan, time 117.53ms
iter 263930: loss nan, time 118.06ms
iter 263940: loss nan, time 116.62ms
iter 263950: loss nan, time 117.56ms
iter 263960: loss nan, time 116.58ms
iter 263970: loss nan, time 117.23ms
iter 263980: loss nan, time 117.08ms
iter 263990: loss nan, time 116.63ms
tensor(0.0955)
step 264000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 264000: loss nan, time 2888.24ms
iter 264010: loss nan, time 121.65ms
iter 264020: loss nan, time 121.26ms
iter 264030: loss nan, time 121.82ms
iter 264040: loss nan, time 121.61ms
iter 264050: loss nan, time 120.96ms
iter 264060: loss nan, time 120.97ms
iter 264070: loss nan, time 119.93ms
iter 264080: loss nan, time 121.70ms
iter 264090: loss nan, time 119.84ms
tensor(0.0778)
iter 264100: loss nan, time 121.31ms
iter 264110: loss nan, time 119.30ms
iter 264120: loss nan, time 122.88ms
iter 264130: loss nan, time 122.79ms
iter 264140: loss nan, time 121.32ms
iter 264150: loss nan, time 121.26ms
iter 264160: loss nan, time 121.11ms
iter 264170: loss nan, time 121.44ms
iter 264180: loss nan, time 120.57ms
iter 264190: loss nan, time 118.56ms
tensor(0.0618)
iter 264200: loss nan, time 119.24ms
iter 264210: loss nan, time 119.22ms
iter 264220: loss nan, time 121.58ms
iter 264230: loss nan, time 121.32ms
iter 264240: loss nan, time 122.22ms
step 264250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 264250: loss nan, time 2916.52ms
iter 264260: loss nan, time 119.83ms
iter 264270: loss nan, time 119.78ms
iter 264280: loss nan, time 120.01ms
iter 264290: loss nan, time 120.26ms
tensor(0.0476)
iter 264300: loss nan, time 121.51ms
iter 264310: loss nan, time 120.54ms
iter 264320: loss nan, time 119.97ms
iter 264330: loss nan, time 119.51ms
iter 264340: loss nan, time 120.02ms
iter 264350: loss nan, time 120.24ms
iter 264360: loss nan, time 121.36ms
iter 264370: loss nan, time 120.82ms
iter 264380: loss nan, time 122.30ms
iter 264390: loss nan, time 122.80ms
tensor(0.0351)
iter 264400: loss nan, time 121.01ms
iter 264410: loss nan, time 120.58ms
iter 264420: loss nan, time 118.93ms
iter 264430: loss nan, time 119.46ms
iter 264440: loss nan, time 118.75ms
iter 264450: loss nan, time 117.02ms
iter 264460: loss nan, time 118.54ms
iter 264470: loss nan, time 118.60ms
iter 264480: loss nan, time 119.31ms
iter 264490: loss nan, time 119.83ms
tensor(0.0245)
step 264500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 264500: loss nan, time 2913.56ms
iter 264510: loss nan, time 121.22ms
iter 264520: loss nan, time 121.03ms
iter 264530: loss nan, time 120.83ms
iter 264540: loss nan, time 120.93ms
iter 264550: loss nan, time 118.93ms
iter 264560: loss nan, time 120.97ms
iter 264570: loss nan, time 117.75ms
iter 264580: loss nan, time 120.93ms
iter 264590: loss nan, time 120.61ms
tensor(0.0157)
iter 264600: loss nan, time 121.48ms
iter 264610: loss nan, time 124.17ms
iter 264620: loss nan, time 123.58ms
iter 264630: loss nan, time 121.21ms
iter 264640: loss nan, time 123.40ms
iter 264650: loss nan, time 120.23ms
iter 264660: loss nan, time 119.03ms
iter 264670: loss nan, time 118.91ms
iter 264680: loss nan, time 121.10ms
iter 264690: loss nan, time 121.23ms
tensor(0.0089)
iter 264700: loss nan, time 121.41ms
iter 264710: loss nan, time 122.79ms
iter 264720: loss nan, time 121.38ms
iter 264730: loss nan, time 120.05ms
iter 264740: loss nan, time 121.18ms
step 264750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 264750: loss nan, time 2923.59ms
iter 264760: loss nan, time 120.13ms
iter 264770: loss nan, time 120.59ms
iter 264780: loss nan, time 121.25ms
iter 264790: loss nan, time 121.87ms
tensor(0.0039)
iter 264800: loss nan, time 122.86ms
iter 264810: loss nan, time 120.14ms
iter 264820: loss nan, time 121.48ms
iter 264830: loss nan, time 119.28ms
iter 264840: loss nan, time 119.49ms
iter 264850: loss nan, time 120.71ms
iter 264860: loss nan, time 121.21ms
iter 264870: loss nan, time 119.04ms
iter 264880: loss nan, time 122.87ms
iter 264890: loss nan, time 121.14ms
tensor(0.0010)
iter 264900: loss nan, time 119.56ms
iter 264910: loss nan, time 121.16ms
iter 264920: loss nan, time 119.41ms
iter 264930: loss nan, time 119.65ms
iter 264940: loss nan, time 119.82ms
iter 264950: loss nan, time 120.90ms
iter 264960: loss nan, time 121.14ms
iter 264970: loss nan, time 121.12ms
iter 264980: loss nan, time 122.22ms
iter 264990: loss nan, time 123.20ms
tensor(0.0010)
step 265000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 265000: loss nan, time 2922.92ms
iter 265010: loss nan, time 120.00ms
iter 265020: loss nan, time 120.88ms
iter 265030: loss nan, time 121.31ms
iter 265040: loss nan, time 121.10ms
iter 265050: loss nan, time 119.25ms
iter 265060: loss nan, time 119.15ms
iter 265070: loss nan, time 119.10ms
iter 265080: loss nan, time 120.88ms
iter 265090: loss nan, time 121.01ms
tensor(0.0010)
iter 265100: loss nan, time 119.94ms
iter 265110: loss nan, time 121.10ms
iter 265120: loss nan, time 119.89ms
iter 265130: loss nan, time 122.41ms
iter 265140: loss nan, time 123.13ms
iter 265150: loss nan, time 119.11ms
iter 265160: loss nan, time 121.51ms
iter 265170: loss nan, time 121.06ms
iter 265180: loss nan, time 120.97ms
iter 265190: loss nan, time 118.99ms
tensor(0.0039)
iter 265200: loss nan, time 119.89ms
iter 265210: loss nan, time 120.99ms
iter 265220: loss nan, time 120.97ms
iter 265230: loss nan, time 121.33ms
iter 265240: loss nan, time 122.53ms
step 265250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 265250: loss nan, time 2914.12ms
iter 265260: loss nan, time 123.19ms
iter 265270: loss nan, time 121.18ms
iter 265280: loss nan, time 121.46ms
iter 265290: loss nan, time 121.26ms
tensor(0.0089)
iter 265300: loss nan, time 119.48ms
iter 265310: loss nan, time 119.78ms
iter 265320: loss nan, time 118.97ms
iter 265330: loss nan, time 121.38ms
iter 265340: loss nan, time 121.53ms
iter 265350: loss nan, time 121.44ms
iter 265360: loss nan, time 123.48ms
iter 265370: loss nan, time 120.35ms
iter 265380: loss nan, time 121.33ms
iter 265390: loss nan, time 121.13ms
tensor(0.0157)
iter 265400: loss nan, time 119.23ms
iter 265410: loss nan, time 119.36ms
iter 265420: loss nan, time 120.01ms
iter 265430: loss nan, time 121.55ms
iter 265440: loss nan, time 120.32ms
iter 265450: loss nan, time 121.50ms
iter 265460: loss nan, time 123.13ms
iter 265470: loss nan, time 122.79ms
iter 265480: loss nan, time 120.97ms
iter 265490: loss nan, time 120.87ms
tensor(0.0245)
step 265500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 265500: loss nan, time 2906.60ms
iter 265510: loss nan, time 120.80ms
iter 265520: loss nan, time 120.85ms
iter 265530: loss nan, time 120.98ms
iter 265540: loss nan, time 120.85ms
iter 265550: loss nan, time 119.01ms
iter 265560: loss nan, time 120.21ms
iter 265570: loss nan, time 118.90ms
iter 265580: loss nan, time 120.96ms
iter 265590: loss nan, time 121.07ms
tensor(0.0351)
iter 265600: loss nan, time 120.52ms
iter 265610: loss nan, time 122.15ms
iter 265620: loss nan, time 121.03ms
iter 265630: loss nan, time 123.16ms
iter 265640: loss nan, time 120.88ms
iter 265650: loss nan, time 118.84ms
iter 265660: loss nan, time 120.74ms
iter 265670: loss nan, time 120.93ms
iter 265680: loss nan, time 120.83ms
iter 265690: loss nan, time 118.66ms
tensor(0.0476)
iter 265700: loss nan, time 119.70ms
iter 265710: loss nan, time 120.29ms
iter 265720: loss nan, time 120.93ms
iter 265730: loss nan, time 121.09ms
iter 265740: loss nan, time 121.94ms
step 265750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 265750: loss nan, time 2908.27ms
iter 265760: loss nan, time 121.27ms
iter 265770: loss nan, time 122.12ms
iter 265780: loss nan, time 122.55ms
iter 265790: loss nan, time 122.69ms
tensor(0.0618)
iter 265800: loss nan, time 120.97ms
iter 265810: loss nan, time 120.66ms
iter 265820: loss nan, time 120.79ms
iter 265830: loss nan, time 120.82ms
iter 265840: loss nan, time 118.90ms
iter 265850: loss nan, time 118.94ms
iter 265860: loss nan, time 119.44ms
iter 265870: loss nan, time 118.80ms
iter 265880: loss nan, time 120.88ms
iter 265890: loss nan, time 120.85ms
tensor(0.0778)
iter 265900: loss nan, time 121.51ms
iter 265910: loss nan, time 122.13ms
iter 265920: loss nan, time 121.33ms
iter 265930: loss nan, time 123.08ms
iter 265940: loss nan, time 120.46ms
iter 265950: loss nan, time 118.72ms
iter 265960: loss nan, time 120.76ms
iter 265970: loss nan, time 120.77ms
iter 265980: loss nan, time 121.17ms
iter 265990: loss nan, time 118.42ms
tensor(0.0955)
step 266000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 266000: loss nan, time 2911.07ms
iter 266010: loss nan, time 118.86ms
iter 266020: loss nan, time 118.40ms
iter 266030: loss nan, time 118.29ms
iter 266040: loss nan, time 120.38ms
iter 266050: loss nan, time 120.97ms
iter 266060: loss nan, time 120.83ms
iter 266070: loss nan, time 120.85ms
iter 266080: loss nan, time 121.47ms
iter 266090: loss nan, time 122.26ms
tensor(0.1147)
iter 266100: loss nan, time 123.22ms
iter 266110: loss nan, time 122.94ms
iter 266120: loss nan, time 120.80ms
iter 266130: loss nan, time 120.01ms
iter 266140: loss nan, time 120.87ms
iter 266150: loss nan, time 120.87ms
iter 266160: loss nan, time 120.80ms
iter 266170: loss nan, time 118.65ms
iter 266180: loss nan, time 119.41ms
iter 266190: loss nan, time 119.90ms
tensor(0.1355)
iter 266200: loss nan, time 120.41ms
iter 266210: loss nan, time 121.15ms
iter 266220: loss nan, time 118.71ms
iter 266230: loss nan, time 122.12ms
iter 266240: loss nan, time 123.00ms
step 266250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 266250: loss nan, time 2919.87ms
iter 266260: loss nan, time 119.95ms
iter 266270: loss nan, time 120.88ms
iter 266280: loss nan, time 120.84ms
iter 266290: loss nan, time 120.84ms
tensor(0.1577)
iter 266300: loss nan, time 121.01ms
iter 266310: loss nan, time 118.72ms
iter 266320: loss nan, time 119.10ms
iter 266330: loss nan, time 119.40ms
iter 266340: loss nan, time 120.37ms
iter 266350: loss nan, time 120.78ms
iter 266360: loss nan, time 120.05ms
iter 266370: loss nan, time 121.28ms
iter 266380: loss nan, time 121.49ms
iter 266390: loss nan, time 123.01ms
tensor(0.1813)
iter 266400: loss nan, time 122.47ms
iter 266410: loss nan, time 122.87ms
iter 266420: loss nan, time 120.89ms
iter 266430: loss nan, time 120.77ms
iter 266440: loss nan, time 120.38ms
iter 266450: loss nan, time 120.81ms
iter 266460: loss nan, time 120.60ms
iter 266470: loss nan, time 119.11ms
iter 266480: loss nan, time 119.41ms
iter 266490: loss nan, time 118.92ms
tensor(0.2061)
step 266500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 266500: loss nan, time 2900.53ms
iter 266510: loss nan, time 117.82ms
iter 266520: loss nan, time 118.81ms
iter 266530: loss nan, time 118.56ms
iter 266540: loss nan, time 120.35ms
iter 266550: loss nan, time 120.26ms
iter 266560: loss nan, time 120.82ms
iter 266570: loss nan, time 120.86ms
iter 266580: loss nan, time 118.47ms
iter 266590: loss nan, time 121.43ms
tensor(0.2321)
iter 266600: loss nan, time 122.47ms
iter 266610: loss nan, time 120.90ms
iter 266620: loss nan, time 123.37ms
iter 266630: loss nan, time 120.84ms
iter 266640: loss nan, time 121.15ms
iter 266650: loss nan, time 120.61ms
iter 266660: loss nan, time 118.57ms
iter 266670: loss nan, time 119.64ms
iter 266680: loss nan, time 119.80ms
iter 266690: loss nan, time 120.78ms
tensor(0.2591)
iter 266700: loss nan, time 121.15ms
iter 266710: loss nan, time 118.62ms
iter 266720: loss nan, time 119.28ms
iter 266730: loss nan, time 120.07ms
iter 266740: loss nan, time 120.84ms
step 266750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 266750: loss nan, time 2911.19ms
iter 266760: loss nan, time 120.95ms
iter 266770: loss nan, time 121.23ms
iter 266780: loss nan, time 122.44ms
iter 266790: loss nan, time 122.49ms
tensor(0.2871)
iter 266800: loss nan, time 123.33ms
iter 266810: loss nan, time 120.74ms
iter 266820: loss nan, time 120.72ms
iter 266830: loss nan, time 120.58ms
iter 266840: loss nan, time 120.79ms
iter 266850: loss nan, time 120.94ms
iter 266860: loss nan, time 120.61ms
iter 266870: loss nan, time 118.54ms
iter 266880: loss nan, time 118.70ms
iter 266890: loss nan, time 119.82ms
tensor(0.3159)
iter 266900: loss nan, time 120.78ms
iter 266910: loss nan, time 120.45ms
iter 266920: loss nan, time 120.72ms
iter 266930: loss nan, time 118.54ms
iter 266940: loss nan, time 121.01ms
iter 266950: loss nan, time 121.07ms
iter 266960: loss nan, time 120.87ms
iter 266970: loss nan, time 122.86ms
iter 266980: loss nan, time 120.57ms
iter 266990: loss nan, time 120.92ms
tensor(0.3455)
step 267000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 267000: loss nan, time 2909.40ms
iter 267010: loss nan, time 122.04ms
iter 267020: loss nan, time 118.40ms
iter 267030: loss nan, time 120.68ms
iter 267040: loss nan, time 120.58ms
iter 267050: loss nan, time 121.17ms
iter 267060: loss nan, time 120.62ms
iter 267070: loss nan, time 118.19ms
iter 267080: loss nan, time 119.97ms
iter 267090: loss nan, time 120.05ms
tensor(0.3757)
iter 267100: loss nan, time 121.47ms
iter 267110: loss nan, time 120.99ms
iter 267120: loss nan, time 120.68ms
iter 267130: loss nan, time 121.04ms
iter 267140: loss nan, time 121.61ms
iter 267150: loss nan, time 122.27ms
iter 267160: loss nan, time 122.94ms
iter 267170: loss nan, time 122.90ms
iter 267180: loss nan, time 120.84ms
iter 267190: loss nan, time 120.97ms
tensor(0.4063)
iter 267200: loss nan, time 120.95ms
iter 267210: loss nan, time 120.40ms
iter 267220: loss nan, time 121.17ms
iter 267230: loss nan, time 119.46ms
iter 267240: loss nan, time 117.50ms
step 267250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 267250: loss nan, time 2915.48ms
iter 267260: loss nan, time 117.44ms
iter 267270: loss nan, time 115.95ms
iter 267280: loss nan, time 115.44ms
iter 267290: loss nan, time 117.21ms
tensor(0.4373)
iter 267300: loss nan, time 115.88ms
iter 267310: loss nan, time 116.87ms
iter 267320: loss nan, time 117.68ms
iter 267330: loss nan, time 119.15ms
iter 267340: loss nan, time 116.27ms
iter 267350: loss nan, time 119.46ms
iter 267360: loss nan, time 118.94ms
iter 267370: loss nan, time 120.58ms
iter 267380: loss nan, time 120.76ms
iter 267390: loss nan, time 120.82ms
tensor(0.4686)
iter 267400: loss nan, time 121.93ms
iter 267410: loss nan, time 118.98ms
iter 267420: loss nan, time 123.13ms
iter 267430: loss nan, time 123.88ms
iter 267440: loss nan, time 119.95ms
iter 267450: loss nan, time 122.78ms
iter 267460: loss nan, time 120.75ms
iter 267470: loss nan, time 120.85ms
iter 267480: loss nan, time 121.21ms
iter 267490: loss nan, time 119.60ms
tensor(0.5000)
step 267500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 267500: loss nan, time 2910.34ms
iter 267510: loss nan, time 120.69ms
iter 267520: loss nan, time 120.58ms
iter 267530: loss nan, time 120.80ms
iter 267540: loss nan, time 120.33ms
iter 267550: loss nan, time 118.98ms
iter 267560: loss nan, time 119.03ms
iter 267570: loss nan, time 118.45ms
iter 267580: loss nan, time 119.87ms
iter 267590: loss nan, time 120.17ms
tensor(0.5314)
iter 267600: loss nan, time 119.80ms
iter 267610: loss nan, time 120.32ms
iter 267620: loss nan, time 121.06ms
iter 267630: loss nan, time 120.76ms
iter 267640: loss nan, time 120.49ms
iter 267650: loss nan, time 121.35ms
iter 267660: loss nan, time 122.26ms
iter 267670: loss nan, time 122.45ms
iter 267680: loss nan, time 123.08ms
iter 267690: loss nan, time 121.03ms
tensor(0.5627)
iter 267700: loss nan, time 121.13ms
iter 267710: loss nan, time 120.14ms
iter 267720: loss nan, time 120.79ms
iter 267730: loss nan, time 121.11ms
iter 267740: loss nan, time 120.80ms
step 267750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 267750: loss nan, time 2916.15ms
iter 267760: loss nan, time 117.78ms
iter 267770: loss nan, time 118.72ms
iter 267780: loss nan, time 119.30ms
iter 267790: loss nan, time 120.08ms
tensor(0.5937)
iter 267800: loss nan, time 120.51ms
iter 267810: loss nan, time 119.84ms
iter 267820: loss nan, time 118.75ms
iter 267830: loss nan, time 120.89ms
iter 267840: loss nan, time 121.66ms
iter 267850: loss nan, time 120.84ms
iter 267860: loss nan, time 122.07ms
iter 267870: loss nan, time 120.56ms
iter 267880: loss nan, time 122.27ms
iter 267890: loss nan, time 121.30ms
tensor(0.6243)
iter 267900: loss nan, time 119.06ms
iter 267910: loss nan, time 120.02ms
iter 267920: loss nan, time 120.71ms
iter 267930: loss nan, time 121.01ms
iter 267940: loss nan, time 120.81ms
iter 267950: loss nan, time 118.71ms
iter 267960: loss nan, time 118.50ms
iter 267970: loss nan, time 119.50ms
iter 267980: loss nan, time 120.24ms
iter 267990: loss nan, time 120.78ms
tensor(0.6545)
step 268000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 268000: loss nan, time 2916.03ms
iter 268010: loss nan, time 120.70ms
iter 268020: loss nan, time 121.18ms
iter 268030: loss nan, time 120.83ms
iter 268040: loss nan, time 121.75ms
iter 268050: loss nan, time 122.81ms
iter 268060: loss nan, time 122.99ms
iter 268070: loss nan, time 120.55ms
iter 268080: loss nan, time 120.65ms
iter 268090: loss nan, time 120.78ms
tensor(0.6841)
iter 268100: loss nan, time 120.90ms
iter 268110: loss nan, time 120.81ms
iter 268120: loss nan, time 120.54ms
iter 268130: loss nan, time 120.62ms
iter 268140: loss nan, time 118.85ms
iter 268150: loss nan, time 118.94ms
iter 268160: loss nan, time 119.49ms
iter 268170: loss nan, time 118.56ms
iter 268180: loss nan, time 120.52ms
iter 268190: loss nan, time 120.80ms
tensor(0.7129)
iter 268200: loss nan, time 122.20ms
iter 268210: loss nan, time 120.92ms
iter 268220: loss nan, time 119.04ms
iter 268230: loss nan, time 121.77ms
iter 268240: loss nan, time 122.72ms
step 268250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 268250: loss nan, time 2903.03ms
iter 268260: loss nan, time 115.43ms
iter 268270: loss nan, time 117.19ms
iter 268280: loss nan, time 118.39ms
iter 268290: loss nan, time 115.71ms
tensor(0.7409)
iter 268300: loss nan, time 117.24ms
iter 268310: loss nan, time 118.79ms
iter 268320: loss nan, time 116.11ms
iter 268330: loss nan, time 116.89ms
iter 268340: loss nan, time 118.66ms
iter 268350: loss nan, time 115.51ms
iter 268360: loss nan, time 117.07ms
iter 268370: loss nan, time 118.28ms
iter 268380: loss nan, time 116.02ms
iter 268390: loss nan, time 116.90ms
tensor(0.7679)
iter 268400: loss nan, time 118.86ms
iter 268410: loss nan, time 115.55ms
iter 268420: loss nan, time 117.18ms
iter 268430: loss nan, time 118.12ms
iter 268440: loss nan, time 115.94ms
iter 268450: loss nan, time 116.98ms
iter 268460: loss nan, time 117.78ms
iter 268470: loss nan, time 117.26ms
iter 268480: loss nan, time 117.17ms
iter 268490: loss nan, time 117.29ms
tensor(0.7939)
step 268500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 268500: loss nan, time 2913.88ms
iter 268510: loss nan, time 114.90ms
iter 268520: loss nan, time 118.09ms
iter 268530: loss nan, time 116.88ms
iter 268540: loss nan, time 114.91ms
iter 268550: loss nan, time 117.32ms
iter 268560: loss nan, time 116.95ms
iter 268570: loss nan, time 114.98ms
iter 268580: loss nan, time 119.20ms
iter 268590: loss nan, time 116.28ms
tensor(0.8187)
iter 268600: loss nan, time 114.91ms
iter 268610: loss nan, time 119.14ms
iter 268620: loss nan, time 116.64ms
iter 268630: loss nan, time 114.91ms
iter 268640: loss nan, time 118.51ms
iter 268650: loss nan, time 117.03ms
iter 268660: loss nan, time 114.84ms
iter 268670: loss nan, time 118.31ms
iter 268680: loss nan, time 116.79ms
iter 268690: loss nan, time 114.92ms
tensor(0.8423)
iter 268700: loss nan, time 117.48ms
iter 268710: loss nan, time 116.84ms
iter 268720: loss nan, time 115.11ms
iter 268730: loss nan, time 117.71ms
iter 268740: loss nan, time 117.68ms
step 268750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 268750: loss nan, time 2900.99ms
iter 268760: loss nan, time 118.84ms
iter 268770: loss nan, time 116.47ms
iter 268780: loss nan, time 116.98ms
iter 268790: loss nan, time 116.51ms
tensor(0.8645)
iter 268800: loss nan, time 117.80ms
iter 268810: loss nan, time 114.85ms
iter 268820: loss nan, time 116.99ms
iter 268830: loss nan, time 116.21ms
iter 268840: loss nan, time 116.28ms
iter 268850: loss nan, time 116.92ms
iter 268860: loss nan, time 116.15ms
iter 268870: loss nan, time 115.49ms
iter 268880: loss nan, time 116.04ms
iter 268890: loss nan, time 116.98ms
tensor(0.8853)
iter 268900: loss nan, time 115.46ms
iter 268910: loss nan, time 116.94ms
iter 268920: loss nan, time 117.03ms
iter 268930: loss nan, time 115.49ms
iter 268940: loss nan, time 116.95ms
iter 268950: loss nan, time 115.99ms
iter 268960: loss nan, time 117.85ms
iter 268970: loss nan, time 116.32ms
iter 268980: loss nan, time 117.49ms
iter 268990: loss nan, time 116.42ms
tensor(0.9045)
step 269000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 269000: loss nan, time 2906.68ms
iter 269010: loss nan, time 116.72ms
iter 269020: loss nan, time 116.24ms
iter 269030: loss nan, time 116.83ms
iter 269040: loss nan, time 114.50ms
iter 269050: loss nan, time 116.31ms
iter 269060: loss nan, time 116.17ms
iter 269070: loss nan, time 116.81ms
iter 269080: loss nan, time 115.09ms
iter 269090: loss nan, time 117.13ms
tensor(0.9222)
iter 269100: loss nan, time 115.53ms
iter 269110: loss nan, time 116.39ms
iter 269120: loss nan, time 117.05ms
iter 269130: loss nan, time 116.01ms
iter 269140: loss nan, time 116.44ms
iter 269150: loss nan, time 116.13ms
iter 269160: loss nan, time 114.82ms
iter 269170: loss nan, time 116.69ms
iter 269180: loss nan, time 116.87ms
iter 269190: loss nan, time 116.86ms
tensor(0.9382)
iter 269200: loss nan, time 117.27ms
iter 269210: loss nan, time 117.76ms
iter 269220: loss nan, time 114.15ms
iter 269230: loss nan, time 116.88ms
iter 269240: loss nan, time 117.34ms
step 269250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 269250: loss nan, time 2916.67ms
iter 269260: loss nan, time 116.96ms
iter 269270: loss nan, time 114.88ms
iter 269280: loss nan, time 116.74ms
iter 269290: loss nan, time 114.85ms
tensor(0.9524)
iter 269300: loss nan, time 117.72ms
iter 269310: loss nan, time 116.92ms
iter 269320: loss nan, time 116.91ms
iter 269330: loss nan, time 114.87ms
iter 269340: loss nan, time 117.30ms
iter 269350: loss nan, time 114.94ms
iter 269360: loss nan, time 117.87ms
iter 269370: loss nan, time 117.38ms
iter 269380: loss nan, time 116.40ms
iter 269390: loss nan, time 116.22ms
tensor(0.9649)
iter 269400: loss nan, time 117.01ms
iter 269410: loss nan, time 116.44ms
iter 269420: loss nan, time 117.23ms
iter 269430: loss nan, time 117.94ms
iter 269440: loss nan, time 117.49ms
iter 269450: loss nan, time 116.29ms
iter 269460: loss nan, time 116.52ms
iter 269470: loss nan, time 116.49ms
iter 269480: loss nan, time 117.03ms
iter 269490: loss nan, time 116.78ms
tensor(0.9755)
step 269500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 269500: loss nan, time 2909.99ms
iter 269510: loss nan, time 116.84ms
iter 269520: loss nan, time 115.95ms
iter 269530: loss nan, time 116.93ms
iter 269540: loss nan, time 114.32ms
iter 269550: loss nan, time 116.87ms
iter 269560: loss nan, time 116.28ms
iter 269570: loss nan, time 116.88ms
iter 269580: loss nan, time 115.01ms
iter 269590: loss nan, time 116.87ms
tensor(0.9843)
iter 269600: loss nan, time 115.39ms
iter 269610: loss nan, time 116.40ms
iter 269620: loss nan, time 117.02ms
iter 269630: loss nan, time 115.98ms
iter 269640: loss nan, time 116.20ms
iter 269650: loss nan, time 116.22ms
iter 269660: loss nan, time 114.78ms
iter 269670: loss nan, time 116.64ms
iter 269680: loss nan, time 117.16ms
iter 269690: loss nan, time 116.89ms
tensor(0.9911)
iter 269700: loss nan, time 116.98ms
iter 269710: loss nan, time 117.12ms
iter 269720: loss nan, time 114.42ms
iter 269730: loss nan, time 116.47ms
iter 269740: loss nan, time 116.07ms
step 269750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 269750: loss nan, time 2912.40ms
iter 269760: loss nan, time 118.04ms
iter 269770: loss nan, time 114.75ms
iter 269780: loss nan, time 117.01ms
iter 269790: loss nan, time 116.41ms
tensor(0.9961)
iter 269800: loss nan, time 117.76ms
iter 269810: loss nan, time 116.56ms
iter 269820: loss nan, time 116.74ms
iter 269830: loss nan, time 113.91ms
iter 269840: loss nan, time 116.97ms
iter 269850: loss nan, time 115.43ms
iter 269860: loss nan, time 117.08ms
iter 269870: loss nan, time 117.27ms
iter 269880: loss nan, time 116.29ms
iter 269890: loss nan, time 114.77ms
tensor(0.9990)
iter 269900: loss nan, time 117.28ms
iter 269910: loss nan, time 114.99ms
iter 269920: loss nan, time 116.12ms
iter 269930: loss nan, time 117.65ms
iter 269940: loss nan, time 116.98ms
iter 269950: loss nan, time 114.87ms
iter 269960: loss nan, time 116.75ms
iter 269970: loss nan, time 115.04ms
iter 269980: loss nan, time 116.96ms
iter 269990: loss nan, time 115.76ms
tensor(1.)
step 270000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 270000: loss nan, time 2900.85ms
iter 270010: loss nan, time 115.94ms
iter 270020: loss nan, time 116.78ms
iter 270030: loss nan, time 116.35ms
iter 270040: loss nan, time 117.21ms
iter 270050: loss nan, time 117.30ms
iter 270060: loss nan, time 116.40ms
iter 270070: loss nan, time 117.02ms
iter 270080: loss nan, time 116.78ms
iter 270090: loss nan, time 118.43ms
tensor(0.9990)
iter 270100: loss nan, time 116.47ms
iter 270110: loss nan, time 116.89ms
iter 270120: loss nan, time 116.31ms
iter 270130: loss nan, time 117.06ms
iter 270140: loss nan, time 116.90ms
iter 270150: loss nan, time 118.32ms
iter 270160: loss nan, time 117.55ms
iter 270170: loss nan, time 116.76ms
iter 270180: loss nan, time 116.82ms
iter 270190: loss nan, time 116.09ms
tensor(0.9961)
iter 270200: loss nan, time 117.76ms
iter 270210: loss nan, time 116.63ms
iter 270220: loss nan, time 116.84ms
iter 270230: loss nan, time 116.81ms
iter 270240: loss nan, time 116.43ms
step 270250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 270250: loss nan, time 2921.45ms
iter 270260: loss nan, time 116.25ms
iter 270270: loss nan, time 117.40ms
iter 270280: loss nan, time 116.28ms
iter 270290: loss nan, time 116.67ms
tensor(0.9911)
iter 270300: loss nan, time 117.75ms
iter 270310: loss nan, time 116.94ms
iter 270320: loss nan, time 116.42ms
iter 270330: loss nan, time 117.02ms
iter 270340: loss nan, time 117.02ms
iter 270350: loss nan, time 116.70ms
iter 270360: loss nan, time 116.95ms
iter 270370: loss nan, time 116.14ms
iter 270380: loss nan, time 116.65ms
iter 270390: loss nan, time 116.95ms
tensor(0.9843)
iter 270400: loss nan, time 117.55ms
iter 270410: loss nan, time 117.46ms
iter 270420: loss nan, time 117.35ms
iter 270430: loss nan, time 117.25ms
iter 270440: loss nan, time 116.94ms
iter 270450: loss nan, time 117.00ms
iter 270460: loss nan, time 116.19ms
iter 270470: loss nan, time 116.87ms
iter 270480: loss nan, time 117.88ms
iter 270490: loss nan, time 116.24ms
tensor(0.9755)
step 270500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 270500: loss nan, time 2901.55ms
iter 270510: loss nan, time 117.02ms
iter 270520: loss nan, time 114.89ms
iter 270530: loss nan, time 118.96ms
iter 270540: loss nan, time 117.05ms
iter 270550: loss nan, time 114.61ms
iter 270560: loss nan, time 119.44ms
iter 270570: loss nan, time 117.18ms
iter 270580: loss nan, time 114.57ms
iter 270590: loss nan, time 118.57ms
tensor(0.9649)
iter 270600: loss nan, time 117.74ms
iter 270610: loss nan, time 115.02ms
iter 270620: loss nan, time 118.89ms
iter 270630: loss nan, time 117.45ms
iter 270640: loss nan, time 114.26ms
iter 270650: loss nan, time 117.26ms
iter 270660: loss nan, time 116.35ms
iter 270670: loss nan, time 115.06ms
iter 270680: loss nan, time 119.11ms
iter 270690: loss nan, time 117.54ms
tensor(0.9524)
iter 270700: loss nan, time 115.86ms
iter 270710: loss nan, time 116.97ms
iter 270720: loss nan, time 117.42ms
iter 270730: loss nan, time 115.72ms
iter 270740: loss nan, time 116.94ms
step 270750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 270750: loss nan, time 2919.52ms
iter 270760: loss nan, time 116.78ms
iter 270770: loss nan, time 117.22ms
iter 270780: loss nan, time 119.16ms
iter 270790: loss nan, time 115.37ms
tensor(0.9382)
iter 270800: loss nan, time 114.86ms
iter 270810: loss nan, time 118.90ms
iter 270820: loss nan, time 117.25ms
iter 270830: loss nan, time 114.92ms
iter 270840: loss nan, time 118.17ms
iter 270850: loss nan, time 116.99ms
iter 270860: loss nan, time 115.16ms
iter 270870: loss nan, time 118.89ms
iter 270880: loss nan, time 117.75ms
iter 270890: loss nan, time 114.17ms
tensor(0.9222)
iter 270900: loss nan, time 119.10ms
iter 270910: loss nan, time 117.27ms
iter 270920: loss nan, time 116.00ms
iter 270930: loss nan, time 116.94ms
iter 270940: loss nan, time 118.51ms
iter 270950: loss nan, time 115.71ms
iter 270960: loss nan, time 115.93ms
iter 270970: loss nan, time 118.28ms
iter 270980: loss nan, time 115.29ms
iter 270990: loss nan, time 116.90ms
tensor(0.9045)
step 271000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 271000: loss nan, time 2903.60ms
iter 271010: loss nan, time 114.91ms
iter 271020: loss nan, time 117.21ms
iter 271030: loss nan, time 116.98ms
iter 271040: loss nan, time 115.07ms
iter 271050: loss nan, time 116.92ms
iter 271060: loss nan, time 117.04ms
iter 271070: loss nan, time 115.58ms
iter 271080: loss nan, time 117.03ms
iter 271090: loss nan, time 117.14ms
tensor(0.8853)
iter 271100: loss nan, time 115.58ms
iter 271110: loss nan, time 116.94ms
iter 271120: loss nan, time 117.10ms
iter 271130: loss nan, time 116.18ms
iter 271140: loss nan, time 118.04ms
iter 271150: loss nan, time 117.13ms
iter 271160: loss nan, time 114.69ms
iter 271170: loss nan, time 116.90ms
iter 271180: loss nan, time 118.19ms
iter 271190: loss nan, time 116.05ms
tensor(0.8645)
iter 271200: loss nan, time 117.87ms
iter 271210: loss nan, time 118.40ms
iter 271220: loss nan, time 116.30ms
iter 271230: loss nan, time 115.96ms
iter 271240: loss nan, time 118.27ms
step 271250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 271250: loss nan, time 2918.46ms
iter 271260: loss nan, time 117.00ms
iter 271270: loss nan, time 119.20ms
iter 271280: loss nan, time 117.28ms
iter 271290: loss nan, time 116.37ms
tensor(0.8423)
iter 271300: loss nan, time 117.66ms
iter 271310: loss nan, time 117.00ms
iter 271320: loss nan, time 114.22ms
iter 271330: loss nan, time 117.02ms
iter 271340: loss nan, time 117.23ms
iter 271350: loss nan, time 114.85ms
iter 271360: loss nan, time 116.89ms
iter 271370: loss nan, time 117.00ms
iter 271380: loss nan, time 116.16ms
iter 271390: loss nan, time 117.44ms
tensor(0.8187)
iter 271400: loss nan, time 117.66ms
iter 271410: loss nan, time 116.16ms
iter 271420: loss nan, time 117.33ms
iter 271430: loss nan, time 117.28ms
iter 271440: loss nan, time 114.86ms
iter 271450: loss nan, time 117.03ms
iter 271460: loss nan, time 116.90ms
iter 271470: loss nan, time 116.39ms
iter 271480: loss nan, time 115.95ms
iter 271490: loss nan, time 117.01ms
tensor(0.7939)
step 271500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 271500: loss nan, time 2901.54ms
iter 271510: loss nan, time 118.45ms
iter 271520: loss nan, time 115.86ms
iter 271530: loss nan, time 115.42ms
iter 271540: loss nan, time 118.72ms
iter 271550: loss nan, time 114.86ms
iter 271560: loss nan, time 116.95ms
iter 271570: loss nan, time 118.15ms
iter 271580: loss nan, time 116.11ms
iter 271590: loss nan, time 113.80ms
tensor(0.7679)
iter 271600: loss nan, time 119.10ms
iter 271610: loss nan, time 113.64ms
iter 271620: loss nan, time 117.10ms
iter 271630: loss nan, time 118.41ms
iter 271640: loss nan, time 116.07ms
iter 271650: loss nan, time 114.82ms
iter 271660: loss nan, time 119.15ms
iter 271670: loss nan, time 114.91ms
iter 271680: loss nan, time 115.43ms
iter 271690: loss nan, time 118.84ms
tensor(0.7409)
iter 271700: loss nan, time 117.66ms
iter 271710: loss nan, time 114.95ms
iter 271720: loss nan, time 117.68ms
iter 271730: loss nan, time 114.38ms
iter 271740: loss nan, time 115.02ms
step 271750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 271750: loss nan, time 2920.43ms
iter 271760: loss nan, time 117.80ms
iter 271770: loss nan, time 114.93ms
iter 271780: loss nan, time 114.99ms
iter 271790: loss nan, time 118.42ms
tensor(0.7129)
iter 271800: loss nan, time 115.73ms
iter 271810: loss nan, time 116.01ms
iter 271820: loss nan, time 118.60ms
iter 271830: loss nan, time 116.04ms
iter 271840: loss nan, time 113.91ms
iter 271850: loss nan, time 118.26ms
iter 271860: loss nan, time 114.54ms
iter 271870: loss nan, time 117.13ms
iter 271880: loss nan, time 118.44ms
iter 271890: loss nan, time 116.39ms
tensor(0.6841)
iter 271900: loss nan, time 115.24ms
iter 271910: loss nan, time 119.44ms
iter 271920: loss nan, time 115.61ms
iter 271930: loss nan, time 115.90ms
iter 271940: loss nan, time 119.12ms
iter 271950: loss nan, time 117.13ms
iter 271960: loss nan, time 114.86ms
iter 271970: loss nan, time 117.08ms
iter 271980: loss nan, time 114.93ms
iter 271990: loss nan, time 115.64ms
tensor(0.6545)
step 272000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 272000: loss nan, time 2921.15ms
iter 272010: loss nan, time 116.87ms
iter 272020: loss nan, time 113.94ms
iter 272030: loss nan, time 114.99ms
iter 272040: loss nan, time 117.10ms
iter 272050: loss nan, time 114.95ms
iter 272060: loss nan, time 117.49ms
iter 272070: loss nan, time 117.61ms
iter 272080: loss nan, time 115.79ms
iter 272090: loss nan, time 113.98ms
tensor(0.6243)
iter 272100: loss nan, time 118.95ms
iter 272110: loss nan, time 115.77ms
iter 272120: loss nan, time 117.13ms
iter 272130: loss nan, time 118.39ms
iter 272140: loss nan, time 116.38ms
iter 272150: loss nan, time 114.94ms
iter 272160: loss nan, time 118.30ms
iter 272170: loss nan, time 114.94ms
iter 272180: loss nan, time 115.98ms
iter 272190: loss nan, time 118.38ms
tensor(0.5937)
iter 272200: loss nan, time 117.01ms
iter 272210: loss nan, time 114.88ms
iter 272220: loss nan, time 118.37ms
iter 272230: loss nan, time 115.36ms
iter 272240: loss nan, time 116.92ms
step 272250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 272250: loss nan, time 2899.31ms
iter 272260: loss nan, time 114.91ms
iter 272270: loss nan, time 117.00ms
iter 272280: loss nan, time 116.94ms
iter 272290: loss nan, time 115.56ms
tensor(0.5627)
iter 272300: loss nan, time 117.64ms
iter 272310: loss nan, time 116.38ms
iter 272320: loss nan, time 115.10ms
iter 272330: loss nan, time 118.83ms
iter 272340: loss nan, time 116.82ms
iter 272350: loss nan, time 114.87ms
iter 272360: loss nan, time 117.01ms
iter 272370: loss nan, time 117.31ms
iter 272380: loss nan, time 114.82ms
iter 272390: loss nan, time 116.80ms
tensor(0.5314)
iter 272400: loss nan, time 117.19ms
iter 272410: loss nan, time 114.89ms
iter 272420: loss nan, time 116.99ms
iter 272430: loss nan, time 116.98ms
iter 272440: loss nan, time 115.49ms
iter 272450: loss nan, time 117.50ms
iter 272460: loss nan, time 117.23ms
iter 272470: loss nan, time 116.13ms
iter 272480: loss nan, time 116.71ms
iter 272490: loss nan, time 118.26ms
tensor(0.5000)
step 272500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 272500: loss nan, time 2908.60ms
iter 272510: loss nan, time 116.99ms
iter 272520: loss nan, time 118.46ms
iter 272530: loss nan, time 115.83ms
iter 272540: loss nan, time 116.95ms
iter 272550: loss nan, time 118.87ms
iter 272560: loss nan, time 116.79ms
iter 272570: loss nan, time 117.00ms
iter 272580: loss nan, time 119.05ms
iter 272590: loss nan, time 116.95ms
tensor(0.4686)
iter 272600: loss nan, time 115.71ms
iter 272610: loss nan, time 117.07ms
iter 272620: loss nan, time 116.96ms
iter 272630: loss nan, time 115.53ms
iter 272640: loss nan, time 116.66ms
iter 272650: loss nan, time 117.16ms
iter 272660: loss nan, time 116.71ms
iter 272670: loss nan, time 116.92ms
iter 272680: loss nan, time 119.31ms
iter 272690: loss nan, time 116.99ms
tensor(0.4373)
iter 272700: loss nan, time 115.59ms
iter 272710: loss nan, time 114.91ms
iter 272720: loss nan, time 116.95ms
iter 272730: loss nan, time 115.24ms
iter 272740: loss nan, time 117.31ms
step 272750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 272750: loss nan, time 2900.20ms
iter 272760: loss nan, time 116.37ms
iter 272770: loss nan, time 117.56ms
iter 272780: loss nan, time 116.97ms
iter 272790: loss nan, time 114.96ms
tensor(0.4063)
iter 272800: loss nan, time 117.78ms
iter 272810: loss nan, time 116.97ms
iter 272820: loss nan, time 114.96ms
iter 272830: loss nan, time 117.11ms
iter 272840: loss nan, time 116.87ms
iter 272850: loss nan, time 115.27ms
iter 272860: loss nan, time 117.01ms
iter 272870: loss nan, time 118.18ms
iter 272880: loss nan, time 116.49ms
iter 272890: loss nan, time 116.90ms
tensor(0.3757)
iter 272900: loss nan, time 117.71ms
iter 272910: loss nan, time 116.92ms
iter 272920: loss nan, time 115.30ms
iter 272930: loss nan, time 119.60ms
iter 272940: loss nan, time 117.03ms
iter 272950: loss nan, time 114.50ms
iter 272960: loss nan, time 114.93ms
iter 272970: loss nan, time 116.98ms
iter 272980: loss nan, time 115.05ms
iter 272990: loss nan, time 118.85ms
tensor(0.3455)
step 273000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 273000: loss nan, time 2906.58ms
iter 273010: loss nan, time 116.98ms
iter 273020: loss nan, time 119.33ms
iter 273030: loss nan, time 116.94ms
iter 273040: loss nan, time 115.84ms
iter 273050: loss nan, time 117.57ms
iter 273060: loss nan, time 116.43ms
iter 273070: loss nan, time 115.27ms
iter 273080: loss nan, time 117.02ms
iter 273090: loss nan, time 117.22ms
tensor(0.3159)
iter 273100: loss nan, time 115.57ms
iter 273110: loss nan, time 117.55ms
iter 273120: loss nan, time 117.00ms
iter 273130: loss nan, time 114.03ms
iter 273140: loss nan, time 117.34ms
iter 273150: loss nan, time 117.25ms
iter 273160: loss nan, time 115.84ms
iter 273170: loss nan, time 117.20ms
iter 273180: loss nan, time 117.94ms
iter 273190: loss nan, time 116.15ms
tensor(0.2871)
iter 273200: loss nan, time 116.53ms
iter 273210: loss nan, time 117.03ms
iter 273220: loss nan, time 115.76ms
iter 273230: loss nan, time 117.87ms
iter 273240: loss nan, time 118.73ms
step 273250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 273250: loss nan, time 2916.77ms
iter 273260: loss nan, time 117.07ms
iter 273270: loss nan, time 119.18ms
iter 273280: loss nan, time 117.01ms
iter 273290: loss nan, time 115.78ms
tensor(0.2591)
iter 273300: loss nan, time 118.36ms
iter 273310: loss nan, time 117.38ms
iter 273320: loss nan, time 115.63ms
iter 273330: loss nan, time 117.47ms
iter 273340: loss nan, time 117.14ms
iter 273350: loss nan, time 115.80ms
iter 273360: loss nan, time 116.48ms
iter 273370: loss nan, time 117.80ms
iter 273380: loss nan, time 115.20ms
iter 273390: loss nan, time 117.15ms
tensor(0.2321)
iter 273400: loss nan, time 118.17ms
iter 273410: loss nan, time 116.53ms
iter 273420: loss nan, time 116.95ms
iter 273430: loss nan, time 118.17ms
iter 273440: loss nan, time 116.34ms
iter 273450: loss nan, time 115.92ms
iter 273460: loss nan, time 117.20ms
iter 273470: loss nan, time 115.46ms
iter 273480: loss nan, time 117.02ms
iter 273490: loss nan, time 118.16ms
tensor(0.2061)
step 273500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 273500: loss nan, time 2908.64ms
iter 273510: loss nan, time 116.91ms
iter 273520: loss nan, time 117.90ms
iter 273530: loss nan, time 115.54ms
iter 273540: loss nan, time 116.68ms
iter 273550: loss nan, time 118.18ms
iter 273560: loss nan, time 115.37ms
iter 273570: loss nan, time 117.37ms
iter 273580: loss nan, time 118.32ms
iter 273590: loss nan, time 116.27ms
tensor(0.1813)
iter 273600: loss nan, time 117.46ms
iter 273610: loss nan, time 116.52ms
iter 273620: loss nan, time 116.54ms
iter 273630: loss nan, time 116.52ms
iter 273640: loss nan, time 118.81ms
iter 273650: loss nan, time 116.87ms
iter 273660: loss nan, time 117.03ms
iter 273670: loss nan, time 118.78ms
iter 273680: loss nan, time 116.24ms
iter 273690: loss nan, time 117.41ms
tensor(0.1577)
iter 273700: loss nan, time 119.99ms
iter 273710: loss nan, time 116.34ms
iter 273720: loss nan, time 122.42ms
iter 273730: loss nan, time 120.96ms
iter 273740: loss nan, time 117.54ms
step 273750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 273750: loss nan, time 2908.75ms
iter 273760: loss nan, time 119.25ms
iter 273770: loss nan, time 117.09ms
iter 273780: loss nan, time 114.75ms
iter 273790: loss nan, time 117.25ms
tensor(0.1355)
iter 273800: loss nan, time 115.55ms
iter 273810: loss nan, time 115.06ms
iter 273820: loss nan, time 119.42ms
iter 273830: loss nan, time 116.99ms
iter 273840: loss nan, time 119.30ms
iter 273850: loss nan, time 117.44ms
iter 273860: loss nan, time 117.29ms
iter 273870: loss nan, time 117.01ms
iter 273880: loss nan, time 114.97ms
iter 273890: loss nan, time 116.99ms
tensor(0.1147)
iter 273900: loss nan, time 117.74ms
iter 273910: loss nan, time 115.70ms
iter 273920: loss nan, time 116.89ms
iter 273930: loss nan, time 117.02ms
iter 273940: loss nan, time 115.59ms
iter 273950: loss nan, time 117.45ms
iter 273960: loss nan, time 116.86ms
iter 273970: loss nan, time 116.33ms
iter 273980: loss nan, time 116.11ms
iter 273990: loss nan, time 116.96ms
tensor(0.0955)
step 274000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 274000: loss nan, time 2904.56ms
iter 274010: loss nan, time 119.23ms
iter 274020: loss nan, time 117.04ms
iter 274030: loss nan, time 114.98ms
iter 274040: loss nan, time 117.08ms
iter 274050: loss nan, time 114.77ms
iter 274060: loss nan, time 115.59ms
iter 274070: loss nan, time 118.82ms
iter 274080: loss nan, time 117.31ms
iter 274090: loss nan, time 114.24ms
tensor(0.0778)
iter 274100: loss nan, time 118.07ms
iter 274110: loss nan, time 116.74ms
iter 274120: loss nan, time 117.03ms
iter 274130: loss nan, time 117.01ms
iter 274140: loss nan, time 119.01ms
iter 274150: loss nan, time 117.09ms
iter 274160: loss nan, time 115.93ms
iter 274170: loss nan, time 116.94ms
iter 274180: loss nan, time 117.26ms
iter 274190: loss nan, time 114.56ms
tensor(0.0618)
iter 274200: loss nan, time 117.86ms
iter 274210: loss nan, time 116.93ms
iter 274220: loss nan, time 116.19ms
iter 274230: loss nan, time 116.85ms
iter 274240: loss nan, time 116.93ms
step 274250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 274250: loss nan, time 2905.65ms
iter 274260: loss nan, time 119.38ms
iter 274270: loss nan, time 117.06ms
iter 274280: loss nan, time 114.79ms
iter 274290: loss nan, time 117.11ms
tensor(0.0476)
iter 274300: loss nan, time 115.18ms
iter 274310: loss nan, time 115.06ms
iter 274320: loss nan, time 116.91ms
iter 274330: loss nan, time 117.02ms
iter 274340: loss nan, time 115.05ms
iter 274350: loss nan, time 116.87ms
iter 274360: loss nan, time 115.73ms
iter 274370: loss nan, time 116.06ms
iter 274380: loss nan, time 117.91ms
iter 274390: loss nan, time 118.27ms
tensor(0.0351)
iter 274400: loss nan, time 116.82ms
iter 274410: loss nan, time 116.71ms
iter 274420: loss nan, time 116.36ms
iter 274430: loss nan, time 116.34ms
iter 274440: loss nan, time 116.94ms
iter 274450: loss nan, time 118.51ms
iter 274460: loss nan, time 115.98ms
iter 274470: loss nan, time 116.98ms
iter 274480: loss nan, time 115.20ms
iter 274490: loss nan, time 116.14ms
tensor(0.0245)
step 274500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 274500: loss nan, time 2901.20ms
iter 274510: loss nan, time 117.18ms
iter 274520: loss nan, time 115.39ms
iter 274530: loss nan, time 114.91ms
iter 274540: loss nan, time 118.78ms
iter 274550: loss nan, time 115.74ms
iter 274560: loss nan, time 116.96ms
iter 274570: loss nan, time 119.26ms
iter 274580: loss nan, time 117.04ms
iter 274590: loss nan, time 115.16ms
tensor(0.0157)
iter 274600: loss nan, time 117.24ms
iter 274610: loss nan, time 114.83ms
iter 274620: loss nan, time 115.74ms
iter 274630: loss nan, time 117.42ms
iter 274640: loss nan, time 118.41ms
iter 274650: loss nan, time 116.02ms
iter 274660: loss nan, time 116.11ms
iter 274670: loss nan, time 115.96ms
iter 274680: loss nan, time 116.29ms
iter 274690: loss nan, time 116.90ms
tensor(0.0089)
iter 274700: loss nan, time 118.85ms
iter 274710: loss nan, time 116.04ms
iter 274720: loss nan, time 117.01ms
iter 274730: loss nan, time 115.97ms
iter 274740: loss nan, time 116.07ms
step 274750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 274750: loss nan, time 2902.91ms
iter 274760: loss nan, time 117.24ms
iter 274770: loss nan, time 115.78ms
iter 274780: loss nan, time 114.96ms
iter 274790: loss nan, time 118.88ms
tensor(0.0039)
iter 274800: loss nan, time 115.15ms
iter 274810: loss nan, time 116.25ms
iter 274820: loss nan, time 118.81ms
iter 274830: loss nan, time 116.59ms
iter 274840: loss nan, time 114.84ms
iter 274850: loss nan, time 120.14ms
iter 274860: loss nan, time 115.27ms
iter 274870: loss nan, time 115.21ms
iter 274880: loss nan, time 116.87ms
iter 274890: loss nan, time 117.67ms
tensor(0.0010)
iter 274900: loss nan, time 116.36ms
iter 274910: loss nan, time 115.95ms
iter 274920: loss nan, time 116.01ms
iter 274930: loss nan, time 116.28ms
iter 274940: loss nan, time 116.85ms
iter 274950: loss nan, time 118.42ms
iter 274960: loss nan, time 116.10ms
iter 274970: loss nan, time 116.92ms
iter 274980: loss nan, time 116.34ms
iter 274990: loss nan, time 116.45ms
tensor(0.0010)
step 275000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 275000: loss nan, time 2907.76ms
iter 275010: loss nan, time 118.47ms
iter 275020: loss nan, time 115.98ms
iter 275030: loss nan, time 114.92ms
iter 275040: loss nan, time 118.29ms
iter 275050: loss nan, time 114.94ms
iter 275060: loss nan, time 116.99ms
iter 275070: loss nan, time 119.38ms
iter 275080: loss nan, time 116.98ms
iter 275090: loss nan, time 114.82ms
tensor(0.0010)
iter 275100: loss nan, time 119.06ms
iter 275110: loss nan, time 114.96ms
iter 275120: loss nan, time 115.68ms
iter 275130: loss nan, time 117.48ms
iter 275140: loss nan, time 117.98ms
iter 275150: loss nan, time 116.04ms
iter 275160: loss nan, time 115.98ms
iter 275170: loss nan, time 116.29ms
iter 275180: loss nan, time 116.05ms
iter 275190: loss nan, time 117.19ms
tensor(0.0039)
iter 275200: loss nan, time 119.03ms
iter 275210: loss nan, time 116.11ms
iter 275220: loss nan, time 117.43ms
iter 275230: loss nan, time 116.52ms
iter 275240: loss nan, time 116.40ms
step 275250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 275250: loss nan, time 2906.10ms
iter 275260: loss nan, time 117.84ms
iter 275270: loss nan, time 116.48ms
iter 275280: loss nan, time 114.75ms
iter 275290: loss nan, time 118.68ms
tensor(0.0089)
iter 275300: loss nan, time 115.13ms
iter 275310: loss nan, time 116.99ms
iter 275320: loss nan, time 118.33ms
iter 275330: loss nan, time 116.02ms
iter 275340: loss nan, time 114.96ms
iter 275350: loss nan, time 119.11ms
iter 275360: loss nan, time 115.67ms
iter 275370: loss nan, time 117.58ms
iter 275380: loss nan, time 117.28ms
iter 275390: loss nan, time 116.79ms
tensor(0.0157)
iter 275400: loss nan, time 115.12ms
iter 275410: loss nan, time 115.95ms
iter 275420: loss nan, time 114.98ms
iter 275430: loss nan, time 116.07ms
iter 275440: loss nan, time 116.95ms
iter 275450: loss nan, time 119.24ms
iter 275460: loss nan, time 116.67ms
iter 275470: loss nan, time 116.89ms
iter 275480: loss nan, time 117.01ms
iter 275490: loss nan, time 116.94ms
tensor(0.0245)
step 275500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 275500: loss nan, time 2903.15ms
iter 275510: loss nan, time 116.49ms
iter 275520: loss nan, time 117.32ms
iter 275530: loss nan, time 115.01ms
iter 275540: loss nan, time 117.43ms
iter 275550: loss nan, time 114.85ms
iter 275560: loss nan, time 116.77ms
iter 275570: loss nan, time 117.26ms
iter 275580: loss nan, time 116.86ms
iter 275590: loss nan, time 116.23ms
tensor(0.0351)
iter 275600: loss nan, time 117.13ms
iter 275610: loss nan, time 115.34ms
iter 275620: loss nan, time 116.90ms
iter 275630: loss nan, time 116.93ms
iter 275640: loss nan, time 116.71ms
iter 275650: loss nan, time 116.92ms
iter 275660: loss nan, time 116.12ms
iter 275670: loss nan, time 117.05ms
iter 275680: loss nan, time 117.42ms
iter 275690: loss nan, time 116.36ms
tensor(0.0476)
iter 275700: loss nan, time 117.23ms
iter 275710: loss nan, time 117.06ms
iter 275720: loss nan, time 116.63ms
iter 275730: loss nan, time 116.99ms
iter 275740: loss nan, time 117.13ms
step 275750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 275750: loss nan, time 2908.09ms
iter 275760: loss nan, time 116.96ms
iter 275770: loss nan, time 116.41ms
iter 275780: loss nan, time 116.27ms
iter 275790: loss nan, time 117.29ms
tensor(0.0618)
iter 275800: loss nan, time 115.11ms
iter 275810: loss nan, time 116.85ms
iter 275820: loss nan, time 118.07ms
iter 275830: loss nan, time 118.21ms
iter 275840: loss nan, time 116.84ms
iter 275850: loss nan, time 117.41ms
iter 275860: loss nan, time 117.21ms
iter 275870: loss nan, time 116.88ms
iter 275880: loss nan, time 118.04ms
iter 275890: loss nan, time 117.21ms
tensor(0.0778)
iter 275900: loss nan, time 121.27ms
iter 275910: loss nan, time 121.10ms
iter 275920: loss nan, time 122.06ms
iter 275930: loss nan, time 120.74ms
iter 275940: loss nan, time 121.34ms
iter 275950: loss nan, time 120.78ms
iter 275960: loss nan, time 119.36ms
iter 275970: loss nan, time 122.48ms
iter 275980: loss nan, time 121.40ms
iter 275990: loss nan, time 118.91ms
tensor(0.0955)
step 276000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 276000: loss nan, time 2918.22ms
iter 276010: loss nan, time 122.22ms
iter 276020: loss nan, time 120.78ms
iter 276030: loss nan, time 121.28ms
iter 276040: loss nan, time 121.25ms
iter 276050: loss nan, time 121.39ms
iter 276060: loss nan, time 122.18ms
iter 276070: loss nan, time 121.19ms
iter 276080: loss nan, time 121.08ms
iter 276090: loss nan, time 120.99ms
tensor(0.1147)
iter 276100: loss nan, time 123.56ms
iter 276110: loss nan, time 119.14ms
iter 276120: loss nan, time 121.13ms
iter 276130: loss nan, time 121.31ms
iter 276140: loss nan, time 121.22ms
iter 276150: loss nan, time 119.71ms
iter 276160: loss nan, time 121.20ms
iter 276170: loss nan, time 121.09ms
iter 276180: loss nan, time 121.15ms
iter 276190: loss nan, time 120.63ms
tensor(0.1355)
iter 276200: loss nan, time 121.76ms
iter 276210: loss nan, time 121.15ms
iter 276220: loss nan, time 121.21ms
iter 276230: loss nan, time 122.09ms
iter 276240: loss nan, time 121.24ms
step 276250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 276250: loss nan, time 2907.43ms
iter 276260: loss nan, time 121.60ms
iter 276270: loss nan, time 121.30ms
iter 276280: loss nan, time 121.25ms
iter 276290: loss nan, time 120.84ms
tensor(0.1577)
iter 276300: loss nan, time 121.43ms
iter 276310: loss nan, time 118.85ms
iter 276320: loss nan, time 121.14ms
iter 276330: loss nan, time 121.53ms
iter 276340: loss nan, time 118.78ms
iter 276350: loss nan, time 121.29ms
iter 276360: loss nan, time 121.08ms
iter 276370: loss nan, time 121.65ms
iter 276380: loss nan, time 121.44ms
iter 276390: loss nan, time 119.90ms
tensor(0.1813)
iter 276400: loss nan, time 121.40ms
iter 276410: loss nan, time 121.38ms
iter 276420: loss nan, time 122.18ms
iter 276430: loss nan, time 120.80ms
iter 276440: loss nan, time 121.68ms
iter 276450: loss nan, time 121.58ms
iter 276460: loss nan, time 122.26ms
iter 276470: loss nan, time 120.36ms
iter 276480: loss nan, time 121.45ms
iter 276490: loss nan, time 121.48ms
tensor(0.2061)
step 276500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 276500: loss nan, time 2896.33ms
iter 276510: loss nan, time 119.16ms
iter 276520: loss nan, time 121.74ms
iter 276530: loss nan, time 121.20ms
iter 276540: loss nan, time 119.14ms
iter 276550: loss nan, time 121.14ms
iter 276560: loss nan, time 121.33ms
iter 276570: loss nan, time 122.36ms
iter 276580: loss nan, time 120.87ms
iter 276590: loss nan, time 120.98ms
tensor(0.2321)
iter 276600: loss nan, time 121.93ms
iter 276610: loss nan, time 121.61ms
iter 276620: loss nan, time 120.73ms
iter 276630: loss nan, time 121.30ms
iter 276640: loss nan, time 120.44ms
iter 276650: loss nan, time 121.06ms
iter 276660: loss nan, time 121.81ms
iter 276670: loss nan, time 120.97ms
iter 276680: loss nan, time 121.57ms
iter 276690: loss nan, time 121.35ms
tensor(0.2591)
iter 276700: loss nan, time 119.51ms
iter 276710: loss nan, time 120.49ms
iter 276720: loss nan, time 121.31ms
iter 276730: loss nan, time 119.04ms
iter 276740: loss nan, time 121.22ms
step 276750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 276750: loss nan, time 2896.30ms
iter 276760: loss nan, time 121.28ms
iter 276770: loss nan, time 121.11ms
iter 276780: loss nan, time 121.96ms
iter 276790: loss nan, time 121.16ms
tensor(0.2871)
iter 276800: loss nan, time 121.41ms
iter 276810: loss nan, time 121.03ms
iter 276820: loss nan, time 121.61ms
iter 276830: loss nan, time 120.71ms
iter 276840: loss nan, time 121.39ms
iter 276850: loss nan, time 121.36ms
iter 276860: loss nan, time 121.43ms
iter 276870: loss nan, time 122.84ms
iter 276880: loss nan, time 120.85ms
iter 276890: loss nan, time 121.75ms
tensor(0.3159)
iter 276900: loss nan, time 119.98ms
iter 276910: loss nan, time 120.74ms
iter 276920: loss nan, time 121.26ms
iter 276930: loss nan, time 119.08ms
iter 276940: loss nan, time 119.59ms
iter 276950: loss nan, time 120.86ms
iter 276960: loss nan, time 121.30ms
iter 276970: loss nan, time 122.14ms
iter 276980: loss nan, time 122.87ms
iter 276990: loss nan, time 123.47ms
tensor(0.3455)
step 277000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 277000: loss nan, time 2913.40ms
iter 277010: loss nan, time 121.23ms
iter 277020: loss nan, time 120.77ms
iter 277030: loss nan, time 121.04ms
iter 277040: loss nan, time 121.16ms
iter 277050: loss nan, time 119.08ms
iter 277060: loss nan, time 120.42ms
iter 277070: loss nan, time 121.25ms
iter 277080: loss nan, time 121.19ms
iter 277090: loss nan, time 123.48ms
tensor(0.3757)
iter 277100: loss nan, time 121.27ms
iter 277110: loss nan, time 121.27ms
iter 277120: loss nan, time 121.20ms
iter 277130: loss nan, time 119.54ms
iter 277140: loss nan, time 119.66ms
iter 277150: loss nan, time 120.94ms
iter 277160: loss nan, time 121.21ms
iter 277170: loss nan, time 122.94ms
iter 277180: loss nan, time 122.79ms
iter 277190: loss nan, time 123.26ms
tensor(0.4063)
iter 277200: loss nan, time 121.60ms
iter 277210: loss nan, time 121.15ms
iter 277220: loss nan, time 119.24ms
iter 277230: loss nan, time 118.96ms
iter 277240: loss nan, time 118.95ms
step 277250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 277250: loss nan, time 2900.40ms
iter 277260: loss nan, time 120.83ms
iter 277270: loss nan, time 120.96ms
iter 277280: loss nan, time 121.42ms
iter 277290: loss nan, time 122.31ms
tensor(0.4373)
iter 277300: loss nan, time 121.47ms
iter 277310: loss nan, time 121.43ms
iter 277320: loss nan, time 121.27ms
iter 277330: loss nan, time 119.41ms
iter 277340: loss nan, time 118.93ms
iter 277350: loss nan, time 119.96ms
iter 277360: loss nan, time 121.26ms
iter 277370: loss nan, time 121.80ms
iter 277380: loss nan, time 122.18ms
iter 277390: loss nan, time 123.23ms
tensor(0.4686)
iter 277400: loss nan, time 121.59ms
iter 277410: loss nan, time 121.04ms
iter 277420: loss nan, time 121.13ms
iter 277430: loss nan, time 121.07ms
iter 277440: loss nan, time 119.18ms
iter 277450: loss nan, time 120.97ms
iter 277460: loss nan, time 120.52ms
iter 277470: loss nan, time 121.09ms
iter 277480: loss nan, time 123.30ms
iter 277490: loss nan, time 121.20ms
tensor(0.5000)
step 277500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 277500: loss nan, time 2903.00ms
iter 277510: loss nan, time 123.23ms
iter 277520: loss nan, time 122.12ms
iter 277530: loss nan, time 118.31ms
iter 277540: loss nan, time 119.33ms
iter 277550: loss nan, time 120.05ms
iter 277560: loss nan, time 120.66ms
iter 277570: loss nan, time 121.33ms
iter 277580: loss nan, time 121.14ms
iter 277590: loss nan, time 121.88ms
tensor(0.5314)
iter 277600: loss nan, time 123.20ms
iter 277610: loss nan, time 123.44ms
iter 277620: loss nan, time 120.81ms
iter 277630: loss nan, time 121.32ms
iter 277640: loss nan, time 121.02ms
iter 277650: loss nan, time 120.17ms
iter 277660: loss nan, time 121.08ms
iter 277670: loss nan, time 118.37ms
iter 277680: loss nan, time 120.60ms
iter 277690: loss nan, time 117.99ms
tensor(0.5627)
iter 277700: loss nan, time 121.39ms
iter 277710: loss nan, time 121.42ms
iter 277720: loss nan, time 121.19ms
iter 277730: loss nan, time 123.47ms
iter 277740: loss nan, time 121.00ms
step 277750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 277750: loss nan, time 2909.68ms
iter 277760: loss nan, time 121.30ms
iter 277770: loss nan, time 120.28ms
iter 277780: loss nan, time 118.82ms
iter 277790: loss nan, time 120.99ms
tensor(0.5937)
iter 277800: loss nan, time 119.34ms
iter 277810: loss nan, time 119.40ms
iter 277820: loss nan, time 120.41ms
iter 277830: loss nan, time 121.42ms
iter 277840: loss nan, time 121.89ms
iter 277850: loss nan, time 122.42ms
iter 277860: loss nan, time 123.11ms
iter 277870: loss nan, time 121.21ms
iter 277880: loss nan, time 121.25ms
iter 277890: loss nan, time 121.21ms
tensor(0.6243)
iter 277900: loss nan, time 121.32ms
iter 277910: loss nan, time 120.29ms
iter 277920: loss nan, time 119.20ms
iter 277930: loss nan, time 119.16ms
iter 277940: loss nan, time 119.21ms
iter 277950: loss nan, time 121.52ms
iter 277960: loss nan, time 121.11ms
iter 277970: loss nan, time 121.15ms
iter 277980: loss nan, time 122.62ms
iter 277990: loss nan, time 121.13ms
tensor(0.6545)
step 278000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 278000: loss nan, time 2912.69ms
iter 278010: loss nan, time 121.36ms
iter 278020: loss nan, time 121.49ms
iter 278030: loss nan, time 118.33ms
iter 278040: loss nan, time 119.18ms
iter 278050: loss nan, time 120.07ms
iter 278060: loss nan, time 121.07ms
iter 278070: loss nan, time 121.55ms
iter 278080: loss nan, time 122.70ms
iter 278090: loss nan, time 123.84ms
tensor(0.6841)
iter 278100: loss nan, time 121.83ms
iter 278110: loss nan, time 121.25ms
iter 278120: loss nan, time 120.58ms
iter 278130: loss nan, time 119.39ms
iter 278140: loss nan, time 118.40ms
iter 278150: loss nan, time 121.51ms
iter 278160: loss nan, time 121.02ms
iter 278170: loss nan, time 121.56ms
iter 278180: loss nan, time 123.75ms
iter 278190: loss nan, time 121.32ms
tensor(0.7129)
iter 278200: loss nan, time 121.62ms
iter 278210: loss nan, time 120.48ms
iter 278220: loss nan, time 120.09ms
iter 278230: loss nan, time 120.07ms
iter 278240: loss nan, time 121.43ms
step 278250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 278250: loss nan, time 2928.53ms
iter 278260: loss nan, time 123.55ms
iter 278270: loss nan, time 121.15ms
iter 278280: loss nan, time 121.91ms
iter 278290: loss nan, time 121.09ms
tensor(0.7409)
iter 278300: loss nan, time 119.48ms
iter 278310: loss nan, time 118.73ms
iter 278320: loss nan, time 120.87ms
iter 278330: loss nan, time 120.34ms
iter 278340: loss nan, time 119.16ms
iter 278350: loss nan, time 122.63ms
iter 278360: loss nan, time 123.46ms
iter 278370: loss nan, time 118.99ms
iter 278380: loss nan, time 121.11ms
iter 278390: loss nan, time 121.30ms
tensor(0.7679)
iter 278400: loss nan, time 120.67ms
iter 278410: loss nan, time 119.23ms
iter 278420: loss nan, time 118.79ms
iter 278430: loss nan, time 120.34ms
iter 278440: loss nan, time 121.09ms
iter 278450: loss nan, time 121.18ms
iter 278460: loss nan, time 122.26ms
iter 278470: loss nan, time 122.60ms
iter 278480: loss nan, time 123.32ms
iter 278490: loss nan, time 121.22ms
tensor(0.7939)
step 278500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 278500: loss nan, time 2911.10ms
iter 278510: loss nan, time 121.10ms
iter 278520: loss nan, time 120.76ms
iter 278530: loss nan, time 119.00ms
iter 278540: loss nan, time 119.16ms
iter 278550: loss nan, time 120.47ms
iter 278560: loss nan, time 121.14ms
iter 278570: loss nan, time 121.65ms
iter 278580: loss nan, time 122.22ms
iter 278590: loss nan, time 119.85ms
tensor(0.8187)
iter 278600: loss nan, time 123.45ms
iter 278610: loss nan, time 120.69ms
iter 278620: loss nan, time 118.92ms
iter 278630: loss nan, time 120.09ms
iter 278640: loss nan, time 121.05ms
iter 278650: loss nan, time 121.35ms
iter 278660: loss nan, time 118.72ms
iter 278670: loss nan, time 119.46ms
iter 278680: loss nan, time 120.42ms
iter 278690: loss nan, time 122.39ms
tensor(0.8423)
iter 278700: loss nan, time 123.69ms
iter 278710: loss nan, time 121.87ms
iter 278720: loss nan, time 122.42ms
iter 278730: loss nan, time 121.19ms
iter 278740: loss nan, time 120.01ms
step 278750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 278750: loss nan, time 2923.09ms
iter 278760: loss nan, time 121.26ms
iter 278770: loss nan, time 121.35ms
iter 278780: loss nan, time 122.54ms
iter 278790: loss nan, time 121.35ms
tensor(0.8645)
iter 278800: loss nan, time 120.93ms
iter 278810: loss nan, time 121.38ms
iter 278820: loss nan, time 118.30ms
iter 278830: loss nan, time 119.17ms
iter 278840: loss nan, time 120.12ms
iter 278850: loss nan, time 121.14ms
iter 278860: loss nan, time 121.66ms
iter 278870: loss nan, time 121.98ms
iter 278880: loss nan, time 123.13ms
iter 278890: loss nan, time 123.72ms
tensor(0.8853)
iter 278900: loss nan, time 121.43ms
iter 278910: loss nan, time 120.51ms
iter 278920: loss nan, time 121.91ms
iter 278930: loss nan, time 119.00ms
iter 278940: loss nan, time 118.93ms
iter 278950: loss nan, time 120.87ms
iter 278960: loss nan, time 121.19ms
iter 278970: loss nan, time 120.66ms
iter 278980: loss nan, time 117.97ms
iter 278990: loss nan, time 122.57ms
tensor(0.9045)
step 279000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 279000: loss nan, time 2908.98ms
iter 279010: loss nan, time 123.55ms
iter 279020: loss nan, time 121.55ms
iter 279030: loss nan, time 121.13ms
iter 279040: loss nan, time 120.96ms
iter 279050: loss nan, time 121.08ms
iter 279060: loss nan, time 120.17ms
iter 279070: loss nan, time 119.36ms
iter 279080: loss nan, time 117.75ms
iter 279090: loss nan, time 120.51ms
tensor(0.9222)
iter 279100: loss nan, time 121.47ms
iter 279110: loss nan, time 121.55ms
iter 279120: loss nan, time 122.43ms
iter 279130: loss nan, time 123.09ms
iter 279140: loss nan, time 121.45ms
iter 279150: loss nan, time 120.26ms
iter 279160: loss nan, time 121.10ms
iter 279170: loss nan, time 118.87ms
iter 279180: loss nan, time 119.09ms
iter 279190: loss nan, time 120.14ms
tensor(0.9382)
iter 279200: loss nan, time 121.25ms
iter 279210: loss nan, time 121.36ms
iter 279220: loss nan, time 120.52ms
iter 279230: loss nan, time 120.68ms
iter 279240: loss nan, time 123.66ms
step 279250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 279250: loss nan, time 2902.49ms
iter 279260: loss nan, time 123.48ms
iter 279270: loss nan, time 119.38ms
iter 279280: loss nan, time 121.22ms
iter 279290: loss nan, time 121.48ms
tensor(0.9524)
iter 279300: loss nan, time 118.03ms
iter 279310: loss nan, time 121.06ms
iter 279320: loss nan, time 121.52ms
iter 279330: loss nan, time 122.14ms
iter 279340: loss nan, time 125.11ms
iter 279350: loss nan, time 123.27ms
iter 279360: loss nan, time 121.95ms
iter 279370: loss nan, time 118.51ms
iter 279380: loss nan, time 119.88ms
iter 279390: loss nan, time 121.20ms
tensor(0.9649)
iter 279400: loss nan, time 122.60ms
iter 279410: loss nan, time 121.69ms
iter 279420: loss nan, time 122.49ms
iter 279430: loss nan, time 121.18ms
iter 279440: loss nan, time 120.22ms
iter 279450: loss nan, time 120.44ms
iter 279460: loss nan, time 119.96ms
iter 279470: loss nan, time 120.05ms
iter 279480: loss nan, time 119.58ms
iter 279490: loss nan, time 121.60ms
tensor(0.9755)
step 279500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 279500: loss nan, time 2800.47ms
iter 279510: loss nan, time 122.15ms
iter 279520: loss nan, time 122.09ms
iter 279530: loss nan, time 121.48ms
iter 279540: loss nan, time 121.28ms
iter 279550: loss nan, time 119.68ms
iter 279560: loss nan, time 121.11ms
iter 279570: loss nan, time 121.88ms
iter 279580: loss nan, time 122.88ms
iter 279590: loss nan, time 124.05ms
tensor(0.9843)
iter 279600: loss nan, time 122.51ms
iter 279610: loss nan, time 119.42ms
iter 279620: loss nan, time 121.81ms
iter 279630: loss nan, time 119.82ms
iter 279640: loss nan, time 123.48ms
iter 279650: loss nan, time 121.89ms
iter 279660: loss nan, time 119.61ms
iter 279670: loss nan, time 120.22ms
iter 279680: loss nan, time 121.56ms
iter 279690: loss nan, time 122.45ms
tensor(0.9911)
iter 279700: loss nan, time 124.37ms
iter 279710: loss nan, time 121.88ms
iter 279720: loss nan, time 121.77ms
iter 279730: loss nan, time 120.64ms
iter 279740: loss nan, time 122.61ms
step 279750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 279750: loss nan, time 2918.75ms
iter 279760: loss nan, time 121.45ms
iter 279770: loss nan, time 121.61ms
iter 279780: loss nan, time 121.28ms
iter 279790: loss nan, time 121.75ms
tensor(0.9961)
iter 279800: loss nan, time 119.41ms
iter 279810: loss nan, time 120.30ms
iter 279820: loss nan, time 121.53ms
iter 279830: loss nan, time 121.26ms
iter 279840: loss nan, time 122.28ms
iter 279850: loss nan, time 123.70ms
iter 279860: loss nan, time 121.20ms
iter 279870: loss nan, time 121.72ms
iter 279880: loss nan, time 121.42ms
iter 279890: loss nan, time 119.59ms
tensor(0.9990)
iter 279900: loss nan, time 120.02ms
iter 279910: loss nan, time 121.35ms
iter 279920: loss nan, time 119.66ms
iter 279930: loss nan, time 122.06ms
iter 279940: loss nan, time 121.20ms
iter 279950: loss nan, time 119.53ms
iter 279960: loss nan, time 121.28ms
iter 279970: loss nan, time 118.83ms
iter 279980: loss nan, time 120.44ms
iter 279990: loss nan, time 121.52ms
tensor(1.)
step 280000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 280000: loss nan, time 2917.68ms
iter 280010: loss nan, time 122.77ms
iter 280020: loss nan, time 123.52ms
iter 280030: loss nan, time 121.24ms
iter 280040: loss nan, time 121.35ms
iter 280050: loss nan, time 121.40ms
iter 280060: loss nan, time 119.03ms
iter 280070: loss nan, time 118.85ms
iter 280080: loss nan, time 120.66ms
iter 280090: loss nan, time 121.39ms
tensor(0.9990)
iter 280100: loss nan, time 121.56ms
iter 280110: loss nan, time 123.42ms
iter 280120: loss nan, time 121.16ms
iter 280130: loss nan, time 121.15ms
iter 280140: loss nan, time 121.70ms
iter 280150: loss nan, time 119.32ms
iter 280160: loss nan, time 119.91ms
iter 280170: loss nan, time 121.41ms
iter 280180: loss nan, time 121.67ms
iter 280190: loss nan, time 122.86ms
tensor(0.9961)
iter 280200: loss nan, time 123.75ms
iter 280210: loss nan, time 121.39ms
iter 280220: loss nan, time 121.23ms
iter 280230: loss nan, time 121.18ms
iter 280240: loss nan, time 119.58ms
step 280250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 280250: loss nan, time 2910.95ms
iter 280260: loss nan, time 117.92ms
iter 280270: loss nan, time 117.47ms
iter 280280: loss nan, time 116.54ms
iter 280290: loss nan, time 117.33ms
tensor(0.9911)
iter 280300: loss nan, time 120.66ms
iter 280310: loss nan, time 117.37ms
iter 280320: loss nan, time 116.80ms
iter 280330: loss nan, time 119.36ms
iter 280340: loss nan, time 117.03ms
iter 280350: loss nan, time 114.85ms
iter 280360: loss nan, time 117.20ms
iter 280370: loss nan, time 116.90ms
iter 280380: loss nan, time 115.52ms
iter 280390: loss nan, time 117.52ms
tensor(0.9843)
iter 280400: loss nan, time 117.68ms
iter 280410: loss nan, time 116.36ms
iter 280420: loss nan, time 117.03ms
iter 280430: loss nan, time 116.73ms
iter 280440: loss nan, time 117.23ms
iter 280450: loss nan, time 116.99ms
iter 280460: loss nan, time 119.25ms
iter 280470: loss nan, time 117.55ms
iter 280480: loss nan, time 115.41ms
iter 280490: loss nan, time 114.92ms
tensor(0.9755)
step 280500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 280500: loss nan, time 2901.38ms
iter 280510: loss nan, time 115.22ms
iter 280520: loss nan, time 117.24ms
iter 280530: loss nan, time 118.36ms
iter 280540: loss nan, time 116.74ms
iter 280550: loss nan, time 116.92ms
iter 280560: loss nan, time 116.91ms
iter 280570: loss nan, time 116.82ms
iter 280580: loss nan, time 116.99ms
iter 280590: loss nan, time 119.38ms
tensor(0.9649)
iter 280600: loss nan, time 117.53ms
iter 280610: loss nan, time 114.83ms
iter 280620: loss nan, time 115.25ms
iter 280630: loss nan, time 117.14ms
iter 280640: loss nan, time 114.61ms
iter 280650: loss nan, time 118.76ms
iter 280660: loss nan, time 116.89ms
iter 280670: loss nan, time 115.01ms
iter 280680: loss nan, time 114.84ms
iter 280690: loss nan, time 116.69ms
tensor(0.9524)
iter 280700: loss nan, time 117.11ms
iter 280710: loss nan, time 118.73ms
iter 280720: loss nan, time 116.64ms
iter 280730: loss nan, time 116.72ms
iter 280740: loss nan, time 116.71ms
step 280750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 280750: loss nan, time 2900.23ms
iter 280760: loss nan, time 116.68ms
iter 280770: loss nan, time 118.23ms
iter 280780: loss nan, time 115.51ms
iter 280790: loss nan, time 116.91ms
tensor(0.9382)
iter 280800: loss nan, time 118.46ms
iter 280810: loss nan, time 115.47ms
iter 280820: loss nan, time 116.60ms
iter 280830: loss nan, time 117.98ms
iter 280840: loss nan, time 116.38ms
iter 280850: loss nan, time 116.71ms
iter 280860: loss nan, time 117.67ms
iter 280870: loss nan, time 115.49ms
iter 280880: loss nan, time 116.75ms
iter 280890: loss nan, time 118.56ms
tensor(0.9222)
iter 280900: loss nan, time 116.80ms
iter 280910: loss nan, time 117.29ms
iter 280920: loss nan, time 119.65ms
iter 280930: loss nan, time 116.89ms
iter 280940: loss nan, time 116.57ms
iter 280950: loss nan, time 119.71ms
iter 280960: loss nan, time 116.98ms
iter 280970: loss nan, time 115.22ms
iter 280980: loss nan, time 115.17ms
iter 280990: loss nan, time 117.10ms
tensor(0.9045)
step 281000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 281000: loss nan, time 2917.19ms
iter 281010: loss nan, time 117.08ms
iter 281020: loss nan, time 118.41ms
iter 281030: loss nan, time 116.52ms
iter 281040: loss nan, time 117.41ms
iter 281050: loss nan, time 117.08ms
iter 281060: loss nan, time 116.98ms
iter 281070: loss nan, time 116.05ms
iter 281080: loss nan, time 118.67ms
iter 281090: loss nan, time 117.05ms
tensor(0.8853)
iter 281100: loss nan, time 115.60ms
iter 281110: loss nan, time 117.03ms
iter 281120: loss nan, time 117.01ms
iter 281130: loss nan, time 114.90ms
iter 281140: loss nan, time 117.17ms
iter 281150: loss nan, time 117.29ms
iter 281160: loss nan, time 115.43ms
iter 281170: loss nan, time 117.36ms
iter 281180: loss nan, time 117.10ms
iter 281190: loss nan, time 114.94ms
tensor(0.8645)
iter 281200: loss nan, time 117.38ms
iter 281210: loss nan, time 117.01ms
iter 281220: loss nan, time 115.62ms
iter 281230: loss nan, time 117.00ms
iter 281240: loss nan, time 116.95ms
step 281250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 281250: loss nan, time 2901.44ms
iter 281260: loss nan, time 117.06ms
iter 281270: loss nan, time 116.90ms
iter 281280: loss nan, time 115.01ms
iter 281290: loss nan, time 117.72ms
tensor(0.8423)
iter 281300: loss nan, time 117.26ms
iter 281310: loss nan, time 114.96ms
iter 281320: loss nan, time 117.03ms
iter 281330: loss nan, time 116.99ms
iter 281340: loss nan, time 115.84ms
iter 281350: loss nan, time 117.00ms
iter 281360: loss nan, time 117.69ms
iter 281370: loss nan, time 115.71ms
iter 281380: loss nan, time 116.90ms
iter 281390: loss nan, time 118.17ms
tensor(0.8187)
iter 281400: loss nan, time 116.48ms
iter 281410: loss nan, time 117.98ms
iter 281420: loss nan, time 119.23ms
iter 281430: loss nan, time 116.87ms
iter 281440: loss nan, time 116.24ms
iter 281450: loss nan, time 117.84ms
iter 281460: loss nan, time 117.09ms
iter 281470: loss nan, time 115.66ms
iter 281480: loss nan, time 117.69ms
iter 281490: loss nan, time 117.01ms
tensor(0.7939)
step 281500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 281500: loss nan, time 2909.39ms
iter 281510: loss nan, time 119.18ms
iter 281520: loss nan, time 116.83ms
iter 281530: loss nan, time 114.86ms
iter 281540: loss nan, time 119.22ms
iter 281550: loss nan, time 116.95ms
iter 281560: loss nan, time 114.58ms
iter 281570: loss nan, time 118.34ms
iter 281580: loss nan, time 116.92ms
iter 281590: loss nan, time 114.90ms
tensor(0.7679)
iter 281600: loss nan, time 119.52ms
iter 281610: loss nan, time 116.16ms
iter 281620: loss nan, time 114.98ms
iter 281630: loss nan, time 118.38ms
iter 281640: loss nan, time 116.92ms
iter 281650: loss nan, time 115.22ms
iter 281660: loss nan, time 119.22ms
iter 281670: loss nan, time 116.93ms
iter 281680: loss nan, time 114.86ms
iter 281690: loss nan, time 117.91ms
tensor(0.7409)
iter 281700: loss nan, time 116.99ms
iter 281710: loss nan, time 115.22ms
iter 281720: loss nan, time 117.18ms
iter 281730: loss nan, time 116.11ms
iter 281740: loss nan, time 114.86ms
step 281750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 281750: loss nan, time 2919.65ms
iter 281760: loss nan, time 117.63ms
iter 281770: loss nan, time 115.81ms
iter 281780: loss nan, time 116.93ms
iter 281790: loss nan, time 118.51ms
tensor(0.7129)
iter 281800: loss nan, time 116.69ms
iter 281810: loss nan, time 116.65ms
iter 281820: loss nan, time 116.76ms
iter 281830: loss nan, time 116.85ms
iter 281840: loss nan, time 117.51ms
iter 281850: loss nan, time 116.65ms
iter 281860: loss nan, time 116.06ms
iter 281870: loss nan, time 116.82ms
iter 281880: loss nan, time 116.24ms
iter 281890: loss nan, time 116.92ms
tensor(0.6841)
iter 281900: loss nan, time 117.27ms
iter 281910: loss nan, time 116.58ms
iter 281920: loss nan, time 116.93ms
iter 281930: loss nan, time 116.89ms
iter 281940: loss nan, time 116.64ms
iter 281950: loss nan, time 115.91ms
iter 281960: loss nan, time 117.69ms
iter 281970: loss nan, time 117.49ms
iter 281980: loss nan, time 116.02ms
iter 281990: loss nan, time 116.75ms
tensor(0.6545)
step 282000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 282000: loss nan, time 2925.40ms
iter 282010: loss nan, time 117.97ms
iter 282020: loss nan, time 116.79ms
iter 282030: loss nan, time 116.90ms
iter 282040: loss nan, time 117.20ms
iter 282050: loss nan, time 117.03ms
iter 282060: loss nan, time 117.49ms
iter 282070: loss nan, time 116.56ms
iter 282080: loss nan, time 116.90ms
iter 282090: loss nan, time 117.26ms
tensor(0.6243)
iter 282100: loss nan, time 117.57ms
iter 282110: loss nan, time 116.40ms
iter 282120: loss nan, time 117.81ms
iter 282130: loss nan, time 117.29ms
iter 282140: loss nan, time 116.67ms
iter 282150: loss nan, time 116.61ms
iter 282160: loss nan, time 116.97ms
iter 282170: loss nan, time 116.96ms
iter 282180: loss nan, time 117.31ms
iter 282190: loss nan, time 117.17ms
tensor(0.5937)
iter 282200: loss nan, time 117.28ms
iter 282210: loss nan, time 117.02ms
iter 282220: loss nan, time 116.95ms
iter 282230: loss nan, time 116.24ms
iter 282240: loss nan, time 116.87ms
step 282250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 282250: loss nan, time 2908.73ms
iter 282260: loss nan, time 117.91ms
iter 282270: loss nan, time 117.07ms
iter 282280: loss nan, time 116.91ms
iter 282290: loss nan, time 116.54ms
tensor(0.5627)
iter 282300: loss nan, time 117.44ms
iter 282310: loss nan, time 116.98ms
iter 282320: loss nan, time 116.29ms
iter 282330: loss nan, time 116.91ms
iter 282340: loss nan, time 116.96ms
iter 282350: loss nan, time 117.89ms
iter 282360: loss nan, time 116.81ms
iter 282370: loss nan, time 116.97ms
iter 282380: loss nan, time 116.24ms
iter 282390: loss nan, time 116.83ms
tensor(0.5314)
iter 282400: loss nan, time 117.91ms
iter 282410: loss nan, time 118.00ms
iter 282420: loss nan, time 117.10ms
iter 282430: loss nan, time 117.09ms
iter 282440: loss nan, time 116.79ms
iter 282450: loss nan, time 117.01ms
iter 282460: loss nan, time 117.34ms
iter 282470: loss nan, time 116.76ms
iter 282480: loss nan, time 117.91ms
iter 282490: loss nan, time 117.25ms
tensor(0.5000)
step 282500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 282500: loss nan, time 2927.72ms
iter 282510: loss nan, time 117.31ms
iter 282520: loss nan, time 117.63ms
iter 282530: loss nan, time 117.24ms
iter 282540: loss nan, time 117.71ms
iter 282550: loss nan, time 117.34ms
iter 282560: loss nan, time 117.37ms
iter 282570: loss nan, time 117.26ms
iter 282580: loss nan, time 116.84ms
iter 282590: loss nan, time 118.22ms
tensor(0.4686)
iter 282600: loss nan, time 117.88ms
iter 282610: loss nan, time 117.47ms
iter 282620: loss nan, time 115.05ms
iter 282630: loss nan, time 117.29ms
iter 282640: loss nan, time 117.31ms
iter 282650: loss nan, time 116.95ms
iter 282660: loss nan, time 117.09ms
iter 282670: loss nan, time 122.61ms
iter 282680: loss nan, time 120.87ms
iter 282690: loss nan, time 121.30ms
tensor(0.4373)
iter 282700: loss nan, time 124.09ms
iter 282710: loss nan, time 120.22ms
iter 282720: loss nan, time 121.39ms
iter 282730: loss nan, time 122.91ms
iter 282740: loss nan, time 119.50ms
step 282750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 282750: loss nan, time 2925.61ms
iter 282760: loss nan, time 121.71ms
iter 282770: loss nan, time 121.81ms
iter 282780: loss nan, time 120.04ms
iter 282790: loss nan, time 121.96ms
tensor(0.4063)
iter 282800: loss nan, time 122.56ms
iter 282810: loss nan, time 121.25ms
iter 282820: loss nan, time 121.47ms
iter 282830: loss nan, time 121.24ms
iter 282840: loss nan, time 121.76ms
iter 282850: loss nan, time 121.39ms
iter 282860: loss nan, time 121.64ms
iter 282870: loss nan, time 121.27ms
iter 282880: loss nan, time 121.16ms
iter 282890: loss nan, time 121.16ms
tensor(0.3757)
iter 282900: loss nan, time 117.52ms
iter 282910: loss nan, time 122.03ms
iter 282920: loss nan, time 120.56ms
iter 282930: loss nan, time 121.35ms
iter 282940: loss nan, time 119.02ms
iter 282950: loss nan, time 120.82ms
iter 282960: loss nan, time 121.41ms
iter 282970: loss nan, time 121.40ms
iter 282980: loss nan, time 120.39ms
iter 282990: loss nan, time 121.04ms
tensor(0.3455)
step 283000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 283000: loss nan, time 2926.78ms
iter 283010: loss nan, time 121.29ms
iter 283020: loss nan, time 120.00ms
iter 283030: loss nan, time 119.45ms
iter 283040: loss nan, time 120.45ms
iter 283050: loss nan, time 121.01ms
iter 283060: loss nan, time 121.20ms
iter 283070: loss nan, time 121.33ms
iter 283080: loss nan, time 121.53ms
iter 283090: loss nan, time 120.58ms
tensor(0.3159)
iter 283100: loss nan, time 121.37ms
iter 283110: loss nan, time 120.99ms
iter 283120: loss nan, time 121.15ms
iter 283130: loss nan, time 122.19ms
iter 283140: loss nan, time 117.23ms
iter 283150: loss nan, time 120.92ms
iter 283160: loss nan, time 121.12ms
iter 283170: loss nan, time 121.14ms
iter 283180: loss nan, time 121.14ms
iter 283190: loss nan, time 119.40ms
tensor(0.2871)
iter 283200: loss nan, time 122.05ms
iter 283210: loss nan, time 121.20ms
iter 283220: loss nan, time 118.79ms
iter 283230: loss nan, time 121.09ms
iter 283240: loss nan, time 121.44ms
step 283250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 283250: loss nan, time 2912.58ms
iter 283260: loss nan, time 120.11ms
iter 283270: loss nan, time 121.83ms
iter 283280: loss nan, time 120.85ms
iter 283290: loss nan, time 121.20ms
tensor(0.2591)
iter 283300: loss nan, time 121.75ms
iter 283310: loss nan, time 123.92ms
iter 283320: loss nan, time 123.08ms
iter 283330: loss nan, time 121.29ms
iter 283340: loss nan, time 121.75ms
iter 283350: loss nan, time 121.31ms
iter 283360: loss nan, time 119.32ms
iter 283370: loss nan, time 120.08ms
iter 283380: loss nan, time 120.90ms
iter 283390: loss nan, time 119.14ms
tensor(0.2321)
iter 283400: loss nan, time 121.08ms
iter 283410: loss nan, time 122.24ms
iter 283420: loss nan, time 121.29ms
iter 283430: loss nan, time 122.12ms
iter 283440: loss nan, time 121.13ms
iter 283450: loss nan, time 121.20ms
iter 283460: loss nan, time 121.29ms
iter 283470: loss nan, time 117.85ms
iter 283480: loss nan, time 119.71ms
iter 283490: loss nan, time 121.20ms
tensor(0.2061)
step 283500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 283500: loss nan, time 2912.54ms
iter 283510: loss nan, time 121.30ms
iter 283520: loss nan, time 122.31ms
iter 283530: loss nan, time 122.92ms
iter 283540: loss nan, time 123.59ms
iter 283550: loss nan, time 121.06ms
iter 283560: loss nan, time 121.90ms
iter 283570: loss nan, time 119.61ms
iter 283580: loss nan, time 119.47ms
iter 283590: loss nan, time 119.54ms
tensor(0.1813)
iter 283600: loss nan, time 119.33ms
iter 283610: loss nan, time 121.46ms
iter 283620: loss nan, time 121.77ms
iter 283630: loss nan, time 123.22ms
iter 283640: loss nan, time 121.85ms
iter 283650: loss nan, time 119.63ms
iter 283660: loss nan, time 121.20ms
iter 283670: loss nan, time 119.20ms
iter 283680: loss nan, time 119.46ms
iter 283690: loss nan, time 120.31ms
tensor(0.1577)
iter 283700: loss nan, time 121.80ms
iter 283710: loss nan, time 121.26ms
iter 283720: loss nan, time 122.81ms
iter 283730: loss nan, time 123.56ms
iter 283740: loss nan, time 120.50ms
step 283750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 283750: loss nan, time 2923.25ms
iter 283760: loss nan, time 120.15ms
iter 283770: loss nan, time 119.27ms
iter 283780: loss nan, time 118.79ms
iter 283790: loss nan, time 119.30ms
tensor(0.1355)
iter 283800: loss nan, time 121.74ms
iter 283810: loss nan, time 121.75ms
iter 283820: loss nan, time 121.42ms
iter 283830: loss nan, time 123.43ms
iter 283840: loss nan, time 121.25ms
iter 283850: loss nan, time 120.87ms
iter 283860: loss nan, time 121.13ms
iter 283870: loss nan, time 119.05ms
iter 283880: loss nan, time 119.72ms
iter 283890: loss nan, time 120.82ms
tensor(0.1147)
iter 283900: loss nan, time 121.51ms
iter 283910: loss nan, time 122.30ms
iter 283920: loss nan, time 122.86ms
iter 283930: loss nan, time 123.58ms
iter 283940: loss nan, time 121.51ms
iter 283950: loss nan, time 121.21ms
iter 283960: loss nan, time 121.89ms
iter 283970: loss nan, time 119.12ms
iter 283980: loss nan, time 118.56ms
iter 283990: loss nan, time 121.42ms
tensor(0.0955)
step 284000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 284000: loss nan, time 2910.85ms
iter 284010: loss nan, time 121.26ms
iter 284020: loss nan, time 121.33ms
iter 284030: loss nan, time 122.92ms
iter 284040: loss nan, time 121.53ms
iter 284050: loss nan, time 120.96ms
iter 284060: loss nan, time 121.26ms
iter 284070: loss nan, time 119.35ms
iter 284080: loss nan, time 119.08ms
iter 284090: loss nan, time 119.80ms
tensor(0.0778)
iter 284100: loss nan, time 120.80ms
iter 284110: loss nan, time 121.27ms
iter 284120: loss nan, time 121.71ms
iter 284130: loss nan, time 123.59ms
iter 284140: loss nan, time 121.66ms
iter 284150: loss nan, time 121.46ms
iter 284160: loss nan, time 121.79ms
iter 284170: loss nan, time 120.31ms
iter 284180: loss nan, time 119.02ms
iter 284190: loss nan, time 120.20ms
tensor(0.0618)
iter 284200: loss nan, time 121.81ms
iter 284210: loss nan, time 121.18ms
iter 284220: loss nan, time 123.32ms
iter 284230: loss nan, time 121.54ms
iter 284240: loss nan, time 121.14ms
step 284250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 284250: loss nan, time 2902.46ms
iter 284260: loss nan, time 121.42ms
iter 284270: loss nan, time 119.16ms
iter 284280: loss nan, time 119.35ms
iter 284290: loss nan, time 119.99ms
tensor(0.0476)
iter 284300: loss nan, time 121.84ms
iter 284310: loss nan, time 121.27ms
iter 284320: loss nan, time 121.81ms
iter 284330: loss nan, time 122.78ms
iter 284340: loss nan, time 123.24ms
iter 284350: loss nan, time 121.34ms
iter 284360: loss nan, time 121.18ms
iter 284370: loss nan, time 119.45ms
iter 284380: loss nan, time 119.45ms
iter 284390: loss nan, time 121.45ms
tensor(0.0351)
iter 284400: loss nan, time 121.32ms
iter 284410: loss nan, time 121.17ms
iter 284420: loss nan, time 123.50ms
iter 284430: loss nan, time 121.76ms
iter 284440: loss nan, time 121.18ms
iter 284450: loss nan, time 121.29ms
iter 284460: loss nan, time 119.89ms
iter 284470: loss nan, time 121.50ms
iter 284480: loss nan, time 121.54ms
iter 284490: loss nan, time 123.09ms
tensor(0.0245)
step 284500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 284500: loss nan, time 2921.76ms
iter 284510: loss nan, time 121.53ms
iter 284520: loss nan, time 121.56ms
iter 284530: loss nan, time 119.71ms
iter 284540: loss nan, time 122.36ms
iter 284550: loss nan, time 122.13ms
iter 284560: loss nan, time 121.63ms
iter 284570: loss nan, time 121.49ms
iter 284580: loss nan, time 121.30ms
iter 284590: loss nan, time 121.44ms
tensor(0.0157)
iter 284600: loss nan, time 119.51ms
iter 284610: loss nan, time 120.68ms
iter 284620: loss nan, time 121.34ms
iter 284630: loss nan, time 121.33ms
iter 284640: loss nan, time 122.98ms
iter 284650: loss nan, time 121.81ms
iter 284660: loss nan, time 121.49ms
iter 284670: loss nan, time 120.56ms
iter 284680: loss nan, time 118.86ms
iter 284690: loss nan, time 119.37ms
tensor(0.0089)
iter 284700: loss nan, time 121.59ms
iter 284710: loss nan, time 122.21ms
iter 284720: loss nan, time 121.06ms
iter 284730: loss nan, time 121.90ms
iter 284740: loss nan, time 120.97ms
step 284750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 284750: loss nan, time 2910.13ms
iter 284760: loss nan, time 119.31ms
iter 284770: loss nan, time 119.24ms
iter 284780: loss nan, time 119.96ms
iter 284790: loss nan, time 121.76ms
tensor(0.0039)
iter 284800: loss nan, time 121.56ms
iter 284810: loss nan, time 122.35ms
iter 284820: loss nan, time 122.34ms
iter 284830: loss nan, time 123.61ms
iter 284840: loss nan, time 121.22ms
iter 284850: loss nan, time 121.26ms
iter 284860: loss nan, time 119.47ms
iter 284870: loss nan, time 119.55ms
iter 284880: loss nan, time 122.36ms
iter 284890: loss nan, time 121.55ms
tensor(0.0010)
iter 284900: loss nan, time 121.56ms
iter 284910: loss nan, time 123.59ms
iter 284920: loss nan, time 121.46ms
iter 284930: loss nan, time 121.41ms
iter 284940: loss nan, time 119.48ms
iter 284950: loss nan, time 120.53ms
iter 284960: loss nan, time 121.30ms
iter 284970: loss nan, time 121.57ms
iter 284980: loss nan, time 123.07ms
iter 284990: loss nan, time 123.38ms
tensor(0.0010)
step 285000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 285000: loss nan, time 2917.82ms
iter 285010: loss nan, time 121.03ms
iter 285020: loss nan, time 121.33ms
iter 285030: loss nan, time 119.13ms
iter 285040: loss nan, time 119.72ms
iter 285050: loss nan, time 121.20ms
iter 285060: loss nan, time 121.63ms
iter 285070: loss nan, time 120.42ms
iter 285080: loss nan, time 122.39ms
iter 285090: loss nan, time 121.42ms
tensor(0.0010)
iter 285100: loss nan, time 119.67ms
iter 285110: loss nan, time 121.38ms
iter 285120: loss nan, time 119.72ms
iter 285130: loss nan, time 121.45ms
iter 285140: loss nan, time 121.29ms
iter 285150: loss nan, time 123.34ms
iter 285160: loss nan, time 121.68ms
iter 285170: loss nan, time 122.10ms
iter 285180: loss nan, time 121.44ms
iter 285190: loss nan, time 120.40ms
tensor(0.0039)
iter 285200: loss nan, time 121.97ms
iter 285210: loss nan, time 119.63ms
iter 285220: loss nan, time 123.44ms
iter 285230: loss nan, time 121.87ms
iter 285240: loss nan, time 118.85ms
step 285250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 285250: loss nan, time 2909.17ms
iter 285260: loss nan, time 121.66ms
iter 285270: loss nan, time 119.37ms
iter 285280: loss nan, time 120.81ms
iter 285290: loss nan, time 121.33ms
tensor(0.0089)
iter 285300: loss nan, time 122.17ms
iter 285310: loss nan, time 122.20ms
iter 285320: loss nan, time 121.63ms
iter 285330: loss nan, time 121.35ms
iter 285340: loss nan, time 119.38ms
iter 285350: loss nan, time 120.92ms
iter 285360: loss nan, time 119.53ms
iter 285370: loss nan, time 121.60ms
iter 285380: loss nan, time 123.01ms
iter 285390: loss nan, time 119.55ms
tensor(0.0157)
iter 285400: loss nan, time 121.93ms
iter 285410: loss nan, time 119.85ms
iter 285420: loss nan, time 120.34ms
iter 285430: loss nan, time 120.25ms
iter 285440: loss nan, time 121.33ms
iter 285450: loss nan, time 121.46ms
iter 285460: loss nan, time 120.71ms
iter 285470: loss nan, time 123.35ms
iter 285480: loss nan, time 123.42ms
iter 285490: loss nan, time 121.51ms
tensor(0.0245)
step 285500: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 285500: loss nan, time 2914.69ms
iter 285510: loss nan, time 121.07ms
iter 285520: loss nan, time 120.77ms
iter 285530: loss nan, time 120.89ms
iter 285540: loss nan, time 119.74ms
iter 285550: loss nan, time 119.74ms
iter 285560: loss nan, time 118.84ms
iter 285570: loss nan, time 119.80ms
iter 285580: loss nan, time 121.28ms
iter 285590: loss nan, time 119.79ms
tensor(0.0351)
iter 285600: loss nan, time 122.97ms
iter 285610: loss nan, time 121.17ms
iter 285620: loss nan, time 123.09ms
iter 285630: loss nan, time 121.24ms
iter 285640: loss nan, time 118.67ms
iter 285650: loss nan, time 120.90ms
iter 285660: loss nan, time 120.69ms
iter 285670: loss nan, time 119.47ms
iter 285680: loss nan, time 118.65ms
iter 285690: loss nan, time 119.18ms
tensor(0.0476)
iter 285700: loss nan, time 120.78ms
iter 285710: loss nan, time 121.49ms
iter 285720: loss nan, time 121.74ms
iter 285730: loss nan, time 123.52ms
iter 285740: loss nan, time 121.67ms
step 285750: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 285750: loss nan, time 2907.12ms
iter 285760: loss nan, time 121.15ms
iter 285770: loss nan, time 121.20ms
iter 285780: loss nan, time 119.64ms
iter 285790: loss nan, time 120.64ms
tensor(0.0618)
iter 285800: loss nan, time 121.58ms
iter 285810: loss nan, time 120.28ms
iter 285820: loss nan, time 123.86ms
iter 285830: loss nan, time 121.77ms
iter 285840: loss nan, time 119.50ms
iter 285850: loss nan, time 121.55ms
iter 285860: loss nan, time 119.81ms
iter 285870: loss nan, time 121.63ms
iter 285880: loss nan, time 121.46ms
iter 285890: loss nan, time 122.95ms
tensor(0.0778)
iter 285900: loss nan, time 122.73ms
iter 285910: loss nan, time 122.64ms
iter 285920: loss nan, time 118.28ms
iter 285930: loss nan, time 117.20ms
iter 285940: loss nan, time 115.15ms
iter 285950: loss nan, time 119.23ms
iter 285960: loss nan, time 120.63ms
iter 285970: loss nan, time 118.93ms
iter 285980: loss nan, time 121.40ms
iter 285990: loss nan, time 121.52ms
tensor(0.0955)
step 286000: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 286000: loss nan, time 2893.81ms
iter 286010: loss nan, time 120.80ms
iter 286020: loss nan, time 121.22ms
iter 286030: loss nan, time 120.13ms
iter 286040: loss nan, time 121.52ms
iter 286050: loss nan, time 122.06ms
iter 286060: loss nan, time 122.02ms
iter 286070: loss nan, time 121.85ms
iter 286080: loss nan, time 122.55ms
iter 286090: loss nan, time 123.03ms
tensor(0.1147)
iter 286100: loss nan, time 123.41ms
iter 286110: loss nan, time 120.77ms
iter 286120: loss nan, time 121.11ms
iter 286130: loss nan, time 120.74ms
iter 286140: loss nan, time 120.70ms
iter 286150: loss nan, time 120.76ms
iter 286160: loss nan, time 120.78ms
iter 286170: loss nan, time 119.11ms
iter 286180: loss nan, time 118.24ms
iter 286190: loss nan, time 119.21ms
tensor(0.1355)
iter 286200: loss nan, time 120.71ms
iter 286210: loss nan, time 118.86ms
iter 286220: loss nan, time 121.06ms
iter 286230: loss nan, time 121.06ms
iter 286240: loss nan, time 120.33ms
step 286250: train loss nan, val loss nan
saving checkpoint to out-shakespeare-char
iter 286250: loss nan, time 2895.62ms
iter 286260: loss nan, time 120.13ms
iter 286270: loss nan, time 119.15ms
iter 286280: loss nan, time 121.25ms
iter 286290: loss nan, time 122.79ms
tensor(0.1577)
iter 286300: loss nan, time 121.50ms
iter 286310: loss nan, time 123.42ms
iter 286320: loss nan, time 120.78ms
iter 286330: loss nan, time 120.81ms
iter 286340: loss nan, time 120.74ms
iter 286350: loss nan, time 118.78ms
iter 286360: loss nan, time 120.94ms
iter 286370: loss nan, time 118.70ms
iter 286380: loss nan, time 119.73ms
iter 286390: loss nan, time 120.40ms
tensor(0.1813)
iter 286400: loss nan, time 120.83ms
iter 286410: loss nan, time 119.99ms
iter 286420: loss nan, time 119.99ms
