{"message": {"transcript": [{"chunks": [{"end": 1.12, "start": 0.0, "text": "Hello"}, {"end": 1.6, "start": 1.12, "text": "community!"}, {"end": 2.16, "start": 1.6, "text": "It"}, {"end": 2.56, "start": 2.16, "text": "is"}, {"end": 3.0, "start": 2.56, "text": "late"}, {"end": 3.2, "start": 3.0, "text": "in"}, {"end": 3.64, "start": 3.2, "text": "the"}, {"end": 4.0, "start": 3.64, "text": "night,"}, {"end": 4.12, "start": 4.0, "text": "but"}, {"end": 4.48, "start": 4.12, "text": "I"}, {"end": 4.76, "start": 4.48, "text": "just"}, {"end": 5.52, "start": 4.76, "text": "discovered"}, {"end": 6.16, "start": 5.52, "text": "tiny"}, {"end": 6.6, "start": 6.16, "text": "language"}, {"end": 7.04, "start": 6.6, "text": "models."}, {"end": 7.32, "start": 7.04, "text": "They"}, {"end": 7.68, "start": 7.32, "text": "are"}, {"end": 8.48, "start": 7.68, "text": "beautiful"}, {"end": 9.04, "start": 8.48, "text": "for"}, {"end": 9.4, "start": 9.04, "text": "new"}, {"end": 9.84, "start": 9.4, "text": "pre-training"}, {"end": 10.92, "start": 9.84, "text": "methodologies."}, {"end": 11.48, "start": 10.92, "text": "If"}, {"end": 11.72, "start": 11.48, "text": "you"}, {"end": 11.88, "start": 11.72, "text": "want,"}, {"end": 12.04, "start": 11.88, "text": "we"}, {"end": 12.28, "start": 12.04, "text": "have"}, {"end": 12.44, "start": 12.28, "text": "a"}, {"end": 12.52, "start": 12.44, "text": "look"}, {"end": 13.04, "start": 12.52, "text": "together"}, {"end": 13.32, "start": 13.04, "text": "at"}, {"end": 13.72, "start": 13.32, "text": "BERT"}, {"end": 14.48, "start": 13.72, "text": "architecture,"}, {"end": 14.6, "start": 14.48, "text": "at"}, {"end": 15.48, "start": 14.6, "text": "LAMA"}, {"end": 16.12, "start": 15.48, "text": "architecture,"}, {"end": 16.48, "start": 16.12, "text": "and"}, {"end": 16.8, "start": 16.48, "text": "at"}, {"end": 17.4, "start": 16.8, "text": "SNM"}, {"end": 17.8, "start": 17.4, "text": "and"}, {"end": 18.08, "start": 17.8, "text": "Mamba"}, {"end": 18.56, "start": 18.08, "text": "architecture,"}, {"end": 18.56, "start": 18.56, "text": "and"}, {"end": 19.2, "start": 18.56, "text": "we"}, {"end": 19.56, "start": 19.2, "text": "will"}, {"end": 19.92, "start": 19.56, "text": "focus"}, {"end": 19.92, "start": 19.92, "text": "on"}, {"end": 20.32, "start": 19.92, "text": "tiny"}, {"end": 20.84, "start": 20.32, "text": "language"}, {"end": 21.56, "start": 20.84, "text": "models."}, {"end": 21.96, "start": 21.56, "text": "And"}, {"end": 22.0, "start": 21.96, "text": "at"}, {"end": 22.28, "start": 22.0, "text": "the"}, {"end": 22.68, "start": 22.28, "text": "first,"}, {"end": 22.76, "start": 22.68, "text": "you"}, {"end": 23.04, "start": 22.76, "text": "know,"}, {"end": 23.16, "start": 23.04, "text": "I"}, {"end": 23.48, "start": 23.16, "text": "thought,"}, {"end": 23.96, "start": 23.48, "text": "hey,"}, {"end": 24.16, "start": 23.96, "text": "I"}, {"end": 24.52, "start": 24.16, "text": "work"}, {"end": 24.76, "start": 24.52, "text": "with"}, {"end": 25.24, "start": 24.76, "text": "DeepSea"}, {"end": 25.56, "start": 25.24, "text": "version"}, {"end": 26.32, "start": 25.56, "text": "3,"}, {"end": 26.56, "start": 26.32, "text": "and"}, {"end": 26.76, "start": 26.56, "text": "I"}, {"end": 27.0, "start": 26.76, "text": "love"}, {"end": 27.28, "start": 27.0, "text": "this"}, {"end": 27.6, "start": 27.28, "text": "model,"}, {"end": 27.88, "start": 27.6, "text": "but"}, {"end": 27.92, "start": 27.88, "text": "you"}, {"end": 28.24, "start": 27.92, "text": "know,"}, {"end": 28.48, "start": 28.24, "text": "it"}, {"end": 28.84, "start": 28.48, "text": "has"}, {"end": 29.28, "start": 28.84, "text": "close"}, {"end": 29.28, "start": 29.28, "text": "to"}, {"end": 29.96, "start": 29.28, "text": "700"}], "text": " Hello community! It is late in the night, but I just discovered tiny language models. They are beautiful for new pre-training methodologies. If you want, we have a look together at BERT architecture, at LAMA architecture, and at SNM and Mamba architecture, and we will focus on tiny language models. And at the first, you know, I thought, hey, I work with DeepSea version 3, and I love this model, but you know, it has close to 700"}, {"chunks": [{"end": 30.48, "start": 30.0, "text": "100"}, {"end": 30.96, "start": 30.48, "text": "billion"}, {"end": 31.28, "start": 30.96, "text": "free"}, {"end": 31.8, "start": 31.28, "text": "trainable"}, {"end": 32.52, "start": 31.8, "text": "parameters"}, {"end": 33.0, "start": 32.52, "text": "and"}, {"end": 33.4, "start": 33.0, "text": "I"}, {"end": 33.68, "start": 33.4, "text": "have"}, {"end": 34.08, "start": 33.68, "text": "no"}, {"end": 34.64, "start": 34.08, "text": "hardware"}, {"end": 35.24, "start": 34.64, "text": "architecture"}, {"end": 35.44, "start": 35.24, "text": "that"}, {"end": 35.68, "start": 35.44, "text": "I"}, {"end": 35.96, "start": 35.68, "text": "can"}, {"end": 36.96, "start": 35.96, "text": "experiment"}, {"end": 37.28, "start": 36.96, "text": "with"}, {"end": 37.64, "start": 37.28, "text": "the"}, {"end": 38.28, "start": 37.64, "text": "architecture"}, {"end": 39.88, "start": 38.28, "text": "itself."}, {"end": 40.12, "start": 39.88, "text": "So"}, {"end": 40.64, "start": 40.12, "text": "therefore"}, {"end": 41.0, "start": 40.64, "text": "we"}, {"end": 41.2, "start": 41.0, "text": "look"}, {"end": 41.56, "start": 41.2, "text": "now"}, {"end": 41.72, "start": 41.56, "text": "at"}, {"end": 42.16, "start": 41.72, "text": "models"}, {"end": 42.32, "start": 42.16, "text": "that"}, {"end": 42.44, "start": 42.32, "text": "are"}, {"end": 42.8, "start": 42.44, "text": "really"}, {"end": 44.480000000000004, "start": 42.8, "text": "tiny."}, {"end": 45.24, "start": 44.480000000000004, "text": "And"}, {"end": 45.64, "start": 45.24, "text": "I"}, {"end": 45.84, "start": 45.64, "text": "mean"}, {"end": 46.239999999999995, "start": 45.84, "text": "tiny,"}, {"end": 46.480000000000004, "start": 46.239999999999995, "text": "tiny"}, {"end": 46.92, "start": 46.480000000000004, "text": "models"}, {"end": 47.92, "start": 46.92, "text": "compared"}, {"end": 48.120000000000005, "start": 47.92, "text": "to"}, {"end": 48.28, "start": 48.120000000000005, "text": "this."}, {"end": 49.04, "start": 48.28, "text": "This"}, {"end": 49.44, "start": 49.04, "text": "is"}, {"end": 49.519999999999996, "start": 49.44, "text": "a"}, {"end": 50.72, "start": 49.519999999999996, "text": "14"}, {"end": 51.64, "start": 50.72, "text": "million"}, {"end": 52.64, "start": 51.64, "text": "model."}, {"end": 53.96, "start": 52.64, "text": "So"}, {"end": 54.519999999999996, "start": 53.96, "text": "however"}, {"end": 54.6, "start": 54.519999999999996, "text": "the"}, {"end": 55.239999999999995, "start": 54.6, "text": "size,"}, {"end": 55.36, "start": 55.239999999999995, "text": "I"}, {"end": 55.72, "start": 55.36, "text": "think"}, {"end": 56.16, "start": 55.72, "text": "it's"}, {"end": 56.16, "start": 56.16, "text": "a"}, {"end": 56.16, "start": 56.16, "text": "hot"}, {"end": 56.6, "start": 56.16, "text": "topic"}, {"end": 56.68, "start": 56.6, "text": "in"}, {"end": 56.84, "start": 56.68, "text": "the"}, {"end": 57.16, "start": 56.84, "text": "eye."}, {"end": 57.480000000000004, "start": 57.16, "text": "And"}, {"end": 57.64, "start": 57.480000000000004, "text": "you"}, {"end": 57.92, "start": 57.64, "text": "know"}, {"end": 58.92, "start": 57.92, "text": "why?"}, {"end": 59.519999999999996, "start": 58.92, "text": "Because"}, {"end": 59.56, "start": 59.519999999999996, "text": "the"}, {"end": 59.96, "start": 59.56, "text": "model"}], "text": " 100 billion free trainable parameters and I have no hardware architecture that I can experiment with the architecture itself. So therefore we look now at models that are really tiny. And I mean tiny, tiny models compared to this. This is a 14 million model. So however the size, I think it's a hot topic in the eye. And you know why? Because the model"}, {"chunks": [{"end": 60.2, "start": 60.0, "text": "Model"}, {"end": 60.64, "start": 60.2, "text": "size"}, {"end": 61.24, "start": 60.64, "text": "is"}, {"end": 61.36, "start": 61.24, "text": "a"}, {"end": 62.04, "start": 61.36, "text": "neglectable"}, {"end": 64.4, "start": 62.04, "text": "0.002%"}, {"end": 65.04, "start": 64.4, "text": "of"}, {"end": 65.2, "start": 65.04, "text": "a"}, {"end": 65.6, "start": 65.2, "text": "DeepSea"}, {"end": 65.84, "start": 65.6, "text": "version"}, {"end": 66.84, "start": 65.84, "text": "3."}, {"end": 67.2, "start": 66.84, "text": "But"}, {"end": 67.52, "start": 67.2, "text": "this"}, {"end": 67.84, "start": 67.52, "text": "is"}, {"end": 68.08, "start": 67.84, "text": "where"}, {"end": 68.24, "start": 68.08, "text": "we"}, {"end": 68.56, "start": 68.24, "text": "can"}, {"end": 69.2, "start": 68.56, "text": "experiment."}, {"end": 69.6, "start": 69.2, "text": "This"}, {"end": 70.0, "start": 69.6, "text": "is"}, {"end": 70.12, "start": 70.0, "text": "where"}, {"end": 70.12, "start": 70.12, "text": "we"}, {"end": 70.56, "start": 70.12, "text": "can"}, {"end": 70.92, "start": 70.56, "text": "find"}, {"end": 71.2, "start": 70.92, "text": "out"}, {"end": 71.6, "start": 71.2, "text": "the"}, {"end": 71.88, "start": 71.6, "text": "learning"}, {"end": 72.36, "start": 71.88, "text": "mechanisms"}, {"end": 72.6, "start": 72.36, "text": "and"}, {"end": 72.88, "start": 72.6, "text": "we"}, {"end": 73.36, "start": 72.88, "text": "can"}, {"end": 74.12, "start": 73.36, "text": "optimize"}, {"end": 74.28, "start": 74.12, "text": "here"}, {"end": 74.48, "start": 74.28, "text": "our"}, {"end": 75.08, "start": 74.48, "text": "training"}, {"end": 76.52, "start": 75.08, "text": "methodologies."}, {"end": 76.8, "start": 76.52, "text": "I"}, {"end": 76.92, "start": 76.8, "text": "have"}, {"end": 77.28, "start": 76.92, "text": "no"}, {"end": 78.0, "start": 77.28, "text": "way"}, {"end": 78.48, "start": 78.0, "text": "of"}, {"end": 78.84, "start": 78.48, "text": "finding"}, {"end": 79.08, "start": 78.84, "text": "out"}, {"end": 79.24, "start": 79.08, "text": "new"}, {"end": 79.64, "start": 79.24, "text": "ways"}, {"end": 79.88, "start": 79.64, "text": "for"}, {"end": 80.4, "start": 79.88, "text": "optimizing"}, {"end": 80.52, "start": 80.4, "text": "the"}, {"end": 80.92, "start": 80.52, "text": "training"}, {"end": 81.03999999999999, "start": 80.92, "text": "on"}, {"end": 81.4, "start": 81.03999999999999, "text": "a"}, {"end": 82.28, "start": 81.4, "text": "700"}, {"end": 82.96000000000001, "start": 82.28, "text": "billion"}, {"end": 83.12, "start": 82.96000000000001, "text": "free"}, {"end": 83.56, "start": 83.12, "text": "trainable"}, {"end": 84.12, "start": 83.56, "text": "parameter"}, {"end": 84.4, "start": 84.12, "text": "LLM."}, {"end": 86.4, "start": 84.4, "text": "But"}, {"end": 86.72, "start": 86.4, "text": "you"}, {"end": 87.16, "start": 86.72, "text": "know"}, {"end": 87.48, "start": 87.16, "text": "what?"}, {"end": 87.56, "start": 87.48, "text": "If"}, {"end": 87.8, "start": 87.56, "text": "you"}, {"end": 88.12, "start": 87.8, "text": "think"}, {"end": 88.4, "start": 88.12, "text": "about"}, {"end": 88.64, "start": 88.4, "text": "edge"}, {"end": 89.12, "start": 88.64, "text": "devices,"}, {"end": 89.44, "start": 89.12, "text": "and"}, {"end": 89.72, "start": 89.44, "text": "if"}, {"end": 89.96000000000001, "start": 89.72, "text": "you"}], "text": " Model size is a neglectable 0.002% of a DeepSea version 3. But this is where we can experiment. This is where we can find out the learning mechanisms and we can optimize here our training methodologies. I have no way of finding out new ways for optimizing the training on a 700 billion free trainable parameter LLM. But you know what? If you think about edge devices, and if you"}, {"chunks": [{"end": 90.2, "start": 90.0, "text": "that"}, {"end": 90.4, "start": 90.2, "text": "in"}, {"end": 90.76, "start": 90.4, "text": "some"}, {"end": 91.08, "start": 90.76, "text": "years"}, {"end": 91.28, "start": 91.08, "text": "maybe"}, {"end": 91.6, "start": 91.28, "text": "quite"}, {"end": 91.76, "start": 91.6, "text": "a"}, {"end": 92.04, "start": 91.76, "text": "lot"}, {"end": 92.2, "start": 92.04, "text": "of"}, {"end": 92.52, "start": 92.2, "text": "those"}, {"end": 92.68, "start": 92.52, "text": "little"}, {"end": 93.32, "start": 92.68, "text": "devices"}, {"end": 93.6, "start": 93.32, "text": "will"}, {"end": 94.32, "start": 93.6, "text": "have"}, {"end": 94.96, "start": 94.32, "text": "a"}, {"end": 95.88, "start": 94.96, "text": "rudimentary"}, {"end": 96.32, "start": 95.88, "text": "AI"}, {"end": 97.84, "start": 96.32, "text": "intelligence."}, {"end": 98.12, "start": 97.84, "text": "So"}, {"end": 98.4, "start": 98.12, "text": "why"}, {"end": 98.56, "start": 98.4, "text": "not"}, {"end": 98.92, "start": 98.56, "text": "think"}, {"end": 99.6, "start": 98.92, "text": "today"}, {"end": 99.96000000000001, "start": 99.6, "text": "already"}, {"end": 100.72, "start": 99.96000000000001, "text": "here"}, {"end": 101.12, "start": 100.72, "text": "on"}, {"end": 101.76, "start": 101.12, "text": "training"}, {"end": 101.96000000000001, "start": 101.76, "text": "here"}, {"end": 102.48, "start": 101.96000000000001, "text": "tiny"}, {"end": 102.84, "start": 102.48, "text": "little"}, {"end": 103.76, "start": 102.84, "text": "LLMs"}, {"end": 104.08, "start": 103.76, "text": "that"}, {"end": 104.36, "start": 104.08, "text": "still"}, {"end": 104.88, "start": 104.36, "text": "exhibit"}, {"end": 105.24, "start": 104.88, "text": "some"}, {"end": 105.96000000000001, "start": 105.24, "text": "robust"}, {"end": 106.44, "start": 105.96000000000001, "text": "linguistic"}, {"end": 107.24, "start": 106.44, "text": "understanding"}, {"end": 107.72, "start": 107.24, "text": "and"}, {"end": 108.12, "start": 107.72, "text": "some"}, {"end": 108.76, "start": 108.12, "text": "reasoning"}, {"end": 109.52, "start": 108.76, "text": "abilities."}, {"end": 109.84, "start": 109.52, "text": "And"}, {"end": 110.08, "start": 109.84, "text": "they"}, {"end": 110.2, "start": 110.08, "text": "are"}, {"end": 110.4, "start": 110.2, "text": "the"}, {"end": 110.8, "start": 110.4, "text": "perfect"}, {"end": 111.44, "start": 110.8, "text": "size"}, {"end": 111.52, "start": 111.44, "text": "and"}, {"end": 111.52, "start": 111.52, "text": "the"}, {"end": 112.03999999999999, "start": 111.52, "text": "perfect"}, {"end": 112.88, "start": 112.03999999999999, "text": "object"}, {"end": 113.36, "start": 112.88, "text": "to"}, {"end": 114.03999999999999, "start": 113.36, "text": "further"}, {"end": 115.36, "start": 114.03999999999999, "text": "experiment"}, {"end": 115.8, "start": 115.36, "text": "with"}, {"end": 118.56, "start": 115.8, "text": "optimization."}, {"end": 118.84, "start": 118.56, "text": "There's"}, {"end": 119.0, "start": 118.84, "text": "now"}, {"end": 119.16, "start": 119.0, "text": "a"}, {"end": 119.28, "start": 119.16, "text": "new"}, {"end": 119.6, "start": 119.28, "text": "research"}, {"end": 119.96000000000001, "start": 119.6, "text": "paper"}], "text": " that in some years maybe quite a lot of those little devices will have a rudimentary AI intelligence. So why not think today already here on training here tiny little LLMs that still exhibit some robust linguistic understanding and some reasoning abilities. And they are the perfect size and the perfect object to further experiment with optimization. There's now a new research paper"}, {"chunks": [{"end": 120.24, "start": 120.0, "text": "like"}, {"end": 120.32, "start": 120.24, "text": "to"}, {"end": 121.08, "start": 120.32, "text": "show"}, {"end": 121.24, "start": 121.08, "text": "you."}, {"end": 121.36, "start": 121.24, "text": "And"}, {"end": 121.68, "start": 121.36, "text": "they"}, {"end": 122.16, "start": 121.68, "text": "explore"}, {"end": 122.28, "start": 122.16, "text": "here"}, {"end": 122.8, "start": 122.28, "text": "creating"}, {"end": 123.76, "start": 122.8, "text": "simplified"}, {"end": 124.4, "start": 123.76, "text": "language"}, {"end": 125.24, "start": 124.4, "text": "environment"}, {"end": 125.76, "start": 125.24, "text": "to"}, {"end": 126.04, "start": 125.76, "text": "train"}, {"end": 126.72, "start": 126.04, "text": "particular"}, {"end": 127.08, "start": 126.72, "text": "those"}, {"end": 127.96, "start": 127.08, "text": "tiny,"}, {"end": 128.6, "start": 127.96, "text": "tiny"}, {"end": 129.0, "start": 128.6, "text": "language"}, {"end": 129.36, "start": 129.0, "text": "models."}, {"end": 129.48, "start": 129.36, "text": "And"}, {"end": 129.88, "start": 129.48, "text": "their"}, {"end": 130.28, "start": 129.88, "text": "idea"}, {"end": 130.8, "start": 130.28, "text": "is"}, {"end": 131.0, "start": 130.8, "text": "here"}, {"end": 132.04, "start": 131.0, "text": "simple,"}, {"end": 132.16, "start": 132.04, "text": "like"}, {"end": 132.8, "start": 132.16, "text": "children"}, {"end": 133.04, "start": 132.8, "text": "that"}, {"end": 133.24, "start": 133.04, "text": "learn"}, {"end": 133.44, "start": 133.24, "text": "a"}, {"end": 133.88, "start": 133.44, "text": "language,"}, {"end": 134.2, "start": 133.88, "text": "basic"}, {"end": 134.96, "start": 134.2, "text": "vocabulary,"}, {"end": 135.4, "start": 134.96, "text": "a"}, {"end": 135.72, "start": 135.4, "text": "simple"}, {"end": 136.32, "start": 135.72, "text": "syntax"}, {"end": 136.32, "start": 136.32, "text": "of"}, {"end": 136.32, "start": 136.32, "text": "the"}, {"end": 136.56, "start": 136.32, "text": "first"}, {"end": 136.88, "start": 136.56, "text": "words"}, {"end": 137.04, "start": 136.88, "text": "or"}, {"end": 137.2, "start": 137.04, "text": "the"}, {"end": 137.56, "start": 137.2, "text": "first"}, {"end": 138.2, "start": 137.56, "text": "sentences."}, {"end": 138.68, "start": 138.2, "text": "And"}, {"end": 139.28, "start": 138.68, "text": "those"}, {"end": 139.48, "start": 139.28, "text": "tiny"}, {"end": 140.44, "start": 139.48, "text": "LLMs,"}, {"end": 140.72, "start": 140.44, "text": "let's"}, {"end": 140.88, "start": 140.72, "text": "have"}, {"end": 141.0, "start": 140.88, "text": "a"}, {"end": 141.07999999999998, "start": 141.0, "text": "look"}, {"end": 141.24, "start": 141.07999999999998, "text": "if"}, {"end": 141.32, "start": 141.24, "text": "we"}, {"end": 141.76, "start": 141.32, "text": "can"}, {"end": 142.28, "start": 141.76, "text": "implement"}, {"end": 142.84, "start": 142.28, "text": "somehow"}, {"end": 143.24, "start": 142.84, "text": "this"}, {"end": 143.96, "start": 143.24, "text": "idea,"}, {"end": 144.32, "start": 143.96, "text": "this"}, {"end": 145.44, "start": 144.32, "text": "simplicity"}, {"end": 145.56, "start": 145.44, "text": "or"}, {"end": 146.48, "start": 145.56, "text": "simplified"}, {"end": 147.12, "start": 146.48, "text": "language"}, {"end": 148.52, "start": 147.12, "text": "environments"}, {"end": 149.0, "start": 148.52, "text": "also"}, {"end": 149.2, "start": 149.0, "text": "with"}, {"end": 149.96, "start": 149.2, "text": "them."}], "text": " like to show you. And they explore here creating simplified language environment to train particular those tiny, tiny language models. And their idea is here simple, like children that learn a language, basic vocabulary, a simple syntax of the first words or the first sentences. And those tiny LLMs, let's have a look if we can implement somehow this idea, this simplicity or simplified language environments also with them."}, {"chunks": [{"end": 151.6, "start": 150.0, "text": "Now"}, {"end": 151.96, "start": 151.6, "text": "it's"}, {"end": 153.0, "start": 151.96, "text": "interesting"}, {"end": 153.48, "start": 153.0, "text": "that"}, {"end": 153.64, "start": 153.48, "text": "the"}, {"end": 154.12, "start": 153.64, "text": "scaling"}, {"end": 154.52, "start": 154.12, "text": "down,"}, {"end": 155.08, "start": 154.52, "text": "this"}, {"end": 155.52, "start": 155.08, "text": "is"}, {"end": 155.88, "start": 155.52, "text": "a"}, {"end": 156.0, "start": 155.88, "text": "very"}, {"end": 156.04, "start": 156.0, "text": "sensitive"}, {"end": 156.64, "start": 156.04, "text": "topic."}, {"end": 156.84, "start": 156.64, "text": "So"}, {"end": 156.84, "start": 156.84, "text": "to"}, {"end": 157.12, "start": 156.84, "text": "achieve"}, {"end": 157.36, "start": 157.12, "text": "here"}, {"end": 157.6, "start": 157.36, "text": "the"}, {"end": 157.96, "start": 157.6, "text": "simpler"}, {"end": 158.76, "start": 157.96, "text": "environments,"}, {"end": 159.16, "start": 158.76, "text": "the"}, {"end": 159.56, "start": 159.16, "text": "authors,"}, {"end": 159.64, "start": 159.56, "text": "I'll"}, {"end": 160.04, "start": 159.64, "text": "show"}, {"end": 160.28, "start": 160.04, "text": "you"}, {"end": 160.56, "start": 160.28, "text": "in"}, {"end": 160.68, "start": 160.56, "text": "a"}, {"end": 160.8, "start": 160.68, "text": "minute,"}, {"end": 161.04, "start": 160.8, "text": "created"}, {"end": 161.28, "start": 161.04, "text": "here"}, {"end": 161.48, "start": 161.28, "text": "a"}, {"end": 161.6, "start": 161.48, "text": "leaner"}, {"end": 162.6, "start": 161.6, "text": "dataset."}, {"end": 162.76, "start": 162.6, "text": "So"}, {"end": 163.04, "start": 162.76, "text": "by"}, {"end": 163.52, "start": 163.04, "text": "revising"}, {"end": 164.0, "start": 163.52, "text": "existing"}, {"end": 164.4, "start": 164.0, "text": "test"}, {"end": 165.2, "start": 164.4, "text": "datasets"}, {"end": 165.24, "start": 165.2, "text": "to"}, {"end": 165.64, "start": 165.24, "text": "reduce"}, {"end": 165.64, "start": 165.64, "text": "the"}, {"end": 166.12, "start": 165.64, "text": "noise"}, {"end": 166.44, "start": 166.12, "text": "and"}, {"end": 166.84, "start": 166.44, "text": "limit"}, {"end": 167.04, "start": 166.84, "text": "the"}, {"end": 167.36, "start": 167.04, "text": "vocabulary"}, {"end": 167.8, "start": 167.36, "text": "size,"}, {"end": 168.2, "start": 167.8, "text": "we"}, {"end": 168.76, "start": 168.2, "text": "will"}, {"end": 168.96, "start": 168.76, "text": "work"}, {"end": 169.2, "start": 168.96, "text": "here"}, {"end": 169.48, "start": 169.2, "text": "with"}, {"end": 169.48, "start": 169.48, "text": "a"}, {"end": 169.96, "start": 169.48, "text": "vocabulary"}, {"end": 171.2, "start": 169.96, "text": "size"}, {"end": 171.52, "start": 171.2, "text": "of"}, {"end": 171.72, "start": 171.52, "text": "down"}, {"end": 172.12, "start": 171.72, "text": "to"}, {"end": 173.52, "start": 172.12, "text": "2000."}, {"end": 174.0, "start": 173.52, "text": "And"}, {"end": 174.24, "start": 174.0, "text": "our"}, {"end": 174.44, "start": 174.24, "text": "main"}, {"end": 174.8, "start": 174.44, "text": "task"}, {"end": 174.84, "start": 174.8, "text": "we"}, {"end": 175.04, "start": 174.84, "text": "will"}, {"end": 175.32, "start": 175.04, "text": "have"}, {"end": 175.44, "start": 175.32, "text": "to"}, {"end": 176.16, "start": 175.44, "text": "simplify"}, {"end": 176.72, "start": 176.16, "text": "complex"}, {"end": 177.16, "start": 176.72, "text": "ideas."}, {"end": 178.04, "start": 177.16, "text": "Because"}, {"end": 178.24, "start": 178.04, "text": "we"}, {"end": 178.84, "start": 178.24, "text": "are"}, {"end": 179.44, "start": 178.84, "text": "working"}, {"end": 179.68, "start": 179.44, "text": "with"}, {"end": 179.96, "start": 179.68, "text": "models"}], "text": " Now it's interesting that the scaling down, this is a very sensitive topic. So to achieve here the simpler environments, the authors, I'll show you in a minute, created here a leaner dataset. So by revising existing test datasets to reduce the noise and limit the vocabulary size, we will work here with a vocabulary size of down to 2000. And our main task we will have to simplify complex ideas. Because we are working with models"}, {"chunks": [{"end": 180.48, "start": 180.0, "text": "that"}, {"end": 180.56, "start": 180.48, "text": "are"}, {"end": 181.24, "start": 180.56, "text": "so"}, {"end": 181.8, "start": 181.24, "text": "tiny,"}, {"end": 182.12, "start": 181.8, "text": "it's"}, {"end": 182.48, "start": 182.12, "text": "no"}, {"end": 182.72, "start": 182.48, "text": "way"}, {"end": 182.88, "start": 182.72, "text": "that"}, {"end": 183.16, "start": 182.88, "text": "they"}, {"end": 183.48, "start": 183.16, "text": "have"}, {"end": 183.8, "start": 183.48, "text": "some"}, {"end": 184.6, "start": 183.8, "text": "complex"}, {"end": 185.24, "start": 184.6, "text": "ideas"}, {"end": 185.36, "start": 185.24, "text": "that"}, {"end": 185.6, "start": 185.36, "text": "they"}, {"end": 186.12, "start": 185.6, "text": "can"}, {"end": 186.72, "start": 186.12, "text": "calculate."}, {"end": 186.96, "start": 186.72, "text": "So"}, {"end": 187.64, "start": 186.96, "text": "I"}, {"end": 187.8, "start": 187.64, "text": "will"}, {"end": 188.12, "start": 187.8, "text": "show"}, {"end": 188.36, "start": 188.12, "text": "you"}, {"end": 188.48, "start": 188.36, "text": "then"}, {"end": 188.48, "start": 188.48, "text": "the"}, {"end": 189.0, "start": 188.48, "text": "prompt"}, {"end": 189.04, "start": 189.0, "text": "at"}, {"end": 189.28, "start": 189.04, "text": "the"}, {"end": 189.4, "start": 189.28, "text": "end"}, {"end": 189.4, "start": 189.4, "text": "of"}, {"end": 189.6, "start": 189.4, "text": "this"}, {"end": 189.96, "start": 189.6, "text": "video,"}, {"end": 190.2, "start": 189.96, "text": "how"}, {"end": 190.44, "start": 190.2, "text": "we"}, {"end": 190.68, "start": 190.44, "text": "do"}, {"end": 191.24, "start": 190.68, "text": "exactly"}, {"end": 191.48, "start": 191.24, "text": "here"}, {"end": 191.72, "start": 191.48, "text": "the"}, {"end": 192.44, "start": 191.72, "text": "downscaling,"}, {"end": 192.6, "start": 192.44, "text": "the"}, {"end": 193.44, "start": 192.6, "text": "simplification"}, {"end": 193.56, "start": 193.44, "text": "of"}, {"end": 194.16, "start": 193.56, "text": "complex"}, {"end": 194.4, "start": 194.16, "text": "ideas."}, {"end": 194.68, "start": 194.4, "text": "But"}, {"end": 194.68, "start": 194.68, "text": "for"}, {"end": 194.72, "start": 194.68, "text": "the"}, {"end": 195.16, "start": 194.72, "text": "moment,"}, {"end": 195.52, "start": 195.16, "text": "let's"}, {"end": 196.07999999999998, "start": 195.52, "text": "focus"}, {"end": 196.12, "start": 196.07999999999998, "text": "on"}, {"end": 196.24, "start": 196.12, "text": "the"}, {"end": 196.92000000000002, "start": 196.24, "text": "methodology"}, {"end": 199.04, "start": 196.92000000000002, "text": "itself."}, {"end": 199.4, "start": 199.04, "text": "For"}, {"end": 199.8, "start": 199.4, "text": "the"}, {"end": 200.2, "start": 199.8, "text": "data,"}, {"end": 200.28, "start": 200.2, "text": "it"}, {"end": 200.44, "start": 200.28, "text": "is"}, {"end": 201.07999999999998, "start": 200.44, "text": "relatively"}, {"end": 201.36, "start": 201.07999999999998, "text": "easy,"}, {"end": 201.44, "start": 201.36, "text": "and"}, {"end": 201.52, "start": 201.44, "text": "I'll"}, {"end": 201.68, "start": 201.52, "text": "show"}, {"end": 201.8, "start": 201.68, "text": "you"}, {"end": 201.92000000000002, "start": 201.8, "text": "the"}, {"end": 202.56, "start": 201.92000000000002, "text": "prompt"}, {"end": 203.0, "start": 202.56, "text": "that"}, {"end": 203.24, "start": 203.0, "text": "we"}, {"end": 203.84, "start": 203.24, "text": "have,"}, {"end": 204.44, "start": 203.84, "text": "DeepSeq,"}, {"end": 204.64, "start": 204.44, "text": "for"}, {"end": 205.16, "start": 204.64, "text": "example,"}, {"end": 205.68, "start": 205.16, "text": "to"}, {"end": 206.12, "start": 205.68, "text": "revise"}, {"end": 206.24, "start": 206.12, "text": "the"}, {"end": 206.6, "start": 206.24, "text": "data"}, {"end": 206.76, "start": 206.6, "text": "set"}, {"end": 206.84, "start": 206.76, "text": "and"}, {"end": 207.12, "start": 206.84, "text": "create"}, {"end": 207.68, "start": 207.12, "text": "those"}, {"end": 207.88, "start": 207.68, "text": "simpler"}, {"end": 208.0, "start": 207.88, "text": "data"}, {"end": 208.32, "start": 208.0, "text": "set."}, {"end": 208.44, "start": 208.32, "text": "And"}, {"end": 208.52, "start": 208.44, "text": "of"}, {"end": 208.88, "start": 208.52, "text": "course,"}, {"end": 209.24, "start": 208.88, "text": "we"}, {"end": 209.96, "start": 209.24, "text": "can"}], "text": " that are so tiny, it's no way that they have some complex ideas that they can calculate. So I will show you then the prompt at the end of this video, how we do exactly here the downscaling, the simplification of complex ideas. But for the moment, let's focus on the methodology itself. For the data, it is relatively easy, and I'll show you the prompt that we have, DeepSeq, for example, to revise the data set and create those simpler data set. And of course, we can"}, {"chunks": [{"end": 210.24, "start": 210.0, "text": "make"}, {"end": 210.36, "start": 210.24, "text": "it"}, {"end": 210.64, "start": 210.36, "text": "much"}, {"end": 210.8, "start": 210.64, "text": "more"}, {"end": 211.4, "start": 210.8, "text": "interesting"}, {"end": 211.52, "start": 211.4, "text": "if"}, {"end": 211.76, "start": 211.52, "text": "we"}, {"end": 212.12, "start": 211.76, "text": "have"}, {"end": 212.52, "start": 212.12, "text": "now"}, {"end": 212.72, "start": 212.52, "text": "this"}, {"end": 213.0, "start": 212.72, "text": "open"}, {"end": 214.2, "start": 213.0, "text": "architecture"}, {"end": 214.36, "start": 214.2, "text": "by"}, {"end": 215.0, "start": 214.36, "text": "integrating"}, {"end": 215.24, "start": 215.0, "text": "now"}, {"end": 215.32, "start": 215.24, "text": "what"}, {"end": 215.56, "start": 215.32, "text": "is"}, {"end": 215.8, "start": 215.56, "text": "called"}, {"end": 215.88, "start": 215.8, "text": "a"}, {"end": 216.56, "start": 215.88, "text": "curriculum"}, {"end": 217.76, "start": 216.56, "text": "learning."}, {"end": 218.0, "start": 217.76, "text": "Very"}, {"end": 218.44, "start": 218.0, "text": "simple"}, {"end": 218.6, "start": 218.44, "text": "idea,"}, {"end": 218.76, "start": 218.6, "text": "the"}, {"end": 218.92, "start": 218.76, "text": "model"}, {"end": 219.12, "start": 218.92, "text": "is"}, {"end": 219.48, "start": 219.12, "text": "first"}, {"end": 219.76, "start": 219.48, "text": "trained"}, {"end": 219.96, "start": 219.76, "text": "on"}, {"end": 220.32, "start": 219.96, "text": "some"}, {"end": 220.92, "start": 220.32, "text": "simple"}, {"end": 221.84, "start": 220.92, "text": "non-complex"}, {"end": 222.2, "start": 221.84, "text": "data"}, {"end": 222.48, "start": 222.2, "text": "and"}, {"end": 222.8, "start": 222.48, "text": "then"}, {"end": 223.04, "start": 222.8, "text": "we"}, {"end": 223.84, "start": 223.04, "text": "gradually"}, {"end": 224.48, "start": 223.84, "text": "expose"}, {"end": 224.64, "start": 224.48, "text": "the"}, {"end": 224.96, "start": 224.64, "text": "model"}, {"end": 225.04, "start": 224.96, "text": "in"}, {"end": 225.2, "start": 225.04, "text": "the"}, {"end": 225.6, "start": 225.2, "text": "training"}, {"end": 225.8, "start": 225.6, "text": "phase,"}, {"end": 226.32, "start": 225.8, "text": "especially"}, {"end": 226.4, "start": 226.32, "text": "in"}, {"end": 226.72, "start": 226.4, "text": "the"}, {"end": 227.12, "start": 226.72, "text": "pre-training"}, {"end": 227.48, "start": 227.12, "text": "phase,"}, {"end": 228.07999999999998, "start": 227.48, "text": "to"}, {"end": 228.48, "start": 228.07999999999998, "text": "more"}, {"end": 229.28, "start": 228.48, "text": "complex"}, {"end": 229.48, "start": 229.28, "text": "data,"}, {"end": 229.76, "start": 229.48, "text": "to"}, {"end": 230.2, "start": 229.76, "text": "a"}, {"end": 230.72, "start": 230.2, "text": "higher"}, {"end": 231.44, "start": 230.72, "text": "complexity"}, {"end": 231.52, "start": 231.44, "text": "in"}, {"end": 231.8, "start": 231.52, "text": "the"}, {"end": 232.24, "start": 231.8, "text": "reasoning"}, {"end": 232.84, "start": 232.24, "text": "structure"}, {"end": 233.44, "start": 232.84, "text": "of"}, {"end": 233.92000000000002, "start": 233.44, "text": "our"}, {"end": 235.16, "start": 233.92000000000002, "text": "informations."}, {"end": 235.56, "start": 235.16, "text": "And"}, {"end": 236.16, "start": 235.56, "text": "then"}, {"end": 236.16, "start": 236.16, "text": "of"}, {"end": 236.76, "start": 236.16, "text": "course"}, {"end": 236.84, "start": 236.76, "text": "we"}, {"end": 237.04, "start": 236.84, "text": "can"}, {"end": 237.48, "start": 237.04, "text": "focus"}, {"end": 237.76, "start": 237.48, "text": "now"}, {"end": 237.76, "start": 237.76, "text": "on"}, {"end": 238.04, "start": 237.76, "text": "training"}, {"end": 238.32, "start": 238.04, "text": "these"}, {"end": 238.72, "start": 238.32, "text": "tiny"}, {"end": 239.36, "start": 238.72, "text": "LLMs"}, {"end": 239.68, "start": 239.36, "text": "and"}, {"end": 239.96, "start": 239.68, "text": "we"}], "text": " make it much more interesting if we have now this open architecture by integrating now what is called a curriculum learning. Very simple idea, the model is first trained on some simple non-complex data and then we gradually expose the model in the training phase, especially in the pre-training phase, to more complex data, to a higher complexity in the reasoning structure of our informations. And then of course we can focus now on training these tiny LLMs and we"}, {"chunks": [{"end": 240.52, "start": 240.0, "text": "implement"}, {"end": 241.2, "start": 240.52, "text": "now"}, {"end": 241.84, "start": 241.2, "text": "instruction"}, {"end": 242.44, "start": 241.84, "text": "following"}, {"end": 243.28, "start": 242.44, "text": "procedures."}, {"end": 243.52, "start": 243.28, "text": "And"}, {"end": 243.8, "start": 243.52, "text": "this"}, {"end": 244.16, "start": 243.8, "text": "is"}, {"end": 244.36, "start": 244.16, "text": "what"}, {"end": 244.52, "start": 244.36, "text": "we"}, {"end": 244.64, "start": 244.52, "text": "need"}, {"end": 245.04, "start": 244.64, "text": "to"}, {"end": 245.32, "start": 245.04, "text": "build"}, {"end": 246.0, "start": 245.32, "text": "autonomous"}, {"end": 246.48, "start": 246.0, "text": "agent."}, {"end": 246.8, "start": 246.48, "text": "And"}, {"end": 247.16, "start": 246.8, "text": "as"}, {"end": 247.16, "start": 247.16, "text": "I"}, {"end": 247.44, "start": 247.16, "text": "told"}, {"end": 247.72, "start": 247.44, "text": "you,"}, {"end": 248.2, "start": 247.72, "text": "those"}, {"end": 248.52, "start": 248.2, "text": "edge"}, {"end": 249.2, "start": 248.52, "text": "devices,"}, {"end": 249.48, "start": 249.2, "text": "and"}, {"end": 249.6, "start": 249.48, "text": "you"}, {"end": 249.76, "start": 249.6, "text": "see"}, {"end": 250.32, "start": 249.76, "text": "there's"}, {"end": 250.52, "start": 250.32, "text": "somebody"}, {"end": 252.08, "start": 250.52, "text": "missing"}, {"end": 252.44, "start": 252.08, "text": "here."}, {"end": 252.88, "start": 252.44, "text": "If"}, {"end": 253.28, "start": 252.88, "text": "we"}, {"end": 253.92, "start": 253.28, "text": "succeed"}, {"end": 254.32, "start": 253.92, "text": "to"}, {"end": 254.68, "start": 254.32, "text": "have"}, {"end": 255.4, "start": 254.68, "text": "tiny"}, {"end": 255.88, "start": 255.4, "text": "models"}, {"end": 256.4, "start": 255.88, "text": "here,"}, {"end": 256.88, "start": 256.4, "text": "almost"}, {"end": 257.12, "start": 256.88, "text": "in"}, {"end": 257.64, "start": 257.12, "text": "every"}, {"end": 257.88, "start": 257.64, "text": "gadget"}, {"end": 258.12, "start": 257.88, "text": "that"}, {"end": 258.36, "start": 258.12, "text": "is"}, {"end": 258.6, "start": 258.36, "text": "out"}, {"end": 259.36, "start": 258.6, "text": "there,"}, {"end": 259.52, "start": 259.36, "text": "we"}, {"end": 259.72, "start": 259.52, "text": "have"}, {"end": 259.84, "start": 259.72, "text": "to"}, {"end": 260.04, "start": 259.84, "text": "make"}, {"end": 260.52, "start": 260.04, "text": "sure"}, {"end": 260.72, "start": 260.52, "text": "that"}, {"end": 261.04, "start": 260.72, "text": "this"}, {"end": 261.76, "start": 261.04, "text": "intelligence"}, {"end": 262.04, "start": 261.76, "text": "is"}, {"end": 262.76, "start": 262.04, "text": "reliable,"}, {"end": 262.84, "start": 262.76, "text": "that"}, {"end": 262.88, "start": 262.84, "text": "it"}, {"end": 263.08, "start": 262.88, "text": "is"}, {"end": 264.32, "start": 263.08, "text": "performant."}, {"end": 264.52, "start": 264.32, "text": "So"}, {"end": 265.32, "start": 264.52, "text": "building"}, {"end": 265.76, "start": 265.32, "text": "here"}, {"end": 266.44, "start": 265.76, "text": "autonomous"}, {"end": 266.6, "start": 266.44, "text": "agent,"}, {"end": 266.8, "start": 266.6, "text": "this"}, {"end": 267.12, "start": 266.8, "text": "is"}, {"end": 267.52, "start": 267.12, "text": "a"}, {"end": 267.84, "start": 267.52, "text": "real"}, {"end": 268.36, "start": 267.84, "text": "challenge"}, {"end": 268.72, "start": 268.36, "text": "to"}, {"end": 269.04, "start": 268.72, "text": "make"}, {"end": 269.24, "start": 269.04, "text": "them"}, {"end": 269.96, "start": 269.24, "text": "simple,"}], "text": " implement now instruction following procedures. And this is what we need to build autonomous agent. And as I told you, those edge devices, and you see there's somebody missing here. If we succeed to have tiny models here, almost in every gadget that is out there, we have to make sure that this intelligence is reliable, that it is performant. So building here autonomous agent, this is a real challenge to make them simple,"}, {"chunks": [{"end": 271.0, "start": 270.0, "text": "High-performant,"}, {"end": 271.96, "start": 271.0, "text": "cheap"}, {"end": 272.32, "start": 271.96, "text": "to"}, {"end": 272.92, "start": 272.32, "text": "implement,"}, {"end": 273.6, "start": 272.92, "text": "simple"}, {"end": 273.76, "start": 273.6, "text": "to"}, {"end": 274.48, "start": 273.76, "text": "maintain"}, {"end": 274.84, "start": 274.48, "text": "and"}, {"end": 274.96, "start": 274.84, "text": "to"}, {"end": 275.6, "start": 274.96, "text": "coordinate"}, {"end": 275.8, "start": 275.6, "text": "here"}, {"end": 275.96, "start": 275.8, "text": "in"}, {"end": 276.08, "start": 275.96, "text": "a"}, {"end": 278.0, "start": 276.08, "text": "swarm."}, {"end": 278.72, "start": 278.0, "text": "So"}, {"end": 278.8, "start": 278.72, "text": "we"}, {"end": 279.32, "start": 278.8, "text": "also"}, {"end": 279.56, "start": 279.32, "text": "will"}, {"end": 279.92, "start": 279.56, "text": "go"}, {"end": 280.24, "start": 279.92, "text": "to"}, {"end": 280.48, "start": 280.24, "text": "the"}, {"end": 280.72, "start": 280.48, "text": "step"}, {"end": 280.72, "start": 280.72, "text": "where"}, {"end": 280.8, "start": 280.72, "text": "we"}, {"end": 280.88, "start": 280.8, "text": "say,"}, {"end": 280.96, "start": 280.88, "text": "okay,"}, {"end": 280.96, "start": 280.96, "text": "we"}, {"end": 281.2, "start": 280.96, "text": "will"}, {"end": 281.56, "start": 281.2, "text": "focus"}, {"end": 281.68, "start": 281.56, "text": "now"}, {"end": 281.96, "start": 281.68, "text": "to"}, {"end": 282.36, "start": 281.96, "text": "build"}, {"end": 283.68, "start": 282.36, "text": "self-evolving"}, {"end": 284.64, "start": 283.68, "text": "engines"}, {"end": 285.76, "start": 284.64, "text": "because"}, {"end": 286.32, "start": 285.76, "text": "nothing"}, {"end": 286.64, "start": 286.32, "text": "worse"}, {"end": 286.84, "start": 286.64, "text": "than"}, {"end": 286.96, "start": 286.84, "text": "an"}, {"end": 287.36, "start": 286.96, "text": "agent"}, {"end": 287.4, "start": 287.36, "text": "that"}, {"end": 287.84, "start": 287.4, "text": "has"}, {"end": 288.24, "start": 287.84, "text": "no"}, {"end": 288.64, "start": 288.24, "text": "real"}, {"end": 289.84, "start": 288.64, "text": "possibility"}, {"end": 290.24, "start": 289.84, "text": "to"}, {"end": 290.84, "start": 290.24, "text": "advance"}, {"end": 291.32, "start": 290.84, "text": "itself,"}, {"end": 291.76, "start": 291.32, "text": "to"}, {"end": 292.08, "start": 291.76, "text": "learn"}, {"end": 292.28, "start": 292.08, "text": "new"}, {"end": 293.36, "start": 292.28, "text": "stuff."}, {"end": 293.6, "start": 293.36, "text": "So"}, {"end": 294.68, "start": 293.6, "text": "self-evolving"}, {"end": 295.08, "start": 294.68, "text": "agents"}, {"end": 295.32, "start": 295.08, "text": "that"}, {"end": 295.32, "start": 295.32, "text": "are"}, {"end": 295.68, "start": 295.32, "text": "based"}, {"end": 295.8, "start": 295.68, "text": "on"}, {"end": 296.4, "start": 295.8, "text": "tiny"}, {"end": 297.56, "start": 296.4, "text": "LLMs,"}, {"end": 297.96, "start": 297.56, "text": "this"}, {"end": 298.28, "start": 297.96, "text": "is"}, {"end": 298.52, "start": 298.28, "text": "an"}, {"end": 299.08, "start": 298.52, "text": "absolute"}, {"end": 299.64, "start": 299.08, "text": "fascinating"}, {"end": 299.96, "start": 299.64, "text": "topic."}], "text": " High-performant, cheap to implement, simple to maintain and to coordinate here in a swarm. So we also will go to the step where we say, okay, we will focus now to build self-evolving engines because nothing worse than an agent that has no real possibility to advance itself, to learn new stuff. So self-evolving agents that are based on tiny LLMs, this is an absolute fascinating topic."}, {"chunks": [{"end": 300.6, "start": 300.0, "text": "And"}, {"end": 300.8, "start": 300.6, "text": "up"}, {"end": 301.4, "start": 300.8, "text": "until"}, {"end": 301.72, "start": 301.4, "text": "now,"}, {"end": 302.08, "start": 301.72, "text": "I"}, {"end": 302.64, "start": 302.08, "text": "ignored"}, {"end": 303.2, "start": 302.64, "text": "tiny"}, {"end": 303.76, "start": 303.2, "text": "LLMs"}, {"end": 303.92, "start": 303.76, "text": "on"}, {"end": 304.08, "start": 303.92, "text": "my"}, {"end": 304.68, "start": 304.08, "text": "channel."}, {"end": 305.08, "start": 304.68, "text": "And"}, {"end": 305.32, "start": 305.08, "text": "I"}, {"end": 305.48, "start": 305.32, "text": "would"}, {"end": 305.68, "start": 305.48, "text": "like"}, {"end": 305.8, "start": 305.68, "text": "to"}, {"end": 306.12, "start": 305.8, "text": "correct"}, {"end": 307.96, "start": 306.12, "text": "this."}, {"end": 308.32, "start": 307.96, "text": "Now,"}, {"end": 308.6, "start": 308.32, "text": "you"}, {"end": 308.84, "start": 308.6, "text": "might"}, {"end": 309.24, "start": 308.84, "text": "say,"}, {"end": 309.36, "start": 309.24, "text": "and"}, {"end": 309.6, "start": 309.36, "text": "I"}, {"end": 309.96, "start": 309.6, "text": "thought"}, {"end": 310.12, "start": 309.96, "text": "the"}, {"end": 310.6, "start": 310.12, "text": "same,"}, {"end": 311.08, "start": 310.6, "text": "that,"}, {"end": 311.28, "start": 311.08, "text": "okay,"}, {"end": 311.6, "start": 311.28, "text": "we"}, {"end": 311.92, "start": 311.6, "text": "just"}, {"end": 312.4, "start": 311.92, "text": "use"}, {"end": 312.72, "start": 312.4, "text": "a"}, {"end": 313.2, "start": 312.72, "text": "huge"}, {"end": 313.56, "start": 313.2, "text": "EI"}, {"end": 313.96, "start": 313.56, "text": "system,"}, {"end": 314.8, "start": 313.96, "text": "01,"}, {"end": 315.6, "start": 314.8, "text": "03,"}, {"end": 315.96, "start": 315.6, "text": "to"}, {"end": 316.52, "start": 315.96, "text": "train"}, {"end": 316.92, "start": 316.52, "text": "our"}, {"end": 317.36, "start": 316.92, "text": "tiny"}, {"end": 317.96, "start": 317.36, "text": "LLMs."}, {"end": 318.04, "start": 317.96, "text": "And"}, {"end": 318.44, "start": 318.04, "text": "you"}, {"end": 318.68, "start": 318.44, "text": "know,"}, {"end": 318.8, "start": 318.68, "text": "we"}, {"end": 319.68, "start": 318.8, "text": "have,"}, {"end": 319.88, "start": 319.68, "text": "given"}, {"end": 320.24, "start": 319.88, "text": "history,"}, {"end": 320.44, "start": 320.24, "text": "a"}, {"end": 320.8, "start": 320.44, "text": "lot"}, {"end": 320.96, "start": 320.8, "text": "of"}, {"end": 322.16, "start": 320.96, "text": "methodologies."}, {"end": 322.64, "start": 322.16, "text": "Knowledge"}, {"end": 323.36, "start": 322.64, "text": "distillation,"}, {"end": 323.68, "start": 323.36, "text": "model"}, {"end": 324.28, "start": 323.68, "text": "compression,"}, {"end": 325.56, "start": 324.28, "text": "teacher-student"}, {"end": 326.52, "start": 325.56, "text": "learning,"}, {"end": 326.88, "start": 326.52, "text": "model"}, {"end": 327.6, "start": 326.88, "text": "transfer."}, {"end": 327.92, "start": 327.6, "text": "A"}, {"end": 328.16, "start": 327.92, "text": "real"}, {"end": 328.44, "start": 328.16, "text": "nice"}, {"end": 328.52, "start": 328.44, "text": "one"}, {"end": 328.8, "start": 328.52, "text": "is"}, {"end": 328.88, "start": 328.8, "text": "the"}, {"end": 329.2, "start": 328.88, "text": "supervised"}, {"end": 329.56, "start": 329.2, "text": "fine-tuning"}, {"end": 329.6, "start": 329.56, "text": "with"}, {"end": 329.96, "start": 329.6, "text": "LLMs."}], "text": " And up until now, I ignored tiny LLMs on my channel. And I would like to correct this. Now, you might say, and I thought the same, that, okay, we just use a huge EI system, 01, 03, to train our tiny LLMs. And you know, we have, given history, a lot of methodologies. Knowledge distillation, model compression, teacher-student learning, model transfer. A real nice one is the supervised fine-tuning with LLMs."}, {"chunks": [{"end": 330.32, "start": 330.0, "text": "LLM"}, {"end": 331.12, "start": 330.32, "text": "supervision,"}, {"end": 331.24, "start": 331.12, "text": "or"}, {"end": 331.44, "start": 331.24, "text": "you"}, {"end": 331.64, "start": 331.44, "text": "just"}, {"end": 331.68, "start": 331.64, "text": "go"}, {"end": 332.08, "start": 331.68, "text": "with"}, {"end": 332.08, "start": 332.08, "text": "a"}, {"end": 332.6, "start": 332.08, "text": "proxy"}, {"end": 332.8, "start": 332.6, "text": "model"}, {"end": 333.28, "start": 332.8, "text": "training"}, {"end": 334.68, "start": 333.28, "text": "procedure."}, {"end": 334.88, "start": 334.68, "text": "But"}, {"end": 335.84, "start": 334.88, "text": "interestingly,"}, {"end": 336.08, "start": 335.84, "text": "reading"}, {"end": 336.36, "start": 336.08, "text": "now"}, {"end": 336.96, "start": 336.36, "text": "these"}, {"end": 337.12, "start": 336.96, "text": "new"}, {"end": 337.56, "start": 337.12, "text": "papers,"}, {"end": 337.68, "start": 337.56, "text": "I"}, {"end": 338.28, "start": 337.68, "text": "thought,"}, {"end": 338.68, "start": 338.28, "text": "this"}, {"end": 338.96, "start": 338.68, "text": "is"}, {"end": 339.24, "start": 338.96, "text": "not"}, {"end": 339.36, "start": 339.24, "text": "the"}, {"end": 339.6, "start": 339.36, "text": "way"}, {"end": 340.12, "start": 339.6, "text": "forward,"}, {"end": 340.24, "start": 340.12, "text": "you"}, {"end": 341.44, "start": 340.24, "text": "know."}, {"end": 342.32, "start": 341.44, "text": "However,"}, {"end": 343.16, "start": 342.32, "text": "interestingly,"}, {"end": 343.32, "start": 343.16, "text": "there"}, {"end": 343.64, "start": 343.32, "text": "was"}, {"end": 343.76, "start": 343.64, "text": "an"}, {"end": 343.88, "start": 343.76, "text": "order"}, {"end": 344.04, "start": 343.88, "text": "that"}, {"end": 344.24, "start": 344.04, "text": "said,"}, {"end": 344.44, "start": 344.24, "text": "hey,"}, {"end": 344.84, "start": 344.44, "text": "training"}, {"end": 344.92, "start": 344.84, "text": "the"}, {"end": 345.32, "start": 344.92, "text": "tiny"}, {"end": 345.76, "start": 345.32, "text": "LLMs"}, {"end": 345.96, "start": 345.76, "text": "now"}, {"end": 346.12, "start": 345.96, "text": "on"}, {"end": 347.04, "start": 346.12, "text": "our"}, {"end": 347.6, "start": 347.04, "text": "leaner"}, {"end": 347.84, "start": 347.6, "text": "data"}, {"end": 348.2, "start": 347.84, "text": "set"}, {"end": 348.52, "start": 348.2, "text": "with"}, {"end": 348.6, "start": 348.52, "text": "a"}, {"end": 349.04, "start": 348.6, "text": "reduced"}, {"end": 350.32, "start": 349.04, "text": "complexity"}, {"end": 350.8, "start": 350.32, "text": "will"}, {"end": 351.4, "start": 350.8, "text": "enhance"}, {"end": 351.52, "start": 351.4, "text": "now"}, {"end": 351.72, "start": 351.52, "text": "the"}, {"end": 352.12, "start": 351.72, "text": "learning"}, {"end": 353.28, "start": 352.12, "text": "efficiency"}, {"end": 353.56, "start": 353.28, "text": "and"}, {"end": 353.84, "start": 353.56, "text": "allow"}, {"end": 354.04, "start": 353.84, "text": "our"}, {"end": 354.52, "start": 354.04, "text": "tiny"}, {"end": 355.12, "start": 354.52, "text": "LLMs"}, {"end": 356.08, "start": 355.12, "text": "to"}, {"end": 356.92, "start": 356.08, "text": "perform"}, {"end": 357.28, "start": 356.92, "text": "better"}, {"end": 357.56, "start": 357.28, "text": "on"}, {"end": 358.0, "start": 357.56, "text": "downstream"}, {"end": 359.8, "start": 358.0, "text": "tasks."}, {"end": 359.96, "start": 359.8, "text": "So,"}], "text": " LLM supervision, or you just go with a proxy model training procedure. But interestingly, reading now these new papers, I thought, this is not the way forward, you know. However, interestingly, there was an order that said, hey, training the tiny LLMs now on our leaner data set with a reduced complexity will enhance now the learning efficiency and allow our tiny LLMs to perform better on downstream tasks. So,"}, {"chunks": [{"end": 360.2, "start": 360.0, "text": "Those"}, {"end": 360.68, "start": 360.2, "text": "tiny"}, {"end": 361.24, "start": 360.68, "text": "LLMs"}, {"end": 361.36, "start": 361.24, "text": "are"}, {"end": 361.56, "start": 361.36, "text": "now"}, {"end": 362.0, "start": 361.56, "text": "trained"}, {"end": 362.2, "start": 362.0, "text": "on"}, {"end": 362.72, "start": 362.2, "text": "leaner"}, {"end": 362.84, "start": 362.72, "text": "data"}, {"end": 363.08, "start": 362.84, "text": "set,"}, {"end": 363.96, "start": 363.08, "text": "and"}, {"end": 364.2, "start": 363.96, "text": "the"}, {"end": 364.48, "start": 364.2, "text": "hope"}, {"end": 364.68, "start": 364.48, "text": "is"}, {"end": 364.68, "start": 364.68, "text": "that"}, {"end": 364.84, "start": 364.68, "text": "they"}, {"end": 365.04, "start": 364.84, "text": "can"}, {"end": 365.24, "start": 365.04, "text": "achieve"}, {"end": 365.72, "start": 365.24, "text": "similar"}, {"end": 365.88, "start": 365.72, "text": "or"}, {"end": 366.28, "start": 365.88, "text": "even"}, {"end": 366.64, "start": 366.28, "text": "better"}, {"end": 367.2, "start": 366.64, "text": "performance"}, {"end": 367.36, "start": 367.2, "text": "for"}, {"end": 367.76, "start": 367.36, "text": "specific"}, {"end": 368.28, "start": 367.76, "text": "tasks"}, {"end": 368.4, "start": 368.28, "text": "that"}, {"end": 368.56, "start": 368.4, "text": "are"}, {"end": 369.04, "start": 368.56, "text": "for"}, {"end": 369.44, "start": 369.04, "text": "those"}, {"end": 369.68, "start": 369.44, "text": "edge"}, {"end": 370.24, "start": 369.68, "text": "devices"}, {"end": 370.8, "start": 370.24, "text": "designed,"}, {"end": 371.84, "start": 370.8, "text": "compared"}, {"end": 372.08, "start": 371.84, "text": "if"}, {"end": 372.16, "start": 372.08, "text": "you"}, {"end": 372.36, "start": 372.16, "text": "would"}, {"end": 372.56, "start": 372.36, "text": "put"}, {"end": 372.8, "start": 372.56, "text": "there"}, {"end": 372.88, "start": 372.8, "text": "a"}, {"end": 373.64, "start": 372.88, "text": "complex"}, {"end": 374.0, "start": 373.64, "text": "LLM"}, {"end": 374.24, "start": 374.0, "text": "with"}, {"end": 374.28, "start": 374.24, "text": "an"}, {"end": 374.64, "start": 374.28, "text": "internet"}, {"end": 375.2, "start": 374.64, "text": "connection"}, {"end": 375.72, "start": 375.2, "text": "to"}, {"end": 376.32, "start": 375.72, "text": "your"}, {"end": 376.84, "start": 376.32, "text": "server"}, {"end": 376.96, "start": 376.84, "text": "in"}, {"end": 377.48, "start": 376.96, "text": "China"}, {"end": 377.76, "start": 377.48, "text": "or"}, {"end": 377.92, "start": 377.76, "text": "in"}, {"end": 378.12, "start": 377.92, "text": "Silicon"}, {"end": 378.68, "start": 378.12, "text": "Valley"}, {"end": 378.88, "start": 378.68, "text": "or"}, {"end": 379.36, "start": 378.88, "text": "wherever"}, {"end": 379.6, "start": 379.36, "text": "your"}, {"end": 379.92, "start": 379.6, "text": "model"}, {"end": 380.16, "start": 379.92, "text": "is"}, {"end": 380.52, "start": 380.16, "text": "positioned."}, {"end": 381.12, "start": 380.52, "text": "So"}, {"end": 381.72, "start": 381.12, "text": "let's"}, {"end": 382.04, "start": 381.72, "text": "explore"}, {"end": 382.44, "start": 382.04, "text": "this"}, {"end": 382.68, "start": 382.44, "text": "and"}, {"end": 383.2, "start": 382.68, "text": "let's"}, {"end": 383.48, "start": 383.2, "text": "see"}, {"end": 383.64, "start": 383.48, "text": "where"}, {"end": 383.76, "start": 383.64, "text": "the"}, {"end": 384.12, "start": 383.76, "text": "research"}, {"end": 384.32, "start": 384.12, "text": "is"}, {"end": 384.64, "start": 384.32, "text": "today."}, {"end": 386.48, "start": 384.64, "text": "We"}, {"end": 386.56, "start": 386.48, "text": "are"}, {"end": 387.24, "start": 386.56, "text": "now"}, {"end": 387.92, "start": 387.24, "text": "entering"}, {"end": 388.96, "start": 387.92, "text": "here"}, {"end": 389.28, "start": 388.96, "text": "how"}, {"end": 389.52, "start": 389.28, "text": "to"}, {"end": 389.96, "start": 389.52, "text": "compose"}], "text": " Those tiny LLMs are now trained on leaner data set, and the hope is that they can achieve similar or even better performance for specific tasks that are for those edge devices designed, compared if you would put there a complex LLM with an internet connection to your server in China or in Silicon Valley or wherever your model is positioned. So let's explore this and let's see where the research is today. We are now entering here how to compose"}, {"chunks": [{"end": 390.36, "start": 390.0, "text": "those"}, {"end": 391.04, "start": 390.36, "text": "training"}, {"end": 391.52, "start": 391.04, "text": "data"}, {"end": 391.52, "start": 391.52, "text": "set"}, {"end": 391.56, "start": 391.52, "text": "and"}, {"end": 392.12, "start": 391.56, "text": "pre-training"}, {"end": 392.32, "start": 392.12, "text": "data"}, {"end": 392.52, "start": 392.32, "text": "set"}, {"end": 392.64, "start": 392.52, "text": "for"}, {"end": 392.92, "start": 392.64, "text": "tiny"}, {"end": 393.92, "start": 392.92, "text": "LLMs."}, {"end": 394.28, "start": 393.92, "text": "It"}, {"end": 394.52, "start": 394.28, "text": "is"}, {"end": 394.84, "start": 394.52, "text": "not"}, {"end": 395.52, "start": 394.84, "text": "simple"}, {"end": 395.64, "start": 395.52, "text": "at"}, {"end": 396.68, "start": 395.64, "text": "all,"}, {"end": 397.28, "start": 396.68, "text": "especially"}, {"end": 397.48, "start": 397.28, "text": "if"}, {"end": 397.88, "start": 397.48, "text": "we"}, {"end": 398.2, "start": 397.88, "text": "allow"}, {"end": 398.6, "start": 398.2, "text": "the"}, {"end": 399.0, "start": 398.6, "text": "agent"}, {"end": 399.08, "start": 399.0, "text": "or"}, {"end": 399.4, "start": 399.08, "text": "the"}, {"end": 399.64, "start": 399.4, "text": "model,"}, {"end": 399.72, "start": 399.64, "text": "the"}, {"end": 400.16, "start": 399.72, "text": "tiny"}, {"end": 401.04, "start": 400.16, "text": "LLM,"}, {"end": 401.2, "start": 401.04, "text": "in"}, {"end": 401.44, "start": 401.2, "text": "their"}, {"end": 401.88, "start": 401.44, "text": "training"}, {"end": 402.52, "start": 401.88, "text": "strategy"}, {"end": 402.84, "start": 402.52, "text": "to"}, {"end": 403.32, "start": 402.84, "text": "actively"}, {"end": 403.96, "start": 403.32, "text": "seek"}, {"end": 404.72, "start": 403.96, "text": "knowledge"}, {"end": 405.08, "start": 404.72, "text": "during"}, {"end": 405.16, "start": 405.08, "text": "the"}, {"end": 405.76, "start": 405.16, "text": "pre-training"}, {"end": 406.48, "start": 405.76, "text": "phase."}, {"end": 406.6, "start": 406.48, "text": "So"}, {"end": 407.08, "start": 406.6, "text": "this"}, {"end": 407.4, "start": 407.08, "text": "is"}, {"end": 407.76, "start": 407.4, "text": "a"}, {"end": 407.96, "start": 407.76, "text": "new"}, {"end": 408.28, "start": 407.96, "text": "form"}, {"end": 408.44, "start": 408.28, "text": "of"}, {"end": 409.04, "start": 408.44, "text": "pre-training,"}, {"end": 409.24, "start": 409.04, "text": "and"}, {"end": 409.56, "start": 409.24, "text": "this"}, {"end": 409.88, "start": 409.56, "text": "video"}, {"end": 410.6, "start": 409.88, "text": "now"}, {"end": 410.96, "start": 410.6, "text": "is"}, {"end": 411.2, "start": 410.96, "text": "a"}, {"end": 411.72, "start": 411.2, "text": "continuation"}, {"end": 411.88, "start": 411.72, "text": "of"}, {"end": 412.04, "start": 411.88, "text": "my"}, {"end": 412.32, "start": 412.04, "text": "video"}, {"end": 412.6, "start": 412.32, "text": "from"}, {"end": 413.24, "start": 412.6, "text": "yesterday,"}, {"end": 413.64, "start": 413.24, "text": "where"}, {"end": 413.96, "start": 413.64, "text": "we're"}, {"end": 414.28, "start": 413.96, "text": "talking"}, {"end": 414.6, "start": 414.28, "text": "about"}, {"end": 414.92, "start": 414.6, "text": "new"}, {"end": 415.48, "start": 414.92, "text": "pre-training"}, {"end": 416.24, "start": 415.48, "text": "methodologies"}, {"end": 416.56, "start": 416.24, "text": "for"}, {"end": 417.24, "start": 416.56, "text": "normal"}, {"end": 418.28, "start": 417.24, "text": "LLMs."}, {"end": 418.52, "start": 418.28, "text": "But"}, {"end": 419.12, "start": 418.52, "text": "if"}, {"end": 419.48, "start": 419.12, "text": "we"}, {"end": 419.96, "start": 419.48, "text": "simply"}], "text": " those training data set and pre-training data set for tiny LLMs. It is not simple at all, especially if we allow the agent or the model, the tiny LLM, in their training strategy to actively seek knowledge during the pre-training phase. So this is a new form of pre-training, and this video now is a continuation of my video from yesterday, where we're talking about new pre-training methodologies for normal LLMs. But if we simply"}, {"chunks": [{"end": 420.4, "start": 420.0, "text": "to"}, {"end": 420.6, "start": 420.4, "text": "the"}, {"end": 421.32, "start": 420.6, "text": "max."}, {"end": 421.92, "start": 421.32, "text": "There"}, {"end": 422.28, "start": 421.92, "text": "are"}, {"end": 422.52, "start": 422.28, "text": "some"}, {"end": 422.76, "start": 422.52, "text": "new"}, {"end": 422.92, "start": 422.76, "text": "insights"}, {"end": 423.2, "start": 422.92, "text": "I"}, {"end": 423.48, "start": 423.2, "text": "found"}, {"end": 425.04, "start": 423.48, "text": "fascinating."}, {"end": 425.24, "start": 425.04, "text": "Yeah,"}, {"end": 425.48, "start": 425.24, "text": "and"}, {"end": 425.8, "start": 425.48, "text": "models"}, {"end": 426.28, "start": 425.8, "text": "pre-trained"}, {"end": 426.68, "start": 426.28, "text": "on"}, {"end": 427.12, "start": 426.68, "text": "easier"}, {"end": 427.44, "start": 427.12, "text": "data"}, {"end": 427.56, "start": 427.44, "text": "with"}, {"end": 427.76, "start": 427.56, "text": "a"}, {"end": 428.0, "start": 427.76, "text": "lower"}, {"end": 428.76, "start": 428.0, "text": "complexity,"}, {"end": 428.84, "start": 428.76, "text": "and"}, {"end": 429.36, "start": 428.84, "text": "we"}, {"end": 429.8, "start": 429.36, "text": "will"}, {"end": 430.16, "start": 429.8, "text": "connect"}, {"end": 430.32, "start": 430.16, "text": "here"}, {"end": 430.52, "start": 430.32, "text": "the"}, {"end": 431.4, "start": 430.52, "text": "information"}, {"end": 431.76, "start": 431.4, "text": "density,"}, {"end": 432.08, "start": 431.76, "text": "the"}, {"end": 433.4, "start": 432.08, "text": "complexity,"}, {"end": 433.6, "start": 433.4, "text": "and"}, {"end": 433.68, "start": 433.6, "text": "the"}, {"end": 434.48, "start": 433.68, "text": "information"}, {"end": 435.16, "start": 434.48, "text": "entropy"}, {"end": 435.32, "start": 435.16, "text": "in"}, {"end": 435.44, "start": 435.32, "text": "about"}, {"end": 435.8, "start": 435.44, "text": "five"}, {"end": 436.28, "start": 435.8, "text": "minutes."}, {"end": 437.28, "start": 436.28, "text": "The"}, {"end": 437.6, "start": 437.28, "text": "idea"}, {"end": 437.92, "start": 437.6, "text": "is"}, {"end": 438.36, "start": 437.92, "text": "to"}, {"end": 438.8, "start": 438.36, "text": "improve"}, {"end": 438.96, "start": 438.8, "text": "the"}, {"end": 439.28, "start": 438.96, "text": "ability"}, {"end": 439.32, "start": 439.28, "text": "of"}, {"end": 439.72, "start": 439.32, "text": "those"}, {"end": 440.28, "start": 439.72, "text": "tiny"}, {"end": 440.88, "start": 440.28, "text": "LLMs"}, {"end": 441.04, "start": 440.88, "text": "to"}, {"end": 441.4, "start": 441.04, "text": "follow"}, {"end": 441.92, "start": 441.4, "text": "instruction."}, {"end": 442.0, "start": 441.92, "text": "But"}, {"end": 442.4, "start": 442.0, "text": "this"}, {"end": 442.88, "start": 442.4, "text": "is"}, {"end": 443.4, "start": 442.88, "text": "really"}, {"end": 443.96, "start": 443.4, "text": "delicate,"}, {"end": 445.6, "start": 443.96, "text": "sensible"}, {"end": 446.4, "start": 445.6, "text": "equilibrium"}, {"end": 446.4, "start": 446.4, "text": "that"}, {"end": 446.72, "start": 446.4, "text": "we"}, {"end": 447.04, "start": 446.72, "text": "have"}, {"end": 447.16, "start": 447.04, "text": "to"}, {"end": 447.4, "start": 447.16, "text": "take"}, {"end": 449.52, "start": 447.4, "text": "care"}, {"end": 449.96, "start": 449.52, "text": "of."}], "text": " to the max. There are some new insights I found fascinating. Yeah, and models pre-trained on easier data with a lower complexity, and we will connect here the information density, the complexity, and the information entropy in about five minutes. The idea is to improve the ability of those tiny LLMs to follow instruction. But this is really delicate, sensible equilibrium that we have to take care of."}, {"chunks": [{"end": 450.56, "start": 450.0, "text": "imagine"}, {"end": 450.76, "start": 450.56, "text": "if"}, {"end": 450.92, "start": 450.76, "text": "we"}, {"end": 451.12, "start": 450.92, "text": "work"}, {"end": 451.24, "start": 451.12, "text": "with"}, {"end": 451.44, "start": 451.24, "text": "those"}, {"end": 451.8, "start": 451.44, "text": "tiny"}, {"end": 452.16, "start": 451.8, "text": "LLMs,"}, {"end": 452.6, "start": 452.16, "text": "there's"}, {"end": 452.96, "start": 452.6, "text": "a"}, {"end": 453.44, "start": 452.96, "text": "trade-off."}, {"end": 453.52, "start": 453.44, "text": "A"}, {"end": 453.96, "start": 453.52, "text": "trade-off"}, {"end": 454.32, "start": 453.96, "text": "between"}, {"end": 454.96, "start": 454.32, "text": "reducing"}, {"end": 455.0, "start": 454.96, "text": "the"}, {"end": 455.52, "start": 455.0, "text": "complexity"}, {"end": 455.64, "start": 455.52, "text": "of"}, {"end": 455.76, "start": 455.64, "text": "our"}, {"end": 456.16, "start": 455.76, "text": "training"}, {"end": 456.72, "start": 456.16, "text": "data"}, {"end": 456.88, "start": 456.72, "text": "set"}, {"end": 456.96, "start": 456.88, "text": "and"}, {"end": 457.24, "start": 456.96, "text": "what"}, {"end": 457.36, "start": 457.24, "text": "the"}, {"end": 457.72, "start": 457.36, "text": "tiny"}, {"end": 458.08, "start": 457.72, "text": "LLM"}, {"end": 458.36, "start": 458.08, "text": "is"}, {"end": 458.64, "start": 458.36, "text": "able"}, {"end": 458.8, "start": 458.64, "text": "to"}, {"end": 459.36, "start": 458.8, "text": "perform"}, {"end": 459.44, "start": 459.36, "text": "in"}, {"end": 459.92, "start": 459.44, "text": "a"}, {"end": 460.28, "start": 459.92, "text": "downstream"}, {"end": 460.76, "start": 460.28, "text": "task"}, {"end": 461.28, "start": 460.76, "text": "and"}, {"end": 461.64, "start": 461.28, "text": "retaining"}, {"end": 462.08, "start": 461.64, "text": "here"}, {"end": 462.44, "start": 462.08, "text": "the"}, {"end": 463.32, "start": 462.44, "text": "model's"}, {"end": 464.12, "start": 463.32, "text": "ability"}, {"end": 464.52, "start": 464.12, "text": "to"}, {"end": 465.24, "start": 464.52, "text": "perform"}, {"end": 465.8, "start": 465.24, "text": "well"}, {"end": 466.04, "start": 465.8, "text": "on"}, {"end": 466.64, "start": 466.04, "text": "unseen"}, {"end": 467.0, "start": 466.64, "text": "data,"}, {"end": 467.72, "start": 467.0, "text": "to"}, {"end": 468.68, "start": 467.72, "text": "generalize"}, {"end": 468.92, "start": 468.68, "text": "with"}, {"end": 469.36, "start": 468.92, "text": "unseen"}, {"end": 470.32, "start": 469.36, "text": "data."}, {"end": 470.48, "start": 470.32, "text": "So"}, {"end": 470.92, "start": 470.48, "text": "there's"}, {"end": 471.16, "start": 470.92, "text": "this"}, {"end": 471.68, "start": 471.16, "text": "balance"}, {"end": 471.84, "start": 471.68, "text": "here"}, {"end": 472.0, "start": 471.84, "text": "of"}, {"end": 472.36, "start": 472.0, "text": "kind"}, {"end": 472.44, "start": 472.36, "text": "of"}, {"end": 473.12, "start": 472.44, "text": "simplifying"}, {"end": 473.24, "start": 473.12, "text": "the"}, {"end": 473.72, "start": 473.24, "text": "data"}, {"end": 473.72, "start": 473.72, "text": "set"}, {"end": 474.0, "start": 473.72, "text": "down"}, {"end": 474.0, "start": 474.0, "text": "to"}, {"end": 474.04, "start": 474.0, "text": "a"}, {"end": 474.48, "start": 474.04, "text": "certain"}, {"end": 474.92, "start": 474.48, "text": "limit."}, {"end": 475.2, "start": 474.92, "text": "Maybe"}, {"end": 476.12, "start": 475.2, "text": "we"}, {"end": 476.36, "start": 476.12, "text": "use"}, {"end": 476.4, "start": 476.36, "text": "the"}, {"end": 477.0, "start": 476.4, "text": "entropy"}, {"end": 477.36, "start": 477.0, "text": "equation"}, {"end": 477.6, "start": 477.36, "text": "to"}, {"end": 477.84, "start": 477.6, "text": "find"}, {"end": 478.48, "start": 477.84, "text": "out"}, {"end": 479.0, "start": 478.48, "text": "where's"}, {"end": 479.04, "start": 479.0, "text": "the"}, {"end": 479.96, "start": 479.04, "text": "threshold"}], "text": " imagine if we work with those tiny LLMs, there's a trade-off. A trade-off between reducing the complexity of our training data set and what the tiny LLM is able to perform in a downstream task and retaining here the model's ability to perform well on unseen data, to generalize with unseen data. So there's this balance here of kind of simplifying the data set down to a certain limit. Maybe we use the entropy equation to find out where's the threshold"}, {"chunks": [{"end": 480.64, "start": 480.0, "text": "And"}, {"end": 480.96, "start": 480.64, "text": "the"}, {"end": 481.56, "start": 480.96, "text": "other"}, {"end": 481.84, "start": 481.56, "text": "side"}, {"end": 481.84, "start": 481.84, "text": "of"}, {"end": 481.92, "start": 481.84, "text": "the"}, {"end": 482.2, "start": 481.92, "text": "limit"}, {"end": 482.36, "start": 482.2, "text": "is,"}, {"end": 482.52, "start": 482.36, "text": "of"}, {"end": 482.84, "start": 482.52, "text": "course,"}, {"end": 482.96, "start": 482.84, "text": "that"}, {"end": 483.2, "start": 482.96, "text": "we"}, {"end": 483.52, "start": 483.2, "text": "need"}, {"end": 485.16, "start": 483.52, "text": "here"}, {"end": 485.56, "start": 485.16, "text": "a"}, {"end": 486.0, "start": 485.56, "text": "rich"}, {"end": 486.8, "start": 486.0, "text": "distribution"}, {"end": 486.96, "start": 486.8, "text": "of"}, {"end": 487.12, "start": 486.96, "text": "the"}, {"end": 487.4, "start": 487.12, "text": "data."}, {"end": 487.92, "start": 487.4, "text": "So"}, {"end": 488.36, "start": 487.92, "text": "the"}, {"end": 488.64, "start": 488.36, "text": "LLM,"}, {"end": 488.96, "start": 488.64, "text": "the"}, {"end": 489.32, "start": 488.96, "text": "tiny"}, {"end": 489.64, "start": 489.32, "text": "LLM,"}, {"end": 489.8, "start": 489.64, "text": "is"}, {"end": 489.96, "start": 489.8, "text": "not"}, {"end": 490.32, "start": 489.96, "text": "simply"}, {"end": 490.6, "start": 490.32, "text": "just"}, {"end": 491.28, "start": 490.6, "text": "memorizing"}, {"end": 491.36, "start": 491.28, "text": "the"}, {"end": 491.8, "start": 491.36, "text": "training"}, {"end": 492.0, "start": 491.8, "text": "data."}, {"end": 492.44, "start": 492.0, "text": "And"}, {"end": 492.72, "start": 492.44, "text": "we"}, {"end": 492.76, "start": 492.72, "text": "have"}, {"end": 492.84, "start": 492.76, "text": "an"}, {"end": 493.16, "start": 492.84, "text": "overfitting."}, {"end": 493.92, "start": 493.16, "text": "So"}, {"end": 494.28, "start": 493.92, "text": "the"}, {"end": 494.64, "start": 494.28, "text": "goal"}, {"end": 494.76, "start": 494.64, "text": "is"}, {"end": 494.92, "start": 494.76, "text": "now"}, {"end": 495.0, "start": 494.92, "text": "by"}, {"end": 495.28, "start": 495.0, "text": "keeping"}, {"end": 495.4, "start": 495.28, "text": "the"}, {"end": 496.16, "start": 495.4, "text": "structure"}, {"end": 496.36, "start": 496.16, "text": "of"}, {"end": 496.68, "start": 496.36, "text": "this"}, {"end": 497.2, "start": 496.68, "text": "pre-training"}, {"end": 497.44, "start": 497.2, "text": "data"}, {"end": 497.72, "start": 497.44, "text": "set"}, {"end": 498.88, "start": 497.72, "text": "aligned"}, {"end": 499.16, "start": 498.88, "text": "with"}, {"end": 499.64, "start": 499.16, "text": "kind"}, {"end": 499.64, "start": 499.64, "text": "of"}, {"end": 499.8, "start": 499.64, "text": "the"}, {"end": 500.6, "start": 499.8, "text": "conventional"}, {"end": 500.96, "start": 500.6, "text": "LLM"}, {"end": 501.16, "start": 500.96, "text": "that"}, {"end": 501.32, "start": 501.16, "text": "we"}, {"end": 501.48, "start": 501.32, "text": "know,"}, {"end": 502.28, "start": 501.48, "text": "and"}, {"end": 502.88, "start": 502.28, "text": "also"}, {"end": 503.2, "start": 502.88, "text": "simplify"}, {"end": 503.36, "start": 503.2, "text": "the"}, {"end": 503.64, "start": 503.36, "text": "data"}, {"end": 503.76, "start": 503.64, "text": "for"}, {"end": 504.0, "start": 503.76, "text": "an"}, {"end": 504.6, "start": 504.0, "text": "easier"}, {"end": 505.08, "start": 504.6, "text": "learning,"}, {"end": 505.72, "start": 505.08, "text": "the"}, {"end": 506.16, "start": 505.72, "text": "authors"}, {"end": 506.28, "start": 506.16, "text": "aim"}, {"end": 506.52, "start": 506.28, "text": "to"}, {"end": 506.88, "start": 506.52, "text": "increase"}, {"end": 507.08, "start": 506.88, "text": "now"}, {"end": 507.68, "start": 507.08, "text": "this"}, {"end": 508.6, "start": 507.68, "text": "general"}, {"end": 509.96, "start": 508.6, "text": "tasks,"}], "text": " And the other side of the limit is, of course, that we need here a rich distribution of the data. So the LLM, the tiny LLM, is not simply just memorizing the training data. And we have an overfitting. So the goal is now by keeping the structure of this pre-training data set aligned with kind of the conventional LLM that we know, and also simplify the data for an easier learning, the authors aim to increase now this general tasks,"}, {"chunks": [{"end": 510.76, "start": 510.0, "text": "ability"}, {"end": 510.88, "start": 510.76, "text": "of"}, {"end": 511.56, "start": 510.88, "text": "tiny"}, {"end": 513.2, "start": 511.56, "text": "LLMs."}, {"end": 513.6, "start": 513.2, "text": "This"}, {"end": 513.84, "start": 513.6, "text": "is"}, {"end": 513.92, "start": 513.84, "text": "the"}, {"end": 513.92, "start": 513.92, "text": "paper"}, {"end": 514.16, "start": 513.92, "text": "we're"}, {"end": 514.56, "start": 514.16, "text": "going"}, {"end": 514.6, "start": 514.56, "text": "to"}, {"end": 514.76, "start": 514.6, "text": "talk"}, {"end": 515.2, "start": 514.76, "text": "about."}, {"end": 515.52, "start": 515.2, "text": "You"}, {"end": 515.64, "start": 515.52, "text": "see,"}, {"end": 515.76, "start": 515.64, "text": "the"}, {"end": 516.04, "start": 515.76, "text": "very"}, {"end": 516.4, "start": 516.04, "text": "last"}, {"end": 516.64, "start": 516.4, "text": "day"}, {"end": 516.68, "start": 516.64, "text": "of"}, {"end": 517.0, "start": 516.68, "text": "December"}, {"end": 517.96, "start": 517.0, "text": "2024,"}, {"end": 518.32, "start": 517.96, "text": "we"}, {"end": 518.8, "start": 518.32, "text": "have"}, {"end": 519.0, "start": 518.8, "text": "here"}, {"end": 519.32, "start": 519.0, "text": "those"}, {"end": 519.8, "start": 519.32, "text": "researchers"}, {"end": 519.92, "start": 519.8, "text": "from"}, {"end": 520.12, "start": 519.92, "text": "the"}, {"end": 520.44, "start": 520.12, "text": "University"}, {"end": 520.6, "start": 520.44, "text": "of"}, {"end": 520.88, "start": 520.6, "text": "Illinois,"}, {"end": 521.16, "start": 520.88, "text": "and"}, {"end": 521.56, "start": 521.16, "text": "they"}, {"end": 522.0, "start": 521.56, "text": "have"}, {"end": 522.6, "start": 522.0, "text": "an"}, {"end": 523.24, "start": 522.6, "text": "interesting"}, {"end": 523.72, "start": 523.24, "text": "title,"}, {"end": 524.84, "start": 523.72, "text": "Training"}, {"end": 525.0, "start": 524.84, "text": "and"}, {"end": 525.6, "start": 525.0, "text": "Evaluating"}, {"end": 526.12, "start": 525.6, "text": "Tiny"}, {"end": 526.68, "start": 526.12, "text": "Language"}, {"end": 527.0, "start": 526.68, "text": "Model"}, {"end": 527.04, "start": 527.0, "text": "in"}, {"end": 527.2, "start": 527.04, "text": "a"}, {"end": 527.92, "start": 527.2, "text": "Simpler"}, {"end": 528.4, "start": 527.92, "text": "Language"}, {"end": 528.88, "start": 528.4, "text": "Environment."}, {"end": 528.88, "start": 528.88, "text": "And"}, {"end": 529.2, "start": 528.88, "text": "maybe"}, {"end": 529.52, "start": 529.2, "text": "if"}, {"end": 529.92, "start": 529.52, "text": "you"}, {"end": 530.32, "start": 529.92, "text": "just"}, {"end": 530.4, "start": 530.32, "text": "read"}, {"end": 530.52, "start": 530.4, "text": "the"}, {"end": 531.36, "start": 530.52, "text": "title,"}, {"end": 531.56, "start": 531.36, "text": "I"}, {"end": 531.92, "start": 531.56, "text": "was"}, {"end": 532.08, "start": 531.92, "text": "not"}, {"end": 532.8, "start": 532.08, "text": "aware"}, {"end": 533.28, "start": 532.8, "text": "about"}, {"end": 534.0, "start": 533.28, "text": "the"}, {"end": 534.6, "start": 534.0, "text": "impact"}, {"end": 535.0, "start": 534.6, "text": "this"}, {"end": 535.2, "start": 535.0, "text": "idea"}, {"end": 535.4, "start": 535.2, "text": "could"}, {"end": 535.76, "start": 535.4, "text": "have."}, {"end": 536.24, "start": 535.76, "text": "You"}, {"end": 536.56, "start": 536.24, "text": "have"}, {"end": 536.76, "start": 536.56, "text": "here"}, {"end": 536.76, "start": 536.76, "text": "the"}, {"end": 537.16, "start": 536.76, "text": "GitHub,"}, {"end": 537.4, "start": 537.16, "text": "you"}, {"end": 537.4, "start": 537.4, "text": "have"}, {"end": 537.52, "start": 537.4, "text": "the"}, {"end": 537.8, "start": 537.52, "text": "code,"}, {"end": 538.0, "start": 537.8, "text": "you"}, {"end": 538.0, "start": 538.0, "text": "have"}, {"end": 538.16, "start": 538.0, "text": "the"}, {"end": 538.28, "start": 538.16, "text": "data"}, {"end": 538.56, "start": 538.28, "text": "set,"}, {"end": 538.6, "start": 538.56, "text": "you"}, {"end": 538.64, "start": 538.6, "text": "can"}, {"end": 538.76, "start": 538.64, "text": "go"}, {"end": 538.76, "start": 538.76, "text": "and"}, {"end": 538.88, "start": 538.76, "text": "you"}, {"end": 539.32, "start": 538.88, "text": "can"}, {"end": 539.96, "start": 539.32, "text": "experience"}], "text": " ability of tiny LLMs. This is the paper we're going to talk about. You see, the very last day of December 2024, we have here those researchers from the University of Illinois, and they have an interesting title, Training and Evaluating Tiny Language Model in a Simpler Language Environment. And maybe if you just read the title, I was not aware about the impact this idea could have. You have here the GitHub, you have the code, you have the data set, you can go and you can experience"}, {"chunks": [{"end": 540.4, "start": 540.0, "text": "this"}, {"end": 540.88, "start": 540.4, "text": "yourself."}, {"end": 541.08, "start": 540.88, "text": "But"}, {"end": 541.72, "start": 541.08, "text": "I"}, {"end": 541.84, "start": 541.72, "text": "would"}, {"end": 542.04, "start": 541.84, "text": "like"}, {"end": 542.48, "start": 542.04, "text": "to"}, {"end": 543.36, "start": 542.48, "text": "focus"}, {"end": 543.8, "start": 543.36, "text": "here"}, {"end": 544.24, "start": 543.8, "text": "on"}, {"end": 544.52, "start": 544.24, "text": "the"}, {"end": 544.84, "start": 544.52, "text": "idea."}, {"end": 545.04, "start": 544.84, "text": "So"}, {"end": 545.08, "start": 545.04, "text": "the"}, {"end": 545.32, "start": 545.08, "text": "idea"}, {"end": 545.52, "start": 545.32, "text": "is"}, {"end": 545.56, "start": 545.52, "text": "to"}, {"end": 546.2, "start": 545.56, "text": "create"}, {"end": 546.44, "start": 546.2, "text": "such"}, {"end": 546.68, "start": 546.44, "text": "a"}, {"end": 547.04, "start": 546.68, "text": "simple"}, {"end": 547.4, "start": 547.04, "text": "language"}, {"end": 547.8, "start": 547.4, "text": "environment,"}, {"end": 548.12, "start": 547.8, "text": "the"}, {"end": 548.4, "start": 548.12, "text": "authors"}, {"end": 549.04, "start": 548.4, "text": "here"}, {"end": 549.28, "start": 549.04, "text": "said,"}, {"end": 549.56, "start": 549.28, "text": "we"}, {"end": 549.76, "start": 549.56, "text": "want"}, {"end": 550.04, "start": 549.76, "text": "to"}, {"end": 550.36, "start": 550.04, "text": "build"}, {"end": 550.6, "start": 550.36, "text": "here"}, {"end": 550.84, "start": 550.6, "text": "a"}, {"end": 551.52, "start": 550.84, "text": "minimizing"}, {"end": 552.04, "start": 551.52, "text": "language"}, {"end": 552.52, "start": 552.04, "text": "dataset"}, {"end": 553.64, "start": 552.52, "text": "noise"}, {"end": 554.08, "start": 553.64, "text": "and"}, {"end": 554.52, "start": 554.08, "text": "a"}, {"end": 555.0, "start": 554.52, "text": "minimum"}, {"end": 555.84, "start": 555.0, "text": "complexity"}, {"end": 555.92, "start": 555.84, "text": "in"}, {"end": 556.12, "start": 555.92, "text": "our"}, {"end": 557.24, "start": 556.12, "text": "dataset"}, {"end": 557.4, "start": 557.24, "text": "in"}, {"end": 557.8, "start": 557.4, "text": "order"}, {"end": 558.04, "start": 557.8, "text": "to"}, {"end": 558.68, "start": 558.04, "text": "preserve"}, {"end": 558.92, "start": 558.68, "text": "here"}, {"end": 559.08, "start": 558.92, "text": "the"}, {"end": 559.56, "start": 559.08, "text": "essential"}, {"end": 560.64, "start": 559.56, "text": "characteristics"}, {"end": 560.8, "start": 560.64, "text": "of"}, {"end": 560.88, "start": 560.8, "text": "the"}, {"end": 561.4, "start": 560.88, "text": "text"}, {"end": 562.6, "start": 561.4, "text": "distribution."}, {"end": 562.84, "start": 562.6, "text": "As"}, {"end": 563.08, "start": 562.84, "text": "I"}, {"end": 563.64, "start": 563.08, "text": "told"}, {"end": 563.92, "start": 563.64, "text": "you,"}, {"end": 564.04, "start": 563.92, "text": "we"}, {"end": 564.6, "start": 564.04, "text": "want"}, {"end": 565.28, "start": 564.6, "text": "to"}, {"end": 565.6, "start": 565.28, "text": "have"}, {"end": 565.96, "start": 565.6, "text": "a"}, {"end": 566.44, "start": 565.96, "text": "variety"}, {"end": 566.44, "start": 566.44, "text": "of"}, {"end": 566.68, "start": 566.44, "text": "different"}, {"end": 566.96, "start": 566.68, "text": "data"}, {"end": 567.56, "start": 566.96, "text": "distribution"}, {"end": 567.8, "start": 567.56, "text": "from"}, {"end": 568.04, "start": 567.8, "text": "different"}, {"end": 568.36, "start": 568.04, "text": "domain"}, {"end": 568.52, "start": 568.36, "text": "so"}, {"end": 568.56, "start": 568.52, "text": "that"}, {"end": 568.84, "start": 568.56, "text": "a"}, {"end": 569.2, "start": 568.84, "text": "model"}, {"end": 569.68, "start": 569.2, "text": "is"}, {"end": 569.96, "start": 569.68, "text": "able"}], "text": " this yourself. But I would like to focus here on the idea. So the idea is to create such a simple language environment, the authors here said, we want to build here a minimizing language dataset noise and a minimum complexity in our dataset in order to preserve here the essential characteristics of the text distribution. As I told you, we want to have a variety of different data distribution from different domain so that a model is able"}, {"chunks": [{"end": 570.56, "start": 570.0, "text": "although"}, {"end": 570.84, "start": 570.56, "text": "it's"}, {"end": 570.92, "start": 570.84, "text": "highly"}, {"end": 571.68, "start": 570.92, "text": "specialized"}, {"end": 572.2, "start": 571.68, "text": "to"}, {"end": 572.72, "start": 572.2, "text": "perform"}, {"end": 573.2, "start": 572.72, "text": "here"}, {"end": 573.36, "start": 573.2, "text": "maybe"}, {"end": 573.76, "start": 573.36, "text": "even"}, {"end": 574.2, "start": 573.76, "text": "a"}, {"end": 574.44, "start": 574.2, "text": "more"}, {"end": 575.2, "start": 574.44, "text": "complex"}, {"end": 577.0, "start": 575.2, "text": "reasoning."}, {"end": 577.76, "start": 577.0, "text": "Let"}, {"end": 577.92, "start": 577.76, "text": "me"}, {"end": 578.16, "start": 577.92, "text": "give"}, {"end": 578.36, "start": 578.16, "text": "you"}, {"end": 578.68, "start": 578.36, "text": "here"}, {"end": 578.88, "start": 578.68, "text": "some"}, {"end": 579.84, "start": 578.88, "text": "simple"}, {"end": 580.28, "start": 579.84, "text": "insight"}, {"end": 580.52, "start": 580.28, "text": "here"}, {"end": 580.8, "start": 580.52, "text": "from"}, {"end": 581.12, "start": 580.8, "text": "this"}, {"end": 582.0, "start": 581.12, "text": "publication."}, {"end": 582.16, "start": 582.0, "text": "So"}, {"end": 582.52, "start": 582.16, "text": "this"}, {"end": 582.56, "start": 582.52, "text": "is"}, {"end": 582.68, "start": 582.56, "text": "the"}, {"end": 583.16, "start": 582.68, "text": "publication"}, {"end": 583.24, "start": 583.16, "text": "and"}, {"end": 583.36, "start": 583.24, "text": "now"}, {"end": 583.48, "start": 583.36, "text": "we"}, {"end": 583.76, "start": 583.48, "text": "jump"}, {"end": 584.12, "start": 583.76, "text": "into"}, {"end": 584.24, "start": 584.12, "text": "the"}, {"end": 584.76, "start": 584.24, "text": "publication,"}, {"end": 585.24, "start": 584.76, "text": "the"}, {"end": 585.6, "start": 585.24, "text": "data"}, {"end": 585.88, "start": 585.6, "text": "set."}, {"end": 586.16, "start": 585.88, "text": "You"}, {"end": 586.76, "start": 586.16, "text": "know,"}, {"end": 587.04, "start": 586.76, "text": "for"}, {"end": 587.16, "start": 587.04, "text": "a"}, {"end": 587.6, "start": 587.16, "text": "normal"}, {"end": 588.2, "start": 587.6, "text": "LLM,"}, {"end": 589.0, "start": 588.2, "text": "for"}, {"end": 589.08, "start": 589.0, "text": "a"}, {"end": 590.24, "start": 589.08, "text": "multi"}, {"end": 591.12, "start": 590.24, "text": "700B"}, {"end": 592.0, "start": 591.12, "text": "model,"}, {"end": 592.28, "start": 592.0, "text": "this"}, {"end": 592.44, "start": 592.28, "text": "would"}, {"end": 592.48, "start": 592.44, "text": "be"}, {"end": 592.64, "start": 592.48, "text": "a"}, {"end": 593.32, "start": 592.64, "text": "sentence"}, {"end": 593.32, "start": 593.32, "text": "that"}, {"end": 593.56, "start": 593.32, "text": "the"}, {"end": 593.88, "start": 593.56, "text": "model"}, {"end": 594.2, "start": 593.88, "text": "is"}, {"end": 594.52, "start": 594.2, "text": "able"}, {"end": 594.96, "start": 594.52, "text": "because"}, {"end": 595.16, "start": 594.96, "text": "it"}, {"end": 595.2, "start": 595.16, "text": "has"}, {"end": 595.44, "start": 595.2, "text": "been"}, {"end": 595.64, "start": 595.44, "text": "trained,"}, {"end": 595.8, "start": 595.64, "text": "I"}, {"end": 595.96, "start": 595.8, "text": "don't"}, {"end": 596.12, "start": 595.96, "text": "know"}, {"end": 596.24, "start": 596.12, "text": "on"}, {"end": 596.48, "start": 596.24, "text": "how"}, {"end": 596.68, "start": 596.48, "text": "many"}, {"end": 597.48, "start": 596.68, "text": "trillion"}, {"end": 598.08, "start": 597.48, "text": "training"}, {"end": 599.08, "start": 598.08, "text": "tokens."}, {"end": 599.2, "start": 599.08, "text": "But"}, {"end": 599.4, "start": 599.2, "text": "if"}, {"end": 599.56, "start": 599.4, "text": "we"}, {"end": 599.96, "start": 599.56, "text": "go"}], "text": " although it's highly specialized to perform here maybe even a more complex reasoning. Let me give you here some simple insight here from this publication. So this is the publication and now we jump into the publication, the data set. You know, for a normal LLM, for a multi 700B model, this would be a sentence that the model is able because it has been trained, I don't know on how many trillion training tokens. But if we go"}, {"chunks": [{"end": 600.32, "start": 600.0, "text": "down"}, {"end": 600.44, "start": 600.32, "text": "to"}, {"end": 600.56, "start": 600.44, "text": "the"}, {"end": 601.36, "start": 600.56, "text": "tiny"}, {"end": 602.24, "start": 601.36, "text": "LLMs,"}, {"end": 602.56, "start": 602.24, "text": "we"}, {"end": 602.84, "start": 602.56, "text": "have"}, {"end": 603.04, "start": 602.84, "text": "now"}, {"end": 603.16, "start": 603.04, "text": "to"}, {"end": 603.36, "start": 603.16, "text": "make"}, {"end": 603.88, "start": 603.36, "text": "things"}, {"end": 604.04, "start": 603.88, "text": "even"}, {"end": 604.2, "start": 604.04, "text": "from"}, {"end": 604.36, "start": 604.2, "text": "the"}, {"end": 605.0, "start": 604.36, "text": "semantic"}, {"end": 605.68, "start": 605.0, "text": "content"}, {"end": 606.48, "start": 605.68, "text": "simpler."}, {"end": 606.64, "start": 606.48, "text": "So"}, {"end": 607.04, "start": 606.64, "text": "this"}, {"end": 607.52, "start": 607.04, "text": "is"}, {"end": 607.96, "start": 607.52, "text": "now"}, {"end": 608.2, "start": 607.96, "text": "where"}, {"end": 608.52, "start": 608.2, "text": "our"}, {"end": 608.68, "start": 608.52, "text": "new"}, {"end": 609.0, "start": 608.68, "text": "training"}, {"end": 609.44, "start": 609.0, "text": "process,"}, {"end": 609.44, "start": 609.44, "text": "this"}, {"end": 609.52, "start": 609.44, "text": "is"}, {"end": 609.84, "start": 609.52, "text": "the"}, {"end": 610.76, "start": 609.84, "text": "result."}, {"end": 610.92, "start": 610.76, "text": "So"}, {"end": 611.04, "start": 610.92, "text": "we"}, {"end": 612.16, "start": 611.04, "text": "simplify"}, {"end": 612.64, "start": 612.16, "text": "this"}, {"end": 612.96, "start": 612.64, "text": "sentence"}, {"end": 613.24, "start": 612.96, "text": "to"}, {"end": 613.64, "start": 613.24, "text": "this"}, {"end": 613.96, "start": 613.64, "text": "sentence."}, {"end": 614.36, "start": 613.96, "text": "Let"}, {"end": 614.84, "start": 614.36, "text": "me"}, {"end": 615.08, "start": 614.84, "text": "give"}, {"end": 615.36, "start": 615.08, "text": "you"}, {"end": 615.52, "start": 615.36, "text": "another"}, {"end": 616.24, "start": 615.52, "text": "example."}, {"end": 617.08, "start": 616.24, "text": "Entrepreneurship,"}, {"end": 617.36, "start": 617.08, "text": "business"}, {"end": 618.4, "start": 617.36, "text": "management."}, {"end": 618.92, "start": 618.4, "text": "This"}, {"end": 619.24, "start": 618.92, "text": "is"}, {"end": 619.68, "start": 619.24, "text": "here"}, {"end": 620.16, "start": 619.68, "text": "a"}, {"end": 620.56, "start": 620.16, "text": "normal,"}, {"end": 621.48, "start": 620.56, "text": "complex"}, {"end": 621.88, "start": 621.48, "text": "sentence,"}, {"end": 622.04, "start": 621.88, "text": "but"}, {"end": 622.2, "start": 622.04, "text": "we"}, {"end": 622.36, "start": 622.2, "text": "have"}, {"end": 622.48, "start": 622.36, "text": "to"}, {"end": 623.24, "start": 622.48, "text": "simplify"}, {"end": 623.32, "start": 623.24, "text": "it"}, {"end": 623.52, "start": 623.32, "text": "for"}, {"end": 623.72, "start": 623.52, "text": "our"}, {"end": 624.16, "start": 623.72, "text": "tiny"}, {"end": 624.8, "start": 624.16, "text": "LLM."}, {"end": 624.88, "start": 624.8, "text": "And"}, {"end": 625.72, "start": 624.88, "text": "this"}, {"end": 626.0, "start": 625.72, "text": "is"}, {"end": 626.24, "start": 626.0, "text": "now"}, {"end": 626.32, "start": 626.24, "text": "the"}, {"end": 626.96, "start": 626.32, "text": "simplified"}, {"end": 627.44, "start": 626.96, "text": "case"}, {"end": 627.72, "start": 627.44, "text": "that"}, {"end": 627.8, "start": 627.72, "text": "a"}, {"end": 628.08, "start": 627.8, "text": "tiny"}, {"end": 628.24, "start": 628.08, "text": "LLM"}, {"end": 628.52, "start": 628.24, "text": "can"}, {"end": 629.04, "start": 628.52, "text": "beautifully"}, {"end": 629.4, "start": 629.04, "text": "run"}, {"end": 629.48, "start": 629.4, "text": "and"}, {"end": 629.96, "start": 629.48, "text": "understand."}], "text": " down to the tiny LLMs, we have now to make things even from the semantic content simpler. So this is now where our new training process, this is the result. So we simplify this sentence to this sentence. Let me give you another example. Entrepreneurship, business management. This is here a normal, complex sentence, but we have to simplify it for our tiny LLM. And this is now the simplified case that a tiny LLM can beautifully run and understand."}, {"chunks": [{"end": 630.92, "start": 630.0, "text": "So"}, {"end": 631.36, "start": 630.92, "text": "what"}, {"end": 631.56, "start": 631.36, "text": "we"}, {"end": 631.56, "start": 631.56, "text": "did,"}, {"end": 631.8, "start": 631.56, "text": "we"}, {"end": 632.28, "start": 631.8, "text": "reduced"}, {"end": 632.48, "start": 632.28, "text": "the"}, {"end": 633.36, "start": 632.48, "text": "semantic"}, {"end": 634.16, "start": 633.36, "text": "complexity"}, {"end": 634.36, "start": 634.16, "text": "of"}, {"end": 634.52, "start": 634.36, "text": "our"}, {"end": 635.72, "start": 634.52, "text": "sentence."}, {"end": 636.04, "start": 635.72, "text": "We"}, {"end": 636.68, "start": 636.04, "text": "reduced"}, {"end": 636.92, "start": 636.68, "text": "here"}, {"end": 637.32, "start": 636.92, "text": "the"}, {"end": 637.84, "start": 637.32, "text": "amount"}, {"end": 638.32, "start": 637.84, "text": "of"}, {"end": 638.96, "start": 638.32, "text": "vocabulary"}, {"end": 639.24, "start": 638.96, "text": "that"}, {"end": 639.36, "start": 639.24, "text": "we"}, {"end": 639.88, "start": 639.36, "text": "use"}, {"end": 640.2, "start": 639.88, "text": "down"}, {"end": 640.44, "start": 640.2, "text": "to"}, {"end": 641.2, "start": 640.44, "text": "2000"}, {"end": 642.88, "start": 641.2, "text": "tokens."}, {"end": 643.36, "start": 642.88, "text": "And"}, {"end": 643.8, "start": 643.36, "text": "we"}, {"end": 644.0, "start": 643.8, "text": "want"}, {"end": 644.16, "start": 644.0, "text": "to"}, {"end": 644.56, "start": 644.16, "text": "find"}, {"end": 645.0, "start": 644.56, "text": "out"}, {"end": 645.2, "start": 645.0, "text": "with"}, {"end": 645.56, "start": 645.2, "text": "this"}, {"end": 646.12, "start": 645.56, "text": "minimum"}, {"end": 647.24, "start": 646.12, "text": "configuration,"}, {"end": 647.8, "start": 647.24, "text": "better"}, {"end": 648.12, "start": 647.8, "text": "new"}, {"end": 648.8, "start": 648.12, "text": "training"}, {"end": 649.68, "start": 648.8, "text": "procedures."}, {"end": 650.08, "start": 649.68, "text": "And"}, {"end": 650.4, "start": 650.08, "text": "where"}, {"end": 651.0, "start": 650.4, "text": "are"}, {"end": 651.32, "start": 651.0, "text": "some"}, {"end": 652.12, "start": 651.32, "text": "thresholds"}, {"end": 652.56, "start": 652.12, "text": "and"}, {"end": 652.56, "start": 652.56, "text": "what"}, {"end": 652.56, "start": 652.56, "text": "we"}, {"end": 652.56, "start": 652.56, "text": "have"}, {"end": 652.68, "start": 652.56, "text": "to"}, {"end": 652.92, "start": 652.68, "text": "take"}, {"end": 653.44, "start": 652.92, "text": "care"}, {"end": 653.56, "start": 653.44, "text": "of."}, {"end": 654.12, "start": 653.56, "text": "And"}, {"end": 654.48, "start": 654.12, "text": "this"}, {"end": 654.52, "start": 654.48, "text": "is"}, {"end": 654.68, "start": 654.52, "text": "now"}, {"end": 654.8, "start": 654.68, "text": "able,"}, {"end": 655.04, "start": 654.8, "text": "we"}, {"end": 655.32, "start": 655.04, "text": "can"}, {"end": 655.68, "start": 655.32, "text": "do"}, {"end": 655.76, "start": 655.68, "text": "it."}, {"end": 655.76, "start": 655.76, "text": "We"}, {"end": 656.04, "start": 655.76, "text": "can"}, {"end": 656.32, "start": 656.04, "text": "run"}, {"end": 656.48, "start": 656.32, "text": "our"}, {"end": 657.28, "start": 656.48, "text": "experiments"}, {"end": 658.04, "start": 657.28, "text": "because"}, {"end": 658.08, "start": 658.04, "text": "it"}, {"end": 658.36, "start": 658.08, "text": "is"}, {"end": 658.68, "start": 658.36, "text": "not"}, {"end": 658.72, "start": 658.68, "text": "a"}, {"end": 659.96, "start": 658.72, "text": "685p."}], "text": " So what we did, we reduced the semantic complexity of our sentence. We reduced here the amount of vocabulary that we use down to 2000 tokens. And we want to find out with this minimum configuration, better new training procedures. And where are some thresholds and what we have to take care of. And this is now able, we can do it. We can run our experiments because it is not a 685p."}, {"chunks": [{"end": 660.36, "start": 660.0, "text": "billion"}, {"end": 660.52, "start": 660.36, "text": "free"}, {"end": 660.96, "start": 660.52, "text": "trainable"}, {"end": 661.56, "start": 660.96, "text": "parameter"}, {"end": 662.28, "start": 661.56, "text": "model"}, {"end": 662.96, "start": 662.28, "text": "somewhere"}, {"end": 663.64, "start": 662.96, "text": "positioned"}, {"end": 663.8, "start": 663.64, "text": "in"}, {"end": 665.2, "start": 663.8, "text": "China."}, {"end": 667.44, "start": 665.2, "text": "I"}, {"end": 667.84, "start": 667.44, "text": "was"}, {"end": 668.0, "start": 667.84, "text": "talking"}, {"end": 668.36, "start": 668.0, "text": "here"}, {"end": 668.76, "start": 668.36, "text": "about"}, {"end": 669.24, "start": 668.76, "text": "trying"}, {"end": 669.44, "start": 669.24, "text": "to"}, {"end": 670.04, "start": 669.44, "text": "find"}, {"end": 670.4, "start": 670.04, "text": "those"}, {"end": 671.56, "start": 670.4, "text": "thresholds."}, {"end": 671.76, "start": 671.56, "text": "And"}, {"end": 671.96, "start": 671.76, "text": "yeah,"}, {"end": 672.0, "start": 671.96, "text": "we"}, {"end": 672.2, "start": 672.0, "text": "can"}, {"end": 672.36, "start": 672.2, "text": "work"}, {"end": 672.8, "start": 672.36, "text": "with"}, {"end": 673.4, "start": 672.8, "text": "information"}, {"end": 673.68, "start": 673.4, "text": "entropy"}, {"end": 673.92, "start": 673.68, "text": "and"}, {"end": 674.2, "start": 673.92, "text": "it"}, {"end": 674.6, "start": 674.2, "text": "is"}, {"end": 674.72, "start": 674.6, "text": "a"}, {"end": 674.92, "start": 674.72, "text": "beautiful"}, {"end": 675.32, "start": 674.92, "text": "idea,"}, {"end": 675.84, "start": 675.32, "text": "but"}, {"end": 676.04, "start": 675.84, "text": "I"}, {"end": 676.2, "start": 676.04, "text": "just"}, {"end": 676.52, "start": 676.2, "text": "want"}, {"end": 676.64, "start": 676.52, "text": "to"}, {"end": 676.88, "start": 676.64, "text": "focus"}, {"end": 677.12, "start": 676.88, "text": "here"}, {"end": 677.28, "start": 677.12, "text": "on"}, {"end": 677.68, "start": 677.28, "text": "the"}, {"end": 677.88, "start": 677.68, "text": "grand"}, {"end": 677.96, "start": 677.88, "text": "scheme"}, {"end": 677.96, "start": 677.96, "text": "what"}, {"end": 678.2, "start": 677.96, "text": "we"}, {"end": 678.56, "start": 678.2, "text": "are"}, {"end": 679.32, "start": 678.56, "text": "talking"}, {"end": 680.04, "start": 679.32, "text": "about."}, {"end": 680.16, "start": 680.04, "text": "I"}, {"end": 680.2, "start": 680.16, "text": "would"}, {"end": 680.24, "start": 680.2, "text": "like"}, {"end": 681.2, "start": 680.24, "text": "to"}, {"end": 681.72, "start": 681.2, "text": "draw"}, {"end": 682.24, "start": 681.72, "text": "your"}, {"end": 682.6, "start": 682.24, "text": "attention"}, {"end": 682.76, "start": 682.6, "text": "to"}, {"end": 683.0, "start": 682.76, "text": "this"}, {"end": 683.76, "start": 683.0, "text": "publication"}, {"end": 684.2, "start": 683.76, "text": "here,"}, {"end": 684.56, "start": 684.2, "text": "a"}, {"end": 685.2, "start": 684.56, "text": "survey"}, {"end": 685.36, "start": 685.2, "text": "on"}, {"end": 686.0, "start": 685.36, "text": "self"}, {"end": 686.72, "start": 686.0, "text": "evolution"}, {"end": 686.88, "start": 686.72, "text": "of"}, {"end": 687.2, "start": 686.88, "text": "large"}, {"end": 687.64, "start": 687.2, "text": "language"}, {"end": 687.92, "start": 687.64, "text": "model."}, {"end": 689.6, "start": 687.92, "text": "Highly"}, {"end": 689.96, "start": 689.6, "text": "interesting"}], "text": " billion free trainable parameter model somewhere positioned in China. I was talking here about trying to find those thresholds. And yeah, we can work with information entropy and it is a beautiful idea, but I just want to focus here on the grand scheme what we are talking about. I would like to draw your attention to this publication here, a survey on self evolution of large language model. Highly interesting"}, {"chunks": [{"end": 690.08, "start": 690.0, "text": "You"}, {"end": 690.24, "start": 690.08, "text": "have"}, {"end": 690.64, "start": 690.24, "text": "the"}, {"end": 691.04, "start": 690.64, "text": "GitHub"}, {"end": 691.56, "start": 691.04, "text": "and"}, {"end": 691.76, "start": 691.56, "text": "everything"}, {"end": 692.12, "start": 691.76, "text": "available"}, {"end": 692.24, "start": 692.12, "text": "here."}, {"end": 692.52, "start": 692.24, "text": "Yeah,"}, {"end": 692.76, "start": 692.52, "text": "I"}, {"end": 692.96, "start": 692.76, "text": "was"}, {"end": 693.16, "start": 692.96, "text": "asked,"}, {"end": 693.28, "start": 693.16, "text": "do"}, {"end": 693.48, "start": 693.28, "text": "I"}, {"end": 693.76, "start": 693.48, "text": "have"}, {"end": 694.08, "start": 693.76, "text": "to"}, {"end": 694.4, "start": 694.08, "text": "really"}, {"end": 694.84, "start": 694.4, "text": "read"}, {"end": 695.04, "start": 694.84, "text": "all"}, {"end": 695.28, "start": 695.04, "text": "those"}, {"end": 695.68, "start": 695.28, "text": "papers?"}, {"end": 696.2, "start": 695.68, "text": "No."}, {"end": 696.6, "start": 696.2, "text": "If"}, {"end": 697.0, "start": 696.6, "text": "you"}, {"end": 697.32, "start": 697.0, "text": "are"}, {"end": 698.24, "start": 697.32, "text": "interested,"}, {"end": 698.4, "start": 698.24, "text": "if"}, {"end": 698.48, "start": 698.4, "text": "you"}, {"end": 698.68, "start": 698.48, "text": "want"}, {"end": 699.4, "start": 698.68, "text": "further"}, {"end": 700.76, "start": 699.4, "text": "details,"}, {"end": 701.24, "start": 700.76, "text": "then"}, {"end": 701.32, "start": 701.24, "text": "I"}, {"end": 701.52, "start": 701.32, "text": "give"}, {"end": 701.68, "start": 701.52, "text": "you"}, {"end": 701.88, "start": 701.68, "text": "this"}, {"end": 702.24, "start": 701.88, "text": "paper"}, {"end": 702.6, "start": 702.24, "text": "because"}, {"end": 703.24, "start": 702.6, "text": "I've"}, {"end": 703.48, "start": 703.24, "text": "read"}, {"end": 703.76, "start": 703.48, "text": "them."}, {"end": 704.04, "start": 703.76, "text": "I"}, {"end": 704.44, "start": 704.04, "text": "found"}, {"end": 704.56, "start": 704.44, "text": "it"}, {"end": 705.12, "start": 704.56, "text": "interesting."}, {"end": 705.36, "start": 705.12, "text": "I"}, {"end": 705.52, "start": 705.36, "text": "found"}, {"end": 705.72, "start": 705.52, "text": "them"}, {"end": 706.12, "start": 705.72, "text": "helpful"}, {"end": 706.52, "start": 706.12, "text": "for"}, {"end": 706.76, "start": 706.52, "text": "me."}, {"end": 707.2, "start": 706.76, "text": "But"}, {"end": 707.48, "start": 707.2, "text": "it"}, {"end": 708.04, "start": 707.48, "text": "is"}, {"end": 708.28, "start": 708.04, "text": "not"}, {"end": 708.32, "start": 708.28, "text": "that"}, {"end": 708.68, "start": 708.32, "text": "you"}, {"end": 708.96, "start": 708.68, "text": "have"}, {"end": 709.12, "start": 708.96, "text": "to"}, {"end": 709.28, "start": 709.12, "text": "read"}, {"end": 709.56, "start": 709.28, "text": "those"}, {"end": 710.36, "start": 709.56, "text": "papers."}, {"end": 710.52, "start": 710.36, "text": "I"}, {"end": 710.84, "start": 710.52, "text": "try"}, {"end": 710.84, "start": 710.84, "text": "to"}, {"end": 711.04, "start": 710.84, "text": "explain"}, {"end": 711.24, "start": 711.04, "text": "it"}, {"end": 711.6, "start": 711.24, "text": "in"}, {"end": 711.84, "start": 711.6, "text": "a"}, {"end": 712.36, "start": 711.84, "text": "way"}, {"end": 712.76, "start": 712.36, "text": "that"}, {"end": 713.68, "start": 712.76, "text": "you"}, {"end": 714.24, "start": 713.68, "text": "should"}, {"end": 714.4, "start": 714.24, "text": "be"}, {"end": 714.76, "start": 714.4, "text": "able"}, {"end": 714.96, "start": 714.76, "text": "to"}, {"end": 715.4, "start": 714.96, "text": "follow"}, {"end": 715.6, "start": 715.4, "text": "my"}, {"end": 716.2, "start": 715.6, "text": "crazy"}, {"end": 716.72, "start": 716.2, "text": "ideas."}, {"end": 716.96, "start": 716.72, "text": "Yes,"}, {"end": 717.16, "start": 716.96, "text": "I"}, {"end": 717.4, "start": 717.16, "text": "know."}, {"end": 717.52, "start": 717.4, "text": "Not"}, {"end": 717.68, "start": 717.52, "text": "a"}, {"end": 718.12, "start": 717.68, "text": "lot"}, {"end": 718.48, "start": 718.12, "text": "of"}, {"end": 718.8, "start": 718.48, "text": "other"}, {"end": 719.4, "start": 718.8, "text": "YouTubers"}, {"end": 719.96, "start": 719.4, "text": "do."}], "text": " You have the GitHub and everything available here. Yeah, I was asked, do I have to really read all those papers? No. If you are interested, if you want further details, then I give you this paper because I've read them. I found it interesting. I found them helpful for me. But it is not that you have to read those papers. I try to explain it in a way that you should be able to follow my crazy ideas. Yes, I know. Not a lot of other YouTubers do."}, {"chunks": [{"end": 720.72, "start": 720.0, "text": "follow"}, {"end": 721.24, "start": 720.72, "text": "those"}, {"end": 721.64, "start": 721.24, "text": "rather"}, {"end": 722.56, "start": 721.64, "text": "abstractions"}, {"end": 722.56, "start": 722.56, "text": "and"}, {"end": 723.08, "start": 722.56, "text": "present"}, {"end": 723.24, "start": 723.08, "text": "you"}, {"end": 723.36, "start": 723.24, "text": "only"}, {"end": 723.68, "start": 723.36, "text": "single"}, {"end": 724.48, "start": 723.68, "text": "papers,"}, {"end": 724.68, "start": 724.48, "text": "but"}, {"end": 725.0, "start": 724.68, "text": "I"}, {"end": 725.32, "start": 725.0, "text": "like"}, {"end": 725.44, "start": 725.32, "text": "to"}, {"end": 725.72, "start": 725.44, "text": "go"}, {"end": 726.48, "start": 725.72, "text": "cross-discipline."}, {"end": 727.28, "start": 726.48, "text": "And"}, {"end": 727.6, "start": 727.28, "text": "let's"}, {"end": 727.8, "start": 727.6, "text": "explore"}, {"end": 728.72, "start": 727.8, "text": "this."}, {"end": 728.88, "start": 728.72, "text": "So"}, {"end": 729.0, "start": 728.88, "text": "what"}, {"end": 729.32, "start": 729.0, "text": "they"}, {"end": 729.36, "start": 729.32, "text": "do"}, {"end": 729.8, "start": 729.36, "text": "is"}, {"end": 730.0, "start": 729.8, "text": "here"}, {"end": 730.08, "start": 730.0, "text": "they"}, {"end": 730.28, "start": 730.08, "text": "say,"}, {"end": 730.44, "start": 730.28, "text": "hey,"}, {"end": 730.52, "start": 730.44, "text": "the"}, {"end": 731.0, "start": 730.52, "text": "classical"}, {"end": 731.48, "start": 731.0, "text": "framework"}, {"end": 731.64, "start": 731.48, "text": "for"}, {"end": 731.92, "start": 731.64, "text": "developing"}, {"end": 732.4, "start": 731.92, "text": "your"}, {"end": 733.44, "start": 732.4, "text": "self-evolving"}, {"end": 733.88, "start": 733.44, "text": "agents"}, {"end": 734.0, "start": 733.88, "text": "in"}, {"end": 734.28, "start": 734.0, "text": "AI."}, {"end": 734.68, "start": 734.28, "text": "Typically"}, {"end": 735.24, "start": 734.68, "text": "you"}, {"end": 735.56, "start": 735.24, "text": "have"}, {"end": 735.72, "start": 735.56, "text": "your"}, {"end": 736.12, "start": 735.72, "text": "four"}, {"end": 736.72, "start": 736.12, "text": "stages,"}, {"end": 736.96, "start": 736.72, "text": "the"}, {"end": 737.76, "start": 736.96, "text": "experience"}, {"end": 738.04, "start": 737.76, "text": "acquisition,"}, {"end": 738.24, "start": 738.04, "text": "the"}, {"end": 738.84, "start": 738.24, "text": "experience"}, {"end": 739.4, "start": 738.84, "text": "refinement,"}, {"end": 739.68, "start": 739.4, "text": "the"}, {"end": 740.12, "start": 739.68, "text": "updating,"}, {"end": 740.32, "start": 740.12, "text": "and"}, {"end": 740.64, "start": 740.32, "text": "the"}, {"end": 743.28, "start": 740.64, "text": "evaluation."}, {"end": 743.4, "start": 743.28, "text": "Now"}, {"end": 743.56, "start": 743.4, "text": "if"}, {"end": 743.76, "start": 743.56, "text": "we"}, {"end": 743.84, "start": 743.76, "text": "look"}, {"end": 744.12, "start": 743.84, "text": "here"}, {"end": 744.24, "start": 744.12, "text": "at"}, {"end": 744.84, "start": 744.24, "text": "our"}, {"end": 745.36, "start": 744.84, "text": "actual"}, {"end": 745.72, "start": 745.36, "text": "model"}, {"end": 745.92, "start": 745.72, "text": "here,"}, {"end": 746.28, "start": 745.92, "text": "they"}, {"end": 746.36, "start": 746.28, "text": "do"}, {"end": 746.4, "start": 746.36, "text": "it"}, {"end": 746.76, "start": 746.4, "text": "a"}, {"end": 747.2, "start": 746.76, "text": "little"}, {"end": 747.36, "start": 747.2, "text": "bit"}, {"end": 747.8, "start": 747.36, "text": "different,"}, {"end": 748.6, "start": 747.8, "text": "because"}, {"end": 749.0, "start": 748.6, "text": "they"}, {"end": 749.12, "start": 749.0, "text": "say"}, {"end": 749.52, "start": 749.12, "text": "here,"}, {"end": 749.76, "start": 749.52, "text": "as"}, {"end": 749.76, "start": 749.76, "text": "a"}, {"end": 749.96, "start": 749.76, "text": "common"}], "text": " follow those rather abstractions and present you only single papers, but I like to go cross-discipline. And let's explore this. So what they do is here they say, hey, the classical framework for developing your self-evolving agents in AI. Typically you have your four stages, the experience acquisition, the experience refinement, the updating, and the evaluation. Now if we look here at our actual model here, they do it a little bit different, because they say here, as a common"}, {"chunks": [{"end": 750.12, "start": 750.0, "text": "To"}, {"end": 750.4, "start": 750.12, "text": "complement"}, {"end": 750.56, "start": 750.4, "text": "to"}, {"end": 750.84, "start": 750.56, "text": "what"}, {"end": 750.92, "start": 750.84, "text": "is"}, {"end": 751.04, "start": 750.92, "text": "the"}, {"end": 751.84, "start": 751.04, "text": "classical"}, {"end": 752.48, "start": 751.84, "text": "procedure,"}, {"end": 752.64, "start": 752.48, "text": "we"}, {"end": 753.08, "start": 752.64, "text": "propose"}, {"end": 753.36, "start": 753.08, "text": "here"}, {"end": 753.68, "start": 753.36, "text": "the"}, {"end": 754.16, "start": 753.68, "text": "self"}, {"end": 754.72, "start": 754.16, "text": "evolving"}, {"end": 755.16, "start": 754.72, "text": "agents"}, {"end": 755.48, "start": 755.16, "text": "from"}, {"end": 755.72, "start": 755.48, "text": "our"}, {"end": 756.36, "start": 755.72, "text": "tiny"}, {"end": 756.72, "start": 756.36, "text": "LLMs"}, {"end": 757.08, "start": 756.72, "text": "that"}, {"end": 757.32, "start": 757.08, "text": "are"}, {"end": 757.92, "start": 757.32, "text": "actively"}, {"end": 758.24, "start": 757.92, "text": "seeking"}, {"end": 758.76, "start": 758.24, "text": "new"}, {"end": 759.36, "start": 758.76, "text": "knowledge"}, {"end": 759.76, "start": 759.36, "text": "for"}, {"end": 759.88, "start": 759.76, "text": "the"}, {"end": 760.64, "start": 759.88, "text": "continual"}, {"end": 761.8, "start": 760.64, "text": "improvements."}, {"end": 762.56, "start": 761.8, "text": "Now"}, {"end": 762.76, "start": 762.56, "text": "this"}, {"end": 763.04, "start": 762.76, "text": "is"}, {"end": 763.44, "start": 763.04, "text": "interesting"}, {"end": 763.72, "start": 763.44, "text": "because"}, {"end": 763.84, "start": 763.72, "text": "if"}, {"end": 763.84, "start": 763.84, "text": "those"}, {"end": 764.48, "start": 763.84, "text": "tiny"}, {"end": 764.72, "start": 764.48, "text": "LLMs"}, {"end": 765.16, "start": 764.72, "text": "go"}, {"end": 765.56, "start": 765.16, "text": "out"}, {"end": 766.2, "start": 765.56, "text": "on"}, {"end": 766.68, "start": 766.2, "text": "the"}, {"end": 767.24, "start": 766.68, "text": "internet,"}, {"end": 767.44, "start": 767.24, "text": "they"}, {"end": 767.72, "start": 767.44, "text": "have"}, {"end": 767.96, "start": 767.72, "text": "now"}, {"end": 768.44, "start": 767.96, "text": "to"}, {"end": 768.52, "start": 768.44, "text": "convert"}, {"end": 768.72, "start": 768.52, "text": "a"}, {"end": 769.24, "start": 768.72, "text": "normal"}, {"end": 769.8, "start": 769.24, "text": "complexity"}, {"end": 770.52, "start": 769.8, "text": "semantic"}, {"end": 771.12, "start": 770.52, "text": "content"}, {"end": 772.24, "start": 771.12, "text": "down"}, {"end": 772.48, "start": 772.24, "text": "to"}, {"end": 773.08, "start": 772.48, "text": "their"}, {"end": 774.24, "start": 773.08, "text": "understanding."}, {"end": 774.52, "start": 774.24, "text": "And"}, {"end": 774.96, "start": 774.52, "text": "this"}, {"end": 775.32, "start": 774.96, "text": "is"}, {"end": 775.88, "start": 775.32, "text": "already"}, {"end": 776.0, "start": 775.88, "text": "an"}, {"end": 776.48, "start": 776.0, "text": "adventure"}, {"end": 776.56, "start": 776.48, "text": "in"}, {"end": 777.08, "start": 776.56, "text": "itself"}, {"end": 777.16, "start": 777.08, "text": "and"}, {"end": 777.48, "start": 777.16, "text": "we"}, {"end": 777.6, "start": 777.48, "text": "are"}, {"end": 777.6, "start": 777.6, "text": "not"}, {"end": 778.12, "start": 777.6, "text": "talking"}, {"end": 778.6, "start": 778.12, "text": "about"}, {"end": 779.28, "start": 778.6, "text": "optimization"}, {"end": 779.36, "start": 779.28, "text": "of"}, {"end": 779.68, "start": 779.36, "text": "training"}, {"end": 779.96, "start": 779.68, "text": "procedures."}], "text": " To complement to what is the classical procedure, we propose here the self evolving agents from our tiny LLMs that are actively seeking new knowledge for the continual improvements. Now this is interesting because if those tiny LLMs go out on the internet, they have now to convert a normal complexity semantic content down to their understanding. And this is already an adventure in itself and we are not talking about optimization of training procedures."}, {"chunks": [{"end": 780.8, "start": 780.0, "text": "especially"}, {"end": 780.92, "start": 780.8, "text": "in"}, {"end": 781.16, "start": 780.92, "text": "the"}, {"end": 781.56, "start": 781.16, "text": "pre-training"}, {"end": 783.56, "start": 781.56, "text": "stage."}, {"end": 783.8, "start": 783.56, "text": "If"}, {"end": 784.16, "start": 783.8, "text": "you"}, {"end": 784.32, "start": 784.16, "text": "think"}, {"end": 784.64, "start": 784.32, "text": "this"}, {"end": 784.92, "start": 784.64, "text": "is"}, {"end": 785.32, "start": 784.92, "text": "new,"}, {"end": 785.68, "start": 785.32, "text": "no."}, {"end": 785.92, "start": 785.68, "text": "I"}, {"end": 786.28, "start": 785.92, "text": "learned"}, {"end": 786.52, "start": 786.28, "text": "and"}, {"end": 786.8, "start": 786.52, "text": "I"}, {"end": 787.2, "start": 786.8, "text": "was"}, {"end": 787.56, "start": 787.2, "text": "amazed"}, {"end": 788.04, "start": 787.56, "text": "that,"}, {"end": 788.72, "start": 788.04, "text": "look,"}, {"end": 789.12, "start": 788.72, "text": "there"}, {"end": 789.76, "start": 789.12, "text": "are"}, {"end": 790.12, "start": 789.76, "text": "other"}, {"end": 790.6, "start": 790.12, "text": "LMMs"}, {"end": 790.76, "start": 790.6, "text": "or"}, {"end": 790.96, "start": 790.76, "text": "other"}, {"end": 791.48, "start": 790.96, "text": "stories,"}, {"end": 791.84, "start": 791.48, "text": "tiny"}, {"end": 792.2, "start": 791.84, "text": "stories,"}, {"end": 792.56, "start": 792.2, "text": "tiny"}, {"end": 793.0, "start": 792.56, "text": "dialogue,"}, {"end": 793.2, "start": 793.0, "text": "baby"}, {"end": 793.6, "start": 793.2, "text": "LIM"}, {"end": 794.28, "start": 793.6, "text": "and"}, {"end": 794.4, "start": 794.28, "text": "mini"}, {"end": 794.64, "start": 794.4, "text": "GPT"}, {"end": 795.4, "start": 794.64, "text": "2023,"}, {"end": 796.56, "start": 795.4, "text": "2024."}, {"end": 796.6, "start": 796.56, "text": "You"}, {"end": 797.44, "start": 796.6, "text": "see,"}, {"end": 798.08, "start": 797.44, "text": "minimal"}, {"end": 798.64, "start": 798.08, "text": "data"}, {"end": 798.84, "start": 798.64, "text": "sets"}, {"end": 799.0, "start": 798.84, "text": "and"}, {"end": 799.2, "start": 799.0, "text": "a"}, {"end": 799.48, "start": 799.2, "text": "model"}, {"end": 800.28, "start": 799.48, "text": "size"}, {"end": 800.8, "start": 800.28, "text": "from"}, {"end": 801.44, "start": 800.8, "text": "1"}, {"end": 801.8, "start": 801.44, "text": "million"}, {"end": 802.4, "start": 801.8, "text": "to,"}, {"end": 802.6, "start": 802.4, "text": "I"}, {"end": 802.76, "start": 802.6, "text": "don't"}, {"end": 802.88, "start": 802.76, "text": "know,"}, {"end": 803.96, "start": 802.88, "text": "165"}, {"end": 804.68, "start": 803.96, "text": "million."}, {"end": 804.84, "start": 804.68, "text": "But"}, {"end": 805.4, "start": 804.84, "text": "today,"}, {"end": 805.64, "start": 805.4, "text": "we"}, {"end": 805.72, "start": 805.64, "text": "are"}, {"end": 806.84, "start": 805.72, "text": "focusing"}, {"end": 806.96, "start": 806.84, "text": "here"}, {"end": 807.12, "start": 806.96, "text": "on"}, {"end": 807.56, "start": 807.12, "text": "this"}, {"end": 807.76, "start": 807.56, "text": "model,"}, {"end": 807.84, "start": 807.76, "text": "on"}, {"end": 808.0, "start": 807.84, "text": "this"}, {"end": 808.2, "start": 808.0, "text": "paper"}, {"end": 808.24, "start": 808.2, "text": "that"}, {"end": 808.56, "start": 808.24, "text": "I"}, {"end": 809.04, "start": 808.56, "text": "showed"}, {"end": 809.32, "start": 809.04, "text": "you,"}, {"end": 809.64, "start": 809.32, "text": "because"}, {"end": 809.76, "start": 809.64, "text": "they"}, {"end": 809.96, "start": 809.76, "text": "call"}], "text": " especially in the pre-training stage. If you think this is new, no. I learned and I was amazed that, look, there are other LMMs or other stories, tiny stories, tiny dialogue, baby LIM and mini GPT 2023, 2024. You see, minimal data sets and a model size from 1 million to, I don't know, 165 million. But today, we are focusing here on this model, on this paper that I showed you, because they call"}, {"chunks": [{"end": 810.16, "start": 810.0, "text": "with"}, {"end": 810.72, "start": 810.16, "text": "Tiny"}, {"end": 811.84, "start": 810.72, "text": "Helm"}, {"end": 812.2, "start": 811.84, "text": "because"}, {"end": 812.72, "start": 812.2, "text": "they"}, {"end": 813.16, "start": 812.72, "text": "have"}, {"end": 813.4, "start": 813.16, "text": "now"}, {"end": 813.44, "start": 813.4, "text": "the"}, {"end": 813.56, "start": 813.44, "text": "insight"}, {"end": 813.68, "start": 813.56, "text": "from"}, {"end": 813.88, "start": 813.68, "text": "all"}, {"end": 814.04, "start": 813.88, "text": "the"}, {"end": 814.36, "start": 814.04, "text": "heart"}, {"end": 814.84, "start": 814.36, "text": "of"}, {"end": 815.0, "start": 814.84, "text": "mouth"}, {"end": 815.32, "start": 815.0, "text": "and"}, {"end": 815.56, "start": 815.32, "text": "they"}, {"end": 815.92, "start": 815.56, "text": "try"}, {"end": 816.24, "start": 815.92, "text": "to"}, {"end": 816.56, "start": 816.24, "text": "build"}, {"end": 816.84, "start": 816.56, "text": "now"}, {"end": 817.28, "start": 816.84, "text": "the"}, {"end": 817.8, "start": 817.28, "text": "new"}, {"end": 818.92, "start": 817.8, "text": "best-in-class"}, {"end": 819.76, "start": 818.92, "text": "Tiny"}, {"end": 821.4, "start": 819.76, "text": "LLM."}, {"end": 821.52, "start": 821.4, "text": "So"}, {"end": 821.76, "start": 821.52, "text": "we"}, {"end": 821.84, "start": 821.76, "text": "look"}, {"end": 822.08, "start": 821.84, "text": "now"}, {"end": 822.12, "start": 822.08, "text": "at"}, {"end": 822.6, "start": 822.12, "text": "a"}, {"end": 823.24, "start": 822.6, "text": "14"}, {"end": 823.68, "start": 823.24, "text": "million"}, {"end": 823.88, "start": 823.68, "text": "free"}, {"end": 824.36, "start": 823.88, "text": "trainable"}, {"end": 825.36, "start": 824.36, "text": "parameter"}, {"end": 826.0, "start": 825.36, "text": "architecture"}, {"end": 826.16, "start": 826.0, "text": "of"}, {"end": 826.44, "start": 826.16, "text": "our"}, {"end": 826.6, "start": 826.44, "text": "deep"}, {"end": 826.92, "start": 826.6, "text": "neural"}, {"end": 827.16, "start": 826.92, "text": "network"}, {"end": 828.44, "start": 827.16, "text": "and"}, {"end": 828.8, "start": 828.44, "text": "this"}, {"end": 829.2, "start": 828.8, "text": "is"}, {"end": 829.36, "start": 829.2, "text": "now"}, {"end": 830.0, "start": 829.36, "text": "connected"}, {"end": 830.2, "start": 830.0, "text": "if"}, {"end": 830.32, "start": 830.2, "text": "you"}, {"end": 830.48, "start": 830.32, "text": "are"}, {"end": 830.8, "start": 830.48, "text": "looking"}, {"end": 830.88, "start": 830.8, "text": "a"}, {"end": 831.12, "start": 830.88, "text": "little"}, {"end": 831.32, "start": 831.12, "text": "bit"}, {"end": 831.4, "start": 831.32, "text": "into"}, {"end": 831.64, "start": 831.4, "text": "the"}, {"end": 832.44, "start": 831.64, "text": "semantic,"}, {"end": 832.64, "start": 832.44, "text": "into"}, {"end": 832.88, "start": 832.64, "text": "the"}, {"end": 833.6, "start": 832.88, "text": "linguistic"}, {"end": 834.32, "start": 833.6, "text": "structure,"}, {"end": 834.88, "start": 834.32, "text": "how"}, {"end": 835.04, "start": 834.88, "text": "we"}, {"end": 835.4, "start": 835.04, "text": "talk,"}, {"end": 835.68, "start": 835.4, "text": "how"}, {"end": 836.16, "start": 835.68, "text": "we"}, {"end": 836.36, "start": 836.16, "text": "have"}, {"end": 836.36, "start": 836.36, "text": "a"}, {"end": 837.08, "start": 836.36, "text": "complexity"}, {"end": 837.32, "start": 837.08, "text": "when"}, {"end": 837.6, "start": 837.32, "text": "we"}, {"end": 838.56, "start": 837.6, "text": "express"}, {"end": 839.4, "start": 838.56, "text": "ourselves."}, {"end": 839.72, "start": 839.4, "text": "I"}, {"end": 839.96, "start": 839.72, "text": "found"}], "text": " with Tiny Helm because they have now the insight from all the heart of mouth and they try to build now the new best-in-class Tiny LLM. So we look now at a 14 million free trainable parameter architecture of our deep neural network and this is now connected if you are looking a little bit into the semantic, into the linguistic structure, how we talk, how we have a complexity when we express ourselves. I found"}, {"chunks": [{"end": 840.2, "start": 840.0, "text": "This"}, {"end": 840.48, "start": 840.2, "text": "study"}, {"end": 840.92, "start": 840.48, "text": "helpful"}, {"end": 841.24, "start": 840.92, "text": "here"}, {"end": 841.48, "start": 841.24, "text": "by"}, {"end": 841.84, "start": 841.48, "text": "Stanford"}, {"end": 842.72, "start": 841.84, "text": "University,"}, {"end": 843.04, "start": 842.72, "text": "a"}, {"end": 843.8, "start": 843.04, "text": "recursive"}, {"end": 844.28, "start": 843.8, "text": "deep"}, {"end": 844.8, "start": 844.28, "text": "model"}, {"end": 844.96, "start": 844.8, "text": "for"}, {"end": 845.8, "start": 844.96, "text": "semantic"}, {"end": 846.84, "start": 845.8, "text": "compositional"}, {"end": 847.68, "start": 846.84, "text": "ability"}, {"end": 848.0, "start": 847.68, "text": "over"}, {"end": 848.28, "start": 848.0, "text": "a"}, {"end": 848.6, "start": 848.28, "text": "semantic"}, {"end": 848.88, "start": 848.6, "text": "tree"}, {"end": 849.16, "start": 848.88, "text": "bank."}, {"end": 849.32, "start": 849.16, "text": "Never"}, {"end": 850.8, "start": 849.32, "text": "mind."}, {"end": 851.6, "start": 850.8, "text": "Stanford"}, {"end": 852.2, "start": 851.6, "text": "introduced"}, {"end": 852.48, "start": 852.2, "text": "here"}, {"end": 852.96, "start": 852.48, "text": "the"}, {"end": 853.36, "start": 852.96, "text": "recursive"}, {"end": 853.56, "start": 853.36, "text": "neural"}, {"end": 854.04, "start": 853.56, "text": "tensor"}, {"end": 854.2, "start": 854.04, "text": "network."}, {"end": 854.52, "start": 854.2, "text": "And"}, {"end": 855.0, "start": 854.52, "text": "I"}, {"end": 855.24, "start": 855.0, "text": "think"}, {"end": 855.52, "start": 855.24, "text": "this"}, {"end": 855.6, "start": 855.52, "text": "gives"}, {"end": 855.64, "start": 855.6, "text": "you"}, {"end": 856.32, "start": 855.64, "text": "here"}, {"end": 856.56, "start": 856.32, "text": "a"}, {"end": 856.56, "start": 856.56, "text": "good"}, {"end": 856.92, "start": 856.56, "text": "idea"}, {"end": 857.2, "start": 856.92, "text": "if"}, {"end": 857.56, "start": 857.2, "text": "you"}, {"end": 857.88, "start": 857.56, "text": "want"}, {"end": 858.04, "start": 857.88, "text": "to"}, {"end": 858.4, "start": 858.04, "text": "have"}, {"end": 858.52, "start": 858.4, "text": "in"}, {"end": 859.08, "start": 858.52, "text": "computer"}, {"end": 860.12, "start": 859.08, "text": "science"}, {"end": 860.52, "start": 860.12, "text": "an"}, {"end": 861.4, "start": 860.52, "text": "idea"}, {"end": 861.92, "start": 861.4, "text": "where"}, {"end": 862.24, "start": 861.92, "text": "we"}, {"end": 862.36, "start": 862.24, "text": "are"}, {"end": 862.4, "start": 862.36, "text": "with"}, {"end": 862.96, "start": 862.4, "text": "complexity"}, {"end": 863.8, "start": 862.96, "text": "assessment"}, {"end": 863.92, "start": 863.8, "text": "of"}, {"end": 864.36, "start": 863.92, "text": "human"}, {"end": 864.92, "start": 864.36, "text": "language"}, {"end": 865.28, "start": 864.92, "text": "and"}, {"end": 865.84, "start": 865.28, "text": "how"}, {"end": 865.92, "start": 865.84, "text": "we"}, {"end": 866.16, "start": 865.92, "text": "can"}, {"end": 866.36, "start": 866.16, "text": "maybe"}, {"end": 866.84, "start": 866.36, "text": "reduce"}, {"end": 867.04, "start": 866.84, "text": "the"}, {"end": 867.68, "start": 867.04, "text": "complexity"}, {"end": 868.08, "start": 867.68, "text": "but"}, {"end": 868.36, "start": 868.08, "text": "keep"}, {"end": 868.48, "start": 868.36, "text": "the"}, {"end": 869.08, "start": 868.48, "text": "semantic"}, {"end": 869.96, "start": 869.08, "text": "information."}], "text": " This study helpful here by Stanford University, a recursive deep model for semantic compositional ability over a semantic tree bank. Never mind. Stanford introduced here the recursive neural tensor network. And I think this gives you here a good idea if you want to have in computer science an idea where we are with complexity assessment of human language and how we can maybe reduce the complexity but keep the semantic information."}, {"chunks": [{"end": 872.68, "start": 870.0, "text": "This"}, {"end": 873.04, "start": 872.68, "text": "explores"}, {"end": 873.48, "start": 873.04, "text": "how"}, {"end": 873.76, "start": 873.48, "text": "the"}, {"end": 874.0, "start": 873.76, "text": "data"}, {"end": 875.0, "start": 874.0, "text": "simplification"}, {"end": 875.6, "start": 875.0, "text": "through"}, {"end": 876.08, "start": 875.6, "text": "filtering"}, {"end": 876.4, "start": 876.08, "text": "here"}, {"end": 876.64, "start": 876.4, "text": "web"}, {"end": 877.0, "start": 876.64, "text": "data"}, {"end": 877.76, "start": 877.0, "text": "truncations"}, {"end": 878.04, "start": 877.76, "text": "using"}, {"end": 878.2, "start": 878.04, "text": "here"}, {"end": 878.32, "start": 878.2, "text": "a"}, {"end": 879.0, "start": 878.32, "text": "vocabulary"}, {"end": 879.44, "start": 879.0, "text": "of"}, {"end": 880.04, "start": 879.44, "text": "extreme"}, {"end": 880.88, "start": 880.04, "text": "20,000"}, {"end": 881.52, "start": 880.88, "text": "words"}, {"end": 882.04, "start": 881.52, "text": "can"}, {"end": 882.52, "start": 882.04, "text": "induce"}, {"end": 883.44, "start": 882.52, "text": "here"}, {"end": 884.2, "start": 883.44, "text": "emergent"}, {"end": 884.84, "start": 884.2, "text": "abilities"}, {"end": 885.0, "start": 884.84, "text": "in"}, {"end": 885.48, "start": 885.0, "text": "tiny"}, {"end": 886.32, "start": 885.48, "text": "models."}, {"end": 887.2, "start": 886.32, "text": "This"}, {"end": 887.56, "start": 887.2, "text": "is"}, {"end": 887.56, "start": 887.56, "text": "a"}, {"end": 888.08, "start": 887.56, "text": "question"}, {"end": 888.16, "start": 888.08, "text": "in"}, {"end": 888.76, "start": 888.16, "text": "itself,"}, {"end": 889.12, "start": 888.76, "text": "are"}, {"end": 889.6, "start": 889.12, "text": "there"}, {"end": 890.12, "start": 889.6, "text": "emergent"}, {"end": 890.64, "start": 890.12, "text": "abilities,"}, {"end": 891.16, "start": 890.64, "text": "especially"}, {"end": 891.28, "start": 891.16, "text": "in"}, {"end": 891.72, "start": 891.28, "text": "tiny"}, {"end": 892.32, "start": 891.72, "text": "models?"}, {"end": 892.8, "start": 892.32, "text": "I"}, {"end": 892.92, "start": 892.8, "text": "would"}, {"end": 892.96, "start": 892.92, "text": "put"}, {"end": 893.4, "start": 892.96, "text": "here"}, {"end": 893.56, "start": 893.4, "text": "a"}, {"end": 893.6, "start": 893.56, "text": "big"}, {"end": 894.32, "start": 893.6, "text": "question"}, {"end": 894.6, "start": 894.32, "text": "mark,"}, {"end": 895.08, "start": 894.6, "text": "but"}, {"end": 895.32, "start": 895.08, "text": "okay,"}, {"end": 895.4, "start": 895.32, "text": "if"}, {"end": 895.44, "start": 895.4, "text": "you"}, {"end": 895.44, "start": 895.44, "text": "want"}, {"end": 895.44, "start": 895.44, "text": "to"}, {"end": 895.44, "start": 895.44, "text": "go"}, {"end": 896.04, "start": 895.44, "text": "here"}, {"end": 896.48, "start": 896.04, "text": "with"}, {"end": 896.52, "start": 896.48, "text": "the"}, {"end": 897.04, "start": 896.52, "text": "research"}, {"end": 897.32, "start": 897.04, "text": "from"}, {"end": 898.0, "start": 897.32, "text": "Stanford,"}, {"end": 898.52, "start": 898.0, "text": "yeah,"}, {"end": 898.88, "start": 898.52, "text": "great"}, {"end": 899.12, "start": 898.88, "text": "paper"}, {"end": 899.32, "start": 899.12, "text": "to"}, {"end": 899.64, "start": 899.32, "text": "read."}, {"end": 899.96, "start": 899.64, "text": "If"}], "text": " This explores how the data simplification through filtering here web data truncations using here a vocabulary of extreme 20,000 words can induce here emergent abilities in tiny models. This is a question in itself, are there emergent abilities, especially in tiny models? I would put here a big question mark, but okay, if you want to go here with the research from Stanford, yeah, great paper to read. If"}, {"chunks": [{"end": 900.2, "start": 900.0, "text": "you"}, {"end": 900.44, "start": 900.2, "text": "are"}, {"end": 900.84, "start": 900.44, "text": "interested"}, {"end": 900.96, "start": 900.84, "text": "in"}, {"end": 902.0, "start": 900.96, "text": "this."}, {"end": 902.48, "start": 902.0, "text": "At"}, {"end": 903.12, "start": 902.48, "text": "the"}, {"end": 903.56, "start": 903.12, "text": "beginning,"}, {"end": 903.68, "start": 903.56, "text": "I"}, {"end": 903.88, "start": 903.68, "text": "told"}, {"end": 904.16, "start": 903.88, "text": "you"}, {"end": 904.44, "start": 904.16, "text": "we"}, {"end": 904.64, "start": 904.44, "text": "are"}, {"end": 904.88, "start": 904.64, "text": "going"}, {"end": 905.2, "start": 904.88, "text": "to"}, {"end": 905.56, "start": 905.2, "text": "look"}, {"end": 905.6, "start": 905.56, "text": "at"}, {"end": 905.96, "start": 905.6, "text": "different"}, {"end": 906.36, "start": 905.96, "text": "architectures."}, {"end": 906.6, "start": 906.36, "text": "Now,"}, {"end": 907.04, "start": 906.6, "text": "you"}, {"end": 907.36, "start": 907.04, "text": "are"}, {"end": 907.48, "start": 907.36, "text": "familiar"}, {"end": 907.52, "start": 907.48, "text": "with"}, {"end": 907.56, "start": 907.52, "text": "the"}, {"end": 908.0, "start": 907.56, "text": "transformer"}, {"end": 908.68, "start": 908.0, "text": "architecture,"}, {"end": 908.84, "start": 908.68, "text": "where"}, {"end": 909.0, "start": 908.84, "text": "we"}, {"end": 909.28, "start": 909.0, "text": "have"}, {"end": 909.52, "start": 909.28, "text": "either"}, {"end": 909.56, "start": 909.52, "text": "the"}, {"end": 909.76, "start": 909.56, "text": "BERT"}, {"end": 910.48, "start": 909.76, "text": "architecture"}, {"end": 910.8, "start": 910.48, "text": "or"}, {"end": 911.12, "start": 910.8, "text": "something"}, {"end": 911.28, "start": 911.12, "text": "like"}, {"end": 911.44, "start": 911.28, "text": "a"}, {"end": 911.68, "start": 911.44, "text": "LAMA"}, {"end": 912.2, "start": 911.68, "text": "architecture"}, {"end": 912.44, "start": 912.2, "text": "or"}, {"end": 912.64, "start": 912.44, "text": "to"}, {"end": 913.12, "start": 912.64, "text": "regressive"}, {"end": 913.48, "start": 913.12, "text": "decoder,"}, {"end": 913.56, "start": 913.48, "text": "but"}, {"end": 914.12, "start": 913.56, "text": "there"}, {"end": 914.44, "start": 914.12, "text": "is"}, {"end": 914.84, "start": 914.44, "text": "also"}, {"end": 915.0, "start": 914.84, "text": "here"}, {"end": 915.2, "start": 915.0, "text": "the"}, {"end": 915.48, "start": 915.2, "text": "MAMBA"}, {"end": 916.56, "start": 915.48, "text": "architecture."}, {"end": 916.96, "start": 916.56, "text": "And"}, {"end": 917.44, "start": 916.96, "text": "if"}, {"end": 917.48, "start": 917.44, "text": "you're"}, {"end": 917.56, "start": 917.48, "text": "new"}, {"end": 917.8, "start": 917.56, "text": "to"}, {"end": 917.88, "start": 917.8, "text": "AI,"}, {"end": 918.28, "start": 917.88, "text": "I"}, {"end": 918.4, "start": 918.28, "text": "have"}, {"end": 918.52, "start": 918.4, "text": "here"}, {"end": 918.76, "start": 918.52, "text": "two"}, {"end": 919.2, "start": 918.76, "text": "videos"}, {"end": 919.4, "start": 919.2, "text": "where"}, {"end": 919.56, "start": 919.4, "text": "I"}, {"end": 919.92, "start": 919.56, "text": "explain,"}, {"end": 920.24, "start": 919.92, "text": "hey,"}, {"end": 920.72, "start": 920.24, "text": "MAMBA"}, {"end": 921.28, "start": 920.72, "text": "EIS-6"}, {"end": 921.44, "start": 921.28, "text": "are"}, {"end": 921.48, "start": 921.44, "text": "the"}, {"end": 921.6, "start": 921.48, "text": "better"}, {"end": 921.72, "start": 921.6, "text": "than"}, {"end": 921.96, "start": 921.72, "text": "a"}, {"end": 922.56, "start": 921.96, "text": "transformer,"}, {"end": 922.96, "start": 922.56, "text": "and"}, {"end": 923.2, "start": 922.96, "text": "I"}, {"end": 923.52, "start": 923.2, "text": "code"}, {"end": 923.84, "start": 923.52, "text": "here"}, {"end": 924.44, "start": 923.84, "text": "with"}, {"end": 924.8, "start": 924.44, "text": "you,"}, {"end": 925.08, "start": 924.8, "text": "here,"}, {"end": 925.28, "start": 925.08, "text": "a"}, {"end": 925.6, "start": 925.28, "text": "mixture"}, {"end": 925.88, "start": 925.6, "text": "of"}, {"end": 926.12, "start": 925.88, "text": "expert"}, {"end": 926.48, "start": 926.12, "text": "model"}, {"end": 926.64, "start": 926.48, "text": "here"}, {"end": 926.96, "start": 926.64, "text": "with"}, {"end": 927.28, "start": 926.96, "text": "a"}, {"end": 927.96, "start": 927.28, "text": "transformer,"}, {"end": 928.04, "start": 927.96, "text": "the"}, {"end": 928.64, "start": 928.04, "text": "code,"}, {"end": 928.88, "start": 928.64, "text": "and"}, {"end": 929.16, "start": 928.88, "text": "we"}, {"end": 929.44, "start": 929.16, "text": "also,"}, {"end": 929.64, "start": 929.44, "text": "if"}, {"end": 929.68, "start": 929.64, "text": "you"}, {"end": 929.84, "start": 929.68, "text": "go"}, {"end": 929.96, "start": 929.84, "text": "back"}], "text": " you are interested in this. At the beginning, I told you we are going to look at different architectures. Now, you are familiar with the transformer architecture, where we have either the BERT architecture or something like a LAMA architecture or to regressive decoder, but there is also here the MAMBA architecture. And if you're new to AI, I have here two videos where I explain, hey, MAMBA EIS-6 are the better than a transformer, and I code here with you, here, a mixture of expert model here with a transformer, the code, and we also, if you go back"}, {"chunks": [{"end": 930.08, "start": 930.0, "text": "Back"}, {"end": 930.16, "start": 930.08, "text": "to"}, {"end": 930.32, "start": 930.16, "text": "the"}, {"end": 930.64, "start": 930.32, "text": "pure"}, {"end": 931.0, "start": 930.64, "text": "Mamba,"}, {"end": 931.16, "start": 931.0, "text": "we"}, {"end": 931.68, "start": 931.16, "text": "fine-tune"}, {"end": 931.84, "start": 931.68, "text": "it,"}, {"end": 932.08, "start": 931.84, "text": "we"}, {"end": 932.28, "start": 932.08, "text": "have"}, {"end": 932.36, "start": 932.28, "text": "a"}, {"end": 932.8, "start": 932.36, "text": "DPO"}, {"end": 932.96, "start": 932.8, "text": "alignment,"}, {"end": 933.2, "start": 932.96, "text": "I"}, {"end": 933.52, "start": 933.2, "text": "show"}, {"end": 933.6, "start": 933.52, "text": "you"}, {"end": 933.84, "start": 933.6, "text": "here,"}, {"end": 934.04, "start": 933.84, "text": "the"}, {"end": 934.28, "start": 934.04, "text": "code,"}, {"end": 934.48, "start": 934.28, "text": "we"}, {"end": 934.64, "start": 934.48, "text": "do"}, {"end": 934.88, "start": 934.64, "text": "a"}, {"end": 935.12, "start": 934.88, "text": "test"}, {"end": 935.56, "start": 935.12, "text": "assessment,"}, {"end": 935.84, "start": 935.56, "text": "and"}, {"end": 936.2, "start": 935.84, "text": "I"}, {"end": 936.48, "start": 936.2, "text": "even"}, {"end": 936.84, "start": 936.48, "text": "have"}, {"end": 936.96, "start": 936.84, "text": "a"}, {"end": 937.12, "start": 936.96, "text": "video."}, {"end": 937.24, "start": 937.12, "text": "If"}, {"end": 937.36, "start": 937.24, "text": "you"}, {"end": 937.6, "start": 937.36, "text": "want"}, {"end": 937.6, "start": 937.6, "text": "to"}, {"end": 937.8, "start": 937.6, "text": "go"}, {"end": 938.0, "start": 937.8, "text": "beyond"}, {"end": 938.12, "start": 938.0, "text": "the"}, {"end": 938.92, "start": 938.12, "text": "classical"}, {"end": 939.44, "start": 938.92, "text": "Mamba"}, {"end": 940.08, "start": 939.44, "text": "AI"}, {"end": 941.08, "start": 940.08, "text": "implementation,"}, {"end": 941.6, "start": 941.08, "text": "if"}, {"end": 941.84, "start": 941.6, "text": "we"}, {"end": 942.24, "start": 941.84, "text": "say,"}, {"end": 942.44, "start": 942.24, "text": "okay,"}, {"end": 942.64, "start": 942.44, "text": "we"}, {"end": 942.8, "start": 942.64, "text": "want"}, {"end": 943.12, "start": 942.8, "text": "to"}, {"end": 943.44, "start": 943.12, "text": "integrate"}, {"end": 943.56, "start": 943.44, "text": "our"}, {"end": 944.52, "start": 943.56, "text": "self-attention"}, {"end": 944.6, "start": 944.52, "text": "here,"}, {"end": 944.84, "start": 944.6, "text": "we"}, {"end": 945.16, "start": 944.84, "text": "can"}, {"end": 945.48, "start": 945.16, "text": "do"}, {"end": 945.64, "start": 945.48, "text": "this"}, {"end": 945.84, "start": 945.64, "text": "with"}, {"end": 946.24, "start": 945.84, "text": "vector"}, {"end": 947.2, "start": 946.24, "text": "fields."}, {"end": 947.68, "start": 947.2, "text": "So,"}, {"end": 947.8, "start": 947.68, "text": "you"}, {"end": 948.0, "start": 947.8, "text": "see,"}, {"end": 948.36, "start": 948.0, "text": "already"}, {"end": 948.64, "start": 948.36, "text": "one"}, {"end": 948.96, "start": 948.64, "text": "year"}, {"end": 949.12, "start": 948.96, "text": "ago,"}, {"end": 949.44, "start": 949.12, "text": "we"}, {"end": 949.68, "start": 949.44, "text": "went"}, {"end": 950.04, "start": 949.68, "text": "far"}, {"end": 950.68, "start": 950.04, "text": "beyond"}, {"end": 950.84, "start": 950.68, "text": "what"}, {"end": 951.08, "start": 950.84, "text": "is"}, {"end": 951.52, "start": 951.08, "text": "called"}, {"end": 951.8, "start": 951.52, "text": "Mamba"}, {"end": 954.52, "start": 951.8, "text": "today."}, {"end": 954.72, "start": 954.52, "text": "Now,"}, {"end": 954.96, "start": 954.72, "text": "let's"}, {"end": 955.0, "start": 954.96, "text": "look"}, {"end": 955.0, "start": 955.0, "text": "at"}, {"end": 955.4, "start": 955.0, "text": "the"}, {"end": 955.88, "start": 955.4, "text": "models"}, {"end": 955.92, "start": 955.88, "text": "that"}, {"end": 956.24, "start": 955.92, "text": "they"}, {"end": 956.44, "start": 956.24, "text": "looked"}, {"end": 956.48, "start": 956.44, "text": "at."}, {"end": 956.72, "start": 956.48, "text": "First"}, {"end": 957.68, "start": 956.72, "text": "is"}, {"end": 957.76, "start": 957.68, "text": "a"}, {"end": 958.84, "start": 957.76, "text": "simple"}, {"end": 959.12, "start": 958.84, "text": "BERT"}, {"end": 959.48, "start": 959.12, "text": "model."}, {"end": 959.96, "start": 959.48, "text": "Remember,"}], "text": " Back to the pure Mamba, we fine-tune it, we have a DPO alignment, I show you here, the code, we do a test assessment, and I even have a video. If you want to go beyond the classical Mamba AI implementation, if we say, okay, we want to integrate our self-attention here, we can do this with vector fields. So, you see, already one year ago, we went far beyond what is called Mamba today. Now, let's look at the models that they looked at. First is a simple BERT model. Remember,"}, {"chunks": [{"end": 960.48, "start": 960.0, "text": "the"}, {"end": 961.12, "start": 960.48, "text": "encoder"}, {"end": 961.56, "start": 961.12, "text": "part"}, {"end": 961.6, "start": 961.56, "text": "of"}, {"end": 961.76, "start": 961.6, "text": "the"}, {"end": 962.44, "start": 961.76, "text": "transformer"}, {"end": 962.8, "start": 962.44, "text": "a"}, {"end": 963.64, "start": 962.8, "text": "14"}, {"end": 964.48, "start": 963.64, "text": "million"}, {"end": 964.88, "start": 964.48, "text": "model"}, {"end": 965.76, "start": 964.88, "text": "size"}, {"end": 966.4, "start": 965.76, "text": "vocabulary"}, {"end": 966.88, "start": 966.4, "text": "size"}, {"end": 967.96, "start": 966.88, "text": "2000"}, {"end": 968.36, "start": 967.96, "text": "then"}, {"end": 968.84, "start": 968.36, "text": "they"}, {"end": 969.08, "start": 968.84, "text": "go"}, {"end": 969.24, "start": 969.08, "text": "for"}, {"end": 969.24, "start": 969.24, "text": "the"}, {"end": 969.6, "start": 969.24, "text": "decoder"}, {"end": 969.76, "start": 969.6, "text": "for"}, {"end": 970.2, "start": 969.76, "text": "the"}, {"end": 970.56, "start": 970.2, "text": "llama"}, {"end": 971.24, "start": 970.56, "text": "14"}, {"end": 972.16, "start": 971.24, "text": "million"}, {"end": 972.28, "start": 972.16, "text": "you"}, {"end": 972.48, "start": 972.28, "text": "see"}, {"end": 973.04, "start": 972.48, "text": "vocabulary"}, {"end": 973.36, "start": 973.04, "text": "size"}, {"end": 973.36, "start": 973.36, "text": "of"}, {"end": 974.28, "start": 973.36, "text": "2000"}, {"end": 974.52, "start": 974.28, "text": "so"}, {"end": 974.88, "start": 974.52, "text": "just"}, {"end": 974.92, "start": 974.88, "text": "to"}, {"end": 974.92, "start": 974.92, "text": "make"}, {"end": 975.04, "start": 974.92, "text": "this"}, {"end": 975.32, "start": 975.04, "text": "sure"}, {"end": 975.44, "start": 975.32, "text": "this"}, {"end": 976.04, "start": 975.44, "text": "is"}, {"end": 976.44, "start": 976.04, "text": "a"}, {"end": 976.64, "start": 976.44, "text": "really"}, {"end": 977.48, "start": 976.64, "text": "experiment"}, {"end": 978.04, "start": 977.48, "text": "this"}, {"end": 978.6, "start": 978.04, "text": "is"}, {"end": 978.8, "start": 978.6, "text": "a"}, {"end": 978.96, "start": 978.8, "text": "really"}, {"end": 979.2, "start": 978.96, "text": "a"}, {"end": 979.88, "start": 979.2, "text": "simplified"}, {"end": 980.44, "start": 979.88, "text": "demonstration"}, {"end": 980.84, "start": 980.44, "text": "but"}, {"end": 981.04, "start": 980.84, "text": "with"}, {"end": 981.16, "start": 981.04, "text": "the"}, {"end": 981.76, "start": 981.16, "text": "full"}, {"end": 982.6, "start": 981.76, "text": "computational"}, {"end": 983.72, "start": 982.6, "text": "complexity"}, {"end": 984.24, "start": 983.72, "text": "and"}, {"end": 984.52, "start": 984.24, "text": "now"}, {"end": 984.52, "start": 984.52, "text": "the"}, {"end": 985.12, "start": 984.52, "text": "question"}, {"end": 986.04, "start": 985.12, "text": "is"}, {"end": 986.24, "start": 986.04, "text": "are"}, {"end": 986.52, "start": 986.24, "text": "we"}, {"end": 986.64, "start": 986.52, "text": "able"}, {"end": 986.8, "start": 986.64, "text": "to"}, {"end": 987.28, "start": 986.8, "text": "scale"}, {"end": 987.72, "start": 987.28, "text": "from"}, {"end": 988.2, "start": 987.72, "text": "this"}, {"end": 989.28, "start": 988.2, "text": "14"}, {"end": 989.6, "start": 989.28, "text": "million"}, {"end": 989.96, "start": 989.6, "text": "tiny"}], "text": " the encoder part of the transformer a 14 million model size vocabulary size 2000 then they go for the decoder for the llama 14 million you see vocabulary size of 2000 so just to make this sure this is a really experiment this is a really a simplified demonstration but with the full computational complexity and now the question is are we able to scale from this 14 million tiny"}, {"chunks": [{"end": 990.68, "start": 990.0, "text": "almost"}, {"end": 991.72, "start": 990.68, "text": "nothing"}, {"end": 992.88, "start": 991.72, "text": "size"}, {"end": 994.0, "start": 992.88, "text": "to"}, {"end": 995.04, "start": 994.0, "text": "more"}, {"end": 995.64, "start": 995.04, "text": "potent"}, {"end": 995.92, "start": 995.64, "text": "edge"}, {"end": 996.64, "start": 995.92, "text": "devices"}, {"end": 996.68, "start": 996.64, "text": "or"}, {"end": 996.84, "start": 996.68, "text": "even"}, {"end": 997.36, "start": 996.84, "text": "to"}, {"end": 997.84, "start": 997.36, "text": "normal"}, {"end": 998.32, "start": 997.84, "text": "LLMs."}, {"end": 998.36, "start": 998.32, "text": "So"}, {"end": 998.76, "start": 998.36, "text": "we"}, {"end": 999.04, "start": 998.76, "text": "are"}, {"end": 999.28, "start": 999.04, "text": "not"}, {"end": 999.96, "start": 999.28, "text": "looking"}, {"end": 1000.2, "start": 999.96, "text": "up"}, {"end": 1000.52, "start": 1000.2, "text": "to"}, {"end": 1000.88, "start": 1000.52, "text": "huge"}, {"end": 1001.2, "start": 1000.88, "text": "miles,"}, {"end": 1001.4, "start": 1001.2, "text": "but"}, {"end": 1001.72, "start": 1001.4, "text": "we"}, {"end": 1001.8, "start": 1001.72, "text": "are"}, {"end": 1002.28, "start": 1001.8, "text": "really"}, {"end": 1002.44, "start": 1002.28, "text": "looking"}, {"end": 1003.08, "start": 1002.44, "text": "down"}, {"end": 1003.32, "start": 1003.08, "text": "how"}, {"end": 1004.36, "start": 1003.32, "text": "small"}, {"end": 1004.6, "start": 1004.36, "text": "can"}, {"end": 1004.88, "start": 1004.6, "text": "we"}, {"end": 1005.4, "start": 1004.88, "text": "go"}, {"end": 1005.6, "start": 1005.4, "text": "with"}, {"end": 1005.8, "start": 1005.6, "text": "the"}, {"end": 1005.96, "start": 1005.8, "text": "architecture"}, {"end": 1006.08, "start": 1005.96, "text": "that"}, {"end": 1006.36, "start": 1006.08, "text": "you"}, {"end": 1006.96, "start": 1006.36, "text": "have"}, {"end": 1007.2, "start": 1006.96, "text": "an"}, {"end": 1007.68, "start": 1007.2, "text": "AI"}, {"end": 1007.92, "start": 1007.68, "text": "on"}, {"end": 1008.24, "start": 1007.92, "text": "your"}, {"end": 1008.48, "start": 1008.24, "text": "Apple"}, {"end": 1008.88, "start": 1008.48, "text": "Watch"}, {"end": 1009.24, "start": 1008.88, "text": "that"}, {"end": 1009.52, "start": 1009.24, "text": "is"}, {"end": 1009.84, "start": 1009.52, "text": "really"}, {"end": 1010.28, "start": 1009.84, "text": "locally"}, {"end": 1010.68, "start": 1010.28, "text": "performing"}, {"end": 1011.6, "start": 1010.68, "text": "the"}, {"end": 1012.76, "start": 1011.6, "text": "intelligence."}, {"end": 1012.88, "start": 1012.76, "text": "Yeah,"}, {"end": 1013.12, "start": 1012.88, "text": "and"}, {"end": 1013.32, "start": 1013.12, "text": "then"}, {"end": 1013.64, "start": 1013.32, "text": "the"}, {"end": 1013.88, "start": 1013.64, "text": "Mamba"}, {"end": 1014.2, "start": 1013.88, "text": "task"}, {"end": 1014.48, "start": 1014.2, "text": "here,"}, {"end": 1014.72, "start": 1014.48, "text": "this"}, {"end": 1014.88, "start": 1014.72, "text": "is"}, {"end": 1015.16, "start": 1014.88, "text": "the"}, {"end": 1015.48, "start": 1015.16, "text": "specification"}, {"end": 1015.56, "start": 1015.48, "text": "of"}, {"end": 1015.68, "start": 1015.56, "text": "the"}, {"end": 1016.24, "start": 1015.68, "text": "14"}, {"end": 1016.48, "start": 1016.24, "text": "million"}, {"end": 1016.8, "start": 1016.48, "text": "Mamba"}, {"end": 1017.32, "start": 1016.8, "text": "project."}, {"end": 1018.4, "start": 1017.32, "text": "For"}, {"end": 1018.6, "start": 1018.4, "text": "the"}, {"end": 1018.88, "start": 1018.6, "text": "hardware,"}, {"end": 1019.08, "start": 1018.88, "text": "I"}, {"end": 1019.28, "start": 1019.08, "text": "think"}, {"end": 1019.4, "start": 1019.28, "text": "it's"}, {"end": 1019.96, "start": 1019.4, "text": "interesting"}], "text": " almost nothing size to more potent edge devices or even to normal LLMs. So we are not looking up to huge miles, but we are really looking down how small can we go with the architecture that you have an AI on your Apple Watch that is really locally performing the intelligence. Yeah, and then the Mamba task here, this is the specification of the 14 million Mamba project. For the hardware, I think it's interesting"}, {"chunks": [{"end": 1020.28, "start": 1020.0, "text": "what"}, {"end": 1020.48, "start": 1020.28, "text": "you"}, {"end": 1020.72, "start": 1020.48, "text": "have"}, {"end": 1020.92, "start": 1020.72, "text": "to"}, {"end": 1021.48, "start": 1020.92, "text": "use"}, {"end": 1021.76, "start": 1021.48, "text": "for"}, {"end": 1022.16, "start": 1021.76, "text": "14"}, {"end": 1022.68, "start": 1022.16, "text": "million"}, {"end": 1023.68, "start": 1022.68, "text": "pre-training"}, {"end": 1023.96, "start": 1023.68, "text": "and"}, {"end": 1024.16, "start": 1023.96, "text": "they"}, {"end": 1024.6, "start": 1024.16, "text": "used"}, {"end": 1024.76, "start": 1024.6, "text": "here"}, {"end": 1025.32, "start": 1024.76, "text": "four"}, {"end": 1025.92, "start": 1025.32, "text": "NVIDIA"}, {"end": 1026.68, "start": 1025.92, "text": "A6000"}, {"end": 1026.92, "start": 1026.68, "text": "with"}, {"end": 1027.28, "start": 1026.92, "text": "each"}, {"end": 1027.72, "start": 1027.28, "text": "of"}, {"end": 1028.2, "start": 1027.72, "text": "48"}, {"end": 1032.2, "start": 1028.2, "text": "GB."}, {"end": 1032.72, "start": 1032.2, "text": "If"}, {"end": 1032.92, "start": 1032.72, "text": "you're"}, {"end": 1033.56, "start": 1032.92, "text": "interested"}, {"end": 1034.0, "start": 1033.56, "text": "in"}, {"end": 1034.64, "start": 1034.0, "text": "member-based"}, {"end": 1034.88, "start": 1034.64, "text": "language"}, {"end": 1034.92, "start": 1034.88, "text": "models,"}, {"end": 1035.12, "start": 1034.92, "text": "I"}, {"end": 1035.4, "start": 1035.12, "text": "think"}, {"end": 1035.64, "start": 1035.4, "text": "this"}, {"end": 1035.8, "start": 1035.64, "text": "is"}, {"end": 1036.04, "start": 1035.8, "text": "here"}, {"end": 1036.12, "start": 1036.04, "text": "an"}, {"end": 1036.8, "start": 1036.12, "text": "excellent"}, {"end": 1037.64, "start": 1036.8, "text": "summary,"}, {"end": 1037.8, "start": 1037.64, "text": "a"}, {"end": 1039.08, "start": 1037.8, "text": "study"}, {"end": 1039.52, "start": 1039.08, "text": "by"}, {"end": 1040.04, "start": 1039.52, "text": "NVIDIA,"}, {"end": 1040.64, "start": 1040.04, "text": "University"}, {"end": 1040.68, "start": 1040.64, "text": "of"}, {"end": 1041.28, "start": 1040.68, "text": "Wisconsin,"}, {"end": 1041.6, "start": 1041.28, "text": "Princeton"}, {"end": 1042.2, "start": 1041.6, "text": "University,"}, {"end": 1042.52, "start": 1042.2, "text": "Together"}, {"end": 1042.84, "start": 1042.52, "text": "AI,"}, {"end": 1043.24, "start": 1042.84, "text": "Carnegie"}, {"end": 1044.52, "start": 1043.24, "text": "Mellon."}, {"end": 1044.88, "start": 1044.52, "text": "They"}, {"end": 1045.16, "start": 1044.88, "text": "give"}, {"end": 1045.36, "start": 1045.16, "text": "you"}, {"end": 1045.76, "start": 1045.36, "text": "here"}, {"end": 1046.48, "start": 1045.76, "text": "really,"}, {"end": 1046.68, "start": 1046.48, "text": "I"}, {"end": 1046.8, "start": 1046.68, "text": "would"}, {"end": 1046.96, "start": 1046.8, "text": "say,"}, {"end": 1047.16, "start": 1046.96, "text": "state"}, {"end": 1047.16, "start": 1047.16, "text": "of"}, {"end": 1047.44, "start": 1047.16, "text": "the"}, {"end": 1047.8, "start": 1047.44, "text": "art"}, {"end": 1048.16, "start": 1047.8, "text": "of"}, {"end": 1048.72, "start": 1048.16, "text": "selective"}, {"end": 1049.16, "start": 1048.72, "text": "state"}, {"end": 1049.44, "start": 1049.16, "text": "space"}, {"end": 1049.64, "start": 1049.44, "text": "model"}, {"end": 1049.96, "start": 1049.64, "text": "like"}], "text": " what you have to use for 14 million pre-training and they used here four NVIDIA A6000 with each of 48 GB. If you're interested in member-based language models, I think this is here an excellent summary, a study by NVIDIA, University of Wisconsin, Princeton University, Together AI, Carnegie Mellon. They give you here really, I would say, state of the art of selective state space model like"}, {"chunks": [{"end": 1050.52, "start": 1050.0, "text": "and"}, {"end": 1050.88, "start": 1050.52, "text": "you"}, {"end": 1051.0, "start": 1050.88, "text": "know"}, {"end": 1051.28, "start": 1051.0, "text": "that"}, {"end": 1051.68, "start": 1051.28, "text": "was"}, {"end": 1051.76, "start": 1051.68, "text": "in"}, {"end": 1051.88, "start": 1051.76, "text": "the"}, {"end": 1052.12, "start": 1051.88, "text": "summer"}, {"end": 1052.6, "start": 1052.12, "text": "of"}, {"end": 1054.08, "start": 1052.6, "text": "2024,"}, {"end": 1054.16, "start": 1054.08, "text": "it"}, {"end": 1054.48, "start": 1054.16, "text": "was"}, {"end": 1054.72, "start": 1054.48, "text": "almost"}, {"end": 1055.52, "start": 1054.72, "text": "famous,"}, {"end": 1055.84, "start": 1055.52, "text": "because"}, {"end": 1056.44, "start": 1055.84, "text": "they"}, {"end": 1056.64, "start": 1056.44, "text": "showed"}, {"end": 1056.8, "start": 1056.64, "text": "here"}, {"end": 1056.8, "start": 1056.8, "text": "that"}, {"end": 1057.24, "start": 1056.8, "text": "pure"}, {"end": 1057.92, "start": 1057.24, "text": "SSM-based"}, {"end": 1058.48, "start": 1057.92, "text": "model"}, {"end": 1058.72, "start": 1058.48, "text": "match"}, {"end": 1058.84, "start": 1058.72, "text": "or"}, {"end": 1058.92, "start": 1058.84, "text": "even"}, {"end": 1059.36, "start": 1058.92, "text": "exceed"}, {"end": 1060.04, "start": 1059.36, "text": "sometimes"}, {"end": 1060.2, "start": 1060.04, "text": "the"}, {"end": 1060.88, "start": 1060.2, "text": "transform"}, {"end": 1061.0, "start": 1060.88, "text": "in"}, {"end": 1061.2, "start": 1061.0, "text": "many"}, {"end": 1062.08, "start": 1061.2, "text": "tasks,"}, {"end": 1063.6, "start": 1062.08, "text": "especially"}, {"end": 1063.76, "start": 1063.6, "text": "the"}, {"end": 1063.84, "start": 1063.76, "text": "member"}, {"end": 1064.08, "start": 1063.84, "text": "and"}, {"end": 1064.2, "start": 1064.08, "text": "the"}, {"end": 1064.44, "start": 1064.2, "text": "member"}, {"end": 1064.64, "start": 1064.44, "text": "2"}, {"end": 1065.04, "start": 1064.64, "text": "models"}, {"end": 1065.64, "start": 1065.04, "text": "lag"}, {"end": 1066.08, "start": 1065.64, "text": "behind"}, {"end": 1066.16, "start": 1066.08, "text": "the"}, {"end": 1066.76, "start": 1066.16, "text": "transformer"}, {"end": 1067.04, "start": 1066.76, "text": "model"}, {"end": 1067.24, "start": 1067.04, "text": "on"}, {"end": 1067.72, "start": 1067.24, "text": "specific"}, {"end": 1068.08, "start": 1067.72, "text": "tasks"}, {"end": 1068.24, "start": 1068.08, "text": "that"}, {"end": 1069.24, "start": 1068.24, "text": "require"}, {"end": 1070.16, "start": 1069.24, "text": "in-context"}, {"end": 1070.88, "start": 1070.16, "text": "learning."}, {"end": 1071.24, "start": 1070.88, "text": "And"}, {"end": 1071.44, "start": 1071.24, "text": "now"}, {"end": 1071.56, "start": 1071.44, "text": "you"}, {"end": 1071.8, "start": 1071.56, "text": "see"}, {"end": 1072.2, "start": 1071.8, "text": "why"}, {"end": 1072.44, "start": 1072.2, "text": "all"}, {"end": 1072.6, "start": 1072.44, "text": "of"}, {"end": 1072.96, "start": 1072.6, "text": "my"}, {"end": 1073.16, "start": 1072.96, "text": "last"}, {"end": 1073.4, "start": 1073.16, "text": "two,"}, {"end": 1073.72, "start": 1073.4, "text": "three"}, {"end": 1074.24, "start": 1073.72, "text": "videos"}, {"end": 1074.32, "start": 1074.24, "text": "were"}, {"end": 1074.8, "start": 1074.32, "text": "also"}, {"end": 1075.36, "start": 1074.8, "text": "about"}, {"end": 1076.32, "start": 1075.36, "text": "in-context"}, {"end": 1076.68, "start": 1076.32, "text": "learning"}, {"end": 1076.8, "start": 1076.68, "text": "and"}, {"end": 1077.68, "start": 1076.8, "text": "understand"}, {"end": 1077.84, "start": 1077.68, "text": "with"}, {"end": 1078.68, "start": 1077.84, "text": "mathematical"}, {"end": 1079.24, "start": 1078.68, "text": "models"}, {"end": 1079.48, "start": 1079.24, "text": "like"}, {"end": 1079.96, "start": 1079.48, "text": "here,"}], "text": " and you know that was in the summer of 2024, it was almost famous, because they showed here that pure SSM-based model match or even exceed sometimes the transform in many tasks, especially the member and the member 2 models lag behind the transformer model on specific tasks that require in-context learning. And now you see why all of my last two, three videos were also about in-context learning and understand with mathematical models like here,"}, {"chunks": [{"end": 1080.2, "start": 1080.0, "text": "the"}, {"end": 1081.2, "start": 1080.2, "text": "Direct-Clee"}, {"end": 1082.04, "start": 1081.2, "text": "energy"}, {"end": 1083.08, "start": 1082.04, "text": "minimization,"}, {"end": 1083.72, "start": 1083.08, "text": "how"}, {"end": 1084.0, "start": 1083.72, "text": "the"}, {"end": 1084.28, "start": 1084.0, "text": "in-context"}, {"end": 1084.64, "start": 1084.28, "text": "learning"}, {"end": 1086.08, "start": 1084.64, "text": "works,"}, {"end": 1086.36, "start": 1086.08, "text": "or"}, {"end": 1087.08, "start": 1086.36, "text": "the"}, {"end": 1087.36, "start": 1087.08, "text": "lag"}, {"end": 1087.68, "start": 1087.36, "text": "behind"}, {"end": 1087.8, "start": 1087.68, "text": "here,"}, {"end": 1087.96, "start": 1087.8, "text": "the"}, {"end": 1088.44, "start": 1087.96, "text": "abilities,"}, {"end": 1088.64, "start": 1088.44, "text": "if"}, {"end": 1088.84, "start": 1088.64, "text": "you"}, {"end": 1089.04, "start": 1088.84, "text": "have"}, {"end": 1089.2, "start": 1089.04, "text": "a"}, {"end": 1089.92, "start": 1089.2, "text": "transform"}, {"end": 1090.6, "start": 1089.92, "text": "architecture"}, {"end": 1090.88, "start": 1090.6, "text": "for"}, {"end": 1092.0, "start": 1090.88, "text": "long-context"}, {"end": 1093.8, "start": 1092.0, "text": "reasoning."}, {"end": 1094.0, "start": 1093.8, "text": "Just"}, {"end": 1094.28, "start": 1094.0, "text": "as"}, {"end": 1094.48, "start": 1094.28, "text": "a"}, {"end": 1094.88, "start": 1094.48, "text": "summary,"}, {"end": 1095.24, "start": 1094.88, "text": "even"}, {"end": 1095.4, "start": 1095.24, "text": "though"}, {"end": 1095.44, "start": 1095.4, "text": "I'm"}, {"end": 1095.52, "start": 1095.44, "text": "not"}, {"end": 1095.76, "start": 1095.52, "text": "really"}, {"end": 1096.12, "start": 1095.76, "text": "familiar"}, {"end": 1096.32, "start": 1096.12, "text": "with"}, {"end": 1096.56, "start": 1096.32, "text": "this."}, {"end": 1097.8, "start": 1096.56, "text": "Another"}, {"end": 1098.2, "start": 1097.8, "text": "document,"}, {"end": 1098.44, "start": 1098.2, "text": "and"}, {"end": 1098.64, "start": 1098.44, "text": "I"}, {"end": 1099.16, "start": 1098.64, "text": "have"}, {"end": 1099.4, "start": 1099.16, "text": "to"}, {"end": 1099.92, "start": 1099.4, "text": "give"}, {"end": 1100.24, "start": 1099.92, "text": "you"}, {"end": 1100.52, "start": 1100.24, "text": "here"}, {"end": 1100.76, "start": 1100.52, "text": "the"}, {"end": 1102.2, "start": 1100.76, "text": "precise"}, {"end": 1102.72, "start": 1102.2, "text": "reference,"}, {"end": 1103.12, "start": 1102.72, "text": "because"}, {"end": 1103.24, "start": 1103.12, "text": "this"}, {"end": 1103.36, "start": 1103.24, "text": "is"}, {"end": 1103.92, "start": 1103.36, "text": "copyrighted"}, {"end": 1103.92, "start": 1103.92, "text": "by"}, {"end": 1104.2, "start": 1103.92, "text": "the"}, {"end": 1104.76, "start": 1104.2, "text": "Association"}, {"end": 1104.88, "start": 1104.76, "text": "of"}, {"end": 1105.6, "start": 1104.88, "text": "Computational"}, {"end": 1106.28, "start": 1105.6, "text": "Linguistics."}, {"end": 1106.6, "start": 1106.28, "text": "It"}, {"end": 1106.92, "start": 1106.6, "text": "is"}, {"end": 1107.16, "start": 1106.92, "text": "an"}, {"end": 1107.28, "start": 1107.16, "text": "old"}, {"end": 1107.52, "start": 1107.28, "text": "document,"}, {"end": 1107.64, "start": 1107.52, "text": "it"}, {"end": 1107.96, "start": 1107.64, "text": "is"}, {"end": 1108.2, "start": 1107.96, "text": "from"}, {"end": 1108.44, "start": 1108.2, "text": "June"}, {"end": 1108.92, "start": 1108.44, "text": "2024,"}, {"end": 1109.32, "start": 1108.92, "text": "but"}, {"end": 1109.52, "start": 1109.32, "text": "I"}, {"end": 1109.96, "start": 1109.52, "text": "think,"}], "text": " the Direct-Clee energy minimization, how the in-context learning works, or the lag behind here, the abilities, if you have a transform architecture for long-context reasoning. Just as a summary, even though I'm not really familiar with this. Another document, and I have to give you here the precise reference, because this is copyrighted by the Association of Computational Linguistics. It is an old document, it is from June 2024, but I think,"}, {"chunks": [{"end": 1110.6, "start": 1110.0, "text": "It's"}, {"end": 1110.68, "start": 1110.6, "text": "an"}, {"end": 1111.08, "start": 1110.68, "text": "excellent"}, {"end": 1111.56, "start": 1111.08, "text": "summary."}, {"end": 1111.92, "start": 1111.56, "text": "This"}, {"end": 1112.16, "start": 1111.92, "text": "is"}, {"end": 1112.4, "start": 1112.16, "text": "here"}, {"end": 1112.44, "start": 1112.4, "text": "a"}, {"end": 1113.68, "start": 1112.44, "text": "pre-training"}, {"end": 1114.2, "start": 1113.68, "text": "guide"}, {"end": 1114.44, "start": 1114.2, "text": "to"}, {"end": 1114.8, "start": 1114.44, "text": "training"}, {"end": 1115.08, "start": 1114.8, "text": "here,"}, {"end": 1115.56, "start": 1115.08, "text": "the"}, {"end": 1116.2, "start": 1115.56, "text": "pre-training"}, {"end": 1116.56, "start": 1116.2, "text": "data"}, {"end": 1117.08, "start": 1116.56, "text": "for"}, {"end": 1117.24, "start": 1117.08, "text": "an"}, {"end": 1118.16, "start": 1117.24, "text": "LLM,"}, {"end": 1118.52, "start": 1118.16, "text": "data"}, {"end": 1118.84, "start": 1118.52, "text": "age,"}, {"end": 1119.28, "start": 1118.84, "text": "domain"}, {"end": 1119.8, "start": 1119.28, "text": "coverage,"}, {"end": 1120.48, "start": 1119.8, "text": "quality,"}, {"end": 1121.0, "start": 1120.48, "text": "toxicity,"}, {"end": 1121.16, "start": 1121.0, "text": "and"}, {"end": 1121.68, "start": 1121.16, "text": "the"}, {"end": 1122.12, "start": 1121.68, "text": "authors"}, {"end": 1122.32, "start": 1122.12, "text": "are"}, {"end": 1123.04, "start": 1122.32, "text": "MIT,"}, {"end": 1123.52, "start": 1123.04, "text": "Cornell"}, {"end": 1124.16, "start": 1123.52, "text": "University,"}, {"end": 1124.44, "start": 1124.16, "text": "Google"}, {"end": 1124.88, "start": 1124.44, "text": "Research,"}, {"end": 1125.64, "start": 1124.88, "text": "OpenAI,"}, {"end": 1125.88, "start": 1125.64, "text": "and"}, {"end": 1126.52, "start": 1125.88, "text": "Carnegie"}, {"end": 1126.88, "start": 1126.52, "text": "Mellon"}, {"end": 1127.76, "start": 1126.88, "text": "University."}, {"end": 1127.88, "start": 1127.76, "text": "So"}, {"end": 1128.12, "start": 1127.88, "text": "they"}, {"end": 1128.36, "start": 1128.12, "text": "pulled"}, {"end": 1128.88, "start": 1128.36, "text": "together"}, {"end": 1129.36, "start": 1128.88, "text": "their"}, {"end": 1130.16, "start": 1129.36, "text": "findings"}, {"end": 1130.24, "start": 1130.16, "text": "and"}, {"end": 1130.48, "start": 1130.24, "text": "their"}, {"end": 1131.04, "start": 1130.48, "text": "ideas,"}, {"end": 1131.08, "start": 1131.04, "text": "and"}, {"end": 1131.6, "start": 1131.08, "text": "I"}, {"end": 1131.92, "start": 1131.6, "text": "think"}, {"end": 1132.24, "start": 1131.92, "text": "this"}, {"end": 1132.48, "start": 1132.24, "text": "is"}, {"end": 1132.72, "start": 1132.48, "text": "great,"}, {"end": 1132.72, "start": 1132.72, "text": "and"}, {"end": 1133.08, "start": 1132.72, "text": "the"}, {"end": 1133.48, "start": 1133.08, "text": "authors"}, {"end": 1133.76, "start": 1133.48, "text": "here"}, {"end": 1133.8, "start": 1133.76, "text": "of"}, {"end": 1134.08, "start": 1133.8, "text": "our"}, {"end": 1134.28, "start": 1134.08, "text": "main"}, {"end": 1134.6, "start": 1134.28, "text": "study"}, {"end": 1134.72, "start": 1134.6, "text": "that"}, {"end": 1134.88, "start": 1134.72, "text": "we"}, {"end": 1135.16, "start": 1134.88, "text": "examined"}, {"end": 1135.48, "start": 1135.16, "text": "today,"}, {"end": 1135.96, "start": 1135.48, "text": "they"}, {"end": 1136.48, "start": 1135.96, "text": "used"}, {"end": 1136.76, "start": 1136.48, "text": "this"}, {"end": 1137.64, "start": 1136.76, "text": "pre-training"}, {"end": 1138.28, "start": 1137.64, "text": "guide,"}, {"end": 1138.8, "start": 1138.28, "text": "especially"}, {"end": 1139.04, "start": 1138.8, "text": "for"}, {"end": 1139.24, "start": 1139.04, "text": "the"}, {"end": 1139.76, "start": 1139.24, "text": "pre-training"}, {"end": 1139.96, "start": 1139.76, "text": "data,"}], "text": " It's an excellent summary. This is here a pre-training guide to training here, the pre-training data for an LLM, data age, domain coverage, quality, toxicity, and the authors are MIT, Cornell University, Google Research, OpenAI, and Carnegie Mellon University. So they pulled together their findings and their ideas, and I think this is great, and the authors here of our main study that we examined today, they used this pre-training guide, especially for the pre-training data,"}, {"chunks": [{"end": 1141.4, "start": 1140.0, "text": "design."}, {"end": 1141.6, "start": 1141.4, "text": "So"}, {"end": 1141.72, "start": 1141.6, "text": "all"}, {"end": 1141.92, "start": 1141.72, "text": "the"}, {"end": 1142.36, "start": 1141.92, "text": "insights"}, {"end": 1142.56, "start": 1142.36, "text": "that"}, {"end": 1142.72, "start": 1142.56, "text": "you"}, {"end": 1142.88, "start": 1142.72, "text": "might"}, {"end": 1143.2, "start": 1142.88, "text": "find"}, {"end": 1143.36, "start": 1143.2, "text": "here"}, {"end": 1143.68, "start": 1143.36, "text": "that"}, {"end": 1143.84, "start": 1143.68, "text": "you"}, {"end": 1144.16, "start": 1143.84, "text": "say,"}, {"end": 1144.36, "start": 1144.16, "text": "I"}, {"end": 1144.36, "start": 1144.36, "text": "don't"}, {"end": 1144.56, "start": 1144.36, "text": "know"}, {"end": 1144.76, "start": 1144.56, "text": "where"}, {"end": 1144.96, "start": 1144.76, "text": "they"}, {"end": 1145.2, "start": 1144.96, "text": "got"}, {"end": 1145.44, "start": 1145.2, "text": "this"}, {"end": 1146.28, "start": 1145.44, "text": "idea."}, {"end": 1146.84, "start": 1146.28, "text": "This"}, {"end": 1147.04, "start": 1146.84, "text": "would"}, {"end": 1147.28, "start": 1147.04, "text": "be"}, {"end": 1147.48, "start": 1147.28, "text": "your"}, {"end": 1148.08, "start": 1147.48, "text": "reference"}, {"end": 1149.56, "start": 1148.08, "text": "documentation."}, {"end": 1151.12, "start": 1149.56, "text": "Great."}, {"end": 1151.36, "start": 1151.12, "text": "Now"}, {"end": 1151.68, "start": 1151.36, "text": "our"}, {"end": 1152.12, "start": 1151.68, "text": "authors"}, {"end": 1152.4, "start": 1152.12, "text": "had"}, {"end": 1152.92, "start": 1152.4, "text": "three"}, {"end": 1153.28, "start": 1152.92, "text": "main"}, {"end": 1153.68, "start": 1153.28, "text": "questions"}, {"end": 1153.8, "start": 1153.68, "text": "they"}, {"end": 1154.0, "start": 1153.8, "text": "were"}, {"end": 1154.8, "start": 1154.0, "text": "interested"}, {"end": 1155.08, "start": 1154.8, "text": "in."}, {"end": 1155.4, "start": 1155.08, "text": "Can"}, {"end": 1155.8, "start": 1155.4, "text": "training"}, {"end": 1156.0, "start": 1155.8, "text": "now"}, {"end": 1156.12, "start": 1156.0, "text": "with"}, {"end": 1156.36, "start": 1156.12, "text": "some"}, {"end": 1156.72, "start": 1156.36, "text": "clean"}, {"end": 1157.2, "start": 1156.72, "text": "data"}, {"end": 1157.64, "start": 1157.2, "text": "sets"}, {"end": 1158.12, "start": 1157.64, "text": "that"}, {"end": 1158.16, "start": 1158.12, "text": "have"}, {"end": 1158.52, "start": 1158.16, "text": "this"}, {"end": 1158.92, "start": 1158.52, "text": "lower"}, {"end": 1159.4, "start": 1158.92, "text": "linguistic"}, {"end": 1160.2, "start": 1159.4, "text": "complexity,"}, {"end": 1160.88, "start": 1160.2, "text": "can"}, {"end": 1161.28, "start": 1160.88, "text": "they"}, {"end": 1161.64, "start": 1161.28, "text": "enhance"}, {"end": 1161.76, "start": 1161.64, "text": "here"}, {"end": 1162.08, "start": 1161.76, "text": "the"}, {"end": 1162.64, "start": 1162.08, "text": "learning"}, {"end": 1163.44, "start": 1162.64, "text": "efficiency"}, {"end": 1164.16, "start": 1163.44, "text": "of"}, {"end": 1164.8, "start": 1164.16, "text": "LLMs"}, {"end": 1164.88, "start": 1164.8, "text": "and"}, {"end": 1165.0, "start": 1164.88, "text": "of"}, {"end": 1165.6, "start": 1165.0, "text": "tiny"}, {"end": 1166.44, "start": 1165.6, "text": "LLMs?"}, {"end": 1166.64, "start": 1166.44, "text": "And"}, {"end": 1166.96, "start": 1166.64, "text": "you"}, {"end": 1167.04, "start": 1166.96, "text": "would"}, {"end": 1167.64, "start": 1167.04, "text": "assume"}, {"end": 1168.08, "start": 1167.64, "text": "yes,"}, {"end": 1168.12, "start": 1168.08, "text": "of"}, {"end": 1168.52, "start": 1168.12, "text": "course."}, {"end": 1169.6, "start": 1168.52, "text": "Then"}, {"end": 1169.96, "start": 1169.6, "text": "second"}], "text": " design. So all the insights that you might find here that you say, I don't know where they got this idea. This would be your reference documentation. Great. Now our authors had three main questions they were interested in. Can training now with some clean data sets that have this lower linguistic complexity, can they enhance here the learning efficiency of LLMs and of tiny LLMs? And you would assume yes, of course. Then second"}, {"chunks": [{"end": 1170.4, "start": 1170.0, "text": "question"}, {"end": 1170.84, "start": 1170.4, "text": "was"}, {"end": 1171.12, "start": 1170.84, "text": "do"}, {"end": 1171.32, "start": 1171.12, "text": "the"}, {"end": 1171.68, "start": 1171.32, "text": "language"}, {"end": 1171.96, "start": 1171.68, "text": "models"}, {"end": 1172.64, "start": 1171.96, "text": "pre-trained"}, {"end": 1172.72, "start": 1172.64, "text": "and"}, {"end": 1173.48, "start": 1172.72, "text": "instruction"}, {"end": 1173.8, "start": 1173.48, "text": "tuned"}, {"end": 1174.0, "start": 1173.8, "text": "with"}, {"end": 1174.4, "start": 1174.0, "text": "this"}, {"end": 1174.72, "start": 1174.4, "text": "low"}, {"end": 1175.44, "start": 1174.72, "text": "complexity"}, {"end": 1175.64, "start": 1175.44, "text": "data"}, {"end": 1176.48, "start": 1175.64, "text": "sets"}, {"end": 1176.88, "start": 1176.48, "text": "tend"}, {"end": 1177.04, "start": 1176.88, "text": "to"}, {"end": 1177.52, "start": 1177.04, "text": "develop"}, {"end": 1177.84, "start": 1177.52, "text": "here"}, {"end": 1178.44, "start": 1177.84, "text": "this"}, {"end": 1178.96, "start": 1178.44, "text": "instruction"}, {"end": 1179.4, "start": 1178.96, "text": "following"}, {"end": 1180.64, "start": 1179.4, "text": "abilities"}, {"end": 1182.12, "start": 1180.64, "text": "earlier"}, {"end": 1182.2, "start": 1182.12, "text": "are"}, {"end": 1182.36, "start": 1182.2, "text": "they"}, {"end": 1182.8, "start": 1182.36, "text": "maybe"}, {"end": 1183.0, "start": 1182.8, "text": "even"}, {"end": 1183.44, "start": 1183.0, "text": "better"}, {"end": 1183.96, "start": 1183.44, "text": "to"}, {"end": 1184.24, "start": 1183.96, "text": "be"}, {"end": 1184.44, "start": 1184.24, "text": "on"}, {"end": 1184.52, "start": 1184.44, "text": "the"}, {"end": 1184.88, "start": 1184.52, "text": "road"}, {"end": 1185.64, "start": 1184.88, "text": "to"}, {"end": 1186.08, "start": 1185.64, "text": "develop"}, {"end": 1186.28, "start": 1186.08, "text": "to"}, {"end": 1187.16, "start": 1186.28, "text": "autonomous"}, {"end": 1188.32, "start": 1187.16, "text": "self-learning"}, {"end": 1188.68, "start": 1188.32, "text": "AI"}, {"end": 1189.32, "start": 1188.68, "text": "agents"}, {"end": 1189.4, "start": 1189.32, "text": "and"}, {"end": 1189.72, "start": 1189.4, "text": "their"}, {"end": 1190.12, "start": 1189.72, "text": "third"}, {"end": 1190.48, "start": 1190.12, "text": "question"}, {"end": 1190.84, "start": 1190.48, "text": "was"}, {"end": 1192.56, "start": 1190.84, "text": "hey"}, {"end": 1192.8, "start": 1192.56, "text": "would"}, {"end": 1193.24, "start": 1192.8, "text": "they"}, {"end": 1193.6, "start": 1193.24, "text": "enable"}, {"end": 1193.72, "start": 1193.6, "text": "here"}, {"end": 1193.84, "start": 1193.72, "text": "a"}, {"end": 1194.16, "start": 1193.84, "text": "more"}, {"end": 1194.8, "start": 1194.16, "text": "efficient"}, {"end": 1195.56, "start": 1194.8, "text": "development"}, {"end": 1195.8, "start": 1195.56, "text": "of"}, {"end": 1195.96, "start": 1195.8, "text": "our"}, {"end": 1196.36, "start": 1195.96, "text": "language"}, {"end": 1196.72, "start": 1196.36, "text": "model"}, {"end": 1197.32, "start": 1196.72, "text": "architecture"}, {"end": 1197.4, "start": 1197.32, "text": "and"}, {"end": 1197.48, "start": 1197.4, "text": "of"}, {"end": 1197.6, "start": 1197.48, "text": "the"}, {"end": 1198.0, "start": 1197.6, "text": "training"}, {"end": 1198.6, "start": 1198.0, "text": "techniques"}, {"end": 1199.56, "start": 1198.6, "text": "especially"}, {"end": 1199.88, "start": 1199.56, "text": "if"}, {"end": 1199.96, "start": 1199.88, "text": "we"}], "text": " question was do the language models pre-trained and instruction tuned with this low complexity data sets tend to develop here this instruction following abilities earlier are they maybe even better to be on the road to develop to autonomous self-learning AI agents and their third question was hey would they enable here a more efficient development of our language model architecture and of the training techniques especially if we"}, {"chunks": [{"end": 1200.36, "start": 1200.0, "text": "have"}, {"end": 1200.68, "start": 1200.36, "text": "on"}, {"end": 1201.08, "start": 1200.68, "text": "edge"}, {"end": 1201.92, "start": 1201.08, "text": "devices"}, {"end": 1202.32, "start": 1201.92, "text": "a"}, {"end": 1202.84, "start": 1202.32, "text": "resource"}, {"end": 1203.6, "start": 1202.84, "text": "efficient,"}, {"end": 1203.72, "start": 1203.6, "text": "a"}, {"end": 1204.6, "start": 1203.72, "text": "reduced"}, {"end": 1206.96, "start": 1204.6, "text": "scale."}, {"end": 1207.24, "start": 1206.96, "text": "In"}, {"end": 1208.04, "start": 1207.24, "text": "Annex"}, {"end": 1208.16, "start": 1208.04, "text": "B,"}, {"end": 1208.48, "start": 1208.16, "text": "they"}, {"end": 1208.68, "start": 1208.48, "text": "give"}, {"end": 1209.08, "start": 1208.68, "text": "you"}, {"end": 1209.6, "start": 1209.08, "text": "some"}, {"end": 1210.28, "start": 1209.6, "text": "interesting"}, {"end": 1211.24, "start": 1210.28, "text": "insights"}, {"end": 1211.56, "start": 1211.24, "text": "and"}, {"end": 1212.16, "start": 1211.56, "text": "you"}, {"end": 1212.32, "start": 1212.16, "text": "can"}, {"end": 1212.56, "start": 1212.32, "text": "feel"}, {"end": 1213.08, "start": 1212.56, "text": "that"}, {"end": 1213.24, "start": 1213.08, "text": "they"}, {"end": 1214.0, "start": 1213.24, "text": "want"}, {"end": 1214.76, "start": 1214.0, "text": "to"}, {"end": 1215.08, "start": 1214.76, "text": "find"}, {"end": 1215.52, "start": 1215.08, "text": "your"}, {"end": 1216.4, "start": 1215.52, "text": "relationship"}, {"end": 1216.8, "start": 1216.4, "text": "between"}, {"end": 1217.24, "start": 1216.8, "text": "the"}, {"end": 1217.92, "start": 1217.24, "text": "complexity"}, {"end": 1217.96, "start": 1217.92, "text": "of"}, {"end": 1218.4, "start": 1217.96, "text": "the"}, {"end": 1218.8, "start": 1218.4, "text": "language"}, {"end": 1219.6, "start": 1218.8, "text": "data"}, {"end": 1220.12, "start": 1219.6, "text": "set"}, {"end": 1220.92, "start": 1220.12, "text": "for"}, {"end": 1221.4, "start": 1220.92, "text": "the"}, {"end": 1221.92, "start": 1221.4, "text": "scaling"}, {"end": 1222.6, "start": 1221.92, "text": "intention"}, {"end": 1222.72, "start": 1222.6, "text": "and"}, {"end": 1222.96, "start": 1222.72, "text": "its"}, {"end": 1223.4, "start": 1222.96, "text": "data"}, {"end": 1224.08, "start": 1223.4, "text": "distribution"}, {"end": 1224.76, "start": 1224.08, "text": "properties"}, {"end": 1225.04, "start": 1224.76, "text": "on"}, {"end": 1225.32, "start": 1225.04, "text": "its"}, {"end": 1225.68, "start": 1225.32, "text": "different"}, {"end": 1226.68, "start": 1225.68, "text": "domains."}, {"end": 1226.76, "start": 1226.68, "text": "So"}, {"end": 1227.6, "start": 1226.76, "text": "they"}, {"end": 1227.92, "start": 1227.6, "text": "say,"}, {"end": 1228.28, "start": 1227.92, "text": "hey,"}, {"end": 1228.36, "start": 1228.28, "text": "we"}, {"end": 1228.72, "start": 1228.36, "text": "define"}, {"end": 1228.84, "start": 1228.72, "text": "the"}, {"end": 1229.32, "start": 1228.84, "text": "complexity"}, {"end": 1229.36, "start": 1229.32, "text": "of"}, {"end": 1229.44, "start": 1229.36, "text": "a"}, {"end": 1229.76, "start": 1229.44, "text": "language"}, {"end": 1229.96, "start": 1229.76, "text": "data"}], "text": " have on edge devices a resource efficient, a reduced scale. In Annex B, they give you some interesting insights and you can feel that they want to find your relationship between the complexity of the language data set for the scaling intention and its data distribution properties on its different domains. So they say, hey, we define the complexity of a language data"}, {"chunks": [{"end": 1230.16, "start": 1230.0, "text": "said"}, {"end": 1230.28, "start": 1230.16, "text": "to"}, {"end": 1230.4, "start": 1230.28, "text": "be"}, {"end": 1230.52, "start": 1230.4, "text": "the"}, {"end": 1230.76, "start": 1230.52, "text": "total"}, {"end": 1231.04, "start": 1230.76, "text": "number"}, {"end": 1231.2, "start": 1231.04, "text": "of"}, {"end": 1231.52, "start": 1231.2, "text": "token"}, {"end": 1232.04, "start": 1231.52, "text": "combination"}, {"end": 1232.36, "start": 1232.04, "text": "patterns"}, {"end": 1232.88, "start": 1232.36, "text": "presented"}, {"end": 1233.04, "start": 1232.88, "text": "with"}, {"end": 1233.16, "start": 1233.04, "text": "the"}, {"end": 1233.92, "start": 1233.16, "text": "data"}, {"end": 1234.24, "start": 1233.92, "text": "set."}, {"end": 1234.6, "start": 1234.24, "text": "And"}, {"end": 1235.0, "start": 1234.6, "text": "they"}, {"end": 1235.24, "start": 1235.0, "text": "give"}, {"end": 1235.4, "start": 1235.24, "text": "you"}, {"end": 1235.92, "start": 1235.4, "text": "here"}, {"end": 1236.32, "start": 1235.92, "text": "a"}, {"end": 1237.08, "start": 1236.32, "text": "definition"}, {"end": 1237.24, "start": 1237.08, "text": "of"}, {"end": 1237.4, "start": 1237.24, "text": "a"}, {"end": 1237.84, "start": 1237.4, "text": "naive"}, {"end": 1238.56, "start": 1237.84, "text": "complexity,"}, {"end": 1238.72, "start": 1238.56, "text": "as"}, {"end": 1238.92, "start": 1238.72, "text": "they"}, {"end": 1239.2, "start": 1238.92, "text": "call"}, {"end": 1239.44, "start": 1239.2, "text": "it,"}, {"end": 1239.6, "start": 1239.44, "text": "and"}, {"end": 1239.6, "start": 1239.6, "text": "for"}, {"end": 1239.6, "start": 1239.6, "text": "the"}, {"end": 1239.92, "start": 1239.6, "text": "training"}, {"end": 1240.8, "start": 1239.92, "text": "techniques."}, {"end": 1240.96, "start": 1240.8, "text": "And"}, {"end": 1240.96, "start": 1240.96, "text": "of"}, {"end": 1241.24, "start": 1240.96, "text": "course,"}, {"end": 1241.32, "start": 1241.24, "text": "the"}, {"end": 1241.8, "start": 1241.32, "text": "context"}, {"end": 1242.24, "start": 1241.8, "text": "window"}, {"end": 1242.6, "start": 1242.24, "text": "length,"}, {"end": 1242.68, "start": 1242.6, "text": "the"}, {"end": 1243.2, "start": 1242.68, "text": "size"}, {"end": 1243.76, "start": 1243.2, "text": "here,"}, {"end": 1244.32, "start": 1243.76, "text": "affects"}, {"end": 1244.48, "start": 1244.32, "text": "the"}, {"end": 1246.0, "start": 1244.48, "text": "complexity."}, {"end": 1246.12, "start": 1246.0, "text": "And"}, {"end": 1247.08, "start": 1246.12, "text": "they"}, {"end": 1247.76, "start": 1247.08, "text": "analyze"}, {"end": 1248.36, "start": 1247.76, "text": "here"}, {"end": 1248.6, "start": 1248.36, "text": "if"}, {"end": 1248.72, "start": 1248.6, "text": "you"}, {"end": 1248.96, "start": 1248.72, "text": "want"}, {"end": 1249.08, "start": 1248.96, "text": "the"}, {"end": 1249.76, "start": 1249.08, "text": "complexity"}, {"end": 1249.76, "start": 1249.76, "text": "of"}, {"end": 1249.88, "start": 1249.76, "text": "the"}, {"end": 1250.12, "start": 1249.88, "text": "data"}, {"end": 1250.4, "start": 1250.12, "text": "set"}, {"end": 1250.44, "start": 1250.4, "text": "and"}, {"end": 1251.56, "start": 1250.44, "text": "put"}, {"end": 1251.76, "start": 1251.56, "text": "it"}, {"end": 1251.92, "start": 1251.76, "text": "in"}, {"end": 1252.52, "start": 1251.92, "text": "relation"}, {"end": 1252.72, "start": 1252.52, "text": "here"}, {"end": 1252.92, "start": 1252.72, "text": "to"}, {"end": 1253.24, "start": 1252.92, "text": "the"}, {"end": 1253.88, "start": 1253.24, "text": "information"}, {"end": 1254.2, "start": 1253.88, "text": "entropy."}, {"end": 1255.8, "start": 1254.2, "text": "Yeah,"}, {"end": 1255.88, "start": 1255.8, "text": "if"}, {"end": 1256.12, "start": 1255.88, "text": "you"}, {"end": 1256.12, "start": 1256.12, "text": "want"}, {"end": 1257.12, "start": 1256.12, "text": "to"}, {"end": 1257.56, "start": 1257.12, "text": "have"}, {"end": 1257.72, "start": 1257.56, "text": "here"}, {"end": 1258.08, "start": 1257.72, "text": "the"}, {"end": 1258.6, "start": 1258.08, "text": "notation"}, {"end": 1258.72, "start": 1258.6, "text": "for"}, {"end": 1258.88, "start": 1258.72, "text": "all"}, {"end": 1259.0, "start": 1258.88, "text": "the"}, {"end": 1259.96, "start": 1259.0, "text": "variables,"}], "text": " said to be the total number of token combination patterns presented with the data set. And they give you here a definition of a naive complexity, as they call it, and for the training techniques. And of course, the context window length, the size here, affects the complexity. And they analyze here if you want the complexity of the data set and put it in relation here to the information entropy. Yeah, if you want to have here the notation for all the variables,"}, {"chunks": [{"end": 1260.24, "start": 1260.0, "text": "is"}, {"end": 1260.36, "start": 1260.24, "text": "the"}, {"end": 1261.04, "start": 1260.36, "text": "explanation"}, {"end": 1261.2, "start": 1261.04, "text": "you"}, {"end": 1261.44, "start": 1261.2, "text": "need."}, {"end": 1261.44, "start": 1261.44, "text": "But"}, {"end": 1261.48, "start": 1261.44, "text": "you"}, {"end": 1262.28, "start": 1261.48, "text": "know,"}, {"end": 1262.48, "start": 1262.28, "text": "let"}, {"end": 1262.76, "start": 1262.48, "text": "me"}, {"end": 1263.52, "start": 1262.76, "text": "translate"}, {"end": 1263.8, "start": 1263.52, "text": "this"}, {"end": 1264.16, "start": 1263.8, "text": "formal"}, {"end": 1264.32, "start": 1264.16, "text": "in"}, {"end": 1264.92, "start": 1264.32, "text": "simple"}, {"end": 1265.44, "start": 1264.92, "text": "words"}, {"end": 1265.64, "start": 1265.44, "text": "since"}, {"end": 1265.76, "start": 1265.64, "text": "we"}, {"end": 1265.92, "start": 1265.76, "text": "are"}, {"end": 1266.08, "start": 1265.92, "text": "talking"}, {"end": 1266.28, "start": 1266.08, "text": "about"}, {"end": 1267.0, "start": 1266.28, "text": "tiny"}, {"end": 1267.6, "start": 1267.0, "text": "LLMs."}, {"end": 1267.92, "start": 1267.6, "text": "So"}, {"end": 1268.24, "start": 1267.92, "text": "what"}, {"end": 1268.24, "start": 1268.24, "text": "do"}, {"end": 1268.24, "start": 1268.24, "text": "you"}, {"end": 1268.24, "start": 1268.24, "text": "have?"}, {"end": 1268.52, "start": 1268.24, "text": "They"}, {"end": 1268.52, "start": 1268.52, "text": "have"}, {"end": 1269.04, "start": 1268.52, "text": "the"}, {"end": 1269.28, "start": 1269.04, "text": "lower"}, {"end": 1269.88, "start": 1269.28, "text": "bound"}, {"end": 1269.88, "start": 1269.88, "text": "and"}, {"end": 1270.12, "start": 1269.88, "text": "the"}, {"end": 1270.64, "start": 1270.12, "text": "entropy."}, {"end": 1270.88, "start": 1270.64, "text": "So"}, {"end": 1271.08, "start": 1270.88, "text": "the"}, {"end": 1271.24, "start": 1271.08, "text": "lower"}, {"end": 1271.6, "start": 1271.24, "text": "bound"}, {"end": 1271.68, "start": 1271.6, "text": "of"}, {"end": 1272.04, "start": 1271.68, "text": "this"}, {"end": 1272.84, "start": 1272.04, "text": "mathematical"}, {"end": 1273.56, "start": 1272.84, "text": "formulation"}, {"end": 1273.96, "start": 1273.56, "text": "with"}, {"end": 1274.24, "start": 1273.96, "text": "the"}, {"end": 1274.84, "start": 1274.24, "text": "exponential"}, {"end": 1274.92, "start": 1274.84, "text": "of"}, {"end": 1275.4, "start": 1274.92, "text": "information"}, {"end": 1275.76, "start": 1275.4, "text": "entropy"}, {"end": 1276.36, "start": 1275.76, "text": "implies"}, {"end": 1276.52, "start": 1276.36, "text": "that"}, {"end": 1276.96, "start": 1276.52, "text": "reducing"}, {"end": 1277.08, "start": 1276.96, "text": "the"}, {"end": 1277.24, "start": 1277.08, "text": "entropy"}, {"end": 1277.32, "start": 1277.24, "text": "of"}, {"end": 1277.44, "start": 1277.32, "text": "a"}, {"end": 1278.0, "start": 1277.44, "text": "data"}, {"end": 1278.2, "start": 1278.0, "text": "set."}, {"end": 1278.64, "start": 1278.2, "text": "This"}, {"end": 1278.96, "start": 1278.64, "text": "simply"}, {"end": 1279.16, "start": 1278.96, "text": "means"}, {"end": 1279.24, "start": 1279.16, "text": "by"}, {"end": 1279.8, "start": 1279.24, "text": "making"}, {"end": 1279.92, "start": 1279.8, "text": "the"}, {"end": 1280.12, "start": 1279.92, "text": "word"}, {"end": 1281.0, "start": 1280.12, "text": "distribution"}, {"end": 1281.44, "start": 1281.0, "text": "more"}, {"end": 1282.28, "start": 1281.44, "text": "predictable."}, {"end": 1282.44, "start": 1282.28, "text": "You"}, {"end": 1282.6, "start": 1282.44, "text": "remember"}, {"end": 1283.0, "start": 1282.6, "text": "the"}, {"end": 1283.4, "start": 1283.0, "text": "next"}, {"end": 1283.76, "start": 1283.4, "text": "token"}, {"end": 1284.28, "start": 1283.76, "text": "prediction,"}, {"end": 1285.44, "start": 1284.28, "text": "autoregressive,"}, {"end": 1286.12, "start": 1285.44, "text": "for"}, {"end": 1286.44, "start": 1286.12, "text": "making"}, {"end": 1286.56, "start": 1286.44, "text": "a"}, {"end": 1286.84, "start": 1286.56, "text": "data"}, {"end": 1287.16, "start": 1286.84, "text": "set"}, {"end": 1288.56, "start": 1287.16, "text": "simpler."}, {"end": 1288.8, "start": 1288.56, "text": "It"}, {"end": 1289.52, "start": 1288.8, "text": "reduces"}, {"end": 1289.72, "start": 1289.52, "text": "its"}, {"end": 1289.96, "start": 1289.72, "text": "overall"}], "text": " is the explanation you need. But you know, let me translate this formal in simple words since we are talking about tiny LLMs. So what do you have? They have the lower bound and the entropy. So the lower bound of this mathematical formulation with the exponential of information entropy implies that reducing the entropy of a data set. This simply means by making the word distribution more predictable. You remember the next token prediction, autoregressive, for making a data set simpler. It reduces its overall"}, {"chunks": [{"end": 1290.76, "start": 1290.0, "text": "complexity"}, {"end": 1290.88, "start": 1290.76, "text": "and"}, {"end": 1291.16, "start": 1290.88, "text": "thus"}, {"end": 1291.36, "start": 1291.16, "text": "is"}, {"end": 1291.72, "start": 1291.36, "text": "easier"}, {"end": 1291.88, "start": 1291.72, "text": "for"}, {"end": 1292.2, "start": 1291.88, "text": "language"}, {"end": 1292.52, "start": 1292.2, "text": "models"}, {"end": 1292.68, "start": 1292.52, "text": "to"}, {"end": 1292.88, "start": 1292.68, "text": "learn"}, {"end": 1293.28, "start": 1292.88, "text": "from,"}, {"end": 1293.72, "start": 1293.28, "text": "especially"}, {"end": 1293.8, "start": 1293.72, "text": "the"}, {"end": 1294.12, "start": 1293.8, "text": "tiny"}, {"end": 1294.48, "start": 1294.12, "text": "language"}, {"end": 1295.48, "start": 1294.48, "text": "model."}, {"end": 1295.6, "start": 1295.48, "text": "I"}, {"end": 1295.72, "start": 1295.6, "text": "would"}, {"end": 1296.04, "start": 1295.72, "text": "say"}, {"end": 1296.56, "start": 1296.04, "text": "this"}, {"end": 1297.36, "start": 1296.56, "text": "is"}, {"end": 1297.76, "start": 1297.36, "text": "nothing"}, {"end": 1298.08, "start": 1297.76, "text": "I"}, {"end": 1298.24, "start": 1298.08, "text": "would"}, {"end": 1298.64, "start": 1298.24, "text": "doubt."}, {"end": 1298.96, "start": 1298.64, "text": "I"}, {"end": 1299.12, "start": 1298.96, "text": "would"}, {"end": 1299.28, "start": 1299.12, "text": "say,"}, {"end": 1299.48, "start": 1299.28, "text": "yeah,"}, {"end": 1299.88, "start": 1299.48, "text": "this"}, {"end": 1300.12, "start": 1299.88, "text": "is"}, {"end": 1300.28, "start": 1300.12, "text": "a"}, {"end": 1301.12, "start": 1300.28, "text": "given."}, {"end": 1301.4, "start": 1301.12, "text": "And"}, {"end": 1301.76, "start": 1301.4, "text": "then"}, {"end": 1302.04, "start": 1301.76, "text": "they"}, {"end": 1302.36, "start": 1302.04, "text": "say"}, {"end": 1302.36, "start": 1302.36, "text": "for"}, {"end": 1302.56, "start": 1302.36, "text": "the"}, {"end": 1302.76, "start": 1302.56, "text": "upper"}, {"end": 1303.12, "start": 1302.76, "text": "bound"}, {"end": 1303.24, "start": 1303.12, "text": "and"}, {"end": 1303.24, "start": 1303.24, "text": "the"}, {"end": 1303.64, "start": 1303.24, "text": "dataset"}, {"end": 1304.04, "start": 1303.64, "text": "size,"}, {"end": 1304.24, "start": 1304.04, "text": "the"}, {"end": 1304.32, "start": 1304.24, "text": "upper"}, {"end": 1304.56, "start": 1304.32, "text": "bound"}, {"end": 1304.92, "start": 1304.56, "text": "with"}, {"end": 1304.92, "start": 1304.92, "text": "the"}, {"end": 1305.48, "start": 1304.92, "text": "exponential"}, {"end": 1305.48, "start": 1305.48, "text": "of"}, {"end": 1305.48, "start": 1305.48, "text": "the"}, {"end": 1305.68, "start": 1305.48, "text": "token"}, {"end": 1305.92, "start": 1305.68, "text": "and"}, {"end": 1306.0, "start": 1305.92, "text": "the"}, {"end": 1306.24, "start": 1306.0, "text": "log"}, {"end": 1306.6, "start": 1306.24, "text": "tokens"}, {"end": 1306.96, "start": 1306.6, "text": "shows"}, {"end": 1307.0, "start": 1306.96, "text": "that"}, {"end": 1307.24, "start": 1307.0, "text": "as"}, {"end": 1307.24, "start": 1307.24, "text": "the"}, {"end": 1307.84, "start": 1307.24, "text": "dataset"}, {"end": 1308.76, "start": 1307.84, "text": "size,"}, {"end": 1308.96, "start": 1308.76, "text": "this"}, {"end": 1309.16, "start": 1308.96, "text": "means"}, {"end": 1309.32, "start": 1309.16, "text": "the"}, {"end": 1309.52, "start": 1309.32, "text": "number"}, {"end": 1309.64, "start": 1309.52, "text": "of"}, {"end": 1309.76, "start": 1309.64, "text": "the"}, {"end": 1310.32, "start": 1309.76, "text": "tokens"}, {"end": 1311.56, "start": 1310.32, "text": "increase,"}, {"end": 1311.96, "start": 1311.56, "text": "so"}, {"end": 1312.36, "start": 1311.96, "text": "does"}, {"end": 1312.56, "start": 1312.36, "text": "the"}, {"end": 1315.68, "start": 1312.56, "text": "complexity."}, {"end": 1316.08, "start": 1315.68, "text": "This"}, {"end": 1316.52, "start": 1316.08, "text": "is,"}, {"end": 1316.84, "start": 1316.52, "text": "yeah,"}, {"end": 1317.52, "start": 1316.84, "text": "it"}, {"end": 1317.84, "start": 1317.52, "text": "is"}, {"end": 1318.44, "start": 1317.84, "text": "self-explanatory,"}, {"end": 1318.48, "start": 1318.44, "text": "but"}, {"end": 1318.64, "start": 1318.48, "text": "I"}, {"end": 1318.88, "start": 1318.64, "text": "think"}, {"end": 1319.28, "start": 1318.88, "text": "there"}, {"end": 1319.28, "start": 1319.28, "text": "are"}, {"end": 1319.48, "start": 1319.28, "text": "a"}, {"end": 1319.68, "start": 1319.48, "text": "lot"}, {"end": 1319.96, "start": 1319.68, "text": "of"}], "text": " complexity and thus is easier for language models to learn from, especially the tiny language model. I would say this is nothing I would doubt. I would say, yeah, this is a given. And then they say for the upper bound and the dataset size, the upper bound with the exponential of the token and the log tokens shows that as the dataset size, this means the number of the tokens increase, so does the complexity. This is, yeah, it is self-explanatory, but I think there are a lot of"}, {"chunks": [{"end": 1320.52, "start": 1320.0, "text": "hidden"}, {"end": 1321.88, "start": 1320.52, "text": "complexities,"}, {"end": 1323.28, "start": 1321.88, "text": "especially"}, {"end": 1323.68, "start": 1323.28, "text": "in"}, {"end": 1324.0, "start": 1323.68, "text": "their"}, {"end": 1325.2, "start": 1324.0, "text": "formulation."}, {"end": 1325.52, "start": 1325.2, "text": "I"}, {"end": 1325.84, "start": 1325.52, "text": "think"}, {"end": 1325.96, "start": 1325.84, "text": "I"}, {"end": 1326.72, "start": 1325.96, "text": "understand"}, {"end": 1327.16, "start": 1326.72, "text": "why"}, {"end": 1327.48, "start": 1327.16, "text": "they"}, {"end": 1327.72, "start": 1327.48, "text": "do"}, {"end": 1328.36, "start": 1327.72, "text": "it"}, {"end": 1328.48, "start": 1328.36, "text": "here."}, {"end": 1328.48, "start": 1328.48, "text": "And"}, {"end": 1328.64, "start": 1328.48, "text": "their"}, {"end": 1328.88, "start": 1328.64, "text": "goal"}, {"end": 1329.12, "start": 1328.88, "text": "is"}, {"end": 1329.12, "start": 1329.12, "text": "to"}, {"end": 1329.44, "start": 1329.12, "text": "provide"}, {"end": 1329.6, "start": 1329.44, "text": "a"}, {"end": 1329.84, "start": 1329.6, "text": "formal"}, {"end": 1330.44, "start": 1329.84, "text": "mathematical"}, {"end": 1330.72, "start": 1330.44, "text": "measure"}, {"end": 1331.0, "start": 1330.72, "text": "for"}, {"end": 1331.56, "start": 1331.0, "text": "evaluating"}, {"end": 1331.72, "start": 1331.56, "text": "the"}, {"end": 1332.12, "start": 1331.72, "text": "dataset"}, {"end": 1333.04, "start": 1332.12, "text": "complexity"}, {"end": 1333.28, "start": 1333.04, "text": "for"}, {"end": 1333.44, "start": 1333.28, "text": "their"}, {"end": 1334.12, "start": 1333.44, "text": "scaling."}, {"end": 1334.72, "start": 1334.12, "text": "But"}, {"end": 1334.92, "start": 1334.72, "text": "I"}, {"end": 1336.36, "start": 1334.92, "text": "don't"}, {"end": 1337.36, "start": 1336.36, "text": "know."}, {"end": 1337.88, "start": 1337.36, "text": "I"}, {"end": 1338.08, "start": 1337.88, "text": "have"}, {"end": 1338.32, "start": 1338.08, "text": "a"}, {"end": 1338.6, "start": 1338.32, "text": "feeling"}, {"end": 1338.84, "start": 1338.6, "text": "that"}, {"end": 1339.2, "start": 1338.84, "text": "their"}, {"end": 1339.6, "start": 1339.2, "text": "proof"}, {"end": 1339.76, "start": 1339.6, "text": "here,"}, {"end": 1340.44, "start": 1339.76, "text": "yes,"}, {"end": 1340.8, "start": 1340.44, "text": "it"}, {"end": 1341.44, "start": 1340.8, "text": "supports"}, {"end": 1341.68, "start": 1341.44, "text": "here"}, {"end": 1341.8, "start": 1341.68, "text": "in"}, {"end": 1342.2, "start": 1341.8, "text": "their"}, {"end": 1342.36, "start": 1342.2, "text": "view,"}, {"end": 1342.4, "start": 1342.36, "text": "the"}, {"end": 1342.64, "start": 1342.4, "text": "data"}, {"end": 1343.4, "start": 1342.64, "text": "pre-processing"}, {"end": 1343.72, "start": 1343.4, "text": "approach"}, {"end": 1343.96, "start": 1343.72, "text": "to"}, {"end": 1344.16, "start": 1343.96, "text": "reduce"}, {"end": 1344.2, "start": 1344.16, "text": "the"}, {"end": 1344.72, "start": 1344.2, "text": "complexity"}, {"end": 1344.84, "start": 1344.72, "text": "by"}, {"end": 1345.32, "start": 1344.84, "text": "simplifying"}, {"end": 1345.44, "start": 1345.32, "text": "the"}, {"end": 1346.16, "start": 1345.44, "text": "language,"}, {"end": 1346.48, "start": 1346.16, "text": "thereby"}, {"end": 1346.92, "start": 1346.48, "text": "reducing"}, {"end": 1347.32, "start": 1346.92, "text": "the"}, {"end": 1347.6, "start": 1347.32, "text": "entropy"}, {"end": 1347.6, "start": 1347.6, "text": "of"}, {"end": 1347.8, "start": 1347.6, "text": "the"}, {"end": 1348.12, "start": 1347.8, "text": "information"}, {"end": 1348.6, "start": 1348.12, "text": "to"}, {"end": 1348.96, "start": 1348.6, "text": "train"}, {"end": 1349.4, "start": 1348.96, "text": "smaller"}, {"end": 1349.72, "start": 1349.4, "text": "models"}, {"end": 1349.96, "start": 1349.72, "text": "effectively."}], "text": " hidden complexities, especially in their formulation. I think I understand why they do it here. And their goal is to provide a formal mathematical measure for evaluating the dataset complexity for their scaling. But I don't know. I have a feeling that their proof here, yes, it supports here in their view, the data pre-processing approach to reduce the complexity by simplifying the language, thereby reducing the entropy of the information to train smaller models effectively."}, {"chunks": [{"end": 1351.28, "start": 1350.0, "text": "But"}, {"end": 1351.56, "start": 1351.28, "text": "this"}, {"end": 1351.76, "start": 1351.56, "text": "is"}, {"end": 1351.8, "start": 1351.76, "text": "nothing"}, {"end": 1352.4, "start": 1351.8, "text": "that"}, {"end": 1352.56, "start": 1352.4, "text": "I"}, {"end": 1352.8, "start": 1352.56, "text": "would"}, {"end": 1353.04, "start": 1352.8, "text": "doubt."}, {"end": 1353.28, "start": 1353.04, "text": "This"}, {"end": 1353.48, "start": 1353.28, "text": "is"}, {"end": 1353.88, "start": 1353.48, "text": "what"}, {"end": 1354.0, "start": 1353.88, "text": "I"}, {"end": 1354.48, "start": 1354.0, "text": "would"}, {"end": 1354.84, "start": 1354.48, "text": "expect,"}, {"end": 1355.0, "start": 1354.84, "text": "you"}, {"end": 1355.24, "start": 1355.0, "text": "know."}, {"end": 1355.68, "start": 1355.24, "text": "I"}, {"end": 1355.96, "start": 1355.68, "text": "would"}, {"end": 1356.4, "start": 1355.96, "text": "maybe"}, {"end": 1357.08, "start": 1356.4, "text": "even,"}, {"end": 1357.24, "start": 1357.08, "text": "I"}, {"end": 1357.36, "start": 1357.24, "text": "would"}, {"end": 1357.64, "start": 1357.36, "text": "be"}, {"end": 1357.92, "start": 1357.64, "text": "more"}, {"end": 1358.92, "start": 1357.92, "text": "interested"}, {"end": 1359.24, "start": 1358.92, "text": "here"}, {"end": 1359.36, "start": 1359.24, "text": "in"}, {"end": 1359.64, "start": 1359.36, "text": "the"}, {"end": 1359.68, "start": 1359.64, "text": "hidden"}, {"end": 1360.36, "start": 1359.68, "text": "complexity."}, {"end": 1360.6, "start": 1360.36, "text": "That"}, {"end": 1360.88, "start": 1360.6, "text": "is"}, {"end": 1361.24, "start": 1360.88, "text": "not"}, {"end": 1361.48, "start": 1361.24, "text": "that"}, {"end": 1361.76, "start": 1361.48, "text": "you"}, {"end": 1362.12, "start": 1361.76, "text": "say,"}, {"end": 1362.64, "start": 1362.12, "text": "yeah,"}, {"end": 1362.88, "start": 1362.64, "text": "if"}, {"end": 1362.96, "start": 1362.88, "text": "you"}, {"end": 1363.12, "start": 1362.96, "text": "make"}, {"end": 1363.16, "start": 1363.12, "text": "the"}, {"end": 1363.76, "start": 1363.16, "text": "data"}, {"end": 1364.16, "start": 1363.76, "text": "set"}, {"end": 1364.56, "start": 1364.16, "text": "simpler,"}, {"end": 1364.8, "start": 1364.56, "text": "you"}, {"end": 1365.08, "start": 1364.8, "text": "reduce"}, {"end": 1365.2, "start": 1365.08, "text": "the"}, {"end": 1366.08, "start": 1365.2, "text": "complexity,"}, {"end": 1366.44, "start": 1366.08, "text": "therefore"}, {"end": 1366.48, "start": 1366.44, "text": "it"}, {"end": 1366.68, "start": 1366.48, "text": "is"}, {"end": 1367.04, "start": 1366.68, "text": "easier"}, {"end": 1367.24, "start": 1367.04, "text": "to"}, {"end": 1368.6, "start": 1367.24, "text": "learn."}, {"end": 1370.6, "start": 1368.6, "text": "Yes."}, {"end": 1370.84, "start": 1370.6, "text": "But"}, {"end": 1371.24, "start": 1370.84, "text": "let's"}, {"end": 1371.28, "start": 1371.24, "text": "come"}, {"end": 1371.48, "start": 1371.28, "text": "back"}, {"end": 1371.8, "start": 1371.48, "text": "to"}, {"end": 1372.04, "start": 1371.8, "text": "the"}, {"end": 1372.68, "start": 1372.04, "text": "core"}, {"end": 1373.08, "start": 1372.68, "text": "element"}, {"end": 1373.08, "start": 1373.08, "text": "of"}, {"end": 1373.2, "start": 1373.08, "text": "the"}, {"end": 1373.64, "start": 1373.2, "text": "complete"}, {"end": 1374.4, "start": 1373.64, "text": "paper,"}, {"end": 1374.56, "start": 1374.4, "text": "not"}, {"end": 1374.76, "start": 1374.56, "text": "just"}, {"end": 1374.92, "start": 1374.76, "text": "this"}, {"end": 1375.48, "start": 1374.92, "text": "proof"}, {"end": 1375.6, "start": 1375.48, "text": "for"}, {"end": 1375.96, "start": 1375.6, "text": "the"}, {"end": 1376.72, "start": 1375.96, "text": "complexity."}, {"end": 1376.92, "start": 1376.72, "text": "So"}, {"end": 1377.24, "start": 1376.92, "text": "the"}, {"end": 1377.48, "start": 1377.24, "text": "authors"}, {"end": 1377.72, "start": 1377.48, "text": "state"}, {"end": 1377.88, "start": 1377.72, "text": "now"}, {"end": 1378.12, "start": 1377.88, "text": "that"}, {"end": 1378.2, "start": 1378.12, "text": "the"}, {"end": 1378.84, "start": 1378.2, "text": "complexity"}, {"end": 1378.84, "start": 1378.84, "text": "of"}, {"end": 1379.0, "start": 1378.84, "text": "the"}, {"end": 1379.32, "start": 1379.0, "text": "language"}, {"end": 1379.76, "start": 1379.32, "text": "data"}, {"end": 1379.96, "start": 1379.76, "text": "set"}], "text": " But this is nothing that I would doubt. This is what I would expect, you know. I would maybe even, I would be more interested here in the hidden complexity. That is not that you say, yeah, if you make the data set simpler, you reduce the complexity, therefore it is easier to learn. Yes. But let's come back to the core element of the complete paper, not just this proof for the complexity. So the authors state now that the complexity of the language data set"}, {"chunks": [{"end": 1380.36, "start": 1380.0, "text": "determined"}, {"end": 1380.56, "start": 1380.36, "text": "by"}, {"end": 1380.56, "start": 1380.56, "text": "the"}, {"end": 1381.28, "start": 1380.56, "text": "information"}, {"end": 1381.48, "start": 1381.28, "text": "entropy"}, {"end": 1381.48, "start": 1381.48, "text": "of"}, {"end": 1381.52, "start": 1381.48, "text": "the"}, {"end": 1381.76, "start": 1381.52, "text": "text"}, {"end": 1382.72, "start": 1381.76, "text": "distribution."}, {"end": 1383.08, "start": 1382.72, "text": "And"}, {"end": 1383.2, "start": 1383.08, "text": "the"}, {"end": 1383.44, "start": 1383.2, "text": "goal"}, {"end": 1383.8, "start": 1383.44, "text": "is"}, {"end": 1384.04, "start": 1383.8, "text": "to"}, {"end": 1384.36, "start": 1384.04, "text": "simplify"}, {"end": 1384.64, "start": 1384.36, "text": "the"}, {"end": 1384.84, "start": 1384.64, "text": "language"}, {"end": 1385.2, "start": 1384.84, "text": "dataset"}, {"end": 1385.44, "start": 1385.2, "text": "to"}, {"end": 1385.76, "start": 1385.44, "text": "reduce"}, {"end": 1385.92, "start": 1385.76, "text": "the"}, {"end": 1386.4, "start": 1385.92, "text": "randomness"}, {"end": 1386.48, "start": 1386.4, "text": "of"}, {"end": 1386.68, "start": 1386.48, "text": "the"}, {"end": 1387.04, "start": 1386.68, "text": "word"}, {"end": 1388.12, "start": 1387.04, "text": "occurrences"}, {"end": 1388.56, "start": 1388.12, "text": "by"}, {"end": 1389.16, "start": 1388.56, "text": "reducing"}, {"end": 1389.2, "start": 1389.16, "text": "the"}, {"end": 1389.52, "start": 1389.2, "text": "entropy"}, {"end": 1389.64, "start": 1389.52, "text": "of"}, {"end": 1389.96, "start": 1389.64, "text": "the"}, {"end": 1390.24, "start": 1389.96, "text": "text"}, {"end": 1391.44, "start": 1390.24, "text": "distribution."}, {"end": 1391.72, "start": 1391.44, "text": "And"}, {"end": 1392.2, "start": 1391.72, "text": "this"}, {"end": 1392.52, "start": 1392.2, "text": "is"}, {"end": 1392.8, "start": 1392.52, "text": "what"}, {"end": 1393.0, "start": 1392.8, "text": "is"}, {"end": 1393.52, "start": 1393.0, "text": "targeted"}, {"end": 1393.72, "start": 1393.52, "text": "by"}, {"end": 1394.08, "start": 1393.72, "text": "stating"}, {"end": 1394.08, "start": 1394.08, "text": "that"}, {"end": 1394.24, "start": 1394.08, "text": "the"}, {"end": 1395.04, "start": 1394.24, "text": "complexity"}, {"end": 1395.28, "start": 1395.04, "text": "is"}, {"end": 1395.72, "start": 1395.28, "text": "related"}, {"end": 1395.84, "start": 1395.72, "text": "to"}, {"end": 1396.04, "start": 1395.84, "text": "the"}, {"end": 1396.64, "start": 1396.04, "text": "information"}, {"end": 1397.16, "start": 1396.64, "text": "entropy."}, {"end": 1397.36, "start": 1397.16, "text": "But"}, {"end": 1398.16, "start": 1397.36, "text": "this"}, {"end": 1398.44, "start": 1398.16, "text": "is"}, {"end": 1399.48, "start": 1398.44, "text": "something"}, {"end": 1399.8, "start": 1399.48, "text": "that"}, {"end": 1399.88, "start": 1399.8, "text": "I"}, {"end": 1400.0, "start": 1399.88, "text": "would"}, {"end": 1400.32, "start": 1400.0, "text": "say"}, {"end": 1400.68, "start": 1400.32, "text": "yes,"}, {"end": 1400.68, "start": 1400.68, "text": "of"}, {"end": 1401.68, "start": 1400.68, "text": "course."}, {"end": 1401.84, "start": 1401.68, "text": "I"}, {"end": 1402.2, "start": 1401.84, "text": "don't"}, {"end": 1402.44, "start": 1402.2, "text": "doubt"}, {"end": 1402.72, "start": 1402.44, "text": "this."}, {"end": 1404.16, "start": 1402.72, "text": "For"}, {"end": 1404.72, "start": 1404.16, "text": "the"}, {"end": 1405.56, "start": 1404.72, "text": "application,"}, {"end": 1405.88, "start": 1405.56, "text": "by"}, {"end": 1406.92, "start": 1405.88, "text": "simplifying"}, {"end": 1407.08, "start": 1406.92, "text": "in"}, {"end": 1407.36, "start": 1407.08, "text": "general"}, {"end": 1407.88, "start": 1407.36, "text": "the"}, {"end": 1408.8, "start": 1407.88, "text": "language,"}, {"end": 1408.84, "start": 1408.8, "text": "and"}, {"end": 1409.4, "start": 1408.84, "text": "they"}, {"end": 1409.76, "start": 1409.4, "text": "limit"}, {"end": 1409.96, "start": 1409.76, "text": "the"}], "text": " determined by the information entropy of the text distribution. And the goal is to simplify the language dataset to reduce the randomness of the word occurrences by reducing the entropy of the text distribution. And this is what is targeted by stating that the complexity is related to the information entropy. But this is something that I would say yes, of course. I don't doubt this. For the application, by simplifying in general the language, and they limit the"}, {"chunks": [{"end": 1410.04, "start": 1410.0, "text": "The"}, {"end": 1410.32, "start": 1410.04, "text": "tabular"}, {"end": 1410.6, "start": 1410.32, "text": "side"}, {"end": 1411.16, "start": 1410.6, "text": "massively"}, {"end": 1411.28, "start": 1411.16, "text": "down"}, {"end": 1411.48, "start": 1411.28, "text": "to"}, {"end": 1412.16, "start": 1411.48, "text": "2K,"}, {"end": 1412.56, "start": 1412.16, "text": "and"}, {"end": 1412.84, "start": 1412.56, "text": "thereby"}, {"end": 1413.32, "start": 1412.84, "text": "influence"}, {"end": 1413.32, "start": 1413.32, "text": "of"}, {"end": 1413.72, "start": 1413.32, "text": "course"}, {"end": 1413.92, "start": 1413.72, "text": "the"}, {"end": 1414.16, "start": 1413.92, "text": "word"}, {"end": 1414.84, "start": 1414.16, "text": "distribution"}, {"end": 1415.52, "start": 1414.84, "text": "itself."}, {"end": 1415.68, "start": 1415.52, "text": "And"}, {"end": 1416.04, "start": 1415.68, "text": "they"}, {"end": 1416.44, "start": 1416.04, "text": "remove"}, {"end": 1416.96, "start": 1416.44, "text": "a"}, {"end": 1417.4, "start": 1416.96, "text": "lot"}, {"end": 1417.68, "start": 1417.4, "text": "of"}, {"end": 1418.36, "start": 1417.68, "text": "outliers,"}, {"end": 1418.64, "start": 1418.36, "text": "the"}, {"end": 1418.84, "start": 1418.64, "text": "slow"}, {"end": 1419.12, "start": 1418.84, "text": "frequency"}, {"end": 1419.32, "start": 1419.12, "text": "word"}, {"end": 1420.52, "start": 1419.32, "text": "completely."}, {"end": 1420.72, "start": 1420.52, "text": "So"}, {"end": 1421.0, "start": 1420.72, "text": "now"}, {"end": 1421.24, "start": 1421.0, "text": "they"}, {"end": 1421.28, "start": 1421.24, "text": "are"}, {"end": 1421.68, "start": 1421.28, "text": "kind"}, {"end": 1421.84, "start": 1421.68, "text": "of"}, {"end": 1422.12, "start": 1421.84, "text": "controlling"}, {"end": 1422.32, "start": 1422.12, "text": "the"}, {"end": 1422.96, "start": 1422.32, "text": "distribution"}, {"end": 1423.0, "start": 1422.96, "text": "of"}, {"end": 1423.28, "start": 1423.0, "text": "the"}, {"end": 1423.72, "start": 1423.28, "text": "words"}, {"end": 1424.0, "start": 1423.72, "text": "in"}, {"end": 1424.2, "start": 1424.0, "text": "their"}, {"end": 1424.92, "start": 1424.2, "text": "dataset."}, {"end": 1425.24, "start": 1424.92, "text": "And"}, {"end": 1425.48, "start": 1425.24, "text": "the"}, {"end": 1425.76, "start": 1425.48, "text": "idea"}, {"end": 1426.04, "start": 1425.76, "text": "here"}, {"end": 1426.32, "start": 1426.04, "text": "is,"}, {"end": 1426.48, "start": 1426.32, "text": "in"}, {"end": 1427.16, "start": 1426.48, "text": "my"}, {"end": 1427.52, "start": 1427.16, "text": "words,"}, {"end": 1427.84, "start": 1427.52, "text": "that"}, {"end": 1428.16, "start": 1427.84, "text": "the"}, {"end": 1428.44, "start": 1428.16, "text": "models,"}, {"end": 1428.56, "start": 1428.44, "text": "the"}, {"end": 1428.96, "start": 1428.56, "text": "tiny"}, {"end": 1429.16, "start": 1428.96, "text": "models"}, {"end": 1429.36, "start": 1429.16, "text": "can"}, {"end": 1429.64, "start": 1429.36, "text": "learn"}, {"end": 1429.76, "start": 1429.64, "text": "the"}, {"end": 1430.32, "start": 1429.76, "text": "distribution"}, {"end": 1430.76, "start": 1430.32, "text": "properties"}, {"end": 1432.48, "start": 1430.76, "text": "quicker."}, {"end": 1432.88, "start": 1432.48, "text": "Also"}, {"end": 1433.64, "start": 1432.88, "text": "something"}, {"end": 1433.72, "start": 1433.64, "text": "I"}, {"end": 1433.84, "start": 1433.72, "text": "would"}, {"end": 1434.12, "start": 1433.84, "text": "say,"}, {"end": 1434.36, "start": 1434.12, "text": "okay,"}, {"end": 1436.36, "start": 1434.36, "text": "yes."}, {"end": 1436.48, "start": 1436.36, "text": "Now"}, {"end": 1436.72, "start": 1436.48, "text": "for"}, {"end": 1436.72, "start": 1436.72, "text": "the"}, {"end": 1437.32, "start": 1436.72, "text": "training."}, {"end": 1438.08, "start": 1437.32, "text": "Diago,"}, {"end": 1438.4, "start": 1438.08, "text": "since"}, {"end": 1438.4, "start": 1438.4, "text": "the"}, {"end": 1438.92, "start": 1438.4, "text": "tiny"}, {"end": 1439.2, "start": 1438.92, "text": "LLMs"}, {"end": 1439.24, "start": 1439.2, "text": "are"}, {"end": 1439.52, "start": 1439.24, "text": "less"}, {"end": 1439.96, "start": 1439.52, "text": "powerful,"}], "text": " The tabular side massively down to 2K, and thereby influence of course the word distribution itself. And they remove a lot of outliers, the slow frequency word completely. So now they are kind of controlling the distribution of the words in their dataset. And the idea here is, in my words, that the models, the tiny models can learn the distribution properties quicker. Also something I would say, okay, yes. Now for the training. Diago, since the tiny LLMs are less powerful,"}, {"chunks": [{"end": 1441.0, "start": 1440.0, "text": "absolutely,"}, {"end": 1441.44, "start": 1441.0, "text": "they"}, {"end": 1441.76, "start": 1441.44, "text": "may"}, {"end": 1442.08, "start": 1441.76, "text": "need"}, {"end": 1442.72, "start": 1442.08, "text": "to"}, {"end": 1443.12, "start": 1442.72, "text": "see"}, {"end": 1443.52, "start": 1443.12, "text": "less"}, {"end": 1444.04, "start": 1443.52, "text": "complex"}, {"end": 1444.88, "start": 1444.04, "text": "distribution"}, {"end": 1445.32, "start": 1444.88, "text": "patterns"}, {"end": 1445.36, "start": 1445.32, "text": "in"}, {"end": 1445.6, "start": 1445.36, "text": "the"}, {"end": 1446.04, "start": 1445.6, "text": "training"}, {"end": 1446.28, "start": 1446.04, "text": "data"}, {"end": 1447.12, "start": 1446.28, "text": "itself"}, {"end": 1447.32, "start": 1447.12, "text": "in"}, {"end": 1447.64, "start": 1447.32, "text": "order"}, {"end": 1447.8, "start": 1447.64, "text": "to"}, {"end": 1448.28, "start": 1447.8, "text": "train"}, {"end": 1448.64, "start": 1448.28, "text": "effectively."}, {"end": 1449.84, "start": 1448.64, "text": "And"}, {"end": 1450.2, "start": 1449.84, "text": "again,"}, {"end": 1450.28, "start": 1450.2, "text": "I"}, {"end": 1450.4, "start": 1450.28, "text": "would"}, {"end": 1450.56, "start": 1450.4, "text": "say,"}, {"end": 1450.64, "start": 1450.56, "text": "yes,"}, {"end": 1451.0, "start": 1450.64, "text": "this"}, {"end": 1451.32, "start": 1451.0, "text": "is"}, {"end": 1451.56, "start": 1451.32, "text": "also"}, {"end": 1451.72, "start": 1451.56, "text": "what"}, {"end": 1452.28, "start": 1451.72, "text": "I"}, {"end": 1452.56, "start": 1452.28, "text": "would"}, {"end": 1452.88, "start": 1452.56, "text": "assume."}, {"end": 1453.12, "start": 1452.88, "text": "So"}, {"end": 1453.16, "start": 1453.12, "text": "the"}, {"end": 1453.48, "start": 1453.16, "text": "goal"}, {"end": 1453.96, "start": 1453.48, "text": "was"}, {"end": 1454.04, "start": 1453.96, "text": "here"}, {"end": 1454.04, "start": 1454.04, "text": "to"}, {"end": 1454.4, "start": 1454.04, "text": "provide"}, {"end": 1455.0, "start": 1454.4, "text": "datasets"}, {"end": 1455.24, "start": 1455.0, "text": "with"}, {"end": 1455.52, "start": 1455.24, "text": "less"}, {"end": 1455.96, "start": 1455.52, "text": "complex"}, {"end": 1456.6, "start": 1455.96, "text": "distribution."}, {"end": 1456.72, "start": 1456.6, "text": "I'll"}, {"end": 1456.96, "start": 1456.72, "text": "show"}, {"end": 1457.2, "start": 1456.96, "text": "you"}, {"end": 1457.32, "start": 1457.2, "text": "in"}, {"end": 1457.4, "start": 1457.32, "text": "a"}, {"end": 1457.68, "start": 1457.4, "text": "minute"}, {"end": 1457.96, "start": 1457.68, "text": "how"}, {"end": 1458.28, "start": 1457.96, "text": "we"}, {"end": 1458.48, "start": 1458.28, "text": "built"}, {"end": 1458.68, "start": 1458.48, "text": "this"}, {"end": 1459.2, "start": 1458.68, "text": "dataset."}, {"end": 1459.28, "start": 1459.2, "text": "And"}, {"end": 1459.68, "start": 1459.28, "text": "I"}, {"end": 1459.76, "start": 1459.68, "text": "think"}, {"end": 1460.16, "start": 1459.76, "text": "it"}, {"end": 1460.32, "start": 1460.16, "text": "makes"}, {"end": 1460.72, "start": 1460.32, "text": "sense."}, {"end": 1461.04, "start": 1460.72, "text": "Now,"}, {"end": 1461.32, "start": 1461.04, "text": "if"}, {"end": 1461.64, "start": 1461.32, "text": "you"}, {"end": 1461.88, "start": 1461.64, "text": "have"}, {"end": 1462.04, "start": 1461.88, "text": "an"}, {"end": 1462.84, "start": 1462.04, "text": "extreme"}, {"end": 1464.32, "start": 1462.84, "text": "limited"}, {"end": 1464.76, "start": 1464.32, "text": "AI"}, {"end": 1465.88, "start": 1464.76, "text": "architecture"}, {"end": 1466.24, "start": 1465.88, "text": "from"}, {"end": 1466.52, "start": 1466.24, "text": "the"}, {"end": 1467.28, "start": 1466.52, "text": "parameters,"}, {"end": 1467.28, "start": 1467.28, "text": "from"}, {"end": 1467.4, "start": 1467.28, "text": "the"}, {"end": 1468.08, "start": 1467.4, "text": "layers,"}, {"end": 1468.32, "start": 1468.08, "text": "from"}, {"end": 1468.4, "start": 1468.32, "text": "the"}, {"end": 1468.44, "start": 1468.4, "text": "complexity,"}, {"end": 1468.6, "start": 1468.44, "text": "I"}, {"end": 1469.0, "start": 1468.6, "text": "think"}, {"end": 1469.64, "start": 1469.0, "text": "it"}, {"end": 1469.96, "start": 1469.64, "text": "is,"}], "text": " absolutely, they may need to see less complex distribution patterns in the training data itself in order to train effectively. And again, I would say, yes, this is also what I would assume. So the goal was here to provide datasets with less complex distribution. I'll show you in a minute how we built this dataset. And I think it makes sense. Now, if you have an extreme limited AI architecture from the parameters, from the layers, from the complexity, I think it is,"}, {"chunks": [{"end": 1471.12, "start": 1470.0, "text": "Self-explanatory"}, {"end": 1471.56, "start": 1471.12, "text": "that"}, {"end": 1471.88, "start": 1471.56, "text": "you"}, {"end": 1472.36, "start": 1471.88, "text": "cannot"}, {"end": 1472.76, "start": 1472.36, "text": "have"}, {"end": 1473.2, "start": 1472.76, "text": "highly"}, {"end": 1473.76, "start": 1473.2, "text": "complex"}, {"end": 1474.32, "start": 1473.76, "text": "tasks"}, {"end": 1474.92, "start": 1474.32, "text": "because"}, {"end": 1475.44, "start": 1474.92, "text": "those"}, {"end": 1476.04, "start": 1475.44, "text": "little"}, {"end": 1476.76, "start": 1476.04, "text": "systems,"}, {"end": 1477.04, "start": 1476.76, "text": "AI"}, {"end": 1477.52, "start": 1477.04, "text": "systems"}, {"end": 1477.68, "start": 1477.52, "text": "with"}, {"end": 1478.12, "start": 1477.68, "text": "hardly"}, {"end": 1478.4, "start": 1478.12, "text": "any"}, {"end": 1478.6, "start": 1478.4, "text": "free"}, {"end": 1479.72, "start": 1478.6, "text": "parameter,"}, {"end": 1479.88, "start": 1479.72, "text": "how"}, {"end": 1480.32, "start": 1479.88, "text": "should"}, {"end": 1480.56, "start": 1480.32, "text": "they"}, {"end": 1481.72, "start": 1480.56, "text": "have"}, {"end": 1481.92, "start": 1481.72, "text": "been"}, {"end": 1483.4, "start": 1481.92, "text": "pre-trained"}, {"end": 1483.64, "start": 1483.4, "text": "and"}, {"end": 1485.04, "start": 1483.64, "text": "imprinted"}, {"end": 1485.36, "start": 1485.04, "text": "here"}, {"end": 1485.92, "start": 1485.36, "text": "their"}, {"end": 1486.36, "start": 1485.92, "text": "tensor"}, {"end": 1486.88, "start": 1486.36, "text": "structure"}, {"end": 1487.32, "start": 1486.88, "text": "on"}, {"end": 1487.76, "start": 1487.32, "text": "a"}, {"end": 1487.92, "start": 1487.76, "text": "lot"}, {"end": 1488.16, "start": 1487.92, "text": "of"}, {"end": 1488.52, "start": 1488.16, "text": "different"}, {"end": 1489.0, "start": 1488.52, "text": "complex"}, {"end": 1489.36, "start": 1489.0, "text": "reasoning"}, {"end": 1489.88, "start": 1489.36, "text": "schemas?"}, {"end": 1490.12, "start": 1489.88, "text": "It"}, {"end": 1490.28, "start": 1490.12, "text": "is"}, {"end": 1490.64, "start": 1490.28, "text": "simply"}, {"end": 1490.76, "start": 1490.64, "text": "not"}, {"end": 1491.56, "start": 1490.76, "text": "possible."}, {"end": 1491.88, "start": 1491.56, "text": "Or"}, {"end": 1492.4, "start": 1491.88, "text": "maybe,"}, {"end": 1492.84, "start": 1492.4, "text": "but"}, {"end": 1493.12, "start": 1492.84, "text": "then"}, {"end": 1493.4, "start": 1493.12, "text": "we"}, {"end": 1493.4, "start": 1493.4, "text": "have"}, {"end": 1493.52, "start": 1493.4, "text": "to"}, {"end": 1493.72, "start": 1493.52, "text": "go"}, {"end": 1494.0, "start": 1493.72, "text": "back"}, {"end": 1494.4, "start": 1494.0, "text": "and"}, {"end": 1495.0, "start": 1494.4, "text": "massively"}, {"end": 1495.32, "start": 1495.0, "text": "change"}, {"end": 1495.52, "start": 1495.32, "text": "here"}, {"end": 1495.76, "start": 1495.52, "text": "the"}, {"end": 1496.48, "start": 1495.76, "text": "backpropagation"}, {"end": 1496.72, "start": 1496.48, "text": "and"}, {"end": 1497.12, "start": 1496.72, "text": "find"}, {"end": 1497.36, "start": 1497.12, "text": "here"}, {"end": 1497.44, "start": 1497.36, "text": "complete"}, {"end": 1497.68, "start": 1497.44, "text": "new"}, {"end": 1498.44, "start": 1497.68, "text": "ways"}, {"end": 1498.72, "start": 1498.44, "text": "how"}, {"end": 1499.16, "start": 1498.72, "text": "to"}, {"end": 1499.32, "start": 1499.16, "text": "do"}, {"end": 1499.48, "start": 1499.32, "text": "here"}, {"end": 1499.68, "start": 1499.48, "text": "the"}, {"end": 1499.84, "start": 1499.68, "text": "feedback"}, {"end": 1499.96, "start": 1499.84, "text": "loop."}], "text": " Self-explanatory that you cannot have highly complex tasks because those little systems, AI systems with hardly any free parameter, how should they have been pre-trained and imprinted here their tensor structure on a lot of different complex reasoning schemas? It is simply not possible. Or maybe, but then we have to go back and massively change here the backpropagation and find here complete new ways how to do here the feedback loop."}, {"chunks": [{"end": 1500.24, "start": 1500.0, "text": "Okay."}, {"end": 1500.24, "start": 1500.24, "text": "So,"}, {"end": 1500.72, "start": 1500.24, "text": "coming"}, {"end": 1500.88, "start": 1500.72, "text": "now"}, {"end": 1501.76, "start": 1500.88, "text": "to"}, {"end": 1501.88, "start": 1501.76, "text": "the"}, {"end": 1502.12, "start": 1501.88, "text": "conclusion"}, {"end": 1502.16, "start": 1502.12, "text": "from"}, {"end": 1502.68, "start": 1502.16, "text": "the"}, {"end": 1503.16, "start": 1502.68, "text": "side"}, {"end": 1503.48, "start": 1503.16, "text": "of"}, {"end": 1503.72, "start": 1503.48, "text": "the"}, {"end": 1504.28, "start": 1503.72, "text": "authors,"}, {"end": 1504.8, "start": 1504.28, "text": "and"}, {"end": 1505.0, "start": 1504.8, "text": "this"}, {"end": 1505.32, "start": 1505.0, "text": "is"}, {"end": 1505.68, "start": 1505.32, "text": "here"}, {"end": 1505.96, "start": 1505.68, "text": "officially"}, {"end": 1506.16, "start": 1505.96, "text": "the"}, {"end": 1506.72, "start": 1506.16, "text": "published"}, {"end": 1507.36, "start": 1506.72, "text": "conclusion."}, {"end": 1507.76, "start": 1507.36, "text": "This"}, {"end": 1508.08, "start": 1507.76, "text": "is"}, {"end": 1508.32, "start": 1508.08, "text": "just"}, {"end": 1508.72, "start": 1508.32, "text": "here"}, {"end": 1509.32, "start": 1508.72, "text": "my"}, {"end": 1509.88, "start": 1509.32, "text": "conclusion,"}, {"end": 1510.12, "start": 1509.88, "text": "if"}, {"end": 1510.36, "start": 1510.12, "text": "you"}, {"end": 1510.84, "start": 1510.36, "text": "want,"}, {"end": 1511.2, "start": 1510.84, "text": "what"}, {"end": 1511.44, "start": 1511.2, "text": "I"}, {"end": 1511.6, "start": 1511.44, "text": "saw"}, {"end": 1511.6, "start": 1511.6, "text": "in"}, {"end": 1511.8, "start": 1511.6, "text": "this"}, {"end": 1512.04, "start": 1511.8, "text": "paper."}, {"end": 1512.12, "start": 1512.04, "text": "So"}, {"end": 1512.2, "start": 1512.12, "text": "the"}, {"end": 1512.72, "start": 1512.2, "text": "official"}, {"end": 1513.12, "start": 1512.72, "text": "conclusion"}, {"end": 1513.28, "start": 1513.12, "text": "is"}, {"end": 1513.28, "start": 1513.28, "text": "that"}, {"end": 1513.52, "start": 1513.28, "text": "the"}, {"end": 1513.88, "start": 1513.52, "text": "authors"}, {"end": 1514.2, "start": 1513.88, "text": "suggested"}, {"end": 1514.56, "start": 1514.2, "text": "for"}, {"end": 1515.04, "start": 1514.56, "text": "LLM"}, {"end": 1515.24, "start": 1515.04, "text": "and"}, {"end": 1515.56, "start": 1515.24, "text": "their"}, {"end": 1515.8, "start": 1515.56, "text": "most"}, {"end": 1516.24, "start": 1515.8, "text": "widespread"}, {"end": 1517.04, "start": 1516.24, "text": "application,"}, {"end": 1517.2, "start": 1517.04, "text": "the"}, {"end": 1518.24, "start": 1517.2, "text": "agents,"}, {"end": 1518.56, "start": 1518.24, "text": "the"}, {"end": 1519.08, "start": 1518.56, "text": "strategy"}, {"end": 1519.24, "start": 1519.08, "text": "that"}, {"end": 1519.36, "start": 1519.24, "text": "were"}, {"end": 1519.8, "start": 1519.36, "text": "tested"}, {"end": 1520.52, "start": 1519.8, "text": "effectively"}, {"end": 1520.8, "start": 1520.52, "text": "for"}, {"end": 1520.8, "start": 1520.8, "text": "the"}, {"end": 1521.08, "start": 1520.8, "text": "small"}, {"end": 1521.36, "start": 1521.08, "text": "models,"}, {"end": 1521.4, "start": 1521.36, "text": "and"}, {"end": 1521.72, "start": 1521.4, "text": "they"}, {"end": 1521.76, "start": 1521.72, "text": "have"}, {"end": 1522.32, "start": 1521.76, "text": "a"}, {"end": 1522.84, "start": 1522.32, "text": "lot"}, {"end": 1523.4, "start": 1522.84, "text": "of"}, {"end": 1524.12, "start": 1523.4, "text": "statistical"}, {"end": 1524.6, "start": 1524.12, "text": "data"}, {"end": 1524.72, "start": 1524.6, "text": "in"}, {"end": 1525.04, "start": 1524.72, "text": "their"}, {"end": 1525.8, "start": 1525.04, "text": "preprint."}, {"end": 1526.36, "start": 1525.8, "text": "So"}, {"end": 1526.72, "start": 1526.36, "text": "please"}, {"end": 1527.28, "start": 1526.72, "text": "go"}, {"end": 1527.84, "start": 1527.28, "text": "there,"}, {"end": 1528.2, "start": 1527.84, "text": "have"}, {"end": 1528.52, "start": 1528.2, "text": "a"}, {"end": 1528.68, "start": 1528.52, "text": "look"}, {"end": 1528.88, "start": 1528.68, "text": "at"}, {"end": 1529.04, "start": 1528.88, "text": "it,"}, {"end": 1529.24, "start": 1529.04, "text": "you'll"}, {"end": 1529.48, "start": 1529.24, "text": "find"}, {"end": 1529.64, "start": 1529.48, "text": "a"}, {"end": 1529.64, "start": 1529.64, "text": "lot"}, {"end": 1529.64, "start": 1529.64, "text": "of"}, {"end": 1529.96, "start": 1529.64, "text": "data,"}], "text": " Okay. So, coming now to the conclusion from the side of the authors, and this is here officially the published conclusion. This is just here my conclusion, if you want, what I saw in this paper. So the official conclusion is that the authors suggested for LLM and their most widespread application, the agents, the strategy that were tested effectively for the small models, and they have a lot of statistical data in their preprint. So please go there, have a look at it, you'll find a lot of data,"}, {"chunks": [{"end": 1530.36, "start": 1530.0, "text": "regarding"}, {"end": 1530.76, "start": 1530.36, "text": "small"}, {"end": 1531.2, "start": 1530.76, "text": "models,"}, {"end": 1531.64, "start": 1531.2, "text": "regarding"}, {"end": 1531.8, "start": 1531.64, "text": "data"}, {"end": 1532.28, "start": 1531.8, "text": "sets,"}, {"end": 1532.8, "start": 1532.28, "text": "regarding"}, {"end": 1533.44, "start": 1532.8, "text": "agents"}, {"end": 1533.72, "start": 1533.44, "text": "in"}, {"end": 1534.36, "start": 1533.72, "text": "those"}, {"end": 1534.6, "start": 1534.36, "text": "simpler"}, {"end": 1534.96, "start": 1534.6, "text": "language"}, {"end": 1536.56, "start": 1534.96, "text": "environments."}, {"end": 1536.76, "start": 1536.56, "text": "In"}, {"end": 1537.08, "start": 1536.76, "text": "the"}, {"end": 1537.28, "start": 1537.08, "text": "authors'"}, {"end": 1538.24, "start": 1537.28, "text": "argument,"}, {"end": 1538.56, "start": 1538.24, "text": "they"}, {"end": 1539.0, "start": 1538.56, "text": "can"}, {"end": 1539.56, "start": 1539.0, "text": "potentially"}, {"end": 1539.92, "start": 1539.56, "text": "be"}, {"end": 1540.44, "start": 1539.92, "text": "adapted"}, {"end": 1540.76, "start": 1540.44, "text": "now"}, {"end": 1541.28, "start": 1540.76, "text": "to"}, {"end": 1541.8, "start": 1541.28, "text": "larger"}, {"end": 1542.16, "start": 1541.8, "text": "models,"}, {"end": 1542.32, "start": 1542.16, "text": "to"}, {"end": 1542.64, "start": 1542.32, "text": "larger"}, {"end": 1543.24, "start": 1542.64, "text": "data"}, {"end": 1543.56, "start": 1543.24, "text": "sets,"}, {"end": 1543.88, "start": 1543.56, "text": "and"}, {"end": 1544.04, "start": 1543.88, "text": "the"}, {"end": 1544.72, "start": 1544.04, "text": "behavior"}, {"end": 1544.8, "start": 1544.72, "text": "of"}, {"end": 1545.16, "start": 1544.8, "text": "agents"}, {"end": 1545.56, "start": 1545.16, "text": "in"}, {"end": 1545.64, "start": 1545.56, "text": "more"}, {"end": 1546.24, "start": 1545.64, "text": "complex"}, {"end": 1547.8, "start": 1546.24, "text": "environments."}, {"end": 1548.24, "start": 1547.8, "text": "Now,"}, {"end": 1548.32, "start": 1548.24, "text": "you"}, {"end": 1548.52, "start": 1548.32, "text": "know"}, {"end": 1548.76, "start": 1548.52, "text": "from"}, {"end": 1549.4, "start": 1548.76, "text": "mathematics,"}, {"end": 1549.68, "start": 1549.4, "text": "whenever"}, {"end": 1549.88, "start": 1549.68, "text": "we"}, {"end": 1550.24, "start": 1549.88, "text": "deal"}, {"end": 1550.4, "start": 1550.24, "text": "with"}, {"end": 1550.56, "start": 1550.4, "text": "an"}, {"end": 1550.92, "start": 1550.56, "text": "increased"}, {"end": 1552.88, "start": 1550.92, "text": "complexity,"}, {"end": 1553.32, "start": 1552.88, "text": "I"}, {"end": 1553.64, "start": 1553.32, "text": "would"}, {"end": 1553.8, "start": 1553.64, "text": "not"}, {"end": 1554.08, "start": 1553.8, "text": "expect"}, {"end": 1554.36, "start": 1554.08, "text": "here"}, {"end": 1554.88, "start": 1554.36, "text": "a"}, {"end": 1555.28, "start": 1554.88, "text": "linear"}, {"end": 1555.8, "start": 1555.28, "text": "mapping,"}, {"end": 1557.6, "start": 1555.8, "text": "because"}, {"end": 1558.0, "start": 1557.6, "text": "we"}, {"end": 1558.12, "start": 1558.0, "text": "have"}, {"end": 1558.6, "start": 1558.12, "text": "unknown"}, {"end": 1559.28, "start": 1558.6, "text": "effects"}, {"end": 1559.28, "start": 1559.28, "text": "that"}, {"end": 1559.96, "start": 1559.28, "text": "can"}], "text": " regarding small models, regarding data sets, regarding agents in those simpler language environments. In the authors' argument, they can potentially be adapted now to larger models, to larger data sets, and the behavior of agents in more complex environments. Now, you know from mathematics, whenever we deal with an increased complexity, I would not expect here a linear mapping, because we have unknown effects that can"}, {"chunks": [{"end": 1560.64, "start": 1560.0, "text": "take"}, {"end": 1561.28, "start": 1560.64, "text": "over"}, {"end": 1561.44, "start": 1561.28, "text": "and"}, {"end": 1561.88, "start": 1561.44, "text": "take"}, {"end": 1562.52, "start": 1561.88, "text": "control"}, {"end": 1562.64, "start": 1562.52, "text": "of"}, {"end": 1563.36, "start": 1562.64, "text": "this."}, {"end": 1564.12, "start": 1563.36, "text": "So"}, {"end": 1564.68, "start": 1564.12, "text": "this"}, {"end": 1565.28, "start": 1564.68, "text": "adaptation"}, {"end": 1565.32, "start": 1565.28, "text": "from"}, {"end": 1565.32, "start": 1565.32, "text": "the"}, {"end": 1565.68, "start": 1565.32, "text": "inside"}, {"end": 1566.12, "start": 1565.68, "text": "of"}, {"end": 1566.4, "start": 1566.12, "text": "a"}, {"end": 1567.08, "start": 1566.4, "text": "small"}, {"end": 1567.4, "start": 1567.08, "text": "model"}, {"end": 1568.76, "start": 1567.4, "text": "to"}, {"end": 1568.88, "start": 1568.76, "text": "a"}, {"end": 1569.28, "start": 1568.88, "text": "medium"}, {"end": 1569.4, "start": 1569.28, "text": "or"}, {"end": 1569.68, "start": 1569.4, "text": "a"}, {"end": 1570.04, "start": 1569.68, "text": "large"}, {"end": 1570.08, "start": 1570.04, "text": "model,"}, {"end": 1570.36, "start": 1570.08, "text": "I"}, {"end": 1570.64, "start": 1570.36, "text": "understand"}, {"end": 1570.64, "start": 1570.64, "text": "what"}, {"end": 1570.96, "start": 1570.64, "text": "they"}, {"end": 1571.24, "start": 1570.96, "text": "do."}, {"end": 1571.52, "start": 1571.24, "text": "And"}, {"end": 1571.84, "start": 1571.52, "text": "I"}, {"end": 1572.08, "start": 1571.84, "text": "think"}, {"end": 1572.56, "start": 1572.08, "text": "this"}, {"end": 1572.84, "start": 1572.56, "text": "work"}, {"end": 1573.56, "start": 1572.84, "text": "is"}, {"end": 1574.36, "start": 1573.56, "text": "amazing"}, {"end": 1574.68, "start": 1574.36, "text": "that"}, {"end": 1574.96, "start": 1574.68, "text": "they"}, {"end": 1575.16, "start": 1574.96, "text": "go"}, {"end": 1575.4, "start": 1575.16, "text": "down"}, {"end": 1575.76, "start": 1575.4, "text": "to"}, {"end": 1576.0, "start": 1575.76, "text": "the"}, {"end": 1576.6, "start": 1576.0, "text": "absolute"}, {"end": 1577.88, "start": 1576.6, "text": "minimum,"}, {"end": 1578.4, "start": 1577.88, "text": "1"}, {"end": 1578.88, "start": 1578.4, "text": "million"}, {"end": 1579.04, "start": 1578.88, "text": "or"}, {"end": 1579.4, "start": 1579.04, "text": "14"}, {"end": 1579.76, "start": 1579.4, "text": "million"}, {"end": 1579.92, "start": 1579.76, "text": "free"}, {"end": 1580.32, "start": 1579.92, "text": "trainable"}, {"end": 1581.4, "start": 1580.32, "text": "parameter."}, {"end": 1582.12, "start": 1581.4, "text": "But"}, {"end": 1582.4, "start": 1582.12, "text": "if"}, {"end": 1582.56, "start": 1582.4, "text": "we"}, {"end": 1583.0, "start": 1582.56, "text": "can"}, {"end": 1583.44, "start": 1583.0, "text": "adapt"}, {"end": 1583.72, "start": 1583.44, "text": "the"}, {"end": 1584.72, "start": 1583.72, "text": "inside,"}, {"end": 1584.8, "start": 1584.72, "text": "I"}, {"end": 1585.28, "start": 1584.8, "text": "think,"}, {"end": 1585.68, "start": 1585.28, "text": "wow,"}, {"end": 1586.32, "start": 1585.68, "text": "there"}, {"end": 1586.52, "start": 1586.32, "text": "is"}, {"end": 1586.52, "start": 1586.52, "text": "a"}, {"end": 1586.76, "start": 1586.52, "text": "lot"}, {"end": 1586.88, "start": 1586.76, "text": "of"}, {"end": 1587.36, "start": 1586.88, "text": "research"}, {"end": 1587.56, "start": 1587.36, "text": "to"}, {"end": 1587.68, "start": 1587.56, "text": "be"}, {"end": 1589.32, "start": 1587.68, "text": "done."}, {"end": 1589.68, "start": 1589.32, "text": "Yeah,"}, {"end": 1589.96, "start": 1589.68, "text": "I"}], "text": " take over and take control of this. So this adaptation from the inside of a small model to a medium or a large model, I understand what they do. And I think this work is amazing that they go down to the absolute minimum, 1 million or 14 million free trainable parameter. But if we can adapt the inside, I think, wow, there is a lot of research to be done. Yeah, I"}, {"chunks": [{"end": 1590.28, "start": 1590.0, "text": "to"}, {"end": 1590.52, "start": 1590.28, "text": "show"}, {"end": 1590.56, "start": 1590.52, "text": "you"}, {"end": 1590.88, "start": 1590.56, "text": "they"}, {"end": 1591.4, "start": 1590.88, "text": "they"}, {"end": 1591.76, "start": 1591.4, "text": "are"}, {"end": 1592.04, "start": 1591.76, "text": "really"}, {"end": 1592.6, "start": 1592.04, "text": "transparent"}, {"end": 1592.8, "start": 1592.6, "text": "they"}, {"end": 1593.0, "start": 1592.8, "text": "give"}, {"end": 1593.28, "start": 1593.0, "text": "you"}, {"end": 1593.68, "start": 1593.28, "text": "everything"}, {"end": 1594.08, "start": 1593.68, "text": "so"}, {"end": 1594.28, "start": 1594.08, "text": "in"}, {"end": 1594.72, "start": 1594.28, "text": "nxg"}, {"end": 1595.08, "start": 1594.72, "text": "they"}, {"end": 1595.44, "start": 1595.08, "text": "give"}, {"end": 1595.56, "start": 1595.44, "text": "you"}, {"end": 1595.56, "start": 1595.56, "text": "the"}, {"end": 1596.0, "start": 1595.56, "text": "prompt"}, {"end": 1596.2, "start": 1596.0, "text": "for"}, {"end": 1596.72, "start": 1596.2, "text": "creating"}, {"end": 1596.88, "start": 1596.72, "text": "now"}, {"end": 1597.24, "start": 1596.88, "text": "this"}, {"end": 1597.52, "start": 1597.24, "text": "leaner"}, {"end": 1598.36, "start": 1597.52, "text": "training"}, {"end": 1598.76, "start": 1598.36, "text": "leaner"}, {"end": 1599.2, "start": 1598.76, "text": "glue"}, {"end": 1599.48, "start": 1599.2, "text": "and"}, {"end": 1599.72, "start": 1599.48, "text": "leaner"}, {"end": 1600.08, "start": 1599.72, "text": "evaluation"}, {"end": 1600.8, "start": 1600.08, "text": "datas"}, {"end": 1601.04, "start": 1600.8, "text": "so"}, {"end": 1601.28, "start": 1601.04, "text": "you"}, {"end": 1601.4, "start": 1601.28, "text": "have"}, {"end": 1601.44, "start": 1601.4, "text": "a"}, {"end": 1601.8, "start": 1601.44, "text": "background"}, {"end": 1602.24, "start": 1601.8, "text": "prompt"}, {"end": 1602.36, "start": 1602.24, "text": "you"}, {"end": 1602.52, "start": 1602.36, "text": "just"}, {"end": 1602.76, "start": 1602.52, "text": "tell"}, {"end": 1603.12, "start": 1602.76, "text": "the"}, {"end": 1603.44, "start": 1603.12, "text": "llm"}, {"end": 1603.6, "start": 1603.44, "text": "hey"}, {"end": 1603.84, "start": 1603.6, "text": "you're"}, {"end": 1603.88, "start": 1603.84, "text": "a"}, {"end": 1604.24, "start": 1603.88, "text": "professional"}, {"end": 1604.76, "start": 1604.24, "text": "linguist"}, {"end": 1604.92, "start": 1604.76, "text": "you"}, {"end": 1605.0, "start": 1604.92, "text": "got"}, {"end": 1605.0, "start": 1605.0, "text": "this"}, {"end": 1605.08, "start": 1605.0, "text": "so"}, {"end": 1605.4, "start": 1605.08, "text": "and"}, {"end": 1605.52, "start": 1605.4, "text": "then"}, {"end": 1605.88, "start": 1605.52, "text": "you"}, {"end": 1606.16, "start": 1605.88, "text": "have"}, {"end": 1606.36, "start": 1606.16, "text": "the"}, {"end": 1606.72, "start": 1606.36, "text": "general"}, {"end": 1607.2, "start": 1606.72, "text": "requirement"}, {"end": 1607.6, "start": 1607.2, "text": "prompt"}, {"end": 1607.8, "start": 1607.6, "text": "and"}, {"end": 1608.36, "start": 1607.8, "text": "then"}, {"end": 1608.6, "start": 1608.36, "text": "here"}, {"end": 1608.8, "start": 1608.6, "text": "you"}, {"end": 1609.08, "start": 1608.8, "text": "have"}, {"end": 1609.24, "start": 1609.08, "text": "now"}, {"end": 1609.68, "start": 1609.24, "text": "the"}, {"end": 1610.8, "start": 1609.68, "text": "simplification"}, {"end": 1611.16, "start": 1610.8, "text": "prompt"}, {"end": 1611.72, "start": 1611.16, "text": "and"}, {"end": 1612.04, "start": 1611.72, "text": "you"}, {"end": 1612.48, "start": 1612.04, "text": "tell"}, {"end": 1612.88, "start": 1612.48, "text": "your"}, {"end": 1613.12, "start": 1612.88, "text": "i"}, {"end": 1613.44, "start": 1613.12, "text": "don't"}, {"end": 1613.72, "start": 1613.44, "text": "know"}, {"end": 1613.76, "start": 1613.72, "text": "what"}, {"end": 1613.96, "start": 1613.76, "text": "you"}, {"end": 1614.0, "start": 1613.96, "text": "like"}, {"end": 1614.24, "start": 1614.0, "text": "your"}, {"end": 1614.92, "start": 1614.24, "text": "o1"}, {"end": 1615.36, "start": 1614.92, "text": "model"}, {"end": 1616.12, "start": 1615.36, "text": "or"}, {"end": 1616.32, "start": 1616.12, "text": "your"}, {"end": 1616.84, "start": 1616.32, "text": "q1"}, {"end": 1617.28, "start": 1616.84, "text": "model"}, {"end": 1617.48, "start": 1617.28, "text": "you"}, {"end": 1617.88, "start": 1617.48, "text": "see"}, {"end": 1618.16, "start": 1617.88, "text": "this"}, {"end": 1618.44, "start": 1618.16, "text": "is"}, {"end": 1618.56, "start": 1618.44, "text": "the"}, {"end": 1619.2, "start": 1618.56, "text": "instruction"}, {"end": 1619.44, "start": 1619.2, "text": "on"}, {"end": 1619.96, "start": 1619.44, "text": "how"}], "text": " to show you they they are really transparent they give you everything so in nxg they give you the prompt for creating now this leaner training leaner glue and leaner evaluation datas so you have a background prompt you just tell the llm hey you're a professional linguist you got this so and then you have the general requirement prompt and then here you have now the simplification prompt and you tell your i don't know what you like your o1 model or your q1 model you see this is the instruction on how"}, {"chunks": [{"end": 1620.16, "start": 1620.0, "text": "to"}, {"end": 1621.64, "start": 1620.16, "text": "simplify"}, {"end": 1621.76, "start": 1621.64, "text": "the"}, {"end": 1622.08, "start": 1621.76, "text": "prompt."}, {"end": 1622.28, "start": 1622.08, "text": "And"}, {"end": 1622.28, "start": 1622.28, "text": "of"}, {"end": 1622.6, "start": 1622.28, "text": "course,"}, {"end": 1622.76, "start": 1622.6, "text": "you"}, {"end": 1623.04, "start": 1622.76, "text": "need"}, {"end": 1623.4, "start": 1623.04, "text": "training"}, {"end": 1623.64, "start": 1623.4, "text": "data."}, {"end": 1624.68, "start": 1623.64, "text": "So"}, {"end": 1624.92, "start": 1624.68, "text": "for"}, {"end": 1625.12, "start": 1624.92, "text": "the"}, {"end": 1625.28, "start": 1625.12, "text": "wiki"}, {"end": 1626.24, "start": 1625.28, "text": "simplification,"}, {"end": 1626.48, "start": 1626.24, "text": "you"}, {"end": 1626.8, "start": 1626.48, "text": "have"}, {"end": 1626.96, "start": 1626.8, "text": "here"}, {"end": 1627.2, "start": 1626.96, "text": "the"}, {"end": 1627.32, "start": 1627.2, "text": "backward"}, {"end": 1627.6, "start": 1627.32, "text": "prompt,"}, {"end": 1627.72, "start": 1627.6, "text": "you"}, {"end": 1627.92, "start": 1627.72, "text": "have"}, {"end": 1628.08, "start": 1627.92, "text": "here"}, {"end": 1628.44, "start": 1628.08, "text": "the"}, {"end": 1628.68, "start": 1628.44, "text": "simplification"}, {"end": 1629.0, "start": 1628.68, "text": "prompt,"}, {"end": 1629.48, "start": 1629.0, "text": "and"}, {"end": 1629.76, "start": 1629.48, "text": "then"}, {"end": 1630.0, "start": 1629.76, "text": "you"}, {"end": 1630.0, "start": 1630.0, "text": "have"}, {"end": 1630.28, "start": 1630.0, "text": "here"}, {"end": 1630.44, "start": 1630.28, "text": "the"}, {"end": 1631.04, "start": 1630.44, "text": "examples."}, {"end": 1631.36, "start": 1631.04, "text": "Example"}, {"end": 1631.76, "start": 1631.36, "text": "1,"}, {"end": 1632.16, "start": 1631.76, "text": "example"}, {"end": 1632.48, "start": 1632.16, "text": "2,"}, {"end": 1632.56, "start": 1632.48, "text": "and"}, {"end": 1632.88, "start": 1632.56, "text": "example"}, {"end": 1633.36, "start": 1632.88, "text": "3,"}, {"end": 1633.64, "start": 1633.36, "text": "and"}, {"end": 1633.92, "start": 1633.64, "text": "you"}, {"end": 1634.28, "start": 1633.92, "text": "define"}, {"end": 1634.56, "start": 1634.28, "text": "the"}, {"end": 1634.8, "start": 1634.56, "text": "output"}, {"end": 1635.0, "start": 1634.8, "text": "format."}, {"end": 1635.72, "start": 1635.0, "text": "So"}, {"end": 1636.36, "start": 1635.72, "text": "really"}, {"end": 1637.56, "start": 1636.36, "text": "straightforward."}, {"end": 1637.72, "start": 1637.56, "text": "Be"}, {"end": 1638.16, "start": 1637.72, "text": "careful"}, {"end": 1638.44, "start": 1638.16, "text": "that"}, {"end": 1638.6, "start": 1638.44, "text": "you"}, {"end": 1639.04, "start": 1638.6, "text": "choose"}, {"end": 1639.16, "start": 1639.04, "text": "here"}, {"end": 1639.44, "start": 1639.16, "text": "real"}, {"end": 1640.16, "start": 1639.44, "text": "powerful"}, {"end": 1640.84, "start": 1640.16, "text": "examples"}, {"end": 1641.28, "start": 1640.84, "text": "that"}, {"end": 1641.68, "start": 1641.28, "text": "you"}, {"end": 1642.0, "start": 1641.68, "text": "have"}, {"end": 1642.32, "start": 1642.0, "text": "here,"}, {"end": 1642.68, "start": 1642.32, "text": "really"}, {"end": 1643.0, "start": 1642.68, "text": "the"}, {"end": 1643.44, "start": 1643.0, "text": "most"}, {"end": 1644.08, "start": 1643.44, "text": "significant"}, {"end": 1644.84, "start": 1644.08, "text": "functionals"}, {"end": 1645.16, "start": 1644.84, "text": "that"}, {"end": 1645.48, "start": 1645.16, "text": "you"}, {"end": 1645.84, "start": 1645.48, "text": "can"}, {"end": 1646.2, "start": 1645.84, "text": "find"}, {"end": 1646.48, "start": 1646.2, "text": "here"}, {"end": 1646.48, "start": 1646.48, "text": "in"}, {"end": 1646.68, "start": 1646.48, "text": "your"}, {"end": 1647.16, "start": 1646.68, "text": "system."}, {"end": 1647.36, "start": 1647.16, "text": "And"}, {"end": 1647.8, "start": 1647.36, "text": "then"}, {"end": 1647.92, "start": 1647.8, "text": "you"}, {"end": 1648.32, "start": 1647.92, "text": "provide"}, {"end": 1648.6, "start": 1648.32, "text": "here"}, {"end": 1648.8, "start": 1648.6, "text": "one,"}, {"end": 1648.92, "start": 1648.8, "text": "two,"}, {"end": 1649.24, "start": 1648.92, "text": "three"}, {"end": 1649.72, "start": 1649.24, "text": "examples,"}, {"end": 1649.96, "start": 1649.72, "text": "or"}], "text": " to simplify the prompt. And of course, you need training data. So for the wiki simplification, you have here the backward prompt, you have here the simplification prompt, and then you have here the examples. Example 1, example 2, and example 3, and you define the output format. So really straightforward. Be careful that you choose here real powerful examples that you have here, really the most significant functionals that you can find here in your system. And then you provide here one, two, three examples, or"}, {"chunks": [{"end": 1650.4, "start": 1650.0, "text": "much"}, {"end": 1650.68, "start": 1650.4, "text": "more"}, {"end": 1651.0, "start": 1650.68, "text": "if"}, {"end": 1651.28, "start": 1651.0, "text": "you"}, {"end": 1651.84, "start": 1651.28, "text": "have"}, {"end": 1652.04, "start": 1651.84, "text": "found"}, {"end": 1652.4, "start": 1652.04, "text": "them."}, {"end": 1652.52, "start": 1652.4, "text": "So"}, {"end": 1652.68, "start": 1652.52, "text": "you"}, {"end": 1653.8, "start": 1652.68, "text": "see,"}, {"end": 1654.36, "start": 1653.8, "text": "short"}, {"end": 1654.76, "start": 1654.36, "text": "video,"}, {"end": 1655.0, "start": 1654.76, "text": "just"}, {"end": 1655.16, "start": 1655.0, "text": "wanted"}, {"end": 1655.32, "start": 1655.16, "text": "to"}, {"end": 1655.56, "start": 1655.32, "text": "tell"}, {"end": 1656.2, "start": 1655.56, "text": "you,"}, {"end": 1656.44, "start": 1656.2, "text": "hey,"}, {"end": 1656.8, "start": 1656.44, "text": "they"}, {"end": 1656.8, "start": 1656.8, "text": "are"}, {"end": 1657.36, "start": 1656.8, "text": "tiny"}, {"end": 1657.76, "start": 1657.36, "text": "language"}, {"end": 1658.28, "start": 1657.76, "text": "models."}, {"end": 1658.8, "start": 1658.28, "text": "There's"}, {"end": 1659.32, "start": 1658.8, "text": "a"}, {"end": 1659.88, "start": 1659.32, "text": "lot"}, {"end": 1660.12, "start": 1659.88, "text": "of"}, {"end": 1660.4, "start": 1660.12, "text": "AI"}, {"end": 1660.76, "start": 1660.4, "text": "research"}, {"end": 1661.0, "start": 1660.76, "text": "going"}, {"end": 1661.0, "start": 1661.0, "text": "on"}, {"end": 1661.0, "start": 1661.0, "text": "and"}, {"end": 1661.52, "start": 1661.0, "text": "they"}, {"end": 1661.68, "start": 1661.52, "text": "have"}, {"end": 1661.8, "start": 1661.68, "text": "the"}, {"end": 1662.08, "start": 1661.8, "text": "potential"}, {"end": 1662.2, "start": 1662.08, "text": "that"}, {"end": 1662.32, "start": 1662.2, "text": "we"}, {"end": 1662.6, "start": 1662.32, "text": "can"}, {"end": 1663.04, "start": 1662.6, "text": "really"}, {"end": 1664.28, "start": 1663.04, "text": "explore"}, {"end": 1664.4, "start": 1664.28, "text": "here"}, {"end": 1664.52, "start": 1664.4, "text": "the"}, {"end": 1665.04, "start": 1664.52, "text": "behavior"}, {"end": 1665.2, "start": 1665.04, "text": "of"}, {"end": 1665.76, "start": 1665.2, "text": "language"}, {"end": 1666.28, "start": 1665.76, "text": "model"}, {"end": 1666.88, "start": 1666.28, "text": "to"}, {"end": 1668.32, "start": 1666.88, "text": "understand"}, {"end": 1668.48, "start": 1668.32, "text": "new"}, {"end": 1669.16, "start": 1668.48, "text": "pre-training"}, {"end": 1670.24, "start": 1669.16, "text": "methodologies,"}, {"end": 1670.56, "start": 1670.24, "text": "new"}, {"end": 1671.52, "start": 1670.56, "text": "pre-training"}, {"end": 1672.04, "start": 1671.52, "text": "data"}, {"end": 1672.68, "start": 1672.04, "text": "designs"}, {"end": 1672.88, "start": 1672.68, "text": "that"}, {"end": 1673.2, "start": 1672.88, "text": "those"}, {"end": 1673.6, "start": 1673.2, "text": "models"}, {"end": 1673.84, "start": 1673.6, "text": "need"}, {"end": 1674.56, "start": 1673.84, "text": "to"}, {"end": 1675.08, "start": 1674.56, "text": "improve"}, {"end": 1675.48, "start": 1675.08, "text": "their"}, {"end": 1676.04, "start": 1675.48, "text": "overall"}, {"end": 1676.36, "start": 1676.04, "text": "AI"}, {"end": 1677.0, "start": 1676.36, "text": "performance."}, {"end": 1677.2, "start": 1677.0, "text": "So"}, {"end": 1677.28, "start": 1677.2, "text": "I"}, {"end": 1677.56, "start": 1677.28, "text": "think"}, {"end": 1677.88, "start": 1677.56, "text": "this"}, {"end": 1678.24, "start": 1677.88, "text": "is"}, {"end": 1678.32, "start": 1678.24, "text": "it"}, {"end": 1678.64, "start": 1678.32, "text": "for"}, {"end": 1678.92, "start": 1678.64, "text": "this"}, {"end": 1679.48, "start": 1678.92, "text": "video."}, {"end": 1679.56, "start": 1679.48, "text": "I"}, {"end": 1679.6, "start": 1679.56, "text": "hope"}, {"end": 1679.96, "start": 1679.6, "text": "you"}], "text": " much more if you have found them. So you see, short video, just wanted to tell you, hey, they are tiny language models. There's a lot of AI research going on and they have the potential that we can really explore here the behavior of language model to understand new pre-training methodologies, new pre-training data designs that those models need to improve their overall AI performance. So I think this is it for this video. I hope you"}, {"chunks": [{"end": 1680.6, "start": 1680.0, "text": "enjoyed"}, {"end": 1680.68, "start": 1680.6, "text": "it,"}, {"end": 1680.8, "start": 1680.68, "text": "I"}, {"end": 1680.88, "start": 1680.8, "text": "hope"}, {"end": 1681.08, "start": 1680.88, "text": "I"}, {"end": 1681.08, "start": 1681.08, "text": "could"}, {"end": 1681.52, "start": 1681.08, "text": "provide"}, {"end": 1681.76, "start": 1681.52, "text": "some"}, {"end": 1681.88, "start": 1681.76, "text": "new"}, {"end": 1682.2, "start": 1681.88, "text": "insights,"}, {"end": 1682.48, "start": 1682.2, "text": "some"}, {"end": 1682.76, "start": 1682.48, "text": "new"}, {"end": 1683.32, "start": 1682.76, "text": "ideas,"}, {"end": 1683.6, "start": 1683.32, "text": "and"}, {"end": 1683.84, "start": 1683.6, "text": "it"}, {"end": 1684.08, "start": 1683.84, "text": "would"}, {"end": 1684.24, "start": 1684.08, "text": "be"}, {"end": 1684.68, "start": 1684.24, "text": "great"}, {"end": 1684.92, "start": 1684.68, "text": "if"}, {"end": 1685.0, "start": 1684.92, "text": "you"}, {"end": 1685.56, "start": 1685.0, "text": "subscribe"}, {"end": 1685.88, "start": 1685.56, "text": "to"}, {"end": 1686.12, "start": 1685.88, "text": "see"}, {"end": 1686.44, "start": 1686.12, "text": "you"}, {"end": 1686.56, "start": 1686.44, "text": "in"}, {"end": 1686.8, "start": 1686.56, "text": "my"}, {"end": 1687.04, "start": 1686.8, "text": "next"}, {"end": 1687.64, "start": 1687.04, "text": "video."}], "text": " enjoyed it, I hope I could provide some new insights, some new ideas, and it would be great if you subscribe to see you in my next video."}]}}