{"message": {"transcript": [{"chunks": [{"end": 6.44, "start": 0.0, "text": "So"}, {"end": 7.08, "start": 6.44, "text": "our"}, {"end": 7.4, "start": 7.08, "text": "next"}, {"end": 7.84, "start": 7.4, "text": "speaker"}, {"end": 8.36, "start": 7.84, "text": "is"}, {"end": 8.56, "start": 8.36, "text": "Min"}, {"end": 8.6, "start": 8.56, "text": "Min"}, {"end": 9.16, "start": 8.6, "text": "Chen"}, {"end": 11.2, "start": 9.16, "text": "from"}, {"end": 13.04, "start": 11.2, "text": "Google."}, {"end": 14.6, "start": 13.04, "text": "Thank"}, {"end": 15.28, "start": 14.6, "text": "you."}, {"end": 20.76, "start": 15.28, "text": "So"}, {"end": 21.2, "start": 20.76, "text": "I'm"}, {"end": 21.72, "start": 21.2, "text": "going"}, {"end": 21.96, "start": 21.72, "text": "to"}, {"end": 22.28, "start": 21.96, "text": "try"}, {"end": 22.88, "start": 22.28, "text": "to"}, {"end": 23.56, "start": 22.88, "text": "share"}, {"end": 24.56, "start": 23.56, "text": "our"}, {"end": 25.32, "start": 24.56, "text": "efforts"}, {"end": 25.64, "start": 25.32, "text": "in"}, {"end": 26.36, "start": 25.64, "text": "improving"}, {"end": 27.08, "start": 26.36, "text": "YouTube"}, {"end": 27.84, "start": 27.08, "text": "video"}, {"end": 28.72, "start": 27.84, "text": "recommendations"}, {"end": 29.44, "start": 28.72, "text": "from"}, {"end": 30.0, "start": 29.44, "text": "reinforcement"}], "text": " So our next speaker is Min Min Chen from Google. Thank you. So I'm going to try to share our efforts in improving YouTube video recommendations from reinforcement"}, {"chunks": [{"end": 30.92, "start": 30.0, "text": "learning"}, {"end": 32.92, "start": 30.92, "text": "perspective."}, {"end": 33.56, "start": 32.92, "text": "So"}, {"end": 34.0, "start": 33.56, "text": "this"}, {"end": 34.2, "start": 34.0, "text": "is"}, {"end": 34.68, "start": 34.2, "text": "actually"}, {"end": 34.88, "start": 34.68, "text": "a"}, {"end": 35.24, "start": 34.88, "text": "joint"}, {"end": 35.480000000000004, "start": 35.24, "text": "work"}, {"end": 35.56, "start": 35.480000000000004, "text": "with"}, {"end": 35.8, "start": 35.56, "text": "a"}, {"end": 36.16, "start": 35.8, "text": "lot"}, {"end": 36.32, "start": 36.16, "text": "of"}, {"end": 37.16, "start": 36.32, "text": "collaborators"}, {"end": 37.36, "start": 37.16, "text": "at"}, {"end": 37.88, "start": 37.36, "text": "Google."}, {"end": 37.96, "start": 37.88, "text": "Ed,"}, {"end": 38.6, "start": 37.96, "text": "who's"}, {"end": 39.24, "start": 38.6, "text": "actually"}, {"end": 39.92, "start": 39.24, "text": "helped"}, {"end": 40.32, "start": 39.92, "text": "us"}, {"end": 40.72, "start": 40.32, "text": "start"}, {"end": 42.0, "start": 40.72, "text": "this"}, {"end": 42.88, "start": 42.0, "text": "exploration,"}, {"end": 43.08, "start": 42.88, "text": "is"}, {"end": 43.8, "start": 43.08, "text": "also"}, {"end": 44.04, "start": 43.8, "text": "in"}, {"end": 44.08, "start": 44.04, "text": "the"}, {"end": 44.519999999999996, "start": 44.08, "text": "audience,"}, {"end": 45.24, "start": 44.519999999999996, "text": "so"}, {"end": 45.28, "start": 45.24, "text": "be"}, {"end": 45.56, "start": 45.28, "text": "sure"}, {"end": 45.76, "start": 45.56, "text": "to"}, {"end": 46.0, "start": 45.76, "text": "talk"}, {"end": 46.120000000000005, "start": 46.0, "text": "to"}, {"end": 46.4, "start": 46.120000000000005, "text": "him"}, {"end": 46.56, "start": 46.4, "text": "and"}, {"end": 46.68, "start": 46.56, "text": "get"}, {"end": 46.96, "start": 46.68, "text": "his"}, {"end": 47.72, "start": 46.96, "text": "perspective"}, {"end": 47.96, "start": 47.72, "text": "as"}, {"end": 49.72, "start": 47.96, "text": "well."}, {"end": 51.4, "start": 49.72, "text": "So"}, {"end": 51.84, "start": 51.4, "text": "I'm"}, {"end": 52.120000000000005, "start": 51.84, "text": "going"}, {"end": 52.44, "start": 52.120000000000005, "text": "to"}, {"end": 52.879999999999995, "start": 52.44, "text": "divide"}, {"end": 52.879999999999995, "start": 52.879999999999995, "text": "this"}, {"end": 53.36, "start": 52.879999999999995, "text": "talk"}, {"end": 53.56, "start": 53.36, "text": "into"}, {"end": 53.760000000000005, "start": 53.56, "text": "two"}, {"end": 54.239999999999995, "start": 53.760000000000005, "text": "parts."}, {"end": 54.28, "start": 54.239999999999995, "text": "In"}, {"end": 54.56, "start": 54.28, "text": "the"}, {"end": 54.879999999999995, "start": 54.56, "text": "first"}, {"end": 55.760000000000005, "start": 54.879999999999995, "text": "part,"}, {"end": 56.239999999999995, "start": 55.760000000000005, "text": "I"}, {"end": 56.480000000000004, "start": 56.239999999999995, "text": "will"}, {"end": 56.8, "start": 56.480000000000004, "text": "try"}, {"end": 56.84, "start": 56.8, "text": "to"}, {"end": 57.480000000000004, "start": 56.84, "text": "motivate"}, {"end": 57.92, "start": 57.480000000000004, "text": "the"}, {"end": 58.120000000000005, "start": 57.92, "text": "work,"}, {"end": 58.44, "start": 58.120000000000005, "text": "talk"}, {"end": 58.68, "start": 58.44, "text": "about"}, {"end": 59.08, "start": 58.68, "text": "why"}, {"end": 59.120000000000005, "start": 59.08, "text": "do"}, {"end": 59.36, "start": 59.120000000000005, "text": "we"}, {"end": 59.64, "start": 59.36, "text": "want"}, {"end": 59.72, "start": 59.64, "text": "to"}, {"end": 59.96, "start": 59.72, "text": "use"}], "text": " learning perspective. So this is actually a joint work with a lot of collaborators at Google. Ed, who's actually helped us start this exploration, is also in the audience, so be sure to talk to him and get his perspective as well. So I'm going to divide this talk into two parts. In the first part, I will try to motivate the work, talk about why do we want to use"}, {"chunks": [{"end": 60.8, "start": 60.0, "text": "reinforcement"}, {"end": 61.28, "start": 60.8, "text": "learning"}, {"end": 62.04, "start": 61.28, "text": "for"}, {"end": 62.44, "start": 62.04, "text": "building"}, {"end": 63.24, "start": 62.44, "text": "recommender"}, {"end": 63.6, "start": 63.24, "text": "systems"}, {"end": 63.96, "start": 63.6, "text": "and"}, {"end": 64.36, "start": 63.96, "text": "the"}, {"end": 64.92, "start": 64.36, "text": "challenge"}, {"end": 65.2, "start": 64.92, "text": "of"}, {"end": 66.2, "start": 65.2, "text": "using"}, {"end": 66.56, "start": 66.2, "text": "them"}, {"end": 66.84, "start": 66.56, "text": "for"}, {"end": 67.44, "start": 66.84, "text": "recommender"}, {"end": 67.76, "start": 67.44, "text": "systems."}, {"end": 68.52, "start": 67.76, "text": "And"}, {"end": 68.72, "start": 68.52, "text": "in"}, {"end": 69.36, "start": 68.72, "text": "the"}, {"end": 69.8, "start": 69.36, "text": "second"}, {"end": 69.88, "start": 69.8, "text": "part,"}, {"end": 70.32, "start": 69.88, "text": "I"}, {"end": 70.36, "start": 70.32, "text": "will"}, {"end": 70.64, "start": 70.36, "text": "try"}, {"end": 70.84, "start": 70.64, "text": "to"}, {"end": 71.6, "start": 70.84, "text": "share"}, {"end": 72.32, "start": 71.6, "text": "our"}, {"end": 72.64, "start": 72.32, "text": "initial"}, {"end": 73.2, "start": 72.64, "text": "success"}, {"end": 73.68, "start": 73.2, "text": "stories"}, {"end": 73.92, "start": 73.68, "text": "of"}, {"end": 74.72, "start": 73.92, "text": "using"}, {"end": 75.4, "start": 74.72, "text": "reinforcement"}, {"end": 76.0, "start": 75.4, "text": "learning"}, {"end": 76.48, "start": 76.0, "text": "for"}, {"end": 76.72, "start": 76.48, "text": "a"}, {"end": 77.0, "start": 76.72, "text": "YouTube"}, {"end": 77.16, "start": 77.0, "text": "video"}, {"end": 78.48, "start": 77.16, "text": "recommendation."}, {"end": 79.0, "start": 78.48, "text": "And"}, {"end": 79.52, "start": 79.0, "text": "actually"}, {"end": 79.96000000000001, "start": 79.52, "text": "using"}, {"end": 80.12, "start": 79.96000000000001, "text": "these"}, {"end": 80.84, "start": 80.12, "text": "techniques"}, {"end": 81.12, "start": 80.84, "text": "leads"}, {"end": 81.92, "start": 81.12, "text": "to"}, {"end": 82.36, "start": 81.92, "text": "the"}, {"end": 83.12, "start": 82.36, "text": "largest"}, {"end": 83.48, "start": 83.12, "text": "single"}, {"end": 83.96000000000001, "start": 83.48, "text": "launch"}, {"end": 84.48, "start": 83.96000000000001, "text": "improvement"}, {"end": 84.64, "start": 84.48, "text": "we've"}, {"end": 85.36, "start": 84.64, "text": "seen"}, {"end": 85.36, "start": 85.36, "text": "in"}, {"end": 85.72, "start": 85.36, "text": "YouTube"}, {"end": 86.6, "start": 85.72, "text": "for"}, {"end": 86.68, "start": 86.6, "text": "the"}, {"end": 87.03999999999999, "start": 86.68, "text": "last"}, {"end": 87.48, "start": 87.03999999999999, "text": "two"}, {"end": 88.24, "start": 87.48, "text": "years."}, {"end": 88.72, "start": 88.24, "text": "So"}, {"end": 89.32, "start": 88.72, "text": "hopefully"}, {"end": 89.48, "start": 89.32, "text": "I"}, {"end": 89.84, "start": 89.48, "text": "can"}, {"end": 90.0, "start": 89.84, "text": "motivate"}], "text": " reinforcement learning for building recommender systems and the challenge of using them for recommender systems. And in the second part, I will try to share our initial success stories of using reinforcement learning for a YouTube video recommendation. And actually using these techniques leads to the largest single launch improvement we've seen in YouTube for the last two years. So hopefully I can motivate"}, {"chunks": [{"end": 90.16, "start": 90.0, "text": "to"}, {"end": 90.48, "start": 90.16, "text": "try"}, {"end": 90.76, "start": 90.48, "text": "out"}, {"end": 91.2, "start": 90.76, "text": "some"}, {"end": 91.52, "start": 91.2, "text": "of"}, {"end": 91.68, "start": 91.52, "text": "these"}, {"end": 94.36, "start": 91.68, "text": "techniques."}, {"end": 95.68, "start": 94.36, "text": "So"}, {"end": 96.4, "start": 95.68, "text": "recommender"}, {"end": 97.0, "start": 96.4, "text": "systems"}, {"end": 97.24, "start": 97.0, "text": "are"}, {"end": 97.72, "start": 97.24, "text": "heavily"}, {"end": 98.2, "start": 97.72, "text": "relied"}, {"end": 98.92, "start": 98.2, "text": "in"}, {"end": 99.24, "start": 98.92, "text": "the"}, {"end": 99.88, "start": 99.24, "text": "industry"}, {"end": 100.08, "start": 99.88, "text": "to"}, {"end": 100.64, "start": 100.08, "text": "help"}, {"end": 101.2, "start": 100.64, "text": "users"}, {"end": 101.48, "start": 101.2, "text": "source"}, {"end": 101.6, "start": 101.48, "text": "through"}, {"end": 102.03999999999999, "start": 101.6, "text": "a"}, {"end": 102.6, "start": 102.03999999999999, "text": "large"}, {"end": 103.32, "start": 102.6, "text": "corpus"}, {"end": 103.68, "start": 103.32, "text": "of"}, {"end": 104.28, "start": 103.68, "text": "content"}, {"end": 104.64, "start": 104.28, "text": "and"}, {"end": 105.08, "start": 104.64, "text": "find"}, {"end": 105.48, "start": 105.08, "text": "the"}, {"end": 105.88, "start": 105.48, "text": "very"}, {"end": 106.08, "start": 105.88, "text": "small"}, {"end": 106.68, "start": 106.08, "text": "fractions"}, {"end": 106.84, "start": 106.68, "text": "of"}, {"end": 107.12, "start": 106.84, "text": "content"}, {"end": 107.12, "start": 107.12, "text": "they"}, {"end": 107.44, "start": 107.12, "text": "will"}, {"end": 107.68, "start": 107.44, "text": "be"}, {"end": 108.32, "start": 107.68, "text": "interested"}, {"end": 108.92, "start": 108.32, "text": "in."}, {"end": 109.48, "start": 108.92, "text": "And"}, {"end": 109.96000000000001, "start": 109.48, "text": "no"}, {"end": 110.72, "start": 109.96000000000001, "text": "surprise,"}, {"end": 111.64, "start": 110.72, "text": "recommender"}, {"end": 112.0, "start": 111.64, "text": "systems"}, {"end": 112.48, "start": 112.0, "text": "are"}, {"end": 113.36, "start": 112.48, "text": "extensively"}, {"end": 113.68, "start": 113.36, "text": "used"}, {"end": 114.2, "start": 113.68, "text": "within"}, {"end": 114.68, "start": 114.2, "text": "Google"}, {"end": 114.92, "start": 114.68, "text": "to"}, {"end": 115.28, "start": 114.92, "text": "power"}, {"end": 115.68, "start": 115.28, "text": "different"}, {"end": 115.92, "start": 115.68, "text": "Google"}, {"end": 117.12, "start": 115.92, "text": "products."}, {"end": 117.24, "start": 117.12, "text": "So"}, {"end": 117.36, "start": 117.24, "text": "they"}, {"end": 117.6, "start": 117.36, "text": "are"}, {"end": 117.88, "start": 117.6, "text": "used"}, {"end": 118.4, "start": 117.88, "text": "in"}, {"end": 118.8, "start": 118.4, "text": "YouTube"}, {"end": 119.0, "start": 118.8, "text": "to"}, {"end": 119.52, "start": 119.0, "text": "recommend"}, {"end": 119.96000000000001, "start": 119.52, "text": "videos."}], "text": " to try out some of these techniques. So recommender systems are heavily relied in the industry to help users source through a large corpus of content and find the very small fractions of content they will be interested in. And no surprise, recommender systems are extensively used within Google to power different Google products. So they are used in YouTube to recommend videos."}, {"chunks": [{"end": 120.64, "start": 120.0, "text": "In"}, {"end": 121.64, "start": 120.64, "text": "Google"}, {"end": 121.88, "start": 121.64, "text": "Play"}, {"end": 122.4, "start": 121.88, "text": "Store"}, {"end": 122.8, "start": 122.4, "text": "to"}, {"end": 123.52, "start": 122.8, "text": "recommend"}, {"end": 124.16, "start": 123.52, "text": "apps,"}, {"end": 124.92, "start": 124.16, "text": "games,"}, {"end": 125.28, "start": 124.92, "text": "books,"}, {"end": 126.12, "start": 125.28, "text": "and"}, {"end": 126.76, "start": 126.12, "text": "music."}, {"end": 127.56, "start": 126.76, "text": "And"}, {"end": 128.12, "start": 127.56, "text": "in"}, {"end": 128.48, "start": 128.12, "text": "Google"}, {"end": 129.04, "start": 128.48, "text": "News"}, {"end": 129.04, "start": 129.04, "text": "to"}, {"end": 129.4, "start": 129.04, "text": "try"}, {"end": 129.72, "start": 129.4, "text": "to"}, {"end": 130.84, "start": 129.72, "text": "personalize"}, {"end": 131.08, "start": 130.84, "text": "new"}, {"end": 131.76, "start": 131.08, "text": "stories"}, {"end": 131.92, "start": 131.76, "text": "for"}, {"end": 133.16, "start": 131.92, "text": "users."}, {"end": 133.72, "start": 133.16, "text": "And"}, {"end": 133.92, "start": 133.72, "text": "in"}, {"end": 134.28, "start": 133.92, "text": "Google"}, {"end": 134.76, "start": 134.28, "text": "Maps"}, {"end": 135.12, "start": 134.76, "text": "to"}, {"end": 135.4, "start": 135.12, "text": "find"}, {"end": 135.88, "start": 135.4, "text": "restaurants"}, {"end": 136.24, "start": 135.88, "text": "for"}, {"end": 136.36, "start": 136.24, "text": "you"}, {"end": 136.6, "start": 136.36, "text": "to"}, {"end": 136.88, "start": 136.6, "text": "eat"}, {"end": 137.48, "start": 136.88, "text": "and"}, {"end": 138.04, "start": 137.48, "text": "hotels"}, {"end": 138.07999999999998, "start": 138.04, "text": "to"}, {"end": 138.52, "start": 138.07999999999998, "text": "stay."}, {"end": 138.8, "start": 138.52, "text": "And"}, {"end": 139.07999999999998, "start": 138.8, "text": "in"}, {"end": 139.52, "start": 139.07999999999998, "text": "fact,"}, {"end": 139.92000000000002, "start": 139.52, "text": "there"}, {"end": 140.07999999999998, "start": 139.92000000000002, "text": "are"}, {"end": 140.6, "start": 140.07999999999998, "text": "many,"}, {"end": 140.88, "start": 140.6, "text": "many"}, {"end": 141.44, "start": 140.88, "text": "other"}, {"end": 141.68, "start": 141.44, "text": "use"}, {"end": 142.16, "start": 141.68, "text": "cases"}, {"end": 142.48, "start": 142.16, "text": "within"}, {"end": 143.4, "start": 142.48, "text": "Google."}, {"end": 145.28, "start": 143.4, "text": "So"}, {"end": 145.48, "start": 145.28, "text": "in"}, {"end": 146.56, "start": 145.48, "text": "the"}, {"end": 147.04, "start": 146.56, "text": "last"}, {"end": 147.88, "start": 147.04, "text": "decade,"}, {"end": 148.24, "start": 147.88, "text": "we've"}, {"end": 148.88, "start": 148.24, "text": "seen"}, {"end": 149.48, "start": 148.88, "text": "the"}, {"end": 150.0, "start": 149.48, "text": "techniques"}], "text": " In Google Play Store to recommend apps, games, books, and music. And in Google News to try to personalize new stories for users. And in Google Maps to find restaurants for you to eat and hotels to stay. And in fact, there are many, many other use cases within Google. So in the last decade, we've seen the techniques"}, {"chunks": [{"end": 150.44, "start": 150.0, "text": "that's"}, {"end": 151.04, "start": 150.44, "text": "powering"}, {"end": 151.36, "start": 151.04, "text": "these"}, {"end": 152.04, "start": 151.36, "text": "recommender"}, {"end": 153.4, "start": 152.04, "text": "systems"}, {"end": 154.24, "start": 153.4, "text": "evolved."}, {"end": 154.4, "start": 154.24, "text": "So"}, {"end": 154.56, "start": 154.4, "text": "the"}, {"end": 155.04, "start": 154.56, "text": "first"}, {"end": 155.6, "start": 155.04, "text": "generation"}, {"end": 156.04, "start": 155.6, "text": "of"}, {"end": 156.8, "start": 156.04, "text": "recommender"}, {"end": 157.52, "start": 156.8, "text": "systems"}, {"end": 158.16, "start": 157.52, "text": "heavily"}, {"end": 158.68, "start": 158.16, "text": "relied"}, {"end": 159.0, "start": 158.68, "text": "on"}, {"end": 159.24, "start": 159.0, "text": "this"}, {"end": 160.12, "start": 159.24, "text": "user"}, {"end": 161.04, "start": 160.12, "text": "item"}, {"end": 161.6, "start": 161.04, "text": "interaction"}, {"end": 162.6, "start": 161.6, "text": "pairs."}, {"end": 162.96, "start": 162.6, "text": "For"}, {"end": 163.64, "start": 162.96, "text": "example,"}, {"end": 163.96, "start": 163.64, "text": "in"}, {"end": 164.32, "start": 163.96, "text": "metric"}, {"end": 165.24, "start": 164.32, "text": "factorization,"}, {"end": 166.07999999999998, "start": 165.24, "text": "we"}, {"end": 166.24, "start": 166.07999999999998, "text": "learn"}, {"end": 166.24, "start": 166.24, "text": "kind"}, {"end": 166.28, "start": 166.24, "text": "of"}, {"end": 167.07999999999998, "start": 166.28, "text": "latent"}, {"end": 167.68, "start": 167.07999999999998, "text": "user"}, {"end": 167.92000000000002, "start": 167.68, "text": "and"}, {"end": 168.16, "start": 167.92000000000002, "text": "item"}, {"end": 168.6, "start": 168.16, "text": "embedding"}, {"end": 168.8, "start": 168.6, "text": "so"}, {"end": 169.28, "start": 168.8, "text": "that"}, {"end": 169.56, "start": 169.28, "text": "they"}, {"end": 169.96, "start": 169.56, "text": "will"}, {"end": 170.6, "start": 169.96, "text": "request"}, {"end": 171.07999999999998, "start": 170.6, "text": "towards"}, {"end": 171.36, "start": 171.07999999999998, "text": "the"}, {"end": 172.24, "start": 171.36, "text": "few"}, {"end": 173.32, "start": 172.24, "text": "interaction"}, {"end": 173.88, "start": 173.32, "text": "pairs"}, {"end": 174.0, "start": 173.88, "text": "we"}, {"end": 174.32, "start": 174.0, "text": "have"}, {"end": 174.88, "start": 174.32, "text": "observed."}, {"end": 175.07999999999998, "start": 174.88, "text": "And"}, {"end": 175.32, "start": 175.07999999999998, "text": "in"}, {"end": 175.76, "start": 175.32, "text": "the"}, {"end": 176.04, "start": 175.76, "text": "hope"}, {"end": 176.56, "start": 176.04, "text": "that"}, {"end": 176.72, "start": 176.56, "text": "by"}, {"end": 177.07999999999998, "start": 176.72, "text": "learning"}, {"end": 177.24, "start": 177.07999999999998, "text": "these"}, {"end": 178.04, "start": 177.24, "text": "recommendations,"}, {"end": 178.2, "start": 178.04, "text": "we"}, {"end": 178.48, "start": 178.2, "text": "can"}, {"end": 178.72, "start": 178.48, "text": "then"}, {"end": 179.64, "start": 178.72, "text": "generalize"}, {"end": 179.96, "start": 179.64, "text": "to"}], "text": " that's powering these recommender systems evolved. So the first generation of recommender systems heavily relied on this user item interaction pairs. For example, in metric factorization, we learn kind of latent user and item embedding so that they will request towards the few interaction pairs we have observed. And in the hope that by learning these recommendations, we can then generalize to"}, {"chunks": [{"end": 181.52, "start": 180.0, "text": "missing"}, {"end": 182.2, "start": 181.52, "text": "user-item"}, {"end": 183.0, "start": 182.2, "text": "interaction"}, {"end": 184.36, "start": 183.0, "text": "pairs."}, {"end": 184.6, "start": 184.36, "text": "With"}, {"end": 185.08, "start": 184.6, "text": "the"}, {"end": 186.64, "start": 185.08, "text": "kind"}, {"end": 187.2, "start": 186.64, "text": "of"}, {"end": 187.24, "start": 187.2, "text": "deep"}, {"end": 188.28, "start": 187.24, "text": "neural"}, {"end": 188.88, "start": 188.28, "text": "network"}, {"end": 189.2, "start": 188.88, "text": "gaining"}, {"end": 190.56, "start": 189.2, "text": "popularity"}, {"end": 190.84, "start": 190.56, "text": "in"}, {"end": 191.2, "start": 190.84, "text": "different"}, {"end": 191.52, "start": 191.2, "text": "fields,"}, {"end": 191.56, "start": 191.52, "text": "they"}, {"end": 192.08, "start": 191.56, "text": "also"}, {"end": 192.4, "start": 192.08, "text": "started"}, {"end": 192.76, "start": 192.4, "text": "to"}, {"end": 193.28, "start": 192.76, "text": "influence"}, {"end": 194.0, "start": 193.28, "text": "the"}, {"end": 195.24, "start": 194.0, "text": "recommender"}, {"end": 196.07999999999998, "start": 195.24, "text": "system"}, {"end": 196.84, "start": 196.07999999999998, "text": "community."}, {"end": 197.04, "start": 196.84, "text": "So"}, {"end": 197.76, "start": 197.04, "text": "researchers"}, {"end": 198.07999999999998, "start": 197.76, "text": "try"}, {"end": 198.32, "start": 198.07999999999998, "text": "to"}, {"end": 199.04, "start": 198.32, "text": "use"}, {"end": 199.4, "start": 199.04, "text": "these"}, {"end": 200.36, "start": 199.4, "text": "techniques"}, {"end": 200.44, "start": 200.36, "text": "to"}, {"end": 201.36, "start": 200.44, "text": "build"}, {"end": 201.8, "start": 201.36, "text": "richer"}, {"end": 202.32, "start": 201.8, "text": "models"}, {"end": 202.4, "start": 202.32, "text": "of"}, {"end": 202.48, "start": 202.4, "text": "the"}, {"end": 202.88, "start": 202.48, "text": "users"}, {"end": 203.68, "start": 202.88, "text": "as"}, {"end": 204.4, "start": 203.68, "text": "well"}, {"end": 204.72, "start": 204.4, "text": "as"}, {"end": 205.16, "start": 204.72, "text": "richer"}, {"end": 205.6, "start": 205.16, "text": "models"}, {"end": 205.76, "start": 205.6, "text": "of"}, {"end": 205.8, "start": 205.76, "text": "the"}, {"end": 206.76, "start": 205.8, "text": "items,"}, {"end": 207.12, "start": 206.76, "text": "for"}, {"end": 207.48, "start": 207.12, "text": "example,"}, {"end": 207.8, "start": 207.48, "text": "to"}, {"end": 208.12, "start": 207.8, "text": "address"}, {"end": 208.48, "start": 208.12, "text": "cold"}, {"end": 208.84, "start": 208.48, "text": "start"}, {"end": 210.0, "start": 208.84, "text": "issues."}], "text": " missing user-item interaction pairs. With the kind of deep neural network gaining popularity in different fields, they also started to influence the recommender system community. So researchers try to use these techniques to build richer models of the users as well as richer models of the items, for example, to address cold start issues."}, {"chunks": [{"end": 210.24, "start": 210.0, "text": "And"}, {"end": 210.44, "start": 210.24, "text": "these"}, {"end": 210.96, "start": 210.44, "text": "techniques"}, {"end": 211.24, "start": 210.96, "text": "are"}, {"end": 211.76, "start": 211.24, "text": "also"}, {"end": 212.24, "start": 211.76, "text": "relied"}, {"end": 212.64, "start": 212.24, "text": "on"}, {"end": 213.4, "start": 212.64, "text": "to"}, {"end": 213.52, "start": 213.4, "text": "kind"}, {"end": 213.96, "start": 213.52, "text": "of"}, {"end": 215.12, "start": 213.96, "text": "incorporate"}, {"end": 215.44, "start": 215.12, "text": "side"}, {"end": 216.28, "start": 215.44, "text": "information"}, {"end": 216.68, "start": 216.28, "text": "beyond"}, {"end": 217.0, "start": 216.68, "text": "just"}, {"end": 217.36, "start": 217.0, "text": "the"}, {"end": 217.92, "start": 217.36, "text": "user-item"}, {"end": 218.52, "start": 217.92, "text": "interaction"}, {"end": 219.08, "start": 218.52, "text": "pairs"}, {"end": 219.52, "start": 219.08, "text": "to"}, {"end": 220.2, "start": 219.52, "text": "improve"}, {"end": 220.96, "start": 220.2, "text": "recommendation"}, {"end": 222.32, "start": 220.96, "text": "systems."}, {"end": 222.72, "start": 222.32, "text": "So"}, {"end": 222.8, "start": 222.72, "text": "we've"}, {"end": 223.44, "start": 222.8, "text": "seen"}, {"end": 223.56, "start": 223.44, "text": "these"}, {"end": 224.36, "start": 223.56, "text": "techniques"}, {"end": 224.56, "start": 224.36, "text": "bring"}, {"end": 225.24, "start": 224.56, "text": "almost"}, {"end": 225.56, "start": 225.24, "text": "like"}, {"end": 226.88, "start": 225.56, "text": "revolutionized"}, {"end": 228.0, "start": 226.88, "text": "advances"}, {"end": 228.28, "start": 228.0, "text": "to"}, {"end": 228.64, "start": 228.28, "text": "industrial"}, {"end": 229.36, "start": 228.64, "text": "recommender"}, {"end": 230.16, "start": 229.36, "text": "systems"}, {"end": 230.76, "start": 230.16, "text": "in"}, {"end": 231.16, "start": 230.76, "text": "early"}, {"end": 232.16, "start": 231.16, "text": "years,"}, {"end": 232.52, "start": 232.16, "text": "but"}, {"end": 232.96, "start": 232.52, "text": "these"}, {"end": 233.36, "start": 232.96, "text": "improvements"}, {"end": 233.52, "start": 233.36, "text": "are"}, {"end": 234.04, "start": 233.52, "text": "kind"}, {"end": 234.12, "start": 234.04, "text": "of"}, {"end": 234.96, "start": 234.12, "text": "saturated"}, {"end": 235.88, "start": 234.96, "text": "out"}, {"end": 236.44, "start": 235.88, "text": "lately."}, {"end": 236.92000000000002, "start": 236.44, "text": "So"}, {"end": 236.96, "start": 236.92000000000002, "text": "the"}, {"end": 238.36, "start": 236.96, "text": "question"}, {"end": 238.6, "start": 238.36, "text": "we"}, {"end": 239.12, "start": 238.6, "text": "want"}, {"end": 239.24, "start": 239.12, "text": "to"}, {"end": 239.76, "start": 239.24, "text": "ask"}, {"end": 239.76, "start": 239.76, "text": "to"}, {"end": 239.96, "start": 239.76, "text": "address"}], "text": " And these techniques are also relied on to kind of incorporate side information beyond just the user-item interaction pairs to improve recommendation systems. So we've seen these techniques bring almost like revolutionized advances to industrial recommender systems in early years, but these improvements are kind of saturated out lately. So the question we want to ask to address"}, {"chunks": [{"end": 240.32, "start": 240.0, "text": "address"}, {"end": 240.88, "start": 240.32, "text": "is"}, {"end": 241.44, "start": 240.88, "text": "how"}, {"end": 241.44, "start": 241.44, "text": "do"}, {"end": 241.64, "start": 241.44, "text": "we"}, {"end": 241.96, "start": 241.64, "text": "break"}, {"end": 242.56, "start": 241.96, "text": "out"}, {"end": 242.88, "start": 242.56, "text": "of"}, {"end": 243.04, "start": 242.88, "text": "the"}, {"end": 244.44, "start": 243.04, "text": "plateau?"}, {"end": 245.24, "start": 244.44, "text": "So"}, {"end": 245.52, "start": 245.24, "text": "the"}, {"end": 246.24, "start": 245.52, "text": "first"}, {"end": 246.52, "start": 246.24, "text": "and"}, {"end": 246.92, "start": 246.52, "text": "second"}, {"end": 247.52, "start": 246.92, "text": "generation"}, {"end": 247.72, "start": 247.52, "text": "of"}, {"end": 249.12, "start": 247.72, "text": "recommender"}, {"end": 250.16, "start": 249.12, "text": "systems"}, {"end": 250.48, "start": 250.16, "text": "are"}, {"end": 250.8, "start": 250.48, "text": "still"}, {"end": 251.04, "start": 250.8, "text": "full"}, {"end": 251.4, "start": 251.04, "text": "within"}, {"end": 251.56, "start": 251.4, "text": "the"}, {"end": 252.44, "start": 251.56, "text": "supervised"}, {"end": 252.96, "start": 252.44, "text": "learning"}, {"end": 253.72, "start": 252.96, "text": "paradigm"}, {"end": 253.92, "start": 253.72, "text": "in"}, {"end": 254.8, "start": 253.92, "text": "our"}, {"end": 255.84, "start": 254.8, "text": "perspective."}, {"end": 256.0, "start": 255.84, "text": "And"}, {"end": 256.32, "start": 256.0, "text": "there"}, {"end": 256.56, "start": 256.32, "text": "are"}, {"end": 257.28, "start": 256.56, "text": "several"}, {"end": 258.2, "start": 257.28, "text": "limitations"}, {"end": 258.92, "start": 258.2, "text": "with"}, {"end": 259.56, "start": 258.92, "text": "solving"}, {"end": 260.04, "start": 259.56, "text": "recommender"}, {"end": 260.44, "start": 260.04, "text": "systems"}, {"end": 260.84, "start": 260.44, "text": "as"}, {"end": 261.12, "start": 260.84, "text": "a"}, {"end": 262.8, "start": 261.12, "text": "supervised,"}, {"end": 263.04, "start": 262.8, "text": "using"}, {"end": 263.56, "start": 263.04, "text": "supervised"}, {"end": 264.0, "start": 263.56, "text": "learning"}, {"end": 266.2, "start": 264.0, "text": "approaches."}, {"end": 266.6, "start": 266.2, "text": "So"}, {"end": 267.36, "start": 266.6, "text": "first,"}, {"end": 268.8, "start": 267.36, "text": "these"}, {"end": 269.48, "start": 268.8, "text": "systems,"}, {"end": 269.96, "start": 269.48, "text": "I"}], "text": " address is how do we break out of the plateau? So the first and second generation of recommender systems are still full within the supervised learning paradigm in our perspective. And there are several limitations with solving recommender systems as a supervised, using supervised learning approaches. So first, these systems, I"}, {"chunks": [{"end": 270.28, "start": 270.0, "text": "And"}, {"end": 270.56, "start": 270.28, "text": "these"}, {"end": 271.2, "start": 270.56, "text": "algorithms"}, {"end": 271.64, "start": 271.2, "text": "basically"}, {"end": 272.68, "start": 271.64, "text": "take"}, {"end": 273.4, "start": 272.68, "text": "the"}, {"end": 274.16, "start": 273.4, "text": "feedbacks"}, {"end": 274.6, "start": 274.16, "text": "the"}, {"end": 275.0, "start": 274.6, "text": "user"}, {"end": 275.84, "start": 275.0, "text": "provided"}, {"end": 276.08, "start": 275.84, "text": "on"}, {"end": 276.48, "start": 276.08, "text": "the"}, {"end": 277.4, "start": 276.48, "text": "items"}, {"end": 277.76, "start": 277.4, "text": "as"}, {"end": 278.0, "start": 277.76, "text": "ground"}, {"end": 278.88, "start": 278.0, "text": "truth"}, {"end": 279.0, "start": 278.88, "text": "and"}, {"end": 279.4, "start": 279.0, "text": "try"}, {"end": 279.44, "start": 279.4, "text": "to"}, {"end": 279.84, "start": 279.44, "text": "build"}, {"end": 280.36, "start": 279.84, "text": "models"}, {"end": 280.68, "start": 280.36, "text": "to"}, {"end": 281.08, "start": 280.68, "text": "request"}, {"end": 281.48, "start": 281.08, "text": "to"}, {"end": 282.48, "start": 281.48, "text": "these"}, {"end": 282.84, "start": 282.48, "text": "user"}, {"end": 283.76, "start": 282.84, "text": "feedbacks."}, {"end": 284.08, "start": 283.76, "text": "But"}, {"end": 284.36, "start": 284.08, "text": "in"}, {"end": 284.96, "start": 284.36, "text": "reality,"}, {"end": 285.56, "start": 284.96, "text": "recommender"}, {"end": 286.24, "start": 285.56, "text": "systems"}, {"end": 286.76, "start": 286.24, "text": "are"}, {"end": 286.84, "start": 286.76, "text": "an"}, {"end": 287.44, "start": 286.84, "text": "interactive"}, {"end": 288.64, "start": 287.44, "text": "system"}, {"end": 289.2, "start": 288.64, "text": "where"}, {"end": 289.2, "start": 289.2, "text": "the"}, {"end": 289.96, "start": 289.2, "text": "recommender"}, {"end": 290.48, "start": 289.96, "text": "will"}, {"end": 290.64, "start": 290.48, "text": "be"}, {"end": 291.44, "start": 290.64, "text": "providing"}, {"end": 292.28, "start": 291.44, "text": "users"}, {"end": 292.6, "start": 292.28, "text": "with"}, {"end": 292.84, "start": 292.6, "text": "a"}, {"end": 293.12, "start": 292.84, "text": "few"}, {"end": 293.8, "start": 293.12, "text": "content"}, {"end": 294.08, "start": 293.8, "text": "and"}, {"end": 294.12, "start": 294.08, "text": "the"}, {"end": 294.36, "start": 294.12, "text": "users"}, {"end": 294.88, "start": 294.36, "text": "will"}, {"end": 295.36, "start": 294.88, "text": "provide"}, {"end": 295.92, "start": 295.36, "text": "feedback"}, {"end": 296.12, "start": 295.92, "text": "on"}, {"end": 296.32, "start": 296.12, "text": "these"}, {"end": 297.24, "start": 296.32, "text": "contents."}, {"end": 297.56, "start": 297.24, "text": "So"}, {"end": 297.72, "start": 297.56, "text": "as"}, {"end": 297.76, "start": 297.72, "text": "a"}, {"end": 298.16, "start": 297.76, "text": "result,"}, {"end": 298.32, "start": 298.16, "text": "we"}, {"end": 298.8, "start": 298.32, "text": "only"}, {"end": 299.48, "start": 298.8, "text": "observe"}, {"end": 300.0, "start": 299.48, "text": "feedback"}], "text": " And these algorithms basically take the feedbacks the user provided on the items as ground truth and try to build models to request to these user feedbacks. But in reality, recommender systems are an interactive system where the recommender will be providing users with a few content and the users will provide feedback on these contents. So as a result, we only observe feedback"}, {"chunks": [{"end": 300.32, "start": 300.0, "text": "on"}, {"end": 300.8, "start": 300.32, "text": "items"}, {"end": 301.2, "start": 300.8, "text": "which"}, {"end": 301.6, "start": 301.2, "text": "the"}, {"end": 302.68, "start": 301.6, "text": "recommender,"}, {"end": 303.8, "start": 302.68, "text": "the"}, {"end": 304.36, "start": 303.8, "text": "current"}, {"end": 304.72, "start": 304.36, "text": "system"}, {"end": 305.0, "start": 304.72, "text": "would"}, {"end": 305.28, "start": 305.0, "text": "have"}, {"end": 305.84, "start": 305.28, "text": "recommended"}, {"end": 305.84, "start": 305.84, "text": "but"}, {"end": 305.84, "start": 305.84, "text": "not"}, {"end": 306.32, "start": 305.84, "text": "the"}, {"end": 306.88, "start": 306.32, "text": "others."}, {"end": 307.08, "start": 306.88, "text": "So"}, {"end": 307.36, "start": 307.08, "text": "for"}, {"end": 307.96, "start": 307.36, "text": "example,"}, {"end": 308.16, "start": 307.96, "text": "in"}, {"end": 308.2, "start": 308.16, "text": "this"}, {"end": 308.64, "start": 308.2, "text": "case,"}, {"end": 308.72, "start": 308.64, "text": "we"}, {"end": 308.76, "start": 308.72, "text": "will"}, {"end": 309.12, "start": 308.76, "text": "see,"}, {"end": 309.48, "start": 309.12, "text": "we"}, {"end": 309.88, "start": 309.48, "text": "know"}, {"end": 310.84, "start": 309.88, "text": "whether"}, {"end": 310.88, "start": 310.84, "text": "or"}, {"end": 311.2, "start": 310.88, "text": "not"}, {"end": 311.36, "start": 311.2, "text": "the"}, {"end": 311.68, "start": 311.36, "text": "user"}, {"end": 312.28, "start": 311.68, "text": "likes"}, {"end": 313.0, "start": 312.28, "text": "these"}, {"end": 313.52, "start": 313.0, "text": "two"}, {"end": 314.0, "start": 313.52, "text": "videos,"}, {"end": 314.12, "start": 314.0, "text": "but"}, {"end": 314.32, "start": 314.12, "text": "then"}, {"end": 314.44, "start": 314.32, "text": "we'll"}, {"end": 314.84, "start": 314.44, "text": "have"}, {"end": 315.24, "start": 314.84, "text": "no"}, {"end": 315.92, "start": 315.24, "text": "idea"}, {"end": 316.24, "start": 315.92, "text": "if"}, {"end": 316.36, "start": 316.24, "text": "a"}, {"end": 316.88, "start": 316.36, "text": "different"}, {"end": 317.16, "start": 316.88, "text": "video"}, {"end": 317.4, "start": 317.16, "text": "is"}, {"end": 318.48, "start": 317.4, "text": "recommended,"}, {"end": 318.88, "start": 318.48, "text": "how"}, {"end": 319.32, "start": 318.88, "text": "much"}, {"end": 319.96, "start": 319.32, "text": "more"}, {"end": 320.12, "start": 319.96, "text": "or"}, {"end": 320.6, "start": 320.12, "text": "less"}, {"end": 320.84, "start": 320.6, "text": "the"}, {"end": 321.16, "start": 320.84, "text": "user"}, {"end": 321.44, "start": 321.16, "text": "would"}, {"end": 322.4, "start": 321.44, "text": "prefer"}, {"end": 322.6, "start": 322.4, "text": "that"}, {"end": 324.72, "start": 322.6, "text": "video."}, {"end": 325.36, "start": 324.72, "text": "And"}, {"end": 326.04, "start": 325.36, "text": "ignoring"}, {"end": 326.24, "start": 326.04, "text": "the"}, {"end": 326.84, "start": 326.24, "text": "system"}, {"end": 327.36, "start": 326.84, "text": "bias"}, {"end": 327.72, "start": 327.36, "text": "could"}, {"end": 328.68, "start": 327.72, "text": "seriously"}, {"end": 329.16, "start": 328.68, "text": "impact"}, {"end": 329.36, "start": 329.16, "text": "the"}, {"end": 329.96, "start": 329.36, "text": "quality"}], "text": " on items which the recommender, the current system would have recommended but not the others. So for example, in this case, we will see, we know whether or not the user likes these two videos, but then we'll have no idea if a different video is recommended, how much more or less the user would prefer that video. And ignoring the system bias could seriously impact the quality"}, {"chunks": [{"end": 330.24, "start": 330.0, "text": "of"}, {"end": 331.4, "start": 330.24, "text": "recommendation"}, {"end": 331.68, "start": 331.4, "text": "will"}, {"end": 331.76, "start": 331.68, "text": "be"}, {"end": 332.2, "start": 331.76, "text": "able"}, {"end": 333.32, "start": 332.2, "text": "to"}, {"end": 334.04, "start": 333.32, "text": "give."}, {"end": 334.28, "start": 334.04, "text": "And"}, {"end": 334.44, "start": 334.28, "text": "the"}, {"end": 335.12, "start": 334.44, "text": "second"}, {"end": 335.6, "start": 335.12, "text": "limitation"}, {"end": 335.72, "start": 335.6, "text": "is"}, {"end": 336.48, "start": 335.72, "text": "that"}, {"end": 337.28, "start": 336.48, "text": "these"}, {"end": 338.0, "start": 337.28, "text": "methods"}, {"end": 338.68, "start": 338.0, "text": "tend"}, {"end": 339.28, "start": 338.68, "text": "to"}, {"end": 340.24, "start": 339.28, "text": "provide"}, {"end": 340.92, "start": 340.24, "text": "myopic"}, {"end": 342.6, "start": 340.92, "text": "recommendations."}, {"end": 343.28, "start": 342.6, "text": "So"}, {"end": 343.84, "start": 343.28, "text": "as"}, {"end": 344.24, "start": 343.84, "text": "in"}, {"end": 344.6, "start": 344.24, "text": "this"}, {"end": 345.6, "start": 344.6, "text": "example,"}, {"end": 345.88, "start": 345.6, "text": "because"}, {"end": 346.6, "start": 345.88, "text": "these"}, {"end": 347.68, "start": 346.6, "text": "methods"}, {"end": 348.28, "start": 347.68, "text": "are"}, {"end": 348.96, "start": 348.28, "text": "often"}, {"end": 350.2, "start": 348.96, "text": "optimized"}, {"end": 351.24, "start": 350.2, "text": "to"}, {"end": 351.88, "start": 351.24, "text": "maximize"}, {"end": 352.36, "start": 351.88, "text": "directly"}, {"end": 353.28, "start": 352.36, "text": "immediate"}, {"end": 354.04, "start": 353.28, "text": "response,"}, {"end": 354.24, "start": 354.04, "text": "so"}, {"end": 355.2, "start": 354.24, "text": "as"}, {"end": 355.68, "start": 355.2, "text": "a"}, {"end": 356.28, "start": 355.68, "text": "result,"}, {"end": 356.48, "start": 356.28, "text": "they"}, {"end": 357.12, "start": 356.48, "text": "tend"}, {"end": 357.56, "start": 357.12, "text": "to"}, {"end": 357.96, "start": 357.56, "text": "recommend"}, {"end": 358.16, "start": 357.96, "text": "a"}, {"end": 359.0, "start": 358.16, "text": "content"}, {"end": 359.32, "start": 359.0, "text": "which"}, {"end": 359.6, "start": 359.32, "text": "are"}, {"end": 360.0, "start": 359.6, "text": "catchy."}], "text": " of recommendation will be able to give. And the second limitation is that these methods tend to provide myopic recommendations. So as in this example, because these methods are often optimized to maximize directly immediate response, so as a result, they tend to recommend a content which are catchy."}, {"chunks": [{"end": 360.48, "start": 360.0, "text": "or"}, {"end": 361.36, "start": 360.48, "text": "the"}, {"end": 361.88, "start": 361.36, "text": "users"}, {"end": 362.4, "start": 361.88, "text": "are"}, {"end": 363.8, "start": 362.4, "text": "more"}, {"end": 364.52, "start": 363.8, "text": "familiar"}, {"end": 364.6, "start": 364.52, "text": "with."}, {"end": 365.12, "start": 364.6, "text": "So"}, {"end": 365.48, "start": 365.12, "text": "as"}, {"end": 365.56, "start": 365.48, "text": "in"}, {"end": 366.08, "start": 365.56, "text": "this"}, {"end": 366.96, "start": 366.08, "text": "example,"}, {"end": 368.28, "start": 366.96, "text": "so"}, {"end": 368.4, "start": 368.28, "text": "we,"}, {"end": 368.52, "start": 368.4, "text": "it"}, {"end": 369.12, "start": 368.52, "text": "will"}, {"end": 370.08, "start": 369.12, "text": "recommend"}, {"end": 370.4, "start": 370.08, "text": "a"}, {"end": 370.8, "start": 370.4, "text": "video"}, {"end": 371.04, "start": 370.8, "text": "which"}, {"end": 371.24, "start": 371.04, "text": "the"}, {"end": 371.6, "start": 371.24, "text": "user"}, {"end": 372.36, "start": 371.6, "text": "has"}, {"end": 373.12, "start": 372.36, "text": "already"}, {"end": 373.36, "start": 373.12, "text": "been"}, {"end": 373.84, "start": 373.36, "text": "familiar"}, {"end": 374.16, "start": 373.84, "text": "with."}, {"end": 375.0, "start": 374.16, "text": "And"}, {"end": 376.0, "start": 375.0, "text": "as"}, {"end": 376.48, "start": 376.0, "text": "a"}, {"end": 376.96, "start": 376.48, "text": "result,"}, {"end": 377.44, "start": 376.96, "text": "users"}, {"end": 377.72, "start": 377.44, "text": "do"}, {"end": 377.88, "start": 377.72, "text": "not"}, {"end": 378.24, "start": 377.88, "text": "get"}, {"end": 379.24, "start": 378.24, "text": "additional"}, {"end": 379.92, "start": 379.24, "text": "information"}, {"end": 380.36, "start": 379.92, "text": "from"}, {"end": 380.44, "start": 380.36, "text": "these"}, {"end": 381.0, "start": 380.44, "text": "watches."}, {"end": 381.08, "start": 381.0, "text": "And"}, {"end": 381.24, "start": 381.08, "text": "in"}, {"end": 381.56, "start": 381.24, "text": "the"}, {"end": 382.08, "start": 381.56, "text": "worst"}, {"end": 382.68, "start": 382.08, "text": "cases,"}, {"end": 382.76, "start": 382.68, "text": "it"}, {"end": 383.08, "start": 382.76, "text": "can"}, {"end": 383.72, "start": 383.08, "text": "even"}, {"end": 384.44, "start": 383.72, "text": "recommend"}, {"end": 385.04, "start": 384.44, "text": "videos"}, {"end": 385.28, "start": 385.04, "text": "which"}, {"end": 385.88, "start": 385.28, "text": "are"}, {"end": 386.12, "start": 385.88, "text": "kind"}, {"end": 386.2, "start": 386.12, "text": "of"}, {"end": 387.28, "start": 386.2, "text": "click-baity"}, {"end": 388.76, "start": 387.28, "text": "or"}, {"end": 389.52, "start": 388.76, "text": "eye-catching,"}, {"end": 389.68, "start": 389.52, "text": "but"}, {"end": 389.96, "start": 389.68, "text": "then"}], "text": " or the users are more familiar with. So as in this example, so we, it will recommend a video which the user has already been familiar with. And as a result, users do not get additional information from these watches. And in the worst cases, it can even recommend videos which are kind of click-baity or eye-catching, but then"}, {"chunks": [{"end": 390.76, "start": 390.0, "text": "can"}, {"end": 391.52, "start": 390.76, "text": "hurt"}, {"end": 391.92, "start": 391.52, "text": "the"}, {"end": 392.24, "start": 391.92, "text": "users"}, {"end": 392.96, "start": 392.24, "text": "trust"}, {"end": 393.28, "start": 392.96, "text": "on"}, {"end": 393.28, "start": 393.28, "text": "the"}, {"end": 393.84, "start": 393.28, "text": "platform"}, {"end": 393.84, "start": 393.84, "text": "in"}, {"end": 393.92, "start": 393.84, "text": "the"}, {"end": 394.4, "start": 393.92, "text": "long"}, {"end": 394.68, "start": 394.4, "text": "run"}, {"end": 395.08, "start": 394.68, "text": "but"}, {"end": 395.36, "start": 395.08, "text": "in"}, {"end": 395.72, "start": 395.36, "text": "on"}, {"end": 395.96, "start": 395.72, "text": "the"}, {"end": 396.44, "start": 395.96, "text": "other"}, {"end": 396.8, "start": 396.44, "text": "hand"}, {"end": 396.88, "start": 396.8, "text": "what"}, {"end": 397.48, "start": 396.88, "text": "we"}, {"end": 397.88, "start": 397.48, "text": "like"}, {"end": 398.44, "start": 397.88, "text": "the"}, {"end": 399.0, "start": 398.44, "text": "recommender"}, {"end": 399.36, "start": 399.0, "text": "system"}, {"end": 399.52, "start": 399.36, "text": "to"}, {"end": 399.8, "start": 399.52, "text": "do"}, {"end": 400.2, "start": 399.8, "text": "is"}, {"end": 400.88, "start": 400.2, "text": "to"}, {"end": 401.4, "start": 400.88, "text": "actually"}, {"end": 401.8, "start": 401.4, "text": "planning"}, {"end": 402.32, "start": 401.8, "text": "for"}, {"end": 402.72, "start": 402.32, "text": "the"}, {"end": 402.96, "start": 402.72, "text": "long"}, {"end": 403.68, "start": 402.96, "text": "term"}, {"end": 403.88, "start": 403.68, "text": "so"}, {"end": 404.56, "start": 403.88, "text": "that"}, {"end": 404.68, "start": 404.56, "text": "you"}, {"end": 404.88, "start": 404.68, "text": "will"}, {"end": 405.48, "start": 404.88, "text": "recommend"}, {"end": 406.32, "start": 405.48, "text": "videos"}, {"end": 406.4, "start": 406.32, "text": "that"}, {"end": 406.68, "start": 406.4, "text": "can"}, {"end": 407.24, "start": 406.68, "text": "actually"}, {"end": 407.84, "start": 407.24, "text": "broader"}, {"end": 408.28, "start": 407.84, "text": "the"}, {"end": 408.76, "start": 408.28, "text": "users"}, {"end": 409.36, "start": 408.76, "text": "interest"}, {"end": 410.08, "start": 409.36, "text": "group"}, {"end": 410.36, "start": 410.08, "text": "so"}, {"end": 410.92, "start": 410.36, "text": "that"}, {"end": 410.92, "start": 410.92, "text": "they"}, {"end": 412.2, "start": 410.92, "text": "can"}, {"end": 412.84, "start": 412.2, "text": "lead"}, {"end": 413.28, "start": 412.84, "text": "to"}, {"end": 413.8, "start": 413.28, "text": "more"}, {"end": 414.6, "start": 413.8, "text": "longer"}, {"end": 414.96, "start": 414.6, "text": "term"}, {"end": 415.36, "start": 414.96, "text": "user"}, {"end": 420.0, "start": 415.36, "text": "utility"}], "text": " can hurt the users trust on the platform in the long run but in on the other hand what we like the recommender system to do is to actually planning for the long term so that you will recommend videos that can actually broader the users interest group so that they can lead to more longer term user utility"}, {"chunks": [{"end": 420.84, "start": 420.0, "text": "So"}, {"end": 421.12, "start": 420.84, "text": "every"}, {"end": 422.88, "start": 421.12, "text": "recommender"}, {"end": 423.24, "start": 422.88, "text": "system"}, {"end": 424.12, "start": 423.24, "text": "researchers"}, {"end": 424.28, "start": 424.12, "text": "will"}, {"end": 424.56, "start": 424.28, "text": "probably"}, {"end": 424.96, "start": 424.56, "text": "have"}, {"end": 425.28, "start": 424.96, "text": "their"}, {"end": 425.72, "start": 425.28, "text": "vision"}, {"end": 426.48, "start": 425.72, "text": "of"}, {"end": 427.36, "start": 426.48, "text": "what"}, {"end": 427.68, "start": 427.36, "text": "they"}, {"end": 428.08, "start": 427.68, "text": "like"}, {"end": 428.4, "start": 428.08, "text": "the"}, {"end": 428.68, "start": 428.4, "text": "next"}, {"end": 429.44, "start": 428.68, "text": "generation"}, {"end": 429.72, "start": 429.44, "text": "of"}, {"end": 430.36, "start": 429.72, "text": "recommender"}, {"end": 430.52, "start": 430.36, "text": "system"}, {"end": 430.52, "start": 430.52, "text": "to"}, {"end": 431.28, "start": 430.52, "text": "be."}, {"end": 431.64, "start": 431.28, "text": "So"}, {"end": 432.64, "start": 431.64, "text": "from"}, {"end": 432.92, "start": 432.64, "text": "our"}, {"end": 433.24, "start": 432.92, "text": "perspective,"}, {"end": 433.36, "start": 433.24, "text": "what"}, {"end": 433.36, "start": 433.36, "text": "we"}, {"end": 434.04, "start": 433.36, "text": "want"}, {"end": 434.36, "start": 434.04, "text": "the"}, {"end": 435.68, "start": 434.36, "text": "recommender"}, {"end": 436.32, "start": 435.68, "text": "system"}, {"end": 436.32, "start": 436.32, "text": "to"}, {"end": 436.36, "start": 436.32, "text": "do"}, {"end": 436.64, "start": 436.36, "text": "is"}, {"end": 436.96, "start": 436.64, "text": "kind"}, {"end": 437.28, "start": 436.96, "text": "of"}, {"end": 438.48, "start": 437.28, "text": "form"}, {"end": 438.84, "start": 438.48, "text": "a"}, {"end": 440.2, "start": 438.84, "text": "partnership"}, {"end": 440.2, "start": 440.2, "text": "with"}, {"end": 440.52, "start": 440.2, "text": "the"}, {"end": 440.96, "start": 440.52, "text": "user"}, {"end": 441.24, "start": 440.96, "text": "as"}, {"end": 441.48, "start": 441.24, "text": "they"}, {"end": 442.12, "start": 441.48, "text": "navigate"}, {"end": 442.32, "start": 442.12, "text": "the"}, {"end": 442.92, "start": 442.32, "text": "platform"}, {"end": 442.96, "start": 442.92, "text": "so"}, {"end": 443.28, "start": 442.96, "text": "that"}, {"end": 443.72, "start": 443.28, "text": "the"}, {"end": 444.96, "start": 443.72, "text": "recommender"}, {"end": 445.68, "start": 444.96, "text": "system"}, {"end": 445.88, "start": 445.68, "text": "is"}, {"end": 446.08, "start": 445.88, "text": "not"}, {"end": 446.56, "start": 446.08, "text": "only"}, {"end": 447.04, "start": 446.56, "text": "able"}, {"end": 447.72, "start": 447.04, "text": "to"}, {"end": 448.4, "start": 447.72, "text": "kind"}, {"end": 448.72, "start": 448.4, "text": "of"}, {"end": 449.28, "start": 448.72, "text": "adapt"}, {"end": 449.28, "start": 449.28, "text": "to"}, {"end": 449.44, "start": 449.28, "text": "the"}, {"end": 449.96, "start": 449.44, "text": "user's"}], "text": " So every recommender system researchers will probably have their vision of what they like the next generation of recommender system to be. So from our perspective, what we want the recommender system to do is kind of form a partnership with the user as they navigate the platform so that the recommender system is not only able to kind of adapt to the user's"}, {"chunks": [{"end": 451.44, "start": 450.0, "text": "users'"}, {"end": 452.0, "start": 451.44, "text": "interest"}, {"end": 452.84, "start": 452.0, "text": "quickly,"}, {"end": 453.56, "start": 452.84, "text": "but"}, {"end": 454.6, "start": 453.56, "text": "also"}, {"end": 454.88, "start": 454.6, "text": "help"}, {"end": 455.08, "start": 454.88, "text": "users"}, {"end": 456.0, "start": 455.08, "text": "discover"}, {"end": 456.4, "start": 456.0, "text": "new"}, {"end": 457.0, "start": 456.4, "text": "interests"}, {"end": 457.12, "start": 457.0, "text": "so"}, {"end": 457.4, "start": 457.12, "text": "that"}, {"end": 457.56, "start": 457.4, "text": "we"}, {"end": 457.92, "start": 457.56, "text": "can"}, {"end": 458.6, "start": 457.92, "text": "optimize"}, {"end": 459.48, "start": 458.6, "text": "for"}, {"end": 460.56, "start": 459.48, "text": "long-term"}, {"end": 461.04, "start": 460.56, "text": "user"}, {"end": 465.0, "start": 461.04, "text": "utility."}, {"end": 465.56, "start": 465.0, "text": "And"}, {"end": 465.8, "start": 465.56, "text": "we"}, {"end": 466.2, "start": 465.8, "text": "believe"}, {"end": 466.64, "start": 466.2, "text": "that"}, {"end": 467.28, "start": 466.64, "text": "reinforcement"}, {"end": 467.76, "start": 467.28, "text": "learning"}, {"end": 467.88, "start": 467.76, "text": "is"}, {"end": 468.4, "start": 467.88, "text": "actually"}, {"end": 468.56, "start": 468.4, "text": "a"}, {"end": 468.56, "start": 468.56, "text": "great"}, {"end": 468.6, "start": 468.56, "text": "tool"}, {"end": 469.48, "start": 468.6, "text": "for"}, {"end": 470.04, "start": 469.48, "text": "us"}, {"end": 470.24, "start": 470.04, "text": "to"}, {"end": 470.72, "start": 470.24, "text": "achieve"}, {"end": 471.76, "start": 470.72, "text": "that."}, {"end": 472.4, "start": 471.76, "text": "So"}, {"end": 472.96, "start": 472.4, "text": "reinforcement"}, {"end": 473.32, "start": 472.96, "text": "learning"}, {"end": 473.48, "start": 473.32, "text": "are"}, {"end": 474.08, "start": 473.48, "text": "naturally"}, {"end": 474.72, "start": 474.08, "text": "designed"}, {"end": 474.72, "start": 474.72, "text": "to"}, {"end": 475.12, "start": 474.72, "text": "plan"}, {"end": 475.12, "start": 475.12, "text": "a"}, {"end": 475.84, "start": 475.12, "text": "sequence"}, {"end": 476.2, "start": 475.84, "text": "of"}, {"end": 477.24, "start": 476.2, "text": "actions"}, {"end": 477.4, "start": 477.24, "text": "so"}, {"end": 477.72, "start": 477.4, "text": "that"}, {"end": 477.88, "start": 477.72, "text": "they"}, {"end": 478.16, "start": 477.88, "text": "can"}, {"end": 478.44, "start": 478.16, "text": "actually"}, {"end": 479.0, "start": 478.44, "text": "change"}, {"end": 479.28, "start": 479.0, "text": "the"}, {"end": 479.68, "start": 479.28, "text": "underlying"}, {"end": 479.96, "start": 479.68, "text": "state"}], "text": " users' interest quickly, but also help users discover new interests so that we can optimize for long-term user utility. And we believe that reinforcement learning is actually a great tool for us to achieve that. So reinforcement learning are naturally designed to plan a sequence of actions so that they can actually change the underlying state"}, {"chunks": [{"end": 480.84, "start": 480.0, "text": "and"}, {"end": 481.64, "start": 480.84, "text": "maximize"}, {"end": 482.28, "start": 481.64, "text": "long-term"}, {"end": 483.48, "start": 482.28, "text": "reward."}, {"end": 483.68, "start": 483.48, "text": "And"}, {"end": 484.28, "start": 483.68, "text": "second,"}, {"end": 484.72, "start": 484.28, "text": "there's"}, {"end": 484.96, "start": 484.72, "text": "a"}, {"end": 485.36, "start": 484.96, "text": "large"}, {"end": 485.84, "start": 485.36, "text": "body"}, {"end": 486.0, "start": 485.84, "text": "of"}, {"end": 486.08, "start": 486.0, "text": "literature"}, {"end": 486.12, "start": 486.08, "text": "in"}, {"end": 487.32, "start": 486.12, "text": "reinforcement"}, {"end": 487.76, "start": 487.32, "text": "learning"}, {"end": 488.32, "start": 487.76, "text": "that"}, {"end": 488.96, "start": 488.32, "text": "deal"}, {"end": 489.44, "start": 488.96, "text": "with"}, {"end": 489.64, "start": 489.44, "text": "this"}, {"end": 489.88, "start": 489.64, "text": "kind"}, {"end": 490.08, "start": 489.88, "text": "of"}, {"end": 490.64, "start": 490.08, "text": "bandit"}, {"end": 490.8, "start": 490.64, "text": "feedback"}, {"end": 491.2, "start": 490.8, "text": "where"}, {"end": 491.52, "start": 491.2, "text": "you"}, {"end": 491.92, "start": 491.52, "text": "only"}, {"end": 492.28, "start": 491.92, "text": "observe"}, {"end": 492.8, "start": 492.28, "text": "feedbacks"}, {"end": 493.04, "start": 492.8, "text": "on"}, {"end": 493.52, "start": 493.04, "text": "actions"}, {"end": 493.72, "start": 493.52, "text": "you"}, {"end": 493.88, "start": 493.72, "text": "have"}, {"end": 494.68, "start": 493.88, "text": "already,"}, {"end": 495.04, "start": 494.68, "text": "on"}, {"end": 495.48, "start": 495.04, "text": "actions"}, {"end": 495.64, "start": 495.48, "text": "you"}, {"end": 496.2, "start": 495.64, "text": "chose"}, {"end": 496.28, "start": 496.2, "text": "but"}, {"end": 496.4, "start": 496.28, "text": "not"}, {"end": 497.0, "start": 496.4, "text": "the"}, {"end": 497.88, "start": 497.0, "text": "others."}, {"end": 498.4, "start": 497.88, "text": "So"}, {"end": 498.72, "start": 498.4, "text": "using"}, {"end": 499.44, "start": 498.72, "text": "techniques"}, {"end": 499.72, "start": 499.44, "text": "such"}, {"end": 499.92, "start": 499.72, "text": "as"}, {"end": 500.92, "start": 499.92, "text": "exploration"}, {"end": 501.32, "start": 500.92, "text": "and"}, {"end": 505.16, "start": 501.32, "text": "policy"}, {"end": 505.8, "start": 505.16, "text": "learning."}, {"end": 506.8, "start": 505.8, "text": "But"}, {"end": 507.44, "start": 506.8, "text": "applying"}, {"end": 508.04, "start": 507.44, "text": "reinforcement"}, {"end": 508.56, "start": 508.04, "text": "learning"}, {"end": 508.8, "start": 508.56, "text": "in"}, {"end": 509.16, "start": 508.8, "text": "recommender"}, {"end": 510.0, "start": 509.16, "text": "systems"}], "text": " and maximize long-term reward. And second, there's a large body of literature in reinforcement learning that deal with this kind of bandit feedback where you only observe feedbacks on actions you have already, on actions you chose but not the others. So using techniques such as exploration and policy learning. But applying reinforcement learning in recommender systems"}, {"chunks": [{"end": 511.16, "start": 510.0, "text": "not"}, {"end": 512.24, "start": 511.16, "text": "straightforward."}, {"end": 512.4, "start": 512.24, "text": "So"}, {"end": 512.8, "start": 512.4, "text": "there"}, {"end": 513.12, "start": 512.8, "text": "are"}, {"end": 513.16, "start": 513.12, "text": "a"}, {"end": 513.16, "start": 513.16, "text": "lot"}, {"end": 513.16, "start": 513.16, "text": "of"}, {"end": 513.92, "start": 513.16, "text": "challenges"}, {"end": 514.72, "start": 513.92, "text": "coming"}, {"end": 515.16, "start": 514.72, "text": "from"}, {"end": 515.16, "start": 515.16, "text": "the"}, {"end": 515.68, "start": 515.16, "text": "specific"}, {"end": 516.24, "start": 515.68, "text": "setups"}, {"end": 516.48, "start": 516.24, "text": "we"}, {"end": 516.84, "start": 516.48, "text": "have"}, {"end": 516.84, "start": 516.84, "text": "in"}, {"end": 517.44, "start": 516.84, "text": "recommender"}, {"end": 518.52, "start": 517.44, "text": "systems."}, {"end": 518.76, "start": 518.52, "text": "So"}, {"end": 519.28, "start": 518.76, "text": "first,"}, {"end": 519.4, "start": 519.28, "text": "we"}, {"end": 520.0, "start": 519.4, "text": "usually"}, {"end": 520.24, "start": 520.0, "text": "deal"}, {"end": 520.48, "start": 520.24, "text": "with"}, {"end": 520.64, "start": 520.48, "text": "a"}, {"end": 521.52, "start": 520.64, "text": "much"}, {"end": 522.32, "start": 521.52, "text": "bigger"}, {"end": 523.0, "start": 522.32, "text": "action"}, {"end": 523.48, "start": 523.0, "text": "space"}, {"end": 523.88, "start": 523.48, "text": "than"}, {"end": 524.2, "start": 523.88, "text": "the"}, {"end": 525.2, "start": 524.2, "text": "regular"}, {"end": 525.52, "start": 525.2, "text": "standard"}, {"end": 526.2, "start": 525.52, "text": "applications"}, {"end": 527.32, "start": 526.2, "text": "for"}, {"end": 527.6, "start": 527.32, "text": "RIL."}, {"end": 527.8, "start": 527.6, "text": "So"}, {"end": 527.92, "start": 527.8, "text": "in"}, {"end": 528.16, "start": 527.92, "text": "our"}, {"end": 528.44, "start": 528.16, "text": "case,"}, {"end": 528.84, "start": 528.44, "text": "we"}, {"end": 529.24, "start": 528.84, "text": "have"}, {"end": 529.48, "start": 529.24, "text": "to"}, {"end": 530.52, "start": 529.48, "text": "source"}, {"end": 530.72, "start": 530.52, "text": "through"}, {"end": 531.04, "start": 530.72, "text": "like"}, {"end": 531.52, "start": 531.04, "text": "millions"}, {"end": 531.72, "start": 531.52, "text": "or"}, {"end": 532.4, "start": 531.72, "text": "billions"}, {"end": 532.92, "start": 532.4, "text": "of"}, {"end": 533.48, "start": 532.92, "text": "videos"}, {"end": 533.56, "start": 533.48, "text": "in"}, {"end": 533.92, "start": 533.56, "text": "order"}, {"end": 534.16, "start": 533.92, "text": "to"}, {"end": 534.92, "start": 534.16, "text": "recommend."}, {"end": 535.16, "start": 534.92, "text": "So"}, {"end": 535.8, "start": 535.16, "text": "the"}, {"end": 536.24, "start": 535.8, "text": "action"}, {"end": 536.56, "start": 536.24, "text": "space"}, {"end": 536.8, "start": 536.56, "text": "is"}, {"end": 536.96, "start": 536.8, "text": "in"}, {"end": 537.6, "start": 536.96, "text": "orders"}, {"end": 537.76, "start": 537.6, "text": "of"}, {"end": 538.08, "start": 537.76, "text": "millions"}, {"end": 538.32, "start": 538.08, "text": "or"}, {"end": 538.76, "start": 538.32, "text": "billions."}, {"end": 539.04, "start": 538.76, "text": "And"}, {"end": 539.16, "start": 539.04, "text": "if"}, {"end": 539.2, "start": 539.16, "text": "you"}, {"end": 539.52, "start": 539.2, "text": "want"}, {"end": 539.52, "start": 539.52, "text": "to"}, {"end": 539.52, "start": 539.52, "text": "do"}, {"end": 539.96, "start": 539.52, "text": "setups,"}], "text": " not straightforward. So there are a lot of challenges coming from the specific setups we have in recommender systems. So first, we usually deal with a much bigger action space than the regular standard applications for RIL. So in our case, we have to source through like millions or billions of videos in order to recommend. So the action space is in orders of millions or billions. And if you want to do setups,"}, {"chunks": [{"end": 541.16, "start": 540.0, "text": "recommendation"}, {"end": 541.4, "start": 541.16, "text": "then"}, {"end": 541.8, "start": 541.4, "text": "action"}, {"end": 542.2, "start": 541.8, "text": "space"}, {"end": 542.64, "start": 542.2, "text": "is"}, {"end": 543.24, "start": 542.64, "text": "even"}, {"end": 545.04, "start": 543.24, "text": "bigger"}, {"end": 545.4, "start": 545.04, "text": "and"}, {"end": 546.16, "start": 545.4, "text": "second"}, {"end": 547.24, "start": 546.16, "text": "exploration"}, {"end": 547.44, "start": 547.24, "text": "is"}, {"end": 547.8, "start": 547.44, "text": "actually"}, {"end": 548.24, "start": 547.8, "text": "much"}, {"end": 549.0, "start": 548.24, "text": "expensive"}, {"end": 549.0, "start": 549.0, "text": "in"}, {"end": 549.08, "start": 549.0, "text": "the"}, {"end": 550.08, "start": 549.08, "text": "recommendation"}, {"end": 550.64, "start": 550.08, "text": "settings"}, {"end": 550.76, "start": 550.64, "text": "so"}, {"end": 550.92, "start": 550.76, "text": "you"}, {"end": 551.48, "start": 550.92, "text": "can"}, {"end": 552.04, "start": 551.48, "text": "easily"}, {"end": 552.76, "start": 552.04, "text": "imagine"}, {"end": 552.88, "start": 552.76, "text": "if"}, {"end": 553.44, "start": 552.88, "text": "the"}, {"end": 554.16, "start": 553.44, "text": "recommender"}, {"end": 554.36, "start": 554.16, "text": "just"}, {"end": 554.8, "start": 554.36, "text": "show"}, {"end": 555.16, "start": 554.8, "text": "you"}, {"end": 556.0, "start": 555.16, "text": "random"}, {"end": 557.2, "start": 556.0, "text": "content"}, {"end": 557.44, "start": 557.2, "text": "that"}, {"end": 557.8, "start": 557.44, "text": "could"}, {"end": 558.32, "start": 557.8, "text": "definitely"}, {"end": 558.48, "start": 558.32, "text": "lead"}, {"end": 558.72, "start": 558.48, "text": "to"}, {"end": 559.04, "start": 558.72, "text": "very"}, {"end": 559.72, "start": 559.04, "text": "bad"}, {"end": 560.08, "start": 559.72, "text": "user"}, {"end": 562.0, "start": 560.08, "text": "experience"}, {"end": 562.36, "start": 562.0, "text": "and"}, {"end": 562.76, "start": 562.36, "text": "third"}, {"end": 563.32, "start": 562.76, "text": "so"}, {"end": 564.04, "start": 563.32, "text": "actually"}, {"end": 564.56, "start": 564.04, "text": "most"}, {"end": 565.16, "start": 564.56, "text": "of"}, {"end": 565.88, "start": 565.16, "text": "the"}, {"end": 566.6, "start": 565.88, "text": "data"}, {"end": 566.8, "start": 566.6, "text": "we"}, {"end": 567.08, "start": 566.8, "text": "deal"}, {"end": 567.8, "start": 567.08, "text": "with"}, {"end": 568.12, "start": 567.8, "text": "is"}, {"end": 568.68, "start": 568.12, "text": "coming"}, {"end": 569.04, "start": 568.68, "text": "from"}, {"end": 569.44, "start": 569.04, "text": "an"}, {"end": 570.0, "start": 569.44, "text": "off-police"}], "text": " recommendation then action space is even bigger and second exploration is actually much expensive in the recommendation settings so you can easily imagine if the recommender just show you random content that could definitely lead to very bad user experience and third so actually most of the data we deal with is coming from an off-police"}, {"chunks": [{"end": 570.44, "start": 570.0, "text": "So"}, {"end": 570.8, "start": 570.44, "text": "it's"}, {"end": 571.2, "start": 570.8, "text": "coming"}, {"end": 571.76, "start": 571.2, "text": "from"}, {"end": 572.16, "start": 571.76, "text": "a"}, {"end": 573.4, "start": 572.16, "text": "behavior"}, {"end": 574.24, "start": 573.4, "text": "agent"}, {"end": 574.52, "start": 574.24, "text": "which"}, {"end": 574.8, "start": 574.52, "text": "actually"}, {"end": 575.32, "start": 574.8, "text": "have"}, {"end": 575.36, "start": 575.32, "text": "a"}, {"end": 575.6, "start": 575.36, "text": "different"}, {"end": 576.0, "start": 575.6, "text": "policy"}, {"end": 576.2, "start": 576.0, "text": "than"}, {"end": 576.32, "start": 576.2, "text": "the"}, {"end": 576.68, "start": 576.32, "text": "ones"}, {"end": 576.8, "start": 576.68, "text": "we"}, {"end": 577.2, "start": 576.8, "text": "are"}, {"end": 577.64, "start": 577.2, "text": "learning."}, {"end": 577.84, "start": 577.64, "text": "So"}, {"end": 577.92, "start": 577.84, "text": "we"}, {"end": 578.24, "start": 577.92, "text": "need"}, {"end": 578.56, "start": 578.24, "text": "to"}, {"end": 578.76, "start": 578.56, "text": "be"}, {"end": 579.2, "start": 578.76, "text": "able"}, {"end": 579.36, "start": 579.2, "text": "to"}, {"end": 579.56, "start": 579.36, "text": "do"}, {"end": 580.52, "start": 579.56, "text": "effective"}, {"end": 580.76, "start": 580.52, "text": "of"}, {"end": 581.2, "start": 580.76, "text": "policy"}, {"end": 582.44, "start": 581.2, "text": "learning."}, {"end": 582.92, "start": 582.44, "text": "And"}, {"end": 583.52, "start": 582.92, "text": "fourth"}, {"end": 583.64, "start": 583.52, "text": "is"}, {"end": 584.36, "start": 583.64, "text": "that,"}, {"end": 584.96, "start": 584.36, "text": "so"}, {"end": 585.32, "start": 584.96, "text": "in"}, {"end": 585.92, "start": 585.32, "text": "the"}, {"end": 586.32, "start": 585.92, "text": "recommender"}, {"end": 586.84, "start": 586.32, "text": "system"}, {"end": 587.84, "start": 586.84, "text": "cases,"}, {"end": 588.12, "start": 587.84, "text": "actually"}, {"end": 588.32, "start": 588.12, "text": "we"}, {"end": 588.36, "start": 588.32, "text": "do"}, {"end": 589.04, "start": 588.36, "text": "not"}, {"end": 590.0, "start": 589.04, "text": "observe"}, {"end": 590.44, "start": 590.0, "text": "the"}, {"end": 591.24, "start": 590.44, "text": "underlying"}, {"end": 591.8, "start": 591.24, "text": "state"}, {"end": 592.84, "start": 591.8, "text": "changes."}, {"end": 593.0, "start": 592.84, "text": "So"}, {"end": 593.2, "start": 593.0, "text": "the"}, {"end": 593.52, "start": 593.2, "text": "user"}, {"end": 593.64, "start": 593.52, "text": "is"}, {"end": 594.0, "start": 593.64, "text": "not"}, {"end": 594.24, "start": 594.0, "text": "going"}, {"end": 594.48, "start": 594.24, "text": "to"}, {"end": 595.0, "start": 594.48, "text": "tell"}, {"end": 595.16, "start": 595.0, "text": "us"}, {"end": 595.52, "start": 595.16, "text": "explicitly"}, {"end": 595.52, "start": 595.52, "text": "what"}, {"end": 595.92, "start": 595.52, "text": "they"}, {"end": 596.08, "start": 595.92, "text": "are"}, {"end": 596.8, "start": 596.08, "text": "interested"}, {"end": 597.04, "start": 596.8, "text": "in."}, {"end": 597.36, "start": 597.04, "text": "And"}, {"end": 597.56, "start": 597.36, "text": "we"}, {"end": 597.56, "start": 597.56, "text": "have"}, {"end": 597.64, "start": 597.56, "text": "to"}, {"end": 598.16, "start": 597.64, "text": "infer"}, {"end": 598.44, "start": 598.16, "text": "the"}, {"end": 599.44, "start": 598.44, "text": "user"}, {"end": 599.96, "start": 599.44, "text": "interest"}], "text": " So it's coming from a behavior agent which actually have a different policy than the ones we are learning. So we need to be able to do effective of policy learning. And fourth is that, so in the recommender system cases, actually we do not observe the underlying state changes. So the user is not going to tell us explicitly what they are interested in. And we have to infer the user interest"}, {"chunks": [{"end": 601.04, "start": 600.0, "text": "from"}, {"end": 601.64, "start": 601.04, "text": "the"}, {"end": 602.32, "start": 601.64, "text": "activities"}, {"end": 602.56, "start": 602.32, "text": "they"}, {"end": 603.88, "start": 602.56, "text": "have"}, {"end": 604.16, "start": 603.88, "text": "on"}, {"end": 604.2, "start": 604.16, "text": "the"}, {"end": 605.6, "start": 604.2, "text": "platform."}, {"end": 605.8, "start": 605.6, "text": "And"}, {"end": 606.16, "start": 605.8, "text": "fifth,"}, {"end": 606.32, "start": 606.16, "text": "of"}, {"end": 606.56, "start": 606.32, "text": "course,"}, {"end": 606.68, "start": 606.56, "text": "we"}, {"end": 606.96, "start": 606.68, "text": "have"}, {"end": 607.32, "start": 606.96, "text": "very"}, {"end": 607.8, "start": 607.32, "text": "noisy"}, {"end": 608.12, "start": 607.8, "text": "and"}, {"end": 609.24, "start": 608.12, "text": "sparse"}, {"end": 609.88, "start": 609.24, "text": "reward"}, {"end": 609.88, "start": 609.88, "text": "signals"}, {"end": 610.8, "start": 609.88, "text": "coming"}, {"end": 611.12, "start": 610.8, "text": "from"}, {"end": 611.68, "start": 611.12, "text": "the"}, {"end": 615.0, "start": 611.68, "text": "users."}, {"end": 615.92, "start": 615.0, "text": "So"}, {"end": 616.76, "start": 615.92, "text": "here,"}, {"end": 617.24, "start": 616.76, "text": "I'm"}, {"end": 617.4, "start": 617.24, "text": "going"}, {"end": 617.56, "start": 617.4, "text": "to"}, {"end": 617.84, "start": 617.56, "text": "try"}, {"end": 618.48, "start": 617.84, "text": "share"}, {"end": 619.24, "start": 618.48, "text": "some"}, {"end": 619.68, "start": 619.24, "text": "of"}, {"end": 620.04, "start": 619.68, "text": "the"}, {"end": 620.32, "start": 620.04, "text": "work"}, {"end": 620.48, "start": 620.32, "text": "we"}, {"end": 620.56, "start": 620.48, "text": "did"}, {"end": 620.88, "start": 620.56, "text": "in"}, {"end": 621.36, "start": 620.88, "text": "this"}, {"end": 622.32, "start": 621.36, "text": "domain."}, {"end": 622.88, "start": 622.32, "text": "I'm"}, {"end": 623.8, "start": 622.88, "text": "trying"}, {"end": 624.16, "start": 623.8, "text": "to"}, {"end": 624.68, "start": 624.16, "text": "address"}, {"end": 625.0, "start": 624.68, "text": "some"}, {"end": 625.48, "start": 625.0, "text": "of"}, {"end": 625.64, "start": 625.48, "text": "the"}, {"end": 626.28, "start": 625.64, "text": "challenges"}, {"end": 626.48, "start": 626.28, "text": "we"}, {"end": 627.28, "start": 626.48, "text": "have."}, {"end": 627.52, "start": 627.28, "text": "So"}, {"end": 627.72, "start": 627.52, "text": "I'm"}, {"end": 627.88, "start": 627.72, "text": "going"}, {"end": 627.88, "start": 627.88, "text": "to"}, {"end": 628.4, "start": 627.88, "text": "try"}, {"end": 628.48, "start": 628.4, "text": "to"}, {"end": 628.76, "start": 628.48, "text": "stay"}, {"end": 629.08, "start": 628.76, "text": "at"}, {"end": 629.44, "start": 629.08, "text": "a"}, {"end": 629.88, "start": 629.44, "text": "relative"}, {"end": 629.92, "start": 629.88, "text": "high"}, {"end": 630.0, "start": 629.92, "text": "level"}], "text": " from the activities they have on the platform. And fifth, of course, we have very noisy and sparse reward signals coming from the users. So here, I'm going to try share some of the work we did in this domain. I'm trying to address some of the challenges we have. So I'm going to try to stay at a relative high level"}, {"chunks": [{"end": 631.04, "start": 630.0, "text": "high"}, {"end": 631.28, "start": 631.04, "text": "level,"}, {"end": 631.64, "start": 631.28, "text": "but"}, {"end": 631.72, "start": 631.64, "text": "if"}, {"end": 631.76, "start": 631.72, "text": "you"}, {"end": 632.08, "start": 631.76, "text": "want"}, {"end": 632.16, "start": 632.08, "text": "to"}, {"end": 632.4, "start": 632.16, "text": "know"}, {"end": 632.8, "start": 632.4, "text": "more"}, {"end": 633.2, "start": 632.8, "text": "details"}, {"end": 633.72, "start": 633.2, "text": "of"}, {"end": 634.12, "start": 633.72, "text": "this"}, {"end": 634.72, "start": 634.12, "text": "reinforced"}, {"end": 635.76, "start": 634.72, "text": "recommender,"}, {"end": 636.04, "start": 635.76, "text": "please"}, {"end": 636.28, "start": 636.04, "text": "come"}, {"end": 636.84, "start": 636.28, "text": "to"}, {"end": 637.2, "start": 636.84, "text": "our"}, {"end": 637.4, "start": 637.2, "text": "talk"}, {"end": 638.04, "start": 637.4, "text": "on"}, {"end": 640.28, "start": 638.04, "text": "Wednesday."}, {"end": 640.8, "start": 640.28, "text": "So"}, {"end": 641.44, "start": 640.8, "text": "YouTube"}, {"end": 642.2, "start": 641.44, "text": "as"}, {"end": 642.76, "start": 642.2, "text": "one"}, {"end": 643.2, "start": 642.76, "text": "of"}, {"end": 643.28, "start": 643.2, "text": "the"}, {"end": 643.96, "start": 643.28, "text": "largest"}, {"end": 644.52, "start": 643.96, "text": "video"}, {"end": 645.12, "start": 644.52, "text": "hosting"}, {"end": 645.96, "start": 645.12, "text": "platform"}, {"end": 646.2, "start": 645.96, "text": "has"}, {"end": 646.48, "start": 646.2, "text": "been"}, {"end": 647.0, "start": 646.48, "text": "attracting"}, {"end": 647.36, "start": 647.0, "text": "like"}, {"end": 647.84, "start": 647.36, "text": "billions"}, {"end": 648.2, "start": 647.84, "text": "of"}, {"end": 648.88, "start": 648.2, "text": "users"}, {"end": 649.08, "start": 648.88, "text": "and"}, {"end": 650.8, "start": 649.08, "text": "creators."}, {"end": 651.2, "start": 650.8, "text": "And"}, {"end": 651.72, "start": 651.2, "text": "so"}, {"end": 651.88, "start": 651.72, "text": "people"}, {"end": 652.0, "start": 651.88, "text": "do"}, {"end": 653.48, "start": 652.0, "text": "spend"}, {"end": 653.76, "start": 653.48, "text": "a"}, {"end": 654.2, "start": 653.76, "text": "lot"}, {"end": 654.92, "start": 654.2, "text": "of"}, {"end": 655.44, "start": 654.92, "text": "time"}, {"end": 655.6, "start": 655.44, "text": "on"}, {"end": 655.92, "start": 655.6, "text": "this"}, {"end": 656.56, "start": 655.92, "text": "platform."}, {"end": 656.92, "start": 656.56, "text": "So"}, {"end": 657.08, "start": 656.92, "text": "it"}, {"end": 657.2, "start": 657.08, "text": "was"}, {"end": 657.6, "start": 657.2, "text": "reported"}, {"end": 658.08, "start": 657.6, "text": "in"}, {"end": 659.96, "start": 658.08, "text": "2017"}], "text": " high level, but if you want to know more details of this reinforced recommender, please come to our talk on Wednesday. So YouTube as one of the largest video hosting platform has been attracting like billions of users and creators. And so people do spend a lot of time on this platform. So it was reported in 2017"}, {"chunks": [{"end": 660.56, "start": 660.0, "text": "that"}, {"end": 660.72, "start": 660.56, "text": "the"}, {"end": 662.28, "start": 660.72, "text": "viewship"}, {"end": 662.36, "start": 662.28, "text": "in"}, {"end": 662.6, "start": 662.36, "text": "YouTube"}, {"end": 662.84, "start": 662.6, "text": "is"}, {"end": 663.24, "start": 662.84, "text": "actually"}, {"end": 664.2, "start": 663.24, "text": "surpassing"}, {"end": 664.64, "start": 664.2, "text": "the"}, {"end": 665.04, "start": 664.64, "text": "time"}, {"end": 665.8, "start": 665.04, "text": "people"}, {"end": 666.44, "start": 665.8, "text": "spend"}, {"end": 666.84, "start": 666.44, "text": "in"}, {"end": 668.36, "start": 666.84, "text": "TVs."}, {"end": 669.76, "start": 668.36, "text": "And"}, {"end": 670.04, "start": 669.76, "text": "actually,"}, {"end": 670.2, "start": 670.04, "text": "most"}, {"end": 670.8, "start": 670.2, "text": "of"}, {"end": 671.2, "start": 670.8, "text": "the"}, {"end": 671.68, "start": 671.2, "text": "increase"}, {"end": 671.92, "start": 671.68, "text": "in"}, {"end": 673.04, "start": 671.92, "text": "viewship"}, {"end": 674.36, "start": 673.04, "text": "was"}, {"end": 674.88, "start": 674.36, "text": "caused"}, {"end": 675.24, "start": 674.88, "text": "by"}, {"end": 675.64, "start": 675.24, "text": "using"}, {"end": 675.88, "start": 675.64, "text": "these"}, {"end": 676.48, "start": 675.88, "text": "personalization"}, {"end": 677.04, "start": 676.48, "text": "algorithms,"}, {"end": 677.24, "start": 677.04, "text": "in"}, {"end": 677.6, "start": 677.24, "text": "other"}, {"end": 678.0, "start": 677.6, "text": "words,"}, {"end": 678.08, "start": 678.0, "text": "the"}, {"end": 678.8, "start": 678.08, "text": "recommender"}, {"end": 681.28, "start": 678.8, "text": "systems."}, {"end": 681.88, "start": 681.28, "text": "So"}, {"end": 682.32, "start": 681.88, "text": "the"}, {"end": 683.52, "start": 682.32, "text": "recommender"}, {"end": 683.92, "start": 683.52, "text": "system"}, {"end": 684.44, "start": 683.92, "text": "that's"}, {"end": 685.64, "start": 684.44, "text": "powering"}, {"end": 685.96, "start": 685.64, "text": "the"}, {"end": 686.36, "start": 685.96, "text": "YouTube"}, {"end": 686.56, "start": 686.36, "text": "video"}, {"end": 687.52, "start": 686.56, "text": "recommendation"}, {"end": 687.84, "start": 687.52, "text": "is"}, {"end": 688.0, "start": 687.84, "text": "a"}, {"end": 689.32, "start": 688.0, "text": "multi-stage"}, {"end": 690.0, "start": 689.32, "text": "recommender."}], "text": " that the viewship in YouTube is actually surpassing the time people spend in TVs. And actually, most of the increase in viewship was caused by using these personalization algorithms, in other words, the recommender systems. So the recommender system that's powering the YouTube video recommendation is a multi-stage recommender."}, {"chunks": [{"end": 690.68, "start": 690.0, "text": "So"}, {"end": 691.2, "start": 690.68, "text": "it"}, {"end": 692.12, "start": 691.2, "text": "goes"}, {"end": 693.0, "start": 692.12, "text": "from"}, {"end": 693.4, "start": 693.0, "text": "a"}, {"end": 693.56, "start": 693.4, "text": "video"}, {"end": 694.6, "start": 693.56, "text": "corpus"}, {"end": 695.24, "start": 694.6, "text": "of"}, {"end": 695.96, "start": 695.24, "text": "billions"}, {"end": 696.96, "start": 695.96, "text": "of"}, {"end": 697.4, "start": 696.96, "text": "videos"}, {"end": 697.72, "start": 697.4, "text": "and"}, {"end": 698.12, "start": 697.72, "text": "tries"}, {"end": 698.4, "start": 698.12, "text": "to"}, {"end": 699.04, "start": 698.4, "text": "select"}, {"end": 699.44, "start": 699.04, "text": "a"}, {"end": 699.72, "start": 699.44, "text": "few"}, {"end": 700.16, "start": 699.72, "text": "dozens"}, {"end": 700.28, "start": 700.16, "text": "of"}, {"end": 701.2, "start": 700.28, "text": "videos"}, {"end": 702.04, "start": 701.2, "text": "for"}, {"end": 702.56, "start": 702.04, "text": "users"}, {"end": 702.6, "start": 702.56, "text": "to"}, {"end": 703.6, "start": 702.6, "text": "consume"}, {"end": 704.04, "start": 703.6, "text": "for"}, {"end": 704.28, "start": 704.04, "text": "each"}, {"end": 704.68, "start": 704.28, "text": "user"}, {"end": 705.12, "start": 704.68, "text": "request."}, {"end": 706.88, "start": 705.12, "text": "So"}, {"end": 707.2, "start": 706.88, "text": "in"}, {"end": 707.56, "start": 707.2, "text": "this"}, {"end": 707.96, "start": 707.56, "text": "work,"}, {"end": 708.2, "start": 707.96, "text": "we"}, {"end": 708.96, "start": 708.2, "text": "focused"}, {"end": 709.24, "start": 708.96, "text": "on"}, {"end": 709.44, "start": 709.24, "text": "the"}, {"end": 710.04, "start": 709.44, "text": "candidate"}, {"end": 710.76, "start": 710.04, "text": "generation"}, {"end": 711.52, "start": 710.76, "text": "part,"}, {"end": 711.76, "start": 711.52, "text": "where"}, {"end": 711.96, "start": 711.76, "text": "its"}, {"end": 712.36, "start": 711.96, "text": "goal"}, {"end": 712.6, "start": 712.36, "text": "is"}, {"end": 713.28, "start": 712.6, "text": "to"}, {"end": 714.08, "start": 713.28, "text": "go"}, {"end": 714.36, "start": 714.08, "text": "in"}, {"end": 715.04, "start": 714.36, "text": "from"}, {"end": 715.72, "start": 715.04, "text": "the"}, {"end": 715.92, "start": 715.72, "text": "video"}, {"end": 716.68, "start": 715.92, "text": "corpus"}, {"end": 716.88, "start": 716.68, "text": "of"}, {"end": 717.2, "start": 716.88, "text": "billions"}, {"end": 717.44, "start": 717.2, "text": "of"}, {"end": 718.32, "start": 717.44, "text": "videos"}, {"end": 718.52, "start": 718.32, "text": "and"}, {"end": 718.84, "start": 718.52, "text": "narrow"}, {"end": 718.92, "start": 718.84, "text": "it"}, {"end": 719.28, "start": 718.92, "text": "down"}, {"end": 719.28, "start": 719.28, "text": "to"}, {"end": 719.28, "start": 719.28, "text": "a"}, {"end": 719.28, "start": 719.28, "text": "few"}, {"end": 719.68, "start": 719.28, "text": "hundred"}, {"end": 719.96, "start": 719.68, "text": "videos."}], "text": " So it goes from a video corpus of billions of videos and tries to select a few dozens of videos for users to consume for each user request. So in this work, we focused on the candidate generation part, where its goal is to go in from the video corpus of billions of videos and narrow it down to a few hundred videos."}, {"chunks": [{"end": 720.56, "start": 720.0, "text": "under"}, {"end": 721.56, "start": 720.56, "text": "most"}, {"end": 721.84, "start": 721.56, "text": "relevant"}, {"end": 721.84, "start": 721.84, "text": "to"}, {"end": 722.56, "start": 721.84, "text": "pass"}, {"end": 722.84, "start": 722.56, "text": "to"}, {"end": 722.92, "start": 722.84, "text": "the"}, {"end": 723.48, "start": 722.92, "text": "second"}, {"end": 724.56, "start": 723.48, "text": "stage."}, {"end": 724.68, "start": 724.56, "text": "So"}, {"end": 725.12, "start": 724.68, "text": "there"}, {"end": 725.2, "start": 725.12, "text": "are"}, {"end": 725.2, "start": 725.2, "text": "a"}, {"end": 725.2, "start": 725.2, "text": "few"}, {"end": 725.8, "start": 725.2, "text": "challenges"}, {"end": 726.0, "start": 725.8, "text": "we"}, {"end": 726.2, "start": 726.0, "text": "have"}, {"end": 726.48, "start": 726.2, "text": "to"}, {"end": 727.08, "start": 726.48, "text": "address"}, {"end": 727.28, "start": 727.08, "text": "in"}, {"end": 727.56, "start": 727.28, "text": "building"}, {"end": 727.8, "start": 727.56, "text": "this"}, {"end": 728.8, "start": 727.8, "text": "recommender."}, {"end": 729.76, "start": 728.8, "text": "So"}, {"end": 729.84, "start": 729.76, "text": "the"}, {"end": 730.56, "start": 729.84, "text": "recommender"}, {"end": 730.8, "start": 730.56, "text": "has"}, {"end": 731.04, "start": 730.8, "text": "to"}, {"end": 731.88, "start": 731.04, "text": "accommodate"}, {"end": 732.24, "start": 731.88, "text": "billions"}, {"end": 732.4, "start": 732.24, "text": "of"}, {"end": 733.36, "start": 732.4, "text": "users"}, {"end": 733.84, "start": 733.36, "text": "whose"}, {"end": 735.0, "start": 733.84, "text": "interests"}, {"end": 735.28, "start": 735.0, "text": "are"}, {"end": 736.08, "start": 735.28, "text": "constantly"}, {"end": 736.28, "start": 736.08, "text": "shifting"}, {"end": 736.48, "start": 736.28, "text": "as"}, {"end": 736.76, "start": 736.48, "text": "they"}, {"end": 737.24, "start": 736.76, "text": "interact"}, {"end": 737.6, "start": 737.24, "text": "with"}, {"end": 737.92, "start": 737.6, "text": "the"}, {"end": 738.92, "start": 737.92, "text": "system."}, {"end": 740.44, "start": 738.92, "text": "And"}, {"end": 741.24, "start": 740.44, "text": "actually"}, {"end": 741.48, "start": 741.24, "text": "the"}, {"end": 742.0, "start": 741.48, "text": "platform"}, {"end": 742.36, "start": 742.0, "text": "hosts"}, {"end": 742.92, "start": 742.36, "text": "billions"}, {"end": 743.36, "start": 742.92, "text": "of"}, {"end": 743.96, "start": 743.36, "text": "videos"}, {"end": 744.32, "start": 743.96, "text": "which"}, {"end": 744.68, "start": 744.32, "text": "follows"}, {"end": 744.84, "start": 744.68, "text": "a"}, {"end": 745.04, "start": 744.84, "text": "long"}, {"end": 745.28, "start": 745.04, "text": "tail"}, {"end": 746.24, "start": 745.28, "text": "distribution,"}, {"end": 746.4, "start": 746.24, "text": "which"}, {"end": 746.68, "start": 746.4, "text": "means"}, {"end": 747.08, "start": 746.68, "text": "there"}, {"end": 748.04, "start": 747.08, "text": "are"}, {"end": 748.16, "start": 748.04, "text": "a"}, {"end": 748.56, "start": 748.16, "text": "lot"}, {"end": 748.68, "start": 748.56, "text": "of"}, {"end": 749.44, "start": 748.68, "text": "videos"}, {"end": 749.6, "start": 749.44, "text": "which"}, {"end": 749.96, "start": 749.6, "text": "are"}], "text": " under most relevant to pass to the second stage. So there are a few challenges we have to address in building this recommender. So the recommender has to accommodate billions of users whose interests are constantly shifting as they interact with the system. And actually the platform hosts billions of videos which follows a long tail distribution, which means there are a lot of videos which are"}, {"chunks": [{"end": 750.36, "start": 750.0, "text": "not"}, {"end": 751.16, "start": 750.36, "text": "receiving"}, {"end": 752.48, "start": 751.16, "text": "tons"}, {"end": 752.76, "start": 752.48, "text": "of"}, {"end": 753.24, "start": 752.76, "text": "views,"}, {"end": 753.24, "start": 753.24, "text": "but"}, {"end": 753.24, "start": 753.24, "text": "then"}, {"end": 753.72, "start": 753.24, "text": "they"}, {"end": 754.16, "start": 753.72, "text": "are"}, {"end": 754.36, "start": 754.16, "text": "still"}, {"end": 754.56, "start": 754.36, "text": "very"}, {"end": 755.44, "start": 754.56, "text": "relevant"}, {"end": 755.48, "start": 755.44, "text": "to"}, {"end": 755.52, "start": 755.48, "text": "a"}, {"end": 756.36, "start": 755.52, "text": "small"}, {"end": 756.92, "start": 756.36, "text": "group"}, {"end": 757.16, "start": 756.92, "text": "of"}, {"end": 757.84, "start": 757.16, "text": "users"}, {"end": 758.12, "start": 757.84, "text": "and"}, {"end": 758.8, "start": 758.12, "text": "we"}, {"end": 759.0, "start": 758.8, "text": "want"}, {"end": 759.64, "start": 759.0, "text": "to"}, {"end": 759.88, "start": 759.64, "text": "be"}, {"end": 760.64, "start": 759.88, "text": "able"}, {"end": 761.28, "start": 760.64, "text": "to"}, {"end": 761.72, "start": 761.28, "text": "recommend"}, {"end": 762.56, "start": 761.72, "text": "them."}, {"end": 762.88, "start": 762.56, "text": "And"}, {"end": 763.28, "start": 762.88, "text": "a"}, {"end": 764.64, "start": 763.28, "text": "combination"}, {"end": 765.36, "start": 764.64, "text": "of"}, {"end": 767.08, "start": 765.36, "text": "both"}, {"end": 768.16, "start": 767.08, "text": "means"}, {"end": 768.64, "start": 768.16, "text": "that"}, {"end": 769.0, "start": 768.64, "text": "we'll"}, {"end": 769.48, "start": 769.0, "text": "have"}, {"end": 769.76, "start": 769.48, "text": "a"}, {"end": 770.36, "start": 769.76, "text": "sparse"}, {"end": 771.44, "start": 770.36, "text": "and"}, {"end": 771.88, "start": 771.44, "text": "noisy"}, {"end": 772.36, "start": 771.88, "text": "user"}, {"end": 772.52, "start": 772.36, "text": "feedback."}, {"end": 773.12, "start": 772.52, "text": "So"}, {"end": 773.84, "start": 773.12, "text": "how"}, {"end": 774.16, "start": 773.84, "text": "do"}, {"end": 774.2, "start": 774.16, "text": "we"}, {"end": 774.64, "start": 774.2, "text": "convert"}, {"end": 774.92, "start": 774.64, "text": "this"}, {"end": 776.44, "start": 774.92, "text": "candidate"}, {"end": 777.2, "start": 776.44, "text": "generator"}, {"end": 778.2, "start": 777.2, "text": "recommender"}, {"end": 778.52, "start": 778.2, "text": "into"}, {"end": 778.72, "start": 778.52, "text": "a"}, {"end": 779.56, "start": 778.72, "text": "reinforcement"}, {"end": 779.88, "start": 779.56, "text": "learning"}, {"end": 780.0, "start": 779.88, "text": "one?"}], "text": " not receiving tons of views, but then they are still very relevant to a small group of users and we want to be able to recommend them. And a combination of both means that we'll have a sparse and noisy user feedback. So how do we convert this candidate generator recommender into a reinforcement learning one?"}, {"chunks": [{"end": 780.8, "start": 780.0, "text": "actions"}, {"end": 781.12, "start": 780.8, "text": "in"}, {"end": 781.4, "start": 781.12, "text": "some"}, {"end": 781.96, "start": 781.4, "text": "environments"}, {"end": 782.16, "start": 781.96, "text": "so"}, {"end": 782.4, "start": 782.16, "text": "as"}, {"end": 782.68, "start": 782.4, "text": "to"}, {"end": 783.8, "start": 782.68, "text": "maximize"}, {"end": 784.04, "start": 783.8, "text": "some"}, {"end": 784.4, "start": 784.04, "text": "notion"}, {"end": 784.8, "start": 784.4, "text": "of"}, {"end": 785.6, "start": 784.8, "text": "cumulative"}, {"end": 788.08, "start": 785.6, "text": "reward."}, {"end": 788.56, "start": 788.08, "text": "In"}, {"end": 788.88, "start": 788.56, "text": "this"}, {"end": 789.44, "start": 788.88, "text": "case,"}, {"end": 789.64, "start": 789.44, "text": "the"}, {"end": 790.0, "start": 789.64, "text": "agent"}, {"end": 790.2, "start": 790.0, "text": "we"}, {"end": 790.32, "start": 790.2, "text": "are"}, {"end": 790.76, "start": 790.32, "text": "trying"}, {"end": 790.8, "start": 790.76, "text": "to"}, {"end": 791.16, "start": 790.8, "text": "build"}, {"end": 791.52, "start": 791.16, "text": "is"}, {"end": 791.76, "start": 791.52, "text": "this"}, {"end": 792.24, "start": 791.76, "text": "candidate"}, {"end": 794.08, "start": 792.24, "text": "generator"}, {"end": 794.44, "start": 794.08, "text": "and"}, {"end": 794.72, "start": 794.44, "text": "the"}, {"end": 795.48, "start": 794.72, "text": "environment"}, {"end": 795.96, "start": 795.48, "text": "which"}, {"end": 796.32, "start": 795.96, "text": "is"}, {"end": 797.6, "start": 796.32, "text": "captured"}, {"end": 798.24, "start": 797.6, "text": "by"}, {"end": 798.96, "start": 798.24, "text": "the"}, {"end": 799.08, "start": 798.96, "text": "state,"}, {"end": 799.64, "start": 799.08, "text": "the"}, {"end": 799.92, "start": 799.64, "text": "state"}, {"end": 800.68, "start": 799.92, "text": "transition"}, {"end": 801.12, "start": 800.68, "text": "and"}, {"end": 801.48, "start": 801.12, "text": "a"}, {"end": 801.92, "start": 801.48, "text": "reward"}, {"end": 803.04, "start": 801.92, "text": "function."}, {"end": 803.2, "start": 803.04, "text": "So"}, {"end": 803.44, "start": 803.2, "text": "the"}, {"end": 803.92, "start": 803.44, "text": "state"}, {"end": 804.16, "start": 803.92, "text": "here"}, {"end": 804.36, "start": 804.16, "text": "should"}, {"end": 804.72, "start": 804.36, "text": "basically"}, {"end": 805.4, "start": 804.72, "text": "capture"}, {"end": 805.64, "start": 805.4, "text": "the"}, {"end": 806.12, "start": 805.64, "text": "user's"}, {"end": 806.68, "start": 806.12, "text": "interest"}, {"end": 807.16, "start": 806.68, "text": "as"}, {"end": 807.52, "start": 807.16, "text": "well"}, {"end": 808.0, "start": 807.52, "text": "as"}, {"end": 808.36, "start": 808.0, "text": "the"}, {"end": 808.88, "start": 808.36, "text": "recommendation"}, {"end": 809.96, "start": 808.88, "text": "context."}], "text": " actions in some environments so as to maximize some notion of cumulative reward. In this case, the agent we are trying to build is this candidate generator and the environment which is captured by the state, the state transition and a reward function. So the state here should basically capture the user's interest as well as the recommendation context."}, {"chunks": [{"end": 810.56, "start": 810.0, "text": "And"}, {"end": 811.0, "start": 810.56, "text": "the"}, {"end": 811.72, "start": 811.0, "text": "reward"}, {"end": 811.96, "start": 811.72, "text": "should"}, {"end": 812.96, "start": 811.96, "text": "capture"}, {"end": 813.16, "start": 812.96, "text": "a"}, {"end": 813.68, "start": 813.16, "text": "user's"}, {"end": 814.84, "start": 813.68, "text": "long-term"}, {"end": 815.8, "start": 814.84, "text": "engagement"}, {"end": 816.0, "start": 815.8, "text": "or"}, {"end": 817.08, "start": 816.0, "text": "satisfaction"}, {"end": 817.36, "start": 817.08, "text": "with"}, {"end": 817.56, "start": 817.36, "text": "the"}, {"end": 817.92, "start": 817.56, "text": "platform"}, {"end": 818.32, "start": 817.92, "text": "or"}, {"end": 818.48, "start": 818.32, "text": "the"}, {"end": 820.92, "start": 818.48, "text": "recommendations."}, {"end": 821.32, "start": 820.92, "text": "And"}, {"end": 821.72, "start": 821.32, "text": "the"}, {"end": 822.4, "start": 821.72, "text": "action"}, {"end": 822.44, "start": 822.4, "text": "the"}, {"end": 822.72, "start": 822.44, "text": "agent"}, {"end": 822.92, "start": 822.72, "text": "can"}, {"end": 823.4, "start": 822.92, "text": "take"}, {"end": 823.68, "start": 823.4, "text": "is"}, {"end": 823.96, "start": 823.68, "text": "to"}, {"end": 825.0, "start": 823.96, "text": "nominate"}, {"end": 825.64, "start": 825.0, "text": "videos"}, {"end": 825.92, "start": 825.64, "text": "for"}, {"end": 825.96, "start": 825.92, "text": "a"}, {"end": 826.52, "start": 825.96, "text": "catalog"}, {"end": 826.92, "start": 826.52, "text": "of"}, {"end": 830.8, "start": 826.92, "text": "millions."}, {"end": 831.4, "start": 830.8, "text": "So"}, {"end": 831.4, "start": 831.4, "text": "in"}, {"end": 831.48, "start": 831.4, "text": "the"}, {"end": 832.04, "start": 831.48, "text": "following"}, {"end": 832.56, "start": 832.04, "text": "slides,"}, {"end": 833.04, "start": 832.56, "text": "I'm"}, {"end": 833.32, "start": 833.04, "text": "going"}, {"end": 833.32, "start": 833.32, "text": "to"}, {"end": 833.68, "start": 833.32, "text": "try"}, {"end": 833.96, "start": 833.68, "text": "to"}, {"end": 834.52, "start": 833.96, "text": "walk"}, {"end": 834.56, "start": 834.52, "text": "you"}, {"end": 834.84, "start": 834.56, "text": "through"}, {"end": 835.16, "start": 834.84, "text": "on"}, {"end": 835.8, "start": 835.16, "text": "how"}, {"end": 836.36, "start": 835.8, "text": "we"}, {"end": 837.0, "start": 836.36, "text": "build"}, {"end": 837.4, "start": 837.0, "text": "the"}, {"end": 837.76, "start": 837.4, "text": "state"}, {"end": 838.76, "start": 837.76, "text": "representation,"}, {"end": 839.08, "start": 838.76, "text": "how"}, {"end": 839.2, "start": 839.08, "text": "we"}, {"end": 839.4, "start": 839.2, "text": "come"}, {"end": 839.52, "start": 839.4, "text": "up"}, {"end": 839.52, "start": 839.52, "text": "with"}, {"end": 839.52, "start": 839.52, "text": "a"}, {"end": 840.0, "start": 839.52, "text": "reward."}], "text": " And the reward should capture a user's long-term engagement or satisfaction with the platform or the recommendations. And the action the agent can take is to nominate videos for a catalog of millions. So in the following slides, I'm going to try to walk you through on how we build the state representation, how we come up with a reward."}, {"chunks": [{"end": 840.32, "start": 840.0, "text": "and"}, {"end": 840.84, "start": 840.32, "text": "how"}, {"end": 840.84, "start": 840.84, "text": "we"}, {"end": 841.64, "start": 840.84, "text": "choose"}, {"end": 843.8, "start": 841.64, "text": "actions."}, {"end": 844.12, "start": 843.8, "text": "So"}, {"end": 844.28, "start": 844.12, "text": "the"}, {"end": 844.64, "start": 844.28, "text": "data"}, {"end": 844.96, "start": 844.64, "text": "source"}, {"end": 845.2, "start": 844.96, "text": "we"}, {"end": 845.52, "start": 845.2, "text": "use"}, {"end": 845.8, "start": 845.52, "text": "to"}, {"end": 846.0, "start": 845.8, "text": "build"}, {"end": 846.28, "start": 846.0, "text": "this"}, {"end": 847.28, "start": 846.28, "text": "agent"}, {"end": 847.56, "start": 847.28, "text": "is"}, {"end": 847.8, "start": 847.56, "text": "the"}, {"end": 848.12, "start": 847.8, "text": "user"}, {"end": 849.32, "start": 848.12, "text": "trajectory."}, {"end": 850.24, "start": 849.32, "text": "So"}, {"end": 850.48, "start": 850.24, "text": "for"}, {"end": 850.88, "start": 850.48, "text": "each"}, {"end": 851.52, "start": 850.88, "text": "user,"}, {"end": 852.24, "start": 851.52, "text": "before"}, {"end": 852.96, "start": 852.24, "text": "our"}, {"end": 853.52, "start": 852.96, "text": "recommendations,"}, {"end": 853.72, "start": 853.52, "text": "we"}, {"end": 854.2, "start": 853.72, "text": "have"}, {"end": 854.84, "start": 854.2, "text": "access"}, {"end": 855.6, "start": 854.84, "text": "to"}, {"end": 855.96, "start": 855.6, "text": "a"}, {"end": 856.64, "start": 855.96, "text": "sequence"}, {"end": 856.96, "start": 856.64, "text": "of"}, {"end": 857.36, "start": 856.96, "text": "user"}, {"end": 858.04, "start": 857.36, "text": "activities"}, {"end": 858.24, "start": 858.04, "text": "on"}, {"end": 858.32, "start": 858.24, "text": "the"}, {"end": 859.16, "start": 858.32, "text": "platform,"}, {"end": 859.32, "start": 859.16, "text": "which"}, {"end": 859.84, "start": 859.32, "text": "videos"}, {"end": 860.08, "start": 859.84, "text": "they"}, {"end": 860.6, "start": 860.08, "text": "watched,"}, {"end": 860.8, "start": 860.6, "text": "which"}, {"end": 861.64, "start": 860.8, "text": "one"}, {"end": 862.16, "start": 861.64, "text": "they"}, {"end": 862.68, "start": 862.16, "text": "liked,"}, {"end": 863.4, "start": 862.68, "text": "what"}, {"end": 864.04, "start": 863.4, "text": "their"}, {"end": 864.32, "start": 864.04, "text": "search"}, {"end": 864.96, "start": 864.32, "text": "query"}, {"end": 865.12, "start": 864.96, "text": "looks"}, {"end": 866.68, "start": 865.12, "text": "like."}, {"end": 867.16, "start": 866.68, "text": "And"}, {"end": 867.2, "start": 867.16, "text": "we"}, {"end": 868.04, "start": 867.2, "text": "call"}, {"end": 868.24, "start": 868.04, "text": "this"}, {"end": 868.8, "start": 868.24, "text": "part"}, {"end": 869.24, "start": 868.8, "text": "of"}, {"end": 869.24, "start": 869.24, "text": "the"}, {"end": 869.68, "start": 869.24, "text": "search"}, {"end": 869.96, "start": 869.68, "text": "query."}], "text": " and how we choose actions. So the data source we use to build this agent is the user trajectory. So for each user, before our recommendations, we have access to a sequence of user activities on the platform, which videos they watched, which one they liked, what their search query looks like. And we call this part of the search query."}, {"chunks": [{"end": 870.52, "start": 870.0, "text": "trajectory"}, {"end": 871.12, "start": 870.52, "text": "the"}, {"end": 871.72, "start": 871.12, "text": "sequential"}, {"end": 873.32, "start": 871.72, "text": "past"}, {"end": 873.68, "start": 873.32, "text": "and"}, {"end": 874.36, "start": 873.68, "text": "after"}, {"end": 874.64, "start": 874.36, "text": "our"}, {"end": 874.76, "start": 874.64, "text": "recommendation"}, {"end": 875.16, "start": 874.76, "text": "we"}, {"end": 875.52, "start": 875.16, "text": "also"}, {"end": 875.76, "start": 875.52, "text": "have"}, {"end": 876.44, "start": 875.76, "text": "access"}, {"end": 877.2, "start": 876.44, "text": "to"}, {"end": 877.68, "start": 877.2, "text": "the"}, {"end": 878.2, "start": 877.68, "text": "users"}, {"end": 879.08, "start": 878.2, "text": "feedback"}, {"end": 879.16, "start": 879.08, "text": "do"}, {"end": 879.2, "start": 879.16, "text": "they"}, {"end": 879.44, "start": 879.2, "text": "like"}, {"end": 880.08, "start": 879.44, "text": "our"}, {"end": 881.76, "start": 880.08, "text": "recommendations"}, {"end": 882.04, "start": 881.76, "text": "which"}, {"end": 882.2, "start": 882.04, "text": "video"}, {"end": 882.44, "start": 882.2, "text": "they"}, {"end": 883.04, "start": 882.44, "text": "watched"}, {"end": 884.2, "start": 883.04, "text": "after"}, {"end": 884.68, "start": 884.2, "text": "which"}, {"end": 885.16, "start": 884.68, "text": "videos"}, {"end": 885.36, "start": 885.16, "text": "they"}, {"end": 885.8, "start": 885.36, "text": "like"}, {"end": 886.36, "start": 885.8, "text": "what's"}, {"end": 886.68, "start": 886.36, "text": "their"}, {"end": 887.84, "start": 886.68, "text": "comment"}, {"end": 888.08, "start": 887.84, "text": "and"}, {"end": 888.32, "start": 888.08, "text": "these"}, {"end": 888.48, "start": 888.32, "text": "we"}, {"end": 888.96, "start": 888.48, "text": "call"}, {"end": 889.2, "start": 888.96, "text": "the"}, {"end": 889.6, "start": 889.2, "text": "sequential"}, {"end": 890.04, "start": 889.6, "text": "future"}, {"end": 890.76, "start": 890.04, "text": "and"}, {"end": 890.92, "start": 890.76, "text": "in"}, {"end": 891.44, "start": 890.92, "text": "the"}, {"end": 892.56, "start": 891.44, "text": "following"}, {"end": 893.2, "start": 892.56, "text": "I'm"}, {"end": 894.52, "start": 893.2, "text": "going"}, {"end": 894.88, "start": 894.52, "text": "to"}, {"end": 895.48, "start": 894.88, "text": "discuss"}, {"end": 895.64, "start": 895.48, "text": "so"}, {"end": 896.08, "start": 895.64, "text": "how"}, {"end": 896.2, "start": 896.08, "text": "we"}, {"end": 896.4, "start": 896.2, "text": "use"}, {"end": 896.56, "start": 896.4, "text": "the"}, {"end": 897.04, "start": 896.56, "text": "sequential"}, {"end": 897.56, "start": 897.04, "text": "past"}, {"end": 897.72, "start": 897.56, "text": "to"}, {"end": 898.0, "start": 897.72, "text": "come"}, {"end": 898.2, "start": 898.0, "text": "up"}, {"end": 898.6, "start": 898.2, "text": "with"}, {"end": 899.16, "start": 898.6, "text": "our"}, {"end": 899.52, "start": 899.16, "text": "belief"}, {"end": 899.8, "start": 899.52, "text": "of"}, {"end": 900.0, "start": 899.8, "text": "the"}], "text": " trajectory the sequential past and after our recommendation we also have access to the users feedback do they like our recommendations which video they watched after which videos they like what's their comment and these we call the sequential future and in the following I'm going to discuss so how we use the sequential past to come up with our belief of the"}, {"chunks": [{"end": 900.36, "start": 900.0, "text": "user"}, {"end": 900.8, "start": 900.36, "text": "state"}, {"end": 901.2, "start": 900.8, "text": "and"}, {"end": 901.72, "start": 901.2, "text": "how"}, {"end": 901.8, "start": 901.72, "text": "we"}, {"end": 901.96, "start": 901.8, "text": "use"}, {"end": 902.08, "start": 901.96, "text": "the"}, {"end": 902.68, "start": 902.08, "text": "sequential"}, {"end": 903.24, "start": 902.68, "text": "future"}, {"end": 903.6, "start": 903.24, "text": "to"}, {"end": 903.92, "start": 903.6, "text": "come"}, {"end": 903.92, "start": 903.92, "text": "up"}, {"end": 903.96, "start": 903.92, "text": "with"}, {"end": 904.36, "start": 903.96, "text": "the"}, {"end": 905.32, "start": 904.36, "text": "reward."}, {"end": 905.84, "start": 905.32, "text": "So"}, {"end": 908.24, "start": 905.84, "text": "in"}, {"end": 909.52, "start": 908.24, "text": "building"}, {"end": 909.88, "start": 909.52, "text": "up"}, {"end": 910.16, "start": 909.88, "text": "the"}, {"end": 910.52, "start": 910.16, "text": "stage"}, {"end": 911.16, "start": 910.52, "text": "representation,"}, {"end": 911.44, "start": 911.16, "text": "as"}, {"end": 911.84, "start": 911.44, "text": "said,"}, {"end": 912.08, "start": 911.84, "text": "one"}, {"end": 912.28, "start": 912.08, "text": "of"}, {"end": 912.48, "start": 912.28, "text": "the"}, {"end": 913.16, "start": 912.48, "text": "challenges"}, {"end": 913.16, "start": 913.16, "text": "is"}, {"end": 913.32, "start": 913.16, "text": "the"}, {"end": 914.0, "start": 913.32, "text": "partial"}, {"end": 914.88, "start": 914.0, "text": "observability."}, {"end": 915.04, "start": 914.88, "text": "So"}, {"end": 915.32, "start": 915.04, "text": "the"}, {"end": 915.76, "start": 915.32, "text": "user"}, {"end": 915.92, "start": 915.76, "text": "do"}, {"end": 916.36, "start": 915.92, "text": "not"}, {"end": 916.68, "start": 916.36, "text": "tell"}, {"end": 916.96, "start": 916.68, "text": "us"}, {"end": 917.6, "start": 916.96, "text": "about"}, {"end": 918.32, "start": 917.6, "text": "their"}, {"end": 918.76, "start": 918.32, "text": "interest"}, {"end": 919.24, "start": 918.76, "text": "groups"}, {"end": 919.24, "start": 919.24, "text": "or"}, {"end": 919.24, "start": 919.24, "text": "how"}, {"end": 919.24, "start": 919.24, "text": "happy"}, {"end": 919.56, "start": 919.24, "text": "they"}, {"end": 920.28, "start": 919.56, "text": "are"}, {"end": 920.32, "start": 920.28, "text": "with"}, {"end": 920.92, "start": 920.32, "text": "our"}, {"end": 922.32, "start": 920.92, "text": "recommendations."}, {"end": 922.88, "start": 922.32, "text": "What"}, {"end": 923.2, "start": 922.88, "text": "we"}, {"end": 923.44, "start": 923.2, "text": "did"}, {"end": 923.72, "start": 923.44, "text": "is"}, {"end": 924.08, "start": 923.72, "text": "that"}, {"end": 924.56, "start": 924.08, "text": "we"}, {"end": 925.08, "start": 924.56, "text": "based"}, {"end": 925.28, "start": 925.08, "text": "on"}, {"end": 925.32, "start": 925.28, "text": "the"}, {"end": 925.92, "start": 925.32, "text": "user"}, {"end": 926.96, "start": 925.92, "text": "activity"}, {"end": 927.4, "start": 926.96, "text": "on"}, {"end": 927.76, "start": 927.4, "text": "the"}, {"end": 928.16, "start": 927.76, "text": "platform"}, {"end": 928.68, "start": 928.16, "text": "and"}, {"end": 929.08, "start": 928.68, "text": "try"}, {"end": 929.28, "start": 929.08, "text": "to"}, {"end": 929.6, "start": 929.28, "text": "figure"}, {"end": 929.96, "start": 929.6, "text": "out"}], "text": " user state and how we use the sequential future to come up with the reward. So in building up the stage representation, as said, one of the challenges is the partial observability. So the user do not tell us about their interest groups or how happy they are with our recommendations. What we did is that we based on the user activity on the platform and try to figure out"}, {"chunks": [{"end": 930.16, "start": 930.0, "text": "figure"}, {"end": 930.72, "start": 930.16, "text": "out"}, {"end": 931.88, "start": 930.72, "text": "what's"}, {"end": 933.12, "start": 931.88, "text": "their"}, {"end": 933.68, "start": 933.12, "text": "interest."}, {"end": 933.76, "start": 933.68, "text": "So"}, {"end": 933.8, "start": 933.76, "text": "what"}, {"end": 934.24, "start": 933.8, "text": "we"}, {"end": 934.76, "start": 934.24, "text": "did"}, {"end": 935.48, "start": 934.76, "text": "was"}, {"end": 935.76, "start": 935.48, "text": "we"}, {"end": 936.32, "start": 935.76, "text": "take"}, {"end": 937.16, "start": 936.32, "text": "the"}, {"end": 938.28, "start": 937.16, "text": "user"}, {"end": 939.24, "start": 938.28, "text": "watches"}, {"end": 939.64, "start": 939.24, "text": "that's"}, {"end": 939.96, "start": 939.64, "text": "right"}, {"end": 940.24, "start": 939.96, "text": "before"}, {"end": 940.88, "start": 940.24, "text": "our"}, {"end": 941.68, "start": 940.88, "text": "recommendation,"}, {"end": 942.16, "start": 941.68, "text": "we"}, {"end": 942.76, "start": 942.16, "text": "send"}, {"end": 943.12, "start": 942.76, "text": "them"}, {"end": 943.12, "start": 943.12, "text": "through"}, {"end": 943.12, "start": 943.12, "text": "a"}, {"end": 943.92, "start": 943.12, "text": "recurring"}, {"end": 944.0, "start": 943.92, "text": "neural"}, {"end": 944.48, "start": 944.0, "text": "networks,"}, {"end": 944.8, "start": 944.48, "text": "and"}, {"end": 945.12, "start": 944.8, "text": "the"}, {"end": 945.72, "start": 945.12, "text": "RNN"}, {"end": 945.88, "start": 945.72, "text": "is"}, {"end": 946.16, "start": 945.88, "text": "going"}, {"end": 946.4, "start": 946.16, "text": "to"}, {"end": 946.96, "start": 946.4, "text": "aggregate"}, {"end": 947.32, "start": 946.96, "text": "these"}, {"end": 947.96, "start": 947.32, "text": "events"}, {"end": 948.2, "start": 947.96, "text": "and"}, {"end": 948.64, "start": 948.2, "text": "produce"}, {"end": 948.8, "start": 948.64, "text": "a"}, {"end": 949.08, "start": 948.8, "text": "dense"}, {"end": 949.88, "start": 949.08, "text": "vector,"}, {"end": 950.04, "start": 949.88, "text": "which"}, {"end": 950.4, "start": 950.04, "text": "is"}, {"end": 950.88, "start": 950.4, "text": "our"}, {"end": 951.32, "start": 950.88, "text": "representation"}, {"end": 951.68, "start": 951.32, "text": "of"}, {"end": 952.56, "start": 951.68, "text": "the"}, {"end": 953.4, "start": 952.56, "text": "user"}, {"end": 953.64, "start": 953.4, "text": "state."}, {"end": 953.88, "start": 953.64, "text": "One"}, {"end": 953.88, "start": 953.88, "text": "thing"}, {"end": 956.0, "start": 953.88, "text": "that's"}, {"end": 956.68, "start": 956.0, "text": "worth"}, {"end": 957.2, "start": 956.68, "text": "noting"}, {"end": 957.44, "start": 957.2, "text": "is"}, {"end": 957.84, "start": 957.44, "text": "that"}, {"end": 958.44, "start": 957.84, "text": "actually"}, {"end": 958.64, "start": 958.44, "text": "the"}, {"end": 959.24, "start": 958.64, "text": "context"}, {"end": 959.36, "start": 959.24, "text": "of"}, {"end": 959.72, "start": 959.36, "text": "the"}, {"end": 959.96, "start": 959.72, "text": "recommendation"}], "text": " figure out what's their interest. So what we did was we take the user watches that's right before our recommendation, we send them through a recurring neural networks, and the RNN is going to aggregate these events and produce a dense vector, which is our representation of the user state. One thing that's worth noting is that actually the context of the recommendation"}, {"chunks": [{"end": 960.72, "start": 960.0, "text": "matters"}, {"end": 960.92, "start": 960.72, "text": "a"}, {"end": 961.72, "start": 960.92, "text": "lot."}, {"end": 961.92, "start": 961.72, "text": "So"}, {"end": 962.2, "start": 961.92, "text": "the"}, {"end": 962.72, "start": 962.2, "text": "context"}, {"end": 962.96, "start": 962.72, "text": "of"}, {"end": 963.84, "start": 962.96, "text": "historical"}, {"end": 964.84, "start": 963.84, "text": "events"}, {"end": 965.08, "start": 964.84, "text": "is"}, {"end": 965.24, "start": 965.08, "text": "going"}, {"end": 965.44, "start": 965.24, "text": "to"}, {"end": 966.28, "start": 965.44, "text": "influence"}, {"end": 967.96, "start": 966.28, "text": "how"}, {"end": 968.84, "start": 967.96, "text": "much"}, {"end": 968.88, "start": 968.84, "text": "they"}, {"end": 969.12, "start": 968.88, "text": "will"}, {"end": 970.0, "start": 969.12, "text": "change"}, {"end": 970.24, "start": 970.0, "text": "the"}, {"end": 970.84, "start": 970.24, "text": "underlying"}, {"end": 971.16, "start": 970.84, "text": "user"}, {"end": 971.96, "start": 971.16, "text": "state"}, {"end": 972.48, "start": 971.96, "text": "for"}, {"end": 972.68, "start": 972.48, "text": "each"}, {"end": 973.92, "start": 972.68, "text": "event."}, {"end": 974.68, "start": 973.92, "text": "And"}, {"end": 975.52, "start": 974.68, "text": "the"}, {"end": 975.96, "start": 975.52, "text": "request"}, {"end": 976.6, "start": 975.96, "text": "context"}, {"end": 976.8, "start": 976.6, "text": "is"}, {"end": 977.24, "start": 976.8, "text": "also"}, {"end": 977.44, "start": 977.24, "text": "going"}, {"end": 977.56, "start": 977.44, "text": "to"}, {"end": 978.12, "start": 977.56, "text": "change"}, {"end": 978.72, "start": 978.12, "text": "our"}, {"end": 979.08, "start": 978.72, "text": "recommendation"}, {"end": 979.48, "start": 979.08, "text": "significantly."}, {"end": 980.32, "start": 979.48, "text": "For"}, {"end": 981.12, "start": 980.32, "text": "example,"}, {"end": 981.36, "start": 981.12, "text": "we"}, {"end": 981.84, "start": 981.36, "text": "would"}, {"end": 983.28, "start": 981.84, "text": "probably"}, {"end": 983.52, "start": 983.28, "text": "want"}, {"end": 983.68, "start": 983.52, "text": "to"}, {"end": 984.4, "start": 983.68, "text": "recommend"}, {"end": 984.64, "start": 984.4, "text": "very"}, {"end": 984.64, "start": 984.64, "text": "different"}, {"end": 985.16, "start": 984.64, "text": "videos"}, {"end": 985.52, "start": 985.16, "text": "when"}, {"end": 985.56, "start": 985.52, "text": "the"}, {"end": 985.88, "start": 985.56, "text": "user"}, {"end": 986.48, "start": 985.88, "text": "are"}, {"end": 987.2, "start": 986.48, "text": "browsing"}, {"end": 987.56, "start": 987.2, "text": "on"}, {"end": 988.32, "start": 987.56, "text": "desktop"}, {"end": 988.92, "start": 988.32, "text": "or"}, {"end": 990.0, "start": 988.92, "text": "mobile."}], "text": " matters a lot. So the context of historical events is going to influence how much they will change the underlying user state for each event. And the request context is also going to change our recommendation significantly. For example, we would probably want to recommend very different videos when the user are browsing on desktop or mobile."}, {"chunks": [{"end": 993.84, "start": 990.0, "text": "mobile"}, {"end": 994.32, "start": 993.84, "text": "phones."}, {"end": 994.56, "start": 994.32, "text": "So"}, {"end": 994.8, "start": 994.56, "text": "we"}, {"end": 995.24, "start": 994.8, "text": "tested"}, {"end": 995.76, "start": 995.24, "text": "these"}, {"end": 996.32, "start": 995.76, "text": "ideas"}, {"end": 996.56, "start": 996.32, "text": "in"}, {"end": 997.44, "start": 996.56, "text": "multiple"}, {"end": 998.04, "start": 997.44, "text": "live"}, {"end": 999.24, "start": 998.04, "text": "experiments"}, {"end": 999.8, "start": 999.24, "text": "and"}, {"end": 1000.64, "start": 999.8, "text": "observed"}, {"end": 1000.88, "start": 1000.64, "text": "very"}, {"end": 1002.12, "start": 1000.88, "text": "significant"}, {"end": 1002.64, "start": 1002.12, "text": "improve"}, {"end": 1003.32, "start": 1002.64, "text": "on"}, {"end": 1003.68, "start": 1003.32, "text": "the"}, {"end": 1004.12, "start": 1003.68, "text": "main"}, {"end": 1004.76, "start": 1004.12, "text": "online"}, {"end": 1007.64, "start": 1004.76, "text": "matrix."}, {"end": 1008.04, "start": 1007.64, "text": "Now"}, {"end": 1008.52, "start": 1008.04, "text": "let"}, {"end": 1008.68, "start": 1008.52, "text": "me"}, {"end": 1010.32, "start": 1008.68, "text": "talk"}, {"end": 1011.56, "start": 1010.32, "text": "about"}, {"end": 1011.92, "start": 1011.56, "text": "the"}, {"end": 1012.48, "start": 1011.92, "text": "reward"}, {"end": 1014.12, "start": 1012.48, "text": "part."}, {"end": 1014.68, "start": 1014.12, "text": "So"}, {"end": 1015.32, "start": 1014.68, "text": "one"}, {"end": 1015.76, "start": 1015.32, "text": "of"}, {"end": 1016.24, "start": 1015.76, "text": "the,"}, {"end": 1016.4, "start": 1016.24, "text": "so"}, {"end": 1016.72, "start": 1016.4, "text": "to"}, {"end": 1017.28, "start": 1016.72, "text": "address"}, {"end": 1017.6, "start": 1017.28, "text": "issue"}, {"end": 1017.92, "start": 1017.6, "text": "of"}, {"end": 1018.56, "start": 1017.92, "text": "the"}, {"end": 1019.2, "start": 1018.56, "text": "supervised"}, {"end": 1019.6, "start": 1019.2, "text": "learning"}, {"end": 1019.96, "start": 1019.6, "text": "approach,"}], "text": " mobile phones. So we tested these ideas in multiple live experiments and observed very significant improve on the main online matrix. Now let me talk about the reward part. So one of the, so to address issue of the supervised learning approach,"}, {"chunks": [{"end": 1020.24, "start": 1020.0, "text": "where"}, {"end": 1020.44, "start": 1020.24, "text": "you"}, {"end": 1020.84, "start": 1020.44, "text": "only"}, {"end": 1021.24, "start": 1020.84, "text": "care"}, {"end": 1021.72, "start": 1021.24, "text": "about"}, {"end": 1022.4, "start": 1021.72, "text": "immediate"}, {"end": 1023.28, "start": 1022.4, "text": "response."}, {"end": 1023.76, "start": 1023.28, "text": "Here,"}, {"end": 1024.04, "start": 1023.76, "text": "we"}, {"end": 1024.76, "start": 1024.04, "text": "want"}, {"end": 1024.88, "start": 1024.76, "text": "to"}, {"end": 1025.88, "start": 1024.88, "text": "aggregate"}, {"end": 1026.4, "start": 1025.88, "text": "all"}, {"end": 1027.04, "start": 1026.4, "text": "the"}, {"end": 1027.84, "start": 1027.04, "text": "rewards"}, {"end": 1028.16, "start": 1027.84, "text": "that's"}, {"end": 1029.2, "start": 1028.16, "text": "coming"}, {"end": 1030.16, "start": 1029.2, "text": "after"}, {"end": 1030.52, "start": 1030.16, "text": "our"}, {"end": 1031.44, "start": 1030.52, "text": "recommendation"}, {"end": 1032.12, "start": 1031.44, "text": "so"}, {"end": 1032.56, "start": 1032.12, "text": "that"}, {"end": 1032.72, "start": 1032.56, "text": "we"}, {"end": 1033.92, "start": 1032.72, "text": "can"}, {"end": 1034.52, "start": 1033.92, "text": "give"}, {"end": 1035.08, "start": 1034.52, "text": "actions"}, {"end": 1035.4, "start": 1035.08, "text": "which"}, {"end": 1035.64, "start": 1035.4, "text": "leads"}, {"end": 1035.88, "start": 1035.64, "text": "to"}, {"end": 1036.32, "start": 1035.88, "text": "longer"}, {"end": 1036.8, "start": 1036.32, "text": "term"}, {"end": 1038.04, "start": 1036.8, "text": "engagement"}, {"end": 1039.72, "start": 1038.04, "text": "a"}, {"end": 1040.72, "start": 1039.72, "text": "boost"}, {"end": 1041.76, "start": 1040.72, "text": "versus"}, {"end": 1044.04, "start": 1041.76, "text": "just"}, {"end": 1044.64, "start": 1044.04, "text": "rewards"}, {"end": 1045.28, "start": 1044.64, "text": "the"}, {"end": 1045.84, "start": 1045.28, "text": "actions"}, {"end": 1046.16, "start": 1045.84, "text": "which"}, {"end": 1046.56, "start": 1046.16, "text": "focus"}, {"end": 1047.48, "start": 1046.56, "text": "more"}, {"end": 1047.88, "start": 1047.48, "text": "on"}, {"end": 1048.64, "start": 1047.88, "text": "the"}, {"end": 1049.16, "start": 1048.64, "text": "immediate"}, {"end": 1050.0, "start": 1049.16, "text": "returns."}], "text": " where you only care about immediate response. Here, we want to aggregate all the rewards that's coming after our recommendation so that we can give actions which leads to longer term engagement a boost versus just rewards the actions which focus more on the immediate returns."}, {"chunks": [{"end": 1050.64, "start": 1050.0, "text": "in"}, {"end": 1051.72, "start": 1050.64, "text": "the"}, {"end": 1052.16, "start": 1051.72, "text": "user"}, {"end": 1052.76, "start": 1052.16, "text": "feedbacks,"}, {"end": 1052.96, "start": 1052.76, "text": "we"}, {"end": 1053.36, "start": 1052.96, "text": "try"}, {"end": 1053.72, "start": 1053.36, "text": "to"}, {"end": 1054.24, "start": 1053.72, "text": "aggregate"}, {"end": 1054.4, "start": 1054.24, "text": "the"}, {"end": 1054.92, "start": 1054.4, "text": "future"}, {"end": 1055.32, "start": 1054.92, "text": "reward"}, {"end": 1055.8, "start": 1055.32, "text": "with"}, {"end": 1056.52, "start": 1055.8, "text": "exponential"}, {"end": 1056.72, "start": 1056.52, "text": "decay."}, {"end": 1059.12, "start": 1056.72, "text": "So"}, {"end": 1060.24, "start": 1059.12, "text": "in"}, {"end": 1060.44, "start": 1060.24, "text": "our"}, {"end": 1060.76, "start": 1060.44, "text": "live"}, {"end": 1062.12, "start": 1060.76, "text": "experiments,"}, {"end": 1062.96, "start": 1062.12, "text": "we"}, {"end": 1062.96, "start": 1062.96, "text": "see"}, {"end": 1063.52, "start": 1062.96, "text": "that"}, {"end": 1064.16, "start": 1063.52, "text": "when"}, {"end": 1064.4, "start": 1064.16, "text": "we"}, {"end": 1064.72, "start": 1064.4, "text": "move"}, {"end": 1065.2, "start": 1064.72, "text": "from"}, {"end": 1065.48, "start": 1065.2, "text": "a"}, {"end": 1066.0, "start": 1065.48, "text": "myopic"}, {"end": 1066.8, "start": 1066.0, "text": "recommendation"}, {"end": 1067.88, "start": 1066.8, "text": "towards"}, {"end": 1068.6, "start": 1067.88, "text": "accounting"}, {"end": 1069.2, "start": 1068.6, "text": "for"}, {"end": 1069.64, "start": 1069.2, "text": "future"}, {"end": 1070.56, "start": 1069.64, "text": "reward,"}, {"end": 1070.68, "start": 1070.56, "text": "we"}, {"end": 1072.12, "start": 1070.68, "text": "saw"}, {"end": 1074.16, "start": 1072.12, "text": "0.3%"}, {"end": 1076.72, "start": 1074.16, "text": "again"}, {"end": 1077.6, "start": 1076.72, "text": "in"}, {"end": 1078.12, "start": 1077.6, "text": "online"}, {"end": 1078.8, "start": 1078.12, "text": "matrix."}, {"end": 1079.12, "start": 1078.8, "text": "And"}, {"end": 1079.44, "start": 1079.12, "text": "this"}, {"end": 1079.44, "start": 1079.44, "text": "is"}, {"end": 1079.44, "start": 1079.44, "text": "kind"}, {"end": 1079.56, "start": 1079.44, "text": "of"}, {"end": 1079.96, "start": 1079.56, "text": "one"}], "text": " in the user feedbacks, we try to aggregate the future reward with exponential decay. So in our live experiments, we see that when we move from a myopic recommendation towards accounting for future reward, we saw 0.3% again in online matrix. And this is kind of one"}, {"chunks": [{"end": 1080.28, "start": 1080.0, "text": "of"}, {"end": 1081.08, "start": 1080.28, "text": "the"}, {"end": 1082.16, "start": 1081.08, "text": "larger"}, {"end": 1083.68, "start": 1082.16, "text": "improvements"}, {"end": 1084.0, "start": 1083.68, "text": "on"}, {"end": 1087.0, "start": 1084.0, "text": "YouTube."}, {"end": 1087.68, "start": 1087.0, "text": "So"}, {"end": 1088.0, "start": 1087.68, "text": "in"}, {"end": 1088.72, "start": 1088.0, "text": "the"}, {"end": 1089.04, "start": 1088.72, "text": "last"}, {"end": 1089.12, "start": 1089.04, "text": "part"}, {"end": 1089.6, "start": 1089.12, "text": "I'm"}, {"end": 1089.68, "start": 1089.6, "text": "going"}, {"end": 1089.68, "start": 1089.68, "text": "to"}, {"end": 1090.12, "start": 1089.68, "text": "talk"}, {"end": 1090.52, "start": 1090.12, "text": "about,"}, {"end": 1090.72, "start": 1090.52, "text": "so"}, {"end": 1091.36, "start": 1090.72, "text": "how"}, {"end": 1091.68, "start": 1091.36, "text": "do"}, {"end": 1092.36, "start": 1091.68, "text": "we,"}, {"end": 1093.08, "start": 1092.36, "text": "given"}, {"end": 1093.4, "start": 1093.08, "text": "this"}, {"end": 1093.44, "start": 1093.4, "text": "user"}, {"end": 1094.12, "start": 1093.44, "text": "state"}, {"end": 1094.88, "start": 1094.12, "text": "representation"}, {"end": 1095.16, "start": 1094.88, "text": "and"}, {"end": 1095.4, "start": 1095.16, "text": "the"}, {"end": 1095.84, "start": 1095.4, "text": "reward"}, {"end": 1096.32, "start": 1095.84, "text": "function,"}, {"end": 1096.56, "start": 1096.32, "text": "how"}, {"end": 1096.64, "start": 1096.56, "text": "do"}, {"end": 1096.88, "start": 1096.64, "text": "we"}, {"end": 1097.96, "start": 1096.88, "text": "choose"}, {"end": 1100.64, "start": 1097.96, "text": "actions?"}, {"end": 1100.88, "start": 1100.64, "text": "So"}, {"end": 1101.2, "start": 1100.88, "text": "there"}, {"end": 1101.52, "start": 1101.2, "text": "are"}, {"end": 1102.0, "start": 1101.52, "text": "many"}, {"end": 1102.48, "start": 1102.0, "text": "approaches"}, {"end": 1102.68, "start": 1102.48, "text": "in"}, {"end": 1103.44, "start": 1102.68, "text": "RLs"}, {"end": 1103.56, "start": 1103.44, "text": "to"}, {"end": 1104.2, "start": 1103.56, "text": "help"}, {"end": 1104.52, "start": 1104.2, "text": "the"}, {"end": 1104.96, "start": 1104.52, "text": "agent"}, {"end": 1105.32, "start": 1104.96, "text": "choose"}, {"end": 1106.08, "start": 1105.32, "text": "actions."}, {"end": 1106.44, "start": 1106.08, "text": "In"}, {"end": 1106.96, "start": 1106.44, "text": "our"}, {"end": 1107.8, "start": 1106.96, "text": "work,"}, {"end": 1108.84, "start": 1107.8, "text": "we"}, {"end": 1109.56, "start": 1108.84, "text": "decided"}, {"end": 1109.56, "start": 1109.56, "text": "to"}, {"end": 1109.64, "start": 1109.56, "text": "go"}, {"end": 1110.0, "start": 1109.64, "text": "with"}], "text": " of the larger improvements on YouTube. So in the last part I'm going to talk about, so how do we, given this user state representation and the reward function, how do we choose actions? So there are many approaches in RLs to help the agent choose actions. In our work, we decided to go with"}, {"chunks": [{"end": 1110.2, "start": 1110.0, "text": "the"}, {"end": 1111.04, "start": 1110.2, "text": "policy-based"}, {"end": 1111.64, "start": 1111.04, "text": "approach"}, {"end": 1112.16, "start": 1111.64, "text": "because"}, {"end": 1112.28, "start": 1112.16, "text": "it"}, {"end": 1112.76, "start": 1112.28, "text": "is"}, {"end": 1113.24, "start": 1112.76, "text": "actually"}, {"end": 1114.04, "start": 1113.24, "text": "directly"}, {"end": 1114.4, "start": 1114.04, "text": "try"}, {"end": 1114.6, "start": 1114.4, "text": "to"}, {"end": 1115.32, "start": 1114.6, "text": "maximize"}, {"end": 1115.56, "start": 1115.32, "text": "this"}, {"end": 1115.68, "start": 1115.56, "text": "quantity"}, {"end": 1116.08, "start": 1115.68, "text": "that"}, {"end": 1116.2, "start": 1116.08, "text": "we"}, {"end": 1116.56, "start": 1116.2, "text": "are"}, {"end": 1117.24, "start": 1116.56, "text": "interested,"}, {"end": 1117.4, "start": 1117.24, "text": "which"}, {"end": 1117.76, "start": 1117.4, "text": "is"}, {"end": 1117.96, "start": 1117.76, "text": "the"}, {"end": 1118.44, "start": 1117.96, "text": "long-term"}, {"end": 1119.52, "start": 1118.44, "text": "reward."}, {"end": 1120.84, "start": 1119.52, "text": "And"}, {"end": 1121.52, "start": 1120.84, "text": "also"}, {"end": 1122.04, "start": 1121.52, "text": "because"}, {"end": 1122.72, "start": 1122.04, "text": "of"}, {"end": 1123.04, "start": 1122.72, "text": "its"}, {"end": 1123.8, "start": 1123.04, "text": "stability"}, {"end": 1123.92, "start": 1123.8, "text": "compared"}, {"end": 1124.16, "start": 1123.92, "text": "to"}, {"end": 1124.6, "start": 1124.16, "text": "a"}, {"end": 1125.32, "start": 1124.6, "text": "value-based"}, {"end": 1126.44, "start": 1125.32, "text": "approach."}, {"end": 1127.0, "start": 1126.44, "text": "So"}, {"end": 1127.56, "start": 1127.0, "text": "in"}, {"end": 1128.0, "start": 1127.56, "text": "this"}, {"end": 1128.52, "start": 1128.0, "text": "reinforced"}, {"end": 1129.44, "start": 1128.52, "text": "recommender,"}, {"end": 1129.64, "start": 1129.44, "text": "we're"}, {"end": 1130.36, "start": 1129.64, "text": "trying"}, {"end": 1130.84, "start": 1130.36, "text": "to"}, {"end": 1131.24, "start": 1130.84, "text": "learn"}, {"end": 1131.36, "start": 1131.24, "text": "a"}, {"end": 1131.88, "start": 1131.36, "text": "stochastic"}, {"end": 1131.92, "start": 1131.88, "text": "policy"}, {"end": 1132.28, "start": 1131.92, "text": "pi"}, {"end": 1133.12, "start": 1132.28, "text": "theta,"}, {"end": 1133.36, "start": 1133.12, "text": "which"}, {"end": 1133.52, "start": 1133.36, "text": "is"}, {"end": 1133.76, "start": 1133.52, "text": "going"}, {"end": 1133.96, "start": 1133.76, "text": "to"}, {"end": 1134.48, "start": 1133.96, "text": "cast"}, {"end": 1134.64, "start": 1134.48, "text": "a"}, {"end": 1134.88, "start": 1134.64, "text": "distribution"}, {"end": 1135.36, "start": 1134.88, "text": "over"}, {"end": 1135.84, "start": 1135.36, "text": "our"}, {"end": 1136.24, "start": 1135.84, "text": "action"}, {"end": 1136.72, "start": 1136.24, "text": "space"}, {"end": 1137.68, "start": 1136.72, "text": "given"}, {"end": 1138.16, "start": 1137.68, "text": "a"}, {"end": 1138.64, "start": 1138.16, "text": "user"}, {"end": 1138.96, "start": 1138.64, "text": "state"}, {"end": 1139.16, "start": 1138.96, "text": "so"}, {"end": 1139.96, "start": 1139.16, "text": "that"}], "text": " the policy-based approach because it is actually directly try to maximize this quantity that we are interested, which is the long-term reward. And also because of its stability compared to a value-based approach. So in this reinforced recommender, we're trying to learn a stochastic policy pi theta, which is going to cast a distribution over our action space given a user state so that"}, {"chunks": [{"end": 1140.24, "start": 1140.0, "text": "we"}, {"end": 1140.56, "start": 1140.24, "text": "can"}, {"end": 1141.6, "start": 1140.56, "text": "maximize"}, {"end": 1141.92, "start": 1141.6, "text": "this"}, {"end": 1142.88, "start": 1141.92, "text": "cumulative"}, {"end": 1143.72, "start": 1142.88, "text": "long-term"}, {"end": 1145.92, "start": 1143.72, "text": "reward."}, {"end": 1146.72, "start": 1145.92, "text": "So"}, {"end": 1147.4, "start": 1146.72, "text": "the"}, {"end": 1148.0, "start": 1147.4, "text": "trajectory"}, {"end": 1148.52, "start": 1148.0, "text": "tau"}, {"end": 1148.84, "start": 1148.52, "text": "here"}, {"end": 1149.28, "start": 1148.84, "text": "is"}, {"end": 1149.68, "start": 1149.28, "text": "the"}, {"end": 1150.68, "start": 1149.68, "text": "trajectory"}, {"end": 1151.44, "start": 1150.68, "text": "generated"}, {"end": 1151.76, "start": 1151.44, "text": "by"}, {"end": 1152.4, "start": 1151.76, "text": "following"}, {"end": 1152.68, "start": 1152.4, "text": "the"}, {"end": 1153.12, "start": 1152.68, "text": "policy"}, {"end": 1154.4, "start": 1153.12, "text": "and"}, {"end": 1154.76, "start": 1154.4, "text": "the"}, {"end": 1155.48, "start": 1154.76, "text": "cumulative"}, {"end": 1156.08, "start": 1155.48, "text": "reward"}, {"end": 1156.44, "start": 1156.08, "text": "is"}, {"end": 1157.0, "start": 1156.44, "text": "the"}, {"end": 1157.48, "start": 1157.0, "text": "total"}, {"end": 1157.76, "start": 1157.48, "text": "sum"}, {"end": 1158.08, "start": 1157.76, "text": "of"}, {"end": 1158.76, "start": 1158.08, "text": "rewards"}, {"end": 1158.92, "start": 1158.76, "text": "for"}, {"end": 1159.12, "start": 1158.92, "text": "the"}, {"end": 1159.6, "start": 1159.12, "text": "entire"}, {"end": 1161.52, "start": 1159.6, "text": "trajectory."}, {"end": 1161.68, "start": 1161.52, "text": "So"}, {"end": 1162.08, "start": 1161.68, "text": "because"}, {"end": 1162.24, "start": 1162.08, "text": "we"}, {"end": 1163.28, "start": 1162.24, "text": "have"}, {"end": 1163.52, "start": 1163.28, "text": "an"}, {"end": 1163.68, "start": 1163.52, "text": "explicit"}, {"end": 1163.96, "start": 1163.68, "text": "form"}, {"end": 1164.2, "start": 1163.96, "text": "of"}, {"end": 1164.52, "start": 1164.2, "text": "the"}, {"end": 1165.12, "start": 1164.52, "text": "policy,"}, {"end": 1165.16, "start": 1165.12, "text": "then"}, {"end": 1165.4, "start": 1165.16, "text": "we"}, {"end": 1166.0, "start": 1165.4, "text": "can"}, {"end": 1166.36, "start": 1166.0, "text": "basically"}, {"end": 1167.24, "start": 1166.36, "text": "optimize"}, {"end": 1167.48, "start": 1167.24, "text": "the"}, {"end": 1167.88, "start": 1167.48, "text": "policy"}, {"end": 1168.44, "start": 1167.88, "text": "parameter"}, {"end": 1168.96, "start": 1168.44, "text": "zetas"}, {"end": 1169.52, "start": 1168.96, "text": "by"}, {"end": 1170.0, "start": 1169.52, "text": "doing"}], "text": " we can maximize this cumulative long-term reward. So the trajectory tau here is the trajectory generated by following the policy and the cumulative reward is the total sum of rewards for the entire trajectory. So because we have an explicit form of the policy, then we can basically optimize the policy parameter zetas by doing"}, {"chunks": [{"end": 1172.88, "start": 1170.0, "text": "gradient"}, {"end": 1173.72, "start": 1172.88, "text": "ascent"}, {"end": 1174.0, "start": 1173.72, "text": "on"}, {"end": 1174.2, "start": 1174.0, "text": "these"}, {"end": 1177.68, "start": 1174.2, "text": "parameters."}, {"end": 1178.32, "start": 1177.68, "text": "And"}, {"end": 1178.84, "start": 1178.32, "text": "because"}, {"end": 1179.16, "start": 1178.84, "text": "of"}, {"end": 1179.48, "start": 1179.16, "text": "this"}, {"end": 1180.0, "start": 1179.48, "text": "log"}, {"end": 1180.56, "start": 1180.0, "text": "trick,"}, {"end": 1180.72, "start": 1180.56, "text": "so"}, {"end": 1181.8, "start": 1180.72, "text": "actually"}, {"end": 1182.16, "start": 1181.8, "text": "the"}, {"end": 1182.76, "start": 1182.16, "text": "gradient"}, {"end": 1183.04, "start": 1182.76, "text": "end"}, {"end": 1183.64, "start": 1183.04, "text": "up"}, {"end": 1184.24, "start": 1183.64, "text": "looking"}, {"end": 1185.24, "start": 1184.24, "text": "exactly"}, {"end": 1185.52, "start": 1185.24, "text": "like"}, {"end": 1186.08, "start": 1185.52, "text": "the"}, {"end": 1186.52, "start": 1186.08, "text": "gradient"}, {"end": 1187.0, "start": 1186.52, "text": "of"}, {"end": 1188.2, "start": 1187.0, "text": "weighted"}, {"end": 1188.8, "start": 1188.2, "text": "log"}, {"end": 1190.2, "start": 1188.8, "text": "likelihood"}, {"end": 1190.6, "start": 1190.2, "text": "with"}, {"end": 1191.12, "start": 1190.6, "text": "weight"}, {"end": 1191.88, "start": 1191.12, "text": "proportional"}, {"end": 1192.32, "start": 1191.88, "text": "to"}, {"end": 1192.64, "start": 1192.32, "text": "the"}, {"end": 1193.32, "start": 1192.64, "text": "long-term"}, {"end": 1193.8, "start": 1193.32, "text": "reward."}, {"end": 1194.2, "start": 1193.8, "text": "So"}, {"end": 1194.44, "start": 1194.2, "text": "in"}, {"end": 1194.88, "start": 1194.44, "text": "the"}, {"end": 1195.24, "start": 1194.88, "text": "sense"}, {"end": 1195.64, "start": 1195.24, "text": "that"}, {"end": 1196.04, "start": 1195.64, "text": "actually"}, {"end": 1196.24, "start": 1196.04, "text": "the"}, {"end": 1196.84, "start": 1196.24, "text": "learning"}, {"end": 1197.08, "start": 1196.84, "text": "part"}, {"end": 1197.24, "start": 1197.08, "text": "of"}, {"end": 1197.88, "start": 1197.24, "text": "reinforcement"}, {"end": 1198.36, "start": 1197.88, "text": "learning"}, {"end": 1198.56, "start": 1198.36, "text": "is"}, {"end": 1198.96, "start": 1198.56, "text": "actually"}, {"end": 1199.32, "start": 1198.96, "text": "very"}, {"end": 1199.96, "start": 1199.32, "text": "connected"}], "text": " gradient ascent on these parameters. And because of this log trick, so actually the gradient end up looking exactly like the gradient of weighted log likelihood with weight proportional to the long-term reward. So in the sense that actually the learning part of reinforcement learning is actually very connected"}, {"chunks": [{"end": 1200.56, "start": 1200.0, "text": "to"}, {"end": 1201.2, "start": 1200.56, "text": "supervise"}, {"end": 1201.44, "start": 1201.2, "text": "learning"}, {"end": 1201.84, "start": 1201.44, "text": "where"}, {"end": 1201.92, "start": 1201.84, "text": "you"}, {"end": 1203.04, "start": 1201.92, "text": "would"}, {"end": 1203.44, "start": 1203.04, "text": "actually"}, {"end": 1204.2, "start": 1203.44, "text": "optimize"}, {"end": 1204.76, "start": 1204.2, "text": "for"}, {"end": 1204.84, "start": 1204.76, "text": "the"}, {"end": 1205.64, "start": 1204.84, "text": "likelihood"}, {"end": 1205.88, "start": 1205.64, "text": "of"}, {"end": 1206.72, "start": 1205.88, "text": "observing"}, {"end": 1206.88, "start": 1206.72, "text": "the"}, {"end": 1207.36, "start": 1206.88, "text": "next"}, {"end": 1207.84, "start": 1207.36, "text": "action."}, {"end": 1208.12, "start": 1207.84, "text": "But"}, {"end": 1208.44, "start": 1208.12, "text": "what"}, {"end": 1208.96, "start": 1208.44, "text": "reinforcement"}, {"end": 1209.48, "start": 1208.96, "text": "learning"}, {"end": 1209.92, "start": 1209.48, "text": "offers"}, {"end": 1210.4, "start": 1209.92, "text": "us"}, {"end": 1211.0, "start": 1210.4, "text": "is"}, {"end": 1211.44, "start": 1211.0, "text": "a"}, {"end": 1212.4, "start": 1211.44, "text": "tool"}, {"end": 1212.56, "start": 1212.4, "text": "to"}, {"end": 1212.88, "start": 1212.56, "text": "think"}, {"end": 1213.56, "start": 1212.88, "text": "about"}, {"end": 1214.32, "start": 1213.56, "text": "exploration,"}, {"end": 1214.6, "start": 1214.32, "text": "think"}, {"end": 1214.96, "start": 1214.6, "text": "about"}, {"end": 1215.0, "start": 1214.96, "text": "the"}, {"end": 1215.96, "start": 1215.0, "text": "planning,"}, {"end": 1216.12, "start": 1215.96, "text": "and"}, {"end": 1216.36, "start": 1216.12, "text": "think"}, {"end": 1216.64, "start": 1216.36, "text": "about"}, {"end": 1217.48, "start": 1216.64, "text": "changing"}, {"end": 1217.72, "start": 1217.48, "text": "actually"}, {"end": 1218.04, "start": 1217.72, "text": "the"}, {"end": 1218.52, "start": 1218.04, "text": "underlying"}, {"end": 1218.88, "start": 1218.52, "text": "user"}, {"end": 1222.48, "start": 1218.88, "text": "state."}, {"end": 1223.32, "start": 1222.48, "text": "So"}, {"end": 1223.52, "start": 1223.32, "text": "a"}, {"end": 1223.96, "start": 1223.52, "text": "common"}, {"end": 1224.92, "start": 1223.96, "text": "choice"}, {"end": 1225.44, "start": 1224.92, "text": "to"}, {"end": 1226.32, "start": 1225.44, "text": "parameterize"}, {"end": 1226.48, "start": 1226.32, "text": "the"}, {"end": 1227.0, "start": 1226.48, "text": "policy"}, {"end": 1227.44, "start": 1227.0, "text": "is"}, {"end": 1227.64, "start": 1227.44, "text": "to"}, {"end": 1227.68, "start": 1227.64, "text": "do"}, {"end": 1228.04, "start": 1227.68, "text": "a"}, {"end": 1228.52, "start": 1228.04, "text": "soft"}, {"end": 1229.04, "start": 1228.52, "text": "max"}, {"end": 1229.56, "start": 1229.04, "text": "over"}, {"end": 1229.72, "start": 1229.56, "text": "all"}, {"end": 1229.8, "start": 1229.72, "text": "of"}, {"end": 1229.96, "start": 1229.8, "text": "the"}], "text": " to supervise learning where you would actually optimize for the likelihood of observing the next action. But what reinforcement learning offers us is a tool to think about exploration, think about the planning, and think about changing actually the underlying user state. So a common choice to parameterize the policy is to do a soft max over all of the"}, {"chunks": [{"end": 1230.52, "start": 1230.0, "text": "actions"}, {"end": 1230.8, "start": 1230.52, "text": "but"}, {"end": 1231.0, "start": 1230.8, "text": "in"}, {"end": 1231.2, "start": 1231.0, "text": "our"}, {"end": 1231.44, "start": 1231.2, "text": "case"}, {"end": 1231.76, "start": 1231.44, "text": "because"}, {"end": 1231.88, "start": 1231.76, "text": "we"}, {"end": 1232.2, "start": 1231.88, "text": "have"}, {"end": 1232.28, "start": 1232.2, "text": "to"}, {"end": 1232.48, "start": 1232.28, "text": "deal"}, {"end": 1232.76, "start": 1232.48, "text": "with"}, {"end": 1233.32, "start": 1232.76, "text": "this"}, {"end": 1234.04, "start": 1233.32, "text": "large"}, {"end": 1234.72, "start": 1234.04, "text": "action"}, {"end": 1235.08, "start": 1234.72, "text": "space"}, {"end": 1235.4, "start": 1235.08, "text": "of"}, {"end": 1236.44, "start": 1235.4, "text": "millions"}, {"end": 1237.28, "start": 1236.44, "text": "so"}, {"end": 1237.8, "start": 1237.28, "text": "in"}, {"end": 1238.68, "start": 1237.8, "text": "learning"}, {"end": 1238.88, "start": 1238.68, "text": "we"}, {"end": 1239.12, "start": 1238.88, "text": "did"}, {"end": 1239.92, "start": 1239.12, "text": "samples"}, {"end": 1240.12, "start": 1239.92, "text": "of"}, {"end": 1240.52, "start": 1240.12, "text": "max"}, {"end": 1240.76, "start": 1240.52, "text": "to"}, {"end": 1241.68, "start": 1240.76, "text": "avoid"}, {"end": 1242.04, "start": 1241.68, "text": "this"}, {"end": 1242.96, "start": 1242.04, "text": "expensive"}, {"end": 1244.52, "start": 1242.96, "text": "computation"}, {"end": 1245.4, "start": 1244.52, "text": "of"}, {"end": 1245.92, "start": 1245.4, "text": "this"}, {"end": 1246.56, "start": 1245.92, "text": "normalization"}, {"end": 1247.48, "start": 1246.56, "text": "factors"}, {"end": 1247.76, "start": 1247.48, "text": "and"}, {"end": 1247.92, "start": 1247.76, "text": "at"}, {"end": 1248.28, "start": 1247.92, "text": "serving"}, {"end": 1248.88, "start": 1248.28, "text": "time"}, {"end": 1249.12, "start": 1248.88, "text": "we"}, {"end": 1249.32, "start": 1249.12, "text": "did"}, {"end": 1250.16, "start": 1249.32, "text": "a"}, {"end": 1250.56, "start": 1250.16, "text": "fast"}, {"end": 1250.96, "start": 1250.56, "text": "nearest"}, {"end": 1251.72, "start": 1250.96, "text": "neighborhood"}, {"end": 1252.72, "start": 1251.72, "text": "lookup"}, {"end": 1252.96, "start": 1252.72, "text": "before"}, {"end": 1253.4, "start": 1252.96, "text": "we"}, {"end": 1253.72, "start": 1253.4, "text": "actually"}, {"end": 1255.04, "start": 1253.72, "text": "sampling"}, {"end": 1255.6, "start": 1255.04, "text": "from"}, {"end": 1255.84, "start": 1255.6, "text": "a"}, {"end": 1256.4, "start": 1255.84, "text": "subset"}, {"end": 1256.64, "start": 1256.4, "text": "of"}, {"end": 1257.12, "start": 1256.64, "text": "the"}, {"end": 1257.32, "start": 1257.12, "text": "of"}, {"end": 1257.6, "start": 1257.32, "text": "the"}, {"end": 1257.96, "start": 1257.6, "text": "action"}, {"end": 1260.0, "start": 1257.96, "text": "space"}], "text": " actions but in our case because we have to deal with this large action space of millions so in learning we did samples of max to avoid this expensive computation of this normalization factors and at serving time we did a fast nearest neighborhood lookup before we actually sampling from a subset of the of the action space"}, {"chunks": [{"end": 1261.96, "start": 1260.0, "text": "So"}, {"end": 1262.56, "start": 1261.96, "text": "one"}, {"end": 1262.64, "start": 1262.56, "text": "of"}, {"end": 1263.2, "start": 1262.64, "text": "the"}, {"end": 1263.96, "start": 1263.2, "text": "biggest"}, {"end": 1265.12, "start": 1263.96, "text": "gain"}, {"end": 1265.56, "start": 1265.12, "text": "we"}, {"end": 1266.12, "start": 1265.56, "text": "gain"}, {"end": 1267.44, "start": 1266.12, "text": "from"}, {"end": 1268.12, "start": 1267.44, "text": "this"}, {"end": 1268.48, "start": 1268.12, "text": "line"}, {"end": 1268.76, "start": 1268.48, "text": "of"}, {"end": 1269.24, "start": 1268.76, "text": "work"}, {"end": 1269.76, "start": 1269.24, "text": "is"}, {"end": 1270.24, "start": 1269.76, "text": "to"}, {"end": 1270.56, "start": 1270.24, "text": "do"}, {"end": 1271.04, "start": 1270.56, "text": "our"}, {"end": 1271.88, "start": 1271.04, "text": "policy"}, {"end": 1272.36, "start": 1271.88, "text": "learning"}, {"end": 1272.8, "start": 1272.36, "text": "to"}, {"end": 1273.72, "start": 1272.8, "text": "address"}, {"end": 1274.12, "start": 1273.72, "text": "the"}, {"end": 1274.68, "start": 1274.12, "text": "system"}, {"end": 1275.84, "start": 1274.68, "text": "bias."}, {"end": 1276.56, "start": 1275.84, "text": "So"}, {"end": 1277.0, "start": 1276.56, "text": "in"}, {"end": 1277.4, "start": 1277.0, "text": "our"}, {"end": 1277.96, "start": 1277.4, "text": "case,"}, {"end": 1279.88, "start": 1277.96, "text": "also"}, {"end": 1280.4, "start": 1279.88, "text": "one"}, {"end": 1280.64, "start": 1280.4, "text": "of"}, {"end": 1281.24, "start": 1280.64, "text": "distinct"}, {"end": 1281.76, "start": 1281.24, "text": "fact"}, {"end": 1282.08, "start": 1281.76, "text": "of"}, {"end": 1282.56, "start": 1282.08, "text": "building"}, {"end": 1282.68, "start": 1282.56, "text": "a"}, {"end": 1283.32, "start": 1282.68, "text": "recommender"}, {"end": 1283.68, "start": 1283.32, "text": "system"}, {"end": 1284.72, "start": 1283.68, "text": "versus"}, {"end": 1285.08, "start": 1284.72, "text": "the"}, {"end": 1286.12, "start": 1285.08, "text": "standard"}, {"end": 1286.64, "start": 1286.12, "text": "I.O."}, {"end": 1287.12, "start": 1286.64, "text": "application"}, {"end": 1287.32, "start": 1287.12, "text": "is"}, {"end": 1287.72, "start": 1287.32, "text": "that"}, {"end": 1288.08, "start": 1287.72, "text": "we"}, {"end": 1288.96, "start": 1288.08, "text": "only"}, {"end": 1289.08, "start": 1288.96, "text": "have"}, {"end": 1289.96, "start": 1289.08, "text": "access"}], "text": " So one of the biggest gain we gain from this line of work is to do our policy learning to address the system bias. So in our case, also one of distinct fact of building a recommender system versus the standard I.O. application is that we only have access"}, {"chunks": [{"end": 1290.6, "start": 1290.0, "text": "to"}, {"end": 1291.16, "start": 1290.6, "text": "data"}, {"end": 1291.92, "start": 1291.16, "text": "which"}, {"end": 1292.12, "start": 1291.92, "text": "are"}, {"end": 1292.24, "start": 1292.12, "text": "of"}, {"end": 1292.88, "start": 1292.24, "text": "policy."}, {"end": 1293.24, "start": 1292.88, "text": "In"}, {"end": 1293.6, "start": 1293.24, "text": "our"}, {"end": 1294.08, "start": 1293.6, "text": "case,"}, {"end": 1294.12, "start": 1294.08, "text": "we"}, {"end": 1295.28, "start": 1294.12, "text": "have"}, {"end": 1295.84, "start": 1295.28, "text": "an"}, {"end": 1296.6, "start": 1295.84, "text": "agent"}, {"end": 1296.8, "start": 1296.6, "text": "which"}, {"end": 1296.88, "start": 1296.8, "text": "is"}, {"end": 1297.68, "start": 1296.88, "text": "refreshed"}, {"end": 1297.88, "start": 1297.68, "text": "every"}, {"end": 1298.32, "start": 1297.88, "text": "five"}, {"end": 1299.24, "start": 1298.32, "text": "hours,"}, {"end": 1299.48, "start": 1299.24, "text": "which"}, {"end": 1299.88, "start": 1299.48, "text": "means"}, {"end": 1300.04, "start": 1299.88, "text": "the"}, {"end": 1300.48, "start": 1300.04, "text": "agent's"}, {"end": 1301.12, "start": 1300.48, "text": "policy"}, {"end": 1302.0, "start": 1301.12, "text": "after,"}, {"end": 1302.52, "start": 1302.0, "text": "in"}, {"end": 1303.12, "start": 1302.52, "text": "five"}, {"end": 1303.56, "start": 1303.12, "text": "hours"}, {"end": 1303.88, "start": 1303.56, "text": "ago,"}, {"end": 1303.88, "start": 1303.88, "text": "could"}, {"end": 1303.96, "start": 1303.88, "text": "be"}, {"end": 1304.12, "start": 1303.96, "text": "very"}, {"end": 1304.44, "start": 1304.12, "text": "different"}, {"end": 1304.72, "start": 1304.44, "text": "from"}, {"end": 1304.72, "start": 1304.72, "text": "the"}, {"end": 1304.76, "start": 1304.72, "text": "target"}, {"end": 1305.16, "start": 1304.76, "text": "policy"}, {"end": 1305.6, "start": 1305.16, "text": "we're"}, {"end": 1306.16, "start": 1305.6, "text": "trying"}, {"end": 1306.6, "start": 1306.16, "text": "to"}, {"end": 1307.0, "start": 1306.6, "text": "learn"}, {"end": 1307.72, "start": 1307.0, "text": "now."}, {"end": 1308.08, "start": 1307.72, "text": "Meanwhile,"}, {"end": 1308.36, "start": 1308.08, "text": "we"}, {"end": 1309.4, "start": 1308.36, "text": "also"}, {"end": 1309.96, "start": 1309.4, "text": "have"}, {"end": 1310.56, "start": 1309.96, "text": "traffic"}, {"end": 1311.0, "start": 1310.56, "text": "which"}, {"end": 1311.68, "start": 1311.0, "text": "are"}, {"end": 1312.36, "start": 1311.68, "text": "acquired"}, {"end": 1312.92, "start": 1312.36, "text": "by"}, {"end": 1313.32, "start": 1312.92, "text": "other"}, {"end": 1313.84, "start": 1313.32, "text": "agents"}, {"end": 1314.08, "start": 1313.84, "text": "which"}, {"end": 1314.48, "start": 1314.08, "text": "have"}, {"end": 1314.76, "start": 1314.48, "text": "very"}, {"end": 1315.36, "start": 1314.76, "text": "different"}, {"end": 1316.52, "start": 1315.36, "text": "policy"}, {"end": 1316.84, "start": 1316.52, "text": "than"}, {"end": 1317.2, "start": 1316.84, "text": "the"}, {"end": 1317.64, "start": 1317.2, "text": "policy"}, {"end": 1318.0, "start": 1317.64, "text": "we"}, {"end": 1318.52, "start": 1318.0, "text": "are"}, {"end": 1319.2, "start": 1318.52, "text": "trying"}, {"end": 1319.36, "start": 1319.2, "text": "to"}, {"end": 1320.0, "start": 1319.36, "text": "learn."}], "text": " to data which are of policy. In our case, we have an agent which is refreshed every five hours, which means the agent's policy after, in five hours ago, could be very different from the target policy we're trying to learn now. Meanwhile, we also have traffic which are acquired by other agents which have very different policy than the policy we are trying to learn."}, {"chunks": [{"end": 1320.4, "start": 1320.0, "text": "to"}, {"end": 1321.24, "start": 1320.4, "text": "address"}, {"end": 1321.84, "start": 1321.24, "text": "the"}, {"end": 1322.4, "start": 1321.84, "text": "system"}, {"end": 1322.76, "start": 1322.4, "text": "bias"}, {"end": 1323.08, "start": 1322.76, "text": "which"}, {"end": 1323.64, "start": 1323.08, "text": "caused"}, {"end": 1324.44, "start": 1323.64, "text": "by"}, {"end": 1326.8, "start": 1324.44, "text": "having"}, {"end": 1327.24, "start": 1326.8, "text": "only"}, {"end": 1327.8, "start": 1327.24, "text": "access"}, {"end": 1328.36, "start": 1327.8, "text": "to"}, {"end": 1328.76, "start": 1328.36, "text": "log"}, {"end": 1329.2, "start": 1328.76, "text": "data"}, {"end": 1329.44, "start": 1329.2, "text": "which"}, {"end": 1329.52, "start": 1329.44, "text": "are"}, {"end": 1330.32, "start": 1329.52, "text": "generated"}, {"end": 1331.04, "start": 1330.32, "text": "by"}, {"end": 1331.44, "start": 1331.04, "text": "a"}, {"end": 1335.96, "start": 1331.44, "text": "different"}, {"end": 1336.6, "start": 1335.96, "text": "policy."}, {"end": 1337.48, "start": 1336.6, "text": "So"}, {"end": 1337.88, "start": 1337.48, "text": "there's"}, {"end": 1339.36, "start": 1337.88, "text": "actually"}, {"end": 1340.24, "start": 1339.36, "text": "many"}, {"end": 1341.04, "start": 1340.24, "text": "works"}, {"end": 1341.28, "start": 1341.04, "text": "which"}, {"end": 1341.68, "start": 1341.28, "text": "are"}, {"end": 1342.76, "start": 1341.68, "text": "kind"}, {"end": 1342.96, "start": 1342.76, "text": "of"}, {"end": 1343.52, "start": 1342.96, "text": "similar"}, {"end": 1344.0, "start": 1343.52, "text": "to"}, {"end": 1344.4, "start": 1344.0, "text": "what"}, {"end": 1345.04, "start": 1344.4, "text": "we"}, {"end": 1345.04, "start": 1345.04, "text": "did"}, {"end": 1345.2, "start": 1345.04, "text": "here."}, {"end": 1346.08, "start": 1345.2, "text": "So"}, {"end": 1346.32, "start": 1346.08, "text": "there's"}, {"end": 1346.92, "start": 1346.32, "text": "this"}, {"end": 1347.56, "start": 1346.92, "text": "kind"}, {"end": 1347.84, "start": 1347.56, "text": "of"}, {"end": 1348.48, "start": 1347.84, "text": "actual"}, {"end": 1348.96, "start": 1348.48, "text": "learning"}, {"end": 1349.96, "start": 1348.96, "text": "literature"}], "text": " to address the system bias which caused by having only access to log data which are generated by a different policy. So there's actually many works which are kind of similar to what we did here. So there's this kind of actual learning literature"}, {"chunks": [{"end": 1350.4, "start": 1350.0, "text": "It's"}, {"end": 1350.68, "start": 1350.4, "text": "also"}, {"end": 1351.04, "start": 1350.68, "text": "similar"}, {"end": 1351.36, "start": 1351.04, "text": "to"}, {"end": 1351.48, "start": 1351.36, "text": "what"}, {"end": 1351.48, "start": 1351.48, "text": "people"}, {"end": 1351.76, "start": 1351.48, "text": "have"}, {"end": 1351.92, "start": 1351.76, "text": "been"}, {"end": 1352.4, "start": 1351.92, "text": "doing"}, {"end": 1352.48, "start": 1352.4, "text": "in"}, {"end": 1352.92, "start": 1352.48, "text": "the"}, {"end": 1353.92, "start": 1352.92, "text": "domain"}, {"end": 1354.6, "start": 1353.92, "text": "adaptation"}, {"end": 1355.56, "start": 1354.6, "text": "cases."}, {"end": 1356.24, "start": 1355.56, "text": "Where"}, {"end": 1356.88, "start": 1356.24, "text": "basically"}, {"end": 1357.24, "start": 1356.88, "text": "what"}, {"end": 1357.28, "start": 1357.24, "text": "we"}, {"end": 1357.36, "start": 1357.28, "text": "are"}, {"end": 1357.64, "start": 1357.36, "text": "trying"}, {"end": 1357.64, "start": 1357.64, "text": "to"}, {"end": 1357.64, "start": 1357.64, "text": "do"}, {"end": 1357.64, "start": 1357.64, "text": "is"}, {"end": 1357.68, "start": 1357.64, "text": "to"}, {"end": 1358.12, "start": 1357.68, "text": "add"}, {"end": 1358.48, "start": 1358.12, "text": "an"}, {"end": 1359.72, "start": 1358.48, "text": "important"}, {"end": 1360.16, "start": 1359.72, "text": "weight,"}, {"end": 1360.52, "start": 1360.16, "text": "which"}, {"end": 1361.12, "start": 1360.52, "text": "tries"}, {"end": 1361.2, "start": 1361.12, "text": "to"}, {"end": 1361.68, "start": 1361.2, "text": "weight"}, {"end": 1361.96, "start": 1361.68, "text": "the"}, {"end": 1362.84, "start": 1361.96, "text": "trajectory"}, {"end": 1363.08, "start": 1362.84, "text": "based"}, {"end": 1363.72, "start": 1363.08, "text": "on"}, {"end": 1364.44, "start": 1363.72, "text": "how"}, {"end": 1365.04, "start": 1364.44, "text": "likely"}, {"end": 1365.24, "start": 1365.04, "text": "is"}, {"end": 1365.48, "start": 1365.24, "text": "the"}, {"end": 1365.8, "start": 1365.48, "text": "trajectory"}, {"end": 1366.16, "start": 1365.8, "text": "being"}, {"end": 1367.32, "start": 1366.16, "text": "generated"}, {"end": 1367.48, "start": 1367.32, "text": "by"}, {"end": 1367.6, "start": 1367.48, "text": "the"}, {"end": 1368.0, "start": 1367.6, "text": "target"}, {"end": 1368.88, "start": 1368.0, "text": "policy"}, {"end": 1369.36, "start": 1368.88, "text": "versus"}, {"end": 1369.96, "start": 1369.36, "text": "the"}, {"end": 1370.52, "start": 1369.96, "text": "behavior"}, {"end": 1371.4, "start": 1370.52, "text": "policy."}, {"end": 1371.64, "start": 1371.4, "text": "So"}, {"end": 1371.72, "start": 1371.64, "text": "we"}, {"end": 1372.2, "start": 1371.72, "text": "did"}, {"end": 1372.76, "start": 1372.2, "text": "actually"}, {"end": 1373.08, "start": 1372.76, "text": "a"}, {"end": 1373.96, "start": 1373.08, "text": "whole"}, {"end": 1374.4, "start": 1373.96, "text": "body"}, {"end": 1374.56, "start": 1374.4, "text": "of"}, {"end": 1375.08, "start": 1374.56, "text": "work"}, {"end": 1375.56, "start": 1375.08, "text": "trying"}, {"end": 1376.32, "start": 1375.56, "text": "to"}, {"end": 1377.12, "start": 1376.32, "text": "control"}, {"end": 1377.28, "start": 1377.12, "text": "the"}, {"end": 1378.08, "start": 1377.28, "text": "bias"}, {"end": 1378.56, "start": 1378.08, "text": "and"}, {"end": 1379.28, "start": 1378.56, "text": "variance"}, {"end": 1379.68, "start": 1379.28, "text": "in"}, {"end": 1380.0, "start": 1379.68, "text": "this,"}], "text": " It's also similar to what people have been doing in the domain adaptation cases. Where basically what we are trying to do is to add an important weight, which tries to weight the trajectory based on how likely is the trajectory being generated by the target policy versus the behavior policy. So we did actually a whole body of work trying to control the bias and variance in this,"}, {"chunks": [{"end": 1380.28, "start": 1380.0, "text": "in"}, {"end": 1380.6, "start": 1380.28, "text": "this"}, {"end": 1381.12, "start": 1380.6, "text": "important"}, {"end": 1381.84, "start": 1381.12, "text": "weight."}, {"end": 1382.04, "start": 1381.84, "text": "So"}, {"end": 1382.16, "start": 1382.04, "text": "if"}, {"end": 1382.28, "start": 1382.16, "text": "you"}, {"end": 1382.52, "start": 1382.28, "text": "want"}, {"end": 1383.16, "start": 1382.52, "text": "to"}, {"end": 1383.68, "start": 1383.16, "text": "know"}, {"end": 1383.8, "start": 1383.68, "text": "more"}, {"end": 1384.2, "start": 1383.8, "text": "details,"}, {"end": 1384.44, "start": 1384.2, "text": "please"}, {"end": 1384.68, "start": 1384.44, "text": "come"}, {"end": 1384.76, "start": 1384.68, "text": "to"}, {"end": 1385.08, "start": 1384.76, "text": "our"}, {"end": 1386.28, "start": 1385.08, "text": "talk"}, {"end": 1386.88, "start": 1386.28, "text": "on"}, {"end": 1389.84, "start": 1386.88, "text": "Wednesday."}, {"end": 1391.12, "start": 1389.84, "text": "So"}, {"end": 1392.72, "start": 1391.12, "text": "one"}, {"end": 1393.16, "start": 1392.72, "text": "of"}, {"end": 1393.76, "start": 1393.16, "text": "the,"}, {"end": 1394.2, "start": 1393.76, "text": "as"}, {"end": 1394.36, "start": 1394.2, "text": "we"}, {"end": 1394.76, "start": 1394.36, "text": "said,"}, {"end": 1395.24, "start": 1394.76, "text": "without"}, {"end": 1395.96, "start": 1395.24, "text": "addressing"}, {"end": 1396.16, "start": 1395.96, "text": "the"}, {"end": 1396.64, "start": 1396.16, "text": "system"}, {"end": 1397.44, "start": 1396.64, "text": "bias,"}, {"end": 1397.56, "start": 1397.44, "text": "we"}, {"end": 1397.72, "start": 1397.56, "text": "could"}, {"end": 1398.0, "start": 1397.72, "text": "just"}, {"end": 1398.32, "start": 1398.0, "text": "end"}, {"end": 1398.52, "start": 1398.32, "text": "up"}, {"end": 1399.48, "start": 1398.52, "text": "recommending"}, {"end": 1399.76, "start": 1399.48, "text": "an"}, {"end": 1400.48, "start": 1399.76, "text": "item,"}, {"end": 1400.56, "start": 1400.48, "text": "not"}, {"end": 1400.84, "start": 1400.56, "text": "because"}, {"end": 1400.96, "start": 1400.84, "text": "the"}, {"end": 1401.64, "start": 1400.96, "text": "item"}, {"end": 1402.44, "start": 1401.64, "text": "was"}, {"end": 1403.32, "start": 1402.44, "text": "relevant,"}, {"end": 1403.6, "start": 1403.32, "text": "but"}, {"end": 1403.92, "start": 1403.6, "text": "instead"}, {"end": 1404.16, "start": 1403.92, "text": "just"}, {"end": 1404.52, "start": 1404.16, "text": "because"}, {"end": 1404.68, "start": 1404.52, "text": "the"}, {"end": 1405.28, "start": 1404.68, "text": "previous"}, {"end": 1405.6, "start": 1405.28, "text": "system"}, {"end": 1406.24, "start": 1405.6, "text": "are"}, {"end": 1406.8, "start": 1406.24, "text": "more"}, {"end": 1407.32, "start": 1406.8, "text": "likely"}, {"end": 1407.6, "start": 1407.32, "text": "to"}, {"end": 1408.2, "start": 1407.6, "text": "recommend"}, {"end": 1408.52, "start": 1408.2, "text": "that"}, {"end": 1409.4, "start": 1408.52, "text": "item."}, {"end": 1409.96, "start": 1409.4, "text": "So"}], "text": " in this important weight. So if you want to know more details, please come to our talk on Wednesday. So one of the, as we said, without addressing the system bias, we could just end up recommending an item, not because the item was relevant, but instead just because the previous system are more likely to recommend that item. So"}, {"chunks": [{"end": 1410.6, "start": 1410.0, "text": "We've"}, {"end": 1411.28, "start": 1410.6, "text": "seen"}, {"end": 1411.84, "start": 1411.28, "text": "that"}, {"end": 1412.12, "start": 1411.84, "text": "in"}, {"end": 1412.32, "start": 1412.12, "text": "our"}, {"end": 1413.2, "start": 1412.32, "text": "live"}, {"end": 1414.68, "start": 1413.2, "text": "experiment"}, {"end": 1415.24, "start": 1414.68, "text": "before"}, {"end": 1415.6, "start": 1415.24, "text": "we"}, {"end": 1415.88, "start": 1415.6, "text": "apply"}, {"end": 1416.36, "start": 1415.88, "text": "this"}, {"end": 1416.68, "start": 1416.36, "text": "off-policy"}, {"end": 1417.4, "start": 1416.68, "text": "correction"}, {"end": 1418.12, "start": 1417.4, "text": "versus"}, {"end": 1418.44, "start": 1418.12, "text": "after"}, {"end": 1418.68, "start": 1418.44, "text": "we"}, {"end": 1419.16, "start": 1418.68, "text": "apply,"}, {"end": 1419.24, "start": 1419.16, "text": "we"}, {"end": 1419.24, "start": 1419.24, "text": "can"}, {"end": 1419.44, "start": 1419.24, "text": "see"}, {"end": 1419.84, "start": 1419.44, "text": "that"}, {"end": 1420.4, "start": 1419.84, "text": "there"}, {"end": 1420.76, "start": 1420.4, "text": "are"}, {"end": 1421.2, "start": 1420.76, "text": "much"}, {"end": 1421.4, "start": 1421.2, "text": "more"}, {"end": 1422.44, "start": 1421.4, "text": "traffic"}, {"end": 1422.68, "start": 1422.44, "text": "that's"}, {"end": 1423.2, "start": 1422.68, "text": "coming"}, {"end": 1423.96, "start": 1423.2, "text": "from"}, {"end": 1424.08, "start": 1423.96, "text": "the"}, {"end": 1424.84, "start": 1424.08, "text": "tail"}, {"end": 1425.36, "start": 1424.84, "text": "recommendations"}, {"end": 1425.8, "start": 1425.36, "text": "versus"}, {"end": 1426.08, "start": 1425.8, "text": "the"}, {"end": 1426.88, "start": 1426.08, "text": "before,"}, {"end": 1427.32, "start": 1426.88, "text": "which"}, {"end": 1427.72, "start": 1427.32, "text": "basically"}, {"end": 1428.2, "start": 1427.72, "text": "tries"}, {"end": 1428.24, "start": 1428.2, "text": "to"}, {"end": 1429.0, "start": 1428.24, "text": "follow"}, {"end": 1429.24, "start": 1429.0, "text": "the"}, {"end": 1430.16, "start": 1429.24, "text": "prior"}, {"end": 1431.36, "start": 1430.16, "text": "system"}, {"end": 1432.68, "start": 1431.36, "text": "and"}, {"end": 1433.84, "start": 1432.68, "text": "recommends"}, {"end": 1434.72, "start": 1433.84, "text": "more"}, {"end": 1435.2, "start": 1434.72, "text": "head"}, {"end": 1435.92, "start": 1435.2, "text": "videos"}, {"end": 1436.24, "start": 1435.92, "text": "than"}, {"end": 1439.96, "start": 1436.24, "text": "tail."}], "text": " We've seen that in our live experiment before we apply this off-policy correction versus after we apply, we can see that there are much more traffic that's coming from the tail recommendations versus the before, which basically tries to follow the prior system and recommends more head videos than tail."}, {"chunks": [{"end": 1440.84, "start": 1440.0, "text": "So"}, {"end": 1441.0, "start": 1440.84, "text": "in"}, {"end": 1441.68, "start": 1441.0, "text": "the"}, {"end": 1442.12, "start": 1441.68, "text": "live"}, {"end": 1442.76, "start": 1442.12, "text": "experiment,"}, {"end": 1443.36, "start": 1442.76, "text": "so"}, {"end": 1443.64, "start": 1443.36, "text": "we"}, {"end": 1444.04, "start": 1443.64, "text": "also"}, {"end": 1445.08, "start": 1444.04, "text": "observed"}, {"end": 1446.52, "start": 1445.08, "text": "0.86"}, {"end": 1447.52, "start": 1446.52, "text": "overall"}, {"end": 1447.92, "start": 1447.52, "text": "matrix"}, {"end": 1448.0, "start": 1447.92, "text": "gain."}, {"end": 1448.48, "start": 1448.0, "text": "So"}, {"end": 1448.84, "start": 1448.48, "text": "this"}, {"end": 1449.2, "start": 1448.84, "text": "is"}, {"end": 1449.68, "start": 1449.2, "text": "actually"}, {"end": 1450.0, "start": 1449.68, "text": "the"}, {"end": 1450.52, "start": 1450.0, "text": "largest"}, {"end": 1452.16, "start": 1450.52, "text": "single"}, {"end": 1453.36, "start": 1452.16, "text": "launch"}, {"end": 1454.56, "start": 1453.36, "text": "we've"}, {"end": 1455.32, "start": 1454.56, "text": "had"}, {"end": 1455.4, "start": 1455.32, "text": "in"}, {"end": 1455.76, "start": 1455.4, "text": "YouTube"}, {"end": 1456.2, "start": 1455.76, "text": "for"}, {"end": 1456.64, "start": 1456.2, "text": "two"}, {"end": 1460.16, "start": 1456.64, "text": "years."}, {"end": 1460.88, "start": 1460.16, "text": "So"}, {"end": 1461.92, "start": 1460.88, "text": "to"}, {"end": 1463.36, "start": 1461.92, "text": "conclude,"}, {"end": 1463.88, "start": 1463.36, "text": "we've"}, {"end": 1465.28, "start": 1463.88, "text": "built"}, {"end": 1465.92, "start": 1465.28, "text": "our"}, {"end": 1466.68, "start": 1465.92, "text": "initial"}, {"end": 1467.0, "start": 1466.68, "text": "kind"}, {"end": 1467.16, "start": 1467.0, "text": "of"}, {"end": 1467.84, "start": 1467.16, "text": "attempt"}, {"end": 1468.16, "start": 1467.84, "text": "of"}, {"end": 1468.64, "start": 1468.16, "text": "trying"}, {"end": 1469.16, "start": 1468.64, "text": "reinforcement"}, {"end": 1470.0, "start": 1469.16, "text": "learning"}], "text": " So in the live experiment, so we also observed 0.86 overall matrix gain. So this is actually the largest single launch we've had in YouTube for two years. So to conclude, we've built our initial kind of attempt of trying reinforcement learning"}, {"chunks": [{"end": 1470.48, "start": 1470.0, "text": "for"}, {"end": 1470.8, "start": 1470.48, "text": "recommender"}, {"end": 1471.2, "start": 1470.8, "text": "systems"}, {"end": 1471.56, "start": 1471.2, "text": "and"}, {"end": 1471.96, "start": 1471.56, "text": "we"}, {"end": 1472.48, "start": 1471.96, "text": "had"}, {"end": 1473.16, "start": 1472.48, "text": "quite"}, {"end": 1473.2, "start": 1473.16, "text": "a"}, {"end": 1473.44, "start": 1473.2, "text": "bit"}, {"end": 1474.0, "start": 1473.44, "text": "success"}, {"end": 1474.16, "start": 1474.0, "text": "with"}, {"end": 1474.16, "start": 1474.16, "text": "it."}, {"end": 1474.4, "start": 1474.16, "text": "So"}, {"end": 1474.64, "start": 1474.4, "text": "hopefully"}, {"end": 1474.88, "start": 1474.64, "text": "you"}, {"end": 1475.2, "start": 1474.88, "text": "can"}, {"end": 1475.68, "start": 1475.2, "text": "also"}, {"end": 1475.76, "start": 1475.68, "text": "think"}, {"end": 1477.4, "start": 1475.76, "text": "about"}, {"end": 1477.72, "start": 1477.4, "text": "it"}, {"end": 1478.08, "start": 1477.72, "text": "and"}, {"end": 1479.56, "start": 1478.08, "text": "try"}, {"end": 1480.04, "start": 1479.56, "text": "it"}, {"end": 1480.4, "start": 1480.04, "text": "in"}, {"end": 1481.08, "start": 1480.4, "text": "your"}, {"end": 1481.2, "start": 1481.08, "text": "use"}, {"end": 1481.2, "start": 1481.2, "text": "case."}, {"end": 1481.2, "start": 1481.2, "text": "But"}, {"end": 1481.2, "start": 1481.2, "text": "of"}, {"end": 1481.84, "start": 1481.2, "text": "course"}, {"end": 1482.36, "start": 1481.84, "text": "there"}, {"end": 1482.6, "start": 1482.36, "text": "are"}, {"end": 1482.76, "start": 1482.6, "text": "a"}, {"end": 1483.48, "start": 1482.76, "text": "lot"}, {"end": 1484.64, "start": 1483.48, "text": "of"}, {"end": 1485.24, "start": 1484.64, "text": "future"}, {"end": 1485.72, "start": 1485.24, "text": "work"}, {"end": 1485.92, "start": 1485.72, "text": "we"}, {"end": 1486.4, "start": 1485.92, "text": "are"}, {"end": 1487.24, "start": 1486.4, "text": "trying"}, {"end": 1487.4, "start": 1487.24, "text": "to"}, {"end": 1487.64, "start": 1487.4, "text": "do"}, {"end": 1487.64, "start": 1487.64, "text": "in"}, {"end": 1487.68, "start": 1487.64, "text": "this"}, {"end": 1488.2, "start": 1487.68, "text": "area"}, {"end": 1488.28, "start": 1488.2, "text": "in"}, {"end": 1488.6, "start": 1488.28, "text": "terms"}, {"end": 1488.88, "start": 1488.6, "text": "of"}, {"end": 1489.4, "start": 1488.88, "text": "building"}, {"end": 1489.44, "start": 1489.4, "text": "a"}, {"end": 1489.88, "start": 1489.44, "text": "better"}, {"end": 1490.32, "start": 1489.88, "text": "state"}, {"end": 1491.0, "start": 1490.32, "text": "representation"}, {"end": 1491.48, "start": 1491.0, "text": "so"}, {"end": 1491.48, "start": 1491.48, "text": "that"}, {"end": 1491.48, "start": 1491.48, "text": "we"}, {"end": 1491.92, "start": 1491.48, "text": "can"}, {"end": 1492.36, "start": 1491.92, "text": "account"}, {"end": 1492.96, "start": 1492.36, "text": "for"}, {"end": 1493.2, "start": 1492.96, "text": "longer"}, {"end": 1493.68, "start": 1493.2, "text": "range"}, {"end": 1494.68, "start": 1493.68, "text": "dependencies"}, {"end": 1495.4, "start": 1494.68, "text": "in"}, {"end": 1495.84, "start": 1495.4, "text": "the"}, {"end": 1496.12, "start": 1495.84, "text": "user"}, {"end": 1496.96, "start": 1496.12, "text": "state."}, {"end": 1497.6, "start": 1496.96, "text": "Having"}, {"end": 1498.04, "start": 1497.6, "text": "better"}, {"end": 1499.36, "start": 1498.04, "text": "exploration"}, {"end": 1499.96, "start": 1499.36, "text": "strategies"}], "text": " for recommender systems and we had quite a bit success with it. So hopefully you can also think about it and try it in your use case. But of course there are a lot of future work we are trying to do in this area in terms of building a better state representation so that we can account for longer range dependencies in the user state. Having better exploration strategies"}, {"chunks": [{"end": 1500.56, "start": 1500.0, "text": "and"}, {"end": 1501.28, "start": 1500.56, "text": "planning"}, {"end": 1501.48, "start": 1501.28, "text": "so"}, {"end": 1501.72, "start": 1501.48, "text": "that"}, {"end": 1501.88, "start": 1501.72, "text": "we"}, {"end": 1502.12, "start": 1501.88, "text": "can"}, {"end": 1502.76, "start": 1502.12, "text": "really"}, {"end": 1502.88, "start": 1502.76, "text": "lead"}, {"end": 1503.08, "start": 1502.88, "text": "the"}, {"end": 1504.0, "start": 1503.08, "text": "users"}, {"end": 1504.64, "start": 1504.0, "text": "towards"}, {"end": 1504.76, "start": 1504.64, "text": "a"}, {"end": 1505.16, "start": 1504.76, "text": "different"}, {"end": 1505.88, "start": 1505.16, "text": "state"}, {"end": 1506.68, "start": 1505.88, "text": "versus"}, {"end": 1507.4, "start": 1506.68, "text": "they"}, {"end": 1507.64, "start": 1507.4, "text": "just"}, {"end": 1509.0, "start": 1507.64, "text": "recommending"}, {"end": 1510.28, "start": 1509.0, "text": "content"}, {"end": 1511.12, "start": 1510.28, "text": "that"}, {"end": 1511.92, "start": 1511.12, "text": "are"}, {"end": 1512.56, "start": 1511.92, "text": "familiar"}, {"end": 1513.04, "start": 1512.56, "text": "to"}, {"end": 1513.32, "start": 1513.04, "text": "the"}, {"end": 1514.84, "start": 1513.32, "text": "users."}, {"end": 1515.24, "start": 1514.84, "text": "And"}, {"end": 1515.64, "start": 1515.24, "text": "so"}, {"end": 1516.36, "start": 1515.64, "text": "far"}, {"end": 1516.76, "start": 1516.36, "text": "we"}, {"end": 1517.52, "start": 1516.76, "text": "have"}, {"end": 1517.64, "start": 1517.52, "text": "been"}, {"end": 1518.16, "start": 1517.64, "text": "constraining"}, {"end": 1518.72, "start": 1518.16, "text": "ourselves"}, {"end": 1518.96, "start": 1518.72, "text": "just"}, {"end": 1519.28, "start": 1518.96, "text": "thinking"}, {"end": 1519.96, "start": 1519.28, "text": "about"}, {"end": 1520.0, "start": 1519.96, "text": "the"}, {"end": 1520.36, "start": 1520.0, "text": "systems"}, {"end": 1520.6, "start": 1520.36, "text": "and"}, {"end": 1521.16, "start": 1520.6, "text": "users,"}, {"end": 1521.56, "start": 1521.16, "text": "but"}, {"end": 1522.08, "start": 1521.56, "text": "actually"}, {"end": 1522.32, "start": 1522.08, "text": "in"}, {"end": 1522.92, "start": 1522.32, "text": "YouTube"}, {"end": 1523.84, "start": 1522.92, "text": "there's"}, {"end": 1524.76, "start": 1523.84, "text": "also"}, {"end": 1525.72, "start": 1524.76, "text": "the"}, {"end": 1526.8, "start": 1525.72, "text": "creators."}, {"end": 1527.2, "start": 1526.8, "text": "So"}, {"end": 1527.48, "start": 1527.2, "text": "we"}, {"end": 1528.28, "start": 1527.48, "text": "want"}, {"end": 1528.4, "start": 1528.28, "text": "to"}, {"end": 1528.72, "start": 1528.4, "text": "maybe"}, {"end": 1530.0, "start": 1528.72, "text": "have"}], "text": " and planning so that we can really lead the users towards a different state versus they just recommending content that are familiar to the users. And so far we have been constraining ourselves just thinking about the systems and users, but actually in YouTube there's also the creators. So we want to maybe have"}, {"chunks": [{"end": 1530.76, "start": 1530.0, "text": "different"}, {"end": 1531.24, "start": 1530.76, "text": "utility"}, {"end": 1532.24, "start": 1531.24, "text": "functions"}, {"end": 1532.4, "start": 1532.24, "text": "to"}, {"end": 1533.28, "start": 1532.4, "text": "optimize"}, {"end": 1533.52, "start": 1533.28, "text": "so"}, {"end": 1533.6, "start": 1533.52, "text": "that"}, {"end": 1533.8, "start": 1533.6, "text": "we"}, {"end": 1534.56, "start": 1533.8, "text": "can"}, {"end": 1534.92, "start": 1534.56, "text": "improve"}, {"end": 1535.0, "start": 1534.92, "text": "the"}, {"end": 1535.88, "start": 1535.0, "text": "overall"}, {"end": 1536.24, "start": 1535.88, "text": "YouTube"}, {"end": 1537.44, "start": 1536.24, "text": "ecosystem"}, {"end": 1537.92, "start": 1537.44, "text": "instead"}, {"end": 1538.56, "start": 1537.92, "text": "of"}, {"end": 1538.92, "start": 1538.56, "text": "just"}, {"end": 1539.6, "start": 1538.92, "text": "focusing"}, {"end": 1539.96, "start": 1539.6, "text": "on"}, {"end": 1540.72, "start": 1539.96, "text": "improve"}, {"end": 1541.36, "start": 1540.72, "text": "the"}, {"end": 1541.88, "start": 1541.36, "text": "user"}, {"end": 1542.72, "start": 1541.88, "text": "utility"}, {"end": 1544.16, "start": 1542.72, "text": "overall."}, {"end": 1544.28, "start": 1544.16, "text": "Yeah,"}, {"end": 1544.68, "start": 1544.28, "text": "with"}, {"end": 1544.76, "start": 1544.68, "text": "that,"}, {"end": 1544.76, "start": 1544.76, "text": "we"}, {"end": 1545.32, "start": 1544.76, "text": "can"}, {"end": 1545.72, "start": 1545.32, "text": "take"}, {"end": 1546.36, "start": 1545.72, "text": "questions."}, {"end": 1547.6, "start": 1546.36, "text": "Thank"}, {"end": 1547.8, "start": 1547.6, "text": "you."}, {"end": 1549.84, "start": 1547.8, "text": "Questions?"}, {"end": 1550.32, "start": 1549.84, "text": "So"}, {"end": 1550.88, "start": 1550.32, "text": "I"}, {"end": 1551.56, "start": 1550.88, "text": "have"}, {"end": 1553.88, "start": 1551.56, "text": "a"}, {"end": 1554.72, "start": 1553.88, "text": "simple"}, {"end": 1555.96, "start": 1554.72, "text": "question."}, {"end": 1556.72, "start": 1555.96, "text": "The"}, {"end": 1557.44, "start": 1556.72, "text": "first"}, {"end": 1557.88, "start": 1557.44, "text": "one,"}, {"end": 1558.12, "start": 1557.88, "text": "I"}, {"end": 1559.16, "start": 1558.12, "text": "think"}, {"end": 1559.44, "start": 1559.16, "text": "you'll"}, {"end": 1559.96, "start": 1559.44, "text": "select"}], "text": " different utility functions to optimize so that we can improve the overall YouTube ecosystem instead of just focusing on improve the user utility overall. Yeah, with that, we can take questions. Thank you. Questions? So I have a simple question. The first one, I think you'll select"}, {"chunks": [{"end": 1561.24, "start": 1560.0, "text": "I"}, {"end": 1562.32, "start": 1561.24, "text": "don't"}, {"end": 1563.8, "start": 1562.32, "text": "think"}, {"end": 1564.2, "start": 1563.8, "text": "we"}, {"end": 1571.08, "start": 1564.2, "text": "can"}, {"end": 1571.64, "start": 1571.08, "text": "reveal"}, {"end": 1573.84, "start": 1571.64, "text": "what's"}, {"end": 1574.44, "start": 1573.84, "text": "the,"}, {"end": 1574.84, "start": 1574.44, "text": "I"}, {"end": 1575.08, "start": 1574.84, "text": "mean"}, {"end": 1575.92, "start": 1575.08, "text": "there"}, {"end": 1576.0, "start": 1575.92, "text": "are"}, {"end": 1576.28, "start": 1576.0, "text": "multiple"}, {"end": 1576.6, "start": 1576.28, "text": "metrics"}, {"end": 1576.92, "start": 1576.6, "text": "we"}, {"end": 1577.36, "start": 1576.92, "text": "are"}, {"end": 1577.76, "start": 1577.36, "text": "looking"}, {"end": 1577.84, "start": 1577.76, "text": "at"}, {"end": 1578.4, "start": 1577.84, "text": "in"}, {"end": 1578.84, "start": 1578.4, "text": "our"}, {"end": 1579.08, "start": 1578.84, "text": "live"}, {"end": 1579.64, "start": 1579.08, "text": "experiments."}, {"end": 1580.0, "start": 1579.64, "text": "Some"}, {"end": 1580.0, "start": 1580.0, "text": "of"}, {"end": 1580.12, "start": 1580.0, "text": "them"}, {"end": 1580.44, "start": 1580.12, "text": "are"}, {"end": 1580.88, "start": 1580.44, "text": "more"}, {"end": 1581.04, "start": 1580.88, "text": "engagement"}, {"end": 1581.32, "start": 1581.04, "text": "driven"}, {"end": 1581.4, "start": 1581.32, "text": "and"}, {"end": 1581.8, "start": 1581.4, "text": "some"}, {"end": 1582.04, "start": 1581.8, "text": "of"}, {"end": 1582.28, "start": 1582.04, "text": "them"}, {"end": 1582.64, "start": 1582.28, "text": "are"}, {"end": 1583.28, "start": 1582.64, "text": "actually"}, {"end": 1584.24, "start": 1583.28, "text": "measuring"}, {"end": 1584.6, "start": 1584.24, "text": "the"}, {"end": 1584.88, "start": 1584.6, "text": "user"}, {"end": 1585.32, "start": 1584.88, "text": "overall"}, {"end": 1586.24, "start": 1585.32, "text": "satisfaction"}, {"end": 1586.4, "start": 1586.24, "text": "with"}, {"end": 1586.6, "start": 1586.4, "text": "the"}, {"end": 1587.24, "start": 1586.6, "text": "platform"}, {"end": 1587.52, "start": 1587.24, "text": "through"}, {"end": 1588.04, "start": 1587.52, "text": "kind"}, {"end": 1588.32, "start": 1588.04, "text": "of"}, {"end": 1590.0, "start": 1588.32, "text": "survey"}], "text": " I don't think we can reveal what's the, I mean there are multiple metrics we are looking at in our live experiments. Some of them are more engagement driven and some of them are actually measuring the user overall satisfaction with the platform through kind of survey"}, {"chunks": [{"end": 1591.36, "start": 1590.0, "text": "Yeah,"}, {"end": 1592.08, "start": 1591.36, "text": "so"}, {"end": 1592.32, "start": 1592.08, "text": "this"}, {"end": 1592.68, "start": 1592.32, "text": "is"}, {"end": 1592.92, "start": 1592.68, "text": "one"}, {"end": 1593.0, "start": 1592.92, "text": "of"}, {"end": 1593.12, "start": 1593.0, "text": "the"}, {"end": 1593.56, "start": 1593.12, "text": "metrics"}, {"end": 1594.2, "start": 1593.56, "text": "we"}, {"end": 1596.16, "start": 1594.2, "text": "are"}, {"end": 1597.0, "start": 1596.16, "text": "looking"}, {"end": 1597.88, "start": 1597.0, "text": "at."}, {"end": 1597.96, "start": 1597.88, "text": "So"}, {"end": 1598.16, "start": 1597.96, "text": "right"}, {"end": 1598.52, "start": 1598.16, "text": "now,"}, {"end": 1598.68, "start": 1598.52, "text": "how"}, {"end": 1598.76, "start": 1598.68, "text": "do"}, {"end": 1599.12, "start": 1598.76, "text": "you"}, {"end": 1599.72, "start": 1599.12, "text": "address"}, {"end": 1599.88, "start": 1599.72, "text": "the"}, {"end": 1603.88, "start": 1599.88, "text": "costar"}, {"end": 1604.2, "start": 1603.88, "text": "problem?"}, {"end": 1604.2, "start": 1604.2, "text": "You"}, {"end": 1604.4, "start": 1604.2, "text": "mean"}, {"end": 1605.2, "start": 1604.4, "text": "costar"}, {"end": 1605.68, "start": 1605.2, "text": "on"}, {"end": 1606.04, "start": 1605.68, "text": "the"}, {"end": 1606.44, "start": 1606.04, "text": "item"}, {"end": 1607.88, "start": 1606.44, "text": "side?"}, {"end": 1608.04, "start": 1607.88, "text": "On"}, {"end": 1608.16, "start": 1608.04, "text": "the"}, {"end": 1608.56, "start": 1608.16, "text": "user"}, {"end": 1609.2, "start": 1608.56, "text": "side."}, {"end": 1610.52, "start": 1609.2, "text": "On"}, {"end": 1610.64, "start": 1610.52, "text": "the"}, {"end": 1610.72, "start": 1610.64, "text": "user"}, {"end": 1611.16, "start": 1610.72, "text": "side."}, {"end": 1611.36, "start": 1611.16, "text": "That's"}, {"end": 1611.56, "start": 1611.36, "text": "a"}, {"end": 1611.88, "start": 1611.56, "text": "good"}, {"end": 1612.48, "start": 1611.88, "text": "question."}, {"end": 1612.8, "start": 1612.48, "text": "So"}, {"end": 1612.96, "start": 1612.8, "text": "right"}, {"end": 1613.24, "start": 1612.96, "text": "now,"}, {"end": 1613.48, "start": 1613.24, "text": "we"}, {"end": 1613.48, "start": 1613.48, "text": "are"}, {"end": 1614.04, "start": 1613.48, "text": "mostly"}, {"end": 1614.6, "start": 1614.04, "text": "relying"}, {"end": 1614.84, "start": 1614.6, "text": "on"}, {"end": 1615.32, "start": 1614.84, "text": "this"}, {"end": 1615.76, "start": 1615.32, "text": "user"}, {"end": 1616.24, "start": 1615.76, "text": "path"}, {"end": 1617.0, "start": 1616.24, "text": "activities"}, {"end": 1617.28, "start": 1617.0, "text": "to"}, {"end": 1617.88, "start": 1617.28, "text": "kind"}, {"end": 1618.52, "start": 1617.88, "text": "of"}, {"end": 1619.16, "start": 1618.52, "text": "learn"}, {"end": 1619.44, "start": 1619.16, "text": "this"}, {"end": 1619.56, "start": 1619.44, "text": "user"}, {"end": 1619.96, "start": 1619.56, "text": "state"}], "text": " Yeah, so this is one of the metrics we are looking at. So right now, how do you address the costar problem? You mean costar on the item side? On the user side. On the user side. That's a good question. So right now, we are mostly relying on this user path activities to kind of learn this user state"}, {"chunks": [{"end": 1620.52, "start": 1620.0, "text": "So"}, {"end": 1621.32, "start": 1620.52, "text": "if"}, {"end": 1621.56, "start": 1621.32, "text": "a"}, {"end": 1622.64, "start": 1621.56, "text": "user"}, {"end": 1623.0, "start": 1622.64, "text": "does"}, {"end": 1623.28, "start": 1623.0, "text": "not"}, {"end": 1623.76, "start": 1623.28, "text": "have"}, {"end": 1624.24, "start": 1623.76, "text": "any"}, {"end": 1624.64, "start": 1624.24, "text": "past"}, {"end": 1625.36, "start": 1624.64, "text": "histories,"}, {"end": 1625.4, "start": 1625.36, "text": "then"}, {"end": 1625.8, "start": 1625.4, "text": "we"}, {"end": 1626.48, "start": 1625.8, "text": "kind"}, {"end": 1626.76, "start": 1626.48, "text": "of"}, {"end": 1627.04, "start": 1626.76, "text": "are"}, {"end": 1627.12, "start": 1627.04, "text": "in"}, {"end": 1628.16, "start": 1627.12, "text": "the"}, {"end": 1629.0, "start": 1628.16, "text": "last."}, {"end": 1629.28, "start": 1629.0, "text": "But"}, {"end": 1630.28, "start": 1629.28, "text": "then"}, {"end": 1630.8, "start": 1630.28, "text": "we"}, {"end": 1631.28, "start": 1630.8, "text": "are"}, {"end": 1631.48, "start": 1631.28, "text": "thinking"}, {"end": 1631.8, "start": 1631.48, "text": "of"}, {"end": 1632.28, "start": 1631.8, "text": "actually"}, {"end": 1632.56, "start": 1632.28, "text": "other"}, {"end": 1633.0, "start": 1632.56, "text": "ways"}, {"end": 1633.24, "start": 1633.0, "text": "to"}, {"end": 1634.56, "start": 1633.24, "text": "incorporate"}, {"end": 1635.16, "start": 1634.56, "text": "the"}, {"end": 1635.52, "start": 1635.16, "text": "user"}, {"end": 1636.48, "start": 1635.52, "text": "profile"}, {"end": 1636.76, "start": 1636.48, "text": "into"}, {"end": 1637.0, "start": 1636.76, "text": "the"}, {"end": 1637.28, "start": 1637.0, "text": "stage"}, {"end": 1638.48, "start": 1637.28, "text": "representation"}, {"end": 1639.36, "start": 1638.48, "text": "building"}, {"end": 1639.52, "start": 1639.36, "text": "so"}, {"end": 1640.08, "start": 1639.52, "text": "that"}, {"end": 1640.52, "start": 1640.08, "text": "I"}, {"end": 1641.6, "start": 1640.52, "text": "can"}, {"end": 1642.24, "start": 1641.6, "text": "hopefully"}, {"end": 1642.6, "start": 1642.24, "text": "address"}, {"end": 1643.24, "start": 1642.6, "text": "possibly"}, {"end": 1643.88, "start": 1643.24, "text": "their"}, {"end": 1644.56, "start": 1643.88, "text": "costart"}, {"end": 1650.0, "start": 1644.56, "text": "issue."}], "text": " So if a user does not have any past histories, then we kind of are in the last. But then we are thinking of actually other ways to incorporate the user profile into the stage representation building so that I can hopefully address possibly their costart issue."}, {"chunks": [{"end": 1650.4, "start": 1650.0, "text": "Other"}, {"end": 1658.96, "start": 1650.4, "text": "questions?"}, {"end": 1662.4, "start": 1658.96, "text": "Yeah,"}, {"end": 1668.84, "start": 1662.4, "text": "that's"}, {"end": 1675.28, "start": 1668.84, "text": "a"}, {"end": 1676.12, "start": 1675.28, "text": "very"}, {"end": 1676.52, "start": 1676.12, "text": "interesting"}, {"end": 1677.32, "start": 1676.52, "text": "question."}, {"end": 1678.04, "start": 1677.32, "text": "So"}, {"end": 1678.08, "start": 1678.04, "text": "it"}, {"end": 1678.36, "start": 1678.08, "text": "does"}, {"end": 1678.8, "start": 1678.36, "text": "touch"}, {"end": 1679.2, "start": 1678.8, "text": "upon"}, {"end": 1679.96, "start": 1679.2, "text": "the"}], "text": " Other questions? Yeah, that's a very interesting question. So it does touch upon the"}, {"chunks": [{"end": 1680.28, "start": 1680.0, "text": "on"}, {"end": 1680.4, "start": 1680.28, "text": "this"}, {"end": 1680.8, "start": 1680.4, "text": "kind"}, {"end": 1681.28, "start": 1680.8, "text": "of"}, {"end": 1681.48, "start": 1681.28, "text": "whole"}, {"end": 1682.0, "start": 1681.48, "text": "YouTube"}, {"end": 1682.56, "start": 1682.0, "text": "ecosystem."}, {"end": 1682.96, "start": 1682.56, "text": "So"}, {"end": 1683.04, "start": 1682.96, "text": "right"}, {"end": 1683.44, "start": 1683.04, "text": "now,"}, {"end": 1684.08, "start": 1683.44, "text": "what"}, {"end": 1685.12, "start": 1684.08, "text": "we"}, {"end": 1685.6, "start": 1685.12, "text": "are"}, {"end": 1686.16, "start": 1685.6, "text": "trying"}, {"end": 1686.56, "start": 1686.16, "text": "to"}, {"end": 1686.72, "start": 1686.56, "text": "do"}, {"end": 1687.0, "start": 1686.72, "text": "is"}, {"end": 1687.6, "start": 1687.0, "text": "we"}, {"end": 1688.04, "start": 1687.6, "text": "do"}, {"end": 1688.44, "start": 1688.04, "text": "make"}, {"end": 1689.64, "start": 1688.44, "text": "utilize"}, {"end": 1690.0, "start": 1689.64, "text": "of"}, {"end": 1690.16, "start": 1690.0, "text": "the"}, {"end": 1690.76, "start": 1690.16, "text": "other"}, {"end": 1691.12, "start": 1690.76, "text": "kind"}, {"end": 1691.96, "start": 1691.12, "text": "of"}, {"end": 1693.24, "start": 1691.96, "text": "degenerators"}, {"end": 1693.32, "start": 1693.24, "text": "traffic."}, {"end": 1693.64, "start": 1693.32, "text": "So"}, {"end": 1694.16, "start": 1693.64, "text": "by"}, {"end": 1694.48, "start": 1694.16, "text": "taking"}, {"end": 1695.04, "start": 1694.48, "text": "them"}, {"end": 1695.68, "start": 1695.04, "text": "into"}, {"end": 1696.04, "start": 1695.68, "text": "our"}, {"end": 1696.8, "start": 1696.04, "text": "learning,"}, {"end": 1698.0, "start": 1696.8, "text": "so"}, {"end": 1698.44, "start": 1698.0, "text": "that's"}, {"end": 1698.68, "start": 1698.44, "text": "in"}, {"end": 1699.52, "start": 1698.68, "text": "the"}, {"end": 1700.04, "start": 1699.52, "text": "sense"}, {"end": 1700.28, "start": 1700.04, "text": "we"}, {"end": 1701.0, "start": 1700.28, "text": "can"}, {"end": 1702.04, "start": 1701.0, "text": "sort"}, {"end": 1702.4, "start": 1702.04, "text": "of"}, {"end": 1702.44, "start": 1702.4, "text": "learn"}, {"end": 1703.0, "start": 1702.44, "text": "from"}, {"end": 1703.72, "start": 1703.0, "text": "kind"}, {"end": 1704.16, "start": 1703.72, "text": "of"}, {"end": 1704.88, "start": 1704.16, "text": "mimicking"}, {"end": 1705.28, "start": 1704.88, "text": "their"}, {"end": 1705.6, "start": 1705.28, "text": "good"}, {"end": 1707.08, "start": 1705.6, "text": "behaviors."}, {"end": 1707.68, "start": 1707.08, "text": "But"}, {"end": 1707.84, "start": 1707.68, "text": "then"}, {"end": 1708.0, "start": 1707.84, "text": "there"}, {"end": 1708.28, "start": 1708.0, "text": "are"}, {"end": 1708.96, "start": 1708.28, "text": "also"}, {"end": 1709.28, "start": 1708.96, "text": "work"}, {"end": 1709.6, "start": 1709.28, "text": "coming"}, {"end": 1709.96, "start": 1709.6, "text": "out"}], "text": " on this kind of whole YouTube ecosystem. So right now, what we are trying to do is we do make utilize of the other kind of degenerators traffic. So by taking them into our learning, so that's in the sense we can sort of learn from kind of mimicking their good behaviors. But then there are also work coming out"}, {"chunks": [{"end": 1710.32, "start": 1710.0, "text": "on"}, {"end": 1710.64, "start": 1710.32, "text": "the"}, {"end": 1710.64, "start": 1710.64, "text": "team"}, {"end": 1710.96, "start": 1710.64, "text": "try"}, {"end": 1711.16, "start": 1710.96, "text": "to"}, {"end": 1711.64, "start": 1711.16, "text": "account"}, {"end": 1712.08, "start": 1711.64, "text": "for,"}, {"end": 1712.16, "start": 1712.08, "text": "for"}, {"end": 1712.68, "start": 1712.16, "text": "example,"}, {"end": 1712.68, "start": 1712.68, "text": "the"}, {"end": 1713.44, "start": 1712.68, "text": "interactions"}, {"end": 1713.92, "start": 1713.44, "text": "between"}, {"end": 1714.12, "start": 1713.92, "text": "the"}, {"end": 1714.8, "start": 1714.12, "text": "candidate"}, {"end": 1715.56, "start": 1714.8, "text": "generator"}, {"end": 1715.76, "start": 1715.56, "text": "and"}, {"end": 1716.56, "start": 1715.76, "text": "the"}, {"end": 1717.04, "start": 1716.56, "text": "ranker,"}, {"end": 1717.28, "start": 1717.04, "text": "how"}, {"end": 1717.6, "start": 1717.28, "text": "to"}, {"end": 1718.76, "start": 1717.6, "text": "better"}, {"end": 1719.48, "start": 1718.76, "text": "integrate"}, {"end": 1719.64, "start": 1719.48, "text": "the"}, {"end": 1719.92, "start": 1719.64, "text": "two"}, {"end": 1720.32, "start": 1719.92, "text": "process"}, {"end": 1720.52, "start": 1720.32, "text": "so"}, {"end": 1720.84, "start": 1720.52, "text": "that"}, {"end": 1721.16, "start": 1720.84, "text": "it"}, {"end": 1721.72, "start": 1721.16, "text": "will"}, {"end": 1722.04, "start": 1721.72, "text": "lead"}, {"end": 1722.68, "start": 1722.04, "text": "to"}, {"end": 1723.16, "start": 1722.68, "text": "a"}, {"end": 1724.16, "start": 1723.16, "text": "better"}, {"end": 1725.32, "start": 1724.16, "text": "recommendation"}, {"end": 1734.32, "start": 1725.32, "text": "in"}, {"end": 1737.6, "start": 1734.32, "text": "the"}, {"end": 1740.0, "start": 1737.6, "text": "end."}], "text": " on the team try to account for, for example, the interactions between the candidate generator and the ranker, how to better integrate the two process so that it will lead to a better recommendation in the end."}, {"chunks": [{"end": 1745.28, "start": 1740.0, "text": "It's"}, {"end": 1745.8, "start": 1745.28, "text": "a"}, {"end": 1746.44, "start": 1745.8, "text": "very"}, {"end": 1747.2, "start": 1746.44, "text": "interesting"}, {"end": 1747.92, "start": 1747.2, "text": "question."}, {"end": 1748.56, "start": 1747.92, "text": "So"}, {"end": 1748.76, "start": 1748.56, "text": "indeed,"}, {"end": 1749.64, "start": 1748.76, "text": "we"}, {"end": 1751.92, "start": 1749.64, "text": "actually"}, {"end": 1752.88, "start": 1751.92, "text": "transitioned"}, {"end": 1753.88, "start": 1752.88, "text": "from"}, {"end": 1755.0, "start": 1753.88, "text": "an"}, {"end": 1755.48, "start": 1755.0, "text": "agent"}, {"end": 1755.8, "start": 1755.48, "text": "which"}, {"end": 1756.0, "start": 1755.8, "text": "is"}, {"end": 1756.64, "start": 1756.0, "text": "kind"}, {"end": 1757.24, "start": 1756.64, "text": "of"}, {"end": 1757.64, "start": 1757.24, "text": "doing"}, {"end": 1758.48, "start": 1757.64, "text": "purely"}, {"end": 1759.4, "start": 1758.48, "text": "mimicking"}, {"end": 1759.6, "start": 1759.4, "text": "from"}, {"end": 1760.12, "start": 1759.6, "text": "previous"}, {"end": 1760.4, "start": 1760.12, "text": "system."}, {"end": 1761.0, "start": 1760.4, "text": "So"}, {"end": 1761.28, "start": 1761.0, "text": "in"}, {"end": 1762.4, "start": 1761.28, "text": "the"}, {"end": 1762.76, "start": 1762.4, "text": "sense,"}, {"end": 1762.92, "start": 1762.76, "text": "the"}, {"end": 1763.48, "start": 1762.92, "text": "agent"}, {"end": 1763.88, "start": 1763.48, "text": "is"}, {"end": 1765.6, "start": 1763.88, "text": "having"}, {"end": 1766.68, "start": 1765.6, "text": "reasonable"}, {"end": 1767.2, "start": 1766.68, "text": "levels"}, {"end": 1767.44, "start": 1767.2, "text": "of"}, {"end": 1768.4, "start": 1767.44, "text": "performance"}, {"end": 1768.4, "start": 1768.4, "text": "and"}, {"end": 1768.72, "start": 1768.4, "text": "the"}, {"end": 1768.96, "start": 1768.72, "text": "agent"}, {"end": 1769.04, "start": 1768.96, "text": "is"}, {"end": 1769.96, "start": 1769.04, "text": "having"}], "text": " It's a very interesting question. So indeed, we actually transitioned from an agent which is kind of doing purely mimicking from previous system. So in the sense, the agent is having reasonable levels of performance and the agent is having"}, {"chunks": [{"end": 1770.56, "start": 1770.0, "text": "by"}, {"end": 1772.44, "start": 1770.56, "text": "incorporating"}, {"end": 1772.8, "start": 1772.44, "text": "like"}, {"end": 1773.12, "start": 1772.8, "text": "this"}, {"end": 1773.36, "start": 1773.12, "text": "our"}, {"end": 1773.92, "start": 1773.36, "text": "policy"}, {"end": 1774.56, "start": 1773.92, "text": "learning"}, {"end": 1774.84, "start": 1774.56, "text": "and"}, {"end": 1775.48, "start": 1774.84, "text": "long-term"}, {"end": 1776.2, "start": 1775.48, "text": "rewards"}, {"end": 1776.36, "start": 1776.2, "text": "is"}, {"end": 1776.76, "start": 1776.36, "text": "try"}, {"end": 1777.4, "start": 1776.76, "text": "to"}, {"end": 1778.32, "start": 1777.4, "text": "shift"}, {"end": 1778.68, "start": 1778.32, "text": "from"}, {"end": 1779.04, "start": 1778.68, "text": "this"}, {"end": 1780.0, "start": 1779.04, "text": "original"}, {"end": 1780.4, "start": 1780.0, "text": "behavior"}, {"end": 1780.72, "start": 1780.4, "text": "of"}, {"end": 1781.08, "start": 1780.72, "text": "just"}, {"end": 1782.04, "start": 1781.08, "text": "making"}, {"end": 1782.64, "start": 1782.04, "text": "the"}, {"end": 1783.0, "start": 1782.64, "text": "prior"}, {"end": 1783.56, "start": 1783.0, "text": "system"}, {"end": 1784.28, "start": 1783.56, "text": "towards"}, {"end": 1785.0, "start": 1784.28, "text": "really"}, {"end": 1785.92, "start": 1785.0, "text": "optimizing"}, {"end": 1786.44, "start": 1785.92, "text": "for"}, {"end": 1792.04, "start": 1786.44, "text": "the"}, {"end": 1792.36, "start": 1792.04, "text": "long"}, {"end": 1800.0, "start": 1792.36, "text": "term."}], "text": " by incorporating like this our policy learning and long-term rewards is try to shift from this original behavior of just making the prior system towards really optimizing for the long term."}, {"chunks": [{"end": 1805.2, "start": 1800.0, "text": "Yes,"}, {"end": 1808.12, "start": 1805.2, "text": "we"}, {"end": 1810.4, "start": 1808.12, "text": "actually"}, {"end": 1813.48, "start": 1810.4, "text": "do"}, {"end": 1815.16, "start": 1813.48, "text": "observe"}, {"end": 1820.2, "start": 1815.16, "text": "like"}, {"end": 1821.28, "start": 1820.2, "text": "this"}, {"end": 1822.24, "start": 1821.28, "text": "very"}, {"end": 1823.2, "start": 1822.24, "text": "interesting"}, {"end": 1823.64, "start": 1823.2, "text": "user"}, {"end": 1824.2, "start": 1823.64, "text": "learning"}, {"end": 1825.0, "start": 1824.2, "text": "process"}, {"end": 1825.4, "start": 1825.0, "text": "as"}, {"end": 1825.96, "start": 1825.4, "text": "well"}, {"end": 1827.52, "start": 1825.96, "text": "during"}, {"end": 1828.44, "start": 1827.52, "text": "our"}, {"end": 1829.96, "start": 1828.44, "text": "experiment."}], "text": " Yes, we actually do observe like this very interesting user learning process as well during our experiment."}, {"chunks": [{"end": 1830.48, "start": 1830.0, "text": "So"}, {"end": 1831.28, "start": 1830.48, "text": "there"}, {"end": 1831.56, "start": 1831.28, "text": "is"}, {"end": 1831.96, "start": 1831.56, "text": "this"}, {"end": 1832.8, "start": 1831.96, "text": "growing"}, {"end": 1833.24, "start": 1832.8, "text": "impact"}, {"end": 1833.6, "start": 1833.24, "text": "of"}, {"end": 1835.8, "start": 1833.6, "text": "our"}, {"end": 1836.8, "start": 1835.8, "text": "improvement"}, {"end": 1837.12, "start": 1836.8, "text": "during"}, {"end": 1837.56, "start": 1837.12, "text": "like"}, {"end": 1837.92, "start": 1837.56, "text": "even"}, {"end": 1838.24, "start": 1837.92, "text": "just"}, {"end": 1838.68, "start": 1838.24, "text": "two,"}, {"end": 1838.96, "start": 1838.68, "text": "three"}, {"end": 1839.64, "start": 1838.96, "text": "weeks"}, {"end": 1840.36, "start": 1839.64, "text": "period."}, {"end": 1840.44, "start": 1840.36, "text": "We"}, {"end": 1840.48, "start": 1840.44, "text": "see"}, {"end": 1840.76, "start": 1840.48, "text": "like"}, {"end": 1841.04, "start": 1840.76, "text": "it's"}, {"end": 1841.92, "start": 1841.04, "text": "kind"}, {"end": 1843.08, "start": 1841.92, "text": "of"}, {"end": 1843.4, "start": 1843.08, "text": "a"}, {"end": 1843.8, "start": 1843.4, "text": "move"}, {"end": 1844.32, "start": 1843.8, "text": "from"}, {"end": 1844.68, "start": 1844.32, "text": "a"}, {"end": 1845.72, "start": 1844.68, "text": "neutral-ish"}, {"end": 1846.48, "start": 1845.72, "text": "improvement"}, {"end": 1847.16, "start": 1846.48, "text": "towards"}, {"end": 1847.44, "start": 1847.16, "text": "a"}, {"end": 1848.04, "start": 1847.44, "text": "bigger"}, {"end": 1848.28, "start": 1848.04, "text": "gain"}, {"end": 1848.64, "start": 1848.28, "text": "where"}, {"end": 1849.76, "start": 1848.64, "text": "the"}, {"end": 1850.56, "start": 1849.76, "text": "user"}, {"end": 1850.84, "start": 1850.56, "text": "is"}, {"end": 1851.28, "start": 1850.84, "text": "actually"}, {"end": 1851.76, "start": 1851.28, "text": "also"}, {"end": 1852.44, "start": 1851.76, "text": "adapting"}, {"end": 1853.04, "start": 1852.44, "text": "to"}, {"end": 1853.56, "start": 1853.04, "text": "this"}, {"end": 1853.68, "start": 1853.56, "text": "new"}, {"end": 1854.2, "start": 1853.68, "text": "behavior"}, {"end": 1854.76, "start": 1854.2, "text": "exhibited"}, {"end": 1855.92, "start": 1854.76, "text": "by"}, {"end": 1856.32, "start": 1855.92, "text": "the"}, {"end": 1860.0, "start": 1856.32, "text": "system."}], "text": " So there is this growing impact of our improvement during like even just two, three weeks period. We see like it's kind of a move from a neutral-ish improvement towards a bigger gain where the user is actually also adapting to this new behavior exhibited by the system."}, {"chunks": [{"end": 1869.48, "start": 1860.0, "text": "It's"}, {"end": 1869.92, "start": 1869.48, "text": "a"}, {"end": 1870.28, "start": 1869.92, "text": "great"}, {"end": 1870.88, "start": 1870.28, "text": "question."}, {"end": 1871.24, "start": 1870.88, "text": "So"}, {"end": 1873.76, "start": 1871.24, "text": "we"}, {"end": 1877.2, "start": 1873.76, "text": "haven't"}, {"end": 1877.56, "start": 1877.2, "text": "really"}, {"end": 1877.96, "start": 1877.56, "text": "been"}, {"end": 1878.24, "start": 1877.96, "text": "trying"}, {"end": 1878.48, "start": 1878.24, "text": "to"}, {"end": 1878.8, "start": 1878.48, "text": "tease"}, {"end": 1879.32, "start": 1878.8, "text": "out"}, {"end": 1879.84, "start": 1879.32, "text": "whether"}, {"end": 1880.52, "start": 1879.84, "text": "on,"}, {"end": 1880.72, "start": 1880.52, "text": "like,"}, {"end": 1881.0, "start": 1880.72, "text": "the"}, {"end": 1881.4, "start": 1881.0, "text": "implement"}, {"end": 1881.56, "start": 1881.4, "text": "is"}, {"end": 1881.96, "start": 1881.56, "text": "more"}, {"end": 1882.64, "start": 1881.96, "text": "coming"}, {"end": 1883.92, "start": 1882.64, "text": "from"}, {"end": 1885.2, "start": 1883.92, "text": "the"}, {"end": 1885.84, "start": 1885.2, "text": "heavy"}, {"end": 1886.24, "start": 1885.84, "text": "head"}, {"end": 1886.72, "start": 1886.24, "text": "users"}, {"end": 1886.84, "start": 1886.72, "text": "or"}, {"end": 1886.96, "start": 1886.84, "text": "the"}, {"end": 1887.56, "start": 1886.96, "text": "tail"}, {"end": 1888.76, "start": 1887.56, "text": "users."}, {"end": 1889.04, "start": 1888.76, "text": "But"}, {"end": 1889.08, "start": 1889.04, "text": "usually"}, {"end": 1889.36, "start": 1889.08, "text": "the"}, {"end": 1889.72, "start": 1889.36, "text": "tail"}, {"end": 1889.96, "start": 1889.72, "text": "users"}], "text": " It's a great question. So we haven't really been trying to tease out whether on, like, the implement is more coming from the heavy head users or the tail users. But usually the tail users"}, {"chunks": [{"end": 1890.28, "start": 1890.0, "text": "Kind"}, {"end": 1890.44, "start": 1890.28, "text": "of"}, {"end": 1890.88, "start": 1890.44, "text": "tail"}, {"end": 1891.16, "start": 1890.88, "text": "users"}, {"end": 1891.64, "start": 1891.16, "text": "are"}, {"end": 1891.88, "start": 1891.64, "text": "also"}, {"end": 1892.08, "start": 1891.88, "text": "more"}, {"end": 1892.44, "start": 1892.08, "text": "interested"}, {"end": 1892.6, "start": 1892.44, "text": "in"}, {"end": 1892.92, "start": 1892.6, "text": "the"}, {"end": 1893.28, "start": 1892.92, "text": "tail"}, {"end": 1893.92, "start": 1893.28, "text": "content."}, {"end": 1894.08, "start": 1893.92, "text": "So"}, {"end": 1894.4, "start": 1894.08, "text": "maybe"}, {"end": 1894.8, "start": 1894.4, "text": "the"}, {"end": 1895.16, "start": 1894.8, "text": "shift"}, {"end": 1896.08, "start": 1895.16, "text": "in"}, {"end": 1896.44, "start": 1896.08, "text": "kind"}, {"end": 1896.44, "start": 1896.44, "text": "of"}, {"end": 1896.96, "start": 1896.44, "text": "traffic"}, {"end": 1897.52, "start": 1896.96, "text": "towards"}, {"end": 1898.24, "start": 1897.52, "text": "tail"}, {"end": 1898.56, "start": 1898.24, "text": "also"}, {"end": 1898.72, "start": 1898.56, "text": "suggests"}, {"end": 1899.0, "start": 1898.72, "text": "maybe"}, {"end": 1899.72, "start": 1899.0, "text": "we're"}, {"end": 1900.0, "start": 1899.72, "text": "dealing"}, {"end": 1900.56, "start": 1900.0, "text": "better"}, {"end": 1900.72, "start": 1900.56, "text": "with"}, {"end": 1900.92, "start": 1900.72, "text": "the"}, {"end": 1901.44, "start": 1900.92, "text": "tail"}, {"end": 1908.12, "start": 1901.44, "text": "users."}, {"end": 1908.52, "start": 1908.12, "text": "Yes."}, {"end": 1908.68, "start": 1908.52, "text": "When"}, {"end": 1909.2, "start": 1908.68, "text": "running"}, {"end": 1909.64, "start": 1909.2, "text": "these"}, {"end": 1909.8, "start": 1909.64, "text": "AP"}, {"end": 1909.96, "start": 1909.8, "text": "tests,"}, {"end": 1910.76, "start": 1909.96, "text": "how"}, {"end": 1911.56, "start": 1910.76, "text": "often"}, {"end": 1911.56, "start": 1911.56, "text": "did"}, {"end": 1912.32, "start": 1911.56, "text": "you"}, {"end": 1912.52, "start": 1912.32, "text": "measure"}, {"end": 1912.92, "start": 1912.52, "text": "the"}, {"end": 1914.64, "start": 1912.92, "text": "changes?"}, {"end": 1915.08, "start": 1914.64, "text": "So"}, {"end": 1915.4, "start": 1915.08, "text": "we"}, {"end": 1915.84, "start": 1915.4, "text": "run"}, {"end": 1915.96, "start": 1915.84, "text": "it"}, {"end": 1916.56, "start": 1915.96, "text": "for"}, {"end": 1916.84, "start": 1916.56, "text": "like"}, {"end": 1917.28, "start": 1916.84, "text": "usually"}, {"end": 1917.96, "start": 1917.28, "text": "several"}, {"end": 1918.36, "start": 1917.96, "text": "weeks."}, {"end": 1918.4, "start": 1918.36, "text": "And"}, {"end": 1918.68, "start": 1918.4, "text": "then"}, {"end": 1919.2, "start": 1918.68, "text": "after"}, {"end": 1919.44, "start": 1919.2, "text": "that,"}, {"end": 1919.52, "start": 1919.44, "text": "we"}, {"end": 1919.96, "start": 1919.52, "text": "also"}], "text": " Kind of tail users are also more interested in the tail content. So maybe the shift in kind of traffic towards tail also suggests maybe we're dealing better with the tail users. Yes. When running these AP tests, how often did you measure the changes? So we run it for like usually several weeks. And then after that, we also"}, {"chunks": [{"end": 1920.36, "start": 1920.0, "text": "when"}, {"end": 1920.92, "start": 1920.36, "text": "kind"}, {"end": 1921.04, "start": 1920.92, "text": "of"}, {"end": 1921.28, "start": 1921.04, "text": "hold"}, {"end": 1921.76, "start": 1921.28, "text": "back"}, {"end": 1922.36, "start": 1921.76, "text": "experiments"}, {"end": 1922.6, "start": 1922.36, "text": "even"}, {"end": 1922.88, "start": 1922.6, "text": "much"}, {"end": 1923.2, "start": 1922.88, "text": "longer"}, {"end": 1923.4, "start": 1923.2, "text": "to"}, {"end": 1923.8, "start": 1923.4, "text": "hold"}, {"end": 1923.88, "start": 1923.8, "text": "back"}, {"end": 1924.8, "start": 1923.88, "text": "experience"}, {"end": 1925.08, "start": 1924.8, "text": "where"}, {"end": 1925.36, "start": 1925.08, "text": "we"}, {"end": 1926.28, "start": 1925.36, "text": "will"}, {"end": 1926.48, "start": 1926.28, "text": "hold"}, {"end": 1926.64, "start": 1926.48, "text": "back"}, {"end": 1926.76, "start": 1926.64, "text": "the"}, {"end": 1927.04, "start": 1926.76, "text": "system"}, {"end": 1927.52, "start": 1927.04, "text": "for"}, {"end": 1928.48, "start": 1927.52, "text": "months"}, {"end": 1928.96, "start": 1928.48, "text": "and"}, {"end": 1929.88, "start": 1928.96, "text": "try"}, {"end": 1930.52, "start": 1929.88, "text": "to"}, {"end": 1930.76, "start": 1930.52, "text": "see"}, {"end": 1930.88, "start": 1930.76, "text": "if"}, {"end": 1931.72, "start": 1930.88, "text": "the"}, {"end": 1932.56, "start": 1931.72, "text": "gain"}, {"end": 1932.88, "start": 1932.56, "text": "I"}, {"end": 1933.8, "start": 1932.88, "text": "mean"}, {"end": 1934.52, "start": 1933.8, "text": "the"}, {"end": 1934.96, "start": 1934.52, "text": "difference"}, {"end": 1935.08, "start": 1934.96, "text": "between"}, {"end": 1935.12, "start": 1935.08, "text": "the"}, {"end": 1935.88, "start": 1935.12, "text": "control"}, {"end": 1936.28, "start": 1935.88, "text": "and"}, {"end": 1936.32, "start": 1936.28, "text": "the"}, {"end": 1936.8, "start": 1936.32, "text": "experiment"}, {"end": 1937.32, "start": 1936.8, "text": "group."}, {"end": 1950.0, "start": 1937.32, "text": "Yes?"}], "text": " when kind of hold back experiments even much longer to hold back experience where we will hold back the system for months and try to see if the gain I mean the difference between the control and the experiment group. Yes?"}, {"chunks": [{"end": 1957.76, "start": 1950.0, "text": "So,"}, {"end": 1958.16, "start": 1957.76, "text": "I"}, {"end": 1959.76, "start": 1958.16, "text": "mean,"}, {"end": 1960.16, "start": 1959.76, "text": "if"}, {"end": 1960.64, "start": 1960.16, "text": "a"}, {"end": 1961.12, "start": 1960.64, "text": "cookie"}, {"end": 1961.92, "start": 1961.12, "text": "expires"}, {"end": 1962.16, "start": 1961.92, "text": "or"}, {"end": 1962.44, "start": 1962.16, "text": "get"}, {"end": 1963.16, "start": 1962.44, "text": "cleaned,"}, {"end": 1963.32, "start": 1963.16, "text": "I"}, {"end": 1963.6, "start": 1963.32, "text": "think"}, {"end": 1963.96, "start": 1963.6, "text": "they"}, {"end": 1964.56, "start": 1963.96, "text": "will"}, {"end": 1964.76, "start": 1964.56, "text": "be"}, {"end": 1965.2, "start": 1964.76, "text": "just"}, {"end": 1965.48, "start": 1965.2, "text": "dropped"}, {"end": 1966.0, "start": 1965.48, "text": "out"}, {"end": 1966.88, "start": 1966.0, "text": "from"}, {"end": 1968.32, "start": 1966.88, "text": "the"}, {"end": 1968.84, "start": 1968.32, "text": "long-term"}, {"end": 1979.96, "start": 1968.84, "text": "holdback."}], "text": " So, I mean, if a cookie expires or get cleaned, I think they will be just dropped out from the long-term holdback."}, {"chunks": [{"end": 1983.24, "start": 1980.0, "text": "Yeah,"}, {"end": 1983.76, "start": 1983.24, "text": "that's"}, {"end": 1983.8, "start": 1983.76, "text": "a"}, {"end": 1984.16, "start": 1983.8, "text": "good"}, {"end": 1984.8, "start": 1984.16, "text": "question."}, {"end": 1985.0, "start": 1984.8, "text": "I"}, {"end": 1985.12, "start": 1985.0, "text": "don't"}, {"end": 1985.36, "start": 1985.12, "text": "have"}, {"end": 1985.6, "start": 1985.36, "text": "a"}, {"end": 1986.04, "start": 1985.6, "text": "good"}, {"end": 1986.76, "start": 1986.04, "text": "answer"}, {"end": 1987.32, "start": 1986.76, "text": "right"}, {"end": 1987.8, "start": 1987.32, "text": "now."}, {"end": 1987.92, "start": 1987.8, "text": "Maybe"}, {"end": 1988.0, "start": 1987.92, "text": "I"}, {"end": 1988.2, "start": 1988.0, "text": "can"}, {"end": 1988.44, "start": 1988.2, "text": "address"}, {"end": 1994.56, "start": 1988.44, "text": "later."}, {"end": 1995.72, "start": 1994.56, "text": "Thank"}, {"end": 1996.2, "start": 1995.72, "text": "you,"}, {"end": 1996.8, "start": 1996.2, "text": "Henry."}, {"end": 1996.8, "start": 1996.8, "text": "Thank"}, {"end": 1996.84, "start": 1996.8, "text": "you."}], "text": " Yeah, that's a good question. I don't have a good answer right now. Maybe I can address later. Thank you, Henry. Thank you."}]}}