{"message": {"transcript": [{"chunks": [{"end": 6.88, "start": 0.0, "text": "So"}, {"end": 7.24, "start": 6.88, "text": "I"}, {"end": 7.48, "start": 7.24, "text": "interned"}, {"end": 7.56, "start": 7.48, "text": "at"}, {"end": 7.64, "start": 7.56, "text": "Two"}, {"end": 8.0, "start": 7.64, "text": "Sigma"}, {"end": 8.52, "start": 8.0, "text": "last"}, {"end": 8.92, "start": 8.52, "text": "summer"}, {"end": 9.08, "start": 8.92, "text": "and"}, {"end": 9.28, "start": 9.08, "text": "I"}, {"end": 9.76, "start": 9.28, "text": "returned"}, {"end": 10.2, "start": 9.76, "text": "this"}, {"end": 11.0, "start": 10.2, "text": "fall"}, {"end": 11.48, "start": 11.0, "text": "primarily"}, {"end": 11.72, "start": 11.48, "text": "because"}, {"end": 11.8, "start": 11.72, "text": "of"}, {"end": 11.96, "start": 11.8, "text": "the"}, {"end": 12.2, "start": 11.96, "text": "people"}, {"end": 12.52, "start": 12.2, "text": "I"}, {"end": 12.72, "start": 12.52, "text": "met."}, {"end": 12.72, "start": 12.72, "text": "I"}, {"end": 12.92, "start": 12.72, "text": "feel"}, {"end": 13.16, "start": 12.92, "text": "like"}, {"end": 13.24, "start": 13.16, "text": "people"}, {"end": 13.68, "start": 13.24, "text": "here"}, {"end": 14.2, "start": 13.68, "text": "genuinely"}, {"end": 14.56, "start": 14.2, "text": "form"}, {"end": 14.56, "start": 14.56, "text": "a"}, {"end": 14.92, "start": 14.56, "text": "community"}, {"end": 15.2, "start": 14.92, "text": "around"}, {"end": 15.64, "start": 15.2, "text": "shared"}, {"end": 16.04, "start": 15.64, "text": "interests,"}, {"end": 16.44, "start": 16.04, "text": "usually"}, {"end": 16.64, "start": 16.44, "text": "a"}, {"end": 16.96, "start": 16.64, "text": "shared"}, {"end": 17.4, "start": 16.96, "text": "passion"}, {"end": 17.6, "start": 17.4, "text": "for"}, {"end": 18.12, "start": 17.6, "text": "science"}, {"end": 18.52, "start": 18.12, "text": "and"}, {"end": 18.56, "start": 18.52, "text": "tech."}, {"end": 18.72, "start": 18.56, "text": "Honestly"}, {"end": 19.08, "start": 18.72, "text": "speaking,"}, {"end": 19.72, "start": 19.08, "text": "finance"}, {"end": 20.08, "start": 19.72, "text": "wasn't"}, {"end": 20.16, "start": 20.08, "text": "on"}, {"end": 20.48, "start": 20.16, "text": "my"}, {"end": 20.84, "start": 20.48, "text": "list"}, {"end": 21.0, "start": 20.84, "text": "when"}, {"end": 21.08, "start": 21.0, "text": "I"}, {"end": 21.44, "start": 21.08, "text": "started"}, {"end": 21.92, "start": 21.44, "text": "job"}, {"end": 22.56, "start": 21.92, "text": "hunting."}, {"end": 22.96, "start": 22.56, "text": "But"}, {"end": 23.16, "start": 22.96, "text": "when"}, {"end": 23.28, "start": 23.16, "text": "I"}, {"end": 23.56, "start": 23.28, "text": "came"}, {"end": 24.12, "start": 23.56, "text": "here,"}, {"end": 24.72, "start": 24.12, "text": "everyone"}, {"end": 24.96, "start": 24.72, "text": "was"}, {"end": 25.68, "start": 24.96, "text": "so"}, {"end": 26.0, "start": 25.68, "text": "smart"}, {"end": 26.4, "start": 26.0, "text": "and"}, {"end": 26.72, "start": 26.4, "text": "so"}, {"end": 27.28, "start": 26.72, "text": "friendly"}, {"end": 27.28, "start": 27.28, "text": "and"}, {"end": 27.8, "start": 27.28, "text": "so"}, {"end": 28.2, "start": 27.8, "text": "sincere."}, {"end": 28.68, "start": 28.2, "text": "And"}, {"end": 28.88, "start": 28.68, "text": "I"}, {"end": 29.04, "start": 28.88, "text": "feel"}, {"end": 29.52, "start": 29.04, "text": "like"}, {"end": 29.96, "start": 29.52, "text": "my"}], "text": " So I interned at Two Sigma last summer and I returned this fall primarily because of the people I met. I feel like people here genuinely form a community around shared interests, usually a shared passion for science and tech. Honestly speaking, finance wasn't on my list when I started job hunting. But when I came here, everyone was so smart and so friendly and so sincere. And I feel like my"}, {"chunks": [{"end": 30.48, "start": 30.0, "text": "Skillset"}, {"end": 30.68, "start": 30.48, "text": "can"}, {"end": 30.8, "start": 30.68, "text": "be"}, {"end": 31.08, "start": 30.8, "text": "truly"}, {"end": 31.52, "start": 31.08, "text": "valued"}, {"end": 31.72, "start": 31.52, "text": "here"}, {"end": 31.92, "start": 31.72, "text": "and"}, {"end": 32.64, "start": 31.92, "text": "that's"}, {"end": 33.04, "start": 32.64, "text": "something"}, {"end": 33.4, "start": 33.04, "text": "really"}, {"end": 33.72, "start": 33.4, "text": "important"}, {"end": 33.96, "start": 33.72, "text": "to"}, {"end": 34.2, "start": 33.96, "text": "me."}, {"end": 34.56, "start": 34.2, "text": "There's"}, {"end": 35.12, "start": 34.56, "text": "so"}, {"end": 35.36, "start": 35.12, "text": "many"}, {"end": 35.88, "start": 35.36, "text": "cool,"}, {"end": 36.36, "start": 35.88, "text": "exciting"}, {"end": 36.72, "start": 36.36, "text": "projects"}, {"end": 36.92, "start": 36.72, "text": "to"}, {"end": 37.32, "start": 36.92, "text": "work"}, {"end": 37.56, "start": 37.32, "text": "on"}, {"end": 37.72, "start": 37.56, "text": "and"}, {"end": 38.0, "start": 37.72, "text": "I"}, {"end": 38.36, "start": 38.0, "text": "know"}, {"end": 38.8, "start": 38.36, "text": "I'm"}, {"end": 38.96, "start": 38.8, "text": "going"}, {"end": 39.480000000000004, "start": 38.96, "text": "to"}, {"end": 39.64, "start": 39.480000000000004, "text": "be"}, {"end": 40.36, "start": 39.64, "text": "working"}, {"end": 40.36, "start": 40.36, "text": "with"}, {"end": 40.84, "start": 40.36, "text": "those"}, {"end": 41.24, "start": 40.84, "text": "people"}, {"end": 41.4, "start": 41.24, "text": "that"}, {"end": 41.76, "start": 41.4, "text": "are"}, {"end": 42.04, "start": 41.76, "text": "super"}, {"end": 42.24, "start": 42.04, "text": "knowledgeable"}, {"end": 42.519999999999996, "start": 42.24, "text": "and"}, {"end": 43.16, "start": 42.519999999999996, "text": "that's"}, {"end": 43.72, "start": 43.16, "text": "what"}, {"end": 44.12, "start": 43.72, "text": "drives"}, {"end": 44.16, "start": 44.12, "text": "me"}, {"end": 44.4, "start": 44.16, "text": "to"}, {"end": 44.44, "start": 44.4, "text": "move"}, {"end": 44.84, "start": 44.44, "text": "forward."}, {"end": 45.0, "start": 44.84, "text": "At"}, {"end": 45.32, "start": 45.0, "text": "other"}, {"end": 45.6, "start": 45.32, "text": "tech"}, {"end": 46.0, "start": 45.6, "text": "companies"}, {"end": 46.16, "start": 46.0, "text": "I've"}, {"end": 46.32, "start": 46.16, "text": "worked"}, {"end": 46.480000000000004, "start": 46.32, "text": "at,"}, {"end": 46.6, "start": 46.480000000000004, "text": "you"}, {"end": 46.6, "start": 46.6, "text": "can"}, {"end": 46.84, "start": 46.6, "text": "kind"}, {"end": 46.92, "start": 46.84, "text": "of"}, {"end": 47.08, "start": 46.92, "text": "feel"}, {"end": 47.32, "start": 47.08, "text": "like"}, {"end": 47.68, "start": 47.32, "text": "you're"}, {"end": 47.760000000000005, "start": 47.68, "text": "a"}, {"end": 47.879999999999995, "start": 47.760000000000005, "text": "very,"}, {"end": 48.28, "start": 47.879999999999995, "text": "very"}, {"end": 48.64, "start": 48.28, "text": "small"}, {"end": 49.0, "start": 48.64, "text": "part"}, {"end": 49.120000000000005, "start": 49.0, "text": "of"}, {"end": 49.32, "start": 49.120000000000005, "text": "a"}, {"end": 49.32, "start": 49.32, "text": "very,"}, {"end": 49.6, "start": 49.32, "text": "very"}, {"end": 49.879999999999995, "start": 49.6, "text": "large"}, {"end": 50.4, "start": 49.879999999999995, "text": "thing"}, {"end": 50.6, "start": 50.4, "text": "and"}, {"end": 50.6, "start": 50.6, "text": "that"}, {"end": 50.6, "start": 50.6, "text": "you're"}, {"end": 50.92, "start": 50.6, "text": "kind"}, {"end": 51.16, "start": 50.92, "text": "of"}, {"end": 51.6, "start": 51.16, "text": "expendable"}, {"end": 51.879999999999995, "start": 51.6, "text": "that"}, {"end": 52.36, "start": 51.879999999999995, "text": "way."}, {"end": 52.84, "start": 52.36, "text": "Whereas"}, {"end": 53.36, "start": 52.84, "text": "here,"}, {"end": 53.64, "start": 53.36, "text": "people"}, {"end": 54.120000000000005, "start": 53.64, "text": "are"}, {"end": 54.56, "start": 54.120000000000005, "text": "extremely"}, {"end": 55.32, "start": 54.56, "text": "invested"}, {"end": 55.36, "start": 55.32, "text": "in"}, {"end": 55.56, "start": 55.36, "text": "my"}, {"end": 56.239999999999995, "start": 55.56, "text": "success."}, {"end": 56.68, "start": 56.239999999999995, "text": "This"}, {"end": 56.8, "start": 56.68, "text": "is"}, {"end": 57.0, "start": 56.8, "text": "just"}, {"end": 57.28, "start": 57.0, "text": "the"}, {"end": 57.8, "start": 57.28, "text": "perfect"}, {"end": 58.2, "start": 57.8, "text": "place"}, {"end": 58.32, "start": 58.2, "text": "to"}, {"end": 58.6, "start": 58.32, "text": "be."}, {"end": 58.96, "start": 58.6, "text": "So"}, {"end": 59.239999999999995, "start": 58.96, "text": "yeah,"}, {"end": 59.32, "start": 59.239999999999995, "text": "I'm"}, {"end": 59.6, "start": 59.32, "text": "pretty"}, {"end": 59.96, "start": 59.6, "text": "happy"}], "text": " Skillset can be truly valued here and that's something really important to me. There's so many cool, exciting projects to work on and I know I'm going to be working with those people that are super knowledgeable and that's what drives me to move forward. At other tech companies I've worked at, you can kind of feel like you're a very, very small part of a very, very large thing and that you're kind of expendable that way. Whereas here, people are extremely invested in my success. This is just the perfect place to be. So yeah, I'm pretty happy"}, {"chunks": [{"end": 60.32, "start": 60.0, "text": "and"}, {"end": 60.6, "start": 60.32, "text": "make"}, {"end": 60.88, "start": 60.6, "text": "that"}, {"end": 61.16, "start": 60.88, "text": "decision."}, {"end": 62.16, "start": 61.16, "text": "Thank"}, {"end": 66.04, "start": 62.16, "text": "you"}, {"end": 66.48, "start": 66.04, "text": "all"}, {"end": 66.92, "start": 66.48, "text": "very"}, {"end": 67.24, "start": 66.92, "text": "much"}, {"end": 67.6, "start": 67.24, "text": "for"}, {"end": 67.6, "start": 67.6, "text": "the"}, {"end": 68.16, "start": 67.6, "text": "opportunity"}, {"end": 68.16, "start": 68.16, "text": "to"}, {"end": 68.16, "start": 68.16, "text": "talk"}, {"end": 68.16, "start": 68.16, "text": "a"}, {"end": 68.16, "start": 68.16, "text": "little"}, {"end": 68.16, "start": 68.16, "text": "bit"}, {"end": 68.28, "start": 68.16, "text": "about"}, {"end": 69.0, "start": 68.28, "text": "how"}, {"end": 69.52, "start": 69.0, "text": "I"}, {"end": 69.76, "start": 69.52, "text": "see"}, {"end": 70.03999999999999, "start": 69.76, "text": "data"}, {"end": 70.76, "start": 70.03999999999999, "text": "science"}, {"end": 71.76, "start": 70.76, "text": "developing."}, {"end": 71.92, "start": 71.76, "text": "My"}, {"end": 74.08, "start": 71.92, "text": "principal"}, {"end": 74.32, "start": 74.08, "text": "goal"}, {"end": 74.76, "start": 74.32, "text": "is"}, {"end": 74.76, "start": 74.76, "text": "to"}, {"end": 74.76, "start": 74.76, "text": "argue"}, {"end": 74.88, "start": 74.76, "text": "that"}, {"end": 75.28, "start": 74.88, "text": "while"}, {"end": 75.6, "start": 75.28, "text": "both"}, {"end": 75.88, "start": 75.6, "text": "machine"}, {"end": 76.32, "start": 75.88, "text": "learning"}, {"end": 76.6, "start": 76.32, "text": "and"}, {"end": 77.32, "start": 76.6, "text": "statistics"}, {"end": 77.6, "start": 77.32, "text": "are"}, {"end": 77.92, "start": 77.6, "text": "important"}, {"end": 78.68, "start": 77.92, "text": "contributors"}, {"end": 78.68, "start": 78.68, "text": "to"}, {"end": 78.8, "start": 78.68, "text": "the"}, {"end": 79.6, "start": 78.8, "text": "field,"}, {"end": 79.88, "start": 79.6, "text": "there"}, {"end": 79.92, "start": 79.88, "text": "are"}, {"end": 80.4, "start": 79.92, "text": "other"}, {"end": 80.96000000000001, "start": 80.4, "text": "approaches"}, {"end": 81.03999999999999, "start": 80.96000000000001, "text": "and"}, {"end": 81.48, "start": 81.03999999999999, "text": "other"}, {"end": 82.32, "start": 81.48, "text": "algorithms"}, {"end": 82.64, "start": 82.32, "text": "that"}, {"end": 82.84, "start": 82.64, "text": "also"}, {"end": 83.48, "start": 82.84, "text": "play"}, {"end": 83.68, "start": 83.48, "text": "a"}, {"end": 83.96000000000001, "start": 83.68, "text": "key"}, {"end": 84.64, "start": 83.96000000000001, "text": "role."}, {"end": 85.0, "start": 84.64, "text": "In"}, {"end": 85.6, "start": 85.0, "text": "the"}, {"end": 85.88, "start": 85.6, "text": "last"}, {"end": 86.44, "start": 85.88, "text": "part"}, {"end": 86.56, "start": 86.44, "text": "of"}, {"end": 86.56, "start": 86.56, "text": "the"}, {"end": 86.72, "start": 86.56, "text": "talk,"}, {"end": 87.03999999999999, "start": 86.72, "text": "I'll"}, {"end": 87.36, "start": 87.03999999999999, "text": "show"}, {"end": 87.44, "start": 87.36, "text": "you"}, {"end": 87.44, "start": 87.44, "text": "two"}, {"end": 87.48, "start": 87.44, "text": "examples"}, {"end": 87.52, "start": 87.48, "text": "of"}, {"end": 88.03999999999999, "start": 87.52, "text": "algorithms"}, {"end": 88.36, "start": 88.03999999999999, "text": "that"}, {"end": 88.68, "start": 88.36, "text": "are"}, {"end": 89.48, "start": 88.68, "text": "important"}, {"end": 89.96000000000001, "start": 89.48, "text": "for"}], "text": " and make that decision. Thank you all very much for the opportunity to talk a little bit about how I see data science developing. My principal goal is to argue that while both machine learning and statistics are important contributors to the field, there are other approaches and other algorithms that also play a key role. In the last part of the talk, I'll show you two examples of algorithms that are important for"}, {"chunks": [{"end": 90.32, "start": 90.0, "text": "dealing"}, {"end": 90.6, "start": 90.32, "text": "with"}, {"end": 91.16, "start": 90.6, "text": "large-scale"}, {"end": 91.52, "start": 91.16, "text": "data,"}, {"end": 92.0, "start": 91.52, "text": "but"}, {"end": 92.56, "start": 92.0, "text": "that"}, {"end": 93.32, "start": 92.56, "text": "are"}, {"end": 94.4, "start": 93.32, "text": "definitely"}, {"end": 94.68, "start": 94.4, "text": "not"}, {"end": 95.12, "start": 94.68, "text": "machine"}, {"end": 96.6, "start": 95.12, "text": "learning."}, {"end": 97.6, "start": 96.6, "text": "Before"}, {"end": 98.44, "start": 97.6, "text": "proceeding,"}, {"end": 99.32, "start": 98.44, "text": "I"}, {"end": 99.96000000000001, "start": 99.32, "text": "want"}, {"end": 100.03999999999999, "start": 99.96000000000001, "text": "to"}, {"end": 100.44, "start": 100.03999999999999, "text": "point"}, {"end": 100.64, "start": 100.44, "text": "you"}, {"end": 100.76, "start": 100.64, "text": "to"}, {"end": 101.0, "start": 100.76, "text": "two"}, {"end": 101.48, "start": 101.0, "text": "places"}, {"end": 101.76, "start": 101.48, "text": "where"}, {"end": 101.92, "start": 101.76, "text": "you"}, {"end": 101.96000000000001, "start": 101.92, "text": "can"}, {"end": 102.36, "start": 101.96000000000001, "text": "find"}, {"end": 103.03999999999999, "start": 102.36, "text": "material"}, {"end": 103.44, "start": 103.03999999999999, "text": "relevant"}, {"end": 103.84, "start": 103.44, "text": "to"}, {"end": 104.32, "start": 103.84, "text": "this"}, {"end": 104.64, "start": 104.32, "text": "talk."}, {"end": 104.92, "start": 104.64, "text": "Okay."}, {"end": 105.28, "start": 104.92, "text": "First"}, {"end": 105.56, "start": 105.28, "text": "of"}, {"end": 105.84, "start": 105.56, "text": "all,"}, {"end": 106.56, "start": 105.84, "text": "several"}, {"end": 106.8, "start": 106.56, "text": "years"}, {"end": 107.03999999999999, "start": 106.8, "text": "ago,"}, {"end": 107.32, "start": 107.03999999999999, "text": "I"}, {"end": 107.56, "start": 107.32, "text": "was"}, {"end": 108.32, "start": 107.56, "text": "part"}, {"end": 108.56, "start": 108.32, "text": "of"}, {"end": 108.56, "start": 108.56, "text": "a"}, {"end": 108.6, "start": 108.56, "text": "study"}, {"end": 109.16, "start": 108.6, "text": "conducted"}, {"end": 109.56, "start": 109.16, "text": "by"}, {"end": 109.92, "start": 109.56, "text": "the"}, {"end": 110.44, "start": 109.92, "text": "U.S."}, {"end": 111.03999999999999, "start": 110.44, "text": "National"}, {"end": 111.72, "start": 111.03999999999999, "text": "Research"}, {"end": 112.16, "start": 111.72, "text": "Council"}, {"end": 112.24, "start": 112.16, "text": "on"}, {"end": 112.52, "start": 112.24, "text": "Data"}, {"end": 113.08, "start": 112.52, "text": "Science"}, {"end": 114.56, "start": 113.08, "text": "Education."}, {"end": 114.96000000000001, "start": 114.56, "text": "It"}, {"end": 115.36, "start": 114.96000000000001, "text": "was"}, {"end": 115.6, "start": 115.36, "text": "run"}, {"end": 115.92, "start": 115.6, "text": "by"}, {"end": 116.08, "start": 115.92, "text": "the"}, {"end": 116.52, "start": 116.08, "text": "Statistics"}, {"end": 116.96000000000001, "start": 116.52, "text": "Branch"}, {"end": 117.03999999999999, "start": 116.96000000000001, "text": "of"}, {"end": 117.2, "start": 117.03999999999999, "text": "the"}, {"end": 118.0, "start": 117.2, "text": "NRC,"}, {"end": 118.16, "start": 118.0, "text": "not"}, {"end": 118.68, "start": 118.16, "text": "the"}, {"end": 119.16, "start": 118.68, "text": "Computer"}, {"end": 119.64, "start": 119.16, "text": "Science"}, {"end": 119.96000000000001, "start": 119.64, "text": "Branch."}], "text": " dealing with large-scale data, but that are definitely not machine learning. Before proceeding, I want to point you to two places where you can find material relevant to this talk. Okay. First of all, several years ago, I was part of a study conducted by the U.S. National Research Council on Data Science Education. It was run by the Statistics Branch of the NRC, not the Computer Science Branch."}, {"chunks": [{"end": 120.56, "start": 120.0, "text": "But"}, {"end": 121.04, "start": 120.56, "text": "the"}, {"end": 121.56, "start": 121.04, "text": "membership"}, {"end": 121.84, "start": 121.56, "text": "was"}, {"end": 122.24, "start": 121.84, "text": "about"}, {"end": 122.64, "start": 122.24, "text": "half"}, {"end": 123.32, "start": 122.64, "text": "and"}, {"end": 123.68, "start": 123.32, "text": "half."}, {"end": 123.8, "start": 123.68, "text": "There"}, {"end": 123.88, "start": 123.8, "text": "are"}, {"end": 124.24, "start": 123.88, "text": "a"}, {"end": 124.92, "start": 124.24, "text": "lot"}, {"end": 125.44, "start": 124.92, "text": "of"}, {"end": 126.08, "start": 125.44, "text": "interesting"}, {"end": 126.48, "start": 126.08, "text": "ideas"}, {"end": 126.84, "start": 126.48, "text": "in"}, {"end": 127.16, "start": 126.84, "text": "this"}, {"end": 127.52, "start": 127.16, "text": "collection"}, {"end": 127.68, "start": 127.52, "text": "of"}, {"end": 128.32, "start": 127.68, "text": "documents,"}, {"end": 128.6, "start": 128.32, "text": "but"}, {"end": 128.84, "start": 128.6, "text": "one"}, {"end": 128.88, "start": 128.84, "text": "of"}, {"end": 129.44, "start": 128.88, "text": "the"}, {"end": 129.96, "start": 129.44, "text": "things"}, {"end": 129.96, "start": 129.96, "text": "I"}, {"end": 130.04, "start": 129.96, "text": "got"}, {"end": 130.56, "start": 130.04, "text": "was"}, {"end": 130.6, "start": 130.56, "text": "an"}, {"end": 131.04, "start": 130.6, "text": "impression"}, {"end": 131.24, "start": 131.04, "text": "of"}, {"end": 131.92, "start": 131.24, "text": "how"}, {"end": 132.12, "start": 131.92, "text": "two"}, {"end": 132.48, "start": 132.12, "text": "different"}, {"end": 133.2, "start": 132.48, "text": "cultures"}, {"end": 133.56, "start": 133.2, "text": "approach"}, {"end": 134.36, "start": 133.56, "text": "data"}, {"end": 135.08, "start": 134.36, "text": "science"}, {"end": 136.76, "start": 135.08, "text": "differently."}, {"end": 137.24, "start": 136.76, "text": "So"}, {"end": 137.72, "start": 137.24, "text": "last"}, {"end": 138.12, "start": 137.72, "text": "year"}, {"end": 138.36, "start": 138.12, "text": "I"}, {"end": 138.36, "start": 138.36, "text": "wrote"}, {"end": 138.36, "start": 138.36, "text": "a"}, {"end": 138.64, "start": 138.36, "text": "piece"}, {"end": 139.36, "start": 138.64, "text": "for"}, {"end": 139.88, "start": 139.36, "text": "the"}, {"end": 140.32, "start": 139.88, "text": "IEEE"}, {"end": 140.64, "start": 140.32, "text": "Data"}, {"end": 140.92000000000002, "start": 140.64, "text": "Engineering"}, {"end": 141.8, "start": 140.92000000000002, "text": "Bulletin"}, {"end": 142.32, "start": 141.8, "text": "that"}, {"end": 142.68, "start": 142.32, "text": "expressed"}, {"end": 142.72, "start": 142.68, "text": "my"}, {"end": 143.04, "start": 142.72, "text": "own"}, {"end": 143.52, "start": 143.04, "text": "views"}, {"end": 143.96, "start": 143.52, "text": "on"}, {"end": 144.28, "start": 143.96, "text": "the"}, {"end": 144.72, "start": 144.28, "text": "topic."}, {"end": 145.28, "start": 144.72, "text": "A"}, {"end": 145.88, "start": 145.28, "text": "lot"}, {"end": 146.04, "start": 145.88, "text": "of"}, {"end": 146.2, "start": 146.04, "text": "the"}, {"end": 146.32, "start": 146.2, "text": "content"}, {"end": 146.56, "start": 146.32, "text": "that"}, {"end": 146.8, "start": 146.56, "text": "paper"}, {"end": 147.44, "start": 146.8, "text": "forms"}, {"end": 147.76, "start": 147.44, "text": "the"}, {"end": 148.24, "start": 147.76, "text": "first"}, {"end": 148.4, "start": 148.24, "text": "half"}, {"end": 148.6, "start": 148.4, "text": "of"}, {"end": 149.0, "start": 148.6, "text": "this"}, {"end": 149.96, "start": 149.0, "text": "talk"}], "text": " But the membership was about half and half. There are a lot of interesting ideas in this collection of documents, but one of the things I got was an impression of how two different cultures approach data science differently. So last year I wrote a piece for the IEEE Data Engineering Bulletin that expressed my own views on the topic. A lot of the content that paper forms the first half of this talk"}, {"chunks": [{"end": 152.44, "start": 150.0, "text": "Okay."}, {"end": 152.6, "start": 152.44, "text": "Now,"}, {"end": 152.96, "start": 152.6, "text": "where"}, {"end": 154.12, "start": 152.96, "text": "does"}, {"end": 154.76, "start": 154.12, "text": "data"}, {"end": 155.52, "start": 154.76, "text": "science"}, {"end": 155.84, "start": 155.52, "text": "come"}, {"end": 157.2, "start": 155.84, "text": "from?"}, {"end": 157.92, "start": 157.2, "text": "Okay."}, {"end": 158.72, "start": 157.92, "text": "Well,"}, {"end": 159.04, "start": 158.72, "text": "around"}, {"end": 159.04, "start": 159.04, "text": "the"}, {"end": 159.28, "start": 159.04, "text": "turn"}, {"end": 159.64, "start": 159.28, "text": "of"}, {"end": 159.68, "start": 159.64, "text": "the"}, {"end": 159.92, "start": 159.68, "text": "millennium,"}, {"end": 159.96, "start": 159.92, "text": "people"}, {"end": 160.44, "start": 159.96, "text": "were"}, {"end": 160.96, "start": 160.44, "text": "talking"}, {"end": 161.28, "start": 160.96, "text": "about"}, {"end": 161.56, "start": 161.28, "text": "data"}, {"end": 162.24, "start": 161.56, "text": "mining"}, {"end": 162.4, "start": 162.24, "text": "or"}, {"end": 163.28, "start": 162.4, "text": "knowledge"}, {"end": 164.04, "start": 163.28, "text": "discovery"}, {"end": 164.2, "start": 164.04, "text": "from"}, {"end": 164.68, "start": 164.2, "text": "which"}, {"end": 165.56, "start": 164.68, "text": "SIG-KDD"}, {"end": 166.07999999999998, "start": 165.56, "text": "took"}, {"end": 166.32, "start": 166.07999999999998, "text": "its"}, {"end": 167.72, "start": 166.32, "text": "name."}, {"end": 168.64, "start": 167.72, "text": "Then"}, {"end": 169.36, "start": 168.64, "text": "around"}, {"end": 170.44, "start": 169.36, "text": "2010,"}, {"end": 170.56, "start": 170.44, "text": "you"}, {"end": 170.84, "start": 170.56, "text": "couldn't"}, {"end": 171.24, "start": 170.84, "text": "say"}, {"end": 171.32, "start": 171.24, "text": "you"}, {"end": 171.48, "start": 171.32, "text": "were"}, {"end": 171.64, "start": 171.48, "text": "doing"}, {"end": 171.84, "start": 171.64, "text": "any"}, {"end": 171.84, "start": 171.84, "text": "of"}, {"end": 172.04, "start": 171.84, "text": "that"}, {"end": 172.68, "start": 172.04, "text": "anymore."}, {"end": 172.8, "start": 172.68, "text": "You"}, {"end": 172.88, "start": 172.8, "text": "had"}, {"end": 173.44, "start": 172.88, "text": "to"}, {"end": 174.0, "start": 173.44, "text": "say"}, {"end": 174.16, "start": 174.0, "text": "you"}, {"end": 174.16, "start": 174.16, "text": "were"}, {"end": 174.4, "start": 174.16, "text": "doing"}, {"end": 174.64, "start": 174.4, "text": "big"}, {"end": 177.16, "start": 174.64, "text": "data."}, {"end": 177.84, "start": 177.16, "text": "And"}, {"end": 177.96, "start": 177.84, "text": "now"}, {"end": 177.96, "start": 177.96, "text": "you"}, {"end": 177.96, "start": 177.96, "text": "need"}, {"end": 179.04, "start": 177.96, "text": "to"}, {"end": 179.48, "start": 179.04, "text": "say"}, {"end": 179.48, "start": 179.48, "text": "you're"}, {"end": 179.8, "start": 179.48, "text": "doing"}, {"end": 179.96, "start": 179.8, "text": "data"}], "text": " Okay. Now, where does data science come from? Okay. Well, around the turn of the millennium, people were talking about data mining or knowledge discovery from which SIG-KDD took its name. Then around 2010, you couldn't say you were doing any of that anymore. You had to say you were doing big data. And now you need to say you're doing data"}, {"chunks": [{"end": 180.76, "start": 180.0, "text": "data"}, {"end": 182.84, "start": 180.76, "text": "science."}, {"end": 183.4, "start": 182.84, "text": "But"}, {"end": 183.56, "start": 183.4, "text": "the"}, {"end": 184.28, "start": 183.56, "text": "concept"}, {"end": 184.88, "start": 184.28, "text": "behind"}, {"end": 185.2, "start": 184.88, "text": "these"}, {"end": 185.44, "start": 185.2, "text": "changing"}, {"end": 186.0, "start": 185.44, "text": "words"}, {"end": 186.44, "start": 186.0, "text": "hasn't"}, {"end": 187.8, "start": 186.44, "text": "really"}, {"end": 188.52, "start": 187.8, "text": "changed."}, {"end": 189.04, "start": 188.52, "text": "There"}, {"end": 189.48, "start": 189.04, "text": "are"}, {"end": 190.2, "start": 189.48, "text": "many"}, {"end": 190.96, "start": 190.2, "text": "scientists"}, {"end": 191.32, "start": 190.96, "text": "and"}, {"end": 191.76, "start": 191.32, "text": "engineers"}, {"end": 191.92, "start": 191.76, "text": "who"}, {"end": 191.92, "start": 191.92, "text": "are"}, {"end": 192.48, "start": 191.92, "text": "interested"}, {"end": 192.56, "start": 192.48, "text": "in"}, {"end": 192.8, "start": 192.56, "text": "what"}, {"end": 193.12, "start": 192.8, "text": "can"}, {"end": 193.4, "start": 193.12, "text": "be"}, {"end": 193.52, "start": 193.4, "text": "done"}, {"end": 193.72, "start": 193.52, "text": "with"}, {"end": 193.88, "start": 193.72, "text": "the"}, {"end": 194.44, "start": 193.88, "text": "biggest"}, {"end": 194.72, "start": 194.44, "text": "hardware"}, {"end": 195.56, "start": 194.72, "text": "configurations"}, {"end": 196.28, "start": 195.56, "text": "available,"}, {"end": 196.64, "start": 196.28, "text": "using"}, {"end": 197.2, "start": 196.64, "text": "the"}, {"end": 197.64, "start": 197.2, "text": "best"}, {"end": 198.07999999999998, "start": 197.64, "text": "and"}, {"end": 198.6, "start": 198.07999999999998, "text": "most"}, {"end": 198.96, "start": 198.6, "text": "efficient"}, {"end": 199.48, "start": 198.96, "text": "programming"}, {"end": 199.88, "start": 199.48, "text": "tools"}, {"end": 200.0, "start": 199.88, "text": "and"}, {"end": 200.0, "start": 200.0, "text": "the"}, {"end": 200.48, "start": 200.0, "text": "best"}, {"end": 201.04, "start": 200.48, "text": "algorithms"}, {"end": 201.28, "start": 201.04, "text": "in"}, {"end": 201.28, "start": 201.28, "text": "order"}, {"end": 201.8, "start": 201.28, "text": "to"}, {"end": 202.24, "start": 201.8, "text": "solve"}, {"end": 203.16, "start": 202.24, "text": "problems"}, {"end": 203.32, "start": 203.16, "text": "in"}, {"end": 203.32, "start": 203.32, "text": "a"}, {"end": 203.92000000000002, "start": 203.32, "text": "variety"}, {"end": 204.4, "start": 203.92000000000002, "text": "of"}, {"end": 204.88, "start": 204.4, "text": "application"}, {"end": 205.44, "start": 204.88, "text": "areas,"}, {"end": 206.12, "start": 205.44, "text": "including"}, {"end": 206.76, "start": 206.12, "text": "essentially"}, {"end": 206.92000000000002, "start": 206.76, "text": "all"}, {"end": 207.44, "start": 206.92000000000002, "text": "branches"}, {"end": 207.44, "start": 207.44, "text": "of"}, {"end": 208.24, "start": 207.44, "text": "science,"}, {"end": 208.68, "start": 208.24, "text": "as"}, {"end": 209.07999999999998, "start": 208.68, "text": "well"}, {"end": 209.4, "start": 209.07999999999998, "text": "as"}, {"end": 209.96, "start": 209.4, "text": "industrial"}], "text": " data science. But the concept behind these changing words hasn't really changed. There are many scientists and engineers who are interested in what can be done with the biggest hardware configurations available, using the best and most efficient programming tools and the best algorithms in order to solve problems in a variety of application areas, including essentially all branches of science, as well as industrial"}, {"chunks": [{"end": 210.64, "start": 210.0, "text": "applications."}, {"end": 212.36, "start": 210.64, "text": "Now,"}, {"end": 213.56, "start": 212.36, "text": "because"}, {"end": 214.48, "start": 213.56, "text": "data"}, {"end": 215.2, "start": 214.48, "text": "science"}, {"end": 215.28, "start": 215.2, "text": "is"}, {"end": 215.68, "start": 215.28, "text": "so"}, {"end": 216.44, "start": 215.68, "text": "important,"}, {"end": 216.8, "start": 216.44, "text": "it"}, {"end": 217.12, "start": 216.8, "text": "is"}, {"end": 217.12, "start": 217.12, "text": "no"}, {"end": 217.72, "start": 217.12, "text": "surprise"}, {"end": 217.8, "start": 217.72, "text": "that"}, {"end": 218.12, "start": 217.8, "text": "different"}, {"end": 218.8, "start": 218.12, "text": "communities"}, {"end": 219.0, "start": 218.8, "text": "want"}, {"end": 219.2, "start": 219.0, "text": "to"}, {"end": 219.56, "start": 219.2, "text": "claim"}, {"end": 219.92, "start": 219.56, "text": "it"}, {"end": 220.96, "start": 219.92, "text": "as"}, {"end": 221.32, "start": 220.96, "text": "their"}, {"end": 221.36, "start": 221.32, "text": "own."}, {"end": 221.6, "start": 221.36, "text": "So,"}, {"end": 221.68, "start": 221.6, "text": "for"}, {"end": 222.44, "start": 221.68, "text": "example,"}, {"end": 222.56, "start": 222.44, "text": "there's"}, {"end": 222.6, "start": 222.56, "text": "a"}, {"end": 223.12, "start": 222.6, "text": "tendency"}, {"end": 223.4, "start": 223.12, "text": "to"}, {"end": 223.88, "start": 223.4, "text": "equate"}, {"end": 224.44, "start": 223.88, "text": "data"}, {"end": 225.16, "start": 224.44, "text": "science"}, {"end": 225.48, "start": 225.16, "text": "with"}, {"end": 225.84, "start": 225.48, "text": "machine"}, {"end": 225.96, "start": 225.84, "text": "learning."}, {"end": 227.0, "start": 225.96, "text": "And"}, {"end": 227.92000000000002, "start": 227.0, "text": "as"}, {"end": 228.44, "start": 227.92000000000002, "text": "I"}, {"end": 229.04, "start": 228.44, "text": "saw"}, {"end": 229.6, "start": 229.04, "text": "from"}, {"end": 229.76, "start": 229.6, "text": "the"}, {"end": 230.28, "start": 229.76, "text": "NRC"}, {"end": 230.96, "start": 230.28, "text": "panel"}, {"end": 231.44, "start": 230.96, "text": "I"}, {"end": 231.88, "start": 231.44, "text": "was"}, {"end": 231.92000000000002, "start": 231.88, "text": "on,"}, {"end": 232.96, "start": 231.92000000000002, "text": "statisticians"}, {"end": 233.32, "start": 232.96, "text": "are"}, {"end": 233.32, "start": 233.32, "text": "not"}, {"end": 233.36, "start": 233.32, "text": "above"}, {"end": 233.8, "start": 233.36, "text": "claiming"}, {"end": 233.84, "start": 233.8, "text": "data"}, {"end": 234.48, "start": 233.84, "text": "science"}, {"end": 234.96, "start": 234.48, "text": "as"}, {"end": 235.32, "start": 234.96, "text": "a"}, {"end": 235.44, "start": 235.32, "text": "branch"}, {"end": 235.48, "start": 235.44, "text": "of"}, {"end": 236.76, "start": 235.48, "text": "statistics."}, {"end": 237.0, "start": 236.76, "text": "But"}, {"end": 237.44, "start": 237.0, "text": "in"}, {"end": 237.68, "start": 237.44, "text": "my"}, {"end": 238.32, "start": 237.68, "text": "own"}, {"end": 238.8, "start": 238.32, "text": "experience,"}, {"end": 239.2, "start": 238.8, "text": "I"}, {"end": 239.56, "start": 239.2, "text": "have"}, {"end": 239.96, "start": 239.56, "text": "always"}], "text": " applications. Now, because data science is so important, it is no surprise that different communities want to claim it as their own. So, for example, there's a tendency to equate data science with machine learning. And as I saw from the NRC panel I was on, statisticians are not above claiming data science as a branch of statistics. But in my own experience, I have always"}, {"chunks": [{"end": 240.64, "start": 240.0, "text": "viewed"}, {"end": 241.52, "start": 240.64, "text": "database"}, {"end": 242.08, "start": 241.52, "text": "systems"}, {"end": 242.68, "start": 242.08, "text": "research"}, {"end": 242.96, "start": 242.68, "text": "as"}, {"end": 243.08, "start": 242.96, "text": "a"}, {"end": 243.36, "start": 243.08, "text": "study"}, {"end": 243.64, "start": 243.36, "text": "of"}, {"end": 244.48, "start": 243.64, "text": "what"}, {"end": 244.84, "start": 244.48, "text": "can"}, {"end": 244.96, "start": 244.84, "text": "be"}, {"end": 245.2, "start": 244.96, "text": "done"}, {"end": 245.36, "start": 245.2, "text": "to"}, {"end": 245.56, "start": 245.36, "text": "exploit"}, {"end": 245.96, "start": 245.56, "text": "the"}, {"end": 246.44, "start": 245.96, "text": "largest"}, {"end": 246.72, "start": 246.44, "text": "possible"}, {"end": 246.88, "start": 246.72, "text": "data"}, {"end": 247.44, "start": 246.88, "text": "sets"}, {"end": 247.68, "start": 247.44, "text": "that"}, {"end": 247.88, "start": 247.68, "text": "can"}, {"end": 248.08, "start": 247.88, "text": "be"}, {"end": 248.64, "start": 248.08, "text": "handled"}, {"end": 248.92, "start": 248.64, "text": "at"}, {"end": 249.28, "start": 248.92, "text": "the"}, {"end": 250.4, "start": 249.28, "text": "time."}, {"end": 251.0, "start": 250.4, "text": "Obviously,"}, {"end": 251.08, "start": 251.0, "text": "the"}, {"end": 251.76, "start": 251.08, "text": "maximum"}, {"end": 252.28, "start": 251.76, "text": "amount"}, {"end": 252.6, "start": 252.28, "text": "of"}, {"end": 253.0, "start": 252.6, "text": "data"}, {"end": 253.24, "start": 253.0, "text": "we"}, {"end": 253.24, "start": 253.24, "text": "can"}, {"end": 253.56, "start": 253.24, "text": "handle"}, {"end": 254.0, "start": 253.56, "text": "has"}, {"end": 254.24, "start": 254.0, "text": "grown"}, {"end": 255.0, "start": 254.24, "text": "exponentially,"}, {"end": 255.24, "start": 255.0, "text": "leading"}, {"end": 255.36, "start": 255.24, "text": "to"}, {"end": 255.72, "start": 255.36, "text": "all"}, {"end": 255.76, "start": 255.72, "text": "the"}, {"end": 256.2, "start": 255.76, "text": "amazing"}, {"end": 257.0, "start": 256.2, "text": "opportunities"}, {"end": 257.08, "start": 257.0, "text": "we"}, {"end": 257.28, "start": 257.08, "text": "now"}, {"end": 257.56, "start": 257.28, "text": "have"}, {"end": 257.84, "start": 257.56, "text": "to"}, {"end": 258.4, "start": 257.84, "text": "influence"}, {"end": 259.96, "start": 258.4, "text": "science"}, {"end": 260.16, "start": 259.96, "text": "and"}, {"end": 261.44, "start": 260.16, "text": "commerce."}, {"end": 262.64, "start": 261.44, "text": "And"}, {"end": 263.24, "start": 262.64, "text": "especially"}, {"end": 263.32, "start": 263.24, "text": "if"}, {"end": 263.44, "start": 263.32, "text": "you"}, {"end": 263.84, "start": 263.44, "text": "want"}, {"end": 263.84, "start": 263.84, "text": "to"}, {"end": 264.16, "start": 263.84, "text": "become"}, {"end": 264.16, "start": 264.16, "text": "a"}, {"end": 264.16, "start": 264.16, "text": "data"}, {"end": 264.6, "start": 264.16, "text": "scientist,"}, {"end": 264.8, "start": 264.6, "text": "I"}, {"end": 264.8, "start": 264.8, "text": "believe"}, {"end": 265.52, "start": 264.8, "text": "the"}, {"end": 265.88, "start": 265.52, "text": "best"}, {"end": 266.2, "start": 265.88, "text": "road"}, {"end": 266.28, "start": 266.2, "text": "is"}, {"end": 266.64, "start": 266.28, "text": "still"}, {"end": 266.8, "start": 266.64, "text": "to"}, {"end": 267.32, "start": 266.8, "text": "major"}, {"end": 267.4, "start": 267.32, "text": "in"}, {"end": 268.16, "start": 267.4, "text": "computer"}, {"end": 268.76, "start": 268.16, "text": "science"}, {"end": 269.0, "start": 268.76, "text": "and"}, {"end": 269.96, "start": 269.0, "text": "specialize"}], "text": " viewed database systems research as a study of what can be done to exploit the largest possible data sets that can be handled at the time. Obviously, the maximum amount of data we can handle has grown exponentially, leading to all the amazing opportunities we now have to influence science and commerce. And especially if you want to become a data scientist, I believe the best road is still to major in computer science and specialize"}, {"chunks": [{"end": 270.2, "start": 270.0, "text": "in"}, {"end": 270.68, "start": 270.2, "text": "handling"}, {"end": 271.4, "start": 270.68, "text": "large-scale"}, {"end": 272.64, "start": 271.4, "text": "data."}, {"end": 273.2, "start": 272.64, "text": "Like"}, {"end": 273.76, "start": 273.2, "text": "any"}, {"end": 274.12, "start": 273.76, "text": "CS"}, {"end": 274.72, "start": 274.12, "text": "education,"}, {"end": 275.0, "start": 274.72, "text": "that"}, {"end": 275.24, "start": 275.0, "text": "will"}, {"end": 275.68, "start": 275.24, "text": "include"}, {"end": 275.76, "start": 275.68, "text": "a"}, {"end": 276.12, "start": 275.76, "text": "grounding"}, {"end": 276.24, "start": 276.12, "text": "in"}, {"end": 277.16, "start": 276.24, "text": "statistics,"}, {"end": 277.2, "start": 277.16, "text": "and"}, {"end": 277.6, "start": 277.2, "text": "you"}, {"end": 278.08, "start": 277.6, "text": "will"}, {"end": 278.16, "start": 278.08, "text": "take"}, {"end": 278.56, "start": 278.16, "text": "a"}, {"end": 278.96, "start": 278.56, "text": "number"}, {"end": 279.0, "start": 278.96, "text": "of"}, {"end": 279.4, "start": 279.0, "text": "courses"}, {"end": 279.48, "start": 279.4, "text": "in"}, {"end": 279.8, "start": 279.48, "text": "machine"}, {"end": 280.28, "start": 279.8, "text": "learning,"}, {"end": 280.76, "start": 280.28, "text": "but"}, {"end": 280.84, "start": 280.76, "text": "you"}, {"end": 281.2, "start": 280.84, "text": "will"}, {"end": 281.88, "start": 281.2, "text": "also"}, {"end": 281.92, "start": 281.88, "text": "learn"}, {"end": 282.24, "start": 281.92, "text": "about"}, {"end": 282.88, "start": 282.24, "text": "non-machine"}, {"end": 283.4, "start": 282.88, "text": "learning"}, {"end": 284.16, "start": 283.4, "text": "approaches"}, {"end": 284.16, "start": 284.16, "text": "to"}, {"end": 284.44, "start": 284.16, "text": "handling"}, {"end": 285.16, "start": 284.44, "text": "large-scale"}, {"end": 285.16, "start": 285.16, "text": "data."}, {"end": 285.16, "start": 285.16, "text": "Now,"}, {"end": 285.24, "start": 285.16, "text": "I"}, {"end": 285.72, "start": 285.24, "text": "don't"}, {"end": 285.88, "start": 285.72, "text": "want"}, {"end": 285.96, "start": 285.88, "text": "to"}, {"end": 287.0, "start": 285.96, "text": "beat"}, {"end": 287.92, "start": 287.0, "text": "the"}, {"end": 288.08, "start": 287.92, "text": "drum"}, {"end": 289.64, "start": 288.08, "text": "too"}, {"end": 290.48, "start": 289.64, "text": "loudly"}, {"end": 291.24, "start": 290.48, "text": "for"}, {"end": 291.76, "start": 291.24, "text": "database"}, {"end": 292.32, "start": 291.76, "text": "systems"}, {"end": 292.32, "start": 292.32, "text": "or"}, {"end": 292.92, "start": 292.32, "text": "computer"}, {"end": 293.4, "start": 292.92, "text": "science"}, {"end": 293.52, "start": 293.4, "text": "in"}, {"end": 293.92, "start": 293.52, "text": "general,"}, {"end": 293.92, "start": 293.92, "text": "so"}, {"end": 294.16, "start": 293.92, "text": "let"}, {"end": 294.16, "start": 294.16, "text": "me"}, {"end": 294.8, "start": 294.16, "text": "start"}, {"end": 295.08, "start": 294.8, "text": "by"}, {"end": 295.88, "start": 295.08, "text": "recognizing"}, {"end": 296.04, "start": 295.88, "text": "that"}, {"end": 296.2, "start": 296.04, "text": "the"}, {"end": 296.56, "start": 296.2, "text": "machine"}, {"end": 296.84, "start": 296.56, "text": "learning"}, {"end": 297.44, "start": 296.84, "text": "community"}, {"end": 298.4, "start": 297.44, "text": "has"}, {"end": 298.76, "start": 298.4, "text": "had"}, {"end": 299.16, "start": 298.76, "text": "many"}, {"end": 299.72, "start": 299.16, "text": "remarkable"}, {"end": 299.96, "start": 299.72, "text": "achievements"}], "text": " in handling large-scale data. Like any CS education, that will include a grounding in statistics, and you will take a number of courses in machine learning, but you will also learn about non-machine learning approaches to handling large-scale data. Now, I don't want to beat the drum too loudly for database systems or computer science in general, so let me start by recognizing that the machine learning community has had many remarkable achievements"}, {"chunks": [{"end": 300.16, "start": 300.0, "text": "in"}, {"end": 300.48, "start": 300.16, "text": "recent"}, {"end": 300.92, "start": 300.48, "text": "years."}, {"end": 301.12, "start": 300.92, "text": "There"}, {"end": 302.64, "start": 301.12, "text": "was"}, {"end": 302.88, "start": 302.64, "text": "a"}, {"end": 303.12, "start": 302.88, "text": "time"}, {"end": 303.76, "start": 303.12, "text": "when"}, {"end": 304.28, "start": 303.76, "text": "the"}, {"end": 304.68, "start": 304.28, "text": "field"}, {"end": 305.16, "start": 304.68, "text": "of"}, {"end": 305.56, "start": 305.16, "text": "artificial"}, {"end": 306.08, "start": 305.56, "text": "intelligence"}, {"end": 306.12, "start": 306.08, "text": "was"}, {"end": 306.4, "start": 306.12, "text": "focused"}, {"end": 306.96, "start": 306.4, "text": "more"}, {"end": 307.28, "start": 306.96, "text": "on"}, {"end": 308.4, "start": 307.28, "text": "simulating"}, {"end": 308.56, "start": 308.4, "text": "or"}, {"end": 309.08, "start": 308.56, "text": "representing"}, {"end": 309.92, "start": 309.08, "text": "thought"}, {"end": 310.04, "start": 309.92, "text": "and"}, {"end": 310.64, "start": 310.04, "text": "reasoning,"}, {"end": 311.04, "start": 310.64, "text": "and"}, {"end": 311.48, "start": 311.04, "text": "I"}, {"end": 312.2, "start": 311.48, "text": "frankly"}, {"end": 312.48, "start": 312.2, "text": "often"}, {"end": 313.12, "start": 312.48, "text": "criticize"}, {"end": 313.4, "start": 313.12, "text": "their"}, {"end": 314.04, "start": 313.4, "text": "approach."}, {"end": 314.36, "start": 314.04, "text": "But"}, {"end": 314.52, "start": 314.36, "text": "no"}, {"end": 315.04, "start": 314.52, "text": "more."}, {"end": 315.24, "start": 315.04, "text": "I"}, {"end": 315.88, "start": 315.24, "text": "have"}, {"end": 316.28, "start": 315.88, "text": "nothing"}, {"end": 316.68, "start": 316.28, "text": "but"}, {"end": 317.36, "start": 316.68, "text": "admiration"}, {"end": 317.6, "start": 317.36, "text": "for"}, {"end": 317.64, "start": 317.6, "text": "the"}, {"end": 318.24, "start": 317.64, "text": "successes"}, {"end": 318.36, "start": 318.24, "text": "that"}, {"end": 318.72, "start": 318.36, "text": "machine"}, {"end": 319.0, "start": 318.72, "text": "learning"}, {"end": 319.28, "start": 319.0, "text": "has"}, {"end": 319.72, "start": 319.28, "text": "had"}, {"end": 320.16, "start": 319.72, "text": "in"}, {"end": 320.56, "start": 320.16, "text": "building"}, {"end": 321.32, "start": 320.56, "text": "superior"}, {"end": 321.8, "start": 321.32, "text": "algorithms."}, {"end": 322.16, "start": 321.8, "text": "Likewise,"}, {"end": 322.72, "start": 322.16, "text": "I"}, {"end": 324.96, "start": 322.72, "text": "do"}, {"end": 325.88, "start": 324.96, "text": "not"}, {"end": 326.6, "start": 325.88, "text": "want"}, {"end": 326.96, "start": 326.6, "text": "to"}, {"end": 327.2, "start": 326.96, "text": "dismiss"}, {"end": 327.36, "start": 327.2, "text": "the"}, {"end": 328.32, "start": 327.36, "text": "statistics"}, {"end": 329.96, "start": 328.32, "text": "community."}], "text": " in recent years. There was a time when the field of artificial intelligence was focused more on simulating or representing thought and reasoning, and I frankly often criticize their approach. But no more. I have nothing but admiration for the successes that machine learning has had in building superior algorithms. Likewise, I do not want to dismiss the statistics community."}, {"chunks": [{"end": 330.2, "start": 330.0, "text": "Their"}, {"end": 330.8, "start": 330.2, "text": "achievements"}, {"end": 331.6, "start": 330.8, "text": "have"}, {"end": 331.6, "start": 331.6, "text": "been"}, {"end": 331.6, "start": 331.6, "text": "many"}, {"end": 331.6, "start": 331.6, "text": "and"}, {"end": 331.6, "start": 331.6, "text": "the"}, {"end": 332.08, "start": 331.6, "text": "tools"}, {"end": 332.28, "start": 332.08, "text": "they"}, {"end": 332.8, "start": 332.28, "text": "created"}, {"end": 332.96, "start": 332.8, "text": "have"}, {"end": 333.52, "start": 332.96, "text": "important"}, {"end": 333.96, "start": 333.52, "text": "uses"}, {"end": 334.08, "start": 333.96, "text": "in"}, {"end": 334.76, "start": 334.08, "text": "data"}, {"end": 335.0, "start": 334.76, "text": "science"}, {"end": 335.16, "start": 335.0, "text": "and"}, {"end": 335.36, "start": 335.16, "text": "in"}, {"end": 336.04, "start": 335.36, "text": "CS"}, {"end": 337.56, "start": 336.04, "text": "in"}, {"end": 338.0, "start": 337.56, "text": "general."}, {"end": 338.36, "start": 338.0, "text": "Many"}, {"end": 339.08, "start": 338.36, "text": "statisticians"}, {"end": 339.2, "start": 339.08, "text": "are"}, {"end": 339.4, "start": 339.2, "text": "beginning"}, {"end": 339.64, "start": 339.4, "text": "to"}, {"end": 339.8, "start": 339.64, "text": "get"}, {"end": 340.48, "start": 339.8, "text": "interested"}, {"end": 340.6, "start": 340.48, "text": "in"}, {"end": 340.84, "start": 340.6, "text": "computer"}, {"end": 341.32, "start": 340.84, "text": "science"}, {"end": 342.48, "start": 341.32, "text": "problems"}, {"end": 342.96, "start": 342.48, "text": "and"}, {"end": 343.04, "start": 342.96, "text": "are"}, {"end": 343.36, "start": 343.04, "text": "able"}, {"end": 343.48, "start": 343.36, "text": "to"}, {"end": 343.48, "start": 343.48, "text": "make"}, {"end": 343.84, "start": 343.48, "text": "important"}, {"end": 344.72, "start": 343.84, "text": "contributions."}, {"end": 345.12, "start": 344.72, "text": "For"}, {"end": 345.6, "start": 345.12, "text": "just"}, {"end": 346.72, "start": 345.6, "text": "one"}, {"end": 347.04, "start": 346.72, "text": "little"}, {"end": 347.52, "start": 347.04, "text": "personal"}, {"end": 348.32, "start": 347.52, "text": "example,"}, {"end": 348.4, "start": 348.32, "text": "a"}, {"end": 348.6, "start": 348.4, "text": "few"}, {"end": 349.04, "start": 348.6, "text": "years"}, {"end": 349.4, "start": 349.04, "text": "ago,"}, {"end": 349.44, "start": 349.4, "text": "I"}, {"end": 349.76, "start": 349.44, "text": "introduced"}, {"end": 349.8, "start": 349.76, "text": "one"}, {"end": 350.28, "start": 349.8, "text": "of"}, {"end": 350.8, "start": 350.28, "text": "my"}, {"end": 351.08, "start": 350.8, "text": "colleagues"}, {"end": 351.12, "start": 351.08, "text": "in"}, {"end": 351.4, "start": 351.12, "text": "the"}, {"end": 352.12, "start": 351.4, "text": "statistics"}, {"end": 352.44, "start": 352.12, "text": "department"}, {"end": 352.68, "start": 352.44, "text": "at"}, {"end": 353.12, "start": 352.68, "text": "Stanford"}, {"end": 353.24, "start": 353.12, "text": "to"}, {"end": 353.44, "start": 353.24, "text": "the"}, {"end": 353.92, "start": 353.44, "text": "idea"}, {"end": 354.0, "start": 353.92, "text": "of"}, {"end": 354.04, "start": 354.0, "text": "locality"}, {"end": 354.96, "start": 354.04, "text": "sensitive"}, {"end": 355.32, "start": 354.96, "text": "hashing."}, {"end": 355.52, "start": 355.32, "text": "One"}, {"end": 355.84, "start": 355.52, "text": "of"}, {"end": 356.28, "start": 355.84, "text": "the"}, {"end": 357.44, "start": 356.28, "text": "ideas"}, {"end": 357.64, "start": 357.44, "text": "I'm"}, {"end": 357.84, "start": 357.64, "text": "going"}, {"end": 358.24, "start": 357.84, "text": "to"}, {"end": 358.52, "start": 358.24, "text": "cover"}, {"end": 358.52, "start": 358.52, "text": "at"}, {"end": 359.0, "start": 358.52, "text": "the"}, {"end": 359.2, "start": 359.0, "text": "end"}, {"end": 359.2, "start": 359.2, "text": "of"}, {"end": 359.52, "start": 359.2, "text": "this"}, {"end": 359.96, "start": 359.52, "text": "talk"}], "text": " Their achievements have been many and the tools they created have important uses in data science and in CS in general. Many statisticians are beginning to get interested in computer science problems and are able to make important contributions. For just one little personal example, a few years ago, I introduced one of my colleagues in the statistics department at Stanford to the idea of locality sensitive hashing. One of the ideas I'm going to cover at the end of this talk"}, {"chunks": [{"end": 360.4, "start": 360.0, "text": "He"}, {"end": 360.56, "start": 360.4, "text": "was"}, {"end": 360.56, "start": 360.56, "text": "able"}, {"end": 360.8, "start": 360.56, "text": "to"}, {"end": 361.28, "start": 360.8, "text": "show"}, {"end": 361.72, "start": 361.28, "text": "me"}, {"end": 362.16, "start": 361.72, "text": "something"}, {"end": 362.6, "start": 362.16, "text": "that"}, {"end": 362.92, "start": 362.6, "text": "speeds"}, {"end": 363.0, "start": 362.92, "text": "up"}, {"end": 363.48, "start": 363.0, "text": "one"}, {"end": 363.48, "start": 363.48, "text": "of"}, {"end": 363.48, "start": 363.48, "text": "the"}, {"end": 363.88, "start": 363.48, "text": "important"}, {"end": 364.48, "start": 363.88, "text": "algorithms"}, {"end": 364.56, "start": 364.48, "text": "in"}, {"end": 364.72, "start": 364.56, "text": "that"}, {"end": 365.0, "start": 364.72, "text": "field"}, {"end": 365.56, "start": 365.0, "text": "called"}, {"end": 365.84, "start": 365.56, "text": "min"}, {"end": 366.48, "start": 365.84, "text": "hashing"}, {"end": 366.96, "start": 366.48, "text": "by"}, {"end": 367.4, "start": 366.96, "text": "a"}, {"end": 367.96, "start": 367.4, "text": "lot."}, {"end": 368.52, "start": 367.96, "text": "I"}, {"end": 368.76, "start": 368.52, "text": "should"}, {"end": 369.0, "start": 368.76, "text": "have"}, {"end": 369.44, "start": 369.0, "text": "been"}, {"end": 369.96, "start": 369.44, "text": "able"}, {"end": 370.36, "start": 369.96, "text": "to"}, {"end": 370.36, "start": 370.36, "text": "see"}, {"end": 370.4, "start": 370.36, "text": "it"}, {"end": 370.52, "start": 370.4, "text": "for"}, {"end": 371.0, "start": 370.52, "text": "myself,"}, {"end": 371.48, "start": 371.0, "text": "but"}, {"end": 371.88, "start": 371.48, "text": "I"}, {"end": 372.12, "start": 371.88, "text": "didn't."}, {"end": 372.28, "start": 372.12, "text": "He"}, {"end": 376.24, "start": 372.28, "text": "did."}, {"end": 376.56, "start": 376.24, "text": "But"}, {"end": 376.56, "start": 376.56, "text": "now"}, {"end": 376.56, "start": 376.56, "text": "I'd"}, {"end": 376.56, "start": 376.56, "text": "like"}, {"end": 377.12, "start": 376.56, "text": "to"}, {"end": 377.96, "start": 377.12, "text": "start"}, {"end": 378.56, "start": 377.96, "text": "talking"}, {"end": 378.96, "start": 378.56, "text": "about"}, {"end": 378.96, "start": 378.96, "text": "what"}, {"end": 378.96, "start": 378.96, "text": "I"}, {"end": 379.6, "start": 378.96, "text": "feel"}, {"end": 379.76, "start": 379.6, "text": "is"}, {"end": 380.0, "start": 379.76, "text": "wrong"}, {"end": 380.4, "start": 380.0, "text": "with"}, {"end": 380.72, "start": 380.4, "text": "the"}, {"end": 381.6, "start": 380.72, "text": "statistics"}, {"end": 381.92, "start": 381.6, "text": "view"}, {"end": 381.96, "start": 381.92, "text": "of"}, {"end": 382.76, "start": 381.96, "text": "data"}, {"end": 383.56, "start": 382.76, "text": "science."}, {"end": 384.08, "start": 383.56, "text": "What"}, {"end": 384.08, "start": 384.08, "text": "we"}, {"end": 384.64, "start": 384.08, "text": "have"}, {"end": 385.28, "start": 384.64, "text": "here"}, {"end": 385.72, "start": 385.28, "text": "is"}, {"end": 385.92, "start": 385.72, "text": "a"}, {"end": 386.28, "start": 385.92, "text": "Venn"}, {"end": 386.64, "start": 386.28, "text": "diagram"}, {"end": 386.88, "start": 386.64, "text": "due"}, {"end": 387.16, "start": 386.88, "text": "to"}, {"end": 387.48, "start": 387.16, "text": "Drew"}, {"end": 388.88, "start": 387.48, "text": "Conway."}, {"end": 389.32, "start": 388.88, "text": "Who"}, {"end": 389.88, "start": 389.32, "text": "is"}, {"end": 389.96, "start": 389.88, "text": "he?"}], "text": " He was able to show me something that speeds up one of the important algorithms in that field called min hashing by a lot. I should have been able to see it for myself, but I didn't. He did. But now I'd like to start talking about what I feel is wrong with the statistics view of data science. What we have here is a Venn diagram due to Drew Conway. Who is he?"}, {"chunks": [{"end": 390.88, "start": 390.0, "text": "Well,"}, {"end": 391.52, "start": 390.88, "text": "when"}, {"end": 391.64, "start": 391.52, "text": "he"}, {"end": 392.2, "start": 391.64, "text": "drew"}, {"end": 392.28, "start": 392.2, "text": "this"}, {"end": 392.56, "start": 392.28, "text": "many"}, {"end": 392.92, "start": 392.56, "text": "years"}, {"end": 393.44, "start": 392.92, "text": "ago,"}, {"end": 393.52, "start": 393.44, "text": "he"}, {"end": 393.52, "start": 393.52, "text": "was"}, {"end": 393.6, "start": 393.52, "text": "a"}, {"end": 394.08, "start": 393.6, "text": "graduate"}, {"end": 394.48, "start": 394.08, "text": "student"}, {"end": 394.6, "start": 394.48, "text": "in"}, {"end": 394.96, "start": 394.6, "text": "political"}, {"end": 396.28, "start": 394.96, "text": "science."}, {"end": 396.76, "start": 396.28, "text": "And"}, {"end": 397.12, "start": 396.76, "text": "I"}, {"end": 397.36, "start": 397.12, "text": "actually"}, {"end": 397.76, "start": 397.36, "text": "got"}, {"end": 397.76, "start": 397.76, "text": "this"}, {"end": 398.4, "start": 397.76, "text": "diagram"}, {"end": 399.04, "start": 398.4, "text": "from"}, {"end": 399.24, "start": 399.04, "text": "a"}, {"end": 400.0, "start": 399.24, "text": "Wikipedia"}, {"end": 400.48, "start": 400.0, "text": "page"}, {"end": 401.4, "start": 400.48, "text": "called"}, {"end": 401.92, "start": 401.4, "text": "Data"}, {"end": 402.76, "start": 401.92, "text": "Science"}, {"end": 403.16, "start": 402.76, "text": "Venn"}, {"end": 403.88, "start": 403.16, "text": "Diagrams."}, {"end": 403.96, "start": 403.88, "text": "It"}, {"end": 404.2, "start": 403.96, "text": "turns"}, {"end": 404.44, "start": 404.2, "text": "out"}, {"end": 405.0, "start": 404.44, "text": "that"}, {"end": 405.36, "start": 405.0, "text": "every"}, {"end": 405.68, "start": 405.36, "text": "field"}, {"end": 406.08, "start": 405.68, "text": "has"}, {"end": 406.4, "start": 406.08, "text": "its"}, {"end": 406.56, "start": 406.4, "text": "own"}, {"end": 407.0, "start": 406.56, "text": "definition"}, {"end": 407.04, "start": 407.0, "text": "of"}, {"end": 407.72, "start": 407.04, "text": "data"}, {"end": 408.2, "start": 407.72, "text": "science,"}, {"end": 408.44, "start": 408.2, "text": "and"}, {"end": 408.84, "start": 408.44, "text": "it's"}, {"end": 408.84, "start": 408.84, "text": "one"}, {"end": 409.32, "start": 408.84, "text": "that"}, {"end": 410.52, "start": 409.32, "text": "magnifies"}, {"end": 410.56, "start": 410.52, "text": "the"}, {"end": 411.24, "start": 410.56, "text": "importance"}, {"end": 412.16, "start": 411.24, "text": "of"}, {"end": 412.52, "start": 412.16, "text": "their"}, {"end": 412.52, "start": 412.52, "text": "own"}, {"end": 412.56, "start": 412.52, "text": "field"}, {"end": 412.56, "start": 412.56, "text": "that"}, {"end": 412.72, "start": 412.56, "text": "can"}, {"end": 412.92, "start": 412.72, "text": "be"}, {"end": 413.6, "start": 412.92, "text": "represented"}, {"end": 414.12, "start": 413.6, "text": "by"}, {"end": 414.28, "start": 414.12, "text": "a"}, {"end": 415.04, "start": 414.28, "text": "Venn"}, {"end": 415.68, "start": 415.04, "text": "diagram."}, {"end": 415.76, "start": 415.68, "text": "I'm"}, {"end": 416.16, "start": 415.76, "text": "no"}, {"end": 416.84, "start": 416.16, "text": "exception."}, {"end": 417.52, "start": 416.84, "text": "I'll"}, {"end": 417.92, "start": 417.52, "text": "present"}, {"end": 418.36, "start": 417.92, "text": "my"}, {"end": 418.52, "start": 418.36, "text": "Venn"}, {"end": 419.04, "start": 418.52, "text": "diagram"}, {"end": 419.28, "start": 419.04, "text": "on"}, {"end": 419.44, "start": 419.28, "text": "the"}, {"end": 419.64, "start": 419.44, "text": "next"}, {"end": 419.96, "start": 419.64, "text": "slide."}], "text": " Well, when he drew this many years ago, he was a graduate student in political science. And I actually got this diagram from a Wikipedia page called Data Science Venn Diagrams. It turns out that every field has its own definition of data science, and it's one that magnifies the importance of their own field that can be represented by a Venn diagram. I'm no exception. I'll present my Venn diagram on the next slide."}, {"chunks": [{"end": 421.12, "start": 420.0, "text": "But"}, {"end": 421.92, "start": 421.12, "text": "the"}, {"end": 422.6, "start": 421.92, "text": "reason"}, {"end": 422.88, "start": 422.6, "text": "I"}, {"end": 423.36, "start": 422.88, "text": "focus"}, {"end": 423.36, "start": 423.36, "text": "on"}, {"end": 423.48, "start": 423.36, "text": "this"}, {"end": 423.76, "start": 423.48, "text": "one"}, {"end": 423.76, "start": 423.76, "text": "is"}, {"end": 424.2, "start": 423.76, "text": "that"}, {"end": 424.52, "start": 424.2, "text": "several"}, {"end": 425.12, "start": 424.52, "text": "times"}, {"end": 425.64, "start": 425.12, "text": "I've"}, {"end": 425.92, "start": 425.64, "text": "listened"}, {"end": 425.96, "start": 425.92, "text": "to"}, {"end": 427.56, "start": 425.96, "text": "statisticians"}, {"end": 428.36, "start": 427.56, "text": "presenting"}, {"end": 428.48, "start": 428.36, "text": "this"}, {"end": 428.8, "start": 428.48, "text": "very"}, {"end": 429.48, "start": 428.8, "text": "diagram"}, {"end": 429.88, "start": 429.48, "text": "as"}, {"end": 430.12, "start": 429.88, "text": "the"}, {"end": 430.64, "start": 430.12, "text": "true"}, {"end": 431.08, "start": 430.64, "text": "definition"}, {"end": 431.08, "start": 431.08, "text": "of"}, {"end": 431.12, "start": 431.08, "text": "data"}, {"end": 431.56, "start": 431.12, "text": "science."}, {"end": 431.8, "start": 431.56, "text": "And"}, {"end": 433.04, "start": 431.8, "text": "what's"}, {"end": 433.32, "start": 433.04, "text": "wrong"}, {"end": 433.36, "start": 433.32, "text": "with"}, {"end": 433.56, "start": 433.36, "text": "it?"}, {"end": 433.92, "start": 433.56, "text": "Well,"}, {"end": 434.08, "start": 433.92, "text": "it"}, {"end": 435.56, "start": 434.08, "text": "turns"}, {"end": 436.12, "start": 435.56, "text": "out"}, {"end": 436.72, "start": 436.12, "text": "everything"}, {"end": 436.84, "start": 436.72, "text": "is"}, {"end": 437.08, "start": 436.84, "text": "wrong"}, {"end": 437.52, "start": 437.08, "text": "with"}, {"end": 437.52, "start": 437.52, "text": "it."}, {"end": 437.56, "start": 437.52, "text": "Okay,"}, {"end": 439.68, "start": 437.56, "text": "first"}, {"end": 440.12, "start": 439.68, "text": "a"}, {"end": 440.88, "start": 440.12, "text": "small"}, {"end": 441.28, "start": 440.88, "text": "quibble."}, {"end": 441.36, "start": 441.28, "text": "Okay,"}, {"end": 441.36, "start": 441.36, "text": "I"}, {"end": 442.12, "start": 441.36, "text": "prefer"}, {"end": 442.48, "start": 442.12, "text": "the"}, {"end": 442.96, "start": 442.48, "text": "term"}, {"end": 443.4, "start": 442.96, "text": "domain"}, {"end": 443.84, "start": 443.4, "text": "knowledge"}, {"end": 444.04, "start": 443.84, "text": "to"}, {"end": 444.92, "start": 444.04, "text": "substantive"}, {"end": 449.96, "start": 444.92, "text": "expertise."}], "text": " But the reason I focus on this one is that several times I've listened to statisticians presenting this very diagram as the true definition of data science. And what's wrong with it? Well, it turns out everything is wrong with it. Okay, first a small quibble. Okay, I prefer the term domain knowledge to substantive expertise."}, {"chunks": [{"end": 450.16, "start": 450.0, "text": "But"}, {"end": 450.52, "start": 450.16, "text": "here's"}, {"end": 450.92, "start": 450.52, "text": "the"}, {"end": 451.04, "start": 450.92, "text": "thing"}, {"end": 451.44, "start": 451.04, "text": "that"}, {"end": 451.64, "start": 451.44, "text": "really"}, {"end": 451.76, "start": 451.64, "text": "drives"}, {"end": 451.76, "start": 451.76, "text": "me"}, {"end": 452.08, "start": 451.76, "text": "nuts."}, {"end": 452.84, "start": 452.08, "text": "Computer"}, {"end": 453.4, "start": 452.84, "text": "science"}, {"end": 453.56, "start": 453.4, "text": "is"}, {"end": 453.84, "start": 453.56, "text": "not"}, {"end": 454.2, "start": 453.84, "text": "just"}, {"end": 454.52, "start": 454.2, "text": "writing"}, {"end": 454.96, "start": 454.52, "text": "code."}, {"end": 455.64, "start": 454.96, "text": "We"}, {"end": 456.04, "start": 455.64, "text": "have"}, {"end": 456.36, "start": 456.04, "text": "very"}, {"end": 456.8, "start": 456.36, "text": "many"}, {"end": 457.32, "start": 456.8, "text": "models,"}, {"end": 458.16, "start": 457.32, "text": "abstractions,"}, {"end": 458.72, "start": 458.16, "text": "algorithms,"}, {"end": 458.8, "start": 458.72, "text": "all"}, {"end": 458.92, "start": 458.8, "text": "of"}, {"end": 459.44, "start": 458.92, "text": "which"}, {"end": 459.64, "start": 459.44, "text": "make"}, {"end": 459.8, "start": 459.64, "text": "the"}, {"end": 460.28, "start": 459.8, "text": "solution"}, {"end": 460.32, "start": 460.28, "text": "of"}, {"end": 460.68, "start": 460.32, "text": "data"}, {"end": 461.04, "start": 460.68, "text": "science"}, {"end": 461.52, "start": 461.04, "text": "problems"}, {"end": 462.48, "start": 461.52, "text": "possible."}, {"end": 462.84, "start": 462.48, "text": "A"}, {"end": 465.08, "start": 462.84, "text": "little"}, {"end": 465.84, "start": 465.08, "text": "respect"}, {"end": 466.4, "start": 465.84, "text": "would"}, {"end": 467.36, "start": 466.4, "text": "have"}, {"end": 468.2, "start": 467.36, "text": "been"}, {"end": 468.44, "start": 468.2, "text": "in"}, {"end": 468.76, "start": 468.44, "text": "order."}, {"end": 468.76, "start": 468.76, "text": "And"}, {"end": 469.2, "start": 468.76, "text": "then"}, {"end": 469.8, "start": 469.2, "text": "Conway"}, {"end": 470.8, "start": 469.8, "text": "calls"}, {"end": 470.92, "start": 470.8, "text": "a"}, {"end": 471.56, "start": 470.92, "text": "computer"}, {"end": 471.76, "start": 471.56, "text": "scientist"}, {"end": 472.08, "start": 471.76, "text": "trying"}, {"end": 472.16, "start": 472.08, "text": "to"}, {"end": 472.48, "start": 472.16, "text": "help"}, {"end": 472.76, "start": 472.48, "text": "some"}, {"end": 473.16, "start": 472.76, "text": "domain"}, {"end": 473.96, "start": 473.16, "text": "scientist"}, {"end": 474.16, "start": 473.96, "text": "a"}, {"end": 475.04, "start": 474.16, "text": "danger"}, {"end": 475.28, "start": 475.04, "text": "if"}, {"end": 475.36, "start": 475.28, "text": "they"}, {"end": 475.68, "start": 475.36, "text": "do"}, {"end": 476.08, "start": 475.68, "text": "not"}, {"end": 476.52, "start": 476.08, "text": "function"}, {"end": 476.84, "start": 476.52, "text": "under"}, {"end": 476.96, "start": 476.84, "text": "the"}, {"end": 477.52, "start": 476.96, "text": "wise"}, {"end": 478.24, "start": 477.52, "text": "guidance"}, {"end": 478.72, "start": 478.24, "text": "of"}, {"end": 478.8, "start": 478.72, "text": "a"}, {"end": 479.96, "start": 478.8, "text": "statistician."}], "text": " But here's the thing that really drives me nuts. Computer science is not just writing code. We have very many models, abstractions, algorithms, all of which make the solution of data science problems possible. A little respect would have been in order. And then Conway calls a computer scientist trying to help some domain scientist a danger if they do not function under the wise guidance of a statistician."}, {"chunks": [{"end": 480.32, "start": 480.0, "text": "I'm"}, {"end": 480.84, "start": 480.32, "text": "going"}, {"end": 481.44, "start": 480.84, "text": "to"}, {"end": 482.0, "start": 481.44, "text": "argue"}, {"end": 482.36, "start": 482.0, "text": "that"}, {"end": 482.88, "start": 482.36, "text": "most"}, {"end": 483.96, "start": 482.88, "text": "achievements"}, {"end": 484.72, "start": 483.96, "text": "of"}, {"end": 485.28, "start": 484.72, "text": "data"}, {"end": 485.84, "start": 485.28, "text": "science"}, {"end": 486.04, "start": 485.84, "text": "really"}, {"end": 486.56, "start": 486.04, "text": "fall"}, {"end": 486.6, "start": 486.56, "text": "in"}, {"end": 486.96, "start": 486.6, "text": "this"}, {"end": 487.64, "start": 486.96, "text": "category,"}, {"end": 487.68, "start": 487.64, "text": "this"}, {"end": 488.48, "start": 487.68, "text": "portion"}, {"end": 488.92, "start": 488.48, "text": "of"}, {"end": 490.8, "start": 488.92, "text": "the"}, {"end": 491.76, "start": 490.8, "text": "Venn"}, {"end": 492.76, "start": 491.76, "text": "diagram."}, {"end": 493.68, "start": 492.76, "text": "Here's"}, {"end": 494.24, "start": 493.68, "text": "what"}, {"end": 494.64, "start": 494.24, "text": "Conway"}, {"end": 495.28, "start": 494.64, "text": "calls"}, {"end": 495.84, "start": 495.28, "text": "traditional"}, {"end": 497.04, "start": 495.84, "text": "research."}, {"end": 497.6, "start": 497.04, "text": "Supplying"}, {"end": 498.36, "start": 497.6, "text": "statistics"}, {"end": 498.36, "start": 498.36, "text": "to"}, {"end": 498.4, "start": 498.36, "text": "a"}, {"end": 498.68, "start": 498.4, "text": "problem"}, {"end": 499.2, "start": 498.68, "text": "without"}, {"end": 499.2, "start": 499.2, "text": "writing"}, {"end": 499.2, "start": 499.2, "text": "any"}, {"end": 499.24, "start": 499.2, "text": "code."}, {"end": 499.48, "start": 499.24, "text": "Now,"}, {"end": 499.52, "start": 499.48, "text": "I"}, {"end": 500.08, "start": 499.52, "text": "don't"}, {"end": 500.8, "start": 500.08, "text": "know"}, {"end": 501.0, "start": 500.8, "text": "whose"}, {"end": 502.52, "start": 501.0, "text": "tradition"}, {"end": 502.72, "start": 502.52, "text": "that"}, {"end": 503.16, "start": 502.72, "text": "is,"}, {"end": 503.4, "start": 503.16, "text": "but"}, {"end": 503.72, "start": 503.4, "text": "I"}, {"end": 504.2, "start": 503.72, "text": "hope"}, {"end": 504.52, "start": 504.2, "text": "it's"}, {"end": 504.56, "start": 504.52, "text": "not"}, {"end": 505.92, "start": 504.56, "text": "yours."}, {"end": 506.2, "start": 505.92, "text": "All"}, {"end": 506.64, "start": 506.2, "text": "that"}, {"end": 506.96, "start": 506.64, "text": "does"}, {"end": 506.96, "start": 506.96, "text": "is"}, {"end": 508.04, "start": 506.96, "text": "provide"}, {"end": 508.4, "start": 508.04, "text": "amusement"}, {"end": 508.6, "start": 508.4, "text": "for"}, {"end": 508.84, "start": 508.6, "text": "the"}, {"end": 509.84, "start": 508.84, "text": "statistician"}, {"end": 509.84, "start": 509.84, "text": "or"}, {"end": 509.96, "start": 509.84, "text": "the"}], "text": " I'm going to argue that most achievements of data science really fall in this category, this portion of the Venn diagram. Here's what Conway calls traditional research. Supplying statistics to a problem without writing any code. Now, I don't know whose tradition that is, but I hope it's not yours. All that does is provide amusement for the statistician or the"}, {"chunks": [{"end": 510.32, "start": 510.0, "text": "and"}, {"end": 511.32, "start": 510.32, "text": "it"}, {"end": 511.6, "start": 511.32, "text": "doesn't"}, {"end": 512.32, "start": 511.6, "text": "provide"}, {"end": 512.68, "start": 512.32, "text": "a"}, {"end": 513.16, "start": 512.68, "text": "solution"}, {"end": 516.28, "start": 513.16, "text": "to"}, {"end": 516.96, "start": 516.28, "text": "anything."}, {"end": 517.4, "start": 516.96, "text": "And"}, {"end": 517.76, "start": 517.4, "text": "last,"}, {"end": 518.2, "start": 517.76, "text": "is"}, {"end": 518.68, "start": 518.2, "text": "machine"}, {"end": 519.12, "start": 518.68, "text": "learning"}, {"end": 519.52, "start": 519.12, "text": "really"}, {"end": 520.08, "start": 519.52, "text": "something"}, {"end": 520.16, "start": 520.08, "text": "that"}, {"end": 520.6, "start": 520.16, "text": "doesn't"}, {"end": 521.24, "start": 520.6, "text": "apply"}, {"end": 521.4, "start": 521.24, "text": "to"}, {"end": 521.52, "start": 521.4, "text": "any"}, {"end": 523.16, "start": 521.52, "text": "domain?"}, {"end": 523.84, "start": 523.16, "text": "Well,"}, {"end": 524.16, "start": 523.84, "text": "okay,"}, {"end": 524.48, "start": 524.16, "text": "there"}, {"end": 524.48, "start": 524.48, "text": "have"}, {"end": 524.72, "start": 524.48, "text": "been"}, {"end": 525.04, "start": 524.72, "text": "some"}, {"end": 525.4, "start": 525.04, "text": "great"}, {"end": 526.0, "start": 525.4, "text": "achievements"}, {"end": 526.16, "start": 526.0, "text": "by"}, {"end": 526.44, "start": 526.16, "text": "people"}, {"end": 527.28, "start": 526.44, "text": "looking"}, {"end": 527.56, "start": 527.28, "text": "at"}, {"end": 527.56, "start": 527.56, "text": "the"}, {"end": 528.24, "start": 527.56, "text": "methodology"}, {"end": 528.36, "start": 528.24, "text": "of"}, {"end": 528.68, "start": 528.36, "text": "machine"}, {"end": 529.16, "start": 528.68, "text": "learning"}, {"end": 529.56, "start": 529.16, "text": "rather"}, {"end": 530.16, "start": 529.56, "text": "than"}, {"end": 530.96, "start": 530.16, "text": "applying"}, {"end": 531.8, "start": 530.96, "text": "it."}, {"end": 532.2, "start": 531.8, "text": "I"}, {"end": 533.68, "start": 532.2, "text": "think"}, {"end": 534.24, "start": 533.68, "text": "that"}, {"end": 534.68, "start": 534.24, "text": "the"}, {"end": 535.36, "start": 534.68, "text": "reason"}, {"end": 535.72, "start": 535.36, "text": "everybody"}, {"end": 535.96, "start": 535.72, "text": "wants"}, {"end": 536.2, "start": 535.96, "text": "to"}, {"end": 536.44, "start": 536.2, "text": "engage"}, {"end": 536.6, "start": 536.44, "text": "in"}, {"end": 537.08, "start": 536.6, "text": "machine"}, {"end": 537.48, "start": 537.08, "text": "learning"}, {"end": 537.68, "start": 537.48, "text": "these"}, {"end": 538.16, "start": 537.68, "text": "days"}, {"end": 538.32, "start": 538.16, "text": "is"}, {"end": 538.92, "start": 538.32, "text": "because"}, {"end": 539.36, "start": 538.92, "text": "it's"}, {"end": 539.64, "start": 539.36, "text": "so"}, {"end": 539.96, "start": 539.64, "text": "useful"}], "text": " and it doesn't provide a solution to anything. And last, is machine learning really something that doesn't apply to any domain? Well, okay, there have been some great achievements by people looking at the methodology of machine learning rather than applying it. I think that the reason everybody wants to engage in machine learning these days is because it's so useful"}, {"chunks": [{"end": 540.2, "start": 540.0, "text": "and"}, {"end": 540.6, "start": 540.2, "text": "solving"}, {"end": 541.32, "start": 540.6, "text": "problems"}, {"end": 541.32, "start": 541.32, "text": "in"}, {"end": 541.36, "start": 541.32, "text": "a"}, {"end": 542.16, "start": 541.36, "text": "variety"}, {"end": 542.48, "start": 542.16, "text": "of"}, {"end": 546.08, "start": 542.48, "text": "domains."}, {"end": 546.96, "start": 546.08, "text": "Okay,"}, {"end": 547.4, "start": 546.96, "text": "so"}, {"end": 547.76, "start": 547.4, "text": "here's"}, {"end": 548.08, "start": 547.76, "text": "my"}, {"end": 548.52, "start": 548.08, "text": "Venn"}, {"end": 548.88, "start": 548.52, "text": "diagram."}, {"end": 552.4, "start": 548.88, "text": "Okay,"}, {"end": 552.88, "start": 552.4, "text": "there's"}, {"end": 553.88, "start": 552.88, "text": "computer"}, {"end": 555.84, "start": 553.88, "text": "science"}, {"end": 556.24, "start": 555.84, "text": "and"}, {"end": 556.8, "start": 556.24, "text": "there"}, {"end": 556.84, "start": 556.8, "text": "are"}, {"end": 557.52, "start": 556.84, "text": "scientific"}, {"end": 558.0, "start": 557.52, "text": "domains"}, {"end": 558.16, "start": 558.0, "text": "we'd"}, {"end": 558.32, "start": 558.16, "text": "like"}, {"end": 559.12, "start": 558.32, "text": "to"}, {"end": 559.96, "start": 559.12, "text": "affect"}, {"end": 560.16, "start": 559.96, "text": "and"}, {"end": 561.0, "start": 560.16, "text": "somewhere"}, {"end": 562.04, "start": 561.0, "text": "in"}, {"end": 562.76, "start": 562.04, "text": "the"}, {"end": 562.88, "start": 562.76, "text": "middle"}, {"end": 567.08, "start": 562.88, "text": "is"}, {"end": 567.88, "start": 567.08, "text": "data"}, {"end": 568.36, "start": 567.88, "text": "science."}, {"end": 568.6, "start": 568.36, "text": "Okay,"}, {"end": 569.96, "start": 568.6, "text": "now,"}], "text": " and solving problems in a variety of domains. Okay, so here's my Venn diagram. Okay, there's computer science and there are scientific domains we'd like to affect and somewhere in the middle is data science. Okay, now,"}, {"chunks": [{"end": 570.28, "start": 570.0, "text": "Okay,"}, {"end": 570.76, "start": 570.28, "text": "machine"}, {"end": 572.04, "start": 570.76, "text": "learning"}, {"end": 572.52, "start": 572.04, "text": "is"}, {"end": 572.56, "start": 572.52, "text": "a"}, {"end": 573.04, "start": 572.56, "text": "branch"}, {"end": 573.12, "start": 573.04, "text": "of"}, {"end": 573.6, "start": 573.12, "text": "data"}, {"end": 573.84, "start": 573.6, "text": "science."}, {"end": 573.88, "start": 573.84, "text": "It"}, {"end": 575.16, "start": 573.88, "text": "is"}, {"end": 575.64, "start": 575.16, "text": "used"}, {"end": 575.96, "start": 575.64, "text": "for"}, {"end": 576.0, "start": 575.96, "text": "a"}, {"end": 576.0, "start": 576.0, "text": "lot"}, {"end": 576.72, "start": 576.0, "text": "of"}, {"end": 577.36, "start": 576.72, "text": "work"}, {"end": 577.68, "start": 577.36, "text": "that"}, {"end": 578.36, "start": 577.68, "text": "serves"}, {"end": 578.92, "start": 578.36, "text": "the"}, {"end": 579.44, "start": 578.92, "text": "application"}, {"end": 579.64, "start": 579.44, "text": "domains,"}, {"end": 580.0, "start": 579.64, "text": "but"}, {"end": 580.88, "start": 580.0, "text": "it"}, {"end": 581.24, "start": 580.88, "text": "also"}, {"end": 581.52, "start": 581.24, "text": "has"}, {"end": 581.88, "start": 581.52, "text": "uses"}, {"end": 582.12, "start": 581.88, "text": "in"}, {"end": 582.68, "start": 582.12, "text": "purely"}, {"end": 583.04, "start": 582.68, "text": "internal"}, {"end": 583.4, "start": 583.04, "text": "matters"}, {"end": 583.88, "start": 583.4, "text": "of"}, {"end": 584.68, "start": 583.88, "text": "computer"}, {"end": 585.64, "start": 584.68, "text": "science,"}, {"end": 586.12, "start": 585.64, "text": "often"}, {"end": 586.32, "start": 586.12, "text": "in"}, {"end": 587.2, "start": 586.32, "text": "applications"}, {"end": 587.28, "start": 587.2, "text": "that"}, {"end": 587.64, "start": 587.28, "text": "are"}, {"end": 588.08, "start": 587.64, "text": "called"}, {"end": 588.4, "start": 588.08, "text": "artificial"}, {"end": 588.88, "start": 588.4, "text": "intelligence"}, {"end": 589.24, "start": 588.88, "text": "rather"}, {"end": 589.84, "start": 589.24, "text": "than"}, {"end": 590.36, "start": 589.84, "text": "machine"}, {"end": 591.4, "start": 590.36, "text": "learning."}, {"end": 592.28, "start": 591.4, "text": "For"}, {"end": 593.28, "start": 592.28, "text": "example,"}, {"end": 594.12, "start": 593.28, "text": "machine"}, {"end": 594.92, "start": 594.12, "text": "learning"}, {"end": 595.16, "start": 594.92, "text": "is"}, {"end": 595.92, "start": 595.16, "text": "useful"}, {"end": 596.04, "start": 595.92, "text": "in"}, {"end": 596.56, "start": 596.04, "text": "protecting"}, {"end": 597.0, "start": 596.56, "text": "computer"}, {"end": 597.92, "start": 597.0, "text": "systems"}, {"end": 598.12, "start": 597.92, "text": "from"}, {"end": 599.96, "start": 598.12, "text": "intrusions."}], "text": " Okay, machine learning is a branch of data science. It is used for a lot of work that serves the application domains, but it also has uses in purely internal matters of computer science, often in applications that are called artificial intelligence rather than machine learning. For example, machine learning is useful in protecting computer systems from intrusions."}, {"chunks": [{"end": 600.64, "start": 600.0, "text": "which"}, {"end": 601.16, "start": 600.64, "text": "is"}, {"end": 601.44, "start": 601.16, "text": "a"}, {"end": 601.48, "start": 601.44, "text": "subject"}, {"end": 601.52, "start": 601.48, "text": "that"}, {"end": 601.96, "start": 601.52, "text": "falls"}, {"end": 602.6, "start": 601.96, "text": "squarely"}, {"end": 602.8, "start": 602.6, "text": "within"}, {"end": 603.44, "start": 602.8, "text": "computer"}, {"end": 603.68, "start": 603.44, "text": "science,"}, {"end": 604.08, "start": 603.68, "text": "not"}, {"end": 604.2, "start": 604.08, "text": "an"}, {"end": 604.68, "start": 604.2, "text": "application"}, {"end": 605.72, "start": 604.68, "text": "domain."}, {"end": 606.6, "start": 605.72, "text": "Or"}, {"end": 606.84, "start": 606.6, "text": "machine"}, {"end": 607.16, "start": 606.84, "text": "learning"}, {"end": 607.4, "start": 607.16, "text": "is"}, {"end": 607.96, "start": 607.4, "text": "useful"}, {"end": 608.2, "start": 607.96, "text": "in"}, {"end": 608.44, "start": 608.2, "text": "the"}, {"end": 609.0, "start": 608.44, "text": "implementation"}, {"end": 609.28, "start": 609.0, "text": "of"}, {"end": 609.6, "start": 609.28, "text": "things"}, {"end": 609.68, "start": 609.6, "text": "like"}, {"end": 610.6, "start": 609.68, "text": "chatbots"}, {"end": 610.76, "start": 610.6, "text": "and"}, {"end": 611.16, "start": 610.76, "text": "lots"}, {"end": 611.4, "start": 611.16, "text": "of"}, {"end": 611.44, "start": 611.4, "text": "other"}, {"end": 611.84, "start": 611.44, "text": "things"}, {"end": 612.28, "start": 611.84, "text": "that"}, {"end": 612.68, "start": 612.28, "text": "are"}, {"end": 613.52, "start": 612.68, "text": "sufficiently"}, {"end": 614.08, "start": 613.52, "text": "general"}, {"end": 614.16, "start": 614.08, "text": "in"}, {"end": 614.56, "start": 614.16, "text": "application"}, {"end": 614.88, "start": 614.56, "text": "area"}, {"end": 615.12, "start": 614.88, "text": "that"}, {"end": 615.16, "start": 615.12, "text": "they"}, {"end": 615.64, "start": 615.16, "text": "don't"}, {"end": 615.96, "start": 615.64, "text": "really"}, {"end": 616.36, "start": 615.96, "text": "belong"}, {"end": 616.72, "start": 616.36, "text": "to"}, {"end": 616.8, "start": 616.72, "text": "any"}, {"end": 617.28, "start": 616.8, "text": "particular"}, {"end": 617.48, "start": 617.28, "text": "domain."}, {"end": 617.48, "start": 617.48, "text": "Now,"}, {"end": 617.68, "start": 617.48, "text": "math"}, {"end": 617.68, "start": 617.68, "text": "and"}, {"end": 620.48, "start": 617.68, "text": "statistics"}, {"end": 620.92, "start": 620.48, "text": "both"}, {"end": 621.92, "start": 620.92, "text": "have"}, {"end": 622.08, "start": 621.92, "text": "a"}, {"end": 622.92, "start": 622.08, "text": "role"}, {"end": 622.92, "start": 622.92, "text": "to"}, {"end": 623.16, "start": 622.92, "text": "play"}, {"end": 623.44, "start": 623.16, "text": "in"}, {"end": 623.64, "start": 623.44, "text": "this"}, {"end": 624.28, "start": 623.64, "text": "picture."}, {"end": 624.48, "start": 624.28, "text": "And"}, {"end": 624.64, "start": 624.48, "text": "I"}, {"end": 624.92, "start": 624.64, "text": "want"}, {"end": 625.32, "start": 624.92, "text": "to,"}, {"end": 625.72, "start": 625.32, "text": "first"}, {"end": 625.76, "start": 625.72, "text": "of"}, {"end": 625.76, "start": 625.76, "text": "all,"}, {"end": 626.68, "start": 625.76, "text": "apologize"}, {"end": 627.36, "start": 626.68, "text": "that"}, {"end": 627.68, "start": 627.36, "text": "I"}, {"end": 628.12, "start": 627.68, "text": "was"}, {"end": 628.32, "start": 628.12, "text": "not"}, {"end": 628.32, "start": 628.32, "text": "able"}, {"end": 628.32, "start": 628.32, "text": "to"}, {"end": 628.32, "start": 628.32, "text": "draw"}, {"end": 628.36, "start": 628.32, "text": "the"}, {"end": 628.92, "start": 628.36, "text": "bubbles"}, {"end": 629.12, "start": 628.92, "text": "in"}, {"end": 629.44, "start": 629.12, "text": "a"}, {"end": 629.48, "start": 629.44, "text": "wiggly"}, {"end": 629.48, "start": 629.48, "text": "enough"}, {"end": 629.96, "start": 629.48, "text": "shape"}], "text": " which is a subject that falls squarely within computer science, not an application domain. Or machine learning is useful in the implementation of things like chatbots and lots of other things that are sufficiently general in application area that they don't really belong to any particular domain. Now, math and statistics both have a role to play in this picture. And I want to, first of all, apologize that I was not able to draw the bubbles in a wiggly enough shape"}, {"chunks": [{"end": 630.68, "start": 630.0, "text": "for"}, {"end": 630.88, "start": 630.68, "text": "me"}, {"end": 631.24, "start": 630.88, "text": "to"}, {"end": 631.84, "start": 631.24, "text": "respect"}, {"end": 632.24, "start": 631.84, "text": "the"}, {"end": 632.6, "start": 632.24, "text": "fact"}, {"end": 633.2, "start": 632.6, "text": "that"}, {"end": 633.76, "start": 633.2, "text": "both"}, {"end": 633.96, "start": 633.76, "text": "of"}, {"end": 634.4, "start": 633.96, "text": "these"}, {"end": 635.28, "start": 634.4, "text": "fields"}, {"end": 636.08, "start": 635.28, "text": "deserve"}, {"end": 636.52, "start": 636.08, "text": "large"}, {"end": 636.96, "start": 636.52, "text": "bubbles,"}, {"end": 637.16, "start": 636.96, "text": "not"}, {"end": 637.56, "start": 637.16, "text": "the"}, {"end": 638.08, "start": 637.56, "text": "small"}, {"end": 638.48, "start": 638.08, "text": "ones"}, {"end": 638.6, "start": 638.48, "text": "that"}, {"end": 638.68, "start": 638.6, "text": "I"}, {"end": 638.8, "start": 638.68, "text": "drew"}, {"end": 639.12, "start": 638.8, "text": "here."}, {"end": 639.68, "start": 639.12, "text": "But"}, {"end": 640.0, "start": 639.68, "text": "my"}, {"end": 640.16, "start": 640.0, "text": "point"}, {"end": 640.32, "start": 640.16, "text": "is"}, {"end": 641.08, "start": 640.32, "text": "that"}, {"end": 641.8, "start": 641.08, "text": "math"}, {"end": 642.2, "start": 641.8, "text": "and"}, {"end": 643.12, "start": 642.2, "text": "stat"}, {"end": 643.44, "start": 643.12, "text": "have"}, {"end": 643.84, "start": 643.44, "text": "lots"}, {"end": 644.08, "start": 643.84, "text": "of"}, {"end": 645.28, "start": 644.08, "text": "applications"}, {"end": 645.56, "start": 645.28, "text": "in"}, {"end": 646.44, "start": 645.56, "text": "computer"}, {"end": 646.92, "start": 646.44, "text": "science,"}, {"end": 647.2, "start": 646.92, "text": "but"}, {"end": 647.52, "start": 647.2, "text": "they"}, {"end": 649.04, "start": 647.52, "text": "don't"}, {"end": 650.4, "start": 649.04, "text": "affect"}, {"end": 650.72, "start": 650.4, "text": "domains"}, {"end": 651.12, "start": 650.72, "text": "by"}, {"end": 651.68, "start": 651.12, "text": "themselves."}, {"end": 652.04, "start": 651.68, "text": "They"}, {"end": 652.4, "start": 652.04, "text": "do"}, {"end": 653.64, "start": 652.4, "text": "so"}, {"end": 653.84, "start": 653.64, "text": "through"}, {"end": 654.24, "start": 653.84, "text": "the"}, {"end": 654.92, "start": 654.24, "text": "algorithms"}, {"end": 654.92, "start": 654.92, "text": "that"}, {"end": 655.32, "start": 654.92, "text": "they"}, {"end": 655.4, "start": 655.32, "text": "help"}, {"end": 656.16, "start": 655.4, "text": "design"}, {"end": 656.68, "start": 656.16, "text": "and"}, {"end": 659.96, "start": 656.68, "text": "analyze."}], "text": " for me to respect the fact that both of these fields deserve large bubbles, not the small ones that I drew here. But my point is that math and stat have lots of applications in computer science, but they don't affect domains by themselves. They do so through the algorithms that they help design and analyze."}, {"chunks": [{"end": 660.8, "start": 660.0, "text": "Okay,"}, {"end": 664.36, "start": 660.8, "text": "now"}, {"end": 665.72, "start": 664.36, "text": "there's"}, {"end": 666.64, "start": 665.72, "text": "a"}, {"end": 667.08, "start": 666.64, "text": "lot"}, {"end": 667.44, "start": 667.08, "text": "of"}, {"end": 668.36, "start": 667.44, "text": "value"}, {"end": 668.52, "start": 668.36, "text": "in"}, {"end": 668.96, "start": 668.52, "text": "what"}, {"end": 669.8, "start": 668.96, "text": "statistics"}, {"end": 670.32, "start": 669.8, "text": "brings"}, {"end": 671.08, "start": 670.32, "text": "to"}, {"end": 671.4, "start": 671.08, "text": "the"}, {"end": 671.8, "start": 671.4, "text": "table."}, {"end": 672.4, "start": 671.8, "text": "For"}, {"end": 673.0, "start": 672.4, "text": "example,"}, {"end": 673.4, "start": 673.0, "text": "many"}, {"end": 673.44, "start": 673.4, "text": "of"}, {"end": 673.48, "start": 673.44, "text": "the"}, {"end": 674.0, "start": 673.48, "text": "most"}, {"end": 674.28, "start": 674.0, "text": "efficient"}, {"end": 675.08, "start": 674.28, "text": "algorithms"}, {"end": 675.36, "start": 675.08, "text": "are"}, {"end": 676.16, "start": 675.36, "text": "randomized"}, {"end": 677.12, "start": 676.16, "text": "algorithms"}, {"end": 678.08, "start": 677.12, "text": "for"}, {"end": 678.48, "start": 678.08, "text": "things"}, {"end": 678.68, "start": 678.48, "text": "that"}, {"end": 679.16, "start": 678.68, "text": "usually"}, {"end": 679.56, "start": 679.16, "text": "perform"}, {"end": 680.12, "start": 679.56, "text": "very"}, {"end": 680.52, "start": 680.12, "text": "well,"}, {"end": 680.56, "start": 680.52, "text": "but"}, {"end": 680.6, "start": 680.56, "text": "have"}, {"end": 680.88, "start": 680.6, "text": "bad"}, {"end": 681.2, "start": 680.88, "text": "worst"}, {"end": 682.0, "start": 681.2, "text": "cases."}, {"end": 683.28, "start": 682.0, "text": "Quicksort"}, {"end": 683.64, "start": 683.28, "text": "is"}, {"end": 683.68, "start": 683.64, "text": "an"}, {"end": 684.2, "start": 683.68, "text": "example"}, {"end": 684.44, "start": 684.2, "text": "that"}, {"end": 684.64, "start": 684.44, "text": "might"}, {"end": 684.88, "start": 684.64, "text": "be"}, {"end": 685.4, "start": 684.88, "text": "familiar"}, {"end": 686.16, "start": 685.4, "text": "to"}, {"end": 686.48, "start": 686.16, "text": "you."}, {"end": 687.16, "start": 686.48, "text": "It's"}, {"end": 687.76, "start": 687.16, "text": "n"}, {"end": 688.2, "start": 687.76, "text": "log"}, {"end": 688.36, "start": 688.2, "text": "n"}, {"end": 688.52, "start": 688.36, "text": "on"}, {"end": 689.24, "start": 688.52, "text": "average,"}, {"end": 689.24, "start": 689.24, "text": "but"}, {"end": 689.24, "start": 689.24, "text": "the"}, {"end": 689.4, "start": 689.24, "text": "worst"}, {"end": 689.96, "start": 689.4, "text": "case"}], "text": " Okay, now there's a lot of value in what statistics brings to the table. For example, many of the most efficient algorithms are randomized algorithms for things that usually perform very well, but have bad worst cases. Quicksort is an example that might be familiar to you. It's n log n on average, but the worst case"}, {"chunks": [{"end": 690.6, "start": 690.0, "text": "case"}, {"end": 690.92, "start": 690.6, "text": "is"}, {"end": 690.96, "start": 690.92, "text": "N"}, {"end": 691.92, "start": 690.96, "text": "squared."}, {"end": 692.28, "start": 691.92, "text": "You"}, {"end": 692.8, "start": 692.28, "text": "need"}, {"end": 692.84, "start": 692.8, "text": "a"}, {"end": 693.32, "start": 692.84, "text": "statistical"}, {"end": 693.96, "start": 693.32, "text": "analysis"}, {"end": 694.12, "start": 693.96, "text": "to"}, {"end": 694.68, "start": 694.12, "text": "assure"}, {"end": 695.12, "start": 694.68, "text": "that"}, {"end": 695.48, "start": 695.12, "text": "the"}, {"end": 695.84, "start": 695.48, "text": "average"}, {"end": 696.24, "start": 695.84, "text": "case"}, {"end": 696.68, "start": 696.24, "text": "is"}, {"end": 697.36, "start": 696.68, "text": "as"}, {"end": 697.92, "start": 697.36, "text": "good"}, {"end": 698.0, "start": 697.92, "text": "as"}, {"end": 698.2, "start": 698.0, "text": "I"}, {"end": 698.8, "start": 698.2, "text": "said."}, {"end": 699.32, "start": 698.8, "text": "Wishing"}, {"end": 699.52, "start": 699.32, "text": "and"}, {"end": 699.8, "start": 699.52, "text": "hoping"}, {"end": 700.12, "start": 699.8, "text": "is"}, {"end": 700.68, "start": 700.12, "text": "not"}, {"end": 701.32, "start": 700.68, "text": "always"}, {"end": 704.44, "start": 701.32, "text": "enough."}, {"end": 705.28, "start": 704.44, "text": "In"}, {"end": 705.56, "start": 705.28, "text": "addition,"}, {"end": 705.88, "start": 705.56, "text": "there"}, {"end": 706.76, "start": 705.88, "text": "are"}, {"end": 707.64, "start": 706.76, "text": "certain"}, {"end": 708.52, "start": 707.64, "text": "claims"}, {"end": 708.92, "start": 708.52, "text": "that"}, {"end": 709.88, "start": 708.92, "text": "require"}, {"end": 710.68, "start": 709.88, "text": "precise"}, {"end": 711.44, "start": 710.68, "text": "statistical"}, {"end": 713.2, "start": 711.44, "text": "analysis."}, {"end": 713.96, "start": 713.2, "text": "For"}, {"end": 714.72, "start": 713.96, "text": "example,"}, {"end": 714.88, "start": 714.72, "text": "when"}, {"end": 714.88, "start": 714.88, "text": "you"}, {"end": 715.32, "start": 714.88, "text": "claim"}, {"end": 715.88, "start": 715.32, "text": "10%"}, {"end": 715.88, "start": 715.88, "text": "of"}, {"end": 716.04, "start": 715.88, "text": "the"}, {"end": 716.76, "start": 716.04, "text": "population"}, {"end": 717.12, "start": 716.76, "text": "is"}, {"end": 717.36, "start": 717.12, "text": "in"}, {"end": 718.04, "start": 717.36, "text": "poverty"}, {"end": 718.12, "start": 718.04, "text": "in"}, {"end": 718.4, "start": 718.12, "text": "some"}, {"end": 719.08, "start": 718.4, "text": "place"}, {"end": 719.48, "start": 719.08, "text": "or"}, {"end": 719.96, "start": 719.48, "text": "other,"}], "text": " case is N squared. You need a statistical analysis to assure that the average case is as good as I said. Wishing and hoping is not always enough. In addition, there are certain claims that require precise statistical analysis. For example, when you claim 10% of the population is in poverty in some place or other,"}, {"chunks": [{"end": 720.28, "start": 720.0, "text": "Did"}, {"end": 720.48, "start": 720.28, "text": "you"}, {"end": 720.8, "start": 720.48, "text": "mean"}, {"end": 721.28, "start": 720.8, "text": "that"}, {"end": 721.6, "start": 721.28, "text": "there's"}, {"end": 722.08, "start": 721.6, "text": "a"}, {"end": 722.8, "start": 722.08, "text": "95%"}, {"end": 723.72, "start": 722.8, "text": "probability"}, {"end": 723.76, "start": 723.72, "text": "that"}, {"end": 724.12, "start": 723.76, "text": "the"}, {"end": 724.44, "start": 724.12, "text": "actual"}, {"end": 725.0, "start": 724.44, "text": "percentage"}, {"end": 725.12, "start": 725.0, "text": "is"}, {"end": 725.24, "start": 725.12, "text": "between"}, {"end": 725.48, "start": 725.24, "text": "9"}, {"end": 725.92, "start": 725.48, "text": "and"}, {"end": 727.52, "start": 725.92, "text": "11%"}, {"end": 727.8, "start": 727.52, "text": "or"}, {"end": 728.4, "start": 727.8, "text": "that"}, {"end": 728.84, "start": 728.4, "text": "there"}, {"end": 728.84, "start": 728.84, "text": "is"}, {"end": 729.04, "start": 728.84, "text": "a"}, {"end": 730.2, "start": 729.04, "text": "75%"}, {"end": 730.76, "start": 730.2, "text": "probability"}, {"end": 731.32, "start": 730.76, "text": "that"}, {"end": 731.84, "start": 731.32, "text": "it"}, {"end": 732.12, "start": 731.84, "text": "is"}, {"end": 732.24, "start": 732.12, "text": "between"}, {"end": 733.36, "start": 732.24, "text": "2%"}, {"end": 733.76, "start": 733.36, "text": "and"}, {"end": 734.88, "start": 733.76, "text": "20%?"}, {"end": 735.32, "start": 734.88, "text": "You"}, {"end": 735.84, "start": 735.32, "text": "need"}, {"end": 736.24, "start": 735.84, "text": "to"}, {"end": 736.92, "start": 736.24, "text": "get"}, {"end": 737.48, "start": 736.92, "text": "the"}, {"end": 741.08, "start": 737.48, "text": "story"}, {"end": 742.28, "start": 741.08, "text": "right."}, {"end": 744.0, "start": 742.28, "text": "But"}, {"end": 744.28, "start": 744.0, "text": "one"}, {"end": 744.48, "start": 744.28, "text": "of"}, {"end": 745.52, "start": 744.48, "text": "the"}, {"end": 746.84, "start": 745.52, "text": "things"}, {"end": 747.24, "start": 746.84, "text": "I"}, {"end": 747.64, "start": 747.24, "text": "learned"}, {"end": 747.92, "start": 747.64, "text": "from"}, {"end": 748.28, "start": 747.92, "text": "the"}, {"end": 748.76, "start": 748.28, "text": "data"}, {"end": 749.4, "start": 748.76, "text": "science"}, {"end": 749.96, "start": 749.4, "text": "education"}], "text": " Did you mean that there's a 95% probability that the actual percentage is between 9 and 11% or that there is a 75% probability that it is between 2% and 20%? You need to get the story right. But one of the things I learned from the data science education"}, {"chunks": [{"end": 750.88, "start": 750.0, "text": "meetings"}, {"end": 751.8, "start": 750.88, "text": "was"}, {"end": 752.28, "start": 751.8, "text": "that"}, {"end": 753.36, "start": 752.28, "text": "statisticians"}, {"end": 753.8, "start": 753.36, "text": "tend"}, {"end": 753.92, "start": 753.8, "text": "to"}, {"end": 754.12, "start": 753.92, "text": "think"}, {"end": 754.52, "start": 754.12, "text": "with"}, {"end": 754.64, "start": 754.52, "text": "the"}, {"end": 754.96, "start": 754.64, "text": "mind"}, {"end": 755.44, "start": 754.96, "text": "of"}, {"end": 755.76, "start": 755.44, "text": "the"}, {"end": 756.96, "start": 755.76, "text": "mathematician."}, {"end": 758.0, "start": 756.96, "text": "That"}, {"end": 758.52, "start": 758.0, "text": "is,"}, {"end": 759.04, "start": 758.52, "text": "they're"}, {"end": 759.36, "start": 759.04, "text": "too"}, {"end": 759.56, "start": 759.36, "text": "much"}, {"end": 760.2, "start": 759.56, "text": "concerned"}, {"end": 760.4, "start": 760.2, "text": "with"}, {"end": 761.8, "start": 760.4, "text": "analysis"}, {"end": 762.36, "start": 761.8, "text": "and"}, {"end": 762.48, "start": 762.36, "text": "not"}, {"end": 762.64, "start": 762.48, "text": "enough"}, {"end": 762.8, "start": 762.64, "text": "with"}, {"end": 764.44, "start": 762.8, "text": "problem"}, {"end": 765.08, "start": 764.44, "text": "solving."}, {"end": 765.52, "start": 765.08, "text": "Just"}, {"end": 765.76, "start": 765.52, "text": "to"}, {"end": 765.88, "start": 765.76, "text": "give"}, {"end": 766.08, "start": 765.88, "text": "you"}, {"end": 766.28, "start": 766.08, "text": "one"}, {"end": 766.72, "start": 766.28, "text": "example,"}, {"end": 767.28, "start": 766.72, "text": "one"}, {"end": 767.28, "start": 767.28, "text": "of"}, {"end": 767.32, "start": 767.28, "text": "the"}, {"end": 768.56, "start": 767.32, "text": "topics"}, {"end": 769.08, "start": 768.56, "text": "discussed"}, {"end": 769.08, "start": 769.08, "text": "in"}, {"end": 769.28, "start": 769.08, "text": "this"}, {"end": 769.48, "start": 769.28, "text": "group,"}, {"end": 769.68, "start": 769.48, "text": "we"}, {"end": 770.12, "start": 769.68, "text": "heard"}, {"end": 771.36, "start": 770.12, "text": "about"}, {"end": 771.92, "start": 771.36, "text": "a"}, {"end": 772.8, "start": 771.92, "text": "statistics"}, {"end": 773.88, "start": 772.8, "text": "view"}, {"end": 774.16, "start": 773.88, "text": "of"}, {"end": 774.24, "start": 774.16, "text": "a"}, {"end": 774.72, "start": 774.24, "text": "hackathon,"}, {"end": 774.84, "start": 774.72, "text": "which"}, {"end": 775.08, "start": 774.84, "text": "is"}, {"end": 775.12, "start": 775.08, "text": "a"}, {"end": 775.64, "start": 775.12, "text": "contest"}, {"end": 776.36, "start": 775.64, "text": "that"}, {"end": 776.88, "start": 776.36, "text": "they"}, {"end": 777.08, "start": 776.88, "text": "actually"}, {"end": 777.4, "start": 777.08, "text": "run"}, {"end": 777.52, "start": 777.4, "text": "each"}, {"end": 778.76, "start": 777.52, "text": "year,"}, {"end": 779.48, "start": 778.76, "text": "which"}, {"end": 779.8, "start": 779.48, "text": "a"}, {"end": 779.96, "start": 779.8, "text": "team"}], "text": " meetings was that statisticians tend to think with the mind of the mathematician. That is, they're too much concerned with analysis and not enough with problem solving. Just to give you one example, one of the topics discussed in this group, we heard about a statistics view of a hackathon, which is a contest that they actually run each year, which a team"}, {"chunks": [{"end": 780.2, "start": 780.0, "text": "of"}, {"end": 780.84, "start": 780.2, "text": "students"}, {"end": 780.84, "start": 780.84, "text": "are"}, {"end": 780.92, "start": 780.84, "text": "presented"}, {"end": 781.32, "start": 780.92, "text": "with"}, {"end": 781.72, "start": 781.32, "text": "a"}, {"end": 782.12, "start": 781.72, "text": "large"}, {"end": 784.76, "start": 782.12, "text": "data"}, {"end": 784.96, "start": 784.76, "text": "set."}, {"end": 784.96, "start": 784.96, "text": "Their"}, {"end": 785.04, "start": 784.96, "text": "job"}, {"end": 785.48, "start": 785.04, "text": "is"}, {"end": 785.76, "start": 785.48, "text": "to"}, {"end": 786.84, "start": 785.76, "text": "spend"}, {"end": 787.04, "start": 786.84, "text": "a"}, {"end": 787.36, "start": 787.04, "text": "weekend,"}, {"end": 787.76, "start": 787.36, "text": "with"}, {"end": 788.6, "start": 787.76, "text": "sleep"}, {"end": 789.12, "start": 788.6, "text": "optional"}, {"end": 790.52, "start": 789.12, "text": "apparently,"}, {"end": 791.0, "start": 790.52, "text": "finding"}, {"end": 791.64, "start": 791.0, "text": "something"}, {"end": 792.28, "start": 791.64, "text": "interesting"}, {"end": 792.52, "start": 792.28, "text": "in"}, {"end": 792.6, "start": 792.52, "text": "the"}, {"end": 792.92, "start": 792.6, "text": "data."}, {"end": 793.32, "start": 792.92, "text": "That's"}, {"end": 793.92, "start": 793.32, "text": "exactly"}, {"end": 794.28, "start": 793.92, "text": "how"}, {"end": 794.76, "start": 794.28, "text": "it's"}, {"end": 795.12, "start": 794.76, "text": "expressed."}, {"end": 795.16, "start": 795.12, "text": "Now"}, {"end": 795.2, "start": 795.16, "text": "I"}, {"end": 795.88, "start": 795.2, "text": "guess"}, {"end": 796.28, "start": 795.88, "text": "that"}, {"end": 796.64, "start": 796.28, "text": "can"}, {"end": 796.84, "start": 796.64, "text": "be"}, {"end": 797.48, "start": 796.84, "text": "very"}, {"end": 798.4, "start": 797.48, "text": "amusing"}, {"end": 798.96, "start": 798.4, "text": "as"}, {"end": 799.28, "start": 798.96, "text": "a"}, {"end": 800.64, "start": 799.28, "text": "contest,"}, {"end": 801.92, "start": 800.64, "text": "but"}, {"end": 802.76, "start": 801.92, "text": "wouldn't"}, {"end": 803.04, "start": 802.76, "text": "it"}, {"end": 803.24, "start": 803.04, "text": "be"}, {"end": 803.72, "start": 803.24, "text": "better"}, {"end": 803.96, "start": 803.72, "text": "to"}, {"end": 804.36, "start": 803.96, "text": "encourage"}, {"end": 805.0, "start": 804.36, "text": "students"}, {"end": 805.44, "start": 805.0, "text": "to"}, {"end": 805.6, "start": 805.44, "text": "take"}, {"end": 805.64, "start": 805.6, "text": "that"}, {"end": 806.12, "start": 805.64, "text": "same"}, {"end": 806.52, "start": 806.12, "text": "data"}, {"end": 806.52, "start": 806.52, "text": "and"}, {"end": 807.12, "start": 806.52, "text": "use"}, {"end": 807.48, "start": 807.12, "text": "it"}, {"end": 807.84, "start": 807.48, "text": "to"}, {"end": 808.52, "start": 807.84, "text": "solve"}, {"end": 808.56, "start": 808.52, "text": "a"}, {"end": 809.04, "start": 808.56, "text": "problem"}, {"end": 809.44, "start": 809.04, "text": "someone"}, {"end": 809.84, "start": 809.44, "text": "cares"}, {"end": 809.96, "start": 809.84, "text": "about?"}], "text": " of students are presented with a large data set. Their job is to spend a weekend, with sleep optional apparently, finding something interesting in the data. That's exactly how it's expressed. Now I guess that can be very amusing as a contest, but wouldn't it be better to encourage students to take that same data and use it to solve a problem someone cares about?"}, {"chunks": [{"end": 811.52, "start": 810.0, "text": "Okay,"}, {"end": 812.16, "start": 811.52, "text": "so"}, {"end": 813.56, "start": 812.16, "text": "I,"}, {"end": 813.8, "start": 813.56, "text": "from"}, {"end": 814.36, "start": 813.8, "text": "my"}, {"end": 815.04, "start": 814.36, "text": "own"}, {"end": 815.56, "start": 815.04, "text": "preference,"}, {"end": 815.72, "start": 815.56, "text": "I"}, {"end": 816.2, "start": 815.72, "text": "prefer"}, {"end": 816.92, "start": 816.2, "text": "the"}, {"end": 817.6, "start": 816.92, "text": "Kaggle"}, {"end": 817.84, "start": 817.6, "text": "approach,"}, {"end": 818.16, "start": 817.84, "text": "let's"}, {"end": 818.52, "start": 818.16, "text": "say,"}, {"end": 819.16, "start": 818.52, "text": "where"}, {"end": 819.88, "start": 819.16, "text": "people"}, {"end": 820.56, "start": 819.88, "text": "really"}, {"end": 821.28, "start": 820.56, "text": "want"}, {"end": 821.84, "start": 821.28, "text": "a"}, {"end": 822.48, "start": 821.84, "text": "solution"}, {"end": 822.48, "start": 822.48, "text": "to"}, {"end": 822.48, "start": 822.48, "text": "a"}, {"end": 823.0, "start": 822.48, "text": "problem,"}, {"end": 824.12, "start": 823.0, "text": "can"}, {"end": 824.36, "start": 824.12, "text": "post"}, {"end": 824.76, "start": 824.36, "text": "data"}, {"end": 825.44, "start": 824.76, "text": "sets,"}, {"end": 826.08, "start": 825.44, "text": "and"}, {"end": 826.64, "start": 826.08, "text": "people"}, {"end": 826.92, "start": 826.64, "text": "compete"}, {"end": 827.04, "start": 826.92, "text": "to"}, {"end": 827.44, "start": 827.04, "text": "solve"}, {"end": 827.48, "start": 827.44, "text": "the"}, {"end": 828.4, "start": 827.48, "text": "problem,"}, {"end": 828.68, "start": 828.4, "text": "backed"}, {"end": 829.2, "start": 828.68, "text": "off"}, {"end": 829.88, "start": 829.2, "text": "for"}, {"end": 830.32, "start": 829.88, "text": "cash"}, {"end": 833.12, "start": 830.32, "text": "prizes."}, {"end": 833.44, "start": 833.12, "text": "Anyway,"}, {"end": 834.8, "start": 833.44, "text": "a"}, {"end": 835.44, "start": 834.8, "text": "second"}, {"end": 835.88, "start": 835.44, "text": "problem"}, {"end": 836.6, "start": 835.88, "text": "that"}, {"end": 837.16, "start": 836.6, "text": "I"}, {"end": 837.56, "start": 837.16, "text": "have"}, {"end": 837.68, "start": 837.56, "text": "with"}, {"end": 837.68, "start": 837.68, "text": "the"}, {"end": 838.36, "start": 837.68, "text": "statistics"}, {"end": 838.36, "start": 838.36, "text": "approach"}, {"end": 838.72, "start": 838.36, "text": "to"}, {"end": 838.72, "start": 838.72, "text": "the"}, {"end": 839.44, "start": 838.72, "text": "world"}, {"end": 839.96, "start": 839.44, "text": "is"}], "text": " Okay, so I, from my own preference, I prefer the Kaggle approach, let's say, where people really want a solution to a problem, can post data sets, and people compete to solve the problem, backed off for cash prizes. Anyway, a second problem that I have with the statistics approach to the world is"}, {"chunks": [{"end": 840.08, "start": 840.0, "text": "but"}, {"end": 840.16, "start": 840.08, "text": "they"}, {"end": 840.28, "start": 840.16, "text": "seem"}, {"end": 840.4, "start": 840.28, "text": "to"}, {"end": 841.08, "start": 840.4, "text": "forget"}, {"end": 841.32, "start": 841.08, "text": "data"}, {"end": 842.0, "start": 841.32, "text": "science"}, {"end": 842.32, "start": 842.0, "text": "is"}, {"end": 842.68, "start": 842.32, "text": "largely,"}, {"end": 842.88, "start": 842.68, "text": "even"}, {"end": 843.08, "start": 842.88, "text": "if"}, {"end": 843.4, "start": 843.08, "text": "it's"}, {"end": 843.92, "start": 843.4, "text": "not"}, {"end": 844.2, "start": 843.92, "text": "completely"}, {"end": 844.64, "start": 844.2, "text": "so,"}, {"end": 844.88, "start": 844.64, "text": "it's"}, {"end": 845.2, "start": 844.88, "text": "largely"}, {"end": 845.4, "start": 845.2, "text": "an"}, {"end": 846.4, "start": 845.4, "text": "experimental"}, {"end": 846.88, "start": 846.4, "text": "science."}, {"end": 847.16, "start": 846.88, "text": "If"}, {"end": 847.68, "start": 847.16, "text": "you"}, {"end": 848.2, "start": 847.68, "text": "want"}, {"end": 849.36, "start": 848.2, "text": "to"}, {"end": 849.56, "start": 849.36, "text": "know"}, {"end": 849.92, "start": 849.56, "text": "if"}, {"end": 850.56, "start": 849.92, "text": "your"}, {"end": 852.0, "start": 850.56, "text": "idea"}, {"end": 852.6, "start": 852.0, "text": "solves"}, {"end": 852.64, "start": 852.6, "text": "a"}, {"end": 853.04, "start": 852.64, "text": "problem"}, {"end": 853.52, "start": 853.04, "text": "you're"}, {"end": 853.88, "start": 853.52, "text": "working"}, {"end": 854.32, "start": 853.88, "text": "on,"}, {"end": 854.72, "start": 854.32, "text": "then"}, {"end": 855.2, "start": 854.72, "text": "implement"}, {"end": 855.56, "start": 855.2, "text": "it,"}, {"end": 855.8, "start": 855.56, "text": "run"}, {"end": 856.12, "start": 855.8, "text": "it,"}, {"end": 857.2, "start": 856.12, "text": "and"}, {"end": 859.56, "start": 857.2, "text": "see."}, {"end": 859.92, "start": 859.56, "text": "Again,"}, {"end": 860.28, "start": 859.92, "text": "in"}, {"end": 860.72, "start": 860.28, "text": "most"}, {"end": 861.4, "start": 860.72, "text": "situations,"}, {"end": 861.92, "start": 861.4, "text": "maybe"}, {"end": 862.32, "start": 861.92, "text": "not"}, {"end": 863.08, "start": 862.32, "text": "all,"}, {"end": 863.52, "start": 863.08, "text": "but"}, {"end": 864.16, "start": 863.52, "text": "mostly"}, {"end": 864.36, "start": 864.16, "text": "it"}, {"end": 864.36, "start": 864.36, "text": "is"}, {"end": 864.72, "start": 864.36, "text": "better"}, {"end": 864.84, "start": 864.72, "text": "to"}, {"end": 864.84, "start": 864.84, "text": "do"}, {"end": 865.2, "start": 864.84, "text": "your"}, {"end": 866.6, "start": 865.2, "text": "best,"}, {"end": 867.32, "start": 866.6, "text": "measure"}, {"end": 867.56, "start": 867.32, "text": "and"}, {"end": 868.16, "start": 867.56, "text": "try"}, {"end": 869.08, "start": 868.16, "text": "to"}, {"end": 869.96, "start": 869.08, "text": "improve,"}], "text": " but they seem to forget data science is largely, even if it's not completely so, it's largely an experimental science. If you want to know if your idea solves a problem you're working on, then implement it, run it, and see. Again, in most situations, maybe not all, but mostly it is better to do your best, measure and try to improve,"}, {"chunks": [{"end": 870.88, "start": 870.0, "text": "analyzing"}, {"end": 871.08, "start": 870.88, "text": "your"}, {"end": 872.04, "start": 871.08, "text": "performance"}, {"end": 876.96, "start": 872.04, "text": "theoretically."}, {"end": 878.0, "start": 876.96, "text": "Okay."}, {"end": 878.68, "start": 878.0, "text": "Now,"}, {"end": 878.96, "start": 878.68, "text": "okay,"}, {"end": 879.04, "start": 878.96, "text": "I"}, {"end": 879.36, "start": 879.04, "text": "want"}, {"end": 879.4, "start": 879.36, "text": "to"}, {"end": 879.44, "start": 879.4, "text": "give"}, {"end": 879.64, "start": 879.44, "text": "you"}, {"end": 880.04, "start": 879.64, "text": "an"}, {"end": 881.36, "start": 880.04, "text": "example"}, {"end": 883.88, "start": 881.36, "text": "of"}, {"end": 884.52, "start": 883.88, "text": "this"}, {"end": 884.84, "start": 884.52, "text": "point."}, {"end": 885.52, "start": 884.84, "text": "Okay."}, {"end": 885.96, "start": 885.52, "text": "Let's"}, {"end": 886.68, "start": 885.96, "text": "consider"}, {"end": 887.08, "start": 886.68, "text": "the"}, {"end": 888.2, "start": 887.08, "text": "detection"}, {"end": 888.4, "start": 888.2, "text": "of"}, {"end": 888.92, "start": 888.4, "text": "phishing"}, {"end": 889.68, "start": 888.92, "text": "emails"}, {"end": 889.96, "start": 889.68, "text": "or"}, {"end": 890.4, "start": 889.96, "text": "spam"}, {"end": 890.88, "start": 890.4, "text": "emails"}, {"end": 890.96, "start": 890.88, "text": "in"}, {"end": 892.0, "start": 890.96, "text": "general."}, {"end": 892.64, "start": 892.0, "text": "Okay."}, {"end": 893.72, "start": 892.64, "text": "Now,"}, {"end": 894.12, "start": 893.72, "text": "I"}, {"end": 894.36, "start": 894.12, "text": "think"}, {"end": 894.4, "start": 894.36, "text": "Google"}, {"end": 895.12, "start": 894.4, "text": "and"}, {"end": 895.48, "start": 895.12, "text": "other"}, {"end": 895.72, "start": 895.48, "text": "email"}, {"end": 895.88, "start": 895.72, "text": "providers"}, {"end": 896.0, "start": 895.88, "text": "are"}, {"end": 896.76, "start": 896.0, "text": "pretty"}, {"end": 896.92, "start": 896.76, "text": "good"}, {"end": 897.2, "start": 896.92, "text": "at"}, {"end": 897.28, "start": 897.2, "text": "their"}, {"end": 897.32, "start": 897.28, "text": "job,"}, {"end": 897.88, "start": 897.32, "text": "although"}, {"end": 898.12, "start": 897.88, "text": "I"}, {"end": 898.32, "start": 898.12, "text": "can"}, {"end": 898.76, "start": 898.32, "text": "attest"}, {"end": 898.88, "start": 898.76, "text": "that"}, {"end": 899.44, "start": 898.88, "text": "they"}, {"end": 899.6, "start": 899.44, "text": "are"}, {"end": 899.96, "start": 899.6, "text": "not"}], "text": " analyzing your performance theoretically. Okay. Now, okay, I want to give you an example of this point. Okay. Let's consider the detection of phishing emails or spam emails in general. Okay. Now, I think Google and other email providers are pretty good at their job, although I can attest that they are not"}, {"chunks": [{"end": 904.52, "start": 900.0, "text": "Perfect."}, {"end": 905.4, "start": 904.52, "text": "Okay."}, {"end": 905.92, "start": 905.4, "text": "Well,"}, {"end": 906.2, "start": 905.92, "text": "so"}, {"end": 906.44, "start": 906.2, "text": "how"}, {"end": 906.8, "start": 906.44, "text": "good"}, {"end": 907.36, "start": 906.8, "text": "is,"}, {"end": 907.64, "start": 907.36, "text": "say,"}, {"end": 907.68, "start": 907.64, "text": "Google?"}, {"end": 909.4, "start": 907.68, "text": "Well,"}, {"end": 910.08, "start": 909.4, "text": "the"}, {"end": 911.0, "start": 910.08, "text": "answer"}, {"end": 911.12, "start": 911.0, "text": "is"}, {"end": 911.44, "start": 911.12, "text": "we"}, {"end": 911.76, "start": 911.44, "text": "don't"}, {"end": 912.24, "start": 911.76, "text": "know."}, {"end": 912.88, "start": 912.24, "text": "Okay."}, {"end": 913.2, "start": 912.88, "text": "And"}, {"end": 915.04, "start": 913.2, "text": "even"}, {"end": 915.52, "start": 915.04, "text": "if"}, {"end": 915.92, "start": 915.52, "text": "we"}, {"end": 916.44, "start": 915.92, "text": "had"}, {"end": 917.04, "start": 916.44, "text": "exact"}, {"end": 917.48, "start": 917.04, "text": "probabilities"}, {"end": 917.92, "start": 917.48, "text": "of"}, {"end": 918.24, "start": 917.92, "text": "false"}, {"end": 919.08, "start": 918.24, "text": "positives"}, {"end": 919.16, "start": 919.08, "text": "or"}, {"end": 919.48, "start": 919.16, "text": "false"}, {"end": 919.96, "start": 919.48, "text": "negatives"}, {"end": 920.64, "start": 919.96, "text": "today,"}, {"end": 921.2, "start": 920.64, "text": "it"}, {"end": 921.64, "start": 921.2, "text": "wouldn't"}, {"end": 922.0, "start": 921.64, "text": "be"}, {"end": 922.04, "start": 922.0, "text": "that"}, {"end": 922.44, "start": 922.04, "text": "useful"}, {"end": 923.08, "start": 922.44, "text": "tomorrow"}, {"end": 923.88, "start": 923.08, "text": "because"}, {"end": 924.48, "start": 923.88, "text": "spammers"}, {"end": 924.52, "start": 924.48, "text": "are"}, {"end": 924.6, "start": 924.52, "text": "always"}, {"end": 924.68, "start": 924.6, "text": "coming"}, {"end": 924.88, "start": 924.68, "text": "up"}, {"end": 925.08, "start": 924.88, "text": "with"}, {"end": 925.28, "start": 925.08, "text": "new"}, {"end": 927.48, "start": 925.28, "text": "tricks."}, {"end": 928.52, "start": 927.48, "text": "Okay."}, {"end": 928.76, "start": 928.52, "text": "So"}, {"end": 929.04, "start": 928.76, "text": "if"}, {"end": 929.4, "start": 929.04, "text": "you"}, {"end": 929.96, "start": 929.4, "text": "regard"}], "text": " Perfect. Okay. Well, so how good is, say, Google? Well, the answer is we don't know. Okay. And even if we had exact probabilities of false positives or false negatives today, it wouldn't be that useful tomorrow because spammers are always coming up with new tricks. Okay. So if you regard"}, {"chunks": [{"end": 930.24, "start": 930.0, "text": "the"}, {"end": 931.12, "start": 930.24, "text": "development"}, {"end": 931.44, "start": 931.12, "text": "of"}, {"end": 931.64, "start": 931.44, "text": "spam"}, {"end": 932.2, "start": 931.64, "text": "detectors"}, {"end": 932.2, "start": 932.2, "text": "then"}, {"end": 932.32, "start": 932.2, "text": "is"}, {"end": 932.8, "start": 932.32, "text": "belonging"}, {"end": 932.96, "start": 932.8, "text": "to"}, {"end": 933.84, "start": 932.96, "text": "Conway's"}, {"end": 934.32, "start": 933.84, "text": "danger"}, {"end": 934.8, "start": 934.32, "text": "zone"}, {"end": 935.16, "start": 934.8, "text": "because"}, {"end": 935.24, "start": 935.16, "text": "it"}, {"end": 936.0, "start": 935.24, "text": "consists"}, {"end": 936.36, "start": 936.0, "text": "of"}, {"end": 937.76, "start": 936.36, "text": "hacking"}, {"end": 938.64, "start": 937.76, "text": "applied"}, {"end": 939.04, "start": 938.64, "text": "to"}, {"end": 940.04, "start": 939.04, "text": "a"}, {"end": 940.72, "start": 940.04, "text": "domain,"}, {"end": 941.0, "start": 940.72, "text": "a"}, {"end": 941.28, "start": 941.0, "text": "spam"}, {"end": 941.64, "start": 941.28, "text": "detection"}, {"end": 941.84, "start": 941.64, "text": "in"}, {"end": 942.24, "start": 941.84, "text": "this"}, {"end": 942.84, "start": 942.24, "text": "case,"}, {"end": 943.4, "start": 942.84, "text": "without"}, {"end": 943.44, "start": 943.4, "text": "the"}, {"end": 943.96, "start": 943.44, "text": "guidance"}, {"end": 943.96, "start": 943.96, "text": "of"}, {"end": 944.08, "start": 943.96, "text": "a"}, {"end": 945.2, "start": 944.08, "text": "statistician"}, {"end": 945.48, "start": 945.2, "text": "or"}, {"end": 945.56, "start": 945.48, "text": "a"}, {"end": 946.4, "start": 945.56, "text": "mathematician."}, {"end": 946.88, "start": 946.4, "text": "Think"}, {"end": 947.08, "start": 946.88, "text": "of"}, {"end": 947.16, "start": 947.08, "text": "what"}, {"end": 947.36, "start": 947.16, "text": "would"}, {"end": 947.44, "start": 947.36, "text": "be"}, {"end": 947.84, "start": 947.44, "text": "lost"}, {"end": 948.12, "start": 947.84, "text": "by"}, {"end": 948.68, "start": 948.12, "text": "throwing"}, {"end": 949.0, "start": 948.68, "text": "the"}, {"end": 949.84, "start": 949.0, "text": "software"}, {"end": 951.28, "start": 949.84, "text": "away."}, {"end": 952.04, "start": 951.28, "text": "Okay,"}, {"end": 952.24, "start": 952.04, "text": "people"}, {"end": 952.56, "start": 952.24, "text": "would"}, {"end": 952.76, "start": 952.56, "text": "be"}, {"end": 953.16, "start": 952.76, "text": "falling"}, {"end": 953.76, "start": 953.16, "text": "into"}, {"end": 953.92, "start": 953.76, "text": "spam"}, {"end": 954.44, "start": 953.92, "text": "traps"}, {"end": 956.04, "start": 954.44, "text": "all"}, {"end": 956.52, "start": 956.04, "text": "over"}, {"end": 959.2, "start": 956.52, "text": "the"}, {"end": 959.96, "start": 959.2, "text": "world."}], "text": " the development of spam detectors then is belonging to Conway's danger zone because it consists of hacking applied to a domain, a spam detection in this case, without the guidance of a statistician or a mathematician. Think of what would be lost by throwing the software away. Okay, people would be falling into spam traps all over the world."}, {"chunks": [{"end": 960.28, "start": 960.0, "text": "You"}, {"end": 960.8, "start": 960.28, "text": "know,"}, {"end": 961.92, "start": 960.8, "text": "incidentally,"}, {"end": 962.2, "start": 961.92, "text": "I"}, {"end": 962.4, "start": 962.2, "text": "assume"}, {"end": 962.4, "start": 962.4, "text": "that"}, {"end": 962.44, "start": 962.4, "text": "if"}, {"end": 962.88, "start": 962.44, "text": "you"}, {"end": 963.12, "start": 962.88, "text": "look"}, {"end": 963.52, "start": 963.12, "text": "beneath"}, {"end": 964.08, "start": 963.52, "text": "the"}, {"end": 964.8, "start": 964.08, "text": "hood"}, {"end": 965.24, "start": 964.8, "text": "of"}, {"end": 965.56, "start": 965.24, "text": "the"}, {"end": 965.76, "start": 965.56, "text": "very"}, {"end": 966.48, "start": 965.76, "text": "sophisticated"}, {"end": 966.84, "start": 966.48, "text": "spam"}, {"end": 967.2, "start": 966.84, "text": "detection"}, {"end": 968.2, "start": 967.2, "text": "software,"}, {"end": 968.6, "start": 968.2, "text": "you'll"}, {"end": 969.48, "start": 968.6, "text": "find"}, {"end": 969.8, "start": 969.48, "text": "many"}, {"end": 970.44, "start": 969.8, "text": "sophisticated"}, {"end": 970.84, "start": 970.44, "text": "mathematical"}, {"end": 971.28, "start": 970.84, "text": "and"}, {"end": 972.28, "start": 971.28, "text": "statistical"}, {"end": 972.8, "start": 972.28, "text": "ideas."}, {"end": 972.88, "start": 972.8, "text": "But"}, {"end": 973.12, "start": 972.88, "text": "the"}, {"end": 973.6, "start": 973.12, "text": "value"}, {"end": 974.04, "start": 973.6, "text": "of"}, {"end": 974.4, "start": 974.04, "text": "these"}, {"end": 975.2, "start": 974.4, "text": "ideas"}, {"end": 975.4, "start": 975.2, "text": "is"}, {"end": 975.84, "start": 975.4, "text": "only"}, {"end": 976.68, "start": 975.84, "text": "realized"}, {"end": 976.96, "start": 976.68, "text": "through"}, {"end": 977.16, "start": 976.96, "text": "the"}, {"end": 978.64, "start": 977.16, "text": "implementation,"}, {"end": 979.04, "start": 978.64, "text": "not"}, {"end": 980.16, "start": 979.04, "text": "directly."}, {"end": 985.12, "start": 980.16, "text": "Okay."}, {"end": 985.52, "start": 985.12, "text": "And"}, {"end": 986.2, "start": 985.52, "text": "spam"}, {"end": 986.84, "start": 986.2, "text": "detection"}, {"end": 987.08, "start": 986.84, "text": "is"}, {"end": 987.44, "start": 987.08, "text": "far"}, {"end": 987.76, "start": 987.44, "text": "from"}, {"end": 988.28, "start": 987.76, "text": "unique"}, {"end": 988.4, "start": 988.28, "text": "as"}, {"end": 988.4, "start": 988.4, "text": "an"}, {"end": 989.04, "start": 988.4, "text": "example"}, {"end": 989.12, "start": 989.04, "text": "of"}, {"end": 989.56, "start": 989.12, "text": "where"}, {"end": 989.56, "start": 989.56, "text": "it"}, {"end": 989.56, "start": 989.56, "text": "is"}, {"end": 989.96, "start": 989.56, "text": "more"}], "text": " You know, incidentally, I assume that if you look beneath the hood of the very sophisticated spam detection software, you'll find many sophisticated mathematical and statistical ideas. But the value of these ideas is only realized through the implementation, not directly. Okay. And spam detection is far from unique as an example of where it is more"}, {"chunks": [{"end": 990.32, "start": 990.0, "text": "important"}, {"end": 990.4, "start": 990.32, "text": "to"}, {"end": 990.4, "start": 990.4, "text": "do"}, {"end": 990.76, "start": 990.4, "text": "the"}, {"end": 991.24, "start": 990.76, "text": "best"}, {"end": 991.64, "start": 991.24, "text": "you"}, {"end": 992.28, "start": 991.64, "text": "can"}, {"end": 992.68, "start": 992.28, "text": "rather"}, {"end": 993.4, "start": 992.68, "text": "than"}, {"end": 994.12, "start": 993.4, "text": "analyzing"}, {"end": 994.56, "start": 994.12, "text": "how"}, {"end": 994.76, "start": 994.56, "text": "well"}, {"end": 995.76, "start": 994.76, "text": "you're"}, {"end": 996.36, "start": 995.76, "text": "doing."}, {"end": 996.6, "start": 996.36, "text": "For"}, {"end": 997.36, "start": 996.6, "text": "example,"}, {"end": 997.6, "start": 997.36, "text": "we're"}, {"end": 997.96, "start": 997.6, "text": "not"}, {"end": 998.36, "start": 997.96, "text": "perfect"}, {"end": 998.84, "start": 998.36, "text": "at"}, {"end": 999.56, "start": 998.84, "text": "understanding"}, {"end": 999.64, "start": 999.56, "text": "the"}, {"end": 1000.4, "start": 999.64, "text": "relationship"}, {"end": 1000.48, "start": 1000.4, "text": "between"}, {"end": 1000.92, "start": 1000.48, "text": "your"}, {"end": 1001.48, "start": 1000.92, "text": "genome"}, {"end": 1001.68, "start": 1001.48, "text": "and"}, {"end": 1001.72, "start": 1001.68, "text": "the"}, {"end": 1002.12, "start": 1001.72, "text": "proper"}, {"end": 1003.0, "start": 1002.12, "text": "medical"}, {"end": 1003.76, "start": 1003.0, "text": "treatment,"}, {"end": 1004.04, "start": 1003.76, "text": "but"}, {"end": 1004.12, "start": 1004.04, "text": "it"}, {"end": 1004.32, "start": 1004.12, "text": "is"}, {"end": 1004.64, "start": 1004.32, "text": "far"}, {"end": 1004.64, "start": 1004.64, "text": "better"}, {"end": 1005.0, "start": 1004.64, "text": "to"}, {"end": 1005.32, "start": 1005.0, "text": "use"}, {"end": 1006.08, "start": 1005.32, "text": "what"}, {"end": 1006.56, "start": 1006.08, "text": "knowledge"}, {"end": 1007.6, "start": 1006.56, "text": "we"}, {"end": 1007.64, "start": 1007.6, "text": "have"}, {"end": 1008.12, "start": 1007.64, "text": "managed"}, {"end": 1008.24, "start": 1008.12, "text": "to"}, {"end": 1008.64, "start": 1008.24, "text": "develop"}, {"end": 1008.84, "start": 1008.64, "text": "than"}, {"end": 1009.04, "start": 1008.84, "text": "to"}, {"end": 1009.72, "start": 1009.04, "text": "worry"}, {"end": 1010.36, "start": 1009.72, "text": "that"}, {"end": 1010.56, "start": 1010.36, "text": "our"}, {"end": 1011.08, "start": 1010.56, "text": "knowledge"}, {"end": 1011.68, "start": 1011.08, "text": "is"}, {"end": 1012.12, "start": 1011.68, "text": "not"}, {"end": 1012.8, "start": 1012.12, "text": "perfect"}, {"end": 1013.16, "start": 1012.8, "text": "or"}, {"end": 1013.6, "start": 1013.16, "text": "that"}, {"end": 1014.0, "start": 1013.6, "text": "we"}, {"end": 1014.36, "start": 1014.0, "text": "don't"}, {"end": 1014.48, "start": 1014.36, "text": "even"}, {"end": 1014.84, "start": 1014.48, "text": "know"}, {"end": 1015.56, "start": 1014.84, "text": "exact"}, {"end": 1017.48, "start": 1015.56, "text": "probabilities."}, {"end": 1017.96, "start": 1017.48, "text": "Similarly,"}, {"end": 1018.16, "start": 1017.96, "text": "we've"}, {"end": 1018.16, "start": 1018.16, "text": "made"}, {"end": 1018.44, "start": 1018.16, "text": "great"}, {"end": 1019.32, "start": 1018.44, "text": "progress"}, {"end": 1019.96, "start": 1019.32, "text": "in"}], "text": " important to do the best you can rather than analyzing how well you're doing. For example, we're not perfect at understanding the relationship between your genome and the proper medical treatment, but it is far better to use what knowledge we have managed to develop than to worry that our knowledge is not perfect or that we don't even know exact probabilities. Similarly, we've made great progress in"}, {"chunks": [{"end": 1020.56, "start": 1020.0, "text": "weather"}, {"end": 1021.12, "start": 1020.56, "text": "prediction,"}, {"end": 1021.52, "start": 1021.12, "text": "but"}, {"end": 1021.64, "start": 1021.52, "text": "we're"}, {"end": 1021.72, "start": 1021.64, "text": "not"}, {"end": 1022.16, "start": 1021.72, "text": "perfect."}, {"end": 1022.68, "start": 1022.16, "text": "I"}, {"end": 1022.92, "start": 1022.68, "text": "think"}, {"end": 1023.4, "start": 1022.92, "text": "it's"}, {"end": 1023.76, "start": 1023.4, "text": "still"}, {"end": 1024.16, "start": 1023.76, "text": "better"}, {"end": 1024.32, "start": 1024.16, "text": "to"}, {"end": 1025.0, "start": 1024.32, "text": "warn"}, {"end": 1025.36, "start": 1025.0, "text": "people"}, {"end": 1025.36, "start": 1025.36, "text": "that"}, {"end": 1025.36, "start": 1025.36, "text": "they're"}, {"end": 1025.44, "start": 1025.36, "text": "at"}, {"end": 1025.88, "start": 1025.44, "text": "risk"}, {"end": 1026.44, "start": 1025.88, "text": "of"}, {"end": 1026.64, "start": 1026.44, "text": "a"}, {"end": 1027.64, "start": 1026.64, "text": "dangerous"}, {"end": 1028.12, "start": 1027.64, "text": "storm,"}, {"end": 1028.6, "start": 1028.12, "text": "even"}, {"end": 1028.6, "start": 1028.6, "text": "if"}, {"end": 1028.6, "start": 1028.6, "text": "we"}, {"end": 1028.92, "start": 1028.6, "text": "can't"}, {"end": 1029.44, "start": 1028.92, "text": "give"}, {"end": 1029.8, "start": 1029.44, "text": "them"}, {"end": 1029.88, "start": 1029.8, "text": "the"}, {"end": 1030.24, "start": 1029.88, "text": "exact"}, {"end": 1031.2, "start": 1030.24, "text": "probability"}, {"end": 1031.44, "start": 1031.2, "text": "of"}, {"end": 1031.48, "start": 1031.44, "text": "the"}, {"end": 1032.44, "start": 1031.48, "text": "various"}, {"end": 1033.2, "start": 1032.44, "text": "outcomes."}, {"end": 1033.36, "start": 1033.2, "text": "Or"}, {"end": 1033.4, "start": 1033.36, "text": "even"}, {"end": 1033.56, "start": 1033.4, "text": "selecting"}, {"end": 1033.56, "start": 1033.56, "text": "ads"}, {"end": 1034.16, "start": 1033.56, "text": "to"}, {"end": 1034.24, "start": 1034.16, "text": "show."}, {"end": 1034.24, "start": 1034.24, "text": "We"}, {"end": 1034.24, "start": 1034.24, "text": "don't"}, {"end": 1034.76, "start": 1034.24, "text": "have"}, {"end": 1034.8, "start": 1034.76, "text": "a"}, {"end": 1035.08, "start": 1034.8, "text": "great"}, {"end": 1035.6, "start": 1035.08, "text": "theory"}, {"end": 1036.16, "start": 1035.6, "text": "of"}, {"end": 1037.08, "start": 1036.16, "text": "what"}, {"end": 1037.92, "start": 1037.08, "text": "effect"}, {"end": 1038.44, "start": 1037.92, "text": "an"}, {"end": 1039.48, "start": 1038.44, "text": "ad"}, {"end": 1039.92, "start": 1039.48, "text": "will"}, {"end": 1040.28, "start": 1039.92, "text": "have."}, {"end": 1040.52, "start": 1040.28, "text": "So"}, {"end": 1040.92, "start": 1040.52, "text": "we"}, {"end": 1042.04, "start": 1040.92, "text": "experiment."}, {"end": 1042.44, "start": 1042.04, "text": "We"}, {"end": 1042.84, "start": 1042.44, "text": "show"}, {"end": 1043.32, "start": 1042.84, "text": "the"}, {"end": 1043.44, "start": 1043.32, "text": "ad"}, {"end": 1043.52, "start": 1043.44, "text": "and"}, {"end": 1043.56, "start": 1043.52, "text": "we"}, {"end": 1044.52, "start": 1043.56, "text": "estimate"}, {"end": 1044.96, "start": 1044.52, "text": "the"}, {"end": 1045.76, "start": 1044.96, "text": "click-through"}, {"end": 1046.32, "start": 1045.76, "text": "rate"}, {"end": 1046.4, "start": 1046.32, "text": "by"}, {"end": 1046.88, "start": 1046.4, "text": "the"}, {"end": 1047.64, "start": 1046.88, "text": "results"}, {"end": 1047.92, "start": 1047.64, "text": "of"}, {"end": 1048.68, "start": 1047.92, "text": "our"}, {"end": 1049.96, "start": 1048.68, "text": "experiments."}], "text": " weather prediction, but we're not perfect. I think it's still better to warn people that they're at risk of a dangerous storm, even if we can't give them the exact probability of the various outcomes. Or even selecting ads to show. We don't have a great theory of what effect an ad will have. So we experiment. We show the ad and we estimate the click-through rate by the results of our experiments."}, {"chunks": [{"end": 1051.4, "start": 1050.0, "text": "Now,"}, {"end": 1055.48, "start": 1051.4, "text": "let's"}, {"end": 1057.24, "start": 1055.48, "text": "look"}, {"end": 1057.76, "start": 1057.24, "text": "at"}, {"end": 1058.44, "start": 1057.76, "text": "some"}, {"end": 1058.8, "start": 1058.44, "text": "of"}, {"end": 1059.28, "start": 1058.8, "text": "the"}, {"end": 1059.72, "start": 1059.28, "text": "problems"}, {"end": 1059.84, "start": 1059.72, "text": "with"}, {"end": 1060.84, "start": 1059.84, "text": "machine"}, {"end": 1061.44, "start": 1060.84, "text": "learning."}, {"end": 1061.56, "start": 1061.44, "text": "But"}, {"end": 1062.04, "start": 1061.56, "text": "first,"}, {"end": 1062.48, "start": 1062.04, "text": "I"}, {"end": 1062.96, "start": 1062.48, "text": "want"}, {"end": 1063.28, "start": 1062.96, "text": "to"}, {"end": 1064.16, "start": 1063.28, "text": "concede"}, {"end": 1065.44, "start": 1064.16, "text": "two"}, {"end": 1066.84, "start": 1065.44, "text": "important"}, {"end": 1067.32, "start": 1066.84, "text": "points."}, {"end": 1067.96, "start": 1067.32, "text": "First,"}, {"end": 1068.56, "start": 1067.96, "text": "it"}, {"end": 1069.24, "start": 1068.56, "text": "is"}, {"end": 1069.8, "start": 1069.24, "text": "frequently"}, {"end": 1070.44, "start": 1069.8, "text": "possible"}, {"end": 1070.6, "start": 1070.44, "text": "to"}, {"end": 1071.16, "start": 1070.6, "text": "express"}, {"end": 1071.16, "start": 1071.16, "text": "a"}, {"end": 1071.8, "start": 1071.16, "text": "problem"}, {"end": 1072.2, "start": 1071.8, "text": "involving"}, {"end": 1072.64, "start": 1072.2, "text": "large"}, {"end": 1073.08, "start": 1072.64, "text": "scale"}, {"end": 1073.96, "start": 1073.08, "text": "data"}, {"end": 1074.52, "start": 1073.96, "text": "as"}, {"end": 1075.04, "start": 1074.52, "text": "one"}, {"end": 1075.44, "start": 1075.04, "text": "of"}, {"end": 1075.76, "start": 1075.44, "text": "building"}, {"end": 1076.04, "start": 1075.76, "text": "a"}, {"end": 1077.64, "start": 1076.04, "text": "model"}, {"end": 1077.76, "start": 1077.64, "text": "for"}, {"end": 1078.08, "start": 1077.76, "text": "something."}, {"end": 1078.52, "start": 1078.08, "text": "Machine"}, {"end": 1079.16, "start": 1078.52, "text": "learning"}, {"end": 1079.56, "start": 1079.16, "text": "is"}, {"end": 1079.96, "start": 1079.56, "text": "graded"}], "text": " Now, let's look at some of the problems with machine learning. But first, I want to concede two important points. First, it is frequently possible to express a problem involving large scale data as one of building a model for something. Machine learning is graded"}, {"chunks": [{"end": 1080.56, "start": 1080.0, "text": "building"}, {"end": 1080.84, "start": 1080.56, "text": "models,"}, {"end": 1081.16, "start": 1080.84, "text": "and"}, {"end": 1081.28, "start": 1081.16, "text": "in"}, {"end": 1081.48, "start": 1081.28, "text": "particular,"}, {"end": 1081.48, "start": 1081.48, "text": "you"}, {"end": 1081.48, "start": 1081.48, "text": "often"}, {"end": 1081.56, "start": 1081.48, "text": "get"}, {"end": 1081.96, "start": 1081.56, "text": "a"}, {"end": 1082.24, "start": 1081.96, "text": "model"}, {"end": 1082.68, "start": 1082.24, "text": "that"}, {"end": 1082.92, "start": 1082.68, "text": "is"}, {"end": 1083.6, "start": 1082.92, "text": "more"}, {"end": 1084.32, "start": 1083.6, "text": "accurate"}, {"end": 1084.76, "start": 1084.32, "text": "than"}, {"end": 1085.12, "start": 1084.76, "text": "what"}, {"end": 1085.56, "start": 1085.12, "text": "you"}, {"end": 1086.08, "start": 1085.56, "text": "get"}, {"end": 1086.32, "start": 1086.08, "text": "from"}, {"end": 1087.2, "start": 1086.32, "text": "previously"}, {"end": 1091.6, "start": 1087.2, "text": "known"}, {"end": 1092.44, "start": 1091.6, "text": "approaches."}, {"end": 1092.92, "start": 1092.44, "text": "Okay,"}, {"end": 1093.76, "start": 1092.92, "text": "now,"}, {"end": 1094.48, "start": 1093.76, "text": "what"}, {"end": 1095.24, "start": 1094.48, "text": "is"}, {"end": 1096.16, "start": 1095.24, "text": "this"}, {"end": 1096.48, "start": 1096.16, "text": "thing?"}, {"end": 1096.8, "start": 1096.48, "text": "Okay,"}, {"end": 1097.2, "start": 1096.8, "text": "you"}, {"end": 1097.68, "start": 1097.2, "text": "may"}, {"end": 1098.04, "start": 1097.68, "text": "have"}, {"end": 1098.44, "start": 1098.04, "text": "heard"}, {"end": 1098.6, "start": 1098.44, "text": "of"}, {"end": 1099.12, "start": 1098.6, "text": "what's"}, {"end": 1099.44, "start": 1099.12, "text": "called"}, {"end": 1099.44, "start": 1099.44, "text": "the"}, {"end": 1100.88, "start": 1099.44, "text": "Gartner"}, {"end": 1101.0, "start": 1100.88, "text": "hype"}, {"end": 1101.24, "start": 1101.0, "text": "cycle."}, {"end": 1101.52, "start": 1101.24, "text": "Okay,"}, {"end": 1101.96, "start": 1101.52, "text": "the"}, {"end": 1102.68, "start": 1101.96, "text": "idea"}, {"end": 1103.04, "start": 1102.68, "text": "is"}, {"end": 1103.52, "start": 1103.04, "text": "that"}, {"end": 1104.0, "start": 1103.52, "text": "every"}, {"end": 1104.56, "start": 1104.0, "text": "interesting"}, {"end": 1105.48, "start": 1104.56, "text": "technological"}, {"end": 1105.52, "start": 1105.48, "text": "idea"}, {"end": 1106.28, "start": 1105.52, "text": "starts"}, {"end": 1106.56, "start": 1106.28, "text": "off"}, {"end": 1107.0, "start": 1106.56, "text": "by"}, {"end": 1107.36, "start": 1107.0, "text": "getting"}, {"end": 1107.56, "start": 1107.36, "text": "hyped"}, {"end": 1107.92, "start": 1107.56, "text": "to"}, {"end": 1108.12, "start": 1107.92, "text": "the"}, {"end": 1108.72, "start": 1108.12, "text": "point"}, {"end": 1109.24, "start": 1108.72, "text": "that"}, {"end": 1109.96, "start": 1109.24, "text": "the"}], "text": " building models, and in particular, you often get a model that is more accurate than what you get from previously known approaches. Okay, now, what is this thing? Okay, you may have heard of what's called the Gartner hype cycle. Okay, the idea is that every interesting technological idea starts off by getting hyped to the point that the"}, {"chunks": [{"end": 1110.72, "start": 1110.0, "text": "Expectations"}, {"end": 1111.04, "start": 1110.72, "text": "for"}, {"end": 1111.68, "start": 1111.04, "text": "success"}, {"end": 1112.16, "start": 1111.68, "text": "greatly"}, {"end": 1112.56, "start": 1112.16, "text": "exceed"}, {"end": 1113.08, "start": 1112.56, "text": "reality."}, {"end": 1113.24, "start": 1113.08, "text": "Then"}, {"end": 1115.2, "start": 1113.24, "text": "people"}, {"end": 1115.68, "start": 1115.2, "text": "get"}, {"end": 1116.56, "start": 1115.68, "text": "disillusioned"}, {"end": 1117.04, "start": 1116.56, "text": "with"}, {"end": 1117.32, "start": 1117.04, "text": "the"}, {"end": 1117.72, "start": 1117.32, "text": "idea"}, {"end": 1118.04, "start": 1117.72, "text": "since"}, {"end": 1118.16, "start": 1118.04, "text": "it"}, {"end": 1118.68, "start": 1118.16, "text": "fails"}, {"end": 1119.48, "start": 1118.68, "text": "to"}, {"end": 1119.88, "start": 1119.48, "text": "live"}, {"end": 1119.88, "start": 1119.88, "text": "up"}, {"end": 1120.0, "start": 1119.88, "text": "to"}, {"end": 1120.16, "start": 1120.0, "text": "the"}, {"end": 1120.16, "start": 1120.16, "text": "hype."}, {"end": 1120.68, "start": 1120.16, "text": "That's"}, {"end": 1121.28, "start": 1120.68, "text": "this"}, {"end": 1121.76, "start": 1121.28, "text": "region"}, {"end": 1123.0, "start": 1121.76, "text": "here."}, {"end": 1123.48, "start": 1123.0, "text": "But"}, {"end": 1124.56, "start": 1123.48, "text": "at"}, {"end": 1124.72, "start": 1124.56, "text": "that"}, {"end": 1124.92, "start": 1124.72, "text": "point,"}, {"end": 1125.48, "start": 1124.92, "text": "clearer"}, {"end": 1126.32, "start": 1125.48, "text": "heads"}, {"end": 1127.28, "start": 1126.32, "text": "look"}, {"end": 1128.0, "start": 1127.28, "text": "at"}, {"end": 1128.44, "start": 1128.0, "text": "what"}, {"end": 1129.36, "start": 1128.44, "text": "value"}, {"end": 1130.04, "start": 1129.36, "text": "there"}, {"end": 1130.32, "start": 1130.04, "text": "is"}, {"end": 1130.36, "start": 1130.32, "text": "in"}, {"end": 1130.64, "start": 1130.36, "text": "the"}, {"end": 1131.48, "start": 1130.64, "text": "idea,"}, {"end": 1131.92, "start": 1131.48, "text": "and"}, {"end": 1132.8, "start": 1131.92, "text": "eventually"}, {"end": 1133.08, "start": 1132.8, "text": "they"}, {"end": 1133.08, "start": 1133.08, "text": "learn"}, {"end": 1133.48, "start": 1133.08, "text": "to"}, {"end": 1134.32, "start": 1133.48, "text": "exploit"}, {"end": 1135.04, "start": 1134.32, "text": "whatever"}, {"end": 1135.48, "start": 1135.04, "text": "the"}, {"end": 1136.4, "start": 1135.48, "text": "idea"}, {"end": 1136.96, "start": 1136.4, "text": "is"}, {"end": 1137.48, "start": 1136.96, "text": "to"}, {"end": 1137.96, "start": 1137.48, "text": "the"}, {"end": 1138.48, "start": 1137.96, "text": "extent"}, {"end": 1139.12, "start": 1138.48, "text": "that"}, {"end": 1139.36, "start": 1139.12, "text": "is"}, {"end": 1139.88, "start": 1139.36, "text": "possible"}, {"end": 1139.96, "start": 1139.88, "text": "and"}], "text": " Expectations for success greatly exceed reality. Then people get disillusioned with the idea since it fails to live up to the hype. That's this region here. But at that point, clearer heads look at what value there is in the idea, and eventually they learn to exploit whatever the idea is to the extent that is possible and"}, {"chunks": [{"end": 1140.08, "start": 1140.0, "text": "Now,"}, {"end": 1140.44, "start": 1140.08, "text": "this"}, {"end": 1141.04, "start": 1140.44, "text": "chart,"}, {"end": 1141.44, "start": 1141.04, "text": "which"}, {"end": 1142.92, "start": 1141.44, "text": "comes"}, {"end": 1143.32, "start": 1142.92, "text": "from"}, {"end": 1144.64, "start": 1143.32, "text": "2015,"}, {"end": 1145.24, "start": 1144.64, "text": "that's"}, {"end": 1146.0, "start": 1145.24, "text": "a"}, {"end": 1146.48, "start": 1146.0, "text": "little"}, {"end": 1147.72, "start": 1146.48, "text": "bit"}, {"end": 1149.68, "start": 1147.72, "text": "old,"}, {"end": 1150.04, "start": 1149.68, "text": "shows"}, {"end": 1150.4, "start": 1150.04, "text": "machine"}, {"end": 1150.88, "start": 1150.4, "text": "learning"}, {"end": 1151.28, "start": 1150.88, "text": "having"}, {"end": 1151.64, "start": 1151.28, "text": "just"}, {"end": 1152.44, "start": 1151.64, "text": "passed"}, {"end": 1152.6, "start": 1152.44, "text": "the"}, {"end": 1152.92, "start": 1152.6, "text": "peak"}, {"end": 1153.12, "start": 1152.92, "text": "of"}, {"end": 1153.6, "start": 1153.12, "text": "hype."}, {"end": 1157.84, "start": 1153.6, "text": "So"}, {"end": 1158.44, "start": 1157.84, "text": "what"}, {"end": 1158.96, "start": 1158.44, "text": "happened"}, {"end": 1158.96, "start": 1158.96, "text": "in"}, {"end": 1161.96, "start": 1158.96, "text": "2016?"}, {"end": 1162.28, "start": 1161.96, "text": "Well,"}, {"end": 1162.4, "start": 1162.28, "text": "it"}, {"end": 1162.96, "start": 1162.4, "text": "appears"}, {"end": 1163.84, "start": 1162.96, "text": "that"}, {"end": 1164.32, "start": 1163.84, "text": "machine"}, {"end": 1164.56, "start": 1164.32, "text": "learning"}, {"end": 1165.0, "start": 1164.56, "text": "went"}, {"end": 1165.72, "start": 1165.0, "text": "backwards."}, {"end": 1165.84, "start": 1165.72, "text": "I'm"}, {"end": 1165.88, "start": 1165.84, "text": "not"}, {"end": 1166.4, "start": 1165.88, "text": "sure"}, {"end": 1166.6, "start": 1166.4, "text": "that"}, {"end": 1167.04, "start": 1166.6, "text": "that's"}, {"end": 1167.8, "start": 1167.04, "text": "actually"}, {"end": 1168.4, "start": 1167.8, "text": "even"}, {"end": 1168.88, "start": 1168.4, "text": "legal,"}, {"end": 1169.28, "start": 1168.88, "text": "but"}, {"end": 1169.56, "start": 1169.28, "text": "it"}, {"end": 1169.96, "start": 1169.56, "text": "did."}], "text": " Now, this chart, which comes from 2015, that's a little bit old, shows machine learning having just passed the peak of hype. So what happened in 2016? Well, it appears that machine learning went backwards. I'm not sure that that's actually even legal, but it did."}, {"chunks": [{"end": 1170.52, "start": 1170.0, "text": "and"}, {"end": 1171.04, "start": 1170.52, "text": "it"}, {"end": 1171.4, "start": 1171.04, "text": "was"}, {"end": 1172.68, "start": 1171.4, "text": "then"}, {"end": 1172.92, "start": 1172.68, "text": "deemed"}, {"end": 1173.08, "start": 1172.92, "text": "to"}, {"end": 1173.4, "start": 1173.08, "text": "be"}, {"end": 1173.68, "start": 1173.4, "text": "at"}, {"end": 1173.92, "start": 1173.68, "text": "the"}, {"end": 1174.64, "start": 1173.92, "text": "very"}, {"end": 1174.88, "start": 1174.64, "text": "peak"}, {"end": 1175.08, "start": 1174.88, "text": "of"}, {"end": 1177.72, "start": 1175.08, "text": "hype."}, {"end": 1178.88, "start": 1177.72, "text": "Okay,"}, {"end": 1179.4, "start": 1178.88, "text": "move"}, {"end": 1179.8, "start": 1179.4, "text": "on"}, {"end": 1180.08, "start": 1179.8, "text": "a"}, {"end": 1180.96, "start": 1180.08, "text": "year."}, {"end": 1180.96, "start": 1180.96, "text": "In"}, {"end": 1182.96, "start": 1180.96, "text": "2017,"}, {"end": 1183.52, "start": 1182.96, "text": "they"}, {"end": 1183.92, "start": 1183.52, "text": "split"}, {"end": 1184.24, "start": 1183.92, "text": "the"}, {"end": 1184.92, "start": 1184.24, "text": "field"}, {"end": 1185.56, "start": 1184.92, "text": "into"}, {"end": 1186.0, "start": 1185.56, "text": "machine"}, {"end": 1186.52, "start": 1186.0, "text": "learning"}, {"end": 1186.64, "start": 1186.52, "text": "and"}, {"end": 1186.92, "start": 1186.64, "text": "deep"}, {"end": 1187.12, "start": 1186.92, "text": "learning,"}, {"end": 1187.28, "start": 1187.12, "text": "which"}, {"end": 1187.44, "start": 1187.28, "text": "I"}, {"end": 1187.72, "start": 1187.44, "text": "hope"}, {"end": 1187.88, "start": 1187.72, "text": "you"}, {"end": 1188.12, "start": 1187.88, "text": "can"}, {"end": 1188.4, "start": 1188.12, "text": "see"}, {"end": 1188.96, "start": 1188.4, "text": "there"}, {"end": 1189.36, "start": 1188.96, "text": "at"}, {"end": 1189.68, "start": 1189.36, "text": "the"}, {"end": 1190.04, "start": 1189.68, "text": "top."}, {"end": 1190.52, "start": 1190.04, "text": "Again,"}, {"end": 1190.8, "start": 1190.52, "text": "they're"}, {"end": 1191.2, "start": 1190.8, "text": "both"}, {"end": 1191.56, "start": 1191.2, "text": "at"}, {"end": 1191.8, "start": 1191.56, "text": "the"}, {"end": 1192.32, "start": 1191.8, "text": "very"}, {"end": 1192.64, "start": 1192.32, "text": "top"}, {"end": 1192.72, "start": 1192.64, "text": "of"}, {"end": 1192.8, "start": 1192.72, "text": "the"}, {"end": 1193.12, "start": 1192.8, "text": "hype"}, {"end": 1197.2, "start": 1193.12, "text": "cycle."}, {"end": 1197.8, "start": 1197.2, "text": "Okay,"}, {"end": 1199.96, "start": 1197.8, "text": "2018,"}], "text": " and it was then deemed to be at the very peak of hype. Okay, move on a year. In 2017, they split the field into machine learning and deep learning, which I hope you can see there at the top. Again, they're both at the very top of the hype cycle. Okay, 2018,"}, {"chunks": [{"end": 1200.48, "start": 1200.0, "text": "again,"}, {"end": 1200.64, "start": 1200.48, "text": "looks"}, {"end": 1200.64, "start": 1200.64, "text": "pretty"}, {"end": 1200.68, "start": 1200.64, "text": "much"}, {"end": 1201.12, "start": 1200.68, "text": "the"}, {"end": 1201.4, "start": 1201.12, "text": "same"}, {"end": 1201.56, "start": 1201.4, "text": "thing,"}, {"end": 1201.8, "start": 1201.56, "text": "except"}, {"end": 1202.0, "start": 1201.8, "text": "you"}, {"end": 1202.68, "start": 1202.0, "text": "have"}, {"end": 1202.84, "start": 1202.68, "text": "only"}, {"end": 1203.48, "start": 1202.84, "text": "deep"}, {"end": 1203.96, "start": 1203.48, "text": "learning"}, {"end": 1204.76, "start": 1203.96, "text": "as"}, {"end": 1205.28, "start": 1204.76, "text": "a"}, {"end": 1206.12, "start": 1205.28, "text": "representation"}, {"end": 1206.2, "start": 1206.12, "text": "of"}, {"end": 1206.52, "start": 1206.2, "text": "machine"}, {"end": 1207.04, "start": 1206.52, "text": "learning,"}, {"end": 1207.16, "start": 1207.04, "text": "but"}, {"end": 1207.4, "start": 1207.16, "text": "again,"}, {"end": 1207.96, "start": 1207.4, "text": "it's"}, {"end": 1208.08, "start": 1207.96, "text": "at"}, {"end": 1208.4, "start": 1208.08, "text": "the"}, {"end": 1209.04, "start": 1208.4, "text": "maximum"}, {"end": 1209.52, "start": 1209.04, "text": "level"}, {"end": 1209.68, "start": 1209.52, "text": "of"}, {"end": 1210.84, "start": 1209.68, "text": "hype."}, {"end": 1211.04, "start": 1210.84, "text": "Now,"}, {"end": 1211.28, "start": 1211.04, "text": "I"}, {"end": 1211.44, "start": 1211.28, "text": "can't"}, {"end": 1211.8, "start": 1211.44, "text": "show"}, {"end": 1212.16, "start": 1211.8, "text": "you"}, {"end": 1212.4, "start": 1212.16, "text": "any"}, {"end": 1212.84, "start": 1212.4, "text": "more"}, {"end": 1212.88, "start": 1212.84, "text": "of"}, {"end": 1213.28, "start": 1212.88, "text": "this"}, {"end": 1213.64, "start": 1213.28, "text": "story"}, {"end": 1213.92, "start": 1213.64, "text": "because"}, {"end": 1214.36, "start": 1213.92, "text": "after"}, {"end": 1215.36, "start": 1214.36, "text": "that,"}, {"end": 1215.88, "start": 1215.36, "text": "Gartner"}, {"end": 1216.68, "start": 1215.88, "text": "started"}, {"end": 1217.16, "start": 1216.68, "text": "a"}, {"end": 1217.88, "start": 1217.16, "text": "separate"}, {"end": 1218.84, "start": 1217.88, "text": "diagram"}, {"end": 1219.48, "start": 1218.84, "text": "for"}, {"end": 1219.92, "start": 1219.48, "text": "the"}, {"end": 1220.48, "start": 1219.92, "text": "various"}, {"end": 1220.96, "start": 1220.48, "text": "branches"}, {"end": 1221.72, "start": 1220.96, "text": "of"}, {"end": 1222.2, "start": 1221.72, "text": "machine"}, {"end": 1222.64, "start": 1222.2, "text": "learning,"}, {"end": 1223.4, "start": 1222.64, "text": "and"}, {"end": 1223.72, "start": 1223.4, "text": "it"}, {"end": 1224.08, "start": 1223.72, "text": "no"}, {"end": 1224.52, "start": 1224.08, "text": "longer"}, {"end": 1225.08, "start": 1224.52, "text": "appeared"}, {"end": 1225.68, "start": 1225.08, "text": "as"}, {"end": 1226.0, "start": 1225.68, "text": "a"}, {"end": 1226.16, "start": 1226.0, "text": "single"}, {"end": 1227.64, "start": 1226.16, "text": "topic"}, {"end": 1227.68, "start": 1227.64, "text": "on"}, {"end": 1229.08, "start": 1227.68, "text": "the"}, {"end": 1229.08, "start": 1229.08, "text": "main"}, {"end": 1229.96, "start": 1229.08, "text": "diagram."}], "text": " again, looks pretty much the same thing, except you have only deep learning as a representation of machine learning, but again, it's at the maximum level of hype. Now, I can't show you any more of this story because after that, Gartner started a separate diagram for the various branches of machine learning, and it no longer appeared as a single topic on the main diagram."}, {"chunks": [{"end": 1230.96, "start": 1230.0, "text": "Okay."}, {"end": 1232.28, "start": 1230.96, "text": "So"}, {"end": 1233.04, "start": 1232.28, "text": "I'd"}, {"end": 1233.44, "start": 1233.04, "text": "now"}, {"end": 1234.28, "start": 1233.44, "text": "like"}, {"end": 1234.68, "start": 1234.28, "text": "to"}, {"end": 1235.12, "start": 1234.68, "text": "discuss"}, {"end": 1235.92, "start": 1235.12, "text": "why"}, {"end": 1236.2, "start": 1235.92, "text": "I"}, {"end": 1236.2, "start": 1236.2, "text": "do"}, {"end": 1236.24, "start": 1236.2, "text": "not"}, {"end": 1236.64, "start": 1236.24, "text": "believe"}, {"end": 1237.0, "start": 1236.64, "text": "machine"}, {"end": 1237.36, "start": 1237.0, "text": "learning"}, {"end": 1237.6, "start": 1237.36, "text": "can"}, {"end": 1238.12, "start": 1237.6, "text": "fairly"}, {"end": 1238.56, "start": 1238.12, "text": "claim"}, {"end": 1239.12, "start": 1238.56, "text": "to"}, {"end": 1239.48, "start": 1239.12, "text": "be"}, {"end": 1239.68, "start": 1239.48, "text": "all"}, {"end": 1239.76, "start": 1239.68, "text": "of"}, {"end": 1240.04, "start": 1239.76, "text": "data"}, {"end": 1240.64, "start": 1240.04, "text": "science."}, {"end": 1240.68, "start": 1240.64, "text": "I'm"}, {"end": 1241.12, "start": 1240.68, "text": "gonna"}, {"end": 1241.28, "start": 1241.12, "text": "give"}, {"end": 1241.64, "start": 1241.28, "text": "you"}, {"end": 1241.72, "start": 1241.64, "text": "three"}, {"end": 1242.72, "start": 1241.72, "text": "arguments."}, {"end": 1242.8, "start": 1242.72, "text": "Okay,"}, {"end": 1243.4, "start": 1242.8, "text": "one,"}, {"end": 1244.04, "start": 1243.4, "text": "first"}, {"end": 1244.44, "start": 1244.04, "text": "argument"}, {"end": 1244.52, "start": 1244.44, "text": "is"}, {"end": 1245.08, "start": 1244.52, "text": "that"}, {"end": 1245.48, "start": 1245.08, "text": "one"}, {"end": 1246.0, "start": 1245.48, "text": "often"}, {"end": 1246.48, "start": 1246.0, "text": "sees"}, {"end": 1247.12, "start": 1246.48, "text": "classified"}, {"end": 1247.16, "start": 1247.12, "text": "as"}, {"end": 1247.56, "start": 1247.16, "text": "machine"}, {"end": 1248.48, "start": 1247.56, "text": "learning,"}, {"end": 1248.8, "start": 1248.48, "text": "ideas"}, {"end": 1249.44, "start": 1248.8, "text": "that"}, {"end": 1249.88, "start": 1249.44, "text": "really"}, {"end": 1250.24, "start": 1249.88, "text": "come"}, {"end": 1251.24, "start": 1250.24, "text": "from"}, {"end": 1251.52, "start": 1251.24, "text": "elsewhere."}, {"end": 1251.92, "start": 1251.52, "text": "People"}, {"end": 1252.2, "start": 1251.92, "text": "were"}, {"end": 1252.76, "start": 1252.2, "text": "doing"}, {"end": 1253.48, "start": 1252.76, "text": "things"}, {"end": 1253.92, "start": 1253.48, "text": "like"}, {"end": 1254.8, "start": 1253.92, "text": "clustering"}, {"end": 1255.08, "start": 1254.8, "text": "gradient"}, {"end": 1255.88, "start": 1255.08, "text": "descent"}, {"end": 1256.04, "start": 1255.88, "text": "or"}, {"end": 1256.8, "start": 1256.04, "text": "association"}, {"end": 1257.72, "start": 1256.8, "text": "rules"}, {"end": 1258.12, "start": 1257.72, "text": "long"}, {"end": 1258.52, "start": 1258.12, "text": "before"}, {"end": 1258.88, "start": 1258.52, "text": "machine"}, {"end": 1259.16, "start": 1258.88, "text": "learning"}, {"end": 1259.44, "start": 1259.16, "text": "was"}, {"end": 1259.48, "start": 1259.44, "text": "even"}, {"end": 1259.64, "start": 1259.48, "text": "a"}, {"end": 1259.96, "start": 1259.64, "text": "thing."}], "text": " Okay. So I'd now like to discuss why I do not believe machine learning can fairly claim to be all of data science. I'm gonna give you three arguments. Okay, one, first argument is that one often sees classified as machine learning, ideas that really come from elsewhere. People were doing things like clustering gradient descent or association rules long before machine learning was even a thing."}, {"chunks": [{"end": 1262.4, "start": 1260.0, "text": "The"}, {"end": 1263.04, "start": 1262.4, "text": "second"}, {"end": 1263.4, "start": 1263.04, "text": "is"}, {"end": 1264.4, "start": 1263.4, "text": "that"}, {"end": 1265.04, "start": 1264.4, "text": "while"}, {"end": 1265.04, "start": 1265.04, "text": "many"}, {"end": 1265.36, "start": 1265.04, "text": "data"}, {"end": 1265.84, "start": 1265.36, "text": "science"}, {"end": 1266.28, "start": 1265.84, "text": "problems"}, {"end": 1266.6, "start": 1266.28, "text": "can"}, {"end": 1266.6, "start": 1266.6, "text": "be"}, {"end": 1267.24, "start": 1266.6, "text": "expressed"}, {"end": 1267.6, "start": 1267.24, "text": "as"}, {"end": 1268.08, "start": 1267.6, "text": "a"}, {"end": 1268.88, "start": 1268.08, "text": "search"}, {"end": 1269.28, "start": 1268.88, "text": "for"}, {"end": 1269.4, "start": 1269.28, "text": "a"}, {"end": 1269.4, "start": 1269.4, "text": "good"}, {"end": 1269.76, "start": 1269.4, "text": "model,"}, {"end": 1270.08, "start": 1269.76, "text": "there"}, {"end": 1270.28, "start": 1270.08, "text": "are"}, {"end": 1270.6, "start": 1270.28, "text": "also"}, {"end": 1271.08, "start": 1270.6, "text": "many"}, {"end": 1271.48, "start": 1271.08, "text": "other"}, {"end": 1271.72, "start": 1271.48, "text": "problems"}, {"end": 1272.12, "start": 1271.72, "text": "involving"}, {"end": 1272.8, "start": 1272.12, "text": "large-scale"}, {"end": 1273.52, "start": 1272.8, "text": "data"}, {"end": 1274.0, "start": 1273.52, "text": "that"}, {"end": 1274.0, "start": 1274.0, "text": "do"}, {"end": 1274.28, "start": 1274.0, "text": "not"}, {"end": 1275.32, "start": 1274.28, "text": "involve"}, {"end": 1275.92, "start": 1275.32, "text": "modeling."}, {"end": 1276.32, "start": 1275.92, "text": "I'm"}, {"end": 1276.52, "start": 1276.32, "text": "going"}, {"end": 1277.12, "start": 1276.52, "text": "to"}, {"end": 1277.44, "start": 1277.12, "text": "talk"}, {"end": 1277.96, "start": 1277.44, "text": "shortly"}, {"end": 1278.32, "start": 1277.96, "text": "about"}, {"end": 1278.72, "start": 1278.32, "text": "two"}, {"end": 1279.08, "start": 1278.72, "text": "such"}, {"end": 1279.76, "start": 1279.08, "text": "examples"}, {"end": 1280.16, "start": 1279.76, "text": "that"}, {"end": 1280.16, "start": 1280.16, "text": "I"}, {"end": 1280.56, "start": 1280.16, "text": "consider"}, {"end": 1280.88, "start": 1280.56, "text": "very"}, {"end": 1282.84, "start": 1280.88, "text": "important,"}, {"end": 1283.0, "start": 1282.84, "text": "and"}, {"end": 1283.44, "start": 1283.0, "text": "there"}, {"end": 1283.88, "start": 1283.44, "text": "are"}, {"end": 1284.28, "start": 1283.88, "text": "lots"}, {"end": 1285.32, "start": 1284.28, "text": "of"}, {"end": 1285.88, "start": 1285.32, "text": "others."}, {"end": 1286.6, "start": 1285.88, "text": "And"}, {"end": 1287.2, "start": 1286.6, "text": "the"}, {"end": 1288.0, "start": 1287.2, "text": "third"}, {"end": 1289.12, "start": 1288.0, "text": "objection"}, {"end": 1289.96, "start": 1289.12, "text": "is"}], "text": " The second is that while many data science problems can be expressed as a search for a good model, there are also many other problems involving large-scale data that do not involve modeling. I'm going to talk shortly about two such examples that I consider very important, and there are lots of others. And the third objection is"}, {"chunks": [{"end": 1290.4, "start": 1290.0, "text": "that"}, {"end": 1290.56, "start": 1290.4, "text": "often"}, {"end": 1290.72, "start": 1290.56, "text": "the"}, {"end": 1291.0, "start": 1290.72, "text": "best"}, {"end": 1291.4, "start": 1291.0, "text": "machine"}, {"end": 1291.76, "start": 1291.4, "text": "learning"}, {"end": 1292.32, "start": 1291.76, "text": "algorithms"}, {"end": 1292.84, "start": 1292.32, "text": "create"}, {"end": 1293.28, "start": 1292.84, "text": "models"}, {"end": 1293.28, "start": 1293.28, "text": "that"}, {"end": 1293.76, "start": 1293.28, "text": "cannot"}, {"end": 1294.08, "start": 1293.76, "text": "be"}, {"end": 1294.92, "start": 1294.08, "text": "explained"}, {"end": 1295.36, "start": 1294.92, "text": "or"}, {"end": 1297.44, "start": 1295.36, "text": "understood."}, {"end": 1297.88, "start": 1297.44, "text": "Okay,"}, {"end": 1298.48, "start": 1297.88, "text": "sometimes"}, {"end": 1298.76, "start": 1298.48, "text": "you"}, {"end": 1298.76, "start": 1298.76, "text": "don't"}, {"end": 1299.12, "start": 1298.76, "text": "care,"}, {"end": 1299.12, "start": 1299.12, "text": "you"}, {"end": 1299.36, "start": 1299.12, "text": "just"}, {"end": 1299.56, "start": 1299.36, "text": "want"}, {"end": 1299.8, "start": 1299.56, "text": "good"}, {"end": 1300.64, "start": 1299.8, "text": "results."}, {"end": 1301.48, "start": 1300.64, "text": "Okay."}, {"end": 1301.56, "start": 1301.48, "text": "But"}, {"end": 1301.64, "start": 1301.56, "text": "there"}, {"end": 1301.64, "start": 1301.64, "text": "are"}, {"end": 1303.0, "start": 1301.64, "text": "times"}, {"end": 1303.32, "start": 1303.0, "text": "when"}, {"end": 1303.32, "start": 1303.32, "text": "it"}, {"end": 1303.36, "start": 1303.32, "text": "is"}, {"end": 1303.92, "start": 1303.36, "text": "preferable"}, {"end": 1304.04, "start": 1303.92, "text": "to"}, {"end": 1304.32, "start": 1304.04, "text": "use"}, {"end": 1304.64, "start": 1304.32, "text": "another"}, {"end": 1305.0, "start": 1304.64, "text": "approach"}, {"end": 1305.24, "start": 1305.0, "text": "in"}, {"end": 1306.0, "start": 1305.24, "text": "order"}, {"end": 1306.52, "start": 1306.0, "text": "that"}, {"end": 1306.92, "start": 1306.52, "text": "it"}, {"end": 1307.24, "start": 1306.92, "text": "is"}, {"end": 1307.84, "start": 1307.24, "text": "possible"}, {"end": 1308.24, "start": 1307.84, "text": "to"}, {"end": 1308.76, "start": 1308.24, "text": "give"}, {"end": 1309.16, "start": 1308.76, "text": "a"}, {"end": 1309.84, "start": 1309.16, "text": "realistic"}, {"end": 1310.84, "start": 1309.84, "text": "explanation"}, {"end": 1311.56, "start": 1310.84, "text": "of"}, {"end": 1312.08, "start": 1311.56, "text": "what's"}, {"end": 1312.48, "start": 1312.08, "text": "going"}, {"end": 1312.88, "start": 1312.48, "text": "on."}, {"end": 1313.12, "start": 1312.88, "text": "And"}, {"end": 1313.64, "start": 1313.12, "text": "I'm"}, {"end": 1315.4, "start": 1313.64, "text": "gonna"}, {"end": 1316.44, "start": 1315.4, "text": "give"}, {"end": 1318.12, "start": 1316.44, "text": "an"}, {"end": 1318.48, "start": 1318.12, "text": "example"}, {"end": 1319.28, "start": 1318.48, "text": "next."}, {"end": 1319.96, "start": 1319.28, "text": "Okay."}], "text": " that often the best machine learning algorithms create models that cannot be explained or understood. Okay, sometimes you don't care, you just want good results. Okay. But there are times when it is preferable to use another approach in order that it is possible to give a realistic explanation of what's going on. And I'm gonna give an example next. Okay."}, {"chunks": [{"end": 1320.52, "start": 1320.0, "text": "I"}, {"end": 1321.28, "start": 1320.52, "text": "want"}, {"end": 1321.56, "start": 1321.28, "text": "to"}, {"end": 1321.92, "start": 1321.56, "text": "remind"}, {"end": 1322.04, "start": 1321.92, "text": "everybody"}, {"end": 1322.16, "start": 1322.04, "text": "about"}, {"end": 1323.28, "start": 1322.16, "text": "association"}, {"end": 1324.04, "start": 1323.28, "text": "rules."}, {"end": 1324.16, "start": 1324.04, "text": "I"}, {"end": 1324.56, "start": 1324.16, "text": "assume"}, {"end": 1324.84, "start": 1324.56, "text": "most"}, {"end": 1325.48, "start": 1324.84, "text": "people"}, {"end": 1325.68, "start": 1325.48, "text": "are"}, {"end": 1326.04, "start": 1325.68, "text": "pretty"}, {"end": 1326.36, "start": 1326.04, "text": "familiar"}, {"end": 1326.6, "start": 1326.36, "text": "with"}, {"end": 1327.28, "start": 1326.6, "text": "them."}, {"end": 1327.68, "start": 1327.28, "text": "Okay,"}, {"end": 1328.12, "start": 1327.68, "text": "so"}, {"end": 1328.36, "start": 1328.12, "text": "these"}, {"end": 1328.48, "start": 1328.36, "text": "are"}, {"end": 1329.56, "start": 1328.48, "text": "if-then"}, {"end": 1330.16, "start": 1329.56, "text": "rules"}, {"end": 1330.64, "start": 1330.16, "text": "developed"}, {"end": 1330.88, "start": 1330.64, "text": "from"}, {"end": 1331.96, "start": 1330.88, "text": "data."}, {"end": 1332.24, "start": 1331.96, "text": "The"}, {"end": 1332.92, "start": 1332.24, "text": "original"}, {"end": 1333.4, "start": 1332.92, "text": "application"}, {"end": 1333.4, "start": 1333.4, "text": "and"}, {"end": 1333.76, "start": 1333.4, "text": "the"}, {"end": 1334.44, "start": 1333.76, "text": "easiest"}, {"end": 1334.88, "start": 1334.44, "text": "way"}, {"end": 1335.08, "start": 1334.88, "text": "to"}, {"end": 1335.4, "start": 1335.08, "text": "think"}, {"end": 1335.8, "start": 1335.4, "text": "about"}, {"end": 1336.6, "start": 1335.8, "text": "association"}, {"end": 1337.08, "start": 1336.6, "text": "rules"}, {"end": 1337.52, "start": 1337.08, "text": "is"}, {"end": 1338.12, "start": 1337.52, "text": "that"}, {"end": 1338.56, "start": 1338.12, "text": "they"}, {"end": 1339.12, "start": 1338.56, "text": "are"}, {"end": 1339.84, "start": 1339.12, "text": "derived"}, {"end": 1340.88, "start": 1339.84, "text": "from"}, {"end": 1341.04, "start": 1340.88, "text": "a"}, {"end": 1341.4, "start": 1341.04, "text": "data"}, {"end": 1341.48, "start": 1341.4, "text": "set"}, {"end": 1341.72, "start": 1341.48, "text": "consisting"}, {"end": 1341.76, "start": 1341.72, "text": "of"}, {"end": 1342.4, "start": 1341.76, "text": "baskets,"}, {"end": 1343.04, "start": 1342.4, "text": "such"}, {"end": 1343.76, "start": 1343.04, "text": "as"}, {"end": 1344.24, "start": 1343.76, "text": "the"}, {"end": 1344.68, "start": 1344.24, "text": "baskets"}, {"end": 1344.88, "start": 1344.68, "text": "one"}, {"end": 1345.12, "start": 1344.88, "text": "might"}, {"end": 1345.52, "start": 1345.12, "text": "fill"}, {"end": 1345.88, "start": 1345.52, "text": "in"}, {"end": 1346.28, "start": 1345.88, "text": "the"}, {"end": 1347.12, "start": 1346.28, "text": "supermarket"}, {"end": 1347.36, "start": 1347.12, "text": "with"}, {"end": 1348.44, "start": 1347.36, "text": "various"}, {"end": 1349.0, "start": 1348.44, "text": "items."}, {"end": 1349.32, "start": 1349.0, "text": "And"}, {"end": 1349.96, "start": 1349.32, "text": "each"}], "text": " I want to remind everybody about association rules. I assume most people are pretty familiar with them. Okay, so these are if-then rules developed from data. The original application and the easiest way to think about association rules is that they are derived from a data set consisting of baskets, such as the baskets one might fill in the supermarket with various items. And each"}, {"chunks": [{"end": 1350.08, "start": 1350.0, "text": "Each"}, {"end": 1350.56, "start": 1350.08, "text": "basket"}, {"end": 1351.04, "start": 1350.56, "text": "contains"}, {"end": 1351.04, "start": 1351.04, "text": "a"}, {"end": 1351.28, "start": 1351.04, "text": "small"}, {"end": 1351.64, "start": 1351.28, "text": "set"}, {"end": 1352.04, "start": 1351.64, "text": "of"}, {"end": 1352.68, "start": 1352.04, "text": "items."}, {"end": 1352.92, "start": 1352.68, "text": "We"}, {"end": 1353.36, "start": 1352.92, "text": "want"}, {"end": 1353.44, "start": 1353.36, "text": "to"}, {"end": 1354.24, "start": 1353.44, "text": "know"}, {"end": 1355.2, "start": 1354.24, "text": "rules"}, {"end": 1355.68, "start": 1355.2, "text": "that"}, {"end": 1356.0, "start": 1355.68, "text": "say"}, {"end": 1356.6, "start": 1356.0, "text": "when"}, {"end": 1356.8, "start": 1356.6, "text": "a"}, {"end": 1357.04, "start": 1356.8, "text": "certain"}, {"end": 1358.24, "start": 1357.04, "text": "set"}, {"end": 1358.88, "start": 1358.24, "text": "of"}, {"end": 1359.36, "start": 1358.88, "text": "items"}, {"end": 1359.96, "start": 1359.36, "text": "appears"}, {"end": 1359.96, "start": 1359.96, "text": "in"}, {"end": 1359.96, "start": 1359.96, "text": "a"}, {"end": 1360.12, "start": 1359.96, "text": "basket,"}, {"end": 1361.12, "start": 1360.12, "text": "then"}, {"end": 1361.6, "start": 1361.12, "text": "it"}, {"end": 1362.08, "start": 1361.6, "text": "is"}, {"end": 1362.44, "start": 1362.08, "text": "then"}, {"end": 1362.8, "start": 1362.44, "text": "much"}, {"end": 1363.12, "start": 1362.8, "text": "more"}, {"end": 1363.72, "start": 1363.12, "text": "likely"}, {"end": 1364.84, "start": 1363.72, "text": "than"}, {"end": 1365.24, "start": 1364.84, "text": "would"}, {"end": 1365.4, "start": 1365.24, "text": "be"}, {"end": 1366.2, "start": 1365.4, "text": "otherwise"}, {"end": 1366.76, "start": 1366.2, "text": "expected"}, {"end": 1367.32, "start": 1366.76, "text": "to"}, {"end": 1367.8, "start": 1367.32, "text": "find"}, {"end": 1368.16, "start": 1367.8, "text": "another"}, {"end": 1368.96, "start": 1368.16, "text": "particular"}, {"end": 1369.28, "start": 1368.96, "text": "item."}, {"end": 1369.88, "start": 1369.28, "text": "So"}, {"end": 1370.24, "start": 1369.88, "text": "for"}, {"end": 1370.72, "start": 1370.24, "text": "example,"}, {"end": 1371.08, "start": 1370.72, "text": "many"}, {"end": 1371.48, "start": 1371.08, "text": "people"}, {"end": 1371.48, "start": 1371.48, "text": "like"}, {"end": 1371.88, "start": 1371.48, "text": "peanut"}, {"end": 1372.28, "start": 1371.88, "text": "butter"}, {"end": 1372.28, "start": 1372.28, "text": "and"}, {"end": 1372.28, "start": 1372.28, "text": "jelly"}, {"end": 1373.4, "start": 1372.28, "text": "sandwiches."}, {"end": 1373.52, "start": 1373.4, "text": "So"}, {"end": 1373.92, "start": 1373.52, "text": "if"}, {"end": 1374.24, "start": 1373.92, "text": "a"}, {"end": 1374.64, "start": 1374.24, "text": "basket"}, {"end": 1375.44, "start": 1374.64, "text": "contains"}, {"end": 1375.68, "start": 1375.44, "text": "bread"}, {"end": 1376.12, "start": 1375.68, "text": "and"}, {"end": 1376.44, "start": 1376.12, "text": "jelly,"}, {"end": 1376.84, "start": 1376.44, "text": "it"}, {"end": 1377.16, "start": 1376.84, "text": "is"}, {"end": 1377.6, "start": 1377.16, "text": "likely"}, {"end": 1377.8, "start": 1377.6, "text": "to"}, {"end": 1378.08, "start": 1377.8, "text": "contain"}, {"end": 1378.36, "start": 1378.08, "text": "peanut"}, {"end": 1378.68, "start": 1378.36, "text": "butter"}, {"end": 1379.08, "start": 1378.68, "text": "as"}, {"end": 1379.96, "start": 1379.08, "text": "well."}], "text": " Each basket contains a small set of items. We want to know rules that say when a certain set of items appears in a basket, then it is then much more likely than would be otherwise expected to find another particular item. So for example, many people like peanut butter and jelly sandwiches. So if a basket contains bread and jelly, it is likely to contain peanut butter as well."}, {"chunks": [{"end": 1382.88, "start": 1380.0, "text": "Okay,"}, {"end": 1383.6, "start": 1382.88, "text": "now,"}, {"end": 1384.44, "start": 1383.6, "text": "association"}, {"end": 1386.16, "start": 1384.44, "text": "rules"}, {"end": 1386.84, "start": 1386.16, "text": "were"}, {"end": 1387.52, "start": 1386.84, "text": "these"}, {"end": 1388.16, "start": 1387.52, "text": "algorithms"}, {"end": 1389.64, "start": 1388.16, "text": "for"}, {"end": 1390.24, "start": 1389.64, "text": "efficiently"}, {"end": 1390.56, "start": 1390.24, "text": "finding"}, {"end": 1391.36, "start": 1390.56, "text": "association"}, {"end": 1391.76, "start": 1391.36, "text": "rules"}, {"end": 1392.0, "start": 1391.76, "text": "first"}, {"end": 1392.08, "start": 1392.0, "text": "came"}, {"end": 1392.36, "start": 1392.08, "text": "out"}, {"end": 1392.56, "start": 1392.36, "text": "of"}, {"end": 1393.2, "start": 1392.56, "text": "the"}, {"end": 1394.08, "start": 1393.2, "text": "database"}, {"end": 1394.48, "start": 1394.08, "text": "community,"}, {"end": 1394.68, "start": 1394.48, "text": "not"}, {"end": 1395.24, "start": 1394.68, "text": "the"}, {"end": 1395.6, "start": 1395.24, "text": "machine"}, {"end": 1395.88, "start": 1395.6, "text": "learning"}, {"end": 1396.52, "start": 1395.88, "text": "community."}, {"end": 1396.64, "start": 1396.52, "text": "It"}, {"end": 1397.16, "start": 1396.64, "text": "was"}, {"end": 1398.0, "start": 1397.16, "text": "Rakesh"}, {"end": 1398.76, "start": 1398.0, "text": "Agrawal"}, {"end": 1399.04, "start": 1398.76, "text": "and"}, {"end": 1399.2, "start": 1399.04, "text": "his"}, {"end": 1399.84, "start": 1399.2, "text": "colleagues"}, {"end": 1399.96, "start": 1399.84, "text": "at"}, {"end": 1400.32, "start": 1399.96, "text": "IBM"}, {"end": 1400.44, "start": 1400.32, "text": "who"}, {"end": 1400.64, "start": 1400.44, "text": "did"}, {"end": 1400.84, "start": 1400.64, "text": "the"}, {"end": 1401.36, "start": 1400.84, "text": "original"}, {"end": 1402.4, "start": 1401.36, "text": "work."}, {"end": 1402.84, "start": 1402.4, "text": "And"}, {"end": 1403.68, "start": 1402.84, "text": "these"}, {"end": 1404.24, "start": 1403.68, "text": "algorithms,"}, {"end": 1404.32, "start": 1404.24, "text": "you"}, {"end": 1404.44, "start": 1404.32, "text": "see,"}, {"end": 1404.44, "start": 1404.44, "text": "did"}, {"end": 1404.84, "start": 1404.44, "text": "not"}, {"end": 1405.36, "start": 1404.84, "text": "learn"}, {"end": 1406.24, "start": 1405.36, "text": "anything."}, {"end": 1406.6, "start": 1406.24, "text": "They"}, {"end": 1406.92, "start": 1406.6, "text": "just"}, {"end": 1407.4, "start": 1406.92, "text": "counted"}, {"end": 1408.4, "start": 1407.4, "text": "co-occurrences"}, {"end": 1408.64, "start": 1408.4, "text": "of"}, {"end": 1409.24, "start": 1408.64, "text": "items"}, {"end": 1409.32, "start": 1409.24, "text": "in"}, {"end": 1409.4, "start": 1409.32, "text": "an"}, {"end": 1409.72, "start": 1409.4, "text": "efficient"}, {"end": 1409.96, "start": 1409.72, "text": "way."}], "text": " Okay, now, association rules were these algorithms for efficiently finding association rules first came out of the database community, not the machine learning community. It was Rakesh Agrawal and his colleagues at IBM who did the original work. And these algorithms, you see, did not learn anything. They just counted co-occurrences of items in an efficient way."}, {"chunks": [{"end": 1412.72, "start": 1410.0, "text": "So"}, {"end": 1413.44, "start": 1412.72, "text": "let's"}, {"end": 1417.68, "start": 1413.44, "text": "see"}, {"end": 1418.2, "start": 1417.68, "text": "how"}, {"end": 1418.52, "start": 1418.2, "text": "we"}, {"end": 1419.36, "start": 1418.52, "text": "could"}, {"end": 1419.96, "start": 1419.36, "text": "apply"}, {"end": 1420.68, "start": 1419.96, "text": "association"}, {"end": 1421.36, "start": 1420.68, "text": "rules"}, {"end": 1421.56, "start": 1421.36, "text": "to"}, {"end": 1422.24, "start": 1421.56, "text": "discovering"}, {"end": 1422.64, "start": 1422.24, "text": "phishing"}, {"end": 1423.0, "start": 1422.64, "text": "attacks."}, {"end": 1423.24, "start": 1423.0, "text": "We"}, {"end": 1424.6, "start": 1423.24, "text": "would"}, {"end": 1425.36, "start": 1424.6, "text": "think"}, {"end": 1425.92, "start": 1425.36, "text": "of"}, {"end": 1426.28, "start": 1425.92, "text": "each"}, {"end": 1426.6, "start": 1426.28, "text": "phishing"}, {"end": 1427.0, "start": 1426.6, "text": "email"}, {"end": 1427.32, "start": 1427.0, "text": "as"}, {"end": 1427.4, "start": 1427.32, "text": "a"}, {"end": 1428.04, "start": 1427.4, "text": "basket"}, {"end": 1428.4, "start": 1428.04, "text": "of"}, {"end": 1428.76, "start": 1428.4, "text": "words"}, {"end": 1429.04, "start": 1428.76, "text": "and"}, {"end": 1429.16, "start": 1429.04, "text": "look"}, {"end": 1429.68, "start": 1429.16, "text": "for"}, {"end": 1430.12, "start": 1429.68, "text": "sets"}, {"end": 1430.16, "start": 1430.12, "text": "of"}, {"end": 1430.68, "start": 1430.16, "text": "words"}, {"end": 1430.68, "start": 1430.68, "text": "that"}, {"end": 1431.08, "start": 1430.68, "text": "appeared"}, {"end": 1431.36, "start": 1431.08, "text": "more"}, {"end": 1431.88, "start": 1431.36, "text": "commonly"}, {"end": 1432.04, "start": 1431.88, "text": "in"}, {"end": 1432.92, "start": 1432.04, "text": "phishing"}, {"end": 1433.72, "start": 1432.92, "text": "emails"}, {"end": 1434.16, "start": 1433.72, "text": "than"}, {"end": 1434.36, "start": 1434.16, "text": "in"}, {"end": 1434.72, "start": 1434.36, "text": "other"}, {"end": 1435.48, "start": 1434.72, "text": "emails."}, {"end": 1435.68, "start": 1435.48, "text": "For"}, {"end": 1436.52, "start": 1435.68, "text": "example,"}, {"end": 1436.8, "start": 1436.52, "text": "if"}, {"end": 1436.8, "start": 1436.8, "text": "we"}, {"end": 1436.84, "start": 1436.8, "text": "did"}, {"end": 1437.2, "start": 1436.84, "text": "this"}, {"end": 1437.72, "start": 1437.2, "text": "analysis,"}, {"end": 1437.88, "start": 1437.72, "text": "we"}, {"end": 1438.4, "start": 1437.88, "text": "might"}, {"end": 1438.96, "start": 1438.4, "text": "discover"}, {"end": 1439.44, "start": 1438.96, "text": "that"}, {"end": 1439.8, "start": 1439.44, "text": "the"}, {"end": 1439.96, "start": 1439.8, "text": "two"}], "text": " So let's see how we could apply association rules to discovering phishing attacks. We would think of each phishing email as a basket of words and look for sets of words that appeared more commonly in phishing emails than in other emails. For example, if we did this analysis, we might discover that the two"}, {"chunks": [{"end": 1440.44, "start": 1440.0, "text": "words"}, {"end": 1441.52, "start": 1440.44, "text": "Nigerian"}, {"end": 1442.08, "start": 1441.52, "text": "and"}, {"end": 1443.4, "start": 1442.08, "text": "prince"}, {"end": 1443.96, "start": 1443.4, "text": "together"}, {"end": 1444.44, "start": 1443.96, "text": "in"}, {"end": 1444.68, "start": 1444.44, "text": "an"}, {"end": 1444.8, "start": 1444.68, "text": "email"}, {"end": 1445.08, "start": 1444.8, "text": "were"}, {"end": 1445.36, "start": 1445.08, "text": "a"}, {"end": 1445.44, "start": 1445.36, "text": "good"}, {"end": 1445.92, "start": 1445.44, "text": "indication"}, {"end": 1446.32, "start": 1445.92, "text": "that"}, {"end": 1446.48, "start": 1446.32, "text": "the"}, {"end": 1446.64, "start": 1446.48, "text": "email"}, {"end": 1446.96, "start": 1446.64, "text": "was"}, {"end": 1449.12, "start": 1446.96, "text": "phishing."}, {"end": 1449.52, "start": 1449.12, "text": "Well,"}, {"end": 1450.08, "start": 1449.52, "text": "this"}, {"end": 1450.8, "start": 1450.08, "text": "sort"}, {"end": 1450.92, "start": 1450.8, "text": "of"}, {"end": 1451.28, "start": 1450.92, "text": "approach"}, {"end": 1451.76, "start": 1451.28, "text": "is"}, {"end": 1452.0, "start": 1451.76, "text": "a"}, {"end": 1452.28, "start": 1452.0, "text": "few"}, {"end": 1452.92, "start": 1452.28, "text": "percentage"}, {"end": 1453.36, "start": 1452.92, "text": "points"}, {"end": 1453.36, "start": 1453.36, "text": "even"}, {"end": 1453.8, "start": 1453.36, "text": "less"}, {"end": 1454.4, "start": 1453.8, "text": "accurate"}, {"end": 1454.76, "start": 1454.4, "text": "than"}, {"end": 1455.28, "start": 1454.76, "text": "even"}, {"end": 1455.56, "start": 1455.28, "text": "a"}, {"end": 1456.04, "start": 1455.56, "text": "simple"}, {"end": 1456.44, "start": 1456.04, "text": "machine"}, {"end": 1456.84, "start": 1456.44, "text": "learning"}, {"end": 1457.4, "start": 1456.84, "text": "approach,"}, {"end": 1457.6, "start": 1457.4, "text": "such"}, {"end": 1457.68, "start": 1457.6, "text": "as"}, {"end": 1458.16, "start": 1457.68, "text": "learning"}, {"end": 1458.52, "start": 1458.16, "text": "a"}, {"end": 1458.88, "start": 1458.52, "text": "weight"}, {"end": 1459.08, "start": 1458.88, "text": "for"}, {"end": 1459.28, "start": 1459.08, "text": "each"}, {"end": 1459.72, "start": 1459.28, "text": "word"}, {"end": 1460.44, "start": 1459.72, "text": "and"}, {"end": 1460.76, "start": 1460.44, "text": "summing"}, {"end": 1460.92, "start": 1460.76, "text": "the"}, {"end": 1461.28, "start": 1460.92, "text": "weights"}, {"end": 1461.36, "start": 1461.28, "text": "of"}, {"end": 1462.48, "start": 1461.36, "text": "words"}, {"end": 1462.6, "start": 1462.48, "text": "to"}, {"end": 1463.08, "start": 1462.6, "text": "see"}, {"end": 1463.36, "start": 1463.08, "text": "if"}, {"end": 1463.72, "start": 1463.36, "text": "the"}, {"end": 1463.96, "start": 1463.72, "text": "sum"}, {"end": 1464.48, "start": 1463.96, "text": "exceeds"}, {"end": 1464.52, "start": 1464.48, "text": "a"}, {"end": 1465.08, "start": 1464.52, "text": "threshold"}, {"end": 1465.68, "start": 1465.08, "text": "and"}, {"end": 1466.72, "start": 1465.68, "text": "declaring"}, {"end": 1467.04, "start": 1466.72, "text": "phishing"}, {"end": 1467.28, "start": 1467.04, "text": "if"}, {"end": 1467.52, "start": 1467.28, "text": "it"}, {"end": 1469.32, "start": 1467.52, "text": "does."}, {"end": 1469.96, "start": 1469.32, "text": "Okay."}], "text": " words Nigerian and prince together in an email were a good indication that the email was phishing. Well, this sort of approach is a few percentage points even less accurate than even a simple machine learning approach, such as learning a weight for each word and summing the weights of words to see if the sum exceeds a threshold and declaring phishing if it does. Okay."}, {"chunks": [{"end": 1470.68, "start": 1470.0, "text": "But"}, {"end": 1471.76, "start": 1470.68, "text": "there"}, {"end": 1472.6, "start": 1471.76, "text": "is"}, {"end": 1472.92, "start": 1472.6, "text": "a"}, {"end": 1473.52, "start": 1472.92, "text": "key"}, {"end": 1474.16, "start": 1473.52, "text": "advantage"}, {"end": 1474.16, "start": 1474.16, "text": "to"}, {"end": 1474.28, "start": 1474.16, "text": "the"}, {"end": 1475.24, "start": 1474.28, "text": "association"}, {"end": 1475.72, "start": 1475.24, "text": "rule"}, {"end": 1477.52, "start": 1475.72, "text": "approach."}, {"end": 1478.4, "start": 1477.52, "text": "Association"}, {"end": 1479.04, "start": 1478.4, "text": "rules"}, {"end": 1479.88, "start": 1479.04, "text": "can"}, {"end": 1480.2, "start": 1479.88, "text": "be"}, {"end": 1481.04, "start": 1480.2, "text": "used"}, {"end": 1481.48, "start": 1481.04, "text": "as"}, {"end": 1481.6, "start": 1481.48, "text": "a"}, {"end": 1481.72, "start": 1481.6, "text": "clear"}, {"end": 1482.04, "start": 1481.72, "text": "explanation"}, {"end": 1482.08, "start": 1482.04, "text": "of"}, {"end": 1482.68, "start": 1482.08, "text": "why"}, {"end": 1483.04, "start": 1482.68, "text": "a"}, {"end": 1483.28, "start": 1483.04, "text": "decision"}, {"end": 1483.92, "start": 1483.28, "text": "was"}, {"end": 1484.44, "start": 1483.92, "text": "made."}, {"end": 1484.6, "start": 1484.44, "text": "So"}, {"end": 1484.72, "start": 1484.6, "text": "when"}, {"end": 1485.36, "start": 1484.72, "text": "someone"}, {"end": 1485.84, "start": 1485.36, "text": "who"}, {"end": 1486.16, "start": 1485.84, "text": "really"}, {"end": 1486.52, "start": 1486.16, "text": "is"}, {"end": 1486.6, "start": 1486.52, "text": "a"}, {"end": 1487.16, "start": 1486.6, "text": "Nigerian"}, {"end": 1487.76, "start": 1487.16, "text": "prince"}, {"end": 1488.84, "start": 1487.76, "text": "complains"}, {"end": 1489.6, "start": 1488.84, "text": "that"}, {"end": 1490.24, "start": 1489.6, "text": "all"}, {"end": 1491.56, "start": 1490.24, "text": "their"}, {"end": 1491.68, "start": 1491.56, "text": "emails"}, {"end": 1491.72, "start": 1491.68, "text": "are"}, {"end": 1492.08, "start": 1491.72, "text": "going"}, {"end": 1492.24, "start": 1492.08, "text": "to"}, {"end": 1492.72, "start": 1492.24, "text": "spam,"}, {"end": 1493.32, "start": 1492.72, "text": "we"}, {"end": 1493.32, "start": 1493.32, "text": "can"}, {"end": 1493.32, "start": 1493.32, "text": "show"}, {"end": 1493.32, "start": 1493.32, "text": "them"}, {"end": 1493.32, "start": 1493.32, "text": "the"}, {"end": 1493.32, "start": 1493.32, "text": "rule"}, {"end": 1493.36, "start": 1493.32, "text": "that"}, {"end": 1494.0, "start": 1493.36, "text": "says"}, {"end": 1494.28, "start": 1494.0, "text": "any"}, {"end": 1494.64, "start": 1494.28, "text": "email"}, {"end": 1495.08, "start": 1494.64, "text": "with"}, {"end": 1495.36, "start": 1495.08, "text": "these"}, {"end": 1495.64, "start": 1495.36, "text": "two"}, {"end": 1496.16, "start": 1495.64, "text": "words,"}, {"end": 1496.8, "start": 1496.16, "text": "Nigerian"}, {"end": 1497.08, "start": 1496.8, "text": "and"}, {"end": 1498.44, "start": 1497.08, "text": "prince,"}, {"end": 1498.72, "start": 1498.44, "text": "are"}, {"end": 1499.44, "start": 1498.72, "text": "considered"}, {"end": 1499.96, "start": 1499.44, "text": "spam."}], "text": " But there is a key advantage to the association rule approach. Association rules can be used as a clear explanation of why a decision was made. So when someone who really is a Nigerian prince complains that all their emails are going to spam, we can show them the rule that says any email with these two words, Nigerian and prince, are considered spam."}, {"chunks": [{"end": 1501.16, "start": 1500.0, "text": "On"}, {"end": 1504.28, "start": 1501.16, "text": "the"}, {"end": 1504.72, "start": 1504.28, "text": "other"}, {"end": 1505.2, "start": 1504.72, "text": "hand,"}, {"end": 1505.84, "start": 1505.2, "text": "you"}, {"end": 1506.24, "start": 1505.84, "text": "may"}, {"end": 1506.76, "start": 1506.24, "text": "have"}, {"end": 1507.4, "start": 1506.76, "text": "found"}, {"end": 1507.68, "start": 1507.4, "text": "that"}, {"end": 1508.08, "start": 1507.68, "text": "Gmail"}, {"end": 1508.44, "start": 1508.08, "text": "puts"}, {"end": 1508.88, "start": 1508.44, "text": "into"}, {"end": 1509.04, "start": 1508.88, "text": "spam"}, {"end": 1509.56, "start": 1509.04, "text": "something"}, {"end": 1510.44, "start": 1509.56, "text": "that"}, {"end": 1510.92, "start": 1510.44, "text": "really"}, {"end": 1511.88, "start": 1510.92, "text": "isn't."}, {"end": 1512.2, "start": 1511.88, "text": "It"}, {"end": 1512.48, "start": 1512.2, "text": "happened"}, {"end": 1512.48, "start": 1512.48, "text": "to"}, {"end": 1512.72, "start": 1512.48, "text": "me"}, {"end": 1513.36, "start": 1512.72, "text": "recently."}, {"end": 1513.6, "start": 1513.36, "text": "And"}, {"end": 1514.12, "start": 1513.6, "text": "when"}, {"end": 1514.52, "start": 1514.12, "text": "I"}, {"end": 1514.84, "start": 1514.52, "text": "asked"}, {"end": 1515.24, "start": 1514.84, "text": "why"}, {"end": 1515.28, "start": 1515.24, "text": "it"}, {"end": 1515.52, "start": 1515.28, "text": "was"}, {"end": 1516.24, "start": 1515.52, "text": "considered"}, {"end": 1517.4, "start": 1516.24, "text": "spam,"}, {"end": 1517.92, "start": 1517.4, "text": "all"}, {"end": 1518.44, "start": 1517.92, "text": "I"}, {"end": 1519.04, "start": 1518.44, "text": "got"}, {"end": 1519.52, "start": 1519.04, "text": "was"}, {"end": 1519.68, "start": 1519.52, "text": "a"}, {"end": 1520.2, "start": 1519.68, "text": "statement"}, {"end": 1520.56, "start": 1520.2, "text": "that"}, {"end": 1520.96, "start": 1520.56, "text": "it"}, {"end": 1521.44, "start": 1520.96, "text": "looks"}, {"end": 1521.76, "start": 1521.44, "text": "like"}, {"end": 1522.04, "start": 1521.76, "text": "other"}, {"end": 1522.4, "start": 1522.04, "text": "emails"}, {"end": 1523.0, "start": 1522.4, "text": "that"}, {"end": 1523.32, "start": 1523.0, "text": "many"}, {"end": 1523.68, "start": 1523.32, "text": "people"}, {"end": 1524.0, "start": 1523.68, "text": "reported"}, {"end": 1524.44, "start": 1524.0, "text": "as"}, {"end": 1524.6, "start": 1524.44, "text": "spam."}, {"end": 1524.64, "start": 1524.6, "text": "In"}, {"end": 1526.04, "start": 1524.64, "text": "other"}, {"end": 1526.52, "start": 1526.04, "text": "words,"}, {"end": 1526.8, "start": 1526.52, "text": "what"}, {"end": 1527.0, "start": 1526.8, "text": "they're"}, {"end": 1527.64, "start": 1527.0, "text": "saying"}, {"end": 1528.0, "start": 1527.64, "text": "is"}, {"end": 1528.76, "start": 1528.0, "text": "whatever"}, {"end": 1529.2, "start": 1528.76, "text": "model"}, {"end": 1529.2, "start": 1529.2, "text": "of"}, {"end": 1529.56, "start": 1529.2, "text": "spam"}, {"end": 1529.6, "start": 1529.56, "text": "we're"}, {"end": 1529.88, "start": 1529.6, "text": "using"}, {"end": 1529.96, "start": 1529.88, "text": "today"}], "text": " On the other hand, you may have found that Gmail puts into spam something that really isn't. It happened to me recently. And when I asked why it was considered spam, all I got was a statement that it looks like other emails that many people reported as spam. In other words, what they're saying is whatever model of spam we're using today"}, {"chunks": [{"end": 1530.2, "start": 1530.0, "text": "They"}, {"end": 1530.84, "start": 1530.2, "text": "said"}, {"end": 1531.48, "start": 1530.84, "text": "it"}, {"end": 1532.4, "start": 1531.48, "text": "was"}, {"end": 1532.68, "start": 1532.4, "text": "spam."}, {"end": 1533.12, "start": 1532.68, "text": "Don't"}, {"end": 1533.56, "start": 1533.12, "text": "bother"}, {"end": 1533.76, "start": 1533.56, "text": "us."}, {"end": 1534.08, "start": 1533.76, "text": "We"}, {"end": 1536.88, "start": 1534.08, "text": "can't"}, {"end": 1536.96, "start": 1536.88, "text": "really"}, {"end": 1537.92, "start": 1536.96, "text": "explain"}, {"end": 1538.2, "start": 1537.92, "text": "it"}, {"end": 1538.2, "start": 1538.2, "text": "in"}, {"end": 1538.2, "start": 1538.2, "text": "any"}, {"end": 1538.44, "start": 1538.2, "text": "detail."}, {"end": 1538.88, "start": 1538.44, "text": "Okay."}, {"end": 1539.36, "start": 1538.88, "text": "Now,"}, {"end": 1539.6, "start": 1539.36, "text": "that's"}, {"end": 1539.84, "start": 1539.6, "text": "not"}, {"end": 1540.16, "start": 1539.84, "text": "such"}, {"end": 1540.32, "start": 1540.16, "text": "a"}, {"end": 1540.36, "start": 1540.32, "text": "big"}, {"end": 1541.08, "start": 1540.36, "text": "deal."}, {"end": 1541.44, "start": 1541.08, "text": "I"}, {"end": 1542.12, "start": 1541.44, "text": "found"}, {"end": 1542.48, "start": 1542.12, "text": "the"}, {"end": 1542.48, "start": 1542.48, "text": "email"}, {"end": 1542.48, "start": 1542.48, "text": "I"}, {"end": 1542.48, "start": 1542.48, "text": "was"}, {"end": 1543.32, "start": 1542.48, "text": "expecting"}, {"end": 1543.68, "start": 1543.32, "text": "and"}, {"end": 1543.88, "start": 1543.68, "text": "all"}, {"end": 1544.12, "start": 1543.88, "text": "was"}, {"end": 1544.84, "start": 1544.12, "text": "fine."}, {"end": 1545.28, "start": 1544.84, "text": "And"}, {"end": 1546.0, "start": 1545.28, "text": "I"}, {"end": 1546.52, "start": 1546.0, "text": "should"}, {"end": 1547.08, "start": 1546.52, "text": "say"}, {"end": 1547.16, "start": 1547.08, "text": "I"}, {"end": 1547.88, "start": 1547.16, "text": "appreciate"}, {"end": 1548.44, "start": 1547.88, "text": "that"}, {"end": 1548.64, "start": 1548.44, "text": "Google"}, {"end": 1549.08, "start": 1548.64, "text": "not"}, {"end": 1549.32, "start": 1549.08, "text": "only"}, {"end": 1549.8, "start": 1549.32, "text": "provides"}, {"end": 1550.0, "start": 1549.8, "text": "me"}, {"end": 1550.44, "start": 1550.0, "text": "free"}, {"end": 1551.56, "start": 1550.44, "text": "email,"}, {"end": 1551.84, "start": 1551.56, "text": "but"}, {"end": 1552.08, "start": 1551.84, "text": "is"}, {"end": 1552.6, "start": 1552.08, "text": "generally"}, {"end": 1553.08, "start": 1552.6, "text": "quite"}, {"end": 1553.52, "start": 1553.08, "text": "accurate"}, {"end": 1554.0, "start": 1553.52, "text": "in"}, {"end": 1555.08, "start": 1554.0, "text": "identifying"}, {"end": 1556.68, "start": 1555.08, "text": "spam."}, {"end": 1557.44, "start": 1556.68, "text": "Okay."}, {"end": 1557.96, "start": 1557.44, "text": "But"}, {"end": 1558.36, "start": 1557.96, "text": "what"}, {"end": 1559.28, "start": 1558.36, "text": "happens,"}, {"end": 1559.56, "start": 1559.28, "text": "say,"}, {"end": 1559.6, "start": 1559.56, "text": "when"}, {"end": 1559.76, "start": 1559.6, "text": "you're"}, {"end": 1559.96, "start": 1559.76, "text": "insuring"}], "text": " They said it was spam. Don't bother us. We can't really explain it in any detail. Okay. Now, that's not such a big deal. I found the email I was expecting and all was fine. And I should say I appreciate that Google not only provides me free email, but is generally quite accurate in identifying spam. Okay. But what happens, say, when you're insuring"}, {"chunks": [{"end": 1560.08, "start": 1560.0, "text": "a"}, {"end": 1560.72, "start": 1560.08, "text": "science"}, {"end": 1560.72, "start": 1560.72, "text": "company"}, {"end": 1561.36, "start": 1560.72, "text": "constructs"}, {"end": 1561.36, "start": 1561.36, "text": "a"}, {"end": 1561.92, "start": 1561.36, "text": "machine"}, {"end": 1562.08, "start": 1561.92, "text": "learning"}, {"end": 1562.4, "start": 1562.08, "text": "model"}, {"end": 1562.84, "start": 1562.4, "text": "to"}, {"end": 1563.56, "start": 1562.84, "text": "decide"}, {"end": 1563.56, "start": 1563.56, "text": "on"}, {"end": 1563.6, "start": 1563.56, "text": "your"}, {"end": 1564.28, "start": 1563.6, "text": "premium"}, {"end": 1564.56, "start": 1564.28, "text": "and"}, {"end": 1564.84, "start": 1564.56, "text": "you're"}, {"end": 1565.12, "start": 1564.84, "text": "told"}, {"end": 1565.24, "start": 1565.12, "text": "you"}, {"end": 1565.48, "start": 1565.24, "text": "have"}, {"end": 1565.52, "start": 1565.48, "text": "to"}, {"end": 1565.76, "start": 1565.52, "text": "pay"}, {"end": 1566.12, "start": 1565.76, "text": "more"}, {"end": 1566.28, "start": 1566.12, "text": "than"}, {"end": 1566.44, "start": 1566.28, "text": "you've"}, {"end": 1566.48, "start": 1566.44, "text": "been"}, {"end": 1567.52, "start": 1566.48, "text": "paying."}, {"end": 1567.88, "start": 1567.52, "text": "You"}, {"end": 1568.08, "start": 1567.88, "text": "might"}, {"end": 1568.44, "start": 1568.08, "text": "take"}, {"end": 1568.76, "start": 1568.44, "text": "this"}, {"end": 1568.96, "start": 1568.76, "text": "very"}, {"end": 1569.72, "start": 1568.96, "text": "seriously"}, {"end": 1569.88, "start": 1569.72, "text": "if"}, {"end": 1569.96, "start": 1569.88, "text": "they"}, {"end": 1570.12, "start": 1569.96, "text": "were"}, {"end": 1570.48, "start": 1570.12, "text": "not"}, {"end": 1570.64, "start": 1570.48, "text": "able"}, {"end": 1570.8, "start": 1570.64, "text": "to"}, {"end": 1571.4, "start": 1570.8, "text": "explain"}, {"end": 1571.6, "start": 1571.4, "text": "what"}, {"end": 1572.12, "start": 1571.6, "text": "it"}, {"end": 1572.6, "start": 1572.12, "text": "is"}, {"end": 1572.88, "start": 1572.6, "text": "about"}, {"end": 1573.88, "start": 1572.88, "text": "you"}, {"end": 1574.36, "start": 1573.88, "text": "that"}, {"end": 1575.12, "start": 1574.36, "text": "makes"}, {"end": 1575.72, "start": 1575.12, "text": "you"}, {"end": 1575.92, "start": 1575.72, "text": "a"}, {"end": 1577.08, "start": 1575.92, "text": "higher"}, {"end": 1580.4, "start": 1577.08, "text": "risk."}, {"end": 1581.0, "start": 1580.4, "text": "Okay,"}, {"end": 1581.68, "start": 1581.0, "text": "so"}, {"end": 1582.28, "start": 1581.68, "text": "here's"}, {"end": 1582.72, "start": 1582.28, "text": "what"}, {"end": 1582.72, "start": 1582.72, "text": "I"}, {"end": 1582.72, "start": 1582.72, "text": "think"}, {"end": 1582.72, "start": 1582.72, "text": "is"}, {"end": 1582.72, "start": 1582.72, "text": "the"}, {"end": 1583.0, "start": 1582.72, "text": "proper"}, {"end": 1583.6, "start": 1583.0, "text": "role"}, {"end": 1584.08, "start": 1583.6, "text": "for"}, {"end": 1584.48, "start": 1584.08, "text": "the"}, {"end": 1585.16, "start": 1584.48, "text": "powerful"}, {"end": 1585.48, "start": 1585.16, "text": "machine"}, {"end": 1585.8, "start": 1585.48, "text": "learning"}, {"end": 1586.36, "start": 1585.8, "text": "algorithms"}, {"end": 1586.84, "start": 1586.36, "text": "that"}, {"end": 1587.08, "start": 1586.84, "text": "are"}, {"end": 1587.24, "start": 1587.08, "text": "so"}, {"end": 1588.36, "start": 1587.24, "text": "popular"}, {"end": 1588.96, "start": 1588.36, "text": "these"}, {"end": 1589.96, "start": 1588.96, "text": "days."}], "text": " a science company constructs a machine learning model to decide on your premium and you're told you have to pay more than you've been paying. You might take this very seriously if they were not able to explain what it is about you that makes you a higher risk. Okay, so here's what I think is the proper role for the powerful machine learning algorithms that are so popular these days."}, {"chunks": [{"end": 1591.12, "start": 1590.0, "text": "Obviously,"}, {"end": 1591.72, "start": 1591.12, "text": "your"}, {"end": 1592.48, "start": 1591.72, "text": "problem"}, {"end": 1595.12, "start": 1592.48, "text": "has"}, {"end": 1595.92, "start": 1595.12, "text": "to"}, {"end": 1596.4, "start": 1595.92, "text": "require"}, {"end": 1596.64, "start": 1596.4, "text": "a"}, {"end": 1596.68, "start": 1596.64, "text": "model"}, {"end": 1596.68, "start": 1596.68, "text": "of"}, {"end": 1597.04, "start": 1596.68, "text": "something."}, {"end": 1597.16, "start": 1597.04, "text": "And"}, {"end": 1597.16, "start": 1597.16, "text": "you"}, {"end": 1597.16, "start": 1597.16, "text": "really"}, {"end": 1597.2, "start": 1597.16, "text": "don't"}, {"end": 1597.24, "start": 1597.2, "text": "need"}, {"end": 1597.36, "start": 1597.24, "text": "to"}, {"end": 1597.8, "start": 1597.36, "text": "explain"}, {"end": 1598.0, "start": 1597.8, "text": "your"}, {"end": 1601.4, "start": 1598.0, "text": "results."}, {"end": 1601.64, "start": 1601.4, "text": "But"}, {"end": 1602.32, "start": 1601.64, "text": "there's"}, {"end": 1602.72, "start": 1602.32, "text": "another"}, {"end": 1603.32, "start": 1602.72, "text": "issue"}, {"end": 1603.32, "start": 1603.32, "text": "that"}, {"end": 1603.64, "start": 1603.32, "text": "people"}, {"end": 1604.12, "start": 1603.64, "text": "often"}, {"end": 1604.6, "start": 1604.12, "text": "forget"}, {"end": 1605.76, "start": 1604.6, "text": "about."}, {"end": 1606.28, "start": 1605.76, "text": "The"}, {"end": 1607.04, "start": 1606.28, "text": "problem"}, {"end": 1607.4, "start": 1607.04, "text": "needs"}, {"end": 1607.4, "start": 1607.4, "text": "to"}, {"end": 1607.44, "start": 1607.4, "text": "be"}, {"end": 1607.96, "start": 1607.44, "text": "one"}, {"end": 1608.04, "start": 1607.96, "text": "you"}, {"end": 1608.44, "start": 1608.04, "text": "don't"}, {"end": 1608.52, "start": 1608.44, "text": "really"}, {"end": 1609.0, "start": 1608.52, "text": "understand."}, {"end": 1610.36, "start": 1609.0, "text": "Just"}, {"end": 1610.52, "start": 1610.36, "text": "for"}, {"end": 1610.52, "start": 1610.52, "text": "as"}, {"end": 1610.88, "start": 1610.52, "text": "an"}, {"end": 1611.2, "start": 1610.88, "text": "example,"}, {"end": 1611.96, "start": 1611.2, "text": "around"}, {"end": 1611.96, "start": 1611.96, "text": "the"}, {"end": 1613.16, "start": 1611.96, "text": "turn"}, {"end": 1613.4, "start": 1613.16, "text": "of"}, {"end": 1614.04, "start": 1613.4, "text": "the"}, {"end": 1614.64, "start": 1614.04, "text": "millennium,"}, {"end": 1615.04, "start": 1614.64, "text": "a"}, {"end": 1615.4, "start": 1615.04, "text": "former"}, {"end": 1616.24, "start": 1615.4, "text": "Stanford"}, {"end": 1616.72, "start": 1616.24, "text": "student"}, {"end": 1617.12, "start": 1616.72, "text": "got"}, {"end": 1617.88, "start": 1617.12, "text": "rich"}, {"end": 1618.16, "start": 1617.88, "text": "from"}, {"end": 1618.16, "start": 1618.16, "text": "a"}, {"end": 1618.52, "start": 1618.16, "text": "database"}, {"end": 1618.96, "start": 1618.52, "text": "company"}, {"end": 1618.96, "start": 1618.96, "text": "he'd"}, {"end": 1619.36, "start": 1618.96, "text": "helped"}, {"end": 1619.6, "start": 1619.36, "text": "to"}, {"end": 1619.96, "start": 1619.6, "text": "found."}], "text": " Obviously, your problem has to require a model of something. And you really don't need to explain your results. But there's another issue that people often forget about. The problem needs to be one you don't really understand. Just for as an example, around the turn of the millennium, a former Stanford student got rich from a database company he'd helped to found."}, {"chunks": [{"end": 1620.2, "start": 1620.0, "text": "And"}, {"end": 1620.6, "start": 1620.2, "text": "he"}, {"end": 1621.28, "start": 1620.6, "text": "decided"}, {"end": 1621.44, "start": 1621.28, "text": "to"}, {"end": 1621.96, "start": 1621.44, "text": "invest"}, {"end": 1622.4, "start": 1621.96, "text": "some"}, {"end": 1622.64, "start": 1622.4, "text": "of"}, {"end": 1622.92, "start": 1622.64, "text": "the"}, {"end": 1623.08, "start": 1622.92, "text": "money"}, {"end": 1623.4, "start": 1623.08, "text": "building"}, {"end": 1623.44, "start": 1623.4, "text": "on"}, {"end": 1623.72, "start": 1623.44, "text": "a"}, {"end": 1624.12, "start": 1623.72, "text": "machine"}, {"end": 1624.48, "start": 1624.12, "text": "learning"}, {"end": 1625.28, "start": 1624.48, "text": "company."}, {"end": 1625.84, "start": 1625.28, "text": "He"}, {"end": 1626.48, "start": 1625.84, "text": "actually"}, {"end": 1627.24, "start": 1626.48, "text": "hired"}, {"end": 1627.44, "start": 1627.24, "text": "some"}, {"end": 1627.48, "start": 1627.44, "text": "of"}, {"end": 1627.48, "start": 1627.48, "text": "the"}, {"end": 1627.52, "start": 1627.48, "text": "top"}, {"end": 1627.88, "start": 1627.52, "text": "names"}, {"end": 1628.0, "start": 1627.88, "text": "in"}, {"end": 1628.04, "start": 1628.0, "text": "the"}, {"end": 1628.32, "start": 1628.04, "text": "field"}, {"end": 1628.4, "start": 1628.32, "text": "at"}, {"end": 1628.56, "start": 1628.4, "text": "the"}, {"end": 1629.32, "start": 1628.56, "text": "time."}, {"end": 1629.8, "start": 1629.32, "text": "They"}, {"end": 1630.52, "start": 1629.8, "text": "tried"}, {"end": 1630.8, "start": 1630.52, "text": "to"}, {"end": 1631.2, "start": 1630.8, "text": "build"}, {"end": 1631.92, "start": 1631.2, "text": "a"}, {"end": 1632.56, "start": 1631.92, "text": "system"}, {"end": 1632.72, "start": 1632.56, "text": "that"}, {"end": 1633.44, "start": 1632.72, "text": "would"}, {"end": 1633.72, "start": 1633.44, "text": "search"}, {"end": 1633.92, "start": 1633.72, "text": "the"}, {"end": 1634.24, "start": 1633.92, "text": "web"}, {"end": 1634.84, "start": 1634.24, "text": "for"}, {"end": 1635.68, "start": 1634.84, "text": "resumes"}, {"end": 1635.68, "start": 1635.68, "text": "and"}, {"end": 1635.84, "start": 1635.68, "text": "use"}, {"end": 1636.2, "start": 1635.84, "text": "them"}, {"end": 1636.2, "start": 1636.2, "text": "to"}, {"end": 1636.96, "start": 1636.2, "text": "provide"}, {"end": 1637.16, "start": 1636.96, "text": "a"}, {"end": 1637.8, "start": 1637.16, "text": "facility"}, {"end": 1638.72, "start": 1637.8, "text": "for"}, {"end": 1639.56, "start": 1638.72, "text": "employers"}, {"end": 1639.72, "start": 1639.56, "text": "to"}, {"end": 1640.04, "start": 1639.72, "text": "find"}, {"end": 1640.48, "start": 1640.04, "text": "prospects."}, {"end": 1640.64, "start": 1640.48, "text": "So"}, {"end": 1640.72, "start": 1640.64, "text": "they"}, {"end": 1641.28, "start": 1640.72, "text": "labeled"}, {"end": 1641.96, "start": 1641.28, "text": "a"}, {"end": 1642.8, "start": 1641.96, "text": "large"}, {"end": 1643.24, "start": 1642.8, "text": "sample"}, {"end": 1643.48, "start": 1643.24, "text": "of"}, {"end": 1644.76, "start": 1643.48, "text": "resumes"}, {"end": 1644.92, "start": 1644.76, "text": "and"}, {"end": 1645.0, "start": 1644.92, "text": "non"}, {"end": 1645.84, "start": 1645.0, "text": "resumes"}, {"end": 1646.48, "start": 1645.84, "text": "and"}, {"end": 1647.2, "start": 1646.48, "text": "built"}, {"end": 1647.68, "start": 1647.2, "text": "a"}, {"end": 1648.12, "start": 1647.68, "text": "model"}, {"end": 1648.44, "start": 1648.12, "text": "of"}, {"end": 1649.96, "start": 1648.44, "text": "resumes."}], "text": " And he decided to invest some of the money building on a machine learning company. He actually hired some of the top names in the field at the time. They tried to build a system that would search the web for resumes and use them to provide a facility for employers to find prospects. So they labeled a large sample of resumes and non resumes and built a model of resumes."}, {"chunks": [{"end": 1650.52, "start": 1650.0, "text": "was"}, {"end": 1650.88, "start": 1650.52, "text": "that"}, {"end": 1651.08, "start": 1650.88, "text": "they"}, {"end": 1651.44, "start": 1651.08, "text": "were"}, {"end": 1652.08, "start": 1651.44, "text": "never"}, {"end": 1652.2, "start": 1652.08, "text": "able"}, {"end": 1652.2, "start": 1652.2, "text": "to"}, {"end": 1652.2, "start": 1652.2, "text": "improve"}, {"end": 1652.2, "start": 1652.2, "text": "on"}, {"end": 1652.32, "start": 1652.2, "text": "the"}, {"end": 1653.08, "start": 1652.32, "text": "accuracy"}, {"end": 1653.4, "start": 1653.08, "text": "of"}, {"end": 1653.64, "start": 1653.4, "text": "simple"}, {"end": 1654.36, "start": 1653.64, "text": "approaches,"}, {"end": 1654.64, "start": 1654.36, "text": "say,"}, {"end": 1654.88, "start": 1654.64, "text": "where"}, {"end": 1655.04, "start": 1654.88, "text": "you"}, {"end": 1655.36, "start": 1655.04, "text": "look"}, {"end": 1655.72, "start": 1655.36, "text": "for"}, {"end": 1656.2, "start": 1655.72, "text": "well-known"}, {"end": 1657.08, "start": 1656.2, "text": "phrases"}, {"end": 1657.16, "start": 1657.08, "text": "that"}, {"end": 1657.6, "start": 1657.16, "text": "indicate"}, {"end": 1657.6, "start": 1657.6, "text": "a"}, {"end": 1657.72, "start": 1657.6, "text": "web"}, {"end": 1657.92, "start": 1657.72, "text": "page"}, {"end": 1658.4, "start": 1657.92, "text": "is"}, {"end": 1658.72, "start": 1658.4, "text": "likely"}, {"end": 1658.72, "start": 1658.72, "text": "to"}, {"end": 1658.8, "start": 1658.72, "text": "be"}, {"end": 1659.64, "start": 1658.8, "text": "a"}, {"end": 1660.48, "start": 1659.64, "text": "resume,"}, {"end": 1661.32, "start": 1660.48, "text": "such"}, {"end": 1661.6, "start": 1661.32, "text": "as"}, {"end": 1662.32, "start": 1661.6, "text": "previously"}, {"end": 1662.56, "start": 1662.32, "text": "held"}, {"end": 1663.44, "start": 1662.56, "text": "positions"}, {"end": 1663.64, "start": 1663.44, "text": "or"}, {"end": 1664.44, "start": 1663.64, "text": "resume."}, {"end": 1665.08, "start": 1664.44, "text": "Unfortunately,"}, {"end": 1666.36, "start": 1665.08, "text": "too"}, {"end": 1666.72, "start": 1666.36, "text": "many"}, {"end": 1667.52, "start": 1666.72, "text": "people"}, {"end": 1668.44, "start": 1667.52, "text": "understand"}, {"end": 1669.56, "start": 1668.44, "text": "what"}, {"end": 1669.96, "start": 1669.56, "text": "a"}, {"end": 1671.08, "start": 1669.96, "text": "resume"}, {"end": 1671.64, "start": 1671.08, "text": "looks"}, {"end": 1671.96, "start": 1671.64, "text": "like,"}, {"end": 1672.28, "start": 1671.96, "text": "and"}, {"end": 1673.4, "start": 1672.28, "text": "they"}, {"end": 1674.16, "start": 1673.4, "text": "were"}, {"end": 1674.84, "start": 1674.16, "text": "unable"}, {"end": 1674.96, "start": 1674.84, "text": "to"}, {"end": 1675.24, "start": 1674.96, "text": "compete"}, {"end": 1675.32, "start": 1675.24, "text": "with"}, {"end": 1675.4, "start": 1675.32, "text": "the"}, {"end": 1676.04, "start": 1675.4, "text": "simplest"}, {"end": 1676.8, "start": 1676.04, "text": "systems"}, {"end": 1677.08, "start": 1676.8, "text": "that"}, {"end": 1677.16, "start": 1677.08, "text": "were"}, {"end": 1677.76, "start": 1677.16, "text": "under"}, {"end": 1678.36, "start": 1677.76, "text": "development"}, {"end": 1678.48, "start": 1678.36, "text": "at"}, {"end": 1678.56, "start": 1678.48, "text": "the"}, {"end": 1678.88, "start": 1678.56, "text": "same"}, {"end": 1679.96, "start": 1678.88, "text": "time."}], "text": " was that they were never able to improve on the accuracy of simple approaches, say, where you look for well-known phrases that indicate a web page is likely to be a resume, such as previously held positions or resume. Unfortunately, too many people understand what a resume looks like, and they were unable to compete with the simplest systems that were under development at the same time."}, {"chunks": [{"end": 1680.76, "start": 1680.0, "text": "and"}, {"end": 1681.4, "start": 1680.76, "text": "do"}, {"end": 1681.76, "start": 1681.4, "text": "the"}, {"end": 1682.08, "start": 1681.76, "text": "same"}, {"end": 1686.92, "start": 1682.08, "text": "thing."}, {"end": 1687.8, "start": 1686.92, "text": "Okay,"}, {"end": 1688.76, "start": 1687.8, "text": "so,"}, {"end": 1690.72, "start": 1688.76, "text": "okay,"}, {"end": 1691.44, "start": 1690.72, "text": "I'm"}, {"end": 1692.64, "start": 1691.44, "text": "now"}, {"end": 1693.08, "start": 1692.64, "text": "going"}, {"end": 1693.4, "start": 1693.08, "text": "to"}, {"end": 1693.68, "start": 1693.4, "text": "introduce"}, {"end": 1693.88, "start": 1693.68, "text": "you"}, {"end": 1693.88, "start": 1693.88, "text": "to"}, {"end": 1694.12, "start": 1693.88, "text": "two"}, {"end": 1694.48, "start": 1694.12, "text": "of"}, {"end": 1694.8, "start": 1694.48, "text": "my"}, {"end": 1695.92, "start": 1694.8, "text": "favorite"}, {"end": 1696.16, "start": 1695.92, "text": "big"}, {"end": 1696.72, "start": 1696.16, "text": "data"}, {"end": 1697.44, "start": 1696.72, "text": "ideas"}, {"end": 1697.72, "start": 1697.44, "text": "that"}, {"end": 1698.04, "start": 1697.72, "text": "cannot"}, {"end": 1698.16, "start": 1698.04, "text": "be"}, {"end": 1699.16, "start": 1698.16, "text": "classified"}, {"end": 1699.44, "start": 1699.16, "text": "as"}, {"end": 1699.84, "start": 1699.44, "text": "machine"}, {"end": 1700.36, "start": 1699.84, "text": "learning"}, {"end": 1700.96, "start": 1700.36, "text": "or"}, {"end": 1703.2, "start": 1700.96, "text": "statistics."}, {"end": 1703.8, "start": 1703.2, "text": "Now,"}, {"end": 1704.56, "start": 1703.8, "text": "interestingly"}, {"end": 1704.84, "start": 1704.56, "text": "though,"}, {"end": 1704.88, "start": 1704.84, "text": "both"}, {"end": 1705.72, "start": 1704.88, "text": "require"}, {"end": 1706.4, "start": 1705.72, "text": "serious"}, {"end": 1706.64, "start": 1706.4, "text": "use"}, {"end": 1706.68, "start": 1706.64, "text": "of"}, {"end": 1707.44, "start": 1706.68, "text": "statistical"}, {"end": 1707.84, "start": 1707.44, "text": "ideas"}, {"end": 1707.88, "start": 1707.84, "text": "to"}, {"end": 1708.48, "start": 1707.88, "text": "prove"}, {"end": 1709.36, "start": 1708.48, "text": "that"}, {"end": 1709.68, "start": 1709.36, "text": "they"}, {"end": 1709.92, "start": 1709.68, "text": "work,"}, {"end": 1709.96, "start": 1709.92, "text": "okay?"}], "text": " and do the same thing. Okay, so, okay, I'm now going to introduce you to two of my favorite big data ideas that cannot be classified as machine learning or statistics. Now, interestingly though, both require serious use of statistical ideas to prove that they work, okay?"}, {"chunks": [{"end": 1710.08, "start": 1710.0, "text": "Both"}, {"end": 1710.72, "start": 1710.08, "text": "come"}, {"end": 1710.96, "start": 1710.72, "text": "out"}, {"end": 1711.16, "start": 1710.96, "text": "of"}, {"end": 1711.88, "start": 1711.16, "text": "mainstream"}, {"end": 1712.56, "start": 1711.88, "text": "computer"}, {"end": 1712.88, "start": 1712.56, "text": "science"}, {"end": 1713.4, "start": 1712.88, "text": "research,"}, {"end": 1713.72, "start": 1713.4, "text": "not"}, {"end": 1714.52, "start": 1713.72, "text": "statistics"}, {"end": 1714.52, "start": 1714.52, "text": "or"}, {"end": 1714.8, "start": 1714.52, "text": "machine"}, {"end": 1714.8, "start": 1714.8, "text": "learning."}, {"end": 1714.84, "start": 1714.8, "text": "Okay."}, {"end": 1715.12, "start": 1714.84, "text": "So"}, {"end": 1715.12, "start": 1715.12, "text": "we're"}, {"end": 1715.6, "start": 1715.12, "text": "going"}, {"end": 1715.72, "start": 1715.6, "text": "to"}, {"end": 1716.0, "start": 1715.72, "text": "get"}, {"end": 1716.48, "start": 1716.0, "text": "a"}, {"end": 1717.56, "start": 1716.48, "text": "very"}, {"end": 1718.56, "start": 1717.56, "text": "simple"}, {"end": 1720.0, "start": 1718.56, "text": "introduction"}, {"end": 1720.4, "start": 1720.0, "text": "to"}, {"end": 1721.12, "start": 1720.4, "text": "locality"}, {"end": 1721.96, "start": 1721.12, "text": "sensitive"}, {"end": 1722.44, "start": 1721.96, "text": "hashing"}, {"end": 1722.76, "start": 1722.44, "text": "or"}, {"end": 1723.36, "start": 1722.76, "text": "LSH."}, {"end": 1723.48, "start": 1723.36, "text": "And"}, {"end": 1724.36, "start": 1723.48, "text": "we're"}, {"end": 1724.68, "start": 1724.36, "text": "going"}, {"end": 1725.0, "start": 1724.68, "text": "to"}, {"end": 1725.08, "start": 1725.0, "text": "meet"}, {"end": 1726.0, "start": 1725.08, "text": "an"}, {"end": 1726.44, "start": 1726.0, "text": "algorithm"}, {"end": 1726.44, "start": 1726.44, "text": "due"}, {"end": 1726.68, "start": 1726.44, "text": "to"}, {"end": 1727.64, "start": 1726.68, "text": "Flagellet"}, {"end": 1728.56, "start": 1727.64, "text": "and"}, {"end": 1728.96, "start": 1728.56, "text": "Martin"}, {"end": 1728.96, "start": 1728.96, "text": "for"}, {"end": 1729.88, "start": 1728.96, "text": "counting"}, {"end": 1729.88, "start": 1729.88, "text": "the"}, {"end": 1729.88, "start": 1729.88, "text": "number"}, {"end": 1729.92, "start": 1729.88, "text": "of"}, {"end": 1731.16, "start": 1729.92, "text": "distinct"}, {"end": 1731.56, "start": 1731.16, "text": "elements"}, {"end": 1731.68, "start": 1731.56, "text": "in"}, {"end": 1734.04, "start": 1731.68, "text": "a"}, {"end": 1734.44, "start": 1734.04, "text": "list."}, {"end": 1734.96, "start": 1734.44, "text": "But"}, {"end": 1735.48, "start": 1734.96, "text": "before"}, {"end": 1735.84, "start": 1735.48, "text": "going"}, {"end": 1736.04, "start": 1735.84, "text": "on,"}, {"end": 1736.56, "start": 1736.04, "text": "I"}, {"end": 1736.8, "start": 1736.56, "text": "want"}, {"end": 1736.8, "start": 1736.8, "text": "to"}, {"end": 1737.2, "start": 1736.8, "text": "plug"}, {"end": 1737.44, "start": 1737.2, "text": "my"}, {"end": 1737.6, "start": 1737.44, "text": "book."}, {"end": 1737.8, "start": 1737.6, "text": "I"}, {"end": 1737.8, "start": 1737.8, "text": "don't"}, {"end": 1737.8, "start": 1737.8, "text": "feel"}, {"end": 1737.8, "start": 1737.8, "text": "guilty"}, {"end": 1738.36, "start": 1737.8, "text": "about"}, {"end": 1738.8, "start": 1738.36, "text": "this"}, {"end": 1739.36, "start": 1738.8, "text": "because"}, {"end": 1739.96, "start": 1739.36, "text": "it's"}], "text": " Both come out of mainstream computer science research, not statistics or machine learning. Okay. So we're going to get a very simple introduction to locality sensitive hashing or LSH. And we're going to meet an algorithm due to Flagellet and Martin for counting the number of distinct elements in a list. But before going on, I want to plug my book. I don't feel guilty about this because it's"}, {"chunks": [{"end": 1740.36, "start": 1740.0, "text": "It's"}, {"end": 1740.4, "start": 1740.36, "text": "a"}, {"end": 1741.08, "start": 1740.4, "text": "freebie."}, {"end": 1741.4, "start": 1741.08, "text": "You"}, {"end": 1742.64, "start": 1741.4, "text": "can"}, {"end": 1744.32, "start": 1742.64, "text": "learn"}, {"end": 1744.64, "start": 1744.32, "text": "much"}, {"end": 1744.92, "start": 1744.64, "text": "more"}, {"end": 1745.24, "start": 1744.92, "text": "about"}, {"end": 1745.6, "start": 1745.24, "text": "these"}, {"end": 1746.2, "start": 1745.6, "text": "two"}, {"end": 1746.76, "start": 1746.2, "text": "topics"}, {"end": 1746.84, "start": 1746.76, "text": "and"}, {"end": 1747.4, "start": 1746.84, "text": "others"}, {"end": 1748.08, "start": 1747.4, "text": "involving"}, {"end": 1748.48, "start": 1748.08, "text": "large"}, {"end": 1748.96, "start": 1748.48, "text": "scale"}, {"end": 1749.92, "start": 1748.96, "text": "data."}, {"end": 1750.4, "start": 1749.92, "text": "And"}, {"end": 1750.64, "start": 1750.4, "text": "as"}, {"end": 1750.84, "start": 1750.64, "text": "I"}, {"end": 1751.16, "start": 1750.84, "text": "said,"}, {"end": 1751.36, "start": 1751.16, "text": "the"}, {"end": 1751.72, "start": 1751.36, "text": "best"}, {"end": 1752.0, "start": 1751.72, "text": "thing"}, {"end": 1752.64, "start": 1752.0, "text": "about"}, {"end": 1752.8, "start": 1752.64, "text": "it"}, {"end": 1752.8, "start": 1752.8, "text": "is"}, {"end": 1753.08, "start": 1752.8, "text": "that"}, {"end": 1753.08, "start": 1753.08, "text": "although"}, {"end": 1753.28, "start": 1753.08, "text": "it's"}, {"end": 1753.84, "start": 1753.28, "text": "published"}, {"end": 1753.84, "start": 1753.84, "text": "by"}, {"end": 1754.32, "start": 1753.84, "text": "Cambridge"}, {"end": 1755.04, "start": 1754.32, "text": "University"}, {"end": 1755.44, "start": 1755.04, "text": "Press,"}, {"end": 1755.6, "start": 1755.44, "text": "and"}, {"end": 1755.6, "start": 1755.6, "text": "they"}, {"end": 1755.76, "start": 1755.6, "text": "will"}, {"end": 1756.2, "start": 1755.76, "text": "sell"}, {"end": 1756.52, "start": 1756.2, "text": "you"}, {"end": 1756.68, "start": 1756.52, "text": "a"}, {"end": 1756.68, "start": 1756.68, "text": "copy,"}, {"end": 1756.76, "start": 1756.68, "text": "they"}, {"end": 1757.2, "start": 1756.76, "text": "let"}, {"end": 1757.6, "start": 1757.2, "text": "us"}, {"end": 1758.12, "start": 1757.6, "text": "offer"}, {"end": 1758.6, "start": 1758.12, "text": "free"}, {"end": 1759.68, "start": 1758.6, "text": "downloads,"}, {"end": 1760.04, "start": 1759.68, "text": "which"}, {"end": 1760.08, "start": 1760.04, "text": "we"}, {"end": 1760.52, "start": 1760.08, "text": "do"}, {"end": 1761.12, "start": 1760.52, "text": "at"}, {"end": 1761.36, "start": 1761.12, "text": "the"}, {"end": 1761.64, "start": 1761.36, "text": "site"}, {"end": 1763.36, "start": 1761.64, "text": "www.mmds,"}, {"end": 1763.64, "start": 1763.36, "text": "standing"}, {"end": 1763.8, "start": 1763.64, "text": "for,"}, {"end": 1763.84, "start": 1763.8, "text": "of"}, {"end": 1764.28, "start": 1763.84, "text": "course,"}, {"end": 1764.68, "start": 1764.28, "text": "mining"}, {"end": 1764.68, "start": 1764.68, "text": "of"}, {"end": 1765.08, "start": 1764.68, "text": "massive"}, {"end": 1766.52, "start": 1765.08, "text": "datasets,"}, {"end": 1769.96, "start": 1766.52, "text": ".org."}], "text": " It's a freebie. You can learn much more about these two topics and others involving large scale data. And as I said, the best thing about it is that although it's published by Cambridge University Press, and they will sell you a copy, they let us offer free downloads, which we do at the site www.mmds, standing for, of course, mining of massive datasets, .org."}, {"chunks": [{"end": 1772.28, "start": 1770.0, "text": "Now,"}, {"end": 1772.52, "start": 1772.28, "text": "the"}, {"end": 1773.0, "start": 1772.52, "text": "purpose"}, {"end": 1773.2, "start": 1773.0, "text": "of"}, {"end": 1774.0, "start": 1773.2, "text": "LSH"}, {"end": 1774.4, "start": 1774.0, "text": "then"}, {"end": 1774.48, "start": 1774.4, "text": "is"}, {"end": 1775.12, "start": 1774.48, "text": "to"}, {"end": 1775.76, "start": 1775.12, "text": "enable"}, {"end": 1776.24, "start": 1775.76, "text": "us"}, {"end": 1776.84, "start": 1776.24, "text": "to"}, {"end": 1777.52, "start": 1776.84, "text": "find"}, {"end": 1778.04, "start": 1777.52, "text": "pairs"}, {"end": 1778.4, "start": 1778.04, "text": "of"}, {"end": 1779.0, "start": 1778.4, "text": "items"}, {"end": 1779.56, "start": 1779.0, "text": "in"}, {"end": 1779.96, "start": 1779.56, "text": "a"}, {"end": 1780.48, "start": 1779.96, "text": "large"}, {"end": 1781.24, "start": 1780.48, "text": "set"}, {"end": 1781.76, "start": 1781.24, "text": "that"}, {"end": 1782.64, "start": 1781.76, "text": "are"}, {"end": 1783.12, "start": 1782.64, "text": "in"}, {"end": 1783.56, "start": 1783.12, "text": "some"}, {"end": 1785.24, "start": 1783.56, "text": "sense"}, {"end": 1785.64, "start": 1785.24, "text": "similar,"}, {"end": 1785.96, "start": 1785.64, "text": "without"}, {"end": 1786.52, "start": 1785.96, "text": "the"}, {"end": 1787.28, "start": 1786.52, "text": "quadratic"}, {"end": 1788.6, "start": 1787.28, "text": "process"}, {"end": 1789.04, "start": 1788.6, "text": "of"}, {"end": 1789.96, "start": 1789.04, "text": "evaluating"}, {"end": 1790.12, "start": 1789.96, "text": "each"}, {"end": 1790.4, "start": 1790.12, "text": "pair"}, {"end": 1791.2, "start": 1790.4, "text": "separately"}, {"end": 1791.28, "start": 1791.2, "text": "and"}, {"end": 1791.72, "start": 1791.28, "text": "deciding"}, {"end": 1791.8, "start": 1791.72, "text": "on"}, {"end": 1792.0, "start": 1791.8, "text": "the"}, {"end": 1792.76, "start": 1792.0, "text": "similarity"}, {"end": 1792.88, "start": 1792.76, "text": "of"}, {"end": 1793.0, "start": 1792.88, "text": "each"}, {"end": 1793.4, "start": 1793.0, "text": "pair"}, {"end": 1793.6, "start": 1793.4, "text": "separately."}, {"end": 1795.68, "start": 1793.6, "text": "Now,"}, {"end": 1796.12, "start": 1795.68, "text": "I'm"}, {"end": 1796.96, "start": 1796.12, "text": "just"}, {"end": 1797.48, "start": 1796.96, "text": "gonna"}, {"end": 1797.92, "start": 1797.48, "text": "give"}, {"end": 1798.4, "start": 1797.92, "text": "you"}, {"end": 1799.08, "start": 1798.4, "text": "an"}, {"end": 1799.96, "start": 1799.08, "text": "example,"}], "text": " Now, the purpose of LSH then is to enable us to find pairs of items in a large set that are in some sense similar, without the quadratic process of evaluating each pair separately and deciding on the similarity of each pair separately. Now, I'm just gonna give you an example,"}, {"chunks": [{"end": 1800.52, "start": 1800.0, "text": "I'm"}, {"end": 1800.96, "start": 1800.52, "text": "going"}, {"end": 1801.16, "start": 1800.96, "text": "to"}, {"end": 1801.48, "start": 1801.16, "text": "focus"}, {"end": 1801.48, "start": 1801.48, "text": "on"}, {"end": 1801.68, "start": 1801.48, "text": "a"}, {"end": 1801.84, "start": 1801.68, "text": "very"}, {"end": 1802.68, "start": 1801.84, "text": "particular"}, {"end": 1803.36, "start": 1802.68, "text": "application"}, {"end": 1803.84, "start": 1803.36, "text": "called"}, {"end": 1804.04, "start": 1803.84, "text": "entity"}, {"end": 1804.6, "start": 1804.04, "text": "resolution."}, {"end": 1804.76, "start": 1804.6, "text": "In"}, {"end": 1805.32, "start": 1804.76, "text": "this"}, {"end": 1805.88, "start": 1805.32, "text": "sort"}, {"end": 1806.6, "start": 1805.88, "text": "of"}, {"end": 1806.88, "start": 1806.6, "text": "problem,"}, {"end": 1806.92, "start": 1806.88, "text": "we"}, {"end": 1807.48, "start": 1806.92, "text": "have"}, {"end": 1807.88, "start": 1807.48, "text": "a"}, {"end": 1808.36, "start": 1807.88, "text": "large"}, {"end": 1808.76, "start": 1808.36, "text": "set"}, {"end": 1809.32, "start": 1808.76, "text": "of"}, {"end": 1810.08, "start": 1809.32, "text": "records"}, {"end": 1810.28, "start": 1810.08, "text": "that"}, {"end": 1810.8, "start": 1810.28, "text": "represent"}, {"end": 1811.36, "start": 1810.8, "text": "entities."}, {"end": 1811.48, "start": 1811.36, "text": "And"}, {"end": 1811.52, "start": 1811.48, "text": "I'm"}, {"end": 1811.76, "start": 1811.52, "text": "going"}, {"end": 1811.92, "start": 1811.76, "text": "to"}, {"end": 1812.2, "start": 1811.92, "text": "assume"}, {"end": 1812.84, "start": 1812.2, "text": "that"}, {"end": 1813.28, "start": 1812.84, "text": "entities"}, {"end": 1813.28, "start": 1813.28, "text": "are"}, {"end": 1813.6, "start": 1813.28, "text": "people,"}, {"end": 1814.4, "start": 1813.6, "text": "although"}, {"end": 1815.24, "start": 1814.4, "text": "obviously"}, {"end": 1815.8, "start": 1815.24, "text": "they"}, {"end": 1816.44, "start": 1815.8, "text": "don't"}, {"end": 1817.16, "start": 1816.44, "text": "have"}, {"end": 1817.88, "start": 1817.16, "text": "to"}, {"end": 1818.52, "start": 1817.88, "text": "be."}, {"end": 1818.92, "start": 1818.52, "text": "The"}, {"end": 1819.48, "start": 1818.92, "text": "records"}, {"end": 1819.76, "start": 1819.48, "text": "will"}, {"end": 1820.24, "start": 1819.76, "text": "have"}, {"end": 1820.56, "start": 1820.24, "text": "some"}, {"end": 1821.36, "start": 1820.56, "text": "fields"}, {"end": 1821.4, "start": 1821.36, "text": "like"}, {"end": 1821.92, "start": 1821.4, "text": "say"}, {"end": 1822.36, "start": 1821.92, "text": "name"}, {"end": 1822.6, "start": 1822.36, "text": "and"}, {"end": 1822.84, "start": 1822.6, "text": "phone."}, {"end": 1822.92, "start": 1822.84, "text": "And"}, {"end": 1822.92, "start": 1822.92, "text": "our"}, {"end": 1823.2, "start": 1822.92, "text": "job"}, {"end": 1823.32, "start": 1823.2, "text": "is"}, {"end": 1823.6, "start": 1823.32, "text": "to"}, {"end": 1824.8, "start": 1823.6, "text": "find"}, {"end": 1825.56, "start": 1824.8, "text": "records"}, {"end": 1827.04, "start": 1825.56, "text": "that"}, {"end": 1828.24, "start": 1827.04, "text": "represent"}, {"end": 1828.48, "start": 1828.24, "text": "the"}, {"end": 1828.8, "start": 1828.48, "text": "same"}, {"end": 1829.96, "start": 1828.8, "text": "person."}], "text": " I'm going to focus on a very particular application called entity resolution. In this sort of problem, we have a large set of records that represent entities. And I'm going to assume that entities are people, although obviously they don't have to be. The records will have some fields like say name and phone. And our job is to find records that represent the same person."}, {"chunks": [{"end": 1830.76, "start": 1830.0, "text": "For"}, {"end": 1831.56, "start": 1830.76, "text": "example,"}, {"end": 1831.76, "start": 1831.56, "text": "a"}, {"end": 1831.8, "start": 1831.76, "text": "credit"}, {"end": 1832.2, "start": 1831.8, "text": "rating"}, {"end": 1832.88, "start": 1832.2, "text": "agency"}, {"end": 1833.4, "start": 1832.88, "text": "gets"}, {"end": 1833.68, "start": 1833.4, "text": "input"}, {"end": 1834.2, "start": 1833.68, "text": "records"}, {"end": 1834.28, "start": 1834.2, "text": "from"}, {"end": 1834.52, "start": 1834.28, "text": "many"}, {"end": 1834.84, "start": 1834.52, "text": "different"}, {"end": 1835.32, "start": 1834.84, "text": "sources,"}, {"end": 1836.08, "start": 1835.32, "text": "such"}, {"end": 1836.8, "start": 1836.08, "text": "as"}, {"end": 1837.2, "start": 1836.8, "text": "credit"}, {"end": 1837.56, "start": 1837.2, "text": "card"}, {"end": 1839.08, "start": 1837.56, "text": "transactions,"}, {"end": 1839.32, "start": 1839.08, "text": "bank"}, {"end": 1839.96, "start": 1839.32, "text": "records,"}, {"end": 1840.32, "start": 1839.96, "text": "and"}, {"end": 1840.72, "start": 1840.32, "text": "needs"}, {"end": 1841.12, "start": 1840.72, "text": "to"}, {"end": 1841.72, "start": 1841.12, "text": "determine"}, {"end": 1842.12, "start": 1841.72, "text": "which"}, {"end": 1842.84, "start": 1842.12, "text": "records"}, {"end": 1843.52, "start": 1842.84, "text": "refer"}, {"end": 1843.76, "start": 1843.52, "text": "to"}, {"end": 1844.16, "start": 1843.76, "text": "the"}, {"end": 1844.56, "start": 1844.16, "text": "same"}, {"end": 1845.04, "start": 1844.56, "text": "person."}, {"end": 1846.88, "start": 1845.04, "text": "Okay."}, {"end": 1847.24, "start": 1846.88, "text": "Now,"}, {"end": 1847.4, "start": 1847.24, "text": "the"}, {"end": 1848.12, "start": 1847.4, "text": "problem"}, {"end": 1848.88, "start": 1848.12, "text": "can"}, {"end": 1848.88, "start": 1848.88, "text": "be"}, {"end": 1849.56, "start": 1848.88, "text": "tricky"}, {"end": 1849.84, "start": 1849.56, "text": "because"}, {"end": 1850.08, "start": 1849.84, "text": "the"}, {"end": 1850.52, "start": 1850.08, "text": "records"}, {"end": 1850.72, "start": 1850.52, "text": "can"}, {"end": 1851.2, "start": 1850.72, "text": "contain"}, {"end": 1851.92, "start": 1851.2, "text": "errors,"}, {"end": 1853.04, "start": 1851.92, "text": "misspellings,"}, {"end": 1853.52, "start": 1853.04, "text": "or"}, {"end": 1854.84, "start": 1853.52, "text": "one"}, {"end": 1854.92, "start": 1854.84, "text": "might"}, {"end": 1855.32, "start": 1854.92, "text": "have"}, {"end": 1855.72, "start": 1855.32, "text": "your"}, {"end": 1856.48, "start": 1855.72, "text": "landline"}, {"end": 1857.08, "start": 1856.48, "text": "phone,"}, {"end": 1857.08, "start": 1857.08, "text": "the"}, {"end": 1857.08, "start": 1857.08, "text": "other"}, {"end": 1857.84, "start": 1857.08, "text": "your"}, {"end": 1858.36, "start": 1857.84, "text": "cell"}, {"end": 1858.8, "start": 1858.36, "text": "phone,"}, {"end": 1859.36, "start": 1858.8, "text": "so"}, {"end": 1859.4, "start": 1859.36, "text": "you"}, {"end": 1859.56, "start": 1859.4, "text": "don't"}, {"end": 1859.72, "start": 1859.56, "text": "look"}, {"end": 1859.72, "start": 1859.72, "text": "like"}, {"end": 1859.72, "start": 1859.72, "text": "the"}, {"end": 1859.72, "start": 1859.72, "text": "same"}, {"end": 1859.96, "start": 1859.72, "text": "person."}], "text": " For example, a credit rating agency gets input records from many different sources, such as credit card transactions, bank records, and needs to determine which records refer to the same person. Okay. Now, the problem can be tricky because the records can contain errors, misspellings, or one might have your landline phone, the other your cell phone, so you don't look like the same person."}, {"chunks": [{"end": 1860.12, "start": 1860.0, "text": "in"}, {"end": 1860.6, "start": 1860.12, "text": "person"}, {"end": 1860.84, "start": 1860.6, "text": "and"}, {"end": 1861.2, "start": 1860.84, "text": "so"}, {"end": 1861.56, "start": 1861.2, "text": "on."}, {"end": 1862.44, "start": 1861.56, "text": "And"}, {"end": 1863.68, "start": 1862.44, "text": "looking"}, {"end": 1864.12, "start": 1863.68, "text": "at"}, {"end": 1864.2, "start": 1864.12, "text": "all"}, {"end": 1865.68, "start": 1864.2, "text": "pairs"}, {"end": 1866.24, "start": 1865.68, "text": "can"}, {"end": 1866.88, "start": 1866.24, "text": "be"}, {"end": 1867.28, "start": 1866.88, "text": "painful."}, {"end": 1867.84, "start": 1867.28, "text": "If"}, {"end": 1868.32, "start": 1867.84, "text": "I"}, {"end": 1869.04, "start": 1868.32, "text": "have"}, {"end": 1869.2, "start": 1869.04, "text": "even"}, {"end": 1869.48, "start": 1869.2, "text": "a"}, {"end": 1870.04, "start": 1869.48, "text": "million"}, {"end": 1871.72, "start": 1870.04, "text": "records,"}, {"end": 1872.2, "start": 1871.72, "text": "which"}, {"end": 1874.6, "start": 1872.2, "text": "of"}, {"end": 1875.2, "start": 1874.6, "text": "course"}, {"end": 1875.44, "start": 1875.2, "text": "is"}, {"end": 1875.84, "start": 1875.44, "text": "not"}, {"end": 1876.12, "start": 1875.84, "text": "a"}, {"end": 1876.44, "start": 1876.12, "text": "very"}, {"end": 1876.84, "start": 1876.44, "text": "big"}, {"end": 1877.24, "start": 1876.84, "text": "set"}, {"end": 1877.52, "start": 1877.24, "text": "in"}, {"end": 1878.24, "start": 1877.52, "text": "comparison"}, {"end": 1879.08, "start": 1878.24, "text": "to"}, {"end": 1879.4, "start": 1879.08, "text": "say"}, {"end": 1880.44, "start": 1879.4, "text": "credit"}, {"end": 1880.76, "start": 1880.44, "text": "card"}, {"end": 1881.76, "start": 1880.76, "text": "transaction"}, {"end": 1882.6, "start": 1881.76, "text": "records,"}, {"end": 1883.2, "start": 1882.6, "text": "that"}, {"end": 1883.96, "start": 1883.2, "text": "implies"}, {"end": 1884.36, "start": 1883.96, "text": "a"}, {"end": 1884.72, "start": 1884.36, "text": "half"}, {"end": 1884.8, "start": 1884.72, "text": "a"}, {"end": 1885.52, "start": 1884.8, "text": "trillion"}, {"end": 1889.96, "start": 1885.52, "text": "pairs."}], "text": " in person and so on. And looking at all pairs can be painful. If I have even a million records, which of course is not a very big set in comparison to say credit card transaction records, that implies a half a trillion pairs."}, {"chunks": [{"end": 1890.4, "start": 1890.0, "text": "Now,"}, {"end": 1891.12, "start": 1890.4, "text": "this"}, {"end": 1891.84, "start": 1891.12, "text": "problem"}, {"end": 1892.28, "start": 1891.84, "text": "is"}, {"end": 1892.96, "start": 1892.28, "text": "not"}, {"end": 1893.0, "start": 1892.96, "text": "one"}, {"end": 1893.36, "start": 1893.0, "text": "that"}, {"end": 1893.36, "start": 1893.36, "text": "can"}, {"end": 1893.4, "start": 1893.36, "text": "be"}, {"end": 1893.88, "start": 1893.4, "text": "solved"}, {"end": 1894.08, "start": 1893.88, "text": "by"}, {"end": 1894.44, "start": 1894.08, "text": "just"}, {"end": 1894.92, "start": 1894.44, "text": "finding"}, {"end": 1895.32, "start": 1894.92, "text": "a"}, {"end": 1895.68, "start": 1895.32, "text": "model."}, {"end": 1896.08, "start": 1895.68, "text": "It's"}, {"end": 1896.6, "start": 1896.08, "text": "true,"}, {"end": 1897.28, "start": 1896.6, "text": "actually,"}, {"end": 1897.44, "start": 1897.28, "text": "you"}, {"end": 1897.64, "start": 1897.44, "text": "can"}, {"end": 1897.96, "start": 1897.64, "text": "use"}, {"end": 1898.28, "start": 1897.96, "text": "machine"}, {"end": 1898.32, "start": 1898.28, "text": "learning"}, {"end": 1898.84, "start": 1898.32, "text": "to"}, {"end": 1899.36, "start": 1898.84, "text": "build"}, {"end": 1899.76, "start": 1899.36, "text": "a"}, {"end": 1900.2, "start": 1899.76, "text": "model"}, {"end": 1900.96, "start": 1900.2, "text": "of"}, {"end": 1901.48, "start": 1900.96, "text": "when"}, {"end": 1902.56, "start": 1901.48, "text": "records"}, {"end": 1902.8, "start": 1902.56, "text": "are"}, {"end": 1902.88, "start": 1902.8, "text": "similar"}, {"end": 1902.88, "start": 1902.88, "text": "enough"}, {"end": 1903.04, "start": 1902.88, "text": "to"}, {"end": 1903.16, "start": 1903.04, "text": "be"}, {"end": 1903.4, "start": 1903.16, "text": "considered"}, {"end": 1903.68, "start": 1903.4, "text": "the"}, {"end": 1904.0, "start": 1903.68, "text": "same"}, {"end": 1904.44, "start": 1904.0, "text": "person,"}, {"end": 1904.72, "start": 1904.44, "text": "but"}, {"end": 1905.0, "start": 1904.72, "text": "that's"}, {"end": 1905.52, "start": 1905.0, "text": "not"}, {"end": 1905.88, "start": 1905.52, "text": "the"}, {"end": 1906.48, "start": 1905.88, "text": "problem."}, {"end": 1907.04, "start": 1906.48, "text": "It's"}, {"end": 1907.24, "start": 1907.04, "text": "not"}, {"end": 1907.6, "start": 1907.24, "text": "the"}, {"end": 1908.76, "start": 1907.6, "text": "bottleneck."}, {"end": 1908.84, "start": 1908.76, "text": "The"}, {"end": 1909.36, "start": 1908.84, "text": "problem"}, {"end": 1909.56, "start": 1909.36, "text": "is"}, {"end": 1909.72, "start": 1909.56, "text": "you"}, {"end": 1910.44, "start": 1909.72, "text": "can't"}, {"end": 1910.92, "start": 1910.44, "text": "afford"}, {"end": 1910.96, "start": 1910.92, "text": "to"}, {"end": 1911.44, "start": 1910.96, "text": "look"}, {"end": 1911.8, "start": 1911.44, "text": "at"}, {"end": 1911.92, "start": 1911.8, "text": "each"}, {"end": 1912.24, "start": 1911.92, "text": "pair"}, {"end": 1912.32, "start": 1912.24, "text": "of"}, {"end": 1912.96, "start": 1912.32, "text": "records"}, {"end": 1913.84, "start": 1912.96, "text": "and"}, {"end": 1914.12, "start": 1913.84, "text": "make"}, {"end": 1914.68, "start": 1914.12, "text": "the"}, {"end": 1915.04, "start": 1914.68, "text": "same"}, {"end": 1915.56, "start": 1915.04, "text": "person"}, {"end": 1915.72, "start": 1915.56, "text": "or"}, {"end": 1916.04, "start": 1915.72, "text": "different"}, {"end": 1916.64, "start": 1916.04, "text": "person"}, {"end": 1919.96, "start": 1916.64, "text": "decision."}], "text": " Now, this problem is not one that can be solved by just finding a model. It's true, actually, you can use machine learning to build a model of when records are similar enough to be considered the same person, but that's not the problem. It's not the bottleneck. The problem is you can't afford to look at each pair of records and make the same person or different person decision."}, {"chunks": [{"end": 1923.92, "start": 1920.0, "text": "Now,"}, {"end": 1924.52, "start": 1923.92, "text": "okay,"}, {"end": 1925.04, "start": 1924.52, "text": "so"}, {"end": 1925.28, "start": 1925.04, "text": "the"}, {"end": 1925.72, "start": 1925.28, "text": "cool"}, {"end": 1926.56, "start": 1925.72, "text": "thing"}, {"end": 1927.16, "start": 1926.56, "text": "about"}, {"end": 1927.96, "start": 1927.16, "text": "locality"}, {"end": 1928.48, "start": 1927.96, "text": "sensitive"}, {"end": 1929.16, "start": 1928.48, "text": "hashing"}, {"end": 1929.36, "start": 1929.16, "text": "is"}, {"end": 1929.72, "start": 1929.36, "text": "that"}, {"end": 1929.96, "start": 1929.72, "text": "if"}, {"end": 1930.72, "start": 1929.96, "text": "you're"}, {"end": 1931.04, "start": 1930.72, "text": "willing"}, {"end": 1931.68, "start": 1931.04, "text": "to"}, {"end": 1932.2, "start": 1931.68, "text": "accept"}, {"end": 1932.2, "start": 1932.2, "text": "a"}, {"end": 1932.44, "start": 1932.2, "text": "few"}, {"end": 1932.84, "start": 1932.44, "text": "false"}, {"end": 1933.44, "start": 1932.84, "text": "negatives,"}, {"end": 1933.6, "start": 1933.44, "text": "in"}, {"end": 1933.64, "start": 1933.6, "text": "our"}, {"end": 1934.64, "start": 1933.64, "text": "example,"}, {"end": 1935.28, "start": 1934.64, "text": "that"}, {"end": 1935.56, "start": 1935.28, "text": "would"}, {"end": 1935.76, "start": 1935.56, "text": "mean"}, {"end": 1936.16, "start": 1935.76, "text": "missing"}, {"end": 1936.28, "start": 1936.16, "text": "a"}, {"end": 1936.52, "start": 1936.28, "text": "few"}, {"end": 1936.88, "start": 1936.52, "text": "pairs"}, {"end": 1936.92, "start": 1936.88, "text": "of"}, {"end": 1937.68, "start": 1936.92, "text": "records"}, {"end": 1938.0, "start": 1937.68, "text": "that"}, {"end": 1938.32, "start": 1938.0, "text": "do"}, {"end": 1939.0, "start": 1938.32, "text": "represent"}, {"end": 1939.08, "start": 1939.0, "text": "the"}, {"end": 1939.44, "start": 1939.08, "text": "same"}, {"end": 1939.84, "start": 1939.44, "text": "person,"}, {"end": 1939.84, "start": 1939.84, "text": "then"}, {"end": 1939.84, "start": 1939.84, "text": "you"}, {"end": 1939.92, "start": 1939.84, "text": "can"}, {"end": 1941.44, "start": 1939.92, "text": "cut"}, {"end": 1941.88, "start": 1941.44, "text": "way"}, {"end": 1942.64, "start": 1941.88, "text": "down"}, {"end": 1943.08, "start": 1942.64, "text": "on"}, {"end": 1943.32, "start": 1943.08, "text": "the"}, {"end": 1943.44, "start": 1943.32, "text": "number"}, {"end": 1943.68, "start": 1943.44, "text": "of"}, {"end": 1944.24, "start": 1943.68, "text": "pairs"}, {"end": 1944.36, "start": 1944.24, "text": "you"}, {"end": 1945.2, "start": 1944.36, "text": "actually"}, {"end": 1945.28, "start": 1945.2, "text": "have"}, {"end": 1945.72, "start": 1945.28, "text": "to"}, {"end": 1945.96, "start": 1945.72, "text": "evaluate"}, {"end": 1946.48, "start": 1945.96, "text": "for"}, {"end": 1948.44, "start": 1946.48, "text": "similarity."}, {"end": 1948.88, "start": 1948.44, "text": "And"}, {"end": 1949.24, "start": 1948.88, "text": "thus"}, {"end": 1949.84, "start": 1949.24, "text": "your"}, {"end": 1949.96, "start": 1949.84, "text": "algorithm"}], "text": " Now, okay, so the cool thing about locality sensitive hashing is that if you're willing to accept a few false negatives, in our example, that would mean missing a few pairs of records that do represent the same person, then you can cut way down on the number of pairs you actually have to evaluate for similarity. And thus your algorithm"}, {"chunks": [{"end": 1950.36, "start": 1950.0, "text": "gone"}, {"end": 1950.48, "start": 1950.36, "text": "in"}, {"end": 1950.68, "start": 1950.48, "text": "time"}, {"end": 1950.92, "start": 1950.68, "text": "that's"}, {"end": 1951.44, "start": 1950.92, "text": "much"}, {"end": 1952.12, "start": 1951.44, "text": "faster"}, {"end": 1952.48, "start": 1952.12, "text": "than"}, {"end": 1953.16, "start": 1952.48, "text": "quadratic"}, {"end": 1953.16, "start": 1953.16, "text": "in"}, {"end": 1953.44, "start": 1953.16, "text": "the"}, {"end": 1953.68, "start": 1953.44, "text": "number"}, {"end": 1953.92, "start": 1953.68, "text": "of"}, {"end": 1955.52, "start": 1953.92, "text": "records."}, {"end": 1956.04, "start": 1955.52, "text": "And"}, {"end": 1956.4, "start": 1956.04, "text": "remember,"}, {"end": 1956.6, "start": 1956.4, "text": "you"}, {"end": 1956.96, "start": 1956.6, "text": "can't"}, {"end": 1958.08, "start": 1956.96, "text": "really"}, {"end": 1958.64, "start": 1958.08, "text": "look,"}, {"end": 1959.36, "start": 1958.64, "text": "do"}, {"end": 1959.6, "start": 1959.36, "text": "anything"}, {"end": 1959.84, "start": 1959.6, "text": "that's"}, {"end": 1960.8, "start": 1959.84, "text": "quadratic"}, {"end": 1961.2, "start": 1960.8, "text": "in"}, {"end": 1961.56, "start": 1961.2, "text": "a"}, {"end": 1962.12, "start": 1961.56, "text": "number"}, {"end": 1962.72, "start": 1962.12, "text": "that's"}, {"end": 1963.04, "start": 1962.72, "text": "way"}, {"end": 1963.56, "start": 1963.04, "text": "up"}, {"end": 1964.0, "start": 1963.56, "text": "in"}, {"end": 1964.44, "start": 1964.0, "text": "the"}, {"end": 1964.72, "start": 1964.44, "text": "many,"}, {"end": 1965.0, "start": 1964.72, "text": "many"}, {"end": 1965.32, "start": 1965.0, "text": "millions,"}, {"end": 1965.64, "start": 1965.32, "text": "let's"}, {"end": 1965.92, "start": 1965.64, "text": "say,"}, {"end": 1966.08, "start": 1965.92, "text": "or"}, {"end": 1970.12, "start": 1966.08, "text": "billions."}, {"end": 1970.68, "start": 1970.12, "text": "Now,"}, {"end": 1971.28, "start": 1970.68, "text": "what"}, {"end": 1971.72, "start": 1971.28, "text": "you"}, {"end": 1972.16, "start": 1971.72, "text": "need"}, {"end": 1972.36, "start": 1972.16, "text": "to"}, {"end": 1972.64, "start": 1972.36, "text": "do"}, {"end": 1973.0, "start": 1972.64, "text": "for"}, {"end": 1974.52, "start": 1973.0, "text": "locality-sensitive"}, {"end": 1974.92, "start": 1974.52, "text": "hashing"}, {"end": 1975.48, "start": 1974.92, "text": "might"}, {"end": 1976.16, "start": 1975.48, "text": "appear"}, {"end": 1976.64, "start": 1976.16, "text": "at"}, {"end": 1976.68, "start": 1976.64, "text": "first"}, {"end": 1977.28, "start": 1976.68, "text": "glance"}, {"end": 1977.76, "start": 1977.28, "text": "to"}, {"end": 1977.92, "start": 1977.76, "text": "be"}, {"end": 1979.0, "start": 1977.92, "text": "magic."}, {"end": 1979.16, "start": 1979.0, "text": "You"}, {"end": 1979.92, "start": 1979.16, "text": "want"}, {"end": 1979.96, "start": 1979.92, "text": "to,"}], "text": " gone in time that's much faster than quadratic in the number of records. And remember, you can't really look, do anything that's quadratic in a number that's way up in the many, many millions, let's say, or billions. Now, what you need to do for locality-sensitive hashing might appear at first glance to be magic. You want to,"}, {"chunks": [{"end": 1980.68, "start": 1980.0, "text": "invent"}, {"end": 1981.04, "start": 1980.68, "text": "hash"}, {"end": 1981.48, "start": 1981.04, "text": "functions,"}, {"end": 1981.6, "start": 1981.48, "text": "several"}, {"end": 1982.2, "start": 1981.6, "text": "hash"}, {"end": 1983.04, "start": 1982.2, "text": "functions,"}, {"end": 1983.4, "start": 1983.04, "text": "and"}, {"end": 1983.64, "start": 1983.4, "text": "then"}, {"end": 1984.12, "start": 1983.64, "text": "each"}, {"end": 1984.4, "start": 1984.12, "text": "of"}, {"end": 1984.96, "start": 1984.4, "text": "which"}, {"end": 1985.32, "start": 1984.96, "text": "has"}, {"end": 1985.52, "start": 1985.32, "text": "the"}, {"end": 1985.52, "start": 1985.52, "text": "property"}, {"end": 1986.24, "start": 1985.52, "text": "that"}, {"end": 1986.44, "start": 1986.24, "text": "if"}, {"end": 1986.72, "start": 1986.44, "text": "two"}, {"end": 1987.64, "start": 1986.72, "text": "records"}, {"end": 1987.84, "start": 1987.64, "text": "are"}, {"end": 1988.36, "start": 1987.84, "text": "similar,"}, {"end": 1989.4, "start": 1988.36, "text": "then"}, {"end": 1989.4, "start": 1989.4, "text": "they"}, {"end": 1989.6, "start": 1989.4, "text": "have"}, {"end": 1990.04, "start": 1989.6, "text": "a"}, {"end": 1990.32, "start": 1990.04, "text": "good"}, {"end": 1990.84, "start": 1990.32, "text": "chance"}, {"end": 1990.88, "start": 1990.84, "text": "of"}, {"end": 1991.16, "start": 1990.88, "text": "being"}, {"end": 1991.52, "start": 1991.16, "text": "thrown"}, {"end": 1991.56, "start": 1991.52, "text": "into"}, {"end": 1991.8, "start": 1991.56, "text": "the"}, {"end": 1992.16, "start": 1991.8, "text": "same"}, {"end": 1992.32, "start": 1992.16, "text": "bucket."}, {"end": 1992.36, "start": 1992.32, "text": "But"}, {"end": 1994.0, "start": 1992.36, "text": "if"}, {"end": 1994.28, "start": 1994.0, "text": "the"}, {"end": 1994.88, "start": 1994.28, "text": "records"}, {"end": 1995.4, "start": 1994.88, "text": "are"}, {"end": 1996.08, "start": 1995.4, "text": "not"}, {"end": 1996.48, "start": 1996.08, "text": "similar,"}, {"end": 1997.12, "start": 1996.48, "text": "then"}, {"end": 1997.4, "start": 1997.12, "text": "there"}, {"end": 1997.64, "start": 1997.4, "text": "is"}, {"end": 1998.08, "start": 1997.64, "text": "very"}, {"end": 1998.2, "start": 1998.08, "text": "little"}, {"end": 1998.92, "start": 1998.2, "text": "chance"}, {"end": 1999.08, "start": 1998.92, "text": "that"}, {"end": 1999.2, "start": 1999.08, "text": "they'll"}, {"end": 2001.0, "start": 1999.2, "text": "wind"}, {"end": 2001.32, "start": 2001.0, "text": "up"}, {"end": 2001.64, "start": 2001.32, "text": "in"}, {"end": 2002.2, "start": 2001.64, "text": "the"}, {"end": 2002.6, "start": 2002.2, "text": "same"}, {"end": 2003.0, "start": 2002.6, "text": "bucket."}, {"end": 2003.44, "start": 2003.0, "text": "And"}, {"end": 2003.88, "start": 2003.44, "text": "the"}, {"end": 2004.24, "start": 2003.88, "text": "algorithm"}, {"end": 2004.24, "start": 2004.24, "text": "then"}, {"end": 2004.92, "start": 2004.24, "text": "consists"}, {"end": 2005.16, "start": 2004.92, "text": "really"}, {"end": 2005.2, "start": 2005.16, "text": "of,"}, {"end": 2005.36, "start": 2005.2, "text": "you"}, {"end": 2005.52, "start": 2005.36, "text": "do"}, {"end": 2005.84, "start": 2005.52, "text": "several"}, {"end": 2007.08, "start": 2005.84, "text": "different"}, {"end": 2007.68, "start": 2007.08, "text": "hashings"}, {"end": 2007.68, "start": 2007.68, "text": "of"}, {"end": 2007.96, "start": 2007.68, "text": "this"}, {"end": 2008.32, "start": 2007.96, "text": "type,"}, {"end": 2008.36, "start": 2008.32, "text": "but"}, {"end": 2009.36, "start": 2008.36, "text": "you"}, {"end": 2009.96, "start": 2009.36, "text": "only"}], "text": " invent hash functions, several hash functions, and then each of which has the property that if two records are similar, then they have a good chance of being thrown into the same bucket. But if the records are not similar, then there is very little chance that they'll wind up in the same bucket. And the algorithm then consists really of, you do several different hashings of this type, but you only"}, {"chunks": [{"end": 2010.16, "start": 2010.0, "text": "Let"}, {"end": 2010.2, "start": 2010.16, "text": "me"}, {"end": 2010.68, "start": 2010.2, "text": "compare"}, {"end": 2010.96, "start": 2010.68, "text": "two"}, {"end": 2011.36, "start": 2010.96, "text": "records"}, {"end": 2011.8, "start": 2011.36, "text": "if"}, {"end": 2012.32, "start": 2011.8, "text": "they"}, {"end": 2013.16, "start": 2012.32, "text": "wound"}, {"end": 2013.44, "start": 2013.16, "text": "up"}, {"end": 2013.64, "start": 2013.44, "text": "in"}, {"end": 2013.88, "start": 2013.64, "text": "the"}, {"end": 2014.24, "start": 2013.88, "text": "same"}, {"end": 2014.84, "start": 2014.24, "text": "bucket"}, {"end": 2015.32, "start": 2014.84, "text": "for"}, {"end": 2016.88, "start": 2015.32, "text": "at"}, {"end": 2017.0, "start": 2016.88, "text": "least"}, {"end": 2017.56, "start": 2017.0, "text": "one"}, {"end": 2017.72, "start": 2017.56, "text": "of"}, {"end": 2018.12, "start": 2017.72, "text": "the"}, {"end": 2019.4, "start": 2018.12, "text": "hashings."}, {"end": 2020.04, "start": 2019.4, "text": "So"}, {"end": 2020.48, "start": 2020.04, "text": "for"}, {"end": 2021.08, "start": 2020.48, "text": "instance,"}, {"end": 2021.24, "start": 2021.08, "text": "if"}, {"end": 2021.44, "start": 2021.24, "text": "we're"}, {"end": 2021.72, "start": 2021.44, "text": "dealing"}, {"end": 2021.96, "start": 2021.72, "text": "with"}, {"end": 2022.52, "start": 2021.96, "text": "records"}, {"end": 2022.72, "start": 2022.52, "text": "about"}, {"end": 2022.96, "start": 2022.72, "text": "people,"}, {"end": 2024.2, "start": 2022.96, "text": "then"}, {"end": 2024.56, "start": 2024.2, "text": "one"}, {"end": 2024.84, "start": 2024.56, "text": "hash"}, {"end": 2025.28, "start": 2024.84, "text": "function"}, {"end": 2025.68, "start": 2025.28, "text": "we"}, {"end": 2025.88, "start": 2025.68, "text": "use"}, {"end": 2026.2, "start": 2025.88, "text": "might"}, {"end": 2026.64, "start": 2026.2, "text": "be"}, {"end": 2027.12, "start": 2026.64, "text": "the"}, {"end": 2029.0, "start": 2027.12, "text": "exact"}, {"end": 2030.48, "start": 2029.0, "text": "name"}, {"end": 2030.6, "start": 2030.48, "text": "of"}, {"end": 2030.6, "start": 2030.6, "text": "the"}, {"end": 2031.48, "start": 2030.6, "text": "person."}, {"end": 2031.76, "start": 2031.48, "text": "And"}, {"end": 2031.76, "start": 2031.76, "text": "another"}, {"end": 2031.8, "start": 2031.76, "text": "might"}, {"end": 2032.36, "start": 2031.8, "text": "put"}, {"end": 2033.36, "start": 2032.36, "text": "records"}, {"end": 2033.36, "start": 2033.36, "text": "in"}, {"end": 2033.52, "start": 2033.36, "text": "the"}, {"end": 2033.88, "start": 2033.52, "text": "same"}, {"end": 2034.28, "start": 2033.88, "text": "bucket"}, {"end": 2034.64, "start": 2034.28, "text": "if"}, {"end": 2035.52, "start": 2034.64, "text": "and"}, {"end": 2035.8, "start": 2035.52, "text": "only"}, {"end": 2036.04, "start": 2035.8, "text": "if"}, {"end": 2036.36, "start": 2036.04, "text": "their"}, {"end": 2036.4, "start": 2036.36, "text": "phone"}, {"end": 2036.72, "start": 2036.4, "text": "fields"}, {"end": 2038.96, "start": 2036.72, "text": "are"}, {"end": 2039.96, "start": 2038.96, "text": "identical."}], "text": " Let me compare two records if they wound up in the same bucket for at least one of the hashings. So for instance, if we're dealing with records about people, then one hash function we use might be the exact name of the person. And another might put records in the same bucket if and only if their phone fields are identical."}, {"chunks": [{"end": 2040.16, "start": 2040.0, "text": "And"}, {"end": 2040.2, "start": 2040.16, "text": "we"}, {"end": 2040.48, "start": 2040.2, "text": "might"}, {"end": 2040.88, "start": 2040.48, "text": "find"}, {"end": 2041.08, "start": 2040.88, "text": "other"}, {"end": 2041.72, "start": 2041.08, "text": "fields"}, {"end": 2042.0, "start": 2041.72, "text": "on"}, {"end": 2042.28, "start": 2042.0, "text": "which"}, {"end": 2042.32, "start": 2042.28, "text": "to"}, {"end": 2042.92, "start": 2042.32, "text": "base"}, {"end": 2043.52, "start": 2042.92, "text": "hash"}, {"end": 2044.36, "start": 2043.52, "text": "functions,"}, {"end": 2045.0, "start": 2044.36, "text": "such"}, {"end": 2045.56, "start": 2045.0, "text": "as"}, {"end": 2045.76, "start": 2045.56, "text": "an"}, {"end": 2046.2, "start": 2045.76, "text": "address"}, {"end": 2046.72, "start": 2046.2, "text": "field"}, {"end": 2047.36, "start": 2046.72, "text": "or"}, {"end": 2047.76, "start": 2047.36, "text": "an"}, {"end": 2048.68, "start": 2047.76, "text": "ID,"}, {"end": 2049.16, "start": 2048.68, "text": "social"}, {"end": 2049.64, "start": 2049.16, "text": "security"}, {"end": 2050.16, "start": 2049.64, "text": "numbers"}, {"end": 2050.4, "start": 2050.16, "text": "of"}, {"end": 2050.64, "start": 2050.4, "text": "some"}, {"end": 2051.08, "start": 2050.64, "text": "sort."}, {"end": 2051.4, "start": 2051.08, "text": "Now,"}, {"end": 2051.44, "start": 2051.4, "text": "our"}, {"end": 2053.44, "start": 2051.44, "text": "expectation"}, {"end": 2056.32, "start": 2053.44, "text": "is"}, {"end": 2056.6, "start": 2056.32, "text": "that"}, {"end": 2056.6, "start": 2056.6, "text": "if"}, {"end": 2057.12, "start": 2056.6, "text": "two"}, {"end": 2057.2, "start": 2057.12, "text": "records"}, {"end": 2058.36, "start": 2057.2, "text": "represent"}, {"end": 2058.4, "start": 2058.36, "text": "the"}, {"end": 2058.4, "start": 2058.4, "text": "same"}, {"end": 2059.04, "start": 2058.4, "text": "person,"}, {"end": 2059.44, "start": 2059.04, "text": "then"}, {"end": 2060.0, "start": 2059.44, "text": "they'll"}, {"end": 2060.8, "start": 2060.0, "text": "have"}, {"end": 2061.64, "start": 2060.8, "text": "at"}, {"end": 2062.16, "start": 2061.64, "text": "least"}, {"end": 2063.0, "start": 2062.16, "text": "one"}, {"end": 2063.28, "start": 2063.0, "text": "field"}, {"end": 2063.72, "start": 2063.28, "text": "that"}, {"end": 2063.84, "start": 2063.72, "text": "we"}, {"end": 2064.72, "start": 2063.84, "text": "use"}, {"end": 2065.32, "start": 2064.72, "text": "for"}, {"end": 2065.72, "start": 2065.32, "text": "one"}, {"end": 2065.88, "start": 2065.72, "text": "of"}, {"end": 2065.96, "start": 2065.88, "text": "the"}, {"end": 2067.36, "start": 2065.96, "text": "hashings"}, {"end": 2068.2, "start": 2067.36, "text": "where"}, {"end": 2068.28, "start": 2068.2, "text": "the"}, {"end": 2069.96, "start": 2068.28, "text": "value"}], "text": " And we might find other fields on which to base hash functions, such as an address field or an ID, social security numbers of some sort. Now, our expectation is that if two records represent the same person, then they'll have at least one field that we use for one of the hashings where the value"}, {"chunks": [{"end": 2070.12, "start": 2070.0, "text": "The"}, {"end": 2070.64, "start": 2070.12, "text": "values"}, {"end": 2070.64, "start": 2070.64, "text": "of"}, {"end": 2070.68, "start": 2070.64, "text": "the"}, {"end": 2070.84, "start": 2070.68, "text": "two"}, {"end": 2071.4, "start": 2070.84, "text": "records"}, {"end": 2071.44, "start": 2071.4, "text": "in"}, {"end": 2072.0, "start": 2071.44, "text": "those"}, {"end": 2072.28, "start": 2072.0, "text": "fields"}, {"end": 2072.84, "start": 2072.28, "text": "are"}, {"end": 2073.36, "start": 2072.84, "text": "identical"}, {"end": 2073.4, "start": 2073.36, "text": "and"}, {"end": 2073.72, "start": 2073.4, "text": "therefore"}, {"end": 2074.0, "start": 2073.72, "text": "the"}, {"end": 2074.84, "start": 2074.0, "text": "records"}, {"end": 2075.24, "start": 2074.84, "text": "will"}, {"end": 2075.76, "start": 2075.24, "text": "wind"}, {"end": 2076.24, "start": 2075.76, "text": "up"}, {"end": 2076.6, "start": 2076.24, "text": "at"}, {"end": 2076.92, "start": 2076.6, "text": "least"}, {"end": 2077.56, "start": 2076.92, "text": "once"}, {"end": 2077.56, "start": 2077.56, "text": "in"}, {"end": 2077.72, "start": 2077.56, "text": "the"}, {"end": 2078.08, "start": 2077.72, "text": "same"}, {"end": 2080.92, "start": 2078.08, "text": "bucket."}, {"end": 2081.32, "start": 2080.92, "text": "Okay,"}, {"end": 2081.76, "start": 2081.32, "text": "now,"}, {"end": 2082.28, "start": 2081.76, "text": "but"}, {"end": 2083.36, "start": 2082.28, "text": "winding"}, {"end": 2083.36, "start": 2083.36, "text": "up"}, {"end": 2083.36, "start": 2083.36, "text": "in"}, {"end": 2083.44, "start": 2083.36, "text": "the"}, {"end": 2083.68, "start": 2083.44, "text": "same"}, {"end": 2083.76, "start": 2083.68, "text": "bucket"}, {"end": 2084.28, "start": 2083.76, "text": "only"}, {"end": 2084.68, "start": 2084.28, "text": "makes"}, {"end": 2084.68, "start": 2084.68, "text": "the"}, {"end": 2085.0, "start": 2084.68, "text": "pair"}, {"end": 2085.16, "start": 2085.0, "text": "be"}, {"end": 2085.84, "start": 2085.16, "text": "candidates"}, {"end": 2085.88, "start": 2085.84, "text": "for"}, {"end": 2086.32, "start": 2085.88, "text": "similarity."}, {"end": 2087.0, "start": 2086.32, "text": "Okay,"}, {"end": 2087.44, "start": 2087.0, "text": "they"}, {"end": 2087.52, "start": 2087.44, "text": "still"}, {"end": 2087.52, "start": 2087.52, "text": "need"}, {"end": 2087.52, "start": 2087.52, "text": "to"}, {"end": 2087.52, "start": 2087.52, "text": "be"}, {"end": 2088.4, "start": 2087.52, "text": "evaluated"}, {"end": 2088.8, "start": 2088.4, "text": "as"}, {"end": 2089.0, "start": 2088.8, "text": "a"}, {"end": 2089.32, "start": 2089.0, "text": "whole."}, {"end": 2089.68, "start": 2089.32, "text": "Okay,"}, {"end": 2089.88, "start": 2089.68, "text": "just"}, {"end": 2090.48, "start": 2089.88, "text": "for"}, {"end": 2091.2, "start": 2090.48, "text": "example,"}, {"end": 2091.44, "start": 2091.2, "text": "two"}, {"end": 2091.72, "start": 2091.44, "text": "people"}, {"end": 2091.8, "start": 2091.72, "text": "could"}, {"end": 2092.36, "start": 2091.8, "text": "have"}, {"end": 2093.08, "start": 2092.36, "text": "the"}, {"end": 2093.8, "start": 2093.08, "text": "same"}, {"end": 2094.44, "start": 2093.8, "text": "name"}, {"end": 2095.16, "start": 2094.44, "text": "and"}, {"end": 2095.8, "start": 2095.16, "text": "yet"}, {"end": 2096.0, "start": 2095.8, "text": "be"}, {"end": 2096.36, "start": 2096.0, "text": "different"}, {"end": 2096.72, "start": 2096.36, "text": "people."}, {"end": 2098.08, "start": 2096.72, "text": "Okay,"}, {"end": 2099.08, "start": 2098.08, "text": "we"}, {"end": 2099.96, "start": 2099.08, "text": "assume"}], "text": " The values of the two records in those fields are identical and therefore the records will wind up at least once in the same bucket. Okay, now, but winding up in the same bucket only makes the pair be candidates for similarity. Okay, they still need to be evaluated as a whole. Okay, just for example, two people could have the same name and yet be different people. Okay, we assume"}, {"chunks": [{"end": 2100.28, "start": 2100.0, "text": "They"}, {"end": 2101.16, "start": 2100.28, "text": "will"}, {"end": 2101.56, "start": 2101.16, "text": "have"}, {"end": 2101.96, "start": 2101.56, "text": "different"}, {"end": 2102.36, "start": 2101.96, "text": "phone"}, {"end": 2102.8, "start": 2102.36, "text": "numbers"}, {"end": 2102.92, "start": 2102.8, "text": "and"}, {"end": 2103.32, "start": 2102.92, "text": "they'll"}, {"end": 2103.56, "start": 2103.32, "text": "live"}, {"end": 2103.68, "start": 2103.56, "text": "at"}, {"end": 2104.04, "start": 2103.68, "text": "different"}, {"end": 2104.76, "start": 2104.04, "text": "addresses."}, {"end": 2105.04, "start": 2104.76, "text": "So"}, {"end": 2105.16, "start": 2105.04, "text": "the"}, {"end": 2105.56, "start": 2105.16, "text": "records"}, {"end": 2106.0, "start": 2105.56, "text": "as"}, {"end": 2106.6, "start": 2106.0, "text": "a"}, {"end": 2107.12, "start": 2106.6, "text": "whole"}, {"end": 2107.76, "start": 2107.12, "text": "will"}, {"end": 2108.04, "start": 2107.76, "text": "not"}, {"end": 2108.08, "start": 2108.04, "text": "be"}, {"end": 2108.36, "start": 2108.08, "text": "sufficiently"}, {"end": 2108.64, "start": 2108.36, "text": "similar"}, {"end": 2109.76, "start": 2108.64, "text": "for"}, {"end": 2110.16, "start": 2109.76, "text": "us"}, {"end": 2110.44, "start": 2110.16, "text": "to"}, {"end": 2111.04, "start": 2110.44, "text": "think"}, {"end": 2111.36, "start": 2111.04, "text": "that"}, {"end": 2111.84, "start": 2111.36, "text": "these"}, {"end": 2112.56, "start": 2111.84, "text": "records"}, {"end": 2113.36, "start": 2112.56, "text": "represent"}, {"end": 2113.76, "start": 2113.36, "text": "the"}, {"end": 2114.32, "start": 2113.76, "text": "same"}, {"end": 2115.2, "start": 2114.32, "text": "people."}, {"end": 2115.52, "start": 2115.2, "text": "Or"}, {"end": 2115.88, "start": 2115.52, "text": "another"}, {"end": 2116.48, "start": 2115.88, "text": "example,"}, {"end": 2116.64, "start": 2116.48, "text": "you"}, {"end": 2117.28, "start": 2116.64, "text": "might"}, {"end": 2118.0, "start": 2117.28, "text": "at"}, {"end": 2118.32, "start": 2118.0, "text": "some"}, {"end": 2118.36, "start": 2118.32, "text": "point"}, {"end": 2118.44, "start": 2118.36, "text": "give"}, {"end": 2118.52, "start": 2118.44, "text": "up"}, {"end": 2118.88, "start": 2118.52, "text": "your"}, {"end": 2119.56, "start": 2118.88, "text": "landline"}, {"end": 2119.8, "start": 2119.56, "text": "phone"}, {"end": 2119.88, "start": 2119.8, "text": "and"}, {"end": 2119.88, "start": 2119.88, "text": "the"}, {"end": 2120.04, "start": 2119.88, "text": "number"}, {"end": 2120.52, "start": 2120.04, "text": "is"}, {"end": 2121.08, "start": 2120.52, "text": "later"}, {"end": 2122.0, "start": 2121.08, "text": "assigned"}, {"end": 2122.0, "start": 2122.0, "text": "to"}, {"end": 2122.52, "start": 2122.0, "text": "someone"}, {"end": 2122.96, "start": 2122.52, "text": "else."}, {"end": 2123.88, "start": 2122.96, "text": "So"}, {"end": 2124.16, "start": 2123.88, "text": "you"}, {"end": 2124.6, "start": 2124.16, "text": "have"}, {"end": 2125.24, "start": 2124.6, "text": "two"}, {"end": 2125.76, "start": 2125.24, "text": "different"}, {"end": 2126.08, "start": 2125.76, "text": "people"}, {"end": 2126.64, "start": 2126.08, "text": "who"}, {"end": 2127.2, "start": 2126.64, "text": "will"}, {"end": 2127.6, "start": 2127.2, "text": "appear"}, {"end": 2128.2, "start": 2127.6, "text": "in"}, {"end": 2129.24, "start": 2128.2, "text": "records"}, {"end": 2129.72, "start": 2129.24, "text": "as"}, {"end": 2129.72, "start": 2129.72, "text": "having"}, {"end": 2129.72, "start": 2129.72, "text": "the"}, {"end": 2129.72, "start": 2129.72, "text": "same"}, {"end": 2129.96, "start": 2129.72, "text": "phone."}], "text": " They will have different phone numbers and they'll live at different addresses. So the records as a whole will not be sufficiently similar for us to think that these records represent the same people. Or another example, you might at some point give up your landline phone and the number is later assigned to someone else. So you have two different people who will appear in records as having the same phone."}, {"chunks": [{"end": 2131.72, "start": 2130.0, "text": "Okay."}, {"end": 2132.44, "start": 2131.72, "text": "Same"}, {"end": 2133.12, "start": 2132.44, "text": "problem,"}, {"end": 2133.36, "start": 2133.12, "text": "you"}, {"end": 2133.68, "start": 2133.36, "text": "hope."}, {"end": 2133.92, "start": 2133.68, "text": "They"}, {"end": 2134.36, "start": 2133.92, "text": "will"}, {"end": 2134.8, "start": 2134.36, "text": "not,"}, {"end": 2135.24, "start": 2134.8, "text": "though"}, {"end": 2135.4, "start": 2135.24, "text": "records"}, {"end": 2135.6, "start": 2135.4, "text": "as"}, {"end": 2136.0, "start": 2135.6, "text": "a"}, {"end": 2136.6, "start": 2136.0, "text": "whole,"}, {"end": 2138.0, "start": 2136.6, "text": "will"}, {"end": 2138.64, "start": 2138.0, "text": "not"}, {"end": 2138.8, "start": 2138.64, "text": "look"}, {"end": 2139.0, "start": 2138.8, "text": "very"}, {"end": 2139.72, "start": 2139.0, "text": "similar."}, {"end": 2139.92, "start": 2139.72, "text": "Okay."}, {"end": 2140.24, "start": 2139.92, "text": "But"}, {"end": 2140.24, "start": 2140.24, "text": "the"}, {"end": 2141.04, "start": 2140.24, "text": "biggest"}, {"end": 2141.44, "start": 2141.04, "text": "source"}, {"end": 2141.44, "start": 2141.44, "text": "of"}, {"end": 2142.28, "start": 2141.44, "text": "candidates"}, {"end": 2143.04, "start": 2142.28, "text": "that"}, {"end": 2143.36, "start": 2143.04, "text": "are"}, {"end": 2143.4, "start": 2143.36, "text": "not"}, {"end": 2143.84, "start": 2143.4, "text": "really"}, {"end": 2144.16, "start": 2143.84, "text": "similar"}, {"end": 2144.32, "start": 2144.16, "text": "is"}, {"end": 2144.76, "start": 2144.32, "text": "probably"}, {"end": 2145.56, "start": 2144.76, "text": "typos"}, {"end": 2145.6, "start": 2145.56, "text": "that"}, {"end": 2146.08, "start": 2145.6, "text": "occur."}, {"end": 2146.56, "start": 2146.08, "text": "Someone"}, {"end": 2147.04, "start": 2146.56, "text": "might"}, {"end": 2147.44, "start": 2147.04, "text": "have"}, {"end": 2147.72, "start": 2147.44, "text": "a"}, {"end": 2148.24, "start": 2147.72, "text": "phone"}, {"end": 2148.76, "start": 2148.24, "text": "number"}, {"end": 2148.84, "start": 2148.76, "text": "that"}, {"end": 2149.08, "start": 2148.84, "text": "is,"}, {"end": 2149.4, "start": 2149.08, "text": "let's"}, {"end": 2149.6, "start": 2149.4, "text": "say,"}, {"end": 2149.84, "start": 2149.6, "text": "one"}, {"end": 2150.2, "start": 2149.84, "text": "digit"}, {"end": 2150.6, "start": 2150.2, "text": "off"}, {"end": 2150.64, "start": 2150.6, "text": "from"}, {"end": 2151.28, "start": 2150.64, "text": "yours,"}, {"end": 2151.44, "start": 2151.28, "text": "but"}, {"end": 2151.88, "start": 2151.44, "text": "then"}, {"end": 2152.24, "start": 2151.88, "text": "their"}, {"end": 2152.92, "start": 2152.24, "text": "number"}, {"end": 2153.24, "start": 2152.92, "text": "gets"}, {"end": 2153.84, "start": 2153.24, "text": "mistyped"}, {"end": 2153.92, "start": 2153.84, "text": "when"}, {"end": 2154.4, "start": 2153.92, "text": "the"}, {"end": 2155.12, "start": 2154.4, "text": "record"}, {"end": 2155.36, "start": 2155.12, "text": "is"}, {"end": 2155.84, "start": 2155.36, "text": "created."}, {"end": 2159.96, "start": 2155.84, "text": "Okay."}], "text": " Okay. Same problem, you hope. They will not, though records as a whole, will not look very similar. Okay. But the biggest source of candidates that are not really similar is probably typos that occur. Someone might have a phone number that is, let's say, one digit off from yours, but then their number gets mistyped when the record is created. Okay."}, {"chunks": [{"end": 2160.12, "start": 2160.0, "text": "So"}, {"end": 2161.92, "start": 2160.12, "text": "to"}, {"end": 2163.48, "start": 2161.92, "text": "summarize,"}, {"end": 2163.92, "start": 2163.48, "text": "we'll"}, {"end": 2164.16, "start": 2163.92, "text": "use"}, {"end": 2164.68, "start": 2164.16, "text": "fields"}, {"end": 2164.84, "start": 2164.68, "text": "like"}, {"end": 2165.12, "start": 2164.84, "text": "name"}, {"end": 2165.36, "start": 2165.12, "text": "and"}, {"end": 2165.76, "start": 2165.36, "text": "phone"}, {"end": 2165.96, "start": 2165.76, "text": "to"}, {"end": 2166.44, "start": 2165.96, "text": "hash"}, {"end": 2166.64, "start": 2166.44, "text": "all"}, {"end": 2167.04, "start": 2166.64, "text": "the"}, {"end": 2167.48, "start": 2167.04, "text": "records"}, {"end": 2168.0, "start": 2167.48, "text": "several"}, {"end": 2168.44, "start": 2168.0, "text": "times."}, {"end": 2170.44, "start": 2168.44, "text": "Okay."}, {"end": 2170.76, "start": 2170.44, "text": "We"}, {"end": 2171.16, "start": 2170.76, "text": "only"}, {"end": 2171.6, "start": 2171.16, "text": "compare"}, {"end": 2172.16, "start": 2171.6, "text": "the"}, {"end": 2172.56, "start": 2172.16, "text": "similarity"}, {"end": 2173.04, "start": 2172.56, "text": "of"}, {"end": 2173.64, "start": 2173.04, "text": "the"}, {"end": 2174.88, "start": 2173.64, "text": "relatively"}, {"end": 2175.48, "start": 2174.88, "text": "small"}, {"end": 2175.76, "start": 2175.48, "text": "number"}, {"end": 2176.04, "start": 2175.76, "text": "of"}, {"end": 2176.64, "start": 2176.04, "text": "pairs"}, {"end": 2176.72, "start": 2176.64, "text": "of"}, {"end": 2177.36, "start": 2176.72, "text": "records"}, {"end": 2177.4, "start": 2177.36, "text": "that"}, {"end": 2177.68, "start": 2177.4, "text": "appear"}, {"end": 2178.28, "start": 2177.68, "text": "together"}, {"end": 2178.44, "start": 2178.28, "text": "in"}, {"end": 2178.76, "start": 2178.44, "text": "one"}, {"end": 2179.04, "start": 2178.76, "text": "bucket."}, {"end": 2179.4, "start": 2179.04, "text": "And"}, {"end": 2180.0, "start": 2179.4, "text": "we"}, {"end": 2180.8, "start": 2180.0, "text": "expect"}, {"end": 2181.8, "start": 2180.8, "text": "that"}, {"end": 2181.84, "start": 2181.8, "text": "the"}, {"end": 2182.08, "start": 2181.84, "text": "vast"}, {"end": 2182.68, "start": 2182.08, "text": "majority"}, {"end": 2182.72, "start": 2182.68, "text": "of"}, {"end": 2183.68, "start": 2182.72, "text": "pairs"}, {"end": 2183.84, "start": 2183.68, "text": "will"}, {"end": 2184.2, "start": 2183.84, "text": "never"}, {"end": 2184.6, "start": 2184.2, "text": "appear"}, {"end": 2184.6, "start": 2184.6, "text": "in"}, {"end": 2184.64, "start": 2184.6, "text": "a"}, {"end": 2184.84, "start": 2184.64, "text": "bucket"}, {"end": 2185.0, "start": 2184.84, "text": "together."}, {"end": 2185.2, "start": 2185.0, "text": "So"}, {"end": 2185.76, "start": 2185.2, "text": "we"}, {"end": 2186.16, "start": 2185.76, "text": "never"}, {"end": 2186.28, "start": 2186.16, "text": "look"}, {"end": 2186.32, "start": 2186.28, "text": "at"}, {"end": 2186.68, "start": 2186.32, "text": "those"}, {"end": 2187.8, "start": 2186.68, "text": "pairs"}, {"end": 2188.28, "start": 2187.8, "text": "and"}, {"end": 2189.08, "start": 2188.28, "text": "thus"}, {"end": 2189.68, "start": 2189.08, "text": "save"}, {"end": 2189.96, "start": 2189.68, "text": "lots"}], "text": " So to summarize, we'll use fields like name and phone to hash all the records several times. Okay. We only compare the similarity of the relatively small number of pairs of records that appear together in one bucket. And we expect that the vast majority of pairs will never appear in a bucket together. So we never look at those pairs and thus save lots"}, {"chunks": [{"end": 2191.16, "start": 2190.0, "text": "time."}, {"end": 2193.08, "start": 2191.16, "text": "Okay."}, {"end": 2193.68, "start": 2193.08, "text": "But"}, {"end": 2195.12, "start": 2193.68, "text": "we"}, {"end": 2195.6, "start": 2195.12, "text": "will"}, {"end": 2196.04, "start": 2195.6, "text": "miss"}, {"end": 2196.36, "start": 2196.04, "text": "those"}, {"end": 2197.0, "start": 2196.36, "text": "pairs"}, {"end": 2197.4, "start": 2197.0, "text": "that"}, {"end": 2197.48, "start": 2197.4, "text": "really"}, {"end": 2197.6, "start": 2197.48, "text": "do"}, {"end": 2198.36, "start": 2197.6, "text": "represent"}, {"end": 2198.44, "start": 2198.36, "text": "the"}, {"end": 2198.6, "start": 2198.44, "text": "same"}, {"end": 2199.2, "start": 2198.6, "text": "person,"}, {"end": 2199.84, "start": 2199.2, "text": "but"}, {"end": 2200.48, "start": 2199.84, "text": "because"}, {"end": 2200.52, "start": 2200.48, "text": "of"}, {"end": 2200.84, "start": 2200.52, "text": "typos"}, {"end": 2201.4, "start": 2200.84, "text": "or"}, {"end": 2201.96, "start": 2201.4, "text": "other"}, {"end": 2203.44, "start": 2201.96, "text": "reasons,"}, {"end": 2203.92, "start": 2203.44, "text": "are"}, {"end": 2204.44, "start": 2203.92, "text": "not"}, {"end": 2205.12, "start": 2204.44, "text": "exact"}, {"end": 2205.12, "start": 2205.12, "text": "matches"}, {"end": 2205.32, "start": 2205.12, "text": "in"}, {"end": 2205.56, "start": 2205.32, "text": "any"}, {"end": 2205.64, "start": 2205.56, "text": "of"}, {"end": 2205.96, "start": 2205.64, "text": "the"}, {"end": 2206.36, "start": 2205.96, "text": "fields"}, {"end": 2206.92, "start": 2206.36, "text": "we"}, {"end": 2207.56, "start": 2206.92, "text": "use"}, {"end": 2207.84, "start": 2207.56, "text": "for"}, {"end": 2208.4, "start": 2207.84, "text": "hashing."}, {"end": 2208.88, "start": 2208.4, "text": "It's"}, {"end": 2209.24, "start": 2208.88, "text": "rare,"}, {"end": 2209.4, "start": 2209.24, "text": "but"}, {"end": 2209.4, "start": 2209.4, "text": "it"}, {"end": 2209.48, "start": 2209.4, "text": "can"}, {"end": 2209.76, "start": 2209.48, "text": "happen."}, {"end": 2210.36, "start": 2209.76, "text": "Okay."}, {"end": 2210.68, "start": 2210.36, "text": "Now"}, {"end": 2215.24, "start": 2210.68, "text": "I"}, {"end": 2215.44, "start": 2215.24, "text": "want"}, {"end": 2215.8, "start": 2215.44, "text": "to"}, {"end": 2216.2, "start": 2215.8, "text": "talk"}, {"end": 2216.96, "start": 2216.2, "text": "about"}, {"end": 2217.0, "start": 2216.96, "text": "a"}, {"end": 2217.92, "start": 2217.0, "text": "totally"}, {"end": 2218.4, "start": 2217.92, "text": "different"}, {"end": 2219.96, "start": 2218.4, "text": "problem."}], "text": " time. Okay. But we will miss those pairs that really do represent the same person, but because of typos or other reasons, are not exact matches in any of the fields we use for hashing. It's rare, but it can happen. Okay. Now I want to talk about a totally different problem."}, {"chunks": [{"end": 2220.28, "start": 2220.0, "text": "and"}, {"end": 2220.64, "start": 2220.28, "text": "its"}, {"end": 2220.96, "start": 2220.64, "text": "solution."}, {"end": 2221.76, "start": 2220.96, "text": "Now"}, {"end": 2222.04, "start": 2221.76, "text": "here"}, {"end": 2222.36, "start": 2222.04, "text": "we"}, {"end": 2223.24, "start": 2222.36, "text": "suppose"}, {"end": 2223.76, "start": 2223.24, "text": "we're"}, {"end": 2224.24, "start": 2223.76, "text": "given"}, {"end": 2224.28, "start": 2224.24, "text": "a"}, {"end": 2224.64, "start": 2224.28, "text": "stream"}, {"end": 2225.16, "start": 2224.64, "text": "or"}, {"end": 2225.68, "start": 2225.16, "text": "list"}, {"end": 2225.88, "start": 2225.68, "text": "of"}, {"end": 2226.88, "start": 2225.88, "text": "elements"}, {"end": 2227.44, "start": 2226.88, "text": "and"}, {"end": 2227.48, "start": 2227.44, "text": "we"}, {"end": 2227.88, "start": 2227.48, "text": "want"}, {"end": 2228.08, "start": 2227.88, "text": "to"}, {"end": 2228.6, "start": 2228.08, "text": "determine"}, {"end": 2228.92, "start": 2228.6, "text": "how"}, {"end": 2229.28, "start": 2228.92, "text": "many"}, {"end": 2229.72, "start": 2229.28, "text": "distinct"}, {"end": 2230.24, "start": 2229.72, "text": "values"}, {"end": 2230.6, "start": 2230.24, "text": "there"}, {"end": 2230.64, "start": 2230.6, "text": "are"}, {"end": 2230.92, "start": 2230.64, "text": "among"}, {"end": 2231.44, "start": 2230.92, "text": "those"}, {"end": 2232.64, "start": 2231.44, "text": "elements."}, {"end": 2233.64, "start": 2232.64, "text": "One"}, {"end": 2234.6, "start": 2233.64, "text": "example"}, {"end": 2235.2, "start": 2234.6, "text": "would"}, {"end": 2235.6, "start": 2235.2, "text": "be"}, {"end": 2236.36, "start": 2235.6, "text": "Facebook"}, {"end": 2236.96, "start": 2236.36, "text": "wants"}, {"end": 2237.12, "start": 2236.96, "text": "to"}, {"end": 2237.52, "start": 2237.12, "text": "report"}, {"end": 2237.68, "start": 2237.52, "text": "the"}, {"end": 2237.68, "start": 2237.68, "text": "number"}, {"end": 2237.88, "start": 2237.68, "text": "of"}, {"end": 2237.88, "start": 2237.88, "text": "people"}, {"end": 2238.48, "start": 2237.88, "text": "who"}, {"end": 2238.72, "start": 2238.48, "text": "use"}, {"end": 2238.92, "start": 2238.72, "text": "their"}, {"end": 2239.32, "start": 2238.92, "text": "system"}, {"end": 2239.44, "start": 2239.32, "text": "at"}, {"end": 2239.72, "start": 2239.44, "text": "least"}, {"end": 2240.28, "start": 2239.72, "text": "once"}, {"end": 2240.4, "start": 2240.28, "text": "in"}, {"end": 2240.72, "start": 2240.4, "text": "a"}, {"end": 2241.28, "start": 2240.72, "text": "given"}, {"end": 2241.48, "start": 2241.28, "text": "month."}, {"end": 2241.6, "start": 2241.48, "text": "Okay,"}, {"end": 2241.6, "start": 2241.6, "text": "it"}, {"end": 2242.24, "start": 2241.6, "text": "has"}, {"end": 2242.72, "start": 2242.24, "text": "a"}, {"end": 2243.44, "start": 2242.72, "text": "stream"}, {"end": 2243.96, "start": 2243.44, "text": "of"}, {"end": 2244.76, "start": 2243.96, "text": "logins"}, {"end": 2245.08, "start": 2244.76, "text": "but"}, {"end": 2245.08, "start": 2245.08, "text": "many"}, {"end": 2245.12, "start": 2245.08, "text": "people"}, {"end": 2245.6, "start": 2245.12, "text": "will"}, {"end": 2245.6, "start": 2245.6, "text": "log"}, {"end": 2245.8, "start": 2245.6, "text": "in"}, {"end": 2246.08, "start": 2245.8, "text": "more"}, {"end": 2246.52, "start": 2246.08, "text": "than"}, {"end": 2247.16, "start": 2246.52, "text": "once"}, {"end": 2247.16, "start": 2247.16, "text": "a"}, {"end": 2247.4, "start": 2247.16, "text": "month"}, {"end": 2247.92, "start": 2247.4, "text": "so"}, {"end": 2248.32, "start": 2247.92, "text": "the"}, {"end": 2248.92, "start": 2248.32, "text": "answer"}, {"end": 2249.04, "start": 2248.92, "text": "is"}, {"end": 2249.04, "start": 2249.04, "text": "not"}, {"end": 2249.24, "start": 2249.04, "text": "just"}, {"end": 2249.72, "start": 2249.24, "text": "the"}, {"end": 2249.8, "start": 2249.72, "text": "length"}, {"end": 2249.8, "start": 2249.8, "text": "of"}, {"end": 2249.8, "start": 2249.8, "text": "the"}, {"end": 2249.96, "start": 2249.8, "text": "stream."}], "text": " and its solution. Now here we suppose we're given a stream or list of elements and we want to determine how many distinct values there are among those elements. One example would be Facebook wants to report the number of people who use their system at least once in a given month. Okay, it has a stream of logins but many people will log in more than once a month so the answer is not just the length of the stream."}, {"chunks": [{"end": 2251.76, "start": 2250.0, "text": "Now,"}, {"end": 2252.76, "start": 2251.76, "text": "there's"}, {"end": 2253.6, "start": 2252.76, "text": "a"}, {"end": 2254.44, "start": 2253.6, "text": "straightforward"}, {"end": 2255.4, "start": 2254.44, "text": "solution."}, {"end": 2256.16, "start": 2255.4, "text": "You"}, {"end": 2256.64, "start": 2256.16, "text": "keep"}, {"end": 2257.08, "start": 2256.64, "text": "a"}, {"end": 2257.36, "start": 2257.08, "text": "hash"}, {"end": 2257.96, "start": 2257.36, "text": "table"}, {"end": 2258.4, "start": 2257.96, "text": "of"}, {"end": 2258.8, "start": 2258.4, "text": "all"}, {"end": 2258.8, "start": 2258.8, "text": "the"}, {"end": 2259.12, "start": 2258.8, "text": "login"}, {"end": 2259.2, "start": 2259.12, "text": "names"}, {"end": 2259.32, "start": 2259.2, "text": "you've"}, {"end": 2259.68, "start": 2259.32, "text": "seen"}, {"end": 2259.92, "start": 2259.68, "text": "so"}, {"end": 2260.24, "start": 2259.92, "text": "far"}, {"end": 2260.56, "start": 2260.24, "text": "this"}, {"end": 2261.76, "start": 2260.56, "text": "month,"}, {"end": 2262.12, "start": 2261.76, "text": "and"}, {"end": 2262.28, "start": 2262.12, "text": "you"}, {"end": 2262.68, "start": 2262.28, "text": "count"}, {"end": 2263.04, "start": 2262.68, "text": "how"}, {"end": 2263.4, "start": 2263.04, "text": "many"}, {"end": 2263.8, "start": 2263.4, "text": "unique"}, {"end": 2264.28, "start": 2263.8, "text": "names"}, {"end": 2266.6, "start": 2264.28, "text": "you've"}, {"end": 2269.36, "start": 2266.6, "text": "seen."}, {"end": 2270.04, "start": 2269.36, "text": "When"}, {"end": 2270.16, "start": 2270.04, "text": "a"}, {"end": 2270.52, "start": 2270.16, "text": "new"}, {"end": 2271.0, "start": 2270.52, "text": "login"}, {"end": 2272.2, "start": 2271.0, "text": "arrives,"}, {"end": 2272.6, "start": 2272.2, "text": "you"}, {"end": 2273.04, "start": 2272.6, "text": "hash"}, {"end": 2273.44, "start": 2273.04, "text": "it"}, {"end": 2274.08, "start": 2273.44, "text": "and"}, {"end": 2275.6, "start": 2274.08, "text": "see"}, {"end": 2276.2, "start": 2275.6, "text": "if"}, {"end": 2276.4, "start": 2276.2, "text": "it"}, {"end": 2276.88, "start": 2276.4, "text": "is"}, {"end": 2277.24, "start": 2276.88, "text": "already"}, {"end": 2278.0, "start": 2277.24, "text": "in"}, {"end": 2278.32, "start": 2278.0, "text": "the"}, {"end": 2278.92, "start": 2278.32, "text": "table."}, {"end": 2279.08, "start": 2278.92, "text": "If"}, {"end": 2279.52, "start": 2279.08, "text": "so,"}, {"end": 2279.68, "start": 2279.52, "text": "you"}, {"end": 2279.68, "start": 2279.68, "text": "do"}, {"end": 2279.88, "start": 2279.68, "text": "nothing"}, {"end": 2279.96, "start": 2279.88, "text": "more."}], "text": " Now, there's a straightforward solution. You keep a hash table of all the login names you've seen so far this month, and you count how many unique names you've seen. When a new login arrives, you hash it and see if it is already in the table. If so, you do nothing more."}, {"chunks": [{"end": 2280.16, "start": 2280.0, "text": "But"}, {"end": 2280.16, "start": 2280.16, "text": "if"}, {"end": 2280.88, "start": 2280.16, "text": "the"}, {"end": 2281.44, "start": 2280.88, "text": "new"}, {"end": 2281.84, "start": 2281.44, "text": "login"}, {"end": 2282.52, "start": 2281.84, "text": "is"}, {"end": 2283.2, "start": 2282.52, "text": "not"}, {"end": 2283.56, "start": 2283.2, "text": "on"}, {"end": 2284.04, "start": 2283.56, "text": "the"}, {"end": 2285.24, "start": 2284.04, "text": "table,"}, {"end": 2285.72, "start": 2285.24, "text": "then"}, {"end": 2286.24, "start": 2285.72, "text": "this"}, {"end": 2286.76, "start": 2286.24, "text": "is"}, {"end": 2287.28, "start": 2286.76, "text": "the"}, {"end": 2287.56, "start": 2287.28, "text": "first"}, {"end": 2287.8, "start": 2287.56, "text": "time"}, {"end": 2287.8, "start": 2287.8, "text": "that"}, {"end": 2287.8, "start": 2287.8, "text": "the"}, {"end": 2287.8, "start": 2287.8, "text": "user"}, {"end": 2287.8, "start": 2287.8, "text": "is"}, {"end": 2287.8, "start": 2287.8, "text": "logged"}, {"end": 2287.84, "start": 2287.8, "text": "in"}, {"end": 2288.36, "start": 2287.84, "text": "this"}, {"end": 2289.16, "start": 2288.36, "text": "month."}, {"end": 2289.72, "start": 2289.16, "text": "So"}, {"end": 2290.08, "start": 2289.72, "text": "you"}, {"end": 2290.52, "start": 2290.08, "text": "add"}, {"end": 2291.04, "start": 2290.52, "text": "the"}, {"end": 2291.28, "start": 2291.04, "text": "name"}, {"end": 2291.52, "start": 2291.28, "text": "to"}, {"end": 2291.84, "start": 2291.52, "text": "the"}, {"end": 2292.6, "start": 2291.84, "text": "table"}, {"end": 2292.68, "start": 2292.6, "text": "and"}, {"end": 2292.96, "start": 2292.68, "text": "you"}, {"end": 2293.36, "start": 2292.96, "text": "increase"}, {"end": 2293.36, "start": 2293.36, "text": "the"}, {"end": 2293.92, "start": 2293.36, "text": "count"}, {"end": 2294.12, "start": 2293.92, "text": "by"}, {"end": 2294.88, "start": 2294.12, "text": "one."}, {"end": 2295.32, "start": 2294.88, "text": "So"}, {"end": 2295.8, "start": 2295.32, "text": "this"}, {"end": 2296.64, "start": 2295.8, "text": "method"}, {"end": 2297.08, "start": 2296.64, "text": "uses"}, {"end": 2297.08, "start": 2297.08, "text": "a"}, {"end": 2297.56, "start": 2297.08, "text": "lot"}, {"end": 2298.08, "start": 2297.56, "text": "of"}, {"end": 2298.4, "start": 2298.08, "text": "space"}, {"end": 2298.4, "start": 2298.4, "text": "for"}, {"end": 2298.4, "start": 2298.4, "text": "the"}, {"end": 2298.68, "start": 2298.4, "text": "hash"}, {"end": 2299.04, "start": 2298.68, "text": "table,"}, {"end": 2300.12, "start": 2299.04, "text": "but"}, {"end": 2300.56, "start": 2300.12, "text": "in"}, {"end": 2301.12, "start": 2300.56, "text": "Facebook's"}, {"end": 2302.04, "start": 2301.12, "text": "case,"}, {"end": 2302.4, "start": 2302.04, "text": "they"}, {"end": 2302.96, "start": 2302.4, "text": "can"}, {"end": 2303.44, "start": 2302.96, "text": "afford"}, {"end": 2303.84, "start": 2303.44, "text": "the"}, {"end": 2304.16, "start": 2303.84, "text": "space."}, {"end": 2305.48, "start": 2304.16, "text": "It's"}, {"end": 2306.32, "start": 2305.48, "text": "not"}, {"end": 2306.52, "start": 2306.32, "text": "a"}, {"end": 2307.68, "start": 2306.52, "text": "problem."}, {"end": 2308.04, "start": 2307.68, "text": "But"}, {"end": 2308.04, "start": 2308.04, "text": "on"}, {"end": 2308.2, "start": 2308.04, "text": "the"}, {"end": 2308.76, "start": 2308.2, "text": "next"}, {"end": 2309.96, "start": 2308.76, "text": "slide,"}], "text": " But if the new login is not on the table, then this is the first time that the user is logged in this month. So you add the name to the table and you increase the count by one. So this method uses a lot of space for the hash table, but in Facebook's case, they can afford the space. It's not a problem. But on the next slide,"}, {"chunks": [{"end": 2311.04, "start": 2310.0, "text": "I'm"}, {"end": 2311.44, "start": 2311.04, "text": "gonna"}, {"end": 2311.52, "start": 2311.44, "text": "talk"}, {"end": 2312.4, "start": 2311.52, "text": "about"}, {"end": 2313.28, "start": 2312.4, "text": "another"}, {"end": 2313.92, "start": 2313.28, "text": "example"}, {"end": 2314.08, "start": 2313.92, "text": "of"}, {"end": 2314.68, "start": 2314.08, "text": "applications"}, {"end": 2315.12, "start": 2314.68, "text": "where"}, {"end": 2315.52, "start": 2315.12, "text": "the"}, {"end": 2316.04, "start": 2315.52, "text": "tables"}, {"end": 2316.6, "start": 2316.04, "text": "themselves"}, {"end": 2316.6, "start": 2316.6, "text": "are"}, {"end": 2316.84, "start": 2316.6, "text": "quite"}, {"end": 2317.28, "start": 2316.84, "text": "small,"}, {"end": 2317.32, "start": 2317.28, "text": "but"}, {"end": 2317.32, "start": 2317.32, "text": "there"}, {"end": 2317.48, "start": 2317.32, "text": "are"}, {"end": 2317.6, "start": 2317.48, "text": "so"}, {"end": 2317.8, "start": 2317.6, "text": "many"}, {"end": 2318.32, "start": 2317.8, "text": "of"}, {"end": 2318.8, "start": 2318.32, "text": "them"}, {"end": 2319.08, "start": 2318.8, "text": "that"}, {"end": 2319.32, "start": 2319.08, "text": "it's"}, {"end": 2319.56, "start": 2319.32, "text": "not"}, {"end": 2319.88, "start": 2319.56, "text": "really"}, {"end": 2320.44, "start": 2319.88, "text": "feasible"}, {"end": 2320.64, "start": 2320.44, "text": "to"}, {"end": 2321.28, "start": 2320.64, "text": "use"}, {"end": 2321.44, "start": 2321.28, "text": "the"}, {"end": 2321.72, "start": 2321.44, "text": "simple"}, {"end": 2325.2, "start": 2321.72, "text": "solution."}, {"end": 2325.2, "start": 2325.2, "text": "Okay,"}, {"end": 2327.16, "start": 2325.2, "text": "now"}, {"end": 2327.84, "start": 2327.16, "text": "web"}, {"end": 2328.88, "start": 2327.84, "text": "crawlers"}, {"end": 2329.36, "start": 2328.88, "text": "are"}, {"end": 2330.12, "start": 2329.36, "text": "designed"}, {"end": 2330.56, "start": 2330.12, "text": "to"}, {"end": 2331.24, "start": 2330.56, "text": "start"}, {"end": 2331.48, "start": 2331.24, "text": "at"}, {"end": 2331.72, "start": 2331.48, "text": "some"}, {"end": 2332.32, "start": 2331.72, "text": "webpage"}, {"end": 2332.52, "start": 2332.32, "text": "and"}, {"end": 2332.72, "start": 2332.52, "text": "see"}, {"end": 2332.72, "start": 2332.72, "text": "what"}, {"end": 2333.12, "start": 2332.72, "text": "pages"}, {"end": 2333.16, "start": 2333.12, "text": "can"}, {"end": 2333.4, "start": 2333.16, "text": "be"}, {"end": 2333.96, "start": 2333.4, "text": "reached"}, {"end": 2334.52, "start": 2333.96, "text": "from"}, {"end": 2335.08, "start": 2334.52, "text": "there."}, {"end": 2336.96, "start": 2335.08, "text": "Presumably"}, {"end": 2336.96, "start": 2336.96, "text": "the"}, {"end": 2337.32, "start": 2336.96, "text": "entire"}, {"end": 2337.72, "start": 2337.32, "text": "web,"}, {"end": 2338.2, "start": 2337.72, "text": "if"}, {"end": 2338.36, "start": 2338.2, "text": "the"}, {"end": 2338.76, "start": 2338.36, "text": "starting"}, {"end": 2339.48, "start": 2338.76, "text": "point"}, {"end": 2339.96, "start": 2339.48, "text": "or"}], "text": " I'm gonna talk about another example of applications where the tables themselves are quite small, but there are so many of them that it's not really feasible to use the simple solution. Okay, now web crawlers are designed to start at some webpage and see what pages can be reached from there. Presumably the entire web, if the starting point or"}, {"chunks": [{"end": 2340.72, "start": 2340.0, "text": "points"}, {"end": 2341.36, "start": 2340.72, "text": "are"}, {"end": 2342.08, "start": 2341.36, "text": "reasonably"}, {"end": 2343.72, "start": 2342.08, "text": "chosen."}, {"end": 2343.96, "start": 2343.72, "text": "Okay,"}, {"end": 2346.84, "start": 2343.96, "text": "so"}, {"end": 2347.64, "start": 2346.84, "text": "they"}, {"end": 2348.52, "start": 2347.64, "text": "follow"}, {"end": 2349.0, "start": 2348.52, "text": "all"}, {"end": 2349.84, "start": 2349.0, "text": "links"}, {"end": 2350.16, "start": 2349.84, "text": "that"}, {"end": 2350.44, "start": 2350.16, "text": "they"}, {"end": 2351.08, "start": 2350.44, "text": "can"}, {"end": 2351.44, "start": 2351.08, "text": "find"}, {"end": 2352.2, "start": 2351.44, "text": "for"}, {"end": 2352.56, "start": 2352.2, "text": "some"}, {"end": 2353.12, "start": 2352.56, "text": "distance,"}, {"end": 2354.04, "start": 2353.12, "text": "but"}, {"end": 2354.48, "start": 2354.04, "text": "eventually"}, {"end": 2354.64, "start": 2354.48, "text": "they"}, {"end": 2354.96, "start": 2354.64, "text": "need"}, {"end": 2355.28, "start": 2354.96, "text": "to"}, {"end": 2355.8, "start": 2355.28, "text": "stop"}, {"end": 2356.12, "start": 2355.8, "text": "because"}, {"end": 2356.32, "start": 2356.12, "text": "the"}, {"end": 2356.36, "start": 2356.32, "text": "web"}, {"end": 2356.72, "start": 2356.36, "text": "is"}, {"end": 2356.8, "start": 2356.72, "text": "simply"}, {"end": 2356.96, "start": 2356.8, "text": "too"}, {"end": 2357.44, "start": 2356.96, "text": "large"}, {"end": 2357.72, "start": 2357.44, "text": "to"}, {"end": 2357.76, "start": 2357.72, "text": "be"}, {"end": 2358.4, "start": 2357.76, "text": "crawled"}, {"end": 2358.52, "start": 2358.4, "text": "in"}, {"end": 2358.72, "start": 2358.52, "text": "its"}, {"end": 2359.16, "start": 2358.72, "text": "entirety."}, {"end": 2359.36, "start": 2359.16, "text": "They"}, {"end": 2359.84, "start": 2359.36, "text": "might,"}, {"end": 2360.4, "start": 2359.84, "text": "for"}, {"end": 2361.0, "start": 2360.4, "text": "example,"}, {"end": 2362.16, "start": 2361.0, "text": "catalog"}, {"end": 2362.48, "start": 2362.16, "text": "all"}, {"end": 2363.0, "start": 2362.48, "text": "pages"}, {"end": 2363.0, "start": 2363.0, "text": "that"}, {"end": 2363.68, "start": 2363.0, "text": "can"}, {"end": 2363.96, "start": 2363.68, "text": "be"}, {"end": 2364.4, "start": 2363.96, "text": "reached"}, {"end": 2364.92, "start": 2364.4, "text": "from"}, {"end": 2365.24, "start": 2364.92, "text": "the"}, {"end": 2365.88, "start": 2365.24, "text": "starting"}, {"end": 2366.4, "start": 2365.88, "text": "point"}, {"end": 2366.96, "start": 2366.4, "text": "by"}, {"end": 2367.36, "start": 2366.96, "text": "at"}, {"end": 2367.56, "start": 2367.36, "text": "most"}, {"end": 2368.04, "start": 2367.56, "text": "10"}, {"end": 2369.28, "start": 2368.04, "text": "hops,"}, {"end": 2369.72, "start": 2369.28, "text": "but"}, {"end": 2369.96, "start": 2369.72, "text": "then"}], "text": " points are reasonably chosen. Okay, so they follow all links that they can find for some distance, but eventually they need to stop because the web is simply too large to be crawled in its entirety. They might, for example, catalog all pages that can be reached from the starting point by at most 10 hops, but then"}, {"chunks": [{"end": 2370.48, "start": 2370.0, "text": "And"}, {"end": 2370.76, "start": 2370.48, "text": "they"}, {"end": 2371.28, "start": 2370.76, "text": "start"}, {"end": 2371.4, "start": 2371.28, "text": "selecting"}, {"end": 2371.76, "start": 2371.4, "text": "only"}, {"end": 2372.24, "start": 2371.76, "text": "certain"}, {"end": 2372.64, "start": 2372.24, "text": "pages"}, {"end": 2372.96, "start": 2372.64, "text": "to"}, {"end": 2373.76, "start": 2372.96, "text": "crawl"}, {"end": 2374.2, "start": 2373.76, "text": "and"}, {"end": 2374.92, "start": 2374.2, "text": "examine"}, {"end": 2375.6, "start": 2374.92, "text": "only"}, {"end": 2376.16, "start": 2375.6, "text": "the"}, {"end": 2376.68, "start": 2376.16, "text": "links"}, {"end": 2376.84, "start": 2376.68, "text": "on"}, {"end": 2377.44, "start": 2376.84, "text": "those"}, {"end": 2377.72, "start": 2377.44, "text": "pages."}, {"end": 2380.04, "start": 2377.72, "text": "So"}, {"end": 2380.12, "start": 2380.04, "text": "a"}, {"end": 2380.68, "start": 2380.12, "text": "good"}, {"end": 2381.68, "start": 2380.68, "text": "heuristic"}, {"end": 2381.72, "start": 2381.68, "text": "is"}, {"end": 2382.24, "start": 2381.72, "text": "to"}, {"end": 2382.92, "start": 2382.24, "text": "select"}, {"end": 2383.32, "start": 2382.92, "text": "for"}, {"end": 2383.92, "start": 2383.32, "text": "crawling"}, {"end": 2384.16, "start": 2383.92, "text": "only"}, {"end": 2384.48, "start": 2384.16, "text": "those"}, {"end": 2385.0, "start": 2384.48, "text": "pages"}, {"end": 2385.68, "start": 2385.0, "text": "with"}, {"end": 2385.92, "start": 2385.68, "text": "a"}, {"end": 2386.24, "start": 2385.92, "text": "high"}, {"end": 2387.48, "start": 2386.24, "text": "page"}, {"end": 2387.84, "start": 2387.48, "text": "rank."}, {"end": 2389.0, "start": 2387.84, "text": "The"}, {"end": 2389.68, "start": 2389.0, "text": "intuition"}, {"end": 2390.28, "start": 2389.68, "text": "is"}, {"end": 2391.0, "start": 2390.28, "text": "that"}, {"end": 2391.28, "start": 2391.0, "text": "it's"}, {"end": 2391.48, "start": 2391.28, "text": "only"}, {"end": 2391.8, "start": 2391.48, "text": "those"}, {"end": 2392.2, "start": 2391.8, "text": "pages"}, {"end": 2392.88, "start": 2392.2, "text": "that"}, {"end": 2393.0, "start": 2392.88, "text": "will"}, {"end": 2393.84, "start": 2393.0, "text": "ever"}, {"end": 2393.88, "start": 2393.84, "text": "be"}, {"end": 2394.36, "start": 2393.88, "text": "shown"}, {"end": 2394.6, "start": 2394.36, "text": "in"}, {"end": 2395.28, "start": 2394.6, "text": "response"}, {"end": 2395.28, "start": 2395.28, "text": "to"}, {"end": 2395.32, "start": 2395.28, "text": "some"}, {"end": 2395.72, "start": 2395.32, "text": "search"}, {"end": 2396.88, "start": 2395.72, "text": "query,"}, {"end": 2397.32, "start": 2396.88, "text": "at"}, {"end": 2397.96, "start": 2397.32, "text": "least"}, {"end": 2398.16, "start": 2397.96, "text": "on"}, {"end": 2398.36, "start": 2398.16, "text": "the"}, {"end": 2398.72, "start": 2398.36, "text": "first"}, {"end": 2399.12, "start": 2398.72, "text": "page"}, {"end": 2399.24, "start": 2399.12, "text": "of"}, {"end": 2399.96, "start": 2399.24, "text": "responses."}], "text": " And they start selecting only certain pages to crawl and examine only the links on those pages. So a good heuristic is to select for crawling only those pages with a high page rank. The intuition is that it's only those pages that will ever be shown in response to some search query, at least on the first page of responses."}, {"chunks": [{"end": 2400.04, "start": 2400.0, "text": "the"}, {"end": 2400.6, "start": 2400.04, "text": "ones"}, {"end": 2401.0, "start": 2400.6, "text": "that"}, {"end": 2401.24, "start": 2401.0, "text": "people"}, {"end": 2401.88, "start": 2401.24, "text": "actually"}, {"end": 2402.28, "start": 2401.88, "text": "look"}, {"end": 2402.68, "start": 2402.28, "text": "at."}, {"end": 2403.36, "start": 2402.68, "text": "So"}, {"end": 2403.8, "start": 2403.36, "text": "the"}, {"end": 2404.12, "start": 2403.8, "text": "rest,"}, {"end": 2404.12, "start": 2404.12, "text": "if"}, {"end": 2404.12, "start": 2404.12, "text": "it"}, {"end": 2404.24, "start": 2404.12, "text": "has"}, {"end": 2404.56, "start": 2404.24, "text": "low"}, {"end": 2405.28, "start": 2404.56, "text": "page"}, {"end": 2405.68, "start": 2405.28, "text": "rank"}, {"end": 2406.24, "start": 2405.68, "text": "and"}, {"end": 2406.72, "start": 2406.24, "text": "it's"}, {"end": 2406.96, "start": 2406.72, "text": "very"}, {"end": 2407.36, "start": 2406.96, "text": "far"}, {"end": 2407.68, "start": 2407.36, "text": "away,"}, {"end": 2408.2, "start": 2407.68, "text": "nobody's"}, {"end": 2408.6, "start": 2408.2, "text": "ever"}, {"end": 2408.6, "start": 2408.6, "text": "gonna"}, {"end": 2409.64, "start": 2408.6, "text": "see"}, {"end": 2410.32, "start": 2409.64, "text": "it"}, {"end": 2410.8, "start": 2410.32, "text": "anyway,"}, {"end": 2411.56, "start": 2410.8, "text": "so"}, {"end": 2412.12, "start": 2411.56, "text": "why"}, {"end": 2412.48, "start": 2412.12, "text": "bother"}, {"end": 2412.76, "start": 2412.48, "text": "crawling"}, {"end": 2413.16, "start": 2412.76, "text": "it?"}, {"end": 2413.88, "start": 2413.16, "text": "Okay,"}, {"end": 2414.16, "start": 2413.88, "text": "the"}, {"end": 2414.4, "start": 2414.16, "text": "technical"}, {"end": 2415.2, "start": 2414.4, "text": "problem"}, {"end": 2415.72, "start": 2415.2, "text": "is"}, {"end": 2416.12, "start": 2415.72, "text": "of"}, {"end": 2416.6, "start": 2416.12, "text": "course"}, {"end": 2417.0, "start": 2416.6, "text": "that"}, {"end": 2417.16, "start": 2417.0, "text": "you"}, {"end": 2417.56, "start": 2417.16, "text": "can't"}, {"end": 2418.2, "start": 2417.56, "text": "compute"}, {"end": 2418.6, "start": 2418.2, "text": "the"}, {"end": 2418.96, "start": 2418.6, "text": "page"}, {"end": 2419.4, "start": 2418.96, "text": "rank"}, {"end": 2419.4, "start": 2419.4, "text": "until"}, {"end": 2419.56, "start": 2419.4, "text": "you"}, {"end": 2419.96, "start": 2419.56, "text": "finish"}, {"end": 2420.84, "start": 2419.96, "text": "crawling,"}, {"end": 2421.24, "start": 2420.84, "text": "so"}, {"end": 2421.32, "start": 2421.24, "text": "you"}, {"end": 2421.8, "start": 2421.32, "text": "need"}, {"end": 2421.92, "start": 2421.8, "text": "to"}, {"end": 2421.96, "start": 2421.92, "text": "do"}, {"end": 2424.08, "start": 2421.96, "text": "something"}, {"end": 2424.76, "start": 2424.08, "text": "instead."}, {"end": 2425.12, "start": 2424.76, "text": "Okay,"}, {"end": 2425.6, "start": 2425.12, "text": "well,"}, {"end": 2425.96, "start": 2425.6, "text": "a"}, {"end": 2426.48, "start": 2425.96, "text": "sensible"}, {"end": 2427.16, "start": 2426.48, "text": "substitute"}, {"end": 2427.56, "start": 2427.16, "text": "approach"}, {"end": 2427.76, "start": 2427.56, "text": "is"}, {"end": 2427.8, "start": 2427.76, "text": "to"}, {"end": 2428.44, "start": 2427.8, "text": "count"}, {"end": 2428.84, "start": 2428.44, "text": "for"}, {"end": 2428.96, "start": 2428.84, "text": "each"}, {"end": 2429.96, "start": 2428.96, "text": "page"}], "text": " the ones that people actually look at. So the rest, if it has low page rank and it's very far away, nobody's ever gonna see it anyway, so why bother crawling it? Okay, the technical problem is of course that you can't compute the page rank until you finish crawling, so you need to do something instead. Okay, well, a sensible substitute approach is to count for each page"}, {"chunks": [{"end": 2430.48, "start": 2430.0, "text": "the"}, {"end": 2430.68, "start": 2430.48, "text": "number"}, {"end": 2430.88, "start": 2430.68, "text": "of"}, {"end": 2431.28, "start": 2430.88, "text": "different"}, {"end": 2432.0, "start": 2431.28, "text": "predecessor"}, {"end": 2432.6, "start": 2432.0, "text": "pages"}, {"end": 2432.8, "start": 2432.6, "text": "you"}, {"end": 2432.84, "start": 2432.8, "text": "have"}, {"end": 2434.4, "start": 2432.84, "text": "crawled."}, {"end": 2434.88, "start": 2434.4, "text": "Okay,"}, {"end": 2435.4, "start": 2434.88, "text": "now"}, {"end": 2435.72, "start": 2435.4, "text": "that's"}, {"end": 2435.96, "start": 2435.72, "text": "not"}, {"end": 2436.56, "start": 2435.96, "text": "exactly"}, {"end": 2437.2, "start": 2436.56, "text": "page"}, {"end": 2437.24, "start": 2437.2, "text": "rank,"}, {"end": 2437.32, "start": 2437.24, "text": "but"}, {"end": 2437.96, "start": 2437.32, "text": "it's"}, {"end": 2438.48, "start": 2437.96, "text": "a"}, {"end": 2438.88, "start": 2438.48, "text": "good"}, {"end": 2441.12, "start": 2438.88, "text": "clue."}, {"end": 2441.92, "start": 2441.12, "text": "So"}, {"end": 2442.56, "start": 2441.92, "text": "imagine,"}, {"end": 2443.2, "start": 2442.56, "text": "therefore,"}, {"end": 2443.4, "start": 2443.2, "text": "that"}, {"end": 2444.24, "start": 2443.4, "text": "for"}, {"end": 2444.44, "start": 2444.24, "text": "each"}, {"end": 2444.96, "start": 2444.44, "text": "page"}, {"end": 2445.52, "start": 2444.96, "text": "we've"}, {"end": 2446.0, "start": 2445.52, "text": "found,"}, {"end": 2446.52, "start": 2446.0, "text": "but"}, {"end": 2446.96, "start": 2446.52, "text": "not"}, {"end": 2447.04, "start": 2446.96, "text": "yet"}, {"end": 2447.56, "start": 2447.04, "text": "decided"}, {"end": 2447.72, "start": 2447.56, "text": "to"}, {"end": 2448.0, "start": 2447.72, "text": "crawl,"}, {"end": 2448.48, "start": 2448.0, "text": "there"}, {"end": 2449.08, "start": 2448.48, "text": "is"}, {"end": 2449.6, "start": 2449.08, "text": "a"}, {"end": 2450.16, "start": 2449.6, "text": "stream"}, {"end": 2450.2, "start": 2450.16, "text": "of"}, {"end": 2450.52, "start": 2450.2, "text": "all"}, {"end": 2450.52, "start": 2450.52, "text": "the"}, {"end": 2451.2, "start": 2450.52, "text": "predecessor"}, {"end": 2451.8, "start": 2451.2, "text": "pages"}, {"end": 2451.8, "start": 2451.8, "text": "that"}, {"end": 2452.12, "start": 2451.8, "text": "we've"}, {"end": 2453.44, "start": 2452.12, "text": "found."}, {"end": 2453.8, "start": 2453.44, "text": "We"}, {"end": 2454.16, "start": 2453.8, "text": "want"}, {"end": 2454.36, "start": 2454.16, "text": "to,"}, {"end": 2454.36, "start": 2454.36, "text": "of"}, {"end": 2454.56, "start": 2454.36, "text": "course,"}, {"end": 2455.04, "start": 2454.56, "text": "count"}, {"end": 2455.28, "start": 2455.04, "text": "the"}, {"end": 2455.84, "start": 2455.28, "text": "number"}, {"end": 2456.12, "start": 2455.84, "text": "of"}, {"end": 2456.84, "start": 2456.12, "text": "distinct"}, {"end": 2457.36, "start": 2456.84, "text": "ones."}, {"end": 2457.8, "start": 2457.36, "text": "Now,"}, {"end": 2458.28, "start": 2457.8, "text": "you"}, {"end": 2458.36, "start": 2458.28, "text": "might"}, {"end": 2458.68, "start": 2458.36, "text": "imagine"}, {"end": 2458.8, "start": 2458.68, "text": "that"}, {"end": 2458.96, "start": 2458.8, "text": "we"}, {"end": 2459.28, "start": 2458.96, "text": "wouldn't"}, {"end": 2459.96, "start": 2459.28, "text": "crawl"}], "text": " the number of different predecessor pages you have crawled. Okay, now that's not exactly page rank, but it's a good clue. So imagine, therefore, that for each page we've found, but not yet decided to crawl, there is a stream of all the predecessor pages that we've found. We want to, of course, count the number of distinct ones. Now, you might imagine that we wouldn't crawl"}, {"chunks": [{"end": 2460.28, "start": 2460.0, "text": "a"}, {"end": 2461.04, "start": 2460.28, "text": "predecessor"}, {"end": 2461.36, "start": 2461.04, "text": "page"}, {"end": 2461.6, "start": 2461.36, "text": "more"}, {"end": 2461.68, "start": 2461.6, "text": "than"}, {"end": 2462.16, "start": 2461.68, "text": "once,"}, {"end": 2462.32, "start": 2462.16, "text": "but"}, {"end": 2462.48, "start": 2462.32, "text": "in"}, {"end": 2463.48, "start": 2462.48, "text": "fact,"}, {"end": 2463.84, "start": 2463.48, "text": "crawling"}, {"end": 2464.16, "start": 2463.84, "text": "is"}, {"end": 2465.12, "start": 2464.16, "text": "going"}, {"end": 2465.8, "start": 2465.12, "text": "on"}, {"end": 2466.36, "start": 2465.8, "text": "with"}, {"end": 2466.68, "start": 2466.36, "text": "many"}, {"end": 2466.96, "start": 2466.68, "text": "different"}, {"end": 2467.44, "start": 2466.96, "text": "machines"}, {"end": 2467.68, "start": 2467.44, "text": "and"}, {"end": 2467.68, "start": 2467.68, "text": "the"}, {"end": 2467.72, "start": 2467.68, "text": "many"}, {"end": 2468.28, "start": 2467.72, "text": "threads"}, {"end": 2468.52, "start": 2468.28, "text": "and"}, {"end": 2469.36, "start": 2468.52, "text": "everything's"}, {"end": 2469.76, "start": 2469.36, "text": "working"}, {"end": 2469.8, "start": 2469.76, "text": "in"}, {"end": 2470.52, "start": 2469.8, "text": "parallel."}, {"end": 2470.84, "start": 2470.52, "text": "So"}, {"end": 2471.08, "start": 2470.84, "text": "the"}, {"end": 2471.32, "start": 2471.08, "text": "same"}, {"end": 2471.72, "start": 2471.32, "text": "page"}, {"end": 2471.96, "start": 2471.72, "text": "might"}, {"end": 2472.08, "start": 2471.96, "text": "be"}, {"end": 2472.68, "start": 2472.08, "text": "crawled"}, {"end": 2472.84, "start": 2472.68, "text": "several"}, {"end": 2474.84, "start": 2472.84, "text": "times."}, {"end": 2475.24, "start": 2474.84, "text": "As"}, {"end": 2475.6, "start": 2475.24, "text": "a"}, {"end": 2476.6, "start": 2475.6, "text": "result,"}, {"end": 2477.24, "start": 2476.6, "text": "we"}, {"end": 2477.6, "start": 2477.24, "text": "have"}, {"end": 2477.92, "start": 2477.6, "text": "for"}, {"end": 2478.32, "start": 2477.92, "text": "each"}, {"end": 2478.48, "start": 2478.32, "text": "of"}, {"end": 2478.64, "start": 2478.48, "text": "the"}, {"end": 2479.0, "start": 2478.64, "text": "perhaps"}, {"end": 2479.56, "start": 2479.0, "text": "trillions"}, {"end": 2479.56, "start": 2479.56, "text": "of"}, {"end": 2480.28, "start": 2479.56, "text": "pages"}, {"end": 2480.6, "start": 2480.28, "text": "that"}, {"end": 2480.64, "start": 2480.6, "text": "we"}, {"end": 2480.68, "start": 2480.64, "text": "might"}, {"end": 2482.0, "start": 2480.68, "text": "visit,"}, {"end": 2482.24, "start": 2482.0, "text": "a"}, {"end": 2482.8, "start": 2482.24, "text": "problem"}, {"end": 2482.8, "start": 2482.8, "text": "of"}, {"end": 2483.36, "start": 2482.8, "text": "counting"}, {"end": 2483.6, "start": 2483.36, "text": "the"}, {"end": 2484.08, "start": 2483.6, "text": "number"}, {"end": 2484.36, "start": 2484.08, "text": "of"}, {"end": 2484.52, "start": 2484.36, "text": "unique"}, {"end": 2485.64, "start": 2484.52, "text": "predecessor"}, {"end": 2487.52, "start": 2485.64, "text": "pages."}, {"end": 2487.84, "start": 2487.52, "text": "Now,"}, {"end": 2488.8, "start": 2487.84, "text": "importantly,"}, {"end": 2489.16, "start": 2488.8, "text": "an"}, {"end": 2489.96, "start": 2489.16, "text": "approximation"}], "text": " a predecessor page more than once, but in fact, crawling is going on with many different machines and the many threads and everything's working in parallel. So the same page might be crawled several times. As a result, we have for each of the perhaps trillions of pages that we might visit, a problem of counting the number of unique predecessor pages. Now, importantly, an approximation"}, {"chunks": [{"end": 2490.2, "start": 2490.0, "text": "to"}, {"end": 2490.6, "start": 2490.2, "text": "this"}, {"end": 2491.12, "start": 2490.6, "text": "page"}, {"end": 2491.52, "start": 2491.12, "text": "is"}, {"end": 2492.64, "start": 2491.52, "text": "okay"}, {"end": 2493.48, "start": 2492.64, "text": "because"}, {"end": 2494.32, "start": 2493.48, "text": "the"}, {"end": 2494.8, "start": 2494.32, "text": "count"}, {"end": 2495.28, "start": 2494.8, "text": "is"}, {"end": 2495.68, "start": 2495.28, "text": "just"}, {"end": 2496.44, "start": 2495.68, "text": "an"}, {"end": 2496.96, "start": 2496.44, "text": "approximation"}, {"end": 2498.6, "start": 2496.96, "text": "to"}, {"end": 2499.56, "start": 2498.6, "text": "the"}, {"end": 2500.08, "start": 2499.56, "text": "page"}, {"end": 2500.76, "start": 2500.08, "text": "rank"}, {"end": 2501.4, "start": 2500.76, "text": "anyway."}, {"end": 2502.0, "start": 2501.4, "text": "So"}, {"end": 2502.72, "start": 2502.0, "text": "our"}, {"end": 2502.96, "start": 2502.72, "text": "goal"}, {"end": 2503.56, "start": 2502.96, "text": "then"}, {"end": 2503.92, "start": 2503.56, "text": "is"}, {"end": 2504.16, "start": 2503.92, "text": "to"}, {"end": 2508.52, "start": 2504.16, "text": "get"}, {"end": 2508.64, "start": 2508.52, "text": "a"}, {"end": 2508.96, "start": 2508.64, "text": "close"}, {"end": 2509.4, "start": 2508.96, "text": "approximation"}, {"end": 2509.72, "start": 2509.4, "text": "to"}, {"end": 2510.44, "start": 2509.72, "text": "the"}, {"end": 2512.56, "start": 2510.44, "text": "count"}, {"end": 2513.4, "start": 2512.56, "text": "of"}, {"end": 2514.36, "start": 2513.4, "text": "distinct"}, {"end": 2514.96, "start": 2514.36, "text": "elements"}, {"end": 2514.96, "start": 2514.96, "text": "in"}, {"end": 2515.04, "start": 2514.96, "text": "a"}, {"end": 2515.32, "start": 2515.04, "text": "stream."}, {"end": 2516.04, "start": 2515.32, "text": "And"}, {"end": 2516.12, "start": 2516.04, "text": "we"}, {"end": 2516.8, "start": 2516.12, "text": "want"}, {"end": 2517.84, "start": 2516.8, "text": "to"}, {"end": 2518.48, "start": 2517.84, "text": "use"}, {"end": 2519.04, "start": 2518.48, "text": "much"}, {"end": 2519.52, "start": 2519.04, "text": "less"}, {"end": 2519.96, "start": 2519.52, "text": "space"}], "text": " to this page is okay because the count is just an approximation to the page rank anyway. So our goal then is to get a close approximation to the count of distinct elements in a stream. And we want to use much less space"}, {"chunks": [{"end": 2520.12, "start": 2520.0, "text": "would"}, {"end": 2520.16, "start": 2520.12, "text": "be"}, {"end": 2520.6, "start": 2520.16, "text": "required"}, {"end": 2521.36, "start": 2520.6, "text": "by"}, {"end": 2521.6, "start": 2521.36, "text": "a"}, {"end": 2521.64, "start": 2521.6, "text": "hash"}, {"end": 2522.64, "start": 2521.64, "text": "table"}, {"end": 2523.0, "start": 2522.64, "text": "with"}, {"end": 2523.56, "start": 2523.0, "text": "all"}, {"end": 2524.4, "start": 2523.56, "text": "the"}, {"end": 2525.16, "start": 2524.4, "text": "unique"}, {"end": 2525.96, "start": 2525.16, "text": "elements."}, {"end": 2526.84, "start": 2525.96, "text": "And"}, {"end": 2527.12, "start": 2526.84, "text": "now"}, {"end": 2527.4, "start": 2527.12, "text": "there"}, {"end": 2527.56, "start": 2527.4, "text": "are"}, {"end": 2527.6, "start": 2527.56, "text": "a"}, {"end": 2527.84, "start": 2527.6, "text": "number"}, {"end": 2527.92, "start": 2527.84, "text": "of"}, {"end": 2528.48, "start": 2527.92, "text": "ways"}, {"end": 2529.08, "start": 2528.48, "text": "to"}, {"end": 2529.2, "start": 2529.08, "text": "do"}, {"end": 2529.64, "start": 2529.2, "text": "this,"}, {"end": 2530.04, "start": 2529.64, "text": "and"}, {"end": 2530.4, "start": 2530.04, "text": "I'm"}, {"end": 2530.72, "start": 2530.4, "text": "going"}, {"end": 2530.88, "start": 2530.72, "text": "to"}, {"end": 2531.32, "start": 2530.88, "text": "talk"}, {"end": 2531.56, "start": 2531.32, "text": "about"}, {"end": 2531.64, "start": 2531.56, "text": "an"}, {"end": 2532.48, "start": 2531.64, "text": "algorithm"}, {"end": 2533.44, "start": 2532.48, "text": "which"}, {"end": 2533.64, "start": 2533.44, "text": "is"}, {"end": 2534.04, "start": 2533.64, "text": "called"}, {"end": 2534.16, "start": 2534.04, "text": "the"}, {"end": 2535.16, "start": 2534.16, "text": "Flagellet-Martin"}, {"end": 2535.84, "start": 2535.16, "text": "algorithm."}, {"end": 2536.2, "start": 2535.84, "text": "So"}, {"end": 2536.8, "start": 2536.2, "text": "in"}, {"end": 2537.96, "start": 2536.8, "text": "Flagellet-Martin,"}, {"end": 2538.52, "start": 2537.96, "text": "instead"}, {"end": 2539.04, "start": 2538.52, "text": "of"}, {"end": 2539.52, "start": 2539.04, "text": "a"}, {"end": 2539.92, "start": 2539.52, "text": "hash"}, {"end": 2540.24, "start": 2539.92, "text": "table,"}, {"end": 2541.4, "start": 2540.24, "text": "we're"}, {"end": 2541.72, "start": 2541.4, "text": "going"}, {"end": 2541.72, "start": 2541.72, "text": "to"}, {"end": 2542.12, "start": 2541.72, "text": "keep"}, {"end": 2542.52, "start": 2542.12, "text": "a"}, {"end": 2543.36, "start": 2542.52, "text": "collection"}, {"end": 2543.6, "start": 2543.36, "text": "of"}, {"end": 2543.72, "start": 2543.6, "text": "what"}, {"end": 2543.72, "start": 2543.72, "text": "are"}, {"end": 2543.88, "start": 2543.72, "text": "called"}, {"end": 2544.28, "start": 2543.88, "text": "variables,"}, {"end": 2544.36, "start": 2544.28, "text": "each"}, {"end": 2544.48, "start": 2544.36, "text": "of"}, {"end": 2545.04, "start": 2544.48, "text": "which"}, {"end": 2545.44, "start": 2545.04, "text": "is"}, {"end": 2545.8, "start": 2545.44, "text": "a"}, {"end": 2546.4, "start": 2545.8, "text": "small"}, {"end": 2546.52, "start": 2546.4, "text": "integer."}, {"end": 2546.52, "start": 2546.52, "text": "And"}, {"end": 2546.52, "start": 2546.52, "text": "when"}, {"end": 2546.6, "start": 2546.52, "text": "I"}, {"end": 2547.24, "start": 2546.6, "text": "say"}, {"end": 2548.16, "start": 2547.24, "text": "small,"}, {"end": 2548.64, "start": 2548.16, "text": "I"}, {"end": 2549.12, "start": 2548.64, "text": "mean"}, {"end": 2549.52, "start": 2549.12, "text": "really"}, {"end": 2549.96, "start": 2549.52, "text": "small."}], "text": " would be required by a hash table with all the unique elements. And now there are a number of ways to do this, and I'm going to talk about an algorithm which is called the Flagellet-Martin algorithm. So in Flagellet-Martin, instead of a hash table, we're going to keep a collection of what are called variables, each of which is a small integer. And when I say small, I mean really small."}, {"chunks": [{"end": 2550.52, "start": 2550.0, "text": "Okay,"}, {"end": 2550.92, "start": 2550.52, "text": "five"}, {"end": 2551.24, "start": 2550.92, "text": "bits"}, {"end": 2551.8, "start": 2551.24, "text": "is"}, {"end": 2552.36, "start": 2551.8, "text": "quite"}, {"end": 2552.76, "start": 2552.36, "text": "sufficient"}, {"end": 2552.84, "start": 2552.76, "text": "in"}, {"end": 2553.32, "start": 2552.84, "text": "most"}, {"end": 2554.2, "start": 2553.32, "text": "applications"}, {"end": 2554.6, "start": 2554.2, "text": "and"}, {"end": 2554.76, "start": 2554.6, "text": "in"}, {"end": 2555.2, "start": 2554.76, "text": "some"}, {"end": 2555.8, "start": 2555.2, "text": "applications"}, {"end": 2555.96, "start": 2555.8, "text": "even"}, {"end": 2556.36, "start": 2555.96, "text": "four"}, {"end": 2556.64, "start": 2556.36, "text": "bits"}, {"end": 2556.8, "start": 2556.64, "text": "will"}, {"end": 2557.16, "start": 2556.8, "text": "do."}, {"end": 2557.24, "start": 2557.16, "text": "So"}, {"end": 2557.24, "start": 2557.24, "text": "we"}, {"end": 2557.92, "start": 2557.24, "text": "keep"}, {"end": 2558.6, "start": 2557.92, "text": "a"}, {"end": 2558.84, "start": 2558.6, "text": "hundred"}, {"end": 2558.88, "start": 2558.84, "text": "of"}, {"end": 2559.12, "start": 2558.88, "text": "these"}, {"end": 2560.08, "start": 2559.12, "text": "variables,"}, {"end": 2560.72, "start": 2560.08, "text": "we"}, {"end": 2560.8, "start": 2560.72, "text": "need"}, {"end": 2561.6, "start": 2560.8, "text": "500"}, {"end": 2562.28, "start": 2561.6, "text": "bits"}, {"end": 2562.6, "start": 2562.28, "text": "or"}, {"end": 2562.72, "start": 2562.6, "text": "that's"}, {"end": 2562.84, "start": 2562.72, "text": "about"}, {"end": 2563.6, "start": 2562.84, "text": "63"}, {"end": 2564.44, "start": 2563.6, "text": "bytes."}, {"end": 2564.8, "start": 2564.44, "text": "That's"}, {"end": 2565.08, "start": 2564.8, "text": "way"}, {"end": 2565.64, "start": 2565.08, "text": "smaller"}, {"end": 2565.92, "start": 2565.64, "text": "than"}, {"end": 2566.16, "start": 2565.92, "text": "the"}, {"end": 2566.76, "start": 2566.16, "text": "hash"}, {"end": 2567.64, "start": 2566.76, "text": "table"}, {"end": 2568.72, "start": 2567.64, "text": "approach."}, {"end": 2569.12, "start": 2568.72, "text": "Now"}, {"end": 2569.64, "start": 2569.12, "text": "technically"}, {"end": 2570.48, "start": 2569.64, "text": "possibly"}, {"end": 2570.8, "start": 2570.48, "text": "you"}, {"end": 2570.84, "start": 2570.8, "text": "want"}, {"end": 2571.24, "start": 2570.84, "text": "two"}, {"end": 2571.44, "start": 2571.24, "text": "or"}, {"end": 2571.56, "start": 2571.44, "text": "three"}, {"end": 2572.16, "start": 2571.56, "text": "hundred"}, {"end": 2573.12, "start": 2572.16, "text": "variables"}, {"end": 2573.64, "start": 2573.12, "text": "to"}, {"end": 2574.16, "start": 2573.64, "text": "get"}, {"end": 2574.36, "start": 2574.16, "text": "a"}, {"end": 2574.6, "start": 2574.36, "text": "more"}, {"end": 2575.24, "start": 2574.6, "text": "accurate"}, {"end": 2575.76, "start": 2575.24, "text": "estimate,"}, {"end": 2575.92, "start": 2575.76, "text": "but"}, {"end": 2576.2, "start": 2575.92, "text": "even"}, {"end": 2576.64, "start": 2576.2, "text": "then"}, {"end": 2576.8, "start": 2576.64, "text": "we're"}, {"end": 2577.4, "start": 2576.8, "text": "talking"}, {"end": 2577.92, "start": 2577.4, "text": "about"}, {"end": 2578.08, "start": 2577.92, "text": "a"}, {"end": 2578.72, "start": 2578.08, "text": "tiny"}, {"end": 2578.92, "start": 2578.72, "text": "amount"}, {"end": 2579.24, "start": 2578.92, "text": "of"}, {"end": 2579.8, "start": 2579.24, "text": "space"}, {"end": 2579.96, "start": 2579.8, "text": "compared"}], "text": " Okay, five bits is quite sufficient in most applications and in some applications even four bits will do. So we keep a hundred of these variables, we need 500 bits or that's about 63 bytes. That's way smaller than the hash table approach. Now technically possibly you want two or three hundred variables to get a more accurate estimate, but even then we're talking about a tiny amount of space compared"}, {"chunks": [{"end": 2580.32, "start": 2580.0, "text": "with"}, {"end": 2580.68, "start": 2580.32, "text": "a"}, {"end": 2580.88, "start": 2580.68, "text": "hash"}, {"end": 2581.08, "start": 2580.88, "text": "table"}, {"end": 2581.6, "start": 2581.08, "text": "that"}, {"end": 2582.16, "start": 2581.6, "text": "holds"}, {"end": 2582.44, "start": 2582.16, "text": "everything"}, {"end": 2582.6, "start": 2582.44, "text": "you've"}, {"end": 2582.96, "start": 2582.6, "text": "ever"}, {"end": 2586.28, "start": 2582.96, "text": "seen."}, {"end": 2588.48, "start": 2586.28, "text": "Okay,"}, {"end": 2589.28, "start": 2588.48, "text": "now,"}, {"end": 2589.76, "start": 2589.28, "text": "with"}, {"end": 2590.0, "start": 2589.76, "text": "each"}, {"end": 2590.56, "start": 2590.0, "text": "variable"}, {"end": 2591.24, "start": 2590.56, "text": "v,"}, {"end": 2592.32, "start": 2591.24, "text": "there's"}, {"end": 2592.64, "start": 2592.32, "text": "a"}, {"end": 2593.16, "start": 2592.64, "text": "different"}, {"end": 2593.48, "start": 2593.16, "text": "hash"}, {"end": 2593.96, "start": 2593.48, "text": "function"}, {"end": 2595.44, "start": 2593.96, "text": "associated."}, {"end": 2596.12, "start": 2595.44, "text": "Okay,"}, {"end": 2596.2, "start": 2596.12, "text": "we're"}, {"end": 2597.0, "start": 2596.2, "text": "going"}, {"end": 2597.24, "start": 2597.0, "text": "to"}, {"end": 2597.76, "start": 2597.24, "text": "assume"}, {"end": 2598.64, "start": 2597.76, "text": "that"}, {"end": 2599.28, "start": 2598.64, "text": "the"}, {"end": 2599.84, "start": 2599.28, "text": "result"}, {"end": 2599.84, "start": 2599.84, "text": "of"}, {"end": 2600.44, "start": 2599.84, "text": "applying"}, {"end": 2600.6, "start": 2600.44, "text": "a"}, {"end": 2601.0, "start": 2600.6, "text": "hash"}, {"end": 2601.8, "start": 2601.0, "text": "function"}, {"end": 2602.56, "start": 2601.8, "text": "is"}, {"end": 2602.8, "start": 2602.56, "text": "a"}, {"end": 2602.92, "start": 2602.8, "text": "bit"}, {"end": 2603.32, "start": 2602.92, "text": "sequence,"}, {"end": 2603.44, "start": 2603.32, "text": "in"}, {"end": 2604.12, "start": 2603.44, "text": "particular,"}, {"end": 2604.2, "start": 2604.12, "text": "of"}, {"end": 2604.68, "start": 2604.2, "text": "31"}, {"end": 2608.84, "start": 2604.68, "text": "bits."}, {"end": 2609.48, "start": 2608.84, "text": "Now,"}, {"end": 2609.96, "start": 2609.48, "text": "suppose"}], "text": " with a hash table that holds everything you've ever seen. Okay, now, with each variable v, there's a different hash function associated. Okay, we're going to assume that the result of applying a hash function is a bit sequence, in particular, of 31 bits. Now, suppose"}, {"chunks": [{"end": 2610.16, "start": 2610.0, "text": "a"}, {"end": 2610.52, "start": 2610.16, "text": "new"}, {"end": 2610.92, "start": 2610.52, "text": "element"}, {"end": 2611.36, "start": 2610.92, "text": "appears"}, {"end": 2611.36, "start": 2611.36, "text": "on"}, {"end": 2611.48, "start": 2611.36, "text": "the"}, {"end": 2612.96, "start": 2611.48, "text": "stream."}, {"end": 2613.56, "start": 2612.96, "text": "For"}, {"end": 2614.12, "start": 2613.56, "text": "example,"}, {"end": 2614.6, "start": 2614.12, "text": "someone"}, {"end": 2614.6, "start": 2614.6, "text": "whom"}, {"end": 2614.6, "start": 2614.6, "text": "we"}, {"end": 2614.68, "start": 2614.6, "text": "may"}, {"end": 2614.84, "start": 2614.68, "text": "or"}, {"end": 2615.4, "start": 2614.84, "text": "may"}, {"end": 2616.0, "start": 2615.4, "text": "not"}, {"end": 2616.0, "start": 2616.0, "text": "have"}, {"end": 2616.24, "start": 2616.0, "text": "seen"}, {"end": 2617.16, "start": 2616.24, "text": "before"}, {"end": 2617.6, "start": 2617.16, "text": "logs"}, {"end": 2617.8, "start": 2617.6, "text": "into"}, {"end": 2618.12, "start": 2617.8, "text": "Facebook."}, {"end": 2618.16, "start": 2618.12, "text": "We"}, {"end": 2621.92, "start": 2618.16, "text": "apply"}, {"end": 2622.4, "start": 2621.92, "text": "each"}, {"end": 2622.76, "start": 2622.4, "text": "of"}, {"end": 2623.28, "start": 2622.76, "text": "the"}, {"end": 2623.76, "start": 2623.28, "text": "hash"}, {"end": 2624.56, "start": 2623.76, "text": "functions,"}, {"end": 2624.68, "start": 2624.56, "text": "one"}, {"end": 2625.28, "start": 2624.68, "text": "for"}, {"end": 2625.8, "start": 2625.28, "text": "each"}, {"end": 2626.04, "start": 2625.8, "text": "of"}, {"end": 2626.08, "start": 2626.04, "text": "the"}, {"end": 2626.4, "start": 2626.08, "text": "variables,"}, {"end": 2626.56, "start": 2626.4, "text": "to"}, {"end": 2626.84, "start": 2626.56, "text": "this"}, {"end": 2631.24, "start": 2626.84, "text": "new"}, {"end": 2632.12, "start": 2631.24, "text": "element."}, {"end": 2632.64, "start": 2632.12, "text": "Now,"}, {"end": 2633.04, "start": 2632.64, "text": "we"}, {"end": 2633.56, "start": 2633.04, "text": "have"}, {"end": 2634.04, "start": 2633.56, "text": "to"}, {"end": 2634.56, "start": 2634.04, "text": "define"}, {"end": 2634.6, "start": 2634.56, "text": "the"}, {"end": 2635.24, "start": 2634.6, "text": "tail"}, {"end": 2636.36, "start": 2635.24, "text": "length"}, {"end": 2636.68, "start": 2636.36, "text": "of"}, {"end": 2637.0, "start": 2636.68, "text": "the"}, {"end": 2637.52, "start": 2637.0, "text": "result"}, {"end": 2637.88, "start": 2637.52, "text": "of"}, {"end": 2638.16, "start": 2637.88, "text": "a"}, {"end": 2638.44, "start": 2638.16, "text": "hash"}, {"end": 2638.64, "start": 2638.44, "text": "function"}, {"end": 2639.0, "start": 2638.64, "text": "is"}, {"end": 2639.44, "start": 2639.0, "text": "the"}, {"end": 2639.96, "start": 2639.44, "text": "number"}], "text": " a new element appears on the stream. For example, someone whom we may or may not have seen before logs into Facebook. We apply each of the hash functions, one for each of the variables, to this new element. Now, we have to define the tail length of the result of a hash function is the number"}, {"chunks": [{"end": 2640.12, "start": 2640.0, "text": "of"}, {"end": 2640.76, "start": 2640.12, "text": "consecutive"}, {"end": 2641.24, "start": 2640.76, "text": "zeros"}, {"end": 2641.28, "start": 2641.24, "text": "at"}, {"end": 2641.6, "start": 2641.28, "text": "the"}, {"end": 2641.76, "start": 2641.6, "text": "end"}, {"end": 2641.76, "start": 2641.76, "text": "of"}, {"end": 2641.84, "start": 2641.76, "text": "the"}, {"end": 2642.04, "start": 2641.84, "text": "bit"}, {"end": 2642.84, "start": 2642.04, "text": "string."}, {"end": 2643.2, "start": 2642.84, "text": "So"}, {"end": 2643.6, "start": 2643.2, "text": "about"}, {"end": 2643.88, "start": 2643.6, "text": "half"}, {"end": 2644.12, "start": 2643.88, "text": "of"}, {"end": 2644.16, "start": 2644.12, "text": "the"}, {"end": 2644.84, "start": 2644.16, "text": "elements"}, {"end": 2645.24, "start": 2644.84, "text": "will"}, {"end": 2645.64, "start": 2645.24, "text": "have"}, {"end": 2645.84, "start": 2645.64, "text": "a"}, {"end": 2646.08, "start": 2645.84, "text": "tail"}, {"end": 2646.28, "start": 2646.08, "text": "length"}, {"end": 2646.44, "start": 2646.28, "text": "of"}, {"end": 2646.96, "start": 2646.44, "text": "zero,"}, {"end": 2647.36, "start": 2646.96, "text": "a"}, {"end": 2648.2, "start": 2647.36, "text": "quarter"}, {"end": 2648.44, "start": 2648.2, "text": "have"}, {"end": 2648.6, "start": 2648.44, "text": "a"}, {"end": 2649.52, "start": 2648.6, "text": "tail"}, {"end": 2650.12, "start": 2649.52, "text": "length"}, {"end": 2650.52, "start": 2650.12, "text": "of"}, {"end": 2650.8, "start": 2650.52, "text": "one,"}, {"end": 2651.0, "start": 2650.8, "text": "an"}, {"end": 2651.24, "start": 2651.0, "text": "eighth"}, {"end": 2651.56, "start": 2651.24, "text": "have"}, {"end": 2651.68, "start": 2651.56, "text": "two,"}, {"end": 2652.16, "start": 2651.68, "text": "tail"}, {"end": 2652.2, "start": 2652.16, "text": "length"}, {"end": 2652.28, "start": 2652.2, "text": "of"}, {"end": 2652.6, "start": 2652.28, "text": "two,"}, {"end": 2652.96, "start": 2652.6, "text": "and"}, {"end": 2656.08, "start": 2652.96, "text": "so"}, {"end": 2657.12, "start": 2656.08, "text": "on."}, {"end": 2657.88, "start": 2657.12, "text": "Now,"}, {"end": 2658.08, "start": 2657.88, "text": "each"}, {"end": 2658.76, "start": 2658.08, "text": "variable"}, {"end": 2659.08, "start": 2658.76, "text": "is"}, {"end": 2659.44, "start": 2659.08, "text": "given"}, {"end": 2659.44, "start": 2659.44, "text": "the"}, {"end": 2659.56, "start": 2659.44, "text": "value"}, {"end": 2660.32, "start": 2659.56, "text": "that"}, {"end": 2660.68, "start": 2660.32, "text": "is"}, {"end": 2661.16, "start": 2660.68, "text": "the"}, {"end": 2662.12, "start": 2661.16, "text": "largest"}, {"end": 2662.28, "start": 2662.12, "text": "tail"}, {"end": 2662.36, "start": 2662.28, "text": "length"}, {"end": 2662.8, "start": 2662.36, "text": "seen"}, {"end": 2663.2, "start": 2662.8, "text": "so"}, {"end": 2663.44, "start": 2663.2, "text": "far."}, {"end": 2664.88, "start": 2663.44, "text": "So"}, {"end": 2665.24, "start": 2664.88, "text": "if"}, {"end": 2665.64, "start": 2665.24, "text": "we're"}, {"end": 2666.16, "start": 2665.64, "text": "hashing"}, {"end": 2666.36, "start": 2666.16, "text": "to"}, {"end": 2666.72, "start": 2666.36, "text": "a"}, {"end": 2667.48, "start": 2666.72, "text": "31-bit"}, {"end": 2668.2, "start": 2667.48, "text": "string,"}, {"end": 2668.6, "start": 2668.2, "text": "then"}, {"end": 2668.68, "start": 2668.6, "text": "the"}, {"end": 2669.28, "start": 2668.68, "text": "value"}, {"end": 2669.52, "start": 2669.28, "text": "of"}, {"end": 2669.76, "start": 2669.52, "text": "each"}, {"end": 2669.96, "start": 2669.76, "text": "variable"}], "text": " of consecutive zeros at the end of the bit string. So about half of the elements will have a tail length of zero, a quarter have a tail length of one, an eighth have two, tail length of two, and so on. Now, each variable is given the value that is the largest tail length seen so far. So if we're hashing to a 31-bit string, then the value of each variable"}, {"chunks": [{"end": 2670.52, "start": 2670.0, "text": "variable"}, {"end": 2670.76, "start": 2670.52, "text": "is"}, {"end": 2671.0, "start": 2670.76, "text": "between"}, {"end": 2671.48, "start": 2671.0, "text": "zero"}, {"end": 2671.6, "start": 2671.48, "text": "and"}, {"end": 2672.2, "start": 2671.6, "text": "31,"}, {"end": 2672.84, "start": 2672.2, "text": "and"}, {"end": 2673.44, "start": 2672.84, "text": "therefore"}, {"end": 2673.8, "start": 2673.44, "text": "five"}, {"end": 2674.52, "start": 2673.8, "text": "bits"}, {"end": 2675.32, "start": 2674.52, "text": "suffice"}, {"end": 2675.6, "start": 2675.32, "text": "to"}, {"end": 2675.88, "start": 2675.6, "text": "represent"}, {"end": 2676.0, "start": 2675.88, "text": "the"}, {"end": 2676.32, "start": 2676.0, "text": "variable,"}, {"end": 2677.28, "start": 2676.32, "text": "as"}, {"end": 2677.6, "start": 2677.28, "text": "I"}, {"end": 2677.96, "start": 2677.6, "text": "said."}, {"end": 2679.08, "start": 2677.96, "text": "Now,"}, {"end": 2679.24, "start": 2679.08, "text": "we"}, {"end": 2680.08, "start": 2679.24, "text": "need"}, {"end": 2680.24, "start": 2680.08, "text": "to"}, {"end": 2681.16, "start": 2680.24, "text": "notice"}, {"end": 2682.32, "start": 2681.16, "text": "something"}, {"end": 2682.72, "start": 2682.32, "text": "quite"}, {"end": 2683.6, "start": 2682.72, "text": "important,"}, {"end": 2684.32, "start": 2683.6, "text": "okay?"}, {"end": 2684.72, "start": 2684.32, "text": "If"}, {"end": 2685.04, "start": 2684.72, "text": "the"}, {"end": 2685.48, "start": 2685.04, "text": "same"}, {"end": 2686.12, "start": 2685.48, "text": "element"}, {"end": 2686.52, "start": 2686.12, "text": "appears"}, {"end": 2686.84, "start": 2686.52, "text": "many"}, {"end": 2687.8, "start": 2686.84, "text": "times"}, {"end": 2688.6, "start": 2687.8, "text": "in"}, {"end": 2689.24, "start": 2688.6, "text": "the"}, {"end": 2689.84, "start": 2689.24, "text": "stream,"}, {"end": 2690.16, "start": 2689.84, "text": "each"}, {"end": 2690.48, "start": 2690.16, "text": "time"}, {"end": 2691.2, "start": 2690.48, "text": "it"}, {"end": 2691.68, "start": 2691.2, "text": "appears,"}, {"end": 2691.92, "start": 2691.68, "text": "it"}, {"end": 2692.08, "start": 2691.92, "text": "will"}, {"end": 2692.64, "start": 2692.08, "text": "get"}, {"end": 2692.64, "start": 2692.64, "text": "the"}, {"end": 2692.64, "start": 2692.64, "text": "same"}, {"end": 2692.64, "start": 2692.64, "text": "value"}, {"end": 2692.68, "start": 2692.64, "text": "for"}, {"end": 2692.84, "start": 2692.68, "text": "each"}, {"end": 2692.96, "start": 2692.84, "text": "of"}, {"end": 2694.16, "start": 2692.96, "text": "its"}, {"end": 2695.08, "start": 2694.16, "text": "hash"}, {"end": 2696.0, "start": 2695.08, "text": "functions,"}, {"end": 2696.2, "start": 2696.0, "text": "and"}, {"end": 2696.76, "start": 2696.2, "text": "therefore"}, {"end": 2696.76, "start": 2696.76, "text": "it"}, {"end": 2697.0, "start": 2696.76, "text": "gets"}, {"end": 2697.0, "start": 2697.0, "text": "the"}, {"end": 2697.0, "start": 2697.0, "text": "same"}, {"end": 2697.0, "start": 2697.0, "text": "tail"}, {"end": 2697.0, "start": 2697.0, "text": "length,"}, {"end": 2698.56, "start": 2697.0, "text": "okay?"}, {"end": 2698.92, "start": 2698.56, "text": "As"}, {"end": 2699.12, "start": 2698.92, "text": "a"}, {"end": 2699.96, "start": 2699.12, "text": "result,"}], "text": " variable is between zero and 31, and therefore five bits suffice to represent the variable, as I said. Now, we need to notice something quite important, okay? If the same element appears many times in the stream, each time it appears, it will get the same value for each of its hash functions, and therefore it gets the same tail length, okay? As a result,"}, {"chunks": [{"end": 2700.16, "start": 2700.0, "text": "Well,"}, {"end": 2700.28, "start": 2700.16, "text": "if"}, {"end": 2700.32, "start": 2700.28, "text": "you"}, {"end": 2700.96, "start": 2700.32, "text": "think"}, {"end": 2701.92, "start": 2700.96, "text": "about"}, {"end": 2702.04, "start": 2701.92, "text": "it,"}, {"end": 2702.04, "start": 2702.04, "text": "an"}, {"end": 2702.44, "start": 2702.04, "text": "element"}, {"end": 2702.8, "start": 2702.44, "text": "can"}, {"end": 2703.16, "start": 2702.8, "text": "only"}, {"end": 2703.4, "start": 2703.16, "text": "affect"}, {"end": 2703.52, "start": 2703.4, "text": "the"}, {"end": 2704.32, "start": 2703.52, "text": "value"}, {"end": 2704.52, "start": 2704.32, "text": "of"}, {"end": 2704.6, "start": 2704.52, "text": "a"}, {"end": 2705.28, "start": 2704.6, "text": "variable"}, {"end": 2705.6, "start": 2705.28, "text": "the"}, {"end": 2706.04, "start": 2705.6, "text": "first"}, {"end": 2706.36, "start": 2706.04, "text": "time"}, {"end": 2706.8, "start": 2706.36, "text": "it's"}, {"end": 2706.84, "start": 2706.8, "text": "seen,"}, {"end": 2706.88, "start": 2706.84, "text": "in"}, {"end": 2707.16, "start": 2706.88, "text": "the"}, {"end": 2707.4, "start": 2707.16, "text": "case"}, {"end": 2707.56, "start": 2707.4, "text": "that"}, {"end": 2708.04, "start": 2707.56, "text": "it"}, {"end": 2708.68, "start": 2708.04, "text": "happens"}, {"end": 2709.12, "start": 2708.68, "text": "to"}, {"end": 2709.72, "start": 2709.12, "text": "have"}, {"end": 2709.88, "start": 2709.72, "text": "the"}, {"end": 2710.44, "start": 2709.88, "text": "longest"}, {"end": 2711.12, "start": 2710.44, "text": "tail"}, {"end": 2711.44, "start": 2711.12, "text": "so"}, {"end": 2711.88, "start": 2711.44, "text": "far."}, {"end": 2712.36, "start": 2711.88, "text": "The"}, {"end": 2713.16, "start": 2712.36, "text": "second"}, {"end": 2713.44, "start": 2713.16, "text": "and"}, {"end": 2714.2, "start": 2713.44, "text": "subsequent"}, {"end": 2715.4, "start": 2714.2, "text": "occurrences"}, {"end": 2715.72, "start": 2715.4, "text": "of"}, {"end": 2715.76, "start": 2715.72, "text": "an"}, {"end": 2716.16, "start": 2715.76, "text": "element,"}, {"end": 2716.36, "start": 2716.16, "text": "you"}, {"end": 2716.88, "start": 2716.36, "text": "get"}, {"end": 2717.16, "start": 2716.88, "text": "the"}, {"end": 2717.4, "start": 2717.16, "text": "same"}, {"end": 2717.76, "start": 2717.4, "text": "tail"}, {"end": 2718.08, "start": 2717.76, "text": "lengths,"}, {"end": 2718.44, "start": 2718.08, "text": "so"}, {"end": 2718.96, "start": 2718.44, "text": "you"}, {"end": 2719.44, "start": 2718.96, "text": "can't"}, {"end": 2719.76, "start": 2719.44, "text": "change"}, {"end": 2720.2, "start": 2719.76, "text": "anything."}, {"end": 2720.6, "start": 2720.2, "text": "And"}, {"end": 2720.88, "start": 2720.6, "text": "so"}, {"end": 2721.4, "start": 2720.88, "text": "as"}, {"end": 2721.92, "start": 2721.4, "text": "far"}, {"end": 2722.24, "start": 2721.92, "text": "as"}, {"end": 2722.64, "start": 2722.24, "text": "this"}, {"end": 2723.76, "start": 2722.64, "text": "process"}, {"end": 2723.88, "start": 2723.76, "text": "is"}, {"end": 2724.68, "start": 2723.88, "text": "concerned,"}, {"end": 2724.76, "start": 2724.68, "text": "the"}, {"end": 2725.52, "start": 2724.76, "text": "second"}, {"end": 2725.92, "start": 2725.52, "text": "and"}, {"end": 2726.64, "start": 2725.92, "text": "subsequent"}, {"end": 2726.92, "start": 2726.64, "text": "elements,"}, {"end": 2727.76, "start": 2726.92, "text": "occurrences"}, {"end": 2728.2, "start": 2727.76, "text": "of"}, {"end": 2728.64, "start": 2728.2, "text": "an"}, {"end": 2728.76, "start": 2728.64, "text": "element"}, {"end": 2728.96, "start": 2728.76, "text": "might"}, {"end": 2729.32, "start": 2728.96, "text": "not"}, {"end": 2729.96, "start": 2729.32, "text": "exist."}], "text": " Well, if you think about it, an element can only affect the value of a variable the first time it's seen, in the case that it happens to have the longest tail so far. The second and subsequent occurrences of an element, you get the same tail lengths, so you can't change anything. And so as far as this process is concerned, the second and subsequent elements, occurrences of an element might not exist."}, {"chunks": [{"end": 2731.04, "start": 2730.0, "text": "intuitively"}, {"end": 2731.96, "start": 2731.04, "text": "why"}, {"end": 2732.52, "start": 2731.96, "text": "we"}, {"end": 2732.92, "start": 2732.52, "text": "are"}, {"end": 2733.84, "start": 2732.92, "text": "approximating"}, {"end": 2733.88, "start": 2733.84, "text": "the"}, {"end": 2734.44, "start": 2733.88, "text": "count"}, {"end": 2734.84, "start": 2734.44, "text": "of"}, {"end": 2740.8, "start": 2734.84, "text": "distinct"}, {"end": 2742.56, "start": 2740.8, "text": "elements."}, {"end": 2742.92, "start": 2742.56, "text": "Now,"}, {"end": 2743.28, "start": 2742.92, "text": "each"}, {"end": 2744.4, "start": 2743.28, "text": "variable"}, {"end": 2745.08, "start": 2744.4, "text": "gives"}, {"end": 2745.52, "start": 2745.08, "text": "its"}, {"end": 2745.68, "start": 2745.52, "text": "own"}, {"end": 2746.68, "start": 2745.68, "text": "estimate"}, {"end": 2747.24, "start": 2746.68, "text": "of"}, {"end": 2747.48, "start": 2747.24, "text": "how"}, {"end": 2747.88, "start": 2747.48, "text": "many"}, {"end": 2748.52, "start": 2747.88, "text": "distinct"}, {"end": 2749.0, "start": 2748.52, "text": "elements"}, {"end": 2749.08, "start": 2749.0, "text": "you've"}, {"end": 2750.88, "start": 2749.08, "text": "seen."}, {"end": 2751.32, "start": 2750.88, "text": "In"}, {"end": 2752.08, "start": 2751.32, "text": "particular,"}, {"end": 2752.28, "start": 2752.08, "text": "a"}, {"end": 2753.0, "start": 2752.28, "text": "variable"}, {"end": 2753.6, "start": 2753.0, "text": "that"}, {"end": 2754.28, "start": 2753.6, "text": "has"}, {"end": 2754.6, "start": 2754.28, "text": "the"}, {"end": 2755.32, "start": 2754.6, "text": "value"}, {"end": 2755.84, "start": 2755.32, "text": "of"}, {"end": 2756.56, "start": 2755.84, "text": "tail"}, {"end": 2757.16, "start": 2756.56, "text": "length"}, {"end": 2758.56, "start": 2757.16, "text": "r"}, {"end": 2759.16, "start": 2758.56, "text": "estimates"}, {"end": 2759.16, "start": 2759.16, "text": "we"}, {"end": 2759.48, "start": 2759.16, "text": "have"}, {"end": 2759.96, "start": 2759.48, "text": "seen"}], "text": " intuitively why we are approximating the count of distinct elements. Now, each variable gives its own estimate of how many distinct elements you've seen. In particular, a variable that has the value of tail length r estimates we have seen"}, {"chunks": [{"end": 2760.0, "start": 2760.0, "text": "2"}, {"end": 2760.16, "start": 2760.0, "text": "to"}, {"end": 2760.52, "start": 2760.16, "text": "the"}, {"end": 2761.04, "start": 2760.52, "text": "power"}, {"end": 2761.32, "start": 2761.04, "text": "r"}, {"end": 2761.6, "start": 2761.32, "text": "different"}, {"end": 2765.88, "start": 2761.6, "text": "elements."}, {"end": 2766.68, "start": 2765.88, "text": "Now,"}, {"end": 2767.24, "start": 2766.68, "text": "why"}, {"end": 2767.88, "start": 2767.24, "text": "this"}, {"end": 2768.64, "start": 2767.88, "text": "makes"}, {"end": 2769.16, "start": 2768.64, "text": "sense,"}, {"end": 2770.0, "start": 2769.16, "text": "it"}, {"end": 2770.48, "start": 2770.0, "text": "looks"}, {"end": 2771.44, "start": 2770.48, "text": "tricky."}, {"end": 2772.68, "start": 2771.44, "text": "The"}, {"end": 2773.12, "start": 2772.68, "text": "intuition"}, {"end": 2773.56, "start": 2773.12, "text": "is"}, {"end": 2774.08, "start": 2773.56, "text": "that"}, {"end": 2774.36, "start": 2774.08, "text": "if"}, {"end": 2774.6, "start": 2774.36, "text": "you've"}, {"end": 2775.4, "start": 2774.6, "text": "seen"}, {"end": 2775.6, "start": 2775.4, "text": "as"}, {"end": 2775.76, "start": 2775.6, "text": "many"}, {"end": 2775.92, "start": 2775.76, "text": "as"}, {"end": 2776.16, "start": 2775.92, "text": "2"}, {"end": 2776.64, "start": 2776.16, "text": "to"}, {"end": 2776.88, "start": 2776.64, "text": "the"}, {"end": 2777.04, "start": 2776.88, "text": "r"}, {"end": 2777.76, "start": 2777.04, "text": "different"}, {"end": 2778.28, "start": 2777.76, "text": "elements,"}, {"end": 2778.76, "start": 2778.28, "text": "then"}, {"end": 2779.28, "start": 2778.76, "text": "there's"}, {"end": 2779.28, "start": 2779.28, "text": "a"}, {"end": 2779.28, "start": 2779.28, "text": "good"}, {"end": 2779.28, "start": 2779.28, "text": "chance"}, {"end": 2779.28, "start": 2779.28, "text": "that"}, {"end": 2779.28, "start": 2779.28, "text": "one"}, {"end": 2779.28, "start": 2779.28, "text": "of"}, {"end": 2779.4, "start": 2779.28, "text": "these"}, {"end": 2780.12, "start": 2779.4, "text": "elements"}, {"end": 2780.36, "start": 2780.12, "text": "will"}, {"end": 2780.76, "start": 2780.36, "text": "hash"}, {"end": 2780.76, "start": 2780.76, "text": "to"}, {"end": 2780.8, "start": 2780.76, "text": "a"}, {"end": 2781.12, "start": 2780.8, "text": "string"}, {"end": 2781.44, "start": 2781.12, "text": "with"}, {"end": 2781.68, "start": 2781.44, "text": "r0s"}, {"end": 2781.68, "start": 2781.68, "text": "at"}, {"end": 2782.08, "start": 2781.68, "text": "the"}, {"end": 2782.72, "start": 2782.08, "text": "end."}, {"end": 2782.84, "start": 2782.72, "text": "In"}, {"end": 2783.12, "start": 2782.84, "text": "fact,"}, {"end": 2784.16, "start": 2783.12, "text": "for"}, {"end": 2784.32, "start": 2784.16, "text": "any"}, {"end": 2784.56, "start": 2784.32, "text": "string"}, {"end": 2784.6, "start": 2784.56, "text": "of"}, {"end": 2785.28, "start": 2784.6, "text": "length"}, {"end": 2785.48, "start": 2785.28, "text": "r,"}, {"end": 2786.0, "start": 2785.48, "text": "there's"}, {"end": 2786.44, "start": 2786.0, "text": "a"}, {"end": 2786.64, "start": 2786.44, "text": "good"}, {"end": 2787.08, "start": 2786.64, "text": "chance"}, {"end": 2787.28, "start": 2787.08, "text": "you'll"}, {"end": 2787.4, "start": 2787.28, "text": "see"}, {"end": 2787.4, "start": 2787.4, "text": "that"}, {"end": 2789.96, "start": 2787.4, "text": "string."}], "text": " 2 to the power r different elements. Now, why this makes sense, it looks tricky. The intuition is that if you've seen as many as 2 to the r different elements, then there's a good chance that one of these elements will hash to a string with r0s at the end. In fact, for any string of length r, there's a good chance you'll see that string."}, {"chunks": [{"end": 2790.0, "start": 2790.0, "text": "Okay,"}, {"end": 2791.24, "start": 2790.0, "text": "but"}, {"end": 2791.56, "start": 2791.24, "text": "if"}, {"end": 2791.76, "start": 2791.56, "text": "you've"}, {"end": 2791.92, "start": 2791.76, "text": "seen"}, {"end": 2792.4, "start": 2791.92, "text": "many"}, {"end": 2793.24, "start": 2792.4, "text": "fewer"}, {"end": 2793.76, "start": 2793.24, "text": "than"}, {"end": 2793.76, "start": 2793.76, "text": "two"}, {"end": 2793.92, "start": 2793.76, "text": "to"}, {"end": 2794.36, "start": 2793.92, "text": "the"}, {"end": 2794.64, "start": 2794.36, "text": "r"}, {"end": 2794.88, "start": 2794.64, "text": "elements,"}, {"end": 2795.08, "start": 2794.88, "text": "then"}, {"end": 2795.36, "start": 2795.08, "text": "it's"}, {"end": 2797.4, "start": 2795.36, "text": "unlikely"}, {"end": 2797.6, "start": 2797.4, "text": "that"}, {"end": 2798.08, "start": 2797.6, "text": "any"}, {"end": 2798.44, "start": 2798.08, "text": "element"}, {"end": 2798.52, "start": 2798.44, "text": "you've"}, {"end": 2798.72, "start": 2798.52, "text": "seen"}, {"end": 2799.2, "start": 2798.72, "text": "will"}, {"end": 2799.84, "start": 2799.2, "text": "have"}, {"end": 2800.64, "start": 2799.84, "text": "a"}, {"end": 2801.12, "start": 2800.64, "text": "hash"}, {"end": 2801.48, "start": 2801.12, "text": "value"}, {"end": 2801.8, "start": 2801.48, "text": "with"}, {"end": 2802.12, "start": 2801.8, "text": "as"}, {"end": 2802.24, "start": 2802.12, "text": "many"}, {"end": 2802.52, "start": 2802.24, "text": "as"}, {"end": 2802.76, "start": 2802.52, "text": "r"}, {"end": 2803.28, "start": 2802.76, "text": "zeros"}, {"end": 2803.48, "start": 2803.28, "text": "at"}, {"end": 2808.24, "start": 2803.48, "text": "the"}, {"end": 2808.96, "start": 2808.24, "text": "end."}, {"end": 2809.2, "start": 2808.96, "text": "Okay,"}, {"end": 2809.36, "start": 2809.2, "text": "well,"}, {"end": 2810.64, "start": 2809.36, "text": "it"}, {"end": 2811.28, "start": 2810.64, "text": "is"}, {"end": 2811.72, "start": 2811.28, "text": "then"}, {"end": 2812.6, "start": 2811.72, "text": "necessary"}, {"end": 2812.64, "start": 2812.6, "text": "to"}, {"end": 2813.52, "start": 2812.64, "text": "combine"}, {"end": 2813.8, "start": 2813.52, "text": "the"}, {"end": 2814.4, "start": 2813.8, "text": "estimates"}, {"end": 2814.84, "start": 2814.4, "text": "from"}, {"end": 2815.52, "start": 2814.84, "text": "perhaps"}, {"end": 2816.12, "start": 2815.52, "text": "several"}, {"end": 2816.28, "start": 2816.12, "text": "hundred"}, {"end": 2816.92, "start": 2816.28, "text": "variables."}, {"end": 2818.12, "start": 2816.92, "text": "This"}, {"end": 2818.44, "start": 2818.12, "text": "is"}, {"end": 2818.44, "start": 2818.44, "text": "a"}, {"end": 2818.88, "start": 2818.44, "text": "little"}, {"end": 2819.6, "start": 2818.88, "text": "tricky"}, {"end": 2819.8, "start": 2819.6, "text": "to"}, {"end": 2819.8, "start": 2819.8, "text": "get"}, {"end": 2819.96, "start": 2819.8, "text": "right."}], "text": " Okay, but if you've seen many fewer than two to the r elements, then it's unlikely that any element you've seen will have a hash value with as many as r zeros at the end. Okay, well, it is then necessary to combine the estimates from perhaps several hundred variables. This is a little tricky to get right."}, {"chunks": [{"end": 2820.36, "start": 2820.0, "text": "You"}, {"end": 2821.08, "start": 2820.36, "text": "can't"}, {"end": 2821.48, "start": 2821.08, "text": "just"}, {"end": 2822.24, "start": 2821.48, "text": "average"}, {"end": 2822.64, "start": 2822.24, "text": "them."}, {"end": 2822.96, "start": 2822.64, "text": "The"}, {"end": 2823.36, "start": 2822.96, "text": "reason"}, {"end": 2823.92, "start": 2823.36, "text": "is"}, {"end": 2824.64, "start": 2823.92, "text": "that"}, {"end": 2825.44, "start": 2824.64, "text": "one"}, {"end": 2826.68, "start": 2825.44, "text": "extreme"}, {"end": 2827.28, "start": 2826.68, "text": "element,"}, {"end": 2827.64, "start": 2827.28, "text": "one"}, {"end": 2828.4, "start": 2827.64, "text": "extremely"}, {"end": 2828.76, "start": 2828.4, "text": "high"}, {"end": 2828.84, "start": 2828.76, "text": "element"}, {"end": 2828.96, "start": 2828.84, "text": "would"}, {"end": 2829.24, "start": 2828.96, "text": "just"}, {"end": 2829.76, "start": 2829.24, "text": "bias"}, {"end": 2830.0, "start": 2829.76, "text": "the"}, {"end": 2830.32, "start": 2830.0, "text": "average."}, {"end": 2830.92, "start": 2830.32, "text": "And"}, {"end": 2831.16, "start": 2830.92, "text": "you"}, {"end": 2831.8, "start": 2831.16, "text": "can't"}, {"end": 2831.8, "start": 2831.8, "text": "take"}, {"end": 2832.44, "start": 2831.8, "text": "the"}, {"end": 2832.96, "start": 2832.44, "text": "median."}, {"end": 2833.08, "start": 2832.96, "text": "The"}, {"end": 2833.6, "start": 2833.08, "text": "reason"}, {"end": 2834.64, "start": 2833.6, "text": "is"}, {"end": 2835.16, "start": 2834.64, "text": "that"}, {"end": 2835.76, "start": 2835.16, "text": "you'd"}, {"end": 2835.8, "start": 2835.76, "text": "only"}, {"end": 2835.8, "start": 2835.8, "text": "get"}, {"end": 2836.4, "start": 2835.8, "text": "powers"}, {"end": 2836.4, "start": 2836.4, "text": "of"}, {"end": 2837.12, "start": 2836.4, "text": "two"}, {"end": 2837.84, "start": 2837.12, "text": "because"}, {"end": 2837.92, "start": 2837.84, "text": "every"}, {"end": 2838.12, "start": 2837.92, "text": "estimate's"}, {"end": 2838.12, "start": 2838.12, "text": "a"}, {"end": 2838.48, "start": 2838.12, "text": "power"}, {"end": 2838.68, "start": 2838.48, "text": "of"}, {"end": 2839.4, "start": 2838.68, "text": "two."}, {"end": 2839.76, "start": 2839.4, "text": "So"}, {"end": 2839.84, "start": 2839.76, "text": "as"}, {"end": 2840.16, "start": 2839.84, "text": "I"}, {"end": 2840.52, "start": 2840.16, "text": "said,"}, {"end": 2840.84, "start": 2840.52, "text": "it's"}, {"end": 2840.96, "start": 2840.84, "text": "a"}, {"end": 2840.96, "start": 2840.96, "text": "little"}, {"end": 2841.32, "start": 2840.96, "text": "tricky"}, {"end": 2842.0, "start": 2841.32, "text": "to"}, {"end": 2842.4, "start": 2842.0, "text": "get"}, {"end": 2843.12, "start": 2842.4, "text": "right."}, {"end": 2843.76, "start": 2843.12, "text": "One"}, {"end": 2844.64, "start": 2843.76, "text": "way"}, {"end": 2845.16, "start": 2844.64, "text": "that"}, {"end": 2845.68, "start": 2845.16, "text": "works"}, {"end": 2846.08, "start": 2845.68, "text": "is"}, {"end": 2846.32, "start": 2846.08, "text": "you"}, {"end": 2847.04, "start": 2846.32, "text": "form"}, {"end": 2847.92, "start": 2847.04, "text": "small"}, {"end": 2848.76, "start": 2847.92, "text": "groups"}, {"end": 2849.12, "start": 2848.76, "text": "of"}, {"end": 2849.96, "start": 2849.12, "text": "variables."}], "text": " You can't just average them. The reason is that one extreme element, one extremely high element would just bias the average. And you can't take the median. The reason is that you'd only get powers of two because every estimate's a power of two. So as I said, it's a little tricky to get right. One way that works is you form small groups of variables."}, {"chunks": [{"end": 2850.08, "start": 2850.0, "text": "You"}, {"end": 2850.48, "start": 2850.08, "text": "take"}, {"end": 2850.68, "start": 2850.48, "text": "the"}, {"end": 2851.52, "start": 2850.68, "text": "average"}, {"end": 2851.92, "start": 2851.52, "text": "of"}, {"end": 2852.28, "start": 2851.92, "text": "the"}, {"end": 2852.8, "start": 2852.28, "text": "estimates"}, {"end": 2853.52, "start": 2852.8, "text": "within"}, {"end": 2853.8, "start": 2853.52, "text": "that"}, {"end": 2853.88, "start": 2853.8, "text": "group,"}, {"end": 2855.0, "start": 2853.88, "text": "and"}, {"end": 2855.52, "start": 2855.0, "text": "then"}, {"end": 2855.76, "start": 2855.52, "text": "you"}, {"end": 2856.08, "start": 2855.76, "text": "take"}, {"end": 2856.24, "start": 2856.08, "text": "the"}, {"end": 2856.84, "start": 2856.24, "text": "median"}, {"end": 2856.84, "start": 2856.84, "text": "of"}, {"end": 2856.88, "start": 2856.84, "text": "the"}, {"end": 2857.28, "start": 2856.88, "text": "group"}, {"end": 2858.16, "start": 2857.28, "text": "estimates."}, {"end": 2858.6, "start": 2858.16, "text": "And"}, {"end": 2858.96, "start": 2858.6, "text": "I"}, {"end": 2859.36, "start": 2858.96, "text": "won't"}, {"end": 2860.04, "start": 2859.36, "text": "go"}, {"end": 2860.32, "start": 2860.04, "text": "into"}, {"end": 2860.32, "start": 2860.32, "text": "it,"}, {"end": 2860.32, "start": 2860.32, "text": "but"}, {"end": 2860.6, "start": 2860.32, "text": "that"}, {"end": 2861.28, "start": 2860.6, "text": "actually"}, {"end": 2865.68, "start": 2861.28, "text": "works."}, {"end": 2866.56, "start": 2865.68, "text": "So"}, {"end": 2866.8, "start": 2866.56, "text": "here"}, {"end": 2867.6, "start": 2866.8, "text": "are"}, {"end": 2867.96, "start": 2867.6, "text": "the"}, {"end": 2869.48, "start": 2867.96, "text": "three"}, {"end": 2870.32, "start": 2869.48, "text": "points"}, {"end": 2870.64, "start": 2870.32, "text": "that"}, {"end": 2870.88, "start": 2870.64, "text": "I"}, {"end": 2871.16, "start": 2870.88, "text": "hope"}, {"end": 2871.32, "start": 2871.16, "text": "you"}, {"end": 2871.56, "start": 2871.32, "text": "will"}, {"end": 2871.96, "start": 2871.56, "text": "take"}, {"end": 2872.56, "start": 2871.96, "text": "away."}, {"end": 2873.24, "start": 2872.56, "text": "First,"}, {"end": 2873.96, "start": 2873.24, "text": "data"}, {"end": 2874.48, "start": 2873.96, "text": "science"}, {"end": 2874.96, "start": 2874.48, "text": "is"}, {"end": 2875.24, "start": 2874.96, "text": "really"}, {"end": 2875.72, "start": 2875.24, "text": "the"}, {"end": 2876.4, "start": 2875.72, "text": "natural"}, {"end": 2877.0, "start": 2876.4, "text": "evolution"}, {"end": 2877.24, "start": 2877.0, "text": "of"}, {"end": 2877.48, "start": 2877.24, "text": "work"}, {"end": 2877.76, "start": 2877.48, "text": "on"}, {"end": 2878.64, "start": 2877.76, "text": "large"}, {"end": 2879.96, "start": 2878.64, "text": "scale"}], "text": " You take the average of the estimates within that group, and then you take the median of the group estimates. And I won't go into it, but that actually works. So here are the three points that I hope you will take away. First, data science is really the natural evolution of work on large scale"}, {"chunks": [{"end": 2880.76, "start": 2880.0, "text": "data"}, {"end": 2881.92, "start": 2880.76, "text": "management"}, {"end": 2882.52, "start": 2881.92, "text": "that"}, {"end": 2882.84, "start": 2882.52, "text": "occurred"}, {"end": 2883.04, "start": 2882.84, "text": "in"}, {"end": 2883.4, "start": 2883.04, "text": "many"}, {"end": 2883.76, "start": 2883.4, "text": "areas"}, {"end": 2883.8, "start": 2883.76, "text": "of"}, {"end": 2884.36, "start": 2883.8, "text": "computer"}, {"end": 2885.24, "start": 2884.36, "text": "science,"}, {"end": 2885.84, "start": 2885.24, "text": "but"}, {"end": 2886.12, "start": 2885.84, "text": "it's"}, {"end": 2886.64, "start": 2886.12, "text": "oriented"}, {"end": 2886.96, "start": 2886.64, "text": "toward"}, {"end": 2887.32, "start": 2886.96, "text": "the"}, {"end": 2887.8, "start": 2887.32, "text": "applications"}, {"end": 2888.16, "start": 2887.8, "text": "to"}, {"end": 2888.88, "start": 2888.16, "text": "science"}, {"end": 2888.96, "start": 2888.88, "text": "and"}, {"end": 2890.8, "start": 2888.96, "text": "industry."}, {"end": 2891.96, "start": 2890.8, "text": "The"}, {"end": 2893.4, "start": 2891.96, "text": "statistics"}, {"end": 2894.08, "start": 2893.4, "text": "community,"}, {"end": 2894.36, "start": 2894.08, "text": "I"}, {"end": 2894.72, "start": 2894.36, "text": "believe,"}, {"end": 2895.0, "start": 2894.72, "text": "has"}, {"end": 2895.12, "start": 2895.0, "text": "an"}, {"end": 2895.48, "start": 2895.12, "text": "important"}, {"end": 2895.8, "start": 2895.48, "text": "role"}, {"end": 2895.88, "start": 2895.8, "text": "to"}, {"end": 2896.32, "start": 2895.88, "text": "play,"}, {"end": 2896.88, "start": 2896.32, "text": "but"}, {"end": 2897.24, "start": 2896.88, "text": "their"}, {"end": 2898.28, "start": 2897.24, "text": "importance"}, {"end": 2898.64, "start": 2898.28, "text": "is"}, {"end": 2899.44, "start": 2898.64, "text": "enhanced"}, {"end": 2899.68, "start": 2899.44, "text": "by"}, {"end": 2899.96, "start": 2899.68, "text": "focusing"}, {"end": 2900.28, "start": 2899.96, "text": "less"}, {"end": 2900.4, "start": 2900.28, "text": "on"}, {"end": 2901.16, "start": 2900.4, "text": "examining"}, {"end": 2901.64, "start": 2901.16, "text": "data"}, {"end": 2901.72, "start": 2901.64, "text": "and"}, {"end": 2902.36, "start": 2901.72, "text": "more"}, {"end": 2902.48, "start": 2902.36, "text": "on"}, {"end": 2903.2, "start": 2902.48, "text": "algorithms"}, {"end": 2903.36, "start": 2903.2, "text": "that"}, {"end": 2903.68, "start": 2903.36, "text": "actually"}, {"end": 2904.04, "start": 2903.68, "text": "solve"}, {"end": 2904.52, "start": 2904.04, "text": "someone's"}, {"end": 2904.64, "start": 2904.52, "text": "problems."}, {"end": 2906.48, "start": 2904.64, "text": "And"}, {"end": 2906.84, "start": 2906.48, "text": "finally,"}, {"end": 2907.24, "start": 2906.84, "text": "machine"}, {"end": 2908.16, "start": 2907.24, "text": "learning"}, {"end": 2908.44, "start": 2908.16, "text": "is"}, {"end": 2908.68, "start": 2908.44, "text": "a"}, {"end": 2909.24, "start": 2908.68, "text": "great"}, {"end": 2909.56, "start": 2909.24, "text": "tool"}, {"end": 2909.96, "start": 2909.56, "text": "for"}], "text": " data management that occurred in many areas of computer science, but it's oriented toward the applications to science and industry. The statistics community, I believe, has an important role to play, but their importance is enhanced by focusing less on examining data and more on algorithms that actually solve someone's problems. And finally, machine learning is a great tool for"}, {"chunks": [{"end": 2910.24, "start": 2910.0, "text": "Many"}, {"end": 2910.56, "start": 2910.24, "text": "data"}, {"end": 2910.96, "start": 2910.56, "text": "science"}, {"end": 2912.04, "start": 2910.96, "text": "problems,"}, {"end": 2912.32, "start": 2912.04, "text": "but"}, {"end": 2912.56, "start": 2912.32, "text": "there"}, {"end": 2912.84, "start": 2912.56, "text": "are"}, {"end": 2913.36, "start": 2912.84, "text": "also"}, {"end": 2913.72, "start": 2913.36, "text": "many"}, {"end": 2914.24, "start": 2913.72, "text": "important"}, {"end": 2914.6, "start": 2914.24, "text": "ideas"}, {"end": 2914.6, "start": 2914.6, "text": "in"}, {"end": 2914.88, "start": 2914.6, "text": "data"}, {"end": 2915.92, "start": 2914.88, "text": "science"}, {"end": 2916.52, "start": 2915.92, "text": "that"}, {"end": 2916.88, "start": 2916.52, "text": "are"}, {"end": 2917.48, "start": 2916.88, "text": "not"}, {"end": 2917.84, "start": 2917.48, "text": "machine"}, {"end": 2918.6, "start": 2917.84, "text": "learning."}, {"end": 2919.28, "start": 2918.6, "text": "Anyway,"}, {"end": 2920.0, "start": 2919.28, "text": "thank"}, {"end": 2920.36, "start": 2920.0, "text": "you"}, {"end": 2920.72, "start": 2920.36, "text": "very"}, {"end": 2921.72, "start": 2920.72, "text": "much."}, {"end": 2921.92, "start": 2921.72, "text": "Thank"}, {"end": 2922.16, "start": 2921.92, "text": "you"}, {"end": 2922.4, "start": 2922.16, "text": "for"}, {"end": 2922.56, "start": 2922.4, "text": "helping"}, {"end": 2922.76, "start": 2922.56, "text": "us"}, {"end": 2923.24, "start": 2922.76, "text": "to"}, {"end": 2923.76, "start": 2923.24, "text": "enjoy"}, {"end": 2924.2, "start": 2923.76, "text": "one"}, {"end": 2924.4, "start": 2924.2, "text": "more"}, {"end": 2924.56, "start": 2924.4, "text": "time,"}, {"end": 2924.8, "start": 2924.56, "text": "I"}, {"end": 2925.28, "start": 2924.8, "text": "would"}, {"end": 2925.68, "start": 2925.28, "text": "say"}, {"end": 2925.88, "start": 2925.68, "text": "some"}, {"end": 2926.04, "start": 2925.88, "text": "of"}, {"end": 2926.16, "start": 2926.04, "text": "the"}, {"end": 2926.52, "start": 2926.16, "text": "most"}, {"end": 2926.8, "start": 2926.52, "text": "beautiful"}, {"end": 2927.48, "start": 2926.8, "text": "gems"}, {"end": 2927.72, "start": 2927.48, "text": "in"}, {"end": 2928.6, "start": 2927.72, "text": "computer"}, {"end": 2929.0, "start": 2928.6, "text": "science"}, {"end": 2929.88, "start": 2929.0, "text": "algorithms."}, {"end": 2930.0, "start": 2929.88, "text": "I'm"}, {"end": 2930.76, "start": 2930.0, "text": "sure"}, {"end": 2930.96, "start": 2930.76, "text": "these"}, {"end": 2931.36, "start": 2930.96, "text": "two"}, {"end": 2931.8, "start": 2931.36, "text": "algorithms"}, {"end": 2932.2, "start": 2931.8, "text": "often"}, {"end": 2932.6, "start": 2932.2, "text": "appear"}, {"end": 2932.72, "start": 2932.6, "text": "in"}, {"end": 2933.08, "start": 2932.72, "text": "many"}, {"end": 2933.72, "start": 2933.08, "text": "PhD"}, {"end": 2934.52, "start": 2933.72, "text": "students'"}, {"end": 2935.52, "start": 2934.52, "text": "qualification"}, {"end": 2936.52, "start": 2935.52, "text": "exams."}, {"end": 2936.8, "start": 2936.52, "text": "Now,"}, {"end": 2937.52, "start": 2936.8, "text": "thank"}, {"end": 2937.76, "start": 2937.52, "text": "you"}, {"end": 2938.2, "start": 2937.76, "text": "for"}, {"end": 2938.68, "start": 2938.2, "text": "the"}, {"end": 2939.96, "start": 2938.68, "text": "talk."}], "text": " Many data science problems, but there are also many important ideas in data science that are not machine learning. Anyway, thank you very much. Thank you for helping us to enjoy one more time, I would say some of the most beautiful gems in computer science algorithms. I'm sure these two algorithms often appear in many PhD students' qualification exams. Now, thank you for the talk."}, {"chunks": [{"end": 2940.36, "start": 2940.0, "text": "have"}, {"end": 2940.8, "start": 2940.36, "text": "collected"}, {"end": 2941.12, "start": 2940.8, "text": "some"}, {"end": 2941.72, "start": 2941.12, "text": "questions"}, {"end": 2942.4, "start": 2941.72, "text": "for"}, {"end": 2942.76, "start": 2942.4, "text": "your"}, {"end": 2943.44, "start": 2942.76, "text": "talk."}, {"end": 2943.72, "start": 2943.44, "text": "So"}, {"end": 2943.92, "start": 2943.72, "text": "the"}, {"end": 2944.84, "start": 2943.92, "text": "first"}, {"end": 2945.32, "start": 2944.84, "text": "one,"}, {"end": 2945.92, "start": 2945.32, "text": "now"}, {"end": 2946.96, "start": 2945.92, "text": "you"}, {"end": 2947.28, "start": 2946.96, "text": "have"}, {"end": 2947.56, "start": 2947.28, "text": "helped"}, {"end": 2948.04, "start": 2947.56, "text": "us"}, {"end": 2948.16, "start": 2948.04, "text": "review"}, {"end": 2948.4, "start": 2948.16, "text": "some"}, {"end": 2948.56, "start": 2948.4, "text": "of"}, {"end": 2948.8, "start": 2948.56, "text": "the"}, {"end": 2949.4, "start": 2948.8, "text": "histories"}, {"end": 2949.48, "start": 2949.4, "text": "of"}, {"end": 2949.6, "start": 2949.48, "text": "the"}, {"end": 2950.28, "start": 2949.6, "text": "interaction"}, {"end": 2950.96, "start": 2950.28, "text": "between"}, {"end": 2951.16, "start": 2950.96, "text": "data"}, {"end": 2951.96, "start": 2951.16, "text": "science"}, {"end": 2952.12, "start": 2951.96, "text": "and"}, {"end": 2952.24, "start": 2952.12, "text": "the"}, {"end": 2952.56, "start": 2952.24, "text": "machine"}, {"end": 2952.92, "start": 2952.56, "text": "learning"}, {"end": 2953.24, "start": 2952.92, "text": "AI"}, {"end": 2954.16, "start": 2953.24, "text": "community."}, {"end": 2954.4, "start": 2954.16, "text": "So"}, {"end": 2954.6, "start": 2954.4, "text": "if"}, {"end": 2954.88, "start": 2954.6, "text": "we"}, {"end": 2955.0, "start": 2954.88, "text": "look"}, {"end": 2955.24, "start": 2955.0, "text": "into"}, {"end": 2955.72, "start": 2955.24, "text": "the"}, {"end": 2956.08, "start": 2955.72, "text": "next"}, {"end": 2956.48, "start": 2956.08, "text": "five"}, {"end": 2957.04, "start": 2956.48, "text": "years,"}, {"end": 2957.28, "start": 2957.04, "text": "what"}, {"end": 2957.64, "start": 2957.28, "text": "would"}, {"end": 2957.8, "start": 2957.64, "text": "you"}, {"end": 2958.28, "start": 2957.8, "text": "say"}, {"end": 2958.96, "start": 2958.28, "text": "about"}, {"end": 2959.4, "start": 2958.96, "text": "the,"}, {"end": 2959.48, "start": 2959.4, "text": "you"}, {"end": 2960.0, "start": 2959.48, "text": "know,"}, {"end": 2960.32, "start": 2960.0, "text": "how"}, {"end": 2960.84, "start": 2960.32, "text": "data"}, {"end": 2961.28, "start": 2960.84, "text": "science"}, {"end": 2962.04, "start": 2961.28, "text": "would"}, {"end": 2962.28, "start": 2962.04, "text": "evolve"}, {"end": 2962.6, "start": 2962.28, "text": "together"}, {"end": 2963.0, "start": 2962.6, "text": "with"}, {"end": 2963.68, "start": 2963.0, "text": "machine"}, {"end": 2964.68, "start": 2963.68, "text": "learning"}, {"end": 2965.4, "start": 2964.68, "text": "and"}, {"end": 2966.92, "start": 2965.4, "text": "AI?"}, {"end": 2967.44, "start": 2966.92, "text": "Yeah."}, {"end": 2968.16, "start": 2967.44, "text": "Okay,"}, {"end": 2969.96, "start": 2968.16, "text": "well,"}], "text": " have collected some questions for your talk. So the first one, now you have helped us review some of the histories of the interaction between data science and the machine learning AI community. So if we look into the next five years, what would you say about the, you know, how data science would evolve together with machine learning and AI? Yeah. Okay, well,"}, {"chunks": [{"end": 2970.52, "start": 2970.0, "text": "You"}, {"end": 2971.92, "start": 2970.52, "text": "know,"}, {"end": 2972.36, "start": 2971.92, "text": "sort"}, {"end": 2972.36, "start": 2972.36, "text": "of"}, {"end": 2972.8, "start": 2972.36, "text": "at"}, {"end": 2973.56, "start": 2972.8, "text": "random,"}, {"end": 2973.88, "start": 2973.56, "text": "I"}, {"end": 2974.84, "start": 2973.88, "text": "had"}, {"end": 2975.48, "start": 2974.84, "text": "an"}, {"end": 2975.84, "start": 2975.48, "text": "email"}, {"end": 2975.96, "start": 2975.84, "text": "from"}, {"end": 2976.48, "start": 2975.96, "text": "somebody"}, {"end": 2976.84, "start": 2976.48, "text": "that"}, {"end": 2977.2, "start": 2976.84, "text": "I"}, {"end": 2977.28, "start": 2977.2, "text": "had"}, {"end": 2978.36, "start": 2977.28, "text": "corresponded"}, {"end": 2978.96, "start": 2978.36, "text": "with"}, {"end": 2979.4, "start": 2978.96, "text": "a"}, {"end": 2979.6, "start": 2979.4, "text": "couple"}, {"end": 2979.76, "start": 2979.6, "text": "of"}, {"end": 2979.96, "start": 2979.76, "text": "years"}, {"end": 2980.08, "start": 2979.96, "text": "ago,"}, {"end": 2980.76, "start": 2980.08, "text": "basically"}, {"end": 2982.84, "start": 2980.76, "text": "saying,"}, {"end": 2983.88, "start": 2982.84, "text": "machine"}, {"end": 2984.44, "start": 2983.88, "text": "learning"}, {"end": 2984.64, "start": 2984.44, "text": "is"}, {"end": 2984.92, "start": 2984.64, "text": "becoming"}, {"end": 2985.64, "start": 2984.92, "text": "canned."}, {"end": 2986.0, "start": 2985.64, "text": "You"}, {"end": 2986.6, "start": 2986.0, "text": "know,"}, {"end": 2986.76, "start": 2986.6, "text": "you"}, {"end": 2987.12, "start": 2986.76, "text": "have"}, {"end": 2987.44, "start": 2987.12, "text": "a"}, {"end": 2988.04, "start": 2987.44, "text": "problem,"}, {"end": 2988.2, "start": 2988.04, "text": "you"}, {"end": 2989.96, "start": 2988.2, "text": "just"}, {"end": 2991.08, "start": 2989.96, "text": "use"}, {"end": 2991.88, "start": 2991.08, "text": "some"}, {"end": 2995.76, "start": 2991.88, "text": "application"}, {"end": 2998.12, "start": 2995.76, "text": "and"}, {"end": 2999.12, "start": 2998.12, "text": "you"}, {"end": 2999.92, "start": 2999.12, "text": "solve"}, {"end": 2999.96, "start": 2999.92, "text": "it."}], "text": " You know, sort of at random, I had an email from somebody that I had corresponded with a couple of years ago, basically saying, machine learning is becoming canned. You know, you have a problem, you just use some application and you solve it."}, {"chunks": [{"end": 3000.64, "start": 3000.0, "text": "don't"}, {"end": 3001.04, "start": 3000.64, "text": "think"}, {"end": 3001.28, "start": 3001.04, "text": "that"}, {"end": 3001.64, "start": 3001.28, "text": "that's"}, {"end": 3002.48, "start": 3001.64, "text": "quite"}, {"end": 3002.6, "start": 3002.48, "text": "the"}, {"end": 3003.76, "start": 3002.6, "text": "case."}, {"end": 3004.48, "start": 3003.76, "text": "There"}, {"end": 3005.0, "start": 3004.48, "text": "is"}, {"end": 3005.52, "start": 3005.0, "text": "no"}, {"end": 3006.12, "start": 3005.52, "text": "question"}, {"end": 3006.32, "start": 3006.12, "text": "that"}, {"end": 3006.4, "start": 3006.32, "text": "there"}, {"end": 3007.0, "start": 3006.4, "text": "will"}, {"end": 3007.48, "start": 3007.0, "text": "be"}, {"end": 3007.56, "start": 3007.48, "text": "a"}, {"end": 3008.04, "start": 3007.56, "text": "lot"}, {"end": 3009.0, "start": 3008.04, "text": "more,"}, {"end": 3009.8, "start": 3009.0, "text": "let's"}, {"end": 3010.04, "start": 3009.8, "text": "say"}, {"end": 3010.88, "start": 3010.04, "text": "regularization."}, {"end": 3011.24, "start": 3010.88, "text": "A"}, {"end": 3011.84, "start": 3011.24, "text": "lot"}, {"end": 3012.12, "start": 3011.84, "text": "of"}, {"end": 3012.72, "start": 3012.12, "text": "things"}, {"end": 3013.12, "start": 3012.72, "text": "that"}, {"end": 3013.12, "start": 3013.12, "text": "are"}, {"end": 3013.48, "start": 3013.12, "text": "now"}, {"end": 3014.2, "start": 3013.48, "text": "interesting"}, {"end": 3014.2, "start": 3014.2, "text": "and"}, {"end": 3015.08, "start": 3014.2, "text": "complex"}, {"end": 3016.04, "start": 3015.08, "text": "will"}, {"end": 3016.6, "start": 3016.04, "text": "become"}, {"end": 3019.84, "start": 3016.6, "text": "routine."}, {"end": 3020.24, "start": 3019.84, "text": "I"}, {"end": 3021.12, "start": 3020.24, "text": "think"}, {"end": 3021.56, "start": 3021.12, "text": "I"}, {"end": 3021.76, "start": 3021.56, "text": "could"}, {"end": 3022.44, "start": 3021.76, "text": "see,"}, {"end": 3024.32, "start": 3022.44, "text": "well,"}, {"end": 3024.88, "start": 3024.32, "text": "I"}, {"end": 3026.28, "start": 3024.88, "text": "could"}, {"end": 3027.8, "start": 3026.28, "text": "see"}, {"end": 3029.32, "start": 3027.8, "text": "the"}, {"end": 3029.96, "start": 3029.32, "text": "data,"}], "text": " don't think that that's quite the case. There is no question that there will be a lot more, let's say regularization. A lot of things that are now interesting and complex will become routine. I think I could see, well, I could see the data,"}, {"chunks": [{"end": 3030.56, "start": 3030.0, "text": "Data"}, {"end": 3031.72, "start": 3030.56, "text": "preparation,"}, {"end": 3031.96, "start": 3031.72, "text": "that"}, {"end": 3032.64, "start": 3031.96, "text": "is"}, {"end": 3033.12, "start": 3032.64, "text": "using"}, {"end": 3033.12, "start": 3033.12, "text": "the,"}, {"end": 3033.12, "start": 3033.12, "text": "you"}, {"end": 3033.2, "start": 3033.12, "text": "know,"}, {"end": 3033.76, "start": 3033.2, "text": "finding"}, {"end": 3034.16, "start": 3033.76, "text": "the"}, {"end": 3035.48, "start": 3034.16, "text": "right"}, {"end": 3035.8, "start": 3035.48, "text": "data,"}, {"end": 3036.6, "start": 3035.8, "text": "I"}, {"end": 3036.88, "start": 3036.6, "text": "think"}, {"end": 3037.44, "start": 3036.88, "text": "that's"}, {"end": 3037.48, "start": 3037.44, "text": "still"}, {"end": 3037.52, "start": 3037.48, "text": "going"}, {"end": 3037.68, "start": 3037.52, "text": "to"}, {"end": 3037.8, "start": 3037.68, "text": "be"}, {"end": 3038.2, "start": 3037.8, "text": "a"}, {"end": 3039.88, "start": 3038.2, "text": "problem."}, {"end": 3041.84, "start": 3039.88, "text": "I"}, {"end": 3043.2, "start": 3041.84, "text": "suspect"}, {"end": 3046.12, "start": 3043.2, "text": "that"}, {"end": 3046.96, "start": 3046.12, "text": "there"}, {"end": 3047.52, "start": 3046.96, "text": "will"}, {"end": 3049.24, "start": 3047.52, "text": "be"}, {"end": 3049.72, "start": 3049.24, "text": "a"}, {"end": 3050.76, "start": 3049.72, "text": "number"}, {"end": 3051.08, "start": 3050.76, "text": "of"}, {"end": 3051.48, "start": 3051.08, "text": "more"}, {"end": 3052.04, "start": 3051.48, "text": "powerful"}, {"end": 3053.48, "start": 3052.04, "text": "programming"}, {"end": 3054.16, "start": 3053.48, "text": "tools"}, {"end": 3055.4, "start": 3054.16, "text": "around."}, {"end": 3055.88, "start": 3055.4, "text": "I"}, {"end": 3057.12, "start": 3055.88, "text": "can"}, {"end": 3059.96, "start": 3057.12, "text": "see"}], "text": " Data preparation, that is using the, you know, finding the right data, I think that's still going to be a problem. I suspect that there will be a number of more powerful programming tools around. I can see"}, {"chunks": [{"end": 3061.0, "start": 3060.0, "text": "changes,"}, {"end": 3061.84, "start": 3061.0, "text": "improvements"}, {"end": 3062.24, "start": 3061.84, "text": "in"}, {"end": 3062.96, "start": 3062.24, "text": "hardware,"}, {"end": 3063.44, "start": 3062.96, "text": "for"}, {"end": 3064.68, "start": 3063.44, "text": "example,"}, {"end": 3065.24, "start": 3064.68, "text": "that"}, {"end": 3065.8, "start": 3065.24, "text": "will"}, {"end": 3066.36, "start": 3065.8, "text": "make"}, {"end": 3066.76, "start": 3066.36, "text": "a"}, {"end": 3068.88, "start": 3066.76, "text": "lot"}, {"end": 3069.44, "start": 3068.88, "text": "of"}, {"end": 3070.92, "start": 3069.44, "text": "machine"}, {"end": 3071.76, "start": 3070.92, "text": "learning"}, {"end": 3073.16, "start": 3071.76, "text": "apps"}, {"end": 3074.84, "start": 3073.16, "text": "run"}, {"end": 3075.32, "start": 3074.84, "text": "much"}, {"end": 3076.88, "start": 3075.32, "text": "faster."}, {"end": 3077.48, "start": 3076.88, "text": "It's"}, {"end": 3078.24, "start": 3077.48, "text": "not"}, {"end": 3078.64, "start": 3078.24, "text": "only"}, {"end": 3079.16, "start": 3078.64, "text": "graphics"}, {"end": 3080.0, "start": 3079.16, "text": "processing"}, {"end": 3080.76, "start": 3080.0, "text": "units,"}, {"end": 3081.48, "start": 3080.76, "text": "but"}, {"end": 3082.08, "start": 3081.48, "text": "I"}, {"end": 3082.6, "start": 3082.08, "text": "can"}, {"end": 3082.68, "start": 3082.6, "text": "see"}, {"end": 3082.76, "start": 3082.68, "text": "there"}, {"end": 3082.88, "start": 3082.76, "text": "are"}, {"end": 3082.88, "start": 3082.88, "text": "a"}, {"end": 3082.92, "start": 3082.88, "text": "number"}, {"end": 3083.12, "start": 3082.92, "text": "of"}, {"end": 3083.48, "start": 3083.12, "text": "things"}, {"end": 3083.76, "start": 3083.48, "text": "coming"}, {"end": 3083.88, "start": 3083.76, "text": "down"}, {"end": 3084.0, "start": 3083.88, "text": "the"}, {"end": 3084.44, "start": 3084.0, "text": "pike"}, {"end": 3084.68, "start": 3084.44, "text": "that"}, {"end": 3084.88, "start": 3084.68, "text": "I"}, {"end": 3085.0, "start": 3084.88, "text": "think"}, {"end": 3085.24, "start": 3085.0, "text": "in"}, {"end": 3085.56, "start": 3085.24, "text": "five"}, {"end": 3086.56, "start": 3085.56, "text": "years"}, {"end": 3087.48, "start": 3086.56, "text": "will"}, {"end": 3088.0, "start": 3087.48, "text": "be"}, {"end": 3089.24, "start": 3088.0, "text": "frequently"}, {"end": 3089.96, "start": 3089.24, "text": "used"}], "text": " changes, improvements in hardware, for example, that will make a lot of machine learning apps run much faster. It's not only graphics processing units, but I can see there are a number of things coming down the pike that I think in five years will be frequently used"}, {"chunks": [{"end": 3090.72, "start": 3090.0, "text": "in"}, {"end": 3092.44, "start": 3090.72, "text": "data"}, {"end": 3095.08, "start": 3092.44, "text": "science"}, {"end": 3099.6, "start": 3095.08, "text": "applications."}, {"end": 3100.32, "start": 3099.6, "text": "Thanks."}, {"end": 3100.72, "start": 3100.32, "text": "So,"}, {"end": 3101.6, "start": 3100.72, "text": "you"}, {"end": 3102.24, "start": 3101.6, "text": "know,"}, {"end": 3102.6, "start": 3102.24, "text": "KDD"}, {"end": 3103.16, "start": 3102.6, "text": "is"}, {"end": 3103.36, "start": 3103.16, "text": "a"}, {"end": 3103.88, "start": 3103.36, "text": "conference"}, {"end": 3104.08, "start": 3103.88, "text": "not"}, {"end": 3104.28, "start": 3104.08, "text": "just"}, {"end": 3104.6, "start": 3104.28, "text": "for"}, {"end": 3104.8, "start": 3104.6, "text": "academia."}, {"end": 3105.12, "start": 3104.8, "text": "We"}, {"end": 3105.52, "start": 3105.12, "text": "actually"}, {"end": 3105.84, "start": 3105.52, "text": "have"}, {"end": 3106.12, "start": 3105.84, "text": "at"}, {"end": 3106.36, "start": 3106.12, "text": "least"}, {"end": 3106.72, "start": 3106.36, "text": "half"}, {"end": 3107.0, "start": 3106.72, "text": "from"}, {"end": 3107.24, "start": 3107.0, "text": "the"}, {"end": 3108.12, "start": 3107.24, "text": "industry."}, {"end": 3108.44, "start": 3108.12, "text": "And"}, {"end": 3109.0, "start": 3108.44, "text": "what"}, {"end": 3109.28, "start": 3109.0, "text": "do"}, {"end": 3109.64, "start": 3109.28, "text": "you"}, {"end": 3110.24, "start": 3109.64, "text": "say"}, {"end": 3110.84, "start": 3110.24, "text": "in"}, {"end": 3111.08, "start": 3110.84, "text": "the"}, {"end": 3111.48, "start": 3111.08, "text": "future,"}, {"end": 3111.72, "start": 3111.48, "text": "what"}, {"end": 3112.04, "start": 3111.72, "text": "would"}, {"end": 3112.16, "start": 3112.04, "text": "be"}, {"end": 3112.32, "start": 3112.16, "text": "the"}, {"end": 3112.64, "start": 3112.32, "text": "best"}, {"end": 3112.92, "start": 3112.64, "text": "way"}, {"end": 3113.2, "start": 3112.92, "text": "that"}, {"end": 3113.8, "start": 3113.2, "text": "industry"}, {"end": 3114.16, "start": 3113.8, "text": "and"}, {"end": 3115.12, "start": 3114.16, "text": "academia"}, {"end": 3115.24, "start": 3115.12, "text": "in"}, {"end": 3115.48, "start": 3115.24, "text": "this"}, {"end": 3115.8, "start": 3115.48, "text": "KDD"}, {"end": 3116.16, "start": 3115.8, "text": "community"}, {"end": 3116.36, "start": 3116.16, "text": "could"}, {"end": 3116.84, "start": 3116.36, "text": "collaborate?"}, {"end": 3117.04, "start": 3116.84, "text": "And"}, {"end": 3117.88, "start": 3117.04, "text": "what"}, {"end": 3118.12, "start": 3117.88, "text": "would"}, {"end": 3118.24, "start": 3118.12, "text": "be"}, {"end": 3118.32, "start": 3118.24, "text": "the"}, {"end": 3118.8, "start": 3118.32, "text": "synergy"}, {"end": 3119.16, "start": 3118.8, "text": "between"}, {"end": 3119.28, "start": 3119.16, "text": "the"}, {"end": 3119.96, "start": 3119.28, "text": "two?"}], "text": " in data science applications. Thanks. So, you know, KDD is a conference not just for academia. We actually have at least half from the industry. And what do you say in the future, what would be the best way that industry and academia in this KDD community could collaborate? And what would be the synergy between the two?"}, {"chunks": [{"end": 3120.44, "start": 3120.0, "text": "you"}, {"end": 3121.12, "start": 3120.44, "text": "know,"}, {"end": 3122.08, "start": 3121.12, "text": "people"}, {"end": 3122.48, "start": 3122.08, "text": "for"}, {"end": 3122.68, "start": 3122.48, "text": "the"}, {"end": 3123.12, "start": 3122.68, "text": "best"}, {"end": 3124.64, "start": 3123.12, "text": "of"}, {"end": 3125.28, "start": 3124.64, "text": "data"}, {"end": 3125.72, "start": 3125.28, "text": "science?"}, {"end": 3125.92, "start": 3125.72, "text": "Oh,"}, {"end": 3127.44, "start": 3125.92, "text": "boy."}, {"end": 3127.92, "start": 3127.44, "text": "You"}, {"end": 3128.8, "start": 3127.92, "text": "know,"}, {"end": 3129.36, "start": 3128.8, "text": "it's,"}, {"end": 3130.64, "start": 3129.36, "text": "I"}, {"end": 3131.52, "start": 3130.64, "text": "think"}, {"end": 3132.0, "start": 3131.52, "text": "it's"}, {"end": 3132.72, "start": 3132.0, "text": "an"}, {"end": 3133.64, "start": 3132.72, "text": "interesting,"}, {"end": 3134.28, "start": 3133.64, "text": "well,"}, {"end": 3134.88, "start": 3134.28, "text": "it's"}, {"end": 3135.28, "start": 3134.88, "text": "an"}, {"end": 3136.28, "start": 3135.28, "text": "interesting"}, {"end": 3137.12, "start": 3136.28, "text": "competition."}, {"end": 3137.6, "start": 3137.12, "text": "There"}, {"end": 3139.16, "start": 3137.6, "text": "are"}, {"end": 3139.92, "start": 3139.16, "text": "obviously"}, {"end": 3141.56, "start": 3139.92, "text": "things"}, {"end": 3142.84, "start": 3141.56, "text": "that"}, {"end": 3143.64, "start": 3142.84, "text": "industry"}, {"end": 3144.92, "start": 3143.64, "text": "can"}, {"end": 3145.52, "start": 3144.92, "text": "do."}, {"end": 3148.16, "start": 3145.52, "text": "Resources"}, {"end": 3148.6, "start": 3148.16, "text": "are"}, {"end": 3149.96, "start": 3148.6, "text": "enormous."}], "text": " you know, people for the best of data science? Oh, boy. You know, it's, I think it's an interesting, well, it's an interesting competition. There are obviously things that industry can do. Resources are enormous."}, {"chunks": [{"end": 3150.28, "start": 3150.0, "text": "not"}, {"end": 3150.52, "start": 3150.28, "text": "only"}, {"end": 3150.96, "start": 3150.52, "text": "financial"}, {"end": 3151.6, "start": 3150.96, "text": "resources,"}, {"end": 3152.08, "start": 3151.6, "text": "but"}, {"end": 3152.4, "start": 3152.08, "text": "often"}, {"end": 3153.04, "start": 3152.4, "text": "data"}, {"end": 3154.56, "start": 3153.04, "text": "resources."}, {"end": 3155.08, "start": 3154.56, "text": "If"}, {"end": 3155.72, "start": 3155.08, "text": "you"}, {"end": 3156.28, "start": 3155.72, "text": "were"}, {"end": 3157.44, "start": 3156.28, "text": "Google"}, {"end": 3157.88, "start": 3157.44, "text": "or"}, {"end": 3158.2, "start": 3157.88, "text": "Facebook"}, {"end": 3158.44, "start": 3158.2, "text": "or"}, {"end": 3159.0, "start": 3158.44, "text": "something,"}, {"end": 3159.36, "start": 3159.0, "text": "you"}, {"end": 3160.16, "start": 3159.36, "text": "just"}, {"end": 3162.24, "start": 3160.16, "text": "have"}, {"end": 3162.92, "start": 3162.24, "text": "access"}, {"end": 3163.68, "start": 3162.92, "text": "to"}, {"end": 3164.4, "start": 3163.68, "text": "data"}, {"end": 3165.76, "start": 3164.4, "text": "that"}, {"end": 3166.32, "start": 3165.76, "text": "nobody"}, {"end": 3166.8, "start": 3166.32, "text": "else"}, {"end": 3167.6, "start": 3166.8, "text": "has."}, {"end": 3167.96, "start": 3167.6, "text": "On"}, {"end": 3169.88, "start": 3167.96, "text": "the"}, {"end": 3170.24, "start": 3169.88, "text": "other"}, {"end": 3170.72, "start": 3170.24, "text": "hand,"}, {"end": 3171.24, "start": 3170.72, "text": "I"}, {"end": 3172.0, "start": 3171.24, "text": "have"}, {"end": 3172.24, "start": 3172.0, "text": "seen"}, {"end": 3172.32, "start": 3172.24, "text": "some"}, {"end": 3172.64, "start": 3172.32, "text": "pretty"}, {"end": 3172.68, "start": 3172.64, "text": "good"}, {"end": 3174.64, "start": 3172.68, "text": "things"}, {"end": 3175.48, "start": 3174.64, "text": "coming"}, {"end": 3176.28, "start": 3175.48, "text": "out"}, {"end": 3177.68, "start": 3176.28, "text": "of"}, {"end": 3179.36, "start": 3177.68, "text": "academia"}, {"end": 3179.96, "start": 3179.36, "text": "still."}], "text": " not only financial resources, but often data resources. If you were Google or Facebook or something, you just have access to data that nobody else has. On the other hand, I have seen some pretty good things coming out of academia still."}, {"chunks": [{"end": 3180.4, "start": 3180.0, "text": "So"}, {"end": 3180.72, "start": 3180.4, "text": "I'm"}, {"end": 3182.56, "start": 3180.72, "text": "optimistic"}, {"end": 3182.8, "start": 3182.56, "text": "that,"}, {"end": 3183.36, "start": 3182.8, "text": "you"}, {"end": 3184.4, "start": 3183.36, "text": "know,"}, {"end": 3184.88, "start": 3184.4, "text": "it'll"}, {"end": 3185.0, "start": 3184.88, "text": "be"}, {"end": 3186.96, "start": 3185.0, "text": "about"}, {"end": 3188.28, "start": 3186.96, "text": "half"}, {"end": 3189.24, "start": 3188.28, "text": "and"}, {"end": 3191.08, "start": 3189.24, "text": "half,"}, {"end": 3191.88, "start": 3191.08, "text": "I"}, {"end": 3192.96, "start": 3191.88, "text": "would"}, {"end": 3194.08, "start": 3192.96, "text": "say."}, {"end": 3194.48, "start": 3194.08, "text": "You"}, {"end": 3194.84, "start": 3194.48, "text": "know,"}, {"end": 3195.56, "start": 3194.84, "text": "a"}, {"end": 3196.12, "start": 3195.56, "text": "lot,"}, {"end": 3197.68, "start": 3196.12, "text": "you"}, {"end": 3200.08, "start": 3197.68, "text": "know,"}, {"end": 3200.32, "start": 3200.08, "text": "if"}, {"end": 3200.64, "start": 3200.32, "text": "you"}, {"end": 3201.24, "start": 3200.64, "text": "look"}, {"end": 3203.28, "start": 3201.24, "text": "at,"}, {"end": 3203.92, "start": 3203.28, "text": "you"}, {"end": 3205.92, "start": 3203.92, "text": "know,"}, {"end": 3206.76, "start": 3205.92, "text": "where"}, {"end": 3208.16, "start": 3206.76, "text": "a"}, {"end": 3208.8, "start": 3208.16, "text": "lot"}, {"end": 3209.56, "start": 3208.8, "text": "of"}, {"end": 3209.8, "start": 3209.56, "text": "the"}, {"end": 3209.96, "start": 3209.8, "text": "basic"}], "text": " So I'm optimistic that, you know, it'll be about half and half, I would say. You know, a lot, you know, if you look at, you know, where a lot of the basic"}, {"chunks": [{"end": 3210.48, "start": 3210.0, "text": "machine"}, {"end": 3211.48, "start": 3210.48, "text": "learning"}, {"end": 3211.96, "start": 3211.48, "text": "ideas"}, {"end": 3212.24, "start": 3211.96, "text": "came"}, {"end": 3212.48, "start": 3212.24, "text": "from"}, {"end": 3212.48, "start": 3212.48, "text": "a"}, {"end": 3212.64, "start": 3212.48, "text": "lot"}, {"end": 3213.0, "start": 3212.64, "text": "of"}, {"end": 3213.48, "start": 3213.0, "text": "that"}, {"end": 3213.6, "start": 3213.48, "text": "a"}, {"end": 3213.96, "start": 3213.6, "text": "lot"}, {"end": 3214.2, "start": 3213.96, "text": "of"}, {"end": 3214.24, "start": 3214.2, "text": "it"}, {"end": 3214.4, "start": 3214.24, "text": "came"}, {"end": 3214.68, "start": 3214.4, "text": "from"}, {"end": 3215.12, "start": 3214.68, "text": "from"}, {"end": 3216.28, "start": 3215.12, "text": "uh"}, {"end": 3217.16, "start": 3216.28, "text": "from"}, {"end": 3217.68, "start": 3217.16, "text": "from"}, {"end": 3218.6, "start": 3217.68, "text": "academia"}, {"end": 3219.6, "start": 3218.6, "text": "especially"}, {"end": 3219.88, "start": 3219.6, "text": "way"}, {"end": 3220.08, "start": 3219.88, "text": "way"}, {"end": 3220.4, "start": 3220.08, "text": "back"}, {"end": 3221.08, "start": 3220.4, "text": "not"}, {"end": 3221.36, "start": 3221.08, "text": "way"}, {"end": 3222.52, "start": 3221.36, "text": "back"}, {"end": 3222.88, "start": 3222.52, "text": "decade"}, {"end": 3223.2, "start": 3222.88, "text": "two"}, {"end": 3224.4, "start": 3223.2, "text": "decades"}, {"end": 3225.56, "start": 3224.4, "text": "all"}, {"end": 3225.88, "start": 3225.56, "text": "right"}, {"end": 3226.72, "start": 3225.88, "text": "thanks"}, {"end": 3227.32, "start": 3226.72, "text": "um"}, {"end": 3227.6, "start": 3227.32, "text": "yeah"}, {"end": 3228.04, "start": 3227.6, "text": "so"}, {"end": 3228.32, "start": 3228.04, "text": "um"}, {"end": 3228.6, "start": 3228.32, "text": "yeah"}, {"end": 3228.72, "start": 3228.6, "text": "i"}, {"end": 3229.0, "start": 3228.72, "text": "definitely"}, {"end": 3229.28, "start": 3229.0, "text": "agree"}, {"end": 3229.92, "start": 3229.28, "text": "so"}, {"end": 3230.16, "start": 3229.92, "text": "these"}, {"end": 3230.76, "start": 3230.16, "text": "days"}, {"end": 3230.8, "start": 3230.76, "text": "you"}, {"end": 3231.12, "start": 3230.8, "text": "know"}, {"end": 3231.52, "start": 3231.12, "text": "uh"}, {"end": 3231.8, "start": 3231.52, "text": "as"}, {"end": 3232.04, "start": 3231.8, "text": "a"}, {"end": 3232.8, "start": 3232.04, "text": "conference"}, {"end": 3233.0, "start": 3232.8, "text": "review"}, {"end": 3233.32, "start": 3233.0, "text": "you"}, {"end": 3233.68, "start": 3233.32, "text": "know"}, {"end": 3233.96, "start": 3233.68, "text": "we"}, {"end": 3234.32, "start": 3233.96, "text": "see"}, {"end": 3234.72, "start": 3234.32, "text": "that"}, {"end": 3235.44, "start": 3234.72, "text": "uh"}, {"end": 3235.84, "start": 3235.44, "text": "for"}, {"end": 3236.4, "start": 3235.84, "text": "example"}, {"end": 3236.56, "start": 3236.4, "text": "the"}, {"end": 3236.84, "start": 3236.56, "text": "applied"}, {"end": 3237.04, "start": 3236.84, "text": "data"}, {"end": 3237.4, "start": 3237.04, "text": "science"}, {"end": 3237.64, "start": 3237.4, "text": "track"}, {"end": 3237.8, "start": 3237.64, "text": "and"}, {"end": 3238.12, "start": 3237.8, "text": "research"}, {"end": 3238.76, "start": 3238.12, "text": "track"}, {"end": 3238.92, "start": 3238.76, "text": "um"}, {"end": 3239.12, "start": 3238.92, "text": "it's"}, {"end": 3239.16, "start": 3239.12, "text": "an"}, {"end": 3239.64, "start": 3239.16, "text": "interesting"}, {"end": 3239.64, "start": 3239.64, "text": "you"}, {"end": 3239.92, "start": 3239.64, "text": "know"}, {"end": 3239.96, "start": 3239.92, "text": "in"}], "text": " machine learning ideas came from a lot of that a lot of it came from from uh from from academia especially way way back not way back decade two decades all right thanks um yeah so um yeah i definitely agree so these days you know uh as a conference review you know we see that uh for example the applied data science track and research track um it's an interesting you know in"}, {"chunks": [{"end": 3240.16, "start": 3240.0, "text": "to"}, {"end": 3240.56, "start": 3240.16, "text": "play"}, {"end": 3240.64, "start": 3240.56, "text": "between"}, {"end": 3240.64, "start": 3240.64, "text": "the"}, {"end": 3240.88, "start": 3240.64, "text": "two"}, {"end": 3241.68, "start": 3240.88, "text": "tracks."}, {"end": 3242.24, "start": 3241.68, "text": "Sometimes"}, {"end": 3242.72, "start": 3242.24, "text": "it's"}, {"end": 3243.2, "start": 3242.72, "text": "actually"}, {"end": 3243.44, "start": 3243.2, "text": "very"}, {"end": 3244.16, "start": 3243.44, "text": "hard"}, {"end": 3244.76, "start": 3244.16, "text": "for"}, {"end": 3244.96, "start": 3244.76, "text": "some"}, {"end": 3245.12, "start": 3244.96, "text": "of"}, {"end": 3245.4, "start": 3245.12, "text": "the"}, {"end": 3245.72, "start": 3245.4, "text": "reviewers"}, {"end": 3245.96, "start": 3245.72, "text": "to"}, {"end": 3246.28, "start": 3245.96, "text": "actually"}, {"end": 3246.76, "start": 3246.28, "text": "evaluate"}, {"end": 3246.96, "start": 3246.76, "text": "some"}, {"end": 3247.08, "start": 3246.96, "text": "of"}, {"end": 3247.24, "start": 3247.08, "text": "the"}, {"end": 3247.56, "start": 3247.24, "text": "work"}, {"end": 3248.44, "start": 3247.56, "text": "from,"}, {"end": 3248.64, "start": 3248.44, "text": "let's"}, {"end": 3249.08, "start": 3248.64, "text": "just"}, {"end": 3249.4, "start": 3249.08, "text": "say"}, {"end": 3249.64, "start": 3249.4, "text": "the"}, {"end": 3249.8, "start": 3249.64, "text": "applied"}, {"end": 3249.96, "start": 3249.8, "text": "data"}, {"end": 3250.28, "start": 3249.96, "text": "science"}, {"end": 3250.72, "start": 3250.28, "text": "track"}, {"end": 3251.48, "start": 3250.72, "text": "for"}, {"end": 3251.84, "start": 3251.48, "text": "many"}, {"end": 3252.0, "start": 3251.84, "text": "of"}, {"end": 3252.12, "start": 3252.0, "text": "the"}, {"end": 3252.44, "start": 3252.12, "text": "reasons"}, {"end": 3252.52, "start": 3252.44, "text": "you"}, {"end": 3252.84, "start": 3252.52, "text": "already"}, {"end": 3253.8, "start": 3252.84, "text": "mentioned."}, {"end": 3254.04, "start": 3253.8, "text": "But"}, {"end": 3254.2, "start": 3254.04, "text": "I"}, {"end": 3254.64, "start": 3254.2, "text": "think,"}, {"end": 3254.8, "start": 3254.64, "text": "yeah,"}, {"end": 3255.04, "start": 3254.8, "text": "going"}, {"end": 3255.8, "start": 3255.04, "text": "forward,"}, {"end": 3256.12, "start": 3255.8, "text": "it's"}, {"end": 3256.56, "start": 3256.12, "text": "still"}, {"end": 3256.92, "start": 3256.56, "text": "gonna"}, {"end": 3257.2, "start": 3256.92, "text": "be"}, {"end": 3258.12, "start": 3257.2, "text": "maybe"}, {"end": 3258.44, "start": 3258.12, "text": "half-half"}, {"end": 3259.2, "start": 3258.44, "text": "thing."}, {"end": 3259.48, "start": 3259.2, "text": "But"}, {"end": 3260.4, "start": 3259.48, "text": "KDD,"}, {"end": 3260.64, "start": 3260.4, "text": "I"}, {"end": 3260.88, "start": 3260.64, "text": "mean,"}, {"end": 3261.12, "start": 3260.88, "text": "as"}, {"end": 3261.12, "start": 3261.12, "text": "a"}, {"end": 3261.52, "start": 3261.12, "text": "KDD"}, {"end": 3262.2, "start": 3261.52, "text": "conference,"}, {"end": 3263.12, "start": 3262.2, "text": "how"}, {"end": 3263.16, "start": 3263.12, "text": "do"}, {"end": 3263.36, "start": 3263.16, "text": "you"}, {"end": 3263.6, "start": 3263.36, "text": "think"}, {"end": 3264.48, "start": 3263.6, "text": "KDD"}, {"end": 3264.84, "start": 3264.48, "text": "can"}, {"end": 3265.72, "start": 3264.84, "text": "strive"}, {"end": 3266.16, "start": 3265.72, "text": "to"}, {"end": 3266.8, "start": 3266.16, "text": "basically"}, {"end": 3267.4, "start": 3266.8, "text": "sustain"}, {"end": 3267.68, "start": 3267.4, "text": "its"}, {"end": 3268.64, "start": 3267.68, "text": "identity"}, {"end": 3269.0, "start": 3268.64, "text": "in"}, {"end": 3269.64, "start": 3269.0, "text": "today's"}, {"end": 3269.96, "start": 3269.64, "text": "trend"}], "text": " to play between the two tracks. Sometimes it's actually very hard for some of the reviewers to actually evaluate some of the work from, let's just say the applied data science track for many of the reasons you already mentioned. But I think, yeah, going forward, it's still gonna be maybe half-half thing. But KDD, I mean, as a KDD conference, how do you think KDD can strive to basically sustain its identity in today's trend"}, {"chunks": [{"end": 3270.76, "start": 3270.0, "text": "where"}, {"end": 3271.04, "start": 3270.76, "text": "all"}, {"end": 3271.24, "start": 3271.04, "text": "the"}, {"end": 3271.64, "start": 3271.24, "text": "conference"}, {"end": 3272.24, "start": 3271.64, "text": "include"}, {"end": 3272.8, "start": 3272.24, "text": "machine"}, {"end": 3273.04, "start": 3272.8, "text": "learning,"}, {"end": 3273.48, "start": 3273.04, "text": "AI,"}, {"end": 3273.88, "start": 3273.48, "text": "and"}, {"end": 3274.16, "start": 3273.88, "text": "there's"}, {"end": 3274.8, "start": 3274.16, "text": "lots"}, {"end": 3275.04, "start": 3274.8, "text": "of"}, {"end": 3275.4, "start": 3275.04, "text": "things"}, {"end": 3276.12, "start": 3275.4, "text": "together."}, {"end": 3276.48, "start": 3276.12, "text": "What"}, {"end": 3277.36, "start": 3276.48, "text": "can"}, {"end": 3278.32, "start": 3277.36, "text": "KDD"}, {"end": 3278.52, "start": 3278.32, "text": "do"}, {"end": 3278.84, "start": 3278.52, "text": "to"}, {"end": 3279.32, "start": 3278.84, "text": "sustain"}, {"end": 3279.6, "start": 3279.32, "text": "its"}, {"end": 3281.68, "start": 3279.6, "text": "identity?"}, {"end": 3282.08, "start": 3281.68, "text": "Well,"}, {"end": 3283.0, "start": 3282.08, "text": "I"}, {"end": 3283.24, "start": 3283.0, "text": "mean,"}, {"end": 3284.08, "start": 3283.24, "text": "that's"}, {"end": 3284.64, "start": 3284.08, "text": "an"}, {"end": 3285.28, "start": 3284.64, "text": "interesting"}, {"end": 3286.16, "start": 3285.28, "text": "question."}, {"end": 3286.96, "start": 3286.16, "text": "Being"}, {"end": 3287.4, "start": 3286.96, "text": "a"}, {"end": 3288.0, "start": 3287.4, "text": "real"}, {"end": 3288.76, "start": 3288.0, "text": "old"}, {"end": 3289.28, "start": 3288.76, "text": "guy,"}, {"end": 3289.76, "start": 3289.28, "text": "I"}, {"end": 3290.12, "start": 3289.76, "text": "mean,"}, {"end": 3290.56, "start": 3290.12, "text": "I"}, {"end": 3291.96, "start": 3290.56, "text": "remember,"}, {"end": 3292.44, "start": 3291.96, "text": "for"}, {"end": 3293.08, "start": 3292.44, "text": "example,"}, {"end": 3293.52, "start": 3293.08, "text": "being"}, {"end": 3294.12, "start": 3293.52, "text": "involved"}, {"end": 3294.84, "start": 3294.12, "text": "in"}, {"end": 3295.28, "start": 3294.84, "text": "the"}, {"end": 3295.72, "start": 3295.28, "text": "theory"}, {"end": 3296.96, "start": 3295.72, "text": "conferences,"}, {"end": 3297.32, "start": 3296.96, "text": "the"}, {"end": 3297.68, "start": 3297.32, "text": "SIG"}, {"end": 3298.72, "start": 3297.68, "text": "Act"}, {"end": 3299.44, "start": 3298.72, "text": "and"}, {"end": 3299.96, "start": 3299.44, "text": "FOX."}], "text": " where all the conference include machine learning, AI, and there's lots of things together. What can KDD do to sustain its identity? Well, I mean, that's an interesting question. Being a real old guy, I mean, I remember, for example, being involved in the theory conferences, the SIG Act and FOX."}, {"chunks": [{"end": 3300.76, "start": 3300.0, "text": "what"}, {"end": 3302.0, "start": 3300.76, "text": "happens"}, {"end": 3302.48, "start": 3302.0, "text": "in"}, {"end": 3302.88, "start": 3302.48, "text": "these"}, {"end": 3303.52, "start": 3302.88, "text": "conferences,"}, {"end": 3303.84, "start": 3303.52, "text": "it"}, {"end": 3304.24, "start": 3303.84, "text": "just"}, {"end": 3304.8, "start": 3304.24, "text": "evolves."}, {"end": 3305.64, "start": 3304.8, "text": "People"}, {"end": 3306.88, "start": 3305.64, "text": "send"}, {"end": 3307.16, "start": 3306.88, "text": "in"}, {"end": 3307.84, "start": 3307.16, "text": "papers"}, {"end": 3308.32, "start": 3307.84, "text": "that"}, {"end": 3309.76, "start": 3308.32, "text": "are"}, {"end": 3311.04, "start": 3309.76, "text": "interesting."}, {"end": 3312.76, "start": 3311.04, "text": "They"}, {"end": 3313.08, "start": 3312.76, "text": "may"}, {"end": 3313.72, "start": 3313.08, "text": "be"}, {"end": 3314.56, "start": 3313.72, "text": "somewhat"}, {"end": 3315.04, "start": 3314.56, "text": "outside"}, {"end": 3316.12, "start": 3315.04, "text": "the"}, {"end": 3316.8, "start": 3316.12, "text": "scope"}, {"end": 3318.68, "start": 3316.8, "text": "or"}, {"end": 3319.68, "start": 3318.68, "text": "mainstream"}, {"end": 3319.88, "start": 3319.68, "text": "of"}, {"end": 3320.2, "start": 3319.88, "text": "the"}, {"end": 3321.28, "start": 3320.2, "text": "conference."}, {"end": 3321.96, "start": 3321.28, "text": "And"}, {"end": 3322.44, "start": 3321.96, "text": "if"}, {"end": 3323.04, "start": 3322.44, "text": "you"}, {"end": 3323.4, "start": 3323.04, "text": "look"}, {"end": 3324.12, "start": 3323.4, "text": "through"}, {"end": 3324.56, "start": 3324.12, "text": "the"}, {"end": 3325.24, "start": 3324.56, "text": "years,"}, {"end": 3326.08, "start": 3325.24, "text": "like"}, {"end": 3326.64, "start": 3326.08, "text": "every"}, {"end": 3327.36, "start": 3326.64, "text": "10"}, {"end": 3327.96, "start": 3327.36, "text": "years,"}, {"end": 3328.36, "start": 3327.96, "text": "the"}, {"end": 3328.76, "start": 3328.36, "text": "main"}, {"end": 3329.04, "start": 3328.76, "text": "topics"}, {"end": 3329.08, "start": 3329.04, "text": "are"}, {"end": 3329.64, "start": 3329.08, "text": "completely"}, {"end": 3329.96, "start": 3329.64, "text": "different."}], "text": " what happens in these conferences, it just evolves. People send in papers that are interesting. They may be somewhat outside the scope or mainstream of the conference. And if you look through the years, like every 10 years, the main topics are completely different."}, {"chunks": [{"end": 3331.04, "start": 3330.0, "text": "different"}, {"end": 3331.28, "start": 3331.04, "text": "and"}, {"end": 3331.6, "start": 3331.28, "text": "so"}, {"end": 3331.6, "start": 3331.6, "text": "i"}, {"end": 3331.6, "start": 3331.6, "text": "i"}, {"end": 3331.6, "start": 3331.6, "text": "i"}, {"end": 3331.6, "start": 3331.6, "text": "would"}, {"end": 3331.84, "start": 3331.6, "text": "i"}, {"end": 3332.0, "start": 3331.84, "text": "just"}, {"end": 3332.08, "start": 3332.0, "text": "i"}, {"end": 3332.64, "start": 3332.08, "text": "just"}, {"end": 3332.68, "start": 3332.64, "text": "wouldn't"}, {"end": 3332.88, "start": 3332.68, "text": "worry"}, {"end": 3333.52, "start": 3332.88, "text": "about"}, {"end": 3333.88, "start": 3333.52, "text": "it"}, {"end": 3333.96, "start": 3333.88, "text": "uh"}, {"end": 3334.24, "start": 3333.96, "text": "as"}, {"end": 3334.64, "start": 3334.24, "text": "long"}, {"end": 3334.88, "start": 3334.64, "text": "as"}, {"end": 3336.64, "start": 3334.88, "text": "people"}, {"end": 3336.88, "start": 3336.64, "text": "are"}, {"end": 3337.16, "start": 3336.88, "text": "showing"}, {"end": 3337.44, "start": 3337.16, "text": "up"}, {"end": 3337.92, "start": 3337.44, "text": "for"}, {"end": 3338.08, "start": 3337.92, "text": "the"}, {"end": 3338.72, "start": 3338.08, "text": "conference"}, {"end": 3338.88, "start": 3338.72, "text": "or"}, {"end": 3339.0, "start": 3338.88, "text": "in"}, {"end": 3340.0, "start": 3339.0, "text": "this"}, {"end": 3341.44, "start": 3340.0, "text": "case"}, {"end": 3341.84, "start": 3341.44, "text": "uh"}, {"end": 3342.4, "start": 3341.84, "text": "tuning"}, {"end": 3342.68, "start": 3342.4, "text": "in"}, {"end": 3343.04, "start": 3342.68, "text": "online"}, {"end": 3344.24, "start": 3343.04, "text": "uh"}, {"end": 3344.84, "start": 3344.24, "text": "and"}, {"end": 3345.36, "start": 3344.84, "text": "they're"}, {"end": 3345.72, "start": 3345.36, "text": "submitting"}, {"end": 3346.24, "start": 3345.72, "text": "papers"}, {"end": 3346.68, "start": 3346.24, "text": "which"}, {"end": 3346.84, "start": 3346.68, "text": "i"}, {"end": 3347.0, "start": 3346.84, "text": "know"}, {"end": 3347.32, "start": 3347.0, "text": "they"}, {"end": 3347.72, "start": 3347.32, "text": "they"}, {"end": 3348.4, "start": 3347.72, "text": "are"}, {"end": 3349.12, "start": 3348.4, "text": "uh"}, {"end": 3349.4, "start": 3349.12, "text": "in"}, {"end": 3349.64, "start": 3349.4, "text": "great"}, {"end": 3350.2, "start": 3349.64, "text": "numbers"}, {"end": 3351.2, "start": 3350.2, "text": "uh"}, {"end": 3351.76, "start": 3351.2, "text": "i"}, {"end": 3352.4, "start": 3351.76, "text": "i"}, {"end": 3352.72, "start": 3352.4, "text": "wouldn't"}, {"end": 3353.32, "start": 3352.72, "text": "worry"}, {"end": 3354.16, "start": 3353.32, "text": "about"}, {"end": 3354.16, "start": 3354.16, "text": "it"}, {"end": 3354.2, "start": 3354.16, "text": "at"}, {"end": 3354.28, "start": 3354.2, "text": "all"}, {"end": 3355.48, "start": 3354.28, "text": "just"}, {"end": 3355.84, "start": 3355.48, "text": "just"}, {"end": 3355.96, "start": 3355.84, "text": "let"}, {"end": 3356.56, "start": 3355.96, "text": "it"}, {"end": 3357.96, "start": 3356.56, "text": "happen"}, {"end": 3358.24, "start": 3357.96, "text": "great"}, {"end": 3358.4, "start": 3358.24, "text": "yeah"}, {"end": 3358.68, "start": 3358.4, "text": "so"}, {"end": 3359.16, "start": 3358.68, "text": "as"}, {"end": 3359.2, "start": 3359.16, "text": "long"}, {"end": 3359.6, "start": 3359.2, "text": "as"}, {"end": 3359.84, "start": 3359.6, "text": "we"}, {"end": 3359.84, "start": 3359.84, "text": "can"}, {"end": 3359.96, "start": 3359.84, "text": "try"}], "text": " different and so i i i would i just i just wouldn't worry about it uh as long as people are showing up for the conference or in this case uh tuning in online uh and they're submitting papers which i know they they are uh in great numbers uh i i wouldn't worry about it at all just just let it happen great yeah so as long as we can try"}, {"chunks": [{"end": 3360.44, "start": 3360.0, "text": "people"}, {"end": 3360.88, "start": 3360.44, "text": "to"}, {"end": 3361.12, "start": 3360.88, "text": "these"}, {"end": 3361.44, "start": 3361.12, "text": "conferences"}, {"end": 3362.12, "start": 3361.44, "text": "and"}, {"end": 3362.48, "start": 3362.12, "text": "you"}, {"end": 3362.88, "start": 3362.48, "text": "know"}, {"end": 3363.2, "start": 3362.88, "text": "as"}, {"end": 3363.48, "start": 3363.2, "text": "online"}, {"end": 3363.96, "start": 3363.48, "text": "ones"}, {"end": 3364.28, "start": 3363.96, "text": "are"}, {"end": 3364.6, "start": 3364.28, "text": "not"}, {"end": 3364.84, "start": 3364.6, "text": "easy"}, {"end": 3365.08, "start": 3364.84, "text": "as"}, {"end": 3365.08, "start": 3365.08, "text": "we"}, {"end": 3365.24, "start": 3365.08, "text": "can"}, {"end": 3365.48, "start": 3365.24, "text": "see"}, {"end": 3365.72, "start": 3365.48, "text": "from"}, {"end": 3365.88, "start": 3365.72, "text": "the"}, {"end": 3366.28, "start": 3365.88, "text": "last"}, {"end": 3366.72, "start": 3366.28, "text": "year"}, {"end": 3366.96, "start": 3366.72, "text": "um"}, {"end": 3367.4, "start": 3366.96, "text": "all"}, {"end": 3367.64, "start": 3367.4, "text": "right"}, {"end": 3367.64, "start": 3367.64, "text": "good"}, {"end": 3367.92, "start": 3367.64, "text": "so"}, {"end": 3368.08, "start": 3367.92, "text": "the"}, {"end": 3368.56, "start": 3368.08, "text": "next"}, {"end": 3368.88, "start": 3368.56, "text": "question"}, {"end": 3369.16, "start": 3368.88, "text": "we"}, {"end": 3369.4, "start": 3369.16, "text": "have"}, {"end": 3369.72, "start": 3369.4, "text": "is"}, {"end": 3369.8, "start": 3369.72, "text": "um"}, {"end": 3370.44, "start": 3369.8, "text": "you"}, {"end": 3370.88, "start": 3370.44, "text": "know"}, {"end": 3371.44, "start": 3370.88, "text": "recent"}, {"end": 3372.04, "start": 3371.44, "text": "years"}, {"end": 3372.52, "start": 3372.04, "text": "there"}, {"end": 3372.68, "start": 3372.52, "text": "is"}, {"end": 3373.24, "start": 3372.68, "text": "this"}, {"end": 3373.4, "start": 3373.24, "text": "growing"}, {"end": 3373.96, "start": 3373.4, "text": "trend"}, {"end": 3374.36, "start": 3373.96, "text": "of"}, {"end": 3374.52, "start": 3374.36, "text": "uh"}, {"end": 3374.76, "start": 3374.52, "text": "this"}, {"end": 3375.12, "start": 3374.76, "text": "open"}, {"end": 3375.96, "start": 3375.12, "text": "source"}, {"end": 3376.36, "start": 3375.96, "text": "and"}, {"end": 3376.84, "start": 3376.36, "text": "this"}, {"end": 3377.04, "start": 3376.84, "text": "uh"}, {"end": 3377.12, "start": 3377.04, "text": "you"}, {"end": 3377.44, "start": 3377.12, "text": "know"}, {"end": 3377.6, "start": 3377.44, "text": "uh"}, {"end": 3377.8, "start": 3377.6, "text": "this"}, {"end": 3378.4, "start": 3377.8, "text": "movement"}, {"end": 3378.6, "start": 3378.4, "text": "and"}, {"end": 3378.84, "start": 3378.6, "text": "we"}, {"end": 3378.92, "start": 3378.84, "text": "see"}, {"end": 3379.48, "start": 3378.92, "text": "a"}, {"end": 3380.0, "start": 3379.48, "text": "lot"}, {"end": 3380.44, "start": 3380.0, "text": "of"}, {"end": 3380.52, "start": 3380.44, "text": "uh"}, {"end": 3381.24, "start": 3380.52, "text": "companies"}, {"end": 3381.48, "start": 3381.24, "text": "and"}, {"end": 3381.96, "start": 3381.48, "text": "and"}, {"end": 3382.2, "start": 3381.96, "text": "and"}, {"end": 3382.84, "start": 3382.2, "text": "and"}, {"end": 3382.92, "start": 3382.84, "text": "and"}, {"end": 3383.08, "start": 3382.92, "text": "people"}, {"end": 3383.52, "start": 3383.08, "text": "in"}, {"end": 3384.32, "start": 3383.52, "text": "academia"}, {"end": 3384.56, "start": 3384.32, "text": "open"}, {"end": 3385.2, "start": 3384.56, "text": "source"}, {"end": 3385.44, "start": 3385.2, "text": "their"}, {"end": 3385.68, "start": 3385.44, "text": "stuff"}, {"end": 3385.88, "start": 3385.68, "text": "how"}, {"end": 3385.88, "start": 3385.88, "text": "do"}, {"end": 3386.0, "start": 3385.88, "text": "you"}, {"end": 3386.2, "start": 3386.0, "text": "think"}, {"end": 3386.56, "start": 3386.2, "text": "this"}, {"end": 3386.68, "start": 3386.56, "text": "open"}, {"end": 3387.12, "start": 3386.68, "text": "source"}, {"end": 3387.36, "start": 3387.12, "text": "this"}, {"end": 3387.52, "start": 3387.36, "text": "whole"}, {"end": 3387.6, "start": 3387.52, "text": "thing"}, {"end": 3387.88, "start": 3387.6, "text": "is"}, {"end": 3388.24, "start": 3387.88, "text": "going"}, {"end": 3388.28, "start": 3388.24, "text": "to"}, {"end": 3389.0, "start": 3388.28, "text": "affect"}, {"end": 3389.16, "start": 3389.0, "text": "uh"}, {"end": 3389.36, "start": 3389.16, "text": "data"}, {"end": 3389.76, "start": 3389.36, "text": "science"}, {"end": 3389.96, "start": 3389.76, "text": "community"}], "text": " people to these conferences and you know as online ones are not easy as we can see from the last year um all right good so the next question we have is um you know recent years there is this growing trend of uh this open source and this uh you know uh this movement and we see a lot of uh companies and and and and and people in academia open source their stuff how do you think this open source this whole thing is going to affect uh data science community"}, {"chunks": [{"end": 3390.56, "start": 3390.0, "text": "as"}, {"end": 3392.48, "start": 3390.56, "text": "a"}, {"end": 3392.96, "start": 3392.48, "text": "whole?"}, {"end": 3393.36, "start": 3392.96, "text": "Well,"}, {"end": 3393.84, "start": 3393.36, "text": "I"}, {"end": 3394.4, "start": 3393.84, "text": "think"}, {"end": 3395.04, "start": 3394.4, "text": "it's"}, {"end": 3395.24, "start": 3395.04, "text": "the,"}, {"end": 3395.32, "start": 3395.24, "text": "you"}, {"end": 3395.36, "start": 3395.32, "text": "know,"}, {"end": 3395.48, "start": 3395.36, "text": "I"}, {"end": 3396.0, "start": 3395.48, "text": "think"}, {"end": 3396.68, "start": 3396.0, "text": "open"}, {"end": 3397.72, "start": 3396.68, "text": "source"}, {"end": 3398.52, "start": 3397.72, "text": "benefits"}, {"end": 3399.64, "start": 3398.52, "text": "every"}, {"end": 3400.4, "start": 3399.64, "text": "field,"}, {"end": 3401.68, "start": 3400.4, "text": "not"}, {"end": 3402.48, "start": 3401.68, "text": "only"}, {"end": 3403.0, "start": 3402.48, "text": "data"}, {"end": 3403.44, "start": 3403.0, "text": "science."}, {"end": 3403.8, "start": 3403.44, "text": "I"}, {"end": 3404.28, "start": 3403.8, "text": "know"}, {"end": 3405.76, "start": 3404.28, "text": "there"}, {"end": 3406.08, "start": 3405.76, "text": "are"}, {"end": 3406.8, "start": 3406.08, "text": "a"}, {"end": 3407.6, "start": 3406.8, "text": "lot"}, {"end": 3409.56, "start": 3407.6, "text": "of"}, {"end": 3410.2, "start": 3409.56, "text": "free"}, {"end": 3410.76, "start": 3410.2, "text": "components"}, {"end": 3411.4, "start": 3410.76, "text": "available"}, {"end": 3412.16, "start": 3411.4, "text": "that"}, {"end": 3412.76, "start": 3412.16, "text": "data"}, {"end": 3414.36, "start": 3412.76, "text": "scientists"}, {"end": 3414.84, "start": 3414.36, "text": "can"}, {"end": 3415.48, "start": 3414.84, "text": "use."}, {"end": 3416.08, "start": 3415.48, "text": "And,"}, {"end": 3417.24, "start": 3416.08, "text": "you"}, {"end": 3417.68, "start": 3417.24, "text": "know,"}, {"end": 3418.24, "start": 3417.68, "text": "I"}, {"end": 3418.96, "start": 3418.24, "text": "think"}, {"end": 3419.12, "start": 3418.96, "text": "it's"}, {"end": 3419.96, "start": 3419.12, "text": "basically"}], "text": " as a whole? Well, I think it's the, you know, I think open source benefits every field, not only data science. I know there are a lot of free components available that data scientists can use. And, you know, I think it's basically"}, {"chunks": [{"end": 3420.96, "start": 3420.0, "text": "It's"}, {"end": 3421.04, "start": 3420.96, "text": "a"}, {"end": 3421.56, "start": 3421.04, "text": "great"}, {"end": 3422.36, "start": 3421.56, "text": "thing."}, {"end": 3422.92, "start": 3422.36, "text": "I"}, {"end": 3423.4, "start": 3422.92, "text": "wish"}, {"end": 3424.0, "start": 3423.4, "text": "I"}, {"end": 3424.8, "start": 3424.0, "text": "had"}, {"end": 3425.0, "start": 3424.8, "text": "a,"}, {"end": 3425.52, "start": 3425.0, "text": "I"}, {"end": 3426.16, "start": 3425.52, "text": "could"}, {"end": 3426.52, "start": 3426.16, "text": "say"}, {"end": 3427.28, "start": 3426.52, "text": "thing,"}, {"end": 3428.96, "start": 3427.28, "text": "progress"}, {"end": 3429.48, "start": 3428.96, "text": "goes"}, {"end": 3431.16, "start": 3429.48, "text": "20%"}, {"end": 3431.64, "start": 3431.16, "text": "faster"}, {"end": 3432.4, "start": 3431.64, "text": "because"}, {"end": 3432.8, "start": 3432.4, "text": "open"}, {"end": 3433.76, "start": 3432.8, "text": "source"}, {"end": 3434.84, "start": 3433.76, "text": "products."}, {"end": 3434.84, "start": 3434.84, "text": "I"}, {"end": 3435.44, "start": 3434.84, "text": "can't"}, {"end": 3435.72, "start": 3435.44, "text": "say"}, {"end": 3436.44, "start": 3435.72, "text": "that,"}, {"end": 3436.64, "start": 3436.44, "text": "but"}, {"end": 3436.88, "start": 3436.64, "text": "my"}, {"end": 3437.44, "start": 3436.88, "text": "intuition"}, {"end": 3437.92, "start": 3437.44, "text": "is"}, {"end": 3438.48, "start": 3437.92, "text": "that"}, {"end": 3439.64, "start": 3438.48, "text": "it's"}, {"end": 3439.84, "start": 3439.64, "text": "a"}, {"end": 3440.32, "start": 3439.84, "text": "really"}, {"end": 3440.76, "start": 3440.32, "text": "good"}, {"end": 3440.96, "start": 3440.76, "text": "thing."}, {"end": 3441.84, "start": 3440.96, "text": "And"}, {"end": 3442.8, "start": 3441.84, "text": "if"}, {"end": 3443.52, "start": 3442.8, "text": "you"}, {"end": 3443.88, "start": 3443.52, "text": "can"}, {"end": 3444.44, "start": 3443.88, "text": "contribute"}, {"end": 3445.0, "start": 3444.44, "text": "to"}, {"end": 3446.0, "start": 3445.0, "text": "it,"}, {"end": 3446.32, "start": 3446.0, "text": "I"}, {"end": 3448.8, "start": 3446.32, "text": "think"}, {"end": 3448.92, "start": 3448.8, "text": "you"}, {"end": 3449.36, "start": 3448.92, "text": "should."}, {"end": 3449.72, "start": 3449.36, "text": "Thank"}, {"end": 3449.96, "start": 3449.72, "text": "you."}], "text": " It's a great thing. I wish I had a, I could say thing, progress goes 20% faster because open source products. I can't say that, but my intuition is that it's a really good thing. And if you can contribute to it, I think you should. Thank you."}, {"chunks": [{"end": 3450.48, "start": 3450.0, "text": "One"}, {"end": 3450.76, "start": 3450.48, "text": "last"}, {"end": 3451.04, "start": 3450.76, "text": "question"}, {"end": 3451.2, "start": 3451.04, "text": "we"}, {"end": 3451.36, "start": 3451.2, "text": "have"}, {"end": 3451.6, "start": 3451.36, "text": "is,"}, {"end": 3451.6, "start": 3451.6, "text": "you"}, {"end": 3451.8, "start": 3451.6, "text": "know,"}, {"end": 3453.0, "start": 3451.8, "text": "since"}, {"end": 3453.24, "start": 3453.0, "text": "you've"}, {"end": 3453.6, "start": 3453.24, "text": "been"}, {"end": 3453.96, "start": 3453.6, "text": "around"}, {"end": 3454.0, "start": 3453.96, "text": "the"}, {"end": 3454.68, "start": 3454.0, "text": "field"}, {"end": 3455.12, "start": 3454.68, "text": "for"}, {"end": 3455.6, "start": 3455.12, "text": "so"}, {"end": 3456.0, "start": 3455.6, "text": "long"}, {"end": 3456.2, "start": 3456.0, "text": "and"}, {"end": 3456.48, "start": 3456.2, "text": "you've"}, {"end": 3456.68, "start": 3456.48, "text": "got,"}, {"end": 3456.92, "start": 3456.68, "text": "you"}, {"end": 3457.0, "start": 3456.92, "text": "know,"}, {"end": 3457.6, "start": 3457.0, "text": "almost"}, {"end": 3457.76, "start": 3457.6, "text": "all"}, {"end": 3458.44, "start": 3457.76, "text": "the"}, {"end": 3459.04, "start": 3458.44, "text": "prestigious"}, {"end": 3459.28, "start": 3459.04, "text": "awards,"}, {"end": 3459.6, "start": 3459.28, "text": "so"}, {"end": 3459.8, "start": 3459.6, "text": "if"}, {"end": 3460.24, "start": 3459.8, "text": "you"}, {"end": 3460.6, "start": 3460.24, "text": "have"}, {"end": 3460.8, "start": 3460.6, "text": "some"}, {"end": 3461.28, "start": 3460.8, "text": "advice"}, {"end": 3461.48, "start": 3461.28, "text": "for"}, {"end": 3461.6, "start": 3461.48, "text": "the"}, {"end": 3462.6, "start": 3461.6, "text": "coming"}, {"end": 3463.76, "start": 3462.6, "text": "researchers"}, {"end": 3463.88, "start": 3463.76, "text": "and"}, {"end": 3464.52, "start": 3463.88, "text": "practitioners"}, {"end": 3464.88, "start": 3464.52, "text": "field,"}, {"end": 3465.12, "start": 3464.88, "text": "those"}, {"end": 3465.32, "start": 3465.12, "text": "young"}, {"end": 3466.2, "start": 3465.32, "text": "students,"}, {"end": 3466.72, "start": 3466.2, "text": "what"}, {"end": 3466.96, "start": 3466.72, "text": "would"}, {"end": 3467.12, "start": 3466.96, "text": "you"}, {"end": 3467.64, "start": 3467.12, "text": "suggest"}, {"end": 3467.92, "start": 3467.64, "text": "them"}, {"end": 3467.96, "start": 3467.92, "text": "if"}, {"end": 3468.24, "start": 3467.96, "text": "they"}, {"end": 3468.52, "start": 3468.24, "text": "want"}, {"end": 3469.2, "start": 3468.52, "text": "to"}, {"end": 3469.48, "start": 3469.2, "text": "achieve"}, {"end": 3469.72, "start": 3469.48, "text": "anything"}, {"end": 3469.92, "start": 3469.72, "text": "in"}, {"end": 3470.0, "start": 3469.92, "text": "the"}, {"end": 3470.16, "start": 3470.0, "text": "data"}, {"end": 3470.48, "start": 3470.16, "text": "science"}, {"end": 3470.76, "start": 3470.48, "text": "community"}, {"end": 3471.0, "start": 3470.76, "text": "in"}, {"end": 3471.44, "start": 3471.0, "text": "the"}, {"end": 3472.52, "start": 3471.44, "text": "future?"}, {"end": 3472.88, "start": 3472.52, "text": "Oh,"}, {"end": 3473.52, "start": 3472.88, "text": "boy."}, {"end": 3474.08, "start": 3473.52, "text": "Well,"}, {"end": 3476.6, "start": 3474.08, "text": "I"}, {"end": 3477.52, "start": 3476.6, "text": "don't"}, {"end": 3478.68, "start": 3477.52, "text": "know,"}, {"end": 3479.96, "start": 3478.68, "text": "actually."}], "text": " One last question we have is, you know, since you've been around the field for so long and you've got, you know, almost all the prestigious awards, so if you have some advice for the coming researchers and practitioners field, those young students, what would you suggest them if they want to achieve anything in the data science community in the future? Oh, boy. Well, I don't know, actually."}, {"chunks": [{"end": 3480.8, "start": 3480.0, "text": "When"}, {"end": 3481.12, "start": 3480.8, "text": "I"}, {"end": 3481.16, "start": 3481.12, "text": "look"}, {"end": 3481.64, "start": 3481.16, "text": "back"}, {"end": 3482.04, "start": 3481.64, "text": "at"}, {"end": 3482.6, "start": 3482.04, "text": "my"}, {"end": 3483.16, "start": 3482.6, "text": "life,"}, {"end": 3483.64, "start": 3483.16, "text": "it's"}, {"end": 3483.88, "start": 3483.64, "text": "just"}, {"end": 3483.88, "start": 3483.88, "text": "sort"}, {"end": 3483.88, "start": 3483.88, "text": "of"}, {"end": 3483.88, "start": 3483.88, "text": "a"}, {"end": 3484.08, "start": 3483.88, "text": "bunch"}, {"end": 3484.44, "start": 3484.08, "text": "of"}, {"end": 3485.16, "start": 3484.44, "text": "random"}, {"end": 3486.56, "start": 3485.16, "text": "choices"}, {"end": 3486.84, "start": 3486.56, "text": "that"}, {"end": 3488.44, "start": 3486.84, "text": "didn't"}, {"end": 3488.68, "start": 3488.44, "text": "seem"}, {"end": 3489.0, "start": 3488.68, "text": "that"}, {"end": 3489.8, "start": 3489.0, "text": "important"}, {"end": 3490.24, "start": 3489.8, "text": "at"}, {"end": 3490.6, "start": 3490.24, "text": "the"}, {"end": 3491.48, "start": 3490.6, "text": "time"}, {"end": 3492.6, "start": 3491.48, "text": "and"}, {"end": 3492.88, "start": 3492.6, "text": "turned"}, {"end": 3493.48, "start": 3492.88, "text": "out"}, {"end": 3493.76, "start": 3493.48, "text": "to"}, {"end": 3494.28, "start": 3493.76, "text": "work"}, {"end": 3495.04, "start": 3494.28, "text": "well."}, {"end": 3495.28, "start": 3495.04, "text": "I"}, {"end": 3495.88, "start": 3495.28, "text": "think,"}, {"end": 3496.56, "start": 3495.88, "text": "you"}, {"end": 3498.16, "start": 3496.56, "text": "know,"}, {"end": 3498.68, "start": 3498.16, "text": "the"}, {"end": 3500.36, "start": 3498.68, "text": "obvious"}, {"end": 3502.0, "start": 3500.36, "text": "things,"}, {"end": 3502.2, "start": 3502.0, "text": "it"}, {"end": 3503.24, "start": 3502.2, "text": "pays"}, {"end": 3504.08, "start": 3503.24, "text": "to"}, {"end": 3504.44, "start": 3504.08, "text": "get"}, {"end": 3504.88, "start": 3504.44, "text": "a"}, {"end": 3505.0, "start": 3504.88, "text": "good"}, {"end": 3505.96, "start": 3505.0, "text": "education."}, {"end": 3506.44, "start": 3505.96, "text": "I"}, {"end": 3507.28, "start": 3506.44, "text": "presume"}, {"end": 3507.6, "start": 3507.28, "text": "that"}, {"end": 3508.28, "start": 3507.6, "text": "everybody"}, {"end": 3509.32, "start": 3508.28, "text": "listening"}, {"end": 3509.96, "start": 3509.32, "text": "is,"}], "text": " When I look back at my life, it's just sort of a bunch of random choices that didn't seem that important at the time and turned out to work well. I think, you know, the obvious things, it pays to get a good education. I presume that everybody listening is,"}, {"chunks": [{"end": 3510.28, "start": 3510.0, "text": "either"}, {"end": 3510.96, "start": 3510.28, "text": "done"}, {"end": 3511.56, "start": 3510.96, "text": "that"}, {"end": 3512.04, "start": 3511.56, "text": "or"}, {"end": 3512.68, "start": 3512.04, "text": "about"}, {"end": 3514.16, "start": 3512.68, "text": "to"}, {"end": 3514.64, "start": 3514.16, "text": "do"}, {"end": 3517.12, "start": 3514.64, "text": "that."}, {"end": 3518.28, "start": 3517.12, "text": "And,"}, {"end": 3519.08, "start": 3518.28, "text": "you"}, {"end": 3519.52, "start": 3519.08, "text": "know,"}, {"end": 3520.88, "start": 3519.52, "text": "you"}, {"end": 3521.56, "start": 3520.88, "text": "want"}, {"end": 3522.8, "start": 3521.56, "text": "to"}, {"end": 3524.08, "start": 3522.8, "text": "get,"}, {"end": 3524.32, "start": 3524.08, "text": "you"}, {"end": 3524.52, "start": 3524.32, "text": "know,"}, {"end": 3524.8, "start": 3524.52, "text": "first"}, {"end": 3525.04, "start": 3524.8, "text": "of"}, {"end": 3525.76, "start": 3525.04, "text": "all,"}, {"end": 3526.28, "start": 3525.76, "text": "find"}, {"end": 3526.32, "start": 3526.28, "text": "a"}, {"end": 3526.76, "start": 3526.32, "text": "job"}, {"end": 3526.96, "start": 3526.76, "text": "that"}, {"end": 3527.08, "start": 3526.96, "text": "you"}, {"end": 3527.36, "start": 3527.08, "text": "feel"}, {"end": 3528.04, "start": 3527.36, "text": "comfortable"}, {"end": 3529.8, "start": 3528.04, "text": "with."}, {"end": 3530.28, "start": 3529.8, "text": "Okay,"}, {"end": 3531.2, "start": 3530.28, "text": "don't,"}, {"end": 3531.84, "start": 3531.2, "text": "again,"}, {"end": 3532.04, "start": 3531.84, "text": "sort"}, {"end": 3532.44, "start": 3532.04, "text": "of,"}, {"end": 3533.12, "start": 3532.44, "text": "don't"}, {"end": 3533.68, "start": 3533.12, "text": "overthink"}, {"end": 3533.96, "start": 3533.68, "text": "it,"}, {"end": 3534.08, "start": 3533.96, "text": "I"}, {"end": 3534.44, "start": 3534.08, "text": "guess"}, {"end": 3535.12, "start": 3534.44, "text": "is"}, {"end": 3536.12, "start": 3535.12, "text": "what"}, {"end": 3536.96, "start": 3536.12, "text": "I'm"}, {"end": 3538.44, "start": 3536.96, "text": "saying."}, {"end": 3539.96, "start": 3538.44, "text": "Don't,"}], "text": " either done that or about to do that. And, you know, you want to get, you know, first of all, find a job that you feel comfortable with. Okay, don't, again, sort of, don't overthink it, I guess is what I'm saying. Don't,"}, {"chunks": [{"end": 3541.56, "start": 3540.0, "text": "Don't"}, {"end": 3541.76, "start": 3541.56, "text": "think"}, {"end": 3542.72, "start": 3541.76, "text": "in"}, {"end": 3543.36, "start": 3542.72, "text": "terms"}, {"end": 3544.12, "start": 3543.36, "text": "of"}, {"end": 3544.96, "start": 3544.12, "text": "what"}, {"end": 3545.24, "start": 3544.96, "text": "will"}, {"end": 3545.76, "start": 3545.24, "text": "the"}, {"end": 3546.0, "start": 3545.76, "text": "payoff"}, {"end": 3546.12, "start": 3546.0, "text": "be"}, {"end": 3546.84, "start": 3546.12, "text": "to"}, {"end": 3548.0, "start": 3546.84, "text": "taking"}, {"end": 3548.96, "start": 3548.0, "text": "a"}, {"end": 3549.44, "start": 3548.96, "text": "certain"}, {"end": 3550.52, "start": 3549.44, "text": "job"}, {"end": 3550.84, "start": 3550.52, "text": "or"}, {"end": 3551.48, "start": 3550.84, "text": "going"}, {"end": 3552.36, "start": 3551.48, "text": "to"}, {"end": 3552.64, "start": 3552.36, "text": "a"}, {"end": 3553.2, "start": 3552.64, "text": "certain"}, {"end": 3553.6, "start": 3553.2, "text": "school"}, {"end": 3554.48, "start": 3553.6, "text": "or"}, {"end": 3555.12, "start": 3554.48, "text": "studying"}, {"end": 3555.24, "start": 3555.12, "text": "a"}, {"end": 3555.96, "start": 3555.24, "text": "certain"}, {"end": 3556.4, "start": 3555.96, "text": "subject."}, {"end": 3556.88, "start": 3556.4, "text": "Just"}, {"end": 3557.24, "start": 3556.88, "text": "do"}, {"end": 3557.92, "start": 3557.24, "text": "what"}, {"end": 3558.48, "start": 3557.92, "text": "feels"}, {"end": 3558.88, "start": 3558.48, "text": "good"}, {"end": 3559.12, "start": 3558.88, "text": "to"}, {"end": 3559.44, "start": 3559.12, "text": "you."}, {"end": 3559.44, "start": 3559.44, "text": "If"}, {"end": 3560.68, "start": 3559.44, "text": "you're"}, {"end": 3562.08, "start": 3560.68, "text": "interested,"}, {"end": 3562.56, "start": 3562.08, "text": "for"}, {"end": 3563.48, "start": 3562.56, "text": "example,"}, {"end": 3564.0, "start": 3563.48, "text": "in"}, {"end": 3564.28, "start": 3564.0, "text": "a"}, {"end": 3564.88, "start": 3564.28, "text": "certain"}, {"end": 3565.4, "start": 3564.88, "text": "application"}, {"end": 3566.92, "start": 3565.4, "text": "area,"}, {"end": 3568.44, "start": 3566.92, "text": "learn"}, {"end": 3569.96, "start": 3568.44, "text": "that."}], "text": " Don't think in terms of what will the payoff be to taking a certain job or going to a certain school or studying a certain subject. Just do what feels good to you. If you're interested, for example, in a certain application area, learn that."}, {"chunks": [{"end": 3571.64, "start": 3570.0, "text": "don't"}, {"end": 3572.24, "start": 3571.64, "text": "worry"}, {"end": 3572.64, "start": 3572.24, "text": "about"}, {"end": 3572.68, "start": 3572.64, "text": "whether"}, {"end": 3573.28, "start": 3572.68, "text": "there's"}, {"end": 3573.28, "start": 3573.28, "text": "a"}, {"end": 3573.36, "start": 3573.28, "text": "good"}, {"end": 3574.12, "start": 3573.36, "text": "award"}, {"end": 3574.16, "start": 3574.12, "text": "to"}, {"end": 3574.16, "start": 3574.16, "text": "be"}, {"end": 3574.16, "start": 3574.16, "text": "had"}, {"end": 3574.24, "start": 3574.16, "text": "in"}, {"end": 3574.96, "start": 3574.24, "text": "that"}, {"end": 3575.56, "start": 3574.96, "text": "area"}, {"end": 3576.16, "start": 3575.56, "text": "or"}, {"end": 3576.56, "start": 3576.16, "text": "something"}, {"end": 3576.72, "start": 3576.56, "text": "like"}, {"end": 3577.32, "start": 3576.72, "text": "that."}, {"end": 3577.6, "start": 3577.32, "text": "What"}, {"end": 3577.8, "start": 3577.6, "text": "can"}, {"end": 3578.04, "start": 3577.8, "text": "I"}, {"end": 3578.64, "start": 3578.04, "text": "say?"}, {"end": 3579.88, "start": 3578.64, "text": "It's"}, {"end": 3580.16, "start": 3579.88, "text": "more"}, {"end": 3580.68, "start": 3580.16, "text": "important"}, {"end": 3580.88, "start": 3580.68, "text": "to"}, {"end": 3581.52, "start": 3580.88, "text": "enjoy"}, {"end": 3582.04, "start": 3581.52, "text": "what"}, {"end": 3582.56, "start": 3582.04, "text": "you're"}, {"end": 3582.72, "start": 3582.56, "text": "doing"}, {"end": 3583.16, "start": 3582.72, "text": "than"}, {"end": 3583.36, "start": 3583.16, "text": "it"}, {"end": 3584.12, "start": 3583.36, "text": "is"}, {"end": 3585.4, "start": 3584.12, "text": "to,"}, {"end": 3585.76, "start": 3585.4, "text": "because"}, {"end": 3586.28, "start": 3585.76, "text": "I"}, {"end": 3586.56, "start": 3586.28, "text": "mean,"}, {"end": 3586.76, "start": 3586.56, "text": "all"}, {"end": 3587.08, "start": 3586.76, "text": "of"}, {"end": 3587.44, "start": 3587.08, "text": "these,"}, {"end": 3587.56, "start": 3587.44, "text": "you"}, {"end": 3587.6, "start": 3587.56, "text": "know,"}, {"end": 3588.08, "start": 3587.6, "text": "these"}, {"end": 3588.8, "start": 3588.08, "text": "academies"}, {"end": 3589.2, "start": 3588.8, "text": "and"}, {"end": 3589.28, "start": 3589.2, "text": "so"}, {"end": 3589.52, "start": 3589.28, "text": "on,"}, {"end": 3590.24, "start": 3589.52, "text": "all"}, {"end": 3590.32, "start": 3590.24, "text": "that"}, {"end": 3590.52, "start": 3590.32, "text": "is"}, {"end": 3590.72, "start": 3590.52, "text": "just"}, {"end": 3591.0, "start": 3590.72, "text": "pay"}, {"end": 3591.56, "start": 3591.0, "text": "dues."}, {"end": 3592.12, "start": 3591.56, "text": "Basically"}, {"end": 3592.44, "start": 3592.12, "text": "that's"}, {"end": 3592.6, "start": 3592.44, "text": "all."}, {"end": 3592.76, "start": 3592.6, "text": "All"}, {"end": 3592.92, "start": 3592.76, "text": "right."}, {"end": 3593.44, "start": 3592.92, "text": "Thank"}, {"end": 3593.8, "start": 3593.44, "text": "you."}, {"end": 3594.8, "start": 3593.8, "text": "So"}, {"end": 3595.0, "start": 3594.8, "text": "yeah."}, {"end": 3595.32, "start": 3595.0, "text": "Yeah."}, {"end": 3595.6, "start": 3595.32, "text": "Thank"}, {"end": 3596.0, "start": 3595.6, "text": "you"}, {"end": 3596.68, "start": 3596.0, "text": "for"}, {"end": 3596.84, "start": 3596.68, "text": "your"}, {"end": 3597.2, "start": 3596.84, "text": "talks."}, {"end": 3597.4, "start": 3597.2, "text": "I"}, {"end": 3598.04, "start": 3597.4, "text": "don't"}, {"end": 3598.24, "start": 3598.04, "text": "know"}, {"end": 3598.48, "start": 3598.24, "text": "whether"}, {"end": 3598.64, "start": 3598.48, "text": "people"}, {"end": 3598.92, "start": 3598.64, "text": "would"}, {"end": 3598.96, "start": 3598.92, "text": "have"}, {"end": 3599.36, "start": 3598.96, "text": "finally"}, {"end": 3599.56, "start": 3599.36, "text": "agree"}, {"end": 3599.88, "start": 3599.56, "text": "on"}, {"end": 3599.96, "start": 3599.88, "text": "the"}], "text": " don't worry about whether there's a good award to be had in that area or something like that. What can I say? It's more important to enjoy what you're doing than it is to, because I mean, all of these, you know, these academies and so on, all that is just pay dues. Basically that's all. All right. Thank you. So yeah. Yeah. Thank you for your talks. I don't know whether people would have finally agree on the"}, {"chunks": [{"end": 3600.04, "start": 3600.0, "text": "of"}, {"end": 3600.48, "start": 3600.04, "text": "data"}, {"end": 3600.96, "start": 3600.48, "text": "science"}, {"end": 3601.12, "start": 3600.96, "text": "but"}, {"end": 3601.28, "start": 3601.12, "text": "i'm"}, {"end": 3601.56, "start": 3601.28, "text": "sure"}, {"end": 3601.68, "start": 3601.56, "text": "the"}, {"end": 3602.16, "start": 3601.68, "text": "audience"}, {"end": 3602.36, "start": 3602.16, "text": "could"}, {"end": 3602.6, "start": 3602.36, "text": "benefit"}, {"end": 3602.8, "start": 3602.6, "text": "a"}, {"end": 3603.04, "start": 3602.8, "text": "lot"}, {"end": 3603.28, "start": 3603.04, "text": "from"}, {"end": 3603.44, "start": 3603.28, "text": "the"}, {"end": 3603.8, "start": 3603.44, "text": "wisdom"}, {"end": 3603.84, "start": 3603.8, "text": "and"}, {"end": 3604.44, "start": 3603.84, "text": "beauty"}, {"end": 3604.64, "start": 3604.44, "text": "from"}, {"end": 3604.84, "start": 3604.64, "text": "your"}, {"end": 3605.52, "start": 3604.84, "text": "talk"}, {"end": 3605.76, "start": 3605.52, "text": "thank"}, {"end": 3605.92, "start": 3605.76, "text": "you"}, {"end": 3606.12, "start": 3605.92, "text": "so"}, {"end": 3606.36, "start": 3606.12, "text": "much"}, {"end": 3606.76, "start": 3606.36, "text": "for"}, {"end": 3606.92, "start": 3606.76, "text": "your"}, {"end": 3607.0, "start": 3606.92, "text": "talk"}, {"end": 3607.28, "start": 3607.0, "text": "thank"}, {"end": 3607.4, "start": 3607.28, "text": "you"}, {"end": 3607.52, "start": 3607.4, "text": "thank"}, {"end": 3607.76, "start": 3607.52, "text": "you"}, {"end": 3608.32, "start": 3607.76, "text": "very"}, {"end": 3608.88, "start": 3608.32, "text": "much"}, {"end": 3610.16, "start": 3608.88, "text": "thank"}, {"end": 3610.32, "start": 3610.16, "text": "you"}, {"end": 3610.56, "start": 3610.32, "text": "all"}, {"end": 3611.2, "start": 3610.56, "text": "for"}, {"end": 3612.44, "start": 3611.2, "text": "listening"}], "text": " of data science but i'm sure the audience could benefit a lot from the wisdom and beauty from your talk thank you so much for your talk thank you thank you very much thank you all for listening"}]}}