Staying up to Date with Online Content Changes Using Reinforcement Learning for Scheduling

May 03, 2019 Submission readers: everyone
  • Keywords: web crawling, convex optimization, reinforcement learning
  • Abstract: From traditional Web search engines to virtual assistants and Web accelerators, services that rely on online information need to continually keep track of remote content changes by explicitly requesting content updates from remote sources (e.g., web pages). We propose a novel optimization objective for this setting that has several practically desirable properties, and efficient algorithms for it with optimality guarantees even in the face of mixed content change observability and initially unknown change model parameters. Experiments on 18.5M URLs crawled daily for 14 weeks show significant advantages of this approach over prior art.
0 Replies