Abstract: Twitter's data platform team is serving a large number of real-time analytics jobs, powering a wide range of data science use cases, from aggregations over time to spam detection. These analytics jobs constitute a crucial step in Twitter's data science infrastructure. As a key part of Twitter's “partly cloudy” strategy, real-time data analytics jobs are being migrated from on-premises into the cloud. We would like to share our migration approach and findings in this paper. The jobs to be migrated vary but follow common patterns, including the “read-modify-write store” and “lambda architecture” patterns. Both patterns can be migrated to the Beam data model in general ways. Besides job patterns, the job IOs are handled by replicating or proxying between on-premises and the cloud. Tests are applied in two phases through monitoring metrics and control tests. A case study demonstrates the business impact of migration. Finally, we discuss lessons learned.
0 Replies
Loading