Abstract: Persistent connections are increasingly being used in Web retrieval due to wide adoption of HTTP/1.1 standards. With persistent connections, the request allocation algorithm used by Web clusters is often session-grained. This article studies the caching performance of Web clusters under session-grained request allocation. It is shown that although content-based algorithms considerably improve caching performance over content-blind algorithms at the request-grained level, most performance gain is offset by the allocation dependency that arises when the requests are allocated at the session-grained level. The performance loss increases with cluster size and connection holding time. An optimization problem is then formulated for improving the caching effectiveness of session-grained allocation. The problem is proven to be NP-complete. Based on a heuristic approach, a session-affinity aware algorithm is presented that makes use of the correlation between the requests in a session. The new algorithm is shown to significantly outperform the content-based algorithm under session-grained allocation. It is also shown that optimizing session-grained allocation cannot fully compensate for the performance loss caused by allocation dependency.
Loading