IPdb: A High-precision IP Level Industry Categorization of Web Services

Published: 29 Jan 2025, Last Modified: 29 Jan 2025WWW 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: Web mining and content analysis
Keywords: Internet management, Web mearsurement
Abstract: IP addresses with web services are crucial in the Internet ecosystem. Classifying these addresses by industry and organization offers valuable insights into the entities utilizing them, enabling more efficient network management and enhanced security. Previous work in website classification and Internet management struggles to offer an IP-level perspective of the industries of web services due to their limited industry categories or potential industry inconsistencies between IP address owners and AS owners. To this end, we present IPdb, an IP-level industry categorization dataset. To construct the dataset, we developed LLMIC, a Large Language Model-based Industry Categorization framework with a precision of nearly 96\%. IPdb serves as a labeled database for future endeavors in developing IP-level industry classifiers, encompassing over 200 million IP addresses. Furthermore, our study indicates that 30\% $\sim$ 50\% of organizations within critical infrastructure industries deploy web servers across multiple ASes. Our study also validates the problem of mismatched granularity in industry categorization at the AS level with 87.83\% ASes in IPv4 and 72.96\% ASes in IPv6 containing IP addresses from different industries.
Submission Number: 1896
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview