Abstract: Passwords are today’s dominant form of authentication, and password guessing is the most effective method for evaluating password strength. Most password guessing models (e.g., PCFG, Markov, and RFGuess) regard passwords as sequences composed of basic units (i.e., characters/segments), with the information flow being unidirectional (i.e., predicting the next unit based on the preceding units). However, modeling passwords in a single direction fails to capture users’ password creation behavior that is actually impacted by the full password context. Through an in-depth analysis of real-world passwords, we reveal that users often create passwords around a central keyword, like common words, names, or dates, and then embellish them with numbers or symbols. Based on this observation, we, for the first time, attempt to parse passwords as trees. Unlike existing sequence models, trees can reveal the semantic connections within passwords and the logical thought processes users follow when creating passwords. For instance, in the password iloveyou, the basic units i and you are semantically dependent on the predicate love, forming a natural tree structure. We propose a trawling guessing model called PassTree and a targeted guessing model based on personally identifiable information (PII), named PassTree-PII. Our extensive experiments demonstrate the effectiveness of our models: (1) PassTree outperforms its leading counterparts by 0.38%-2.51% when guessing numbers are below $10^{7}$107; (2) PassTree-PII achieves a cracking rate comparable to the state-of-the-art RFGuess-PII proposed in USENIX Security’23, but operates significantly more efficiently, using only 0.87% of the memory and being 16.60 times faster. Our work provides a new perspective on understanding user passwords and demonstrates a feasible technical route of applying tree structures to password guessing.
External IDs:doi:10.1109/tdsc.2025.3552583
Loading