Making Translators Privacy-aware on the User's Side

Published: 10 Jun 2024, Last Modified: 10 Jun 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We propose PRISM to enable users of machine translation systems to preserve the privacy of data on their own initiative. There is a growing demand to apply machine translation systems to data that require privacy protection. While several machine translation engines claim to prioritize privacy, the extent and specifics of such protection are largely ambiguous. First, there is often a lack of clarity on how and to what degree the data is protected. Even if service providers believe they have sufficient safeguards in place, sophisticated adversaries might still extract sensitive information. Second, vulnerabilities may exist outside of these protective measures, such as within communication channels, potentially leading to data leakage. As a result, users are hesitant to utilize machine translation engines for data demanding high levels of privacy protection, thereby missing out on their benefits. PRISM resolves this problem. Instead of relying on the translation service to keep data safe, PRISM provides the means to protect data on the user's side. This approach ensures that even machine translation engines with inadequate privacy measures can be used securely. For platforms already equipped with privacy safeguards, PRISM acts as an additional protection layer, reinforcing their security furthermore. PRISM adds these privacy features without significantly compromising translation accuracy. We prove that PRISM enjoys the theoretical guarantee of word-level differential privacy. Our experiments demonstrate the effectiveness of PRISM using real-world translators, T5 and ChatGPT (GPT-3.5-turbo), and the datasets with two languages. PRISM effectively balances privacy protection with translation accuracy over other user-side privacy protection protocols and helps users grasp the content written in a foreign language without leaking the original content.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url:
Changes Since Last Submission: We added discussion on the definition of differential privacy. We inserted explanations of the theory just after Theorem 3.2 (p.5). The reason for this change is that a reviewer and the AE misunderstood that Theorem 3.2 could not apply to correlated texts and that was the reason for the decision. We clarified that Theorem 3.2 is applicable to correlated texts and block-wise changes. We replaced "differential privacy" with "word-level differential privacy" in the introduction. We inserted the following disucssion in the introduction (page 2, L.5--L.9) ```Here, we mean word-level by regarding texts with one different word as adjacent. It should be noted that this definition may be oversimplified for practical guarantee of privacy as words in real-world texts are correlated. Group privacy, where a group of words is considered a unit, mitigates this problem as differential privacy can be composed, and our results can be extended to group privacy. Exploring more sophisticated guarantees that care text correlations is left as future work.``` We inserted `We prove that PRISM enjoys the theoretical guarantee of word-level differential privacy.` in the abstract.
Assigned Action Editor: ~Nihar_B_Shah1
Submission Number: 2414