TL;DR: Our paper explores the supply-chain risks associated with open ML models
Abstract: Powerful machine learning (ML) models are now readily available online, which creates exciting possibilities for users who lack the deep technical expertise or substantial computing resources needed to develop them. On the other hand, this type of open ecosystem comes with many risks. In this paper, we argue that the current ecosystem for open ML models contains significant *supply-chain* risks, some of which have been exploited already in real attacks. These include an attacker replacing a model with something malicious (e.g., malware), or a model being trained using a vulnerable version of a framework or on restricted or poisoned data. We then explore how Sigstore, a solution designed to bring transparency to open-source software supply chains, can be used to bring transparency to open ML models, in terms of enabling model publishers to sign their models and prove properties about the datasets they use.
Lay Summary: Powerful machine learning models are now easy to find and download online. This is exciting because people who aren't tech experts or don't have powerful computers can still use them. On the other hand, sharing these models so openly comes with downsides. Specifically, our paper highlights how there are big security risks that come from downloading and using models we don't know much about, similar to downloading and using computer programs from unsafe sources. For example, someone could replace a good model with a harmful one (like a computer virus), or might have built their model in an unsafe way or using data that is bad or that they weren't supposed to use.
We suggest using something called Sigstore to make this process safer. Sigstore adds transparency to models by helping people digitally "sign" the models they create in a way that proves they're legitimate and shows where the data that was used to create the model came from.
Verify Author Names: My co-authors have confirmed that their names are spelled correctly both on OpenReview and in the camera-ready PDF. (If needed, please update ‘Preferred Name’ in OpenReview to match the PDF.)
No Additional Revisions: I understand that after the May 29 deadline, the camera-ready submission cannot be revised before the conference. I have verified with all authors that they approve of this version.
Pdf Appendices: My camera-ready PDF file contains both the main text (not exceeding the page limits) and all appendices that I wish to include. I understand that any other supplementary material (e.g., separate files previously uploaded to OpenReview) will not be visible in the PMLR proceedings.
Latest Style File: I have compiled the camera ready paper with the latest ICML2025 style files <https://media.icml.cc/Conferences/ICML2025/Styles/icml2025.zip> and the compiled PDF includes an unnumbered Impact Statement section.
Paper Verification Code: MjQzZ
Link To Code: https://github.com/sigstore/model-transparency/
Permissions Form: pdf
Primary Area: System Risks, Safety, and Government Policy
Keywords: transparency, provenance, integrity
Submission Number: 196
Loading