\section{Introduction}  \label{intro}
What constitutes a robustly safe and responsible release of new AI systems, from components such as training datasets to model access itself, urgently requires multidisciplinary guidance. There is no overarching standard or standards-making body to form consensus for what constitutes responsible release. This is particularly true for generative AI systems, which can be leveraged for a broad range of tasks and are sometimes referred to as “general-purpose” \cite{Trajtenberg18}. A sub-type of foundation models \cite{DBLP:journals/corr/abs-2108-07258}, these systems generate outputs across modalities such as text and image. They can be applied to both beneficial and harmful tasks. For example, language models can be adapted to tasks such as grammar correction and translation, or be used for phishing and spam. The relative novelty means new uses, and misuses, of these systems are still being discovered. Carefully considering release strategies for present and less powerful systems better prepares and sets precedent for the AI community and the broader affected community as increasingly powerful systems are deployed. Due to the fast pace of AI progress and release, developers, deployers, researchers, and policymakers must take action via community discussions, guardrails, and investments.

The current state of generative AI system release is largely determined by the organizations developing systems. Developers are most likely to best understand the system, but understanding impact and ripple effects requires multidisciplinary expertise that is rarely housed in one organization. Waiting for long-term evidence of consequences is infeasible for high-risk and powerful systems. A strictly closed and vertical process to commercialization can lead to concentrated power among high-resourced organizations. An open process without ethical considerations can inflict and exacerbate risk and harm, from misuse to bias.

Many components make up a system throughout its lifecycle, from training data to computing power. This paper will primarily focus on the cumulative release of a model and its components by outlining key considerations in release; release options along the gradient; the timeline of released systems; and necessary investments to improve safe releases. 

\section{Previous Work}
Discussion about safe release has been ongoing in the AI research community, but there is no standards body or default convener for these discussions. Initiatives such as the Partnership on AI’s Publication Norms for Responsible AI \cite{staff_2022} and Stanford University’s Call for Community Norms for the Release of Foundation Models \cite{liang_bommasani_creel_reich} have made space for discussion around the many options for system releases \cite{sastry}, and the many components involved and available platforms that have made options complex. 

At the core of release considerations is the tension between openness that shares rather than concentrates power, and closedness that minimizes potential harm and risk. Broadly, safety and risk control development lags behind system development; for example, tools for detecting generated outputs underperform as systems become increasingly powerful \cite{DBLP:journals/corr/abs-2002-03438}. For researchers in natural language processing, whether developers should be ethically responsible for downstream misuse of a publicly released system is contentious, with about half of researchers believing professionals should be responsible\cite{https://doi.org/10.48550/arxiv.2208.12852}.

Parallel fields such as open-source software deployment can share informative lessons, such as the ability for open source software communities to enable community research and crowdsourcing work such as discovering vulnerabilities \cite{10.1145/2145204.2145396}. Examining the specific use case of open-source Deepfakes highlights the difficulty of managing downstream harms in real-time and the risk of safety controls being seen as futile \cite{10.1145/3531146.3533779}. While generative AI research would also greatly benefit from this community insight, lessons from software are not often directly applied due to substantial difference in functionality \cite{DBLP:journals/corr/abs-2001-00463}. Ultimately personal values around openness are a large factor in decision making \cite{DBLP:journals/corr/abs-1910-01170}, and tensions can be further examined by output modality \cite{bergman-etal-2022-guiding}. 

\section{What is Being Released?}
The parts of an AI system considered in a release can be broken into three broad and overlapping categories: 
\begin{itemize}
  \item access to the model itself, 
  \item components that enable further risk analysis, 
  \item and components that enable model replication. 
\end{itemize}
Components are organized based on their most straightforward use. There is overlap among these components; the same model cannot be replicated without its component for risk analysis, such as its entire original training data, even if all replication components are available. Conversely, components for replication can also be analyzed for social impacts such as biases. 

\subsection{The Model Itself}
Access to the model itself includes the model weights and the ability to query, adapt, or otherwise examine and conduct further research into a model. The range of access is expanded according to the gradient in Figure \ref{fig:gradient}. 

\subsection{Components for Risk Analysis}
These components are the parts of system development that could provide further insight into the model; the model's capabilities; the decision making process on what data was collected and how; and documentation of the process. Additionally this details system risks, training data, fine-tuning data, and information on people and human crowdworkers involved in adapting the model through methods such as reinforcement learning with human feedback. This also includes evaluation results: published results from any evaluations that researcher and developers may have run on the base model. These components may be withheld due to intellectual property (IP) rights, consent, or privacy concerns.

\subsection{Components for Replication}
These components include a technical paper detailing the model training process and code used to train the model, as these can ease replication efforts. This also includes training information such as configuration settings (e.g. batch size), and telemetry collected during training (e.g. training loss). These components may be withheld for competition, IP, and misuse reasons. They are also high risk for misuse concerns as they can be repurposed or adapted to malicious or otherwise harmful use cases \cite{vee_2022}. 

\section{Key Considerations in Release}
Deployers should weigh the following considerations when making release decisions. Risks and threats from increasingly powerful systems are difficult to enumerate and assess, especially since malicious actors and their incentives are constantly evolving \cite{10.1145/3372823}. Taxonomies of ethics and risks of specific systems \cite{https://doi.org/10.48550/arxiv.2112.04359} can serve as a framework for potential harms. Specific considerations across all generative systems are listed below. 

\subsection{Concentration of Power}
One of the most prominent arguments for providing access to systems is to avoid concentrating the level of power that high-resource organizations are collecting as one of the few groups capable of developing and deploying these systems. Large technology companies are able to create powerful AI systems because of their access to training data, computing infrastructure, and commercial capabilities for deploying that system. This monopolization also gives these high-resource institutions more influence in AI development, the behavior of these systems, and the narrative and direction of the field \cite{benkler_2019}. Although these companies may provide access or even open-source their systems, contributions to system development are limited to people and resources working towards that company’s interests \cite{10.1145/3488666}. Large companies are often geographically concentrated in Western countries whereas systems are deployed globally, which can asymmetrically impose cultural values \cite{DBLP:journals/corr/abs-2007-04068}. These companies can also punish pushback or dissent \cite{dave_dastin_2020}. The people most affected and exploited by AI systems are rarely found in large technology companies. They must be empowered to shape systems that also benefit them, or to opt out of interaction with AI entirely \cite{Kalluri_2020}.

\subsection{Exacerbating Disparate Performance and Harmful Social Impacts}
The fewer perspectives that are incorporated into the system development process leads to higher likelihood the system performs disparately for different groups. AI systems can propagate harms such as exacerbating social inequity \cite{noble_2018, benjamin_2020, hovy-spruit-2016-social} and harmful biases \cite{doi:10.1126/science.aal4230, blodgett-etal-2020-language}, which can be further amplified in larger systems as scale increases \cite{10.1145/3442188.3445922}. Means of measuring and mitigating risk in these systems are largely cultural and context-dependent \cite{talat-etal-2022-reap}. The many technical and social aspects of AI systems \cite{DBLP:journals/corr/abs-2108-07258} require robust research \cite{DBLP:journals/corr/abs-2111-15366} conducted with communities affected \cite{10.1145/3531146.3533083} to ensure these systems benefit and do not exploit marginalized groups, if the systems are to be deployed among these groups.

\subsection{Malicious Use and Unintentional Misuse}
With more modalities of AI generation improving in output quality, from high quality text to high quality images, the potential for harmful use cases also increases. Malicious uses such as the creation of deepfake imagery \cite{DBLP:journals/corr/abs-1909-11573}, AI-generated disinformation \cite{https://doi.org/10.48550/arxiv.2301.04246}, and illegal and disturbing material \cite{simonite_2021}, can cause severe emotional harm at the individual level and destructive institutional harm at the societal level. Furthermore, malicious actors \cite{DBLP:journals/corr/abs-1802-07228} have historically worked to circumvent safety controls. Threat modeling will necessarily differ by modality, but as systems improve in types of outputs such as code generation, potential harms can also broaden \cite{https://doi.org/10.48550/arxiv.2207.14157}. While limiting access can prevent some malicious uses and is often a suggested action to minimize misuse \cite{DBLP:journals/corr/abs-1907-11274}, systems can still be vulnerable to attacks with only querying functionality available \cite{DBLP:journals/corr/abs-2012-07805}. 

\subsection{Auditability}
The question of auditability addresses who is conducting audits and the level of access required to effectively examine an AI system. Auditing must be considered both pre- and post-deployment as impacts from a system may not be detectable pre-deployment and when deployed, impacts may be difficult to trace back to a specific system \cite{10.1145/3351095.3372873}. The actors conducting and capable of conducting audits will likely require some level of technical skill even when numerous no-code tools are built. The size of the system and its components also determine auditability; the datasets that large generative AI systems are trained on are not only difficult to analyze at scale, but few tools exist to analyze large static datasets \cite{Cai2015TheCO}. Formal audits alone cannot be the only insight or governance of a system \cite{10.1145/3514094.3534181}. 

\subsection{Accountability in Case of Harm}
In the case that an AI system harms or is connected to harming people, who or what is to be held accountable is unclear. More open and deployed systems have a higher likelihood of a broader reach and therefore a higher chance of harm. Since harm is not explicitly defined and not always physical, what constitutes harm can have a large range. The range may include encouraging physical harm, propagating social harms such as identity stereotypes, and more abstract harms such as lack of access to a system lowering opportunities for a specific group. Work to characterize sociotechnical harms can narrow the scope \cite{https://doi.org/10.48550/arxiv.2210.05791}.

\subsection{Value judgments for gating and limiting access}
A base generative AI system is capable of many types of content, making content moderation complex \cite{gillespie_2021}. What constitutes appropriate outputs is influenced by religion, cultural, and personal beliefs. What content can and should be limited, filtered, and gated is also vague. For example, sexual content may not be inherently unsafe to generate in some cultures, but may be subject to local laws. Technical filters may not be able to distinguish between sexual content and nonsexual nudity, and may not be able to distinguish between consensual and non-consensual content. While most difficult without specific use cases or context, specific applications face the same challenges.

\section{The Gradient of System Access}
Once considerations are taken into account, the group determining release method must choose if the system and its components are publicly acknowledged and released. The below gradient of release options are based on five years (2018 - 2022) of publicized generative AI systems. This gradient of options serves as a framework and does not fully capture the nuance of the many components and details in a system release.

Figure \ref{fig:gradient} shows the tradeoffs in considerations along the gradient; as systems become more open they better enable audits and community research but are more difficult to control for risks. 

\begin{figure}[h!]
  \centering
  \includegraphics[width=\textwidth]{images/MAIN_gradient}
  \caption{Considerations and Systems Along the Gradient of System Access}
  \label{fig:gradient}
\end{figure}

Below the gradient are examples of generative systems placed according to their original release method upon announcement; for example, GPT-2 may fall under “Downloadable” today, but was originally released as “Gradual/Staged”. 

\subsection{Fully Closed} 
When all aspects and components of a system are inaccessible outside the developer organization, or even closed outside a specific subsection of an organization, the system is fully closed. At the furthest end of the spectrum, the system’s existence is unknown outside a select group within the developer organization, even after full training. A fully closed system may or may not include some form of public announcement that the system exists. These systems can only be researched by the developer organization, which is often a high-resource organization such as the Alphabet companies Google and DeepMind. Some publicly-known systems, such as Google’s Imagen \cite{https://doi.org/10.48550/arxiv.2205.11487} and DeepMind’s Gopher \cite{DBLP:journals/corr/abs-2112-11446}, are examples. Public engagement may come from the system being deployed in a commercial application or the public calling out biases and notable social aspects of a system from public releases. These releases can be cherry-picked, for example showing only non-human animals or human silhouettes \cite{https://doi.org/10.48550/arxiv.2209.14792}, which does not give robust insight to broad capabilities or social impacts such as biases. 

\subsection{Gradual/Staged Release}
This method refers to releasing a system in stages or gradually over a predetermined amount of time. The time between stages is intended for investments that minimize risk such as monitoring for malicious actor activity and conducting research on potential harms. In 2019, OpenAI stage-released language model GPT-2 in four sizes by increasing parameter count over nine months while conducting research internally and with external partners \cite{DBLP:journals/corr/abs-1908-09203}. This sparked debate among some \cite{lipton_2019}, but still is a recommended tactic among others \cite{https://doi.org/10.48550/arxiv.2210.04610}. In 2022, Stability AI’s Stable Diffusion \cite{mostaque_2022} initially approached a stage release by providing access to a hosted model before releasing the model weights. However, model weights were leaked 12 days after their initial hosted release. This exemplifies the need to inject safety protocols and prevent leakage during this approach. While there is no standardized time frame for staged releases, generally substantial sociotechnical research requires multiple weeks, months, and sometimes years.

\subsection{Gated to Public Access (Including Paid and Free)}
When providing access to a system without fully opening all components, actors deciding release method may choose to place access limitations. Above the infrastructural limitation options, namely hosting, cloud-based access, or fully downloadable access, is the choice to make the release gated or public. Gating system access is a selective process by group of people used to block high risk or out-of-scope use cases. Limited access can make enforcing controls easier; for example, system deployers withhold the right to revoke access in a gated and hosted access setting. However, gating downloadable systems is unreliable as a technical mechanism; the network effect of researchers sharing within the same circles \cite{liang_bommasani_creel_reich} can provide a loophole to gating. The releasing organization cannot fully monitor whether users are sharing access through screen-sharing, credential-sharing, or simply sending components such as model weights to unauthorized users. This does not mean this is an ineffective guardrail, as it still creates barriers to sharing the model. The deploying organization will still be making critical decisions that contribute to concentration of power.

\subsubsection{Hosted Access}
System deployers may provide access to the model itself by hosting the model on their own servers and allowing surface-level interfacing. Access can differ depending on the interface's usability, especially for users with minimal or no experience with these systems. Generally, users are unable to perform tasks outside what is prescribed: usually simple input-output probing. This method is specific only to model access, not access to other system components. Examples include Midjourney, which allows users to interact with its image generation model, via Discord bot or web interface \cite{midjourney_2022}. When optimized for usability and dialogue, as seen with OpenAI's ChatGPT \cite{openai_chat}, broader perspectives can interact with and the model, but raises misuse concern and ethical challenges \cite{reich_2022}. This method can also transition to API or downloadable, as seen with OpenAI’s DALL$\cdot$E 2 \cite{https://doi.org/10.48550/arxiv.2204.06125} was switching to API-based access seven months later \cite{openai_2022}. While this method provides some model access, it limits external research ability.

 \subsubsection{Cloud-based/API Access}
Cloud-based access or access provided via application programming interface (API), provides more insight and researchability into a model than Hosting, but still allows for restrictive functionality. Some APIs only allow for querying, such as OpenAI’s original GPT-3 release via API \cite{openai_2020}. Additional functionality can be added, such as fine-tuning via API. Similar to Hosting, this method is specific only to model access. Non-released components and system information can be determined via tools such as EleutherAI’s evaluation harness, used to determine GPT-3 parameter sizes via OpenAI’s API \cite{gao_2021}. This method is favorable for structured access where research is possible but can still be tracked and is unlikely to create a “modified version” \cite{https://doi.org/10.48550/arxiv.2201.05159}. Cloud-based access can track users and their activity to monitor for risky behavior. This can also better enforce safety controls such as rate limiting. . 

\subsubsection{Downloadable}
The main distinction between downloadable and fully open systems is the withholding of system components, such as training dataset availability. Downloadable systems can also be gated. Downloadability does not inherently imply full access to any user granted access, as the size of a model can limit who is capable of running a modal locally. Personal and standard consumer hardware is unlikely to support large and powerful models. The infrastructure needed to run large models creates an access barrier. In response, industry \cite{microsoft_2022} and public initiatives \cite{nairrtf_2021} are creating accessible infrastructure for researchers. Downloadable models better enable robust research, but are difficult to track for potential misuse or harm. This method also eases user ability to erode or disable safety controls, such as content filters.

\subsection{Fully Open}
When all aspects of the system are accessible and downloadable, including all components, the system is fully open. These systems cannot be gated and by definition are fully public. For the purposes of this framework, a basic level of accessibility and documentation across components qualifies a system release as “fully open”, but releases may differ in documentation detail and levels of granularity. The most prominent fully open systems were developed by organizations founded on the principle of openness. EleutherAI is a decentralized collective also prioritizes transparency and has released all system components, as seen in their GPT-J \cite{mesh-transformer-jax} and GPT-Neo \cite{gpt-neox-20b} language models and the Pile dataset \cite{DBLP:journals/corr/abs-2101-00027}. The BigScience global research community of over 1000 researchers developed the BLOOM language model in the open \cite{https://doi.org/10.48550/arxiv.2211.05100}. Over 30 working groups covered aspects from dataset creation to carbon footprint to modeling approach to optimize for a multilingual system created transparently \cite{bigscience-2022-bigscience}. While openness does enable broader research that can engage many peoples, it can also enable dangerous uses and model creation \cite{vee_2022} and controls can be difficult to enforce. 

\section{Trends in System Releases}
We analyze release trends across prominent base generative AI systems; this does not include fine-tuned or updated systems such as models that undergo reinforcement learning with human feedback. These figures are based on tracking and evaluation initiatives \cite{LiaoModelTracker2022, https://doi.org/10.48550/arxiv.2211.09110, talat-etal-2022-reap}, are not exhaustive, and intend to show release trends over time. 

\subsection{Timelines for Large Language Models}
When examining systems by the original method of release over time, trends seen in Figure \ref{fig:lm_release} show closing and limiting language model access as more common since GPT-2’s staged release. Language models with fewer than six billion parameters have generally been towards the open end of the gradient, but more powerful models, especially from large companies, tend to be closed. This can be due to their requiring deeper consideration and safeguards due to risk potential, but Figure \ref{fig:lm_release} also illustrates the high number of large companies able to develop and close language models. 


\begin{figure}[h]
  \centering
  \includegraphics[width=\linewidth]{images/LM_Release.pdf}
  \caption{Language Model Release Method By Parameter Count Over Time}
  \label{fig:lm_release}
\end{figure}

\pagebreak
\subsection{Timelines for All Modalities}
As more generative modalities are developed, from image to audio to video, they face similar release decision challenges. 

\begin{figure}[t]
  \centering
  \includegraphics[width=.92\linewidth]{images/MAIN_Long_Release_Timeline.pdf}
  \caption{Release Methods Over Time (All Modalities)}
  \label{fig:all_modalities}
\end{figure}

 Figure \ref{fig:all_modalities} shows system release over this same time period. As there is no standard means to compare capabilities across modalities, all levels of system capability are placed equally. Again, trends show openness until GPT-2's staged released. This timeline also shows a sharp increase in the amount of systems developed, and closed, after 2021. The systems most commonly toward the open end of the gradient are developed by smaller organizations founded with the intent to be open. 

 Conversely, many systems from large companies\footnote{See Appendix \ref{Appendix:A} for logo and developer key} are becoming closed or have closed components. OpenAI is the most common company to restrict but not fully close or open access. Alphabet companies Google and DeepMind are most common among closed systems. Across modalities, large companies have steered toward closedness. Open initiatives from large companies are shown to release a downloadable model trained on public datasets crafted by other organizations, as seen with Meta’s OPT-175B \cite{https://doi.org/10.48550/arxiv.2205.01068}. It is unclear at this time whether movements towards openness will pressure historically closed organizations to adjust their release strategies.


\section{Safety Controls and Guardrails}
A combination of controls and guardrails, largely from the developer and deploying organizations but also from external researchers, can complement each other in order to address the above considerations and risks. Many of these methods are pioneered and honed in research environments and outside developer organizations. Individually, no one control can serve as a panacea. While it is possible to add controls and guardrails long after deployment, these options are most effective when deployed simultaneously with system release. 

\subsection{Documentation and Transparency}
Structured documentation that clearly communicates critical information about each component of the system gives further insight to the system and can take many forms. Proposed approaches to documentation at dataset and model levels have proven successful; without any enforcement mechanisms, many releases across AI companies include some form of this documentation. \textit{Datasheets for datasets} \cite{DBLP:journals/corr/abs-1803-09010} communicates aspects of datasets such as creators’ motivations, collection process, and overall composition. Meta’s OPT-175B release included a datasheet in its appendix \cite{https://doi.org/10.48550/arxiv.2205.01068}. \textit{Data statements for natural language processing} \cite{bender-friedman-2018-data} are another popular tool more tailored to language-based systems, seen used by bias measurement dataset CrowS-Pairs \cite{DBLP:journals/corr/abs-2010-00133}. \textit{Model cards} \cite{DBLP:journals/corr/abs-1810-03993} have been popular as seen in Google’s PaLM \cite{https://doi.org/10.48550/arxiv.2204.02311}, OpenAI’s GPT-2 \cite{openai_2019} and GPT-3 \cite{openai_gpt3}, and Runway Research and Stability AI’s Stable Diffusion \cite{Rombach_2022_CVPR}. Model cards are deployed across Hugging Face’s platform and have evolved to be interactive \cite{Crisan_2022}. \textit{System cards} \cite{Procope_2022} blend datasheets and model cards and have been used for DALL$\cdot$E 2 \cite{mishkin2022risks}.

\subsection{Technical Tools}
Technical tools can address specific technical safety concerns, but cannot be a substitute for addressing complex societal problems. In some cases, technical tools can create new social harms and should therefore be vetted and combined with other guardrails.

\subsubsection{Rate Limiting}
Constricting the amount of outputs a user can generate via cloud-based access is a popular means of preventing attacks and harmful generations. Rate limiting also helps a system perform well and protect underlying infrastructure from being overloaded. This defensive measure can be enforced with common strategies such as a token bucket, which tracks and limits usage according to a set number of tokens that can refresh or accumulate on a predetermined time frame. As an example, OpenAI’s DALL$\cdot$E 2’s public API rate limits external users \cite{openai_rate}. This can be adapted for users whose applications have been cleared as safe.

\subsubsection{Safety and Content Filters}
Filters developed to trigger blank responses when given an unsafe input are popularly deployed across varying levels of access. This can help block illegal and egregious content. Developers selecting these trigger categories must make normative judgements about what input content blocks generation. Stable Diffusion’s safety filter was found to primarily prevent generations with sexually explicit content but not violence and gore \cite{https://doi.org/10.48550/arxiv.2210.04610}, which is a normative judgment about the safety of both categories in generated images. Blocking generations for socially sensitive topics can result in entire identity groups being blocked. Lessons from social media platform content moderation highlight harms such as community erasure, especially among marginalized groups \cite{10.1145/3479610}.

\subsubsection{Detection Models}
While methods to detect AI generated outputs can vary and include human detection, detection models can be a helpful tool, especially for less powerful generative systems. While the human eye alone can detect outputs from less powerful systems, such as Craiyon for AI-generated images, detection models can have higher accuracy for more powerful systems. This is particularly important when models are deployed in high-stakes settings \cite{kreps_mccain_brundage_2022}. As system output quality improves, that distinction becomes more difficult for both humans and AI detection models. Approaches to detection can be tailored to modality \cite{DBLP:journals/corr/abs-2011-01314} such as text \cite{gehrmann-etal-2019-gltr}, and include human annotation \cite{DBLP:journals/corr/abs-2107-01294}. Detection models can also differ based on type of generation within a modality, such as facial generations \cite{Gragnaniello2022}.

\subsubsection{Hardcoding Responses}
Predetermined safe outputs triggered for a given input can be hardcoded into a model interface. This can aid legal compliance or provide standardized responses for high-risk inputs. Similar to filtering, determining trigger inputs or trigger categories requires normative judgements about what constitutes unsafe inputs and what constitutes appropriate outputs. This can not only lead to community erasure, but also impose these normative beliefs onto users.

\subsubsection{Watermarking}
The concept of digital watermarking media can be transferred to AI systems to protect against model theft, protect IP, and more easily identify AI-generated outputs. Encoding a unique identifier in generated outputs can aid in detecting media as AI-generated and synthetic and trace the output to a specific model. Research strives ensure these watermarks are invisible to the human-eye \cite{DBLP:journals/corr/abs-1909-01285}, do not affect output quality \cite{https://doi.org/10.48550/arxiv.2301.10226}, and tamper-proof from model attacks and alterations like fine-tuning via methods such as embedding noise as watermarks \cite{10.1145/3196494.3196550}. Different approaches to watermarking can be deployed for different needs, from the embedding method to easily determine whether an output is synthetic to linking watermarks to a model owner’s identity for authentication purposes \cite{10.3389/fdata.2021.729663}. There are no current prominent successful case studies as watermarking has not yet been publicly deployed at scale for large generative systems.


\subsubsection{Model Weight Encryption}
Encryption can be used in order to protect model weights, often to protect from model stealing and to protect IP. This allows only an authorized user with the key to use the model. \cite{10.1145/3505634}’s proposed NN-Lock does not change model structure so as not to adversely affect model performance. \cite{10.5555/3437539.3437711} proposed an obfuscation framework that only authorizes users with a trustworthy hardware device. \cite{DBLP:journals/corr/abs-2011-13564} notes many existing IP protection methods are not robust to model attacks and are not suitable for commercial purposes as they verify model ownership but not user identities.

\subsubsection{Updating, Adapting, or Retraining models}
Models can be adapted in a way that mitigates risk. Popular methods include fine-tuning; for example fine-tuning GPT-3 on values-targeted datasets \cite{solaiman2021process}, or fine-tuning LaMDA on annotated data to improve factual grounding \cite{https://doi.org/10.48550/arxiv.2201.08239}. Another method is reinforcement learning with human feedback as seen with InstructGPT \cite{https://doi.org/10.48550/arxiv.2203.02155} and its open-source replication effort at CarperAI \cite{carperai_2022}. These methods result in new models different from their base models, but often improved along a safety parameter.

\subsection{Community and Platform Efforts}
Community-driven approaches to risk mitigation leverage new and varied viewpoints. Bounty programs, from bug bounties to bias bounties \cite{rubinovitz_2018}, can raise unforeseen safety issues and strengthen trust in a system \cite{https://doi.org/10.48550/arxiv.2004.07213}. \textit{Bias bounties} by nature benefit from diverse perspectives. \textit{Community moderation}, \textit{community-based content flagging}, and \textit{naming and shaming techniques} on a platform enables users to determine and stop harmful content before it escalates. \textit{Monitoring and logging inputs} by a user on the backend helps track trends in harmful or extremist behavior.

\subsection{Organizational and Platform Policies}
Organizational and platform policies can guide and enforce safe human interaction with generative AI systems. These policies can have drawbacks; they may protect from harm, but also limit beneficial uses. For example, limiting access in a region under active war can prevent disinformation generation but also general access. \textit{Internal risk policies} should provide a process for what considerations must be weighed and how to evaluate each prior to determining release options. If the system is deployed on a platform or on a given interface, \textit{a code of conduct} for engaging with the platform and other users prevents direct harm on the platform at risk of losing platform access. \textit{Mandating user accounts} on a platform helps track specific users and their activity, which supports community and platform efforts. \textit{Sharing policies} that outline what can and cannot be posted on other platforms or for uses outside of personal use prevents harmful content from spreading and inciting further harmful content.

\subsection{Legal Recourse}
Legal measures such as licenses are an enforceable control when a user uses a system in a way the deployer prohibited. The Responsible AI License (RAIL) places behavioral use conditions on a model, with the model owner owning the license and responsibility for pursuing enforcement if need be \cite{DBLP:journals/corr/abs-2011-03116}. Both BigScience’s BLOOM model \cite{bigscience_2022} and Runway Research and Stability AI’s Stable Diffusion \cite{rombach_esser_2022} use RAILs. Licenses are difficult to enforce for downloadable or fully open systems, as model behavior and uses cannot be fully monitored. Legal enforcement can also be costly in terms of both time and financial resources. Example cases studies are examined in \cite{DBLP:journals/corr/abs-2011-03116}.


\section{Necessary Investments for Responsible Release}
Developers and researchers must listen and leverage multidisciplinary and often external expertise, especially for guardrails. Policymakers must mandate safety where possible and technically feasible, and provide resources for the under-resourced. Regardless of level of access, generative AI systems are capturing, reflecting, and amplifying aspects of society that require multiple perspectives in exploratory and risk control research. Since a system cannot be fully safe or unbiased for all groups of peoples and there is no clear standard for when a system is safe for broad public release, further discourse across all affected parties is needed. Research and decisions made now will inform considerations for increasingly powerful systems across modalities in the future, making early investment crucial.

\subsection{Accessible Interfaces and Low and No-code Tools}
In order to make generative systems accessible to the many peoples they affect, means of interacting with a system, such as a model demo, are needed. A clean, easily usable interface that accommodates disabilities and all levels of technical comfort significantly improves accessibility. This step towards further openness can push a system toward the far end of the gradient with less risk control and increased red-teaming. Accessible interfaces with low-barrier sharing can also better enable cross-field collaborations \cite{DBLP:journals/corr/abs-1906-02569}. Large-scale probing can reveal flaws, as seen with Meta’s Galactica language model which was released with a demo. The demo was retracted within three days due to the public naming risks such as disinformation generation \cite{heaven_2022}. Both computer science training and low- to no-code interfaces are necessary to streamline sociotechnical research. Moral experiments show varying approaches to ethical problems by background and culture \cite{1334156}, which are urgently needed perspectives in building, evaluating, and deploying new AI systems. Effective design and user interface must be optimized for experts outside of computer science \cite{https://doi.org/10.48550/arxiv.1907.04446}.

\subsection{Closing Resource Gaps}
Resource gaps mainly between among major labs, research groups, and academia have widened \cite{benaich_hogarth_2022}. In addition to the gap hindering groups from developing systems at the same level of performance, it also hinders the ability to build and run exploratory research projects. The monetary, infrastructure, and sometimes skills limitations can bar especially underrepresented groups from contributing to understanding and mitigating risks. Public sector investment at national \cite{nairrtf_2021} and global levels can start to bridge this gap. Grants from developer labs can also sponsor third-party research, but should have built-in mechanisms for also allowing critical research. Infrastructure grants for computer clusters can enable smaller research groups to engage with powerful systems. Skill-building requires longer-term investment. 

\subsection{Technical and Practical Ethics Training}
Increasing access to social scientists and the many multidisciplinary experts underrepresented in the AI research community is insufficient. The technical barriers to evaluating and improving or mitigating harms of AI systems can slow or hinder critical research. Conversely, the lack of practical ethics and science and technology studies (STS) training among technical professionals prevents thoughtfully integrating societal guardrails from project conception throughout the development process. Training must be implemented at early stage education; academic courses and curricula in computer science must integrate social and ethical considerations. Social sciences geared toward examining AI systems must foster technical understanding. 

\subsection{Expert Foresight}
Experts in relevant disciplines should be included while relative risk is low. As generative AI systems become a higher risk for specific applications and fields such as disinformation and medical advice, correlating experts should be tapped in and begin foundational work in mitigating that risk. The rapid rate of AI development means substantial research should be planned in anticipation to prevent the trend of detection and mitigation trailing capabilities advancement. For example, research conducted on radicalization risks of language models using GPT-2 and GPT-3 show that GPT-3 has significantly higher risk potential in generating extremist text \cite{DBLP:journals/corr/abs-2009-06807}. Starting this research with less powerful systems can better inform future mitigation efforts.

\subsection{Multidisciplinary Discourse}
Increasing access to social scientists and the many multidisciplinary experts underrepresented in this discussion is insufficient. Critically, actors in this space must have some incentive to engage in frequent community discussion and be held accountable to commitments for safe releases. Google’s public position on responsible AI practices encourages in-house risk evaluation and mitigation, but conflicts of interest can result in internal critics being unable to share or publish findings \cite{10.1145/3514094.3534181} and dismissal \cite{ebell_2021}. These initiatives can also be formed as an industry argument for self-regulation, but ultimately lack external accountability. A new third-party convening body can help facilitate this discourse. Instead of relying on existing fora, conversations can take lessons from social and abolitionist movements in how to include underrepresented and affected communities \cite{stark_2021}. 

\subsection{Enforcement Mechanisms for Unsafe Release}
Ultimately all actors involved in the release of a powerful AI system must have some incentive to conduct releases safely. But enforcing responsible release requires a definition for what constitutes responsible release. Responsible is distinct from safe, and can emphasize meeting all possible enforceable safety guardrails pre- and post-release. Regulation can mandate that releases include system documentation and auditing for high-risk or high-impact releases. Updatable policies can recommend certain risk controls and guardrails and policy bodies can better fund risk research and development of further evaluations and controls. 

\section{Conclusion}
The gradient of generative AI system release shows the complexity and tradeoffs of any one option. Releases must balance concentration of power and AI risks in addition to considering precedent for future releases as system capabilities increase. Developers and deployers, regardless of release method preference, must engage multidisciplinary experts and the AI community to better form norms for safe release. Existing and evolving risk controls and guardrails require developer, deployer, researcher, and policymaker action and can mitigate some foreseeable harms, but long-term investments in disciplines and discourse across the AI community and among affected peoples are necessary.  

\begin{ack}
 Thank you to Hugging Face for funding this research. 

 Thank you to Joshua Achiam, Stella Biderman, Miles Brundage, Clémentine Fourrier, Yacine Jernite, Margaret Mitchell, Percy Liang, and Sonja Schmer-Galunder for their thoughtful feedback on earlier versions of this paper.

 Thank you to Johann Christensen for testing figures.

\end{ack}

