\section{Introduction}

Level design is a core feature of what defines a video game. When constructed correctly, it is a main determinant of player experience. From the designer's perspective, level design can be either a tedious, but necessary step in the game's development or a creatively freeing process - sometimes it is both. Most levels are designed with the intent to teach the game's interactable space - the mechanics - to the player in a way that is (ideally) engaging, fun, visually pleasing, intuitive, and informative \cite{rogers2014level,koster2013theory,green2017press}.  
Levels designed for tutorial sections of the game create simplistic and low-risk environments. These levels are direct, and sometimes oblique in their intention so the player can grasp the core mechanics of the game as quickly as possible. As the player becomes more familiar with the mechanics and how they work together in the game's system, the levels should also increase in complexity and challenge. The general design of these levels in turn needs to be as complex and engaging both visually and functionally~\cite{khalifa2019intentional,anthropy2014game}.


Most games demonstrate each mechanic at least once throughout the entire level space; combining and ordering them in a way that builds itself based on the player's current skill as they get more familiar with the game \cite{totten_2016,anthropy2014game}. However, designing levels to explore multiple combinations of mechanics is an arduous task to undertake for a level designer. While it is unlikely that a player would play all of these levels, creating this possibility space of levels would allow the level designer to hand-pick and order them in a way that the mechanics feel coherent. Furthermore, an adaptive game with a diverse set of levels would allow the player to explore different combinations of mechanics according to their own pace and preference. For example, if a player is having difficulties with a certain necessary mechanic, such as long jumps in most Super Mario games or the spin attack in most Legend of Zelda games, having a specific subset of levels with a focus on these more challenging mechanics would allow the player to develop their skill with the mechanic better than a single tutorial level~\cite{anthropy2014game}. In contrast, if a player has fully mastered a mechanic, the game could select a level that uses the mechanic in a more challenging situation or a level with an entirely different mechanic space for them to master next. Levels could be selected automatically from the generated level space and dynamically ordered in a way that adapts to the player's current skill level as opposed to a hard-coded level ordering~\cite{yannakakis2011experience}. Generating a diversity of levels that explore multiple mechanic combinations would save time in the design process and allow for more creative flexibility in a way that manually designed levels could not provide. 
 

  

This paper presents a system that seeks to combine human design and AI-driven design to enable mixed-initiative collaborative game level creation. Users can choose to start from a blank slate with their work while adding their own edits then have an AI back-end evolve their work towards a pre-defined objective. This objective function can be defined by minimalism in design, maximization of game mechanic coverage, overall quality, or any other feature that could contribute to the quality of the level. Alternatively, users may select from a variety of AI suggestions and pre-generated samples to begin their work and then make changes as necessary. This design process is not limited to the initializing step of the level; the user and AI system can switch their roles as designers at any point in the creation process. Concurrently, the AI system will look at what its previous users have created and submitted, and ask new users to design levels that complement what's already there. With this design process, the mechanic space of a game can be fully explored and every combination of mechanics can be represented by a level. With a human-based rating system, the automated system can learn to design levels with better quality and the human users can design levels that are missing from this mechanic combination space. 

This project demonstrates the mixed-initiative collaborative process through level design for the independent, Sokoban-like game `Baba is You' - a game whose mechanics are defined and modified by the level design itself and the player's interaction with it. Levels can be made either by users, AI, or a mixed combination of both and uploaded to the level database to be used for future creations and to improve the quality of the AI's objective function.






\subsection{Baba is Y'all v1 (prototype)}

\begin{figure}[ht]
    \centering
    \includegraphics[width=0.8\linewidth]{imgs/level_matrix.png}
    \caption{Baba is Y'all Version 1 Main Screen (from April 2020)}
    \label{fig:levelMat1}
\end{figure}
The first version of Baba is Y'all (BiY v1) was released officially in March 29th, 2020, and promoted chiefly on Twitter. This version served as a prototype and proof-of-concept system for mixed-initiative AI-assisted game content collaboration specifically for designing levels in the game `Baba is You' (Arvi 'Hempuli' Teikari, 2017).

This system was built on concepts from three different areas of content creation:
\begin{itemize}
    \item \textbf{Crowdsourcing:} a model used by different systems that allows a large set of users to contribute toward a common goal provided by the system~\cite{brabham2013crowdsourcing}. For example, Wikipedia users participate to fill in missing information for particular content.
    \item \textbf{User content creation:}
     allows players to create levels for a game/system and upload them online to the level database for other players to play and enjoy - i.e. Super Mario Maker (Nintendo, 2015), Line Rider (inXile Entertainment, 2006), and LittleBigPlanet (Media Molecule, 2008).
    \item \textbf{Quality diversity:} the underlying technique behind our system. It ensures that the levels made from combining the first 2 concepts are of both good quality and diverse in terms of the feature space they are established in~\cite{pugh2016quality}. For this system, the feature space is defined as the potential game mechanics implemented in each level.
\end{itemize}

The Baba is Y'all website (as shown in figure~\ref{fig:levelMat1}) was a prototype example of a mixed-initiative collaborative level designing system. However, the site was limited by the steep learning curve required to interact with the system~\cite{charity2020baba}. Features of the site were overwhelming to use and lack cohesion in navigating the site.

\subsection{Baba is Y'all v2 (updated release)}

\begin{figure}[ht]
    \centering
    \includegraphics[width=0.8\linewidth]{imgs/dark_main.png}
    \caption{Baba is Y'all Version 2 Main Screen (as of September 2021)}
    \label{fig:levelMat2}
\end{figure}


The second version of Baba is Y'all\footnote{http://equius.gil.engineering.nyu.edu/} (BiY v2) was released on May 27th, 2021 and designed to have a more user-friendly setup. It was similarly promoted via Twitter and on mailing lists. This version includes a cleaner, more compact, and more fluid user interface for the entire website and consolidated many of the separate features from the BiY v1 site onto fewer pages for easier access. Three main webpages were created for this updated system.

Unlike the previous version, which showed all of the mechanic combination levels (both from the database and unmade) in random order, the updated level selection page adds level tabs that separates levels by recently added (New), highest rated (Top), and levels with rules that had not been made yet (Unmade.) A carousel scrolling feature shows 9 levels at a time to not overwhelm the player with choices (as shown in figure~\ref{fig:levelMat2}). The level rating system is also included on the main page as a tab, as well as the search feature. The personal level selection tab allows users to see their previously submitted levels and login to their account to submit levels with their username as the author or co-author.

The updated level editing page consolidates both the user editing with the PCG level evolution onto one page. Users can easily switch between manually editing the level themselves and allowing the PCG back-end system to edit the level while pausing in between. Users can also select rule objectives for the system to evolve towards implementing. To fight the problem of blank canvas paralysis, users can start from a set of different types of levels (both PCG and user-made)~\cite{krall2012artist}. Once a level is successfully solved, users may name the level upon submission - further personalizing the levels and assigning authorship.

A slideshow tutorial is provided for the users and describes every feature and function of the site instead of the walkthrough video that was featured on BiY v1. Users can also play a demo version of the `Baba is You' (Arvi 'Hempuli' Teikari, 2017) game to familiarize themselves with the game mechanics/rule space and how they interact with each other (game dynamics). For quick assistance, a helper tool is provided on the level editing page as a refresher on how to use the editing tool.

In addition to updating the features and collecting more data about the levels created, we conducted a formal user study with 76 participants to gather information about which features they chose to use for their level creation process and their subjective opinion on using the site overall. This user study, as well as the general level statistics collected from the site's database, showed that our new interface better facilitated the user-AI collaborative experience to create more diverse levels.

\section{Background and Related Work}

The Baba is Y'all system uses the following methods in the collaborative level design process: procedural content generation to create new levels from the AI backend, quality diversity to maintain the different kinds of levels produced from the system and show the coverage of game mechanics across each level, crowdsourcing so the AI may learn to create new levels from previously submitted "valid" levels - either those made exclusively by users, the system itself, or a combination of both, and finally mixed-initiative AI so that the user and evolutionary algorithm can develop the level together. Each method is described as the following: 

\subsection{Procedural Content Generation}
Procedural content generation (PCG) is defined as the process of using a computer program to create content that with limited or indirect user input \cite{shaker2016procedural}. Such methods can make an automated, quicker, and more efficient content creation process, and also enable aesthetics based on generation. PCG has been used in games from the 1980's Rogue to its descendent genre of the Rogue-likes used in games such as Spelunky (Mossmouth, LLC, 2008) and Hades (Supergiant Games, 2020), as well as games that revolve around level and world generation such as Minecraft (Mojang, 2011) and No Man's Sky (Hello Games, 2016). PCG can be used to build levels such as The Binding of Isaac (Edmund McMillen, 2011), enemy encounters such as Phoenix HD (Firi Games, 2011), or item or weapon generation such as Borderlands (Gearbox Software, 2009). In academia, PCG has been explored in many different game facets for generating assets \cite{ruela2017procedural, gonzalez2020generating}, mechanics \cite{khalifa2019general, togelius2008experiment,browne2010evolutionary}, levels \cite{snodgrass2016learning,charity2020mech}, boss fights~\cite{siu2016programming}, tutorials~\cite{khalifa2019intentional,green2018atdelfi}, or even other generators \cite{kerssemakers2012procedural,earle2021learning,earle2021illuminating,khalifa2020multi}. 

A plethora of AI methods underpin successful PCG approaches, including evolutionary search \cite{togelius2010search}, supervised and unsupervised learning \cite{summerville2018procedural,liu2021deep}, and reinforcement learning \cite{khalifa2020pcgrl}. The results of these implementations have led to PCG processes being able to generate higher quality, more generalizable, and more diverse content. PCG is used in the Baba is Y'all system to allow the mutator module to create new `Baba is You' levels.

\subsection{Quality Diversity}
Quality-diversity (QD) search based methods are increasing in usage for both game researchers and AI researchers \cite{pugh2016quality,gravina2019procedural}. Quality-diversity techniques are search based techniques that try to generate a set of diverse solutions while maintaining high level of quality for each solution. A well-known and popular example is MAP-Elites, an evolutionary algorithm that uses a multi-dimensional map instead of a population to store its solutions~\cite{mouret2015illuminating}. This map is constructed by dividing the solution space into a group of cells based on a pre-defined behavior characteristics. Any new solution found will not only be evaluated for fitness but also for its defined characteristics then placed in the correct cell in the MAP-Elites map. If the cell is not empty, both solutions compete and only the fitter solution survives. Because of the map maintenance and the cell competition, MAP-Elites can guarantee a map of diverse and high quality solutions, after a finite number of iterations through the generated population. 

The MAP-Elites algorithm has also been extended into Constrained MAP-Elites \cite{khalifa2018talakat, khalifa2019intentional, alvarez2019empowering}, Covariance Matrix Adaptation using MAP-Elites (CMA-ME) \cite{fontaine2020covariance}, Monte Carlo Elites~\cite{sfikas2021monte}, MAP-Elites via Gradient Arborescence~\cite{fontaine2021differentiable}, and etc. For this project, we use the Constrained MAP-Elites algorithm to maintain a diverse population of `Baba is You' levels where the behavior characteristic space of the matrix is defined by the starting and ending rules of a level when it is submitted. 

\subsection{Crowdsourcing data and content}
Some, but relatively few, games allow users to submit their own custom creations using the game's engine as most games do not have their source code available or even partially accessible for modifications to add more content in the context of the game. Whether through a built-in level editing system seen in games like Super Mario Maker (Nintendo, 2015), LittleBigPlanet (MediaMolecule, 2008), or LineRider (inXile Entertainment, 2006) or through a modding community that alter the source code for notable games such as Skyrim (Bethesda, 2011) Minecraft (Mojang, 2011,) or Friday Night Funkin' (Ninjamuffin99, 2020), players can create their own content to enhance their experience and/or share with others. 

In crowdsourcing, many users contribute data that can be used for a common goal. Some systems like Wikipedia rely entirely on content submitted by their user base in order to provide information to others on a given subject. Other systems like Amazon's MechanicalTurk crowdsource data collection, such as research experiments \cite{buhrmester2016amazon}, by outsourcing small tasks to multiple users for a small wage. An example of a game generator based on crowdsourced data is Barros et al.’s DATA Agent \cite{barros2018killed,green2018data}, which uses crowd-sourced data such as Wikipedia to create a point-click adventure game sourced from a large corpus of open data to generate interesting adventure games.

What differentiates the Baba is Y'all system from other level editing systems or interactive PCG systems is that the Baba is Y'all site has a central goal: populate the MAP-Elites matrix with levels that cover all possible rule combinations. With this system, users may freely create the levels they want, but they may also work towards completing the global goal of making levels with a behavior characteristic that has not been made before. Participation in this task is encouraged by the AI back-end system that keeps track of missing cells in the MAP-Elites matrix. 


\subsection{Mixed-Initiative AI}
Mixed-initiative AI systems involve a co-creation of content between a human user and an artificially intelligent system~\cite{yannakakis2014mixed}. Previous mixed-initiative systems include selecting from and evolving a population of generated images \cite{secretan2008picbreeder,bontrager2018deep}, composing music \cite{mann2016ai,tokui2000music}, and creating game levels through suggestive feedback \cite{machado2019pitako}. Mixed-initiative and collaborative AI level editors for game systems have thoroughly been explored in the field as well through direct and indirect interaction with the AI backend system \cite{shaker2013ropossum,liapis2013sentient,butler2013mixed,guzdial2018co,zhou2021toward,bhaumik2021lode,alvarez2019empowering,smith2010tanagra,delarosa2021mixed}. 

Since the release of the first Baba is Y'all prototype and paper~\cite{charity2020baba}, the implementation of mixed-initiative systems have grown in the game and AI research field. Bhaumik implemented an AI constrained system with their Lode Encoder level editing tool that only allowed users to edit a level from a set of levels generated by a variational autoencoder - forcing users to only edit from a palette provided by the AI back-end tool~\cite{bhaumik2021lode}. Delarosa used a reinforcement learning agent in a mixed-initiative web app to collaboratively suggest edits to Sokoban levels \cite{delarosa2021mixed}. Zhou used levels generated with the AI-assisted level editor Morai Maker (a Super Mario level editor) to apply transfer learning for level editing to Zelda \cite{zhou2021toward}. These recent developments look more into how the human users are affected through their relationship with collaborating with these AI systems and how it can be improved through examining the dimensionality of the QD algorithm, the evolutionary process, or the human-system interaction itself \cite{alvarez2020exploring}. We look to incorporate these new perspectives into this updated iteration of Baba is Y'all and evaluate the effects through a user study.

\section{System Description}
The updated Baba is Y'all site's features were condensed into 2 main pages to make navigation and level editing much easier and intuitive:
\begin{itemize}
    \item \textbf{The Home Screen:} contains the level matrix \textit{Map Module}, the search page, the \textit{Rating Module} page, and the \textit{User Profile} page. From here, users can also change the visuals of the site from light to dark mode, view the tutorial section or the site stats page by clicking on the Baba and Keke sprites respectively at the top of the page, and create a new level from scratch by clicking on various 'Create New Level' buttons placed on various subpages. Figure~\ref{fig:levelMat2} shows the starting page of the home screen.
    \item \textbf{The Level Editor Screen:} contains both the \textit{Editor Module} and the \textit{Mutator Module}. Users can also test their levels with themselves or with the Keke solver by clicking on the Baba and Keke icons at the bottom of the canvas. Figure~\ref{fig:editor_screen} shows the starting page of the level editor screen.
\end{itemize}
In the following subsections, we are going to explain the different modules that constitutes these two main screens. Each of the following modules are either being used in the home screen, the level editor screen, or both.

\subsection{Baba is You}

`Baba is You' (Arvi ``Hempuli'' Teikari, 2019) is a puzzle game where players can manipulate the rules of a level and properties of the game objects through Sokoban-like movements of pushing word blocks found on the map. These dynamically changing rules create interesting exploration spaces for both procedurally generating the levels and solving them. The different combinations of rules can also lead to a large diversity of level types that can be made in this space. 


The general rules for the `Baba is You' game can be referred to from our previous paper~\cite{charity2020baba}. To reiterate, there are three types of rule formats in the game:
\begin{itemize}
    \item \textbf{X-IS-(KEYWORD)} a property rule stating that the game object class `X' has a certain property such as `WIN', `YOU', `MOVE', etc.
    \item \textbf{X-IS-X} a reflexive rule stating that the game object class `X' cannot be changed to another game object class.
    \item \textbf{X-IS-Y} a transformative rule changing all game objects of class `X' into game objects of class `Y'.
\end{itemize}


\begin{figure}
    \centering
    \includegraphics[width=0.4\linewidth]{imgs/simple_level.png}
    \caption{An example of a simple `Baba is You' level.} 
    \label{fig:simple_map}
\end{figure}

The game sprites are divided into two main different classes: the object class and the keyword class. Sprites in the object class represent the interactable objects in the map as well as the literal word representation for the object. Sprites in the keyword class represent the rules of the level that manipulate the properties of the objects. For example, figure~\ref{fig:simple_map} shows four different object class sprites [BABA (object and corresponding word) and FLAG (object and corresponding word)] and three different keyword class sprites [IS (x2), YOU, and WIN]. The keyword class sprites are arranged in two rules: `BABA-IS-YOU' allowing the player to control all the Baba objects and `FLAG-IS-WIN' indicating that reaching any flag object will make the player win the level. The system has a total of 32 different sprites: 11 object class sprites and 21 keyword class sprites. Because the game allows rule manipulation, object classes are arbitrary in the game as they serve only to provide a variety of objects for rules to affect and for aesthetic pleasure. 


\subsection{Game Module}

The game module is responsible for simulating a `Baba is You' level. It also allows users to test the playability of levels either by directly playing through the level themselves or by allowing a solver agent to attempt to solve it. This component is used on the home screen when a user selects a level to play and the editor screen for a user to test their created level.

Because the game rules are dynamic and can be altered by the player at any stage in the solution, the system keeps track of all the active rules at every state. Once the win condition has been met, the game module records the current solution, the active rules at the start of the level, and the active rules when the solution has been reached. These properties are saved to be used and interpreted by the Map module (section~\ref{sec:map_module}). 
The activated rules are used as the level's characteristic feature representation and saved as a chromosome to the MAP-Elites matrix.




The game module provides an AI solver called 'KEKE' (based on one of the characters traditionally used as an autonomous 'NPC' in the game). KEKE uses a greedy best-first tree search algorithm that tries to solve the input level. The branching space is based on the five possible inputs a player can do within the game: move left, move right, move up, move down, and do nothing. The algorithm uses a heuristic function based on a weighted average of the Manhattan distance to the centroid distance for 3 different groups: keyword objects, objects associated with the `WIN' rule, and objects associated with the `PUSH' rule. These were chosen based on their critical importance for the user solving the level - as winning objects are required to complete the level, keyword objects allow for manipulation of active rules, and pushable objects can directly and indirectly affect the layout of a level map and therefore the accessibility of player objects to reach winning objects. The heuristic function is represented by the following equation:
\begin{equation}
    h = (n + w + p) / 3
\end{equation}
where $h$ is the final heuristic value for placement in the priority queue, $n$ is the minimum Manhatttan distance from any player object to the nearest winnable object, $w$ is the minimum Manhatttan distance from any player object to the nearest word sprite, and $p$ is the minimum Manhatttan distance from any player object to the nearest pushable object.

As an update for this version of the system, the agent can run for a maximum of 10000 iterations and can be stopped at any time. A user may also attempt to solve part of the level themselves and the KEKE solver can pick up where the user left off to attempt to solve the remainder of the level. This creates a mixed-initiative approach to solving the levels in addition to editing the levels. However, even with this collaborative approach, the system still has limitations and difficulty solving levels with complex solutions - specifically solutions that require back-tracking across the level after a rule has been changed. The solver runs on the client side of the site and is limited by the capacity of the user's computational resources. Future work will look into improving the solver system to reduce computational resource. We will also look for better solving algorithms to improve the utility of the solver such as Monte Carlo Tree Search (MCTS) with reversibility compression~\cite{cook2021monte}.

\subsection{Editor Module}

\begin{figure}[ht]
    \centering
    \includegraphics[width=0.9\linewidth]{imgs/editor_screen.png}
    \caption{A screenshot of the level editor screen} 
    \label{fig:editor_screen}
\end{figure}

The editor module of the system allows human users to create their own `Baba is You' levels in the same vain of Super Mario Maker (Nintendo, 2015). Figure~\ref{fig:editor_screen} shows the editor window that is available for the user. The user can place and erase any game sprite or keyword at any location on the map using the provided tools. As a basis, the user can start modifying either a blank map, a basic map (a map with X-IS-YOU and Y-IS-WIN rules already placed with X and Y objects), a randomly generated map, or an elite level provided by the Map Module. Similar to Super Mario Maker (Nintendo, 2015), the created levels can only be submitted after they are tested by the human player or the AI agent to check for solvability. For testing the level, the editor module sends the level information to the game module to allow the user to test it. 

This updated version of the site also includes an undo and redo feature so that users may erase any changes they make. A selection and lasso feature is also available so users can select specific areas of the level and move them to another location. Unlike the previous version, all tiles are available to the user on the same screen and the user may seamlessly transition from the editor module to the mutator module and vice versa for ease of access and better interactivity and collaboration between the AI system and the user.

\subsection{Mutator Module}\label{sec:mutator_module}

\begin{figure}[ht]
    \centering
    \includegraphics[width=0.8\linewidth]{imgs/evolver_screen.png}
    \caption{A screenshot of the level evolver page}
    \label{fig:evolver_screen}
\end{figure}

The Mutator module is a procedural content level generator. More specifically, the Baba is Y'all system uses an evolutionary level generator that defines a fitness function based on a version of tile-pattern Kullback-Liebler Divergence (ETPKLDiv\footnote{https://github.com/amidos2006/ETPKLDiv}) algorithm~\cite{lucas2019tile}. Figure~\ref{fig:evolver_screen} shows the updated interface used by the evolver. As mentioned before in the previous subsection, this version of the mutator module can interface seamlessly with the other modules to allow the user more ease of access between manual editing and evolutionary editing. The user can easily transfer the level from the editor module to the mutator module and vice versa. 
When switching between the editor module and the mutator module, the level loses its pure procedurally generated or pure human-designed quality and becomes a hybrid of the two - thus mixed-initiative interaction between the algorithm and the user. 

The evolver interface provides the user with multiple customizations such as the initialization method, stopping criteria, evolution pausing, and an application of a mutation function allowing manual user control. With these features, the user is not directly changing the evolution process itself, but instead guiding and limiting the algorithm towards generating the level they want.

The ETPKLDiv algorithm uses a 1+1 evolution strategy, also known as a hillclimber, to improve the similarity between the current evolved levels and a reference level. The algorithm uses a sliding window of a fixed size to calculate the probability of each tile configuration (called tile patterns) in both the reference level and the evolved level and tries to minimize the Kullback-Liebler Divergence between both probability distributions.

Like Lucas and Volz, we use a window size of 3x3 for the tile selection. This was to maximize the probability of generating initial rules for a level, since rules in `Baba is You' are made up of 3 tiles. However, in our project, we used 2+2 evolution strategy instead of 1+1 used to allow slightly more diversity in the population~\cite{lucas2019tile}. We also modified the fitness function to allow it to compare with more than one level. The fitness value also includes the potential solvability of the level ($p$), the ratio of empty tiles ($s$), and the ratio of useless sprites ($u$).  
The final fitness equation for a level is as follows:
\begin{equation}\label{eq:fitness}
    fitness_{new} = min(fitness_{old}) + u + p + 0.1 \cdot s
\end{equation}
where $fitness_{old}$ is the Kullback-Lievler Divergence fitness function from the Lucas and Volz work~\cite{lucas2019tile} compared to a reference level. The minimum operator is added as we are using multiple reference levels instead of one and we want to pick the fitness of the most similar reference level.

In the updated version of Baba is Y'all, we recalculate the ratio of useless objects ($u$) used in the original version's equation. The value $u$ is defined as the combined percentage of unnecessary object and word sprites in the level. This is broken up into 2 variables $o$ and $w$ for the objects and words respectively. The $o$ value corresponds to the objects that are not required or predicted to act as a constraint or solution for the level. The value for $o$ can be calculated as follows:
\begin{equation}
    o = \frac{i}{j}
\end{equation}
where $i$ is the number of objects sprites initialized in the level without a related object-word sprite and $j$ is the total number of object sprites initialized in the level. While the $w$ value corresponds to the words that have no associated object in the map (this does not apply to keyword class words such as ``KILL'' or ``MOVE''). The value for $w$ can be calculated as follows: 
\begin{equation}
    w = \frac{k}{l}
\end{equation} 
where $k$ is the number of word sprites initialized in the level without a related object-word sprite and $l$ is the total number of word sprites initialized in the level. To combine both variables $o$ and $w$ into the one variable $u$ a constant ratio is applied. In the system, 0.85 is applied to the $o$ variable and 0.15 to $w$. This is to more weight on reducing the number of useless object sprites as opposed to useless word sprites, as word sprites can be used to modify the properties of objects or transform other object sprites.

The $u$ value is implemented in order to prevent noise within the level due to having object tiles that cannot be manipulated in any way or have relevancy to the level. A human-made level may include these ``useless'' tiles for aesthetic purposes or to give the level a theme - similar to the original `Baba is You' levels. However, the PCG algorithm optimizes towards efficiency and minimalist levels, therefore ignoring the subjective aspect of a level's quality (which can be added later by the user).

The playability of the level ($p$) is a binary constraint value that determines whether a level is potentially winnable or not. The value can be calculated as follows:
\begin{equation}
    p = 
    \begin{cases}
        1, & \text{has [`X-IS-YOU' rule, `WIN' keyword]} \\
        0, & otherwise
    \end{cases}
\end{equation}
This is to ensure any levels that are absolutely impossible to play or win are penalized in the population and less likely to be mutated and evolved from in future generations. We used a simple playability constraint check instead of checking for playability using the solver because the solver take time to check for playability. Also, all playable levels by the solver usually end up being easy levels due to the limited search space we are given for the best first algorithm.

The ratio of empty tiles ($s$) is the ratio of empty space tiles to all of the tiles in the level. The equation can be calculated as follows:
\begin{equation}
    s = \frac{e}{t}
\end{equation}
where $e$ is the number of empty spaces in the level and $t$ is the total number of tiles found in the level. The value $s$ is multiplied with a value of $0.1$ in equation~\ref{eq:fitness} to avoid heavy penalization for having any empty spaces in a level and to prevent encouragement for levels to mutate towards populating the level with an overabundance of similar tiles in order to eliminate any empty space.

The Mutator module is not run as a back-end process to find more levels, instead it has to be done manually by the user. This is done due to the fact that some generated levels cannot be solved without human input. One might wonder why not generate a huge corpus of levels and ask the users later to test them for the system. This could result in the system generating a multitude of levels that are either impossible to solve or are solvable but not subjectively ``good'' levels - levels the user would not find pleasing or enjoyable. This overabundance of ``garbage'' levels could lead to a waste of memory and a waste human resources. By allowing the user direct control over which levels are submitted from the generation algorithm, it still guarantees that the levels are solvable and with sufficient quality and promote using the tool in a mixed-initiative approach. Future work will explore implementing a fully autonomous generator and associated solver to expand the archive of levels without human input.

\subsection{Objective Module}\label{sec:objective_module}

\begin{figure}[ht]
    \centering
    \includegraphics[width=0.9\linewidth]{imgs/obj_screen.png}
    \caption{A screenshot of the rule objective screen} 
    \label{fig:objective_screen}
\end{figure}

In conjunction with the Mutator module (section~\ref{sec:mutator_module}), an Objective Module has been implemented to help guide the evolver towards generating levels that match selected objectives - or rules - set by either the Map Module or the user. Like before this will nudge both the user and the evolver back-end towards creating levels with mechanic combinations that have not been made in the site database. 

Users can select from the table of mechanics which sets of rules to include in the level - whether initially at the start of the level, at the solution, or either. Initial rules can be found automatically when the user or evolver edits the level, final rules can only be determined at the end of the level - when the solution has been found. Active rules are highlighted with a green backlight in the table and change accordingly when a rule is created or removed. 

The evolver also prioritizes levels that match as many of the selected rules as possible. A cascading function is used to rank the generated levels from the chromosome population. The evolver first evaluates how well a generated level corresponds to the selected objectives then looks at the fitness function. With this, the evolver becomes more involved with expanding the level database for the site and actively tries to help the user fill these missing levels.

\subsection{Rating Module}\label{sec:rating_module}

\begin{figure}[ht]
    \centering
    \includegraphics[width=0.9\linewidth]{imgs/rating_screen.png}
    \caption{A screenshot of the rating screen with 2 levels shown} 
    \label{fig:rating_screen}
\end{figure}

Like the original system, a rating for a single level is determined by comparison to another level within the site database. The user must determine the better level based on two qualities: level of challenge and quality of aesthetic design. A level that is considered `more challenging' could indicate that the solution search space for the level takes longer to arrive at or is not as intuitive or straightforward. A level that is considered to have `better design' represents that the level is more visually pleasing and elegant with its map representation - a quality that is hard to generate automatically with AI. Users can select between the two levels for each feature by shifting a slider towards one level or the other. 

\subsection{Map Module}\label{sec:map_module}

\begin{figure}[ht]
    \centering
    \includegraphics[width=0.9\linewidth]{imgs/map_screen.png}
    \caption{A screenshot of the map selection screen} 
    \label{fig:select_screen}
\end{figure}

The Map module functions as both storing all of the levels in the site database as well as recommending specific levels to the user to use for their own level creation process. The Map module is the core module of the system. To maintain distinguish-ability between quality and diverse levels, we implemented the MAP-Elites algorithm for this module. 

\begin{table}[t]
    \caption{Chromosome Rule Representation}
    \centering
    \begin{tabular}{|p{0.2\linewidth}|p{0.7\linewidth}|}
    \hline
         Rule Type & Definition \\
    \hline
    \hline
        X-IS-X & objects of class X cannot be changed to another class \\
        X-IS-Y & objects of class X will transform to class Y \\
        X-IS-PUSH & X can be pushed \\
        X-IS-MOVE & X will autonomously move \\
        X-IS-STOP & X will prevent the player from passing through it\\
        X-IS-KILL & X will kill the player on contact\\
        X-IS-SINK & X will destroy any object on contact\\
        X-IS-[PAIR] & both rules 'X-IS-HOT' and 'X-IS-MELT' are present \\
        X,Y-IS-YOU & two distinct objects classes are controlled by the player \\
    \hline
    \end{tabular}
    \label{tab:rrp}
\end{table}

When a level is submitted to be archived, the system uses the  list of active rules at the start and the end of the level as behavior characteristic for the input level to determine its location in the map. There are 9 different rules checked for in each level - based on the possible rule mechanics that can be made in the Game module system. Table \ref{tab:rrp} shows the full list of possible rules. Since these rules can be active at the beginning or at the end, it makes the number of behavior characteristics equal to 18 instead of 9 which provide us with a map of $2^{18}$ cells.

The Map Module can recommend levels to start from when designing a new level. Like the Mutator Module (section~\ref{sec:mutator_module}), it also takes the Objective Module (section~\ref{sec:objective_module}) into consideration when selecting its recommendations. The Map Module can provide levels that most similarly match the objectives chosen and provide either other levels the user has previously made or high rated (and intuitively high quality) ``elite'' levels. 

In this project we are using a multi population per each cell of the Map-Elites similar to the constrained Map-Elites~\cite{khalifa2018talakat}. The quality of the level is determined by user ratings - performed by the Rating Module.

\subsection{User Profiles}

\begin{figure}[ht]
    \centering
    \includegraphics[width=0.9\linewidth]{imgs/user_screen.png}
    \caption{A screenshot of the user profile screen for the user 'Milk'} 
    \label{fig:profile_screen}
\end{figure}

The user profiles feature is the newest addition to the Baba is Y'all site. Like the original system, if a user creates a profile through the site's login system and submits a level, they get authorship attributed to the submitted level. Users can also find their previously made levels on the  profile page - called ``My Levels'' - and replay them, edit them, or view the level's mechanic combination. A user's personal stats for their level submissions can also be viewed on the page including the number of levels submitted, number of rule combinations contributed, and their top rated level. This feature was implemented to provide more user agency and personalization on the site and give users better access to their own submitted levels. 

Through the search page, players can search for specific levels by username or by level name. This creates a sense of authorship over each of the levels, even if the level wasn't designed with any human input (i.e. a level with PCG.js as the author) and encourages the collaborative nature of the site between AI and human. Users may also share links to site levels via the game page.

\section{Results}

The following results were extracted from the entire Baba is Y'all v2 site and includes data from levels made from participants not involved with the study. 

\subsection{User and Author-based Data}


All users on the Baba is Y'all site had the option of registering for a new account to easily find their saved work as well as attribute personal authorship to any levels they submitted. Those who participated in the user study were given pre-made usernames in order to verify the levels they submitted from their responses and to protect their identities. These users only had to provide an email address to register for both the site and the survey. 
The site had a total of 727 unique users registered - only 78 (10\%) came from outside of the user study while the rest of the users participated in the survey.

\begin{figure}[ht]
    \centering
    \includegraphics[width=0.95\linewidth]{imgs/Level-types.png}
    \caption{Sample levels generated for the system. The left column is user generated levels, the middle column is evolver module levels, and the right column is mixed-initiative user and evolver levels}
    \label{fig:level_types}
\end{figure}

We looked into all the levels created by the users and we divided them based on how the mixed-initiative tool was used to create them. We divided them into three main categories (as shown in figure~\ref{fig:level_types}):
\begin{itemize}
    \item \textbf{User-Only levels:} were created from a blank map exclusively by the human user without any AI assistance.
    \item \textbf{PCG-only levels:} were created solely by the AI tool without any human input aside from choosing which tool to use and when.
    \item \textbf{Mixed-author levels:} involved both the human user as well as the AI tool in the creation process of the level. 
\end{itemize}

\begin{table}[ht]
\begin{center}
\begin{tabular}{|c c c|} 
 \hline
 Author Type & Number & \%\\
 \hline\hline
 User-only & 103 & 66.45 \\
 \hline
 PCG-only & 16 & 10.32\\
 \hline
 Mixed-author & 36 & 23.23\\
 \hline
 \hline
 Total & 155 & 100\\
 \hline 
\end{tabular}
\end{center}
\caption{Authorship for levels submitted}
\label{tab:level_author}
\end{table}

The majority of the levels submitted were user only (66.45\%), however almost a quarter (23.23\%) of the levels submitted had mixed-authorship. Table \ref{tab:level_author} shows the full data for this area. Looking at this table, we notice that the amount of submitted levels are a lot less than total number of users ($155$ levels and $727$ users). This big difference in the numbers is due to releasing the system online with no security measures. This attracted a lot of bots that created multiple accounts so they could fill out the user survey via the link provided, but did not submit any levels.

\subsection{Level-based Data}


\begin{figure}[ht]
    \centering
    \includegraphics[width=0.8\linewidth]{site_graphs/rule_perc.png}
    \caption{Site results for the rule distribution across levels submitted}
    \label{fig:level_rule_dist}
\end{figure}

Looking into all the $155$ submitted levels, we found only $74$ different cells in the MAP-Elites matrix were covered. This is less than 1\% of the whole number of possible rule combinations ($2^{18}$ possible combinations). Figure~\ref{fig:level_rule_dist} shows the rule distributions over all of the levels submitted. The X-is-KILL rule was used the most in over half of the levels submitted and the X-is-STOP rule was used the second-most at 44.52\%. This may be because these rules create hazards for the player and add more depth to the level and solution. Meanwhile, the X-is-[PAIR] rule was used the least in only 12.9\% of the levels submitted. This is likely due to the lock-and-key nature of the rule combinations that require more intentionally placed word blocks that can also be accomplished with the X-is-SINK or X-is-KILL rule.

\begin{table}[ht]
\begin{center}
\begin{tabular}{|c c c c|} 
 \hline
 User Type & \# Rules & Sol. Length & Map Size (\# tiles) \\
 \hline\hline
 User-only & 2.563 $\pm$ 2.19 & 25.834 $\pm$ 26.11 & 117.883 $\pm$ 50.84\\
 \hline
 PCG-only & 1.00 $\pm$ 1.17 & 19.062 $\pm$ 14.68 & 95.437 $\pm$ 25.60\\
 \hline
 Mixed-author & \textbf{2.833 $\pm$ 2.56} & \textbf{26.027 $\pm$ 20.36} & \textbf{127.722 $\pm$ 49.91}\\
 \hline
\end{tabular}
\end{center}
\caption{Averaged attributes for different types of created levels}
\label{tab:avg_author}
\end{table}


\begin{figure}[ht]
    \centering
    \includegraphics[width=1.0\linewidth]{site_graphs/rule_dist.png}
    \caption{Rule distributions across the different authored levels}
    \label{fig:rule_dist}
\end{figure}

The relation between rules and the different type of authors can be shown in table~\ref{tab:avg_author}. Some levels may use no rules at all (only containing the required X-is-YOU and X-is-WIN rules.) The mixed-author levels has the highest number of average rules per level ($2.833$), while PCG-only levels have the lowest average ($1$). The rule distributions for each author type are shown in Figure \ref{fig:rule_dist}. The PCG-authored levels had the least variability between rules while the Mixed-authored levels had the most variability. Mixed-author levels also had the highest average solution length and highest average level size, with PCG levels having the lowest for both attributes. 



\section{User Study}
The following results were extracted from a Google Form survey given to the experiment participants. Users were instructed to play a level already made on the site, create a new level using the level editor, test it, and finally submit it to the site. They were also given the option to go through the tutorial of the site if they were unfamiliar with the `Baba is You' game or needed assistance with interacting with the level editor tool. 

Of the $727$ users registered on the site, only a total of $170$ responses were received, however, only $76$ of these responses were valid. These responses were evaluated based on cross-validation and verification between the saved level on the website and the level ID they submitted via the survey that they claimed they authored. Many of these invalid responses contained levels that either did not exist in the database or were claimed to be authored by another user already. The following results are taken from the self-reported subjective survey given to the valid $76$ users. 

\subsection{Demographic Data}\label{sec:demographics}





\begin{figure}[ht]
    \centering
    \includegraphics[width=1.0\linewidth]{hor_survey_graphs/freq_v2.png}
    \caption{A. Frequency for playing games; B. Frequency for designing levels for games}
    \label{fig:freq_des_play}
\end{figure}

\begin{figure}[ht]
    \centering
    \includegraphics[width=1.0\linewidth]{hor_survey_graphs/pref_v2.png}
    \caption{Preference for solving or making puzzles}
    \label{fig:design_pref}
\end{figure}

Half of the users who completed the survey answered that they frequently played video games (more than 10 hours a week) with around 80\% of the users stating they play for at least 2 hours a week (figure~\ref{fig:freq_des_play}). Conversely, only 28.9\% of users responded that they spend 2 or more hours a week designing levels for games with 40.8\% of users stating they never design levels at all (figure~\ref{fig:design_pref}). When asked if they prefer to solve or make puzzles, 50\% of participants responded that they prefer to solve puzzles, while only 6.6\% preferred the latter. 40.8\% of users were split on the preference for designing and solving puzzles. 



\begin{figure}[ht]
    \centering
    \includegraphics[width=1.0\linewidth]{hor_survey_graphs/experience_v2.png}
    \caption{A. Experience playing Sokoban; B. Experience with 'Baba is You'; C. Experience with AI-assisted level editing tools}
    \label{fig:exp_graph}
\end{figure}

We asked participants if they had ever played the original game `Baba is You' by Hempuli (either the jam version or the Steam release as both contain the rules used in the Baba is Y'all site), played a Sokoban-like game (puzzle games with pushing block mechanics), and have experience with AI-assisted level editing tools. Figure~\ref{fig:exp_graph} shows the distribution of the users' answers for these questions. Only 30\% of participants had played the game before, meanwhile 22\% had heard of it but had never played it. For the rest, this study would be their first experience with the game. Interestingly enough, 96\% of the participants stated they had played a Sokoban-like game so we can infer that the learning curve would not be too harsh for the new players. Concerning AI-Assisted level editing tools, 75\% of users had never used them before, with 5.3\% stating they were unsure if they had ever used one - thus the learning curve for AI-collaboration would be much higher and new to participants.



\subsection{Self-Reported Site Interactions}

\begin{figure}[ht]
    \centering
    \includegraphics[width=1.0\linewidth]{survey_graphs/feat.png}
    \caption{Survey results for users' reports on the features they used}
    \label{fig:feat_report}
\end{figure}

Figure \ref{fig:feat_report} shows the full list of features that participants interacted with on the site. Users were given the optional task to go through the tutorial section of the Baba is Y'all site to familiarize themselves with both the mechanics of the original `Baba is You' game, the AI assisted tools available to them through the level editor, and the site layout and navigation itself. 81.6\% of users went through this tutorial (whether fully or partially was not recorded.) The second task for users was to play a level that was previously submitted to the website database. 100\% of users were able to solve a level by themselves, however 72.4\% of users reported choosing to watch the Keke AI solver complete the submitted level as well. The third and final task for the participants was to submit their own `Baba is You' level using the level editor. Here, users were asked the most about their involvement with the AI system. Some users chose to create more than one level, so they may have multiple experiences and their design choices may not be mutually exclusive (i.e. using a blank level and also using an AI-suggested level.)

For the initial creation of the level, 88.2\% of users chose to start with a blank map. 9.2\% of users started with a level that had already been submitted to the level database - either a level that had been ranked as an elite level or a level created by the user themselves (in the case that they submitted more than one level during this study.) 6.6\% of users started with a level that was suggested from the 'Unmade' page - ideally with the intent to make a level with a rule combination that had not been made yet - thus expanding the MAP-Elites rule combination matrix in the database. Unfortunately, we forgot to ask users in the survey if they started with the random level option that was also provided by the AI assistance tool - so we lack data to report on this statistic. 

For editing the level, 81.6\% of users reported editing a level completely by hand without any AI assistance. 27.6\% of users edited the level with help from either the evolver algorithm or the mutator functions provided by the AI assistance back-end.  19.7\% of users reported using the objective table to aid the evolver tool in creating the level. We think this low percentage is attributed the fact that a large population of users were unfamiliar with the system or `Baba is You' game overall. This - as well as the lack of selection for level comparison from the previously submitted levels in the database - made using the evolver tool towards certain goals too steep of a task to accomplish and learn. Finally, when testing the level, 59.2\% of users reported using the Keke solver AI when testing their levels and 72.4\% of users named their levels.

While not required in the tasks given, we also asked participants about any extra site features they chose to explore. 23.7\% of users reported submitting a level rating from the 'Rate' page. 51.3\% of users reported using the 'Search' tool to search for specific levels (what their search criteria was we did not ask.) Finally, 19.7\% of users reported using the 'Share Level' to share a submitted level link with others online. 

The least used interactions - 'Started with a database-saved map in the level editor', 'Started with a level suggestion from the Unmade page', and 'Used the objectives table to evolve levels' - were also all related to the AI mixed-initiation of the system. The first could be attributed to a lack of overall levels in the database (at the start of the experiment there were only around 40 available levels) therefore leading to a lack of viable options for the user to choose from. However, the lack of usage for the other two features could be attributed to the opposite problem of having too many options to choose from - again due to lack of levels available to choose from in the database. Trying to make a level with constrained parameters may have also been too steep of a task to accomplish for someone who was totally unfamiliar with the system or even the `Baba is You' game overall. There was also no incentive for a player to create a level suggested by the system as opposed to making a level from scratch. We also didn't explicitly instruct users to make a level from the suggested set, and instead allowed them to make whatever level they wanted with the editor - whether with the prompted ruleset or from their own ideas.

\section{Discussion}
\subsection{Data Analysis}
It is clear from both the submitted level statistics of the site and the self-reported user survey that mixed-authorship is not the preference for users when designing levels. Many users would still prefer to have total control over their level design process from start to finish. For future work, we can look to limit user control and encourage more AI-assistance with the design process similar to the work done by Bhaumik et al.~\cite{bhaumik2021lode}. 

The limitations of the AI back-end (both the evolver and solver) may be at fault for the lack of AI interaction. The mutator and evolver system are dependent on previously submitted levels and level ratings in order to ``learn'' how to effectively evolve levels towards high quality design. As a result, the assistant tool is always learning what makes a ``good'' level from human input. If there is a lack of available data for the tool to learn from, the AI will be unable to create quality levels - causing the user to less likely submit mixed-initiative co-created levels, and causing a negative feedback loop. 

The fitness function defined for the evolver and mutator tool may be inadequate for level designing. It could produce a level that is deemed ``optimal'' in quality by its internal definition, but may actually be sub-par in quality for a human user. Another flaw in the AI-collaboration system, could be that the users lacked direct control on the evolver and mutator and attempting to use them in middle of creation might have been more problematic as it could destroy some of the level structures that the users were working on. Future work could remedy this problem by giving users various mutation "options" similar to the AI selections in RLBrush \cite{delarosa2021mixed} and Pitako. \cite{machado2019pitako} Finally, the `Keke' AI solver was also lacking in performance as a few participants mentioned that the solver was unable to solve their prototype levels that they themselves could end up solving in just a couple of moves. An improved AI solver would help with the level creation efficiency. 

\subsection{User Comments and Feedback}
We gave the participants opportunities to provide open feedback about their experience using the site in order to gather more subjective data about their experience as well as collect suggestions for potential new features.

Almost no users experienced any technical difficulties or bugs that prevented them from using the site. The few that did mentioned formatting issues with site caused by their browser (i.e. icons too close together, loading the helper gifs, font colors.) However, one user mentioned that this issue may have been because they were using the site from their phone (we unfortunately did not provide users with instructions to complete the study on a desktop or laptop.) In the future, we will be sure to exhaustively test the site on as many browsers as possible - both desktop-based and mobile - to be more accessible. 

Some users were confused by the tutorial and the amount of information it conveyed for the entire site citing it as ``intimidating'', ``overwhelming'', and ``a bit complex''. However, other users reported the lack of information saying it was ``not detailed'', or had ``sufficient information [...] but could have been delivered in a more comprehensible way.'' To make the game more accessible, we will most likely try to make the tutorial section less intimidating to new users by limiting the amount of information shown (possibly through a ``table of contents'' as suggested by one participant) while still being comprehensible enough to understand the level editor and tools. 

For feature suggestions, many users wished for larger maps and vocabulary - like those found in the Steam-release `Baba is You' game. Users also wished for a save feature that would allow them to make ``drafts'' of their level to come back later to edit. Many users also suggested a co-operative multiplayer feature for level editing and level solving - we can assume with another human and not an AI agent.

\begin{figure}[ht]
    \centering
    \includegraphics[width=1.0\linewidth]{hor_survey_graphs/browser_v2.png}
    \caption{User feedback for likelihood to return using the site after the experiment}
    \label{fig:reuse_likelihood}
\end{figure}

While the results of the statistics on the levels submitted were disappointing for involvement of the AI assisting tool, we also asked users how likely they would continue using the site after the experiment. 38.2\% of users said they would continue to use the site, while 55.3\% said they would maybe use the site (figure~\ref{fig:reuse_likelihood}). Many users were optimistic and encouraging with the concept of incorporating AI and PCG technologies with level design - citing the project as a ``cool project'', ``a very unique experience'', a ``lovely game and experiment'', and ``very fun.'' At the time of writing, a few users did return, as their 'Keke' assigned usernames were shown as authors on the New page, long after the study was completed. Most notably, the Keke subject user Keke978 who took up the username 'Jme7' and contributed 28 more levels to the site after the study was concluded and currently holds the title for most levels submitted and most rule combinations on the site.


Many users also provided us with constructive feedback for feature implementation, site usability, and suggestions for improvement with how to further incorporate the AI back-end interactivity. As shown in figure~\ref{fig:exp_graph}, 70\% of users who played with the system had never played the game `Baba is You' and 75\% of people had never used an AI-assisted level editor tool before this experiment. Based on this information and retainability of users to complete the survey and provide the constructive feedback, we can extrapolate 2 conclusions: 1. the game stands alone, independent of `Baba is You', as an entertainment system; and 2. for people with even limited AI-gaming experience, as long as they are not completely foreign to gaming, this project has the ability to grasp their attention long enough to understand it, tinker around, and then give constructive feedback.
 

\section{Conclusion and Future Work}
The results from the user study have demonstrated both the benefits and limitations of a crowd-sourced mixed-initiative collaborative AI system. Currently, users still prefer to edit most of the content themselves, with minimal AI input - due to the lack of submitted content and ratings for the AI to learn from. Pretraining the AI system before incorporating it into the full system would be recommended to create more intelligent systems that can effectively collaborate with their human partners for designing and editing content. This would lead to more helpful suggestions on the evolver's end as well as better designed levels overall. This project is the start of a much longer and bigger investigation into the concept of crowd-sourced mixed initiative systems that can use quality diversity methods to produce content and we have many more ideas to improve upon the Baba is Y’all system. 

As suggested by many participants in the user study, we would like to incorporate level design collaborations between multiple users and multiple types of evolutionary algorithms all at once to create levels. Our system would take inspiration from collaboration tools such as LodeEncoder \cite{bhaumik2021lode}, RLBrush \cite{delarosa2021mixed}, and Roblox (Roblox Corporation, 2006). This would broaden the scope and possibilities of level design and development even further to allow more creativity and evolutionary progress within the system. This collaboration setting will open multitude of interesting problems to investigate such as authorship. 

Outside of the `Baba is You' game, we would like to propose the development of an open-source framework to allow mixed-initiative crowd-sourcing level design for any game or game clone. Such games could include Zelda, Pacman, Final Fantasy, Kirby, or any other game as long as we have a way to differentiate between levels mechanically and we can measure minimum viable quality of levels. Adding more games to the mixed-initiative framework would allow an easier barrier of entry to players who may have been unfamiliar with the independent game `Baba is You' but is very familiar with triple-A games produced by companies such as Nintendo.

We would like to also propose a competition for the online `Keke' solver algorithm for the challenging levels. In this competition, users would submit their own agent that can solve the user-made and artificially created `Baba is You' levels. Ideally, this improve the solver of the `Baba is Y'all' system but also introduce a novel agent capable of solving levels with dynamically changing content and rules - an area that has not been previously explored in the field. Development for this framework for this competition has already begun at the time of writing this paper.

Finally, we would like to propose the creation of a fully autonomous level generator and solver that can act as a user to our system. 
This generator-solver pair would work parallel to the current system's mixed-initiative approach, but with a focus on coverage to exhaustively find and create levels for every combination of mechanics. With a redefined fitness function and updated solver (possibly from the Keke Solver Competition,) this could be more efficient than having users manually submit the levels, while still using content created by human users to maintain the mixed-initiative approach. 

There are many new directions we can take the Baba is Y'all system and the concept of crowd-sourced collaborative mixed-initiative level design as a whole and this project will hopefully serve as a stepping stone into the area and provide insight on how AI and users can work together in a crowd-sourced website to generate new and creative content.

\section*{Acknowledgment}
The authors would like to thank the Game Innovation Lab, Rodrigo Canaan, Mike Cook, and Jack Buckley for their feedback on the site in its beta version as well as the numerous users who participated in the study and left feedback.


\ifCLASSOPTIONcaptionsoff
  \newpage
\fi


\bibliographystyle{IEEEtran}
