Enabling reproducible scientific software environments through collaboration within the conda ecosystem

31 Jul 2023 (modified: 01 Aug 2023)InvestinOpen 2023 OI Fund SubmissionEveryoneRevisionsBibTeX
Funding Area: Capacity building / Construcción de capacidad
Problem Statement: Open research practices have a main aim to make research more reproducible, transparent, collaborative, and accessible. While progress has been made toward publishing data and software, an unsolved issue is creating, versioning, and sharing reproducible software environments hindering [reproducible and open research](https://reproducible.cs.princeton.edu/). Today research analyses can, at most, aim to be [repeatable or replicable](https://bit.ly/3YgnR1d). Software portability (sharing and running the same scientific computing analysis on various platforms) and extensibility (integrating with other software) are not currently possible. As such, reproducible research remains an unattainable gold standard for anyone relying on data analysis or software. [conda-store](https://conda.store/) aims to bridge these gaps through improvements at the infrastructure (enablement), user experience (reducing friction and overhead), and community (normative) levels through: * Low-friction enforcement and adoption of best practices for managing software dependencies, environments, and creation. * Creation of repeatable, versioned, and portable software environments, including dependency provenance. * Generation of reusable and shareable software artifacts that the user can readily reuse. * An intuitive UI for environment creation and management, thus removing friction for researchers and users without deep software engineering or packaging knowledge.
Proposed Activities: This proposal focuses on capacity building by strengthening the project's collaboration, co-creation, and participation practices. We aim to complete the conda-store's transition into [conda-incubator](https://bit.ly/3rSnj5D), an intermediate organization for new projects to gradually transition into a stable conda organization and governance. This incubation will help ensure the long-term sustainability of the project diversification of its contributor and core team. This, in turn, will allow us to support and build capacity on currently understaffed and underserved projects such as conda-build, conda-pack, and conda-docker, thus ensuring the maintainability of tools critical to open source maintainers and researchers. Additionally, we aim to bridge a knowledge and practice gap by generating resources for the open research community centered on creating portable software environments. We'll achieve this goal with the following concrete activities: **1. Integrate conda-store in the conda ecosystem [Month: 01]** * Join the [conda-incubator](https://bit.ly/3rSnj5D) as a “federated project” and adopt the conda governance and community standards. * Create and document formal guidelines within conda-store for when and how to collaborate with upstream projects in the ecosystem. **2. Improvements in maintenance, contribution workflows, and overall project quality assurance with a view to long-term sustainability and maintainability. [Months: 1-2]** * Standardization across conda-store repositories (linting, coding standards, documentation style guide, CI/CD workflows, etc.) * Write documentation for user, contributor, and maintainer workflows for conda-store in alignment with the governance framework. * Enable contributions and community-first development practices through better documentation, mentorship of contributors, and participation in community activities, including community calls and contribution sprints. **3. Development of resources (tutorials, how-to's) for the research community. [Months 3-4]** * Develop and publish educational resources for the research community to create, version, and share portable and replicable computational environments, including guidance on how to best document software environments and dependencies' provenance. * Translations of the tutorials in Spanish. Effective execution of this integration and collaboration proposal will require the following expertise: * Experience-based understanding of community-governed open source development practices. * Deep understanding of open source and open research communities, practices, and current barriers. * Familiarity with Python packaging infrastructure, specifically, the conda ecosystem of tools. * Knowledge and experience with open source technical and community documentation standards and community management principles. * Previous expertise in writing educational resources, and catering to an international audience.
Openness: conda-store, the primary focus of this proposal, is under the permissive [BSD 3-Clause License](https://github.com/Quansight/conda-store/blob/main/LICENSE) for open source software. conda-store was initially developed as a company-backed open source project. However, transitioning to conda-incubator will involve adopting a community-driven and open approach to contribution, communication, and collaboration in keeping with the [conda governance model](https://github.com/conda-incubator/governance) and [Code of Conduct](https://github.com/conda-incubator/governance/blob/main/CODE_OF_CONDUCT.md). The upstream projects we’ll collaborate with in the conda, conda-forge, and conda-incubator organizations are also community-governed projects with [OSI-approved permissive open source licenses](https://opensource.org/) that support open communication and development. Concretely, we will develop the work in this proposal in the open on [GitHub and GitHub projects](https://github.com/orgs/Quansight/projects/41). The team will provide regular updates to the relevant upstream maintainer teams and collaborate closely with the conda contributors’ community throughout this project. In addition to corresponding GitHub issues or pull requests, discussions will occur on the public [Matrix chat](https://matrix.to/#/#conda:matrix.org) and [Discourse](https://conda.discourse.group/) community forums. The educational resources generated in this project will be licensed under a CC-BY 4.0 license.
Challenges: Based on our vast previous experience in building and maintaining open source projects, we anticipate the most significant challenges to be: Building bridges and alignment with other projects in the open source packaging ecosystem. Many tools are already dealing with several aspects of software dependency management and packaging; thus, new tools are often seen as "the new kid on the block" or as another tool in an already saturated market with minimal possibility of being integrated with other tools in the ecosystem. Building and nurturing a sustainable and safe community around the project. We aim to mitigate such challenges and associated risks through the following actions: By transferring conda-store to the conda-incubator organization, we aim to communicate that this is not a single stakeholder project but a project for and by the community. And as such, embracing and adopting the conda guidelines will be a step towards building trust with the community and ensuring we have transparent and equitable processes for decision-making, onboarding new contributors and maintainers, and handling community matters (from outreach to Code of Conduct reports). This is further expanded beyond its immediate contributor and user tier by adopting and practicing guidelines to contribute to upstream projects as a goal for sustainability. As part of the proposal mentioned in the neglectedness section, we have set deliverables around integration with other projects in the ecosystem.
Neglectedness: We recently applied and await results for the [Sovereign Tech Fund's Improve FOSS Developer Tooling Challenge](https://sovereigntechfund.de/en/challenges/), which covers tasks for interoperability of conda-store with other packages in the ecosystem. Such a call focuses more on the technical aspect (implementation) of digital infrastructure and less on capacity building on co-creation. The [NASA ROSES F.7](https://bit.ly/43R9cux) and [CZI's EOSS](https://chanzuckerberg.com/eoss/) grants are available for general OSS development; however, the early-development stage of conda-store and community collaboration-focused work makes the work proposed in this application poorly-suited for these types of grants. Broadly, the packaging ecosystem is underfunded in the Python OSS community (and scientific computing at large). Even though Python is at the forefront of data science and research advancements, packaging tools rarely appear in the software dependency toolchain or are part of broader open source funding calls. In the conda ecosystem, the core conda package manager has corporate sponsorship through Anaconda, but this does not trickle to the rest of the vast packaging ecosystem. Instead, essential supporting tools in the ecosystem, like conda-lock and conda-build are in maintenance mode due to lack of resources and capacity. Successful completion of this project will directly support these projects' long-term sustainability through our direct care and contributions.
Success: During the project implementation, we’ll measure the completion of each activity against the following success criteria: **Activity 1.** * The conda-incubator governance and community standards are followed across all conda-store spaces. * Community documentation for conda-store includes guidelines for upstream collaboration. * The conda-store team has actively participated in community calls and contributor sprints (measured through attendance in public minutes). **Activity 2.** * Conda-store’s issue tracker shows active triage and maintenance during the funded period (using GitHub’s project metrics). * All conda-store repositories have clear standards for coding, linting, and documentation incorporated in the CI/CD workflows. * Community documentation for conda-store includes onboarding guides and user, contributor, and maintainer workflows. **Activity 3.** * There is at least one published tutorial on portable software environments for researchers in English and Spanish. Measuring success and impact in open source communities is challenging because it has wide-reaching, sometimes invisible, ripple effects. That said, over the next few years, we hope for a steady increase in conda-store’s users and contributors and a more active and healthy conda contributors’ community. We can measure these by tracking engagement on project repositories and community forums and actively seeking observation from the core sustainers of the projects.
Total Budget: $8,800.00
Budget File: pdf
Affiliations: Quansight Labs: https://labs.quansight.org/, Conda Organization: https://github.com/conda#the-conda-organization
LMIE Carveout: No, this project proposal does not fit in this category. The conda-store team in the conda organization, who currently work at Quansight Labs (https://github.com/conda-incubator/governance/issues/106,) will be the primary team working on this project, and the majority of the team members work in high-income locations.
Team Skills: The team that would work on this project are the creators and maintainers of conda-store, the open-source project central to this proposal. The team members currently work at Quansight Labs, a non-profit organization with a substantial history of interactions and contributions to many projects within the conda ecosystem and a long and respected history of contributing to other open source projects in the Python data science ecosystem. In addition to development, the conda-store team at Quansight Labs also has expertise working on technical documentation, community engagement, and governance frameworks. These experiences will be critical for the successfully facilitating collaboration within the conda community and creating high-quality resources and guidelines.
Submission Number: 118
Loading