Colandr 2.0 – An Upgrade for Open Source Evidence-Based Research Synthesis Software with Machine Learning Support

31 Jul 2023 (modified: 01 Aug 2023)InvestinOpen 2023 OI Fund SubmissionEveryoneRevisionsBibTeX
Funding Area: Critical shared infrastructure / Infraestructura compartida critica
Problem Statement: Decision-making and public debate are best served when policymakers have access to the best current evidence on an issue. ‘Evidence synthesis’ is the process of bringing together information from a range of sources and disciplines to inform debates and decisions on specific issues. As evidence synthesis is a time-intensive exercise, DataKind built and released ‘Colandr’ in 2017 to bring the power of data to bear on this challenge. Originally built by DataKind volunteers and our partners, Colandr is a free, web-based, open-source web application for conducting evidence reviews, and can be used by collaborative teams of any size and provides an organizational structure to manage information throughout the entire evidence review process. As of 2023, Colandr has an average of 1000 daily active users, with nearly 7,000 unique users producing over 4,000 scientific reviews. Colandr also hosts an established and growing user developer community (https://www.colandrcommunity.com/) where users have built supplemental software products on top of the Colandr core base. An Invest in Open Infrastructure grant would allow DataKind to move Colandr into a new era of usability, GDPR compliance, and stability so it can continue to grow as a critical tool and inform debate on how to tackle the world’s greatest challenges.
Proposed Activities: In this project, we plan to significantly refactor and update all aspects of the Colandr platform. The front-end web UI will require UX and Javascript, CSS, and HTML skills. The back end will require multiple data science skills including natural language processing, search optimization, and data labeling. Additionally, there will need to be resources devoted to optimizing storage, both data and document, and content delivery across a large geographic user base. Working with Colandr's partner, the Center for Biodiversity and Conservation at the American Museum of Natural History, DataKind has compiled a list of the improvements the Colandr web application needs. Our first task is to tackle the 30+ issues we’ve identified, representing a significant refactor and rebuild including no-code to extensive programming solutions, to improve the functionality of Colandr and bring it into General Data Protection Regulation (GDPR) compliance (Colandr has built a significant European Union user base and is not currently compliant with the GDPR, which limits its ability to fully serve the scientific community in the EU). Our aim is to upgrade Colandr in the following key areas: New GDPR compliance features New and improved user profile and account experience Improved screening function for uploaded source materials Improved upload functionality for source materials Improvement in data extraction performance and data extraction accuracy Improved import and export capabilities New interoperability features for related systems New ability to donate to the upkeep of the Colandr web application A final goal for this phase of the project is to significantly improve the current documentation and code repository to adhere to best practices for open-source projects seeking contributions from a robust developer community. It is clear that the Colandr community has an interest in supporting the growth of the Colandr application both financially and by contributing code. We anticipate this project would require 18 months to complete, with a 1 December 2023 start date
Openness: Colandr is an open-source, open-access tool. When created, it was designed to be a commercial-competitive, yet free, tool to democratize access to this type of synthesis technology - technology almost only accessible to those with academic credentials. Colandr has been a project highlighted through Hacktoberfest, Digital Ocean’s annual engagement for developers on open source projects. Furthermore, Colandr has a highly engaged community, including developers who have built on top of the Colandr core code base to create supplemental product solutions. As a key part of open research infrastructure across government agencies, nonprofits, academia, and enterprise, Colandr is licensed permissively (MIT license), with an open code repository. Philanthropic investment in refactoring and expanding Colandr will allow us to continue to keep this tool free and open source. This funding will also allow DataKind to submit and manage Colandr as a Digital Public Good as recognized by the Digital Public Goods Alliance. DataKind will engage the broader user community in developing the Colandr product backlog to ensure that the changes being made are reflective of community interests and needs. DataKind will also formalize a Colandr advisory committee to reflect community needs and interests. DataKind will also offer community engagement in multiple ways - through ongoing training opportunities and evergreen training materials.
Challenges: DataKind and collaborators see few challenges in terms of carrying out this work, and have worked to mitigate those concerns as follows. Our chief concern is obtaining funding, and receiving the Invest in Open Infrastructure grant would alleviate a significant amount of this concern and allow us to proceed confidently in solution creation. Our secondary concern is the potential impact on our user community, to that end, we will work with key stakeholders to craft a messaging plan, and to conduct any production environment change-over in a low-utilization time, such as over a holiday or weekend time period. Since its launch in 2012, DataKind has mobilized a global community of 20,000+ skilled volunteer technologists and staffed an in-house technical team. This collective force has successfully completed 350+ projects and delivered more than $35 million in pro bono services. Our ability to augment an in-house technical team by leveraging a global community of gives us confidence in our ability to be successful in this project.
Neglectedness: DataKind previously sought funding for Colandr 2.0 through RFPs hosted by Fast Forward Labs and AWS. Neither applications were successful, and neither funder were willing to provide us with specific feedback on the application.
Success: Success for this proposed work will be measured in three key ways. First, the completion of the activities - Colandr has not had a significant refactoring and release since it was launched nearly six years ago, and improving and modernizing the codebase will be a key deliverable. Second, through a satisfaction survey of users, we expect that the proposed work will lead to significant improvements in the user experience and we will survey users to understand how they view the changes. Finally, we will measure both new users and repeat users in the six months following the release of Colandr 2.0.
Total Budget: USD $25,000
Budget File: pdf
Affiliations: Center for Biodiversity and Conservation at the American Museum of Natural History, World Wildlife Fund, University of Dresden
LMIE Carveout: No The Colandr tool was designed to make a computer-aided research function - evidence synthesis - accessible to researchers regardless of economics. It is a tool designed to democratize access to science and the scientific research process by design as a free and open-source tool. It was launched at a conference held in Colombia and was recently highlighted at ICCB 2023 in Kigali, Rwanda. Colandr boasts a robust global user base including researchers (academic and practitioner) from LMIEs.
Team Skills: Caitlin Augustin, Ph.D., Vice President, Product and Programs at DataKind was part of the initial team that built Colandr in 2017 and is an active team member at DataKind, her knowledge and community connections will be vital to the success and expanded adoption of the project as Colandr 2.0 is released. Burton DeWilde, PhD, Senior Data Science and DevOps engineer at DataKind was part of the initial team that built Colandr in 2017 and is a longtime and committed consultant to DataKind. His skills and history with DataKind are essential to the success of Colandr 2.0. Larry Kilroy, Head of Technology at DataKind has over 20 years of experience building free and open-source products at nonprofit institutions. Larry’s front-end and back-end skills will be key to ensuring the launch of Colandr 2.0 Samantha Cheng, Ph.D. Dr. Cheng is DataKind’s key external collaborator and is a senior biodiversity scientist at the World Wildlife Fund. Dr. Cheng was part of the initial COlandr development team and has been a key champion of Colandr throughout the years. Dr. Cheng’s engagement will ensure sustained support of Colandr 2.0 in the research community.
Submission Number: 174
Loading