Help! ¡Ayuda! Providing local software documentation in Spanish and other languages

01 Aug 2023 (modified: 01 Aug 2023)InvestinOpen 2023 OI Fund SubmissionEveryoneRevisions
Funding Area: Critical shared infrastructure / Infraestructura compartida critica
Problem Statement: Open Science, a global movement, faces language-based exclusion as most resources are in English. This affects scientists and research software engineers (RSE) working in R, particularly those in Latin America and the Caribbean who are part of the TRACE-LAC and rOpenSci communities. rOpenSci provides community support, standards, and infrastructure to develop, maintain, and publish high-quality open-source scientific software. TRACE-LAC build a high-quality, open-source, and interoperable data toolkit for epidemics analytics – and grow an engaged user community – to inform decision-makers in the response to epidemics in Latin America and the Caribbean. rOpenSci multilingual efforts aim to lower access barriers, democratize quality resources, and increase the possibilities of contributing to open software and science. We successfully piloted our Spanish-language peer review and the localization to Spanish of our comprehensive guide to software development, with Portuguese translation underway. The next step is to develop the infrastructure for R packages multilingual documentation, benefiting TRACE-LAC and non-English speaking global scientists. Not addressing the language barriers could perpetuate exclusion, limiting the engagement and participation of researchers from Latin America. It may hinder their ability to contribute to open software and science, and potentially impede informed decision-making in response to epidemics in the region.
Proposed Activities: R is the programming language used by the rOpenSci and TRACE-LAC communities. Both organizations have a suite of open-source scientific packages that they develop and maintain. Both organizations have joint activities related to developing and publishing these suites of R packages. There is currently no solution for having multilingual documentation in R packages. This project proposes creating, testing, and documenting the infrastructure to have R packages local documentation in several languages. The project is divided into the following activities and stages: ● Develop, in a public repository, a minimal viable R package for rendering local documentation (manual pages) of any other R package and for installing this R package with documentation in a chosen language, ensuring a good developer and translator experience (6 days). ● Create functions, in a public repository, for creating place-holder documentation in another language than English, and automatically create a first translation of it with the DeepL API (3.5 days), which will be reviewed and edited by a person. Using a first-machine translation, we try to automate repetitive tasks but with human supervision. This allows focus researchers' and developers' time on tasks that add value to the software and their documentation. This approach successfully localizes rOpenSci content in books, subtitles, webpage, and blogs. ● TRACE-LAC team schedule release is package Sivirep 0.1 (Sep 2023), package Serofoi 0.1 (Oct, 2023), and package Vaccineff 0.0.1 (Nov, 2023). All these three packages will be ready to use, with complete documentation in English and with at least one user testing before the release. We will apply the translation documentation workflow, including human review of the automatic translation and documentation of the installation process, in at least one of the three packages of the TRACE-LAC epidemics package suite (7 days). ● Register the feedback in the package public GitHub repository during the pilot implementation to improve the package features, documentation, and workflow (7 days). ● Publish the package(s) developed and the package(s) with the multilingual documentation at r-universe (1 day). ● Write and cross-post multilingual (Spanish and English) blog posts about the tools developed and the project results (1 day). All rOpenSci blog posts are shared on social media and our newsletter (2000+ subscribers with a 50% of open rate), and our RSS feed (consumed by other big publication venues as r-bloggers).
Openness: All software packages developed in the rOpenSci suite (https://ropensci.org/packages/) and TRACE-LAC (https://github.com/epiverse-trace) come with an Open Source Initiative (OSI) approved license. rOpenSci also develops and maintains resources, such as books, guides, and lectures, that are made freely available under a Creative Commons license. The software produced by the rOpenSci community enables primary research in various disciplines. Our work has been cited in well over 1000 publications, and our software was downloaded over 15 million times. Our software peer-review system has reviewed and accepted more than 200 packages. Our impact also extends beyond software. Our peer review system motivated the Journal of Open Source Software (JOSS), PyOpenSci, Methods in Ecology and Evolution software papers and has led to changes in practices at federal agencies such as the US Geological Survey and UKgov. We welcome code and non-code contributions from new and seasoned researchers and developers at any career stage, sector, and region. We have a Contributing Guide (https://contributing.ropensci.org/) and a Code of Conduct, which covers how people can get contribute to rOpenSci and how they may benefit. TRACE-LAC presents collaboration guidelines in their GitHub organization. Their suite is also available through rOpenSci’s R-universe platform (https://epiverse-trace.r-universe.dev/builds).
Challenges: The R language cannot currently support multilingual documentation in its packages. We would create a solution allowing documentation and help pages in multiple languages. We anticipate: Technical challenges that we can overcome using our team’s expertise and past successes on related software and workflow development for localization, translation, and multilingual content. Social challenges in making our setup known and simultaneously compatible with the rest of the R package development infrastructure, but we are convinced the pilot usage of the software and workflow by TRACE-LAC developers will help circumvent this difficulty. Future work will involve creating workflows to ensure a more straightforward joined documentation update in different languages.
Neglectedness: We applied for financing for the previous stages of this multilingual publishing project. We received funding from RConsortium (8000) to develop R packages with the functionality to create a complete translation & file creation pipeline from “path to a file in a language” to “path to a file in another language” for Hugo or Quarto content and the creation and update of a DeepL technical glossary. Furthermore, we also were awarded a NUMFocus Small Grant (10000) and CZI funding (5000) to pilot the localization workflow, using the packages (babeldown and babelquarto) developed with the RConsortium grant. We implemented the translation to Spanish of our package development guide and other artifacts of our peer-review process. The funding allows us to pay the reviewers and editors, write translations and localization guides, access the DeepL API, and organize outreach activities like the project's webpage, blog post, and community calls. TRACE-LAC never have funding for working on the localization of their packages.
Success: To have a solution to generate (both render, and kick-start with an automatic translation) multilingual documentation in R packages and to pilot this solution in the TRACE-LAC packages selected for this test. To have the results of the test in the open repositories and in the documentation of the workflow. Publish the packages, workflows and other products of the project. Finish the outreach activities. Have other packages of the TRACE-LAC and rOpenSci suite using our solution and workflow. Have other R related communities using our solution.
Total Budget: 14110
Budget File: pdf
Affiliations: rOpenSci and TRACE-LAC
LMIE Carveout: TRACE-LAC Team is located in Colombia. rOpenSci Team is located in France (Maëlle Salmon), and the rest of rOpenSci’s Multilingual Publishing project is in Argentina and Brazil. TRACE-LAC community members are researchers and software developers from Latin America and the Caribbean. rOpenSci has a global community of contributors, but we are carrying out a series of activities and projects to ensure our research software serves everyone in our communities, with a focus on people from historically and systematically excluded groups who are interested in contributing to rOpenSci and the broader open source and open science communities. The multilingual publishing project is part of these activities, starting with Spanish and Portuguese, for being important languages in Latin America.
Team Skills: Yanina Bellini Saibene, rOpenSci Community Manager, has expertise in community-driven translations (R for Data Science, Teaching Tech Together, The Carpentries’ lessons). Is in charge of the rOpenSci multilingual project. She has expertise in R development, including non-English packages (agricultural sector and educational tutorials). She has leading roles in several communities (R-Ladies, The Carpentries, RConsortium ISC, LatinR). - Maëlle Salmon, Research software engineer with rOpenSci, has expertise in R package development, exemplified in her maintaining rOpenSci package development guide, having created the R-hub blog; and in the infrastructure for building package documentation websites, as a co-author of the pkgdown R package. She was in charge of the technical infrastructure of our guide’s translation: she created and maintains the babeldown and babelquarto packages to respectively translate markup language through DeepL API, and to render multilingual Quarto books. - Zulma M. Cucunubá. PhD in Infectious Disease Epidemiology, professor of Epidemiology at Universidad Javeriana in Colombia. She is co-lead of the TRACE-LAC project Strengthening the tools for the response, analysis and control of epidemics in Latin America and the Caribbean. As part of the TRACE-LAC project, Zulma has lead the development of various R packages aimed at improving data-analytics for epidemic response in the LAC region.
How Did You Hear About This Call: Word of mouth (e.g. conversations and emails from IOI staff, friends, colleagues, etc.) / Boca a boca (por ejemplo, conversaciones y correos electrónicos del personal del IOI, amigos, colegas, etc.)
Submission Number: 195
Loading