Projects in Open Seeds

Adomas Aleno (Unsplash License)

Participants join Open Seeds program with a project that they either are already working on or want to develop during this program individually or in teams. Project ideas can range from solving technical questions to creating an open data project or report, developing an open source software project, writing an open publication, facilitating community/team culture movements, advancing open educational resources or contributing to other existing projects/communities.

Projects per cohorts

All projects

Name Participants Keywords Mentors Description Cohort Status Collaboration
Evaluating criteria of “best paper” awards from research journals across disciplines Malgorzata Lagisz Open Science, Education, Outreach, Meta-research, Big Data, Data Science, Bibliometrics Juyeon Kim This project aims to create a template for an engaging Open Science course tailored to the needs and abilities of high school students. It will use a catchy premise of learning Data Science to introduce students to good (open) scientific practices and explain the importance of such practices. The teaching materials will be composed of several modules, each including short presentation on key topics, reading, quizzes, and hands-on activities, and a mini-project. For the latter, we will apply Data Science techniques to investigate science itself, specifically a corpus of scientific literature (Big Data, accessed via free Dimensions accounts). Students will plan and conduct meta-science projects on chosen topics. In other exercises, students will track the life cycle of a scientific publication and get into the shoes of one of the researchers. We will explore some of the fundamental issues that modern science and scientists face, including incentives, fraud, questionable research practices and biases, and how science intersects with the broader society and policy. Overall, this course will deepen students understanding of science as a collective activity and a complex system with many players - helping young people to navigate their career in research and develop critical skills for other career paths. 6
Open science spread Dario Basset Alessandra Candian The project consists of finding the right ways (media, news and tools) to spread open science concept and find adepts 6
Open collaborative journal Pol Arranz-Gibert Arielle Bennett, Diego Onna I want to create a platform to publish scientific articles from all science disciplines that would allow: - Open peer review – by sharing reviews with names of reviewers, we ensure a more fair reviewing process - Open post-publication review – allow for people to comment on articles to share their view, experience reproducing results, etc. - Free publication - Random assignation of reviewers to articles – ensures fairness in the publication process, allows junior scientists to participate in the process 6
Development of a Multi-Adaptor Quality Control Kit for Integration with Radiographic Equipment Umar Farouk Ahmad Elisee Jafsia It was reported that “there are over 4000 x-ray machines in Nigeria with less than 5% of them under any form of regulatory control of the Nigerian Nuclear Regulatory Authority (NNRA).There is, therefore, a need for proper Quality Assurance QA testing during installation and control tests at regular intervals as this will further help to reduce repeated radiograph and rejected films thereby saving cost and reducing patient doses. This project is aimed at deterring the level of compliance of QA across radiology centres in Kano State and to also move ahead to develop a robust, multi-adaptor quality control kit for all forms of x-ray equipment. This will help to develop local capacity, improve healthcare and promote wellbeing for all ages of Nigerians in-line with Sustainable Development Goal 3. 6
A FAIRer training module on Open Science Elena Giglia Scholarly communication, Open Science, FAIR data management, training Emma Karoune Objective: to provide a FAIR by design training course on Open Science The project aims at creating FAIRer teaching material on my subject of expertise, Open Science, improving my teaching skills by attending your course on opening up every step of the research cycle alongiside researchers in different fields. 6
Ethical standards and reproducibility of computer models in Neurobiology Susana Roman Garcia ethics, neuroscience, open-research, computer models reproducibility, open source software Siobhan Mackenzie Hall Working with the Alan Turing Institute I want to create a framework that allows questioning of ethical standards and reproducibility of Computer Models in Computational Neurobiology, both specifically within my PhD work, and taking this as an example for others to learn too. -\tMy PhD project work includes working with open source software (MCell, BioNetGen and Biodynamo to create biological predictions of how memory works. It is therefore, both, intrinsically embedded and important that the work I create for my PhD is as accessible and collaborative as possible. -\tI plan to implement the Turing Way (, Turing Commons ( and Data Hazards ( principles, as I think about ethical implications of my work. This way, I will use my PhD as a case study and share it with the wider peer community. -\tExplore these questions by designing seminars and workshops with people knowledgeable in different fields. E.g.,workshop where people with expertise in research of Ethics and people researching in Computational Neuroscience come together. -\tOffering workshops and collaboration cafés where we can work through these topics with an intersectional and collaborative approach, offering different solutions to problems that arise. 6
Open Data Custodianship tool Jennifer Ding civic tech, data collection, urban data science, mapping Malvika Sharan ODC (Open Data Custodians) is an open tool for connecting public domain Github repo data to a DB/API. Inspired by the BigScience Data Governance Framework, this project seeks to empower repo maintainers to do more with their data and create new pathways for responsible data sharing in the open AI ecosystem. 6
Life Science in 2022, India. Nilabha Mukherjea Generation Z researchers, Research Trends, Open Community, Life Science, 2022 Anne Treasure Generation Z stands to become the new entrants into the world of research from 2024. I believe new systems and easier ways of accessing information with more vocal communities are needed to usher in the new generation. But to do so, the new bachelor and high school students must be the recipient of effective scientific communication to help them realize the existing communities and opportunities in life science research in India. As a penultimate bioengineering student, my exposure to existing communities is recent and new. This fact allows me to understand and appreciate the importance of such communities in guiding students with a passion for research in life sciences. Life Science in 2022 India is a concept that seeks to unite parallel enterprises by individuals across the nation under one banner. The goal is to provide a complete and interconnected picture of Life Science research and communities in India. Simultaneously, as a founder of a Bioengineering Chapter in my institution, I will be applying our findings to build and train my colleagues and juniors to have the tools and awareness to choose their next steps in research. Finally, we measure the success of this project through the successful publication of a paper that encompasses the findings and suggestions for building a better and more robust open community of life science research in India. 6
An open support resource for primary mental health for researchers. Shamim Wanjiku Osata Mental health, researchers, Tools and techniques, open resource Mayya Sundukova Mental health among researchers is a huge neglected burden that has significant impact on their lives and the people around them. Majority of the focus on mental health is in capacity building for mental health research or offering sustainable mental healthcare. The lack of conversations around how students undertaking research fail to complete projects or progress beyond certain levels in research shines a bright light on the intensity of neglect on the mental health of researchers. It is therefore paramount to create awareness and avail tools catered to researchers needs. The project focus is to develop an open online support resource for primary mental health catered to researchers particularly scientific researchers. This platform will contain educational materials concerning emotional health which sits at the core of mental health. It will also contain self-assessment tools and techniques for developing mental resilience. In addition to that, the resource will contain a peer support network that will offer help and emotional support. The overall objective is to avail help, progress monitoring and increased understanding of the importance of mental health wellbeing among researchers. 6
Mapping the Open Life Science Cohorts Saranjeet Kaur Bhogal Mapping, Visualisation Esther Plomp, Fotis Psomopoulos This project is about creating an interactive map that locates all the past and present mentors, mentees, experts, and speakers who participated in the Open Life Science (OLS) programme. The map will be created using the Shiny R package. It will help provide the geographical reach of the programme. The aim will be to create a map that automatically updates whenever new information in stored in the database. It will also have an option to display only the mentors/only the mentees/only the experts/only the speakers. If time permits, the project will be further expanded to visually summarise other information like a table with area of expertise of the mentors, experts, and speakers. 6
Developing effective online communities to build tools for genomic equity Marie Nugent community building, crowdsourcing, collective intelligence, engagement, participatory research, genomics medicine, bioinformatics, healthcare, data science, population genetics, diversity, equity, health outcomes Lisanna Paladin, John Ogunsola The Diverse Data initiative at Genomics England intends to build community spaces to: Facilitate effective engagement; enable creates crowdsourcing of tools and collaborative opportunities, and; build relationships and understanding across communities around the complex nature of diverse data for genomic medicine. These communities will involve a range of researchers, clinicians and healthcare professionals, patients, publics and cultural partners. 6
AGAPE: Building an open science practicing community in Ireland Nina Trubanová, Cassandra Murphy, Aswathi Surendran community, introductory course, early career researchers, practitioners, Ireland Sara Villa Under Agape, would like to build a community of open science practice to grow our work up to date in a way that can catalyse changes in researchers’ perceptions of open science across Ireland and later internationally. The first step we underwent was creating a massive online open course (MOOC) by early career researchers (ECRs) for ECRs and other academics. The content of the MOOC is currently under review and is planned to launch in the late summer. We hope to grow the learning experience further by hosting workshops and events where individuals can learn from experts and share their own experiences about open science practices. 6
Open sourcing the Pyxium learning platform Hari Sood education, pedagogy, social justice, online learning Nadine Spychala Over 2020/21 I was working on launching a social justice learning platform called Pyxium: It became apparent after working on it for a while that this should exist as an open, free and community driven platform, rather than a private for-profit enterprise. The codebase is currently private, written exclusively by me. There is a lot of work that needs to be done to prepare it for open sourcing - I see it as fitting into a few different categories: * Technical: Commenting throughout the code, cleaning the code up, clarifying folder/file structure, rewriting confusing functions/variables, ensuring basic security * Collaboration: Documentation, contributing guidelines, code of conduct, role distribution * Governance: Strategy, project management, community roles, contribution process and requirements * Community: Engaging developers (and educators, SJ advocates, designers…) with the project And probably much more! 6
Open Science Community Nigeria (OSCN) Umar Ahmad, Mahmood Usman Open science, Community, Nigeria Caleb Kibet Inspired by the Open Science (OS) movement that focuses on making science more open, improves the quality, accessibility, and efficiency of science and education, Open Science Community Nigeria (OSCN) was established to provide a space that will encourage and promote openness, transparency and reproducibility in science among scientists, network of researchers and societal stakeholders in Nigeria. There are quite a number of some initiatives in Nigeria such as AI Hub, Nigeria, Python Nigeria and R User groups that mainly promote data science and computational sciences. However, none there exist a single community that instilled responsible conduct of research, promote the principles of open science and or establish a guiding principles and core values or policies that support scientists, researchers, and societal stakeholders to meet, inspire and co-create important things together as a community. Thus, we aim to establish and develop an open science community in Nigeria where newcomers and experienced peers interact, inspire each other to adopt Open Science practices and values, identify opportunities and pitfalls, and provide feedback on policies, infrastructure, and support services as does in other regions of the World. We aim to specifically target scientists, researchers and students who are passionate about open science but have little to no experience with open science principles and practices. 6
Developing an ontology to connect open science technology, RSEs and the different skills and services that follow Megan Lisa Stock, Nina Roux, Rhoné Roux, Ariana Bethany Subroyen Ontology, RSE (Research software engineer), Semantic technology, Data science, Artificial Intelligence (AI), Knowledge engineering Kim Martin Our project aims to map the relationship between people, skills and technologies, structured under the AI and data-science landscape, in order to develop an ontology that can assist Research Software Engineers (RSE’s) to better manage their teams and affiliates. It also aims to enable better collaboration within the RSE community and between research institutions. This project is driven by the growing need for RSE’s by researchers, as research is becoming increasingly reliant on data-driven technologies, and within the academic community itself, there is an emphasis on providing reusable, reproducible data which is not a skill that most researchers have. By creating a semantic overview of the interaction between skills, technologies and the key role-players in the RSE community, this ontology will be able to provide information on the research fields that each RSE has background knowledge in, their development skills, and the services that they are willing to provide clients, enabling a way in which suitable RSEs can be assigned to projects where they will be able to provide software development and software-development-related assistance. This project also aligns with the thinking supported by Open Science, as the ontology is aimed to be reusable and reproducible. 6
Mapping the RSSE landscape in Africa Nomalungelo Octavia Maphanga, Anelda van der Walt, Peter van Heusden Research landscape, RSSE, RSE, community Joyce Kao, Anne Lee Steele This project aims to identify African research groups that are heavily involved in in-house research software development and could potentially benefit from being part of RSSE-Africa and the larger global RSE movement. This may, for example, include data science research initiatives at universities, bioinformatics or astronomy groups, computational social sciences and digital humanities groups, and many more. Furthermore, the project will identify open communities of practice related to research software and systems engineering, such as RLadies, Python User Groups, and HPC User Groups that could potentially supplement the support RSSE-Africa is providing. For communities of practice, we will collect information on several attributes, including their focus area, communication platforms, activities offered, and more. This information will be shared in a Google Spreadsheet on the RSSE-Africa website and will be open for the community to contribute to and update as necessary. The RSSE-Africa website will simultaneously be updated to improve content and navigation. The project will build on work done by members of the RSSE-Africa community and the Research Software Alliance (ReSA). 6
Open Communities: building a supportive community of practice across the AI for Multiple Long Term Conditions Research Support Facility. Eirini Zormpa, Sophia Batchelor community building, training, health reseach, artificial intelligence, collaboration, reproducibility, reprohacks Lena Karvovskaya The AI for Multiple Long-term Conditions (AIM) research programme consists of seven consortia, all working with artificial intelligence methods to understand multiple long-term medical conditions. The Research Support Facility (AIM RSF) is a collaboration aimed at supporting those consortia. Our project aims to foster a community formed from the AIM RSF and seven AIM consortia and support collaboration from three core approaches: - supporting the upskilling of researchers so that their work may meet the highest ethical, technical, and reproducibility standards - ensuring that existing technical and domain expertise is well-documented and understandable to a wide audience - leading by example and empowering community members to champion open and collaborative research practices. We propose to do this through hosting workshops and events centred around collaborative work. We will start by collating and contributing to training materials around open science and reproducibility and supplement with workshops on tools that support openness, collaboration, and reproducibility. We aim to host regular “Collaborations Cafe” sessions, where community members exchange knowledge, propose projects, and get feedback. Ideally, our project would culminate in a ReproHack, where teams check the computational reproducibility of research, provide feedback, and further develop their technical skillset. 6
Building a Platform for Open and Reproducible Super-resolution Imaging Hardware and Analytical Tools Ran Huo, Moritz Engelhardt Bioimaging, super-resolution, open hardware, open microscopy, image analysis Sara El-Gebali Super-resolution microscopy (SRM) bypasses the diffraction limit and makes the nanoscale visualization of subcellular structures and dynamics possible. Yet the complexity and high expense of SRM setups often obstruct access to the sub-diffraction information for biologists. As a team working at the interface of optics and cell biology, we are motivated to build up an online platform that documents our homemade, powerful, and cost-efficient super-resolution imaging systems and analytical tools that are currently running and under development in our group, as well as to contribute to the expanding global community of open science. The proposed platform will provide adequate information for researchers in need of reproducible SRM. It can benefit cell biologists who are interested in advancing their research with the assistance of super-resolution imaging techniques, such as single-molecule localization microscopy (SMLM) and super-resolution optical fluctuation microscopy (SOFI) that our team focuses on, but lack the experience in selecting optical components, constructing light microscopes and processing the data acquired. We plan to build the platform around three main topics: SRM open hardware (including the open control software), image analysis tools, and an educational core around optics. 6
OpSciHack: Open Life Science Hackathon Harini Lakshminarayanan Open Science, Implementation hurdles, Hackathon, Life Science Yo Yehudi OpSciHack is an open science-focused hackathon that we, Open innovation in life sciences - OILS would like to organize as an annual event. The hackathon aims to solve problems in open science and open science problems. In each iteration of the hackathon, we would like to focus on one pillar of open science (OS) and develop solutions to encourage a culture of OS in research communities. At the hackathon, participants can either come up with their own problem statements or chose one to work on from our list. The questions will focus on: (1) difficulties that prevent them from fully implementing and practicing some or all aspects of OS in their science networks/ institutions/universities, and (2) problems that can be solved through OS. OILS will provide support to the participants by tapping into the OS network in Switzerland and introducing them to necessary experts and tools. An example of such a problem could be the absence of a searchable data repository in their university that contains links to all published datasets from the participating research groups - such an infrastructure would encourage the generation and sharing of well-annotated data. At the end of the event, ideally, the participants would have actionable steps and solutions that either can be directly implemented by them or proposed to competent authorities. Starting out at the swiss national level, we hope to grow OpSciHack involving global participation. 6
The hub portal and academy Laurah Ondari, Pauline Karega, Gladys Rotich, Ken Mugambi Stephane Fadanka The Hub portal and academy seeks to create a curriculum and a data portal targeting young researchers and incoming biology students to be equipped with data management skills, project planning skills and basic bioinformatics skills. Pauline Karega has already kickstarted the hub portal where we hope to gather bioinformatics interested undergraduates and connect them to a platform where they can interact and get connected to opportunities which will be automatically gathered from sources such as twitter using keywords and integrated to the backend of the hub portal at BHKi. Initial value proposition will be create an academy that gives basic scientific research training and skills to these students who will be embarking on their research projects in the final year of study; we hope to equip them with basic programming skills that will help in analysing biological data and guide them through designing and planning a research project. Eventually we hope to evaluate the uptake of knowledge through hackathons and collaborate with high ranking institutions for our mentees. 6
Making Science simple and accessible Romullo Lima, Fabio Ivo Perdigão Online platform, compendium, didactic schemes, didactic figures, multidiscipline Andrea Sánchez Tapia, Gracielle Higino The idea is to create an online platform that hosts any sort of didactic material. It can be a site that gather in an organized way different educational materials to be used by university professor or schoolteachers with their students. At first it can be a link aggregator from materials dug from internet, but in a long term the site will be open to researchers to upload their material editable enough that others can translate to their own language. This material will be accessible so different professionals can download it and they will be encouraged to rate and comment about the material used. To accomplish this, the idea needs at least two moderators to approve the material uploaded, an curator to organize the site, and a digger to find links and people that wants to upload their material. 6
Preparing IceNet stakeholder engagement framework for open and collaborative development Alden Conner polar science, sea ice, conservation, climate change, environmental science, product management, product roadmap, stakeholder map Mallory Freeberg As part of a NERC proposal that would fund three years of further work on the IceNet project, I have proposed a work package called “Demonstrating and deploying real-world solutions”, with the first part of that work package consisting of stakeholder engagement leading to creation of product roadmaps for both the IceNet forecast and the supporting digital infrastructure. I would like to outline a plan for performing this stakeholder engagement openly via the GitHub repository, leading to collaboratively-generated user requirements for the final products. I will consider how to reach stakeholders and engage them in open discussion on GitHub, and how to organise that work to generate a mutually agreed-upon set of user requirements. Stakeholders will include scientists such as polar researchers and AI researchers, who will contribute to requirements for the forecasting capabilities, as well as additional end-users such as conservation researchers and indigenous communities, who will help shape software requirements. 6
An extensible notebook for open specimens Nicky Nicolson biodiversity informatics, species description, specimen citation, research management, record linkage, document production Andrea Sánchez Tapia, Batool Almarzouq This project is developing a prototype “extensible notebook for open specimens”. This is a link-aware editor for semi-structured data based on personal knowledge management software (Obsidian). This environment plus standard open science tools (reference management tooling and pandoc document production) could help the adoption of open science principles amongst biodiversity researchers. The project is split into three main areas of investigation (effort so far has been focussed on the first): 1. Working environment: can we extend personal knowlege management software to reference biodiversity-relevant data classes (in a similar way to how bibliographic citations are managed) - We have developed a set of Obsidian plugins which facilitate easy access to the data resources needed to (a) work with existing species descriptions from literature and (b) recognise and formally describe new species. (Entry for the forthcoming Ebbe Neilsen challenge) 1. Review environment: can we generate snapshots for peer-review /publication 2. Publication environment: can we package data for harvesting into data aggregators We aim to enable researchers to develop the “digital extended specimen”, but without being prescriptive about their workflow: open to access and publish the necessary data - but also open to choose how to organise their work. 6
Multiomics profiling and analysis of cardiovascular diseases Rushda Patel ethics, neuroscience, open-research, computer models reproducibility, open source software Hans-Rudolf Hotz Cardiovascular diseases are the leading cause of death around the world and account for nearly 32% (2019) of all global deaths. The advancement in omics technologies over the recent years has provided a deeper understanding of the molecular processes and dynamic interactions involved in diseases which have helped in identifying various diagnostic, and prognostic biomarkers along with therapeutic targets. In my project, I aim to build a tool to carry out multi-omics analysis and profiling encompassing major CVD diseases using data from relevant datasets. The biological interests of this tool would be the identification of differentially expressed genes, proteins, metabolites, and transcripts, progression-associated genes, SNPs, pathways and networks involved, survival analysis, and small molecule identification. It would also include a user-friendly repository of data from research articles which would be easy to navigate This tool would be a one-stop solution platform for CVD multi-omics that would help researchers in drug discovery, unveil disease mechanisms, identify biomarkers, 6
The Undergraduates Guide To Research Software Engineering Aman Goel research software engineering, open science, open education, community Mariana Meireles The Undergraduate’s Guide To Research Software Engineering aims to provide an open-source, dynamic, and accessible collection of resources on Research Software Engineering to undergraduates and newcomers interested in knowing more about the field. The project aims to develop resources majorly around the following four areas: 1. Information and Background of Research Software Engineering * This area would cover all the necessary context, history, background information, and the current situation of the Research Software Engineering movement across the world. 2. Training and Education Resources * This area would cover all the necessary resources and materials to help develop skills required by a Research Software Engineer. * It would cover existing resources as well as could be expanded to include new material. 3. Job Board for Entry Level Positions * This area would primarily provide entry-level job listings in the field of Research Software Engineering and Open Science to lower the entry barrier for newcomers. 4. Support and Community Engagement Resources * This area would provide support to newcomers in the form of open access community platforms as well as aim to provide help from experts on a case-to-case basis. 6
Developing policy briefs on mental well-being of researchers in academia across different countries Mayya Sundukova Natalie Banner 6
Bioinformatics Secondary school Outreach in Nigeria Emmanuel Adamolekun Bioinformatics, Students, data analysis Michael Landi Bioinformatics Secondary School Outreach (BSSO) is an initiative to develop bioinformatics capacity among High school students in Nigeria and this will create early interest in genomics data analysis among the students and equip them with the relevant skills and knowledge in Bioinformatics. Bioinformatics Hub Nigeria will be training these students on how to use Bioinformatics tools and pipelines and this can be achieved by establishing Bioinformatics research clubs in the visited schools to facilitate the trainings. We would be working alongside with other sister organizations to achieve this goal 6
Collective and Open Research in Climate Science Angelo Varlotta Planetary Sciences, Biology, Chemistry, Global Changes Sara Villa The idea is to facilitate the creation of an environment (a working group or a community, eventually) where researchers, students, citizen scientists can collaborate on projects, research projects, articles or just share their work based on open science principles. There are examples of open research environments in the scientific community, in particular in the machine learning field. Examples are mlcollective, Machine Learning Tokyo and Neuromatch Academy. Those communities run by volunteers created a friendly environment and study groups for machine learning practitioners and also curricula to introduce students to data science. In e.g., the mlcollective effort led to the publication of many projects. Eventually, these examples could be used to create or simply design (white paper/report) a similar project in other research areas. 7
Implementing Facial Recognition and Biometric Attendance Monitoring in Educational and Corporate Settings Richard Dushime Attendance Tracking, Facial Recognition, Biometric Technology, Educational Settings, Corporate Settings, Data Management, Data Privacy, Ethics, Machine Learning, Artificial Intelligence, Python, Django, Web Development, Cloud Computing, Data Visualization, Project Management Yo Yehudi This project aims to design and implement a system for tracking attendance using facial recognition or biometric technology in educational and corporate settings. The project will begin by reviewing current attendance tracking systems and analyzing various facial recognition and biometric technologies. Based on this analysis, a suitable technology will be chosen and a prototype system will be developed. The project will also consider the ethical and privacy implications of using such technology and develop strategies to mitigate any potential risks. The end result will be a functional system for tracking attendance and a report detailing the technical, ethical, and privacy considerations involved. This system will provide a reliable and efficient solution for tracking attendance in educational and corporate settings while addressing ethical and privacy concerns. 7
From invisible to Citizens: Apparent Age for Primary School children without birth certificates Elisee Jafsia, Stephane Fadanka, Nathanael Kedmayla, Yanick Diapa Nana, Hylary Emmanuela Ndegala Nhana, Babari A Babari Michelle Freddia Apparent Age, Children, Process, Determinants, Machine Learning, Birth Certificate Julien Colomb Ages fraud scandals have tarnished African sport at the international level. In 2009, FIFA adopted MRI as an age determination tool for its youth level competitions but recent studies show that bone age assessment has a very large margin of error and cannot be a standard for determining apparent age when we know that environmental factors greatly influence the development of the child. There are populations in which the disappearance of growth cartilage is 4 to 5 years earlier than in another. The ugly face of the problem is that many children are born out of the healthcare system. In Cameroon, approximately 1.7 million children, or 66 %, do not have a birth certificate. These children are said to be “invisible” In other words, they live without being recognised by the country whose citizenship they claim. According to a study carried out by Unicef, 40,000 children, in the Far North region alone, will find it difficult to present their end of year First School Leaving Certificate (FSLC) for lack of a birth certificate. To establish these birth certificates, it will be necessary to make an estimate of their apparent age in a complex process which discourages parents. It is evident that the determination of apparent age should not only be seen from a clinical standpoint. Instead, socio-cultural and environment factors must also be taken into account. To remedy this situation which has reached the alert level, it is urgent to propose a simplified and open system for determining Apparent Age to turn invisible kids to visible ones based on available Data and Machine Learning techniques. 7
Development of an online open science platform, easy and accessible for agricultural students in Cameroon. Lessa Tchohou Fabrice Open Science, Agriculture, Cameroon Gladys Rotich The main aim of the project is to contribute to the quality and adaptable research work through the practice of open science in Cameroon. Specifically, it will be to: - Develop a platform that will connect researchers, students, potential funders and research institutions; - Empower students with skills like research methodology, proposal writing (in the context of Cameroon), and grant writing and management; - Provide job placement opportunities for fresher 7
Data analysis in soil physics: an opportunity of teaching data management and reproducibility Sara Acevedo, Carolina Giraldo Olmos Soil Dataset, Soil Data Reproducibility, Soil Physics Diego Onna The core idea of this project is to create an online repository that hosts as well as examples, datasets and demos for soil science data management. Stakeholders will include early soil scientists who are not familiar with programming, task automatization and open science principles. Overall, we plan to develop online materials that help students to understand how open science + use of scripts + soil science can be merge and create a skillset that is essential in research. 7
An interactive map of socio-ecological projects Nahuel Escobedo Socialecology, Sovereignty, Ecology, Enviromental Yo Yehudi We seek to bring knowledge to the general public and provide tools for change in order to generate a natural-human environment that is beneficial to both parties. The dissemination of socio-environmental and biodiversity conservation projects, actions and problems is a key point to achieve this objective. Today, with so much information available, but at the same time dispersed in different platforms, there is more than ever a need to concentrate and centralize it, so that it is easily accessible. In this text the fundamental concepts that define us will be explained and we will present the first version of a web tool, which manages to provide information on different socio-environmental projects and detail them in an Interactive Map idealy managed by contributing users. 7
Knowledge and perspectives of open science among researchers in Ghana, West Africa Bernard Kwame Solodzi Open Science, Os, Open Access, Low Middle-Income Countries, Ghana, Unesco, Open Science Knowledge, Open Science Challenges Johanssen Obanda Open Science in Africa: The Ghanaian narrative. UNESCO and its stakeholders perceive the Open Science concept as an essential component to advancement of human development through increasing accessibility of the scientific process. To that end, UNESCO has adopted the Open Science recommendations in 2021 to define shared values and principles for OS, identify concrete measures towards OS and provide an international framework for OS policy and practice. However, uptake and adoption of open science in Africa and for that matter, Ghana, has been low, denying researchers and the country in general benefits that the practices accrue. The goal of this project is to assess the knowledge base, understanding and perception of open science in the Ghanaian research space while considering the challenges faced by researchers practising open science. Findings of this study will help establish gaps in practising open science by researchers and advise areas of intervention. 7
Open Science Awareness Project for Central Asian Universities Saule Anafinova Open Science, Central Asia, Developing Countries Saranjeet Kaur Bhogal I would like to conduct a series of open science awareness training on Zoom in frames of which I would like to analyze the developing context of Central Asia and how open science responds OR does not respond to the regional science development needs. I would like to share my findings with Central Asian colleagues. 7
Open Science platform for humanities and computational social scientists Aditi Dutta Open Research, Open Data, Accessibility, Reproducibility, Humanities, Social Science, Computational Social Science Riva Quiroga, Laurah Ondari Open Data (OD) plays a crucial role in openly sharing scientific knowledge with a worldwide audience (among both academic and non-academic audiences) and encouraging reproducibility. But OD is much less developed in Humanities and Social Science (HSS) research, primarily because of the levels of sensitivity of the data involved, and consent required from participants, which could be difficult for researchers, especially if they are new to data sharing. With the growing interest in computing, researchers working in Computational Social Science (CSS) face similar issues when working with social data in their research, while conforming to legal and ethical restrictions. As per OD guidelines, data must comply with the FAIR principles. But the metrics offered by various OD repositories and publication platforms could be different, which limits the affordances of social activity and the type of OD published. To embrace Open Science projects among HSS and CSS researchers who are working or looking to work with sensitive data, this project proposes a platform (or hub) to promote free, online resources for both professionals and students looking to learn about science and ethics. Here, they can also contribute to discussions related to the topic, post questions, and learn from others’ experiences. 7
Open NeuroScience Arielle Bennett Neuroscience is a vast field of research, ranging from single – cell recordings in animals and primates to electrophysiological and behavioural research in humans and work with patients suffering of brain diseases. This big and diverse field has the unifying goal of studying the underlying structure and function of the brain and uncovering its secrets. Each of the areas in neuroscience use a variety of research practices and while the results they yield are qualitatively different, their ultimate integration towards answering the big ‘how the brain works’ question, is of paramount importance. Open science has a big role to play in this exchange and integration of information. Towards that end I would like to examine closely the degree that the practices of Open Science are used in each of the main areas of Neuroscience and write a scientific report. This report will include literature review, examples of specific cases that Open Science practices are already being used as well as challenges that researchers may be facing while applying these practices in a day to day basis. Finally, the report will conclude with final remarks and future directions. 7
Translate Science Daniel Chan, Jennifer Miller Translation, Database, Academic Culture, Publishing, Governance, Global, Multilingual Ailís O’Carroll Translate Science is an open volunteer group interested in improving the translation of scientific texts. The group has come together to advocate for and support work on tools, services, and policy for translating science. This encompasses a range of activities to help translations including: providing information, networking, designing and building tools, and advocating for seeing translations as valuable research output. A core activity has been the development of the Translation Switchboard, an open source web application to discover scientific translations. The members of Translate Science are from different backgrounds and motivations, but we are interested in collaborating. For our group, the term “scientific texts” has a wide spectrum of forms and can mean anything from articles, reports and books, to abstracts, titles, keywords, and indices. We also consider summaries of research for non-research audiences as scientific texts. The project we would like to undertake within the OLS program is to define the scope and mission of the organization and develop shared governance that will have sustainable legitimacy. This project has taken on new urgency with the passing of the organization’s founder, Victor Venema. 7
Ethical Implications of Open Source AI: Transparency and Accountability Gift Kenneth Ethics, Open Source Ai, Transparency, Accountability, Bias, Machine Learning, Governance, Data Privacy, Algorithmic Fairness, Open Data, Responsible Ai, Explainability Isil Poyraz Bilgin Exploring the ethical implications of open-source AI is a topic that deals with understanding the potential consequences of making AI systems and their underlying data and algorithms available to the public. One of the main ethical implications of using AI is that it can potentially lead to bias and discrimination, as the data and algorithms used to train these systems may reflect societal biases. Additionally, open data could also raise concerns about data privacy, as the release of sensitive data could lead to negative consequences for individuals and organizations. However, open source AI can also be used to promote transparency and accountability in AI systems. Making the data and algorithms used to train AI systems publicly available, allows for greater oversight and monitoring of these systems to ensure that they are not being used in unethical or harmful ways. Additionally, open source AI can also foster community development and collaboration, allowing for transparency, a wider range of perspectives and expertise to be brought to bear on the development and use of AI. In summary, open source AI has both potential benefits and drawbacks and it is important to consider the ethical implications before implementing it. Therefore, responsible AI governance and social responsibility are crucial. 7
Open-source object recognition software specifically designed for mice in Alzheimers disease Amir Jafari Nina Trubanová Here are the general steps to developing an open-source object recognition software for mice in Alzheimer’s disease studies: 1-Gather a diverse dataset of images of mice commonly used in Alzheimer’s research. 2-Preprocess the dataset by cropping, resizing, and converting images to a consistent format. 3-Train a CNN model using the preprocessed dataset with libraries like TensorFlow or PyTorch. 4-Evaluate the model’s performance on a test set of images to determine its ability to recognize mice in different scenarios. 5-Integrate the model into a software application for analyzing images of mice in Alzheimer’s disease studies, including object detection, tracking, and annotation. 6-Test the software on a diverse set of images and make necessary adjustments. 7-Share the code and dataset on an open-source platform such as GitHub for other researchers to use and improve. To the best of our knowledge, this is a almost exact overview of the process and the actual development that may require more steps and resources depending on the specific requirements. 7
Open educational hands-on tutorial for evaluating fairness in AI classification models Mariela Rajngewerc Training, Educational, Hands-On Tutorial, Machine Learning, Fairness, Bias Analysis, Classification Andres Sebastian Ayala Ruano, Sabrina López The main objective of this project is to develop a hands-on tutorial for evaluating machine learning models fairness in different contexts. Specifically, to generate open-source class material (pipeline, slides, code) and a trainer user guide to develop and reproduce the tutorial. It is expected to be a ~3 hours class where the tutor presents different datasets and different approaches of fairness. The analysis will focus on understanding which definition of fairness is appropriate in each context; to understand the relations between the selected evaluation method and the implications they could have to disadvantage groups when deployed. On one hand, the focus will be on alerting and understanding how to evaluate fairness and, on the other hand, how to do it using open-source libraries. The code part will be developed on Python using mainly Scikit-Learn and Fairlearn open-source libraries. 7
Comparison of three growth curves for the diagnosis of extrauterine growth retardation at 40 weeks postconceptional age and associated factors in preterm infants in the city of Yaoundé, Cameroon Sandrine Kengne Preterm Infants, Growth Curves, Evaluation, Extra-Uterin Growth Failure, Associated Factors Elisee Jafsia Extra-uterin growth retardation (EUGR) is defined as weight or height below the 10th percentile relative to the mean for the child’s age and sex at 40 weeks post-conceptional age. The objective of this work is to analyze the applicability of growth curves (Intergrowth -21, Fenton and WHO) for the assessment of extra-uterin growth at 40 weeks post-conceptional age and the factors associated with extra-uterin growth retardation in preterm infant in Cameroon. This will be an analytical study of all low birth weight preterm mother-infant pairs present in referral hospitals in the city of Yaoundé whose mothers have given their consent to participate in the study between June 2022 and March 2023. For the diagnosis of EUGR and its associated factors, we will first take the anthropometric parameters of the children at birth at 36 and at 40 PCA as well as information on the parents or their environment. At the end of this analysis, we will be able to choose the appropriate curve for the evaluation of the extra-uterin growth of preterm infants with low birth weight in Yaoundé and to determine the factors associated with the EUGR. 7
Euro-BioImaging Scientific Ambassadors Program Beatriz Serrano-Solano Imaging, Microscopy, Community, Ambassadors, Champions, Community Building Joyce Kao Euro-BioImaging ( is a research infrastructure that offers open access to imaging technologies, training, and data services in biological and biomedical imaging. Euro-BioImaging consists of imaging facilities, called ‘Nodes’, that have opened their doors to all life science researchers. There are informal community leads within the nodes’ staff and users that often raise awareness about Euro-BioImaging and the provided services. However, there is no official program at the moment that acknowledges them as formal ambassadors. This project aims at launching a Scientific Ambassadors Program for Euro-BioImaging, which will officially reward such contributions and encourage new leads to emerge in the community. 7
momenTUM Research Platform: An open-source, reproducible research infrastructure for digital health Manuel Spitschan, Anna Magdalena Biller, Marco Ma Digital Health, Open Source, Smartphone App, Reproducibility, Experience Sampling, Field Studies, Psychology Rowland Mosbergen Over the past year, we built the new research platform momenTUM, which consists of an iOS/Android app for delivering interventions and questionnaires to participants in research studies and server software to configure trials, accept data and register them into a REDCap server. Based on the previously developed schema-app, the app uses reproducible JSON files to specify the timing, order and properties of interventions and questionnaires sent to the end-user device. We are beginning to use momenTUM in our research studies examining human sleep and circadian physiology and believe it will be valuable to the scientific community. Currently, while there are various commercial and sometimes pricy solutions to design smartphone-based interventions, we believe that developing an open-source solution is the right thing to do. Our diverse team – consisting of two software developers and two researchers – shares this ethos. Through our participation in OLS-7, we will advance our project, grow our network and learn how to build a community around the project. 7
Investigating effective strategies for radical inclusion at academic events Siobhan Mackenzie Hall, Daniel Kochin, Carmel Carne Inclusion, Awareness, Access, Intersectionality, Equal Opportunity Malvika Sharan We are a group of researchers from the University of Oxford looking to establish a discussion about pushing the boundaries of what it means to host an inclusive academic event. This discussion cannot happen in a vacuum and we are mindful of our limited experience and expertise in this space. As such, we intend to take a proactive stance and give voice to thoughts, opinions and lived experiences of conference organisers and attendees with inclusivity concerns. We intend to work towards a publication which is a compilation of our investigation into multiple mediums for collecting viewpoints, debate and strategic implementations. As a first step in achieving this aim and an effective pilot of the discussion points, we have recently hosted our first round of focus groups where we addressed the following topics: * Who are we not seeing at conferences, and what would it actually take to get them there? * Implementation of considerations for constantly changing and sensitive geopolitical issues? * Support for visas and other regulatory considerations * Enforcement of harassment policies and embedding of harassment prevention methods at the earliest planning stages Through this endeavour, we connected with interesting and passionate people. We set up a mailing list where we share regular updates on the ongoing work, and excitingly, we have a list of people potentially interested in collaborating on this project. We hope to draw on their perspectives and experiences in developing a publication and any other potential outputs that will help incite real, tangible change in the space of academic events. 7
Queering the law: How to make AI with a Queer Perspective from the Majority of the World Umut Pajaro Velasquez Gender, Queer, Theory, Majority World, Frameword, Ai Milagros Miceli For an AI framework with a queer perspective from the Majority World that is going to be used by technologists, final users, policymakers, sociologists, activists, and everyone involved with Artificial Intelligence, we must consider using a common language in all stages of the foundation of any AI: design, development, deployment, and detection of biases. Furthermore, this common language should go around the concepts of fairness, accountability, transparency, and ethics because of actions those could bring in setting the AI foundation and making them less harmful for queer people from the Majority World. This is going to demand that all of us involved in this process think outside the box and bring queer, Majority World theories, methods, and methodologies, also the use of MLP, NLP, neurosciences, and trans theories from the majority world, body theories, cultural studies as a methodology, interviews, focus groups, participatory design, documentary, or case analysis, field research, law-making, regulatory processes, and others, altogether creating this framework. It is understandable, needs to include the language that can transform this specific harmful stereotype into regulations, formal definitions, and actions that can benefit not only queer people but all of us. 7
Data management materials for ELIXIR Estonia Diana Pilvar Data Management, Data Life Cycle, Dmp, Fair, Open Science Patricia Herterich Online data management materials for researchers should include the following: * Introduction on FAIR principles * Walkthrough of data life cycle * Expanding different life cycle stages with relevant info * Plan - Requirements for data management in Estonia and Europe * Process, Analyse – links to local HPC capabilities * Preserve, Share – links to local and generic repositories * Reuse – Information on how to write good metadata, READMEs, licensing * Open Science * DMP * Current requirements * Programs available for writing DMPs * Tips for writing a good DMP 7
Build community Doaa Abdelkader Open Science, Research Data Amangement, Research Service Michael Landi Open science community in Egypt I intend to build up Open science community in Egypt to raise awareness among researchers of the importance of Open Science and how we will guide them to make their research more organized and manageable, increase their visibility, and meet the FAIR principles ( Findable, Accessible, Interoperable, Reusable). also it will be in the line with Egypt Vision 2030. we will have support from FAIR points to meet the researcher’s needs by providing workshops, webinars, and conferences to guide them to the right path further Open Science (OS) is an umbrella term that is used to describe movements and practices aimed at increasing the accessibility of scientific knowledge and fostering innovations. To demystify these principles, we briefly describe six of its core components. 7
The Impact of Sleep Deprivation on Mental Health Deborah Udoh, Bisola Ahmed Sleep Deprivation, Mental Health, Illness Aswathi Surendran Millennials and Gen Zs recently have become obsessed with being online and informed 24/7 so much that it has encroached on sleep time. We tend to attach more importance to skin care, beauty products, exercise and many others ways to live healthy but fail to acknowledge the importance of a good night rest. The need for sleep tends to be underestimated. With the evidence garnered from cumulative research works, one can boldly assert that sleep deprivation has a profound effect on one’s health. This project pays particular attention to the negative impacts of lack of sleep on mental health. Thus, the work will include various sleep disorders, and other causes of sleep loss. It will touch on the normal sleep cycle and factors that distort this cycle. The aim is to provide adequate evidence to enable people make an informed decision about getting better sleep. 7
liforient Agien Petra Ukeh Artificial Intelligence, Machine Learning, Data Set, Skills, Degrees, Career Path, Internships, Scholarships, Programmes, Schools, Qualifications, Requirements. Caleb Kibet This project is to help every user, be it parents, children or other adults wishing to follow a particular career path find the right pattern towards achieving their life goals. The platform uses the top-down or bottom-up approach to achieve this goal. Using the top-down approach, a user can be interested in a particular field, say Artificial intelligence. The user then keys in the word artificial intelligence and the system gives a detailed guide of all that is required of the field such as academic qualifications needed/natural skills needed too. The program also includes various study programs, schools, scholarships etc, differentiating if it is remote, onsite and hybrid taking note of the locations of the available programs. Each program will also have fees allocation and other details provided by the dataset fed into the system. Statistics such as average salary, working hours etc will also be provided, and related fields as well, which if a user clicks, it will redirect to the clicked field. The bottom up approach uses qualifications be it academic or natural to direct an individual towards available careers for that qualification. Say a student just had the Advanced level passing Biology, chemistry, computer science and maths. The system will also prompt the student to give in natural qualifications such as great organizational skills, working alone or in team, interpersonal skills etc, from which a roadmap to the various available careers based on those qualifications will be given as a guide for the student to follow. Other information such as scholarship opportuinities, internships, schools, programmes etc will be given in the system as well. This system is to take data from contributors all over the world, and updates will be made every now and then to ensure arcuracy of the data. 7
Building a searchable project database Angelica Maineri Project Database Lena Karvovskaya Funders, consortia and RIs dealing with a large array of different smaller projects often find it difficult to clearly communicate to the outside what kind of projects they support. While there are efforts to build repositories that focus on data, software or research outputs as primary units, there is less attention to projects as focal points that lead to further outcomes (e.g. data, software, journal articles). I want to build an open, searchable database which connects project descriptions to project members, keywords, and related outcomes. As a use case, I want to build a searchable database of past OLS projects. 7
Closer to the sky: co-creating astronomical knowledge in a favela of Rio de Janeiro Claudia Mignone, Arianna Cortesi, Gabriela Rufino, Maria Clara Heringer Lourenço Astronomy, Space Science, Public Engagement Of Science, Community Building, Science-Art Gracielle Higino “Closer to the sky” is an astronomy public engagement project in Rio de Janeiro, Brazil that builds upon workshops for children based on the practice of non-violence developed by the astronomy club, initiated by A. Cortesi and C. Araujo. It entails two parts: 1) The collective exploration of astronomical images and space-based images of Earth (available from open access databases) with children, artists and people in a favela (slum) in Rio. Children will use space imagery to create poems, music, performances and street art together with local artists, to promote a positive and healthy vision of their environment. By exploring images with computers at the local cultural centre, they will practice digital skills while handling data containing information to observe the environment. 2) The creation of a digital map of cultural projects in Rio favelas, including the new artwork and educational material, along with locations linked to the history and practice of astronomy in the city. The project is informed by different aspects of the Open Science framework, from opening a dialogue with the community to creating science and educational material together. The interactive map, all content, lessons learnt and documentation on how to run similar initiatives will be released as open source projects. 7
Mapping open-science communities, organizations, and events in Latin America Jesica Formoso, Irene Vazano, Patricia Loto Open Access, Open Source, Open Science Communities, Spanish-Speaking Communities, Mapping, Visualization Alexander Martinez Mendez This project aims to identify communities that actively participate in the implementation, training, and dissemination of open practices such as open access, open data, open science, and open educational resources in the Latin American region. We will create a repository that will collect information regarding main areas, topics of expertise, and geographical location, as well as the contact information of the identified organizations and communities. Then, we will create a website or web-based application that will feature a searchable list and an interactive map interface. Organizations interested in being added to the knowledge base will be able to do so through a pull request on GitHub. The project foresees an automatic update of the list and the map every time a pull request is approved after the new information is curated. 7
Developing an online snake venomics catalog for easy access to venom proteomics data Carol O’Brien Online Platform, Open Research, Snakebite, Venom, Proteomics, Neglected Tropical Disease Billy Broderick Snake venoms are incredibly complex mixtures of many different proteins called toxins. Understanding which particular toxins are in the venoms of each snake is important information for researchers working with snakes, particularly for those developing new anti-venoms against snakebite. There have been many previous studies which have identified which toxins are in which snakes. This is incredibly valuable information for researchers, but the data is trapped in tables, scattered across many different scientific papers. These papers are themselves often trapped behind journal pay-walls. In this project, I propose to; firstly, create a database of snake venom proteomics studies and secondly, create an online platform which will allow easy access to, and visualisation of, the proteins that have been identified in each snake venom. I hope to create a tool which will be useful to anyone interested in understanding snake venoms. 7
Biodegradability of Drilling Fluids in different soil types Chukwuka Ogbonna Biodegradability, Drilling Fluid, Soil, Petroleum Hydrocarbons Stephane Fadanka Drilling fluid is a lubricant used in drilling oil and gas wells and in exploration rigs. It may vary with additives aimed at optimizing and improving the drilling process(Sadiq et al., 2003). They are mixture of natural and synthetic chemical compounds applied to cool and lubricate the drill bit; clean the borehole bottom; drive cuttings to the surface, control formation pressure and improve the function of the drill string and tools in the borehole (API, 1992). The decline of the environment is presently a common and overwhelming challenge globally (Gazali et al., 2017). Studies have shown that uncontrolled disposal of drilling fluid wastes, with accompanying toxic components contribute heavily to the diminishing biodiversity (Vincent-Akpu, 2015; Vincent-Akpu and Sikoki, 2013; Sil et al., 2012). Most drilling firms especially on-shore, prefer to use land spraying (or throwing) as a method of disposing the waste drilling muds. Such unsustainable methods could lead to some hostile environmental impacts (Gazali et al., 2017). Yet, if biodegradation is relied on as a natural and cost-effective option for pollutant breakdown and pollution control (Nrior et al., 2017). The aim of this study is to investigate the biodegradability of drilling fluid in the different soil types. This study, in addition to specifying the ecotoxicological effect of the drilling fluid pollutants in the soil, will chiefly show which soil type most efficiently promotes the drilling waste biodegradability and by implication provide reliable data for policy makers as regards to environmental reclamation 7
Building an open, structured, knowledge listing system Mike Trizna Knowledge Management, Data Hubs, Awesome Lists, Quarto, Version Control, Schema Validation Pauline Karega The project I have in mind for this program has 2 major components. The first component is to build a resource listing system, likely based on Quarto (, that will take data from YAML files, and convert to a listing on a static web page that allows for filter and sorting. The website actually does an incredible job of displaying structured lists that come from YAML, but uses Jekyll to do this. I also intend to build out a system in GitHub actions that will accept and validate new entries that come through a form or GitHub Issues. The second component would be to create training program that teaches how to use the system – while also teaching the concepts of version control, metadata schemas, and continuous integration to complete novices. 7
LA-CoNGA Physics Citizen Science Project Alexander Martinez Mendez Citizen Science, Climate Change, Research Seedbed, Programming Melissa Black The LA-CoNGA Physics Citizen Science project is a collaborative experience that aims to educate young people about climate change from an open science and data science framework. The project aims to expose secondary school students \nin Colombia, Ecuador, Peru and Venezuela to data science research environments and tools at an early age. Through the creation of research workshops and the data generated in a network of weather stations, students will be guided in understanding open science concepts, climate change, statistics and programming. Ultimately, it is hoped that by \ncarrying out a project, the students will be able to take action to address the effects of climate change in their communities. 7
Facilitating easier academic selection of research students using ML Sivasubramanian Murugappan Research, Students, Supervisor, Machine Learning Harini Lakshminarayanan Our project is to build a machine learning-based tool to match the research students with the most suitable supervisor for them and vice versa by receiving their requirements and expectations. Our ML model uses the responses received from users and classifies them on a high-to-low suitability scale. 7
Bioinformatics University Outreach Emmanuel Adamolekun, Seun Elijah Olufemi, Ayomide Akinlotan Bioinformatics, Open Science, Outreach Michael Landi Bioinformatics, an interdisciplinary field that combines computer science, statistics, and biology to analyze and interpret biological data. As the amount of biological data generated continues to grow rapidly, there is an increasing demand for professionals with expertise in bioinformatics. To help meet this demand, we propose a university outreach program focused on developing bioinformatics capacity among University students in Nigeria and this will create early interest in genomics data analysis among the students and equip them with the relevant skills set and knowledge in Bioinformatics. Also, we will be creating awareness of Open science among the university students. We would be working alongside other sister organizations to achieve this goal. 7
Practical Guide to Reproducibility in Bioinformatics Sangram Sahu Hans-Rudolf Hotz Reproducibility in life science always been in debate.\nAlthogly lately standard practices have adopted in different aspects,\nsometimes these might not practically applicable by the end researcher.\nReproducibility practices need to be used by all, then only it will be\nsuccessful in both ends (research producer and research reader).\nTo bridge this gap, a lot of practical training is required.\nThis is where this project comes in picture.\nHere we aim of creating training materials for doing\nreproducible research in a more practical and application-oriented way\nwith a specific focus on niche areas of respective research\n(for now bioinformatics as a starting point).\nNot only limited to materials, as well as virtual training\nby promoting the practuce of creating and sharing of Computational\nEnvironment and Analysis in a reproducible manner with systems in scaling.\n 2 graduated
Mapping Science using Open Scholarship Stefan Gaillard Malvika Sharan, Yo Yehudi I want to work out a way to use data visualization to map the current state\nof scientific knowledge within the life sciences.\nSome literature on mapping science already exists, but most was written before\nthe machine learning revolution. The more contemporary literature suffers\nfrom other mistakes, such as flawed comparisons of studies (using the same\nterminology in different manners) and, most importantly, not all scientific\nresearch is easily available. Thus, Open Science should be an integral\ncomponent of any effort to map scientific knowledge. In this project, I\naim to outline which Open Science practices currently exist that facilitate\nthe mapping of science, which Open Science practices are needed to facilitate\nthe mapping of science and how a prototype for a collaborative effort to map\nscience could look like.\n 2
Creating community awareness of open science practices in phytolith research Emma Karoune Yvan Le Bras I have been conducting my own research to assess the state of data sharing\nin phytolith analysis. The results show that data sharing practices are\nminimal (articles are currently being written!). Therefore, I am aware that\nmy discipline needs to focus on open science practices to develop more\ntransparent and robust methodologies. I want to raise awareness of the need\nfor open science practices, initially focusing on data sharing. This will\ninvolve distributing my current research findings widely through blogs and\nconference talks, and these efforts will enable the formation of community\nconnections to initiate a working group for open science in my field.\nThere are currently working groups for nomenclature and morphometrics in\nphytolith analysis as part of the International Phytolith Society. Initial\nefforts of the working group would be to investigate the best ways to share\ndata, developing resources/training opportunities for students and early\ncareer researchers, and recommending/publishing guidelines for colleagues\nto follow.\n 2 graduated
Synthetic Biology Chassis Toolkit for Public Domain Use Stephen Klusza Anita Bröllochs A synthetic biology platform that is compatible with existing and future\npublic-domain biotechnologies in the Biosafety-Level -1 (BSL-1) organism\nBacillus subtilis. Bacillus subtilis has the ability to secrete biomaterials\noutside of the cell and also undergoes sporulation under certain conditions,\nwhich provides a stable way of storing DNA at room temperature without a\nneed for refrigeration.\nDespite the tremendous progress in synthetic biology and biomanufacturing\nindustries, most innovations are patented with various restrictions on\nuse and impedes global access and use of these tools. A synthetic biology\ntoolkit that exclusively uses technologies in the public domain is the first\nstep towards providing scientists and non-scientists the tools to conduct\nscience and create valuable biomaterials for their communities without restriction.\n 2 graduated
Creating a single pipeline for metagenome classification Muhammet Celik Mallory Freeberg Metagenomics has been increasingly becoming very important in studies of\nhuman and animal health. It has become clear that bacteria in and on our\nbodies are very significant. Thus there is a big boost in science society\nwho are willing to study metagenomics. When we want to do downstream analysis\nin our microbial community, the first step is to find out about our community,\nwhat the community ultimately looks like. My idea is that benchmark the state\nof the art tools (such as Kraken2, Centrifuge and CLARK) and create a single\npipeline that will produce a single pdf file with cutoff-significance-sensitivity\nvalues for each tool. Later we can set that pipeline on web-based system that\nwould make the researchers easier for the scientists.\n 2 graduated
Chronic Learning Bailey Harrington Piraveen Gopalasingam Chronic Learning is an open resource for academics and individuals who are\ntrying to teach or learn something new; the scope is not limited to science.\nThe project currently consists of a WordPress site (unlaunched), a Dropbox\nfor material storage, and a Twitter account for dissemination. I own the\ncopyright for all of the materials, and they are free to use for academic\npurposes, such as lecture slides or handouts. The materials are made up of\na mixture of graphical abstracts, tutorials, and reference pdfs; I hope to\nincorporate more formats, such as videos, as well.\n 2 graduated
Open Innovation in Life Sciences Joyce Kao, Christina Ambrosi Aidan Budd Open Innovation in Life Sciences is an open science organization that\npromotes the practice of open science in the Zurich-area life science\nresearch community. We are a bottom-up approach focusing on early career\nresearchers (ECRs) that complements the Swiss national and institutional\ntop-down open science initiatives. We are currently establishing an official\nSwiss association in collaboration with Life Science Zurich and expanding\nour program to include open science workshops/courses for ECRs and public\nlectures to complement our annual conference and keep the discussion on\nopen science active all year round. The association will be operated by\nECRs working with a board of academic and industry leaders thus it doubles\nas a career training program adding hands-on experience of practicing soft\nskills (e.g. teamwork, communication, etc.) and learning non-scientific\nexpertise (e.g. project management, budgeting, advertising, etc.) to\nenhance traditional Ph.D. and Postdoctoral training.\n 2 graduated
Using collective action to change cultural norms and drive progress in the life sciences Cooper Smout Luis Pedro Coelho, Lilian Juma Open Science practices have the potential to benefit everyone in the life\nscience community (and broader society), but their adoption is limited by\nincentive structures that reward ‘fast’ and unreliable science at the\nindividual level. ‘Crowd-acting’ platforms (e.g., Kickstarter, Collaction)\ncan overcome such collective action problems by organising ‘pledges’ for a\nparticular behaviour, which are only acted on if and when a predetermined\ncritical mass of support is met. By protecting individuals’ interests until\nsuch time that they have the support of their community, crowd-acting\nplatforms can thus resolve the collective action problem and increase the\nuptake of the beneficial behaviour in question. Similarly, Project Free Our\nKnowledge aims to organise collective action in the global research\ncommunity. Researchers can create and sign campaigns asking their peers to\nadopt a new behaviour, but only act on that pledge if a certain threshold\nof support is met (e.g., 1000 signatures). If and when the threshold is\nreached, everyone who has signed that campaign will be listed on the website\nand directed to carry out the new behaviour in unison, with the protection\nof their peers.\n 2 graduated
Developing the Research Software and Systems Engineering Community to support Life Sciences in Africa Peter van Heusden, Eugene de Beste Mesfin Diro, Raniere Silva Bioinformatics and other computational life sciences (e.g. disease modelling)\nrely on computing infrastructure built and maintained by software engineers\nand sysadmins/systems engineers. The RSSE Africa (Research Software & Systems\nEngineers) forum was established last year\njust prior to the ASBCB conference to\nprovide a home for people whose main contribution to research is software and\ncomputing systems. The aim for 2020 is to grow the project in terms of\nvisibility and participation.\n 2
Growing the Galaxy Community Beatriz Serrano-Solano Dave Clements Galaxy is an open-source framework that enables scientists to perform\ncomputational biology analysis. A worldwide community supports the Galaxy\nproject providing tools, workflows, computational resources and training\nmaterials in many different scientific areas. At the beginning of\nSeptember 2020, I’ll be starting a new position at the University\nof Freiburg to help with the community management within the European\nGalaxy team. European Galaxy community has grown both in number and\nfields, standing in need of coordination across several countries that\nneed to work collaboratively to achieve common goals. My new role will\ninclude the responsibility of coordinating the European Galaxy and\nscientific community as well as strengthening the relationship with the\nAustralian community, helping build the Asian and African communities,\nand event organisation around Bioconda, BioContainers and Galaxy. With\nthis personal project, I would like to advance my community building and\nproject management skill while enhancing the connection with the\ncommunities outside Europe.\n 2 graduated
Global land-use and land-cover data under historical, current, and future climatic conditions for ecologists Tainá Rocha Bruno Soares, Yvan Le Bras Land-Use Land-cover (LULC) data are important predictors of species\noccurrence and biodiversity threats. Although there are LULC datasets\navailable for ecologists under current conditions, there is a lack of\nsuch data under past and future climatic conditions. This hinders\nprojecting the predictions from niche and distribution models over\nglobal change scenarios at different time periods. The Land Use\nHarmonization Project (LUH2) is a global\ndataset in raster format covering continental areas, and provides massive\nLULC data from 850 to 2300. However, these data are compressed in a file\nformat (NetCDF) that are intractable by most ecologists. We aim to selected\nthe most useful LULC data for ecologists, transformed the existing dataset\ninto regular GIS formats and derived new LULC data often used in\nmacroecology and ecological niche modeling and provide this data LULC data\nfor thirty-seven time slices from year 900 to 2100, with the following\ntemporal resolution - every 100 years from 900 to 1950, every 10 years from\n1950 to 2010, and every five years from 2010 to 2300.\n 2 graduated
BioLab Open Source Projects Boï Kone, Bakary N’tji Diallo Delphine Lariviere BioLab Open Source Projects is the beginning of a platform where biomedical\nresearchers, and computer scientists will work closely with to solve some\nmain tropical infectious diseases through collaboration and sharing ideas.\nBeing the main source of problems in Africa, tropical diseases are\naffecting the socio-economic development of this part of the world. The\ntime is now the raise the state of the consciousness of the problem by\ncreating an environment where the only interest is sharing. Sharing ideas,\nknowledge, skills, and opportunities about biology. The main challenge in\nAfrican biomedical research science is the lack of resources and sharing\nskills, especially amount young generation of the scientist whom most of\nthe time doesn’t have the orientation and necessary information to address.\nThe project stands to build a community around the main challenge surrounding\ntropical infectious diseases as Malaria, Tuberculosis… To tackle this challenge\nof the biological understanding of the proportion of burden, the need for more\ntraining prospective and collaboration between students, young researchers,\nand potential supervising programs. We will try to build an open-source\nproject around these challenges to better understand these diseases’\ninteraction modes with drugs via bioinformatics tools. The benefit will\nprovide training support to the students, young researchers together and\nbring solutions for general health problems, and be able to share them on\ncollaboration purposes and training.\n 2
Target approach in the stages of liver cirrhosis to reverse back in good condition Wasifa Rahman Sonika Tyagi, Malvika Sharan This is a brand new project for the target approach in the stages of liver\ncirrhosis. This project will identify the significant reasons behind\ndeveloping cirrhosis and to prevent that staging development in continuation\nof this damage. The major focuses to reverse the damage into healthy\ncondition are the enzymes, mutations, snp, genetic factors related to\ndisease progression. These are the key aspects to target the stages of\nliver cirrhosis and to identify there role to a healthy growing liver or\ndecrease there damaging features.\n 2
Science For All (Sci4All) - Citizens’ daily problem solving by engaging multidisciplinary scientific communities Teresa Laguna, Laura Judith Marcos-Zambrano Katharina Lauer The project concept is to establish a collaboration between the general\npublic and scientists, where the citizens can expose their daily issues and\nscientists can collaborate to try to find solutions for them. For that purpose,\nwe aim to establish a multidisciplinary scientific network who help citizens\nto improve their daily life.\nTo start the project, we think it is more appropriate to first create\nlocal communities, starting in Madrid as it the location of the applicants.\nIn the future, the local network can be expanded or occasionally contact\nother local communities, such as the ones already established by means of\nthis program (Barcelona, Utrecht, Montreal), to work collaboratively in\nglobal public issues. Connections with companies can also be established\n(for example, to help in the production of prototypes) but with the\nrequirement of use open licenses and follow the precepts of this program.\n 2 graduated
Supervised Classification with UMAP and Network Propagation Joel Hancock, Markus Kirolos Youssef Markus Löning Using UMAP and network propagation to make a fast, accurate and\ninterpretable supervised learning algorithm. We will benchmark our\nnovel method against other state-of-the art machine learning algorithms\nusing image analysis data, and other single-cell readouts.\nSupervised tasks using features with many artifacts and strange\ndistributions and are typically solved using computationally intensive\nneural nets. This kind of data frequently occurs in morphological image\nanalysis and bioinformatics generally.\nWe would like to apply the technique to different datasets, investigate\nits theoretical underpinnings and see whether GPUs and other computational\nresources can be leveraged to improve the speed of classifications.\nPublish the results in an open science journal and make the code available\nto the community.\n 2
Building a Database for Open Access Immunology Pre-prints and Publications on COVID-19 Harriet Natabona Mukhongo, Brenda Muthoni, Jelioth Muthoni Naomi Penfold This project aims to develop a database of open-access Immunology\npre-prints and publications on COVID-19. As the COVID-19 pandemic unfolds,\nscientists are working round-the-clock to generate data, resulting in an\nupsurge of information on various aspects of the virus. Various publishers\nsuch as Lancet, Elsevier, Oxford, New England Journal of Medicine (NEJM),\nNature, have open access early journal releases and publications on a wide\nrange of fields covering the COVID-19 topic. Many efforts are still underway\nto develop a feasible vaccine with trials taking place at different research\ncenters globally. An open repository of open access pre-prints and journal\narticles in the area of COVID-19 immunology and vaccinology will easily\nconnect researchers on current data available for efficient communication\nof scientific research and open science principles. The repository will be\ndivided into subsections of publications from the 7 continents to be able\nto easily identify how the disease is affecting different geographical\nlocations.\n 2 graduated
Sharing 3D Modeling Workflows for Biomechanists and Palaeontologists Eva Herbst, Dylan Bastiaans Holger Dinkel We recently received some funding from our university to host a workshop\nabout 3D modeling methods for biomechanics. This sparked the idea of creating\nsome sort of online database of workflows in our field. This website will\nserve early career researchers or researchers starting out with biomechanical\nanalyses, or those wishing to switch their workflows (for example to be more\nfreeware based). Currently there is a need for a central repository of such\nworkflows. Usually they are only circulated within lab groups or between\ncollaborators. However, an online hub with modeling resources would\nsignificantly reduce the time researchers need to invest in choosing\nprograms and figuring out how to set up their models, giving them more\ntime to do research and develop new methods. For this project, we\nspecifically want to build a website with open source workflows, tips and\ntricks for setting up models for finite element analysis, as well as other\n3D workflows (such as fossil reconstruction in freeware programs).\n 2 graduated
Open Science Office Hours to support trainees in Montreal to help make their research more open and reproducible Kendra Oudyk Andrew Stewart The goal of this project is to establish Open-Science Office Hours, a\nmeeting space where students in Montreal can come throughout the year\nto get help making their research more open and reproducible. This\ninitiative will promote Open Education in life science, particularly in\nthe area of neuroimaging data science. Students in this field often learn\nhard skills in open science at weekend workshops, summer schools, and\nhackathons, but there is a gap in the learning experience regarding\nlong-term support. The main measurable outcome of this project would be\nhaving a group of senior trainees who take turns hosting Open-Science\noffice hours at regular intervals throughout the year. These meetings\nwould take place virtually and/or in person in Montreal, Canada (depending\non COVID-19).\nAlthough we hope to spark this initiative, it would need community support\nto be sustainable. In the first stage, I would welcome collaborators to\nhelp plan. Later, we would need support in the form of funding so that we\ncan compensate TA’s; this would help enable less-privileged trainees to get\ninvolved as TA’s. Finally, we would need to recruit the TA’s. Thus, this\nproject will involve community contributions at all stages, in various capacities.\n 2 graduated
Registered Reports in Primate Neurophysiology Danny Garside Ivo Jimenez In this new project I aim to explore the specific challenges (both\npractical and metascience related) to adoption of registered reports in\nprimate neurophysiology. As far as we aware there has not yet been a\nregistered report within the field of non-human-primate (NHP)\nneurophysiology. There are specific challenges for using this format\nwithin this field; some practical and some metascientific. A central issue\nis that the existing format for registered reports focuses on hypothesis\ntesting, whereas neurophysiology is quite often an exploratory and\niterative process. Working with NHPs is a necessarily limited privilege\nand comes with considerable responsibility; the cost of having experiments\nwhere the results are not robust, or where opportunities for discovery\nare missed, is particularly high, since opportunity for replication is\nlimited. Therefore, peer review in advance of performing an experiment\nseems a particularly valuable idea. This project would explore ways in\nwhich to find compromise in the conflict between existing registered\nreport formats and current methodologies in primate neurophysiology.\n 2 graduated
OSUM - Open Science UMontreal Samuel Burke, Andréanne Proulx, Pauline Ligonie, Myreille Larouche, Valerie Parent, Béatrice P.De Koninck Renato Alves We are building Open Science UMontreal (OSUM) - a student initiative -\nwhose aims are to create an onboarding experience to the practice of\nopen science (in all of its forms) for the predominantly french-speaking\nscientific community in Quebec (and beyond). We aim to attract as many\nstakeholders as possible (from scientists at the level of undergraduate,\nall the way to emeritus status); to unite those who already have a strong\nopen science mindset and to build our resources such that we become the\nfirst point of contact for the local scientific community. OSUM will\ndiscuss open science values, principles, and practices. By raising awareness\nof current issues, we hope this project will continue to drive the culture\nshift and help people realize how they can immediately benefit from using\nopen and reproducible practices. Furthermore, we strive to navigate our\ninstitution towards becoming an institutional member of OSF, just as the\nUniversity of British-Columbia has done in the last year. As open science\nis a broad and wide-ranging concept with domain-specific definitions,\nwe want to provide opportunities to meet and encourage the exchange of\nideas within and across fields because we can all learn from each other.\n 2 graduated
APBioNetTalks Hilyatuz Zahroh Patricia Herterich APBioNetTalks is a new program to be launched by Asia Pacific\nBioinformatics Network (APBioNet) by the mid of 2020. It will serve as an\nonline platform to host and stream bioinformatics related talks, tutorials,\nand training. The program aims to facilitate the learning of bioinformatics\nto be more inclusive and accessible. It will invite experts, early career\nresearchers/scientists, and Ph.D. students from different countries and\ninstitutions to share their knowledge and skills with the audiences. Upon\nthe completion of live streaming, the videos will be archived and available\non-demand. The program will help provide people with open bioinformatics\nresources and foster the growth of bioinformatics. Additionally, the\nprogram is also expected to reach wider audiences and introduce more people\nto the bioinformatics.\n 2 graduated
Database for coordinating training needs in Kenya to faciliatate preparation and collaboration Pauline Karega Lorena Pantano, Sarah Gibson There are many universities in Kenya in different counties offering\nlife sciences as part of their curriculum. There’s, however, a difference\nin accessibility to resources. There are a number of science clubs in\nthese universities. In light of the recent pandemic, there has been a\nchange in how conferences and science events are being conducted, which\nhas proven that online training is an effective mode of training and so is\nremote distribution of resources. With increasing need to adopt online\ntrainings and conferences, and to narrow the gap in distribution of\nresources to different institutions and also encourage collaboration,\nI propose a platform that contains data of all science clubs in Kenyan\ninstitutions. The platform will allow submissions of training needs from\nthe different institutions to allow narrowed down preparation and it will\nalso allow collaboration of students from various institutions. From\nrequests obtained, organization of talks and trainings can be more\nefficient and organized. These can transition into physical meetups eventually\n 2 graduated
Open platform for Indian Bioinformatics community Pradeep Eranti David Selassie Opoku The Bioinformatics community is one of the ever-growing communities in India,\nwhich has its presence across the length and breadth of the country. Likewise,\nthe education and research programs spread across different branches of the\nBioinformatics topic as its focus. There is a need for establishing open\nplatform(s) where students, (early-career and established) researchers, policy\nmakers could exchange knowledge freely across these different areas through\nopen science practices and principles. In realizing such a platform, the\nbenefits of following open principles need to be widely communicated for\nbringing awareness among the community by building pathways and encouraging\ncontributors & ambassadors. The aim of this project is to explore the scope\nand path towards establishing an open platform and provide opportunities to\nparticipate, contribute and organize open initiatives.\n 2 graduated
OpenWorkstation - A modular and open-source concept for customized automation equipment Sebastian Eggert Meag Doherty The OpenWorkstation project presents a modular and open-source concept to\ndevelop customized automation equipment. Inspired by assembly lines, the\nconcept consists of ready-to-use and customizable hardware modules which\ncan be plugged into the base frame. In contrast to current commercial and\nopen-source standalone solutions, this concept enables the combination of\nsingle hardware modules – each with a specific set of functionalities – to\na modular workstation to provide a fully automated setup.\nThe base setup consists of a pipetting and transport module and is designed\nto execute basic protocol steps for in vitro research applications,\nincluding pipetting operations for non-viscous and viscous liquids and\ntransportation of cell culture vessels between the modules.\nThe successful application of this concept is presented within a case study\nby the development of a storage module to facilitate high-throughput studies\nand a crosslinker module to initiate polymerization of hydrogel solutions.\nBy combining capabilities from various open source instrumentations into a\nmodular technology platform, this concept has the potential to facilitate\nthe development of customized automation equipment for efficient and\nreliable experimentation for in vitro research. Ultimately, the\nOpenWorkstation concept will allow empower academic groups to the develop\ntheir own equipment to automated their research workflows.\n 2 graduated
The Turing Way - Guide for Ethical Research Sophia Batchelor, Ismael Kherroubi Garcia, Laura Carter Jez Cope, Anjali Mazumder The Turing Way project aims to provide all the information that data\nscientists in academia, industry, government and in the third sector need\nat the start of their projects to ensure that they are easy to reproduce\nand reuse at the end. The Guide for Ethical Research is one book of the\nproject (complementing the other four which cover reproducible research,\nproject design, communication, and collaboration).\nProducing an initial a full draft of the Guide for Ethical Research aims to\npositively impact research in the following ways:\n\n1. Equipping data science researchers to approach their work in a more reflective way and to\nthink holistically about their research, its processes and outcomes\n2. Equipping scientists working in a range of fields to advocate for ethical\nresearch at all stages of the research process.\n3. Compiling diverse research ethics resources in one area.\n 2 graduated
sktime - a unified toolbox for machine learning with time series Markus Löning Martina Vilas sktime is a new Python toolbox for machine learning with time series.\nWe provide state-of-the-art time series algorithms and scikit-learn\ncompatible tools for building, optimising and evaluating complex models,\nbased on a clear taxonomy of learning tasks and clear design principles and\npatterns. We want to enable understandable and accessible machine learning\nwith time series by providing instructive documentation and by building a\nfriendly, collaborative and inclusive community. The aim is to unify the\ntime series analysis field by providing a common framework for multiple\nlearning tasks, bringing together contributors from academia and the wider\ndata science community into a joint framework and embedding best practices\ninto the time series analysis field.\n 2 graduated
Turing Data Stories Camila Rangel Smith, David Beavan, Sam Van Stroud, Kevin Xu Yo Yehudi Our goal at Turing Data Stories is to produce educational data science\ncontent through the storytelling medium for the general public. Our\ncontent will be split into different stories, which begin with an\ninteresting and relevant real-world hypothesis and walks the user through\nthe entire data science process - from gathering and cleaning the data,\nto using it for data analysis. The aim of the Turing Data Stories is to\nspark curiosity and motivate more people to play with data.\nThe Turing Data Stories are detailed and pedagogic Jupyter Notebooks that\ndocument an interesting insight or result using real world open data.\nA Turing Data Story follows these principles:\n1. The story should be told in an engaging and educational way, describing\nboth the context of the story and the methods used in the analysis.\n2. The analysis must be fully reproducible (the notebooks should be\nexecutable by others using a provided computer environment, requiring no\ninstallation of software)\n3. The results should be transparent, all data sources are correctly referred to\nor included.\n4. In order to maintain the quality of the results, the story\nshould be peer-reviewed by other contributors before being published.\n 2 graduated
Embedding Accessibility in The Turing Way Open Source Community Guidance Neha Moopen, Paul Owoicho Samuel Guay The Turing Way book is a community-driven book for reproducible research\nin data science. It is written by data scientists, academics, researchers,\nstudents, technologists, software engineers, policymakers, educators and\nother stakeholders from varying technical backgrounds.\nI will be contributing to the project as a technical writer from\nSeptember 2020 to December 2020 where I will have the following\nresponsibilities:\n1. Helping authors in effectively engaging with the\nproject and collaborating in groups to write, edit and review current and\nnew chapters in the book.\n2. Developing resources that will ensure\nconsistency across all chapters and make the book more comprehensible for\nreaders.\n3. Making pages more responsive and discoverable so that readers\ncan read it on any device (like a smartphone or a tablet) with the same\nefficiency.\n4. Participating in the community discussions of the project\nto learn about the experience of its readers, authors, collaborators, and\nother members through their stories of using the book.\n\nAll these tasks in The Turing Way will require me to integrate Open\nScience principles and community practices, which I am confident to learn\nwith my participation in the Open Life Science training and mentoring program.\n 2 graduated
Towards open and citizen-led data informing the decarbonisation of existing housing Kate Simpson Arielle Bennett The Design for Retrofit project I work on at The Alan Turing Institute,\nwithin the Data-Centric Engineering programme aims to use data-driven methods\nto address uncertainty in design stage decision-making towards reducing the\ncarbon impact of homes through retrofit. This involves quantification of the\ncurrent energy demand of housing archetypes and physics-based modelling to\nevaluate the impact of energy-efficiency technologies available to be installed\nin the home during the retrofit process. However, uncertainty exists due to a\nlack of monitored data on indoor air temperature, which leads to a performance\ngap between modelled and monitored energy demand as heating practices are the\nmost sensitive parameter within building energy modelling. Further uncertainty\nis acknowledged due to a lack of longitudinal data following retrofit. Such\ndata could inform the evaluation of long-term impacts of technologies installed,\nto inform future decision making. Motivated householders who are planning or\nrecently completed a retrofit project might be interested to share data, perhaps\nin return for research insight on the impacts and success of similar projects\nthrough a citizen science approach. This is an idea in development for follow-on\nresearch which requires data collection protocols, ethical guidelines and data\nrepositories.\n 2 graduated
Autistica - Turing Citizen Science Project Georgia Aitkenhead, Katharina Kloppenborg Anelda van der Walt This project is to build an online citizen science platform with the\ncollaboration of autistic people and their families. The platform will\nthen be used to gather experiences in order to investigate how sensory\nprocessing might affect the ways autistic people navigate the world\naround them. We are also using learning from the project to create a\nframework to support researchers in making their own work more participatory.\nThe project is a collaboration between The Alan Turing Institute – the\nUK’s national institute for data science and AI research, and Autistica,\na UK autism research charity. I am working under the supervision of\nDr. Kirstie Whitaker, an academic who is committed to the principles of\nopen research and building welcoming and inclusive online communities.\nThe funding for the project is a result of a James Lind priority setting\nalliance run by Autistica. It is a complex project with multiple\nstakeholders. Open Humans, a foundation who have provided the back end\nfor the platform, and a development team from Fujitsu are currently\nco-designing the front-end interface with the input of the autistic\ncommunity. Most important are a diverse and growing number of autistic\nparticipants and (sometimes overlapping) open source developers.\n 2 graduated
Arua City Open Learning Circles Ibrahim Ssali Caleb Kibet The H3ABioNet Open Learning Circles (OLC) Initiative aims to build a\ncommunity of peers committed to learning and teaching each other\ndifferent skills (coding, data analysis, etc.) through lectures,\ntutorials and work on collaborative projects. The goal is to have\nregular meetups for scientists, researchers and students to openly work\ntogether, learn/share code, learn new tools/software or simply improve\ntheir general coding skills. The project also allows space for continued\nlearning and growth in various bioinformatics tools for our ex-trainees\nin this instance. In the beginning, learning circles will enable continuity\nof learning after IBT. To start a bioinformatics department and resource\nCentre at Muni University, Faculty of health sciences located in Arua City,\nWest Nile, Uganda. The aim is to develop a critical mass of practitioners in\nthis region who can develop and utilize Bioinformatics approaches to Biosciences.\nTo enhance bioinformatics training at all levels and increase the size and\nquality of the pool of potential students and researchers. I am at a stage of\nproposal writing to start a bioinformatics department and resource centre at Muni\nUniversity, Faculty of health sciences located in Arua City, West Nile, Uganda.\nThe aim is to develop a critical mass of practitioners in this region who can\ndevelop and utilize Bioinformatics approaches to Biosciences. To enhance\nbioinformatics training at all levels and increase the size and quality of\nthe pool of potential students and researchers.\n 2
Western Cape-H3ABioNet Open Learning Circles Ekeoma Festus Caleb Kibet, Lena Karvovskaya The H3ABioNet Open Learning Circles (OLC) Initiative aims to build a\ncommunity of peers committed to learning and teaching each other\ndifferent skills (coding, data analysis, etc.) through lectures,\ntutorials and work on collaborative projects. The goal is to have\nregular meetups for scientists, researchers and students to openly work\ntogether, learn/share code, learn new tools/software or simply improve\ntheir general coding skills. The project also allows space for continued\nlearning and growth in various bioinformatics tools for our ex-trainees\nin this instance. Through my participation in OLS, I will adapt the Open\nlearning Circles Concept to our community in UWC.\n 2
Best practices for online collaboration/peer-production in citizen science Katharina Kloppenborg citizen science, peer-production, participatory design Fotis Psomopoulos Citizen science revolves around the idea of integrating the public in scientific research. However, there are different interpretations of this idea. An important part of citizen science projects allows laymen only to participate in a limited scope of microtasks and keeps thus reinforcing the power gap between academic scientists and the public. Literature has called for more autonomy of citizen scientists by allowing them to participate in more phases of the research cycle. Commons-based peer-production, an alternative mode of production in which people self-organize to develop complex knowledge-commons like Wikipedia or open software, seems to be a promising approach to facilitate this. However, a design-centred approach implementing this for a specific use case is yet to be done. In my PhD project I am trying to fill this gap by redesigning the online ecosystem of Open Humans - an existing community of practice around citizen science - collaborating closely with this community in a user-centered design approach. As one of the first steps, I am working on a best practices guide, summarizing the experiences of existing similar projects. 3 graduated
A virtual conference management system with seamless open science integration Simon Duerr conferences, virtual, poster session Emily Lescak VCMS ( is convenient tool to setup a website for a virtual conference including an abstract submission portal, timezone adapted scheduling of talks and an interactive virtual poster session with video chat and spotlights for posters with some features still in development. The tool is currently in beta and will be released as FOSS under MPL once the software is battle tested (in mid february). 3 graduated
Memory Collecting: Croatian Homeland War Annalee Sekulic database, video, generational memory, record, document, historical, Croatia, Diaspora Kate Simpson The “Memory Collecting: Croatian Homeland War” project aims to create a platform where survivors can submit video recordings of their own memories and reflections of the 1992 Homeland War. The repository will also store them in a publicly accessible database. By having the software be open to citizen scientists, the database will be one of the most inclusive and easily accessible memory banks. This initiative seeks to preserve the memory of the role of Croatian-Americans in the creation of free, modern Croatia during the Homeland War in the 1990s. 3 graduated
Open Phototroph Steven Burgess synthetic biology, community building, citizen science Stephen Klusza I want to help build a culture of open science and good practice (as well as fun) within the plant synthetic biology community, with an initial emphasis on the US. \n\nI hope to do this by (1) establishing an open toolset for genetic manipulation of algae and photosynthesis enzymes (2) developing an open repository of protocols for genetic manipulation (3) producing educational resources to aid experimentation, both in academia and for citizen scientists (4) building a community of interested individuals to expand and contribute to the project. 3
Opensource Transpiler of Synthetic Biology Lab Protocols for Wetlab Robotics William Jackson Robotics, Synthetic Biology, Open Source, Community Enhancement An open-source software tool and associated protocol repository that translates wet-lab protocols into instruction sets for commonly available robotic liquid handlers. Protocols will be hosted on a publically accessible website, and community members can edit, annotate, and report on different protocols. Think Github for biological protocols with an issue tracker. 3
Field and laboratory based research project researching, surveying, and discovering the palaeoecology and palaeogeography of West Cork Robin Lewando palaeoecology, palaeoenvironment, palynology, interdependence, interconnectedness, landscape, public, geomorphology, geology, geography, microscopy, microfossils Bruno Soares This project is a field and laboratory based research project researching, surveying, and discovering the palaeoecology and palaeogeography of West Cork. The project will make use of:- paper research methods; sampling and scientific analysis of sediments; digital mapping; field and site visits and landscape analysis; scientific processing, analysis and identification of microfossils from sediments; site visits and surveys; ecological surveys; and local enquiry. Results and findings will be published on a website in the form of:- stories, accounts, photographs, digital interactive maps, and graphics, with a prime emphasis on accessibility, understandability, relevance. Principal attention will be paid to environmental areas that are productive of microfossils (bog and lake sediments); that have distinctive landscape features and sediment types (relict glacial and past and present fluvial landscape features); different natural habitat types and plant and animal communities; geological distinctiveness; and archaeological sites. Emphasis will be placed on the interconnectedness of these aspects of the current and past environments. The final step will be to show how, in each area, however local, these many and varied aspects have contributed to the present landscape and environment and thus to give an understanding how the future development may progress. 3
Open data schema for actigraphy data in chronobiology and sleep research Manuel Spitschan, Grégory Hammad open data, open science, data schemas, metadata, actigraphy, actimetry, rest-activity cycles, circadian rhythms, chronobiology, sleep research Mallory Freeberg Actigraphy provides a measure of the 24h rest-activity cycles based on movement counts, typically of the wrist. It is obtained using wearable devices and is a widely used, non-invasive way to determine sleep and circadian properties. Importantly, metrics derived from actigraphy are being increasingly used in clinical contexts, where groups of psychiatric and neurological patients in specific conditions have found to be exhibit abnormal rest-activity rhythms and sleep. Sleep and circadian parameters from actigraphy are derived measures. These are obtained by converting the movements counts (usually obtained at a resolution of 1 minutes) into sleep parameters and circadian metrics using algorithms raging from threshold-based computations to machine learning techniques. Unfortunately, at present, there are no standards or schemas for specifying and sharing actigraphy data and corresponding algorithms.\n\nThe goal of this project is to develop a common schema for the use, analysis, reporting and open and interoperable sharing of actigraphy data across different actigraphy devices produced by different commercial manufacturers and for use by researchers and research users. This project builds upon core research and technical expertise amongst the team members, and provides a framework to structure the work of the newly funded Chronobiology Data Standards Interest Group (CDSIG). 3 graduated
BioFerm: A web application used for kinetic modeling, parameter estimation and simulation of bioprocesses Olayile Ejekwu Kinetic modelling, optimization, Bioprocesses, Microbial growth, parameter estimation Renato Alves BioFerm is a web application platform which can be used for kinetic study, simulation and optimization of bioprocesses. The user is able to calculate the best initial conditions as well as overall operating conditions which will result in the highest product yield (or any user specified output). Kinetic modelling can also be done to further analyse the process and to calculate and estimate yield and kinetic parameters respectively. This allows the prediction of substrate, product and biomass concentrations over the bioprocess period. The BioFerm web application will be able to take in a variety of bioreactor configurations (batch, fed batch, continuous) and fit the results to a variety of models(inhibition and non-inhibition) to return the above mentioned parameters. The software is currently being written in Python using an open-source app framework(streamlit) to run the app but will later be written using Django, also a popular web framework. 3 graduated
Intellectual Property, Indigenous Knowledges, and the Rise of Open Data in Australian Environmental Archaeology Carly Monks Australian archaeology, Open data, Indigenous Knowledge Esther Plomp This project will investigate existing literature on the benefits, risks, and limitations of open data practices in Australian environmental archaeology, seeking to characterise the ethical and practical issues associated with the dissemination of data owned or stewarded (either wholly or in part) by Indigenous communities. Environmental archaeology, and its partner field of palaeoecology, is inherently interdisciplinary, drawing on diverse lines of evidence including faunal and botanical remains, geomorphological records, and Indigenous knowledges in order to understand past and present human-environmental relationships. \nThe project will consider the tensions between Western scientific and Indigenous epistemologies, including the ways in which ‘data’ are understood and connected (or disconnected) to people and places, and where the boundaries of ‘archaeological’ and ‘non-archaeological’ environmental records lie. This project will provide the groundwork for the development of a larger, collaborative project engaging Indigenous and non-Indigenous researchers to advance a Code of Conduct for Australian archaeologists and palaeoecologists seeking to work openly while supporting the rights of Indigenous communities to manage access as they consider appropriate. 3 graduated
MiSET Publication Standards: A tool for AI-assisted peer-review of experimental information Fabienne Lucas rigor, reproducibility, peer-review, publishing, research quality defects, experimental methods, flow cytometry, tool, AI-assisted peer-review Sonika Tyagi The MiSET initiative aims to develop a minimum set of quality standards in the form of a quality assessment tool that evaluates the technical aspects of cytometry publications, and to fully integrate these flow cytometry standards into grant submission and publication requirements across scientific fields (Lucas et al., Cytometry A 2019). 3 graduated
Global Distribution of APOL1 Genetic variants John Ogunsola bioinformatics, data visualization, open educational resource Sam Haynes, Yo Yehudi Genetic variants of APOL1 commonly found in people of recent African ancestry can predispose to chronic kidney disease. It is however unknown if and to what extent the variants are present outside of Africa. This project aims to create a visual representation of the global distribution of the frequencies of these genetic variants, by mining genomic information from publicly available datasets. 3
LA-CoNGA physics (Latin American alliance for Capacity buildiNG in Advanced physics) Reina Camacho Toro, Alexander Martinez Mendez Open educational content, Data science training, Open science training Laura Ación LA-CoNGA physics is an Erasmus+ project, an European-Latinamerican network of 11 universities, 9 research institutions and 3 industrial partners (2 of them being in the data science field) in advanced physics. We aim to create a set of postgraduate courses in Advanced Physics (high energy physics and complex systems) that will be common and inter-institutional, supported by the installation of interconnected instrumentation laboratories. This program will be inserted as a specialization in the Physics masters of the 8 Latinamerican partners in Colombia, Ecuador, Peru and Venezuela. It will comply with the Bologna protocols and is based on three pillars: courses in physics theory/phenomenology, data science and instrumentation. \n\nWe are guided by the principles of open science and education:\nContent should be engaging and pedagogical\nThe content will be created and made available following the FAIR principles\n*Reproducibility is the base of the data science pillar. We want to teach the students how to use the correct tools to work with large amounts of data but also create an environment where the reproducibility of their work, tasks and projects is inculcated and applied from the first day\n\nOur website:\nOur github repository: 3 graduated
Junto Labs - Advancing Virtual Environments for Life Science Research and Active Learning Lomax Boyd online research, mentorship, virtual environments, Jupyter notebooks Melissa Burke Inspired by the social clubs founded by Benjamin Franklin, the Junto Labs initiative seeks to provide life science researchers with an online space for pursuing collaborative research and supporting active learning. Life science laboratories can be open and highly collaborative spaces for in person research, learning and discovery. While online tools, such as Git and Jupyter notebooks, help facilitate openness and reproducibility among peers, they can also provide a highly creative and flexible medium for designing interactive educational experiences. The Junto Labs initiative aims to create a catalog of Jupyter notebooks that exemplify how to design virtual environments optimized for conducting research, facilitating mentorship, and encouraging active learning. Researchers would be able to more easily collaborate on active projects, but also expand active learning opportunities for students who may not otherwise have the chance to participate in research. Importantly, life science laboratories could use the resource to design and provide research and mentorship opportunities to students from under-resourced communities or universities where opportunities to participate in life science research are limited or nonexistent. 3
metaNanoPype: a reproducible Nanopore python pipeline for metabarcoding António Sousa metabarcoding, python pipeline, reproducible Hans-Rudolf Hotz The emergence of short-read NGS technologies have brought a profound knowledge to the field of microbial ecology/evolution through the taxonomic identification of microbial communities - metabarcoding. Although its main limitation resides on their short read-length that has been suppressed by long-read/real-time sequencing technologies such as Oxford Nanopore MinION. Currently, there are many standalone tools/algorithms to process this data inclusive bioinformatic pipelines but they lack a better integration. My proposal is the development of a modular python pipeline for nanopore metabarcoding data - metaNanoPype - with the following modules: (I) demultiplexing; (II) quality-assessment; (III) quality-filtering and trimming; (IV) taxonomic classification; (V) diversity analyses. Each module could include several options to allow flexibility. Each step could generate a log file used later to build a report in html/pdf format describing the versions, commands and references of software used. The report built would ensure reproducibility, transparency, acknowledgement and could be used as supplementary material of papers. metaNanoPype could be publicly available on github (open source) with further documentation published with github pages. 3 graduated
MBiO: Designing an open-collaborative website in the field of molecular biology Nihan Sultan Milat open educational resource, open-collaborative, molecular biology Michael Landi, Renato Alves, Toby Hodges The field of molecular biology is a concept to discover, identify and explain mechanisms of everything about DNA, RNA and protein level in a cell. Despite it is a relatively young discipline, its prominence in the life sciences is becoming more and more popular. Within the scope of my project, I aim to evaluate the paper on molecular biology studies and make them available for everyone. Choosing a weekly topic and summarizing it that everyone can understand is the main idea. As a workflow, I aim to write a brief introduction to introduce the paper and its authors, the purpose of the study, and present the results. Briefly, I would like to design a website which is publicly accessible. I aim to make this website as a resource for the academic community, students and all other folks who want to read and learn. At the same time, I plan to prepare a section where questions can be asked in order to share with other readers to make a discussing community about the related article. I want to provide a connection between students or researchers in this field of science to improve knowledge, share and even find new ideas. 3 graduated
Open Life Science (OLS) Program, a driver of open science skills among early stage researchers and young leaders: mentee perspective Muhammet Celik value of OLS, participant perspective, internalize openess, pendown Bérénice Batut, Yo Yehudi, Malvika Sharan OLS is a great platform to open life science with the objective of train young researchers towards the practices in open science skills. I am a graduand of OLS-2 that recently concluded and coming out of that program I felt that there is tremendous value in the program. However, this might not be reaching out to as many as possible. I think, one way of the extending the outreach beyond what else has been done is perhaps to pen down the experience of the participants of the previous program. As a participant myself, I can see how there many ways, one could promote this and share the journey with the readers, especially with the young generation and highlight the essence of this program. Thus, I took it motivation to myself to contribute in this direction by re-joining the OLS-3 program and having this it self as a project with the goal of coming up as a tangible document in the form of a publication to be shared with the community at large. 3
Documentation enhancement with open science practices in sktime Afzal Ansari, Abdulelah Al Mesfer sktime, documentation, algorithm maintainer, codeowner Toby Hodges sktime is a new Python toolbox for machine learning with time series ( It provides state-of-the-art time series algorithms and scikit-learn compatible tools for building, tuning and evaluating complex models. The goal of this project is to improve sktime’s online documentation with a specific focus on documenting algorithm contributors. Algorithms form a major part of sktime. They require special expertise in their development and maintenance. We plan to enhance the existing documentation by making algorithm contributors more visible. The aim is (i) to make it easier for users and other developers to directly get in touch with the algorithm experts to ask questions or suggest code improvements and (ii) to recognize their contributions more visibly and formally to encourage long-term maintenance of their contributions. sktime has already defined a new community role as part of their governance guidelines to ensure that algorithm contributors have extra rights and responsibilities with regard to their algorithm. However, up-to-date documentation listing the current contributors and links to their algorithms is currently missing. Optionally, we can add other information like literature references. We plan to automate the generation of this documentation by making use of the existing documentation and other components such as CODEOWNERS file and author strings in Python files. 3 graduated
An Open Source Service Area for Turing research projects Sarah Gibson open-source, research, strategy Meag Doherty This project is to develop a Turing Service Area in Open Source that will provide formal support in open working and embedding best practices of open software development into Turing projects. This service area will create an Open Developer Advocate position whose role will be to work with and guide projects into working openly and either build a community around their open project, or make a contribution to an existing open project.\nThis guidance would take the form of regular meetings, co-working and/or drop-in sessions and would address roadmapping of the project in terms of its open goals, and developing project policies for engaging openly. The area would work with the Turing Way project to draw on existing material and contribute new processes there. 3 graduated
Towards FAIRer phytolith data Javier Ruiz Pérez, Juan José García-Granero, Carla Lancelotti, Marco Madella FAIR, data sharing, palaeobotany, phytolith research, archaeology, palaeoecology Emma Karoune Phytoliths are microfossils of plants used world-wide to address a variety of questions in fields like archaeology, palaeoecology and palaeontology. Diverse laboratory procedures, analyses and identification criteria are used resulting from different research traditions. Some steps, such as the normalisation of nomenclature through the International Phytolith Society, have been promoted to standardise the phytolith analysis and the subsequent publication of data. However, the standardisation of phytolith research and data publication is still far from being achieved. Moreover, a recent assessment of the data sharing practices within the phytolith community found only half of the publications share some form of data and the majority do not provide reusable data. This project has grown from initial efforts by Emma Karoune during OLS2 to raise awareness of issues with poor data sharing practice. It is part of a broader initiative supported by the International Phytolith Society on data sharing and represents the first steps towards the FAIRification of phytolith data: an evaluation of sharing practices in phytolith research; the creation of a GitHub repository for collaborative use by this working group and in the forthcoming FAIRification project; and the development of a webpage to provide the community with information as the project proceeds. 3 graduated
Systems Genomic Integration of Diabetes Related Genes: A Quest for Development of Biomarkers Arvinpreet Kaur, Ashutosh Tiwari, Robandeep Kaur, Mehak Chopra, Harpreet Singh, Prash Suravajhala Obesity, Diabetes, Gut microbiome, Linkage disequilibrium, pleiotropy Prash Suravajhala, Harpreet Singh, Bérénice Batut Obesity causes approximately 4.7 million premature deaths annually, which accounts for a loss of ca. 8% globally. Obesity is an outcome of complex, heritable, and multi-factorial interaction of multiple genes, environmental factors, and behavioral traits that makes management and prevention challenging in the human population (Rao et al., 2014). Experimental research has demonstrated that altered metabolites in multiple metabolic pathways are associated with obesity (Zhao et al.,2016). Alteration in the proportion of bacteroidetes and firmicutes in the gut microbiome can trigger obesity. The gut microbiome’s influence on obesity is much more complicated than the imbalance of these bacteria species. Modulation of the gut microbiome through diet, prebiotics, surgery, and antibiotics significantly affects the obesity epidemic (John & Mullin, 2016). It is one of the enormous global health problems associated with increased morbidity and mortality mediated by its association with several other metabolic disorders (Saini et al., 2018). We aim to target obesity and diabetes-associated metabolic disorders and annotate the genes common to these complex diseases using a systems genomic integrated approach, thereby using Galaxy as a platform. 3 graduated
Boosting research visibility using Preprints Didik Utomo, Hilyatuz Zahroh, Zenita Milla Luthfiya preprints, open resource, open access, publishing Iratxe Puebla AKADEMISI PREPRINTS is a free distribution service of preprints from multidisciplinary fields. The server plans to include connection hub to journals and open peer review community. By doing so, we hope to promote the transparency and quick visibility of research results to the public. 3 graduated
Open Science Community in Saudi Arabia Batool Almarzouq Open science, Saudi Arabia, Community Anelda van der Walt Although there is an increasing number of initiatives in Saudi Arabia to raise awareness in Data Science (DS) and connect researchers in artificial intelligence (AI), there is no single community dedicated to stimulating responsible research practices and Open Science policies. I wish (with the help of a mentor) to establish an open science community in Saudi Arabia. Our target groups are researchers and students who are open and curious about open science but have little to no experience with open science practices. 3 graduated
COMPUTATIONAL DRUG DISCOVERY (CORONAVIRUS) Anshika Sah SARS coronavirus 3C-like proteinase, IC50, pIC50, Bioactivity, Lipinski’s rule, Scatter plot, Frequency plot, Box plot, Mann-Whitney test Yo Yehudi Biological activity data was retrieved from the ChEMBL database and pre-processed by selecting the target which was SARS coronavirus 3C-like proteinase in the project and the data frame of the target protein was filtered by removing the molecules which do not have the standard type as IC50 and those having missing value for standard value.\nThe data was distributed as active, inactive, and intermediate by the IC50 values.\nThe SMILES notation (representing the unique chemical structure of compounds) from the dataset was used to compute the molecular descriptors.\nLipinski’s descriptors are used in the project which considers molecular weight, LogP, number of hydrogen bond donors, and number of hydrogen bond acceptors. These descriptors are related to the pharmacokinetic properties of molecules.\nThe exploratory data analysis was performed via Lipinski’s descriptors. Simple box plots and scatter plots were plotted to discern differences between the active and inactive sets of compounds.\nMann-Whitney U test was performed for each descriptor to determine the statistically significant difference between active and inactive molecules. 3 graduated
Skills for Open Agrobiodiversity Data Irene Ramos agrobiodiversity, open data, oer Piraveen Gopalasingam I aim to develop training materials to support the use of open data by researchers working on agrobiodiversity conservation. At CONABIO (Mexico), a governmental agency that coordinates biodiversity data collection, I collaborate in the development of an Agrobiodiversity Information System (SIAgroBD); my role involves technical and community management responsibilities. Currently, twelve teams of students and researchers from different institutions contribute to field data collection for SIAgroBD. While we are committed to open practices at CONABIO and all collected data are open, some external contributors lack the skills to use these data, even if they have helped collect them, and are not familiar with open practices. Thus my project consists in developing training materials (OER) for an introductory workshop on open data with a focus on FAIR principles, biodiversity standards, effective management strategies, among other skills that encourage contributors to become active users of data apart from collectors. The integration of social and biological information and the use of Indigenous data are distinctive features of agrobiodiversity research in Mexico that will also be addressed. I expect this serves as a prototype for advanced training modules that could be used by future contributors or other researchers working on agrobiodiversity topics. 3 graduated
Postdoc Empirical Legal Research Open Notebook Jennifer Miller postdoc, empirical legal research, systematic review, public policy, open notebook science Beth Duckles The project is an open notebook living systematic review of legal documents related to postdoctoral scholars and appointments (postdocs). The project aims to use the methods of empirical legal scholarship to describe and categorize the ways postdoctoral scholars and their appointments have been involved in the legal system. Briefly, empirical legal scholarship is a form of qualitative or mixed-methods research, often involving content analysis, that uses legal documents or decisions as its data source. We are not aware of any other research applying this method or data source to the study of postdocs. In fact, there has been little research of any kind on the legal aspects of postdoc appointments.\n\nBuilding on a “file drawer” paper by Jennifer Miller (with Kristina Van Buskirk), we frame our project around the question of whether postdocs are employees or students. Based on economic theory, we expect the types of cases to reflect whether postdocs are employees producing in a labor market or students consuming in a services market. \n\nMore information about the project is available on GitHub and Zotero 3 graduated
The UKCRC Tissue Directory and Coordination Centre Emma Lawrence, Jessica Sims Biobanking, research, samples, biospecimens, COVID19 Sarah Gibson The mission of the UKCRC Tissue Directory and Coordination Centre (UKCRC TDCC) is to maximise the use, value and impact of the UK’s human sample resources in the UK, and beyond. The UKCRC TDCC is creating a world-leading, research-enabling, and networked biobanking infrastructure to facilitate the discovery and use of the UK’s human samples and data.\nThe UKCRC TDCC works to help researchers discover samples and data, help sample resources improve their data systems for sharing, and harmonise policy relating to the discovery and use of samples and data.\nThe work of the UKCRC TDCC is guided by the belief that the biomedical research ecosystem should be based on open standards, open-science, and pre-competitive collaboration. 3 graduated
Development of language resources for Hausa Natural Language Processing Shamsuddeen Muhammad, Ibrahim Said Ahmad, Ruqayya Nasir Iro Natural Language Processing, Low-resources, Machine Learning, Corpus, Language resources Laura Carter This work aims to create a Nigerian sentiment corpus, sentiment, and hate speech lexicon through manual annotation for three different languages (Hausa, Igbo, Yoruba). Our method for the creation of these language resources is as follows: \n \n Nigerian Sentiment Corpus: To create the sentiment corpus, tweets from major Nigerian news headlines for each of the three languages will be crawled from Twitter using an existing Python crawler we developed. Ten thousand tweets will be extracted per language via the Twitter API. Thereafter, the tweets will be annotated by native annotators for each of the languages. These annotators will be hired and trained to perform the annotation. The annotation tasks consist of labeling each tweet as either positive, negative or neutral.To mitigate errors and bias, each dataset will be annotated by three different annotators. After which the project team will compute the kappa agreement between the annotators\n \n Nigerian Sentiment Lexicon: In the same way, manual annotation of the tweets will be used to create the sentiment lexicon for each of the three languages. The sentiment lexicon annotation task involves Identifying sentiment bearing words from each tweet and assigning a sentiment score between +1 to +5 (with 1 being the most negative sentiment and +5 the most positive sentiment).\n \n Nigerian Hate Speech Lexicon: Extreme negative sentiment from the sentiment lexicon will be used to develop the hate speech lexicon. \n \n Annotation tool: We plan to use a web-based annotation tool, brat (Stenetorp et al., 2012) which has been proved to be efficient for this type of task by many researchers. The annotators must be native speakers of the language and follow the annotation guidelines provided by the project teams. \n \n HausaNLP aims to create more language resources that can be used to train models in machine learning. 3 graduated
ProCancer-I - An AI Platform integrating imaging data and models, supporting precision care through prostate cancer’s continuum Haridimos Kondylakis, Stelios Sfakianakis Prostate Cancer, Open Data, Pca Harpreet Singh In Europe, prostate cancer (PCa) is the second most frequent type of cancer in men and the third most lethal. Current clinical practices, often leading to overdiagnosis and overtreatment of indolent tumors, suffer from a lack of precision calling for advanced AI models to go beyond SoA. The ProCAncer-I project brings together 20 partners, including PCa centers of reference, world leaders in AI, and innovative SMEs, with the objective to design, develop, and sustain a cloud-based, secure European Image Infrastructure with tools and services for data handling. The platform hosts the largest collection of PCa multi-parametric (mp)MRI, anonymized image data worldwide (>17,000 cases), based on data donorship, in line with EU legislation (GDPR). Robust AI models are developed, based on novel ensemble learning methodologies, leading to vendor-specific and -neutral AI models for addressing 8 PCa clinical scenarios. To accelerate the clinical translation of PCa AI models, we focus on improving the trust of the solutions with respect to fairness, safety, explainability, and reproducibility. A roadmap for AI models certification is defined, interacting with regulatory authorities, thus contributing to a European regulatory roadmap for validating the effectiveness of AI-based models for clinical decision making. 3 graduated
Seeding Dario Pescini, Marzia Di Filippo, Chiara Damiani, Paolo Pedaletti community building, Systems Biology, Metabolism, quantitative Life Science, technology, infrastructure Bérénice Batut The long term project objective is to establish in my university a core-team/lab able to aid the community to design and implement open science projects.\n In order to start this long term project I believe that working on a use case would help various aspects.\n It will help to coalesce and uniform the team domain knowledge, to start to get involved also the technical and administrative part and, to gain visibility and credibility.\n The use case is a computational framework to aid the metabolism modelling, that we are currently developing in my lab and it is on the way to be published.\n The publication that will accompany this framework is near to be ready to be submitted and the framework itself is wholly developed with open software.\n This use case, in particular, is suitable to follow various aspects of the Open Science approach, from the journal paper management to the software publication.\n I think that this application can be great opportunity to learn the open science approach in an organic way and to discover how to do it. 3
Creating a network of Open Science ambassadors in Spanish Health Research Institutes Marta Marin, Santi Rello Varona, Iris San Pedro Health Research Institutes, Network of Open Science Ambassadors, Best practices’ Toolkits, Open Science implementation Joyce Kao This project will create a network of Open Science (OS) ambassadors in Spanish Health Research Institutes (HRI). Thus, aiming to implement OS in HRI by raising consciousness about its principles that apply to this particular field (i.e., reproducibility, transparency, dissemination and data sharing). To induce an easy and comprehensive transition to OS, researchers will be provided with access to the best practices’ Toolkits for OS implementation. As part of their activity, OS ambassadors will be encouraged to engage with the general public, patients and the future generations of scientists.\n \n To accomplish that, professionals in the HRI willing to be trained to become OS ambassadors will be identified and recruited. This network will be in charge of promoting OS in their institutions. The ambassadors will identify potential OS activities that can be kickstarted, give solutions to questions raised and advice on best practices on their institutions.\n \n At the end of this project: a network will be created, together with a framework to maintain this group active, and ambassadors will disseminate the knowledge gathered about application of OS principles in health research in their institutes. That would allow the progressive implementation of OS in HRI. 3 graduated
FAIR MAFIL: FAIRification of imaging/neurophysiological data of MAFIL CEITEC MUNI laboratory for EOSC Michal Růžička, Michal Javornik, Zdenka Dudova Louise Bezuidenhout, Bérénice Batut Multimodal and Functional Imaging Laboratory (MAFIL, is one of core facilities at CEITEC MUNI and part of national large research infrastructure Czech-BioImaging and European research infrastructure Euro-BioImaging. The main role is to provide access to medical imaging technologies – mainly magnetic resonance imaging accompanied with various electrophysiological methods.\n Within this project we aim at preparing our data and metadata of neuroimaging datasets processed in MAFIL to follow FAIR principles and be ready for publication and cloud-based processing in EOSC. As MAFIL is “open access” laboratory, i.e. provides researchers outside of CEITEC access to the laboratories, technologies, and experts of CEITEC to conduct their analysis and support their research needs, the procedures will be document and training provided to MAFIL users (“customers”) to be aware of FAIRification procedures and able to apply them on their data making them “EOSC ready”. The outputs and experiences will also be shared with other labs/nodes within Czech-BioImaging/Euro-BioImaging infrastructures.\n Thus, we would very appreciate and welcome any training, help, advice, or good practise on FAIRification and anonymisation of neuroimaging datasets. 3 graduated
The Turing Way - Developing a community health report and assessing its impact on the wider data science community Ali Humayun Malvika Sharan In The Turing Way, we want to systematically understand community practices including the community engagement pathways, contributors’ roles and nature of their participation that have been successful at supporting its community of diverse contributors. Simultaneously, we want to identify factors that may currently prohibit short or long term commitments of our contributors and how they can be further supported.\n \n With my participation in OLS-3, I will develop a community health report of the project, capturing community development aspects from growth to retention. I will build upon the Open Source community health metric (, which involves evaluating contributors’ group that is actively involved in a project, number of new contributors that join the project, and members who leave. For online projects, it can also involve tracking the number of community ambassadors, the number of return attendees to events and the rate of churned attendees. Developing an ideal metric in this project will require further deliberation and consultation from The Turing Way team and core contributors. Hence, this project will be collaboratively designed with other community members by actively inviting their contributions and thoughts. 3
Developing and embedding open science practices within the Research Application Management team at Turing Aida Mehonic open science workflow, open source code, stakeholder engagement, research application, ASG Malvika Sharan I have just started a new role as a Research Application Manager at the Turing Institute. My responsibility is to define my own workplan as well as guide the development of the workplans of 2 other Research Application Managers once they are recruited. This is a new role and we do not yet have a blueprint for what a good research application manager at Turing looks like. \n \n My goal is to embed open science practices into the philosophy and the workflow of the RAM team, as much as possible within the constraints of a given project and Theme. \n \n Since I am personally new to the open science community, I would benefit from OLS training and mentorship. My ambition is to ensure that we create a good basis for open science practices within ASG and hopefully in other parts of Turing that RAMs interface with. 3
GyaNamuna: Virtual School Connecting Rural Students To The World Prakriti Karki, Mohan Gupta, Ujwal Shrestha online, village, education, DIY, Makerspace Teresa Laguna Project aims to connect rural school students to the cities of Nepal and other countries. Pandemic has helped few Nepalese rural schools get internet facilities. Our project will take help of internet, online conferencing tool and team of young people from different disciplines to connect our little heroes to the outer world to help them learn language, better understand external culture, meeting new friends, learning science, doing DIY innovations and initiating makerspace movement through virtual collaborative environment. 3 graduated
Open Source Project for Evaluating Reproducibility Trends in AI Research Projects Martina Vilas Reproducibility, AI trends, Data Science, Reproducible research practices, Computational research methods, Research software Anna Krystalli In The Turing Way, we define reproducible research as work that can be independently recreated using the same data and code from the original study. Reproducible research is necessary to ensure that scientific output can be trusted and built upon. Despite this importance, many studies are difficult to reproduce, including those involving the application of a computational model.\n \n To overcome this “Reproducibility Crisis” we need to identify and standardized reproducible practices that researchers can apply in their projects from the start. But these may vary across fields and methods. In this context, this project will quantitatively assess and derive those research practices that can ensure the reproducibility of studies involving the development and application of AI models for understanding cognitive-systems, with the overarching goal of increasing their transparency and trustworthiness. \n \n As a cognitive neuroscientist, I will develop a prototype of this assessment by identifying and openly documenting reproducible practices of computational modelling projects in my field. With my participation in OLS-3, I will review the reproducible practices of gold-standard studies and assess the level of transparency maintained in their research. I will also curate relevant guidelines and expert-recommendations. The findings will be collaboratively reported as chapters in The Turing Way. 3 graduated
Towards an infrastructure for open-source (online) training in data science and AI Mishka Nemes education, training, data science, AI, open infrastructure, community Jez Cope This project aims to devise, develop and implement an online tool that allows interested users to suggest or contribute to training courses in an open source fashion. The tool could involve a GitHub repository where users can suggest training ideas, review and comment on existing courses, or share their resources for the larger community. As the national institute for data science and AI promoting open, expert and ethical leadership, I believe the Institute and my team would be well placed to support such an engagement stream with the wider community of trainers and researchers. 3
Implementing a series of pedagogical games to teach pupils and citizens (metagenomic) data analysis Teresa Müller, Alireza Khanteymoori, Masako Kaufmann, Florian Heyl citizens science, DNA sequencing, metagenomics, Galaxy Yvan Le Bras As part of the Street Science Community, we successfully developed the BeerDeCoded project: a hands-on workshop for pupils and citizens with the general aim of scientific outreach. During these workshops, we help participants to extract and identify different yeasts contained in a beer sample. The identification is performed by sequencing the extracted yeast DNA, using our self-developed protocols, and analyzing the generated reads via an easy and straightforward Galaxy workflow. \n Because of the pandemic situation, we cannot run face-to-face workshops. For a more scalable outreach to the public and the long term sustainability of the project, we want to implement the data analysis as a series of fun and easy-to-understand online games. We will use already existing games to get participants interested and give them the biological background necessary for our project. Primarily we will develop a game, which teaches the data analysis of the BeerDeCoded project. Here, participants will get familiar with Galaxy, run and play with their first data analysis pipeline. They are going to compare their results with others and use different available datasets. For this game, we will work with the Galaxy community on the technical part and with teachers on the pedagogical and gamification side. 3 graduated
PyOrb 1.0 – Automated Analysis Tool for Orbital Interactions Tori Gijzen, Yuman Hordijk, Trevor Hamlin Software, Chemistry, Open-Source, Density Functional Calculations, Reaction Mechanism, Activation Strain Model, Bonding Mechanism Nadine Spychala Introducing PyOrb 1.0, a user-friendly, open-source software designed to streamline the analysis of orbital interactions within various fragments (e.g., atoms in a chemical bond or reactants in a chemical reaction). By harnessing density functional theory computations, this program automates the exploration of Kohn-Sham molecular orbitals (KS-MOs), offering invaluable insights into the fundamental driving forces behind the chemical bond formation and reactions. Our primary objective is to develop an intuitive tool that effortlessly generates and consolidates essential details, including orbital overlaps, energy gaps, orbital Gross Mulliken populations, and coefficients of fragment orbitals within the broader molecular orbitals. The PyOrb 1.0 program will address three major challenges associated with the analysis of MO bonding mechanisms: 1. It automatically identifies the dominant orbital interaction patterns. 2. It presents three different orbital interaction schemes corresponding to different stages of bond formation. 3. It constructs the orbital diagram including a final plot summarizing the key molecular orbitals. While we have already created an initial prototype, there are several enhancements required to enhance the usability of our tool. PyOrb 1.0 holds immense potential in simplifying the systematic analysis of atomic and molecular bonding mechanisms, allowing users to concentrate on advanced analysis tasks. We plan to make the PyOrb 1.0 source code openly accessible online, accompanied by an introductory tutorial, to facilitate easy retrieval and utilization. 8 [‘VU Amsterdam’]
Understanding Natural History Organizational Structures, Data Sharing, and Coordination through Biodiversity Informatics and Biocollections Kit Lewers Science Of Science, Hci, Human Computer Interaction, Information Overload, Collaboration, Networks, Complex Systems, Information Theory, Mutli-Agent Systems Nicky Nicolson, Laura Carter An ethnographic study that will using participant observation, open interviews and collection of documents to extract information about networks and information ecosystems of research. Data collection will be in the form of field notes, interview notes and audio recordings and document analysis, respectively. The expected duration of the total study is four years, but this first phase is envisioned as a 4-month project. 8
Molerhealth Onabajo Monsurat Healthcare, Africa, Research, Opensource, Disease Misdiagnosis, Technology Mallory Freeberg The Molerhealth project is an initiative aimed at transforming healthcare in Nigeria through the development of an open-source electronic health records (EHR) application. My goal is to address the critical issue of disease misdiagnosis by enabling better information sharing and collaboration between healthcare providers, leading to improved patient outcomes and a more efficient healthcare system. Molerhealth will provide individuals with a secure and user-friendly platform to access, update, and share their comprehensive health records, regardless of their location or healthcare provider. Through this comprehensive EHR system, patients will have seamless continuity of care, as their medical history, test results, medications,allergies and treatment plans will be readily available to healthcare professionals. By harnessing the power of technology, Molerhealth will significantly reduce the rate of misdiagnoses in Nigeria. Doctors will have access to a complete and up-to-date patient profile, enabling accurate diagnoses, appropriate treatment decisions, and timely referrals to specialists when necessary. Moreover, Molerhealth will facilitate better communication between healthcare providers by providing a platform for secure messaging and consultation, enabling the exchange of vital patient information. This collaboration will lead to more informed decision-making, increased efficiency, and ultimately, improved healthcare outcomes. Through its open-source nature, Molerhealth will encourage community participation, innovation, and customization, making it adaptable to the unique healthcare needs of different regions in Nigeria. It will serve as a catalyst for positive change, empowering individuals, and strengthening the healthcare ecosystem as a whole. Together, we will revolutionize healthcare in Nigeria with Molerhealth, ensuring accurate diagnoses, improved patient care, and a healthier future for all. 8
Bioinformatics and Health Data Science Internship Daniel Adediran Bioinformatics, Data Science, Open Source Projects, Capacity Building, Industries Pauline Karega My project focuses on creating a dynamic platform that serves as a bridge between bioinformatics, data science students and the pharmaceutical, healthcare and biotechnology industries. By embracing an open-source foundation, the platform would facilitate collaboration and encourage the execution of open-source projects within these industries. This initiative aims to foster a symbiotic relationship where students would enroll in internships and gain practical experience while contributing to the advancement of scientific research and development. Through the platform, bioinformatic and data science students can connect with industry professionals, forming interdisciplinary teams to tackle real-world challenges. By leveraging their specialized skills, these students can apply their knowledge to diverse projects, such as genomics analysis, drug discovery, personalized medicine, and data-driven healthcare solutions. Moreover, the open-source nature of the platform allows for transparency, accessibility, and reproducibility of projects. By participating in these collaborative endeavours, students gain hands-on experience, expand their professional network, and acquire industry-specific insights and simultaneously, the pharmaceutical, healthcare, and biotechnology sectors would benefit from fresh perspectives, innovative ideas and cost-effective solutions provided by the enthusiastic student community. The platform acts as a catalyst for knowledge exchange, empowering students and fostering a culture of open-source collaboration within these industries. 8
An Open Handbook and MOOC on Computational Methods for African Media Researchers Alette Schoon, Henri-Count Evans Educational Resource Production,Media Studies, Social Media Studies, Research Methods, Big Data, Digital Inequality, African Academic Resources, Scraping, Natural Language Processing, Network Analysis, Data Visualisation, Machine Learning Sara El-Gebali This project will revolve around the production of two resources: an open access handbook and a MOOC on computational methods for African Media researchers. These will share various digital, computational and data research methods to study the media in African contexts (including social media) in an easy and accessable manner. We would like to provide some typical African data examples that users will be able to download to follow along. The resources will be based on the winter school Alette has recently convened that was geared specifically for digital media researchers based in Africa.Here we worked with international media researchers from the UK and the Netherlands who shared some of their approaches. They agreed that we could repurpose these into shared open resources as long as their contribution is acknowledged. 8 [‘Africa-OLS’]
dr Konrad Kording Scientific Rigor, Reproducibility, Neuroscience Fotis Psomopoulos Inspired by Neuromatch and the many other open educational projects, we will produce high quality notebooks to intuitively teach scientific rigor. These notebooks will combine videos, interactives, text and equations into engaging teaching materials. Results will be jupyter notebooks, jupyter books, and good old fashioned webpages. Community for rigor will build an education team, a tech team, and a community team. All materials will be shared CC-BY. We hope to collaborate with a wider open community in optimizing the materials for a broad range of scientists. If we can lower the incidence of non-rigorous science even just by a few percentage points we would majorly improve biomedical research. 8
Digitalization for sustainable Agricultural practices and climate change mitigation Nwakaego Gloria Ashiegbu, Olufemi Adesope, Joseph Chimere Onwumere, Dozie Onunkwo Technology, Climate Change, Agricultural Methods, Sustainability, Mitigation Umar Farouk Ahmad, Irene Ramos In order to mitigate climate change and ensure global food security, agriculture is essential. Traditional farming methods however causes environmental harm and increase greenhouse gas emissions. Agriculture has a tremendous amount of potential to be transformed into a sustainable and climate-resilient industry as a result of digitalization and cutting-edge technologies. This proposal describes a method for promoting sustainable agriculture and reducing climate change by using digitalization. 8
Wind Power System Stability Analysis for Grid Integration: A Statistical Mechanics Approach to the Swing Equation. Nanje Patrick Itarngoh Wind Power, Fluctuation, Stability Analysis, Swing Equation, Statistical Mechanics, Grid Integration Umar Farouk Ahmad Stabilizing wind power output for grid integration is a challenge due to variable-speed turbines, causing fluctuating power generation. To prevent voltage overloading, the grid power system must undergo rapid changes to maintain stability. The Swing equation is widely used for stability analysis, but its classical model may lead to erroneous conclusions. A modified Swing equation with forced oscillation and nonlinear damping is proposed, using the Lagrangian of the system derived from the maximum entropy principle. Statistical Mechanics preserves the Swing equation for power systems subjected to forced oscillations. The modified Swing equation is numerically integrated for various external load conditions, including constant, step, periodic, and stochastic loads. The steady state and transient stability of the system are determined by the load nature, magnitude, and damping. 8
Is my research used in clinical trials? - Tool(s) to assess societal impact using open scholarly data Maximillian Paulus, Ruben Lacroix, Matthijs de Zwaan Open Scholarly Data, Clinical Research, Citations, Societal Impact, Scientometrics Andres Sebastian Ayala Ruano In the past, researchers have relied on commercial databases for scholarly information such as citations. With the acceleration of open science, open scholarly databases have emerged providing access to everyone for free. However, where commercial providers add a layer of abstraction in the form of aggregations and scientometrics, users of open scholarly data are confronted with raw data that can be difficult to work with. As research intelligence, this is where we step in, aiming to turn raw data into useful information. While making custom reports and dashboards is time-consuming and incurs a cost, we are exploring ways to provide a basic set of open-source tools that can be used by everyone, without charge. Our proposal is a tool that allows researchers to check whether their (fundamental) research is being cited by clinical trials. While traditionally only the number of citations has been used to establish impact of a publication, we aim to provide a more inclusive measure of societal impact by investigating the citation network. We will also explore ways to disseminate impact e.g. through visualisations. This implementation will pave the way for a series of similar tools that promote the recognition of research on various levels. 8 [‘VU Amsterdam’]
Assessing the Zinc Finger Protein Mutation and Expression Profile in Kenyan Women with Breast Cancer Michael Kitoi Znf Proteins, Breast Cancer, Mutation, Gene Expression Stephane Fadanka The most recent evidence (2021) shows that breast cancer is the second leading cause of death in Kenyan women, after cervical cancer. Zinc Finger Proteins (ZNFs) have been shown to play a role in the progression of various types of cancers, including breast cancer. However, the impact of various types of mutations on breast cancer progression and prognosis have not been established in the Kenyan population. Lack of comprehensive knowledge on specific mutations within ZNFs 703,750 and 213 in breast cancer among Kenyan women population has hindered the development of diagnostic and therapeutic strategies for this population. 8 [‘Africa-OLS’]
openTECR - an open database on thermodynamics of enzyme-catalyzed reactions Robert Giessmann, Teddy Groves Enzymes, Thermodynamics, Open Source, Community Nicolás Palopoli I aim to create a reliable, free, machine-actionable data collection of apparent equilibrium constants of enzyme-catalyzed reactions, with a clear change process to integrate new data and correct errors. There exists a data collection of other people in the field, but it is a free-floating csv file, and there are actually different versions of it used by different software packages. I want to unify those. This also includes cross-referencing / integrating other databases to exploit division of labor. Further, I want the project to provide a blueprint on how to set-up databases with no cost for infrastructure / developer capacity, given a specific set of technological skills. Currently I see three big problems: 1) create a FAIR representation of the data, 2) enable the community curation technologically, 3) motivate the community to invest work in the database. In the OLS Open Seeds program, I want to focus on problems 2) and 3). 8
The Open Umbrella & Hybrid Learning Field Guide Derek Moore Open Education Resources, Open Education Practices, Quality Improvement, Professional Learning Harini Lakshminarayanan The umbrella offers a teacher or lecturers who used educational technology, 8 entry points for improving their remote, on campus or in a hybrid courses. The field guide fleshes out these entry points with manageable examples and projects. 8 [‘Africa-OLS’]
Bioinformatics codeathon Pauline Wambui Gachanja, Diana Karan Codeathon, Bionformatics, Open Data, Collaboration Laurah Ondari, Pradeep Eranti This project involves a Bioinformatics codeathon. The purpose is to empower upcoming researchers interested in bioinformatics either as a career or to use bioinformatics tools in their research work. This project aims to provide a platform for bioinformatics enthusiasts to learn, network and appreciate bioinformatics. The codeathon will include two phases, beginning with participants pitching projects using publicly available data. Then followed by application of participants to the successfully selected projects. The project is estimated to run for 16 weeks. The first 4 weeks will be designed to train the participants in basics of bioinformatics, manuscript writing and how to make excellent presentations. During the codeathon, the participants will be expected to divide the roles among themselves and collaboratively work until completion of the project, The participants will present on the progress of their work twice a week. After the 16 weeks, the teams will present their work, the best teams awarded and successful projects published. 8 [‘Africa-OLS’]
Making corporate social responsibility data available and accessible for other researchers Marlou Ramaekers Fair, Data Sharing, Survey Data, Philanthropy, Corporate Social Responsibility, Business Administration, Economics, Organization Science Diana Pilvar, Gladys Rotich Since 1995, the Center for Philanthropic Studies at VU Amsterdam has conducted a biennial survey measuring corporate social responsibility in the Netherlands. The survey includes questions on corporate philanthropy in the form of giving money, goods and services to charitable causes, sponsorships of organizations, corporate volunteering, and corporate social responsibility behavior, among other topics. The thirteen waves of data have been collected for the Giving in the Netherlands (“Geven in Nederland”) series, which reports macro-economic estimates of the size, composition and trends in philanthropy. The data are rich and unique; they contain extensive information about corporate giving, volunteering and social responsibility as well as background information for large samples of the for-profit landscape in the Netherlands (total n ≈ 12,000). The data have not been documented for external use. As a result, the data have unused potential. The goal of the current project is to make the data publicly available in line with the FAIR principles in a repository that can be used for future editions of the survey. 8 [‘VU Amsterdam’]
Gender Differences in the Impostor Phenomenon: A Preregistered and Open-Science Meta-Analysis Bo Wang, Jacek Buczny, Wendy Andrews, Hans Ket, Reinout de Vries Gender, Impostor Feelings, Impostor Phenomenon, Impostor Syndrome, Mental Health, Meta-Analysis Umut Pajaro Velasquez, Nina Trubanová The impostor phenomenon (IP) refers to self-doubts about one’s abilities and difficulty internalizing individual accomplishments. These individuals attribute their success to external (e.g., oversight or luck), instead of internal (e.g., intelligence or competence) factors. Therefore, they are worried that they will be found out as intellectual frauds or “impostors”. This phenomenon was first described by Clance and Imes in 1978 and received great attention over the last decades. Although initial researchers suggested that especially women suffer from the impostor phenomenon, more recent empirical evidence regarding gender differences is mixed. The aim of the current meta-analysis is to examine whether or not there are gender differences in the impostor phenomenon to examine where these potential gender differences come from moderation and mediation analysis. When conducting the meta-analysis, we want to follow the Preferred Reporting Items for Systematic reviews and Meta-Analysis for Protocols (PRISMA-P; Moher et al., 2015) and use the PRISMA-P templates developed by Moreau and Gamble (2022). We will preregister our study by uploading these templates (e.g., hypotheses, article searching strategies, screening criteria, codebook) and share all study materials and data via the Open Science Framework. We will also keep a logbook to track all details of our project. 8 [‘VU Amsterdam’]
Creating an online repository for open collaboration in psychology. Vasiliki Kentrou, Antonis Koutsoumpis, Bo Wang Database, Open Repository, Online Study, Surveys Irene Vazano, Jesica Formoso, Patricia Loto This project aims to develop an online repository to facilitate open collaboration in the field of psychology. The objective is to enable researchers from various locations to share their anonymized data on a centralized platform. The shared datasets will be made accessible to other researchers for further analysis. To accomplish this, the development of machine-readable codebooks is crucial. These codebooks - part of the present project - will enable the automated reading and merging of individual researchers’ datasets, resulting in the creation of a comprehensive Giga database in psychology research. 8 [‘VU Amsterdam’]
JICMar Network: towards better opportunities and greater collaboration among young marine science researchers in Latin America Romina Trinchin, Nicolás Lois, Virginia Andrea García Alonso, Milagro Urricariet, Daniela Risaro, Loreley Lago Young Researchers, Marine Sciences, Latin America, Community, Accessibility, Equality Jose Luis Villca Villegas, Alexander Martinez Mendez The JICMar network (acronym for Young Investigators in Marine Sciences-Latin America) seeks to generate a space for the exchange of ideas and experiences that will serve as input to face the next steps in our professional careers. The virtue of the JICMar network lies in the collaboration to promote inclusion, gender diversity and equal opportunities for anyone studying or working on marine issues. The main activities of JICMar are related to generating a database of researchers, groups and institutes from different countries; socialising academic work with cutting-edge methodologies; centralising and sharing strategies for finding job opportunities and funding, as well as generating a job bank. The communication strategy of the JICMar includes the creation of a mailing list, social networks and a website to disseminate and carry out the various activities. 8
UbuntuEdu (Concept title) Roné Wierenga, Henk Wierenga Mother Tongue Education; E-Learning; Education; Short Courses; Multilingualism Tajuddeen Gwadabe, Joyce Kao I aspire to develop an e-learning platform inspired by Udemy that specifically caters to African individuals, enabling them to upload courses in their native languages. The primary objective of this platform is to promote mother tongue education and offer e-learning opportunities for Africans seeking courses developed by their fellow Africans. Extensive research has consistently demonstrated the advantages of mother tongue education, such as increased literacy rates, enhanced critical thinking abilities, and elevated levels of cognitive processing. While my ultimate vision is to serve the entire African continent, I plan to initiate this endeavor by focusing on South African languages. Accomplishing this goal will entail fostering a sense of community, engaging with educators including teachers, lecturers, and professors, as well as actively involving the general public to encourage content creation and course material publication. While platforms like MijnNederlands have been successfully established for specific languages like Dutch, to the best of my knowledge, no such platform currently exists for African languages. Building this platform will undoubtedly be an immense undertaking, but upon completion of this mentorship program, my aim is to have developed the website and initiated the process of community building. 8 [‘Africa-OLS’]
Building a Searchable Open Data Repository of DNA Collections that are Freely-Shared under OpenMTA, with an Associated Landscape Map Visualisation of the Collection Users Yan-Kay Ho, Cibele Zolnier Sousa do Nascimento Dna Collections, Openmta, Fair, Database, Repository, Github, Visualisation, Mapping, Landscape Analysis, Open Communities, Open Bioeconomy, Open Enzymes, Molecular Diagnostics Toolkits, Synthetic Biology Toolkits, Capacity Building, Biotechnology Sara Villa, Mariela Rajngewerc The documentation for the Open DNA Collections currently exist in multiple formats, and across three different platforms (the Open Bioeconomy Lab website, Freegenes website, and as collections on Addgene). Despite being a free and open resource, the dispersed nature of the available data makes it difficult for new users to understand which to use, and where to find the most up-to-date information. This project aims to gather all the disparate resources together, reformat the collections into a more standardised data structure, and produce a searchable/FAIR database on the website (the stewarding organisation) to make the collections more informative and accessible for new and old users alike. The outputs of this open data repository project will complement an existing Reclone project that aims to establish Regional Reagent Distribution Hubs in partner institutes (in Argentina, Ghana, and the Philippines) for sharing the physical DNA collections, and to help with capacity building within these institutes and local researchers. A parallel project goal would be to produce a landscape analysis and visualisation map of the current DNA Collections users to help future users in identifying local researchers within the Reclone community who may be able to better support their use of the collections. 8
Developing an Open Community Repository for SciArt Arianna Zuanazzi Science, Art, Sciart, Raw Images, Community Building, Open Access, Education, Outreach Saranjeet Kaur Bhogal In recent years, the traditional dichotomy between art and science has been increasingly challenged by initiatives and programs (e.g., online resources, mixer programs, “SciArt” projects) that question the rigid definition of either field. These endeavours strive to establish a decentralised collaborative environment where artists and scientists can come together, encouraging them to reconsider and redefine their identities and practices. As part of the open science effort to make scientific data widely available, this project aims to create an Open Community Repository for deposition of raw scientific images. Scientific images are often not accessible to the general audience, educators, and visual artists whose work would benefit from images that portray real scientific spaces, methods, data, and discoveries. We envision that an open repository dedicated to raw scientific images could strengthen collaborations between artists and scientists and kickstart new SciArt partnerships. 8
Assessing word stability in Corpora and its influence on learner dictionaries Mmasibidi Setaka Corpora, Dictionaries, Sesotho, Learners, Word Addition, Word Removal Riva Quiroga The task of assessing word stability in corpora through addition and removal of words is an important process which can influence which words should be in included or omitted in leaner dictionaries. The aim of the project is to investigate the stability of words in corpora when words are either removed or added, to gain insights into how word frequency distributions found in corpora have an influence on the outcome of learner’s dictionaries. The Oxford and McMillan dictionaries have top 3 000 words they deem important, and this project seeks to follow that example and assess the influence on word selection for learner dictionaries. 8 [‘Africa-OLS’]
A Fiji plugin for SOFI analysis Miyase Tekpinar Sofi, Fiji, Super-Resolution, Microscopy Diego Onna Fluorescence microscopy has significantly contributed to our understanding of cellular processes in recent decades. The development of advanced super-resolution (SR) microscopy techniques has allowed for the examination of cellular structures at the nanoscale. SOFI (Super resolution optical fluctuation imaging method) is an SR method which uses the intensity fluctuations from single emitter and can provide higher resolution even using a couple of hundreds frame from wide-field image. However its analysis are mostly limited with scripted code and is not accessible enough of biologist. Creating an fiji plugin is important to help biologists and spread the method to different research fields. 8
The invisible society: Misgendering in Facial Recognition Systems Elena Beretta Face Recognition, Gender, Non-Binary, Identity, Ethics Of Ai Bethan Iley Gender is a deeply ingrained constructs in human culture and societies. It permeates our everyday activities, whether real or virtual, from social interactions to our digital lives. By nature, the most significant aspects of culture are reflected in scientific progress, determining its development and evolution. The way technological change is shaped and structured is thus inherently grounded in societal norms and relations, which are themselves equally affected by technological transformations. In this sense, the relationship among technology and gender can be considered as mutually constitutive. This relationship is leading to at least two main consequences: i) individuals increasingly encounter representations of gender embedded in technology; ii) machines are being trained to recognize and react to relevant traits of human identity, including gender. These consequences are made even more evident by the increasingly pervasive use of Automatic Face Analysis Systems (AFAS), especially when those systems are based on face recognition. However, these systems are consistently built on a gender binary construct and almost never take into account non-binary individuals, causing exclusion and reinforce existing prejudices about these communities. The project will break this barrier by analysing how algorithms (un)recognize non-binary faces, to foster the development of inclusive, diverse and trustworthy AI. 8 [‘VU Amsterdam’]
Identification and Prediction of Bacterial Pathogens Colonizing Yellowing Disease in Coastal Kenyan Coconuts: A Machine Learning Approach Fatma Omar Coconut, Machine Learning, Next-Generation Sequencing, Bacterial Diversity Elisee Jafsia This project aims to investigate the diversity of bacterial pathogens associated with yellowing diseased coconuts along the Kenyan coast. The study will employ a combination of culture-independent methods and NGS techniques for bacterial identification. DNA extraction will be performed using CTAB method, followed by sequencing of 16S rRNA gene using Illumina MiSeq platform. The obtained sequence data will be analyzed using the Qiime2 pipeline. To accurately detect and classify genetic variants, machine learning models including XgBoost, LightGBM, and Random Forest will be trained using Python. The proposed research will shed light on the pathogenic bacteria associated with coconut diseases and facilitate the development of effective control measures. 8 [‘OLS-Africa’]
Open-source handbooks infrastructure at VU Amsterdam Lena Karvovskaya, Jolien Scholten, Koen Leuveld, Elisa Rodenburg, Jessica Hrudey Open Source, Handbook, Github, Rdm, Open Science, Template, Governance, Collaboration Arielle Bennett Inspired by the Privacy Handbook ( ) developed by Utrecht University, colleagues from VU Amsterdam want to create guides on various Research Data Management topics for their own organization. We take the Privacy Handbook and the Turing Way as examples for our work. The goal of the project is to work out the infrastructure which is necessary for a collaboration. We need a template that the team will be able to maintain. We also develop a governance model and contribution guidelines that will make it possible to keep the handbooks as up to date collaborative resources developed and maintained beyond departmental boundaries. 8 [‘VU Amsterdam’]
An online repository of tools and methods for automated archaeological survey Lucy Killoran Archaeology, Computational Archaeology, Computer Vision, Machine Learning, Heritage Management, Geospatial, Landscape Bjørn Peare Bartholdy The aim of this project is to build an online repository which will improve accessibility to automated methods for detecting archaeological features in remote sensing data. The online repository will primarily act as a model garden for archaeological computer vision models, but will also contain example notebooks sharing basic workflows that can be reproduced. I have agreements in place with vision model developers that will allow me to build on their existing technical work to provide the initial set of models. This project is aimed at stakeholders who are (1) interested in gaining an overview of technical research, (2) interested in understanding what is involved in the technical implementation of automated methods, or (3) actively looking for models to apply in their own work. In its first iteration, the model garden will be focused on vision models for archaeological survey but could be expanded in the future to include models for application to different areas of archaeological practice. If possible, and if necessary for the first prototype of the repository, I would also like to collaborate with the Turing’s Scivision project to build on their existing catalogue infrastructure to provide access to the models. 8
Open Science in Neuroscience: A Practical Guide for Researchers Amber Koert, Niels Reijner, Mar Barrantes Cepas, Eduarda Centeno, Dustin Schetters, Lucas Breedt, Nadza Dzinalija, Mona Zimmermann Neuroscience, Biomedical, Open Science, Resource, Innovation, Collaboration Siobhan Mackenzie Hall Our project aims to develop an innovative and comprehensive Open Science guidebook specifically designed for the neuroscience field, addressing the knowledge gaps and lack of guidance that hinder the implementation of open science(OS) practices among researchers. The guidebook will serve as a go-to resource for researchers, providing them with the necessary knowledge and practical tools to incorporate OS principles into their work. It will offer clear and step-by-step guidance on various aspects of OS, such as data management, pre-registration, sharing protocols, analyzing code, and publishing open access. What sets our guidebook apart is its tailored approach to the neuroscience field. Recognizing the diverse subfields and unique challenges within neuroscience research, we will curate and consolidate existing resources while also developing new content that specifically addresses these considerations. Moreover, this guidebook will offer easily accessible guidance to researchers at all levels: students, early-career scientists, and established professionals alike. It will serve as a valuable resource, supporting them in adopting OS practices and fostering collaboration, reproducibility, and transparency within the neuroscience community Overall, our project aims to bridge the gap between knowledge and implementation, empowering neuroscience researchers to embrace OS and contribute to the advancement of their field. 8 [‘VU Amsterdam’]
Estructura e infraestructura para la formación por cohortes virtuales Nicolás Palopoli, Monica Alonso, Julián Buede, Melissa Black, Paz Míguez Ciencia Abierta, América Latina, Comunidad, Enseñanza, Cohortes Virtuales Gemma Turon Como parte de nuestro trabajo en la construcción de capacidades científicas y técnicas en forma responsable y con una mirada local, en MetaDocencia ofrecemos cursos virtuales y gratuitos para la comunidad hispanohablante. A partir de la adjudicación de tres subsidios TOPST de NASA y trabajando en conjunto con OLS y 2i2c, dos organizaciones con intereses afines, migraremos hacia la organización de cohortes virtuales de formación enfocadas en ciencia abierta. Este proyecto busca explorar las alternativas disponibles para organizar cohortes virtuales efectivas y avanzar en el diseño y la implementación de una hoja de ruta que facilite la migración de MetaDocencia hacia esta alternativa de entrenamiento, sirviendo también como referencia para otras organizaciones con intereses similares. 8 [‘MetaDocencia’]
Gobernanza 2.0 de MetaDocencia Romina Pendino, Iván Gabriel Poggio, Paola Andrea Lefer, Laura Ascenzi Gobernanza, Formalización, Transparencia, Participación, Construcción De Capacidades, Mirada Local Verónica Xhardez En 2022 comenzamos un proceso de aprendizaje colectivo y colaborativo para diseñar la gobernanza de MetaDocencia. El objetivo fue elaborar un modelo transparente para la toma de decisiones estratégicas, pensado desde y para nuestro contexto cultural y regional. Para ello, definimos una modalidad de trabajo interna y un método de votación para la toma de decisiones por acuerdo mayoritario y de forma democrática. Discutimos, consensuamos y actualizamos nuestra misión y visión, para que reflejen con mayor fidelidad nuestra propuesta actual. Como resultado del proceso, quedaron establecidos órganos de funcionamiento y reglamentos internos, junto a roles ejecutivos, responsabilidades y funciones de quienes lideran cada equipo. La puesta en práctica de nuestra nueva gobernanza comenzó en diciembre de 2022, con la conformación de un nuevo Consejo Asesor (CA) con estructura y roles ampliados, y la convocatoria a una primera reunión de acuerdo a los principios de funcionamiento actualizados. En este proyecto nos proponemos completar la gobernanza de MetaDocencia formalizando y documentando las siguientes secciones: Pautas de Convivencia (PdC). Política de Conflicto de Interés (COI). Política Editorial Abierta (PEA). Plan para la revisión anual de nuestra gobernanza. 8 [‘MetaDocencia’]
Open Science Community Barcelona (OSCBa) Elisenda Bonet-Carne Harry Smith This project aims to create a community of local scientists to share and promote open science in Barcelona, Spain. I would like to use as an example the OSCU and other initiatives like openscience beers in Montreal. The initial goal would be to reach researchers from different institutions: UB, UPC, UPF, UOC, UAB, etc. and meet informally in a bar/pub periodically.
\nDuring the meetings we would present/talk about what we do, we will promote knowledge exchange and will discuss on how we could improve our research in terms of open science, for example performing more transparent research and sharing data/code. Later on, the plan is to use this community to create workshops and symposia about open science and to agree on some common needs. For example, a common need in life science research group would be to have resources to help researchers in terms of data sharing or interactive paper publication. If we can define common needs we will be able to apply for some funding to cover them.
\nWe will also use this plantform to motivate colleages from other parts of the country to create their own local community and then connect for some events.\n
Investigate feasible solutions to help create a system that creates, collects and processes data efficiently for impactful research especially for driving health research and driving health related policies Deborah Akuoko Vicky Nembaware Most sub Saharan African countries barely have quality health data for national policy making, especially on cancer. Thus, more work might be needed to improve the data quality, especially from regions with insufficient health data registries. This project seeks to investigate feasible solutions to help create a system that creates, collects and processes data efficiently for impactful research especially for driving health research and driving health related policies.
\nIn this project, I will focus on: improving quality of gender data generated in sub Saharan Africa for research and driving health policies, creating room to make health surveillance possible and easy for policy makers to take data driven decisions, and helping establish framework to run health data registries in more regions within sub-Saharan Africa.\n
Infusing a culture of open science within the community of researchers at the Zuckerman Institute Chiara Bertipaglia Mateusz Kuzak The community of Columbia University’s Zuckerman Institute is composed of neuroscientists at all career stages (graduate students, postdocs, faculty members). The Office of Scientific Programs, where I work, was established to build and nurture the community by fostering a professional environment that is welcoming, safer, diverse and inclusive. In its first year of life, the Office has identified relevant issues that seeks to pursue in order to support a collaborative climate of interdisciplinary research and discovery. Transparency is a key value of the Zuckerman Institute’s mission, and the Open Life Science mentor program is the perfect platform for me as the Assistant Director of Scientific Programs, to elaborate a strategy that will infuse open science best practices in the community. I hope that with the help of the cohort and the mentor, I can design a strategy to infuse a culture of open science within the community of researchers at the Zuckerman Institute. Ideally this would include power mapping, stakeholder mapping, action plans, and outreach. In the long run, work done in this project aims at developing community engagement for policy development, and will ultimately pave the way to establish the Zuckerman Institute as a golden standard for open science.\n 1 graduated
Bioinformatics Hub of Kenya (BHK) Festus Nyasimi, Margaret Wanjiku, David Kiragu Mwaura, Michael Landi Toby Hodges, Malvika Sharan The Bioinformatics Hub of Kenya is an entity that develops and manages training in bioinformatics and computational biology and provides a space in which research in bioinformatics is practiced.
\nThe aim of the hub is to provide a vibrant environment that fosters research excellence, facilitating the immersive engagement of established and upcoming bioinformaticians with computational and data intensive activities in biology and the life sciences, and promoting the discipline of bioinformatics in Kenya.
\nOur mission is to bridge the gap between well established and aspiring bioinformaticians through peer training and mentorship to provide a pool of qualified bioinformaticians who will focus on innovation and bringing novelty to assigned projects promising quality, integrity, reliability and trust in their services.\n
1 graduated
Expanding plenoptic Python Package Billy Broderick Rodrigo Oliveira Campos plenoptic is a Python package for computational vision with two main components: standard differentiable models of the visual system and synthesis methods. The models operate on images and make predictions about perception or neural activity, and are relatively simple, with a small number of parameters; they can thus be used out-of-the-box on arbitrary images. To start, we will focus on spatial vision, so the inputs are expected to be static grayscale images. Synthesis methods are used to better understand models of this type. The most common applications of computational models are simulation (holding the parameters and input constant, while predicting the output) and learning (holding the input and output constant, while fitting the parameters). In synthesis, the output and parameters are held constant while the input is fit. This allows scientists to develop better intuition about which aspects of images their models consider relevant and which they ignore, as well as to carry out more efficient and effective model comparison and validation experiments. We will include four such methods, all of which have been described in the literature, but none of which have standard, accessible implementations that work on more than a handful of specific models.\n 1 graduated
@Diversidade em Foco Bruno Soares, Naraiana Loureiro Benone Renato Alves Biodiversidade em Foco (@Biodiversity in Focus, in English) will merge a twitter account and a blog to promote science communication about Brazilian biodiversity. The project will coordinate daily posts communicating recent research findings about Brazilian biodiversity in twitter from researchers in different degrees of scientific career and thematic areas. Different researchers will administrate the twitter account during a week, time when they will provide daily posts about their area of expertise. Posts will be published in Portuguese in order to have higher national reach. Blog will present general information about the project and will summarize individual and community-level efforts, as well as semesterly reports of the project. Reports will be published in Portuguese and in English to share our experience to a broad audience and promote science communication. Reports will consider the number of tweets, participants, their experience during the account administration time, the number of followers and reach information in the twitter account and the blog.\n 1 graduated
Media Lab Nepal for Computational Biology Sudarshan GC Aidan Budd, Malvika Sharan Media Lab Nepal is only one community bio lab of Nepal focusing on democratization of life sciences through open science. It is an interdisciplinary team of students and experts with different backgrounds- basic science, engineering, journalism and management. It acted as a community partner for Darwin- India’s biggest evolutionary movement and innovation partner for Hult Prize Purbanchal University. The team is connecting innovations to entrepreneurship for maintaining sustainability. We are also partnering with community bio labs of our neighbouring countries like India, Bangladesh and Pakistan.
\nI want to initiate computational biology in Nepal to more number of enthusiasts from different fields during this program. Getting the support from mentors, like minded people and their mentoring will help for the sustainability of our project. At least if we can initiate this step and give platform to more number of students, we can benefit life sciences sector of whole country through open science initiative. I also want to make this initiative feasible and sustainable by creating incoming generating platform with computational biology.\n
1 graduated
InterMine Similarity Explorer Himanshu Singh Yo Yehudi InterMine is a powerful open source data warehouse system. It allows users to integrate diverse data sources with a minimum of effort, providing powerful web-services and an elegant web-application with minimal configuration. But InterMine warehouses are fragmented as there’re 30 InterMines which are not interconnected. Project BlueGenes aims to integrate all the sources and provide a single interface to access the data of all 30 InterMines.
\nThis project aims to develop a javascript based tool which will be embedded on BlueGenes list pages (which can be of genes or proteins). This tool will allow users to find similarities between different genes or proteins that are provided in the list. In order to enable this, we’ll make an explorable graph visualization of the relationships between InterMine entities (proteins or genes). Each entity or object will be represented as a node.
\nThe idea is that visualizing things helps us to spot patterns. So, we’ll have this tool, which will visualize the interactions and similarities between different entities and there will be options to tweak the appearance/structure to make things more explorable.\n
Open Connect Platform in Africa Lilian Juma Caleb Kibet In Open community, there is disconnect between Open practitioners and advocates hence need for platform to connect everyone within open landscape. Therefore, Open Connect will be a platform where everyone across the divide will be part of the solution through lessons, trainings and mentorship. This is an interactive platform targeting students, early career researchers, policy makers and senior researchers in shaping Open policies each country at a time in Africa. Since Africa poses different and unique development dynamics, development of all phases of this project will put into consideration various divides such as rich vs poor, rural vs urban, research literate vs research illiterate among others.\n 1 graduated
EMBL Bio-IT Community Project Renato Alves Malvika Sharan Bio-IT project at the European Molecular Biology Laboratory (EMBL) is an initiative established in 2010 to support the development and technical capacity of its diverse bio-computational community.
\nAfter finishing my PhD recently at EMBL, I took the next role as a Bio-IT community coordinator within the same organisation. In order to perform effectively in my job, I want to formally gain community management skills and understand the various tasks and responsibilities within the Bio-IT project. With my application to the Open Life Science program, I want to be paired with a mentor, potentially Malvika Sharan, who is my predecessor in Bio-IT. I want to take this opportunity to ensure a smooth knowledge transfer within the management team of the Bio-IT project and navigate my own path and future prospects as a community manager.\n
1 graduated
Oxford Neuroscience Open Science Co-ordinator: Changing Culture Cassandra Gould Van Praag Naomi Penfold The Wellcome Centre for Integrative Neuroscience (WIN) and other centres in Oxford Neuroscience recently received large awards which include substantial components of open science. These centres have thus far focused on developing the infrastructure required to share data and tools, and are now ready to turn their attention to the ethical, ecological and social challenges of supporting the uptake of open science practices. I have been recruited as an Open Science Co-ordinator for Oxford Neuroscience, to work between departments and alongside partners in other institutions to develop policies and recommendations for good governance that work for individual facilities (e.g. WIN), across departmental boundaries within medical sciences (e.g. Oxford Neuroscience), and then beyond into the wider University and national networks. The aim is to develop an open science community through an understanding of the principles of behavior change, including recognizing scientific and individual risks and benefits for research participants, researchers (at all levels), and the institution.\n 1 graduated
AMRITA: an online database for herbal medicine Anunya Opasawatchai, David Chentanaman Holger Dinkel In South East Asia, herbal medicine has been used over the century as preventive and therapeutics intervention. However, due to the scarcity of scientific evidence, the use of herbal medicines is limited to local wisdom. Several constraints such as the absence of funding, patent issues, and most importantly, the lack of modern scientific approaches have prevented the pathway to this gold mine of drug discovery. Here, we propose the development of AMRITA, an online web-based library of medicinal herbs that include their phylogeny, medicinal properties, known active components and related scientific literatures that is open publicly to the research community world-wide. This platform would facilitate the collaboration of researchers across the fields of biological, pharmaceutical, and clinical sciences in establishing a solid foundation of medicinal plants and enabling the use of these plants in modern medicine.\n 1 graduated
How to build an healthy and open lab Elsa Loissel Daniela Saderi, Patricia Herterich I would like to create a one-stop shop for new life science PIs who want to start their labs using principles, resources and practices found in the open space, but who may struggle to find and access this content themselves. More importantly, I would like to design an efficient communication strategy to bring this relevant content to young PIs in a way that saves them time and energy. This would first require understanding what PIs most needed when they started (and therefore which already existing resources are actually useful), and how it should be delivered to them - in which form, through which channels (e.g. young PIs newsletter), at what time(s). Then, I would like to curate existing content, repackage it in an easily accessible and “digestable” form, add to it, but also create readily usable tools (e.g. templates - not just things people have to read). Topics would include how to work openly and inclusively (e.g. lab codes of conduct, lab manuals, leading inclusive meetings, setting up a healthy lab culture etc.); how to do research open (preprints, toolkits with software to work open, project management tools etc.); how to train the next generation of open scientists; and how to open your work to the public. Finally, I would like to approach organisations and institutions to help more young PIs get access to these resources.\n 1 graduated
Benchmarking environment for bioinformatics tools Zoe Chervontseva Luis Pedro Coelho The aim of this project is to provide a flexible protocol and unified instruments for iterative benchmarking of bioinformatics tools solving various problems. The results of benchmarking for each particular task are to be published as a blog post. The posts are to be automatically updated each time somebody provide new data on the performance of any considered tool. There should be an ability to add new tools as well as new datasets to an existing comparison.\n 1 graduated
EDAM ontology Matúš Kalaš Björn Grüning EDAM ontology is a long-established project, defining a common, controlled vocabulary in the form of a network of concepts and terms. EDAM covers topics, operations, types of data, and data formats used in computational life science, i.e. in analysis of biological data, with some bias towards molecular life science and cellular bioimaging. Bioimage informatics was the first larger extension of EDAM beyond mainstream bioinformatics and computational biology, with highly successful development model engaging a broad community of bioimaging and bioimage analysis experts. The EDAM-bioimaging\n work was initiated by Jon Ison, Josh Moore, and others, and I have been\nresponsible for its further coordination.
\nEDAM is used in applications such as registries of computational tools or training materials (contributing to open science), in the Common Workflow Language (CWL), or in automated workflow composition. It is also in the process of implementation in various parts of the Galaxy project. EDAM is developed in a transparent and participatory manner, similar to modern open source software (it is available under CC BY-SA).
\nThe goal of this concrete mentorship project is to further extend EDAM, so that other communities can benefit from it, and from the applications and use cases it fosters.\n
1 graduated
Improving Documentation for Open Data and Software in the Agricultural Research Community Kristina Riemer, David LeBauer, Emily Cain, Jorge Barrios Katrin Leinweber We are members of the Data Infrastructure for Agriculture Group at The University of Arizona. Our mission is to provide scientists with open software, data, and computing that improves development of productive and sustainable agricultural systems. One of our persistent hurdles has been enabling and motivating the community to use our software and data.
\nTo address this, our project will focus on improving documentation for data and software that our group develops, as well as identifying and sharing best practices with the broader community. We will include the TERRA REF project, which produces high-resolution sensor data on crop plants, and the drone pipeline, which automates steps in the use of drones to study crops. Both projects produce open data (CC0 and CC-BY) and software (MIT/BSD compatible).
\nOur goal is to make our products more usable for our intended audience, which is primarily scientists across a range of disciplines from computer science to remote sensing. We plan to have each person in the group complete a sub-project that can be finished in 15 weeks. Each sub-project would result in an accompanying blog post describing the best practices implemented.\n
1 graduated
Improving open platform accessibility Christine Rogers Hao Ye I’m a member of the technical team for LORIS (, an open-source neuroinformatics platform that allows researchers to collaboratively collect, curate and share data on the brain, behaviour and genes.
\nAside from Jupyter notebooks, gitlab, and a few platforms that focus on data storage and linear processing paths, it can be hard to find platforms where researchers can safely collaborate with real transparency, while sharing as they go through a data collection lifecycle the actual workflows of their open science practices. I work already with an open-source project that provides these tools, but our platform can be challenging to adopt for new projects without dedicated technical resources, given the depth of knowledge and time required.
\nMy objective for this project is to begin tackling this from a few angles, including documentation, publication, and nudges in our code development process. I believe it is feasible in this timeline to also pilot open tools with a new project or as an extension to an existing project – to start putting this idea into action. What problem(s)\n
1 graduated
Enabling open science practices in biology education Sandy Kawano Fotis Psomopoulos The proposed project would establish a foundation for developing a tutorial on training researchers in biology to adopt more open science practices. The main goal would be to design a website (e.g., through GitHub) that would contain different chapters on how to accomplish different open science achievements (e.g., designing a Code of Conduct for collaboration, developing a data management plan, setting up an Open Lab Notebook) and that could be easily cloned to their personal websites and modified for their specific purposes. As an extension, I am hoping that this could then be expanded into a university course for undergraduate and/or graduate students or even as a workshop that could be help at different education centers, museums, or conferences. Not only would this allow junior faculty to get on the right foot of developing an open and inclusive lab, it would also help train the next generation of open scientists who trained under the faculty member. By making these resources freely available, it would also help maintain open lines of communication between lab members and make sure that expectations of the faculty mentor and student / staff mentees are fully transparent so that the mentoring process can be improved for science mentees.\n 1 graduated
Open Science UMontreal (OSUM) Samuel Guay, Danny Colin Andrew Stewart In 2019, we founded Open Science UMontreal (OSUM) because we believe in an Open by Design approach when it comes to Science. As a student initiative, we want to establish an inclusive and collaborative community where students and researchers at all stages of their careers feel welcome to learn, share, and discuss open science values, principles, and practices. By raising awareness of current issues, we hope this project will instigate a culture shift and help people realize how they can immediately benefit from using open and reproducible practices. As open science is a broad and wide-ranging concept with domain-specific definitions, we want to provide opportunities to meet and encourage the exchange of ideas within and across fields because we can all learn from each other. Throughout the Open Life Science mentorship program, we will focus on laying the foundations of our initiative by developing our web platform and organizing some-on campus activities.
Ultimately, we aim to facilitate networking between local “open initiatives” through Open Science Canada. Inspired by the Australia and New Zealand Open Research Network and the Open Science Communities in Europe, we hope that Open Science Canada will draw together like-minded people across the country and foster collaborations and the sharing of open resources and knowledge.
If you’re reading this and you’re interested in open research feel free to come chat with us on, an open-source alternative to Slack! We want to get to know you (and your initiative)!\n
1 graduated
Creating network of Data Champions at the library of Free University of Amsterdam Lena Karvovskaya Patricia Herterich From February 1st I am starting a new position as community manager RDM at the library of Free University of Amsterdam (VU). My ultimate goal is to bridge the gap between researchers and support personal by creating opportunities for networking and collaboration. The project that I would like to develop as part of the Innovation Leaders programme is setting up a network of Data Champions. Data Champions are local community member willing to share their discipline-specific expertise with colleagues; they advocate good RDM practice and advise and proper handling of research data.
Similar programs have been launched at the University of Cambridge, TU Delft, and Wageningen University. The main goal of the Data Champion programme is to drive cultural change towards open science and open data.\n
1 graduated
Multilingual Open Science: Creating Open Educational Resources in Central Asian Languages Zarena Syrgak equity, diversity, inclusion, open scholarship, multilingual open science, epistemic/linguistic, justice Alejandro Coca Castro, Stephen Klusza The project Multilingual Open Science aims to create an open educational resource about FAIR research in Central Asian (CA) languages. The goal is to popularise and raise local researchers’ awareness of open science practices and platforms and thus to tackle the existing structural barriers (e.g., linguistic, epistemic, etc.) in accessing the information and disseminating the research from Central Asia. Currently, the local scholarship is located - geographically, linguistically, epistemically - in the global periphery which hinders the knowledge production, use, and dissemination in the region. Many scholars, as the result, become trapped in fraudulent research and publication practices. Moreover, there are cases when openly accessible educational resources are offered to researchers as a marketable product. So, to address these and other issues related to inaccessibility of knowledge production infrastructure, this project aims to create an openly accessible, fair, and multilingual educational resource about open science with the particular focus on Central Asia. To this end, I am planning to use Jupyter Notebook and create a platform similar to The Turing Way. 5
Training Latin American scientists’ community to create high-quality open journals with good editorial standards Camila Gómez, Danae Carelis Davila Espinoza, Alvaro Andre Vargas Aguilar, Nelson Franco Condori Salluco, Jose Luis Villca Villegas training and education, research community, editorial community, open journals systems. Alexander Martinez Mendez Our initiative is called “Training Latin American scientists community to create high-quality open journals with good editorial standards”. We intend to build a community of native spanish speaking scientists who teach and learn about creating editorial boards and open access scientific journals that meet international high editorial quality standards. We intend to organise hands-on workshops about different steps in the editorial process of creating scientific open access journals, with experienced trainers and with open source tools. Some of the topics we would like to cover are:\n- Acquiring ISSN code for online journals,\n- Acquiring plug-ins for displaying online journal view statistics,\n- Implementing DOIs for scientific articles,\n- Implementing Ithenticate to stop plagiarism in articles submitted to the OJS system.\n\nThese workshops would merge synchronic and asynchronic activities throughout a period of 6 weeks, so that trainees get to learn all the necessary concepts through, familiarize with the required software and conceptualize and brainstorm solutions to the local needs for the journals they intend to develop, with peer support and mentoring. 5 graduated
Building a Cloud-SPAN community of practice Evelyn Greeves Education, Training, Environmental biotechnology, Omics, Community Anne Fouilloux Cloud-SPAN trains researchers, and the research software engineers that support them, to run specialised analyses for environmental omics datasets on cloud-based high-performance computing infrastructure. It is a collaboration between the University of York and The Software Sustainability Institute funded by the UKRI Innovation Scholars award (Project Reference: MR/V038680/1). The primary objective of the project is to generate training materials and opportunities which are open, accessible and reusable (FAIR) for all researchers. We aim to build an engaged community of practice around the participation in, and the development and maintenance of, the materials. The community-building element would be the main focus of the OLS scheme project. 5 graduated
Developing a carbon footprint database for buildings in Ghana Sitsofe Morgah Sustainabilty, Open science, Carbon footprint, Buildings Anne Treasure There is a considerable gap in cases of sub-Saharan African countries regarding assessment of embodied energy of building materials and operational energy of various building types. Lack of data remains a critical barrier to closing this gap. Creating a database of embodied energy of building materials and operational energy of building typologies will be key in establishing the carbon footprint of buildings in Ghana. The development of an online platform will also allow interested groups, individuals and cooperation’s to submit key information needed for the computation of energy outputs in buildings. The aim of this project is to explore the scope and path towards establishing an open database and an inclusive community. 5
Publishing development plan for the Open access journals of TU Delft OPEN Publishing Frédérique Belliard open publishing, open access, open peer review, scholarly communication Arielle Bennett, Julien Colomb TU Delft OPEN Publishing, Open Access academic publisher of the TU Delft, publishes open access (OA) journals and open (text)-books. The OA journals currently receive not enough support regarding their development. This project aims to bring all OA journals to the same high level of quality expected of any TU Delft product. This project will establish the identity of TU Delft OPEN Publishing as a trustworthy academic publisher within and beyond TU Delft.\nThe success of the plan is linked to the open collaboration of the journal editorial boards. The publishing plan will help journals\n- Identify their readership by targeting the correct authors,\n- Evaluate the quality of their publications (Impact),\n- Determine their publishing priorities,\n- Consider other publishing options (special Issues, new article types, publishing peer review comments, cascading),\n- Growth,\n- Improve communications and networking,\n- Diversify the editorial board\n- Take part in the education of young researchers\n\nThe plan needs to integrate as much as possible open science principles such as Open Data, Open peer review or authorship transparency in their publishing processes. While the plan will benefit all, It should consider the specificity of each journal. Overall the plan has to demonstrate its value to the editorial board. 5 graduated
Why R, for those with legal backgrounds using examples from South Africa Biandri Joubert Rstudio, qualitative research, quantitative research, legal research, law Batool Almarzouq This project is envisioned as one that ends up as something that can explain the “why” when encouraging people from legal backgrounds to learn R and some data science. A case for open and reproducible research in a field that does not typically use R and qualitative methods. The idea is to get to that point by creating different data sets derived from commonly used legal sources that law students or graduates would be familiar with and incorporating them into a single platform with a few examples of practical use and application as well as code to encourage such persons to see the “why”. On this platform, I would like to share resources to places where the intro to R courses are available, etc. At this stage, this is an idea I have and I hope to develop it as the week’s progress. 5 graduated
NGS-HuB: An open educational resource for NGS data analysis Nihan Sultan Milat, Faruk Üstünel, Esra Büşra Işık, Birgül Çolak-Al Next-generation sequencing, data analysis, bioinformatics, genomics, open education, open resource Beatriz Serrano-Solano, Fotis Psomopoulos Next-generation sequencing (NGS) is a revolutionary technique with wide applications in biology. Its applications are widely being used from young researchers to pioneer researchers of the field. The NGS data analysis has various approaches and many resources are explaining these methods. However, it can be difficult for beginners in this field to quickly understand and apply these methods. Also, beginner researchers might not have enough knowledge to overcome the most common errors that are encountered during these analyses. Due to the challenges that we mentioned above, in this project, we aimed to create a comprehensive open educational resource explaining NGS data analysis and its different methods and offer an example for the application of methodology and solutions to the most common errors. We believe that this project will be enlightening for anyone interested in NGS data analysis by providing a necessary roadmap. 5
Community engagement: Building an open online learning community Sarah Nietopski community, community engagement, open learning, Data Science and AI education, training Caleb Kibet This project aims to establish and nurture an active (virtual) community of learners with the Turing. The Institute is in the process of creating an open online learning platform to host training and learning materials, covering a range of subjects related to Data Science and AI. In order to ensure that the offering is useful to our audience, that it is responsive to their needs, and that it continues to grow, it is key to involve them in the creation and direction of the resources. Having real user voices and input will help to create an open learning space that is truly useful and valuable. Not only that, but the platform can serve as a central meeting point where learners can come together over shared interests and issues, and collaborate on projects that may have a wider impact.\nIn order to do this, learners will need a way to connect and communicate effectively. 5
Finding paths through the FAIR forest; linking metadata from forestry models and datasets to assist analysis and hypothesis generation Kim Martin forestry, xylogenesis, wood formation, ecophysiology, FAIR, metadata, models, data, ontologies, knowledge graph Deepak Unni This project aims to assist researchers in the field of wood formation and ecophysiology to explore datasets and computational models in a flexible and integrative way. The goal is to provide a platform - in the form of an open knowledge graph and associated interface - that will allow researchers (even those with minimal familiarity with the underlying technology) to explore linked representations of metadata for the included datasets and models. Researchers should be able to survey a variety of models that target phenomena at different scales; ranging from process-based models of the cellular determinants of wood formation, to empirical models of gross tree growth in different environmental contexts. The linked information should allow complex questions to be asked, including: how similar models differ; which datasets can be repurposed to test different model outputs; and identifying whether and how different models can be composed together. This will promote open scientific practices in this research area (through the use of common metadata standards and terms), and may serve as a valuable framework for collaborative knowledge capture and exploration. 5
SciHack: Promoting the Open Source and DIY movements in Peru Maria Andrea Gonzales Castillo, Nadia Odaliz Chamana Chura, Piero Beraun, Darwin Diaz, Sandra Mirella Larriega Cruz, Jhon Anderson Pérez Silva, Rodrigo Gallegos DIY Bio, Open Source, Biohacking, Computational Biology, Python programming, R programming Diego Onna We propose SciHack, as the first Open Community Lab in Peru. Our principal aim is to make available the tools and resources necessary for anyone, including non-professionals, to conduct biological engineering research and learning. As part of our activities, we will focus on democratizing science and biotechnology knowledge in low-income populations of Peru where there’s little to no presence of science and technology education. To address these issues, we will conduct workshops about DIY lab equipment, Bioinformatics open source projects and educational resources, and molecular biology tools to train teachers and teach students how to make and analyze scientific experiments. Furthermore, by the end of our activities, we would like to implement a space for bio-makers of all ages and backgrounds to conduct research projects and build prototypes. In this way, we would be fighting against misinformation of science methodology and results that are primordial not only for the current COVID19 situation but also for the progress of science. 5
Data Science and AI Educators’ Programme AND Tools, Practices & Systems Peer Mentorship Programme Ayesha Dunk, Bridget Nea, Andrea Kocsis peer-mentoring, open source, collaboration, academic, Turing Way, ethics, communication Emmy Tsang I would like to work with the Skills Team at the Alan Turing institute in order to create a peer-mentoring training programme from the main topics of The Turing Way. Our aim is to turn the five main areas in The Turing Way, namely research reproducibility, project design, collaboration, communication, and ethics into modules which the participants of the peer-mentoring programme can work on together. They would apply it on their own research/ projects, therefore they could put the learnt knowledge in use immediately. The aim of The Turing Way has been to be an open source, collaborative, applicable, and practical tool, and with this programme we would like to facilitate the use of it. We would like to see how people in their different stages of their career (PhD, Post-Doc, Researcher, Admin) can collaborate to deepen their knowledge about the contents of The Turing Way, while understand each other’s perspective on the subject. If the prototype is successful, we would like open it up for any applicants interested in applying the practices of The Turing Way to their own projects as part of the other trainings offered by the Institute. 5
Increasing the usability of ChemSpaX, a Python tool for chemical space exploration Adarsh Kalikadien software, homogeneous catalysis, data-driven chemistry, chemical space, transition, metal complexes, open-source Esther Plomp Together with co-workers, I have developed a Python-based tool (Source code, publication 1, publication 2) which can be used to explore the local chemical space of an existing molecular scaffold. Homogeneous catalysts are important in many of our daily processes, but also in our fight against climate change. Our goal was to create a tool that can automatically generate large datasets that can be used in research for data-driven catalyst discovery.\n\nThe issue was that a simple SMILES representation of a molecular scaffold does not work when it contains a transition metal complex. With this tool we use the 3D coordinates of a molecular scaffold and let the user place molecular fragments to create many variations of this scaffold. Other inorganic chemistry fields that use transition-metal containing molecules might benefit from this tool as well. A first prototype is published, but several things can be done to increase the usability of our tool. 5 graduated
The Ersilia Model Hub: encouraging deployment of computer-based research tools for non-expert usage Gemma Turon reproducibility, sustainability, accessibility, artificial intelligence, community engagement Fotis Psomopoulos We have created the Ersilia Model Hub, a FLOSS platform containing AI/ML models for infectious and neglected disease research . These models can be accessed with little to no-coding expertise, solving a major roadblock in the applicability of such technologies to day-to-day research. The Hub currently includes a hundred open source models, both from the literature and developed by the Ersilia organization. With this project, we aim to open the Hub to the whole computer science community, encouraging third-author model depositions so that the code they develop is not simply open source but also deployed in a user-friendly manner. By leveraging the Hub architecture, computer scientists can reach more users, interact with them and further the impact of the assets they have developed by facilitating its implementation in real case scenarios. To this end, the project will focus on establishing clear guidelines on the quality of the software and its reproducibility (as it won’t necessarily be yet peer-reviewed) and creating a standard model deposition form and minimum required documentation. The ultimate goal of the project is to build a community of contributors by facilitating their access. 5 graduated
Easy Access Autism Resources for Rural Parents Robert Schreiber free resources, autism, translation Georgia Aitkenhead, Stephan Heunis Living in South Africa, and in much of the world, you can see a large gap in the accessibility of healthcare resources based on a person’s income and education level. I am privileged enough to have had access to good quality healthcare resources throughout my life, which resulted in me being seen by numerous different therapists, counsellors and psychologists and being diagnosed with Autism Spectrum Disorder, more specifically Asperger’s syndrome. I am also privileged in that I am able to comfortably speak and read English. Unfortunately, a large percentage of our country has a very poor level of education, often not being able to read or speak English very well. Most South African resources I have found for individuals with, and parents of children with, autism are only available in English. My project aims to create a database of resources in a variety of South African languages, both written and as a video format (due to poor literacy rates, especially among older generations due to historical discrimination), in easy to understand language and with concepts that are easy to grasp. This is also useful as more people will be able to access online resources than resources at a healthcare facility. 5
Exploring “Governance Models” for Open Science community projects as per their maturity stage Malvika Sharan, Anne Lee Steele Open Source, Community, Governance, Research Gracielle Higino, Emma Karoune As open science projects mature, they attract, engage and retain members who actively participate and contribute to the project. They build Communities of Practice around the project through knowledge exchange, maintenance or development practices and ultimately guide the future directions for both the projects and their communities. To build a more equitable, resilient and sustainable open infrastructure for a diverse community, it is important to select governance models that give voices to people (users, contributors and wider society) from different socio-technical and socio-cultural backgrounds, identities, career stages, contextual needs and research communities. The Turing Way is a guide for reproducible, ethical and collaborative research and data science. Open Life Science is a training and mentoring program to help researchers learn and apply open principles in their work. They are mission-aligned open science projects that involve participants from around the world to create something deeply meaningful for them. The Turing Way has grown exponentially in the last three years that offers more than 200 pages co-created by more than 300 contributors. Open Life Science has offered 4 cohorts in the last 2 years and currently supports a community of over 300 members (present and past mentees, mentors and experts). To support the governance work in these projects in 2022, I would like to carry out a systematic study of governance models suitable for decentralised and distributed communities such as these. By creating a portfolio of governance models suitable for projects at different maturity levels, members from these projects will be able to identify the right model for their respective projects. The aim is to establish norms, workflows and processes that ensure a democratic structure for decision-making and leadership in a way that contributes to projects’ own visions while collaborating on the shared mission for global open science. 5 graduated
International Committee on Open Phytolith Science: community building initiative and open science training for the Phytolith Community Carla Lancelotti, Javier Ruiz-Pérez, Maria Gabriela Musaubach, Abraham Dabengwa, Emma Karoune, Celine Kerfant, Zachary Dunseth, Juan José García-Granero Community Building, Open Science, Open Data, Open Access, Phytolith, Archaeology, Palaeoecology. Gracielle Higino, Malvika Sharan The International Committee on Open Phytolith Science (ICOPS) was initiated as a new committee within the International Phytolith Society in September 2021 and the first committee meeting took place in December 2021. This committee aims to increase the knowledge of and implementation of open science practices in phytolith research. We are embracing an open source approach to our work in this committee so that the work of our committee is transparent. This will include open documentation, regularly communicating with our community, and providing guidelines and communication channels to enable our community to engage with us. Therefore, we want to establish a solid base for this work going forward by further developing our GitHub repository - adding clear contributing guidelines and documentation on how the committee is to be run. All members of the committee will also benefit from training in all aspects of open science practices. This will allow us to gain further insight into the training and initiatives that we want to work on with our community.\nWe will also start to develop training packages specific to phytolith research such as in open publishing and open and FAIR data. 5 graduated
Building an open community around the Turing-Roche Strategic Partnership Vicky Hellon treatment heterogeneity, missing health data, academia-industry partnership, open science Katharina Lauer The Turing-Roche strategic partnership was established in June 2021 with the goal of establishing a collaboration in advanced analytics between the two organisations to develop new data science methods to investigate large, complex, clinical and healthcare datasets to better understand how and why patients respond differently to treatment, and how treatment can be improved. As Community Manager for the project I am developing a collaborative and open community between both organisations and beyond. As the partnership is just beginning and is flexible in nature there are opportunities to embed open practices such as open publication, reproducibility, open data, training, co-working as well as establishing networks such as an early career researchers. 5
Art as a means to open science Eirini Botsari Lena Karvovskaya On of my goals from my current position, as a community manager at the open science community Rotterdam, is to engage and include as many as possible; I want to sustain and grow the community. One of my ambitions is curating events and engage public to open discussions around open science. One of the events that I really want to arrange is through art (as a means) to raise awareness around open science practices, and create the floor for an open discussion around open science. It is still a general idea, so I am aware that I still need to go over all the details and I am still not sure what blockages will rise through this journey. But I am really positive and I truly believe that art is open and can act as a mediator towards connection, expression, openness, and understanding. 5
An Incomplete History of Research Ethics Ismael Kherroubi Garcia History of Science, Philosophy of Science, Research Ethics Lisanna Paladin A History of Research Ethics is a free, online resource for researchers, governance professionals, and even college students to learn about science and ethics, and be inspired to develop practical tools for the assurance of adequately conducted research. A key purpose of A History of Research Ethics is to demonstrate the variety of disciplines and backgrounds that research ethics can draw on. In other words, interdisciplinarity is critical for its success. This means both interdisciplinary contributions and adapting to audiences from diverse fields and sectors. By embracing the collaborative nature of GitHub and OLS’ open science community, I expect to take A History of Research Ethics to its next stage of development. 5 graduated
Developing thermally stable loop mediated isothermal amplification kit for a non invasive detection of malaria Cavin Mgawe Non invasive diagnostics, molecular kit development, molecular assays, molecular diagnostics, R programming Luis Pedro Coelho This study aims to develop a diagnostic LAMP kit that’s sensitive and robust to Plasmodium falciparum from saliva and urine to enhance simple and easy non-invasive molecular testing. The first phase of this study involved target validation, primer, and probe design using open-access tools. Here, a principal component analysis (PCA) of the R/adegenet package (Jombart et al, 2010) and phylogenetic tree (Neighbour-joining) using R/ape package (Paradis et al, 2004) to cluster repeats of the chosen amplification target. These clusters were aligned to generate a consensus sequence for designing primers and probes for establishing the LAMP assay. I have designed an incorporated strand displacement probe using the engineering guideline of Juan et al, 2015 and the open-access Nupack software tool. The assay has and the master mix is being lyophilized for further experimental evaluation.\n\nThe sensitivity of this kit will be evaluated on extracted DNA from three sample types: saliva, urine, and clinical blood, with crude samples, lysed using lysis buffer. Correlation generated from these results will inform the best sensitive amplification. Further, a possible decay of the lyophilised master mix will be evaluated for six months to ascertain possible shelf-life. 5 graduated
Building Pathways for Onboarding to Research Software Engineering (RSE) Asia Association and Adoption of Code of Conduct Jyoti Bhogal Community Building, Creating Pathway, Onboarding, modifying Code of Conduct Malvika Sharan In October 2021, the RSE Asia Association was launched. This was done to create awareness in the Asia region about the field of Research Software Engineering. The digital infrastructure for the association has been built during the Open Life Science Cohort 4 (OLS-4) program. The webpage is in place, the contact addresses have also been created for communication with people. A small community has also started emerging. It is time that the community can expand. With the expanding community, it is now required that we create well-defined pathways for people to get onboard to the association. This project aims at building such pathways. Also, a basic Code of Conduct is already present on the RSE Asia webpage. It is to be modified to make it more appropriate for the Asian region. 5 graduated
Transcriptomics profiling of bladder cancer using publicly available datasets Umar Ahmad RNA-Seq, Bladder Cancer, Transcriptomics, Bioinformatics Malvika Sharan, Yo Yehudi We are to process bladder cancer RNA-Seq datasets that are publicly available. The co-authors of this manuscript will work on the analysis and apply bioinformatics methods for analysis of this large scale, heterogeneous RNA-sequencing dataset (20 + samples - 3Gb) that will be downloaded from any of the following databases; Genome Atlas (TCGA) database, Genotype-Tissue Expression (GTEx), cBio Cancer GenCancer Genomics Portal (cBioPortal) database and SRA NCBI database (choose any suitable database you are familiar with). The biological questions of particular interest include\n1. identification of differentially expressed transcripts (DEGs)\n2. pathways and gene networks\n3. hub genes associated with cancer progression and recurrence\n4. small molecular identification 5) survival analysis.\n\nOpen source software such as R/Bioconductor (DESeq2), Unix/Linux, Python and Jupyter notebook will be mainly used for the analyses. 5
Bioinformatics Secondary school Outreach in Nigeria Emmanuel Adamolekun Bioinformatics, Students, data analysis Meag Doherty Bioinformatics Secondary School Outreach (BSSO) is an initiative to develop bioinformatics capacity among High school students in Nigeria and this will create early interest in genomics data analysis among the students and equip them with the relevant skills and knowledge in Bioinformatics. Bioinformatics Hub Nigeria will be training these students on how to use Bioinformatics tools and pipelines and this can be achieved by establishing Bioinformatics research clubs in the visited schools to facilitate the trainings. We would be working alongside with other sister organizations to achieve this goal 5
OpenGHG - a cloud platform for greenhouse gas data analysis and collaboration Gareth Jones Greenhouse gases, repeatable science, data science, data sharing, data analysis, open source Michael Addy The OpenGHG project is a NERC funded project that aims to be a community platform for greenhouse gas data science. There is currently no central platform for greenhouse gas / atmospheric chemistry researchers to access standardised data / workflows, or easily share and analyse their measurements. Currently our prototype service processes and standardises the raw measurement data taken from sensor networks worldwide (such as the DECC and AGAGE networks), records associated metadata and makes this data searchable. We are currently in the process of adding the ability to process data from other sources, such as satellite and meteorological models. 5 graduated
Visualisation of participants by their countries Akanksha Chaudhari Data Visualization, database, web application Muhammet Celik, Burce Elbasan Visualization of participants by their countries I would like to work on a project ‘Visualization of participants by their countries’ from the projects listed here.\n\nWith this project, I would try to represent participants and mentors participating in OLS on the map by their countries, and year of participation. I am planning to achieve this using Mapbox and OpenStreetMap. When hovering over a particular flag we can see that particular participants/mentors all info which is listed here or this one.\n\nI would like to make something like this but would do a lot of brainstorming about design and representation. Also, we can add something like showing mentors vs ols-1/2/3/4 participants or showing number per country, or anything else creative. The next step would be to implement this on the official site of OLS. I want to practice my coding and visualization skills through this project and would appreciate the opportunity to meet, interact and work with like-minded people. 5
Build a community around the TU Delft Open Science MOOC Alessandra Candian, Lisanne Walma Open Science, Community Building, Open Education, Engagement Patricia Herterich In this project we want to develop and implement new ways of building, engaging and maintaining the community around the TU Delft Open Science MOOC. We are part of the teaching team of the TU Delft Open Science MOOC called: ‘Open Science: sharing your research with the world’. The MOOC’’s next run starts in May 2022.\n\nThe course runs for 6 weeks and discusses a variety of open science topics, course materials are also available on TU OpenCourseWare.\n\nThe first Open Science MOOC started in 2018 and on average the course attracts about 1000 participants from an international environment. As the course runs participants engage with the teachers and each other through discussion forums. Here they introduce themselves, post assignments, and share and reply to each other’’s thoughts. While a few participants actively contribute and respond in the fora, engagement is still quite limited. Moreover, there is not yet a strategy in place to maintain the community after the course has finished. We would like to strengthen the community building taking place during and after this course by implementing additional strategies to engage participants during the course run and keeping up with participants after the course has finished. 5 graduated
Hub23: An open source community and infrastructure for Turing’s BinderHub Callum Mole, Lydia France, Luke Hare Open Source, Reproducibility, Community, Open Infrastructure, Research Renato Alves Binderhub is a service that allows users to share reproducible interactive computing environments through public code repositories. The subject of our project, Hub23, is an organisational deployment of Binderhub, designed to allow Turing Researchers to use binder (the user interface) to collaborate on repositories internal to Turing. This is sometimes necessary if the underlying repository can not be shared for some reason, or is not yet ready to publish openly. During the OLS program, we aim to build an open community around Hub23 to help to guide future technical developments, and encourage use and contributions from the wider Turing community. We will host a series of Zero-to-Binder workshops aimed at introducing Turing researchers to regular binder, followed by structured discussion of what the ideal features of a collaborative reproducible environment for research would be. Any conclusions and subsequent technical development will be fed upstream to Binderhub, and we also aim to open source the methodologies used to create an internal binderhub deployment, allowing other organisations to do so. 5 graduated
Development of an Open Source Platform for the Storage, Sharing, Synthesis and Meta-Analysis of Clinical Data Valentina Borghesani, Isil Poyraz Bilgin, Pedro Pinheiro-Chagas, Sladjana Lukic neuropsychology, neuroimaging, data sharing, cognitive neuroscience, clinical neuroscience, data visualization, meta-analysis Sara El-Gebali We aim to build an online platform and community that allows open sharing, storage, and synthesis of clinical (meta)data, crucial for the development of modern, transdiagnostic, FAIR neuropsychology. First, published peer-reviewed papers will be scrapped to collect already available (meta)data. Second, our platform will allow direct uploading of clinical brain maps and their corresponding metadata.\n\nA basic automated preprocessing and data-quality check pipeline will be implemented. Key data will be automatically extracted, synthesized, and made available alongside the one directly uploaded. All the available demographic, behavioral, clinical, and cognitive data will be properly organized and mapped onto the neural data to allow statistical analysis (i.e., data-driven lesion-symptom mapping). Ultimately, probabilistic maps synthesizing transdiagnostic information on lesion-symptom mapping would be constantly updated as more data are gathered. To this end, data visualization will be critical (e.g. Overall, the platform will\n\n1. enable sharing of FAIR neuropsychological datasets across research centres and groups;\n2. foster understanding of the topographical distribution and morphological characteristics of brain lesions;\n3. allow large-scale, data-driven exploration of the associations between behavior and cognitive symptoms and brain regions. 5 graduated
Open Science, Open Future Mariangela Panniello open education, student training, community building, educational resources, neuroscience Sara Villa Open science is vital for reproducible, fair, and rigorous research. For its principles to thrive, we need OS practices to be shared and adopted by as many scientists as possible, from the earliest stages of their career. A 2017 survey by the European Commission reports that, among 1277 researchers at all career stages, the majority were unaware of the OS concept, and had never attended an OS initiative (from the Open Science Skills Working Group Report, July 2017).\n\nFor open research to become an established reality, those who are moving their first steps into research must have the opportunity to develop the necessary skillset to apply and disseminate the OS framework. “Open Science, Open Future” aims to be an educational resource to be used online by young scientists: undergraduates, MSc students, and potentially high-school pupils. The curriculum will consist of several modules explaining why each aspect of the OS practice can improve research and make it fairer (e.g. best practices in sharing protocols, storing data, publishing, collaborating). I’m the co-founder of a pan-european collective of scientists,, aiming at rethinking the way we do science. Fellow members are willing to take part to the project. 5
Open Science for Improve diagnostics of Cancer through Artificial Intelligence and Digital Pathology Nodira Ibrogimova, Elisee Jafsia, Stephane Fadanka, Agossou Bidossessi Emmanuel AI, Cancer, Diagnostics, Digital pathology Andres Sebastian Ayala Ruano Open Science for Improve diagnostics of Cancer through Artificial Intelligence and Digital Pathology.\n\nCancer is becoming increasingly prevalent among the group of treatable diseases in African countries. In sub-Saharan Africa, only 10% of histopathology needs are met and this is a major barrier to comprehensive management of cancers\n\n1. There is a shortage of clinicians and pathologists available for cancer diagnosis and treatment. One of the critical factors in treatment efficiency is the correct and timely diagnosis of specimens by pathologists. However, there is currently a significant shortage of cancer care clinicians in Africa and an even more considerable shortage of pathologists. In Cameroon, there are 19 pathologists currently in practice for 22,179,707 inhabitants\n2. The absolute number of patients with cancer in Cameroon was estimated to be 25,000 cases a year.\n\nDiagnosis of cancer relies on histology in nearly 80% of cases, cytology in 10%, and clinical diagnosis in 10% (1). There is, therefore, an urgent need to develop a rapid, highly sensitive and diagnostic tool for the diagnosis of cancers, to increase cancer treatment efficacy and reduce overtreatment of tumors clinically suspicious for malignancy. We propose a hybrid diagnosis method with a deep Learning algorithm applied on hematoxylin and eosin histology slides. Digital microscopy and telepathology were already successfully used to mitigate the lack of pathologists in Cameroon, thus confirming the availability of a robust dataset for our project (1). Following splitting into training, validation and test sets, we will use CNNs as algorithms on the collected images to train the algorithm before deployment and tests. In addition to automated diagnostic, the developed program will have specific features such as sample information storage and tracking software as well as image optimization and analysis tools.\n1. Gruber-Mösenbacher, U., Katzell, L., McNeely, M., Neier, E., Jean, B., Kuran, A., & Chamala, S. (2021). Digital pathology in Cameroon. JCO Global Oncology, 7, 1380-1389.\n2. Ministry of Public Health (2017) Health analytical profile 2016 Cameroon. Ministry of Public Health, Cameroon, Yaounde. 5 graduated
Cultivating a Community of Practice of AI researchers Achintya Rao community management, open and reprodicible data science, Artificial Intelligence, community white papers Yo Yehudi The “AI for Science and Government” (ASG) programme at The Alan Turing Institute seeks to produce three community-led white papers that will capture the outcomes of research into deploying AI and data science in priority areas to support the UK’s economy. The papers will also highlight advances in practices towards open and reproducible research in the fields of AI and data science. The process of authoring the white papers will itself be collaborative, open and transparent, soliciting contributions from the wider ASG community at every step of the way. 5 graduated
Argentinean Public Health Research on Data Science and Artificial Intelligence for Epidemic Prevention (ARPHAI) Verónica Xhardez, Sabrina López, Victoria Dumas, Federico Cestares, Laura Ación Mayya Sundukova ARPHAI is an interdisciplinary research consortium, whose mission is to develop technological tools and recommendations to anticipate and manage epidemiological events. ARPHAI pilots data-driven open source tools using artificial intelligence and data science towards upgrading Argentina’s electronic health record (EHR) system. ARPHAI is part of the Global South AI4COVID Program.\n\nARPHAI includes persons from 20 institutions. ARPHAI started in October 2020 and has grown very fast from scratch. More specifically, ARPHAI is piloting three EHR-based components in parallel to anticipate and detect potential epidemic outbreaks\n1. The extraction of computable phenotypes of diseases, symptoms, and syndromes using natural language processing to analyze EHR structured and free-text;\n2. Models for understanding and prediction of relevant epidemiological variables using computable phenotypes and open data information as input; and\n3. Dashboard visualization of the results from both points above, along with additional open data sources to inform decisions made by the public sector epidemiological authorities.\n\nThere are two additional lines of work ARPHAI undertakes that are transversal to these three research developments, which include a) diversity, equity, and inclusion (DEI) with a focus on gender and b) responsible use of health data. 5 graduated
FarawayFermi - A platform for open source bioinformatic tools to detect biosignatures in astrobiology Sagarika Valluri, Sairaj R Dillikar Bioinformatics, data, astrobiology, biosignatures, life evolution, game theory The project focuses on building tools to understand the evolution of life. We develop bioinformatic tools to determine evolutionary processes to detect early stage life development. The project looks at data from two specific parts of detecting life - co-evolution of life and environment and biosignature assessment within the context of habitability. We use data from current experimental projects and develop new models to aid the growth of astrobiology search for life. The platform will cater to multiple sections such as- data management from all astrobiology projects, experiments, research labs and conferences; new tools to analyse data, predictive model section to simulations from the data set and collaborative forum to encourage citizen science.The platform will help create open source bioinformatic tools to help detect biosignature, assess habitability, promote involvement within astrobiology. In addition to using bioinformatic tools, a part of the platform will use game theory and gamification to test the citizen science component. Using both the platform as a destination for tool testing and science education, we hope to advance the research in astrobiology. 5
Open collaborative network and incentive system for brain health Juyeon Kim Collaboration, building trust, intersectoral collaboration, Emotional/Physical/Nutritional diet for brain, global connectivity, Digitalization, Incentives to public and scientists
  1. Scientific perspectives Depression is becoming more common mental illness caused by the complicate network of various extrinsic and intrinsic stimulators.For the healthier and happier brain status, three key diets such as Emotional,Physical, and Nutritional diets should be considered. Scientific researchers are mostly focusing on research in unraveling the key molecule associated the mechanism of depression. However, we need to pull and process more extensive dataset from individuals, public health, and professionals in psychology, nutrition science, physical science, and neuroscience to improve or treat the brain kept or recovered to the healthy status.\n2. Open science perspectives. For comprehensive data, we need to build up the collaborative digital network with trust through benefiting to each participant to accelerate innovation from the open dataset. Effective collaboration tool or co-creation platform for the intersectoral collaboration would be required for the practice in this project.
An open educational resource to introduce fundamental concepts of GNU/Linux, terminal usage, Bash/AWK scripting, and Git/GitHub for Bioinformatics Andres Sebastian Ayala Ruano version control, training and education, programming Alexander Martinez Mendez, Julien Colomb This project aims to create an open educational resource that introduces important concepts for aspiring Bioinformaticians, written in Spanish. Topics included in this resource are:\n\n* The basics of GNU/Linux\n* Jupyter Lab\n* Terminal usage\n* Text and file processing command-line tools (i.e. grep, sed), regex, and pipes\n* Text and file processing Bioinformatics exercises\n* Make to install software \n* SAM Tools: useful pipelines and software for Bioinformatics\n* AWK: A programming language for text file processing \n* Bash: Shell and programming language\n* Git and GitHub for version control \n\nAs part of a boot camp organized by RSG Ecuador and iGEM Ecuador, we generated the first version of this resource, available as an e-book powered by Jupyter Book and GitHub. However, this resource has not been launched yet because we have doubts about open licenses, permissions to use external images, and other topics that it would be nice to learn at the OLS-4. 4
Genestorian: An Open Source web application for model organism collections Manuel Lera Ramirez web application, genetic engineering, reproducible research Sam Haynes I want to develop Genestorian, a web application to manage collections of model organism strains and recombinant DNA, which will store the genetic engineering steps followed to generate new entities from existing ones. The project would consist on developing:\n\n1. A standard file format to document genetic engineering steps\n2. A web application to generate this documentation in the browser or programmatically\n3. A web application with a database to store such information\n\nNone of the above software pieces, which are all essential for open reproducible science, exists as Open Source, and proprietary solutions do not cover the use-case of model organism research.\n\nEssentially, Genestorian will be a web application that researchers use routinely to consult the “genealogy” of existing biological resources, plan the generation of new resources, and attach experimental data supporting their successful generation.\n\nThe cornerstone of this project is the mentioned file format. It will be similar to a data structure for a family tree: it will store a list of entities, and a list of objects describing their relation. Something like this:\n\n\n{\n "entities": [{id:1, ...},{id:2, ...},{id:3, ...}],\n "steps": [\n {\n "inputs_ids": [1,2],\n "output_id": 3,\n "method": {...},\n "proofs": [{...},{...}]\n },\n {...}\n ]\n}\n 4
Multibeam electron microscopy for imaging large tissue volumes Arent Kievits electoron microscopy and imaging, deep learning Esther Plomp, Martin Jones Recent developments in electron microscopy have led to a significant scale-up in the imaging of biological tissues, making throughput a major bottleneck for further progress. Electron microscopes are inherently throughput limited. A new type of scanning electron microscopy, the multibeam electron microscopy, speeds up imaging by scanning the sample with an array of beams instead of a single beam. Furthermore, this microscope makes use of a new detection system based on transmitted electrons and scintillation photons, which provides comparable information to conventional detection methods. To make use of the full potential of this microscope, new methods for data management, data analysis and visualization have to be designed. For example, we would like to employ deep learning methods for automatic segmentation. We would like to apply the multibeam electron microscope to study mitotic cells, zebrafish development and pancreatic stress. 4 graduated
Open data for nanosystem synthesis experimental conditions Guillermo Luciano Fiorini, Diego Onna, Tobías Aprea database, FAIR principles, research community Gracielle Higino We propose to build a database that compiles literature data for the Stöber synthesis of sílica nanoparticles. This data is aimed to generate a database for other studies, such as statistical and machine learning models to guide the design and synthesis of calibrated and monodispersed silica nanoparticles by the Stöber method. The data should follow the FAIR principles to make it findable, accessible, interoperable, and reusable, making it public and available for any person that is interested in the study of this synthesis. We also aim at building a community of contributors of new synthesis data to enrich the dataset that will allow better models for Stöber synthesis to be studied. Our long-term vision is that the nanosynthesis research community opens and shares its data as this will advance the nanosynthesis field in a more sustainable way globally. 4 graduated
Grassroots: Nurturing the EMBL Bio-IT Community Lisanna Paladin training and education, research community, peer consulting Emmy Tsang, Dave Clements Bio-IT project at the European Molecular Biology Laboratory (EMBL) is a community initiative aiming at: \n\n- delivering training in computational research skills;\n- creating connections between community members;\n- developing and maintaining resources and supporting infrastructure;\n- disseminating relevant information throughout the community.\n\nIn order to strengthen the community interactions, Bio-IT launched the Grassroots consulting initiative, listing volunteers among EMBL Staff interested in providing assistance on a wide range of computational topics. Building on this effort, I aim at expanding the crowdsourcing of Bio-IT’s activities and supporting the community of practice at EMBL. In line with Open Science objectives, a culture of sharing (of skills, resources, data and ideas) within the institute will also foster the same culture beyond it. Within OLS program, I will elaborate a Grassroots project strategic plan by: \n\n- identifying actionable information on the current interaction with the community;\n- performing a SWOT analysis of community building strategies at EMBL, in particular:\n- analysing EMBL-specific challenges and developing strategies to address them;\n- defining the project goals, milestones and deliverables and stating them publicly.\n\nThe objectives of the project are:\n- strengthening peer-consulting and internal communication,\n- acknowledging Bio-IT contributors and give them visibility,\n- renewing the community interest in Bio-IT and computational best practices. 4
Balconnect - A network of private outdoor areas improving urban ecoliteracy and biodiversity Adel Sarvary biodiversity, research community, green infrastructure, policymaking Emma Karoune Balconnect is a social and environmental intervention with the long-term goal of using open-source, emerging data-driven technologies and open science tools to motivate and enable urbanites to bring active daily experiences of real nature into their lives, as well as to provide them with tools not only to improve their direct, individual natural environment, but to conserve local biodiversity and ecosystem services as well. \n\nBalconnect aims at building knowledge-based people-plant interactions using open, community-focused practices and augmented collaborative learning, while mapping and building a structured network of metropolitan outdoor ornamental plants raised on window sills, balconies, terraces and backyards.\nAs the open database is being built, participants could gradually:\n- Document and visualize their personal eco-legacy by learning and sharing nature with their communities (through data, experiences, plants, cultural products).\n- Increase demand for locally cultivated and native plants.\n- Consciously create biodiverse green patches in private outdoor areas.\n- Collect valuable data and conduct open research for science.\n- Make cities more liveable, reducing urban green inequalities.\n- Influence local government decisions on city planning, green infrastructure and nature-based solutions. 4 graduated
Generic data stewards in the Netherlands: who they are, what they do, and who they could become Elisa Rodenburg data stewards, research community, interview Carly Monks, Alexandra Holinski In my project, I want to interview several colleagues/peers who were hired or recently started working as generic data stewards at Dutch universities, and, to some extent, Dutch Universities of Applied Sciences – I am one of them. I will rework these interviews into blog posts and analyse what I learnt from these interviews. The interviews will focus on the following elements:\n1.\tWhat is their background, how did they enter this job?\n2.\tWhat is expected from them in this role?\n3.\tWhat is their role/position within the landscape of RDM and Open Science support at their institution?\n4.\tWhat skills and competencies do they already have?\n5.\tWhat skills and competencies would they like or need to acquire?\n6.\tWhat does a possible career path look like for them?\n\nI will analyse these interviews to make suggestions about necessary training for new generic data stewards, why (and how) they are necessary for the institution, and their possible career paths. We have a blog about the RDM Community at the VU, which is meant to showcase the RDM Community. This is a possible location for my blogs and other progress made in the project. 4
Open and reproducible data analysis for wet lab neuroscientists Sara Villa training and education, reproducible research, sequencing Hans-Rudolf Hotz My project aims to tackle the big gap between the new trend of RNA sequencing analysis and massive expansion of datasets, and the difficulties for wet lab scientists to use this data and even review it when being published. I would also like to create awareness in my community about the necessity on implementing open science and reproducibility tools.\nI would like to: \n1st, publish a tutorial from basics level to help people understand the technology and the data analysis behind RNA sequencing. Implement open science tools for this, and introduce every scientist to its existence and need. \n2nd, create a reproducible analysis pipeline, based on my existent one, but incorporating reproducibility basics such as version control and workflows, so researchers can see how the tutorial would work from an existent example. 4 graduated
Building the Research Software Engineering (RSE) Association in Asia region Saranjeet Kaur Bhogal research community, research software engineering Anne Fouilloux We plan to create a “Research Software Engineering Association in Asia region – RSE Association (Asia) “. The motive of this association would be to emphasize the importance of good software practices to be adopted by researchers in the Asian academia. Software and programming plays an important part of most of the research in these times. Yet, academia has not fully adopted good modern software practices and principles. We also plan to create an awareness of the Research Software Engineering (RSE) role in Asian academia so that people who have an expertise in software as well as research find a firm footing in academia. The plan in this project is to first build a community of interested people. We would also like to build the technical set-up required for this project like a GitHub repository for easy collaboration; a website which would describe the aims of this association, and would include information on further events of this association; a mailing list which people can use to join this association easily, a Twitter account, and a slack channel in the global RSE slack. This project is highly inspired from the Society of Research Software Engineering in the UK. Saranjeet Kaur Bhogal is the primary applicant of this project, and the one who initiated the project, and came up with the idea. 4 graduated
Encouraging Responsible AI Through An Open Framework for Synthetic Data Generation and Assessment Erika Salomon, Caitlin Augustin synthetic data, Ethical AI Fotis Psomopoulos Operating from a place of data for co-liberation, we have three complementary goals: conduct a landscape analysis of open source synthetic data projects, ask critical questions about the embedded assumptions and ethical considerations in generating synthetic data, and recommend appropriate approaches to synthetic generation for multiple case studies. \n\nWe see an urgent need for this work - while methodologies have become more common in the financial policy realm, most applied data researchers outside of large financial or tech companies do not have access to - nor an understanding of - the state of the art approaches, how to evaluate appropriateness of generation for their use cases, and how to evaluate synthetic data fitness for use. \n\nWe are expressly sector-agnostic, bringing expertise from health, environment, and education backgrounds to this problem - all sectors with a need for privacy-protecting solutions. Building on sector-agnostic frameworks such as Datasheets for Datasets, and the very recent CDEI-UK PETs Adoption Guide we aim to deliver a similarly sector-agnostic framework for synthetic data generation. With our interdisciplinary background, we will approach the development of such a framework in a way that acknowledges the political nature of data production and openly consider questions of bias, fairness, and ethical AI. 4
Talleres Open Source community platform Cecilia Herbert training and education, data visualisation, research community Yo Yehudi Our initiative is called “Talleres Open Source” (Open Source Workshops in Spanish), a community of scientists teaching and learning about open source tools in Spanish. We organize workshops in which a trainer with experience in a certain method shows others how to apply it using open source tools, and attendees give it a try with short exercises. For example, “Data visualization using Python”, or “Digital Fabrication with FreeCAD”. We have hosted two cycles of 4 workshops spanning 1 month, with both software and hardware tools, as well as two stand-alone events about design heuristics with a local fablab.\n\nThe goal of this project is to find common problems within Latin American scientific communities and hold workshops and training courses to tackle these problems. Each of these workshops serve as a way to connect people facing the same problem, introduce them to open source tools and concepts, and enable sharing resources. We hope attendees finish each workshop with a concrete first impression about the method and hands-on experience with the tool to reduce the barrier to adoption. 4 graduated
Creating an open database for carbon foot printing of buildings in Ghana Michael Addy energy, database, carbon footprinting Yvan Le Bras, Kate Simpson There is a considerable gap in cases of sub-Saharan African countries regarding assessment of embodied energy of building materials and operational energy of various building types. Lack of open data remains a critical barrier to closing this gap. Creating an open and accessible database of embodied energy of building materials, and operational energy of building typologies will be key in establishing the carbon footprint of buildings in Ghana. The development of an online platform will also allow interested groups, individuals and cooperation’s to submit key information needed for the computation of energy outputs in buildings. The aim of the project is to set up the infrastructure to launch an open database for carbon foot printing of buildings in Ghana, following best practice in open science principles. 4 graduated
Environmental mapping for urban farming project Florence Okoye agriculture, DIY and makerspace, biodiversity Lilly Winfree Portable land is an urban DIY Agriculture project which aims to create a distributed farm. Currently we have four dedicated sites based in the West Midlands - a washyard which has been converted into a growing space, a community grow room, a patio for urban farming and an indoor DIY aquaponics setup. We are also part of a larger community of urban farmers and DIY horticulturalists in the West Midlands, sharing knowledge and supporting collective organising of growers and urban naturalists. \n\nThe primary goal of the project is to create a distributed network of growers and land guardians, but in order to achieve this, we need to develop consistent protocols for understanding environmental quality and biodiversity across our sites as well as shared repositories of data and reports. 4
Culture-independent discovery of natural products from soil metagenomes Mai Alajaji, Batool Almarzouq, Leena AlMehlisy citizen science, therapeutics, soil sample, sequencing, metagenomics Bérénice Batut We are working on establishing a Research Hub in the National Guard. This hub is a virtual platform linked to TDM to support academics (particularly young Female researchers), scientists, and physicians. It aims to bridge the boundaries between research, cross-subject collaboration, and establish a community of like-minded people. It will work towards increasing the visibility of ECRs and invite them to apply Open Science practices in their reserach. A part of this Research Hub is to establish a Citizen Science Soil Collection program in Saudi Arabia. This project aims to adapt the Citizen Science approach in Saudi Arabia to bring together researchers with citizens. Our lab focus is the use of metagenomics of Natural Products (NPs) and it is based on King Abdullah International Medical Research Center (KAIMRC). Metagenomics of NPs is an innovative approach that utilizes next-generation sequencing to study microorganisms via the analysis of their DNA acquired directly from an environmental sample. The screening of natural product extracts has traditionally been the most effective method for identifying new compounds with unique cellular targets which are potentially useful as lead structures for the development of new therapeutics. \n\nHowever, there is hardly any known project which utilises Citizen Science in Saudi Arabia. Therefore, we are collaborating with Open Science community Saudi Arabia (OSASA) to adapt this approach in our current project. This project will engage citizen scientists in collecting and examining soil samples from various regions in Saudi Arabia and bring awareness about the role of Citizen Science in research as part of the Research Hub. 4 graduated
Developing a library in Python for applying measures of emergence and complexity Nadine Spychala complexity measures, programming, Python library Dario Pescini, Anthony Bretaudeau I aim to develop a Python library which allows to call and apply several measures of emergence and complexity to either empirical or simulated data, and provide guidance for comparisons among and conclusions about different measures.  \n\nMeasures of complexity operationalize the idea that a system of interconnected parts is both segregated (i.e., parts act independently), and integrated (i.e., parts show unified behaviour). Emergence, on the other hand, is a phenomenon in which a property occurs only in a collection of elements, but not in the individual elements themselves. Both emergence and complexity are promising concepts in the study of the brain (with a close relationship between the two).  \n\nQuantifications thereof can take on very different flavours, and there is no one-size-fits-all way to do it. While a plethora of complexity measures have been investigated quite substantially in the last couple of decades, quantifying emergence is completely new territory. A few measures exist (see, e. g.,, or, but they are not readily implementable - they are scattered over different github repositories (or people, if repositories are not existent), programming languages (including Matlab which is not open source).  \n\nA way to easily use & compare a set of state of the art emergence and complexity measures by using a few lines of code is thus missing – this is the gap that I’d like to fill. 4 graduated
The Environmental AI Book Alejandro Coca Castro artificial intelligence, data science, reproducible research, research community Delphine Lariviere We propose a living, free and open document, named The Environmental AI book, compiling research in the application of AI and Data Science for monitoring and modelling a wide diversity of settings of the natural and urban environments.\n\nThrough a set of interactive use-cases, the document, powered by Jupyter Book (, aims to inform and guide the scientific community about information extraction and analysis from environmental sensors (including ground sensors, drones, and satellite Earth observations) using data-driven methods.\n\nIn addition to the book, our goal is to build a community dedicated to making collaborative, reusable, and transparent research in environmental science. In this regard, inspired by The Turing Way (h, we are hosting online \nCollaboration Cafes to engage anyone interested in learning and discussing relevant themes in AI and data science to help understand our changing planet.\n\nWhile the scientific community is broad, we think the target audience of this book is:\n- Researchers with some background in environmental science interested in data-driven methods.\n- Researchers with some background in computer science interested in environmental studies.\n- Anyone else interested in reproducibility, inclusive, shareable and collaborative AI and data science for environmental applications. 4 graduated
Bioinformatics Secondary school Outreach in Nigeria Emmanuel Adamolekun outreach, secondary school outreach, training and education Meag Doherty Bioinformatics Secondary School Outreach (BSSO) is an initiative to develop bioinformatics capacity among High school students in Nigeria and this will create early interest in genomics data analysis among the students and equip them with the relevant skills and knowledge in Bioinformatics. Bioinformatics Hub Nigeria will be training these students on how to use Bioinformatics tools and pipelines and this can be achieved by establishing Bioinformatics research clubs in the visited schools to facilitate the trainings. We would be working alongside with other sister organizations to achieve this goal 4
Citizen Scientists as Data Explorers Wai-Yin Kwan citizen science, environmental data, outreach Bruno Soares I want to develop a project that gives iNaturalist citizen scientists the chance to move from data collectors to data explorers. I want to use my programming skills and outreach experience to create an online tool where users can browse through iNaturalist and environmental data. By creating online data exploration tools, I want users to form their own questions and look for answer to their questions. 4 graduated
EROS Stories: Conversation and case studies in open research across educational disciplines Cylcia Bolibaugh, Gill Francis research community, case study, training Lena Karvovskaya, Esther Plomp Established in 2018, EROS (education researchers for open science) is an open research working group at the University of York. We monitor and communicate ongoing developments within the open research landscape, provide guidance and training on adopting open practices, and influence incentive structures to recognise commitment to open research practices.\n\nThe aim of the EROS Stories project is to deepen EROS as a community of practice by providing a mechanism for junior and senior researchers to engage in dialogue about their experiences with particular open research practices, and to showcase the resulting conversations as publicly available case studies.\n\nConcretely, EROS Stories will pair researchers, at least one of whom must be an ECR, and at least one of whom must have experience with a particular research practice. The less experienced partner (who may be senior) commits to reading at least one primer on a particular open research practice, and then leads a conversation with their partner, asking about their experiences, motivations, and insights and top tips for working with the practice.\n\nThe project will help the EROS community build a shared repertoire of experiences, stories, tools and ways of addressing common challenges in doing open, inclusive research. 4
FarawayFermi- A platform for open source bioinformatic tools to detect biosignatures in astrobiology Sagarika Valluri astrobiology, environmental data, citizen science Harpreet Singh The project focuses on building tools to understand the evolution of life. We develop bioinformatic tools to determine evolutionary processes to detect early stage life development. The project looks at data from two specific parts of detecting life - co-evolution of life and environment and biosignature assessment within the context of habitability. We use data from current experimental projects and develop new models to aid the growth of astrobiology search for life. The platform will cater to multiple sections such as- data management from all astrobiology projects, experiments, research labs and conferences; new tools to analyse data, predictive model section to simulations from the data set and collaborative forum to encourage citizen science.The platform will help create open source bioinformatic tools to help detect biosignature, assess habitability, promote involvement within astrobiology. 4
A guide towards reproducible research for Decision Sciences researchers Andreea Avramescu reproducible research, data science Jessica Scheick In academia today there is a certain pressure for young research to publish, and given the time constraints and continuous deadlines, the aspects of reproducibility and replicability are often overlooked. The Turing Way is a great starting point and a guide for people that want get familiar with what needs to be done to make their results, data, hypothesis, etc available to the research community and the public. However, currently it is orientated towards issues encountered mostly in Data Science, and while many of the resources are extendable towards other fields, I consider that it could benefit from specific chapters focused on different research areas. The aim of this project is to create such a chapter by understanding the exact barriers for young researchers when considering reproducible research in Decision Sciences. The guide would be instead of a collection of resources that can be simply used at the end to make “your research more reproducible”, a way of thinking in a way and contain resources that help you consider reproducibility towards the entire research. 4
Building Open Science and Data Analysis Skills by Leading the OLS Survey Data Project Burce Elbasan survey data, data visualisation, programming Beth Duckles Although my research areas are mostly related to wet-lab, with the developed technologies, I am aware that computational and data analysis approaches in research promise great opportunities. Therefore, I am proposing the Open Life Science Survey analysis project to participate in the Open Life Science (OLS) community and make a contribution to the program. In this project, I will be working on already collected survey data from their 3 cohorts. In my opinion, in this era, barriers to open research are not technical but rather socio-cultural. Therefore, by analyzing OLS participant’s survey data, we could gain insight into the demography and socio-culture of the participants. In this way, both OLS team will get a chance to enhance their program for the future and I will learn how to perform data analysis, deal with the survey data and gain some programming skills as well. I believe this opportunity will help me develop my academic skills and give me different perspectives in open research, which are different from but beneficial for my current research. 4 graduated
Hub23: An open source community and infrastructure for Turing’s BinderHub Lydia France, Luke Hare, Callum Mole research community, technical development Renato Alves Binderhub is a service that allows users to share reproducible interactive computing environments through public code repositories. The subject of our project, Hub23, is an organisational deployment of Binderhub, designed to allow Turing Researchers to use binder (the user interface) to collaborate on repositories internal to Turing. This is sometimes necessary if the underlying repository can not be shared for some reason, or is not yet ready to publish openly. During the OLS program, we aim to build an open community around Hub23 to help to guide future technical developments, and encourage use and contributions from the wider Turing community. We will host a series of Zero-to-Binder workshops aimed at introducing Turing researchers to regular binder, followed by structured discussion of what the ideal features of a collaborative reproducible environment for research would be. Any conclusions and subsequent technical development will be fed upstream to Binderhub, and we also aim to open source the methodologies used to create an internal binderhub deployment, allowing other organisations to do so. 4
Online event “Women in Data Science - Perspectives in Industry and Academia” Part II Irena Maus scientific events, data science, gender equality Iratxe Puebla The project is about the conception, organization, and coordination of an online event with a focus on identifying the key driving factors for a scientific career as a woman in data sciences. During the event, we will try to get to the bottom of the large gender gap in the data science field and present efforts to get women into this field, further driving progress towards gender equality. The aim of such an event is to show how diverse and attractive the job of a data scientist can be including open and fair data principles. My colleagues and I already organized such an event in July 2021. The number of participants, and therefore, the response and demand were so great that we decided to held the meeting again in autumn 2021 with a slightly different thematic focus. 4 graduated
open and international River University Ewa Leś biodiversity, training and education, river and freshwater ecology, environmental data Emily Lescak The international and open RIVER UNIVERSITY started in Poland with its pilot in 2018 and the 1st edition in 2020.\nIts professional river education has the ambition to change the reality by creating a strong center/network of modern knowledge, linking experts and giving the opportunity to participants to learn about the peculiarities of different rivers in the Baltic region and in Europe.\nWe provide tools to use in practice, exchange information and experience about inland waters and its impact on the Baltic Sea, to spark joint initiatives for the sake of rivers’ good condition.\nRiverine topics always reflect source-to-sea approach and relation to current trends, challenges, legislation regarding freshwaters, and free-flowing rivers: restore nature law, UN restoration decade, Biodiversity strategy 2030, European Green Deal, national recovery plans, etc.\nIn 2018, the River University mostly served for the purpose of gathering river experts from several transboundary basins of Poland (Odra, Vistula/Western Bug, Neman) with their counterparts from Belarus and Ukraine to discuss i.a. issues related to inland navigation and large infrastructure projects on straightening river flows in these countries. 1st edition of River University in 2020 provided general solid knowledge about healthy rivers, their benefits and threats, presented one of the most stunning rivers in Poland – Drawa river and highlighted current riverine challenges impacting the society. All these looking towards European community goals, as usual.\n\nDuring the 2nd edition in 2021, we swim into Lithuania’s waters, to dive deep – finally live! – into water challenges in the next country in the Baltic region. We will get to know best practices of good water management, experience with transboundary river cooperation, also innovations and developments within sewerage system management. We can already read about flood risk management and about river barriers to remove or mitigate in the Baltic Sea Region (, This time we ask – how are the Lithuanian rivers?\nBeing a visitor of the largest protected area in Lithuania, crossing rivers outdoor, stepping into practical lectures in the national park, we will also ask about how to limit riverine pollution from tourism and how the connectivity of amazing Lithuanian rivers is ensured.\nChecking the European background: how the situation of rivers in Europe looks like in general? Later in time, I may widen it to Europe, not only the Baltic region.\n\nRiver University has been granted patronage from European Parliament and I seek and encourage water-related institution patronage at every edition. It engages top-level lecturers and universities and practitioners, e.g.: the University of Lausanne, Warsaw University of Life\nSciences, Leibniz Institute of Freshwater Ecology and Inland Fisheries. 4 graduated
Learning about open science communities and help build “community health” report for The Turing Way Ali Humayun research community, community health metric, reproducible research Arielle Bennett In The Turing Way, we want to systematically understand community practices including the community engagement pathways, contributors’ roles and nature of their participation that have been successful at supporting its community of diverse contributors. Simultaneously, we want to identify factors that may currently prohibit short or long term commitments of our contributors and how they can be further supported.\n\nWith my participation in OLS-3, I will develop a community health report of the project, capturing community development aspects from growth to retention. I will build upon the Open Source community health metric (, which involves evaluating contributors’ group that is actively involved in a project, number of new contributors that join the project, and members who leave. For online projects, it can also involve tracking the number of community ambassadors, the number of return attendees to events and the rate of churned attendees. Developing an ideal metric in this project will require further deliberation and consultation from The Turing Way team and core contributors. Hence, this project will be collaboratively designed with other community members by actively inviting their contributions and thoughts. 4

This table is based on the work done by Angelica Maineri as part of OLS-7 cohort