Science and innovation are not a luxury but a prerequisite for social and economic development (Annan, ). Across different fields, acquisition and analysis of large amounts of data have become a common practice to drive innovation (Yang, Huang, Li, Liu, & Hu, ), particularly with today's highly instrumented data collection methods (Borgman, Wallis, & Mayernik, ). The efficient analysis of such data has an unprecedented potential to transform how we tackle the major challenges faced by humanity, from climate change to food security (Hilbert, ).
Data‐driven innovation can only be achieved through greater access to data, through effective and efficient‐enabling resources, and ensuring that the best available expertise is harnessed through them. This is particularly the case when collaboration is needed to address the research questions at a continental scale, such as the effect of global impacts on rich, vast ecological systems in the present climate change scenario (Peters, Loescher, SanClements, & Havstad, ). One way of ensuring these conditions is to cultivate and foster a research data infrastructure or cyberinfrastructure (Florio & Sirtori, ), which aims to meet the needs of the research community for democratic access to digital resources and collaborative environments around common practices (Atkins, ). A cyberinfrastructure includes high performance computing (HPC) and use of large shared data storage, a platform or stack of services that provides methods for leveraging those physical resources, and a community of people and institutes that manage these resources in a sustainable, secure, collaborative, and interoperable way (Goff et al., ).
Colombia's topography and location near the equator make it a highly biodiverse country (Rangel‐Ch, ). The country is one of the 17 “megadiverse countries” in the world according to the United Nations Environment Programme (UNEP). Colombia has suffered an expensive internal conflict for five decades, which was only recently alleviated through a peace agreement in late 2016 (Baptiste et al., ). Lack of stability and limited opportunities in at least half of the country, particularly remote rural regions, have resulted in evident negative socioeconomic and ecological impacts (Baumann & Kuemmerle, ).
The “Colombia BIO” programme lead by the Colombian Research Council (Colciencias) is seeking to make sustainable use of this natural capital to drive the growth of the Colombian bioeconomy, social equality, and a long‐lasting peace (Sierra et al., ). In “Colombia BIO”'s expeditions, large amounts of data about Colombia's ecosystems are being collected, including novel biodiversity in regions that were previously unexplored due to the internal conflict (Gonzalez, Arenas, Tovar, Pulido, & Tenorio, ). As 2019, Colombia is one of the 11 country funders of the “Earth Biogenome Project” (EBP; earthbiogenome.org). The EBP “can be viewed as infrastructure for the new biology” that aims to sequence, catalogue, and characterize the genomes of all known eukaryotes to inform ecosystems preservation under the growing impacts from climate change and overexploitation (Lewin et al., ). The EBP consortium in Colombia is led by the University of Los Andes and “BRIDGE Colombia” (Prof F. Di Palma, personal communication).
The capacity to share and analyze this information needs to keep pace with the wealth of information gleaned from these new and upcoming explorations (Canhos et al., ). To date, the national catalogue of Colombian biodiversity (SiB Colombia) (Abud et al., ) includes 7,848 endemic species and around 10% of all known species. Researchers and policymakers need to be provided with comprehensive evidence to inform evidence‐supported decisions on biodiversity management and protection.
For that, the Essential biodiversity variables (EBVs) define a minimum common set of data about taxa (distribution, genome, phenome, traits, ecological interactions, etc.) including their environmental and evolutionary context (La Salle, Williams, & Moritz, ); although EBVs' practical implementation remains a challenge (Kissling et al., ; Pereira et al., ).
“C3biodiversidad,” the Colombian Cyberinfrastructure Consortium for Biodiversity, aims to develop a research cyberinfrastructure in Colombia, particularly for the analysis of biodiversity data, through the sharing of computing resources already available in the country, promoting new resources under “open access” incentives, and building skilled human capital capable of operating these resources in the long‐term. Here (Table S1), we provide a summary of the analysis of Colombia's internal and external strengths and weaknesses (SWOT) for building a local cyberinfrastructure organized into four subjects (following Sections 4 to 7). These conclusions are an output of the “C3biodiversidad workshop” held in Bogota, Colombia in June 2018. The workshop included 36 experts from 16 leading Colombian institutions and a group of international facilitators and experts that represented a fair distribution of interest groups. These discussions also highlighted the need for coordinating with existing centers of excellence in the country, tapping into successful initiatives in the region, and leveraging on existing international open source resources and projects. Building on these conclusions, we have identified key priorities (Table ) and developed a reference framework (Figure ) for building cyberinfrastructure. While the conclusions reflect Colombia's environment at the time of the workshop, we believe these can be applied to other middle and upper‐middle income countries. The four key priorities are discussed in the following sections.
Four priorities for building cyberinfrastructure in emerging countries. These priorities should be developed in a coordinated manner by local innovators. For each priority, possible interventions are also included as examplesPriorities | Vision | Mission | Objectives | Open SC. Framework* | Interventions |
Improve the provisioning and availability of physical infrastructure | Accelerate data‐intensive scientific research. | Organically build a distributed sustainable cooperative computational platform. |
|
|
|
Grow training in scientific data analysis for users and providers | A generation skilled in scientific data analysis. | Promote a coordinated accessible programme of training in scientific data analysis. |
|
“Cross‐cutting issue” |
|
Develop and enforce a national policy for research data | Data‐supported decision‐making. | Develop and enforce a national policy for research data. |
|
|
|
Engage diverse stakeholders in research projects and funding planning | A society highly involved and interested in science and technology. | Develop transversal research schemes that reward stakeholder's engagement. |
|
|
|
*Based on the conceptual open‐science (OS) framework defined by the Colombian Research Council Colciencias (Colciencias, ).
A reference framework consisting of four priorities to facilitate the socioeconomic growth in emerging countries through innovation by developing a research cyberinfrastructure
Biodiversity cyberinfrastructures increase data access and reusability, and also support education and effective public policies. To balance the potential costs in the context of the scientific benefits, the research community often self‐organizes to identify the broad‐scale questions that require large data‐driven analysis that can only be addressed by expensive infrastructure, which is then funded by research councils usually on the condition to be shared as a community resource.
The main challenge Colombia and other middle and upper‐middle countries currently face is their limited access to computational capacity and physical connectivity between research institutions. This technological gap is mostly the result of limited funding, the high cost of foundational infrastructure, inconsistent interest from multinational vendors, and short‐term strategic planning. As a result, key academic and industrial institutions prioritise limiting uncertainty, unforeseen overheads, and imported commodities. Still, major universities and centers in Colombia and other emerging countries have access to HPC infrastructure (Cazar, ). However, these infrastructures are primarily, and usually exclusively, implemented to meet the internal needs of the host institution.
It is a priority to deploy high‐performance computational platforms in the institutions of a country as a requirement to accelerate research and skills training. We believe the best option is progressively integrating into increasing orders of complexity existing resources under a fair‐sharing policy that prioritises the host institution while promoting sharing new computational and data storage capacities through capital investments and incentives. Infrastructures require substantial financial investments in the hardware itself, physical space, environment control, management, and maintenance. For example, the CyVerse cyberinfrastructure is leveraging on the considerable investment from the USA's National Science Foundation (NSF) (Goff et al., ).
Distributed infrastructures are composed of multiple independent and distributed resources that act as one, often with resources provided by different institutions (Towns et al., ), so the initial costs and complexity are distributed. These are usually rolled out in stages in increasing degree of decentralization (Chaterji et al., ).
A federation of heterogeneous computing resources as the kind proposed needs to address two managerial requirements in order to be successful. Firstly, because most institutions want to retain the right to define their own policies on data management and execution priorities, the system must guarantee that users can access each resource at the right level of privilege. As a result, a distributed system typically “authenticates, authorises, and accounts” (AAA system) the user for each individual system in a centralized server. Secondly, when computational resources and data are dispersed in storage locations among participating organizations, end users should be relieved of the complexities associated with negotiating access rights with individual organizations, moving data back and forth, or porting programs to process the data (Langmead & Nellore, ). Technical software solutions for example, data management middleware such as the open source iRODS software (Rajasekar et al., ), workflow software and virtual machines (Boettiger, ; Köster & Rahmann, ) provide tested options for data federation, data replication, quota management, and access control etc.
A successful precedent of distributed high‐performance computational platform is the Iberian‐American Network for High‐Performance Computing (RICAP, 2017–2020). RICAP's resources are distributed across 11 sites in various Latin‐American countries, which are connected through RedClara, the network of Latin America's academic networks (Cazar, ). The existence in many emerging countries of state‐sponsored high‐speed academic‐network providers (Red Nacional Académica de Tecnología Avanzada, RENATA, in the case of Colombia) is key to facilitate the necessary physical connectivity between institutions. However, our SWOT analysis highlighted that Colombian research institutions actively use the connectivity services from private providers too (Table S1).
The absence of a comprehensive policy that regulates and enforces access to research data restrains research. It is a priority to develop and implement a national policy for research data that regulates the access, processing and sharing of data in a standardized way. This would facilitate data‐supported decision‐making, as well as scientific excellence and innovation. In the case of biodiversity, the national implementations on “Access and Benefit Sharing” of genetic resources, designed to give greater control over the natural capital, have also generated regulatory regimes fraught with unintended consequences, this is not exclusive to Colombia (Prathapan et al., ; Wight, ). In Colombia and other emerging countries, there are well‐developed policies that regulate other data types, such as Government (e‐gov) and personal data that can serve as examples to develop research data policy (Sanabria, Pliscoff, & Gomes, ). We recommend requiring open access to taxpayer‐funded research, including both generated data and research publications, as recommended by the Organisation for Economic Co‐operation and Development (OECD) (Arzberger et al., ). When funding is a limiting factor, policy needs to maximize return on investment in data generation.
Good data management is not a goal in itself, but rather is the key conduit guaranteeing experimental reproducibility (Baker, ) and maximizing return on investment in data generation by facilitating its reuse by third parties. Four foundational principles, findability, accessibility, interoperability, and reusability (FAIR) usually guide good data management practices among producers and publishers (Wilkinson et al., ). In Colombia, Colciencias has recently published its vision to promote an “open science” in the country based on the FAIR principles (Colciencias, ). Significant challenges to implementing data management arise from the size and complexities of modern scientific collaboration (Borgman et al., ). Still, when psychology researchers were asked to rank barriers to data sharing, technological barriers (such as “My dataset is too big” or “There is no suitable repository to share my data”) were at the bottom of the list (Houtkoop et al., ). Similar results were obtained in other disciplines (Van den Kaye, Bruce, & Fripp, ; Eynden et al., ), or in the specific case of Colombian researchers (OCyT, ).
Data sharing can be incentivized by normative pressure, for example through a strong centralized information system or due to requirements of funding agencies and journals to release research data at the time of publication or end of funding (Wolkovich, Regetz, & O'connor MI., ). In large projects, funding agencies and international directorates will need to work together in joint initiatives to overcome cultural barriers and geopolitical constraints among countries (Vargas et al., ). However, regardless of journal or funder requirements, data are routinely shared in some scientific fields as a result of a cultural shift, scholarly altruism, and peer approval (Kim & Stanton, ; OCyT, ). Also, data sharing can be promoted by recognizing those who analyze it as creative collaborators in need of career paths (Chang, ). Highlighting and disseminating specific research communities and projects that follow standards, curation and preservation approaches can serve as showcases (Canhos et al., ; Sanabria et al., ). For example, SIB Colombia was rewarded as the best “open science initiative” in the country in 2017 by Colciencias. Further interventions in this area include creating the figure of “Data Champions” (volunteers who advise researchers in their institutions on good research data management and promote FAIR guidelines) and promoting a model where institutional repositories would coexist with a centralized national data management repository.
Skilled labor emigration and limited advanced training opportunities for new recruits are constant risks in middle and upper‐middle countries (O'Mahony, Robinson, & Vecchi, ). So, it is a priority to design and promote a coordinated programme of training in scientific data analysis tailored to different career levels, as well as providing opportunities for career development, to address “brain drain.” The demand for training is high, our analysis of Colombia's situation evidenced that opportunities for coordinated training between strong groups have not been fully explored, and internships and visits between groups are uncommon. Possible interventions include providing technical skills to experts in data analysis, coordinating the training offered in the country, engaging with the global training communities and funding visits from international trainers and staff exchanges. The amount of data generated by high‐throughput experimental technologies has increased the demand for scientists involved in research to acquire a minimum set of capabilities in bioinformatics to effectively communicate with bioinformaticians (Tan, Lim, Khan, & Ranganathan, ; Welch et al., ). The Global Organisation for Bioinformatics Learning, Education and Training (GOBLET) surveys provide “perspectives on the current status of training gaps” and evidence that “the need for bioinformatics training is both real and urgent, and requires worldwide solutions” (Attwood et al., ).
Running effective courses and workshops means having tailored teaching materials and instructors trained in how to teach students who may come from different backgrounds and have different goals. Not surprisingly, the completion rate for self‐paced Massive Open Online Courses (MOOCs) is less than 10% (Jordan, ). However, trainers are available in Colombia and equivalent countries. For example, there is an academic network in Colombia focused on bioinformatics, as well as a biannual national bioinformatics conference, which is often organized in collaboration with other scientific societies. Another key strength is the availability of graduate system administrators and developers; formal training is available through at least four M.Sc. programmes in bioinformatics, data science, or computational biology, as well as several in computational sciences. On the one hand, Train‐the‐Trainers (TTT) workshops, where future instructors are equipped with practical skills to effectively teach, are a cost‐effective way to prepare instructors (Pfund et al., ; Via et al., ). On the other hand, the “keep training local but act to deliver and develop training materials globally” motto highlights how a community might break down the effort of producing training materials in a modular way (Williams & Teal, ). This decentralized approach allows training to become more accessible to more people while “responding at scale to rapidly evolving science” (Teal et al., ). For example, software Carpentry and Data Carpentry lessons are developed collaboratively on Github by volunteers.
We believe it is a priority securing the engagement of a diverse range of stakeholders in research planning, and particularly in cyberinfrastructure planning and execution. Researchers are the driving force in the innovation process, and they will only engage in the cyberinfrastructure if they perceive the cyberinfrastructure as a way to ease data management and analysis. There is consequently a need to survey a priori the needs of the community (Cutcher‐Gershenfeld et al., ; Nativi, Craglia, & Pearlman, ). For example, the DataONE cyberinfrastructure (
The workshop results also proposed promoting private–public partnerships and extending the involvement of the third sector (non‐profit associations, charities, cooperatives, etc.) in research. While researchers are the driving force in the innovation process, the environment where each researcher works (industry, academia, nonprofit, general public, or government) frames how researchers can conduct that research. Our analysis in Colombia highlighted that there is a limited number of initiatives to engage stakeholders in research and a variable interest in research from different sectors. Partnerships between industry, third sector, government and academia appear to be more established in the agricultural and environmental sectors, for example. We identified the following three positive recent initiatives in Colombia: 1. Specific research public funding opportunities involving industry; 2. a new research funding system from the regions to promote regional redistribution; and 3. increasing international investment after Colombia's access to the OECD and the peace agreement process.
Finally, secondary stakeholders (citizens, educators, librarians, policymakers, funding officers, editors, professional societies, etc.) have their particular interests and priorities, and consequently a say in planning. When asked about the impact of open science on society, researchers in Colombia highlighted the mutual benefits of improving the social awareness, reproducibility and general efficiency of science (OCyT, ).
The authors would like to acknowledge the support from the UK Research and Innovation (UKRI) Global Challenges Research Fund (GCRF) GROW Colombia grant via the UK's Biotechnology and Biological Sciences Research Council (BB/P028098/1), as well as from Colciencias Colombia BIO project and the British Council in Colombia. This publication builds on the analysis from a panel of experts at the Colombian Science Council (Colciencias) in Bogota, Colombia on 16‐18 June 2018. As a result, we would like to acknowledge the contributions of Alejandro Caro, AGROSAVIA; Andrés Pinzón Velasco, National University of Colombia; Camilo Corchuelo Rodríguez, Santo Tomás University; Cesar Orlando Díaz, Jorge Tadeo Lozano University; Daniel Fernando López, Humboldt Institute; Dany Molina, Colombia's Center for Bioinformatics and Computational Biology (BIOS); Diego Rincón, Catholic University of Colombia; Gastón Lyons, University of Los Andes; Jaime Erazo, Earlham Institute; John Jaime Riascos, CENICAÑA; Jorge William, Colombia's Center for Bioinformatics and Computational Biology (BIOS); Juan Manuel Anzola, Corpogen; Laura Natalia González García, University of Los Andes; Leroy Mwanzia, International Center for Tropical Agriculture (CIAT); Luz Miriam Díaz, National Academic Network of Advanced Technology of Colombia (RENATA); María Camila Martínez, CENICAÑA; Patricia Jaramillo, National Academic Network of Advanced Technology of Colombia (RENATA); Paula Reyes, AGROSAVIA; Raúl Ramos Pollán, University of Antioquia; Romain Guyot, Autonomous University of Manizales; Tomás Viloria Lagares, University of Los Llanos; and Yesid Cuesta Astroz, University of Antioquia.
JdV, RPD, WH, and FdP conceived and financed this work. EB‐H, JD, and JPM‐R organized the stakeholders' workshop where the data were collected. JdV, RPD, GJH, AM, MM‐T, and NF‐F coached the workshop and facilitated the data collection. All authors participated and contributed to the "C3biodiversidad" workshop in Bogota upon the conclusions of which this manuscript builds on, particularly DE, MAC‐A, NEA‐S, JDP‐D, JC‐A, and AVCR analyzed and interpreted data for the SWOT analysis. All authors contributed to the discussion of the structure of the manuscript. JdV, RPD, and NF‐F drafted the article with contributions by all authors at different stages. All the authors revised and approved the final version.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2020. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Societal Impact Statement
Colombia is a “megadiverse” country with vast natural resources. A history of recent conflict means that information is only now being collected on the natural capital of regions that were previously unexplored. Better access to data, tools, and expertise is required for evidence‐supported decisions on the conservation of these resources. The development of a bespoke cyberinfrastructure could help fulfill this need by providing access to digital resources in a collaborative cyberenvironment. We outline key priorities and develop a reference framework for building cyberinfrastructure in Colombia. This framework could be applied to other fields and countries to promote knowledge exchange, scientific innovation, and socioeconomic growth.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details
















1 Earlham Institute, Norwich Research Park, Norwich, UK
2 Systems and Computing Engineering Department, Universidad de Los Andes, Bogotá, Colombia
3 SiB Colombia ‐ Instituto de Investigación de Recursos Biológicos Alexander von Humboldt, Bogotá, Colombia
4 Faculty of Sciences, Universidad de Los Andes, Bogotá, Colombia
5 Faculty of Sciences, Universidad Antonio Nariño, Bogotá, Colombia
6 Apolo Supercomputing Centre, EAFIT University, Medellín, Colombia
7 Faculty of Sciences, EAFIT University, Medellín, Colombia
8 The John Bingham Laboratory, NIAB, Cambridge, UK
9 Biotechnology Institute, Universidad Nacional de Colombia, Bogotá, Colombia
10 Translational and Integrative Sciences Lab, Oregon State University, Corvallis, OR, USA
11 IBERS, Aberystwyth University, Aberystwyth, UK