The SANBI-GBIF Team embarked on a two-day journey to Spain in December 2022 to participate in an exciting new project in collaboration with GBIF-Spain, funded through the Capacity Enhancement Support Programme (CESP) of the Global Biodiversity Information Facility (GBIF). Francisco Pando (GBIF-Spain) reported that the project is “a beautiful challenge” as it is taking us into an area that is “relevant and full of potential in the GBIF universe, and out of our comfort zone”.
The expected impact of the project entitled “Cross-continental partnership to investigate data mining approaches for impactful data use cases and stories” is to create capacity in data mining and big data handling, and to jointly develop an eLearning training course on “analytical techniques in biodiversity big data”. The project aims to capacitate upwards of 20 participants, with trainers from Spain, Ghana and South Africa, and will also produce relevant training materials for further rollout of training events.
This joint venture included training and information exchange on the eLearning platform which was released in February 2022 at a Data Cleaning workshop held in Cape Town. This platform is hosted by GBIF-Spain.
The need for training and capacity building in biodiversity informatics as a new area of science is critical. Here, vast quantities of biodiversity data put together in a uniform, fair way, and now available, in part led by GBIF, is recognized as an enabler of high impact innovative science, and a key to respond to the current societal challenges. However, the exploitation of data and this resource is often hindered by limited capacity. GBIF recognized this issue, and some of its national nodes, such as South Africa and Spain, include capacity building, through training and especially eLearning, as a strong pillar in their strategies and work plans. SANBI-GBIF and GBIF Spain aim to optimize efforts, and grow national capacity, with course content available to support the needs of the biodiversity informatics community.
Further topics covered at the meeting included artificial intelligence (AI) and ‘big data’ by Dr. Fernando Aguilar (Institute of Physics of Cantabria, specialising in computing and data science), and contributions by Professor David Galicia from the University of Navarra, with a focus on the use and application of software like “R”, “RStudio” and the value of APIs like rgbif for accessing, downloading, and visualising biodiversity data. The use of the R programming language in Jupyter Notebook, which is a web based interactive computing platform was also explored. Other aspects also included advances in machine learning and deep learning leading to the ability to decipher the content of natural images. This can provide new insight for researchers and make difficult analyses of natural images a routine task.
In addition, data cleaning techniques were discussed, particularly enhancing the quality of data with further discussion about aspects like the quality index for data publishing. This index allows for scoring of a dataset based on quality. Data reduction techniques applied to digital accessible knowledge (DAK) was also addressed, especially as this relates to aspects like time, geographic space, and the taxonomy of biodiversity data. Ultimately, data should be fit for use to ascertain what questions can be asked from the data.
An exciting outcome from this collaboration is the training workshop which will be hosted by SANBI-GBIF in Cape Town this June.