fosstodon.org is one of the many independent Mastodon servers you can use to participate in the fediverse.
Fosstodon is an invite only Mastodon instance that is open to those who are interested in technology; particularly free & open source software. If you wish to join, contact us for an invite.

Administered by:

Server stats:

10K
active users

#FAIRprinciples

2 posts2 participants0 posts today

Our #FAIRsharingCommunityChampions #MarkMcKerracher has created a short video "Data tips: #FAIR principles in 60 seconds" as part of his work at the SDS repository at #UniversityofOxford, where he also recommends @fairsharing

Take a look at doi.org/10.25446/oxford.283235, and at the entire series of videos is available at portal.sds.ox.ac.uk/SDS_self_h

See also fairsharing.org/educational#fa

Domain Ontologies: Indispensable for Knowledge Graph Construction

AI slop is all around and increasingly extraction of useful information will face difficulties as we start to feed more noise into the already noisy world of knowledge. We are in an era of unprecedented data abundance, yet this deluge of information often lacks the structure necessary to derive meaningful insights. Knowledge graphs (KGs), with their ability to represent entities and their relationships as interconnected nodes and edges, have emerged as a powerful tool for managing and leveraging complex data. However, the efficacy of a KG is critically dependent on the underlying structure provided by domain ontologies. These ontologies, which are formal, machine-readable conceptualizations of a specific field of knowledge, are not merely useful, but essential for the creation of robust and insightful KGs. Let’s explore the role that domain ontologies play in scaffolding KG construction, drawing on various fields such as AI, healthcare, and cultural heritage, to illuminate their importance.

Vassily Kandinsky, 1913 – Composition VII (1913)
According to Kandinsky, this is the most complex piece he ever painted.

At its core, an ontology is a formal representation of knowledge within a specific domain, providing a structured vocabulary and defining the semantic relationships between concepts. In the context of KGs, ontologies serve as the blueprint that defines the types of nodes (entities) and edges (relationships) that can exist within the graph. Without this foundational structure, a KG would be a mere collection of isolated data points with limited utility. The ontology ensures that the KG’s data is not only interconnected but also semantically interoperable. For example, in the biomedical domain, an ontology like the Chemical Entities of Biological Interest (ChEBI) provides a standardized way of representing molecules and their relationships, which is essential for building biomedical KGs. Similarly, in the cultural domain, an ontology provides a controlled vocabulary to define the entities, such as artworks, artists, and historical events, and their relationships, thus creating a consistent representation of cultural heritage information.

One of the primary reasons domain ontologies are crucial for KGs is their role in ensuring data consistency and interoperability. Ontologies provide unique identifiers and clear definitions for each concept, which helps in aligning data from different sources and avoiding ambiguities. Consider, for example, a healthcare KG that integrates data from various clinical trials, patient records, and research publications. Without a shared ontology, terms like “cancer” or “hypertension” may be interpreted differently across these data sets. The use of ontologies standardizes the representation of these concepts, thus allowing for effective integration and analysis. This not only enhances the accuracy of the KG but also makes the information more accessible and reusable. Furthermore, using ontologies that follow the FAIR (Findable, Accessible, Interoperable, Reusable) principles facilitates data integration, unification, and information sharing, essential for building robust KGs.

Moreover, ontologies facilitate the application of advanced AI methods to unlock new knowledge. They support both deductive reasoning to infer new knowledge and provide structured background knowledge for machine learning. In the context of drug discovery, for instance, a KG built on a biomedical ontology can help identify potential drug targets by connecting genes, proteins, and diseases through clearly defined relationships. This structured approach to data also enables the development of explainable AI models, which are critical in fields like medicine where the decision-making process must be transparent and interpretable. The ontology-grounded KGs can then be used to generate hypotheses that can be validated through manual review, in vitro experiments, or clinical studies, highlighting the utility of ontologies in translating complex data into actionable knowledge.

Despite their many advantages, domain ontologies are not without their challenges. One major hurdle is the lack of direct integration between data and ontologies, meaning that most ontologies are abstract knowledge models not designed to contain or integrate data. This necessitates the use of (semi-)automated approaches to integrate data with the ontological knowledge model, which can be complex and resource-intensive. Additionally, the existence of multiple ontologies within a domain can lead to semantic inconsistencies that impede the construction of holistic KGs. Integrating different ontologies with overlapping information may result in semantic irreconcilability, making it difficult to reuse the ontologies for the purpose of KG construction. Careful planning is therefore required when choosing or building an ontology.

As we move forward, the development of integrated, holistic solutions will be crucial to unlocking the full potential of domain ontologies in KG construction. This means creating methods for integrating multiple ontologies, ensuring data quality and credibility, and focusing on semantic expansion techniques to leverage existing resources. Furthermore, there needs to be a greater emphasis on creating ontologies with the explicit purpose of instantiating them, and storing data directly in graph databases. The integration of expert knowledge into KG learning systems, by using ontological rules, is crucial to ensure that KGs not only capture data, but also the logical patterns, inferences, and analytic approaches of a specific domain.

Domain ontologies will prove to be the key to building robust and useful KGs. They provide the necessary structure, consistency, and interpretability that enables AI systems to extract valuable insights from complex data. By understanding and addressing the challenges associated with ontology design and implementation, we can harness the power of KGs to solve complex problems across diverse domains, from healthcare and science to culture and beyond. The future of knowledge management lies not just in the accumulation of data but in the development of intelligent, ontologically-grounded systems that can bridge the gap between information and meaningful understanding.

References

  1. Al-Moslmi, T., El Alaoui, I., Tsokos, C.P., & Janjua, N. (2021). Knowledge graph construction approaches: A survey of recent research works. arXiv preprint. https://arxiv.org/abs/2011.00235
  2. Chandak, P., Huang, K., & Zitnik, M. (2023). PrimeKG: A multimodal knowledge graph for precision medicine. Scientific Data. https://www.nature.com/articles/s41597-023-01960-3
  3. Gilbert, S., & others. (2024). Augmented non-hallucinating large language models using ontologies and knowledge graphs in biomedicine. npj Digital Medicine. https://www.nature.com/articles/s41746-024-01081-0
  4. Guzmán, A.L., et al. (2022). Applications of Ontologies and Knowledge Graphs in Cancer Research: A Systematic Review. Cancers, 14(8), 1906. https://www.mdpi.com/2072-6694/14/8/1906
  5. Hura, A., & Janjua, N. (2024). Constructing domain-specific knowledge graphs from text: A case study on subprime mortgage crisis. Semantic Web Journal. https://www.semantic-web-journal.net/content/constructing-domain-specific-knowledge-graphs-text-case-study-subprime-mortgage-crisis
  6. Kilicoglu, H., et al. (2024). Towards better understanding of biomedical knowledge graphs: A survey. arXiv preprint. https://arxiv.org/abs/2402.06098
  7. Noy, N.F., & McGuinness, D.L. (2001). Ontology Development 101: A Guide to Creating Your First Ontology. Semantic Scholar. https://www.semanticscholar.org/paper/Ontology-Development-101%3A-A-Guide-to-Creating-Your-Noy/c15cf32df98969af5eaf85ae3098df6d2180b637
  8. Taneja, S.B., et al. (2023). NP-KG: A knowledge graph for pharmacokinetic natural product-drug interaction discovery. Journal of Biomedical Informatics. https://www.sciencedirect.com/science/article/pii/S153204642300062X
  9. Zhao, X., & Han, Y. (2023). Architecture of Knowledge Graph Construction. Semantic Scholar. https://www.semanticscholar.org/paper/Architecture-of-Knowledge-Graph-Construction-Zhao-Han/dcd600619962d5c1f1cfa08a85d0be43a626b301

Due to 🇺🇦 bureaucratic requirements, many are trying to calculate the 'amount of #FAIR data' in December. This is absurd, as FAIR represents principles, not 'units of data.' There is no standardized method to measure how much data complies with FAIR, and moreover, these principles are multifaceted - each aspect can have varying levels of implementation. In short, FAIR assessment requires a comprehensive analysis, not a simple count.

#FAIRprinciples #OpenScience #DataSharing #FAIRData

🌙 FAIR and Open Data: Bridging the Gap!
At Lübeck’s Nights of Open Knowledge, Maria Chlastak hosted an interactive session on the need for both FAIR and open data in science. Highlighting the importance of open formats and research software, she guided a discussion on how true scientific impact relies on data that’s FAIR and open to all.
👉 nfdixcs.org/meldung/focus-on-r
#Nook24 #FAIRData #FAIRprinciples #OpenScience #OpenData #FDM #RDM #RSM

Best Practice for Publishing Environmental DNA (eDNA) Data According to FAIR Principles

biss.pensoft.net/article/13774

Biodiversity Information Science and StandardsBest Practice for Publishing Environmental DNA (eDNA) Data According to FAIR PrinciplesReversing global biodiversity loss will require transformational human actions and robust measurements of their effectiveness. Diversity assessment using environmental DNA (eDNA) has emerged as a cutting-edge technique with the potential to address the challenges of measuring biodiversity. Vast amounts of eDNA sequences and eDNA-based species detections are generated in scientific studies. These datasets are typically stored in a variety of different repositories in multiple formats, hindering their reuse (Berry et al. 2020). Ensuring the publication of eDNA data following the FAIR (Findable, Accessible, Interoperable, Reusable) principles (Wilkinson et al. 2016) would revolutionise environmental assessment, including monitoring of biodiversity, individual species, and interactions across extensive spatial and temporal scales, and generate critical knowledge for evidence-based management. Archiving FAIR eDNA data requires standardising data formats and vocabularies, cyberinfrastructures, guidelines, data sharing policy, and collaboration among scientists and institutions. Some of these requirements are addressed by existing data standards and infrastructures, including Darwin Core (DwC) (Wieczorek et al. 2012), Minimum Information about any (x) Sequence (MIxS) (Yilmaz et al. 2011), the Global Biodiversity Information Facility (GBIF) network, and International Nucleotide Sequence Database Collaboration (INSDC) partners (Arita et al. 2020).However, multiple challenges remain and FAIR data practices have yet to be established among the eDNA community. This is partly because critical attributes unique to eDNA data are not adequately accommodated by existing standards. For example, monitoring contamination and excluding non-target taxa, and the parameters used for quality filtering and species detection vary greatly between studies, depending on the study scopes and the associated financial and ecological costs of incorrectly inferring presence or absence. Making such information FAIR is needed for future studies reusing data and requiring high confidence levels in species detection and taxonomic assignment. Furthermore, the procedures of targeted-taxon detection approach (e.g., interpretations of quantitative polymerase chain reaction (qPCR) results in detecting the presence of DNA from individual taxa) have not yet been fully captured by existing standards. Increasing efforts have been made to establish minimum reporting requirements to validate eDNA study methods and data (Klymus et al. 2020, Thalinger et al. 2021). These requirements need further development to be translated into data standards and formats to enhance machine readability and reusability, and to support and guide the eDNA community, so that they are effectively utilised. In this talk, we share our best practice guide for formatting and publishing eDNA data, developed by an international multidisciplinary working-group comprising eDNA researchers, journal editors, and biodiversity and omics data scientists. We identified required data types, formats and metadata checklists through reviewing and integrating existing data standards, devising subject-specific vocabularies, and introducing additional terms to accommodate the distinctive properties of eDNA data. Implementing the FAIR eDNA data best practice guide, offers a pivotal step towards standardising and enhancing the publication and re-use of eDNA data.

Die Minimaldatensatz-Empfehlung für Museen & Sammlungen v1.0 ist online! 🥳 Sie benennt die wichtigsten Datenfelder für die Online-Publikation von Objektinformationen & ist LIDO-kompatibel! Da steckt wirklich viel Arbeit und Hirnschmalz drin. Danke @ddbkultur für diesen wichtigen Meilenstein 🙏
▶️minimaldatensatz.de

#Kulturdaten @museum #FAIRprinciples #LIDO #normdaten #GLAM #openglam #digitalculturalheritage

wiki.deutsche-digitale-bibliothek.deMinimaldatensatz-Empfehlung für Museen und Sammlungen (v1.0) - DDBinfo für Daten - Confluence - Deutsche Digitale Bibliothek