U.S. Semantic Technologies Symposium Series

4th U.S. Semantic Technologies Symposium
Sept. 29 - Oct. 1, 2022 at Michigan State University, East Lansing, MI

Program

Event Date:

Thursday, September 29, 2022 10:30 AM – Saturday, October 1, 2022 1:00 PM (Eastern Time)


Schedule

Schedule at a glance (see talk details below schedule):

All session will be at MSU Union, 49 Abbot Rd, East Lansing, MI 48824 (except reception).

September 29, 2022, Thursday Type Lake Huron Room, 3rd Floor Lake Superior Room, 3rd Floor
8:30am Introduction and Welcome, Housekeeping Anne Thessen & Hande Kucuk McGinty  
9:00am Keynote Nicole Vasilevsky  
10:00am Coffee Break    
10:30am Sessions for Special Interest Groups Convergence Accelerator Easier RDF
12:00pm Lunch    
1:00pm Sessions for Special Interest Groups Natural Hazards Perfect Syllabus
2:30pm Break   Posters on Display
3:00pm Submitted Talk NHANES  
3:15pm Submitted Talk Hybrid AI  
3:30pm Submitted Talks COSMO  
3:45pm Lightning Talks TBD  
4:00pm Poster   Posters on Display
5:00pm End (Reception at 7:00 pm) Steering Committee Meeting (Invitation Only)  
       
       
September 30, 2022, Friday Type Lake Huron Room, 3rd Floor Lake Superior Room, 3rd Floor
8:30am Introduction and Welcome, Housekeeping Anne Thessen & Hande Kucuk McGinty  
9:00am Keynote Ora Lassila  
10:00am Coffee Break    
10:30am Sessions for Special Interest Groups Food & Health I Common-sense reasoning
12:00pm Lunch - Sponsored Talk with Food BioTeam  
1:00pm Sessions for Special Interest Groups Food & Health II Environmental Data Integration
2:30pm Break   Posters on display
3:00pm - 4:30pm Sessions for Special Interest Groups Food & Health III AI readiness
4:30pm - 6:00pm Sessions for Special Interest Groups FAIR semantic resources AI readiness
6:00pm End    
       
       
October 1, 2022, Saturday Type Lake Huron Room, 3rd Floor Lake Superior Room, 3rd Floor
8:30am Introduction and Welcome, Housekeeping Anne Thessen & Hande Kucuk McGinty  
9:00am Keynote John Graybeal  
10:00am Coffee Break    
10:30pm Sessions for Special Interest Groups Food Shed  
12:00pm Town Hall with Brown Bag Lunch    
1:00pm End    

Keynotes And Invited Talks

Nicole Vasilevsky

Nicole Vasilevsky’s research focuses on the development and use of semantic technologies to facilitate new knowledge discovery and promote scientific reproducibility as part of the Translational and Integrative Sciences Lab at the University of Colorado Anschutz Medical Campus. Her expertise is in biocuration, and the development of biomedical ontologies and data standards for phenotypes, diseases. and other biomedical domains with the goal of improving disease diagnostics and health outcomes. Additionally, she is active in educational efforts in ontology development and data science.

John Graybeal

John Graybeal is a Technical Program Manager at the Stanford Center for Biomedical Informatics Research (BMIR), the home of Protégé (and WebProtégé), BioPortal (and OntoPortal), and CEDAR. Previously he was a Principal Investigator of the Marine Metadata Interoperability Project, which created the MMI Ontology Registry and Repository, and later led to the creation of the ESIP Community Ontology Repository. Most recently he has been heavily involved in metadata and harmonization efforts on behalf of the NIH Rapid Acceleration of Diagnostics RADx Data Hub, and supports the NIH Human Biomolecular Atlas Program (HuBMAP) project, as well as contributing to the Simple Standard for Sharing Ontological Mappings (SSSOM) project.

Ora Lassila

Ora Lassila is a Principal Graph Technologist in the Amazon Neptune team. Earlier, he was a Managing Director at State Street, heading their efforts to adopt ontologies and graph databases. Before that, he worked as a technology architect at Pegasystems, as an architect and technology strategist at Nokia Location & Commerce (aka HERE), and prior to that he was a Research Fellow at the Nokia Research Center Cambridge. He was an elected member of the Advisory Board of the World Wide Web Consortium (W3C) in 1998-2013, and represented Nokia in the W3C Advisory Committee in 1998-2002. In 1996-1997 he was a Visiting Scientist at MIT Laboratory for Computer Science, working with W3C and launching the Resource Description Framework (RDF) standard; he served as a co-editor of the RDF Model and Syntax specification.

Much of his research work at the Nokia Research Center focused on the Semantic Web and particularly its applications to mobile and ubiquitous computing. He collaborated with several US universities, and was an active participant in the DARPA Agent Markup Language (DAML) program.

His positions before that include Project Manager at the Robotics Institute of Carnegie Mellon University and Research Scientist at the CS Laboratory of Helsinki University of Technology (now Aalto University). He has also worked as a software engineer in several companies (including his own start-up). He is the author of more than 100 conference papers and journal articles. Dr. Lassila holds a Ph.D. in Computer Science from the Helsinki University of Technology (renamed Aalto University some years ago).


Sessions

NSF’s Convergence Accelerator Track A: Open Knowledge Networks (Convergence Accelerator)

The NSF Convergence Accelerator is a new organizational structure and activity within the NSF, designed to accelerate the transition of use-inspired convergence research into practice in areas of national importance. A guiding rationale of the Convergence Accelerator is that to achieve progress on scientific and societal challenges it is necessary to develop teams that can share expertise across research fields as well as with multiple partners and stakeholders, including the ultimate users of the research products.

Track A of the NSF Convergence Accelerator is themed “Open Knowledge Networks” (OKNs) funding the creation of nonproprietary infrastructure for building open knowledge networks to enable new modes of data-driven discovery. Semantic Web technologies including knowledge graphs and ontologies play an important role in these projects. Phase 1 of Track A ran from 2019 to 2020 and funded 21 OKN projects at approx. $1M each. Five projects then received a Phase 2 award of approx. $5M each, running from 2020 to 2022. At the time of us2ts 2022, these Phase 2 projects will have reached the final stretch of their runtime. At the session, representatives from each of these projects will present project work and outcomes, and be available for a panel discussion.

Session Organizers: Lara Campbell, Pascal Hitzler, and Tom Narock


Toward Easier RDF (Easier RDF)

In spite of RDF’s 20+ year history, it still holds only a fraction of the graph database market. How can we make RDF – or some RDF-based successor – easy enough for most developers, who are new to RDF, to be consistently successful? Numerous ideas have been discussed since this topic was raised three years ago. The session focuses on three complementary themes:

  1. A community driven RDF cookbook. A github-driven collection of ready-to-go “recipes” for solving common RDF problems, along with a set of use case tutorials that employ those recipes to build a working application.
  2. A bundled RDF software release. Analogous to a LAMP stack, the idea is to create a bundled release of free and open source software that provides all the tools a developer is likely to need, to implement typical RDF applications.
  3. Blue sky RDF 2.0. If RDF were redesigned, what changes should be made? If a higher-level successor to RDF were developed, what features should it have? This theme explores future possibilities. LinkML will be presented as one example of a higher-level form of RDF that is already in use and hoping to solve problems in biological modeling. LinkML compiles down to RDF, JSON and other representations.

Session Organizers: David Booth and Peter Winstanley


A Reference Ontology for Natural Hazards and Disasters (Natural Hazards)

Ever since humans started living in dense urban and suburban settlements, natural hazards, such as hurricanes, wildfires, and drought have presented risks of major loss to human life and property. Recent trends indicate that the risks of natural hazards leading to disasters are further exacerbated by rapid climate change. While a huge volume of data is available to predict and describe natural hazard events and their related impacts, these data are highly heterogeneous in structure and semantics. This heterogeneity leads to huge challenges in developing effective, evidence-based strategies for the mitigation, preparedness, response, and recovery from natural hazards that result in disasters. Integrated management of these heterogeneous hazard- and disaster-related datasets is key to efficient and effective preparation and response to disasters. However there is still a significant deficiency in ontologies that can represent and connect data across independent sources and variant schemas, as well as provide insights, through formal logic-supported inferencing, into the processes and relationships involved in the Natural Hazard-Disaster dynamic. Improved disaster life-cycle responses will depend on information gleaned from disparate data sources documenting past hazard-disaster events, outcomes, and responses; occurring in differing environmental contexts, and under changing climate conditions, etc. Current hazard- and disaster-related conceptualizations (i.e., glossaries, taxonomies, and ontologies) either catalog a range of relevant terms with little or no formal semantic inter-connections, or, if more semantically driven – focus on some specific type of hazard (e.g. wildfires) or crisis management domain (e.g. wildfire response) from the limited perspective of preventing or minimizing some particular type of disaster (e.g. wildfire damage to crops or infrastructure). However, the broad and inter-connected nature of phenomena in the hazard-disaster domain surely can benefit from a more comprehensive and formal representation based on shared concepts and relationships. This effort should proceed while accommodating, i.e. aligning with, existing standard taxonomies and dataset schemas in the hazard domain, to better enable discovery and understanding of causal inter-relationships among hazard and disaster types distributed across various institutional/agency documents and data resources. Such alignment can enable synthesis and complementarity across autonomous information resources, and inferred insights– e.g. better disclosing the likelihood that 1) wildfires in some region, followed by 2) heavy rainstorms, could lead to 3) debris flows, that 4) destroy structures and cause human injury and death. The above “real” example involves three Hazards in succession, that result in a Disaster. The potential for better understanding such causal chains and inter-dependencies motivate the need for a broad reference ontology across the hazard-disaster domain. This reference ontology also must take into consideration prior conceptualizations (established taxonomies and vocabularies) while bringing in new concepts and relations to enable more holistic understanding across these different terminologies or database schemas. This panel session will bring together knowledge modeling engineers and domain ontology experts to decide on best approaches for constructing a reference ontology for Natural Hazards and Disasters.

Session Organizers: Shirly Stephen, Mark Schildhauer, and Rui Zhu


Building the Perfect Syllabus (for Knowledge Engineering) (Perfect Syllabus)

The purpose of this breakout session is to build out an opinionated approach to building the perfect syllabus. In particular, we want to consider the pedagogy of knowledge engineering from three different perspectives, generate a draft of an effective syllabus, and start an open and living document that describes the educational and pedagogical needs in knowledge engineering.

Session Organizers: Cogan Shimizu, Ryan McGranaghan, and Adam C. Kellerman


Food and Health: Agriculture and Nutrition Ontology connectivity (Food & Health I, II, III)

Great emphasis has been placed in the past on developing agricultural techniques to increase production to alleviate hunger and reduce the cost of food on the one hand, and on the other, research into how particular kinds of nutrition help human health. What are the relationships in the farm-to-fork continuum that exist between them? For example, what best-practices from a soil sustainability perspective align with the nutritional content of produced food? What kind of ontology modelling or terminology is required to describe the selection of samples from agricultural contexts during growth and harvesting, such that impacts of soil amendments, and crop stress can be assessed on agricultural outputs? The workshop will take a deep dive into the field-level agricultural modelling and biosample contextual data required. Agriculture (AGRO) and nutrition (CDNO) ontology curators will be present to facilitate discussion and feedback on specification development. Workshop will seek to attain a generic data collection model that is refine-able for domain specific sampling such as fisheries and aquaculture that capture variables such as fish feeding, chemical treatments, etc.

Session Organizers: Damion Dooley, Hande Küçük-McGinty, and Liliana Andres Hernandez


Knowledge-based commonsense reasoning and explainability (Common Sense Reasoning)

Common sense reasoning and explainability are essential for building more advanced ‘general’ AI systems that have human-like capabilities and reasoning ability, even when facing uncertain, implicit (or potentially contradictory) information. Recognizing the importance of these topics, researchers in several communities have increasingly engaged in researching and evaluating explainable commonsense reasoning on tasks pertaining to question answering and abductive reasoning. Unlike other ‘pure’ or logical reasoning tasks where the knowledge base and inference axioms can be separated (at least in principle), knowledge is an important aspect of explainable commonsense reasoning. This knowledge may be acquired over large natural language (and even visual) corpora, as transformer-based models such as BERT and GPT have attempted to do, or through knowledge graphs of concepts, relations and events constructed using natural language processing and crowdsourcing techniques. Once acquired, the knowledge must also be represented appropriately to support human-like reasoning and question answering.

In this tutorial, we will present state-of-the-art techniques for machine explainability and open-domain commonsense reasoning, based both on classic research as well as modern advances in the Natural Language Processing and Semantic Web communities. We will describe how knowledge can be obtained, combined, and organized into dimensions, as well as applied for reasoning on downstream tasks, such as question answering and story understanding.

Knowledge-based commonsense is a key-part of interpretability and eXplainable AI (XAI); so that end users can understand the underlying processes. In this tutorial, we will present an overview of explainability techniques that leverage knowledge graphs, semantic rules, ontologies and commonsense reasoning. These XAI techniques are motivated by building user trust which can supplement data driven decisions with human-level knowledge and understanding.

Tutorial presenters: Filip Ilievski, Information Sciences Institute, University of Southern California; Leilani Gilpin, University of California Santa Cruz.


Environmental Data Integration and Intelligence Services using Knowledge Graphs (Environmental Data Integration)

Knowledge graphs are not merely a set of technologies but a novel paradigm for the representation, retrieval, integration, and reasoning of data from highly heterogeneous and multimodal sources. Within just a few years, knowledge graphs have become a core component of modern search engines, intelligent personal assistants, business intelligence, and so on. Interestingly, despite large-scale data availability, they have yet been as successful in the realm of environmental studies.

This session will provide an exposition of large-scale knowledge graphs for environmental data integration and intelligence. More specifically, a framework of leveraging spatial and temporal knowledge as the nexus to integrate environmental data of various themes will be emphasized in this tutorial based on real world examples. In addition, an overview of state-of-the-art spatiotemporally-explicit machine learning methods, tools, and their potential as well as limitations addressing geospatial challenges will be discussed in this session.

Session Organizers: Lu Zhou, Gengchen Mai, Cogan Shimizu, Shirly Stephen, Thomas Thelen, Ling Cai, Yuanyuan Tian, and Zhining Gu


Semantics of Sustainability: Building a Smart Foodshed Application Ontology (Food Shed)

Compilation of the UN Sustainable Development Goals (UN SDGs) was a great worldly achievement that shed light on the fragile nature and health of humans and the natural, economic, and socio-technical environments in which we live, while providing a framework of aspirational states-of-being. Further codification of the UN SDGs into an ontology (SDGIO) has provided a semantic framework for the Sustainable Development Goals, their targets, and indicators and the large array of entities to which they refer. Analogous efforts have put forth comprehensive covering sets of inter-related food systems sustainability issues and indicators. At the same time, development agencies like USAID have developed “Practioners’ Guidebooks” in order to “better identify capacity development opportunities and activities that continuously improve innovation capabilities within value chains, agrifood systems, and agricultural innovation systems.”

While each of these resources represent a step in advancing sustainable and resilient food systems, they have been developed in disparate, siloed communities. The objective of this workshop is to develop an ontological framework for carrying forward development of a logically coherent application ontology that food systems actors can use to assemble smart foodshed cyberinfrastructure components that enable health/environment-improvement, financially sustainability, and resilience to shocks.

This 90 minute workshop will begin by exploring the varied perspectives of food system actors/stakeholders, and explore the ontological foundations necessary to cohere scientific metrics of sustainability/resiliency issues together with experiential knowledge of food systems actors and the challenges they face in innovating and implementing positive change.

The workshop will then take a deep dive into aligning and updating the United Nations Sustainable Development Goals Interface Ontology (SDGIO), food systems sustainability issues and indicators, and a compilation of objectives, strategies, tactics, and factors affecting capacity of individuals, organizations, and networks to develop local, regional, and global solutions to complex agri-food system problems relative to sustainability and resiliency.

The workshop will conclude by outlining the steps necessary to further cohere identified resources into a Smart Foodshed Application Ontology capable of underpinning local, regional, national, and global-scale Smart Foodshed Knowledge Graphs and related cyberinfrastructure components.

This amendment, to the panel “Semantics of Sustainability: Building a Smart Foodshed Application Ontology”, advances to employ our meta-framework-based participative approach to manage stakeholders’ engagement during the workshop session. We suggest strategies to outline ontological coherent and robust foundations, from the varied session participants’ perspectives, for further developing a Smart Foodshed Application Ontology, to enhance transitioning to healthier and more sustainable food systems.

Session Organizers: Giorgio Alberto Ubbiali, Patrick Huber, Kurt Richter, Andrea Borghini, Nicola Piras, and Matthew C. Lange


Ontology and Data Readiness for AI/ML (AI Readiness)

This US2TS session provides a summary or highlights of the ICBO2022 conference around the theme of “ontology and data readiness for AI/ML”. It is also an extension of the FAIR Ontology Harmonization and TRUST data interoperability session. This session will focus on:

  1. What value can ontological engineering bring to supporting AI/ML data readiness?
  2. How can the ontology community best implement such support?

Ontologies provide prior knowledge to AI/ML models, decrease the size of required ML training sets, support AI/ML explainability through definitions, and support consistency checking through explicit logical commitments. We aim to bring together domain experts from different biomedical areas to discuss these issues.

Session Organizer: Yongqun “Oliver” He, John Beverley, Sivaram Arabandi, Asiyah Yu Lin, Hande Küçük McGinty


Towards FAIR, Trustworthy and Harmonized Semantic Resources (Part of FOHTI-22: FAIR ontology harmonization and TRUST data interoperability Workshop)(FAIR Semantic Resources)

To avoid repetitive modeling and harmonize the common knowledge from different groups/organizations, it is desirable to develop some consensus of FAIRness across a range of relevant semantic resources This includes digital representations of structured vocabularies, but also the high levels of domain ontologies and ontology design patterns to support a wide range of use cases. The FAIR principle along with the Transparency, Responsibility, User Focus, Sustainability and Technology (TRUST) principle have been established and accepted by the global scientific community for digital objects. Can principles like these also be applied to help manage and standardize ontologies and other semantic resources and support opportunities for semantic resource discovery and TRUSTful sharing? Recently there have been several relevant discussions of FAIR semantic resource concepts at the Onto4FAIR and FOHTI workshops. One example is how OBO Foundry principles support FAIR aims and how this has been extended by the Core Ontology for Biology and Biomedicine (COB) in an effort to provide a set of foundational classes and relations that should be used by all OBO ontologies.

In this panel, three distinguished ontology architects (Chris Mungall, Pascal Hitlzer and Ramona Walls) will answer audience questions regarding steps towards FAIR, TRUSTworthy and harmonized semantic resources. Expected topics include the role of modularization, the use of Ontology Design Patterns and Reference Ontologies, and challenges that projects face in adopting FAIR principles for semantic resources.

Session Organizers: Asiyah Lin (NIH), Gary Berg-Cross, Nomi Harris Panelists: Pascal Hitzler (Kansas State University), Chris Mungall (Lawrence Berkeley National Laboratory), Ramona Walls (Critical Path Institute)



Talks

Knowledge Graph Construction from Data, Data Dictionaries, and Codebooks: the National Health and Nutrition Examination Surveys Use Case (NHANES)

CDC’s National Health and Nutrition Examination Surveys (NHANES) is a continuous survey that aims to study the relationship between diet, nutrition, and health and their roles in designated population subgroups with selected diseases and risk factors. Data is acquired using questionnaires (either by human interviewers or computer-assisted), aimed at collecting data about participants’ households and families, medical conditions, substance usage, and more. NHANES data and supporting documentation, including data dictionaries (DDs) and codebooks (CBs), are made publicly available and are used in many data science efforts to support a wide range of health informatics projects. A typical use of NHANES data requires a complex human interpretation of the data with the help of the DDs and CBs. For example, to retrieve “diseases treated by a specific drug in households with annual income under $20,000”, one should select all the relevant variables (diseases, drugs, household income, participants) across the relevant datasets (demographic, drug usage) and perform a series of transformations (normalizing disease and income codes) to generate the answer for the query. During data processing, it is not uncommon for data to be misinterpreted as NHANES may use the same variable for multiple purposes (e.g. the same variable is used for diseases being treated and diseases being prevented by a drug and sometimes this distinction is critical to applications). Furthermore, the result of this processing may be incorrectly combined (e.g., harmonized with new data, from NHANES or other studies).

We present our approach for translating NHANES’ datasets, metadata, and any additional documentation from the surveys into a rich knowledge graph (KG) that maintains semantic distinctions. We leverage the Human-Aware Data Acquisition Infrastructure (HADatAc) [1] and its underlying Human-Aware Science Ontology (HAScO) [2], to systematically represent the complete data acquisition process. Semantic Data Dictionaries (SDDs) [3], which are derived from DDs and CBs, support the elicitation of objects that are not directly represented within NHANES datasets (including household, the household reference person, drug usage for disease treatment, drug usage for disease prevention, etc.). We demonstrate1 how we use the KG to generate tailored datasets based on user choice of variables and alignment criteria across multiple NHANES datasets. Our use of SDDs enables the combined use of ontologies and data. We further demonstrate that once data is encoded into the KG, the KG can be used to support complex automated data harmonization that until now, when required in any kind of meta-analysis study based on NHANES, is still done manually.

Presenters: Henrique Santos, Paulo Pinheiro, and Deborah L. McGuinness of the Rensselaer Polytechnic Institute.


Assisting the technical workforce with Hybrid AI (Hybrid AI)

AI-based technologies have become increasingly relevant to assist highly-specialized workforce in technical decisions, following the general trend in industry to adopt AI solutions at different stages of product lifecycle, from design to commercialization. What is crucial, in this context, is to have a framework guaranteeing that AI-based decision support can unfold in an effective and efficient way, where expertise, typically accumulated over years of “hands-on” experience and constant learning, can be coherently aggregated, and leveraged together with computational algorithms.

A lack of “common ground” could lead to an erosion of trust in AI, which defies one of the core purposes of this technology, i.e. to foster human-machine collaboration. Neuro-symbolic AI can play the role of such a framework: in the poster, we present specific methods to harvest experts know-how, combining it with semantically-structured data and ultimately transforming this knowledge corpus into actionable recommendations that experts can follow to make decisions in a timely manner. This transformation occurs through hybrid reasoning, emerging from the interplay between rule-based inferences, a cognitively adequate way to formalize experts’ know-how, and machine learning, which can be used to rapidly gain insights from high volumes of data, a process that would otherwise be time consuming and require extraordinary manual effort (e.g., finding errors, recurring patterns, etc.).

Presenter: Alessandro Oltramari


Investigating Semantic Primitives Using an OWL-based Ontology

The COSMO ontology project was initiated to provide a foundation ontology (FO) to enable broad semantic interoperability by identifying a "logical defining vocabulary" (LDV), a subset of the elements in the FO, that in combination can provide logical specifications for any entity of interest. This effort includes an attempt to identify as many “semantic primitives” (elements that cannot themselves be properly specified by combinations of other elements in the ontology) as can be recognized as needed to provide the basic logical component elements of a broad range of concepts. The OWL format was chosen to allow logical consistency checking in Protégé for the full ontology as it grew larger. As the ontology grew, it became clear that a small ‘top-level’ ontology would be inadequate to provide sufficient detail for multiple independent developers to create unambiguous extensions understandable to each other and adequate for computational consistency. The COMO uses parts of the OpenCYC ontology and the WordNet hierarchy and synset labels. The COSMO currently has over 30,000 OWL:Classes and over 1360 OWL:ObjectProperties relating those classes, with over 20,000 of the most common words of English labeling at least one OWL:Class. The project has reached a point where further development will be greatly assisted by feedback from other ontology or database developers to fill in gaps and make modifications important to the various fields that can use a common LDV for interoperability of their applications. One metric for the adequacy of the LDV thus developed is the frequency with which new relations between the classes – the OWL:ObjectProperties – need to be added as new groups of OWL:Classes are added. That metric is presented as a graph.

Presenter: Patrick Cassidy



Posters

Environmental Health Language Collaborative (EHLC): Harmonizing Data, Connecting Knowledge, and Improving Health

Understanding the role that the environment plays in affecting human health involves the collection, analysis, and integration of varied data types. The broadness of the domains adds an extra layer of complexity for data retrieval and integration. To ensure that the information is captured, combined, processed, and shared correctly and consistently, a rigorous and clear understanding of the meaning of the data is necessary. To this end, NIEHS has launched the Environmental Health Language Collaborative https://www.niehs.nih.gov/research/programs/ehlc/index.cfm. The Collaborative is a new community initiative that aims to advance integrative environmental health science research by promoting access, use, and harmonization of data through interoperable terminologies and best practices. The community comprises three main elements: Community of Practice: to exchange information, ideas, and expertise as well as foster education and training; Community Forum: to coordinate harmonization activities and collaborate on defining use cases, gaps, and activities; Community Platform: to promote the development and application of harmonized language solutions to address use case needs. The products, outcomes, and recommendations developed and endorsed by this community are expected to enhance data sharing and management efforts for NIEHS and the Environmental Health Sciences community, making data more FAIR. NIEHS encourages anyone interested in advancing this mission to engage in this community.

Authors: Anne E. Thessen, Anna Maria Masci, Stephanie Holmgren, and Charles Schmitt


Vector ontology – Evaluating Semantic Alignments

Ontology alignment is a well-known and difficult problem with many factors to consider when providing alignments such as (terminological, structural, sematic) [1]. Approaches such as simple keyword matches and string edit distances are usually inadequate, because they assume high degrees of syntactic overlap and fail to account for ambiguities within language. Word embedding algorithms, on the other hand, use neighboring context to overcome ambiguities, and have been used successfully to support entity resolution tasks in laboratory settings [2]. However, prior work in embedding spaces has focused more on algorithm design and performance rather than human-centered design, which focuses on the practical utility and deployment of algorithms in operational settings. Good data alignments still require human expertise and domain knowledge which makes it difficult to verify and qualify alignments. Our goal is to leverage word embeddings to assess the veracity of ontology alignments based on the supporting literature, documents and data of the aligned ontologies.

Our research computes the relevancy of datasets to background knowledge asserted directly by users. Users encode background knowledge in a novel form, known as a vector ontology, which consists of arithmetic expressions that convey different mathematical relationships among words. We can determine whether the content of a dataset is congruent with the mathematical relationships in a vector ontology. We bind vectors from the embedding space of a dataset to corresponding words mentioned in a vector ontology, and then evaluate the resulting vector expressions. The resultant quantities determine whether a particular dataset combination refutes or supports the relationships asserted in a vector ontology.

Users can explore a space of different dataset combinations to determine which ones are most congruent with a target vector ontology. Each unique dataset combination results in a unique embedding space, which results in unique vector expressions and resultant evaluations. Users can track the deltas of evaluations to understand whether a specific data combination has improved, degraded, or had no effect on the evaluation of the expressions in a target vector ontology, compared to other combinations. Thus, vector ontologies can be used to quantify the relevancy of adding, removing, or changing a particular dataset, giving users a good macro-level intuition for whether the unknown material embedded is relevant for their alignment task before they expend effort exploring individual vectors.

In our case study, we used vector ontologies to validate claims from a genuine ontology, the ML Schema Core Ontology (https://ml-schema.github.io/documentation/ML%20Schema.html). The ML Schema Ontology [2] is a top-level ontology used to describe ML algorithms, datasets, and experiments. This ontology provided a mapping, from its documentation, of how the ML schema ontology aligns with other well defined public ontologies such as the ONTODM [3], DMOP [4], EXPOSE [5], and MEX [6] ontologies. The ML Schema states these mappings were derived manually, using knowledge from several different experienced researchers, and in this study we explored whether we could use vector ontologies to assess the veracity of these alignments.

The results for each expression in the vector ontology provided us with a way to assets the veracity of relationships specified within the mapping. The study not only showed if a pairing was valid or invalid but also gave clues on what document or ontologies contributed to the validation. Some relationships had great scores where we could validate the alignments, whereas others were inconsistence, making it difficult to judge if the expression was valid. Some relationships scored poorly and we couldn’t find relevant data to justify the claim. Using this information from the results, we could then annotate the mapping according to weather it was truthful or not. Distribution Statement A. Approved for public release: distribution unlimited. PA ARROVAL AFRL-2021-0720

Ref: [1] J. Euzenat, A. Mocan and F. Scharffe, “ONTOLOGY ALIGNMENTS An ontology management perpective,” Springer, 2008. [2] G. Publio, D. Esteves, A. Ławrynowicz, P. Panov, L. Soldatova, T. Soru and H. Z. Joaquin Vanschoren, ML-Schema: Exposing the Semantics of Machine Learning with Schemas and Ontologies, arXiv preprint arXiv:1807.05351, 2018. [3] P. Panov, S. Džeroski and L. Soldatova, “OntoDM: An ontology of data mining,” in Workshops Proceedings of the 8th IEEE International Conference on Data Mining (ICDM 2008), 2008. [4] C. M. Keet, A. Ławrynowicz, C. d’Amato, A. Kalousis, P. Nguyen, R. Palma, R. Stevens and M. Hilario, “The Data Mining OPtimization Ontology,” Journal of Web Semantics, vol. 32, pp. 43-53, 2015. [5] J. Vanschoren, K.U.Leuven, U. Leiden and L. Soldatova, “Exposé: An ontology for machine learning experimentation,” in DM Ontology Jamboree 2010, 2010. [6] D. Esteves, D. Moussallem, C. B. Neto and T. Soru, “MEX Vocabulary: A Lightweight Interchange Format for Machine Learning Experiments,” in Proceedings of the 11th International Conference on Semantic Systems, 2015.

Authors: Patrick Fisher and Nicholas Del Rio


Ethics of Semantics

This poster and/or lightning talk will state the position that the ethical and moral aspects of semantics and semantic modeling (and cognates such as conceptual modeling, ontology, knowledge representation, etc.) should be given attention. As an untreated topic that is related to the significant one of the ethics of AI, relevance, novelty and interdisciplinary innovation are among its value. This presentation draws on work started by the author in approximately 2013.

Author: Robert Rovetto


Space and Spaceflight Semantics

This poster or lightning talk summarizes the authors concepts and ongoing work on what they will call ‘space semantics’ (broadly) and ‘spaceflight semantics’ (discipline-specific). The space ontology efforts, as described at https://purl.org/space-ontology, are introduced. The project develops knowledge organization systems (e.g., conceptual models, ontologies) of space topics of interest, with focus areas of astronautics and astronomy. Terminology refinement and development are part of the project as both stand-alone and component tasks. Combined with the interdisciplinary and abstract aspects of the methodology, these tasks and the overall project stands to potentially contribute to distinct disciplines from semantics, to policy, to philosophy, to terminology. As a personal project in the authors own time, formal support to sustain development is solicited, e.g., formal work collaborations, sponsors, a PhD study opportunity for the author to realize the vision in a protected, mentored and stable environment.

Author: Robert Rovetto


Towards an Ontology for Fairness Metrics

Recent research has revealed that many machine-learning models and datasets suffer from various forms of bias, and a wide variety of metrics have been developed in order to evaluate a model’s fairness. However, there is no one gold standard for fairness evaluation—instead, a model developer must choose which fairness metrics to use from among dozens. Additional challenges arise as many metrics have alternate naming schemes, and are not consistent across sources, and are often mutually exclusive. In order to provide guidance in fairness metric selection, understanding, and interpretation, we introduce the Fairness Metrics Ontology (FMO) aimed to provide guidance for choosing metrics aimed initially at machine learning model evaluation. FMO is a single comprehensive knowledge source which defines each fairness metric leveraging existing well-cited literature, describes their use cases, and delineates the inter-relationships between them. In addition, we define additional concepts related to fairness: these are the different types of biases that fairness metrics address, the different categories of fairness metrics, and the underlying statistical metrics used to calculate each fairness metric. Furthermore, we provide concepts related to machine-learning models and datasets, in order to enable the representation of demonstrating the utility of FMO by using it with an OWL reasoner to recommend metrics for specific machine-learning models, and are in the process of developing a system that uses FMO-based knowledge graph representation to compute and track fairness metric information over time.

Authors: Jade Franklin, Mohamed Ghalwash, Jamie McCusker, Kristin Bennett, and Deborah McGuinness


Article Navigation Standardized genome-wide function prediction enables comparative functional genomics: a new application area for Gene Ontologies in plants (GO-MAP)

The availability of genome-wide gene function annotations enables researchers to generate hypotheses and prioritize candidate genes that may be responsible for phenotypes of interest. In this study, we have functionally annotated 18 crop plant genomes across 14 species using the GOMAP pipeline (Gene Ontology Meta Annotator for Plants; doi.org/10.1186/s13007-021-00754-1). Compared to other GO (Gene Ontology) annotation datasets, GOMAP offers datasets with higher gene coverage and more GO term annotations. We were interested in determining whether the GOMAP-generated datasets could be used to perform comparative functional genomic analyses across the different species of plants. Therefore, we generated dendrograms of functional relatedness based on GO term datasets of the 18 genomes as a proof of concept. The resulting dendrograms were compared to well-established species-level evolutionary phylogenies to assess whether the generated trees were in agreement with known evolutionary relationships, which they largely are. Where discrepancies were observed, we determined branch support based on jack-knifing then removed individual annotation sets by genome to identify the annotation sets causing unexpected relationships. As a conclusion, using GOMAP-generated functional annotations across different plant species generally retain sufficient biological signal to recover known phylogenetics relationships based on genome-wide functional similarities. This shows that comparative functional genomics across species based on GO data hold promise for generating novel hypotheses about comparative gene function and traits.

Authors: Leila Fattel, Dennis Psaroudakis, Colleen Yanarella, Kevin Chiteri, Haley Dostalik, Parnal Joshi, Dollye Starr, Ha Vu, Kokulapalan Wimalanathan, and Carolyn Lawrence-Dill


Explaining Deep Learning with Background Knowledge

Neural networks have successfully answered very complex tasks across many fields. Nonetheless, deep neural networks are still considered black boxes as there is no human interpretable explanation for why the network gave a specific output. We propose identifying hidden layer activation patterns in trained Deep Neural Networks by utilizing an expressive knowledge graph as background knowledge to map it with the dataset with an explanation generation algorithm, i.e., concept induction, to arrive at human-understandable interpretations.

Author: Abhilekha Dalal


Improving Deep Learning Performance Using Background Knowledge

Deep Neural Network is a mostly used system for correctly understanding data. Increasing performance of the deep neural network using background knowledge is being explored for many days. In our work, we have used conception induction and knowledge graphs for explaining learning improvements through background knowledge. For this study, we have used the public text dataset and a traditional deep learning model for data analysis. We also prepared a knowledge graph for providing background knowledge for network inputs and ran an analysis through ECII. Then based on the results we’ve explained our findings from the study.

Authors: Sulogna Chowdhury and Pascal Hitzler


Pointer Network in Deep Deductive Reasoning

Deep Deductive Reasoning aims at leveraging modern deep-learning techniques to perform deductive reasoning. The nature of the problem is hard for the deep-learning models to create an embedding such that it extracts the inference rules/logic that lies underneath the symbolic representation that has been presented to the network as input. It is considered to be symbol-invariant. In sequence-to-sequence architecture, what happens is the model tries to learn the relationship between symbols. We have implemented Pointer Network to generate a full inference of a given graph, and the particular configuration of the network resulted in very low accuracy.

Authors: Rushrukh Rayan


Deep Supervised Reasoning

Deep learning (DL) has emerged in almost every aspect of our life. Besides, it has groundbreaking accuracy and considerable results in various applications; we know little about its reasoning and decision-making behind the scenes. Semantic web technologies have proven to be one of the promising ways to increase the explainability of DL models by providing supervision on how to reason. In this research, we employ an iterative approach named Prototype Learning (PL) to help DL models learn to better reason. PL is an instance-based reasoning approach that tries to find the most relevant examples during each inference in DL models. Our preliminary results show that this method can increase the transparency of the reasoning process of DL models.

Authors: Mohammad Saeid Mahdavinejad, Pascal Hitzler



Lightning Talks

Ontology Design for Wikibase

Wikibase – which is the software underlying Wikidata – is a powerful platform for knowledge graph creation and management. However, it has been developed with a crowd-sourced knowledge graph creation scenario in mind, which in particular means that it has not been designed for use case scenarios in which a tightly controlled high-quality schema, in the form of an ontology, is to be imposed, and indeed, independently developed ontologies do not necessarily map seamlessly to the Wikibase approach. We report on a preprint [1] in which we provide the key ingredients needed in order to combine traditional ontology modeling with use of the Wikibase platform, namely a set of axiom patterns that bridge the paradigm gap, together with usage instructions and a worked example for historical data.

[1] Cogan Shimizu, Andrew Eells, Seila Gonzalez, Lu Zhou, Pascal Hitzler, Alicia Sheill, Catherine Foley, Dean Rehberger, Ontology Design Facilitating Wikibase Integration - and a Worked Example for Historical Data. https://arxiv.org/abs/2205.14032

Presenters: Cogan Shimizu, Andrew Eells, and Pascal Hitzler





© 2022 U.S. Semantic Technologies Symposium 2022
Designed using a Minimalistic Ed Jekyll Theme