Opening Ceremony of ACIIDS 2020
Asserting a high quality of data integration results frequently involves broadening a number of merged data sources. But does more always mean more? In this paper we apply a consensus theory, originating from the collective intelligence field, and investigate which parameters describing a collective affects the quality of its consensus, which can be treated as an output of the data integration, most prominently. Eventually, we identified, either analytically or experimentally, adjusting which properties of the conflict profile (input data) asserts exceeding expected integration quality. In other words-which properties have the biggest influence and which are insignificant.
FOKI is a formally defined framework, proposed by authors, which addresses storing, processing, and integrating ontologies. Its model is based on a mathematical apparatus but lacks a concrete syntax. These features make difficult to use standardized benchmark datasets, usually expressed in OWL2, during experimental verification of FOKI’s validity. To enable a practical usage of FOKI, a set of bidirectional transformation rules (defined at the abstract syntax level) between the OWL2 RL and the framework is needed. However, due to major differences in base assumptions it is impossible to provide a straightforward translation between FOKI and OWL. Therefore, the aim of the paper is to identify which elements of OWL syntax can be transformed into FOKI formalism (on its current state of development) and which of these rules are bi-directional. The defined rules are illustrated with some overall examples. The paper also provides a short discussion about different approaches to transformation definitions.
This paper reports an application of blockchains for knowledge refinement. Constructing a high-quality knowledge base is crucial for building an intelligent system. One promising approach to this task is to make use of “the wisdom of the crowd,” commonly performed through crowdsourcing. To give users proper incentives, gamification could be introduced into crowdsourcing so that users are given rewards according to their contribution. In such a case, it is important to ensure transparency of the rewards system. In this paper, we consider a refinement process of the knowledge base of our word retrieval assistant system. In this knowledge base, each piece of knowledge is represented as a triple. To validate triples acquired from various sources, we introduce yes/no quizzes. Only the triples voted “yes” by a sufficient number of users are incorporated into the main knowledge base. Users are given rewards based on their contribution to this validation process. We describe how a blockchain can be used to ensure transparency of the process, and we present some simulation results of the knowledge refinement process.
Word embeddings are a useful tool for extracting knowledge from the free-form text contained in electronic health records, but it has become commonplace to train such word embeddings on data that do not accurately reflect how language is used in a healthcare context. We use prediction of medical codes as an example application to compare the accuracy of word embeddings trained on health corpora to those trained on more general collections of text. It is shown that both an increase in embedding dimensionality and an increase in the volume of health-related training data improves prediction accuracy. We also present a comparison to the traditional bag-of-words feature representation, demonstrating that in many cases, this conceptually simple method for representing text results in superior accuracy to that of word embeddings.