An Inclusive Learning Algorithm Framework Towards Achieving “Less Artificial” Intelligence
Current state of our understanding about Learning is a significant hindrance in achieving “less artificial” and more “natural like” intelligence that can be implemented in machines. An inclusive framework for learning algorithms will be presented discussing the “known unknowns” and speculating about “unknown unknowns” in learning algorithm development. We are already witnessing a paradigm shift in wide-ranging applications domains such as neural engineering, pharmaceutical drug development, and microbial ecology, which are empowered by rapidly-advancing technologies that can quickly generate terabytes of “imperfect” data for analysis of advanced processes, compounds and organisms. These applications are increasingly demanding transparency thus the need for moving away from completely balackbox approaches for learning. These technologies have been spurred by recent advances in Deep Learning coupled with improvements in processor technology (e.g. GPU), that have allowed practitioners and researchers to overcome the computational limitations of many Neural Networks that depend on fully human curated (i.e. labeled) data (i.e. Supervised Learning). The following fundamental question then naturally arises: What happens when curated information or labels capture only a subset of critical classes, or the curation process itself is not fault- or error-free? Undoubtedly, the algorithm’s perceived reality will distort any subsequent analysis of these data, which may have detrimental downstream effects when new discoveries and critical decisions are made on a basis of these analyses. In such scenarios, learning algorithms that can find models –underlying structures or distinct patterns within data – without relying on labels (i.e. using Unsupervised Learning), have made great progress toward answering these sorts of questions; however, these algorithms only address part of the problem. Unsupervised Learning algorithms do not take into account any available and potentially reliable information or domain knowledge, which could prove useful in developing a robust model of the data. It can be advantageous to consider such information as well as any other available domain knowledge, not as ground truth but as a starting point to build a more complete picture of the problem under investigation. The following applications will be discussed to illustrate the usefulness of the methods developed by my research groups at Australian National University and University of Melbourne [1-9]. Some key contributions in CI area are also highlighted [10-12].
Application 1: Exploring new inter-drug interactions and re-purposing of known drugs
Given the vast number of clinical drugs, only a small portion of inter-drug interactions are known and there is minute knowledge of non-interacting drug pairs. Therefore, we expand this knowledge base by detecting inter-drug interactions as well as null interactions (label completeness). Most drugs function through multiple mechanisms. Knowing these mechanisms that are effective for a particular disease paves way to discover novel compounds that share similar functions.
Application 2: Understanding niche environmental ecology by shedding light on microbial ‘dark matter’
We are only at the very brink of understanding the intricacies of the hidden world of microbes. Most samples will contain a majority of microbial ‘dark matter’, a collection of data that cannot be matched to any known or previously discovered organism. As such, many methods which rely purely on Supervised Learning (i.e. those that require knowledge of all microbes in a sample) cannot be used to analyze such data sets.
 P. N. Hameed, K. Verspoor, S. Kusljic and S.K Halgamuge, “A two-tiered unsupervised clustering approach for drug repositioning through heterogeneous data integration”, BMC Bioinformatics, 2018
 P. N. Hameed, K. Verspoor, S. Kusljic and S.K Halgamuge, “Positive-Unlabeled Learning for Inferring Drug Interactions Based on Heterogeneous Attributes”, BMC Bioinformatics 18.1 (2017): 140.
 D Herath, S. L. Tang, K Tandon, D Ackland, and SK Halgamuge, “CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision”, BMC bioinformatics 18 (16), 2017
 Y. Sun, M. Kirley and S.K. Halgamuge, “A Recursive Decomposition Method for Large Scale Continuous Optimization”, IEEE Transactions on Evolutionary Computation, 2017
 Y. Sun, M. Kirley and S.K. Halgamuge, “Quantifying Variable Interactions in Continuous Optimization Problems'', IEEE Transactions on Evolutionary Computation, 2016.
 D. C. Mendis, E. Morrisroe, S. Petrou & S. K. Halgamuge, "Use of adaptive network burst detection methods for multielectrode array data and the generation of artificial spike patterns for method evaluation." Journal of neural engineering 13.2 (2016): 026009.
 D. Jayasundara, I. Saeed, S. Maheswararajah, B. C. Chang, S. L. Tang, S. K. Halgamuge, "ViQuaS: an improved reconstruction pipeline for viral quasispecies spectra generated by next-generation sequencing." Bioinformatics (2014): btu754.
 I. Saeed, S. L. Tang and S. K. Halgamuge, “Unsupervised discovery of microbial population structure within metagenomes using nucleotide base composition.” Nucleic acids research 40.5 (2012): e34-e34.
 Hsu, A.L., Tang, S.L. & Halgamuge, S.K. "An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data." Bioinformatics 19.16 (2003): 2131-2140.
 A. Ratnaweera, S. K. Halgamuge and H. C. Watson, Self-Organizing Hierarchical Particle Swarm Optimizer with time varying acceleration coefficients, IEEE Transactions on Evolutionary Computation June 2004, IEEE Press [JIP=4.4, Most cited Australian paper since 2003 in all IEEE journals and conferences (Source ISI). Top 1% cited paper in Essential Science Indicators].
 D. Alahakoon, S. K. Halgamuge, and B. Srinivasan. Dynamic Self Organising Maps with Controlled Growth for Knowledge Discovery (Special Issue in Data Mining). IEEE Transactions on Neural Networks, May 2000 [Top 1% cited paper in Essential Science Indicators (2000-2010].
 S. K. Halgamuge and M. Glesner, “Neural Networks in Designing Fuzzy Systems for Real World Applications'', International Journal for Fuzzy Sets and Systems, Vol 65, No 1, pages 1-12, Elsevier, 1994. [Included in the most cited papers in 1994-03 by ISI]
Saman Halgamuge, Fellow of the IEEE, is a Professor in the School of Electrical, Mechanical and Infrastructure Engineering of University of Melbourne and an honorary Professor of Australian National University. He was previously the Director/Head of Research School of Engineering of Australian National University (2016-18), Professor, Associate Dean International, Associate Professor and Reader and Senior Lecturer at University of Melbourne (1997-2016). He graduated with Dipl.-Ing and PhD degrees in Data Engineering (“Datentechnik”) from Technical University of Darmstadt, Germany and B.Sc. Engineering from University of Moratuwa, Sri Lanka. He is an Associate Editor of BMC Bioinformatics, IEEE Transactions on Circuits and Systems II and Applied Mathematics (Hindawi). His research that led to 25o publications has been funded over the last 22 years by Australian Research Council (16 grants), National Health and Medical Research Council (2 grants), industry and other external organisations (13 grants or contracts) and funding to support stipends for about 50 PhD students. His research contributions are in Data engineering, which includes Data Analytics based on Unsupervised and Near Unsupervised Learning and Optimization focusing on applications in Mechatronics, Energy, Biology and Medicine. His publication profile is at http://scholar.google.com.au/citations?sortby=pubdate&hl=en&user=9cafqywAAAAJ&view_op=list_works