Dr Dany Varghese FHEA
Academic and research departments
School of Computer Science and Electronic Engineering, Computer Science Research Centre.About
Biography
Dr. Dany Varghese is a researcher and educator specialising in Explainable Artificial Intelligence (XAI), Logic-Based Machine Learning, and Human-Like Computing. His work focuses on developing transparent, trustworthy AI systems through Inductive Logic Programming (ILP) and the novel framework of Meta Inverse Entailment (MIE).
He is the creator of PyGol and PyILP, widely used open-source tools that bridge symbolic reasoning with modern machine learning. His research has advanced areas such as healthcare AI, autonomous systems, legal-aware AI, and symbolic regression, receiving recognition through multiple Best Paper Awards.
He has supervised projects in robotics, healthcare, and financial applications of XAI, and he actively collaborates with international partners in academia and industry. His broader vision is to design explainable, legally compliant, and human-centred AI systems that can be trusted in high-stakes domains.
He contributes extensively to the teaching of Artificial Intelligence, Machine Learning, and Explainable AI (XAI), with a focus on symbolic reasoning, logic-based learning, and trustworthy AI. His teaching emphasises the development of transparent, ethical, and human-centred intelligent systems for real-world and high-stakes applications.
University roles and responsibilities
- Machine Learning and Data Science (Module Co-Lead)
- Personal Tutor (Masters)
- Project Supervisor (Bachelors and Masters)
My qualifications
ResearchResearch interests
My research focuses on Explainable Artificial Intelligence (XAI), Logic-Based Machine Learning, and Human-Like Computing, with the goal of building transparent, trustworthy, and legally aware AI systems. I am particularly interested in:
- Relational and Logic-Based Machine Learning (Inductive Logic Programming, ILP): Developing interpretable models that combine induction, abduction, and deduction to capture human-like reasoning.
- Meta Inverse Entailment (MIE): Advancing efficient methods for generating human-readable rules, enabling few-shot learning and knowledge discovery from sparse data.
- Explainable AI in Practice: Applying XAI to critical domains such as healthcare diagnostics, autonomous robotics, finance, and legal reasoning, where transparency and accountability are essential.
- Symbolic Regression and Data Science: Creating interpretable rule-based regression frameworks as alternatives to black-box models, with applications in energy efficiency, biology, and clinical decision-making.
- Legal- and Ethics-Aware AI: Embedding formal rules (e.g., maritime COLREGs, clinical guidelines) into AI models to ensure compliance, fairness, and trustworthiness.
- Tools and Frameworks for XAI: I develop and maintain open-source systems such as PyGol (> 5K downloads from 2023) and PyILP (> 15K downloads from 2022), bridging symbolic logic with modern machine learning workflows.
My broader vision is to integrate explainability, legal compliance, and human-centred design into the next generation of AI systems, ensuring they are not only accurate but also transparent, fair, and accountable.
Research interests
My research focuses on Explainable Artificial Intelligence (XAI), Logic-Based Machine Learning, and Human-Like Computing, with the goal of building transparent, trustworthy, and legally aware AI systems. I am particularly interested in:
- Relational and Logic-Based Machine Learning (Inductive Logic Programming, ILP): Developing interpretable models that combine induction, abduction, and deduction to capture human-like reasoning.
- Meta Inverse Entailment (MIE): Advancing efficient methods for generating human-readable rules, enabling few-shot learning and knowledge discovery from sparse data.
- Explainable AI in Practice: Applying XAI to critical domains such as healthcare diagnostics, autonomous robotics, finance, and legal reasoning, where transparency and accountability are essential.
- Symbolic Regression and Data Science: Creating interpretable rule-based regression frameworks as alternatives to black-box models, with applications in energy efficiency, biology, and clinical decision-making.
- Legal- and Ethics-Aware AI: Embedding formal rules (e.g., maritime COLREGs, clinical guidelines) into AI models to ensure compliance, fairness, and trustworthiness.
- Tools and Frameworks for XAI: I develop and maintain open-source systems such as PyGol (> 5K downloads from 2023) and PyILP (> 15K downloads from 2022), bridging symbolic logic with modern machine learning workflows.
My broader vision is to integrate explainability, legal compliance, and human-centred design into the next generation of AI systems, ensuring they are not only accurate but also transparent, fair, and accountable.
Teaching
Machine Learning for Data Science (COMM075)
How do streaming platforms recommend movies you might enjoy, how do banks detect fraudulent transactions, and how can AI assist doctors in identifying diseases earlier? The answer lies in Machine Learning and Data Science.
In Machine Learning for Data Science, you will explore how intelligent systems learn patterns from data and transform raw information into meaningful insights. The module introduces core machine learning concepts including classification, regression, clustering, feature engineering, and model evaluation, while also examining the growing importance of Explainable AI (XAI) and trustworthy machine learning.
The module combines strong theoretical foundations with extensive hands-on laboratory experience using real-world datasets from domains such as healthcare, finance, robotics, and social applications. You will develop practical machine learning solutions, analyse model behaviour, and explore how explainability techniques can improve transparency, fairness, and trust in AI systems.
A strong emphasis is placed on research-informed teaching and the use of modern academic and industry-standard tools, allowing students to engage with current developments in machine learning and data science. Through practical projects and experimental analysis, you will learn not only how to build accurate predictive models, but also how to critically evaluate their reliability, interpretability, and ethical implications.
By the end of the module, you will be able to design, implement, and evaluate machine learning pipelines for complex real-world problems, preparing you for careers in AI, data science, research, and intelligent system development.
Publications
Highlights
Dr. Dany Varghese has authored and co-authored several publications in Explainable AI, Inductive Logic Programming, Human-Like Computing, and trustworthy machine learning, with contributions spanning healthcare AI, autonomous systems, symbolic regression, and logic-based reasoning. His research outputs, including award-winning papers and open-source frameworks such as PyGol and PyILP, have contributed to advancing transparent and interpretable AI methodologies in both academia and applied domains.
ORCID:
鈥擟ognitive computing is an emerging method which helps to analyse the human brain behaviour and simulate it mathematically. Cognitive computing systems learn and interact naturally with people to extend what either humans or machine could do on their own. Cognitive science consists of multiple research disciplines, including psychology, artificial intelligence, philosophy, neuroscience, linguistics, and anthropology. Cognitive computing helps autonomous systems to work as human Brain. COMPASS is a simulator which simulates the working of cognitive computing. It is fully based on the architecture TrueNorth developed by IBM. COMPASS enables to simulate brain-like functions in a hardware platform.
Images with high resolution are always a necessity in
almost all image processing applications. .Super Resolution is a
method in image processing to create High Resolution image
from several or single low resolution image so that high spatial
frequency information can be recovered. SR methods are applied
on LR images in order to increase spatial resolution for a new
image. The super resolution processing includes two main tasks:
up-sampling of the image, removing degradations that arise
during the image capture. In effect, the super-resolution process
tries to generate the missing high frequency components.
Applications may include HDTV, biological imaging etc.
In this work we deal the problem of producing a HR image
from a single low-resolution image using some statistical
mathematical model. Performance of these algorithms was
checked by using objective image quality criteria PSNR, MSSIM
and compared with other existing methods.
The general focus of domain adaptation methodology is transferring learned knowledge from labeled train domain to unlabeled test domain. Domain adaptation tries to minimize the domain shift problem by modeling a classifier using labeled training domain data which taken under definite conditions and this classifier is utilize to test the data which taken under distinct conditions. Common adaptation approaches will learn a freshly acquired feature vector space using labeled data domain (source) and unlabeled train (target) data domain having alike characteristics and a supervised, unsupervised or semi-supervised classifier will carry out the further task. Here is a design of an incremental KM-ELM classifier which can utilize for better classification of various domain adaptation task. This classifier is a fusion of high performing K-Means algorithm and fast neural network Extreme learning machine (ELM). Here utilizes the cross-domain learning capability of ELM with PCA, GFK (Geodesic flow Kernel) methods for addressing domain adaptation task. First PCA and PLS are used to create the subspaces of testing data and training data and these subspaces will considered as a points in Grassmann manifold. After that the geodesic based domain shift representation will carry out and integration of these data points creates the intermediate cross domain. This will form a new space having feature vectors from training domain and testing domain where the likelihood of these vectors in this space is maximum.
Alzheimer's disease (AD) is one of the most intensifying brain disorder that gradually damage memory and thinking skills and later the ability to carry out the normal tasks. It is the most common cause of dementia in older adults. While dementia is more common as people grow older, it is not a normal part of aging. One of the first signs of Alzheimer's disease is memory loss. AD accounts for up to 80% of cases of dementia. The 3 stages of AD is mild, moderate and severe AD. In mild cognitive impairment (MCI), the loss of cognitive skills only slightly affects a person's daily life, moderate stage is the middle stage of AD. While in severe AD, a person is no longer able to function independently and becomes totally reliant on others for care. In this paper, Support Vector Machine (SVM) is used for diagnosing Alzheimer's disease of brain MRI and for classifying it into specific stages. The algorithm was trained and tested using the MRI data from Alzheimer's Disease Neuroimaging Initiative (ADNI). The data used include the MRI scanning of about 70 AD patients and 30 normal controls.
Unlike most of computer vision approaches which dependon hundreds or thousands of training images, humans can typically learnfrom a single visual example. Humans achieve this ability using back-ground knowledge. Rule-based machine learning approaches such as In-ductive Logic Programming (ILP) provide a framework for incorporatingdomain specific background knowledge. These approaches have the po-tential for human-like learning from small data or even one-shot learning,i.e. learning from a single positive example. By contrast, statistics basedcomputer vision algorithms, including Deep Learning, have no generalmechanisms for incorporating background knowledge. In this paper, wepresent an approach for one-shot rule learning called One-Shot Hypoth-esis Derivation (OSHD) which is based on using a logic program declar-ative bias. We apply this approach to the challenging task of Malayalamcharacter recognition. This is a challenging task due to spherical andcomplex structure of Malayalam hand-written language. Unlike for otherlanguages, there is currently no efficient algorithm for Malayalam hand-written recognition. We compare our results with a state-of-the-art DeepLearning approach, called Siamese Network, which has been developedfor one-shot learning. The results suggest that our approach can gener-ate human-understandable rules and also outperforms the deep learningapproach with a significantly higher average predictive accuracy.
Unlike most computer vision approaches, which depend on hundreds or thousands of training images, humans can typically learn from a single visual example. Humans achieve this ability using background knowledge. Rule-based machine learning approaches such as Inductive Logic Programming (ILP) provide a framework for incorporating domain specific background knowledge. These approaches have the potential for human-like learning from small data or even one-shot learning, i.e. learning from a single positive example. By contrast, statistics based computer vision algorithms, including Deep Learning, have no general mechanisms for incorporating background knowledge. This paper presents an approach for one-shot rule learning called One-Shot Hypothesis Derivation (OSHD) based on using a logic program declarative bias. We apply this approach to two challenging human-like computer vision tasks: 1) Malayalam character recognition and 2) neurological diagnosis using retinal images. We compare our results with a state-of-the-art Deep Learning approach, called Siamese Network, developed for oneshot learning. The results suggest that our approach can generate humanunderstandable rules and outperforms the deep learning approach with a significantly higher average predictive accuracy
Plant diseases are one of the main causes of crop loss in agriculture. Machine Learning, in particular statistical and neural nets (NNs) approaches, have been used to help farmers identify plant diseases. However, since new diseases continue to appear in agriculture due to climate change and other factors, we need more data-efficient approaches to identify and classify new diseases as early as possible. Even though statistical machine learning approaches and neural nets have demonstrated state-of-the-art results on many classification tasks, they usually require a large amount of training data. This may not be available for emergent plant diseases. So, data-efficient approaches are essential for an early and precise diagnosis of new plant diseases and necessary to prevent the disease鈥檚 spread. This study explores a data-efficient Inductive Logic Programming (ILP) approach for plant disease classification. We compare some ILP algorithms (including our new implementation, PyGol) with several statistical and neural-net based machine learning algorithms on the task of tomato plant disease classification with varying sizes of training data set (6, 10, 50 and 100 training images per disease class). The results suggest that ILP outperforms other learning algorithms and this is more evident when fewer training data are available.
Abductive reasoning plays an essential part in day-to-day problem-solving. It has been considered a powerful mechanism for hypothetical reasoning in the presence of incomplete knowledge; a form of 鈥渃ommon sense鈥 reasoning. In machine learning, abduction is viewed as a conceptual method in which data and the bond that jointly brings the different types of inference. The traditional Mode-Directed Inverse Entailment (MDIE) based systems such as Progol and Aleph for the abduction were not data-efficient since their execution time with the large dataset was too long. We present a new abductive learning procedure using Meta Inverse Entailment (MIE). MIE is similar to Mode-Directed Inverse Entailment (MDIE) but does not require user-defined mode declarations. In this paper, we use an implementation of MIE in Python called PyGol. We evaluate and compare this approach to reveal the microbial interactions in the ecosystem with state-of-art-of methods for abduction, such as Progol and Aleph. Our results show that PyGol has comparable predictive accuracies but is significantly faster than Progol and Aleph.
Plant diseases are one of the main causes of crop loss in agriculture. Machine Learning, in particular statistical and neural nets (NNs) approaches, have been used to help farmers identify plant diseases. However, since new diseases continue to appear in agriculture due to climate change and other factors, we need more data-efficient approaches to identify and classify new diseases as early as possible. Even though statistical machine learning approaches and neural nets have demonstrated state-of-the-art results on many classification tasks, they usually require a large amount of training data. This may not be available for emergent plant diseases. So, data-efficient approaches are essential for an early and precise diagnosis of new plant diseases and necessary to prevent the disease鈥檚 spread. This study explores a data-efficient Inductive Logic Programming (ILP) approach for plant disease classification. We compare some ILP algorithms (including our new implementation, PyGol) with several statistical and neural-net based machine learning algorithms on the task of tomato plant disease classification with varying sizes of training data set (6, 10, 50 and 100 training images per disease class). The results suggest that ILP outperforms other learning algorithms and this is more evident when fewer training data are available.
The functional diversity of microbial communities emerges from a combination of the great number of species and the many interaction types, such as competition, mutualism, predation or parasitism, in microbial ecological networks. Understanding the relationship between microbial networks and the functions delivered by the microbial communities is a key challenge for microbial ecology, particularly as so many of these interactions are difficult to observe and characterise. We believe that this 鈥橠ark Web鈥 of interactions could be unravelled using an explainable machine learning approach, called Abductive/Inductive Logic Programming (A/ILP) in the R package InfIntE, which uses mechanistic rules (interaction hypotheses) to infer directly the network structure and interaction types. Here we attempt to unravel the dark web of the plant microbiome in metabarcoding data sampled from the grapevine foliar microbiome. Using synthetic, simulated data, we first show that it is possible to satisfactorily reconstruct microbial networks using explainable machine learning. Then we confirm that the dark web of the grapevine microbiome is diverse, being composed of a range of interaction types consistent with the literature. This first attempt to use explainable machine learning to infer microbial interaction networks advances our understanding of the ecological processes that occur in microbial communities and allows us to hypothesise specific types of interaction within the grapevine microbiome. This work will have potentially valuable applications, such as the discovery of antagonistic interactions that might be used to identify potential biological control agents within the microbiome.
Traditional machine learning methods heavily rely on large amounts of labelled data for effective generalisation, posing a challenge in few-shot learning scenarios. In many real-world applications, acquiring large amounts of training data can be difficult or impossible. This paper presents an efficient and explainable method for few-shot learning from images using inductive logic programming (ILP). ILP utilises logical representations and reasoning to capture complex relationships and generalise from sparse data. We demonstrate the effectiveness of our proposed ILP-based approach through an experimental evaluation focused on detecting neurodegenerative diseases from fundus images. By extending our previous work on neurodegenerative disease detection, including Alzheimer鈥檚 disease, Parkinson鈥檚 disease, and vascular dementia disease, we achieve improved explainability in identifying these diseases using fundus images collected from the UK Biobank dataset. The logical representation and reasoning inherent in ILP enhances the interpretability of the detection process. The results highlight the efficacy of ILP in few-shot learning scenarios, showcasing its remarkable generalisation performance compared to a range of other machine learning algorithms. This research contributes to the field of few-shot learning using ILP and paves the way for addressing challenging real-world problems.
In this paper, we present NumLog, an Inductive Logic Programming (ILP) system designed for feature range discovery. NumLog generates quantitative rules with clear confidence bounds to discover feature-range values from examples. Our approach focuses on generating rules with minimal complexity from numerical values, ensuring the assessment of methods that could impact accuracy and comprehensibility. Traditional ILP systems, especially those intersecting with computer vision, struggle with numerical data. This convergence presents unique challenges, often hindering the generation of meaningful insights due to the limited capabilities of conventional ILP systems to handle numerical values. NumLog stands out by incorporating an advanced range discovery mechanism that generates low-complexity rules while maintaining high accuracy and comprehensibility. This enhancement significantly improves interpretability, promoting more effective human-machine learning collaboration. We compare NumLog with the state-of-the-art ILP systems such as NumSynth and Aleph and conduct comprehensive experiments on several datasets. We evaluated our approach by measuring accuracy, precision, F1 score, and rule complexity to demonstrate the effectiveness of the methodology.
Large Language Models (LLMs) have achieved remarkable success in various natural language processing tasks. However, their ability to perform tasks requiring formal representation and reasoning remains limited. This paper explores the integration of Meta Inverse Entailment (MIE) with LLMs to enhance their reasoning capabilities. In our experiments, we examine a hybrid GPT-MIE model on a simplified natural language grammar. The results suggest that the accuracy of GPT is significantly improved when it incorporates the grammar learned using MIE. This hybrid approach demonstrates the potential of combining LLMs' linguistic proficiency with MIE's rigorous formalism, leading to better performance in tasks demanding logical representation and reasoning.
"One-shot learning" traditionally refers to classifying a single instance using a machine learning model pre-trained on extensive datasets. In contrast, Meta Inverse Entailment (MIE), a type of Inductive Logic Programming (ILP), can generate complex logic programs from just a single positive example and minimal background knowledge without prior extensive training. This approach offers a human-centred form of machine learning that is more controllable, reliable, and comprehensible due to its small training data size and the inherent interpretability of logic programs. We use PyGol, a Python-based implementation of Meta Inverse Entailment, and compare its performance with ExpGen-PPO, a leading deep reinforcement learning system. Our experiments focus on two domains: maze-solving and obstacle avoidance for mobile robotics. In both domains, we first train the systems in simplified environments without obstacles and then test their ability to generalise to more complex environments with obstacles. Our results show that PyGol effectively learns generalisable solutions from a single example in both domains, whereas ExpGen-PPO requires more training and significantly more exploration to achieve similar performance.
Explainable Medical Reasoning: From Data to Transparent, Trustworthy Clinical Insights
Learning from small datasets is crucial in biomedical research due to the limited availability of large, annotated data in many domains. Inductive Logic Programming (ILP) offers a robust framework for integrating symbolic reasoning with machine learning, enabling the generation of interpretable models. In this work, we explore the application of numerical symbolic learning approaches to biomedical data using ILP systems such as NumLog, PyGol, and NumSynth. These systems demonstrate superior efficiency in handling numerical features and extracting meaningful rules compared to traditional rule learning and machine learning methods. We evaluate these approaches on two datasets: a neurodegenerative dataset for Alzheimer's disease detection from fundus images and the benchmark Breast Cancer dataset. The results underscore the potential of ILP-based numerical-symbolic learning in identifying complex relationships within biomedical data, providing actionable insights for advancing precision medicine and disease diagnosis.
Autism Spectrum Disorder (ASD) diagnosis relies on integrating heterogeneous behavioral and cognitive indicators, demanding AI systems that are not only accurate but also interpretable and verifiable. In this study, we present an explainable ASD detection framework based on Inductive Logic Programming (ILP), using phenotypic data from the ABIDE dataset. Unlike black-box models, ILP produces symbolic rules in first-order logic, supporting clinical transparency and auditability. We evaluate ILP against standard machine learning models (e.g., Random Forest, SVM, Gradient Boosting) using 10-fold cross-validation and report competitive accuracy, with ILP demonstrating superior specificity and high precision鈥攃ritical metrics in clinical screening. We further compare the interpretability of ILP explanations with state-of-the-art post-hoc methods, SHAP and LIME, using a held-out test instance. While all methods identify consistent predictive features, ILP offers globally consistent, human-readable rules that are more accessible to non-expert users. Our findings affirm ILP as a viable and trustworthy alternative for ASD classification, providing both predictive utility and symbolic transparency. Future work will extend this approach to incorporate fMRI-derived features, enabling richer multimodal reasoning in neurodevelopmental diagnostics.
This paper introduces a symbolic regression framework based on Inductive Logic Programming (ILP) to address the growing demand for interpretable machine learning models in sensitive and regulation-intensive domains. Unlike black-box regressors such as ensemble methods or neural networks, our approach learns human-readable rules that explain how input features relate to output predictions using logic-based representations. We leverage the PyGol, a novel ILP system, to perform multi-class symbolic regression through a one-vs-rest strategy, where continuous targets are either preserved or discretised into symbolic labels. Each label is represented by a distinct set of logic rules defined over feature intervals, facilitating transparent and modular reasoning. A Bayesian-inspired scoring mechanism extends inference to noisy or partially matching instances, enhancing robustness. Through empirical evaluations on benchmark regression datasets, we demonstrate that PyGol achieves competitive predictive performance compared to state-of-the-art regressors while offering superior transparency and traceability. We further present sample learned rules and interpret their behaviour, highlighting the system's explanatory potential. This work affirms the value of ILP-based symbolic models as viable alternatives to black-box approaches, particularly where accountability and decision interpretability are paramount.
Maritime Autonomous Surface Ships (MASS) promise to reduce casualty rates and improve operational efficiency, yet two obstacles impede widespread adoption: the qualitative, often conflicting language of the COLREGs and the opacity of prevailing AI collision-avoidance algorithms. We present a socio-technical decision framework that formalises COLREG hierarchy, including the lex specialis ordering confirmed in Ever Smart v. Alexandra 1, as a tiered rule tree and encodes it in an explainable, auditable knowledge base. Using symbolic logic, the system resolves rule conflicts, logs its reasoning, and outputs a single safe manoeuvre aligned with good seamanship. Three representative scenarios (narrow-channel crossing, cascading multi-vessel conflict, and overtaking in a channel) demonstrate that the framework reproduces expert decisions while exposing a transparent proof trail. The result is a legally coherent foundation for logic-based machine learning using inductive logic programming (ILP) and future maritime autonomous systems trials, advancing the IMO goal of 鈥渁t least equivalent鈥 safety for unmanned vessels.
This paper presents preliminary work on integrating symbolic learning and reasoning into autonomous maritime systems using inductive logic programming (ILP). A key challenge in operationalising ILP is bridging the gap between continuous sensing and actuation data and discrete symbolic logic. We propose a framework that enables autonomous vessels to query maritime rules (COLREGs) and learn from human oversight. Using the ILP system PyGol, we demonstrate the learning of COLREG Rule 13 for overtaking situations from discretised bearing data, and further explore the learning of an exception to Rule 15 for crossing situations through examples inspired by case law. These results show the potential for interpretable, legally compliant decision-making and lay the groundwork for learning more complex rules in dynamic maritime environments.