IMPORTANT CONCEPTS IN XAI
The ultimate goal of this blog post series is to provide a basic understanding of XAI and its methods. In our previous post, we explained what XAI is, why it is necessary, who the stakeholders are affected by it and why, and the current AI regulatory actions involving XAI. To continue with the technical details of XAI, maybe the best option is to define the basic related terms. Once the reader understands these terms and approaches, it will be easier to understand the technical details behind the methods explained in the following sections of this blog post series.
Let us begin with the terms “explainability” and “interpretability”. Although there is confusion about the difference between these two terms in the literature, [1] draws a clear line between them. According to this study, the term “interpretability” refers to the extent to which a human can understand and make sense of the output or operation of a machine-learning model. It enables users to grasp the "gist" or essential aspects of the model's decision-making process, leveraging their background knowledge to make informed decisions. More clearly stated in [2], interpretability is the "level of understanding how the underlying (AI) technology works."
On the other hand, the term “explainability” refers to a more technical understanding of the mechanisms and processes that led to a specific outcome of a machine learning model. Often technical and causative, these explanations outline the rules or procedures applied within the model to arrive at its conclusions, catering primarily to technical practitioners who utilize this in-depth understanding for tasks such as debugging. Again, as stated in [2], explainability is the "level of understanding how the AI-based system ... came up with a given result".
Intrinsic interpretability is a term used for models whose decision-making processes are transparent. Given an intrinsically interpretable model, generating the underlying rationales behind the given decision is easy because the model is simple, and interpretability lies within its structure. Intrinsically interpretable models are considered "white box" in machine learning and artificial intelligence. This designation means that their internal workings are transparent, allowing users to quickly understand how inputs are transformed into outputs. Unlike "black box" models, where the decision-making process is opaque and complex to scrutinize, white box models offer visibility into the logic and calculations driving their predictions or decisions, facilitating a more precise understanding and trust among users.
When explaining machine learning models, the methods used can have different characteristics. For example, if a technique is post-hoc, it means that the training phase of the model is completed, and the method tries to explain it using its finalized properties.
Moreover, the concepts of "Model agnostic" and "Model specific" hold significant importance. A "Model agnostic" method is an explainability method generally applied across all models. This means such methods can work compatibly with various models regardless of the type of artificial intelligence model used. For example, an interpretation technique could be applied to different models, from decision trees to deep neural networks.
On the other hand, the term "Model specific" means that an explainability method is designed to be specific to certain types of models. In this case, a particular method of explainability may only be compatible with artificial intelligence models of a specific type. For example, an interpretation method might only be compatible with linear regression models or a particular type like decision trees. These concepts are essential in explainability research regarding dealing with various model types and focusing on specific requirements.
In XAI, "Global explanation" and "Local explanation" concepts indicate the scope of interpretation or explainability methods. A global method is an explanation technique that provides a general explanation encompassing all data points. It informs about the model's overall behavior or general trends. For example, an explainability method could address the entire dataset to understand the general characteristics of a classification model.
On the other hand, the term "Local" indicates that the explanation is limited to a specific point or a few points. In this case, the explanation method focuses on explaining the model's decisions at those particular data points, often identified as interesting or critical examples. For instance, a "Local" explainability method could be used to understand the decision of a classification model on a specific example. These concepts represent a critical distinction in explainability analysis between understanding the general behavior of the model and focusing on specific data examples.
In conclusion, this blog post has embarked on a foundational exploration of XAI, clarifying crucial terms underpinning this evolving field. From dissecting the nuanced differences between explainability and interpretability to demystifying the nature of intrinsically interpretable models, we have paved the way for a deeper understanding of how AI decisions can be made transparent and understandable. The distinction between model-agnostic and model-specific approaches, alongside the differentiation between global and local explanations, serves as a crucial framework for practitioners and stakeholders to navigate the complexities of XAI. As we move forward, the insights gained from these discussions are instrumental in fostering trust and reliability in AI systems and ensuring that these systems are developed and deployed in a manner that is ethical and understandable to all. As we delve into the technical specifics in the forthcoming sections of this series, the foundational knowledge provided here will be invaluable for a comprehensive grasp of the mechanisms that make AI systems transparent and accountable.
References
[1] Broniatowski, David A. "Psychological foundations of explainability and interpretability in artificial intelligence." NIST, Tech. Rep (2021).
[2] ISO/IEC TR 29119-11:2020, Software and systems engineering, Software testing, Part 11: Guidelines on the testing of AI-based systems. ISO. 2020. Retrieved 25 November 2023.