Developing inherently interpretable models for prediction has gained prominence in recent years. A subclass of these models, wherein the interpretable network relies on learning high-level concepts, are valued because of closeness of concept representations to human communication. However, the visualization and understanding of the learnt unsupervised dictionary of concepts encounters major limitations, specially for large-scale images. We propose here a novel method that relies on mapping the concept features to the latent space of a pretrained generative model. The use of a generative model enables high quality visualization, and naturally lays out an intuitive and interactive procedure for better interpretation of the learnt concepts. Furthermore, leveraging pretrained generative models has the additional advantage of making the training of the system more efficient. We quantitatively ascertain the efficacy of our method in terms of accuracy of the interpretable prediction network, fidelity of reconstruction, as well as faithfulness and consistency of learnt concepts. The experiments are conducted on multiple image recognition benchmarks for large-scale images.
Our system (named VisCoIN) is part of a broader family of by-design interpretable models, termed Concept-based Interpretable Networks (CoINs), shown on the left below. CoINs rely on predicting a concept representation \(\Phi(x)\) (a dictionary of \(K\) concept functions) and then making the final classification decision by using a simple function \(\Theta\) that operates on \(\Phi(x)\). VisCoIN is a type of unsupervised CoIN that learns \(\Phi(x)\) in a completely unsupervised fashion by imposing properties as loss functions.
The system design of VisCoIN is shown on the right below. It leverages a pretrained generative model \(G\) for visualization and a pretrained classifier \(f\). The hidden layers of pretrained classifier provide a rich source for learning concept representations that are also useful for prediction. The generator is used to obtain a high-quality reconstruction of input \(x\) through \(\Phi(x)\) which is essential for visualization of individual concept functions. We use a fixed pretrained generator to keep the design flexible, modular and training costs low. Specifically for VisCoIN we optimize a training loss, consisting of three broad terms, output fidelity loss \(\mathcal{L}_{of}\), modified reconstruction loss \(\mathcal{L}_{rec}^G\), and all regularization terms combined under \(\mathcal{L}_{reg}\). The training loss is optimized w.r.t trainable subnetworks \(\Psi, \Theta, \Omega\) \[ \mathcal{L}_{of}(x ; \Psi, \Theta) = \alpha CE(g(x), f(x)). \] \[ \mathcal{L}_{rec}^G(x ; \Psi, \Omega) = ||\tilde{x} - x||_2^2 + ||\tilde{x} - x||_1 + \beta \textrm{LPIPS}(\tilde{x}, x) + \gamma CE(f(\tilde{x}), f(x)). \] \[ \begin{aligned} & \mathcal{L}_{reg}(x ; \Psi, \Omega) = \mathcal{L}_{reg-\Psi}(x ; \Psi) + \mathcal{L}_{reg-\Omega}(x ; \Omega), \\ & \mathcal{L}_{reg-\Omega}(x ; \Omega) = ||w_x^+ - \bar{w}||_2^2, \quad \mathcal{L}_{reg-\Psi}(x ; \Psi) = \delta ||\Phi(x)||_1 + \mathcal{L}_{orth}(\Psi). \end{aligned} \] \[ \mathcal{L}_{train}(x ; \Psi, \Theta, \Omega) = \mathcal{L}_{of}(x;\Psi, \Theta) + \mathcal{L}^G_{rec}(x;\Psi, \Omega) + \mathcal{L}_{reg}(x;\Psi, \Omega) \]
The interpretation phase is divided in two parts: (1) Concept relevance estimation, and (2) Concept visualization
The previous step provides the information about which concepts are important for a given sample or globally for a class. Next, we visualize the information encoded by the important concepts The visualization pipeline is mainly based on the idea of latent traversals in generative models. By imputing a higher activation for \(\phi_k(x)\) in \(\Phi(x)\), and comparing the obtained visualization to the original reconstruction \(\tilde{x}\) (obtained with the untouched \(\Phi(x)\)), we interpret information encoded by \(\phi_k\) about image \(x\).
Illustration: Visualization comparison for the same learnt concept (``Yellow-colored head'') using activation maximization (second column) as in FLINT and our proposed VisCoIN visualization. Using our concept translator, that maps concept representation space to the latent space of a generative model, we can visualize each concept at different activation values, allowing for more granular and interactive interpretation.
Classification accuracy of VisCoIN is competetive or better than other unsupervised CoINs. It is also comparable with the original base classification performance.
Reconstruction quality in VisCoIN is significantly better than other unsupervised CoINs in terms of perceptual similarity (LPIPS) approximation of original image distribution (FID).
For a given sample \(x\) with activation \(\Phi(x)\), predicted class\(\hat{c}\)and a threshold \(\tau\), we first ``remove'' all concepts with relevance greater than some threshold by setting their activation to 0. This modified version of \(\Phi(x)\)is referred to as \(\Phi_{rem}(x)\). To compute faithfulness for a given \(x\), denoted by (\text{FF}_x\), we compute the change in probability of the predicted class from original reconstructed sample \(\tilde{x} = G(\Omega(\Phi(x), \Phi^{\prime}(x)))\) to new sample \(x_{rem} = G(\Omega(\Phi_{rem}(x), \Phi^{\prime}(x)))\).Faithfulness of concept dictionary (median over 1000 random test images) is similar in all three unsupervised CoINs on CelebA-HQ but significantly better in VisCoIN on classification tasks with large number of classes
Our concept consistency metric for a given concept is obtained as accuracy of a binary classifier on a dataset created by separating two sets of samples, generated either with very high activation of the given concept, or generated with zero activation of that concept.Consistency of concept dictionary in VisCoIN is noticeably better than other unsupervised CoINs