Score-based Causal Representation Learning

(AAAI 2025 Tutorial) Causal Representation Learning

Slides

(NeurIPS 2024) Linear Causal Representation Learning from Unknown Multi-node Interventions

Paper Code Poster

TLDR:

Linear transform, general causal models, unknown multi-node interventions. Same guarantees as single-node interventions!
Hard interventions: element-wise identifiability up to scaling and perfect recovery of DAG.
Soft interventions: identifiability up to ancestors
Requirement: multi-node interventions are diverse enough, specified as having a full-rank intervention signature matrix.
(NeurIPS 2024) Sample Complexity of Interventional Causal Representation Learning

Paper Poster

TLDR:

Linear transform, general causal models, one single-node soft intervention per node.
First sample complexity results for interventional CRL!
Probably approximately correct (PAC)-identifiability via generic score estimators.
Specific sample complexity results for an RKHS-based score estimator.
General Identifiability and Achievability for Causal Representation Learning (AISTATS 2024 - oral)

Paper Code Talk Poster

TLDR:

General transform, general causal models, two single-node hard interventions per node suffice for element-wise identifiability (up to an intervible transform)!
First provably correct algorithm for general transforms! Experiments with images confirm the scalability.
Benefits: do not require faithfulness assumption of causal discovery, and do not require to know which pair of environments intervene on the same node.
Score-based Causal Representation Learning: Linear and General Transformations (under submission)

Paper Code

TLDR:

Linear transform, general causal models, one single-node intervention per node.
Hard interventions: element-wise identifiability up to scaling and perfect recovery of DAG.
Soft interventions: identifiability up to ancestors. If the causal model is sufficiently nonlinear, then latent DAG is fully identified and latent variables are identified up to surrounding variables (shown to be a tight result)
General transform: reorganization of the AISTATS paper, with additional experiments.
Score-based Causal Representation Learning with Interventions

Paper Code

=======================

Project summary:

In causal representation learning (CRL), we consider a data-generating process in which the high-dimensional observations $X$ are generated from low-dimensional, causaly-related variables $Z$ through an unknown transformation $g$ as $X=g(Z)$. The causal relationships among the latent variables are captured by a directed acyclic graph (DAG) ${\cal{G}}_{Z}$ over $Z$.

CRL is the process of inverting the data generation-process for using the observed data $X$ and recovering (i) the causal structure $\cal{G}_{Z}$ and (ii) the latent causal variables $Z$. We focus on two central questions:

Identifiability: Determining the necessary and sufficient conditions under which ${\cal{G}}_{Z}$ and $Z$ can be recovered. The scope of identifiability (e.g., perfect or partial) critically depends on the extent of information available about the data and the underlying data-generation process.
Achievability: Designing algorithms that can recover $Z$ and $\mathcal{G}_{Z}$, while maintaining identifiability guarantees. Note that identifiability results can be non-constructive as well, without specifying feasible algorithms. Hence, we make the distinction and aim to design practical algorithms for achieving constructive identifiability results.

Why is identifiability difficult? Identifiability is known to be impossible without additional supervision or sufficient statistical diversity among the samples of the observed data $X$. For instance, for the true transform $g$ and another invertible function $a$, we have $(g \circ g^{-1})(X)=X$ and $(g \circ a^{-1}) \circ (a \circ g^{-1})(X)=X$. Hence, inverse transform $g^{-1}$ cannot via ensuring reconstruction of $X$. As an extreme simplification, linear mixing of independent Gaussian variables (i.e., a graph with no edge). Since Gaussians are rotation-invariant, we cannot distinguish between $Z$ and $R \circ Z$ for any rotation matrix $R$.

Therefore, to ensure identifiability, we need to look for a reasonable combination of (i) assumptions on the data-generation (model class we consider), and (ii) richer observations.

Interventional CRL

In our work, we rely on interventions on the latent causal space to provide richer observations and sufficient statistical diversity to enable identifiability. Specifically, in addition to observational environment, we consider a set of given interventional environments, in which a subset of nodes are intervened in each. In this framework, we only use distribution level information, meaning that we use the interventions as a weak form of supervision via having access to only the pair of distributions before and after an intervention (as opposed to requiring pairs of observed and intervened samples). This allows us:

model distribution shifts via changes in causal mechanisms
contrast interventional vs. observational distributions
if changes are sparse, gives a natural way to restrain the solution set

Ideally, one would prefer no restriction on intervention size, type, or knowledge. In our papers, we consider different settings, e.g. single-node vs. multi-node interventions and soft vs. hard interventions.

Score-based methodology

In our papers, we establish the connections between score functions (gradient of the log-density) and interventional distributions. Specifically, we show that differences in score functions across different environment pairs contain all the information about the latent DAG. Leveraging this property, we show that the data-generating process can be inverted by finding the inverse transform that minimizes the score differences in the latent space. Using this approach, we develop constructive proofs of identifiability and algorithms in various settings. To give an insight, there are two key properties of score functions that make this idea work. Denote $s(z) = \nabla \log_z p(z)$ and $s_X(x) = \nabla \log_x p(x)$ for observational distributions $p(z)$ and $p_X(x)$. Use superscript $^m$ to denote corresponding definitions in interventional environment ${\cal E}^m$.

Score differences are sparse: Consider a single-node intervention, e.g., node $i$ is intervened in ${\cal E}^m$. Then, the score difference function $s(z) - s^m(z)$ will be sparse, with indices of non-zero entries exactly correspond to the parents of the intervened node. This implies that given access to all single-node interventions, changes in the score functions exactly give DAG structure!

How we use this property to guide our learning of an inverse transform? Consider a candidate encoder $h$, and \let $\hat{Z} = h(X)$. Intuitively, we can use the sparsity of the true latent score differences to find the true encoder $g^{-1}$, i.e., the estimated latent score differences cannot be sparser than the true score differences.

Latent score differences can be computed from observed score differences: So far, all the nice properties of score functions above are for latent variables, since these properties stem from the causal relationships among the latents and the interventions. However, we have only access to observed $X$ variables. In terms of pure identifiability objective, one can suggest computing the score functions of $\hat{Z}$ for every possible encoder $h$, which is infeasible. Instead, we take a constructive approach and show that latent score differences can be computed from observed score differences using the Jacobian of $h^{-1}$. Specifically, $s_{\hat{Z}} ({\hat{z}}) - s_{\hat{Z}}^{m}({\hat{z}}) = J_{h^{-1}}({\hat{z}})^{\top} (s_{X} (x) - s_{X}^{m}(x))$.

References

arXiv
Score-based causal representation learning with interventions

Burak Varıcı, Emre Acartürk, Karthikeyan Shanmugam, Abhishek Kumar, and Ali Tajer

arXiv:2301.08230, 2023

Abs Bib PDF Slides

This paper studies the causal representation learning problem when the latent causal variables are observed indirectly through an unknown linear transformation. The objectives are: (i) recovering the unknown linear transformation (up to scaling) and (ii) determining the directed acyclic graph (DAG) underlying the latent variables. Sufficient conditions for DAG recovery are established, and it is shown that a large class of non-linear models in the latent space (e.g., causal mechanisms parameterized by two-layer neural networks) satisfy these conditions. These sufficient conditions ensure that the effect of an intervention can be detected correctly from changes in the score. Capitalizing on this property, recovering a valid transformation is facilitated by the following key property: any valid transformation renders latent variables’ score function to necessarily have the minimal variations across different interventional environments. This property is leveraged for perfect recovery of the latent DAG structure using only soft interventions. For the special case of stochastic hard interventions, with an additional hypothesis testing step, one can also uniquely recover the linear transformation up to scaling and a valid causal ordering.
@article{varici2023score, title = {Score-based causal representation learning with interventions}, author = {Var{\i}c{\i}, Burak and Acartürk, Emre and Shanmugam, Karthikeyan and Kumar, Abhishek and Tajer, Ali}, journal = {arXiv:2301.08230}, year = {2023}, }
NeurIPS
Linear Causal Representation Learning from Unknown Multi-node Interventions

Burak Varıcı, Emre Acartürk, Karthikeyan Shanmugam, and Ali Tajer

In Proc. Advances in Neural Information Processing Systems, 2024

Abs Bib PDF Code Poster Slides

Despite the multifaceted recent advances in interventional causal representation learning (CRL), they primarily focus on the stylized assumption of single-node interventions. This assumption is not valid in a wide range of applications, and generally, the subset of nodes intervened in an interventional environment is fully unknown. This paper focuses on interventional CRL under unknown multi-node (UMN) interventional environments and establishes the first identifiability results for general latent causal models (parametric or nonparametric) under stochastic interventions (soft or hard) and linear transformation from the latent to observed space. Specifically, it is established that given sufficiently diverse interventional environments, (i) identifiability up to ancestors is possible using only soft interventions, and (ii) perfect identifiability is possible using hard interventions. Remarkably, these guarantees match the best-known results for more restrictive single-node interventions. Furthermore, CRL algorithms are also provided that achieve the identifiability guarantees. A central step in designing these algorithms is establishing the relationships between UMN interventional CRL and score functions associated with the statistical models of different interventional environments. Establishing these relationships also serves as constructive proof of the identifiability guarantees.
@inproceedings{varici2024linear, title = {Linear Causal Representation Learning from Unknown Multi-node Interventions}, author = {Var{\i}c{\i}, Burak and Acart{\"u}rk, Emre and Shanmugam, Karthikeyan and Tajer, Ali}, booktitle = {Proc. Advances in Neural Information Processing Systems}, year = {2024}, video_short = {https://neurips.cc/virtual/2024/poster/93136}, }

NeurIPS

Sample Complexity of Interventional Causal Representation Learning

Emre Acartürk, Burak Varıcı, Karthikeyan Shanmugam, and Ali Tajer

In Proc. Advances in Neural Information Processing Systems, 2024

Bib PDF Poster

@inproceedings{acarturk2024sample,
  title = {Sample Complexity of Interventional Causal Representation Learning},
  author = {Acart{\"u}rk, Emre and Var{\i}c{\i}, Burak and Shanmugam, Karthikeyan and Tajer, Ali},
  booktitle = {Proc. Advances in Neural Information Processing Systems},
  year = {2024},
}

AISTATS (oral)
General Identifiability and Achievability for Causal Representation Learning

Burak Varıcı, Emre Acartürk, Karthikeyan Shanmugam, and Ali Tajer

In Proc. International Conference on Artificial Intelligence and Statistics, 2024

Abs Bib PDF Code Poster Slides

This paper focuses on causal representation learning (CRL) under a general nonparametric latent causal model and a general transformation model that maps the latent data to the observational data. It establishes identifiability and achievability results using two hard uncoupled interventions per node in the latent causal graph. Notably, one does not know which pair of intervention environments have the same node intervened (hence, uncoupled). For identifiability, the paper establishes that perfect recovery of the latent causal model and variables is guaranteed under uncoupled interventions. For achievability, an algorithm is designed that uses observational and interventional data and recovers the latent causal model and variables with provable guarantees. This algorithm leverages score variations across different environments to estimate the inverse of the transformer and, subsequently, the latent variables. The analysis, additionally, recovers the identifiability result for two hard coupled interventions, that is when metadata about the pair of environments that have the same node intervened is known. This paper also shows that when observational data is available, additional faithfulness assumptions that are adopted by the existing literature are unnecessary.
@inproceedings{varici2024general, title = {General Identifiability and Achievability for Causal Representation Learning}, author = {Var{\i}c{\i}, Burak and Acart{\"u}rk, Emre and Shanmugam, Karthikeyan and Tajer, Ali}, booktitle = {Proc. International Conference on Artificial Intelligence and Statistics}, year = {2024}, address = {Valencia, Spain}, }