Causality For Data Scientists
This will be a thread highlighting when and how Data Scientists can effectively leverage #causality.
Inspired by and directly referencing @akelleh& #39;s excellent content on Causal #DataScience.
https://medium.com/causal-data-science/causal-data-science-721ed63a4027
/1">https://medium.com/causal-da...
This will be a thread highlighting when and how Data Scientists can effectively leverage #causality.
Inspired by and directly referencing @akelleh& #39;s excellent content on Causal #DataScience.
https://medium.com/causal-data-science/causal-data-science-721ed63a4027
/1">https://medium.com/causal-da...
Before I jump in, just going to add a bit of credibility by introducing @akelleh :)
(which is who all of this content is stolen from)
- Chief Data Scientist for Research at @BarclaysIB
- Teaches Causal Inference for Data Science at @DSI_Columbia
/2
(which is who all of this content is stolen from)
- Chief Data Scientist for Research at @BarclaysIB
- Teaches Causal Inference for Data Science at @DSI_Columbia
/2
Confounding is something that is well understood by experienced Data Scientists.
So are the dangers of selection bias.
- If you& #39;re conditioning on something when training, then not conditioning on that in prod, bad things can happen
And they know correlation !=> causation
/3
So are the dangers of selection bias.
- If you& #39;re conditioning on something when training, then not conditioning on that in prod, bad things can happen
And they know correlation !=> causation
/3
But if you asked them them to understand how X affects Z using only observational data they may collect all possible confounding variables (Y), put into a linear model to explain how those are affecting Z.
Sound reasonable?
What if X -> Y -> Z?
=> the coefficient on X -> 0.
/4
Sound reasonable?
What if X -> Y -> Z?
=> the coefficient on X -> 0.
/4
So there& #39;s obviously a bit more to understanding how X affects Z.
The rest of this thread will focus on how exactly to do that robustly (or as robustly as possible with the observational data that you can get your hands on :) ).
/5
The rest of this thread will focus on how exactly to do that robustly (or as robustly as possible with the observational data that you can get your hands on :) ).
/5
1. Understand Causality:
- You need to understand how the phenomenon above can be corrected for using the "Back-Door Adjustment"
- Here is a thread to help: https://twitter.com/parker_brydon/status/1209114528845316097">https://twitter.com/parker_br... covering this great post “A Technical Primer On Causality” by @akelleh https://link.medium.com/8TvoYDLEE2
/6">https://link.medium.com/8TvoYDLEE...
- You need to understand how the phenomenon above can be corrected for using the "Back-Door Adjustment"
- Here is a thread to help: https://twitter.com/parker_brydon/status/1209114528845316097">https://twitter.com/parker_br... covering this great post “A Technical Primer On Causality” by @akelleh https://link.medium.com/8TvoYDLEE2
/6">https://link.medium.com/8TvoYDLEE...
2. Understand How To Identify Causal Structure in Your Data:
Now you know how to correct for the variables that satisfy the "back-door criterion"
But you still need to find those variables
Thread: https://twitter.com/parker_brydon/status/1312505624983343104">https://twitter.com/parker_br... (also covering a great post from @akelleh :) )
/7
Now you know how to correct for the variables that satisfy the "back-door criterion"
But you still need to find those variables
Thread: https://twitter.com/parker_brydon/status/1312505624983343104">https://twitter.com/parker_br... (also covering a great post from @akelleh :) )
/7
3. Understand How Implement:
Now you just need to know how to leverage the "Back-Door Adjustment" in practice with those variables
Thread: https://twitter.com/parker_brydon/status/1209136670706143232">https://twitter.com/parker_br... covering this great post "Causal Inference With pandas.DataFrames" by @akelleh
https://link.medium.com/64UeJw7KE2
/8">https://link.medium.com/64UeJw7KE...
Now you just need to know how to leverage the "Back-Door Adjustment" in practice with those variables
Thread: https://twitter.com/parker_brydon/status/1209136670706143232">https://twitter.com/parker_br... covering this great post "Causal Inference With pandas.DataFrames" by @akelleh
https://link.medium.com/64UeJw7KE2
/8">https://link.medium.com/64UeJw7KE...