
Machine learning based causal inference/uplift in Python
Installation¶
pip install causeinfer
git clone https://github.com/andrewtavis/causeinfer.git
cd causeinfer
python setup.py install
import causeinfer
standard_algorithms¶
The standard_algorithms
module compiles causal inference modeling techniques for quick application.
Base Models¶
Base models for the following algorithms:
The Two Model Approach
The Interaction Term Approach
The Binary Class Transformation (BCT) Approach
The Quaternary Class Transformation (QCT) Approach
The Reflective Uplift Approach
The Pessimistic Uplift Approach
Note: these classes should not be used directly. Please use derived classes instead.
- Based on
Kuchumov, A. pyuplift: Lightweight uplift modeling framework for Python. (2019). URL: https://github.com/duketemon/pyuplift. License: https://github.com/duketemon/pyuplift/blob/master/LICENSE.
- Contents
- BaseModel Class
fit, predict
- TransformationModel Class (see annotation/methodology explanation)
is_treatment_positive, is_control_positive, is_control_negative, is_treatment_negative
- class causeinfer.standard_algorithms.base_models.BaseModel[source]¶
Base class for the Two Model and Interaction Term Approaches.
- fit(X, y, w)[source]¶
- Parameters
- Xnumpy.ndarray(num_units, num_features)int, float
Dataframe of covariates.
- ynumpy.ndarray(num_units,)int, float
Vector of unit responses.
- wnumpy.ndarray(num_units,)int, float
Designates the original treatment allocation across units.
- Returns
- selfobject
- predict(X, w)[source]¶
- Parameters
- Xnumpy.ndarray(num_pred_units, num_pred_features)int, float
New data on which to make a prediction.
- wnumpy.ndarray(num_pred_units, num_pred_features)int, float
Treatment allocation for predicted units.
- Returns
- y_prednumpy.ndarray(num_pred_units,)int, float
Vector of predicted unit responses.
- class causeinfer.standard_algorithms.base_models.TransformationModel[source]¶
Base class for the Response Transformation Approaches.
Notes
The following is non-standard annotation to combine marketing and other methodologies.
Traditional marketing annotation is found in parentheses.
The response transformation approach splits the units based on response and treatment:
TP : Treatment Positives (Treatment Responders).
CP : Control Positives (Control Responders).
CN : Control Negatives (Control Nonresponders).
TN : Treatment Negatives (Treatment Nonresponders).
From these four known classes we want to derive the characteristic responses of four unknown classes:
AP : Affected Positives (Persuadables) : within TPs and CNs.
UP : Unaffected Positives (Sure Things) : within TPs and CPs.
UN : Unaffected Negatives (Lost Causes) : within CNs and TNs.
AN : Affected Negatives (Do Not Disturbs) : within CPs and TNs.
The focus then falls onto predicting APs and ANs via their known classes.
- is_treatment_positive(y, w)[source]¶
Checks if a subject did respond when treated.
- Parameters
- yint, float
The target response.
- wint, float
The treatment value.
- Returns
- is_treatment_positivebool
- is_control_positive(y, w)[source]¶
Checks if a subject did respond when not treated.
- Parameters
- yint, float
The target response.
- wint, float
The treatment value.
- Returns
- is_control_positivebool
Two Model¶
The Two Model Approach (Double Model, Separate Model).
- Based on
Kuchumov, A. pyuplift: Lightweight uplift modeling framework for Python. (2019). URL: https://github.com/duketemon/pyuplift. License: https://github.com/duketemon/pyuplift/blob/master/LICENSE.
Hansotia, B. and B. Rukstales (2002). “Incremental value modeling”. In: Journal of Interactive Marketing 16(3), pp. 35–46. URL: https://search.proquest.com/openview/1f86b52432f7d80e46101b2b4b7629c0/1?cbl=32002& pq-origsite=gscholar
Devriendt, F. et al. (2018). A Literature Survey and Experimental Evaluation of the State-of-the-Art in Uplift Modeling: A Stepping Stone Toward the Development of Prescriptive Analytics. Big Data, Vol. 6, No. 1, March 1, 2018, pp. 1-29. Codes found at: data-lab.be/downloads.php.
- Contents
- TwoModel Class
fit, predict, predict_proba
- class causeinfer.standard_algorithms.two_model.TwoModel(control_model=None, treatment_model=None)[source]¶
- fit(X, y, w)[source]¶
Trains a model given covariates, responses and assignments.
- Parameters
- Xnumpy.ndarray(num_units, num_features)int, float
Matrix of covariates.
- ynumpy.ndarray(num_units,)int, float
Vector of unit responses.
- wnumpy.ndarray(num_units,)int, float
Vector of original treatment allocations across units.
- Returns
- treatment_model, control_modelcauseinfer.standard_algorithms.TwoModel
Two trained models (one for training group, one for control).
- predict(X)[source]¶
Predicts a causal effect given covariates.
- Parameters
- Xnumpy.ndarray(num_units, num_features)int, float
New data on which to make predictions.
- Returns
- predictionsnumpy.ndarray(num_units, 2)float
Predicted causal effects for all units given treatment model and control.
- predict_proba(X)[source]¶
Predicts the probability that a subject will be a given class given covariates.
- Parameters
- Xnumpy.ndarray(num_units, num_features)int, float
New data on which to make predictions.
- Returns
- probasnumpy.ndarray(num_units, 2)float
Predicted probability to respond for all units given treatment and control models.
Interaction Term¶
The Interaction Term Approach (The True Lift Model, The Dummy Variable Approach).
- Based on
Kuchumov, A. pyuplift: Lightweight uplift modeling framework for Python. (2019). URL: https://github.com/duketemon/pyuplift. License: https://github.com/duketemon/pyuplift/blob/master/LICENSE.
Lo, VSY. (2002). “The true lift model: a novel data mining approach to response modeling in database marketing”. In:SIGKDD Explor4 (2), 78–86. URL: https://dl.acm.org/citation.cfm?id=772872
Devriendt, F. et al. (2018). A Literature Survey and Experimental Evaluation of the State-of-the-Art in Uplift Modeling: A Stepping Stone Toward the Development of Prescriptive Analytics. Big Data, Vol. 6, No. 1, March 1, 2018, pp. 1-29. Codes found at: data-lab.be/downloads.php.
- Contents
- InteractionTerm Class
fit, predict, predict_proba
- class causeinfer.standard_algorithms.interaction_term.InteractionTerm(model=None)[source]¶
- fit(X, y, w)[source]¶
Trains a model given covariates, responses and assignments.
- Parameters
- Xnumpy.ndarray(num_units, num_features)int, float
Matrix of covariates.
- ynumpy.ndarray(num_units,)int, float
Vector of unit responses.
- wnumpy.ndarray(num_units,)int, float
Vector of original treatment allocations across units.
- Returns
- selfcauseinfer.standard_algorithms.InteractionTerm
A trained model.
- predict(X)[source]¶
Predicts a causal effect given covariates.
- Parameters
- Xnumpy.ndarray(num_units, num_features)int, float
New data on which to make predictions.
- Returns
- predictionsnumpy.ndarray(num_units, 2)float
Predicted causal effects for all units given a 1 and 0 interaction term.
- predict_proba(X)[source]¶
Predicts the probability that a subject will be a given class given covariates.
- Parameters
- Xnumpy.ndarray(num_units, num_features)int, float
New data on which to make predictions.
- Returns
- probasnumpy.ndarray(num_units, 2)float
Predicted causal probabilities for all units given a 1 and 0 interaction term.
Binary Class Transformation¶
The Binary Class Transformation Approach (Influential Marketing, Response Transformation Approach).
- Based on
Kuchumov, A. pyuplift: Lightweight uplift modeling framework for Python. (2019). URL: https://github.com/duketemon/pyuplift. License: https://github.com/duketemon/pyuplift/blob/master/LICENSE.
Lai, L.Y.-T. (2006). “Influential marketing: A new direct marketing strategy addressing the existence of voluntary buyers”. Master of Science thesis, Simon Fraser University School of Computing Science, Burnaby, BC,Canada. URL: https://summit.sfu.ca/item/6629
Shaar, A., Abdessalem, T., and Segard, O. (2016). “Pessimistic Uplift Modeling”. ACM SIGKDD, August 2016, San Francisco, California USA, arXiv:1603.09738v1. URL:https://pdfs.semanticscholar.org/a67e/401715014c7a9d6a6679df70175be01daf7c.pdf.
Devriendt, F. et al. (2018). A Literature Survey and Experimental Evaluation of the State-of-the-Art in Uplift Modeling: A Stepping Stone Toward the Development of Prescriptive Analytics. Big Data, Vol. 6, No. 1, March 1, 2018, pp. 1-29. Codes found at: data-lab.be/downloads.php.
- Contents
- BinaryTransformation Class
_binary_transformation, _binary_regularization, fit, predict (Not available at this time), predict_proba
- class causeinfer.standard_algorithms.binary_transformation.BinaryTransformation(model=None, regularize=False)[source]¶
- _binary_transformation(y, w)[source]¶
Derives which of the unknown Affected Positive or Affected Negative classes the unit could fall into based known outcomes.
- Parameters
- ynumpy.ndarray(num_units,)int, float
Vector of unit responses.
- wnumpy.ndarray(num_units,)int, float
Vector of original treatment allocations across units.
- Returns
- np.array(y_transformed)numpy.ndarrayan array of transformed unit classes.
- _binary_regularization(y=None, w=None)[source]¶
Regularization of binary classes is based on the positive and negative binary affectual classes.
- Parameters
- ynumpy.ndarray(num_units,)int, float
Vector of unit responses.
- wnumpy.ndarray(num_units,)int, float
Vector of original treatment allocations across units.
- Returns
- fav_ratio, unfav_ratiofloat
Regularized ratios of favorable and unfavorable classes.
- fit(X, y, w)[source]¶
Trains a model given covariates, responses and assignments.
- Parameters
- Xnumpy.ndarray(num_units, num_features)int, float
Matrix of covariates.
- ynumpy.ndarray(num_units,)int, float
Vector of unit responses.
- wnumpy.ndarray(num_units,)int, float
Vector of original treatment allocations across units.
- Returns
- selfcauseinfer.standard_algorithms.BinaryTransformation
A trained model.
- predict_proba(X)[source]¶
Predicts the probability that a subject will be a given class given covariates.
- Parameters
- Xnumpy.ndarray(num_units, num_features)int, float
New data on which to make predictions.
- Returns
- probasnumpy.ndarray(num_units, 2)float
Predicted probabilities for being a favorable class and unfavorable class.
Quaternary Class Transformation¶
The Quaternary Class Transformation Approach (Response Transformation Approach).
- Based on
Kuchumov, A. pyuplift: Lightweight uplift modeling framework for Python. (2019). URL: https://github.com/duketemon/pyuplift. License: https://github.com/duketemon/pyuplift/blob/master/LICENSE.
Kane, K., Lo, VSY., and Zheng, J. (2014). “Mining for the truly responsive customers and prospects using truelift modeling: Comparison of new and existing methods”. In:Journal of Marketing Analytics 2(4), 218–238. URL: https://link.springer.com/article/10.1057/jma.2014.18
Devriendt, F. et al. (2018). A Literature Survey and Experimental Evaluation of the State-of-the-Art in Uplift Modeling: A Stepping Stone Toward the Development of Prescriptive Analytics. Big Data, Vol. 6, No. 1, March 1, 2018, pp. 1-29. Codes found at: data-lab.be/downloads.php.
- Contents
- QuaternaryTransformation Class
_quaternary_transformation, _quaternary_regularization, fit, predict (not available at this time), predict_proba
- class causeinfer.standard_algorithms.quaternary_transformation.QuaternaryTransformation(model=None, regularize=False)[source]¶
- _quaternary_transformation(y, w)[source]¶
Assigns known quaternary (TP, CP, CN, TN) classes to units.
- Parameters
- ynumpy.ndarray(num_units,)int, float
Vector of unit responses.
- wnumpy.ndarray(num_units,)int, float
Vector of original treatment allocations across units.
- Returns
- np.array(y_transformed)np.array
an array of transformed unit classes.
- _quaternary_regularization(y=None, w=None)[source]¶
Regularization of quaternary classes is based on their treatment assignment.
- Parameters
- ynumpy.ndarray(num_units,)int, float
Vector of unit responses.
- wnumpy.ndarray(num_units,)int, float
Vector of original treatment allocations across units.
- Returns
- control_count, treatment_countint
Regularized amounts of control and treatment classes.
- fit(X, y, w)[source]¶
Trains a model given covariates, responses and assignments.
- Parameters
- Xnumpy.ndarray(num_units, num_features)int, float
Matrix of covariates.
- ynumpy.ndarray(num_units,)int, float
Vector of unit responses.
- wnumpy.ndarray(num_units,)int, float
Vector of original treatment allocations across units.
- Returns
- selfcauseinfer.standard_algorithms.QuaternaryTransformation
A trained model.
- predict_proba(X)[source]¶
Predicts the probability that a subject will be a given class given covariates.
- Parameters
- Xnumpy.ndarray(num_units, num_features)int, float
New data on which to make predictions.
- Returns
- probasnumpy.ndarray(num_units, 2)float
Predicted probabilities for being a favorable class and an unfavorable class.
Reflective Uplift Transformation¶
The Reflective Uplift Transformation Approach.
- Based on
Kuchumov, A. pyuplift: Lightweight uplift modeling framework for Python. (2019). URL: https://github.com/duketemon/pyuplift. License: https://github.com/duketemon/pyuplift/blob/master/LICENSE.
Shaar, A., Abdessalem, T., and Segard, O. (2016). “Pessimistic Uplift Modeling”. ACM SIGKDD, August 2016, San Francisco, California USA, arXiv:1603.09738v1. URL:https://pdfs.semanticscholar.org/a67e/401715014c7a9d6a6679df70175be01daf7c.pdf.
- Contents
- ReflectiveUplift Class
fit, predict (not available at this time), predict_proba, _reflective_transformation, _reflective_weights
- class causeinfer.standard_algorithms.reflective.ReflectiveUplift(model=None)[source]¶
- fit(X, y, w)[source]¶
Trains a model given covariates, responses and assignments.
- Parameters
- Xnumpy.ndarray(num_units, num_features)int, float
Matrix of covariates.
- ynumpy.ndarray(num_units,)int, float
Vector of unit responses.
- wnumpy.ndarray(num_units,)int, float
Vector of original treatment allocations across units.
- Returns
- selfcauseinfer.standard_algorithms.ReflectiveUplift
A trained model.
- predict_proba(X)[source]¶
Predicts the probability that a subject will be a given class given covariates.
- Parameters
- Xnumpy.ndarray(num_units, num_features)int, float
New data on which to make predictions.
- Returns
- probasnumpy.ndarray(num_units, 2)float
Predicted probabilities for being a favorable class and an unfavorable class.
- _reflective_transformation(y, w)[source]¶
Assigns known quaternary (TP, CP, CN, TN) classes to units.
- Parameters
- ynumpy.ndarray(num_units,)int, float
Vector of unit responses.
- wnumpy.ndarray(num_units,)int, float
Vector of original treatment allocations across units.
- Returns
- np.array(y_transformed)np.array
an array of transformed unit classes.
- _reflective_weights(y, w)[source]¶
Derives weights to normalize binary transformation noise.
- Parameters
- ynumpy.ndarray(num_units,)int, float
Vector of unit responses.
- wnumpy.ndarray(num_units,)int, float
Vector of original treatment allocations across units.
- Returns
- p_tp_fav, p_cp_fav, p_cn_unfav, p_tn_unfavnp.array
Probabilities of being a quaternary class per binary class.
Pessimistic Uplift Transformation¶
The Pessimistic Uplift Transformation Approach.
- Based on
Kuchumov, A. pyuplift: Lightweight uplift modeling framework for Python. (2019). URL: https://github.com/duketemon/pyuplift. License: https://github.com/duketemon/pyuplift/blob/master/LICENSE.
Shaar, A., Abdessalem, T., and Segard, O. (2016). “Pessimistic Uplift Modeling”. ACM SIGKDD, August 2016, San Francisco, California USA, arXiv:1603.09738v1. URL:https://pdfs.semanticscholar.org/a67e/401715014c7a9d6a6679df70175be01daf7c.pdf.
- Contents
- PessimisticUplift Class
fit, predict (not available at this time), predict_proba
- class causeinfer.standard_algorithms.pessimistic.PessimisticUplift(model=None)[source]¶
- fit(X, y, w)[source]¶
Trains a model given covariates, responses and assignments.
- Parameters
- Xnumpy.ndarray(num_units, num_features)int, float
Matrix of covariates.
- ynumpy.ndarray(num_units,)int, float
Vector of unit responses.
- wnumpy.ndarray(num_units,)int, float
Vector of original treatment allocations across units.
- Returns
- selfcauseinfer.standard_algorithms.PessimisticUplift
A trained model.
- predict_proba(X)[source]¶
Predicts the probability that a subject will be a given class given covariates.
- Parameters
- Xnumpy.ndarray(num_units, num_features)int, float
New data on which to make predictions.
- Returns
- probasnumpy.ndarray(num_units, 2)float
Predicted probabilities for being a favorable class and an unfavorable class.
evaluation¶
The evaluation
module provides methods for accuracy measurement and presentation.
Metrics¶
causeinfer metrics provide statistical impressions of model performance.
Functions
causeinfer.evaluation.get_batch_metrics()
- causeinfer.evaluation.get_cum_effect(df, models=None, outcome_col='y', treatment_col='w', treatment_effect_col='tau', normalize=False, random_seed=None)[source]¶
Gets average causal effects of model estimates in cumulative population.
- Parameters
- dfpandas.DataFrame
A data frame with model estimates and actual data as columns.
- modelslist
A list of models corresponding to estimated treatment effect columns.
- outcome_colstroptional (default=y)
The column name for the actual outcome.
- treatment_colstroptional (default=w)
The column name for the treatment indicator (0 or 1).
- treatment_effect_colstroptional (default=tau)
The column name for the true treatment effect.
- normalizeboolnot implemented (default=False)
For consitency with gain and qini.
- random_seedint, optional (default=None)
Random seed for numpy.random.rand().
- Returns
- effectspandas.DataFrame
Average causal effects of model estimates in cumulative population.
- causeinfer.evaluation.get_cum_gain(df, models=None, outcome_col='y', treatment_col='w', treatment_effect_col='tau', normalize=False, random_seed=None)[source]¶
Gets cumulative gains of model estimates in population.
- Parameters
- dfpandas.DataFrame
A data frame with model estimates and actual data as columns.
- modelslist
A list of models corresponding to estimated treatment effect columns.
- outcome_colstroptional (default=y)
The column name for the actual outcome.
- treatment_colstroptional (default=w)
The column name for the treatment indicator (0 or 1).
- treatment_effect_colstroptional (default=tau)
The column name for the true treatment effect.
- normalizebooloptional (default=False)
Whether to normalize the y-axis to 1 or not.
- random_seedint, optional (default=None)
Random seed for numpy.random.rand().
- Returns
- gainspandas.DataFrame
Cumulative gains of model estimates in population.
- causeinfer.evaluation.get_qini(df, models=None, outcome_col='y', treatment_col='w', treatment_effect_col='tau', normalize=False, random_seed=None)[source]¶
Gets Qini of model estimates in population.
- Parameters
- dfpandas.DataFrame
A data frame with model estimates and actual data as columns.
- modelslist
A list of models corresponding to estimated treatment effect columns.
- outcome_colstroptional (default=y)
The column name for the actual outcome.
- treatment_colstroptional (default=w)
The column name for the treatment indicator (0 or 1).
- treatment_effect_colstroptional (default=tau)
The column name for the true treatment effect.
- normalizebooloptional (default=False)
Whether to normalize the y-axis to 1 or not.
- random_seedint, optional (default=None)
Random seed for numpy.random.rand().
- Returns
- qinispandas.DataFrame
Qini of model estimates in population.
- causeinfer.evaluation.auuc_score(df, models=None, outcome_col='y', treatment_col='w', treatment_effect_col='tau', normalize=False, random_seed=None)[source]¶
Calculates the AUUC score (Gini): the Area Under the Uplift Curve.
- Parameters
- dfpandas.DataFrame
A data frame with model estimates and actual data as columns.
- modelslist
A list of models corresponding to estimated treatment effect columns.
- outcome_colstroptional (default=y)
The column name for the actual outcome.
- treatment_colstroptional (default=w)
The column name for the treatment indicator (0 or 1).
- treatment_effect_colstroptional (default=tau)
The column name for the true treatment effect.
- normalizebooloptional (default=False)
Whether to normalize the y-axis to 1 or not.
- random_seedint, for inheritance (default=None)
Random seed for numpy.random.rand().
- Returns
- AUUC scorefloat
- causeinfer.evaluation.qini_score(df, models=None, outcome_col='y', treatment_col='w', treatment_effect_col='tau', normalize=False, random_seed=None)[source]¶
Calculates the Qini score: the area between the Qini curve of a model and random assignment.
- Parameters
- dfpandas.DataFrame)
A data frame with model estimates and actual data as columns
- modelslist
A list of models corresponding to estimated treatment effect columns.
- outcome_colstroptional (default=y)
The column name for the actual outcome.
- treatment_colstroptional (default=w)
The column name for the treatment indicator (0 or 1).
- treatment_effect_colstroptional (default=tau)
The column name for the true treatment effect.
- normalizebooloptional (default=False)
Whether to normalize the y-axis to 1 or not.
- random_seedint, for inheritance (default=None)
Random seed for numpy.random.rand().
- Returns
- Qini scorefloat
- causeinfer.evaluation.signal_to_noise(y, w)[source]¶
Computes the signal to noise ratio of a dataset to derive the potential for causal inference efficacy.
- Parameters
- ynumpy.ndarray(num_units,)int, float
Vector of unit responses.
- wnumpy.ndarray(num_units,)int, float
Vector of original treatment allocations across units.
- Returns
- sn_ratiofloat
Notes
The signal to noise ratio is the difference in treatment and control response to the control response.
Values close to 0 imply that CI would have little benefit over predictive modeling.
Plots¶
causeinfer plots provide graphical representations of model performance.
Functions
- causeinfer.evaluation.plot_eval(df, kind=None, n=100, percent_of_pop=False, normalize=False, figsize=(15, 5), fontsize=20, axis=None, legend_metrics=None, *args, **kwargs)[source]¶
Plots one of the effect/gain/qini charts of model estimates.
- Parameters
- dfpandas.DataFrame
A data frame with model estimates and unit outcomes as columns.
- kindstroptional (default=’gain’)
The kind of plot to draw: ‘effect,’ ‘gain,’ and ‘qini’ are supported.
- nint, optional (default=100)
The number of samples to be used for plotting.
- percent_of_popbooloptional (default=False)
Whether the X-axis is displayed as a percent of the whole population.
- normalizeboolfor inheritance (default=False)
Passes this argument to interior functions directly.
- figsizetupleoptional
Allows for quick changes of figures sizes.
- fontsizeint or floatoptional (default=20)
The font size of the plots, with all labels scaled accordingly.
- axisstroptional (default=None)
Adds an axis to the plot so they can be combined.
- legend_metricsbooloptional (default=True)
Calculate AUUC or Qini metrics to add to the plot legend for gain and qini respectively.
- causeinfer.evaluation.plot_cum_effect(df, n=100, models=None, percent_of_pop=False, outcome_col='y', treatment_col='w', treatment_effect_col='tau', random_seed=None, figsize=None, fontsize=20, axis=None, legend_metrics=None)[source]¶
Plots the causal effect chart of model estimates in cumulative population.
- Parameters
- dfpandas.DataFrame
A data frame with model estimates and actual data as columns.
- kindeffect
The kind of plot to draw
- nint, optional (default=100)
The number of samples to be used for plotting.
- modelslist
A list of models corresponding to estimated treatment effect columns.
- percent_of_popbooloptional (default=False)
Whether the X-axis is displayed as a percent of the whole population.
- outcome_colstroptional (default=y)
The column name for the actual outcome.
- treatment_colstroptional (default=w)
The column name for the treatment indicator (0 or 1).
- treatment_effect_colstroptional (default=tau)
The column name for the true treatment effect.
- random_seedint, optional (default=None)
Random seed for numpy.random.rand().
- figsizetupleoptional
Allows for quick changes of figures sizes.
- fontsizeint or floatoptional (default=20)
The font size of the plots, with all labels scaled accordingly.
- axisstroptional (default=None)
Adds an axis to the plot so they can be combined.
- legend_metricsbooloptional (default=False)
Not supported for plot_cum_effect - the user will be notified.
- Returns
- A plot of the cumulative effects of all models in df.
- causeinfer.evaluation.plot_cum_gain(df, n=100, models=None, percent_of_pop=False, outcome_col='y', treatment_col='w', treatment_effect_col='tau', normalize=False, random_seed=None, figsize=None, fontsize=20, axis=None, legend_metrics=True)[source]¶
Plots the cumulative gain chart (or uplift curve) of model estimates.
- Parameters
- dfpandas.DataFrame
A data frame with model estimates and actual data as columns.
- kindgain
The kind of plot to draw
- nint, optional (default=100)
The number of samples to be used for plotting.
- modelslist
A list of models corresponding to estimated treatment effect columns.
- percent_of_popbooloptional (default=False)
Whether the X-axis is displayed as a percent of the whole population.
- outcome_colstroptional (default=y)
The column name for the actual outcome.
- treatment_colstroptional (default=w)
The column name for the treatment indicator (0 or 1).
- treatment_effect_colstroptional (default=tau)
The column name for the true treatment effect.
- normalizebooloptional (default=False)
Whether to normalize the y-axis to 1 or not.
- random_seedint, optional (default=None)
Random seed for numpy.random.rand().
- figsizetupleoptional
Allows for quick changes of figures sizes.
- fontsizeint or floatoptional (default=20)
The font size of the plots, with all labels scaled accordingly.
- axisstroptional (default=None)
Adds an axis to the plot so they can be combined.
- legend_metricsbooloptional (default=True)
Calculates AUUC metrics to add to the plot legend.
- Returns
- A plot of the cumulative gains of all models in df.
- causeinfer.evaluation.plot_qini(df, n=100, models=None, percent_of_pop=False, outcome_col='y', treatment_col='w', treatment_effect_col='tau', normalize=False, random_seed=None, figsize=None, fontsize=20, axis=None, legend_metrics=True)[source]¶
Plots the Qini chart (or uplift curve) of model estimates.
- Parameters
- dfpandas.DataFrame
A data frame with model estimates and actual data as columns.
- kindqini
The kind of plot to draw
- nint, optional (default=100)
The number of samples to be used for plotting.
- modelslist
A list of models corresponding to estimated treatment effect columns.
- percent_of_popbooloptional (default=False)
Whether the X-axis is displayed as a percent of the whole population.
- outcome_colstroptional (default=y)
The column name for the actual outcome.
- treatment_colstroptional (default=w)
The column name for the treatment indicator (0 or 1).
- treatment_effect_colstroptional (default=tau)
The column name for the true treatment effect.
- normalizebooloptional (default=False)
Whether to normalize the y-axis to 1 or not.
- random_seedint, optional (default=None)
Random seed for numpy.random.rand().
- figsizetupleoptional
Allows for quick changes of figures sizes.
- fontsizeint or floatoptional (default=20)
The font size of the plots, with all labels scaled accordingly.
- axisstroptional (default=None)
Adds an axis to the plot so they can be combined.
- legend_metricsbooloptional (default=True)
Calculates Qini metrics to add to the plot legend.
- Returns
- A plot of the qini curves of all models in df.
- causeinfer.evaluation.plot_batch_metrics(df, kind=None, n=10, models=None, outcome_col='y', treatment_col='w', normalize=False, figsize=(15, 5), fontsize=20, axis=None, *args, **kwargs)[source]¶
Plots the batch chart: the cumulative batch metrics predicted by a model given ranked treatment effects.
- Parameters
- dfpandas.DataFrame
A data frame with model estimates and unit outcomes as columns.
- kindstroptional (default=’gain’)
The kind of plot to draw: ‘effect,’ ‘gain,’ ‘qini,’ and ‘response’ are supported.
- nint, optional (default=10, deciles; 20, quintiles also standard)
The number of batches to split the units into.
- modelslist
A list of models corresponding to estimated treatment effect columns.
- outcome_colstroptional (default=y)
The column name for the actual outcome.
- treatment_colstroptional (default=w)
The column name for the treatment indicator (0 or 1).
- figsizetupleoptional
Allows for quick changes of figures sizes.
- fontsizeint or floatoptional (default=20)
The font size of the plots, with all labels scaled accordingly.
- axisstroptional (default=None)
Adds an axis to the plot so they can be combined.
- Returns
- A plot of batch metrics of all models in df.
- causeinfer.evaluation.plot_batch_responses(df, n=10, models=None, outcome_col='y', treatment_col='w', normalize=False, figsize=(15, 5), fontsize=20, axis=None)[source]¶
Plots the batch response chart: the cumulative batch responses predicted by a model given ranked treatment effects.
- Parameters
- dfpandas.DataFrame
A data frame with model estimates and unit outcomes as columns.
- kindresponse
The kind of plot to draw
- nint, optional (default=10, deciles; 20, quintiles also standard)
The number of batches to split the units into.
- modelslist
A list of models corresponding to estimated treatment effect columns.
- outcome_colstroptional (default=y)
The column name for the actual outcome.
- treatment_colstroptional (default=w)
The column name for the treatment indicator (0 or 1).
- figsizetupleoptional
Allows for quick changes of figures sizes.
- fontsizeint or floatoptional (default=20)
The font size of the plots, with all labels scaled accordingly.
- axisstroptional (default=None)
Adds an axis to the plot so they can be combined.
- Returns
- A plot of batch responses of all models in df.
Iteration¶
Iterations methods allow a researcher or practitioner to derive average model accuracy.
Functions
- causeinfer.evaluation.iterate_model(model, X_train, y_train, w_train, X_test, y_test, w_test, tau_test=None, n=10, pred_type='predict', eval_type=None, normalize_eval=False, verbose=True)[source]¶
Trains and makes predictions with a model multiple times to derive average predictions and their variance.
- Parameters
- modelobject
A model over which iterations will be done.
- X_trainnumpy.ndarray(num_train_units, num_features)int, float
Matrix of covariates.
- y_trainnumpy.ndarray(num_train_units,)int, float
Vector of unit responses.
- w_trainnumpy.ndarray(num_train_units,)int, float
Vector of original treatment allocations across units.
- X_testnumpy.ndarray(num_test_units, num_features)int, float
A matrix of covariates.
- y_testnumpy.ndarray(num_test_units,)int, float
A vector of unit responses.
- w_testnumpy.ndarray(num_test_units,)int, float
A vector of original treatment allocations across units.
- tau_testnumpy.ndarray(num_test_units,)int, float
A vector of the actual treatment effects given simulated data.
- nint (default=10)
The number of train and prediction iterations to run.
- pred_typestr (default=pred)
predict or predict_proba: the type of prediction the iterations will make.
- eval_typestr (default=None)
qini or auuc: the type of evaluation to be done on the predictions.
Note: if None, model predictions will be averaged without their variance being calculated.
- normalize_evalbooloptional (default=False)
Whether to normalize the evaluation metric.
- verbosebool (default=True)
Whether to show a tqdm progress bar for the query.
- Returns
- avg_preds_probasnumpy.ndarray (num_units, 2)float
Averaged per unit predictions.
- all_preds_probasdict
A dictionary of all predictions produced during iterations.
- avg_evalfloat
The average of the iterated model evaluations.
- eval_variancefloat
The variance of all prediction evaluations.
- eval_variancefloat
The variance of all prediction evaluations.
- all_evalsdict
A dictionary of all evaluations produced during iterations.
- causeinfer.evaluation.eval_table(eval_dict, variances=False, annotate_vars=False)[source]¶
Displays the evaluation of models given a dictionary of their evaluations over datasets.
- Parameters
- eval_dictdict
A dictionary of model evaluations over datasets.
- variancesbool (default=False)
Whether to annotate the evaluations with their variances.
- annotate_varsbool (default=False)
Whether to annotate the evaluation variances with stars given their sds.
- Returns
- eval_tablepandas.DataFrame(num_datasets, num_models)
A dataframe of dataset to model evaluation comparisons.
data¶
Data in causeinfer provides examples for business, medical, and socio-economic fields as benchmarks for CI techniques.
Hillstrom Email Marketing¶
An email marketing dataset from Kevin Hillstrom’s MineThatData blog.
See an example using this data at causeinfer/examples/business_hillstrom.
- Description found at:
https://blog.minethatdata.com/2008/03/minethatdata-e-mail-analytics-and-data.html
- Based on
Kuchumov, A. pyuplift: Lightweight uplift modeling framework for Python. (2019). URL: https://github.com/duketemon/pyuplift. License: https://github.com/duketemon/pyuplift/blob/master/LICENSE.
K. Hillstrom. “The MineThatData E-Mail Analytics And Data Mining Challenge”. 2008. URL: https://blog.minethatdata.com/2008/03/minethatdata-e-mail-analytics-and-data.html.
- Contents
download_hillstrom, _format_data, load_hillstrom
- causeinfer.data.hillstrom.download_hillstrom(data_path=None, url='http://www.minethatdata.com/Kevin_Hillstrom_MineThatData_E-MailAnalytics_DataMiningChallenge_2008.03.20.csv')[source]¶
Downloads the dataset from Kevin Hillstrom’s blog.
- Parameters
- data_pathstroptional (default=None)
A user specified path for where the data should go.
- urlstr
The url from which the data is to be downloaded.
- Returns
- The data ‘hillstrom.csv’ in a ‘datasets’ folder, unless otherwise specified.
- causeinfer.data.hillstrom._format_data(df, format_covariates=True, normalize=True)[source]¶
Formats the data upon loading for consistent data preparation.
- Parameters
- dfpd.DataFrame
The original unformatted version of the data.
- format_covariatesbooloptional (default=True), controlled in load_hillstrom
True: creates dummy columns and encodes the data.
False: only steps for data readability will be taken.
- normalizebooloptional (default=True), controlled in load_hillstrom
Normalize dataset columns to prepare them for ML methods.
- Returns
- dfpd.DataFrame
A formated version of the data.
- causeinfer.data.hillstrom.load_hillstrom(file_path=None, format_covariates=True, download_if_missing=True, normalize=True)[source]¶
Loads the Hillstrom dataset with formatting if desired.
- Parameters
- file_pathstroptional (default=None)
Specify another path for the dataset.
By default the dataset should be stored in the ‘datasets’ folder in the cwd.
- format_covariatesbooloptional (default=True)
Indicates whether raw data should be loaded without covariate manipulation.
- download_if_missingbooloptional (default=True)
Download the dataset if it is not downloaded before using ‘download_hillstrom’.
- normalizebooloptional (default=True)
Normalize dataset columns to prepare them for ML methods.
- Returns
- datadict object with the following attributes:
- data.descriptionstr
A description of the Hillstrom email marketing dataset.
- data.dataset_fullnumpy.ndarray(64000, 12) or formatted (64000, 22)
The full dataset with features, treatment, and target variables.
- data.dataset_full_nameslist, size 12 or formatted 22
List of dataset variables names.
- data.featuresnumpy.ndarray(64000, 8) or formatted (64000, 18)
Each row corresponding to the 8 feature values in order.
- data.feature_nameslist, size 8 or formatted 18
List of feature names.
- data.treatmentnumpy.ndarray(64000,)
Each value corresponds to the treatment.
- data.response_spendnumpy.ndarray(64000,)
Each value corresponds to how much customers spent during the two-week outcome period.
- data.response_visitnumpy.ndarray(64000,)
Each value corresponds to whether people visited the site during the two-week outcome period.
- data.response_conversionnumpy.ndarray(64000,)
Each value corresponds to whether they purchased at the site (i.e. converted) during the two-week outcome period.
Mayo Clinic PBC¶
A dataset on medical trials to combat primary biliary cholangitis (PBC, formerly cirrhosis) of the liver from the Mayo Clinic.
See an example using this data at causeinfer/examples/medical_mayo_pbc.
- Description found at:
https://www.mayo.edu/research/documents/pbchtml/DOC-10027635
- Based on
Mayo Clinic. “Primary Biliary Cirrhosis”. 1991. URL: https://www.mayo.edu/research/documents/pbchtml/DOC-10027635.
- Contents
download_mayo_pbc, _format_data, load_mayo_pbc
- causeinfer.data.mayo_pbc.download_mayo_pbc(data_path=None, url='http://www.mayo.edu/research/documents/pbcdat/DOC-10026921')[source]¶
Downloads the dataset from the Mayo Clinic’s research documents.
- Parameters
- data_pathstroptional (default=None)
A user specified path for where the data should go.
- urlstr
The url from which the data is to be downloaded.
- Returns
- The text file ‘mayo_pbc’ in a ‘datasets’ folder, unless otherwise specified.
- causeinfer.data.mayo_pbc._format_data(dataset_path, format_covariates=True, normalize=True)[source]¶
Formats the data upon loading for consistent data preparation.
- Parameters
- dataset_pathstr
The original file is a text file with inconsistent spacing, and periods for NaNs.
Furthermore, process only loads those units that took part in the randomized trial, as there are 106 cases that were monitored, but not in the trial.
- format_covariatesbooloptional (default=True)
True: creates dummy columns and encodes the data.
False: only steps for data readability will be taken.
- normalizebooloptional (default=True)
Normalization step controlled in load_mayo_pbc.
- Returns
- dfpd.DataFrame
A formated version of the data.
- causeinfer.data.mayo_pbc.load_mayo_pbc(file_path=None, format_covariates=True, download_if_missing=True, normalize=True)[source]¶
Loads the Mayo PBC dataset with formatting if desired.
- Parameters
- file_pathstroptional (default=None)
Specify another path for the dataset.
By default the dataset should be stored in the ‘datasets’ folder in the cwd.
- format_covariatesbooloptional (default=True)
Indicates whether raw data should be loaded without covariate manipulation.
- download_if_missingbooloptional (default=True)
Download the dataset if it is not downloaded before using ‘download_mayo_pbc’.
- normalizebooloptional (default=True)
Normalize the dataset to prepare it for ML methods.
- Returns
- datadict object with the following attributes:
- data.descriptionstr
A description of the Mayo Clinic PBC dataset.
- data.dataset_fullnumpy.ndarray312, 19) or formatted (312, 24)
The full dataset with features, treatment, and target variables.
- data.dataset_full_nameslist, size 19 or formatted 24
List of dataset variables names.
- data.featuresnumpy.ndarray(312, 17) or formatted (312, 22)
Each row corresponding to the 17 feature values in order.
- data.feature_nameslist, size 17 or formatted 22
List of feature names.
- data.treatmentnumpy.ndarray(312,)
Each value corresponds to the treatment (1 = treat, 0 = control).
- data.responsenumpy.ndarray(312,)
Each value corresponds to one of the outcomes (0 = alive, 1 = liver transplant, 2 = dead).
CMF Microfinance¶
A dataset on microfinance from The Centre for Micro Finance (CMF) at the Institute for Financial Management Research (Chennai, India).
See an example using this data at causeinfer/examples/socioeconomic_cmf_micro.
- Description found at:
https://www.aeaweb.org/articles?id=10.1257/app.20130533 (see paper)
- Based on
A. Banerjee et al. “The Miracle of Microfinance? Evidence from a Randomized Evaluation”. In: American Economic Journal: Applied Economics 7 (1 2015), pp. 22–53. URL: https://www.aeaweb.org/articles?id=10.1257/app.20130533.
- Contents
download_cmf_micro (deprecated), _format_data, load_cmf_micro
- causeinfer.data.cmf_micro._format_data(dataset_path, format_covariates=True, normalize=True)[source]¶
Formats the data upon loading for consistent data preparation.
Source: https://github.com/thmstang/apa19-microfinance/blob/master/helpers.r (R-version)
- Parameters
- dataset_pathstr
The original file is a folder that has various .dta sets.
- format_covariatesbooloptional (default=True)
True: creates dummy columns and encodes the data.
False: only steps for data readability will be taken.
- normalizebooloptional (default=True)
Normalization step controlled in load_cmf_micro.
- Returns
- dfpd.DataFrame
A formated version of the data.
- causeinfer.data.cmf_micro.load_cmf_micro(file_path=None, format_covariates=True, normalize=True)[source]¶
Loads the CMF micro dataset with formatting if desired.
- Parameters
- file_pathstroptional (default=None)
Specify another path for the dataset.
By default the dataset should be stored in the ‘datasets’ folder in the cwd.
- load_raw_databooloptional (default=True)
Indicates whether raw data should be loaded without covariate manipulation.
- download_if_missingbooloptional (default=True) (Deprecated)
Download the dataset if it is not downloaded before using ‘download_cmf_micro’.
- normalizebooloptional (default=True)
Normalize the dataset to prepare it for ML methods.
- Returns
- datadict object with the following attributes:
- data.descriptionstr
A description of the CMF microfinance data.
- data.dataset_fullnumpy.ndarray(5328, 183) or formatted (5328, 60)
The full dataset with features, treatment, and target variables.
- data.dataset_full_nameslist, size 61
List of dataset variables names.
- data.featuresnumpy.ndarray(5328, 186) or formatted (5328, 57)
Each row corresponding to the 58 feature values in order (note that other target can be a feature).
- data.feature_nameslist, size 58
List of feature names.
- data.treatmentnumpy.ndarray(5328,)
Each value corresponds to the treatment (1 = treat, 0 = control).
- data.response_biz_indexnumpy.ndarray(5328,)
Each value corresponds to the business index of each of the participants.
- data.response_women_empnumpy.ndarray(5328,)
Each value corresponds to the women’s empowerment index of each of the participants.
Download Utilities¶
Utility functions for downloading data.
- Based on
Kuchumov, A. pyuplift: Lightweight uplift modeling framework for Python. (2019). URL: https://github.com/duketemon/pyuplift. License: https://github.com/duketemon/pyuplift/blob/master/LICENSE.
- Contents
download_file, get_download_paths
- causeinfer.data.download_utils.download_file(url: str, output_path: str, zip_file=False)[source]¶
Downloads a file from a url to a specified path.
- Parameters
- urlstr
the URL from which the file can be downloaded from.
- output_pathstr
a user specified path, which defaults to a ‘files’ folder in the cwd.
- causeinfer.data.download_utils.get_download_paths(file_path, file_directory='files', file_name='file')[source]¶
Derives paths for a file folder and a file.
- Parameters
- pathstr
A user specified path that the data should go to
- file_directorystr (default=files)
A user specified directory.
- file_namestr (default=file)
The name to call the file.
utils¶
The utils
module provides needed functions for causal inference model testing and deployment.
Functions
causeinfer.utils.mutli_cross_tab()
- causeinfer.utils.train_test_split(X, y, w, percent_train=0.7, random_state=None, maintain_proportions=False)[source]¶
Split unit X covariates and (y,w) outcome tuples into training and testing sets.
- Parameters
- Xnumpy.ndarray(n_samples, n_features)
Matrix of unit covariate features.
- ynumpy.ndarray(n_samples,)
Array of unit responses.
- wnumpy.ndarray(n_samples,)
Array of unit treatments.
- percent_trainfloat
The percent of the covariates and outcomes to delegate to model training.
- random_stateint (default=None)
A seed for the random number generator for consistency.
- maintain_proportionsbooloptional (default=False)
Whether to maintain the treatment group proportions within the split samples.
- Returns
- X_train, X_test, y_train, y_test, w_train, w_testnumpy.ndarray
Arrays of split covariates and outcomes.
- causeinfer.utils.plot_unit_distributions(df, variable, treatment=None, bins=None, axis=None)[source]¶
Plots seaborn countplots of unit covariate and outcome distributions.
- Parameters
- df_plotpandas df, [n_samples, n_features]
The data from which the plot is made.
- variablestr
A unit covariate or outcome for which the plot is desired.
- treatmentstroptional (default=None)
The treatment variable for comparing across segments.
- binsint (default=None)
Bins the column values such that larger distributions can be plotted.
- axisstroptional (default=None)
Adds an axis to the plot so they can be combined.
- Returns
- axmatplotlib.axes
Displays a seaborn plot of unit distributions across the given covariate or outcome value.
- causeinfer.utils.over_sample(X_1, y_1, w_1, sample_2_size, shuffle=True, random_state=None)[source]¶
Over-samples to provide equality between a given sample and another it is smaller than.
- Parameters
- X_1numpy.ndarray(num_sample1_units, num_sample1_features)
Dataframe of sample covariates.
- y_1numpy.ndarray(num_sample1_units,)
Vector of sample unit responses.
- w_1numpy.ndarray(num_sample1_units,)
Designates the original treatment allocation across sample units.
- sample_2_sizeint
The size of the other sample to match.
- shufflebooloptional (default=True)
Whether to shuffle the new sample after it’s created.
- random_stateint (default=None)
A seed for the random number generator to allow for consistency.
- Returns
- The provided covariates and outcomes, having been over-sampled to match another.
X_os : numpy.ndarray : (num_sample2_units, num_sample2_features).
y_os : numpy.ndarray : (num_sample2_units,).
w_os : numpy.ndarray : (num_sample2_units,).
Contributing to causeinfer¶
Thank you for your consideration in contributing to this project!
Please take a moment to review this document in order to make the contribution process easy and effective for everyone involved.
Following these guidelines helps to communicate that you respect the time of the developers managing and developing this open source project. In return, and in accordance with this project’s code of conduct, other contributors will reciprocate that respect in addressing your issue or assessing patches and features.
Using the issue tracker¶
The issue tracker for causeinfer is the preferred channel for bug reports, features requests and submitting pull requests.
Bug reports¶
A bug is a demonstrable problem that is caused by the code in the repository. Good bug reports are extremely helpful - thank you!
Guidelines for bug reports:
Use the GitHub issue search to check if the issue has already been reported.
Check if the issue has been fixed by trying to reproduce it using the latest
main
or development branch in the repository.Isolate the problem to make sure that the code in the repository is definitely responsible for the issue.
Great Bug Reports tend to have:
A quick summary
Steps to reproduce
What you expected would happen
What actually happens
Notes (why this might be happening, things tried that didn’t work, etc)
Again, thank you for your time in reporting issues!
Feature requests¶
Feature requests are more than welcome! Please take a moment to find out whether your idea fits with the scope and aims of the project. When making a suggestion, provide as much detail and context as possible, and further make clear the degree to which you would like to contribute in its development.
Pull requests¶
Good pull requests - patches, improvements and new features - are a fantastic help. They should remain focused in scope and avoid containing unrelated commits. Note that all contributions to this project will be made under the specified license and should follow the coding indentation and style standards (contact us if unsure).
Please ask first before embarking on any significant pull request (implementing features, refactoring code, etc), otherwise you risk spending a lot of time working on something that the developers might not want to merge into the project. With that being said, major additions are very appreciated!
When making a contribution, adhering to the GitHub flow process is the best way to get your work merged:
Fork the repo, clone your fork, and configure the remotes:
# Clone your fork of the repo into the current directory git clone https://github.com/<your-username>/<repo-name> # Navigate to the newly cloned directory cd <repo-name> # Assign the original repo to a remote called "upstream" git remote add upstream https://github.com/<upsteam-owner>/<repo-name>
If you cloned a while ago, get the latest changes from upstream:
git checkout <dev-branch> git pull upstream <dev-branch>
Create a new topic branch (off the main project development branch) to contain your feature, change, or fix:
git checkout -b <topic-branch-name>
Commit your changes in logical chunks, and please try to adhere to Conventional Commits. Use Git’s interactive rebase feature to tidy up your commits before making them public.
Locally merge (or rebase) the upstream development branch into your topic branch:
git pull --rebase upstream <dev-branch>
Push your topic branch up to your fork:
git push origin <topic-branch-name>
Open a Pull Request with a clear title and description.
Thank you in advance for your contributions!
License¶
BSD 3-Clause License
Copyright (c) 2019, the causeinfer developers. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Change log¶
Changelog¶
causeinfer tries to follow semantic versioning, a MAJOR.MINOR.PATCH version where increments are made of the:
MAJOR version when we make incompatible API changes
MINOR version when we add functionality in a backwards compatible manner
PATCH version when we make backwards compatible bug fixes
causeinfer 1.0.1 (June 3rd, 2022)¶
Updates source code files with direct references to codes they’re based on.
causeinfer 1.0.0 (December 28th, 2021)¶
Release switches causeinfer over to semantic versioning and indicates that it is stable
causeinfer 0.1.2 (April 4th, 2021)¶
Changes include:
An src structure has been adopted to improve organization and testing
Users are now able to implement the following models:
Reflective Uplift (Shaar 2016)
Pessimistic Uplift (Shaar 2016)
The contribution guidelines have been expanded
Code quality checks via Codacy have been added
Extensive code formatting has been done to improve quality and style
Bug fixes and a more explicit use of exceptions
causeinfer 0.1.0 (Feb 25th, 2021)¶
First stable release of causeinfer
Users are able to implement baseline causal inference models including:
Two model
Interaction term (Lo 2002)
Binary transformation (Lai 2006)
Quaternary transformation (Kane 2014)
Plotting functions allow for graphical analysis of models
Functions useful for research such as model iterations, oversampling, and variance analysis are included
The package is fully documented
Virtual environment files are provided
Extensive testing of all modules with GH Actions and Codecov has been performed
A code of conduct and contribution guidelines are included