utils

The utils module provides needed functions for causal inference model testing and deployment.

Functions

causeinfer.utils.train_test_split(X, y, w, percent_train=0.7, random_state=None, maintain_proportions=False)[source]

Split unit X covariates and (y,w) outcome tuples into training and testing sets.

Parameters:
Xnumpy.ndarray(n_samples, n_features)

Matrix of unit covariate features.

ynumpy.ndarray(n_samples,)

Array of unit responses.

wnumpy.ndarray(n_samples,)

Array of unit treatments.

percent_trainfloat

The percent of the covariates and outcomes to delegate to model training.

random_stateint (default=None)

A seed for the random number generator for consistency.

maintain_proportionsbooloptional (default=False)

Whether to maintain the treatment group proportions within the split samples.

Returns:
X_train, X_test, y_train, y_test, w_train, w_testnumpy.ndarray

Arrays of split covariates and outcomes.

causeinfer.utils.plot_unit_distributions(df, variable, treatment=None, bins=None, axis=None)[source]

Plots seaborn countplots of unit covariate and outcome distributions.

Parameters:
df_plotpandas df, [n_samples, n_features]

The data from which the plot is made.

variablestr

A unit covariate or outcome for which the plot is desired.

treatmentstroptional (default=None)

The treatment variable for comparing across segments.

binsint (default=None)

Bins the column values such that larger distributions can be plotted.

axisstroptional (default=None)

Adds an axis to the plot so they can be combined.

Returns:
axmatplotlib.axes

Displays a seaborn plot of unit distributions across the given covariate or outcome value.

causeinfer.utils.over_sample(X_1, y_1, w_1, sample_2_size, shuffle=True, random_state=None)[source]

Over-samples to provide equality between a given sample and another it is smaller than.

Parameters:
X_1numpy.ndarray(num_sample1_units, num_sample1_features)

Dataframe of sample covariates.

y_1numpy.ndarray(num_sample1_units,)

Vector of sample unit responses.

w_1numpy.ndarray(num_sample1_units,)

Designates the original treatment allocation across sample units.

sample_2_sizeint

The size of the other sample to match.

shufflebooloptional (default=True)

Whether to shuffle the new sample after it’s created.

random_stateint (default=None)

A seed for the random number generator to allow for consistency.

Returns:
The provided covariates and outcomes, having been over-sampled to match another.
  • X_os : numpy.ndarray : (num_sample2_units, num_sample2_features).

  • y_os : numpy.ndarray : (num_sample2_units,).

  • w_os : numpy.ndarray : (num_sample2_units,).