utils¶

The utils module provides needed functions for causal inference model testing and deployment.

Functions

causeinfer.utils.train_test_split(X, y, w, percent_train=0.7, random_state=None, maintain_proportions=False)[source]¶

Split unit X covariates and (y,w) outcome tuples into training and testing sets.

Parameters:

Xnumpy.ndarray(n_samples, n_features): Matrix of unit covariate features.
ynumpy.ndarray(n_samples,): Array of unit responses.
wnumpy.ndarray(n_samples,): Array of unit treatments.
percent_trainfloat: The percent of the covariates and outcomes to delegate to model training.
random_stateint (default=None): A seed for the random number generator for consistency.
maintain_proportionsbooloptional (default=False): Whether to maintain the treatment group proportions within the split samples.

Returns:

X_train, X_test, y_train, y_test, w_train, w_testnumpy.ndarray: Arrays of split covariates and outcomes.

causeinfer.utils.plot_unit_distributions(df, variable, treatment=None, bins=None, axis=None)[source]¶

Plots seaborn countplots of unit covariate and outcome distributions.

Parameters:

df_plotpandas df, [n_samples, n_features]: The data from which the plot is made.
variablestr: A unit covariate or outcome for which the plot is desired.
treatmentstroptional (default=None): The treatment variable for comparing across segments.
binsint (default=None): Bins the column values such that larger distributions can be plotted.
axisstroptional (default=None): Adds an axis to the plot so they can be combined.

Returns:

axmatplotlib.axes: Displays a seaborn plot of unit distributions across the given covariate or outcome value.

causeinfer.utils.over_sample(X_1, y_1, w_1, sample_2_size, shuffle=True, random_state=None)[source]¶

Over-samples to provide equality between a given sample and another it is smaller than.

Parameters:

X_1numpy.ndarray(num_sample1_units, num_sample1_features): Dataframe of sample covariates.
y_1numpy.ndarray(num_sample1_units,): Vector of sample unit responses.
w_1numpy.ndarray(num_sample1_units,): Designates the original treatment allocation across sample units.
sample_2_sizeint: The size of the other sample to match.
shufflebooloptional (default=True): Whether to shuffle the new sample after it’s created.
random_stateint (default=None): A seed for the random number generator to allow for consistency.

Returns:

The provided covariates and outcomes, having been over-sampled to match another.