utils¶
The utils
module provides needed functions for causal inference model testing and deployment.
Functions
causeinfer.utils.mutli_cross_tab()
- causeinfer.utils.train_test_split(X, y, w, percent_train=0.7, random_state=None, maintain_proportions=False)[source]¶
Split unit X covariates and (y,w) outcome tuples into training and testing sets.
- Parameters:
- Xnumpy.ndarray(n_samples, n_features)
Matrix of unit covariate features.
- ynumpy.ndarray(n_samples,)
Array of unit responses.
- wnumpy.ndarray(n_samples,)
Array of unit treatments.
- percent_trainfloat
The percent of the covariates and outcomes to delegate to model training.
- random_stateint (default=None)
A seed for the random number generator for consistency.
- maintain_proportionsbooloptional (default=False)
Whether to maintain the treatment group proportions within the split samples.
- Returns:
- X_train, X_test, y_train, y_test, w_train, w_testnumpy.ndarray
Arrays of split covariates and outcomes.
- causeinfer.utils.plot_unit_distributions(df, variable, treatment=None, bins=None, axis=None)[source]¶
Plots seaborn countplots of unit covariate and outcome distributions.
- Parameters:
- df_plotpandas df, [n_samples, n_features]
The data from which the plot is made.
- variablestr
A unit covariate or outcome for which the plot is desired.
- treatmentstroptional (default=None)
The treatment variable for comparing across segments.
- binsint (default=None)
Bins the column values such that larger distributions can be plotted.
- axisstroptional (default=None)
Adds an axis to the plot so they can be combined.
- Returns:
- axmatplotlib.axes
Displays a seaborn plot of unit distributions across the given covariate or outcome value.
- causeinfer.utils.over_sample(X_1, y_1, w_1, sample_2_size, shuffle=True, random_state=None)[source]¶
Over-samples to provide equality between a given sample and another it is smaller than.
- Parameters:
- X_1numpy.ndarray(num_sample1_units, num_sample1_features)
Dataframe of sample covariates.
- y_1numpy.ndarray(num_sample1_units,)
Vector of sample unit responses.
- w_1numpy.ndarray(num_sample1_units,)
Designates the original treatment allocation across sample units.
- sample_2_sizeint
The size of the other sample to match.
- shufflebooloptional (default=True)
Whether to shuffle the new sample after it’s created.
- random_stateint (default=None)
A seed for the random number generator to allow for consistency.
- Returns:
- The provided covariates and outcomes, having been over-sampled to match another.
X_os : numpy.ndarray : (num_sample2_units, num_sample2_features).
y_os : numpy.ndarray : (num_sample2_units,).
w_os : numpy.ndarray : (num_sample2_units,).