Hillstrom Email Marketing¶
An email marketing dataset from Kevin Hillstrom’s MineThatData blog.
See an example using this data at causeinfer/examples/business_hillstrom.
- Description found at:
https://blog.minethatdata.com/2008/03/minethatdata-e-mail-analytics-and-data.html
- Based on
Kuchumov, A. pyuplift: Lightweight uplift modeling framework for Python. (2019). URL: https://github.com/duketemon/pyuplift. License: https://github.com/duketemon/pyuplift/blob/master/LICENSE.
K. Hillstrom. “The MineThatData E-Mail Analytics And Data Mining Challenge”. 2008. URL: https://blog.minethatdata.com/2008/03/minethatdata-e-mail-analytics-and-data.html.
- Contents
download_hillstrom, _format_data, load_hillstrom
- causeinfer.data.hillstrom.download_hillstrom(data_path=None, url='http://www.minethatdata.com/Kevin_Hillstrom_MineThatData_E-MailAnalytics_DataMiningChallenge_2008.03.20.csv')[source]¶
Downloads the dataset from Kevin Hillstrom’s blog.
- Parameters:
- data_pathstroptional (default=None)
A user specified path for where the data should go.
- urlstr
The url from which the data is to be downloaded.
- Returns:
- The data ‘hillstrom.csv’ in a ‘datasets’ folder, unless otherwise specified.
- causeinfer.data.hillstrom._format_data(df, format_covariates=True, normalize=True)[source]¶
Formats the data upon loading for consistent data preparation.
- Parameters:
- dfpd.DataFrame
The original unformatted version of the data.
- format_covariatesbooloptional (default=True), controlled in load_hillstrom
True: creates dummy columns and encodes the data.
False: only steps for data readability will be taken.
- normalizebooloptional (default=True), controlled in load_hillstrom
Normalize dataset columns to prepare them for ML methods.
- Returns:
- dfpd.DataFrame
A formated version of the data.
- causeinfer.data.hillstrom.load_hillstrom(file_path=None, format_covariates=True, download_if_missing=True, normalize=True)[source]¶
Loads the Hillstrom dataset with formatting if desired.
- Parameters:
- file_pathstroptional (default=None)
Specify another path for the dataset.
By default the dataset should be stored in the ‘datasets’ folder in the cwd.
- format_covariatesbooloptional (default=True)
Indicates whether raw data should be loaded without covariate manipulation.
- download_if_missingbooloptional (default=True)
Download the dataset if it is not downloaded before using ‘download_hillstrom’.
- normalizebooloptional (default=True)
Normalize dataset columns to prepare them for ML methods.
- Returns:
- datadict object with the following attributes:
- data.descriptionstr
A description of the Hillstrom email marketing dataset.
- data.dataset_fullnumpy.ndarray(64000, 12) or formatted (64000, 22)
The full dataset with features, treatment, and target variables.
- data.dataset_full_nameslist, size 12 or formatted 22
List of dataset variables names.
- data.featuresnumpy.ndarray(64000, 8) or formatted (64000, 18)
Each row corresponding to the 8 feature values in order.
- data.feature_nameslist, size 8 or formatted 18
List of feature names.
- data.treatmentnumpy.ndarray(64000,)
Each value corresponds to the treatment.
- data.response_spendnumpy.ndarray(64000,)
Each value corresponds to how much customers spent during the two-week outcome period.
- data.response_visitnumpy.ndarray(64000,)
Each value corresponds to whether people visited the site during the two-week outcome period.
- data.response_conversionnumpy.ndarray(64000,)
Each value corresponds to whether they purchased at the site (i.e. converted) during the two-week outcome period.