Mayo Clinic PBC

A dataset on medical trials to combat primary biliary cholangitis (PBC, formerly cirrhosis) of the liver from the Mayo Clinic.

See an example using this data at causeinfer/examples/medical_mayo_pbc.

Description found at:

https://www.mayo.edu/research/documents/pbchtml/DOC-10027635

Based on

Mayo Clinic. “Primary Biliary Cirrhosis”. 1991. URL: https://www.mayo.edu/research/documents/pbchtml/DOC-10027635.

Contents

download_mayo_pbc, _format_data, load_mayo_pbc

causeinfer.data.mayo_pbc.download_mayo_pbc(data_path=None, url='http://www.mayo.edu/research/documents/pbcdat/DOC-10026921')[source]

Downloads the dataset from the Mayo Clinic’s research documents.

Parameters:
data_pathstroptional (default=None)

A user specified path for where the data should go.

urlstr

The url from which the data is to be downloaded.

Returns:
The text file ‘mayo_pbc’ in a ‘datasets’ folder, unless otherwise specified.
causeinfer.data.mayo_pbc._format_data(dataset_path, format_covariates=True, normalize=True)[source]

Formats the data upon loading for consistent data preparation.

Parameters:
dataset_pathstr

The original file is a text file with inconsistent spacing, and periods for NaNs.

Furthermore, process only loads those units that took part in the randomized trial, as there are 106 cases that were monitored, but not in the trial.

format_covariatesbooloptional (default=True)
  • True: creates dummy columns and encodes the data.

  • False: only steps for data readability will be taken.

normalizebooloptional (default=True)

Normalization step controlled in load_mayo_pbc.

Returns:
dfpd.DataFrame

A formated version of the data.

causeinfer.data.mayo_pbc.load_mayo_pbc(file_path=None, format_covariates=True, download_if_missing=True, normalize=True)[source]

Loads the Mayo PBC dataset with formatting if desired.

Parameters:
file_pathstroptional (default=None)

Specify another path for the dataset.

By default the dataset should be stored in the ‘datasets’ folder in the cwd.

format_covariatesbooloptional (default=True)

Indicates whether raw data should be loaded without covariate manipulation.

download_if_missingbooloptional (default=True)

Download the dataset if it is not downloaded before using ‘download_mayo_pbc’.

normalizebooloptional (default=True)

Normalize the dataset to prepare it for ML methods.

Returns:
datadict object with the following attributes:
data.descriptionstr

A description of the Mayo Clinic PBC dataset.

data.dataset_fullnumpy.ndarray312, 19) or formatted (312, 24)

The full dataset with features, treatment, and target variables.

data.dataset_full_nameslist, size 19 or formatted 24

List of dataset variables names.

data.featuresnumpy.ndarray(312, 17) or formatted (312, 22)

Each row corresponding to the 17 feature values in order.

data.feature_nameslist, size 17 or formatted 22

List of feature names.

data.treatmentnumpy.ndarray(312,)

Each value corresponds to the treatment (1 = treat, 0 = control).

data.responsenumpy.ndarray(312,)

Each value corresponds to one of the outcomes (0 = alive, 1 = liver transplant, 2 = dead).