API Reference¶

Estimate Treatment Effects¶

SparseSC.estimate_effects.estimate_effects(outcomes, unit_treatment_periods, T0=None, T1=None, covariates=None, max_n_pl=10000, ret_pl=False, ret_CI=False, level=0.95, fast=True, model_type='retrospective', T2=None, cf_folds=10, cf_seed=110011, **kwargs)

Determines statistical significance for average and individual effects

Parameters:	outcomes (np.array or pd.DataFrame with shape (N,T)) -- Outcomes unit_treatment_periods (np.array or pd.Series with shape (N)) -- Vector of treatment periods for each unit (if a unit is never treated then use np.NaN if vector refers to time periods by numerical index and np.datetime64('NaT') if using DateTime to refer to time periods (and thne Y must be pd.DataFrame with columns in DateTime too)) If using a prospective-based design this is the true treatment periods (and fit will be called with pseudo-treatment periods that are T1 periods earlier). T0 (int, Optional (default is pre-period for first treatment)) -- pre-history length to match over. T1 (int, Optional (Default is post-period for last treatment)) -- post-history length to fit over. covariates (np.array or pd.DataFrame with shape (N,K), Optional) -- Additional pre-treatment features max_n_pl (int, Optional) -- The full number of placebos is choose(N0,N1). If N1=1 then this is N0. This number grows quickly with N1 so we can set a maximum that we compute and this is drawn at random. ret_pl (bool) -- Return the matrix of placebos (different from the SC of the controls when N1>1) ret_CI -- Whether to return confidence intervals (requires more memory during execution) level (float (between 0 and 1)) -- Level for confidence intervals fast (bool) -- Whether to use the fast approximate solution (fit_fast() rather than fit()) model_type -- Model type T2 -- If model='prospective' then the period of which to evaluate the effect kwargs -- Additional parameters passed to fit() or fit_fast()
Returns:	An instance of SparseSCEstResults with the fitted results
Raises:	ValueError -- when invalid parameters are passed
Keyword Args:	Passed on to fit() or fit_fast()

class SparseSC.estimate_effects.SparseSCEstResults(outcomes, fits, unit_treatment_periods, unit_treatment_periods_idx, unit_treatment_periods_idx_fit, T0, T1, pl_res_pre, pl_res_post, pl_res_post_scaled, max_n_pl, covariates=None, ind_CI=None, model_type='retrospective', T2=None, pl_res_post_fit=None)

Bases: object

Holds estimation info

CI: p-value for the current model if relevant, else None.

__init__(outcomes, fits, unit_treatment_periods, unit_treatment_periods_idx, unit_treatment_periods_idx_fit, T0, T1, pl_res_pre, pl_res_post, pl_res_post_scaled, max_n_pl, covariates=None, ind_CI=None, model_type='retrospective', T2=None, pl_res_post_fit=None)

Parameters:

outcomes -- Outcome for the whole sample
fits (dictionary of period->SparseSCFit) -- The fit() return objects
unit_treatment_periods -- Vector or treatment periods for each unit (if a unit is never treated then use np.NaN if vector refers to time periods by numerical index and np.datetime64('NaT') if using DateTime to refer to time periods (and thne Y must be pd.DataFrame with columns in DateTime too))
unit_treatment_periods_idx -- the conversion of unit_treatment_periods to indexes (helpful if use had datetime index)
unit_treatment_periods_idx_fit -- the treatment period indexes passed to fit (helpful if prospective-based design)
T0 -- Pre-history to match over
T1 -- post-history to evaluate over
pl_res_pre (PlaceboResults) -- Statistics for the average fit of the treated units in the pre-period (used for diagnostics)
pl_res_post (PlaceboResults) -- Statistics for the average treatment effect in the post-period
pl_res_post_scaled (PlaceboResults) -- Statistics for the average scaled treatment effect (difference divided by pre-treatment RMS fit) in the post-period.
max_n_pl -- maximum number of of placebos effects used for inference
covariates -- Nxk matrix of full baseline covariates (or None)
ind_CI (dictionary of period->CI_int. Each CI_int is for the full sample (not necessarily T0+T1)) -- Confidence intervals for SC predictions at the unit level (not averaged over N1). Used for graphing rather than treatment effect statistics
model_type -- Model type string
T2 -- T2 (if prospective-type design)
pl_res_post_fit -- If prospective-type designs, the PlaceboResults for target period used for fit (still before actual treatment)

fit: Handy accessor if there is only one treatment-period

get_V(treatment_period=None): Returns V split across potential pre-treatment outcomes and baseline features

get_W(treatment_period=None): Get W (np.ndarray 2D or pd.DataFrame depends on what was passed in) for one of the treatment periods

get_sc(treatment_period=None): Returns and NxT matrix of synthetic controls. For units not eligible (those previously treated or between treatment_period and treatment_period+T1) The results is left empty.

get_tr_time_info(treatment_period)

Returns treatment info: a) indexes for the time index (helpful if user used an np datetime index) b) treatment period used in the call to fit (helpful if model is a prospective type)

Parameters:	treatment_period -- treatment period in the user's view (could be TimeIndex)
Returns:	(reatment_period_idx, treatment_period_idx_fit, treatment_period_fit)

p_value: p-value for the current model if relevant, else None.

class SparseSC.utils.metrics_utils.PlaceboResults(effect_vec, avg_joint_effect, rms_joint_effect, N_placebo)

Bases: object

Holds statistics for a vector of effects, include the full vector and two choices of aggregates (average and RMS)

__init__(effect_vec, avg_joint_effect, rms_joint_effect, N_placebo)

Parameters:	effect_vec (EstResultCI) -- Statistics for a vector of time-specific effects. avg_joint_effect (EstResultCI) -- Statistics for the average effect. rms_joint_effect (EstResultCI) -- Statistics for the RMS effect N_placebo (EstResultCI) -- Number of placebos used for the statistis

class SparseSC.utils.metrics_utils.EstResultCI(effect, p, ci=None, placebos=None)

Bases: object

Holds an estimation result (effect + statistical significance)

__init__(effect, p, ci=None, placebos=None)

Parameters:	effect (scalar, vector, or pd.Series) -- Effect p (Scalar or vector) -- p-value ci (CI_int) -- Confidence interval placebos (matrix) -- Full matrix of placebos

class SparseSC.utils.metrics_utils.CI_int(ci_low, ci_high, level)

Bases: object

Class to hold informatino for a confidence interval (for single point or for a vector)

__init__(ci_low, ci_high, level)

Parameters:	ci_low (scalar, vector, or pd.Series) -- Low-bound ci_high (scalar, vector, or pd.Series) -- High-bound level (float) -- Level (1-alpha) for the CI interval

class SparseSC.utils.metrics_utils.AA_results(diffs_pre, diffs_post, level=0.95, sym_CI=True)

Bases: object

Constructs simple results object from AA test that has already been fit (model_type="full") and diffs constructed.

__init__(diffs_pre, diffs_post, level=0.95, sym_CI=True): Provides typical stats for sims of estimation :param diffs_pre: Pre-treatment diffs :param diffs_post: vector of CI lower bounds :param level: level :param sym_CI: Return symmetric CIs. Will also set "treatment" effect to 0 rather than mean of placebo diffs. :returns: AA_results

Fit a Synthetic Controls Model (Slow, Joint)¶

SparseSC.fit.fit(features, targets, treated_units=None, w_pen=None, v_pen=None, grid=None, grid_min=1e-06, grid_max=1, grid_length=20, stopping_rule=2, gradient_folds=10, w_pen_inner=False, match_space_maker=None, **kwargs)

Parameters:	features (matrix of floats) -- Matrix of features targets (matrix of floats) -- Matrix of targets model_type (str, default = `"retrospective"`) -- Type of model being fit. One of `"retrospective"`, `"prospective"`, `"prospective-restricted"` or `"full"` treated_units (int[], Optional) -- An iterable indicating the rows of X and Y which contain data from treated units. w_pen (float \| float[], optional) -- Penalty applied to the difference between the current weights and the null weights (1/n). default provided by :func:`w_pen_guestimate`. v_pen (float \| float[], optional) -- penalty (penalties) applied to the magnitude of the covariate weights. Defaults to `[ Lambda_c_max * g for g in grid]`, where Lambda_c_max is determined via `get_max_v_pen()` . grid (float \| float[], optional) -- only used when v_pen is not provided. Defaults to `np.exp(np.linspace(np.log(grid_min),np.log(grid_max),grid_length))` grid_min (float, default = 1e-6) -- Lower bound for `grid` when `v_pen` and `grid` are not provided. Must be in the range `(0,1)` grid_max (float, default = 1) -- Upper bound for `grid` when `v_pen` and `grid` are not provided. Must be in the range `(0,1]` grid_length (int, default = 20) -- number of points in the `grid` parameter when `v_pen` and `grid` are not provided stopping_rule (int, float, or function) -- A stopping rule less than one is interpreted as the percent improvement in the out-of-sample squared prediction error required between the current and previous iteration in order to continue with the coordinate descent. A stopping rule of one or greater is interpreted as the number of iterations of the coordinate descent (rounded down to the nearest Int). Alternatively, `stopping_rule` may be a function which will be passed the current model fit, the previous model fit, and the iteration number (depending on it's signature), and should return a truthy value if the coordinate descent should stop and a falsey value if the coordinate descent should stop. choice (str or function. default = `"min"`) -- Method for choosing from among the v_pen. Only used when v_pen is an iterable. Defaults to `"min"` which selects the v_pen parameter associated with the lowest cross validation error. cv_folds (int or (int[],int[])[], default = 10) -- An integer number of Cross Validation folds passed to `sklearn.model_selection.KFold()`, or an explicit list of train validation folds. TODO: These folds are calculated with `KFold(...,shuffle=False)`, but instead, it should be assigned a random state. gradient_folds (int or (int[],int[])[]) -- (default = 10) An integer number of Gradient folds passed to `sklearn.model_selection.KFold()`, or an explicit list of train validation folds, to be used model_type is one either `"foo"` `"bar"`. cv_seed (int, default = 10101) -- passed to `sklearn.model_selection.KFold()` to allow for consistent cross validation folds across calls gradient_seed (int, default = 10101) -- passed to `sklearn.model_selection.KFold()` to allow for consistent gradient folds across calls when model_type is one either `"foo"` `"bar"` with and gradient_folds is an integer. progress (boolean, default = `True`) -- Controls the level of verbosity. If True, the messages indication the progress are printed to the console (stdout). kwargs -- Additional arguments passed to the optimizer (i.e. `method` or scipy.optimize.minimize). See below. custom_donor_pool (boolean, default = `None`) -- By default all control units are allowed to be donors for all units. There are cases where this is not desired and so the user can pass in a matrix specifying a unit-specific donor pool (NxC matrix of booleans). Common reasons for restricting the allowability: (a) When we would like to reduce interpolation bias by restricting the donor pool to those units similar along certain features. (b) If units are not completely independent (for example there may be contamination between neighboring units). This is a violation of the Single Unit Treatment Value Assumption (SUTVA). Note: These are not used in the fitting stage (of V and penalties) just in final unit weight determination.
Keyword Args:	method (str or callable) -- The method or function responsible for performing gradient descent in the covariate space. If a string, it is passed as the `method` argument to `scipy.optimize.minimize()`. Otherwise, `method` must be a function with a signature compatible with `scipy.optimize.minimize()` (`method(fun,x0,grad,kwargs)`) which returns an object having `x` and `fun` attributes. (Default = `SparseSC.optimizers.cd_line_search.cdl_search()`) learning_rate** (float, Default = 0.2) -- The initial learning rate which determines the initial step size, which is set to `learning_rate * null_model_error / gradient`. Must be between 0 and 1. learning_rate_adjustment (float, Default = 0.9) -- Adjustment factor applied to the learning rate applied between iterations when the optimal step size returned by `scipy.optimize.line_search()` is greater less than 1, else the step size is adjusted by `1/learning_rate_adjustment`. Must be between 0 and 1, tol (float, Default = 0.0001) -- Tolerance used for the stopping rule based on the proportion of the in-sample residual error reduced in the last step of the gradient descent.
Returns:	A `SparseSCFit` object containing details of the fitted model.
Return type:	`SparseSCFit`
Raises:	ValueError -- when `treated_units` is not None and not an `iterable`, or when model_type is not one of the allowed values

class SparseSC.fit.SparseSCFit(features, targets, control_units, treated_units, model_type, V, sc_weights, targets_sc=None, fitted_v_pen=None, fitted_w_pen=None, initial_v_pen=None, initial_w_pen=None, score=None, scores=None, selected_score=None, match_space_trans=None, match_space=None, match_space_desc=None)

Bases: object

A class representing the results of a Synthetic Control model instance.

__init__(features, targets, control_units, treated_units, model_type, V, sc_weights, targets_sc=None, fitted_v_pen=None, fitted_w_pen=None, initial_v_pen=None, initial_w_pen=None, score=None, scores=None, selected_score=None, match_space_trans=None, match_space=None, match_space_desc=None)

get_weights(include_trivial_donors=True): getter for the sc_weights. By default, the trivial

predict(targets=None, include_trivial_donors=True)

predict method

Parameters:	targets ((optional) matrix of floats) -- Matrix of targets include_trivial_donors (boolean) -- Should donors for whom selected predictors and all targets equal to zero be included in the weights for non-trivial units. These units will typically have a weight of 1 / total number of units as they do not contribute to the gradient. Default = `False`
Returns:	matrix of predicted outcomes
Return type:	matrix of floats
Raises:	ValueError -- When `targets.shape[0]` is inconsistent with the fitted model.

sc_weights: getter for the sc_weights. By default, the trivial

show(): display goodness of figures illustrating goodness of fit

summary()

A summary of the model fit / penalty selection

This illustrates that (a) the gird function could / should be better, and (b) currently more than two iterations is typically useless.

Fit a Synthetic Controls Model (Fast, Separate)¶

SparseSC.fit_fast.fit_fast(features, targets, model_type='restrospective', treated_units=None, w_pens=None, custom_donor_pool=None, match_space_maker=None, w_pen_inner=True, avoid_NxN_mats=False, verbose=0, targets_aux=None, **kwargs)

Parameters:	features (matrix of floats) -- Matrix of features targets (matrix of floats) -- Matrix of targets model_type (str, default = `"retrospective"`) -- Type of model being fit. One of `"retrospective"`, `"prospective"`, `"prospective-restricted"` or `"full"` treated_units (int[], default=np.logspace(start=-5, stop=5, num=40) (sklearn.RidgeCV can't automatically pick)) -- An iterable indicating the rows of X and Y which contain data from treated units. w_pens (float[], default=np.logspace(start=-5, stop=5, num=40)) -- Penalization values to try when searching for unit weights. treated_units -- An iterable indicating the rows of X and Y which contain data from treated units. custom_donor_pool (boolean, default = `None`) -- By default all control units are allowed to be donors for all units. There are cases where this is not desired and so the user can pass in a matrix specifying a unit-specific donor pool (NxC matrix of booleans). Common reasons for restricting the allowability: (a) When we would like to reduce interpolation bias by restricting the donor pool to those units similar along certain features. (b) If units are not completely independent (for example there may be contamination between neighboring units). This is a violation of the Single Unit Treatment Value Assumption (SUTVA). Note: These are not used in the fitting stage (of V and penalties) just in final unit weight determination. match_space_maker -- Function with signature MatchSpace_transformer, V_vector, best_v_pen, V desc = match_space_maker(X, Y, fit_model_wrapper) where we can call fit_model_wrapper(MatchSpace_transformer, V_vector). Default is MTLassoCV_MatchSpace_factory(). avoid_NxN_mats (bool, default=False) -- There are several points where typically a matrices on the order of NxN would be made (either N or N_c). With a large number of units these can be quite big. These can be avoided. One consequence is that the full unit-level weights will not be kept and just the built Synthetic Control outcome will be return. verbose (int, default=0) -- Verbosity level. 0 means no printouts. 1 will note times of completing each of the 3 main stages and some loop progress bars. 2 will print memory snapshots (Optionally out to a file if the env var SparseSC_log_file is set). kwargs -- Additional parameters so that one can easily switch between fit() and fit_fast()
Returns:	A `SparseSCFit` object containing details of the fitted model.
Return type:	`SparseSCFit`
Raises:	ValueError -- when `treated_units` is not None and not an `iterable`, or when model_type is not one of the allowed values

SparseSC.utils.match_space.Fixed_V_factory(V)

Return a MatchSpace function with user-supplied V over raw X.

Parameters:	V -- V Matrix on the raw features
Returns:	a function with the signature MatchSpace fn, V vector, best_v_pen, V = function(X,Y)

SparseSC.utils.match_space.MTLassoCV_MatchSpace_factory(v_pens=None, n_v_cv=5, sample_frac=1, Y_col_block_size=None, se_factor=None, normalize=True)

Return a MatchSpace function that will fit a MultiTaskLassoCV for Y ~ X

Parameters:	v_pens -- Penalties to evaluate (default is to automatically determince) n_v_cv -- Number of Cross-Validation folds sample_frac -- Fraction of the data to sample se_factor -- Allows taking a different penalty than the min mse. Similar to the lambda.1se rule, if not None, it will take the max lambda that has mse < min_mse + se_factor*(MSE standard error).
Returns:	MatchSpace fn, V vector, best_v_pen, V

SparseSC.utils.match_space.MTLSTMMixed_MatchSpace_factory(T0=None, K_fixed=0, M_sizes=None, dropout_rate=0.2, epochs=2, verbose=0, hidden_length=100)

Return a MatchSpace function that will fit an LSTM of [X_fixed, X_time_varying, Y_pre] ~ Y with the hidden-layer size optimized to reduce errors on goal units

Parameters:	T0 -- length of Y_pre K_fixed -- Number of fixed unit-covariates (rest will assume to be time-varying) M_sizes -- list of sizes of hidden layer (match-space) sizes to try. Default is range(1, 2int(np.log(Y.shape[0]))) dropout_rate* -- epochs -- verbose -- hidden_length --
Returns:	MatchSpace fn, V vector, best_M_size, V

SparseSC.utils.match_space.MTLassoMixed_MatchSpace_factory(v_pens=None, n_v_cv=5)

Return a MatchSpace function that will fit a MultiTaskLassoCV for Y ~ X with the penalization optimized to reduce errors on goal units

Parameters:	v_pens -- Penalties to evaluate (default is to automatically determince) n_v_cv -- Number of Cross-Validation folds
Returns:	MatchSpace fn, V vector, best_v_pen, V