API Reference

Estimate Treatment Effects

SparseSC.estimate_effects.estimate_effects(outcomes, unit_treatment_periods, T0=None, T1=None, covariates=None, max_n_pl=10000, ret_pl=False, ret_CI=False, level=0.95, fast=True, model_type='retrospective', T2=None, cf_folds=10, cf_seed=110011, **kwargs)

Determines statistical significance for average and individual effects

Parameters:
  • outcomes (np.array or pd.DataFrame with shape (N,T)) -- Outcomes
  • unit_treatment_periods (np.array or pd.Series with shape (N)) -- Vector of treatment periods for each unit (if a unit is never treated then use np.NaN if vector refers to time periods by numerical index and np.datetime64('NaT') if using DateTime to refer to time periods (and thne Y must be pd.DataFrame with columns in DateTime too)) If using a prospective-based design this is the true treatment periods (and fit will be called with pseudo-treatment periods that are T1 periods earlier).
  • T0 (int, Optional (default is pre-period for first treatment)) -- pre-history length to match over.
  • T1 (int, Optional (Default is post-period for last treatment)) -- post-history length to fit over.
  • covariates (np.array or pd.DataFrame with shape (N,K), Optional) -- Additional pre-treatment features
  • max_n_pl (int, Optional) -- The full number of placebos is choose(N0,N1). If N1=1 then this is N0. This number grows quickly with N1 so we can set a maximum that we compute and this is drawn at random.
  • ret_pl (bool) -- Return the matrix of placebos (different from the SC of the controls when N1>1)
  • ret_CI -- Whether to return confidence intervals (requires more memory during execution)
  • level (float (between 0 and 1)) -- Level for confidence intervals
  • fast (bool) -- Whether to use the fast approximate solution (fit_fast() rather than fit())
  • model_type -- Model type
  • T2 -- If model='prospective' then the period of which to evaluate the effect
  • kwargs -- Additional parameters passed to fit() or fit_fast()
Returns:

An instance of SparseSCEstResults with the fitted results

Raises:

ValueError -- when invalid parameters are passed

Keyword Args:

Passed on to fit() or fit_fast()

class SparseSC.estimate_effects.SparseSCEstResults(outcomes, fits, unit_treatment_periods, unit_treatment_periods_idx, unit_treatment_periods_idx_fit, T0, T1, pl_res_pre, pl_res_post, pl_res_post_scaled, max_n_pl, covariates=None, ind_CI=None, model_type='retrospective', T2=None, pl_res_post_fit=None)

Bases: object

Holds estimation info

CI

p-value for the current model if relevant, else None.

__init__(outcomes, fits, unit_treatment_periods, unit_treatment_periods_idx, unit_treatment_periods_idx_fit, T0, T1, pl_res_pre, pl_res_post, pl_res_post_scaled, max_n_pl, covariates=None, ind_CI=None, model_type='retrospective', T2=None, pl_res_post_fit=None)
Parameters:
  • outcomes -- Outcome for the whole sample
  • fits (dictionary of period->SparseSCFit) -- The fit() return objects
  • unit_treatment_periods -- Vector or treatment periods for each unit (if a unit is never treated then use np.NaN if vector refers to time periods by numerical index and np.datetime64('NaT') if using DateTime to refer to time periods (and thne Y must be pd.DataFrame with columns in DateTime too))
  • unit_treatment_periods_idx -- the conversion of unit_treatment_periods to indexes (helpful if use had datetime index)
  • unit_treatment_periods_idx_fit -- the treatment period indexes passed to fit (helpful if prospective-based design)
  • T0 -- Pre-history to match over
  • T1 -- post-history to evaluate over
  • pl_res_pre (PlaceboResults) -- Statistics for the average fit of the treated units in the pre-period (used for diagnostics)
  • pl_res_post (PlaceboResults) -- Statistics for the average treatment effect in the post-period
  • pl_res_post_scaled (PlaceboResults) -- Statistics for the average scaled treatment effect (difference divided by pre-treatment RMS fit) in the post-period.
  • max_n_pl -- maximum number of of placebos effects used for inference
  • covariates -- Nxk matrix of full baseline covariates (or None)
  • ind_CI (dictionary of period->CI_int. Each CI_int is for the full sample (not necessarily T0+T1)) -- Confidence intervals for SC predictions at the unit level (not averaged over N1). Used for graphing rather than treatment effect statistics
  • model_type -- Model type string
  • T2 -- T2 (if prospective-type design)
  • pl_res_post_fit -- If prospective-type designs, the PlaceboResults for target period used for fit (still before actual treatment)
fit

Handy accessor if there is only one treatment-period

get_V(treatment_period=None)

Returns V split across potential pre-treatment outcomes and baseline features

get_W(treatment_period=None)

Get W (np.ndarray 2D or pd.DataFrame depends on what was passed in) for one of the treatment periods

get_sc(treatment_period=None)

Returns and NxT matrix of synthetic controls. For units not eligible (those previously treated or between treatment_period and treatment_period+T1) The results is left empty.

get_tr_time_info(treatment_period)

Returns treatment info: a) indexes for the time index (helpful if user used an np datetime index) b) treatment period used in the call to fit (helpful if model is a prospective type)

Parameters:treatment_period -- treatment period in the user's view (could be TimeIndex)
Returns:(reatment_period_idx, treatment_period_idx_fit, treatment_period_fit)
p_value

p-value for the current model if relevant, else None.

class SparseSC.utils.metrics_utils.PlaceboResults(effect_vec, avg_joint_effect, rms_joint_effect, N_placebo)

Bases: object

Holds statistics for a vector of effects, include the full vector and two choices of aggregates (average and RMS)

__init__(effect_vec, avg_joint_effect, rms_joint_effect, N_placebo)
Parameters:
  • effect_vec (EstResultCI) -- Statistics for a vector of time-specific effects.
  • avg_joint_effect (EstResultCI) -- Statistics for the average effect.
  • rms_joint_effect (EstResultCI) -- Statistics for the RMS effect
  • N_placebo (EstResultCI) -- Number of placebos used for the statistis
class SparseSC.utils.metrics_utils.EstResultCI(effect, p, ci=None, placebos=None)

Bases: object

Holds an estimation result (effect + statistical significance)

__init__(effect, p, ci=None, placebos=None)
Parameters:
  • effect (scalar, vector, or pd.Series) -- Effect
  • p (Scalar or vector) -- p-value
  • ci (CI_int) -- Confidence interval
  • placebos (matrix) -- Full matrix of placebos
class SparseSC.utils.metrics_utils.CI_int(ci_low, ci_high, level)

Bases: object

Class to hold informatino for a confidence interval (for single point or for a vector)

__init__(ci_low, ci_high, level)
Parameters:
  • ci_low (scalar, vector, or pd.Series) -- Low-bound
  • ci_high (scalar, vector, or pd.Series) -- High-bound
  • level (float) -- Level (1-alpha) for the CI interval
class SparseSC.utils.metrics_utils.AA_results(diffs_pre, diffs_post, level=0.95, sym_CI=True)

Bases: object

Constructs simple results object from AA test that has already been fit (model_type="full") and diffs constructed.

__init__(diffs_pre, diffs_post, level=0.95, sym_CI=True)

Provides typical stats for sims of estimation :param diffs_pre: Pre-treatment diffs :param diffs_post: vector of CI lower bounds :param level: level :param sym_CI: Return symmetric CIs. Will also set "treatment" effect to 0 rather than mean of placebo diffs. :returns: AA_results

Fit a Synthetic Controls Model (Slow, Joint)

SparseSC.fit.fit(features, targets, treated_units=None, w_pen=None, v_pen=None, grid=None, grid_min=1e-06, grid_max=1, grid_length=20, stopping_rule=2, gradient_folds=10, w_pen_inner=False, match_space_maker=None, **kwargs)
Parameters:
  • features (matrix of floats) -- Matrix of features
  • targets (matrix of floats) -- Matrix of targets
  • model_type (str, default = "retrospective") -- Type of model being fit. One of "retrospective", "prospective", "prospective-restricted" or "full"
  • treated_units (int[], Optional) -- An iterable indicating the rows of X and Y which contain data from treated units.
  • w_pen (float | float[], optional) -- Penalty applied to the difference between the current weights and the null weights (1/n). default provided by :func:w_pen_guestimate.
  • v_pen (float | float[], optional) -- penalty (penalties) applied to the magnitude of the covariate weights. Defaults to [ Lambda_c_max * g for g in grid], where Lambda_c_max is determined via get_max_v_pen() .
  • grid (float | float[], optional) -- only used when v_pen is not provided. Defaults to np.exp(np.linspace(np.log(grid_min),np.log(grid_max),grid_length))
  • grid_min (float, default = 1e-6) -- Lower bound for grid when v_pen and grid are not provided. Must be in the range (0,1)
  • grid_max (float, default = 1) -- Upper bound for grid when v_pen and grid are not provided. Must be in the range (0,1]
  • grid_length (int, default = 20) -- number of points in the grid parameter when v_pen and grid are not provided
  • stopping_rule (int, float, or function) -- A stopping rule less than one is interpreted as the percent improvement in the out-of-sample squared prediction error required between the current and previous iteration in order to continue with the coordinate descent. A stopping rule of one or greater is interpreted as the number of iterations of the coordinate descent (rounded down to the nearest Int). Alternatively, stopping_rule may be a function which will be passed the current model fit, the previous model fit, and the iteration number (depending on it's signature), and should return a truthy value if the coordinate descent should stop and a falsey value if the coordinate descent should stop.
  • choice (str or function. default = "min") -- Method for choosing from among the v_pen. Only used when v_pen is an iterable. Defaults to "min" which selects the v_pen parameter associated with the lowest cross validation error.
  • cv_folds (int or (int[],int[])[], default = 10) -- An integer number of Cross Validation folds passed to sklearn.model_selection.KFold(), or an explicit list of train validation folds. TODO: These folds are calculated with KFold(...,shuffle=False), but instead, it should be assigned a random state.
  • gradient_folds (int or (int[],int[])[]) -- (default = 10) An integer number of Gradient folds passed to sklearn.model_selection.KFold(), or an explicit list of train validation folds, to be used model_type is one either "foo" "bar".
  • cv_seed (int, default = 10101) -- passed to sklearn.model_selection.KFold() to allow for consistent cross validation folds across calls
  • gradient_seed (int, default = 10101) -- passed to sklearn.model_selection.KFold() to allow for consistent gradient folds across calls when model_type is one either "foo" "bar" with and gradient_folds is an integer.
  • progress (boolean, default = True) -- Controls the level of verbosity. If True, the messages indication the progress are printed to the console (stdout).
  • kwargs -- Additional arguments passed to the optimizer (i.e. method or scipy.optimize.minimize). See below.
  • custom_donor_pool (boolean, default = None) -- By default all control units are allowed to be donors for all units. There are cases where this is not desired and so the user can pass in a matrix specifying a unit-specific donor pool (NxC matrix of booleans). Common reasons for restricting the allowability: (a) When we would like to reduce interpolation bias by restricting the donor pool to those units similar along certain features. (b) If units are not completely independent (for example there may be contamination between neighboring units). This is a violation of the Single Unit Treatment Value Assumption (SUTVA). Note: These are not used in the fitting stage (of V and penalties) just in final unit weight determination.
Keyword Args:
  • method (str or callable) -- The method or function
    responsible for performing gradient descent in the covariate space. If a string, it is passed as the method argument to scipy.optimize.minimize(). Otherwise, method must be a function with a signature compatible with scipy.optimize.minimize() (method(fun,x0,grad,**kwargs)) which returns an object having x and fun attributes. (Default = SparseSC.optimizers.cd_line_search.cdl_search())
  • learning_rate (float, Default = 0.2) -- The initial learning rate
    which determines the initial step size, which is set to learning_rate * null_model_error / gradient. Must be between 0 and 1.
  • learning_rate_adjustment (float, Default = 0.9) -- Adjustment factor
    applied to the learning rate applied between iterations when the optimal step size returned by scipy.optimize.line_search() is greater less than 1, else the step size is adjusted by 1/learning_rate_adjustment. Must be between 0 and 1,
  • tol (float, Default = 0.0001) -- Tolerance used for the stopping
    rule based on the proportion of the in-sample residual error reduced in the last step of the gradient descent.
Returns:

A SparseSCFit object containing details of the fitted model.

Return type:

SparseSCFit

Raises:

ValueError -- when treated_units is not None and not an iterable, or when model_type is not one of the allowed values

class SparseSC.fit.SparseSCFit(features, targets, control_units, treated_units, model_type, V, sc_weights, targets_sc=None, fitted_v_pen=None, fitted_w_pen=None, initial_v_pen=None, initial_w_pen=None, score=None, scores=None, selected_score=None, match_space_trans=None, match_space=None, match_space_desc=None)

Bases: object

A class representing the results of a Synthetic Control model instance.

__init__(features, targets, control_units, treated_units, model_type, V, sc_weights, targets_sc=None, fitted_v_pen=None, fitted_w_pen=None, initial_v_pen=None, initial_w_pen=None, score=None, scores=None, selected_score=None, match_space_trans=None, match_space=None, match_space_desc=None)
get_weights(include_trivial_donors=True)

getter for the sc_weights. By default, the trivial

predict(targets=None, include_trivial_donors=True)

predict method

Parameters:
  • targets ((optional) matrix of floats) -- Matrix of targets
  • include_trivial_donors (boolean) -- Should donors for whom selected predictors and all targets equal to zero be included in the weights for non-trivial units. These units will typically have a weight of 1 / total number of units as they do not contribute to the gradient. Default = `False`
Returns:

matrix of predicted outcomes

Return type:

matrix of floats

Raises:

ValueError -- When targets.shape[0] is inconsistent with the fitted model.

sc_weights

getter for the sc_weights. By default, the trivial

show()

display goodness of figures illustrating goodness of fit

summary()

A summary of the model fit / penalty selection

This illustrates that (a) the gird function could / should be better, and (b) currently more than two iterations is typically useless.

Fit a Synthetic Controls Model (Fast, Separate)

SparseSC.fit_fast.fit_fast(features, targets, model_type='restrospective', treated_units=None, w_pens=None, custom_donor_pool=None, match_space_maker=None, w_pen_inner=True, avoid_NxN_mats=False, verbose=0, targets_aux=None, **kwargs)
Parameters:
  • features (matrix of floats) -- Matrix of features
  • targets (matrix of floats) -- Matrix of targets
  • model_type (str, default = "retrospective") -- Type of model being fit. One of "retrospective", "prospective", "prospective-restricted" or "full"
  • treated_units (int[], default=np.logspace(start=-5, stop=5, num=40) (sklearn.RidgeCV can't automatically pick)) -- An iterable indicating the rows of X and Y which contain data from treated units.
  • w_pens (float[], default=np.logspace(start=-5, stop=5, num=40)) -- Penalization values to try when searching for unit weights.
  • treated_units -- An iterable indicating the rows of X and Y which contain data from treated units.
  • custom_donor_pool (boolean, default = None) -- By default all control units are allowed to be donors for all units. There are cases where this is not desired and so the user can pass in a matrix specifying a unit-specific donor pool (NxC matrix of booleans). Common reasons for restricting the allowability: (a) When we would like to reduce interpolation bias by restricting the donor pool to those units similar along certain features. (b) If units are not completely independent (for example there may be contamination between neighboring units). This is a violation of the Single Unit Treatment Value Assumption (SUTVA). Note: These are not used in the fitting stage (of V and penalties) just in final unit weight determination.
  • match_space_maker -- Function with signature MatchSpace_transformer, V_vector, best_v_pen, V desc = match_space_maker(X, Y, fit_model_wrapper) where we can call fit_model_wrapper(MatchSpace_transformer, V_vector). Default is MTLassoCV_MatchSpace_factory().
  • avoid_NxN_mats (bool, default=False) -- There are several points where typically a matrices on the order of NxN would be made (either N or N_c). With a large number of units these can be quite big. These can be avoided. One consequence is that the full unit-level weights will not be kept and just the built Synthetic Control outcome will be return.
  • verbose (int, default=0) -- Verbosity level. 0 means no printouts. 1 will note times of completing each of the 3 main stages and some loop progress bars. 2 will print memory snapshots (Optionally out to a file if the env var SparseSC_log_file is set).
  • kwargs -- Additional parameters so that one can easily switch between fit() and fit_fast()
Returns:

A SparseSCFit object containing details of the fitted model.

Return type:

SparseSCFit

Raises:

ValueError -- when treated_units is not None and not an iterable, or when model_type is not one of the allowed values

SparseSC.utils.match_space.Fixed_V_factory(V)

Return a MatchSpace function with user-supplied V over raw X.

Parameters:V -- V Matrix on the raw features
Returns:a function with the signature MatchSpace fn, V vector, best_v_pen, V = function(X,Y)
SparseSC.utils.match_space.MTLassoCV_MatchSpace_factory(v_pens=None, n_v_cv=5, sample_frac=1, Y_col_block_size=None, se_factor=None, normalize=True)

Return a MatchSpace function that will fit a MultiTaskLassoCV for Y ~ X

Parameters:
  • v_pens -- Penalties to evaluate (default is to automatically determince)
  • n_v_cv -- Number of Cross-Validation folds
  • sample_frac -- Fraction of the data to sample
  • se_factor -- Allows taking a different penalty than the min mse. Similar to the lambda.1se rule, if not None, it will take the max lambda that has mse < min_mse + se_factor*(MSE standard error).
Returns:

MatchSpace fn, V vector, best_v_pen, V

SparseSC.utils.match_space.MTLSTMMixed_MatchSpace_factory(T0=None, K_fixed=0, M_sizes=None, dropout_rate=0.2, epochs=2, verbose=0, hidden_length=100)

Return a MatchSpace function that will fit an LSTM of [X_fixed, X_time_varying, Y_pre] ~ Y with the hidden-layer size optimized to reduce errors on goal units

Parameters:
  • T0 -- length of Y_pre
  • K_fixed -- Number of fixed unit-covariates (rest will assume to be time-varying)
  • M_sizes -- list of sizes of hidden layer (match-space) sizes to try. Default is range(1, 2*int(np.log(Y.shape[0])))
  • dropout_rate --
  • epochs --
  • verbose --
  • hidden_length --
Returns:

MatchSpace fn, V vector, best_M_size, V

SparseSC.utils.match_space.MTLassoMixed_MatchSpace_factory(v_pens=None, n_v_cv=5)

Return a MatchSpace function that will fit a MultiTaskLassoCV for Y ~ X with the penalization optimized to reduce errors on goal units

Parameters:
  • v_pens -- Penalties to evaluate (default is to automatically determince)
  • n_v_cv -- Number of Cross-Validation folds
Returns:

MatchSpace fn, V vector, best_v_pen, V