python - scikit ShuffleSplit raising pandas "IndexError: index N is out of bounds for axis 0 with size M" -


i'm trying use scikit's gridsearch find best alpha lasso, , 1 of parameters want iterate cross validation split. so, i'm doing:

# x_train := pandas dataframe no index (auto numbered index) , 62064 rows # y_train := pandas 1-column dataframe no index (auto numbered index) , 62064 rows  sklearn import linear_model lm sklearn import cross_validation cv sklearn import grid_search  model = lm.lassocv(eps=0.001, n_alphas=1000)  params = {"cv": [cv.shufflesplit(n=len(x_train), test_size=0.2),                  cv.shufflesplit(n=len(x_train), test_size=0.1)]}  m_model = grid_search.gridsearchcv(model, params)  m_model.fit(x_train, y_train) 

but raises exception

--------------------------------------------------------------------------- indexerror                                traceback (most recent call last) <ipython-input-113-f791cb0644c1> in <module>()      10 m_model = grid_search.gridsearchcv(model, params)      11  ---> 12 m_model.fit(x_train.as_matrix(), y_train.as_matrix())  /home/user/programs/repos/pyenv/versions/3.5.2/envs/work/lib/python3.5/site-packages/sklearn/grid_search.py in fit(self, x, y)     802      803         """ --> 804         return self._fit(x, y, parametergrid(self.param_grid))     805      806   /home/user/programs/repos/pyenv/versions/3.5.2/envs/work/lib/python3.5/site-packages/sklearn/grid_search.py in _fit(self, x, y, parameter_iterable)     551                                     self.fit_params, return_parameters=true,     552                                     error_score=self.error_score) --> 553                 parameters in parameter_iterable     554                 train, test in cv)     555   /home/user/programs/repos/pyenv/versions/3.5.2/envs/work/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in __call__(self, iterable)     798             # dispatched. in particular covers edge     799             # case of parallel used exhausted iterator. --> 800             while self.dispatch_one_batch(iterator):     801                 self._iterating = true     802             else:  /home/user/programs/repos/pyenv/versions/3.5.2/envs/work/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in dispatch_one_batch(self, iterator)     656                 return false     657             else: --> 658                 self._dispatch(tasks)     659                 return true     660   /home/user/programs/repos/pyenv/versions/3.5.2/envs/work/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in _dispatch(self, batch)     564      565         if self._pool none: --> 566             job = immediatecomputebatch(batch)     567             self._jobs.append(job)     568             self.n_dispatched_batches += 1  /home/user/programs/repos/pyenv/versions/3.5.2/envs/work/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in __init__(self, batch)     178         # don't delay application, avoid keeping input     179         # arguments in memory --> 180         self.results = batch()     181      182     def get(self):  /home/user/programs/repos/pyenv/versions/3.5.2/envs/work/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in __call__(self)      70       71     def __call__(self): ---> 72         return [func(*args, **kwargs) func, args, kwargs in self.items]      73       74     def __len__(self):  /home/user/programs/repos/pyenv/versions/3.5.2/envs/work/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in <listcomp>(.0)      70       71     def __call__(self): ---> 72         return [func(*args, **kwargs) func, args, kwargs in self.items]      73       74     def __len__(self):  /home/user/programs/repos/pyenv/versions/3.5.2/envs/work/lib/python3.5/site-packages/sklearn/cross_validation.py in _fit_and_score(estimator, x, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, error_score)    1529             estimator.fit(x_train, **fit_params)    1530         else: -> 1531             estimator.fit(x_train, y_train, **fit_params)    1532     1533     except exception e:  /home/user/programs/repos/pyenv/versions/3.5.2/envs/work/lib/python3.5/site-packages/sklearn/linear_model/coordinate_descent.py in fit(self, x, y)    1146                 train, test in folds)    1147         mse_paths = parallel(n_jobs=self.n_jobs, verbose=self.verbose, -> 1148                              backend="threading")(jobs)    1149         mse_paths = np.reshape(mse_paths, (n_l1_ratio, len(folds), -1))    1150         mean_mse = np.mean(mse_paths, axis=1)  /home/user/programs/repos/pyenv/versions/3.5.2/envs/work/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in __call__(self, iterable)     798             # dispatched. in particular covers edge     799             # case of parallel used exhausted iterator. --> 800             while self.dispatch_one_batch(iterator):     801                 self._iterating = true     802             else:  /home/user/programs/repos/pyenv/versions/3.5.2/envs/work/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in dispatch_one_batch(self, iterator)     656                 return false     657             else: --> 658                 self._dispatch(tasks)     659                 return true     660   /home/user/programs/repos/pyenv/versions/3.5.2/envs/work/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in _dispatch(self, batch)     564      565         if self._pool none: --> 566             job = immediatecomputebatch(batch)     567             self._jobs.append(job)     568             self.n_dispatched_batches += 1  /home/user/programs/repos/pyenv/versions/3.5.2/envs/work/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in __init__(self, batch)     178         # don't delay application, avoid keeping input     179         # arguments in memory --> 180         self.results = batch()     181      182     def get(self):  /home/user/programs/repos/pyenv/versions/3.5.2/envs/work/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in __call__(self)      70       71     def __call__(self): ---> 72         return [func(*args, **kwargs) func, args, kwargs in self.items]      73       74     def __len__(self):  /home/user/programs/repos/pyenv/versions/3.5.2/envs/work/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in <listcomp>(.0)      70       71     def __call__(self): ---> 72         return [func(*args, **kwargs) func, args, kwargs in self.items]      73       74     def __len__(self):  /home/user/programs/repos/pyenv/versions/3.5.2/envs/work/lib/python3.5/site-packages/sklearn/linear_model/coordinate_descent.py in _path_residuals(x, y, train, test, path, path_params, alphas, l1_ratio, x_order, dtype)     931         avoid memory copies     932     """ --> 933     x_train = x[train]     934     y_train = y[train]     935     x_test = x[test]  indexerror: index 60527 out of bounds axis 0 size 41376 

i tried use x_train.as_matrix() didn't work either, giving same error.

strange can use manually:

cv_split = cv.shufflesplit(n=len(x_train), test_size=0.2)  tr, te in cv_split:     print(x_train.as_matrix()[tr], y_train.as_matrix()[tr])  [[0 0 0 ..., 0 0 1]  [0 0 0 ..., 0 0 1]  [0 0 0 ..., 0 0 1]  ...,   [0 0 0 ..., 0 0 1]  [0 0 0 ..., 0 0 1]  [0 0 0 ..., 0 0 1]] [2 1 1 ..., 1 4 1] [[   0    0    0 ...,    0    0    1]  [1720    0    0 ...,    0    0    1]  [   0    0    0 ...,    0    0    1]  ...,   [ 773    0    0 ...,    0    0    1]  [   0    0    0 ...,    0    0    1]  [ 501    1    0 ...,    0    0    1]] [1 1 1 ..., 1 2 1] 

what not seeing here? doing wrong or scikit bug?


update 1

just found out cv parameter not cv.shufflesplit object. counterintuitive me, since the docs says

enter image description here

aren't cross_validation classes "object used cross-validation generator"?

thanks!

you shouldn't varying cv in cross validation parameters grid, idea have fixed cross-validation, , use grid search on other parameters, this:

m_model = grid_search.gridsearchcv(model,                                     {'learning_rate': [0.1, 0.05, 0.02]},                                    cv = cv.shufflesplit(n=len(x_train), test_size=0.2)) 

Comments

Popular posts from this blog

javascript - Thinglink image not visible until browser resize -

firebird - Error "invalid transaction handle (expecting explicit transaction start)" executing script from Delphi -

mongodb - How to keep track of users making Stripe Payments -