scipy - How to force SVC to treat a user-provided kernel as sparse -


svc appears treat kernels can take sparse matrices differently don't. however, if user-provided kernel written take sparse matrices, , sparse matrix provided during fit, still converts sparse matrix dense , treats kernel dense because kernel not 1 of sparse kernels pre-defined in scikit-learn.

is there way force svc recognize kernel sparse , not convert sparse matrix dense before passing kernel?

edit 1: minimal working example

as example, if upon creation, svc passed string "linear" kernel, linear kernel used, sparse matrices passed directly linear kernel, , support vectors stored sparse matrices if sparse matrix provided when fitting. however, if instead linear_kernel function passed svc, sparse matrices converted ndarray before passing kernel, , support vectors stored ndarray.

import numpy np scipy.sparse import csr_matrix sklearn.metrics.pairwise import linear_kernel sklearn.svm import svc   def make_random_sparsemat(m, n=1024, p=.94):     """make mxn sparse matrix 1-p probability of 1."""     return csr_matrix(np.random.uniform(size=(m, n)) > p, dtype=np.float64)   x = make_random_sparsemat(100) y = np.asarray(np.random.uniform(size=(100)) > .5, dtype=np.float64) model1 = svc(kernel="linear") model1.fit(x, y) print("built-in kernel:") print("kernel treated sparse: {}".format(model1._sparse)) print("type of dual coefficients: {}".format(type(model1.dual_coef_))) print("type of support vectors: {}".format(type(model1.support_vectors_)))  model2 = svc(kernel=linear_kernel) model2.fit(x, y) print("user-provided kernel:") print("kernel treated sparse: {}".format(model2._sparse)) print("type of dual coefficients: {}".format(type(model2.dual_coef_))) print("type of support vectors: {}".format(type(model2.support_vectors_))) 

output:

built-in kernel: kernel treated sparse: true type of dual coefficients: <class 'scipy.sparse.csr.csr_matrix'> type of support vectors: <class 'scipy.sparse.csr.csr_matrix'> user-provided kernel: kernel treated sparse: false type of dual coefficients: <type 'numpy.ndarray'> type of support vectors: <type 'numpy.ndarray'> 

i'm fishing around in dark, working scikit-learn code find on github.

a lot of svc linear code appears in c library. there talk internal representation being sparse.

your linear_kernel function does:

x, y = check_pairwise_arrays(x, y) return safe_sparse_dot(x, y.t, dense_output=true) 

if make x , y

in [119]: x out[119]:  <100x1024 sparse matrix of type '<class 'numpy.float64'>'     6108 stored elements in compressed sparse row format> in [120]:  in [120]:  in [120]: y = np.asarray(np.random.uniform(size=(100)) > .5, dtype=np.float64) 

and recreate sparse_safe_dot

in [122]: safe_sparse_dot(y,x,dense_output=true) out[122]: array([ 3.,  5.,  3., ...,  4.,  2.,  4.]) 

so applying y , x (in order makes sense), dense array. changing dense_output parameter doesn't change things. basically, y*x, sparse * dense, returns dense.

if make y sparse, can sparse product:

in [125]: ym=sparse.csr_matrix(y) in [126]: ym*x out[126]:  <1x1024 sparse matrix of type '<class 'numpy.float64'>'     1000 stored elements in compressed sparse row format> in [127]: safe_sparse_dot(ym,x,dense_output=false) out[127]:  <1x1024 sparse matrix of type '<class 'numpy.float64'>'     1000 stored elements in compressed sparse row format> in [128]: safe_sparse_dot(ym,x,dense_output=true) out[128]: array([[ 3.,  5.,  3., ...,  4.,  2.,  4.]]) 

i don't know workings of svc , fit, working sparse matrices, know have careful when mixing sparse , dense matrices. easy dense result, whether want or not.


Comments

Popular posts from this blog

javascript - Thinglink image not visible until browser resize -

firebird - Error "invalid transaction handle (expecting explicit transaction start)" executing script from Delphi -

mongodb - How to keep track of users making Stripe Payments -