scipy - How to force SVC to treat a user-provided kernel as sparse -
svc appears treat kernels can take sparse matrices differently don't. however, if user-provided kernel written take sparse matrices, , sparse matrix provided during fit, still converts sparse matrix dense , treats kernel dense because kernel not 1 of sparse kernels pre-defined in scikit-learn.
is there way force svc recognize kernel sparse , not convert sparse matrix dense before passing kernel?
edit 1: minimal working example
as example, if upon creation, svc passed string "linear" kernel, linear kernel used, sparse matrices passed directly linear kernel, , support vectors stored sparse matrices if sparse matrix provided when fitting. however, if instead linear_kernel function passed svc, sparse matrices converted ndarray before passing kernel, , support vectors stored ndarray.
import numpy np scipy.sparse import csr_matrix sklearn.metrics.pairwise import linear_kernel sklearn.svm import svc def make_random_sparsemat(m, n=1024, p=.94): """make mxn sparse matrix 1-p probability of 1.""" return csr_matrix(np.random.uniform(size=(m, n)) > p, dtype=np.float64) x = make_random_sparsemat(100) y = np.asarray(np.random.uniform(size=(100)) > .5, dtype=np.float64) model1 = svc(kernel="linear") model1.fit(x, y) print("built-in kernel:") print("kernel treated sparse: {}".format(model1._sparse)) print("type of dual coefficients: {}".format(type(model1.dual_coef_))) print("type of support vectors: {}".format(type(model1.support_vectors_))) model2 = svc(kernel=linear_kernel) model2.fit(x, y) print("user-provided kernel:") print("kernel treated sparse: {}".format(model2._sparse)) print("type of dual coefficients: {}".format(type(model2.dual_coef_))) print("type of support vectors: {}".format(type(model2.support_vectors_)))
output:
built-in kernel: kernel treated sparse: true type of dual coefficients: <class 'scipy.sparse.csr.csr_matrix'> type of support vectors: <class 'scipy.sparse.csr.csr_matrix'> user-provided kernel: kernel treated sparse: false type of dual coefficients: <type 'numpy.ndarray'> type of support vectors: <type 'numpy.ndarray'>
i'm fishing around in dark, working scikit-learn
code find on github
.
a lot of svc
linear
code appears in c library. there talk internal representation being sparse.
your linear_kernel
function does:
x, y = check_pairwise_arrays(x, y) return safe_sparse_dot(x, y.t, dense_output=true)
if make x
, y
in [119]: x out[119]: <100x1024 sparse matrix of type '<class 'numpy.float64'>' 6108 stored elements in compressed sparse row format> in [120]: in [120]: in [120]: y = np.asarray(np.random.uniform(size=(100)) > .5, dtype=np.float64)
and recreate sparse_safe_dot
in [122]: safe_sparse_dot(y,x,dense_output=true) out[122]: array([ 3., 5., 3., ..., 4., 2., 4.])
so applying y
, x
(in order makes sense), dense array. changing dense_output
parameter doesn't change things. basically, y*x
, sparse * dense, returns dense.
if make y
sparse, can sparse product:
in [125]: ym=sparse.csr_matrix(y) in [126]: ym*x out[126]: <1x1024 sparse matrix of type '<class 'numpy.float64'>' 1000 stored elements in compressed sparse row format> in [127]: safe_sparse_dot(ym,x,dense_output=false) out[127]: <1x1024 sparse matrix of type '<class 'numpy.float64'>' 1000 stored elements in compressed sparse row format> in [128]: safe_sparse_dot(ym,x,dense_output=true) out[128]: array([[ 3., 5., 3., ..., 4., 2., 4.]])
i don't know workings of svc
, fit
, working sparse matrices, know have careful when mixing sparse , dense matrices. easy dense result, whether want or not.
Comments
Post a Comment