python - Import sparse matrix from csv file -
i have csv
file headers like:
given test.csv
file contains sparse matrix
:
"a","b","c","d","e","f","timestamp" 611.88243,0,0,0,0,0,0 0,9089.5601,0,864.07514,0,0,0 0,0,5133.0,0,0,0,0
i want load sparse matrix/ndarray 3 rows , 7 columns. if, use load.txt
array 3 rows , 7 columns.
numpy.loadtxt(open("test.csv","rb"),delimiter=",",skiprows=1)
now, file huge 10,000 columns , 7000 rows. so, taking lot of time load. there efficient method in scipy/numpy
load matrix sparse matrix or array, takes less amount of time in loading taking advantage of sparse feature?
i tested bare bones loadtxt
on data (replicated produce (39,7) array):
def my_loadtxt(file): # barebones loadtxt f = open(file) h = f.readline() ll = [] l in f: y = [float(x) x in l.split(',')] ll.append(y) x = np.array(ll) f.close() return x
it 2x fast np.loadtxt
.
the result turned sparse mattrix, e.g. sparse.csr_matrix(loadtext(...))
. step isn't going save time.
conceivably data line y
in function turned sparse matrix, , collected large sparse matrix. 1 have have knowledge of scipy.sparse
matrix types efficiently. i'm not optimistic saving time.
if going load file once, or rarely, have use 'loadtxt' or simplified version, , accept time cost. if have load frequently, saving in more efficient form might worth it.
you try simple numpy save , load (though in tests slower).
a couple of formats in scipy.io
can save sparse
matrices. example matlab
compatible format:
io.savemat('stack24426239.mat',{'x2':sparse.csr_matrix(x1)}) x2 = io.loadmat('stack24426239.mat')['x2']
in small tests, loadmat
bit faster my_loadtxt
. don't know how fare on larger file. '.mat' bit smaller '.txt'.
Comments
Post a Comment