python - Import sparse matrix from csv file -


i have csv file headers like:

given test.csv file contains sparse matrix:

"a","b","c","d","e","f","timestamp" 611.88243,0,0,0,0,0,0 0,9089.5601,0,864.07514,0,0,0 0,0,5133.0,0,0,0,0 

i want load sparse matrix/ndarray 3 rows , 7 columns. if, use load.txt array 3 rows , 7 columns.

numpy.loadtxt(open("test.csv","rb"),delimiter=",",skiprows=1) 

now, file huge 10,000 columns , 7000 rows. so, taking lot of time load. there efficient method in scipy/numpy load matrix sparse matrix or array, takes less amount of time in loading taking advantage of sparse feature?

i tested bare bones loadtxt on data (replicated produce (39,7) array):

def my_loadtxt(file):     # barebones loadtxt     f = open(file)     h = f.readline()     ll = []     l in f:         y = [float(x) x in l.split(',')]         ll.append(y)     x = np.array(ll)     f.close()     return x 

it 2x fast np.loadtxt.

the result turned sparse mattrix, e.g. sparse.csr_matrix(loadtext(...)). step isn't going save time.

conceivably data line y in function turned sparse matrix, , collected large sparse matrix. 1 have have knowledge of scipy.sparse matrix types efficiently. i'm not optimistic saving time.


if going load file once, or rarely, have use 'loadtxt' or simplified version, , accept time cost. if have load frequently, saving in more efficient form might worth it.

you try simple numpy save , load (though in tests slower).

a couple of formats in scipy.io can save sparse matrices. example matlab compatible format:

io.savemat('stack24426239.mat',{'x2':sparse.csr_matrix(x1)}) x2 = io.loadmat('stack24426239.mat')['x2'] 

in small tests, loadmat bit faster my_loadtxt. don't know how fare on larger file. '.mat' bit smaller '.txt'.


Comments

Popular posts from this blog

google api - Incomplete response from Gmail API threads.list -

Installing Android SQLite Asset Helper -

Qt Creator - Searching files with Locator including folder -