python - memory issues with big numpy/scipy arrays -
i've below code snippet:
data
/imat
data matrices of 100000 x 500
, while matrix s
i'm constructing of order 50000 x 100000
. matrix s
super sparse 1 entry in each column
def getsparsecoverr(imat, sketch): ata = np.dot(imat.transpose(), imat) btb = sketch.transpose().dot(sketch) fn = np.linalg.norm(imat, 'fro') ** 2 val = np.linalg.norm(ata - btb , 2)/fn del ata del btb return val nrows, ncols = data.shape samples = noofsamples(ncols, eps, delta) cols = np.arange(nrows) rows = np.random.random_integers(samples - 1, size = nrows) diag = [] in range(len(cols)): if np.random.random() < 0.5: diag.append(1) else: diag.append(-1) s = sparse.csc_matrix((diag, (rows, cols)), shape = (samples, nrows))/np.sqrt(samples) q = s.dot(data) q = sparse.bsr_matrix(q) print getsparsecoverr(data, q)
when run above code first time gives me print statement output. after that, if run below error:
python: malloc.c:2369: sysmalloc: assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 * (sizeof(size_t))) - 1)) & ~((2 * (sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long)old_end & pagemask) == 0)' failed.
then if run once again, like:
q = sparse.bsr_matrix(q) file "/usr/lib64/python2.7/site-packages/scipy/sparse/bsr.py", line 170, in __init__ arg1 = coo_matrix(arg1, dtype=dtype).tobsr(blocksize=blocksize) file "/usr/lib64/python2.7/site-packages/scipy/sparse/coo.py", line 186, in __init__ self.data = m[self.row, self.col] indexerror: index -1517041769959067988 out of bounds axis 0 size 178133 none
it seems me first run creating memory issues. how can debug , possible problems , solutions?
would work?
def getsparsecoverr(imat, sketch): return np.linalg.norm(np.dot(imat.transpose(), imat) - sketch.transpose().dot(sketch)) / (np.linalg.norm(imat, 'fro') ** 2) def getq(data, rows, cols, diag, samples, nrows): return sparse.bsr_matrix((sparse.csc_matrix((diag, (rows, cols)), shape = (samples, nrows))/np.sqrt(samples)).dot(data)) print getsparsecoverr(data, getq(data, rows, cols, diag, samples, nrows))
that is, trying things out of scope possible. might parenthesis wrong since it's hard test without functions.
if not, assume 1 of functions storing changing state / storing data.
using original code , given use ipython can following:
in [5]: %%bash ps -e -orss=,args= | sort -b -k1,1n | pr -tw$columns | tail -n 10
to monitor allocation of memory each step of code nail down problem.
Comments
Post a Comment