Clojure performance on set updates or large number computations -


i'm trying sum prime numbers euler project problem using sieve algorithm. i'm using mutable set store numbers aren't prime , using 'dosync' , 'commute' update set (otherwise ran out of memory if immutable) . performance linear till 1.2 million primes, performance terrible @ 1.5 million (7 seconds vs. 64 seconds). ideas i'm doing wrong? guess possibly numbers getting large, or maybe updating mutable sets inefficient.

(defn mark-multiples [not-prime-set multiple prime-max]   (loop [ counter (long 2)        product (* counter multiple)]     (if (> product prime-max) not-prime-set       (do         (dosync (commute not-prime-set conj product))         (recur  (inc counter) (* (inc counter) multiple))))))   (defn sieve-summation [prime-max]   (def not-prime-set (ref #{ (long 1) }) )   (loop [counter (long 2)      summation (long 0)]     (if (> counter prime-max) summation       (if (not (contains? @not-prime-set counter))          (do            (mark-multiples not-prime-set counter prime-max)           (recur  (inc counter) (+ summation counter)))         (recur (inc counter) summation))))) 

=> (time (sieve-summation 100000)) "elapsed time: 496.673 msecs" 454396537

=> (time (sieve-summation 150000)) "elapsed time: 763.333 msecs" 986017447

=> (time (sieve-summation 1000000)) "elapsed time: 6037.926 msecs" 37550402023

=> (time (sieve-summation 1100000)) "elapsed time: 6904.385 msecs" 45125753695

=> (time (sieve-summation 1200000)) "elapsed time: 7321.299 msecs" 53433406131

=> (time (sieve-summation 1500000)) "elapsed time: 64995.216 msecs" 82074443256

----edit----

thanks a. webb, great suggestion! code had bit slower, in order speed up, had make not-prime-set transient @ beginning , runs faster (roughly 8 times). i'm still getting out-of-memory errors, i'll try figure out how increase heap size on jvm see if fixes it. i'm running clojure on eclipse on mac, , i'm new clojure , macs.

i interested see how further refactor program (keeping same logic) more elegant in clojure. again.

(defn mark-multiples2 [not-prime-set prime prime-max]   (loop [multiple (* 2 prime) nps not-prime-set ]     (if (> multiple prime-max)        nps       (recur (+ multiple prime) (conj! nps multiple)))))   (defn sieve-summation2 [prime-max]   (loop [counter 2, summation 0, not-prime-set (transient #{1})]     (if (> counter prime-max)        summation       (if (not-prime-set counter)          (recur (inc counter) summation not-prime-set)         (recur (inc counter)             (+ summation counter)             (mark-multiples2 not-prime-set counter prime-max)))))) 

=> (time (sieve-summation2 100000)) "elapsed time: 124.781 msecs" 454396537

=> (time (sieve-summation 100000)) "elapsed time: 876.744 msecs" 454396537

there better , more elegant ways solve problem in clojure, that's not point of question.

using reference type -- whether ref, or more appropriately atom -- nothing here. still creating garbage. swapping out contents of mutable storage location 1 immutable data structure another. don't know caused time spike, 1 possibility triggered long garbage collection cycle.

what want use here transients. without changing code much, following should significant speed-up.

(defn mark-multiples [not-prime-set multiple prime-max]   (loop [m (* 2 multiple), nps (transient not-prime-set)]     (if (> m prime-max)        (persistent! nps)       (recur (+ m multiple) (conj! nps m)))))   (defn sieve-summation [prime-max]   (loop [counter 2, summation 0, not-prime-set #{1}]     (if (> counter prime-max)        summation       (if (contains? not-prime-set counter)          (recur (inc counter) summation not-prime-set)         (recur (inc counter)                 (+ summation counter)                 (mark-multiples not-prime-set counter prime-max)))))) 

this same algorithm, in more idiomatic style:

(defn mark [s n m]   (into s (range (* 2 n) m n)))  (defn prime-sum [m]   (let [step (fn [[a s] n]                 (if (s n)                  [a s]                   [(+ n) (mark s n m)]))]   (first (reduce step [0 #{}] (range 2 m))))) 

from here, might start attack inherent memory problem of algorithm -- storing non-primes, whereas need store next non-primes @ given point. beautiful implementation of idea, see christophe grand's everybody loves sieve of eratosthenes entry.


Comments

Popular posts from this blog

google api - Incomplete response from Gmail API threads.list -

Installing Android SQLite Asset Helper -

Qt Creator - Searching files with Locator including folder -