Clojure performance on set updates or large number computations -
i'm trying sum prime numbers euler project problem using sieve algorithm. i'm using mutable set store numbers aren't prime , using 'dosync' , 'commute' update set (otherwise ran out of memory if immutable) . performance linear till 1.2 million primes, performance terrible @ 1.5 million (7 seconds vs. 64 seconds). ideas i'm doing wrong? guess possibly numbers getting large, or maybe updating mutable sets inefficient.
(defn mark-multiples [not-prime-set multiple prime-max] (loop [ counter (long 2) product (* counter multiple)] (if (> product prime-max) not-prime-set (do (dosync (commute not-prime-set conj product)) (recur (inc counter) (* (inc counter) multiple)))))) (defn sieve-summation [prime-max] (def not-prime-set (ref #{ (long 1) }) ) (loop [counter (long 2) summation (long 0)] (if (> counter prime-max) summation (if (not (contains? @not-prime-set counter)) (do (mark-multiples not-prime-set counter prime-max) (recur (inc counter) (+ summation counter))) (recur (inc counter) summation)))))
=> (time (sieve-summation 100000)) "elapsed time: 496.673 msecs" 454396537
=> (time (sieve-summation 150000)) "elapsed time: 763.333 msecs" 986017447
=> (time (sieve-summation 1000000)) "elapsed time: 6037.926 msecs" 37550402023
=> (time (sieve-summation 1100000)) "elapsed time: 6904.385 msecs" 45125753695
=> (time (sieve-summation 1200000)) "elapsed time: 7321.299 msecs" 53433406131
=> (time (sieve-summation 1500000)) "elapsed time: 64995.216 msecs" 82074443256
----edit----
thanks a. webb, great suggestion! code had bit slower, in order speed up, had make not-prime-set transient @ beginning , runs faster (roughly 8 times). i'm still getting out-of-memory errors, i'll try figure out how increase heap size on jvm see if fixes it. i'm running clojure on eclipse on mac, , i'm new clojure , macs.
i interested see how further refactor program (keeping same logic) more elegant in clojure. again.
(defn mark-multiples2 [not-prime-set prime prime-max] (loop [multiple (* 2 prime) nps not-prime-set ] (if (> multiple prime-max) nps (recur (+ multiple prime) (conj! nps multiple))))) (defn sieve-summation2 [prime-max] (loop [counter 2, summation 0, not-prime-set (transient #{1})] (if (> counter prime-max) summation (if (not-prime-set counter) (recur (inc counter) summation not-prime-set) (recur (inc counter) (+ summation counter) (mark-multiples2 not-prime-set counter prime-max))))))
=> (time (sieve-summation2 100000)) "elapsed time: 124.781 msecs" 454396537
=> (time (sieve-summation 100000)) "elapsed time: 876.744 msecs" 454396537
there better , more elegant ways solve problem in clojure, that's not point of question.
using reference type -- whether ref, or more appropriately atom -- nothing here. still creating garbage. swapping out contents of mutable storage location 1 immutable data structure another. don't know caused time spike, 1 possibility triggered long garbage collection cycle.
what want use here transients. without changing code much, following should significant speed-up.
(defn mark-multiples [not-prime-set multiple prime-max] (loop [m (* 2 multiple), nps (transient not-prime-set)] (if (> m prime-max) (persistent! nps) (recur (+ m multiple) (conj! nps m))))) (defn sieve-summation [prime-max] (loop [counter 2, summation 0, not-prime-set #{1}] (if (> counter prime-max) summation (if (contains? not-prime-set counter) (recur (inc counter) summation not-prime-set) (recur (inc counter) (+ summation counter) (mark-multiples not-prime-set counter prime-max))))))
this same algorithm, in more idiomatic style:
(defn mark [s n m] (into s (range (* 2 n) m n))) (defn prime-sum [m] (let [step (fn [[a s] n] (if (s n) [a s] [(+ n) (mark s n m)]))] (first (reduce step [0 #{}] (range 2 m)))))
from here, might start attack inherent memory problem of algorithm -- storing non-primes, whereas need store next non-primes @ given point. beautiful implementation of idea, see christophe grand's everybody loves sieve of eratosthenes entry.
Comments
Post a Comment