hadoop - Implementing a custom Apache pig algebraic UDF -

- February 15, 2013

everyone

i implemented custom aggregate pig udf. udf implements algebraic interface, , there 3 classes - initial, intermed , final work @ different phases. works correctly, inefficiently.

the udf uses algorithm bit heavy - when running on single value. work more efficiently when running on bigger groups of data - - 100 @ time. observed initial class invoked single value, , later combined intermed , final classes.

i aware there's accumulator interface such cases, not find documentation on how use algebraic udf.

so question - there way me "force" pig pass more values initial calculation - either using accumulator interface or via other way.

an explanantion or pointer documentation or sample appreciated.

thanks amir

it seems pig's algebraic initial function receive single value in tuple (at least according this blog post).

to solve issue, ended doing return single value in initial without processing @ all. intermed , final functions perform algorithm.

since intermed function may receive outputs either initial function or intermed function (this according docs, did not see in practice, in tests, intermed received values initial function), both initial , intermed functions return tuple of 2 values. first value in tuple string telling me source of value - either "initial" or "intermed". second value in tuple actual result.

Search This Blog

CSS

hadoop - Implementing a custom Apache pig algebraic UDF -

Comments

Post a Comment

Popular posts from this blog

sql server - MSSQL Text and Varchar(MAX) fields shown (MEMO) in DBGrid -

qml - Is it possible to implement SystemTrayIcon functionality in Qt Quick application -

double exclamation marks in haskell -