hadoop - R Reducer is not working properly in Amazon EMR -

- April 15, 2010

i have done map reduce code in r run in amazon emr.

my input file format: url1 word1 word2 word3 url2 word4 word2 word3 url3 word1 word7 word2

i'm expecting output as: urls concat spaces word1 url1 url3 word2 url1 url2 url3 word3 url1 url2 .. ... ..

but emr using 3 reducers , creating 3 output files. file wise output correct, combining values, no duplicate keys. if see 3 files together, there duplicate keys.

output file 1: word1 url1 url3 word2 url1 .. ..

output file 2: word2 url2 url3 word3 url1 .. ..

see, word2 distributed 2 files. need 1 key in 1 file.

i'm using, hadoop streaming in emr. please suggest me correct settings remove duplicate keys in different files.

i assume mapper working fine. reducer:

process <- function(mat){  rows = nrow(mat) cols = ncol(mat)  for(i in 1:rows) {      for(j in i+1:rows)     {         if(j<=rows)         {             if(tostring(mat[i,1])==tostring(mat[j,1]))             {             x<-paste(mat[i,2],mat[j,2],sep=" ")             mat[i,2]=x             mat<-mat[-j,]             rows<-rows-1             }         }     } }  write.table(mat, file=stdout(), quote=false, row.names=false, col.names=false) }  reduce <- function(input){   #create column names make easier work data set   names <- c("word", "value")   cols = as.list(vector(length=2, mode="character"))   names(cols) <- names    #read input   hstablereader(file=input, cols, ignorekey=true, chunksize=100000, fun=process, sep=" ")   }

have tried using combiner gather same keys same reducer? way should able gather words similar key single reducer. check wordcount examples combiner understand how combiner class works.

Search This Blog

CSS

hadoop - R Reducer is not working properly in Amazon EMR -

Comments

Post a Comment

Popular posts from this blog

qml - Is it possible to implement SystemTrayIcon functionality in Qt Quick application -

double exclamation marks in haskell -

javascript - How to get D3 Tree link text to transition smoothly? -