Link-Map-Reduce in Riak an example from inagist.com

My last post felt a little incomplete without some code backing it up. I'm following it up with a code sample of how exactly this map reduce is wired up. 

I will walk through how we do the "Popular Replies" section on the conversation page. Again here is a @BarackObama tweet, with more than a 500 replies. Popular replies extracts only those replies which have been further replied to, re-tweeted or a reply from the author of the tweet itself. Right now its picked out 1 of these 500+ replies.

Data Model

Resonses to a tweet are captured in a bucket of its own <<"tweet_responses_bucket">>. Each tweet is keyed by its tweet id as a 128 bit binary <<TweetId:128>>. Response details are not stored directly on this resource but a linked value in a bucket called "tweet_responses_subkeys_bucket". Responses are stored as links on a resource keyed as <<TweetId:128, (ResponseId rem 10):8>> in this bucket. This resource is added as a link on the {<<"tweet_responses_bucket">>, <<TweetId:128>>} resource and tagged as <<"tweet_response">>. A reply is recorded as a link of the form {{<<ResponseId:128>>, <<ResponseAuthorId:128>>}, <<"reply">>}. A link is represented as {{Bucket, Key}, Tag}, this link does not point to a valid bucket, key pair but is purely for our own interpretation.

Here is how it would look

 

           <<"tweet_responses_bucket">>

           ----------------------------

 

           |----------------------------------------|

           |   <<20337776197:128>>                  |

           |----------------------------------------|

           |   Links                                |

           |                                        |

           | {{<<"tweet_responses_subkeys_bucket">>,| 

           |  <<20337776197:128,0:8>>},             |

           |  <<"tweet_response">>},                |

           | {{<<"tweet_responses_subkeys_bucket">>,| 

           |  <<20337776197:128,1:8>>},             |

           |  <<"tweet_response">>},                |

           |  ....                                  |

           |----------------------------------------|

           |   Value                                |

           |                                        |

           |----------------------------------------|

 

 

           <<"tweet_responses_subkeys_bucket">>

           ------------------------------------

 

           |----------------------------------------|

           |   <<20337776197:128,0:8>>              |

           |----------------------------------------|

           |   Links                                |

           |                                        |

           |{{<<20339861590:128>>,<<18035803:128>>},|

           |  <<"reply">>},                         |

           |  ....                                  |

           |----------------------------------------|

           |   Value                                |

           |                                        |

           |----------------------------------------|

 

           |----------------------------------------|

           |   <<20337776197:128,1:8>>              |

           |----------------------------------------|

           |   Links                                |

           |                                        |

           |{{<<20337857101:128>>,<<82294968:128>>},|

           |  <<"reply">>},                         |

           |  ....                                  |

           |----------------------------------------|

           |   Value                                |

           |                                        |

           |----------------------------------------|

 

 

 

Code

And now here is the piece of code this does the extraction of the popular replies. The function gives a sorted list of {TweetId, AuthorId} tuples which are then looked up and served.

Hopefully the code is self explanatory. Of interest is the make_local_fun which creates a function reference which can be passed over to a remote node, without the remote node having a copy of this compiled code in its path.

Feel free to comment on anything I have overlooked or could be done better :)

Filed under  //  code   erlang   map-reduce   riak  
Posted by Jebu Ittiachen