Apache Spark: Split a pair RDD into multiple RDDs by key

This drove me crazy but I finally found a solution.

If you want to Split a pair RDD of type (A, Iterable(B)) by key, so the result is several RDDs of type B, then here how you go:

The trick is twofold (1) get the list of all the keys, (2) iterate through the list of keys, and for each key, create a new RDD by filtering the original pair RDD to get only values attached to that key. Note the use of Java 8’s Lambda expression in line 17.

Remark: bare in mind that the action collect()  gets you the keys of the entire distributed RDD into the driver machine. Generally, keys are integers or small strings, so collecting them into one machine wouldn’t be problematic. Otherwise, this wouldn’t be the best way to go. If you have a suggestion, you are more than welcome to leave a comment below… spread the word, help the world!

So you’ve decided to be a developer? Ok but…

programming

Two must have principles :

passion and patience.

…then here are some tips of what you should do next :

Think out of the box

Programming today is easier and more accessible than ever. You must know (and accept agreements) that you are just one of millions of developers[1] on the globe. If you want to shine among them, then you need to think differently, act differently.

Need for time

Very clear. Coding is a very time-consuming process. If you don’t have time (I say two hours per a day … at least) then I doubt much you can be an (efficient) developer. “Developer” is a job, jobs need time.

P.R.A.C.T.I.C.E

I wrote it in capital letters, because it’s the clue of your success in this universe. You will, absolutely, need to write many many examples before starting doing any thing big. A one reason is when you start a big project you’ll encounter lions and crocodiles (people are gentle to call them only bugs), and that what pushed people, through the history, to abandon programming in the first days. Again, don’t dare to start doing big things if you are not ready for … and ready is subsequent to training.

Join serious projects

After a whole series of practicing, you should be ready to go. Right after, comes step two : Involve serious projects. Whatever you have done during practicing phase, you are not a real developer until you join (possibly start) a serious project. Project means targets to reach. If you don’t have a clear destination you are surly going nowhere or anywhere … When you start a project having chained nested targets, you start to use your talents and tools you are (supposed to be) competed to handle. You actually grow with your projects.

Be useful to your world

Once you solved a sticky problem, please take few  minutes to share it on a blog post or as an answer to the same problem found somewhere online. You would like to do it for two great causes: (1) you save time and efforts for future learners who would face the same problem; and (2) you return favor to  those who suffered nightmares and headaches in order to provide you with ready-to-use solutions –by sharing their tips to others as part of your solution.

To be continued…