how do I shuffle each part file of Pig input within itself?
I am interested in performing shuffle within each part file of a pig input directory. This is convenient for performing n-fold cross validation, where I can select say 10% of the shards as test and the remaining 90% as training, while ensuring that the training data is completely shuffled. Thus I do NOT want global shuffling like the following
generated = foreach data generate RANDOM() as rnd; ordered = order generated by rnd;