Difference between tf.train.shuffle_batch_join and tf.train.shuffle_batch

Looking at both function signatures, with arguments

tf.train.shuffle_batch_join(
tensors_list,
batch_size,
capacity,
min_after_dequeue,
seed=None,
enqueue_many=False,
shapes=None,
allow_smaller_final_batch=False,
shared_name=None,
name=None
)

and

tf.train.shuffle_batch(
tensors,
batch_size,
capacity,
min_after_dequeue,
num_threads=1,
seed=None,
enqueue_many=False,
shapes=None,
allow_smaller_final_batch=False,
shared_name=None,
name=None
)

the only difference is among the atguments is num_threads that denotes intuitively that tf.train.shuffle_batch could be processed with multiple threads or processes, Except that, they seem to do pretty much the same work.

I was wondering if there is a fundamental difference on which someone might choose one over the other except multiprocessing of batches.

1 answer

  • answered 2018-07-11 13:48 Olivier Dehaene

    Quoting from the shuffle_batch_join TF documentation :

    The tensors_list argument is a list of tuples of tensors, or a list of dictionaries of tensors. Each element in the list is treated similarly to the tensors argument of tf.train.shuffle_batch().

    Basically, shuffle_batch_join expects to:

    • Receive a list of tensors
    • Perform shuffle_batch on each member of the list
    • Return a list of tensors with the same number and types as tensors_list[i].

    Be aware that if you use shuffle_batch_join :

    len(tensors_list) threads will be started, with thread i enqueuing the tensors from tensors_list[i]. tensors_list[i1][j] must match tensors_list[i2][j] in type and shape, except in the first dimension if enqueue_many is true.

    Here is a link to the doc.