Difference between tf.train.shuffle_batch_join and tf.train.shuffle_batch
Looking at both function signatures, with arguments
tf.train.shuffle_batch_join(
tensors_list,
batch_size,
capacity,
min_after_dequeue,
seed=None,
enqueue_many=False,
shapes=None,
allow_smaller_final_batch=False,
shared_name=None,
name=None
)
and
tf.train.shuffle_batch(
tensors,
batch_size,
capacity,
min_after_dequeue,
num_threads=1,
seed=None,
enqueue_many=False,
shapes=None,
allow_smaller_final_batch=False,
shared_name=None,
name=None
)
the only difference is among the atguments is num_threads
that denotes intuitively that tf.train.shuffle_batch
could be processed with multiple threads or processes, Except that, they seem to do pretty much the same work.
I was wondering if there is a fundamental difference on which someone might choose one over the other except multiprocessing of batches.
1 answer
-
answered 2018-07-11 13:48
Olivier Dehaene
Quoting from the shuffle_batch_join TF documentation :
The tensors_list argument is a list of tuples of tensors, or a list of dictionaries of tensors. Each element in the list is treated similarly to the tensors argument of tf.train.shuffle_batch().
Basically, shuffle_batch_join expects to:
- Receive a list of tensors
- Perform shuffle_batch on each member of the list
- Return a list of tensors with the same number and types as tensors_list[i].
Be aware that if you use shuffle_batch_join :
len(tensors_list) threads will be started, with thread i enqueuing the tensors from tensors_list[i]. tensors_list[i1][j] must match tensors_list[i2][j] in type and shape, except in the first dimension if enqueue_many is true.
See also questions close to this topic
-
how to display contents of text file one line at a time via timer using python on windows?
this is the code.
def wndProc(hWnd, message, wParam, lParam): if message == win32con.WM_PAINT: hdc, paintStruct = win32gui.BeginPaint(hWnd) dpiScale = win32ui.GetDeviceCaps(hdc, win32con.LOGPIXELSX) / 60.0 fontSize = 36 # http://msdn.microsoft.com/en-us/library/windows/desktop/dd145037(v=vs.85).aspx lf = win32gui.LOGFONT() lf.lfFaceName = "Times New Roman" lf.lfHeight = int(round(dpiScale * fontSize)) #lf.lfWeight = 150 # Use nonantialiased to remove the white edges around the text. # lf.lfQuality = win32con.NONANTIALIASED_QUALITY hf = win32gui.CreateFontIndirect(lf) win32gui.SelectObject(hdc, hf) rect = win32gui.GetClientRect(hWnd) # http://msdn.microsoft.com/en-us/library/windows/desktop/dd162498(v=vs.85).aspx win32gui.DrawText( hdc, **'Glory be to the Father, and to the son and to the Holy Spirit.',** -1, rect, win32con.DT_CENTER | win32con.DT_NOCLIP | win32con.DT_VCENTER ) win32gui.EndPaint(hWnd, paintStruct) return 0
.where it says the "glory be to the father.." prayer I would like that string to actually display a few different prayers on a timer. what I mean is I want to save short prayers to a text file and have the line where it says "glory be.." to change to a new prayer every 60 seconds cycling through a few prayers such as the serenity prayer etc.
-
How to plot the frequency of my data per day in an histogram?
I want to plot the number occurences of my data per day. y represent the id of my data. x represent the timestamp which I convert to time and day. But I can't make the correct plot. import matplotlib.pyplot as plt plt.style.use('ggplot') import time
y=['5914cce8-fad6-45d1-bec2-e59e62823617', '1c2067e0-5173-4a1d-8a75-b18267ee4598', 'db6830ff-fa9c-4aa5-b71e-f6da9333f357', '672cc9d5-360e-4451-bb7c-03e3d0bd8f0d', 'fb0f8122-fffc-47fe-a87a-b2b749df173b', '558e96ca-0222-40c7-acc0-e444f7663f53', 'c3f86fd5-eac3-48d3-a44c-b325f30b6139', '21dd849f-895f-4cf5-a168-45a4c1a9fbf9', 'e3b4cd56-e291-4671-93b6-d2226ee82ae7', '01346c48-a8c4-43d1-ac02-1efa33ca0f4e', '23b78b0f-85be-4ca7-99f4-1a5add76c12e', 'b1c036c0-0c2b-4170-a170-8fd0add0dec2', '74737546-e9c3-4126-bcb2-4d34503421ca', '342991f5-ec87-4c9d-83eb-9908f3e221aa', '4fdcd83a-eb68-4e26-b79b-753c5e022a4e', 'b7fbeca9-9416-43c4-9e90-9e71acc1eaba', '27c9d358-a3ef-4c69-ba89-eac16d8d3bdb', 'ef982c4b-a115-48a1-aef1-2f672d7f1f00', 'efedede2-9bb4-4c52-98b1-8b03070df3fd', 'eb03ae1b-4cde-409c-8d34-2a16a8be30d2'] x=['1548143296750', '1548183033872', '1548346185194', '1548443373507', '1548446119319', '1548446239441', '1548446068267', '1548445962159', '1548446011209', '1548446259465', '1548446180380', '1548239985290', '1548240060367', '1548240045347', '1547627568993', '1548755333313', '1548673604016','1548673443843', '1548673503914', '1548673563975'] date=[] for i in x: print(i) print() i=i[:10] print(i) readable = time.ctime(int(i)) readable=readable[:10] date.append(readable) print(date) plt.hist(date,y) plt.show()
-
mysql.connector.errors.ProgrammingError: Error in SQL Syntax
I'm using the Python MySQL connector to add data to a table by updating the row. A user enters a serial number, and then the row with the serial number is added. I keep getting a SQL syntax error and I can't figure out what it is.
query = ("UPDATE `items` SET salesInfo = %s, shippingDate = %s, warrantyExpiration = %s, item = %s, WHERE serialNum = %s") cursor.execute(query, (info, shipDate, warranty, name, sn, )) conn.commit()
Error:
mysql.connector.errors.ProgrammingError: 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'WHERE serialNum = '1B0000021A974726'' at line 1
"1B0000021A974726" is a serial number inputted by the user and it is already present in the table.
-
Set header in python list()
How can I set header for a python list?
I'm currently doing it like this:
df = list() df.append('header1')
Just wondering if there's a better solution...
-
Better way to Vlookup
I would like to know if there is a better alternative to Vlookup to find matches between two cells (or Python Dfs).
I want my code to check if the values in DF1 was in DF2, If values exactly match OR if the values partially matche return me the value in the DF2.
Just like the matches in 4th column Row 2,3 returned values.
Thanks Amigo!
-
How to have topic directive show up in table of contents?
The topic directive (found here: http://docutils.sourceforge.net/docs/ref/rst/directives.html#topic) is said to be used as a "self-contained section".
How can I make the title of each topic box show up in the table of contents, like a normal section title would?
-
CUDA_ERROR_OUT_OF_MEMORY tensorflow
As part of my study project, I try to train a neural network which makes a segmentation on images (based on FCN), and during the execution I received the following error message:
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,67,1066,718] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Note that I have fixed the batch_size to 1 and I have the same error even when I tried different image sizes , I put also just 1 image to train instead of 1600 still the same error! Could you help me to solve this problem ? What is it really about ?
-
Error while importing tensorflow on server
I have installed tensorflow in my college server offline by downloading and then installing. In conda list tensorflow is there it shows. But when i try to import I get this error:
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/173190025/anaconda3/lib/python3.6/site-packages/tensorflow/__init__.py", line 22, in <module> from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import File "/home/173190025/anaconda3/lib/python3.6/site-packages/tensorflow/python/__init__.py", line 52, in <module> from tensorflow.core.framework.graph_pb2 import * File "/home/173190025/anaconda3/lib/python3.6/site-packages/tensorflow/core/framework/graph_pb2.py", line 6, in <module> from google.protobuf import descriptor as _descriptor File "/home/173190025/anaconda3/lib/python3.6/site-packages/google/protobuf/descriptor.py", line 46, in <module> from google.protobuf.pyext import _message ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `CXXABI_1.3.9' not found (required by /home/173190025/anaconda3/lib/python3.6/site-packages/google/protobuf/pyext/../../../../../libprotobuf.so.15)
-
Output TFRecord to Google Cloud Storage from Python
I know
tf.python_io.TFRecordWriter
has a concept of GCS, but it doesn't seem to have permissions to write to it.If I do the following:
output_path = 'gs://my-bucket-name/{}/{}.tfrecord'.format(object_name, record_name) writer = tf.python_io.TFRecordWriter(output_path) # write to writer writer.close()
then I get 401s saying "Anonymous caller does not have storage.objects.create access to my-bucket-name."
However, on the same machine, if I do
gsutil rsync -d r gs://my-bucket-name bucket-backup
, it properly syncs it, so I've authenticated properly using gcloud.How can I give
TFRecordWriter
permissions to write to GCS? I'm going to just use Google's GCP python API for now, but I'm sure there's a way to do this using TF alone. -
Word2vec compact models
Tell me if there are any w2v models that do not require a dictionary. So, everything that I found in torchtext first wants to know the dictionary build_vocab. But if I have a huge body of text, I would like to have a model that works at the level of phrases. But I did not find one.
-
supervised learning for parcours
for my school project i got to implement a neural network for a parcours. I know it useless but i want the neural net to learn a simple algorithm:
if front right is bigger than front left -> go right, else -> go left.
I wanna use supervised learning. I got 2 inputs neurons, 2 hidden neurons and 1 output neuron. The goal is that when the player has to go left the output gives a number under 0.5 and if the player has to go right the nn has to return a number greater that 0.5.
Somehow I made a mistake and the nn always tries to return 0.5. Do so know what i did wrong and what i can do now.
thats how the parcours looks like
-
Categorical Variables and too many NA for ML model
We have a data set of 250 variables and 50,000 records. One variable is numeric, 248 variables are categorical and one variable is binary (the target variable). Each category variable has more than 3000 levels. We have many NA. Each row is the record of diseases that a patient has suffered. That's why there are so many NAs. Because a patient may have suffered 100 diseases, and another has suffered only one. The objective is to be able to predict if patients can have a specific disease from the information of other diseases they have suffered. How can this data set be handled in machine learning?
-
I am getting "Bash wrote one or more lines to the standard error stream" in Azure pipeline step
I am running the following
-bash
command:- bash: $(ci_scripts_path)/01_install_python_tools.sh displayName: 'Install python 2.7 tools' failOnStderr: true
while the sh script
01_install_python_tools.sh
completes successfully, but anyhow I get this error for the step:##[error]Bash wrote one or more lines to the standard error stream.
-
PySpark MLlib pipeline - customised StringIndexer
I'm building pyspark MLlib pipeline. Some of the data used in analysis is categorical. I'd like to be able to perform following customised actions on those categorical data, using StringIndexer if possible:
- Customise indexing of the weekdays as: Monday-0, Tuesday-1, Wednesday-2... Sunday-6
- Reduce cordiality, i.e. merge some of the categories, for example working days vs weekends: Monday,..Friday - 0, Saturday,Sunday - 1
What is the best way to achieve this, within pyspark pipelines? Is it a good practice?
Thanks !
-
Jenkins / groove - Dynamic stage showing all stages as failed
Im readind a shell script file /tmp/cmd_list.sh with groove script and creating a dynamic stage to build.
The content of /tmp/cmd_list.sh is:
ls pwd aaaaaa who
Only "aaaaaa" mut fail to execute (exit code 127). My problem is, all stages are marked as failed, but when i see the log, comands such as "ls", "pwd" and "who" work fine fine and return code is 0.
I tried to foce stage status for box, but without sucess ... My Groove script (Jenkinsfile):
import hudson.model.Result node('master') { stage ('\u27A1 Checkout'){ sh "echo 'checkout ok'" } def BUILD_LIST = readFile('/tmp/cmd_list.sh').split() for (CMDRUN in BUILD_LIST) { def status; try { node{ stage(CMDRUN) { println "Building ..." status = sh(returnStatus: true, script: CMDRUN ) println "---> EX CODE: "+ status if(status == 0){ currentBuild.result = 'SUCCESS' currentBuild.rawBuild.@result = hudson.model.Result.SUCCESS } else{ currentBuild.result = 'UNSTABLE' currentBuild.rawBuild.@result = hudson.model.Result.UNSTABLE } def e2e = build job:CMDRUN, propagate: false } } } catch (e) { println "===> " + e currentBuild.result = 'UNSTABLE' println "++++> EX CODE: "+ status if(status == 0){ println "++++> NEW STATUS: "+ status currentBuild.rawBuild.@result = hudson.model.Result.SUCCESS currentBuild.result = 'SUCCESS' } else{ println "++++> NEW STATUS: "+ status currentBuild.rawBuild.@result = hudson.model.Result.UNSTABLE } } } }
And the result is: Stage failed list Anyone can help me to show right status? Thank you!