ONNX Runtime error: node->GetOutputEdgesCount() == 0 was false. Can't remove node
I have a simple Keras RNN model, composed by embedding, LSTM, and linear layers:
loaded_model.layers
Out[23]:
[<keras.layers.embeddings.Embedding at 0x2275dc1f6a0>,
<keras.layers.recurrent_v2.LSTM at 0x2275dc8d5b0>,
<keras.layers.core.dense.Dense at 0x2275dd17730>,
<keras.layers.core.activation.Activation at 0x2275de3ee80>]
The model works well in Keras when dumped and loaded. I converted the loaded model to ONNX opset 15 using tf2onnx.convert.from_keras
, but I get this error when I init the InferenceSession
object:
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: D:\a\_work\1\s\onnxruntime\core\graph\graph.cc:3275 onnxruntime::Graph::RemoveNode node->GetOutputEdgesCount() == 0 was false. Can't remove node sequential/lstm_7/transpose as it still has output edges.
This is the relevant node in Netron:
Indeed it has output edges...
I don't want to see this error. Is this some kind of optimization that I can turn off in InferenceSession
with disabled_optimizers=...
? (this argument is not documented unfortunately)
Thank you.
do you know?
how many words do you know
See also questions close to this topic
-
Preparing input data for LSTM layer with conditions
I have a data frame that looks like the one below:
DF.head(20): time var1 var2 prob 12:30 10 12 85 12:31 15 45 85 12:32 18 12 85 12:33 17 26 85 12:34 11 14 85 12:35 14 65 85 12:36 19 29 92 12:37 15 32 92 12:38 13 44 92 12:39 15 33 92 12:40 11 15 92 12:41 15 45 92 12:42 13 44 94 12:43 15 33 94 12:44 11 15 94 12:45 15 45 94 12:46 13 44 92 12:47 15 33 92 12:48 11 15 92 12:49 15 45 92
I want to predict the value of prob for a sequence of 6 previous values. So for the given example, I will take two-time series -> var1 and var2 from time 12:30 to 12:35 to predict prob for 12:35. the input shape that will go to LSTM as per my knowledge will be (df. shape[0],6,1). but I do not know how to convert my input from 2 dimensions to 3 dimensions. I also have a condition where I need to see the previous 6 times only if they are all under the same prob value. so in the given example, I won't be able to take the previous 6 values for prob = 94 as 94 occurs only 4 times and I cannot make 6 timesteps from that.
My pseudo code looks like this:
for i in range(df.shape[0]): #loop across all rows if final_df[i,'prob'] == final_df[i+1,'prob']: #go until the value of prob change make multiple non overlaping dataframes of shape (6,2) else: continue
I need help building the logic and preparing the input data for my LSTM.
-
Replace bidirectional LSTM with GRU in coref?
I am training the coarse-to-fine coreference model (for some other language than English) from Allennlp with template configs from bert_lstm.jsonnet. When I replace the type “lstm” of the context layer with “gru”, it works, but seems to have very little impact on training. The same 63 GB of RAM are consumed each epoch, validation f1-score is hovering around the same value. Is this change in config actually replace Bi-LSTM layer with Bi-GRU layer, or am I missing something?
"context_layer": { "type": "gru", "bidirectional": true, "hidden_size": gru_dim, "input_size": bert_dim, "num_layers": 1 },
-
NaNs in predictions with LSTM
I have an LSTM model that I have trained and tested it with a dataset. Now I want to test it to an other dataset and I use the following snippet:
from keras.models import load_model import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.preprocessing import MinMaxScaler from sklearn.metrics import mean_squared_error from sklearn.metrics import mean_absolute_percentage_error model = load_model('lstm.h5') df = pd.read_csv('datasets/Residential_4.csv') data = df['energy_kWh'].values data = data.reshape((-1,1)) scaler = MinMaxScaler(feature_range=(0,1)) data = scaler.fit_transform(data) lookback = 7 * 24 prediction_horizon = 24 X_test, Y_test = Create_Dataset(data, lookback, prediction_horizon) X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1)) test_predict = model.predict(X_test) test_predict = scaler.inverse_transform(test_predict) Y_test = scaler.inverse_transform(Y_test) def Create_Dataset(df, lookback=1, prediction_horizon=1): X, Y = [], [] for i in range(lookback, len(df)-lookback): X.append(df[i-lookback : i, 0]) Y.append(df[i : i + prediction_horizon, 0]) return np.array(X), np.array(Y)
The problem however is that the
test_predict
hasNaN
values after the row 96. Any idea of why is this happening? -
Onnxruntime NodeJS set intraOpNumThreads and interOpNumThreads by execution mode
I'm using Onnxruntime in NodeJS to execute
onnx
converted models incpu
backend to run inference.According to the docs, the optional parameters are the following:
var options = { /** * */ executionProviders: ['cpu'], /* * The optimization level. * 'disabled'|'basic'|'extended'|'all' */ graphOptimizationLevel: 'all', /** * The intra OP threads number. * change the number of threads used in the threadpool for Intra Operator Execution for CPU operators */ intraOpNumThreads: 1, /** * The inter OP threads number. * Controls the number of threads used to parallelize the execution of the graph (across nodes). */ interOpNumThreads: 1, /** * Whether enable CPU memory arena. */ enableCpuMemArena: false, /** * Whether enable memory pattern. * */ enableMemPattern: false, /** * Execution mode. * 'sequential'|'parallel' */ executionMode: 'sequential', /** * Log severity level * @see ONNX.Severity * 0|1|2|3|4 */ logSeverityLevel: ONNX.Severity.kERROR, /** * Log verbosity level. * */ logVerbosityLevel: ONNX.Severity.kERROR, };
Specifically, I can control (like in Tensorflow) the threading parameters
intraOpNumThreads
andinterOpNumThreads
, that are defined as above.I want to optimize both of them for the
sequential
andparallel
execution modes (controlled byexecutionMode
parameter defined above). My approach was likevar numCPUs = require('os').cpus().length; options.intraOpNumThreads = numCPUs;
in order to have at least a number of threads like the number of available cpus, hence on my macbook pro I get this session configuration for
sequential
execution mode:{ executionProviders: [ 'cpu' ], graphOptimizationLevel: 'all', intraOpNumThreads: 8, interOpNumThreads: 1, enableCpuMemArena: false, enableMemPattern: false, executionMode: 'sequential', logSeverityLevel: 3, logVerbosityLevel: 3 }
and for
parallel
execution mode I set both:{ executionProviders: [ 'cpu' ], graphOptimizationLevel: 'all', intraOpNumThreads: 8, interOpNumThreads: 8, enableCpuMemArena: false, enableMemPattern: false, executionMode: 'parallel', logSeverityLevel: 3, logVerbosityLevel: 3 }
or another approach could be to consider a percentage of the available cpus:
var perc = (val, tot) => Math.round( tot*val/100 ); var numCPUs = require('os').cpus().length; if(options.executionMode=='parallel') { // parallel options.interOpNumThreads = perc(50,numCPUs); options.intraOpNumThreads = perc(10,numCPUs); } else { // sequential options.interOpNumThreads = perc(100,numCPUs); options.intraOpNumThreads = 1; }
but I do not find any doc to ensure this is the optimal configuration for those two scenaries based on the executionMode ('sequential' and 'parallel' execution modes). Is theoretically correct this approach?
-
Installing ncnn on Google Colab
I have trained a custom YOLOX model on google colab and want to convert it from .onnx to .ncnn.
I'm using the following as directions: https://github.com/Megvii-BaseDetection/YOLOX/blob/main/demo/ncnn/cpp/README.md#step4
Step 1 requires building ncnn with directions: https://github.com/Tencent/ncnn/wiki/how-to-build#build-for-macos
The directions give instructions for building on different devices.
My question: Which instructions should I use to build ncnn on Google Colab?
-
yolov5 custom trained weights converted to ONNX showing wrong labels
After converting custom trained Yolov5 weights (.pt) to ONNX and running inference on the ONNX file using:
https://github.com/BlueMirrors/Yolov5-ONNX.git
the detection works well but my image labels/classes are using COCO labels (ie person, airplane, etc) instead of my labels. How can I change the labels to my own. I'm unsure of the formatting of the json (or yaml?) file. Thanks!
def detect_image(device, weight, image_path, output_image): # load model model = Yolov5Onnx(classes="coco", backend="onnx", weight=weight, device=device) # read image image = cv2.imread(image_path) # inference preds = model(image) print(preds) # draw image preds.draw(image) # write image cv2.imwrite(output_image, image)
-
How to use onnxruntime parallel with flask?
Created a server that want to run a session of onnxruntime parallel.
First question, will be used multi-threads or multi-processings?
Try to use multi-threads,
app.run(host='127.0.0.1', port='12345', threaded=True)
.
When run 3 threads that the GPU's memory less than 8G, the program can run. But when run 4 threads that the GPU's memory will be greater than 8G, the program have error:onnxruntime::CudaCall CUBLAS failure 3: CUBLAS_STATUS_ALLOC_FAILED.
I know that the problem is leaky of GPU's memory. But I hope that the program don't run crash. So I try to limit the number of threads, and set
intra_op_num_threads = 2
orinter_op_num_threads = 2
oros.environ["OMP_NUM_THREADS"] = "2"
, but don't work. Try to'gpu_mem_limit'
, don't work either.import onnxruntime as rt from flask import Flask, request app = Flask(__name__) sess = rt.InferenceSession(model_XXX, providers=['CUDAExecutionProvider']) @app.route('/algorithm', methods=['POST']) def parser(): prediction = sess.run(...) if __name__ == '__main__': app.run(host='127.0.0.1', port='12345', threaded=True)
My understanding is that the Flask HTTP server maybe use different sess for each call. How can make each call use the same session of onnxruntime?
System information
- OS Platform and Distribution: Windows10
- ONNX Runtime version: 1.8
- Python version: python 3.7
- GPU model and memory: RTX3070 - 8G
-
How to use onnxruntime with flask
Created a server that can run a session with multi-threads using Flask.
When run 3 threads that the GPU's memory less than 8G, the program can run. But when run 4 threads that the GPU's memory will be greater than 8G, the program have error: onnxruntime::CudaCall CUBLAS failure 3: CUBLAS_STATUS_ALLOC_FAILED.
I know that the problem is leaky of GPU's memory. But I hope that the program don't run crash. So I try to limit the number of threads, and set
intra_op_num_threads = 2
orinter_op_num_threads = 2
oros.environ["OMP_NUM_THREADS"] = "2"
, but don't work.
Try to'gpu_mem_limit'
, don't work eitherimport onnxruntime as rt from flask import Flask, request app = Flask(__name__) sess = rt.InferenceSession(model_XXX, providers=['CUDAExecutionProvider']) @app.route('/algorithm', methods=['POST']) def parser(): prediction = sess.run(...) if __name__ == '__main__': app.run(host='127.0.0.1', port='12345', threaded=True)
My understanding is that the Flask HTTP server maybe use different
sess
for each call.
How can make each call use the same session of onnxruntime?System information
- OS Platform and Distribution: Windows10
- ONNX Runtime version: 1.8
- Python version: python 3.7
- GPU model and memory: RTX3070 - 8G
-
pth to onnx model convertion
I am trying to convert the pth format to onnx, while converting, I face this issue,
RuntimeError: Exporting the operator _convolution_mode to ONNX opset version 13 is not supported. Please feel free to request support or submit a pull request on PyTorch GitHub.
pytorch version is 1.11.0+cpu
model -
cnn = nn.Sequential() cnn.add_module('l1', nn.Conv2d(3, 32, 2, 1, "same")) cnn.add_module('l2', nn.ReLU()) cnn.add_module('l3', nn.MaxPool2d(3, 3)) ... cnn.add_module('fl', nn.LogSoftmax()) torch.onnx.export(model, torch.zeros(16, 3, 224, 224).to(device), save_file, verbose=True, opset_version=13, export_params=True, training=torch.onnx.TrainingMode.EVAL, do_constant_folding=True, input_names = ['cnn.l1'], output_names=['cnn.fl'], dynamic_axes={'cnn.l1': {0: 'batch'}, 'cnn.fl': {0: 'batch'} })
In this, I got two warnings,
UserWarning: Using padding='same' with even kernel lengths and odd dilation may require a zero-padded copy of the input be created (Triggered internally at ../aten/src/ATen/native/Convolution.cpp:744.)
Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
How to successfully convert the pth file to onnx, thanks
-
tf2onnx Unsupported ops: Counter({'SigmoidGrad': 2, 'StridedSliceGrad': 1}) when converting saved model to onnx file
i´m trying to convert a tensorflow model saved to a onnx file to be diggested by c#. I´m using tf2oonx
python -m tf2onnx.convert --saved-model "C:\\Users\\goncalo\\OneDrive\\Área de Trabalho\\cyclop\\My_Class_Model" --output "C:\\Users\\goncalo\\OneDrive\\Área de Trabalho\\cyclop\\My_Class_Model_opt_16.onnx" --opset 15
and it gives me this erros:
2022-05-03 12:01:16,131 - ERROR - Tensorflow op [StatefulPartitionedCall/gradient_tape/grad_cam__class/strided_slice/StridedSliceGrad: StridedSliceGrad] is not supported 2022-05-03 12:01:16,131 - ERROR - Tensorflow op [StatefulPartitionedCall/gradient_tape/grad_cam__class/model/Head_out_fc/Sigmoid/SigmoidGrad: SigmoidGrad] is not supported 2022-05-03 12:01:16,134 - ERROR - Tensorflow op [StatefulPartitionedCall/gradient_tape/grad_cam__class/model/Head_fc/swish_activation_1/Sigmoid/SigmoidGrad: SigmoidGrad] is not supported 2022-05-03 12:01:16,149 - ERROR - Unsupported ops: Counter({'SigmoidGrad': 2, 'StridedSliceGrad': 1})
And i´m a bit stucked. Someone?
Thanks
-
Unhandled exception at 0x00007FFABE6A9538 (cudnn_cnn_infer64_8.dll) in Onnx.exe
I am using onnxrutime-gpu for running object detection model in C++. I installed onnxruntime GPU version 1.6.0. I am using it in visual studio 2019. But no matter what version I use, I am getting this error "Unhandled exception at 0x00007FFABE6A9538 (cudnn_cnn_infer64_8.dll) in Onnx.exe". Model is loaded successfully with onnxruntime-gpu , while performing the inference, it gives this error. Please help me figure this out. Model can be loaded and run successfully with onnxruntime CPU.
-
How to convert TensorFlow 2 saved model to be used with OpenCV dnn.readNet
I am struggling to find a way to convert my trained network using TensorFlow 2 Object detection API to be used with OpenCV for deployment purposes. I tried two methods for that but without success. Could someone help me resolve this issue or propose the best and easy deep learning framework to convert my model to OpenCV (OpenCV friendly)? I really appreciate any help you can provide.
This is my information system
OS Platform: Windows 10 64 bits
Tensorflow Version: 2.8
Python version: 3.9.7
OpenCV version: 4.5.5
1st Method: Using tf2onnx
I used the following code since I am using TensorFlow 2
python -m tf2onnx.convert --saved-model tensorflow-model-path --output model.onnx --opset 15
The conversion process generates the model.onnx successfully and returns the following:
However, when I try to read the converted model, I get the following error:
File "C:\Tensorflow\testcovertedTF2ToONNX.py", line 10, in <module> net = cv2.dnn.readNetFromONNX('C:/Tensorflow/model.onnx') cv2.error: Unknown C++ exception from OpenCV code
The code used to read the converted network is simple.
import cv2 import numpy as np image = cv2.imread("img002500.jpg") if image is None: print("image emplty") image_height, image_width, _ = image.shape net = cv2.dnn.readNetFromONNX('model.onnx') image = image.astype(np.float32) input_blob = cv2.dnn.blobFromImage(image, 1, (640,640), 0, swapRB=False, crop=False) net.setInput(input_blob) output = net.forward()
2nd Method: Trying to get Frozen graph from saved model
I tried to get frozen_graph.pb from my saved_model using the script below, found in
https://github.com/opencv/opencv/issues/16879#issuecomment-603815872import tensorflow as tf print(tf.__version__) from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2 loaded = tf.saved_model.load('models/mnist_test') infer = loaded.signatures['serving_default'] f = tf.function(infer).get_concrete_function(input_tensor=tf.TensorSpec(shape=[None, 640, 640, 3], dtype=tf.float32)) f2 = convert_variables_to_constants_v2(f) graph_def = f2.graph.as_graph_def() # Export frozen graph with tf.io.gfile.GFile('frozen_graph.pb', 'wb') as f: f.write(graph_def.SerializeToString())
Then, I tried to generate the text graph representation (graph.pbtxt) using tf_text_graph_ssd.py found in https://github.com/opencv/opencv/wiki/TensorFlow-Object-Detection-API
python tf_text_graph_ssd.py --input path2frozen_graph.pb --config path2pipeline.config --output outputgraph.pbtxt
The execution of this script returns the following error:
cv.dnn.writeTextGraph(modelPath, outputPath) cv2.error: OpenCV(4.5.5) D:\a\opencv-python\opencv-python\opencv\modules\dnn\src\tensorflow\tf_graph_simplifier.cpp:1052: error: (-215:Assertion failed) permIds.size() == net.node_size() in function 'cv::dnn::dnn4_v20211220::sortByExecutionOrder' During the handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\Tensorflow\generatepBtxtgraph\tf_text_graph_ssd.py", line 413, in <module> createSSDGraph(args.input, args.config, args.output) File "C:\Tensorflow\generatepBtxtgraph\tf_text_graph_ssd.py", line 127, in createSSDGraph writeTextGraph(modelPath, outputPath, outNames) File "C:\Tensorflow\generatepBtxtgraph\tf_text_graph_common.py", line 320, in writeTextGraph from tensorflow.tools.graph_transforms import TransformGraph ModuleNotFoundError: No module named 'tensorflow.tools.graph_transforms'
Trying to read the generated frozen model without a graph.pb using dnn.readNet the code below:
import cv2 import numpy as np image = cv2.imread("img002500.jpg") if image is None: print("image emplty") image_height, image_width, _ = image.shape net = cv2.dnn.readNet('frozen_graph_centernet.pb') image = image.astype(np.float32) # create blob from image (opencv dnn way of pre-processing) input_blob = cv2.dnn.blobFromImage(image, 1, (1024,1024), 0, swapRB=False, crop=False) net.setInput(input_blob) output = net.forward()
returns the following error
Traceback (most recent call last): File "C:\Tensorflow\testFrozengraphTF2.py", line 14, in <module> output = net.forward() cv2.error: OpenCV(4.5.5) D:\a\opencv-python\opencv-python\opencv\modules\dnn\src\dnn.cpp:621: error: (-2:Unspecified error) Can't create layer "StatefulPartitionedCall" of type "StatefulPartitionedCall" in function 'cv::dnn::dnn4_v20211220::LayerData::getLayerInstance'
I understand that OpenCV doesn't import models with StatefulPartitionedCall (TF Eager mode). Unfortunately, this means the script found to export my saved model to frozen_graph did not work.
saved model
you can get my saved model from the link below
https://www.dropbox.com/s/liw5ff87rz7v5n5/my_model.zip?dl=0
#note: the exported model works well with the TensorFlow script