Onnxruntime NodeJS set intraOpNumThreads and interOpNumThreads by execution mode
I'm using Onnxruntime in NodeJS to execute onnx
converted models in cpu
backend to run inference.
According to the docs, the optional parameters are the following:
var options = {
/**
*
*/
executionProviders: ['cpu'],
/*
* The optimization level.
* 'disabled'|'basic'|'extended'|'all'
*/
graphOptimizationLevel: 'all',
/**
* The intra OP threads number.
* change the number of threads used in the threadpool for Intra Operator Execution for CPU operators
*/
intraOpNumThreads: 1,
/**
* The inter OP threads number.
* Controls the number of threads used to parallelize the execution of the graph (across nodes).
*/
interOpNumThreads: 1,
/**
* Whether enable CPU memory arena.
*/
enableCpuMemArena: false,
/**
* Whether enable memory pattern.
*
*/
enableMemPattern: false,
/**
* Execution mode.
* 'sequential'|'parallel'
*/
executionMode: 'sequential',
/**
* Log severity level
* @see ONNX.Severity
* 0|1|2|3|4
*/
logSeverityLevel: ONNX.Severity.kERROR,
/**
* Log verbosity level.
*
*/
logVerbosityLevel: ONNX.Severity.kERROR,
};
Specifically, I can control (like in Tensorflow) the threading parameters intraOpNumThreads
and interOpNumThreads
, that are defined as above.
I want to optimize both of them for the sequential
and parallel
execution modes (controlled by executionMode
parameter defined above).
My approach was like
var numCPUs = require('os').cpus().length;
options.intraOpNumThreads = numCPUs;
in order to have at least a number of threads like the number of available cpus, hence on my macbook pro I get this session configuration for sequential
execution mode:
{
executionProviders: [ 'cpu' ],
graphOptimizationLevel: 'all',
intraOpNumThreads: 8,
interOpNumThreads: 1,
enableCpuMemArena: false,
enableMemPattern: false,
executionMode: 'sequential',
logSeverityLevel: 3,
logVerbosityLevel: 3
}
and for parallel
execution mode I set both:
{
executionProviders: [ 'cpu' ],
graphOptimizationLevel: 'all',
intraOpNumThreads: 8,
interOpNumThreads: 8,
enableCpuMemArena: false,
enableMemPattern: false,
executionMode: 'parallel',
logSeverityLevel: 3,
logVerbosityLevel: 3
}
or another approach could be to consider a percentage of the available cpus:
var perc = (val, tot) => Math.round( tot*val/100 );
var numCPUs = require('os').cpus().length;
if(options.executionMode=='parallel') { // parallel
options.interOpNumThreads = perc(50,numCPUs);
options.intraOpNumThreads = perc(10,numCPUs);
} else { // sequential
options.interOpNumThreads = perc(100,numCPUs);
options.intraOpNumThreads = 1;
}
but I do not find any doc to ensure this is the optimal configuration for those two scenaries based on the executionMode ('sequential' and 'parallel' execution modes). Is theoretically correct this approach?
do you know?
how many words do you know