Accessing GpuMat on another GPU
I'm pretty sure this is extremely dodgy code, but I just want to make sure. Assume I have two GPUs, and let us skip error checking:
cudaSetDevice(0);
cv::cuda::GpuMat foo(5, 5, CV_8UC1);
foo.at<uchar>(2, 2) = 128;
cudaSetDevice(1);
foo.at<uchar>(4, 4) = 192;
Is GpuMat
or the CUDA runtime smart enough to know that foo
data pointer points to GPU 0, or am I (as I suspect) accessing potentially unowned memory on GPU 1 with the foo.at<uchar>(4, 4) = 192;
line? I'm using OpenCV 3.2 and CUDA 8.0 on Ubuntu 16.04.
See also questions close to this topic
-
Can you deserialize bytes with memcpy?
If I have a class that has primitives in it, a bunch of ints or chars for example, is it possible to do deserialize and serialize it with memcpy?
MyClass toSerialize; unsigned char byteDump[sizeof(toSerialize)]; memcpy(&byteDump, &toSerialize, sizeof(toSerialize)); WriteToFile(byteDump);
Then on another program or computer, do this:
MyClass toDeserialize; unsigned char byteDump[sizeof(toSerialize)]; LoadFile(byteDump); memcpy(&toDeserialize, &byteDump, sizeof(byteDump));
I have cases where this does in fact work in the same program. But if I try to run it on other programs or PCs, it sometimes does not work and
MyClass
will have different values. Is this safe to do or not? -
Unity Native c++ plugin memory usage
I am allocating memory from my c++ plugin in Unity but I'm not sure if it is working correctly.
When I allocate ~25Gb(for testing purposes) of memory, it appears (according tho the windows task manager) that my memory is not being allocated. I then check the resource monitor and find that it says unity is using ~25Gb or memory?
Why is this?
also Paging file is set to 0.
-
C++ basic_string Destructor Segmentation Fault
I have a program that is trying to log some contents of a
std::vector<std::string>
and it keeps segmentation faulting when the destructor for thestd::vector<>
is called. Here is the output of mygdb
session:# gdb -p 11914 (gdb) Reading symbols... ### REST OF 'reading symbols' OMITTED FOR BREVITY ### (gdb) list StddsClient::sendDatisMsg() ### some output ### (gdb) b 716 (gdb) continue Continuing. Breakpoint 1, StddsClient::sendDatisMsg (this=0x7fffa42164a0) at StddsClient.cpp:716 716 if(!stdds_conn.isConnected()) (gdb) inspect atm_vec $1 = std::vector of length 3, capacity 4 = {"\001QU ANPDAXA\r\n.SFOATXA 232039\r\n\002TIS\r\nAD SFO /OS AD2039\r\n- SFO ARR INFO D 2039Z. 10010KT 9SM FEW025 29/20 A3002 (THREE ZERO ZERO TWO) RMK 51004 A02 SLP156. ILS APPROACHES ON RWY 28R, 28C, VISUAL APPROA"..., "INFO D", "A"} (gdb) continue Continuing. Program received signal SIGSEGV, Segmentation fault. 0x00007fb3ac6d3496 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() () from /usr/lib64/libstdc++.so.6 (gdb) backtrace #0 0x00007fb3ac6d3496 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() () from /usr/lib64/libstdc++.so.6 #1 0x0000000000410be0 in _Destroy<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > (this=<value optimized out>) at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/../../../../include/c++/4.4.6/bits/stl_construct.h:83 #2 __destroy<std::basic_string<char, std::char_traits<char>, std::allocator<char> >*> (this=<value optimized out>) at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/../../../../include/c++/4.4.6/bits/stl_construct.h:93 #3 _Destroy<std::basic_string<char, std::char_traits<char>, std::allocator<char> >*> (this=<value optimized out>) at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/../../../../include/c++/4.4.6/bits/stl_construct.h:116 #4 _Destroy<std::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > > (this=<value optimized out>) at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/../../../../include/c++/4.4.6/bits/stl_construct.h:142 #5 ~vector (this=<value optimized out>) at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/../../../../include/c++/4.4.6/bits/stl_vector.h:313 #6 StddsClient::sendDatisMsg (this=<value optimized out>) at StddsClient.cpp:735 #7 0x000000000041692f in StddsClient::checkConnection (this=0x7fff39fb1d70) at StddsClient.cpp:174 #8 0x000000000041714a in StddsClient::run (this=0x7fff39fb1d70) at StddsClient.cpp:557 #9 0x0000000000423c7d in main (argc=<value optimized out>, argv=<value optimized out>) at StddsCommd.cpp:21 (gdb)
As you can see, I put a breakpoint before the function finishes and I inspect the vector
atm_vec
and it's contents look okay to me. Nothing weird in there and everything looks correct to me.Here is the
StddsClient::sendDatisMsg()
function:709 void StddsClient::sendDatisMsg() 710 { 711 vector<string> atm_vec = atis_send_reader.readAtisSend(); 712 if(atm_vec.empty()) 713 { 714 return; 715 } 716 if(!stdds_conn.isConnected()) 717 { 718 TLOGMSG(LOG_ERR) << "Unable to send DATIS " << atm_vec[1] << " for EDITOR " << atm_vec[2] << " because the STDDS connection is currently down."; 719 return; 720 } 721 DatisMessage tis_msg(icao_airport, atm_vec[0]); 722 try 723 { 724 stdds_conn << tis_msg.getMessage(); 725 } 726 catch(std::exception& e) 727 { 728 TLOGMSG(LOG_ERR) << "Failed to send DATIS " << atm_vec[1] << " for EDITOR " << atm_vec[2] << ". "; 729 throw; 730 } 731 TLOGMSG(LOG_INFO) << "Successfully delivered DATIS " << atm_vec[1] << " for EDITOR " << atm_vec[2] << " to STDDS."; 732 if(RawLogger::logRawMessage(tis_msg.getMessage(), stdds_log_file) == -1) 733 { 734 LOGERR(LOG_ERR, "Failed to log TS message", errno); 735 } 736 }
In this scenario, the line
if(!stdds_conn.isConnected())
on line 716 in the above source code evaluates to true so it dips down into theif
construct and executes theTLOGMSG
macro. Here is the header and source file associated with theTLOGMSG
macro.Here is the TdlsLogger.hpp file:
TdlsLogger.hpp: #ifndef __TDLS_LOGGER_HPP__ #define __TDLS_LOGGER_HPP__ #include <sstream> #include <string> #include "tdls_log.h" using namespace std; #define LOGFLAGS ( LOG_PID | LOG_CONS | LOG_NOWAIT ) #define TLOGMSG(LVL) TdlsLogger((LVL), __FILE__, __LINE__).getStream() #define TLOGERR(LVL, ERR) TdlsLogger((LVL), __FILE__, __LINE__, (ERR)).getStream() class TdlsLogger { static string program_name; ostringstream oss; int log_lvl; int line_number; int errno; string file_name; public: TdlsLogger(int lvl, const string& fname, int linenum, int err = 0); ~TdlsLogger(); TdlsLogger(TdlsLogger& other); TdlsLogger& operator = (TdlsLogger& other); ostringstream& getStream(); void syslog_send(bool retain_contents = false); static void init(const string& progname, int fac); static void closeLog(); }; #endif
Here is the TdlsLogger.cpp file:
TdlsLogger.cpp: #include "TdlsLogger.hpp" string TdlsLogger::program_name; TdlsLogger::TdlsLogger(int lvl, const string& fname, int linenum, int err) { log_lvl = lvl; line_number = linenum; errno = err; file_name = fname; } TdlsLogger::~TdlsLogger() { syslog_send(); } TdlsLogger::TdlsLogger(TdlsLogger &other) { log_lvl = other.log_lvl; line_number = other.line_number; errno = other.errno; file_name = other.file_name; oss.str(other.oss.str()); } TdlsLogger& TdlsLogger::operator = (TdlsLogger &other) { log_lvl = other.log_lvl; line_number = other.line_number; errno = other.errno; file_name = other.file_name; oss.str(other.oss.str()); return *this; } void TdlsLogger::syslog_send(bool retain_contents) { if(oss.str().length() > 0) { tdls_log(log_lvl, oss.str().c_str(), file_name.c_str(), line_number, errno); if(!retain_contents) { oss.str(""); oss.clear(); } } } void TdlsLogger::init(const string& progname, int fac) { TdlsLogger::program_name.assign(progname); openlog(program_name.c_str(), LOGFLAGS, fac); } void TdlsLogger::closeLog() { closelog(); } ostringstream& TdlsLogger::getStream() { return oss; }
Here is how the vector is constructed from the AtisSendReader::readAtisSend() function if this matters somehow.
const std::string AtisSendReader::soh_str("<SOH>"); const std::string AtisSendReader::etx_str("<ETX>"); const std::string AtisSendReader::stx_str("<STX>"); const std::string AtisSendReader::info_str("INFO "); const std::string AtisSendReader::editor_str("/OS "); std::vector<std::string> AtisSendReader::readAtisSend() { std::vector<std::string> atm_vec; if(isAtisSendNew()) { bufferAtisSend(); } size_t soh_pos = read_buffer.find(soh_str); if(soh_pos == string::npos) { return atm_vec; } size_t etx_pos = read_buffer.find(etx_str, soh_pos); if(etx_pos == string::npos) { return atm_vec; } std::string atm_msg, info_code, editor_code; atm_msg.assign(read_buffer.begin() + soh_pos, read_buffer.begin() + etx_pos + etx_str.size()); read_buffer.erase(read_buffer.begin(), read_buffer.begin() + etx_pos + etx_str.size()); size_t char_pos = atm_msg.find(soh_str); if(char_pos == std::string::npos) { TLOGMSG(LOG_ERR) << "Unable to find <SOH> in ATM message extracted from atissend.fil."; } else { atm_msg.replace(char_pos, soh_str.size(), "^A"); } char_pos = atm_msg.find(stx_str); if(char_pos == std::string::npos) { TLOGMSG(LOG_ERR) << "Unable to find <STX> in ATM message extracted from atissend.fil"; } else { atm_msg.replace(char_pos, stx_str.size(), "^B"); } char_pos = atm_msg.find(etx_str); if(char_pos == std::string::npos) { TLOGMSG(LOG_ERR) << "Unable to find <ETX> in ATM message extracted from atissend.fil"; } else { atm_msg.replace(char_pos, etx_str.size(), "^C"); } char_pos = 0; while((char_pos = atm_msg.find("\n", char_pos)) != std::string::npos) { atm_msg.replace(char_pos, 1, "\r\n"); char_pos += 2; } size_t info_pos = atm_msg.find(info_str); if(info_pos != string::npos) { info_code.assign(atm_msg.begin() + info_pos, atm_msg.begin() + info_pos + info_str.size() + 1); } size_t editor_pos = atm_msg.find(editor_str); if(editor_pos != string::npos) { editor_code = atm_msg.at(editor_pos + editor_str.size()); } atm_vec.push_back(atm_msg); atm_vec.push_back(info_code); atm_vec.push_back(editor_code); return atm_vec; }
The only thing that I can think of is the
TLOGMSG
macro is somehow garbling up some memory. But, this macro is used everywhere in the source code for our team. And yet, it is only blowing up here.Here is how the
stdds_commd
binary is built:[tlytle@vraptor3 stdds]$ make debug g++ -I../../../src/inc -I../../../src/inc/odb -I../../../../../shared_code/logger -I../../../../../shared_code/socket -Wall -Wno-unknown-pragmas -O2 -g -c -o StddsMessage.o StddsMessage.cpp g++ -I../../../src/inc -I../../../src/inc/odb -I../../../../../shared_code/logger -I../../../../../shared_code/socket -Wall -Wno-unknown-pragmas -O2 -g -c -o TdlsStatusMsg.o TdlsStatusMsg.cpp g++ -I../../../src/inc -I../../../src/inc/odb -I../../../../../shared_code/logger -I../../../../../shared_code/socket -Wall -Wno-unknown-pragmas -O2 -g -c -o ClearanceDeliveryMsg.o ClearanceDeliveryMsg.cpp g++ -I../../../src/inc -I../../../src/inc/odb -I../../../../../shared_code/logger -I../../../../../shared_code/socket -Wall -Wno-unknown-pragmas -O2 -g -c -o StddsSendMsg.o StddsSendMsg.cpp g++ -I../../../src/inc -I../../../src/inc/odb -I../../../../../shared_code/logger -I../../../../../shared_code/socket -Wall -Wno-unknown-pragmas -O2 -g -c -o AtisSendReader.o AtisSendReader.cpp g++ -I../../../src/inc -I../../../src/inc/odb -I../../../../../shared_code/logger -I../../../../../shared_code/socket -Wall -Wno-unknown-pragmas -O2 -g -c -o StddsClient.o StddsClient.cpp g++ -I../../../src/inc -I../../../src/inc/odb -I../../../../../shared_code/logger -I../../../../../shared_code/socket -Wall -Wno-unknown-pragmas -O2 -g -c -o StddsConfig.o StddsConfig.cpp g++ -I../../../src/inc -I../../../src/inc/odb -I../../../../../shared_code/logger -I../../../../../shared_code/socket -Wall -Wno-unknown-pragmas -O2 -g -c -o StddsCommd.o StddsCommd.cpp g++ -I../../../src/inc -I../../../src/inc/odb -I../../../../../shared_code/logger -I../../../../../shared_code/socket -Wall -Wno-unknown-pragmas -O2 -g -c -o DatisMessage.o DatisMessage.cpp g++ -o stdds_commd StddsMessage.o TdlsStatusMsg.o ClearanceDeliveryMsg.o StddsSendMsg.o AtisSendReader.o StddsClient.o StddsConfig.o StddsCommd.o DatisMessage.o -L/opt/FAA/lib -ltdls -lnsl -lexpat -lodb -lodb-boost -lodb-mysql -lrt -lttshared
The
TdlsLogger
class is a part of thelibttshared.so
shared library.I have been working on this for about a week now. I keep running into problems like this where I somehow break something deep down in the guts of C++. This happens quite often to be honest. So often, in fact, that I am convinced that I am just not a very good programmer.
-
opencv-python cv2.imread() return a NoneType
I try to use
cv2.imread('~/Download/image.jpg')
to read an image, but it always returned aNoneType
. It seems that this function can not read any image. I am pretty sure that the path is right. Does anyone know something? Thanks -
CreateDIBSection ERROR_TAG_NOT_FOUND when transforming a PNG type resource into a cv::Mat
I'm currently using a modified version of the following code I found here to try and convert a .png resource in my project to a HBITMAP and then into a cv::Map.
cv::Mat Resource2mat(const HMODULE hModule, const LPCSTR lpPNGName) { cv::Mat src; HRSRC found = FindResource(hModule, lpPNGName, "PNG"); unsigned int size = SizeofResource(hModule, found); HGLOBAL loaded = LoadResource(hModule, found); void* resource_data = LockResource(loaded); /* Now we decode the PNG */ vector<unsigned char> raw; unsigned long width, height; int err = decodePNG(raw, width, height, (const unsigned char*)resource_data, size); if (err != 0) { cout<<"\nError while decoding png splash: "<< err <<endl; return src; } // copy from the window device context to the bitmap device context BITMAPV5HEADER bmpheader = { 0 }; bmpheader.bV5Size = sizeof(BITMAPV5HEADER); bmpheader.bV5Width = width; bmpheader.bV5Height = height; bmpheader.bV5Planes = 1; bmpheader.bV5BitCount = 32; bmpheader.bV5Compression = BI_BITFIELDS; bmpheader.bV5SizeImage = width * height * 4; bmpheader.bV5RedMask = 0x00FF0000; bmpheader.bV5GreenMask = 0x0000FF00; bmpheader.bV5BlueMask = 0x000000FF; bmpheader.bV5AlphaMask = 0xFF000000; bmpheader.bV5CSType = LCS_WINDOWS_COLOR_SPACE; bmpheader.bV5Intent = LCS_GM_BUSINESS; void* converted = NULL; HDC screen = GetDC(NULL); HBITMAP result = CreateDIBSection(screen, reinterpret_cast<BITMAPINFO*>(&bmpheader), DIB_RGB_COLORS, &converted, NULL, 0); cout << "Error Final: " << GetLastError() << endl; /* Copy the decoded image into the bitmap in the correct order */ for (unsigned int y1 = height - 1, y2 = 0; y2 < height; y1--, y2++) for (unsigned int x = 0; x < width; x++) { *((char*)converted + 0 + 4 * x + 4 * width*y2) = raw[2 + 4 * x + 4 * width*y1]; // Blue *((char*)converted + 1 + 4 * x + 4 * width*y2) = raw[1 + 4 * x + 4 * width*y1]; // Green *((char*)converted + 2 + 4 * x + 4 * width*y2) = raw[0 + 4 * x + 4 * width*y1]; // Red *((char*)converted + 3 + 4 * x + 4 * width*y2) = raw[3 + 4 * x + 4 * width*y1]; // Alpha } GetDIBits(screen, result, 0, height, src.data, (BITMAPINFO *)&bmpheader, DIB_RGB_COLORS); cv::Mat Actual = src.clone(); ReleaseDC(NULL, screen); /* Done! */ return Actual; }
my .rc file looks like this:
and the resources.h entry looks like this
When running the code and hitting this line, I end up with a 2012 ( ERROR_TAG_NOT_FOUND ) error
HBITMAP result = CreateDIBSection(screen, reinterpret_cast<BITMAPINFO*>(&bmpheader), DIB_RGB_COLORS, &converted, NULL, 0);
Found it by calling GetLastError() before and after this line of code
And this is how I call this function in my int main() :
HINSTANCE BotModuleHandle = GetModuleHandle(NULL); cout << "Attempting to load a resource: " << endl; cv::Mat S = Resource2mat(BotModuleHandle, MAKEINTRESOURCE(103));
Thanks in advance.
Also any suggestions for a better approach for converting a PNG resource into a cv::Mat are highly appreciated
-
OpenCV 3.4.1 Get Primal Form of Custom Trained Linear SVM HoG detectMultiScale
I have trained a Linear SVM in OpenCV 3.4.1. Now I want to use my custom SVM with OpenCV 3's HoG detectMultiScale function. The old method of setting the HoG detector with the custom SVM primal vector no longer works.
For OpenCV 2, one would get the Primal vector from the OpenCV 2 custom trained SVM like this:
#include "linearsvm.h" LinearSVM::LinearSVM() { qDebug() << "Creating SVM and loading trained data..."; load("/home/pi/trainedSVM.xml"); qDebug() << "Done loading data..."; } std::vector<float> LinearSVM::getPrimalForm() const { std::vector<float> support_vector; int sv_count = get_support_vector_count(); const CvSVMDecisionFunc* df = getDecisionFunction(); if ( !df ) { return support_vector; } const double* alphas = df[0].alpha; double rho = df[0].rho; int var_count = get_var_count(); support_vector.resize(var_count, 0); for (unsigned int r = 0; r < (unsigned)sv_count; r++) { float myalpha = alphas[r]; const float* v = get_support_vector(r); for (int j = 0; j < var_count; j++,v++) { support_vector[j] += (-myalpha) * (*v); } } support_vector.push_back(rho); return support_vector; }
Once the primal vector was created from the trained SVM data, one would set the HoG detector SVM like this:
// Primal for of cvsvm descriptor vector<float> primalVector = m_CvSVM.getPrimalForm(); qDebug() << "Got primal form of detection vector..."; qDebug() << "Setting SVM detector..."; // Set the SVM Detector - custom trained HoG Detector m_HoG.setSVMDetector(primalVector);
In OpenCV 3.4.1, this no longer works as CvSVM no longer exists and much of the SVM API has changed.
How do I get the primal vector of my custom SVM in OpenCV 3.4.1 that I trained like this:
// Set up SVM's parameters cv::Ptr<cv::ml::SVM> svm = cv::ml::SVM::create(); svm->setType(cv::ml::SVM::C_SVC); svm->setKernel(cv::ml::SVM::LINEAR); svm->setTermCriteria(cv::TermCriteria(cv::TermCriteria::MAX_ITER, 10, 1e-6)); // Train the SVM with given parameters cv::Ptr<cv::ml::TrainData> td = cv::ml::TrainData::create(trainingDataMat, cv::ml::ROW_SAMPLE, trainingLabelsMat); // Or auto train qDebug() << "Training dataset..."; QElapsedTimer trainingTimer; trainingTimer.restart(); svm->trainAuto(td); qDebug() << "Done training dataset in: " << (float)trainingTimer.elapsed() / 1000.0f;
Thanks.
-
Is it possible to run a whole application on GPU?
Question itself explains everything.
I don't care about how fast code will execute.
It's all about running a whole application on GPU. -
cuda Geforce 650m sm_30 slower than sm_20
I am using Visual Studio 2015 with CUDA 8.0 and Geforce 650m.
The card is of compute capability 3.0.
When I use settings compute_20,sm_20 my code runs around 150x slower in debug mode than in release mode - this is not what I am interested in.
When I use settings compute_30,sm_30 debug version runs as before, but release mode runs as slower as debug too now ... Furtheremore I use extra CPU thread to print progress of number of iterations via
CHECK(cudaHostGetDevicePointer((void **)&d_iter, (void *)h_iter, 0));
to avoid printing directly from kernel. This also not work in release version with sm_30.
Am I missing some additional settings in visual studio ?
-
How to find out the required amount of memory for the cusolverRF function?
in my project, I need to solve a bunch of sparse linear systems of equations
Ai*xi=bi
. Each matrixAi
have the same sparsity pattern and is not symmetric (in fact, those are pentadiagonal systems). Therefore, I chose to solve it with cusolver function cusolverRfBatchSolve().Essentially I perform LU-orthogonalization of the first matrix on the host and then use it to solve my system on GPU in batch.
Here is the weird thing: my test.cu sctipt with a bunch of dummy systems worked well. However, in my real program, after calling a class method that solves the batched system with
cusolverRfBatchSolve()
I noticed that the other array was changed, although it was not supposed to be changed.I made sure that this is not my fault. After that, I compared the pointer of the array that was changed unpredictably (called
wwp_d
in my program) with the whole bunch arrays that were used withcusolver
.I found out, that the array
d_Temp
is very close to the arraywwp_d
in memory. This is a temporary array that is used bycusolverRfBatchSolve()
function. It's size, according to both cuslover documentation and Cuda SampleCUDA Samples\v9.0\7_CUDALibraries\cuSolverRf
is supposed to ben*batchsize*sizeof(double)
, wheren
is the size of a single right-hand size andbatchsize
is the size of batch (number of linear systems to be solved).So, both documentation and cuda sample seems to be wrong. Is there a way to figure out the real amount of memory required for the temporary storage in
cusolverRfBatchSolve()
function? In fact, by increasing the amount of preallocated memory by a factor of 100 solves my problem: program starts to behave in a predictable way.P.S. My code is messy and a part of a bigger project. I am not going to reproduce it here, it will only confuse the community.
Thanks, Mikhail