Trying to do real-time detection, but video is lagging behind

So I have done transfer learning on a Tensorflow model to detect rubik's cubes. Since I don't have a webcam, I am using an app called IP Webcam to use my phone's camera and grab the live feed with cv2, like this:

cap = cv.VideoCapture(0)
address = "http://{My IP}/video"

When I run the object detection in real-time (this is running on a gtx 1060), the model understandably can't keep up with the 30 fps of the camera, but instead of displaying, for example, the live detection at 10 fps, it seems to want to display all 30 frames even if it takes longer, resulting in the video feed not being real-time and if I move it takes around 5-10 seconds to show up in the video.

I don't know if this is an issue with Tensorflow or cv2? Is the issue that I'm not using a connected webcam?