explain ground-thruth .mat file of an image for CNN

good evening,

i'm new to coding CNN i'v got ShanghaiTech crowd counting dataset that has (beside the images) .mat files for what i believe the ground truth for (counting) for images.

i try to print the content of one .mat file in python, here is what i get:

{'image_info': array([[array([[(array([[ 855.32345978,  590.49587357],
   [ 965.5908524 ,  472.79472415],
   [ 937.09478464,  400.93507502],
   [  42.5852337 ,  359.87860699],
   [1017.48233659,    8.99748811],
   [1017.48233659,   23.31916643]]), array([[920]], dtype=uint16))]],
  dtype=[('location', 'O'), ('number', 'O')])]], dtype=object), '__version__': '1.0', '__header__': 'MATLAB 5.0 MAT-file, Platform: PCWIN64, Created on: Fri Nov 18 20:06:05 2016', '__globals__': []}

each .mat file corresponds to one image, i know at some point in CNN we need to calculate the error between the network result and the ground truth we have, but i don't seem to understand the structure and the content of these .mat files.

can someone explain whats in these files and how or for what that content is used in crowd estimation.