Should I use activation function and normalization for regression?

I have a regression problem, a model in related paper use a min-max normalization to normalize input data and output data to -1 to 1 range, and it apply a tanh activation function in the last output layer. However, I found out it's very hard to train, the loss and rmse decrease slowly. If I remove the activation function in output layer, and do not use any data normalization, it get the best score. So, I have two questions:

  1. Do I have to use some activation function in last layer and some data normalization for a regression problem? (all features and true value are in the same scale, just like the house price in different area etc..)

  2. Even I remove the activation function in last layer, but I found out that if I don't use any data normalization, the loss decrease more faster. If I normalize the data to -1 to 1 range or 0 to 1 range (use min-max normalization), the result is always worse. But, why?