Efficiently accessing columns of a matrix in C

I have a Nx x Ny matrix U stored as a one-dimensional array of length Nx*Ny. In terms of application, each entry represents the solution value to some differential equation at the grid point (x_i, y_j), although I don't think that's important.

I am not very proficient in C, but I know that it is row-major, so to avoid too many cache misses, it is better to loop over the columns first:

#define U(i,j) U[j+Ny*i]
for (int i=0; i<Nx; ++i)
    for (int j=0; j<Ny; ++j)
        U(i,j) = i*j; // example operation

My algorithm requires me to do two different types of operations:

  1. For row i of U, do some computation that outputs row i of another array F
  2. For column j of U, do some computation that outputs column j of another array G

where F and G have the same length and "shape" as U. The goal is a computational step like this:

#define U(i,j) U[j+Ny*i]
#define F(i,j) F[j+Ny*i]
#define G(i,j) G[j+Ny*i]

for (int i; i<Nx; ++i)
    /* use U(i,:) to compute F(i,:); the : is just pseudocode short-hand to indicate an entire column or row */

for (int j; j<Ny; ++j)
    /* use U(:,j) to compute G(:,j) */

for (int i=0; i<Nx; ++i)
    for (int j=0; j<Ny; ++j)
        U(i,j) += F(i,j) + G(i,j); // example computation

I am struggling a bit to see how to do this computation efficiently. The steps that operate on rows of U seem fine, but then the operations on the columns of U will be quite slow, and entering values into G in a column-wise fashion will also be slow.

One method I thought of would involve storing both U and its transpose, that way operations on columns of U can be done on rows of UT. But I have to do the computational steps many thousands of times, and it seems like explicitly computing a transpose would be even slower. Likewise, I could assemble the transpose of G so that I'm only ever entering values in a row-major fashion, but then in the step U(i,j) += F(i,j) + G(j,i), I am now having to get column-wise values of G.

How should I deal with this situation in an efficient way?