# Efficiently accessing columns of a matrix in C

I have a `Nx x Ny`

matrix `U`

stored as a one-dimensional array of length `Nx*Ny`

. In terms of application, each entry represents the solution value to some differential equation at the grid point `(x_i, y_j)`

, although I don't think that's important.

I am not very proficient in C, but I know that it is row-major, so to avoid too many cache misses, it is better to loop over the columns first:

```
#define U(i,j) U[j+Ny*i]
for (int i=0; i<Nx; ++i)
for (int j=0; j<Ny; ++j)
U(i,j) = i*j; // example operation
```

My algorithm requires me to do two different types of operations:

- For row
`i`

of`U`

, do some computation that outputs row`i`

of another array`F`

- For column
`j`

of`U`

, do some computation that outputs column`j`

of another array`G`

where `F`

and `G`

have the same length and "shape" as `U`

. The goal is a computational step like this:

```
#define U(i,j) U[j+Ny*i]
#define F(i,j) F[j+Ny*i]
#define G(i,j) G[j+Ny*i]
for (int i; i<Nx; ++i)
/* use U(i,:) to compute F(i,:); the : is just pseudocode short-hand to indicate an entire column or row */
for (int j; j<Ny; ++j)
/* use U(:,j) to compute G(:,j) */
for (int i=0; i<Nx; ++i)
for (int j=0; j<Ny; ++j)
U(i,j) += F(i,j) + G(i,j); // example computation
```

I am struggling a bit to see how to do this computation efficiently. The steps that operate on rows of `U`

seem fine, but then the operations on the columns of `U`

will be quite slow, and entering values into `G`

in a column-wise fashion will also be slow.

One method I thought of would involve storing both `U`

and its transpose, that way operations on columns of `U`

can be done on rows of `UT`

. But I have to do the computational steps many thousands of times, and it seems like explicitly computing a transpose would be even slower. Likewise, I could assemble the transpose of `G`

so that I'm only ever entering values in a row-major fashion, but then in the step `U(i,j) += F(i,j) + G(j,i)`

, I am now having to get column-wise values of `G`

.

How should I deal with this situation in an efficient way?