Circular buffer in assembly of TMS320C6713

I have a filter (FIR) with 40 coefficients which take up 80 bytes total (2 bytes each). I am trying to implement a circular buffer for this filter but the size of the buffer must be a power of 2. The closest power of 2 I can use is 128 so that all the coefficients are included. After accessing the 40 coefficients though the pointer will go out of bounds. What can I do to solve this? Thanks in advance!