In all the routines that loop through a range of virtual addresses, the loop
is controlled by subtracting the cache line size from the total length of the
request. After the subtract, a 'bpl' instruction was used, which branches if
the result of the subtraction is zero or greater, but we need to exit the
loop when the count hits zero. Thus, all the bpl instructions in those loops
have been changed to 'bhi' (branch if greater than zero).
In addition, the two routines that walk through the cache using set-and-index
were correct, but confusing. The loop control for those has been simplified,
just so that it's easier to see by examination that the code is correct.
Routines for other arm architectures and generations still have the bpl
instruction, but compensate for the off-by-one situation by decrementing
the count register by one before entering the loop.
PR: arm/174461
Approved by: cognet (mentor)