January 26, 2023, 08:59:59 PM

News:

Own IWBasic 2.x ? -----> Get your free upgrade to 3.x now.........


How to align variables in a subroutine

Started by sapero, August 18, 2009, 04:35:14 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

sapero

Sometimes you need a structure to be aligned on 8 or 16 bytes boundary - for example if your using SSE instructions. You did it probably by allocating memory and aligning the returned pointer:
unaligned = new(matrix, count+1);
aligned = &*unaligned[1] & ~15;
...
delete unaligned;

This is another method, with no need to allocate memory. All it does is to align the stack pointer before calling a subroutine. This is a inline macro:
#asm
%macro SSEALIGN 0-1
and esp,~15 ; 16 bytes boundary (sse/sse2)
%if %0=1
%assign x ((2-(%1))&3)
%if x>0
sub esp,x*4
%endif
%else
%assign x 2
sub esp,8
%endif
%endmacro

%macro SSERESTORE 0
%if x>0
add esp,x*4
%endif
%endmacro
#endasm


SSEALIGN has one optional parameter: number of 32-bit parameters passed to the called subroutine. Use 1 for each integer/word/pointer, 2 for each double and int64. If your passing a structure by value, use sizeof(struct)/4.
SSERESTORE macro is used only if you need to call another subroutine with different number of parameters, but only if SSEALIGN was previously used in the same subroutine.

This is a example with 4*float matrix for use with SSE1. The matrix must be aligned to 16 bytes boundary if we need the real SSE speed:#asm
%macro SSEALIGN 0-1
and esp,~15 ; 16 bytes boundary (sse/sse2)
%if %0=1
%assign x ((2-(%1))&3)
%if x>0
sub esp,x*4
%endif
%else
%assign x 2
sub esp,8
%endif
%endmacro

%macro SSERESTORE 0
%if x>0
add esp,x*4
%endif
%endmacro
#endasm

struct SSE1MATRIX
{
float f[4];
}

sub main()
{
SSE1MATRIX xx; // here &xx can be unaligned
print("initial xx: ",hex$(&xx));
#emit SSEALIGN
sub1();
}


sub sub1()
{
SSE1MATRIX xx;
print("xx: ",hex$(&xx));
#emit SSEALIGN
sub2();
}


sub sub2()
{
SSE1MATRIX xx;
print("xx: ",hex$(&xx));
#emit SSEALIGN
sub3();
#emit SSERESTORE  ;// unalign
#emit SSEALIGN 2  ;// set alignment for another function
sub4(7,6);
}


sub sub3()
{
SSE1MATRIX xx1;
SSE1MATRIX xx2;
SSE1MATRIX xx3;

print("xx1: ",hex$(&xx1));
print("xx2: ",hex$(&xx2));
print("xx3: ",hex$(&xx3));

xx1.f = 100,200,300,400;
xx2.f = 1,2,3,4;
// xx3 = xx1 + xx2
// if xx1,xx2,xx3 is not aligned, you'll get an exception here
#emit movaps xmm0,[ebp-16] ; xmm0 = xx1
#emit addps  xmm0,[ebp-32] ; xmm0 += xx2
#emit movaps [ebp-48],xmm0 ; xx3 = xmm0
// finished with SSE, make FPU happy
#emit emms
print("sse test: ",xx3.f[0],", ",xx3.f[1],", ",xx3.f[2],", ",xx3.f[3]);

// if the subroutine is returning a 32-bit value and you save it, add one to the number of values:
// SSEALIGN 2+1
// result = function(8,9);
int result;
#emit SSEALIGN 2+1
result=sub4(7,6);
}


sub sub4(int yy,int uu),int
{
SSE1MATRIX xx;
print("xx: ",hex$(&xx));
return &xx;
}


Without using SSEALIGN, the matrix variable will be aligned only randomly