SPO600 Lab 5 – SIMD Lab (Part 2)

Part 2: Inline Assembler

2. Modify the add.c to calculate b mod a using inline assembler and print the result.

#include <stdio.h>

// This is a very simple example of inline assembler.
// On AArch64, this code will calculate c=a+b then 
// print the value of c.
//
int main() {
	int a = 3;
	int b = 19;
	int c;
	int d;

	// __asm__ ("assembley code template" : outputs : outputs : clobbers)
	// __asm__ ("add %0, %1, %2" : "=r"(c) : "r"(a),"r"(b) );

	__asm__("udiv %0, %1, %2" : "=r"(c) : "r"(b), "r"(a) );
	__asm__("msub %0, %1, %2, %3" : "=r"(c) : "r"(c), "r"(a), "r"(b) );

	printf("%d\n", c);
}

Result:

[cle@aarchie simd_lab]$ time ./add
1

real	0m0.005s
user	0m0.001s
sys	0m0.004s

3. vol_inline.c contains a version of the volume scaling problem which uses inline assembler and the SQDMULH instruction. Copy, build and verify the operation of this program.

Default number of samples:

vol.h
#define SAMPLES 5000000

Results:

[cle@aarchie simd_lab]$ time ./vol_inline
Generating sample data.
Scaling samples.
Summing samples.
Result: 930

real	0m0.521s
user	0m0.500s
sys	0m0.020s

Decreased number of samples:

vol.h
#define SAMPLES 500

Results:

[cle@aarchie simd_lab]$ time ./vol_inline
Generating sample data.
Scaling samples.
Summing samples.
Result: 152

real	0m0.005s
user	0m0.001s
sys	0m0.004s

Increased number of samples:

vol.h
#define SAMPLES 90000000

Results:

[cle@aarchie simd_lab]$ time ./vol_inline
Generating sample data.
Scaling samples.
Summing samples.
Result: 713

real	0m9.404s
user	0m8.999s
sys	0m0.379s

Part 3: C Intrinsics

1. Default Result (changed SAMPLES in vol.h back to 5000000):

[cle@aarchie simd_lab]$ time ./vol_intrinsics 
Generating sample data.
Scaling samples.
Summing samples.
Result: 930

real	0m0.522s
user	0m0.500s
sys	0m0.020s

Increased number of samples:

#define SAMPLES 80000000

Results:

[cle@aarchie simd_lab]$ time ./vol_intrinsics 
Generating sample data.
Scaling samples.
Summing samples.
Result: -219

real	0m8.294s
user	0m8.012s
sys	0m0.239s

Q1: What do these intrinsic functions do?

vst1q_s16(cursor, vqdmulhq_s16(vld1q_s16(cursor), vdupq_n_s16(vol_int)));

Q2: Why is the increment below 8 instead of 16 or some other value?

Q3: Why is this line not needed in the inline assembler version of this program?

Q4: Are the results usable? Are they accurate?

Published by cindyledev

Full Stack Developer, Computer Programmer, and Analyst

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create your website at WordPress.com
Get started
%d bloggers like this: