Oxygen Basic
Programming => Example Code => General => Topic started by: Charles Pegge on January 22, 2014, 02:29:58 PM
-
SIMD: Single Instruction Multiple Data
'SIMD ARITHMETIC USING DOUBLES
'2 DOUBLES CALCULATED IN PARALLEL
indexbase 0
double d[100]={1,2,3,4,5,6}
d[18]<=10, 100
movupd xmm0,d[0]
addpd xmm0,xmm0
movupd xmm1,d[2]
mulpd xmm0,xmm1
movupd xmm2,d[18]
divpd xmm0,xmm2
movupd d[16],xmm0
print "SIMD: " str(d[16],4) " " str(d[17],4)
-
Calculating Hypotenuse (single precision)
This requires an instruction from the SSE3 set, so it will only work with Intel CPUs 2004.. or AMD 2005..
'SIMD ARITHMETIC USING SINGLES
'HYPOTENUSE OF 3,4 and 1,1
single s[20]={3.0,4.0,1,1}
movups xmm1,s[0]
mulps xmm1,xmm1
subps xmm0,xmm0
haddps xmm0,xmm1 'Horizontal Add requires SSE3 instruction set (INTEL 2004 / AMD 2005)
sqrtps xmm1,xmm0
movups s[16],xmm1
print "SIMD: " str(s[18],4) " " str(s[19],4) 'result: 5 1.4142
Note that array expressions with a const index are useable in Assembler instructions. s[16] etc.
-
SIMD Comparison
Works on first elements only.
'SIMD COMPARES USING SINGLES
single s[20]={0,1,2,3,4,5,6,7}
movups xmm0,s[0]
movups xmm1,s[4]
comiss xmm0,xmm1 'comparing 0 with 4
jae nv
print "below"
.nv
-
SIMD Interleaving and Shuffling
'SIMD INTERLEAVING (LOW PAIR)
single s[20]={0,1,2,3,4,5,6,7}
movups xmm0,s[0]
movups xmm1,s[4]
unpcklps xmm0,xmm1
movups s[0],xmm0
print "SIMD: " s[0] " " s[1] " " s[2] " " s[3] " " ' 0 4 1 5
'SIMD INTERLEAVING (HIGH PAIR)
single s[20]={0,1,2,3,4,5,6,7}
movups xmm0,s[0]
movups xmm1,s[4]
unpckhps xmm0,xmm1
movups s[0],xmm0
print "SIMD: " s[0] " " s[1] " " s[2] " " s[3] " " ' 2 6 3 7
'SIMD SHUFFLING
single s[20]={0,1,2,3,4}
movups xmm0,s[0]
shufps xmm0,xmm0,0b00011011 'reverses order of elements 00 01 10 11
movups s[0],xmm0
print "SIMD: " s[0] " " s[1] " " s[2] " " s[3] " " ' 3 2 1 0
-
For those like myself wondering what Charles is up to.
Single instruction, multiple data (SIMD), is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously. Thus, such machines exploit data level parallelism. SIMD is particularly applicable to common tasks like adjusting the contrast in a digital image or adjusting the volume of digital audio. Most modern CPU designs include SIMD instructions in order to improve the performance of multimedia use.
(http://upload.wikimedia.org/wikipedia/commons/2/21/SIMD.svg)
-
Hi Charles,
Simd is quit simple.
single v1[4] = {1.1,2.2,3.3,4.4}
single v2[4] = {5.5,6.6,7.7,8.8}
single v3[4]
movups xmm0,v1
movups xmm1,v2
addps xmm0,xmm1
mulps xmm0,xmm1
subps xmm0,xmm1
movups v3,xmm0
print str(v3[1],8)
print str(v3[2],8)
print str(v3[3],8)
print str(v3[4],8)
-
Yes, it is very simple for parallel arithmetic, but shuffling values horizontally and diagonally is like trying to solve one of your magic number squares. :)
-
Pool congestion.
(http://files.allbasic.info/O2/duckpool.png)
-
X86: CPU, FPU, MMX, SSE. They are almost separate devices
(http://static2.wikia.nocookie.net/__cb20110415165552/dumbledoresarmyroleplay/images/0/07/517-magical-menagerie.jpg)
-
another simd for kids.
single A[4] = {5.0,0.0,6.6,3.1}
single B[4] = {1.1,2.0,2.2,4.0}
single C[4]
lea eax,A
lea edx,B
movups xmm0,[eax]
movups xmm1,[edx]
addps xmm0,xmm1
movups C,xmm0
print Str(C[1],4)
print Str(C[2],4)
print Str(C[3],4)
print Str(C[4],4)
-
Simple shuffling
single A[4] = {1.5,1.6,1.7,1.8}
single B[4] = {1.1,2.1,3.1,4.1}
single C[4]
lea eax,A
lea edx,B
movups xmm0,[eax]
MOVUPS xmm1,[edx]
MOVAPS xmm2,xmm0
MOVAPS xmm3,xmm1
SHUFPS xmm0,xmm0,0xD8
SHUFPS xmm1,xmm1,0xE1
MULPS xmm0,xmm1
SHUFPS xmm2,xmm2,0xE1
SHUFPS xmm3,xmm3,0xD8
MULPS xmm2,xmm3
subps xmm0,xmm2
movups C,xmm0
print Str(C[1],8)
print Str(C[2],8)
print Str(C[3],8)
print Str(C[4],8)