Today we have 64Bit processors common, Intel & AMD are thinking about banning 32Bit registers,
Meanwhile ML, Machine Learning & Applications such as Image Composer, FFMPG, Blender, Gimp & Adobe Photoshop & Corel Draw, ..
Are dealing with the realities of editing & distributing, Efficient Media content..
Today the operations of MultiBit Integers & integer floats with remainder theory are called DOT4,..
But we have to clear up the utility & function of MultiBit Integers & deal with how they act on ML & General Maths!
The central causes of MultiBit Integers are several rules:
Memory Storage with no operations, Means that we can PackBytes in Larger Registers, Without effects except that we have to fetch the entire value.. For the set
Data Overflow Events:
We need to leave a digit above or below a number if the maths requires expanding the number larger than our number of digits..
Or reduces our number below our threshold to a decimal place number or negative..
We could handle negative values by for example, Having 8 digits & saying that 2 to 4 of them are -
We could handle Decimal values, The same way with 8 digits & 2 to 4 of them as fractional..
Both operations will result in totally different rules to the maths! This can cause Errors to happen when people who code & accountancy.. cannot see the difference!
But we can account for memory storage, Without errors & maths without overflows,..
Identical RAM Alignment & operational Cache buffering & fetch cycles.
Identical Addition & Division Effects:
ADDERS Roll numbers up & down, With the same operation on all values, We can use a single operation..
If we need variation in ADDER, multiple operations are logical & for that reason, We would single fetch the entire value & Automate the transformation into Maths operations.. In parallel..
With a less optimal approach, We can load the value with a special register & operate a single number before storing it again..
But we can carry out 4 identical operations on a MultiBit Integer operation, With a single pass..
Because we calculated that with a non fraction, All the numbers will move identically.
MUL Multiplication By solid non fractional numbers (Integers), Move all numbers the same..
Same basic rule set as additions of multiple numbers..
So we have identical non fractional,..
I mean 0.1 style operations that are complex, Not single digit or non standard multiplications & divisions, Fractional Additions & So on.. operations..
Multiple Operations with a single operator (Identical MUL / or ADD) Same operation..
Variety of operators (Non Identical MUL / or ADD), Special procedures to extract values from dataset, & multiple operations & then return to storage in RAM..
As long as we obey these laws, The operations make sense, ..
Photo, Art, Music, Recordings & those kinds of things in storage, Without operations by maths are safe.. This way.
Although, These rules are complex, We have many uses for DOT2, DOT4 or DOT8 & So on..
Nominally integer values in sets, 2, 4, 8 or more, Values per Memory Storage Slot, 32Bit for example..
While these operations are complex, They allow multiple smaller digits to be calculated & stored & also saved.
(c)Rupert Summerskill
*
DOT4 To explain simply "polyfill version of pack4xI8(), pack4xU8(), pack4xI8Clamp(), unpack4xI8() and unpack4xU8() on D3D12 backends"
As explained, The usage of virtual floats is possible.. Through various methods..
Example usage scenario of PackedBit & DOT4..
Multiple Synchronised operations in cache, With the same write cycle to RAM, If the values are still valid for the same packing address in ram..
SiMD for 2 to 4 operations in the same cycle..
Synchronised Integer instructions on the CPU, ..
Parallel FPU operations, If permitted..
Non identical operations on a Quad Packed Byte, SiMD parallel operations, Single write back from cache.
Example of function:
Involving a direct transform of Quad INT8 multipliers into INT32, Through Double INT8 x INT8 or INT8 x INT8 x INT8 x INT8 into 32Bit or 64Bit operations..
Involving addition of 4x 8Bit operations in a single INT32, ..
If the results are predicted to be inside the "data bit depth" of 8Bit, ..
The entire result set is written back at the came cache cycle.. To the memory array, As packed.
Roll .. all small values..
DOT4 a PackedByte can be in any value for a dataset range..
128Bit, 2xINT64, 4xINT32, 8xINT16, 16xINT8, 32xINT4
64Bit, 2xINT32, 4xINT16, 8xINT8, 16xINT4
32Bit, 2xINT16, 4xINT8, 8xINT4
16Bit, 2xINT8, 4xINT4
As you can see, RAM with 384Bit, Such as GPU & 256Bit such as TB of RAM on a server..
Has real alignment issues!
BytePacking, Dot4 & compression.. Have real uses.
(c)RS
*
PackedByte & DOT4 Operational Fetch Cycles.. Start with a simple grab from RAM of a 32Bit or 64Bit Byte set.. For our example..
Since the classification of SiMD, FPU & Integer function of a CPU is simple enough for the technology gifted ..
We start with our first priority, To justify the calling of 4 INT8 Byte Codes.. In a single fetch..
Which we know that we have to do!
Because of the Cache block size, Although if we subrequest.. we could fetch 1 to 4 blocks of INT8 relatively fast..
Firstly I shall be presuming that.. You know to store common data together & That is a presumption, One we cannot afford to make!
Secondly I shall assume that.. You know to store almost identically reasoned operations together.. So that the same maths result could be used for all? No I have to say it :p
Thirdly I shall know that SiMD with multiline inputs can do the entire fetch & also with identical maths.. or different maths.. Assuming that, We know :p
PackedByte & DOT4 Table:
Try to maintain streamlining & optimisation..
Store almost identically reasoned operations together.. So that the same maths result could be used for all
Store common data together, Especially if they use a common math solve & Particularly if ..
X+Y = Common Y, X*Y=Common Y & so on..
SiMD can multiply or ADD all operations at the same time, Line parity .. Means parallel & differential Maths..
SiMD with multiline inputs can do the entire fetch & also with identical maths.. or different maths
(c)RS
*
May take higher or lower bit depth & precisions: Rupert S 2021
2 16 Bit loads is 32Bit but takes 2 cycles...
16 Bit loads with 32 Bit Stores & Math unit:
Operation 1
16Bit , 16Bit , 16Bit , 16Bit Operation
\ / \ /
Inline Store
32Bit Store 32Bit Store
64Bit Store
\ /
32Bit ADD/DIV x 2 or 64Bit ADD/DIV x1
Operation 2
32Bit ADD/DIV x 2 or 64Bit ADD/DIV x1
\ /
4x 16Bit Store
4 x 16Bit Operation
MultiBit Serial & Parallel execution conversion inline of N*Bit -+
In the case of ADD -+ Signed for example:(c)RS
Plus & - Lines ADD or Subtract (Signed, Bit Depth Irrelevant)
Multiples of 16Bit works in place of 32Bit or 64Bit
V1: 16Bit Values composing a total 128Bit number
V2: 16Bit Values composing a total 128Bit number - (Value less than V1)
V3: Result
NBit: Bit Depth
4x16Bit operations in the same cycle >
If Value = 16Bit = Store
If Value = V3=Bit = Store * NBit
Stored 128Bit RAM or if remainder = less > 4x16Bit -1-1-1 ; 16Bit Value Store
RS
https://bit.ly/DJ_EQ
*
*RAND OP Ubuntu
https://pollinate.n-helix.com/
(Rn1 *<>/ Rn2 *<>/ Rn3)
-+
VAR(+-) Var = Rn1 +- Rn8
(Rn5 *<>/ Rn6 *<>/ Rn7)
4 Samples over N * Sample 1 to 4
Input into pool 1 Low half -+
Input into pool 1 High half -+
*RAND OP Recycle It
RS
*
https://science.n-helix.com/2021/11/parallel-execution.html
https://science.n-helix.com/2021/11/monticarlo-workload-selector.html
References:
https://science.n-helix.com/2018/01/integer-floats-with-remainder-theory.html
https://science.n-helix.com/2021/02/multi-operation-maths.html
https://science.n-helix.com/2021/11/parallel-execution.html
https://science.n-helix.com/2022/12/math-error-solve.html
On the subject of how deep a personality of 4Bit, 8Bit, 16Bit is reference:
https://science.n-helix.com/2021/03/brain-bit-precision-int32-fp32-int16.html
https://science.n-helix.com/2022/10/ml.html