Saturday, February 13, 2021

Multi Operation Maths - CPU,GPU Computation

Multi Operation Maths - CPU,GPU Computation (c)RS

Performing multiple 4,8,16,32 operations on a 64Bit integer core (The example)



Kind of an F16 operation & Integer 16 or Int8 if you need it, With careful management and special libraries ..
Capable of speeding up PC,Mac & Consoles :HPC:
Requires specially compiled libraries so compiled codes can be managed & roll ops assessed.

Rules:

All operations need to be by the same multiplication

Rolls usable to convert value for example Mul & Division


For example :


451 722 551 834 x 6


In the case of non base factor roll numbers

We have to fraction the difference between the value and our base roll number,

10 for example and 6, So the maths is convoluted & may not be worth it,

Could do 6 + rolls & then -Rolls

On a 10 processor the first factor would be 10x because we could compensate by placement

But we still need space to expand the result to the right or left

0451072205510834 x 10 =

4510722055108340

or 4510 roll -12
7220 roll -8
5510 roll -4
8340 no roll

Converting base 10 to & from hex may make sense

Depending on the cost of roll; This operation may be worth it!

This operation is in Base 10 & 8Bit makes more sense mostly for common operations in hex..

But 8 is not a very big number for larger maths & 16Bit makes more sense; Because it has a larger number.

Performing numeric expansion:

consoles in particular and FPU where expansion is required for emergence mathematics

Performing numeric expansion for circumstances where we require larger numbers for example:

To fill the 187 FPU buffer..

To do that we will roll to the left & expand the number, although we may need multiple operations..

Like i say : Roll + or Roll -

1447000
-Roll 3 = 1447
or
+Roll 3 = 1447000000

That way we can potentially exceed the Bit Depth 32Bit for example.

Rupert S https://science.n-helix.com


*****

Packed F16C & F16 Values in use on CPU & GPU - RS

F16C & F16 : lower precision values that are usable to optimise GPU & CPU operation that involve; Less Detailed values like Hashes or game data Metadata or Machine Learning : RS

Firstly the F16C is the FX 8320E supported instruction so the CPU can potentially use packed F16 Float instructions directly from the CPU,
As quoted F16 carefully managed produces a pipeline that is 100% F16..

Packed F16 instructions use 2 data sets per 32Bit storage register...

Data is converted if the array of instructions includes F32 & commonly all F16 should be present first; Before group conversion or alternatively...

Allocating an additional 16Bits of data for example 0000x4 or subtle variance data that allows unique renders... Such as a chaos key or Entropy / RNG Random data...

Potentially allocating a static key in the form of AES Output from base pair F16c Value...

The additional data make potentially each game player render unique!

Fast Conversion Proposals include:

Unique per player additional data (AES Key conversion for example, Or DES; DES Produces smaller faster values)

Static key, Sorted by data type (Base on player profile or Game map)

Dynamic Key

0000 or empty buffer hash

Side by Side : Wide format texture = 2xF16 Value on same 32Bit Value
Top & Bottom : F16 Double layered format texture = 2xF16 Value on same 32Bit Value

Yes transparency for alien skin can use : Top & Bottom F16 layered texture
Machines also; Or even 4 layers for a truly special effect.

Combine both methodology and crypto hash with one or more layer of BumpMap RayTracing SiMD

SiMD is also 16Bit compatible so no conversion required.

Weather & clouds are examples perfect for light fast loads over massive GPU Arrays.

F16 are also theoretically ideal for 16Bit audio if using SiMD..

In the case of AVX probably worth using dynamic key conversion..
A Dynamic Remainder key that allows lower bits to interpolate Sound data.

Other object sources such as render can potentially use the F16 system to..
Interpolate or Tessellate bits on shift from F16 to F32 on final plane write to frame buffer..
The memory required would be the buffer & not the source process..

An example is to replace the bits missing from F16 in F32/F64 with tessellation shaping and sharpening code; Dynamically relative to performance of the GPU/CPU...
F16 values obviously transfer from GPU to CPU fast & CPU to GPU..

Image enhancement is also possible with a bitshift stack buffer that passes additional data to the missing bits..
For example pre processed micro BumpMapping or Compute shading process; That will pull the bits in.. Under the F16 data  453000.172000 > 453545.172711 bit swap.. could be complex!
Done with a cache? Entirely possible with united cache L3

DLSS & Dynamic sharpen & Smooth/Filter enhanced virtual resolution .. Can significantly enhance the process..
Of dynamic buffer pipelining to render path. (on requirement benefit)

(c)Rupert S https://science.n-helix.com/2019/06/vulkan-stack.html

https://gpuopen.com/learn/first-steps-implementing-fp16/

No comments: