Thursday, March 25, 2021

Upscaling Enhancement

Super resolution API Photo & Video Enhance & upscaling demonstrations

Upscaling Enhancement For Telescopes, Space & Research Aviation Photography & Video
Photo Enhance & upscaling:


Photographic Enhancers:


BloodBorne : "Why not shoot for 4K too? Thus began a week of experiments using a tool called Topaz Video Enhance AI, which uses a number of different AI upscaling models - and it turned out that most of them could deliver appreciably higher detail."



Department of Energy - RGB_Color-Seal_Green-Mark_SC_Vertical V2 Helix.jpg (2.08 MB) https://mirrorace.org/m/3Jz5o

JOE Science Workshop V1 - DcXw0jCU8AA-Jdk.jpg (1.09 MB) https://mirrorace.org/m/3Jz5p

SuperNova image_2144_1e-SN-1993J.jpg (1.74 MB) https://mirrorace.org/m/3Jz5q

XC50 Cray Met Data Test DQ-aoSpUQAAXby-.png (6.72 MB) https://mirrorace.org/m/3Jz5r

**

deadpool V2 3000.jpg (2.81 MB) https://mirrorace.org/m/5LrrU

Such wow art V2 3000 tGi0Ap74NwbRC.jpg (4.09 MB) https://mirrorace.org/m/4pwxi

Friday, March 12, 2021

Brain Bit Precision Int32 FP32, Int16 PF16, Int8 FP8, Int6 FP6, Int4? Idealness of Computational Machine Learning ML TOPS for the human brain

Brain Bit Precision Int32 FP32, Int16 PF16, Int8 FP8, Int6 FP6, Int4? Idealness of Computational Machine Learning ML TOPS for the human brain:

Brain level Int/Float inferencing is ideally in Int8/7 with error bits or float remainders

Comparison List : RS

48Bit Int+Float Int48+FP48 (many connections, Eyes for example) HDR Vison

40BitInt+Float Int40+FP40 HDR Basic

Int16 FP32

Int8 Float16(2Channel, Brain Node)(3% Brain Study)

Int7 (20% Brain Study)

Int6 (80% Brain Study)

Int5 (Wolves (some are 6+))

Int4 (Sheep & worms)

Int3 (Germ biosystems)


Statistically a science test stated 80% of brains in man quantify Bit at 6 20% to 7Bit

XBox X & PlayStation 5 do down to INT4Bit (quite likely for quick inferencing)

Be aware that using 4 bit Int instructions .. potentially means more instructions used per clock cycle & more micro data transfers..

Int8 is most commonly liable to quantify data with minimum error in 8Bit like the Atari STE or the Nintendo 8Bit..

Colour perception for example is many orders of magnitude higher! Or 8bit colours EGA is all we would use..


16Bit was not good enough.. But 32Bit suites most people! But 10Bit(x4) 40Bit & Dolby 12Bit(x4) 48Bit is a luxury & we love it!


(c)Rupert S https://is.gd/ProcessorLasso


Restricted Boltzmann ML Networks : Brain Efficient

I propose that SIMD of large scale width & depth can implement the model :
Restricted Boltzmann Machines (RBMs) have been proposed for developing neural networks for a variety of unsupervised machine learning applications

Restricted Boltzmann Machines utilize a percentage correctness based upon energy levels of multiple node values; That represent a percentage chance of a correct solution,

My impression is that Annealer machine simply utilise more hidden values per node on a neural network,
Thus i propose that SIMD of large scale width & depth can implement the model..

A flexible approach is to experiment with percentages from a base value...
100 or 1000; We can therefore attempt to work with percentiles in order to adapt classical computation to the theory of multiplicity.

SiMD in parallel can; As we know with RISC Architecture .. 
Attempt to run an ideal network composing many times Factor & regression learning model..

Once the rules are set; Millions of independent IO OPS can be performed in cyclic learning,

Without sending or receiving data in a way that interferes with the main CPU & GPU Function..

Localised DMA.

"Adaptive hyperparameter updating for training restricted Boltzmann machines on quantum annealers"

Adaptive hyperparameter updating for training restricted Boltzmann machines on:
Quantum annealers
Wide Path SiMD




"Restricted Boltzmann Machines (RBMs) have been proposed for developing neural networks for a
variety of unsupervised machine learning applications such as image recognition, drug discovery,
and materials design. The Boltzmann probability distribution is used as a model to identify network
parameters by optimizing the likelihood of predicting an output given hidden states trained on
available data. Training such networks often requires sampling over a large probability space that
must be approximated during gradient based optimization. Quantum annealing has been proposed
as a means to search this space more efficiently which has been experimentally investigated on
D-Wave hardware. D-Wave implementation requires selection of an effective inverse temperature
or hyperparameter (β) within the Boltzmann distribution which can strongly influence optimization.
Here, we show how this parameter can be estimated as a hyperparameter applied to D-Wave
hardware during neural network training by maximizing the likelihood or minimizing the Shannon
entropy. We find both methods improve training RBMs based upon D-Wave hardware experimental
validation on an image recognition problem. Neural network image reconstruction errors are
evaluated using Bayesian uncertainty analysis which illustrate more than an order magnitude
lower image reconstruction error using the maximum likelihood over manually optimizing the
hyperparameter. The maximum likelihood method is also shown to out-perform minimizing the
Shannon entropy for image reconstruction."

(c)Rupert S

Example ML Statistic Variable Conversion : Super Sampling Virtual Resolutions : Talking about machine learning & Hardware functions to use it/Run it; To run within the SiMD & AVX feature-set.

For example this works well with fonts & web browsers & consoles or standard input display hubs or User Interfaces, UI & JS & Webpage code.

In the old days photo applications did exist to use ML Image enhancement on older processors..
So how do they exploit Machine Learning on hardware with MMX for example ?

Procedural process data analytics:

Converting large statistics data bases; On general Tessellation/Interpolation of images
The procedural element is writing the code that interpolates data based upon the statistics database...

Associated colours..
Face identity...
Linearity or curvature...
Association of grain & texture...

Databases get large fast & a 2 MB to 15MB Database makes the most sense...
Averages have to be categorized by either being worthy of 2 Places in the database or an average..

You can still run ML on a database object & then the points in the table are called nodes!

Indeed you can do both, However database conversion makes datasets way more manageable to run within the SiMD & AVX feature-set.

However the matter of inferencing then has to be reduced to statistical averages & sometimes ML runs fine inferencing this way.

Both ways work, Whatever is best for you & the specific hardware.

(c)Rupert S

**

DL-ML slide : Machine Learning DL-ML


By my logic the implementation of a CPU+GPU model would be fluid to both..

Machine Learning : Scientific details relevant to DL-ML slide (CPU,GPU,SiMD Hash table(M1 Vector Matrix-table +Speed)

The vector logic is compatible to both CPU+GPU+SiMD+AVX.

Relevant because we use Vector Matrix Table hardware.. and in notes the Matrix significantly speeds up the process.
(Quantum Light Matrix)

The relevance to us is immense with world VM servers
DL-ML Machine Learning Model compatible with our hardware
By my logic the implementation of a CPU+GPU model would be fluid to both..
The vector logic is compatible with both CPU+GPU.

However this is a model we can use & train..
For common core : Rupert S https://is.gd/ProcessorLasso


Saturday, February 13, 2021

Multi Operation Maths - CPU,GPU Computation

Multi Operation Maths - CPU,GPU Computation (c)RS

Performing multiple 4,8,16,32 operations on a 64Bit integer core (The example)



Kind of an F16 operation & Integer 16 or Int8 if you need it, With careful management and special libraries ..
Capable of speeding up PC,Mac & Consoles :HPC:
Requires specially compiled libraries so compiled codes can be managed & roll ops assessed.

Rules:

All operations need to be by the same multiplication

Rolls usable to convert value for example Mul & Division


For example :


451 722 551 834 x 6


In the case of non base factor roll numbers

We have to fraction the difference between the value and our base roll number,

10 for example and 6, So the maths is convoluted & may not be worth it,

Could do 6 + rolls & then -Rolls

On a 10 processor the first factor would be 10x because we could compensate by placement

But we still need space to expand the result to the right or left

0451072205510834 x 10 =

4510722055108340

or 4510 roll -12
7220 roll -8
5510 roll -4
8340 no roll

Converting base 10 to & from hex may make sense

Depending on the cost of roll; This operation may be worth it!

This operation is in Base 10 & 8Bit makes more sense mostly for common operations in hex..

But 8 is not a very big number for larger maths & 16Bit makes more sense; Because it has a larger number.

Performing numeric expansion:

consoles in particular and FPU where expansion is required for emergence mathematics

Performing numeric expansion for circumstances where we require larger numbers for example:

To fill the 187 FPU buffer..

To do that we will roll to the left & expand the number, although we may need multiple operations..

Like i say : Roll + or Roll -

1447000
-Roll 3 = 1447
or
+Roll 3 = 1447000000

That way we can potentially exceed the Bit Depth 32Bit for example.

Rupert S https://science.n-helix.com


*****

Packed F16C & F16 Values in use on CPU & GPU - RS

F16C & F16 : lower precision values that are usable to optimise GPU & CPU operation that involve; Less Detailed values like Hashes or game data Metadata or Machine Learning : RS

Firstly the F16C is the FX 8320E supported instruction so the CPU can potentially use packed F16 Float instructions directly from the CPU,
As quoted F16 carefully managed produces a pipeline that is 100% F16..

Packed F16 instructions use 2 data sets per 32Bit storage register...

Data is converted if the array of instructions includes F32 & commonly all F16 should be present first; Before group conversion or alternatively...

Allocating an additional 16Bits of data for example 0000x4 or subtle variance data that allows unique renders... Such as a chaos key or Entropy / RNG Random data...

Potentially allocating a static key in the form of AES Output from base pair F16c Value...

The additional data make potentially each game player render unique!

Fast Conversion Proposals include:

Unique per player additional data (AES Key conversion for example, Or DES; DES Produces smaller faster values)

Static key, Sorted by data type (Base on player profile or Game map)

Dynamic Key

0000 or empty buffer hash

Side by Side : Wide format texture = 2xF16 Value on same 32Bit Value
Top & Bottom : F16 Double layered format texture = 2xF16 Value on same 32Bit Value

Yes transparency for alien skin can use : Top & Bottom F16 layered texture
Machines also; Or even 4 layers for a truly special effect.

Combine both methodology and crypto hash with one or more layer of BumpMap RayTracing SiMD

SiMD is also 16Bit compatible so no conversion required.

Weather & clouds are examples perfect for light fast loads over massive GPU Arrays.

F16 are also theoretically ideal for 16Bit audio if using SiMD..

In the case of AVX probably worth using dynamic key conversion..
A Dynamic Remainder key that allows lower bits to interpolate Sound data.

Other object sources such as render can potentially use the F16 system to..
Interpolate or Tessellate bits on shift from F16 to F32 on final plane write to frame buffer..
The memory required would be the buffer & not the source process..

An example is to replace the bits missing from F16 in F32/F64 with tessellation shaping and sharpening code; Dynamically relative to performance of the GPU/CPU...
F16 values obviously transfer from GPU to CPU fast & CPU to GPU..

Image enhancement is also possible with a bitshift stack buffer that passes additional data to the missing bits..
For example pre processed micro BumpMapping or Compute shading process; That will pull the bits in.. Under the F16 data  453000.172000 > 453545.172711 bit swap.. could be complex!
Done with a cache? Entirely possible with united cache L3

DLSS & Dynamic sharpen & Smooth/Filter enhanced virtual resolution .. Can significantly enhance the process..
Of dynamic buffer pipelining to render path. (on requirement benefit)

(c)Rupert S https://science.n-helix.com/2019/06/vulkan-stack.html

https://gpuopen.com/learn/first-steps-implementing-fp16/