Friday, March 12, 2021

Brain Bit Precision Int32 FP32, Int16 FP16, Int8 FP8, Int6 FP6, Int4? Idealness of Computational Machine Learning ML TOPS for the human brain

Brain Bit Precision Int32 FP32, Int16 FP16, Int8 FP8, Int6 FP6, Int4? Idealness of Computational Machine Learning ML TOPS for the human brain:

Brain level Int/Float inferencing is ideally in Int8/7 with error bits or float remainders

Comparison List : RS

48Bit Int+Float Int48+FP48 (many connections, Eyes for example) HDR Vison

40BitInt+Float Int40+FP40 HDR Basic

Int16 FP32

Int8 Float16(2Channel, Brain Node)(3% Brain Study)

Int7 (20% Brain Study)

Int6 (80% Brain Study)

Int5 (Wolves (some are 6+))

Int4 (Sheep & worms)

Int3 (Germ biosystems)


Statistically a science test stated 80% of brains in man quantify Bit at 6 20% to 7Bit

XBox X & PlayStation 5 do down to INT4Bit (quite likely for quick inferencing)

Be aware that using 4 bit Int instructions .. potentially means more instructions used per clock cycle & more micro data transfers..

Int8 is most commonly liable to quantify data with minimum error in 8Bit like the Atari STE or the Nintendo 8Bit..

Colour perception for example is many orders of magnitude higher! Or 8bit colours EGA is all we would use..

16Bit was not good enough.. But 32Bit suites most people! But 10Bit(x4) 40Bit & Dolby 12Bit(x4) 48Bit is a luxury & we love it!

Precision Quality Control in ML:


While nothing is sure, Human beings appear to have Integer of around 8 & are more surely able to practice Float units,

Bundling is when multiple Neuron roots go to the same neuron in Sync from the same response cluster Neurons,

This feature enhances data integrity & precision by multiplying data transfer & response precision..

Eye Neurons are an example & so are feelings from clustered neurons such as hands, feet & the sensory organs,

Memory & Maths calculations.

(c)Rupert S https://is.gd/ProcessorLasso


ML Classification Bundling for HIM & Her

Sorting bundles in priorities such as,

Time to process, Similarity & by probability (likelihood) improves perception & thought process,

Logical sort orders..
Required processing order based on sorted requirements (one needs another)
Items that go locally together, { Cleaning, Cooking, cleanup }
Logical order, { Drink, Power, Computer, Application, Search, Webpage, Notebook, read, write }

Saving data caches it & aids processing; But organising it first makes retrieval clean & thought Clean Meditation Logic.

Connection specifics for a better brain; classified by type & example:

Human Brain cells have 1000 connections, squid 10000; Each connection does:

7Bit regular
8Bit, sharp
9Bit on better effort,
10Bit on clarity & meditation + work hard
6Bit on relaxed,
5Bit on drunk

Connections for dedicated skills such as maths have:

Dedication bundling (multiple connections)
Multiple Affirmations, A-Synchronous, Synchronous

1 5Bit to 7Bit
2, 5Bit to 18Bit
3, 7Bit to 26Bit
4, 16Bit to 38Bit
5, 17Bit to 48Bit

Eyes for example can bundle 5 on training, colour purity..
lower bundling offers more flexibility,
High bundling offers assurety & speed & retention.

RS

Python & JS Configurations
https://is.gd/DictionarySortJS

*

Quantization modelling : RS : Physics III Slit Experiment

"(SmoothQuant).The optimized model achieves >3X latency improvement with a custom dequantization kernel for FP16 inference. Although the work does not map to Int8 engine"

In view that inferencing is being activated in Int4 & Int8 & Int16 & Floats f16b F8 & F4,

Now my view is a vision of a Slit experiment in Physics; Now a slit experiment shows light photos in slices through a screen..

Int4 IIII < Int8 IIIIIIII < Int16 IIIIIIIIIIIIIIII

Ratio 1:2:4 on contained knowledge

Minimal Origin of mankind's knowledge : IIII < IIIIIIII < IIIIIIIIIIIIIIII Defined Summit of all power

My method is to compress the point node data with
https://is.gd/WaveletAutoEncoder
https://github.com/GPUOpen-LibrariesAndSDKs/brotli_g_sdk

So what we do is take advantage of patterns; Creating tables of 1111 1010 as examples; These compress well & can be short noted as patterns,

We can expand 4Bit into 8Bit inference & compress as patterns; The total data point is 4Bit if it is a pattern,
The subject is not predictable unless we pick the patterns!

We can however Quantize the memory footprint; The Double/Single precision operations may be faster! :L

We need the models to work in F16 & Int8 & Int4 after-all, But i see a reason to use Floats because sub-quantization does leave a remainder for us to compare..

That relevant 'F16' >=-

RS

Study Subject Reduction :

https://science.n-helix.com/2021/03/brain-bit-precision-int32-fp32-int16.html
https://science.n-helix.com/2022/10/ml.html

https://blog.openvino.ai/blog-posts/q123-technology-update-low-precision-and-model-optimization
https://blog.openvino.ai/blog-posts/q223-technology-update-low-precision-and-model-optimization
https://blog.openvino.ai/blog-posts/q323-technology-update-low-precision-and-model-optimization
https://blog.openvino.ai/blog-posts/q423-technology-update-low-precision-and-model-optimization

Batch Size 240W>65W, 32GB{64, 16}, 15W>5W, 4gb{16, 1} : 16, 8, 4 seems optimal,
Time taken compatible:

ML_With_USB_Stress-Testing_USB_Accelerators_for_Efficient_Edge
https://www.researchgate.net/publication/377174200_Stress-Testing_USB_Accelerators_for_Efficient_Edge_Inference
https://github.com/raphischer/edge-acc

https://is.gd/CJS_DictionarySort

Python & JS Configurations
https://is.gd/DictionarySortJS

*

Restricted Boltzmann ML Networks : Brain Efficient

I propose that SIMD of large scale width & depth can implement the model :
Restricted Boltzmann Machines (RBMs) have been proposed for developing neural networks for a variety of unsupervised machine learning applications

Restricted Boltzmann Machines utilize a percentage correctness based upon energy levels of multiple node values; That represent a percentage chance of a correct solution,

My impression is that Annealer machine simply utilise more hidden values per node on a neural network,
Thus i propose that SIMD of large scale width & depth can implement the model..

A flexible approach is to experiment with percentages from a base value...
100 or 1000; We can therefore attempt to work with percentiles in order to adapt classical computation to the theory of multiplicity.

SiMD in parallel can; As we know with RISC Architecture .. 
Attempt to run an ideal network composing many times Factor & regression learning model..

Once the rules are set; Millions of independent IO OPS can be performed in cyclic learning,

Without sending or receiving data in a way that interferes with the main CPU & GPU Function..

Localised DMA.

"Adaptive hyperparameter updating for training restricted Boltzmann machines on quantum annealers"

Adaptive hyperparameter updating for training restricted Boltzmann machines on:
Quantum annealers
Wide Path SiMD




"Restricted Boltzmann Machines (RBMs) have been proposed for developing neural networks for a
variety of unsupervised machine learning applications such as image recognition, drug discovery,
and materials design. The Boltzmann probability distribution is used as a model to identify network
parameters by optimizing the likelihood of predicting an output given hidden states trained on
available data. Training such networks often requires sampling over a large probability space that
must be approximated during gradient based optimization. Quantum annealing has been proposed
as a means to search this space more efficiently which has been experimentally investigated on
D-Wave hardware. D-Wave implementation requires selection of an effective inverse temperature
or hyperparameter (β) within the Boltzmann distribution which can strongly influence optimization.
Here, we show how this parameter can be estimated as a hyperparameter applied to D-Wave
hardware during neural network training by maximizing the likelihood or minimizing the Shannon
entropy. We find both methods improve training RBMs based upon D-Wave hardware experimental
validation on an image recognition problem. Neural network image reconstruction errors are
evaluated using Bayesian uncertainty analysis which illustrate more than an order magnitude
lower image reconstruction error using the maximum likelihood over manually optimizing the
hyperparameter. The maximum likelihood method is also shown to out-perform minimizing the
Shannon entropy for image reconstruction."

(c)Rupert S

Example ML Statistic Variable Conversion : Super Sampling Virtual Resolutions : Talking about machine learning & Hardware functions to use it/Run it; To run within the SiMD & AVX feature-set.

For example this works well with fonts & web browsers & consoles or standard input display hubs or User Interfaces, UI & JS & Webpage code.

In the old days photo applications did exist to use ML Image enhancement on older processors..
So how do they exploit Machine Learning on hardware with MMX for example ?

Procedural process data analytics:

Converting large statistics data bases; On general Tessellation/Interpolation of images
The procedural element is writing the code that interpolates data based upon the statistics database...

Associated colours..
Face identity...
Linearity or curvature...
Association of grain & texture...

Databases get large fast & a 2 MB to 15MB Database makes the most sense...
Averages have to be categorized by either being worthy of 2 Places in the database or an average..

You can still run ML on a database object & then the points in the table are called nodes!

Indeed you can do both, However database conversion makes datasets way more manageable to run within the SiMD & AVX feature-set.

However the matter of inferencing then has to be reduced to statistical averages & sometimes ML runs fine inferencing this way.

Both ways work, Whatever is best for you & the specific hardware.

(c)Rupert S

**

DL-ML slide : Machine Learning DL-ML


By my logic the implementation of a CPU+GPU model would be fluid to both..

Machine Learning : Scientific details relevant to DL-ML slide (CPU,GPU,SiMD Hash table(M1 Vector Matrix-table +Speed)

The vector logic is compatible to both CPU+GPU+SiMD+AVX.

Relevant because we use Vector Matrix Table hardware.. and in notes the Matrix significantly speeds up the process.
(Quantum Light Matrix)

The relevance to us is immense with world VM servers
DL-ML Machine Learning Model compatible with our hardware
By my logic the implementation of a CPU+GPU model would be fluid to both..
The vector logic is compatible with both CPU+GPU.

However this is a model we can use & train..
For common core : Rupert S https://is.gd/ProcessorLasso



"State-of-the-art approaches such as OpenMP and OpenCL"
https://is.gd/LEDSource

https://science.n-helix.com/2023/06/tops.html


Tokma ML

Python & JS Configurations
https://is.gd/DictionarySortJS

https://iopscience.iop.org/article/10.1088/1741-4326/ad142f

https://is.gd/TokmaML

No comments: