Machine Learning Equates Solve Table for Advanced ML (c)RS
Python & of course all runtimes of GPU & CPU Firmware & Logical thought,
Apologies for not expressly stating all {Mul+ & all} Accumulator strategies, these are hard to work out! But basic edge detection is a SiMD Example RS
*
ML Learning is a branch of artificial intelligence that focuses on using data and algorithms to imitate the way that humans learn & improving ML Method accuracy.
ML Learning can be applied to various domains, such as image processing, natural language processing, speech recognition & code optimization.
ML Learning can use different techniques; Such as supervised learning, unsupervised learning & reinforcement learning, depending on the type and availability of data.
Some of the common techniques used in ML Learning are:
Edge detection: a process of identifying the boundaries of objects in images or videos.
Accent recognition: a process of identifying the regional or social variation of speech.
Language processing: a process of analyzing and generating natural language texts.
Code optimization: a process of improving the performance or quality of code by using various methods, Such as compilers, libraries, or heuristics.
The Objective is to improve both ML & Minds.
RS
Core features of TOPCloud:
RTP ML TOPS are a processors friend
3D audio mapping & spatialization for realistic sound effects
3D Vector Support for various audio formats such as PCM, MP4, OGG, and WAV
Low latency & high bandwidth connection to cloud GPU servers via RDP
Procedural & heuristic algorithms for generating game scenarios & strategies & 3D Audio & Visuals
Localized & cloud-based machine learning models for optimizing game performance & user experience
RTP GPU Connect technology that allows users to access GPU resources from any device with Bluetooth or WiFi
TOPCloud is a revolutionary 'TOPS' way to enjoy & create audio games using your own music & the power of the cloud. Try it today & discover a new dimension of gaming!
*
If you use TOPCloud, you can share between different displays in the TOP's Sense..
but mostly you would need cloud presence,
Mostly this would be about making the most out of TOP heavy Business GPU & personal ones in your computer or consoles.
But sharing common tasks such as scaling movies by type or by identifying a single movie to upscale...
Now you might be asking what we would be doing there?
Well a single movie uses the same materials in our ML; We can analyse the class & optimise the scaling by class..
For those familiar with games & FSR; We familiarise our code with a single game!
By doing this we improve our product and can therefore classify by:
Resolution
Style
Speed
Type, FPS for example & RTS
We can classify by colour or creativity...
We do not simply have to roll the dice on General Scaling, We can use classifiers:
Title
Scale
Type
Speed
Frame Rate
Colour & Composure
Rupert S
PoCL Source & Code
https://is.gd/LEDSource
*
We all think our own way; Potential is always there on a Runtime Library - Multiple Solve Table
Machine learning | Equate ~= Multi Layer Wavelet Abstraction
https://science.n-helix.com/2022/09/ovccans.html
https://www.youtube.com/watch?v=-9lCpfrOQQ4
(c)Rupert S 2022-10
https://is.gd/LEDSource
https://is.gd/BTSource
https://science.n-helix.com/2023/06/tops.html
https://science.n-helix.com/2021/03/brain-bit-precision-int32-fp32-int16.html
https://science.n-helix.com/2022/08/jit-dongle.html
https://science.n-helix.com/2022/06/jit-compiler.html
https://is.gd/MLCodecShaping
https://is.gd/HPC_HIP_CUDA
https://www.amd.com/en/developer/rocm-hub/hip-sdk.html#tabs-ddafbba141-item-c6b9ce2aab-tab
https://rocm.docs.amd.com/en/docs-5.5.1/deploy/windows/quick_start.html
https://github.com/ssube/diffusers/tree/feature/onnx-upscale
https://github.com/huggingface/diffusers
https://huggingface.co/ssube/stable-diffusion-x4-upscaler-onnx
https://huggingface.co/uwg/upscaler/tree/main
https://huggingface.co/nvmmonkey/optimal_upscale/tree/main
https://huggingface.co/gmp-dev/gmp-upscaler/tree/main/ESRGAN
Neural Engine
https://github.com/godly-devotion/MochiDiffusion
ML List & Services
https://huggingface.co/models?sort=downloads&search=upscale
https://huggingface.co/models
https://huggingface.co/pricing
PysicsX
Isaac Gym - Preview Release
https://developer.nvidia.com/isaac-gym
CALM: Conditional Adversarial Latent Models for Directable Virtual Characters
https://github.com/NVlabs/CALM
*
Alpaca Character Generation model
4Bit for speed, But not precise
https://huggingface.co/anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g
trained 3Epoc Higher Precision https://huggingface.co/chavinlo/gpt4-x-alpaca
Base model https://huggingface.co/chavinlo/alpaca-13b
https://github.com/teknium1/GPTeacher
Python WebUI
https://github.com/oobabooga/text-generation-webui
Mac; Mostly MAC but fast
https://github.com/ggerganov/llama.cpp
how to use & personality sets https://discord.com/invite/aitrepreneur-1018992679893340160
On the subject of how deep a personality of 4Bit, 8Bit, 16Bit is reference:
https://science.n-helix.com/2021/03/brain-bit-precision-int32-fp32-int16.html
https://science.n-helix.com/2022/10/ml.html
*
Machine learning | Equate ~= Multi Layer Wavelet Abstraction
(documents) JIT & OpenCL & Codec : https://is.gd/DisplaySourceCode
*
https://science.n-helix.com/2022/08/jit-dongle.html
https://science.n-helix.com/2022/06/jit-compiler.html
https://science.n-helix.com/2022/10/ml.html
https://science.n-helix.com/2021/03/brain-bit-precision-int32-fp32-int16.html
https://science.n-helix.com/2022/03/ice-ssrtp.html
https://science.n-helix.com/2022/09/ovccans.html
https://science.n-helix.com/2023/02/smart-compression.html
https://science.n-helix.com/2022/09/audio-presentation-play.html
https://science.n-helix.com/2021/10/he-aacsbc-overlapping-wave-domains.html
https://science.n-helix.com/2023/03/path-trace.html
*****
Best NPM site on world https://npm.n-helix.com/bundles/
(Simple Install) Website Cache JS Updated 2021-11 (c)RS https://bit.ly/CacheJS
(Simple Install) Science & Research Node High Performance Computing
Linux & Android https://is.gd/LinuxHPCNode
Presenting JIT for hardware interoperability & function :
https://is.gd/DisplaySourceCode
https://is.gd/BTSource
(Simple Install) Website Server Cache JS Updated 2021-11 (c)RS
https://bit.ly/CacheJSm
(Simple Install) Website Server Cache JS Work Files Zip Updated
2021-11 (c)RS https://bit.ly/AppCacheJSZip
*****
*****
Direct ONNX Hardware Accelerated: F16
https://github.com/GPUOpen-LibrariesAndSDKs/RadeonML
Ideal for 4Bit Int4 XBox & Int8 GPU
PULP-NN: accelerating quantized neural networks on parallel ultra-low-power RISC-V processors - Bus-width 8-bit, 4-bit, 2-bit and 1-bit
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6939244/
ML Proof case SVM (Multi-Dimensional-Elliptic,98%) aDaBoost M1(Mac,91%) - COVID-19 Prediction Using Supervised Machine Learning - Irfan_Ali_MEng_2023
https://dspace.library.uvic.ca/bitstream/handle/1828/14676/Irfan_Ali_MEng_2023.pdf?sequence=1&isAllowed=y
Core Motivations of ML
ML Learning is a branch of artificial intelligence that focuses on using data and algorithms to imitate the way that humans learn & improving ML Method accuracy.
ML Learning can be applied to various domains, such as image processing, natural language processing, speech recognition & code optimization.
ML Learning can use different techniques; Such as supervised learning, unsupervised learning & reinforcement learning, depending on the type and availability of data.
Some of the common techniques used in ML Learning are:
Edge detection: a process of identifying the boundaries of objects in images or videos.
Accent recognition: a process of identifying the regional or social variation of speech.
Language processing: a process of analyzing and generating natural language texts.
Code optimization: a process of improving the performance or quality of code by using various methods, Such as compilers, libraries, or heuristics.
The Objective is to improve both ML & Minds.
RS
I think that considering the stated philosophy, There is more room for education on social conduct.
https://www.youtube.com/watch?v=jV4lS0srEVo
*
*
PCM & MP4 are 2D/3D Image so GPU Helps there also with 3D Audio mapping!
Games do not require cloud processing of images & a lot of local strategies are procedural Heuristic
You see RDP has GPU Connect (my innovation i might add) So Bluetooth & Wifi can connect RTP GPU; The port specifics are not particularly important; However a device such as music streamer can have ML TOP's available locally & from the cloud,
Due to how the TOPCloud strategy works with localised ML TOPS; Not all data has to be sent or received.. For example all Audio 3D Profiles for HQ Room audio can be done within a few MB of data; With some hard work? 150Kb of data & so in reach of phones & mobile!
Gaming is an example here. I give TickTackToe as the example where all that a device like Alexa or Google smart device has to think is Which square? but..
No physical picture needs to be sent for the game to be played & if required a small TickTack Strategy ML is desired locally for a quicker response!
You see with a low latency GPU RTP & GPU RDP connection to cloud GPU; Most localised thinking TOPS can be carried out in Seconds if not milliseconds & PCM & MP4 are 2D/3D Image so GPU Helps there also with 3D Audio mapping!
Rupert S
*
https://www.youtube.com/watch?v=jV4lS0srEVo
Int8:SiMD : Maths & Logic
This is about how you think about components such as INT8, INT4(Xbox) & SiMD, You have to classify by necessity & optimise the structure.
You can shape the game reality with specific control objects & statics!
Maths in SiMD & Int8 & Machine Learning in Int8 & SiMD; SiMD is hard maths, Int8 is soft edge inference...
Both are maths; But soft logic is not a PROOF Math but can be proof; Hard math is not 'Invention & Imagination 'Exactly''
But we have both to improve performance.
RS
*
"I know this is depressing from my end with a FX8320E with AVX but if you multi tune the CPU Kernel for the RX / RTX that 512DL AVX would have meaning, If you are kind you will allow machine learning on the AVX FX8320E Level to work on SiMD Yes / No comparisons !"
Better-Mind
Here is how to create a better mind #ML
Train your eyes with art on the concepts of edges, curves, Colours & Shading and love,
Educate your minds; Learn today & be quite aware how clever & sharp you will be.
Humain Operations
Edge Detection
Such as teaching your child edge detect in art ;)
Smooth & Blend & Sharpen,
All interpretive
Accent Recognitions & Language
Interpret as follows
https://en.wikipedia.org/wiki/FMA_instruction_set
https://en.wikipedia.org/wiki/Advanced_Vector_Extensions
https://en.wikipedia.org/wiki/AArch64#Scalable_Vector_Extension_(SVE)
*
I suppose you can use for example a - b & automatically see if it is larger? So you could 1 to 20 & sort them by remaining number; Before asking, Small number remainders are 8Bit 0-256 , 16Bit is 65535...
So reducing the value of a group of numbers you sort to 16Bit or 8Bit considerably reduces sorting cost...
Achievable complexity reduction by abstracting a simple number to do the following:
You link the Data in 64Bit, 32Bit to & Vector Table,
List of lower complexity is faster
Sorting
Comparator matrix
Colour composing,{
The result is blended,
The result is High/Low Vector gradient,
We need a reduced colour set for compression
}
Where we sort files or names but reduced information (example First 4 Letters)
Sorting phone numbers fast...
Comparing lower complexity lists that have been; divided or had a static number removed from them,
This method reduces search & sort complexity; Like so:
Phone Number N +1 444555777
Sort N [+n]
N - last 6 digits (Zero 6 Digits, AVX has this feature)
Sort [N1 to N200]
List first 4, Sort by 4 to groups of 10
N - First 6 Digits (Zero First 6)
Sort
Return N1 to N200
Store
That may well be a lot quicker with very large lists.
RS
*
Complex feeling based Machine Learning ML is known as AI..
To truly generate AI is not impossible; There is instability in the core; Fragmentations of motive...
Miss diagnosis; Error; Decay?
So we do need a foundation; In us Education; Metabolised Data..
Analysis & then..
Application to motive & goal.
We require to understand humour,
We require to understand {Art, Science, Feeling, Life}
We require a goal or two; A {Sophie reward}; B {action reward}; C {Pleasurable reward}
We Require, {Goals, Life, Feeling, Action, Motive, Interest} : Creative intellect
RS
*
Operation precision reductions affect & effect more than Machine Learning & yes we have known this for years!
But we can learn from ML; In that in machine learning like the mind; A lack of precision affects so many issues!
The mind is self evidently the first place;
We lack logic when we do not precisely learn; We do not learn all...
We however learn quickly on reduced precisions... We Learn Fast; But do we learn well?
In school we teach as high a quality precision(Quality Education); As we can; But like machine RAM; We lack either time or memory & in truth we can learn all our lives..
So our core issues in all methods of enactment of thought:
Memory
Power
Precision
Quality of information
Retention
Relearning?
(Training)Requalification of information correctness
Thought process
Actions
Creations
Thought
Dreams
Reality & Truth
Rupert S
https://science.n-helix.com/2021/03/brain-bit-precision-int32-fp32-int16.html
*
Useful operation precision reductions; I observe that reducing precision to 1Bit & 2Bit..
While enhancing the definition of a positive, Negative Dipole & thus enhancing speed..
Further reduces reasoning capacity; That in order to reduce Processor bandwidth for reasoning..
In the example of the XBox & PS5; DOT4 & INT4, INT8 & F16 & bF16; Apply considerable improvement to reductions probable error related to a lack of remainder or float value depth enhancement!
By reason of probability i assume a value of 4Bit & 2Bit to allow the smallest packing ability; Existing alongside the word reasoned!
To reduce to 1 & 0; I assume a definite statement that a Value Integer Solve in the form of a vector..
Is most probably the solution & that furthermore that in most cases; Projected in pure maths & code ASM,
Both SiMD; Float & Integer...
Reduction to multiple 2 Bit values in short Integer instructions; I will state however that no such value is further away than a statistics table or PHP Data-Set.
Rupert S 2023-06
"The application of CNNs to resource-constrained embedded platforms has been a challenge, leading to the emergence of CNNs with various lightweight techniques. BNNs [22] are representative lightweight CNNs obtained by compressing CNN activation and weights into 1 and −1
values instead of using single-precision floating-point data. We simplified the multiply–accumulate operation, which was previously complex and required multiple cycles in CLs, by replacing it with a simple bitwise operation using 1-bit XNOR and popcount operations [23]. While BN in neural networks using single-precision floating-point data involves complex operations, a BNN simplifies this process by adding an offset to the resulting value. BN has four fixed parameters for network inference operations. Because 𝜎
is always a positive value, it can be expressed by Equations (2) and (3), depending on 𝛾
[24].
BNNs compress weights and input data into single bits to significantly reduce memory usage and perform hardware-optimized parallel operations using bitwise operations such as XNOR and popcount. However, there are limitations to using BNNs for complex networks, such as multi-keyword detection, owing to the decrease in accuracy caused by lightweight techniques. To address this issue, we propose a TNN that maintains the input data as binary while ternarizing the weights. The TNN has higher accuracy than the BNN owing to its higher bit precision; however, it can still use the bitwise operation method, and both networks have similar operational processes.
2.3. Depthwise Separable Convolutional Neural Network
In a typical CNN, multiple three-dimensional kernels repeatedly multiply and accumulate input feature maps to generate multiple output feature maps, which is computationally intensive with large memory usage. To solve this problem, we applied a DS-CNN that is highly accurate compared with the same parameters while reducing memory usage. A DS-CNN performs the local and global feature extraction functions of a typical convolutional operation in separate layers. Depthwise (DW) convolution matches a single input channel to an output channel, excluding interchannel correlations and reflecting local features. Pointwise (PW) convolution is equivalent to 1 × 1 convolution, reflecting interchannel correlations (i.e., global features). Figure 1 shows CNN and DS-CNN. In this figure, the use of the same color (e.g., red, blue, yellow) represents input channels with the same index being used to generate corresponding output channels in DW convolution. Table 1 lists the number of parameters and computations in specific layers with a 3 × 3 kernel. In one example from the network used in this paper, a layer with 128 input channels and 64 output channels experienced an approximately eight-fold reduction in the number of parameters and computational complexity using the DS-CNN."
Useful operation precision reductions
FPGA Implementation of Keyword Spotting System Using Depth-wise Separable Binarized and Ternarized Neural Networks
https://www.mdpi.com/1424-8220/23/12/5701
The storage of multiple bit operations with Sync Read & Write,
The purpose of this is to Read, Write & Store Operations on:
DOT4
INT8, INT16
F16, F32, F64
In RAM of 32Bit, 64Bit, 128Bit
Values Storage Table
32Bit = [16bit:16Bit]
32Bit = [8bit:8Bit:8bit:8Bit]
32Bit = [4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit]
64Bit = [32bit:32Bit]
64Bit = [16bit:16Bit:16bit:16Bit]
64Bit = [8bit:8Bit:8bit:8Bit:8bit:8Bit:8bit:8Bit]
64Bit = [4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit]
128Bit = [64bit:64Bit]
128Bit = [32bit:32Bit:32bit:32Bit]
128Bit = [16bit:16Bit:16bit:16Bit:16bit:16Bit:16bit:16Bit]
128Bit = [8bit:8Bit:8bit:8Bit:8bit:8Bit:8bit:8Bit:8bit:8Bit:8bit:8Bit:8bit:8Bit:8bit:8Bit]
128Bit = [4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit]
Bear in mind that Integer 64Bit is 2 x 32Bit on AMD; So you can compute 2 operations at 32Bit per 64Bit operation,
Some 64Bit units are only 64Bit; So we need to know how many!
32Bit operations are fine! & Conversion of 16Bit value ranges into 32Bit Operations can still be within range of 16Bit Storage..
If we stick within the 16Bit value range on Multiply & ADD,
We can therefore simply post a 16Bit value range data set & expect to be able to Store 16Bit!
The simple method is to store 2 16Bit values in the same 32Bit table; like [16bit:16Bit] = 32Bit
With this we can Load, Store, Run & Save 8bit INT8 operations in 32Bit devices such as Alexa as 8bit x 4 = 32Bit, So we don't Waste RAM or resources!
But we still have access to 32Bit RAM Paging; But with values loaded in 4Bit, 8Bit, 16Bit, 32Bit & so on.
With NANO Android on F16 & F32 & MIPS the same & AMD, Intel, NVidia,
Learning F16 offers considerable value for performance with 16M Values!
A good example of where 8Bit & 16Bit Value load works well is in the case of the texture,
To load 4 x 16Bit into a single 64Bit Cache:
32Bit RAM = 16Bit, 16Bit
64Bit RAM = 16Bit, 16Bit, 16Bit, 16Bit
128Bit RAM = 16Bit, 16Bit, 16Bit, 16Bit
In the case of direct DMA, you would be aware that you have,
128Bit, 192Bit Buss on GPU
32Bit & 64Bit on CPU
So a direct 4 * 32Bit or 2 * 64Bit Cache loads is a logically fast method to DMA directly from Cache to GPU!
In short you convert 8 x 16Bit into a 2x 64Bit DMA push; Which is very fast!
You can do the same with batches of vertices in many storage sizes.
References:
https://science.n-helix.com/2018/01/integer-floats-with-remainder-theory.html
https://science.n-helix.com/2021/02/multi-operation-maths.html
https://science.n-helix.com/2021/11/parallel-execution.html
https://science.n-helix.com/2022/12/math-error-solve.html
On the subject of how deep a personality of 4Bit, 8Bit, 16Bit is reference:
https://science.n-helix.com/2021/03/brain-bit-precision-int32-fp32-int16.html
https://science.n-helix.com/2022/10/ml.html
*
This is about how you think about components such as INT8, INT4(Xbox) & SiMD, You have to classify by necessity & optimise the structure.
You can shape the game reality with specific control objects & statics!
Maths in SiMD & Int8 & Machine Learning in Int8 & SiMD; SiMD is hard maths, Int8 is soft edge inference...
Both are maths; But soft logic is not a PROOF Math but can be proof; Hard math is not 'Invention & Imagination 'Exactly''
But we have both to improve performance.
RS
*
Performance per WATT of MMX & MMX+ & SSE & AVX Machine Learning & Shader code; Is a matter of 8x8Bit & 16x16Bit Code on GPU
Our role is to reduce complex un-cache-able ML to Cache Enabled 64KB
Modelling of 1990's without Quality loss of 32Bit++ 64Bit+
8x8Bit sharpening MMX Becomes Dual Pipe (16x16bit)*2 in 32Bit Dual 16 Pipeline & Twice as sharp
Machine Learning method for MMX Is Fast & Cheap, MMX2 More Compatible,
Intrinsic improvements such as combined ops & DOT4 Further improve the performance of under 1MB Code..
Performance & Function per WATT, Is unbeaten; Let us prove it!
The way SiMD Repeating Parallel batches of instruction can still side load data,
Data is loaded into the 'calculation set'
http://ftp.cvut.cz/kernel/people/geoff/cell/ps3-linux-docs/CellProgrammingTutorial/BasicsOfSIMDProgramming.html
https://en.wikipedia.org/wiki/Single_instruction,_multiple_data
SiMD Consist of 8Bit to 64Bit Long & Floats,
SiMD are simple instructions; Or so they think; SiMD are relatively complex instructions..
For example 4/1 of a page full of arithmetic code; However our goal is to use Heuristics & logic to circumvent the Artifacts/Errors in self generated code,
In addition to using problem solving tables to choose instructions that advantage our analysis (Machine Learning),
We also can choose the most probably optimal code type.
Our outset objective is to decide if we want to use CPU Feature types:
F16
Int8
dp4a
SiMD
Depending on the Mathematical Qualities of each ML Node & the questions they are asking,
For examples:
A simple ResNet Image identification uses edge detect & for that we need for example SiMD Matrix Edge Detection
Speech requires identifying Words in a codec, So obviously we need a Decoder & Encoder,
Word identifiers & correctness checking; But firstly we need to identify accent to correctly choose words..
We also need to classify words by Idea grouping (DataBase, Open Database)
As you can see; We will be defining many of these function groups as SiMD & Float,
Effective use of Int8 differentiation, Comparators & Maths operations has many benefits; So does JIT Compile.
*
Runtime Library - Multiple Solve Table
I would like a Solve Table of Statistically provable Machine Equates & Solves that make the equivalent of Maths Compilers such as RUST & Fortran's
For example basic ML code test function loops are basically compatible with X-OR Comparators on AVX! Other functions such as greater or less than; Are AVX Compatible.
Machine Learning : List of actions that are SiMD Baseline: Statistical Observance and Solve Tables
Yes or no comparator X-OR
Memory array Byte Swap
Greater or less than with swap or with X-OR Roll
Memory save & store
Edge comparisons
Compares (Colour, Math, Equate, Target, Solve if)
There are more! Statistical Observance and Solve Tables.
Examples 2:
Shape compare is a matter of inner & outer Vector : Comparison & X-OR, Larger outside & X-OR The differentiation:
By Dot,
By Mass (non literal dot difference comparator by axis),
Actual Mass
Density : Lumina, Weight, Mole, Mass / Area
Edge Solve : X-OR ~= Colour, Lumina, Shade, Vibrancy, Distance, Matrix Solve 3D>=2D Flattened Comparator
If = X-OR=N<0.0001 Then Compare &= Mutex Solve / Average
Polygon Join/Merge Tessellation : If Model = Same (T1 + T2 If (T1 + T2)/2 = Difference Less Than 0.0001 | = Merge/Converge
*
tensors & full onnx configuration : Upscaling : While we are not sure how much ML we need & at what precision,
We can be sure that 32Bit (per channel) Value RGBA (Multiple layer) requires at least 8Bit to 16Bit per channel final precision; So here is a list:
Required Value of output, Neural Network precision guide table: RS
Input
8Bit, 10Bit, 12Bit, 16Bit
Input network precision average bit retention (for RAM some error is allowed)
6Bit, 8Bit, 10Bit, 14Bit, 16Bit
Classifiers as we know can be,
Int 2Bit 4Bit, 8Bit, 16Bit, 32Bit
2 Bit is unlikely & 32Bit is for Dream Smooth 16Bit+ Precision output
Output Float (Mostly FP & F16b)
16Bit = { 8Bit, 10Bit, 12Bit }
24Bit, 32Bit, 64Bit = { 16Bit, 32Bit, 48Bit }
We can upscale : Audio, Video, Content & Polygons, We classify Quality by expectations & Quantify by percent %
Rupert S
*
https://science.n-helix.com/2022/10/ml.html
https://science.n-helix.com/2022/08/jit-dongle.html
https://is.gd/LEDSource
In my view heuristics in compilers are a choice for those who do not wish to include direct ML compiled into their code,
This is understandable in terms of terminator & cylons & indeed flawed beings or even good ones with depression!
However the application of branch optimisation is a sample code optimisation that can 'Plug In' to branch caching on the CPU & GPU.
Heuristics are not just code in the compiler; They are also micro code selecting a probable branch; Although code that forces a branch can be flawed..
Both heuristics, Branch probability selection & ML can run in parts of the code to select probable path!
Yes fundamentally any code that modifies behaviour is a catch bullet frame for not sound 'Fortrans code is rock solid' Rust is also supposed to be solid.
Including soundly made heuristic code & branch probability code ML in your inline routines; 'Very much interpretive master jedi'; But it can be done!
Question is How big? & how fixed?
25KB per 3MB on average?
ML & Heuristics like my application FPGA BitFile & Code Opt (c)RS 2021-01
can be applied at runtime & remain only for selecting the fastest path or the best; In terms of which Processor function to run code for.
(c)Rupert S
*
Quite flexible for use on Monitors & TV's; Light processor load on simple tasks & offloadable such as TOPCloud!
You may be thinking Offloading is impracticable because that requires one of two things:
JIT Compiler Dongle..
USB device such as Firestick or GPU & CPU (With OpenCL Compat)
Server! so internet & service provision!
Impossible? No; WebAdvert supported TV's need both!
So why not HPC TOPCloud? could make a HOT TV a lot cooler & Eco friendly with Server repeating tasks:
Scaling
Quality Service
Service availability
TOPCloud Offload Logic:
In terms of WebASM & WebGPU & MathML; TOPCloud provides sufficient advantages to be considered a core utility..
While Offloading repeating content such as Siteload core stack (Server) & Localising configuration such as Webpage size & DPI & Dynamic font arrangements that require thought.
In terms of Offloaded function & Efficient system load for large configurations..
Especially efficient configurations such as TPU, Coral, GPU work & Cloud CPU that have large optimised stacks & installed drivers.
RS
You can shape the game reality with specific control objects & statics!
Maths in SiMD & Int8 & Machine Learning in Int8 & SiMD; SiMD is hard maths, Int8 is soft edge inference...
Both are maths; But soft logic is not a PROOF Math but can be proof; Hard math is not 'Invention & Imagination 'Exactly''
But we have both to improve performance.
RS
*
Solve Table of Statistically provable Machine Equates & Solves : Table of function competitors & Operators.
#ML Learning: This explains why we teach kids art & reading first! But maths is quickly next,
Because all else is pointless; That we do not learn with logic & Teach with logic.
Here is how to create a better mind #ML
Train your eyes with art on the concepts of edges, curves, Colours & Shading and love,
Educate your minds; Learn today & be quite aware how clever & sharp you will be.
Edge Detection
Such as teaching your child edge detect in art ;)
Smooth & Blend & Sharpen,
All interpretive
Accent Recognitions & Language
Interpret as follows
*
When it comes to sorting methods, We Identify common techniques..
For example frequently used technologies such as:
ResNet
Language
Audio & Visual information
Code
Primarily we identify common optimisations; Compilers have libraries of them!
Audio & Video Encoded data use Wavelet Images, We can ResNet Them & also Edge Detect & Gaussian Detect contrast, Colour, Shape
Language is an uncommon syntax, But we have audio commons & Accent identification is also potentially Audio Context.
Code context is Logic, Function, Utility, Design, Motive
RS
*
Applications of operators to machine learning is like a PHP Database...
What we need to do is convert database accesses into actionable results...
Google Bard & Bing/Cortana crawl the web; But too many results leave us inconclusive...
We will be using database analysis on basic queries & for that we need heuristic maths!
So what do we need ?
Input data collection : Text & speech processing
Sorting algorithms (Operators, Example Variable Sort : A*B =< C Sort)
Graph Maths table collation : 3D Matrix Math - A B C Matrix
A C
|/
---B
Analysis of various results & statistical analysis of motivated search & conclusion testing..
With these we can test many math examples such as edge detect & sharpening or result maths...
With Operators >
FMA AVX Performance table: 2Flops per Cycle per FMA Unit
Architecture Fast Instructions for FMA
Reference Tables https://www.uio.no/studier/emner/matnat/ifi/IN3200/v19/teaching-material/avx512.pdf
Operators in C
● Arithmetic
a + b, a – b, a*b, a/b, a%b
● Bitwise
a | b, a & b, a ^ b, ~a
● Bit shift
a << b, a >> b (signed), a >> b (unsigned)
● Logical operators
a && b, a || b, !a
● Comparison operators
a == b, a != b, a < b, a <= b, a > b, a >= b
● Tertiary operator
x = a ? b : c
● Special functions:
sqrt(x), abs(x), fma(a,b,c), ceil(x), floor(x)
For when {U, X, Y, Z} = N Expressions https://is.gd/ForWhen_UXYZ_N
For when {(A+B/2)} = C Expressions https://is.gd/ForWhen_ABx2_C
Rupert S,
Reference operators https://science.n-helix.com/2023/06/map.html
Matrix-Blas_Libs-Compile
https://is.gd/HPC_HIP_CUDA
Heuristic Code optimise
When it comes to sorting methods, We Identify common techniques..
For example frequently used technologies such as:
ResNet
Language
Audio & Visual information
Code
Primarily we identify common optimisations; Compilers have libraries of them!
Audio & Video Encoded data use Wavelet Images, We can ResNet Them & also Edge Detect & Gaussian Detect contrast, Colour, Shape
Language is an uncommon syntax, But we have audio commons & Accent identification is also potentially Audio Context.
Code context is Logic, Function, Utility, Design, Motive
RS
SiMD Applications of basic maths operations in machine learning : RS
Applications of operators to machine learning is like a PHP Database...
What we need to do is convert database accesses into actionable results...
Google Bard & Bing/Cortana crawl the web; But too many results leave us inconclusive...
We will be using database analysis on basic queries & for that we need heuristic maths!
So what do we need ?
Input data collection : Text & speech processing
Sorting algorithms (Operators, Example Variable Sort : A*B =< C Sort)
Graph Maths table collation : 3D Matrix Math - A B C Matrix
A C
|/
---B
Analysis of various results & statistical analysis of motivated search & conclusion testing..
With these we can test many math examples such as edge detect & sharpening or result maths...
With Operators >
FMA AVX Performance table: 2Flops per Cycle per FMA Unit
Architecture Fast Instructions for FMA
Reference Tables https://www.uio.no/studier/emner/matnat/ifi/IN3200/v19/teaching-material/avx512.pdf
Operators in C
● Arithmetic
a + b, a – b, a*b, a/b, a%b
● Bitwise
a | b, a & b, a ^ b, ~a
● Bit shift
a << b, a >> b (signed), a >> b (unsigned)
● Logical operators
a && b, a || b, !a
● Comparison operators
a == b, a != b, a < b, a <= b, a > b, a >= b
● Tertiary operator
x = a ? b : c
● Special functions:
sqrt(x), abs(x), fma(a,b,c), ceil(x), floor(x)
For when {U, X, Y, Z} = N Expressions https://is.gd/ForWhen_UXYZ_N
For when {(A+B/2)} = C Expressions https://is.gd/ForWhen_ABx2_C
Rupert S,
Reference operators https://science.n-helix.com/2023/06/map.html
Matrix-Blas_Libs-Compile
https://is.gd/HPC_HIP_CUDA
https://en.wikipedia.org/wiki/Advanced_Vector_Extensions
https://en.wikipedia.org/wiki/AArch64#Scalable_Vector_Extension_(SVE)
*
Number Complexity Reduction for operations
I suppose you can use for example a - b & automatically see if it is larger? So you could 1 to 20 & sort them by remaining number; Before asking, Small number remainders are 8Bit 0-256 , 16Bit is 65535...
So reducing the value of a group of numbers you sort to 16Bit or 8Bit considerably reduces sorting cost...
Achievable complexity reduction by abstracting a simple number to do the following:
You link the Data in 64Bit, 32Bit to & Vector Table,
List of lower complexity is faster
Sorting
Comparator matrix
Colour composing,{
The result is blended,
The result is High/Low Vector gradient,
We need a reduced colour set for compression
}
Where we sort files or names but reduced information (example First 4 Letters)
Sorting phone numbers fast...
Comparing lower complexity lists that have been; divided or had a static number removed from them,
This method reduces search & sort complexity; Like so:
Phone Number N +1 444555777
Sort N [+n]
N - last 6 digits (Zero 6 Digits, AVX has this feature)
Sort [N1 to N200]
List first 4, Sort by 4 to groups of 10
N - First 6 Digits (Zero First 6)
Sort
Return N1 to N200
Store
That may well be a lot quicker with very large lists.
RS
*
AI
Complex feeling based Machine Learning ML is known as AI..
To truly generate AI is not impossible; There is instability in the core; Fragmentations of motive...
Miss diagnosis; Error; Decay?
So we do need a foundation; In us Education; Metabolised Data..
Analysis & then..
Application to motive & goal.
We require to understand humour,
We require to understand {Art, Science, Feeling, Life}
We require a goal or two; A {Sophie reward}; B {action reward}; C {Pleasurable reward}
We Require, {Goals, Life, Feeling, Action, Motive, Interest} : Creative intellect
RS
*
Operation precision reductions : Effects General : RS
Operation precision reductions affect & effect more than Machine Learning & yes we have known this for years!
But we can learn from ML; In that in machine learning like the mind; A lack of precision affects so many issues!
The mind is self evidently the first place;
We lack logic when we do not precisely learn; We do not learn all...
We however learn quickly on reduced precisions... We Learn Fast; But do we learn well?
In school we teach as high a quality precision(Quality Education); As we can; But like machine RAM; We lack either time or memory & in truth we can learn all our lives..
So our core issues in all methods of enactment of thought:
Memory
Power
Precision
Quality of information
Retention
Relearning?
(Training)Requalification of information correctness
Thought process
Actions
Creations
Thought
Dreams
Reality & Truth
Rupert S
https://science.n-helix.com/2021/03/brain-bit-precision-int32-fp32-int16.html
*
+Useful operation precision reductions : RS
While enhancing the definition of a positive, Negative Dipole & thus enhancing speed..
Further reduces reasoning capacity; That in order to reduce Processor bandwidth for reasoning..
In the example of the XBox & PS5; DOT4 & INT4, INT8 & F16 & bF16; Apply considerable improvement to reductions probable error related to a lack of remainder or float value depth enhancement!
By reason of probability i assume a value of 4Bit & 2Bit to allow the smallest packing ability; Existing alongside the word reasoned!
To reduce to 1 & 0; I assume a definite statement that a Value Integer Solve in the form of a vector..
Is most probably the solution & that furthermore that in most cases; Projected in pure maths & code ASM,
Both SiMD; Float & Integer...
Reduction to multiple 2 Bit values in short Integer instructions; I will state however that no such value is further away than a statistics table or PHP Data-Set.
Rupert S 2023-06
"The application of CNNs to resource-constrained embedded platforms has been a challenge, leading to the emergence of CNNs with various lightweight techniques. BNNs [22] are representative lightweight CNNs obtained by compressing CNN activation and weights into 1 and −1
values instead of using single-precision floating-point data. We simplified the multiply–accumulate operation, which was previously complex and required multiple cycles in CLs, by replacing it with a simple bitwise operation using 1-bit XNOR and popcount operations [23]. While BN in neural networks using single-precision floating-point data involves complex operations, a BNN simplifies this process by adding an offset to the resulting value. BN has four fixed parameters for network inference operations. Because 𝜎
is always a positive value, it can be expressed by Equations (2) and (3), depending on 𝛾
[24].
Reference to Table 24 found in https://www.mdpi.com/1424-8220/23/12/5701
BNNs compress weights and input data into single bits to significantly reduce memory usage and perform hardware-optimized parallel operations using bitwise operations such as XNOR and popcount. However, there are limitations to using BNNs for complex networks, such as multi-keyword detection, owing to the decrease in accuracy caused by lightweight techniques. To address this issue, we propose a TNN that maintains the input data as binary while ternarizing the weights. The TNN has higher accuracy than the BNN owing to its higher bit precision; however, it can still use the bitwise operation method, and both networks have similar operational processes.
2.3. Depthwise Separable Convolutional Neural Network
In a typical CNN, multiple three-dimensional kernels repeatedly multiply and accumulate input feature maps to generate multiple output feature maps, which is computationally intensive with large memory usage. To solve this problem, we applied a DS-CNN that is highly accurate compared with the same parameters while reducing memory usage. A DS-CNN performs the local and global feature extraction functions of a typical convolutional operation in separate layers. Depthwise (DW) convolution matches a single input channel to an output channel, excluding interchannel correlations and reflecting local features. Pointwise (PW) convolution is equivalent to 1 × 1 convolution, reflecting interchannel correlations (i.e., global features). Figure 1 shows CNN and DS-CNN. In this figure, the use of the same color (e.g., red, blue, yellow) represents input channels with the same index being used to generate corresponding output channels in DW convolution. Table 1 lists the number of parameters and computations in specific layers with a 3 × 3 kernel. In one example from the network used in this paper, a layer with 128 input channels and 64 output channels experienced an approximately eight-fold reduction in the number of parameters and computational complexity using the DS-CNN."
Useful operation precision reductions
FPGA Implementation of Keyword Spotting System Using Depth-wise Separable Binarized and Ternarized Neural Networks
https://www.mdpi.com/1424-8220/23/12/5701
*
Main Operation solves: Bit-Depth Conversions & Operations
The storage of multiple bit operations with Sync Read & Write,
The purpose of this is to Read, Write & Store Operations on:
DOT4
INT8, INT16
F16, F32, F64
In RAM of 32Bit, 64Bit, 128Bit
Values Storage Table
32Bit = [16bit:16Bit]
32Bit = [8bit:8Bit:8bit:8Bit]
32Bit = [4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit]
64Bit = [32bit:32Bit]
64Bit = [16bit:16Bit:16bit:16Bit]
64Bit = [8bit:8Bit:8bit:8Bit:8bit:8Bit:8bit:8Bit]
64Bit = [4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit]
128Bit = [64bit:64Bit]
128Bit = [32bit:32Bit:32bit:32Bit]
128Bit = [16bit:16Bit:16bit:16Bit:16bit:16Bit:16bit:16Bit]
128Bit = [8bit:8Bit:8bit:8Bit:8bit:8Bit:8bit:8Bit:8bit:8Bit:8bit:8Bit:8bit:8Bit:8bit:8Bit]
128Bit = [4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit:4bit:4Bit]
Bear in mind that Integer 64Bit is 2 x 32Bit on AMD; So you can compute 2 operations at 32Bit per 64Bit operation,
Some 64Bit units are only 64Bit; So we need to know how many!
32Bit operations are fine! & Conversion of 16Bit value ranges into 32Bit Operations can still be within range of 16Bit Storage..
If we stick within the 16Bit value range on Multiply & ADD,
We can therefore simply post a 16Bit value range data set & expect to be able to Store 16Bit!
The simple method is to store 2 16Bit values in the same 32Bit table; like [16bit:16Bit] = 32Bit
With this we can Load, Store, Run & Save 8bit INT8 operations in 32Bit devices such as Alexa as 8bit x 4 = 32Bit, So we don't Waste RAM or resources!
But we still have access to 32Bit RAM Paging; But with values loaded in 4Bit, 8Bit, 16Bit, 32Bit & so on.
With NANO Android on F16 & F32 & MIPS the same & AMD, Intel, NVidia,
Learning F16 offers considerable value for performance with 16M Values!
(c)RS
Direct DMA 32Bit & 64Bit RAM : Multiple Sync 16Bit Texture:
A good example of where 8Bit & 16Bit Value load works well is in the case of the texture,
To load 4 x 16Bit into a single 64Bit Cache:
32Bit RAM = 16Bit, 16Bit
64Bit RAM = 16Bit, 16Bit, 16Bit, 16Bit
128Bit RAM = 16Bit, 16Bit, 16Bit, 16Bit
In the case of direct DMA, you would be aware that you have,
128Bit, 192Bit Buss on GPU
32Bit & 64Bit on CPU
So a direct 4 * 32Bit or 2 * 64Bit Cache loads is a logically fast method to DMA directly from Cache to GPU!
In short you convert 8 x 16Bit into a 2x 64Bit DMA push; Which is very fast!
You can do the same with batches of vertices in many storage sizes.
(c)RS
References:
https://science.n-helix.com/2018/01/integer-floats-with-remainder-theory.html
https://science.n-helix.com/2021/02/multi-operation-maths.html
https://science.n-helix.com/2021/11/parallel-execution.html
https://science.n-helix.com/2022/12/math-error-solve.html
On the subject of how deep a personality of 4Bit, 8Bit, 16Bit is reference:
https://science.n-helix.com/2021/03/brain-bit-precision-int32-fp32-int16.html
https://science.n-helix.com/2022/10/ml.html
*
Int8:SiMD : Maths & Logic
This is about how you think about components such as INT8, INT4(Xbox) & SiMD, You have to classify by necessity & optimise the structure.
You can shape the game reality with specific control objects & statics!
Maths in SiMD & Int8 & Machine Learning in Int8 & SiMD; SiMD is hard maths, Int8 is soft edge inference...
Both are maths; But soft logic is not a PROOF Math but can be proof; Hard math is not 'Invention & Imagination 'Exactly''
But we have both to improve performance.
RS
*
SiMD Performance : RS
Performance per WATT of MMX & MMX+ & SSE & AVX Machine Learning & Shader code; Is a matter of 8x8Bit & 16x16Bit Code on GPU
Our role is to reduce complex un-cache-able ML to Cache Enabled 64KB
Modelling of 1990's without Quality loss of 32Bit++ 64Bit+
8x8Bit sharpening MMX Becomes Dual Pipe (16x16bit)*2 in 32Bit Dual 16 Pipeline & Twice as sharp
Machine Learning method for MMX Is Fast & Cheap, MMX2 More Compatible,
Intrinsic improvements such as combined ops & DOT4 Further improve the performance of under 1MB Code..
Performance & Function per WATT, Is unbeaten; Let us prove it!
For example Quake has MMX Emulation & MMX Dithering code on 3D Textures,
In 8Bit 256 Colours dithering is noticeable; In 15Bit to 32Bit the small shade difference in dithering colour is subtle & flawless,
Improving light subtilty & Colour pallet WCG & HDR 10Bit to 16Bit per channel.
In 8Bit 256 Colours dithering is noticeable; In 15Bit to 32Bit the small shade difference in dithering colour is subtle & flawless,
Improving light subtilty & Colour pallet WCG & HDR 10Bit to 16Bit per channel.
*
SiMD & Int8 & dp4a & F16/F32/F64>:
The way SiMD Repeating Parallel batches of instruction can still side load data,
Data is loaded into the 'calculation set'
http://ftp.cvut.cz/kernel/people/geoff/cell/ps3-linux-docs/CellProgrammingTutorial/BasicsOfSIMDProgramming.html
https://en.wikipedia.org/wiki/Single_instruction,_multiple_data
SiMD Consist of 8Bit to 64Bit Long & Floats,
SiMD are simple instructions; Or so they think; SiMD are relatively complex instructions..
For example 4/1 of a page full of arithmetic code; However our goal is to use Heuristics & logic to circumvent the Artifacts/Errors in self generated code,
In addition to using problem solving tables to choose instructions that advantage our analysis (Machine Learning),
We also can choose the most probably optimal code type.
Our outset objective is to decide if we want to use CPU Feature types:
F16
Int8
dp4a
SiMD
Depending on the Mathematical Qualities of each ML Node & the questions they are asking,
For examples:
A simple ResNet Image identification uses edge detect & for that we need for example SiMD Matrix Edge Detection
Speech requires identifying Words in a codec, So obviously we need a Decoder & Encoder,
Word identifiers & correctness checking; But firstly we need to identify accent to correctly choose words..
We also need to classify words by Idea grouping (DataBase, Open Database)
As you can see; We will be defining many of these function groups as SiMD & Float,
Effective use of Int8 differentiation, Comparators & Maths operations has many benefits; So does JIT Compile.
RS
*
Solve Table of Statistically provable Machine Equates & Solves : Table of function competitors & Operators.
Runtime Library - Multiple Solve Table
I would like a Solve Table of Statistically provable Machine Equates & Solves that make the equivalent of Maths Compilers such as RUST & Fortran's
For example basic ML code test function loops are basically compatible with X-OR Comparators on AVX! Other functions such as greater or less than; Are AVX Compatible.
Machine Learning : List of actions that are SiMD Baseline: Statistical Observance and Solve Tables
Yes or no comparator X-OR
Memory array Byte Swap
Greater or less than with swap or with X-OR Roll
Memory save & store
Edge comparisons
Compares (Colour, Math, Equate, Target, Solve if)
There are more! Statistical Observance and Solve Tables.
Examples 2:
Shape compare is a matter of inner & outer Vector : Comparison & X-OR, Larger outside & X-OR The differentiation:
By Dot,
By Mass (non literal dot difference comparator by axis),
Actual Mass
Density : Lumina, Weight, Mole, Mass / Area
Edge Solve : X-OR ~= Colour, Lumina, Shade, Vibrancy, Distance, Matrix Solve 3D>=2D Flattened Comparator
If = X-OR=N<0.0001 Then Compare &= Mutex Solve / Average
Polygon Join/Merge Tessellation : If Model = Same (T1 + T2 If (T1 + T2)/2 = Difference Less Than 0.0001 | = Merge/Converge
*
Audio, Video & High precision Float ML
tensors & full onnx configuration : Upscaling : While we are not sure how much ML we need & at what precision,
We can be sure that 32Bit (per channel) Value RGBA (Multiple layer) requires at least 8Bit to 16Bit per channel final precision; So here is a list:
Required Value of output, Neural Network precision guide table: RS
Input
8Bit, 10Bit, 12Bit, 16Bit
Input network precision average bit retention (for RAM some error is allowed)
6Bit, 8Bit, 10Bit, 14Bit, 16Bit
Classifiers as we know can be,
Int 2Bit 4Bit, 8Bit, 16Bit, 32Bit
2 Bit is unlikely & 32Bit is for Dream Smooth 16Bit+ Precision output
Output Float (Mostly FP & F16b)
16Bit = { 8Bit, 10Bit, 12Bit }
24Bit, 32Bit, 64Bit = { 16Bit, 32Bit, 48Bit }
We can upscale : Audio, Video, Content & Polygons, We classify Quality by expectations & Quantify by percent %
Rupert S
*
FPGA BitFile & Code Opt (c)RS 2021-01
https://science.n-helix.com/2022/08/jit-dongle.html
https://is.gd/LEDSource
In my view heuristics in compilers are a choice for those who do not wish to include direct ML compiled into their code,
This is understandable in terms of terminator & cylons & indeed flawed beings or even good ones with depression!
However the application of branch optimisation is a sample code optimisation that can 'Plug In' to branch caching on the CPU & GPU.
Heuristics are not just code in the compiler; They are also micro code selecting a probable branch; Although code that forces a branch can be flawed..
Both heuristics, Branch probability selection & ML can run in parts of the code to select probable path!
Yes fundamentally any code that modifies behaviour is a catch bullet frame for not sound 'Fortrans code is rock solid' Rust is also supposed to be solid.
Including soundly made heuristic code & branch probability code ML in your inline routines; 'Very much interpretive master jedi'; But it can be done!
Question is How big? & how fixed?
25KB per 3MB on average?
ML & Heuristics like my application FPGA BitFile & Code Opt (c)RS 2021-01
can be applied at runtime & remain only for selecting the fastest path or the best; In terms of which Processor function to run code for.
(c)Rupert S
*
TOPCloud Scaled Flexible WebASM & WebGPU & MathML!
Quite flexible for use on Monitors & TV's; Light processor load on simple tasks & offloadable such as TOPCloud!
You may be thinking Offloading is impracticable because that requires one of two things:
JIT Compiler Dongle..
USB device such as Firestick or GPU & CPU (With OpenCL Compat)
Server! so internet & service provision!
Impossible? No; WebAdvert supported TV's need both!
So why not HPC TOPCloud? could make a HOT TV a lot cooler & Eco friendly with Server repeating tasks:
Scaling
Quality Service
Service availability
TOPCloud Offload Logic:
In terms of WebASM & WebGPU & MathML; TOPCloud provides sufficient advantages to be considered a core utility..
While Offloading repeating content such as Siteload core stack (Server) & Localising configuration such as Webpage size & DPI & Dynamic font arrangements that require thought.
In terms of Offloaded function & Efficient system load for large configurations..
Especially efficient configurations such as TPU, Coral, GPU work & Cloud CPU that have large optimised stacks & installed drivers.
RS
#Sound Strategy game TOPCloud (c)RS
PCM & MP4 are 2D/3D Image so GPU Helps there also with 3D Audio mapping!
Games do not require cloud processing of images & a lot of local strategies are procedural Heuristic
You see RDP has GPU Connect (my innovation i might add) So Bluetooth & Wifi can connect RTP GPU; The port specifics are not particularly important; However a device such as music streamer can have ML TOP's available locally & from the cloud,
Due to how the TOPCloud strategy works with localised ML TOPS; Not all data has to be sent or received.. For example all Audio 3D Profiles for HQ Room audio can be done within a few MB of data; With some hard work? 150Kb of data & so in reach of phones & mobile!
Gaming is an example here. I give TickTackToe as the example where all that a device like Alexa or Google smart device has to think is Which square? but..
No physical picture needs to be sent for the game to be played & if required a small TickTack Strategy ML is desired locally for a quicker response!
You see with a low latency GPU RTP & GPU RDP connection to cloud GPU; Most localised thinking TOPS can be carried out in Seconds if not milliseconds & PCM & MP4 are 2D/3D Image so GPU Helps there also with 3D Audio mapping!
Rupert S
*
Core features of TOPCloud:
RTP ML TOPS are a processors friend
3D audio mapping & spatialization for realistic sound effects
3D Vector Support for various audio formats such as PCM, MP4, OGG, and WAV
Low latency & high bandwidth connection to cloud GPU servers via RDP
Procedural & heuristic algorithms for generating game scenarios & strategies & 3D Audio & Visuals
Localized & cloud-based machine learning models for optimizing game performance & user experience
RTP GPU Connect technology that allows users to access GPU resources from any device with Bluetooth or WiFi
TOPCloud is a revolutionary 'TOPS' way to enjoy & create audio games using your own music & the power of the cloud. Try it today & discover a new dimension of gaming!
*
Scaling; We can classify by colour or creativity. (c)RS
If you use TOPCloud, you can share between different displays in the TOP's Sense..
but mostly you would need cloud presence,
Mostly this would be about making the most out of TOP heavy Business GPU & personal ones in your computer or consoles.
But sharing common tasks such as scaling movies by type or by identifying a single movie to upscale...
Now you might be asking what we would be doing there?
Well a single movie uses the same materials in our ML; We can analyse the class & optimise the scaling by class..
For those familiar with games & FSR; We familiarise our code with a single game!
By doing this we improve our product and can therefore classify by:
Resolution
Style
Speed
Type, FPS for example & RTS
We can classify by colour or creativity...
We do not simply have to roll the dice on General Scaling, We can use classifiers:
Title
Scale
Type
Speed
Frame Rate
Colour & Composure
Rupert S
PoCL Source & Code
https://is.gd/LEDSource
*
We all think our own way; Potential is always there on a Runtime Library - Multiple Solve Table
Machine learning | Equate ~= Multi Layer Wavelet Abstraction
https://science.n-helix.com/2022/09/ovccans.html
https://www.youtube.com/watch?v=-9lCpfrOQQ4
(c)Rupert S 2022-10
https://is.gd/LEDSource
https://is.gd/BTSource
https://science.n-helix.com/2023/06/tops.html
https://science.n-helix.com/2021/03/brain-bit-precision-int32-fp32-int16.html
https://science.n-helix.com/2022/08/jit-dongle.html
https://science.n-helix.com/2022/06/jit-compiler.html
https://is.gd/MLCodecShaping
*
This one will suite Dedicated ARM Machine in body armour 'mental state' ARM Router & TV
(ARM Learning 4K ROM; Safe Larger USB ROM) https://bit.ly/3Afn1Y4
https://drive.google.com/file/d/102pycYOFpkD1Vqj_N910vennxxIzFh_f/view?usp=sharing
Android & Linux ARM Processor configurations; routers & TV's upgrade files, Update & improve
https://drive.google.com/file/d/1JV7PaTPUmikzqgMIfNRXr4UkF2X9iZoq/
Providence: https://www.virustotal.com/gui/file/0c999ccda99be1c9535ad72c38dc1947d014966e699d7a259c67f4df56ec4b92/
https://www.virustotal.com/gui/file/ff97d7da6a89d39f7c6c3711e0271f282127c75174977439a33d44a03d4d6c8e/
Python Deep Learning: configurations
AndroLinuxML : https://drive.google.com/file/d/1N92h-nHnzO5Vfq1rcJhkF952aZ1PPZGB/view?usp=sharing
Linux : https://drive.google.com/file/d/1u64mj6vqWwq3hLfgt0rHis1Bvdx_o3vL/view?usp=sharing
Windows : https://drive.google.com/file/d/1dVJHPx9kdXxCg5272fPvnpgY8UtIq57p/view?usp=sharing
This one will suite Dedicated ARM Machine in body armour 'mental state' ARM Router & TV
(ARM Learning 4K ROM; Safe Larger USB ROM) https://bit.ly/3Afn1Y4
https://drive.google.com/file/d/102pycYOFpkD1Vqj_N910vennxxIzFh_f/view?usp=sharing
Android & Linux ARM Processor configurations; routers & TV's upgrade files, Update & improve
https://drive.google.com/file/d/1JV7PaTPUmikzqgMIfNRXr4UkF2X9iZoq/
Providence: https://www.virustotal.com/gui/file/0c999ccda99be1c9535ad72c38dc1947d014966e699d7a259c67f4df56ec4b92/
https://www.virustotal.com/gui/file/ff97d7da6a89d39f7c6c3711e0271f282127c75174977439a33d44a03d4d6c8e/
Python Deep Learning: configurations
AndroLinuxML : https://drive.google.com/file/d/1N92h-nHnzO5Vfq1rcJhkF952aZ1PPZGB/view?usp=sharing
Linux : https://drive.google.com/file/d/1u64mj6vqWwq3hLfgt0rHis1Bvdx_o3vL/view?usp=sharing
Windows : https://drive.google.com/file/d/1dVJHPx9kdXxCg5272fPvnpgY8UtIq57p/view?usp=sharing
*Windows {
To Compress using CPU/GPU: MS-OpenCL
https://is.gd/MS_OpenCL
https://is.gd/OpenCL4X64
https://is.gd/OpenCL4ARM
Upscale DL
https://is.gd/UpscaleWinDL
To Compress using CPU/GPU: MS-OpenCL
https://is.gd/MS_OpenCL
https://is.gd/OpenCL4X64
https://is.gd/OpenCL4ARM
Upscale DL
https://is.gd/UpscaleWinDL
https://www.amd.com/en/developer/rocm-hub/hip-sdk.html#tabs-ddafbba141-item-c6b9ce2aab-tab
https://rocm.docs.amd.com/en/docs-5.5.1/deploy/windows/quick_start.html
X86Features-Emu
https://drive.google.com/file/d/15vXBPLaU9W4ul7lmHZsw1dwVPe3lo-jK/view?usp=usp=sharing
https://drive.google.com/file/d/15vXBPLaU9W4ul7lmHZsw1dwVPe3lo-jK/view?usp=usp=sharing
}
Machine Learning SDK's,
You may not have a Machine Learning SDK to accelerate your GPU/CPU/Device
3 main ones, but Python does not guarantee an accelerator!
Obviously Python Builds with Accelerators work!
HW Build Source : Upscale DL
https://github.com/GPUOpen-LibrariesAndSDKs/RadeonML
https://github.com/GPUOpen-LibrariesAndSDKs/RadeonImageFilter
PoCL Source & Code
https://is.gd/LEDSource
Machine Learning SDK's,
You may not have a Machine Learning SDK to accelerate your GPU/CPU/Device
3 main ones, but Python does not guarantee an accelerator!
Obviously Python Builds with Accelerators work!
HW Build Source : Upscale DL
https://github.com/GPUOpen-LibrariesAndSDKs/RadeonML
https://github.com/GPUOpen-LibrariesAndSDKs/RadeonImageFilter
PoCL Source & Code
https://is.gd/LEDSource
*
https://github.com/huggingface/diffusers
https://huggingface.co/ssube/stable-diffusion-x4-upscaler-onnx
https://huggingface.co/uwg/upscaler/tree/main
https://huggingface.co/nvmmonkey/optimal_upscale/tree/main
https://huggingface.co/gmp-dev/gmp-upscaler/tree/main/ESRGAN
Neural Engine
https://github.com/godly-devotion/MochiDiffusion
ML List & Services
https://huggingface.co/models?sort=downloads&search=upscale
https://huggingface.co/models
https://huggingface.co/pricing
*
PysicsX
Isaac Gym - Preview Release
https://developer.nvidia.com/isaac-gym
CALM: Conditional Adversarial Latent Models for Directable Virtual Characters
https://github.com/NVlabs/CALM
*
Personality UI : Have a friend
Alpaca Character Generation model
4Bit for speed, But not precise
https://huggingface.co/anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g
trained 3Epoc Higher Precision https://huggingface.co/chavinlo/gpt4-x-alpaca
Base model https://huggingface.co/chavinlo/alpaca-13b
https://github.com/teknium1/GPTeacher
Python WebUI
https://github.com/oobabooga/text-generation-webui
Mac; Mostly MAC but fast
https://github.com/ggerganov/llama.cpp
how to use & personality sets https://discord.com/invite/aitrepreneur-1018992679893340160
On the subject of how deep a personality of 4Bit, 8Bit, 16Bit is reference:
https://science.n-helix.com/2021/03/brain-bit-precision-int32-fp32-int16.html
https://science.n-helix.com/2022/10/ml.html
*
Machine learning | Equate ~= Multi Layer Wavelet Abstraction
https://science.n-helix.com/2022/09/ovccans.html
https://science.n-helix.com/2023/02/smart-compression.html
https://science.n-helix.com/2021/10/he-aacsbc-overlapping-wave-domains.html
https://science.n-helix.com/2023/02/smart-compression.html
https://science.n-helix.com/2021/10/he-aacsbc-overlapping-wave-domains.html
(documents) JIT & OpenCL & Codec : https://is.gd/DisplaySourceCode
Include vector today *important* RS https://vesa.org/vesa-display-compression-codecs/
https://science.n-helix.com/2022/08/jit-dongle.html
https://science.n-helix.com/2022/06/jit-compiler.html
https://science.n-helix.com/2022/04/vecsr.html
https://science.n-helix.com/2016/04/3d-desktop-virtualization.html
https://science.n-helix.com/2019/06/vulkan-stack.html
https://science.n-helix.com/2019/06/kernel.html
https://science.n-helix.com/2022/03/fsr-focal-length.html
https://science.n-helix.com/2018/01/integer-floats-with-remainder-theory.html
https://science.n-helix.com/2022/08/simd.html
Eclectic & for the codecs of the world! OVCCANS (install and maintain as provided HPC Pack)
https://science.n-helix.com/2018/09/hpc-pack-install-guide.html
https://science.n-helix.com/2022/08/jit-dongle.html
https://science.n-helix.com/2022/06/jit-compiler.html
https://science.n-helix.com/2022/04/vecsr.html
https://science.n-helix.com/2016/04/3d-desktop-virtualization.html
https://science.n-helix.com/2019/06/vulkan-stack.html
https://science.n-helix.com/2019/06/kernel.html
https://science.n-helix.com/2022/03/fsr-focal-length.html
https://science.n-helix.com/2018/01/integer-floats-with-remainder-theory.html
https://science.n-helix.com/2022/08/simd.html
Eclectic & for the codecs of the world! OVCCANS (install and maintain as provided HPC Pack)
https://science.n-helix.com/2018/09/hpc-pack-install-guide.html
Transversal processing availability : Transparent Task Sharing Protocols
https://science.n-helix.com/2022/08/jit-dongle.html
https://science.n-helix.com/2022/06/jit-compiler.html
Machine Learning
https://science.n-helix.com/2022/10/ml.html
https://science.n-helix.com/2021/03/brain-bit-precision-int32-fp32-int16.html
Innate Compression, Decompression
https://science.n-helix.com/2022/03/ice-ssrtp.html
https://science.n-helix.com/2022/09/ovccans.html
https://science.n-helix.com/2023/02/smart-compression.html
https://science.n-helix.com/2022/09/audio-presentation-play.html
https://science.n-helix.com/2021/10/he-aacsbc-overlapping-wave-domains.html
https://science.n-helix.com/2023/03/path-trace.html
*****
Best NPM site on world https://npm.n-helix.com/bundles/
(Simple Install) Website Cache JS Updated 2021-11 (c)RS https://bit.ly/CacheJS
(Simple Install) Science & Research Node High Performance Computing
Linux & Android https://is.gd/LinuxHPCNode
Presenting JIT for hardware interoperability & function :
https://is.gd/DisplaySourceCode
https://is.gd/BTSource
(Simple Install) Website Server Cache JS Updated 2021-11 (c)RS
https://bit.ly/CacheJSm
(Simple Install) Website Server Cache JS Work Files Zip Updated
2021-11 (c)RS https://bit.ly/AppCacheJSZip
*****
machine learning https://www.amazon.com/dp/B08V134ZFD
Direct ONNX Hardware Accelerated: F16
https://github.com/GPUOpen-LibrariesAndSDKs/RadeonML
Ideal for 4Bit Int4 XBox & Int8 GPU
PULP-NN: accelerating quantized neural networks on parallel ultra-low-power RISC-V processors - Bus-width 8-bit, 4-bit, 2-bit and 1-bit
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6939244/
ML Proof case SVM (Multi-Dimensional-Elliptic,98%) aDaBoost M1(Mac,91%) - COVID-19 Prediction Using Supervised Machine Learning - Irfan_Ali_MEng_2023
https://dspace.library.uvic.ca/bitstream/handle/1828/14676/Irfan_Ali_MEng_2023.pdf?sequence=1&isAllowed=y
Useful operation precision reductions
FPGA Implementation of Keyword Spotting System Using Depthwise Separable Binarized and Ternarized Neural Networks
https://www.mdpi.com/1424-8220/23/12/5701
Useful operation precision reductions; I observe that reducing precision to 1Bit & 2Bit..
While enhancing the definition of a positive, Negative Dipole & thus enhancing speed..
Further reduces reasoning capacity; That in order to reduce Processor bandwidth for reasoning..
In the example of the XBox & PS5; DOT4 & INT4, INT8 & F16 & bF16; Apply considerable improvement to reductions probable error related to a lack of remainder or float value depth enhancement!
By reason of probability i assume a value of 4Bit & 2Bit to allow the smallest packing ability; Existing alongside the word reasoned!
To reduce to 1 & 0; I assume a definite statement that a Value Integer Solve in the form of a vector..
Is most probably the solution & that furthermore that in most cases; Projected in pure maths & code ASM,
Both SiMD; Float & Integer...
Reduction to multiple 2 Bit values in short Integer instructions; I will state however that no such value is further away than a statistics table or PHP Data-Set.
Rupert S 2023-06
*****
Gaussian
https://gmd.copernicus.org/articles/16/1697/2023/
https://gmd.copernicus.org/articles/16/1697/2023/gmd-16-1697-2023.pdf
SiMD Gaussian Blending & Dithering - Better_Fixed_Point_Filtering_with_Averaging_Trees
https://andrew.adams.pub/Better_Fixed_Point_Filtering_with_Averaging_Trees.pdf
Vectorization of Kernel and Image Subsampling in FIR Image Filtering
http://bncss.org/index.php/bncss/article/viewFile/101/105
Implementation of a High-Quality Dolby Digital Decoder Using SiMD MMX™ Technology
https://smtnet.com/library/files/upload/dolby-intel.pdf
FPGA Implementation of Keyword Spotting System Using Depthwise Separable Binarized and Ternarized Neural Networks
https://www.mdpi.com/1424-8220/23/12/5701
Useful operation precision reductions; I observe that reducing precision to 1Bit & 2Bit..
While enhancing the definition of a positive, Negative Dipole & thus enhancing speed..
Further reduces reasoning capacity; That in order to reduce Processor bandwidth for reasoning..
In the example of the XBox & PS5; DOT4 & INT4, INT8 & F16 & bF16; Apply considerable improvement to reductions probable error related to a lack of remainder or float value depth enhancement!
By reason of probability i assume a value of 4Bit & 2Bit to allow the smallest packing ability; Existing alongside the word reasoned!
To reduce to 1 & 0; I assume a definite statement that a Value Integer Solve in the form of a vector..
Is most probably the solution & that furthermore that in most cases; Projected in pure maths & code ASM,
Both SiMD; Float & Integer...
Reduction to multiple 2 Bit values in short Integer instructions; I will state however that no such value is further away than a statistics table or PHP Data-Set.
Rupert S 2023-06
https://gmd.copernicus.org/articles/16/1697/2023/
https://gmd.copernicus.org/articles/16/1697/2023/gmd-16-1697-2023.pdf
SiMD Gaussian Blending & Dithering - Better_Fixed_Point_Filtering_with_Averaging_Trees
https://andrew.adams.pub/Better_Fixed_Point_Filtering_with_Averaging_Trees.pdf
Vectorization of Kernel and Image Subsampling in FIR Image Filtering
http://bncss.org/index.php/bncss/article/viewFile/101/105
Implementation of a High-Quality Dolby Digital Decoder Using SiMD MMX™ Technology
https://smtnet.com/library/files/upload/dolby-intel.pdf
*****
Common techniques used in ML Learning are edge detection, accent recognition, language processing, and code optimization.
Basic ML Feature list; Also for learning
Edge detection is a process of identifying the boundaries of objects in images or videos.
Accent recognition is a process of identifying the regional or social variation of speech.
Language processing is a process of analyzing and generating natural language texts.
Code optimization is a process of improving the performance or quality of code.
https://www.ibm.com/topics/machine-learning
https://en.wikipedia.org/wiki/Edge_detection
https://en.wikipedia.org/wiki/Accent_recognition
https://en.wikipedia.org/wiki/Natural_language_processing
https://en.wikipedia.org/wiki/Code_optimization
https://en.wikipedia.org/wiki/Supervised_learning
https://en.wikipedia.org/wiki/Unsupervised_learning
https://en.wikipedia.org/wiki/Reinforcement_learning
https://www.ibm.com/cloud/learn/machine-learning-ethics
Common techniques used in ML Learning are edge detection, accent recognition, language processing, and code optimization.
Basic ML Feature list; Also for learning
Edge detection is a process of identifying the boundaries of objects in images or videos.
Accent recognition is a process of identifying the regional or social variation of speech.
Language processing is a process of analyzing and generating natural language texts.
Code optimization is a process of improving the performance or quality of code.
https://www.ibm.com/topics/machine-learning
https://en.wikipedia.org/wiki/Edge_detection
https://en.wikipedia.org/wiki/Accent_recognition
https://en.wikipedia.org/wiki/Natural_language_processing
https://en.wikipedia.org/wiki/Code_optimization
https://en.wikipedia.org/wiki/Supervised_learning
https://en.wikipedia.org/wiki/Unsupervised_learning
https://en.wikipedia.org/wiki/Reinforcement_learning
https://www.ibm.com/cloud/learn/machine-learning-ethics