Dual Blend Mode with Vectors
Dual Blend Mode comes in 3 parts to accelerate the display:
The CPU / GPU combined rendering is handled by Fence mode for synchronisation,
Vectors & textures are handled by the Multi Source Rendering pipeline & VecSR
https://science.n-helix.com/2022/04/vecsr.html
https://science.n-helix.com/2016/04/3d-desktop-virtualization.html
https://science.n-helix.com/2019/06/vulkan-stack.html
https://science.n-helix.com/2025/06/dualblend.html
https://science.n-helix.com/2022/06/jit-compiler.html
https://science.n-helix.com/2022/08/jit-dongle.html
https://science.n-helix.com/2022/09/audio-presentation-play.html
https://science.n-helix.com/2022/06/jit-compiler.html
https://science.n-helix.com/2022/08/jit-dongle.html
https://science.n-helix.com/2022/09/audio-presentation-play.html
Hardware Acceleration
Dual Blending viable in the case of Offloading, Examples of Audio, 3D Audio, Video, 3D & other Dual blending modes,
The Fence & combined texture modes can be used in many fields that use PCM Graphs, Graphs & the general practice of blending sources together,
Primarily aimed at the concept of inset video & graphics, Audio is an FFT Graph you know!
Apply carefully, You never know when a CPU will help in combination with Hardware Accelerators..
USB, Motherboard Audio, GPU & so on.. A case exists for accelerating Bluetooth Gear from the JIT Compiler & Dongle,..
*
The Fencing plan: (c)RS
The Fencing plan is to layer actions at the speed CPU & GPU modify content in single frames,
With Vulkan & DirectX 12 We worked so hard to make the API front the GPU Directly so that the CPU is not stalling the game,..
Innate Compression, Decompression
https://science.n-helix.com/2022/03/ice-ssrtp.html
https://science.n-helix.com/2022/09/ovccans.html
https://science.n-helix.com/2023/02/smart-compression.html
The Fencing plan: (c)RS
The Fencing plan is to layer actions at the speed CPU & GPU modify content in single frames,
With Vulkan & DirectX 12 We worked so hard to make the API front the GPU Directly so that the CPU is not stalling the game,..
In most cases we therefore use the GPU directly, The Origins for direct GPU Low latency API lay right with RS & AMD,
However AMD had a very hard time getting their API into other GPU Manufacturers Source trees..
Microsoft DirectX12 & DX11Warp & Vulkan/OpenCL are the results..
But we need to have a CPU, An APU is normally better but we have REBAR & RDMA for CPU to GPU Data Transfers,..
There are many small issues that face Vulkan & DirectX & the ANGLE API,..
What are these issues?:
Mouse & Pointer device delivers with IO & DMA Direct to the CPU
Fonts
Sprites
Polygon maps
Textures
come from the system & hence directly from the CPU in most cases,..
SDK & API CPU originated content:
Pointers
Memory routing
System control
QAT : DMA, IO & general system control & function.
We need a direct rendering path for the CPU, We have the CPU & We can use it!
Directly leveraging the CPU's functions that are unique:
FPU 183Bit High precision floats
AVX & SVE Direct parallel computation of a fairly high speed
Integer & Float general registers
We recognise that without proper coding most CPU Direct Display rendering, does not have..:
AntiAliasing
Supersampling
Smoothing
Dithering
HDR & WCG Automated colour control
We handle these functions in the following ways:
We pass the pre-computed intermediary to the GPU
We create code that does all these in the MMX & AVX SiMD Registers
We compose the frame at a larger scaling that the GPU will use for the final rendering..
Super Upscaling is our friend and there are many forms of upscaling to use,..
For most CPU related issues of jagged edges, The solution is that the Frame is drawn at 2x the resolution or a multiple of the final size.
We can also use SiMD Dithering & SuperSampling to handle the traditional CPU Deficit of jagged edges,..
We can also colour in greyscale & primary polygons with the GPU,..
So why? Whatever the deficits of the CPU are,..
The direct high precision qualities inside the FPU & AVX/SiMD for the CPU are at least Double the final quality of most GPU Functions..
CPU FPU & SiMD & Integral 32/64Bit functions can flourish the displayed content..
Presenting an educated SDK/API sampling for what the component Processing features are takes skill! We have it, It takes education.. We have it & we will have!
Composing the Final view point from all composing parts requires a specific set of solutions:
Frame jitter (misaligned SiMD, GPU, GPU, Audio)
Finalised frame : Gating .. Fence Mode for GPU & CPU
Synchronised & fast data transfers: Enhanced IO RDMA & Rebar
Security : AES, ECC & Enhanced media protocol DMA & TCP/UDP/QUICC Hyper Frame transport
These are my solutions, These are our solutions..
Rupert S
*
QFT & VRR Fence mode: (C)RS
VRR & QFT & frame rate deviations over time..
What fence mode does is allow us to buffer a work block so all tasks are finished before we write frame Shader blocks..
We use ETA, Delivery Time & Estimated work time, To allow ML & DL to directly optimise the packet system..
Fence Mode is for DirectX, OpenCL, OpenGLES, Vulkan & VESA Displays..
CPU Rendering into the GPU SiMD Shader pipeline requires:
GL_NV_fence VK_KHR_present_wait2
https://www.phoronix.com/news/Vulkan-1.4.317-Released
https://developer.download.nvidia.com/assets/gamedev/docs/VertexArrayRange.pdf
https://registry.khronos.org/OpenGL/extensions/NV/NV_fence.txt
What Fence does is use properties to define a load group for display, We need to know that the CPU is 800Mhz to 5Ghz on average the phone,..
The Phone processor may be between 400MHz & around 2.5 (Quad core Sony) While the GPU is between 250Mhz & 1200Mhz,.. So..
When the CPU writes the Texture, Polygon & colour maps, The Cycle differentiators usually mean calculating the difference with fractions,..
CPU 2x The clock speed than GPU 2:1 Cycles per write, As an example, You can do it by polling the Frame rate & Write per frame on maths,..
Fence & Presence wait is where you set a frame delivery timeline, So we can deliver a single clear frame as a steady rate of Hz,..
Fence however does the conditional wait by groups of shaders, The relevance of this fact is that these days we use VRR & QFT & frame rate deviates over time.
The Fence solution is per screen block & We will use that to update per segment, VRR Fence mode.
Input threads, Core count multiplexed by average devision between CPU & GPU Clock Cycle Effective work,..
For example my FX8320E does 2 threads SMT per core.. So with 8 cores & 2 threads per CPU 16 total threads:
8 Cores, 2 Threads per core SMT : { a1, b1, c1, d1, a2, b2, c2, d2, a3, b3, c3, d3, a4, b4, c4, d4 };
CPU 3.5Ghz, GPU 2Ghz, So.. 3.5:2 reduces to about 1.3 to 1,
CPU AVAILABLE_ASYNC_QUEUES_AMD: 2, MAX_COMPUTE_UNITS: ?
GPU AVAILABLE_ASYNC_QUEUES_AMD: 2, MAX_COMPUTE_UNITS: 16
So at 16 CU per task, Both the CPU & GPU are fairly simple & the result is 1.3 to 1 or rather 4 to 3 when we make an approximate whole number of it..
Tasks Array at 4 to 3
A{1,2,3,4}
B{1,2,3,4}
C{1,2,3,4}
D{1,2,3,4}
This allows us to divide the screen into 16 Groups & Refresh them VRR/QFT at 2Ghz at a rate of 3.5 to 2 & 2000Mhz / 60Hz around 32x a frame,
At that approximate speed we could fully modify each zone 32x per frame,
In actual fact we would be using most of the clock cycles for Maths SiMD Tasks, Textures, Shading, 3D & 2D & DRAW..
We could still manage at least 5x per group : { a1, b1, c1, d1,a2, b2, c2, d2,a3, b3, c3, d3,a4, b4, c4, d4 };
We can Fence each zone & VRR / QFT as we want.
Rupert S
*
A Multiple Source Rendering Pipeline
Dual source blending is going to make a lot of sense for games,
Where DirectX12 removes the CPU from the game render target,
Dual Source in not just composing 2 Shaders in a single pixel array, It is also composing with more than 1 device,..
CPU & GPU & Also Parallel Multi Render pipeline..
Using direct CPU blends for menus & small polygon renders (in High resolution SR) Where the CPU non alpha blend makes sense!
Well it makes more sense when you can : MMX AVX SiMD Blends & Especially ADDER blends that can use the CPU Integer Instructions!
Observations of the CPU to GPU Pipeline are like so!
Texture creation can be expensive to CPU, So you cannot go far,
Simple Texture example, As in Simple to Compress on CPU
However you can use texture formats like Grey Scale Alpha : RA, RX, RGA to emulate grey shading for polygon draw, So called texture on top of, CPU Rendering,
SVG XML
Another format that can be used by the CPU is SVG & SVG allows rendering of polygons in an optimised layer or 3D Mesh,..
Polygons can be pre culled by the CPU from high resolution meshes & created as SVG XML
Polygon SVG / Font Dictionary Estimation
Fonts & Polygon cache fonts: SVG XML & Font Systems can compose dictionaries of polygon shapes to estimate the final result from Dictionary estimation..
How does it work?
You cube map your outline polygon (present in 3D Render or there is no work)
Estimate the best shape from a pre composed & optimised Polygon Font that has shapes in 2D & 3D in the dictionary,..
The result is that high quality pre composed polygons can be pushed into the ZBuffer & frame space,..
Both as a texture, & or cube map in ZBuffer for uploading to GPU,..
Allows dynamic content such as explosions & effects such as skin deformation & bones, noses, exetera to be hand crafted for the scene but dynamically made into the final render,..
Thus saving storage with pre compressed content.
Logical proof that shaders can add pre composed textures to emulate polygons...
Rupert S
*
Chrome Example : Dual source blending : RS
A game or chrome requires a UI, But we will discuss the process of rendering with the CPU & GPU Productively & well,..
Method list:
Dynamic Micro ZBuffers, We wish to render a depth array of polygons then a Micro ZBuffer is allocated to part of the screen & a depth,..
We will Assign an array of 10 Layers, In ML you use layers for dimensions & we will do the same,.. 10 layers is a reasonable amount for a web page,.. We could easily assign more!
Layers { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 }, We can assign layers to groups : { A, B, C, D };
We assign a Micro ZBuffer with dimensions { X, Y } : { A, B, C, D } : { X, Y, Z } & Location Displacement on screen : { Xd, Yd };
We will be compressing our ZBuffers & We will be using:
SVG XML Polygons Packing for pre rendering,.. & Font Hinting to save further processing requirements
Processed MATHS XML
We can use Texture Conversion of we like!
Normally we would be flattening the layer on finalisation for ease of use,..
Rendering is one of Polygon arrays, SVG XML Polygons, Textures & fonts.
We will pass Compressed Micro Zbuffers back and forth between the GPU & the CPU to make the work look seamless!
We will thus be able to process MATHS XML on both the GPU & The CPU at the same time, Per frame
Rupert S
*
Cube maps & Micro ZBuffers
Method:
Now to assign a Micro ZBuffer or cube map, We have to fetch the full screen space size & map the screen space to cubes,..
We can Shader render each Cube Map & Micro ZBuffer with either Textures & SVG Polygon drawing or with a depth array ZBuffer,..
We can also allocate the Full ZBuffer from the task, But Allocating the Full Buffer is too large for our cache arrangement,..
So we allocate Micro ZBuffers & Cube Maps that we can draw polygon arrays into (For 3D & 2D AKA WebGL & WebGPU),..
We can also arrange RGBA Textures & SVG Polygons in layers or mapped to 2D & 3D Shapes,..
Cube Maps, Micro ZBuffers & Textures, SVG Polygons once Mapped to the buffer allow dynamic refreshing with low latency & Processor usage,..
ZBuffer & Cube Map Buffer
A, B, C, D, E, F, G
1:
2:
3:
4:
5:
6:
7:
Micro Allocation sample:
4 Block
Location C, D, 1, 2
Content 'Buffer Array' {(), (), (), ()}
If we move the screen wie can remap the displacement map & virually move it,..
If we allocate the entire screen space / Web Page / UI to the total space then we can displace in the total CPU / GPU ZBuffer,..
We can keep a small displacement map locally with a size of ZBuffer parameter that does not take too large a space in RAM / Cache.
Rupert S
*
Cube maps & Layered rendering : (c)RS
The problem: Efficiency
Right so chrome looks wonderful but Micro ZBuffers, Layers , Cube maps 10 PCX deep, aka 10 layers,.. GPU Usage 100% on simple Video pages with 4K Video & Chat window, Youtube & Social media & so on..
Now the stream looks very good, But 100% GPU,..
The 3D plan:
FSR, 4x AA, Super Sampling 16x Performance, Texture sharpening, blending etcetera on
Now i will iterate the following: Rendering methodology
Firstly The Micro ZBuffers, Cube maps, Polygon & texture layers should be as deep as required by the page,..
Total depth is a reference of 5 deep, So for example :
Overlay & Video
Micro Cube-Map / MicroZBuffer : 10x10pcx by 5 Deep for overlay,
3D & 2D Deep content
Micro Cube-Map / MicroZBuffer : 10x10pcx by 10 Deep x X, Y, Z, or maybe larger Cube-Map / ZBuffer Depth Sizes
Most mages would have about 4 layers:
Game / App / Chrome UI render layer 1
Fonts & Overlays Layer 2
Page layout Layer 3
Video Stream Layer 4
Application overlay Layer 1
Text overlays & Page UI Layer 2
Now in order of priority if the Video is priority, that needs to be layer 3
That is 3 to 5 layers of 1 pcx each
Now 3D Deep rendered UI with 3D Images, for example a plane radar or integrity page 3D Image in the UI means the same priority lists:
Game / App / Chrome UI render layer 1
Fonts & Overlays Layer 2
Page layout Layer 3
Boxes for animation 3D 4 : Deep Cubemaps
Video Stream Layer 5
3D & 2D Application content, such as a plane window 6 : Deep Cubemaps
Firstly they have to define depth, 2D Content is layer or cubemap, But not with a lot of depth,..
Firstly on the performance side, RX280 4G can render Frontier Elite Dangerous in 2000 x 1000 at 60FPS with FSR, 4x AA, Super Sampling 16x Performance, Texture sharpening, blending etcetera on..
Firstly the layers or cubemaps should each be as deep as required but no deeper,..
Video is preferred 1 layer deep &or Cube-Mapped on a single layer,..
Don't over analyse depth test on animated 2D content, Single test, Depth & run texture at the correct depth, Does not need to be refreshed, Unless changed.
Rupert S
*
3D Layers & 3D Geometric micro layers / Micro ZBuffers / Render tiles & other forms of mathematical geometry, For use in ML, Learning & Graphics presentation : RS
Optimising ML, Using Micro Tiling Dimensional Arrays, Sometimes separate so the Computation can be in parallel & also optimised per thread.
Machine Learning in the sense of using OpenCL & Direct-Compute & other API Involving 3D geometries.
Now it makes sense while regarding the other works in this document to think about Geometric, Volumetric & Layer Acceleration in Machine Learning,..
A common feature to use in Code & ML is OpenCL, JIT Compiler & Direct Compute,..
Maths are the primary strategic appliance of ML, Afterall Maths & calculations are the majority of our education & function as higher education, Work & life in research & practice ..
Common usage of dimensions in thought & Human, Machine, ML &or AI:
Common arguments on the maths arrangement of ML, Is reason, Now Greek Philosophers, Nay Scientists displaced water with the apple & founded Mass,..
Doctors measure wounds & count leassions & germs or viruses & cell counts for cancer!
Engineers need to measure a bridge or create one with the required strengths, mass & tension & ofcourse the desire for that to look good too, So aesthetics!
Dimensional parameters are used to create rules by evolution in ML, That is to say that we measure the "Game of Life", If you don't know the game of life,..
G.O.L is normally germs, microbes, ants & other life forms such as Humans, Humans? Yes! Rogue is a common game of life game that has existed so far back that it was drawn in ASCII Text on an IBM 16 Colour computer & a BBC Micro!
So we need dimensions for something like 80% of all ML is dimension related maths..
We can use dimension size priority from the earlier work in this document & state that we will be optimising the ML, Using Micro Tiling Dimensional Arrays, Sometimes separate so the Computation can be in parallel & also optimised per thread.
So we will be using the following concepts in ML & Application Gaming:
Layers, Cube-Maps & Micro ZBuffers present our dimensional arrays & the methods by which we shall compress & optimise our operations..
Buffer & Micro ZBuffer Technology:
Layered Drawing : { 3D & 2D }
SVG Vector : { 3D & 2D }
Texture Format sent directly to the display : { 3D & 2D }
DSC Frame, Directly Rendered
Codecs & Frame By Frame : Texture & SVG + Vectors
Machine Learning & Draw related functions :
N Cubic < N 2 + 1 & so on, Gather & Scatter, Layer, Dimension & so on
https://www.w3.org/TR/webnn/#api-mlgraphbuilder-gathernd
The relevance to us is that both WebGPU/GL & WebNN can scatter & group,..
Known as multithreading & Single Thread performance modes:
Gather them into optimised groups
Scatter them over an array of independent tasks,
Combine tasks on a CPU... for single thread heavy
Scatter them so that we can parallel thread..
Tessellate between them
Combine or multi thread Polygon &or draw
Polygons for example in dense fields require Grouping & Scattering, .. So we can:
Do .. Work & #DoWorkSocial
Rupert S
*
Deep Random forest
ML for tasks such as Audio 3D is basically a Deep Random forest,
Basically a Gaussian mesh that is optimised over days,
In essence once trained they require almost no processing,
Think of a random forest as 9000 option choices in a configuration.
You may begin training Random Forests to your hearts content,
The main content XML Tables & option choice lists,..
Compress with GZIP, Deflate, LZ4 & done!
Moderately simple tasks with a regular tick, Such as pace makers & car wheels, Enhancement
RS
*
Direct Vectors : A deeper View : VESA, Displays & Applications
https://en.wikipedia.org/wiki/Matrix_(mathematics)
High Performance Direct Rendering & Indirect Texture Creation & Presentation,..
Expected Hardware for modern 4K & 8K TV & Monitor : Mali GPU & ARM CPU with Vector SiMD, X64 AMD/Intel X86, RISCV + Vec,..
With these capacities we can! #YesWeCan
VESA & HDMI drawing directly to the Frame
DSC & Texture Formats from the CPU & GPU is a logical choice, So we directly write Vector Drawing directly to the Texture Format & with Anti Aliasing, Super Sampling & Dithering Error Reductions for HDR & WCG,..
Direct Vector is where we send Vectors along the pipeline to the display,..
We can simplify the contents as 2D with SVG Polygons & Flat texture rendering,..
We can make it complex & use 3D ZBuffers or Layered Rendering, For common usage we would prefer to flatten, Apart from 3D TV's & VR! Where 3D Input has more processors available on the display..
Send that directly to the Display from the GPU & CPU.
Table :
Internally rendered from CPU & GPU & Sent to display : Direct & Indirect, Device to device rendering pipeline.
Buffer & Micro ZBuffer Technology:
Layered Drawing : { 3D & 2D }
SVG Vector : { 3D & 2D }
Texture Format sent directly to the display : { 3D & 2D }
DSC Frame, Directly Rendered
Codecs & Frame By Frame : Texture & SVG + Vectors
Layers, Cube-Maps & Micro ZBuffers present our dimensional arrays & the methods by which we shall compress & optimise our operations..
The VBE Video Bios Extensions have not been updated, So we will make these!
But some 2D & 3D SDK will be useful!
The objective being to accelerate the HDMI & VESA Display Ports, The Displays, The applications such as Games, Chrome, Angle, DirectX & OpenCL/GL, Vulkan & Metal
https://shawnhargreaves.com/freebe/freebs12.zip
https://github.com/google/angle
Reference
https://is.gd/SVG_DualBlend https://is.gd/MediaSecurity https://is.gd/JIT_RDMA
https://is.gd/PackedBit https://is.gd/BayerDitherPackBitDOT
https://is.gd/QuantizedFRC https://is.gd/BlendModes https://is.gd/TPM_VM_Sec
https://is.gd/IntegerMathsML https://is.gd/ML_Opt https://is.gd/OPC_ML_Opt https://is.gd/OPC_ML_QuBit https://is.gd/QuBit_GPU https://is.gd/NUMA_Thread
(C)Rupert S
Additional information on VBE
The VBE Bios Extensions have not been updated, So 2D & 3D Drawing may not be standard
"VESA Bios version 3.0 (access to linear framebuffer video memory, high speed protected mode bank switching, page flipping, hardware scrolling, etc), and adds the ability to use 2D hardware acceleration in an efficient and portable manner"
2D+3D Acceleration Reference Video-Bios-Extension V3
http://www.petesqbsite.com/sections/tutorials/tuts/vbe3.pdf
https://en.wikipedia.org/wiki/VESA_BIOS_Extensions
https://www.thejat.in/learn/vesa-bios-extensions-vbe
https://shawnhargreaves.com/freebe/
https://shawnhargreaves.com/freebe/freebs12.zip
https://www.drdobbs.com/architecture-and-design/examining-the-vesa-vbe-20-specification/184409592
However AMD had a very hard time getting their API into other GPU Manufacturers Source trees..
Microsoft DirectX12 & DX11Warp & Vulkan/OpenCL are the results..
But we need to have a CPU, An APU is normally better but we have REBAR & RDMA for CPU to GPU Data Transfers,..
There are many small issues that face Vulkan & DirectX & the ANGLE API,..
What are these issues?:
Mouse & Pointer device delivers with IO & DMA Direct to the CPU
Fonts
Sprites
Polygon maps
Textures
come from the system & hence directly from the CPU in most cases,..
SDK & API CPU originated content:
Pointers
Memory routing
System control
QAT : DMA, IO & general system control & function.
We need a direct rendering path for the CPU, We have the CPU & We can use it!
Directly leveraging the CPU's functions that are unique:
FPU 183Bit High precision floats
AVX & SVE Direct parallel computation of a fairly high speed
Integer & Float general registers
We recognise that without proper coding most CPU Direct Display rendering, does not have..:
AntiAliasing
Supersampling
Smoothing
Dithering
HDR & WCG Automated colour control
We handle these functions in the following ways:
We pass the pre-computed intermediary to the GPU
We create code that does all these in the MMX & AVX SiMD Registers
We compose the frame at a larger scaling that the GPU will use for the final rendering..
Super Upscaling is our friend and there are many forms of upscaling to use,..
For most CPU related issues of jagged edges, The solution is that the Frame is drawn at 2x the resolution or a multiple of the final size.
We can also use SiMD Dithering & SuperSampling to handle the traditional CPU Deficit of jagged edges,..
We can also colour in greyscale & primary polygons with the GPU,..
So why? Whatever the deficits of the CPU are,..
The direct high precision qualities inside the FPU & AVX/SiMD for the CPU are at least Double the final quality of most GPU Functions..
CPU FPU & SiMD & Integral 32/64Bit functions can flourish the displayed content..
Presenting an educated SDK/API sampling for what the component Processing features are takes skill! We have it, It takes education.. We have it & we will have!
Composing the Final view point from all composing parts requires a specific set of solutions:
Frame jitter (misaligned SiMD, GPU, GPU, Audio)
Finalised frame : Gating .. Fence Mode for GPU & CPU
Synchronised & fast data transfers: Enhanced IO RDMA & Rebar
Security : AES, ECC & Enhanced media protocol DMA & TCP/UDP/QUICC Hyper Frame transport
These are my solutions, These are our solutions..
Rupert S
*
QFT & VRR Fence mode: (C)RS
VRR & QFT & frame rate deviations over time..
What fence mode does is allow us to buffer a work block so all tasks are finished before we write frame Shader blocks..
We use ETA, Delivery Time & Estimated work time, To allow ML & DL to directly optimise the packet system..
Fence Mode is for DirectX, OpenCL, OpenGLES, Vulkan & VESA Displays..
CPU Rendering into the GPU SiMD Shader pipeline requires:
GL_NV_fence VK_KHR_present_wait2
https://www.phoronix.com/news/Vulkan-1.4.317-Released
https://developer.download.nvidia.com/assets/gamedev/docs/VertexArrayRange.pdf
https://registry.khronos.org/OpenGL/extensions/NV/NV_fence.txt
What Fence does is use properties to define a load group for display, We need to know that the CPU is 800Mhz to 5Ghz on average the phone,..
The Phone processor may be between 400MHz & around 2.5 (Quad core Sony) While the GPU is between 250Mhz & 1200Mhz,.. So..
When the CPU writes the Texture, Polygon & colour maps, The Cycle differentiators usually mean calculating the difference with fractions,..
CPU 2x The clock speed than GPU 2:1 Cycles per write, As an example, You can do it by polling the Frame rate & Write per frame on maths,..
Fence & Presence wait is where you set a frame delivery timeline, So we can deliver a single clear frame as a steady rate of Hz,..
Fence however does the conditional wait by groups of shaders, The relevance of this fact is that these days we use VRR & QFT & frame rate deviates over time.
The Fence solution is per screen block & We will use that to update per segment, VRR Fence mode.
Input threads, Core count multiplexed by average devision between CPU & GPU Clock Cycle Effective work,..
For example my FX8320E does 2 threads SMT per core.. So with 8 cores & 2 threads per CPU 16 total threads:
8 Cores, 2 Threads per core SMT : { a1, b1, c1, d1, a2, b2, c2, d2, a3, b3, c3, d3, a4, b4, c4, d4 };
CPU 3.5Ghz, GPU 2Ghz, So.. 3.5:2 reduces to about 1.3 to 1,
CPU AVAILABLE_ASYNC_QUEUES_AMD: 2, MAX_COMPUTE_UNITS: ?
GPU AVAILABLE_ASYNC_QUEUES_AMD: 2, MAX_COMPUTE_UNITS: 16
So at 16 CU per task, Both the CPU & GPU are fairly simple & the result is 1.3 to 1 or rather 4 to 3 when we make an approximate whole number of it..
Tasks Array at 4 to 3
A{1,2,3,4}
B{1,2,3,4}
C{1,2,3,4}
D{1,2,3,4}
This allows us to divide the screen into 16 Groups & Refresh them VRR/QFT at 2Ghz at a rate of 3.5 to 2 & 2000Mhz / 60Hz around 32x a frame,
At that approximate speed we could fully modify each zone 32x per frame,
In actual fact we would be using most of the clock cycles for Maths SiMD Tasks, Textures, Shading, 3D & 2D & DRAW..
We could still manage at least 5x per group : { a1, b1, c1, d1,a2, b2, c2, d2,a3, b3, c3, d3,a4, b4, c4, d4 };
We can Fence each zone & VRR / QFT as we want.
Rupert S
*
A Multiple Source Rendering Pipeline
Dual source blending is going to make a lot of sense for games,
Where DirectX12 removes the CPU from the game render target,
Dual Source in not just composing 2 Shaders in a single pixel array, It is also composing with more than 1 device,..
CPU & GPU & Also Parallel Multi Render pipeline..
Using direct CPU blends for menus & small polygon renders (in High resolution SR) Where the CPU non alpha blend makes sense!
Well it makes more sense when you can : MMX AVX SiMD Blends & Especially ADDER blends that can use the CPU Integer Instructions!
Observations of the CPU to GPU Pipeline are like so!
Texture creation can be expensive to CPU, So you cannot go far,
Simple Texture example, As in Simple to Compress on CPU
However you can use texture formats like Grey Scale Alpha : RA, RX, RGA to emulate grey shading for polygon draw, So called texture on top of, CPU Rendering,
SVG XML
Another format that can be used by the CPU is SVG & SVG allows rendering of polygons in an optimised layer or 3D Mesh,..
Polygons can be pre culled by the CPU from high resolution meshes & created as SVG XML
Polygon SVG / Font Dictionary Estimation
Fonts & Polygon cache fonts: SVG XML & Font Systems can compose dictionaries of polygon shapes to estimate the final result from Dictionary estimation..
How does it work?
You cube map your outline polygon (present in 3D Render or there is no work)
Estimate the best shape from a pre composed & optimised Polygon Font that has shapes in 2D & 3D in the dictionary,..
The result is that high quality pre composed polygons can be pushed into the ZBuffer & frame space,..
Both as a texture, & or cube map in ZBuffer for uploading to GPU,..
Allows dynamic content such as explosions & effects such as skin deformation & bones, noses, exetera to be hand crafted for the scene but dynamically made into the final render,..
Thus saving storage with pre compressed content.
Logical proof that shaders can add pre composed textures to emulate polygons...
Rupert S
*
Chrome Example : Dual source blending : RS
A game or chrome requires a UI, But we will discuss the process of rendering with the CPU & GPU Productively & well,..
Method list:
Dynamic Micro ZBuffers, We wish to render a depth array of polygons then a Micro ZBuffer is allocated to part of the screen & a depth,..
We will Assign an array of 10 Layers, In ML you use layers for dimensions & we will do the same,.. 10 layers is a reasonable amount for a web page,.. We could easily assign more!
Layers { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 }, We can assign layers to groups : { A, B, C, D };
We assign a Micro ZBuffer with dimensions { X, Y } : { A, B, C, D } : { X, Y, Z } & Location Displacement on screen : { Xd, Yd };
We will be compressing our ZBuffers & We will be using:
SVG XML Polygons Packing for pre rendering,.. & Font Hinting to save further processing requirements
Processed MATHS XML
We can use Texture Conversion of we like!
Normally we would be flattening the layer on finalisation for ease of use,..
Rendering is one of Polygon arrays, SVG XML Polygons, Textures & fonts.
We will pass Compressed Micro Zbuffers back and forth between the GPU & the CPU to make the work look seamless!
We will thus be able to process MATHS XML on both the GPU & The CPU at the same time, Per frame
Rupert S
*
Cube maps & Micro ZBuffers
Method:
Now to assign a Micro ZBuffer or cube map, We have to fetch the full screen space size & map the screen space to cubes,..
We can Shader render each Cube Map & Micro ZBuffer with either Textures & SVG Polygon drawing or with a depth array ZBuffer,..
We can also allocate the Full ZBuffer from the task, But Allocating the Full Buffer is too large for our cache arrangement,..
So we allocate Micro ZBuffers & Cube Maps that we can draw polygon arrays into (For 3D & 2D AKA WebGL & WebGPU),..
We can also arrange RGBA Textures & SVG Polygons in layers or mapped to 2D & 3D Shapes,..
Cube Maps, Micro ZBuffers & Textures, SVG Polygons once Mapped to the buffer allow dynamic refreshing with low latency & Processor usage,..
ZBuffer & Cube Map Buffer
A, B, C, D, E, F, G
1:
2:
3:
4:
5:
6:
7:
Micro Allocation sample:
4 Block
Location C, D, 1, 2
Content 'Buffer Array' {(), (), (), ()}
If we move the screen wie can remap the displacement map & virually move it,..
If we allocate the entire screen space / Web Page / UI to the total space then we can displace in the total CPU / GPU ZBuffer,..
We can keep a small displacement map locally with a size of ZBuffer parameter that does not take too large a space in RAM / Cache.
Rupert S
*
Cube maps & Layered rendering : (c)RS
The problem: Efficiency
Right so chrome looks wonderful but Micro ZBuffers, Layers , Cube maps 10 PCX deep, aka 10 layers,.. GPU Usage 100% on simple Video pages with 4K Video & Chat window, Youtube & Social media & so on..
Now the stream looks very good, But 100% GPU,..
The 3D plan:
FSR, 4x AA, Super Sampling 16x Performance, Texture sharpening, blending etcetera on
Now i will iterate the following: Rendering methodology
Firstly The Micro ZBuffers, Cube maps, Polygon & texture layers should be as deep as required by the page,..
Total depth is a reference of 5 deep, So for example :
Overlay & Video
Micro Cube-Map / MicroZBuffer : 10x10pcx by 5 Deep for overlay,
3D & 2D Deep content
Micro Cube-Map / MicroZBuffer : 10x10pcx by 10 Deep x X, Y, Z, or maybe larger Cube-Map / ZBuffer Depth Sizes
Most mages would have about 4 layers:
Game / App / Chrome UI render layer 1
Fonts & Overlays Layer 2
Page layout Layer 3
Video Stream Layer 4
Application overlay Layer 1
Text overlays & Page UI Layer 2
Now in order of priority if the Video is priority, that needs to be layer 3
That is 3 to 5 layers of 1 pcx each
Now 3D Deep rendered UI with 3D Images, for example a plane radar or integrity page 3D Image in the UI means the same priority lists:
Game / App / Chrome UI render layer 1
Fonts & Overlays Layer 2
Page layout Layer 3
Boxes for animation 3D 4 : Deep Cubemaps
Video Stream Layer 5
3D & 2D Application content, such as a plane window 6 : Deep Cubemaps
Firstly they have to define depth, 2D Content is layer or cubemap, But not with a lot of depth,..
Firstly on the performance side, RX280 4G can render Frontier Elite Dangerous in 2000 x 1000 at 60FPS with FSR, 4x AA, Super Sampling 16x Performance, Texture sharpening, blending etcetera on..
Firstly the layers or cubemaps should each be as deep as required but no deeper,..
Video is preferred 1 layer deep &or Cube-Mapped on a single layer,..
Don't over analyse depth test on animated 2D content, Single test, Depth & run texture at the correct depth, Does not need to be refreshed, Unless changed.
Rupert S
*
3D Layers & 3D Geometric micro layers / Micro ZBuffers / Render tiles & other forms of mathematical geometry, For use in ML, Learning & Graphics presentation : RS
Optimising ML, Using Micro Tiling Dimensional Arrays, Sometimes separate so the Computation can be in parallel & also optimised per thread.
Machine Learning in the sense of using OpenCL & Direct-Compute & other API Involving 3D geometries.
Now it makes sense while regarding the other works in this document to think about Geometric, Volumetric & Layer Acceleration in Machine Learning,..
A common feature to use in Code & ML is OpenCL, JIT Compiler & Direct Compute,..
Maths are the primary strategic appliance of ML, Afterall Maths & calculations are the majority of our education & function as higher education, Work & life in research & practice ..
Common usage of dimensions in thought & Human, Machine, ML &or AI:
Common arguments on the maths arrangement of ML, Is reason, Now Greek Philosophers, Nay Scientists displaced water with the apple & founded Mass,..
Doctors measure wounds & count leassions & germs or viruses & cell counts for cancer!
Engineers need to measure a bridge or create one with the required strengths, mass & tension & ofcourse the desire for that to look good too, So aesthetics!
Dimensional parameters are used to create rules by evolution in ML, That is to say that we measure the "Game of Life", If you don't know the game of life,..
G.O.L is normally germs, microbes, ants & other life forms such as Humans, Humans? Yes! Rogue is a common game of life game that has existed so far back that it was drawn in ASCII Text on an IBM 16 Colour computer & a BBC Micro!
So we need dimensions for something like 80% of all ML is dimension related maths..
We can use dimension size priority from the earlier work in this document & state that we will be optimising the ML, Using Micro Tiling Dimensional Arrays, Sometimes separate so the Computation can be in parallel & also optimised per thread.
So we will be using the following concepts in ML & Application Gaming:
Layers, Cube-Maps & Micro ZBuffers present our dimensional arrays & the methods by which we shall compress & optimise our operations..
Buffer & Micro ZBuffer Technology:
Layered Drawing : { 3D & 2D }
SVG Vector : { 3D & 2D }
Texture Format sent directly to the display : { 3D & 2D }
DSC Frame, Directly Rendered
Codecs & Frame By Frame : Texture & SVG + Vectors
Machine Learning & Draw related functions :
N Cubic < N 2 + 1 & so on, Gather & Scatter, Layer, Dimension & so on
https://www.w3.org/TR/webnn/#api-mlgraphbuilder-gathernd
The relevance to us is that both WebGPU/GL & WebNN can scatter & group,..
Known as multithreading & Single Thread performance modes:
Gather them into optimised groups
Scatter them over an array of independent tasks,
Combine tasks on a CPU... for single thread heavy
Scatter them so that we can parallel thread..
Tessellate between them
Combine or multi thread Polygon &or draw
Polygons for example in dense fields require Grouping & Scattering, .. So we can:
Do .. Work & #DoWorkSocial
Rupert S
*
Deep Random forest
ML for tasks such as Audio 3D is basically a Deep Random forest,
Basically a Gaussian mesh that is optimised over days,
In essence once trained they require almost no processing,
Think of a random forest as 9000 option choices in a configuration.
You may begin training Random Forests to your hearts content,
The main content XML Tables & option choice lists,..
Compress with GZIP, Deflate, LZ4 & done!
Moderately simple tasks with a regular tick, Such as pace makers & car wheels, Enhancement
RS
*
Direct Vectors : A deeper View : VESA, Displays & Applications
https://en.wikipedia.org/wiki/Matrix_(mathematics)
High Performance Direct Rendering & Indirect Texture Creation & Presentation,..
Expected Hardware for modern 4K & 8K TV & Monitor : Mali GPU & ARM CPU with Vector SiMD, X64 AMD/Intel X86, RISCV + Vec,..
With these capacities we can! #YesWeCan
VESA & HDMI drawing directly to the Frame
DSC & Texture Formats from the CPU & GPU is a logical choice, So we directly write Vector Drawing directly to the Texture Format & with Anti Aliasing, Super Sampling & Dithering Error Reductions for HDR & WCG,..
Direct Vector is where we send Vectors along the pipeline to the display,..
We can simplify the contents as 2D with SVG Polygons & Flat texture rendering,..
We can make it complex & use 3D ZBuffers or Layered Rendering, For common usage we would prefer to flatten, Apart from 3D TV's & VR! Where 3D Input has more processors available on the display..
Send that directly to the Display from the GPU & CPU.
Table :
Internally rendered from CPU & GPU & Sent to display : Direct & Indirect, Device to device rendering pipeline.
Buffer & Micro ZBuffer Technology:
Layered Drawing : { 3D & 2D }
SVG Vector : { 3D & 2D }
Texture Format sent directly to the display : { 3D & 2D }
DSC Frame, Directly Rendered
Codecs & Frame By Frame : Texture & SVG + Vectors
Layers, Cube-Maps & Micro ZBuffers present our dimensional arrays & the methods by which we shall compress & optimise our operations..
The VBE Video Bios Extensions have not been updated, So we will make these!
But some 2D & 3D SDK will be useful!
The objective being to accelerate the HDMI & VESA Display Ports, The Displays, The applications such as Games, Chrome, Angle, DirectX & OpenCL/GL, Vulkan & Metal
https://shawnhargreaves.com/freebe/freebs12.zip
https://github.com/google/angle
Reference
https://is.gd/SVG_DualBlend https://is.gd/MediaSecurity https://is.gd/JIT_RDMA
https://is.gd/PackedBit https://is.gd/BayerDitherPackBitDOT
https://is.gd/QuantizedFRC https://is.gd/BlendModes https://is.gd/TPM_VM_Sec
https://is.gd/IntegerMathsML https://is.gd/ML_Opt https://is.gd/OPC_ML_Opt https://is.gd/OPC_ML_QuBit https://is.gd/QuBit_GPU https://is.gd/NUMA_Thread
(C)Rupert S
Additional information on VBE
The VBE Bios Extensions have not been updated, So 2D & 3D Drawing may not be standard
"VESA Bios version 3.0 (access to linear framebuffer video memory, high speed protected mode bank switching, page flipping, hardware scrolling, etc), and adds the ability to use 2D hardware acceleration in an efficient and portable manner"
2D+3D Acceleration Reference Video-Bios-Extension V3
http://www.petesqbsite.com/sections/tutorials/tuts/vbe3.pdf
https://en.wikipedia.org/wiki/VESA_BIOS_Extensions
https://www.thejat.in/learn/vesa-bios-extensions-vbe
https://shawnhargreaves.com/freebe/
https://shawnhargreaves.com/freebe/freebs12.zip
https://www.drdobbs.com/architecture-and-design/examining-the-vesa-vbe-20-specification/184409592
*****
References
https://science.n-helix.com/2019/06/vulkan-stack.html
https://science.n-helix.com/2022/04/vecsr.html
https://science.n-helix.com/2016/04/3d-desktop-virtualization.html
References
https://science.n-helix.com/2019/06/vulkan-stack.html
https://science.n-helix.com/2022/04/vecsr.html
https://science.n-helix.com/2016/04/3d-desktop-virtualization.html
https://science.n-helix.com/2025/06/dualblend.html
VSR https://drive.google.com/file/d/1hewfYqLmY0z-Am800LMR-6H-P5J0Sr0N/view?usp=drive_link
VecSR https://drive.google.com/file/d/1WDvpD9a6TttMTmIz_sRYWaQT3RExBuSq/view?usp=drive_link
https://science.n-helix.com/2022/10/ml.html
https://science.n-helix.com/2021/03/brain-bit-precision-int32-fp32-int16.html
https://science.n-helix.com/2022/06/jit-compiler.html
https://science.n-helix.com/2022/08/jit-dongle.html
https://science.n-helix.com/2022/09/audio-presentation-play.html
VSR https://drive.google.com/file/d/1hewfYqLmY0z-Am800LMR-6H-P5J0Sr0N/view?usp=drive_link
VecSR https://drive.google.com/file/d/1WDvpD9a6TttMTmIz_sRYWaQT3RExBuSq/view?usp=drive_link
https://science.n-helix.com/2022/10/ml.html
https://science.n-helix.com/2021/03/brain-bit-precision-int32-fp32-int16.html
https://science.n-helix.com/2022/06/jit-compiler.html
https://science.n-helix.com/2022/08/jit-dongle.html
https://science.n-helix.com/2022/09/audio-presentation-play.html
Innate Compression, Decompression
https://science.n-helix.com/2022/03/ice-ssrtp.html
https://science.n-helix.com/2022/09/ovccans.html
https://science.n-helix.com/2023/02/smart-compression.html
https://is.gd/SVG_DualBlend https://is.gd/MediaSecurity https://is.gd/JIT_RDMA
https://is.gd/PackedBit https://is.gd/BayerDitherPackBitDOT
https://is.gd/QuantizedFRC https://is.gd/BlendModes https://is.gd/TPM_VM_Sec
https://is.gd/IntegerMathsML https://is.gd/ML_Opt https://is.gd/OPC_ML_Opt https://is.gd/OPC_ML_QuBit https://is.gd/QuBit_GPU https://is.gd/NUMA_Thread
https://is.gd/PackedBit https://is.gd/BayerDitherPackBitDOT
https://is.gd/QuantizedFRC https://is.gd/BlendModes https://is.gd/TPM_VM_Sec
https://is.gd/IntegerMathsML https://is.gd/ML_Opt https://is.gd/OPC_ML_Opt https://is.gd/OPC_ML_QuBit https://is.gd/QuBit_GPU https://is.gd/NUMA_Thread
No comments:
Post a Comment