Mainly because when CBoMontin Processor Scheduler is intending to run in cache; We need to optimise the scheduler for each Processor Cache size & depth,
Ordering instructions from inside the Processor Cache required optimised code; We create our task list interfaces (UDP & TCP Port approximates) inside the cache..
We prefetch our workloads from kernel space & user space & order them into our processor workflows,
The main process polls priority & nice values for each task & can select the processing order..
We would be prioritising the tasks onto the same processor as the parent task if those tasks are in the same application..
For that we would have to know if the task requires out of order execution or in order; Tasks such as video rendering can afford to have Audio & Video on two threads; However time stamps will be required to be precise!
Priority sort based on applied function groups combined with optimised processor selection,
Preferred thread & processor for sustained & fast function & reduced processor to processor transfers..
From that we compile our Cache Loaded Runtime & optimise our Processor, Process & priority.
RS
QoS To Optimise the routing: Task Management To optimise the process
https://science.n-helix.com/2021/11/monticarlo-workload-selector.htmlhttps://science.n-helix.com/2023/02/pm-qos.htmlTransparent Task Sharing Protocols
https://science.n-helix.com/2022/08/jit-dongle.htmlhttps://science.n-helix.com/2022/06/jit-compiler.html*
Monticarlo Workload Selector
CPU, GPU, APU, SPU, ROM, Kernel & Operating system :
CPU/GPU/Chip/Kernel Cache & Thread Work Operations management
In/out Memory operations & CU feature selection are ordered into groups based on:
CU Selection is preferred by Chip features used by code & Cache in-lining in the same group.
Global Use (In application or common DLL) Group Core CU
Localised Thread group, Sub prioritised to Sub CU in location of work use
Prioritised to local CU with Chip feature available & with lower utilisation (lowers latency)
{ Monticarlos In/Out }
System input load Predictable Statistic analysis }
Monticarlo Assumed averages per task }
System: IO, IRQ, DMA, Data Motion }
{ Process by Advantage }
{ Process By Task FeatureSet }
{ Process by time & Tick & Clock Cycle: Estimates }
{ Monticarlos Out/In }
Random task & workload optimiser ,
Task & Workload Assignment Requestor,
Pointer Allocator,
Cache RAM Allocation System.
Multithreaded pointer Cache Object tasks & management.
{SEV_TDL_TDX Kernel Interaction mount point: Input & Output by SSL Code Class}:
{Code Runtime Classification & Arch:Feature & Location Store: Kernel System Interaction Cache Flow Buffer}
https://is.gd/SEV_SSLSecureCorehttps://is.gd/SSL_DRM_CleanKernel*
Based upon the fact that you can input Monti Carlos Semi Random Ordered work loads into the core process:
*Core Process Instruction*
CPU, Cache, Light memory load job selector
Resident in Cache L3 for 256KB+- Cache list + Code 4Kb L2 with list access to L3
L2:L3 <> L1 Data + Instruction
*formula*
(c)RS 12:00 to 14:00 Haptic & 3D Audio : Group Cluster Thread SPU:GPU CU
Merge = "GPU+CPU SiMD" 3D Wave (Audio 93% * Haptic 7%)
Grouping selector
3D Wave selector
Group Property value A = Audio S=Sound G=Geometry V=Video H=Haptic B=Both BH=BothHaptic
CPU Int : ID+ (group of)"ASGVH"
Float ops FPU Light localised positioning 8 thread
Shader ID + Group 16 Blocks
SiMD/AVX Big Group 2 Cycle
GPU CU / Audio CU (Localised grouping MultiThreads)
https://www.youtube.com/watch?v=cJkx-OLgLzo
*
Task & Workload Assignment Requestor : Memory & Power
We have to bear in mind power requirements & task persistence in the :Task & Workload Assignment Requestor
knowledge of the operating systems requirements:
Latency list in groups { high processor load requirements > Low processor load requirements } : { latency Estimates }
Ram load , Store & clear {high burst : 2ns < 15ns } GB/s Ordered
Ram load , Store & clear {high burst : 5ns < 20ns } MB/s Disordered
GPU Ram load , Store & clear {high burst : 2ns < 15ns } GB/s Ordered
AUDIO Ram load , Store & clear {high burst : 1ns < 15ns } MB/s Disordered
AUDIO Ram load , Store & clear {high burst : 1ns < 15ns } MB/s Ordered
AUDIO Ram load , Store & clear {high burst : 1ns < 15ns } KB/s Disordered
Network load , Send & Receive {Medium burst : 2ns < 15ns } GB/s OrderedNetwork load , Send & Receive {high burst : 1ns < 20ns } MB/s Disordered
Hard drive management & storage {medium : 15ns SSD < 40ns HDD}
*
Also Good for disassociated Asymmetric cores; Since these pose a significant challenge to most software,
However categorising by Processor function yields remarkable classification abilities:
Processor Advanced Instruction set
Core speed
Importance
Location in association with a group of baton passing & interthread messaging & cache,
Symmetry classed processes & threads.
*
Bo-Montin Workload Compute :&: Hardware Accelerated Audio : 3D Audio Dolby NR & DTS
Hardware Accelerated Audio : 3D Audio Dolby NR & DTS : Project Acoustics : Strangely enough ....
Be more positive about Audio Block : Dolby & DTS will use it & thereby in games!
Workload Compute : Where you optimise workload lists though SiMD Maths to HASH subtasks into new GPU workloads,
Simply utilize Direct ML to anticipate future motion vectors (As with video)
OpenCL & Direct Compute : Lists & Compute RAM Loads and Shaders to load...
DMA & Reversed DMA (From GPU to & from RAM)
ReBAR to vector compressed textures without intervention of one processor or another...
Compression Block :
KRAKEN & BC Compression & Decompression
&
SiMD Direct Compressed Load using the Cache Block per SiMD Work Group.
Shaders Optimised & compiled in FPU & SiMD Code form for GPU: Compiling Methods:
In advance load & compile : BRT : Before Runtime Time : task load optimised & ordered Task Executor : Bo-Montin SchedulerGPU SiMD & FPU (micro 128KB Block encoder : decoder : compiler)
CPU SiMD & FPU (micro 128KB Block encoder : decoder : compiler)
JIT : Just in Time task load optimised & ordered Task Executor : Bo-Montin Scheduler
load & compile :
GPU SiMD & FPU (micro 128KB Block encoder : decoder : compiler)
CPU SiMD & FPU (micro 128KB Block encoder : decoder : compiler)
*
Task manager opportunistically &or Systematic Resource Allocation (c)RS
We also need a direct transport tunnel for data between GPU of different types,
Firstly my experience is as follows:
I have a RX280x & RX560 & Intel® Movidius™ Neural Compute SDK Python API v2 & both do Python work! When I have this configuration the RX280x is barely used unless clearly utilized independently!
The Task manager & Python needs to directly transfer workloads a processor tasks between each system processor,
Not limited to the primary Processor (4Ghz FX8320E) & the AVX supporting Movidius & to & from the RX280 & RX560, Both however supported direct Video rendering & Encoding though DX12,
However the RX6500 does not directly support the AMD Hardware Encode under DX12.1 (New Version 2022-04-21)
& That RX560 comes in handy! if the Video rendering work is directly transferred to RX560 or RX280x & Encoded there!
Therefore I clearly see 2 examples.. & there are more!
Clearly Movidius is advantaged for scaler work on behalf of the Python process & in addition the Upscaling RSR & Dynamic Resolution; We do however need directly to have the Task manager opportunistically or systematically plan the use of resources & Even the processor could offload AVX Work.
No-one has this planned & We DO.
*
Multiple Busses &or Processor Features in an Open Compute environment with competitive task scheduling
[Task Scheduler] Monticarlo-Workload-Selector
We prioritise data traffic by importance & Need to ensure that all CPU Functions are used...
In the case of a Chiplet GPU We need to assign function groups to CU & QoS is used to asses available Multiple BUSS Capacities over competing merits,
[Merits : Buss Data Capacity, Buss Cycles, Available Features, Function Endpoint]
PM-QoS is a way of Prioritising Buss traffic to processor functions & RAM & Storage Busses that:
States a data array such as:
Buss Width
divisibility ((Example) Where you transform a 128Bit buss into 32Bit x 4 Data motions and synchronize the transfers,
Data Transfer Cycles Available
Used Data Rate / Total Data Throughput Rate = N
(c)Rupert S https://science.n-helix.com
Kernel Computation Resources Management :
OpenCL, Direct Compute, Compute Shaders & MipMaps :
Optimisation of all system resource use & management 2022 HPC RS
On the matter of Asymmetric GPU / CPU configuration, As in when 2 GPU are not of the same Class or from different providers,
Such a situation is when the motherboard is NVidia & the GPU is AMD for example.
We need both to work, So how?
Firstly the kind of work matters: Operating System Managed Workload Scheduler : Open CL & Direct X as examples:
Firstly PCI 1+ has DMA Transfers of over 500MB/s so data transfer is not a problem,
Secondly DMA is card based; So a shader can transfer work.
Third the memory transfer can be compressed; Does not need to transition mainly though the CPU..
No Cache Issue; Same for Audio Bus
MipMaping is an example with a low PCI to PCI DMA Transfer cost,
But Shaders & OpenCL or Direct Compute are primary examples,
(Direct Compute & OpenCL workloads are cross compatible & convertible)
Exposing a systems potential does require that a DX11 card be utilized for MipMaps or Texture Storage & operations; Within the capacities of Direct 11, 12, 12.1 As and when compatible..
Optimisation of all system resource use & management 2022 HPC
Rupert S
*
Innate Smart Access (c)RS
The Smart-access features require 3 things:
[Innate Compression, Decompression, QoS To Optimise the routing, Task Management To optimise the process] : Task Managed Transfer : DMA:PIO : Transparent Task Sharing Protocols
The following is the initiation of the Smart-access Age
https://science.n-helix.com/2023/02/smart-compression.htmlQoS To Optimise the routing:Task Management To optimise the process
https://science.n-helix.com/2021/11/monticarlo-workload-selector.html
https://science.n-helix.com/2023/02/pm-qos.html
Transparent Task Sharing Protocols
https://science.n-helix.com/2022/08/jit-dongle.html
https://science.n-helix.com/2022/06/jit-compiler.html
Innate Compression, Decompression
https://science.n-helix.com/2022/03/ice-ssrtp.html
https://science.n-helix.com/2022/09/ovccans.html
https://science.n-helix.com/2022/08/simd.html
*
EMS Leaf Allocations & Why we find them useful: (c)RS https://science.n-helix.com
Memory clear though page Voltage removal..
Systematic Cache randomisation flipping (On RAM Cache Directs syncobable (RAND Static, Lower quality RAND)(Why not DEV Write 8 x 16KB (Aligned Streams (2x) L2 CACHE Reasons)
Anyway in order to do this we Allocate Leaf Pages or Large Pages...
De Allocation invokes scrubbing or VOID Call in the case of a VM.
So in our case VT86 Instructions are quite useful in a Hypervisor;
&So Hypervisor from kernel = WIN!
(c)Rupert S
Reference T Clear
Opening Time Security Layering Reference PID with RDPID LeafHASH
https://lkml.org/lkml/2022/4/12/300
*
High performance firmware:
https://is.gd/SEV_SSLSecureCore
https://is.gd/SSL_DRM_CleanKernel
*
More on HRTF 3D Audio
Cyberpunk 2077 HDR : THX, DTS, Dolby : Haptic response so clear you can feel the 3D SOUND
*
AES RAND*****
If we had a front door & a back door & we said that, "That door is only available exclusively to us "Someone would still want to use our code!
AES is good for one thing! Stopping Cyber Crime!
hod Save us from total anarchistic cynicism
Rupert S
/*
* This function will use the architecture-specific hardware random
- * number generator if it is available. The arch-specific hw RNG will
- * almost certainly be faster than what we can do in software, but it
- * is impossible to verify that it is implemented securely (as
- * opposed, to, say, the AES encryption of a sequence number using a
- * key known by the NSA). So it's useful if we need the speed, but
- * only if we're willing to trust the hardware manufacturer not to
- * have put in a back door.
- *
- * Return number of bytes filled in.
+ * number generator if it is available. It is not recommended for
+ * use. Use get_random_bytes() instead. It returns the number of
+ * bytes filled in.
*/
https://lore.kernel.org/lkml/20220209135211.557032-1-Jason@zx2c4.com/t/
RAND : Callback & spinlock
Callback & spinlock are not just linux : Best we hash &or Encrypt several sources (if we have them)
If we have a pure source of Random.. we like the purity! but 90% of the time we like to hash them all together & keep the quality & source integrally variable to improve complexity.
Rupert S
https://www.spinics.net/lists/linux-crypto/msg61312.html
'function gets random data from the best available sourceThe current code has a sequence in several places that calls one or more of arch_get_random_long() or related functions, checks the return value(s) and on failure falls back to random_get_entropy().get_source long() is intended to replace all such sequences.This is better in several ways. In the fallback case it gives much more random output than random_get_entropy(). It never wasted effort by calling arch_get_random_long() et al. when the relevant config variables are not set. When it does usearch_get_random_long(), it does not deliver raw output from that function but masks it by mixing with stored random data.'
RAND : Callback & spinlock : Code Method
Spinlock IRQ Interrupted upon RAND Pool Transfer > Why not Use DMA Transfer & Memory Buffer Merge with SiMD : AVX Byte Swapping & Merge into present RAM Buffer or Future location with Memory location Fast Table.
Part of Bo-Montin Selector Code:
(CPU & Thread Synced & on same CPU)
(Thread 1 : cpu:1:2:3:4)
(RAND)
(Buffer 1) > SiMD cache & Function :
(Thread 2 : cpu:1:2:3:4)
(Memory Location Table : EMS:XMS:32Bit:64Bit)
(Selection Buffer & Transfer)
(Buffer 1) (Buffer 2) (Buffer 3)
(Entropy Sample : DieHARD : Small)
Rupert S
https://lore.kernel.org/all/20220211011446.392673-1-Jason@zx2c4.com/
Random Initiator : Linus' 50ee7529ec45
Linus' 50ee7529ec45 ("random: try to actively add entropy
rather than passively wait for it"), the RNG does a haveged-style jitter
dance around the scheduler, in order to produce entropy
The key is to initialize with a SEED key; To avoid the seed needing to be replaced too often we Encipher it in a set order with an additive key..
to create the perfect circumstances we utilize 2 seeds:
AES/SHA2/PolyCHA
Initiator math key CH1:8Bit to 32Bit High quality HASH Cryptic
& Key 2 CrH
8Bit to 256Bit : Stored HASH Cryptic
We operate maths on the differential and Crypro the HASH :
AES/SHA2/PolyCHA
CrH 'Math' CH1(1,2,3>)
AES/SHA2/PolyCHA > Save to /dev/random & use
We may also use the code directly to do unique HASH RAND & therefore keep crucial details personal or per application & MultiThreads &or CPU & GPU & Task.
Rupert S
(Spectra & Repoline Ablation) PreFETCH Statistical Load Adaptive CPU Optimising Task Manager ML(c)RS 2022
Come to think of it, Light encryption 'In State' may be possible in the Cache L3 (the main problem with repoline) & L2 (secondary) : How?
PFIO_Pol & GPIO Combined with PSLAC TaskManager (CBo_Montin) Processor, Kernel, UserSpace.
Byte Swapping for example or 16b instruction, If a lightly used instruction is used
(one that is under utilized)
Other XOR SiMD instructions can potentially be used to pre load L2 & L1 Instruction & Data.
Spectra & Repoline 1% CPU Hit : 75% improved Security : ALL CPU v& GPU Processor Type Compatible.
In Terms of passwords & SSL Certificate loads only, The Coding would take 20Minutes & consume only 0.1% of total CPU Time.
Also Good for disassociated Asymmetric cores; Since these pose a significant challenge to most software,
However categorising by Processor function yields remarkable classification abilities:
Processor Advanced Instruction set
Core speed
Importance
Location in association with a group of baton passing & interthread messaging & cache,
Symmetry classed processes & threads.
HASH Example
https://lkml.org/lkml/2022/3/17/120
https://lkml.org/lkml/2022/3/17/119
https://lkml.org/lkml/2022/3/17/116
https://lkml.org/lkml/2022/3/17/115
https://lkml.org/lkml/2022/3/17/118
https://science.n-helix.com/2022/02/interrupt-entropy.html
In reference to : https://science.n-helix.com/2021/11/monticarlo-workload-selector.html
CPU Statistical load debug 128 Thread :
https://lkml.org/lkml/2022/3/17/243
PFIO_Pol Generic Processor Function IO & Feature Statistics polling + CPUFunctionClass.h + VCache Memory Table Secure HASH
Also Good for disassociated Asymmetric cores; Since these pose a significant challenge to most software,
However categorising by Processor function yields remarkable classification abilities:
Processor Advanced Instruction set
Core speed
Importance
Location in association with a group of baton passing & interthread messaging & cache,
Symmetry classed processes & threads.
GPIO: Simple logic analyzer using polling : Prefer = Precise Core VClock + GPIO + Processor Function IO & Feature Statistics polling
https://lkml.org/lkml/2022/3/17/216
https://lkml.org/lkml/2022/3/17/215
Security bug; Solutions & explanation's :RS
https://lkml.org/lkml/2022/2/11/1082