ESA Space blog - All Rights Reserved RS: April 2017

Tuesday, April 25, 2017

RNG and the random web - Haveged / RNGTools - Chaos - Crypto - Science of Hardware & Computer Driver - entropy

RNG and the random web - Haveged / RNGTools - Chaos - Crypto - Science of Hardware & Computer Driver

(c)Rupert S

****
preface what is the difference between chaos and entropy ?
Chaos is an issue of confusion .... of logic that spirals unpredictably out of control ....
sometimes exciting, sometimes bad ... confusing, exciting .... lacking perfect definition.
Order/logic go hand in hand in the digital age....
Entropy is the disordered but ordered by average breakdown of the system onto a form that statistically meets the requirement that : (all sums eventually average to zero as much as possible)
ergo statistically : Chaos and Order/Logic both exist
---------------------------------------------------
entropy ...

Entropy or preferably random plays a very important role in science and the internet...
Security and Research both need this.

But most commonly they lack drivers ..

https://en.wikipedia.org/wiki/Comparison_of_hardware_random_number_generators

Phone & PC Random/Seed/Entropy is a problem so making an app like ubuntu's entropy seeding app,
With high quality random would be a life saver to the phone user,
In addition the RND Crng Trng or NRNG could use AES to magnify the pool ... or blow-fish etcetera !

For non rooted phones a device a RNG device installed; if RNG device impossible to install then other noise source ..
For the Phone/PC/Mac/Server OS.

*Driver Function and utilisation* (Copyright Rupert S)

Multiple sources of entropy and the hashing of that combined and injected though AES hardware
is not included.. in applications on Phone, Windows, Mac etcetera..

the use of a Hardware Encrypted cache saved to drive .. for example :

Original fresh random/entropy will be stored securely in flash and or on HD/SSD/RAM to further secure the RND Pool.

1mb of RNG data that has not been used to add to the boot source & during low ebbs in Entropy data,
To be refreshed depending on the recording media..
& additional pre ChaCha/AES/Blowfish/Twofish - Encryption mode; processed data in ram,
(Personally AES on hardware encryption devices makes sense)

(4mb is large enough to use but small enough for 256mb ram devices.)

Fortunately this is 4 weeks development at most.

So kernel inclusion of the driver base is a must

With the main tool being protected space; With distribution to user of AES; blowfish etcetera, hashed and expanded data

NX DEP protected data contained securely,

you can seed the data and remix that with new data..

mixed data is the strongest and surely the least predicable of the lot since despite using algorithms the output is clearly unpredictable.

Entropy SIM and SSD cards are an option & can contain an actual memory array flash combo to be super fast;
but economical.

(Copyright Rupert S)

*****

For a windows/phone RNG device .... i have been thinking !

You could modify the driver and make your own to take data from the RNG devices on the comports & obviously PCI etcetera..
Commonly on the Linux system entropy/RNG/Random drivers are in the kernel but are most commonly not configured properly;
These are the problems we need to fix & fix well..

Entropy SIM and SSD cards are an option & can contain an actual memory array flash combo to be super fast;
but economical.

Haveged exists on linux but not on mac or windows.... (The characteristics of Haveged are not necessarily guaranteed to have all the chaos that we need.)

However haveged is one option that combined with AES,Blowfish Random Expansion can help with Entropy issues !

Haveged is not the only solution and furthermore TRNG/CRNG need optimisation .... to Increase security and to provide true crypto/Rand function.

Haveged provides a viable additional source of entropy ....
Preferably not as the only source,
However haveged is a product that produces results,

We surely need in Random Bit starved computers and mobile markets ....

Yes the CPU/GPU configured so can obviously create logical and not so perfectly entropic results,
However we have to ask ourselves do we need random filled with a viable source available to all ?
The answer is obvious yes.

Haveged produces a data far superior to just the user input...
Furthermore the tasks running on the computer and or within the system improve the output...

As the necessity to use haveged increases;
Most likely the user will be running more tasks that need to use it ! and hence there will be better results and more of them.

yes a true TRNG is a state of peace in the true security advocates heart but there is always room for an improved haveged..
both on windows, on mac and other operating systems.

(copyright : Rupert S)

http://www.issihosts.com/haveged/index.html

https://www.irisa.fr/caps/projects/hipsor/

https://fedoraproject.org/wiki/Windows_Virtio_Drivers

viorng/: Virtio RNG driver

Seems a simple and elegant solution that would allow for the use of RNG data and would allow other devices of the same type to work well !
This would be a service to all and allow research sharing,
The driver is open source.

https://github.com/YanVugenfirer/kvm-guest-drivers-windows/blob/master/viorng/viorng/viorng.inf

https://fatminmin.com/blog/install-win10-with-virtio.html

https://pve.proxmox.com/wiki/Windows_VirtIO_Drivers

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Virtualization_Deployment_and_Administration_Guide/sect-Guest_virtual_machine_device_configuration-Random_number_generator_device.html

Other device drivers could also be made not just for virtual machines...

RS

Other tools and functions to call to make the C/N/T/RNG ... Functional - please read all !

*well thought out analysis of the entropy system care of getnetrandom & Wisconsin university*

http://pages.cs.wisc.edu/~swift/papers/oakland14-rng.pdf

*online entropy fetch with Client for windows and linux servers and soon android*

https://www.getnetrandom.com/#howitworks

https://www.getnetrandom.com/quickstart-guide.pdf

http://whitewoodsecurity.com/products/entropy-engine/

news and paper

https://eurekalert.org/pub_releases/2017-05/udg-rnh053117.php
https://www.theregister.co.uk/2015/04/30/geneva_boffins_make_light_work_of_random_numbers/
https://arxiv.org/abs/1410.2790

(c)RS

*****

Q & A (Copyright Rupert S etc)

"how can you ensure that a particular kernel driver runs before other system processes?
for example doesn't ASLR run way before anything else?"

the boot kernel drivers boot before the os with the network driver
(for secure network driver loading for server sessions)
keep a cache of rnd data and bingo
secured boot with high chaos maintenance

"to make USB tpm/dongle devices and boot is secure and the os is safe from intrusion (low priced preferably)"

the driver has to have a verified certificate

"everything makes sense here the details of boot kernel driver vs regular kernel module."

Microsoft and Redhat kernel drivers need certification on servers and generic OS implementation
go directly to them and register your certificate.

Get involved in the RNG Tools project and the kernel development for Linux,windows & mac,

Also android kernel is based on the Linux kernel but implemented though open source development and deviation from Linux source.

"What's your feeling on RNG Tools in general, and from the point of view of it being an optional component people have to consciously seek out and add in vs. being a "built in" part of a standard distribution?"

Personally i believe in RNGTools and the usage is a must!

Multiple sources of entropy and the hashing of that combined and injected though AES hardware
is not included..

Fortunately this is 4 weeks development at most.

So kernel inclusion of the driver base is a must (with the main tool being n protected space with distribution to user of AES; blowfish etcetera, hashed and expanded data

(c)RS
******

The places the random go...

Voyages of the scientific imagination.

https://www.technologyreview.com/s/609482/ai-is-dreaming-up-new-kinds-of-video-games/

https://www.technologyreview.com/s/529136/no-mans-sky-a-vast-game-crafted-by-algorithms/

Random is ever of use to science and creative imagination..
Least we forgo the unusual for common substance.

https://science.n-helix.com/2018/12/rng.html

Friday, April 7, 2017

boinc - enhancing research workloads for the benefit of mankind & humanity - Computer Optimisation - CPU , GPU & RAM - PC, Mac & ARM development

boinc - enhancing research workloads for the benefit of mankind & humanity - Computer Optimization - CPU & GPU

HPC - High Performance Computation for beneficial goals and obvious worth.

(Guide, experimentation, developer kit's and manuals)

By Rupert S

何百万のコアで何をするのですか？

混乱した毛穴から血が流出するまで、混乱の罠から惑星を救いなさい。

永遠の海のイルカのような時間の川で踊りましょう。

夕方の海岸まで科学の蝋燭をちらつかせる。

what would we do with a million cores!?

save a planet from the grip of chaos death till the blood runs from shattered pores..

dance in the rivers of time like the dolphins in the seas of evermore..

flicker the candle of science till evening shores.

Observing the workloads of many beneficial projects we find that commonly the workload data set is small,

In addition to the memory set being smaller or larger than a machine can compute optimally; we find that feature sets such as fae and avx have commonly not been implemented,

Some projects like asteroids at home and the seti project are using enhanced computation instruction sets ... like avx and memory loads that benefit from the 4gb or more ram that is available on decent gaming and home laptops.

Not all modern machines have loads of ram; However research and or university establishments use sufficiently powerful machines that can glow on the boinc record in full glory with a 256mb to 768mb workload,

In addition the machines are operand,xen ... commonly and servers may have such as Sparc or power pc specific hardware and instruction sets,

In order to examine examples .. below we can see workloads include small data arrays; in the 40mb to 79mb range..

In line with servers and gaming rigs .. we have 1gb of ram per core, of course not all issues require a larger array in the workload and some machines have 256mb per core !

However much Ram you allocate to the projected workload; small memory loads can and will be sufficient for data swapping and or paging (like DNA Replicators)...

Some task can sufficiently benefit from larger thread and data models, to my mind DNA and mapping data are fine examples of specific workloads; Where memory counts,

In addition thread count can be 4 or other numbers and i suggest that a single task can use more than one core and instruction set (neon for example or Symmetric threading FPU, SMT)

Specific workload optimisation, or rather generic with SSE and AVX and FPU threading and precision optimisation would be very cool while we deal with the workload running app.

In particular the Ryzen multi-core is a new and exciting product,

So take care to read the guides in the lower half of the document, AVX2, RDSEED, ADX and additional encryption formats are some of the most exciting changes to the AMD Ryzen Arch.

The report on the vina boinc project for the zika viri chemical examination though computer hive proves interesting... and mentally testing/stimulating,

Showing the problems that properly optimising code for Chemical/Biological examination can face.

AVX similarities to GPU core, Function of AVX can be thought of as CPU extension function of the same usage as GPU!
In short combined with FPU very much in the same performance category as the GPU cores and of much worth to scientific research and development of game dynamics, sound, video and spaces in N-Dimension space.

CPU extensions can prepare vector space for GPU to enhance the speed and optimize vector tables before GPU rendering and sound space in 3D for surround sound...
Interpolate texture, sound and other data with bit swapping.. In SIMD instructions.

RND Function can be used to explore additional data spaces.

Encryption function to enhance unpredictable behavior or to save space.

Further thought ... Efficiency :

add a MHz/Dhrystone's/MIP'S performance per watt to each system ...
then projects will further optimise workloads to improve upon workload energy & environmental efficiency versus work carried out.

Work Hours x Mhz / (efficiency per watt)

-------

Hours / % of projects finished with work completed

Also bear in mind that GPU's need watt efficiency and task management to optimise power used versus work done....

worker priority should always be :

efficiency + merit of the work
--------
time / % necessity

Please examine the issue further.

Rupert S

https://www.worldcommunitygrid.org

https://boinc.berkeley.edu/

http://www.charityengine.com/

https://lhcathome.cern.ch/https://cern.n-helix.com/lhcathome/cpu_list.php

CERNVM-FS-Both : Run & Install Commands : RS https://is.gd/CERN_SH_Scripts

https://cvmfs.readthedocs.io/en/stable/cpt-quickstart.html
https://cvmfs.readthedocs.io/en/stable/cpt-configure.html

HPC Computing work load Photos - HPCSet 2 photos - HPC Set 3 Photos

Conducting Research Photo set 1 - Photo set 2 - photo set 3

http://esa-space.blogspot.ru/2017/04/rng-and-random-web.html - we need Chaos Seeds : Random seeds for our work

https://www.youtube.com/watch?v=mLQGXlxemlg - Optimizing HPC Service Delivery by a life time super computing tec

https://youtu.be/KbjFGQ9fHvw - Scaling and Optimizing Climate and Weather Forecasting Programs on Sunway TaihuLight - very exciting

https://insidehpc.com/2017/06/video-scaling-climate-weather-forecasting-sunway-taihulight/

HPC Best Practices..

http://www.intertwine-project.eu/best-practice-guides

AMD Platform Optimization - please read for all developers

https://community.amd.com/thread/213045 - particular instruction differences for microcode optimisation

http://32ipi028l5q82yhj72224m8j.wpengine.netdna-cdn.com/wp-content/uploads/2017/03/GDC2017-Optimizing-For-AMD-Ryzen.pdf - code optimisation a few very important lessons... may seem simple to some but obviously is not to be taken for granted.

CPU Optimisation - utility and function.

http://gpuopen.com/compute-product/codexl/ - CodeXL is a code efficiency analyser optimiser debugger for GPU and CPU and system.

https://github.com/GPUOpen-Tools/CodeXL/releases/latest

http://bit.ly/CoXLPhoto - CodeXL in action photos

http://www.noamross.net/blog/2013/4/25/faster-talk.html - speeding up code a guide - profiling and bench-marking.

http://www.pgroup.com/doc/pgi17ug-x64.pdf - PGI Compiler guide

http://www.agner.org/optimize/ - code optimisation for all programmers on X86,X86-64bit and some others.. this is a terrific resource !

http://www.agner.org

https://github.com/ctuning/ck - data & program - testing and tuning

for example : Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni ssse3 fma cx16 sse4_1 sse4_2 popcnt aes f16c syscall nx lm avx sse4a osvw xop wdt fma4 topx page1gb rdtscp bmi1

or for example : Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni ssse3 fma cx16 sse4_1 sse4_2 popcnt aes f16c syscall nx lm avx svm sse4a osvw ibs xop skinit wdt lwp fma4 tce tbm topx page1gb rdtscp bmi1

or for example : Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni ssse3 fma cx16 sse4_1 sse4_2 popcnt aes f16c syscall nx lm avx sse4a osvw xop wdt fma4 topx page1gb rdtscp bmi1

for an improved upon instruction list in the newer boinc application.. (with appropriate configuration)

11000 Mips & 2700 FPU Mips - Per Core

**
an article that took some deep learning... itself ôo, anyway very interesting....
hip c++ will we think be simpler than open CL then as a higher level code port...
and machine converted CUDA-code to 99.6%

http://www.anandtech.com/show/10831/amd-sc16-rocm-13-released-boltzmann-realized

**

Interesting examination of instruction infrastructure from x86 CISC to RISC

Code Efficiency etcetera...

http://scholarworks.wmich.edu/masters_theses/1519/

Direct link http://scholarworks.wmich.edu/cgi/viewcontent.cgi?article=2517&context=masters_theses

Compilers and Make compliant with SMT and other HPC Standards

https://cmake.org/

http://llvm.org/
http://llvm.org/docs/FAQ.html

https://gcc.gnu.org/

*not free obviously .. intel*
https://software.intel.com/en-us/articles/intel-advisor-roofline

**

*compilers with FORTRAN specifics and preferably C/C++ and HPC (compatibility C++/C compatible with FORTRAN preferably)

https://gcc.gnu.org/wiki/HomePage
https://gcc.gnu.org/wiki/GFortranBinaries

https://software.intel.com/en-us/intel-parallel-studio-xe/try-buy/#parallelstudioxe

http://www.pgroup.com/products/pgiworkstationg.htm (limitations nVidia compatable GPU Cuda code & no obvious statment of OpenCL Support)

http://llvm.org/ - llvg it seems has fortran compatibility.. (needs research)
http://llvm.org/docs/FAQ.html

http://www.pathscale.com/ - check it out

Fortrans Speacialists (no c++ etcetera)

https://www.absoft.com/products/windows-fortran-compiler-suite/
http://www.fortran.com/products-page/compilers/fortrantools-for-windows/

https://www.cs.sfu.ca/~fedorova/Teaching/CMPT886/Spring2007/papers/adaptive-execution.pdf

*ibm guidance*
http://www.prace-ri.eu/best-practice-guide-ibm-power-775-html/
https://www.redbooks.ibm.com/redbooks/pdfs/sg248280.pdf

Release code to use Power chips and emulation code embeded in boinc mainframe 800 Core Multiplex

https://access.redhat.com/articles/3158511 - Power9 Edition RedHat
https://www.hpcwire.com/off-the-wire/ibm-releases-new-compilers-exploit-power9-technology/

https://www.ibm.com/blogs/systems/ibm-unveils-new-software-for-ai-machine-and-deep-learning/
https://www.ibm.com/spectrum-computing - HPC Management,workload and core efficiency
https://www-01.ibm.com/common/ssi/ShowDoc.wss?docURL=/common/ssi/rep_ca/3/897/ENUS216-043/index.html&request_locale=en

https://books.google.co.uk/books?id=NcrDAgAAQBAJ - cloud structures for science and business

PC/Mac/Windows/Linux/Android - high performance computation - the method and the means

http://science.n-helix.com/2018/09/hpc.html

http://science.n-helix.com/2018/09/hpc-pack-install-guide.html

https://www.khronos.org/news/events/2016-isc-high-performance

https://www.khronos.org/assets/uploads/developers/library/2008_siggraph_bof_opengl/OpenCL%20and%20OpenGL%20SIGGRAPH%20BOF%20Aug08.pdf HPC Report

*
http://www.ziti.uni-heidelberg.de/ziti/uploads/ce_group/2017-ISC.pdf - Overview of MPI message characteristics of HPC Server proxy applications.

*Interesting statistics from which one can conclude that 64 to 256 core units is the space within which,
The maximum increase in message noise/entropic noise; Related to inter process communication is observed.*

https://www.microsoft.com/en-us/download/details.aspx?id=54507 Microsoft HPC Pack 2016 including linux

https://technet.microsoft.com/en-us/library/cc514029(v=ws.11).aspx all HPC Packs 2016,2012 to 2008 info and download

https://msdn.microsoft.com/en-us/library/ff976568.aspx Microsoft High Performance Computing for Developers - info and downloads

https://docs.microsoft.com/en-us/azure/virtual-machines/windows/hpcpack-cluster-active-directory - information and virtualisation

https://www.openfabrics.org/

https://openhpc.community/downloads/

http://www.opencompute.org/

http://www.cray.com/blog/getting-new-intel-xeon-scalable-processors-hpc-workloads/ - details about intel arch in HPC workloads.

https://arxiv.org/pdf/1707.09414.pdf - Network data-load on GPU cluster node arrays - HPC performance.

https://www.hpcwire.com/2018/03/01/part-one-deep-dive-2018-trends-life-sciences-hpc/ - Life sciences; Cloud HPC

https://cs.lbl.gov/news-media/news/2018/a-game-changer-metagenomic-clustering-powered-by-supercomputers/

https://bitbucket.org/azadcse/hipmcl/ - cluster optimize code

https://azure.microsoft.com/en-us/resources/templates/slurm/

https://wikis.nyu.edu/display/NYUHPC/Slurm+Tutorial
https://wiki.hpc.uconn.edu/index.php/SLURM_Guide

Linux, Windows excetera HTCondor
https://research.cs.wisc.edu/htcondor/

Platform LSF will do everything you need. It runs on Windows. It is commercial, and can be purchased with support. https://www.ibm.com/it-infrastructure

https://stackoverflow.com/questions/3149131/please-recommend-an-alternative-to-microsoft-hpc

OpenVX for high performance Computing : Multi platform spec

"OpenVX for HPC Neural Nets and processing .... a new way to deliver on research, gaming & processing of data and images"

https://www.khronos.org/news/tags/tag/OpenVX

https://www.khronos.org/news/press/openvx-1.2-specification-cross-platform-acceleration-power-efficient-vision

https://www.ibm.com/blogs/research/2017/12/pruning-ai-networks/

https://arxiv.org/abs/1611.05162v4 - net-trim

Somewhat over complex formula..
Considering that the objective is to trim the network to as few as necessary nodal's to..
Reduce complexity and improve performance.
(May want to net-trim verbose complexity out of science and code generation.)

Open CL "GPU Development" links

https://www.khronos.org/blog/iwocl-where-you-learn-the-latest-on-opencl

https://www.khronos.org/opencl/

https://www.khronos.org/opencl/resources for SDK, learning & optimisation resources.

http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/opencl-optimization-guide/

https://github.com/RadeonOpenCompute - ROCm: Platform for GPU Enabled HPC and UltraScale Computing

http://gpuopen.com/professional-compute/

http://gpuopen.com/compute-product/hcrng/

https://bitbucket.org/multicoreware/hcrng

http://gpuopen.com/compute-product/clrng/

installing the AMD SDK improves compute performance, Optimise your code !

https://streamhpc.com/blog/2017-05-21/amd-open-sourced-rocms-opencl-driver-stack/

https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime/blob/amd-master/README.md

http://developer.amd.com/tools-and-sdks/opencl-zone/

http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/

http://gpuopen.com/games-cgi/

http://developer.amd.com/tools-and-sdks/graphics-development/

http://hgpu.org information and interesting learning & source

http://dspace.princeton.edu/jspui/bitstream/88435/dsp01wm117r22g/1/Jia_princeton_0181D_11168.pdf Optimisation for parallel computing information.

https://arxiv.org/pdf/1705.05249 - CLBlast: A Tuned OpenCL BLAS Library demonstration.

https://arxiv.org/pdf/1710.08616
https://arxiv.org/pdf/1710.08616.pdf - FORTRAN for GPU and multiprocessor usage in Scientific research,
Also of interest in the generation of coding Format, style, implementation & Structure.

"The new implementation performs up to 4.9x faster when comparing one GPU to one
multi-core CPU socket. On a full-scale production run with 1581 x 1301 x 58
grid size and 2km resolution, 24 Tesla P100 GPUs are shown to replace more
than 50 18-core Broadwell Xeon sockets."

"GPUs are an attractive target architecture, with a memory bandwidth that is
typically 5 to 7 times higher than Intel Xeon architectures of a similar generation."

"Compared to CPUs, GPUs support a very high number of parallel threads while
having a very low thread switching overhead - however with the cost of small
caches available per thread and a low single-threaded performance."

LHC Cern 6 Track GPU Study < help needed...

https://lhcathome.cern.ch/lhcathome/index.php - coders desired.

RS

**

HIP - HSA - the CUDA Compatible C++ for Heterogeneous Computing

http://developer.amd.com/wordpress/media/2012/09/7637-HIP-Datasheet-V1_4-US-Letter.pdf

http://developer.amd.com/wordpress/media/2012/10/hsa10.pdf - a full guide

http://www.hsafoundation.com/

http://www.hsafoundation.com/hsa-developer-tools/

https://github.com/HSAFoundation/HSA-docs-AMD/wiki#initial-implementation

https://github.com/HSAFoundation/HSAIL-Tools

https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver - Driver for kernel

http://www.amd.com/Documents/SDN-Whitepaper.pdf - Smart Software Defined Networks

http://support.amd.com/TechDocs/55766_SEV-KM%20API_Spec.pdf - Secure Encrypted Virtualisation Key Management

http://support.amd.com/TechDocs/Protecting%20VM%20Register%20State%20with%20SEV-ES.pdf - PROTECTING VM REGISTER STATE WITH SEV-ES

http://support.amd.com/TechDocs/50742_15h_Models_60h-6Fh_BKDG.pdf - bios and kernel drivers

**

Machine Intelligence code optimization platforms

https://www.tensorflow.org/ - machine intelligence
https://github.com/tensorflow/tensorflow
https://github.com/hughperkins/tf-coriander - openCL Tensor flow

PyTorch - Machine learning with graphs, Tesor philosophie and python - https://github.com/pytorch/pytorch - http://pytorch.org

Hyperdash python SDK - PyTorch
https://github.com/hyperdashio/hyperdash-sdk-py

Richard Herbert real time learning with PyTorch - Real-time Machine Learning with PyTorch and Filestack
https://blog.filestack.com/tutorials/realtime-machine-learning-pytorch/

"Kirill DubovikovFollow - Knowledge distiller, Data Scientist and Software Architect"
https://medium.com/towards-data-science/pytorch-vs-tensorflow-spotting-the-difference-25c75777377b

speed and data comparison
https://medium.com/@yaroslavvb/tensorflow-meets-pytorch-with-eager-mode-714cce161e6c

**

ARM Development software/SDK's & tools

https://developer.arm.com/products/software-development-tools

https://developer.arm.com/products/software-development-tools/hpc for high performance computing (ideal for Boinc)

https://developer.arm.com/products/software-development-tools/compilers for both HPC and APP development.

https://developer.arm.com/products/system-design/fixed-virtual-platforms

https://www.synopsys.com/verification/virtual-prototyping/vdk/vdk-for-arm.html

https://www.synopsys.com/designware-ip/technical-bulletin/designware-hybrid-ip.html

**

ARM Feature Sets

https://www.arm.com/products/processors/instruction-set-architectures/index.php

https://www.arm.com/products/processors/armv8-architecture.php

IOT links - (internet of things)

https://www.infoq.com/articles/thread-protocol-for-home-automation

http://wso2.com/wso2_resources/wso2_whitepaper_a-reference-architecture-for-the-internet-of-things.pdf

**

compiler optimisation - process

https://crd.lbl.gov/departments/computer-science/PAR/research/roofline/

https://www.nextplatform.com/2017/05/25/nersc-supercomputing-site-eases-path-optimization-scale/

https://www-ssl.intel.com/content/www/us/en/events/hpcdevcon/parallel-programming-track.html#utilizing

Linux arch reference material

https://www.ibm.com/developerworks/library/l-linuxuniversal/

**

Agency GPL

https://code.nasa.gov/

Workers :

https://www.upwork.com/hire/driver-development-freelancers/

Update 2:

for a comparison of Gflops/Mips throughput of various Boinc Tasks ..

here we show the relevance of the code or function used ... AVX for example is multi threaded ! and so is the FPU pipeline of the AMD FX & Ryzen processor.....

http://bit.ly/HPCImpact (original non edited photos ...)

and set 2 (newer) http://bit.ly/2HPCImpact ....

set 3 http://bit.ly/HPCImpact2 to examine of the improvement code streamlining brings.

Some of our work with the updated graphics http://bit.ly/ReserchPhotos

see the work throughput GFlops compared to code efficiency per task !

sometimes entropy is needed to for-fill the task one would imagine (for example on android) http://bit.ly/tRNG-Dev

the improvement of the boinc and worldcommunitygrid projects has been observed, noted and one feels improved upon, ..

further improvement should be implemented as soon as possible; To improve work versus output efficiency.

thank you kindly programmers/Workers & scientists for your perseverance & effort.

RS

http://bit.ly/BoincStudies - Result Studies

Update 3 Q & A:

"In reference to the use of virtual box there is a new product by berkley > http://singularity.lbl.gov/ called singularity that handles repeatable condition containers... and has low overhead for virtualisation data-set.

As to the particle spread one should possibly consider the multiple core and threaded core model specific to the Ryzen and intel sets...

One could imagine that the multi-threaded nature of arm server cores combined with the nature of multi-threaded and headed arm CPU's and GPU Run-script environments is a new and uncompromising land of opportunity and challenge.

Many of the instructions on the FMV4 and Vector instruction sets have multi-threaded en-action at lower precision..."

http://fife.fnal.gov/singularity-on-the-osg/

RS

----

Eric Mcintosh accredited scientist Cern
Project administrator
Project developer
Project tester
Project scientist

"Well we are far from trying to optimise GPU code.

First let me explain that we have a tracking loop over turns
(up to 1,000,000 hoping for 10,000,000 soon) which contains
a large number of inner loops over particles, currently up to 64.

Luckily these loops over particles can be paralleled as each
particle is totally independent. In addition the original author F. Schmidt
pre-calculated everything possible before entering the tracking loop.
Each turn involves some 10,000 steps over a varying number of inner loops,

e.g. straight section, quadruple, beam-beam interaction, power supply ripple, etc etc

Of which there are about 50 different possibilities. A straight section is really just
a multiply and add, whereas beam beam involves hundreds or more FLOP's.
The first idea would be to use a much larger number of particles to best
utilise the GPU. This however would produce a large amount of I/O and
use a lot of disk space, but maybe not insurmountable,

However all the code is FORTRAN, the outer loop calls subroutines (could inline), and has many tests/branches.
It would be great if the main loop fitted entirely into the GPU and we would have
rare Host access for I/O or BOINC checkpoint and progress calls or when
one or more particles are lost.

My colleague Ricardo is actively looking at redoing in C which would also allow
much more portability and also allow to be parallel on multi-core systems.
For the moment we just run tasks in parallel, which works rather well (apart
from some current infrastructure problems). I hope to come up with
some numbers next week on GPU testing.

The code itself has been regularly measured and optimised; for example we
re-ordered array indices to optimise memory access and rewrote the Error Function
of a Complex Number to be faster but with adequate precision.

Portability does come at a price but ensures accuracy of results. I shall publish
measurements in an upcoming paper. I am sure we gain much more from being portable
and being able to use almost any IEEE 754 compliant processor.

On the issue of SixTrack and/or experiments this will shortly be under discussion at
CERN I am sure. Currently SixTrack has many more Hosts/volunteers, is simple to install,
and has been around for 13 years. Not everyone loves VMbox. Not a big deal at
present as we rarely have enough SixTrack work to keep all volunteers busy.

I hope to re-address all this in some weeks after current BOINC infrastructure issues
are resolved and we have the new "super" sixtrack with much broader application
e.g.collimation studies and we support a much wider range of platforms MacOS ARM
and use features such as AVX.

Eric.

____________"

Update 4 : Virtualisation

QEMU is obviously be of use on many projects because of machine emulation and virtualisation..

Comes in flavours including Windows, Mac and Linux.

http://www.qemu.org/

https://www.vmware.com/try-vmware.html - free products at the bottom
https://www.vmware.com/go/downloadplayer
https://www.vmware.com/go/get-free-esxi

Docker Sever & Docker CE (community edition) and this comes with sever edition!

So what do the projects & system.. feel and sense around the subject of using Docker CE ?

Obviously the professional version could be used for support of the main project and the CE edition or pro for the user..

https://store.docker.com/editions/community/docker-ce-desktop-windows

https://store.docker.com/search?offering=community&q=&type=edition

https://www.ctl.io/developers/blog/post/what-is-docker-and-when-to-use-it/

https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-getting-started

https://www.howtoforge.com/tutorial/how-to-use-docker-introduction/

VM and Microprocessor bug fixes incoming..
Hopefully microcode quickly also.

Creating a better virtualization header that is:

More efficient at isolating the contained OS with attributes in the OS's to contain secured data?
We find answers to improve efficiency and protect against VM>VM data transfer or to use this for a creative purpose!

We need answers! and science. : Microcode update
"Thank you for googles firm responses to the bug, faith in google is high..
The micro code be updated to flush & or contain the the speculative data in a data-cycle secure storage,
Within the framework of cache and ram/virtual-ram?
Cycle efficiency would be at most two cycles and a flush Xor bit data overlay,

Bit Masking before and after pre-fetch presents & also uses data - this method would be fast! (c)Rupert S"

Google systems have been updated for Meltdown bug https://security.googleblog.com/2018/01/todays-cpu-vulnerability-what-you-need.html
Attack mitigation - https://support.google.com/faqs/answer/7622138#android

"Microsoft issued an emergency update today,
Amazon said it protected AWS customers running Amazon's tailored Linux version and will roll out the MSFT patch,
for other customers to day"

We need answers! and science. : Microcode : update

(c)RS

specter & meltdown information

how to convert VM's and use hyper V and Docker

https://www.virtualbox.org/manual/ch10.html - compatibility

https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/quick-start/enable-hyper-v

https://www.groovypost.com/howto/migrate-virtual-box-vms-windows-10-hyper-v/

https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/user-guide/nested-virtualization

https://hyperv.veeam.com/blog/nested-vitualization-hyperv/

https://superuser.com/questions/1144405/enable-virtualization-for-windows-10-pro-running-inside-virtualbox

Update 5 : IO Bottlenecks and solutions.

http://blog.scoutapp.com/articles/2011/02/10/understanding-disk-i-o-when-should-you-be-worried

http://www.violin-memory.com/blog/understanding-io-random-vs-sequential/

Drive Cache :

even a 128mb of cache does do wonders for #DataScience #storage
we use a 2gb

http://www.romexsoftware.com/en-us/primo-cache/index.html

#Cache to the #Drive 300mb/s

Update 6 : GPU driver OpenCL HPC #workload optimize flag comparison.

http://bit.ly/CLOptimizeSetting - Observe the results by #GPU number..
As we can observe the flag doubles the speed of OpenCL output on average!
This flag is we believe available in the driver settings General / Advanced.

Update 7 : CPU Score comparisons

CPU Comparisons by the LHC Project : Hot Topic

AMD Ryzen Threadripper 2990WX Phenomenal performance.

By this comparison H/T Hyper thread seems a great thing on the 1950X & to compare these two chips,(2950X / 1950X) cooling is possibly the issue or Something else ..

Cern CPU Performance List

https://lhcathome.cern.ch/lhcathome/cpu_list.php

Update : GPU & Tasks - 7.14.2

Boinc GPU Tasks & Kernel fluctuation

Einstein GPU Work: GPU work for boinc projects: projects power: Lets improve efficiency RX560 power max 30Watt Min 5Watt

Einstein GPU Work:

Project graphics and power use video

For https://boinc.n-helix.com

& https://einsteinathome.org/about

https://is.gd/EinsteinGPU