silver71
Member
Offline
Activity: 101
Merit: 10
no avatar for now
|
|
March 07, 2014, 09:49:59 AM Last edit: March 07, 2014, 10:03:08 AM by silver71 |
|
TITLE : DIY A1 PCB WITH RASPBERRY PI AS CONTROLLER YACTO-IMAGE TURBO-MODE KERNEL DEVELOPERS GUIDLINE
DESCRIPTION : LIMITATIONS AND BOTTLENECKS regarding TURBO-MODE for RPI-based controllers with Raspberry Pi - technical YACTO-IMAGE DEVELOPERS guidline
CONTENTS :
About Raspberry Pi Transmission Performance issues Reception performance issues
RAW TEXT :
About the controller The Raspberry Pi (model B) in the rest of the text mentioned as RPi or RasPi, is a low-cost computer designed for educational purposes, developed by the Raspberry Pi Foundation charity. Its main component is a BCM2835 system-on- chip by Broadcom, which features an ARM1176JZF-S processor running at 700 Mhz (1 GHz boosts), and a Videocore 4 GPU, capable of high-definition video resolutions and support for OpenGL ES2.0. It also mounts a LAN9512 PHY by SMSC, with 100 Mb/s Ethernet capabilities.
The board provides the Ethernet RJ-45 socket, two USB-HS type A ports, HDMI and composite video outputs, stereo Line headphone socket, and a SD-HC card slot. Most of the BCM2835 signals (GPIO, UART, I2C, SPI, PWM, display, camera, and so on), are exposed by a set of pin headers and Camera Interface connectors. The model used for the benchmarks mounts 256 MiB of RAM.
The operating system is usually some flavor of Linux, but there exist at least 6-7 different choices, with Raspbian, a Debian-based Linux distribution as most widely used with specific support for the Raspberry Pi. The platform is controlled through a SSH connection, which makes negligible impact on the performance. No other user software or services are running, except the SSH connection and the benchmark executables.
Here are some benchmarkings for RPi which gives more insight into Rx/Tx issues, that is, the problems which might occur while RPi talks to DIY boards and A1 chips.
Instead as single threads, like for the R2P_GW benchmarks, CPU usages were collected as aggregate values from /proc/stat, while the network stack runs at the system and interrupt levels.
Transmission performance issues
Similarly to R2P_GW, the RasPi generates /benchmark/output messages at the maximum speed achievable, by not introducing forced delays. Timeouts are disabled, and a single message actually resides in memory, being streamed by the message loop.
The platform can saturate the host receiver at 20000 msg/s when the message size is not greater than 200 B. The CPU is used less than 50%, mainly by system processes (around 25%) and the topic handler (around 15%). As the message size increases from 8 B to 200 B, the impact of (software) interrupt requests grows, but stays below 10%.
Between 200 B and 500 B per message there is a sudden increase of the CPU usage, saturated by interrupts and system processes, which limits the throughput to 13000 msg/s. The topic handler usage stays around 15%, which means that the Linux network stack has a substantial effect in these circumstances.
With messages larger than 500 B there are no considerable changes in the CPU usage. At 10000 msg/s interrupts have a share of 40% and system calls of 55%, while the topic handler uses the CPU for less than 5%. The bandwidth gets close to 100 Mb/s, but it is still not reached at 10000 msg/s; indeed, the idle time stays around 1% without growing.
Reception performance
The reception performance was measured by streaming messages at 14000 msg/s, the maximum achievable by the host computer. As for R2P_GW, the reception was first evaluated by buffering each new incoming message, and then by processing the incoming message stream by skipping its contents.
Up to 100 B per message, the platform can receive all of the messages with low effort. The CPU is idle for more than 40% of the time, with the topic handler using less than 20% of the CPU time, and the system calls less than 40%. There is a strange decrease in the effect of system calls at a message size of 50 B, probably caused by some kernel optimization. The throughput stays at the maximum. Between 100 B and 500 B, the CPU usage of interrupts increases over 40%, and the CPU becomes saturated. The topic handler and system calls do not show significant changes in their impact. After 200 B per message, the throughput starts decreasing, but is still above 13000 msg/s.
With a message size beyond 500 B, the bandwidth is completely used. Software interrupts use the CPU at 10%, while the effect of system calls keeps around 35%, and that of the topic handler decreases as low as 10%. The idle time goes back to almost 50%.
The performance results of on-the-fly reception shows that bellow 100 B per message, the platform can receive all of the messages with low effort.
The CPU is idle for 60% of the time, primarily used by system calls for less than 30%, and the topic handler for 10%, the rest by (software) interrupts. Again, there is a strange decrease in usage by system calls at a size of 50 B.
Between 100 B and 500 B per message, where the CPU usage of system calls and interrupts increases up to 45% and 35% respectively, while the topic handler stays slightly above 10%.
With a message size greater than 500 B, the bandwidth reaches the 100 Mb/s limit, and the CPU load decreases. Software interrupts are steadily below 10% as well as the topic handler, which keeps decreasing. System calls go down to 30%, and the idle time almost reaches 60% again.
CONCLUSION :
This is important when programming kernel for overclocking the boards and mining software on RPi, since curently the more clock and power you bring to the boards (regardless of cooling), more errors you get, so this might be a good guidline for future firmware and kernel improvements, for these things to have in mind.
In other words, there are RPi LIMITATIONS not A1 chip limitations, and in order to boost the hashing speed of desk(s) and rig(s) based on this chip (Concraft A1) and this PCB design (DIY 2xA1 board), this problem needs to be circumvented or exploited and it is the software issue, but off-course dependable on PCB fabric design.
The way it (RPi) communicates with daisy chained chips and boards populated with A1 chips is the principal bottleneck to reach TURBO-mode deployment, and cooling in this case is just a technical limitation for time-domain (long-term non-stop operation of desk(s) and rig(s) and should not be and issue for short-time speed trial boost, as a proof of concept, but currently it is mostly so.
THIS IS SO NOT A SOLUTION; BUT A GUIDLINE FOR SOMEONE TO FIND A SOLUTION FOR THIS.
|