Pi of 1000 decimal digits calculation speed

This is a translation of Gadget Renesas Japanese site.
Title: Pi of 1000 decimal digits calculation speed
Author: fujita nozomu
Posted: 28 Sep 2013 7:09
__________________________________________________________________________
 
  • GR-SAKURA
  • GR-KURUMI
  • Arduino Uno R3
  • Arduino Due
  • Japanino
 
I ran the following sketches on each of the above boards to compare the performance up to 1000 digits after the decimal point.
__________________________________________________________________________
 
 
#if defined(__AVR_ATmega168__) || defined(__AVR_ATmega328P__) || defined(__AVR_ATmega1280__) || defined(__AVR_ATmega2560__) || defined(__AVR_ATmega32U4__)
    #if defined(ARDUINO) && ARDUINO >= 100
        #include <Arduino.h>
    #else
        #include <WProgram.h>
    #endif
    #define _ARDUINO_AVR_ 1
#elif defined(__SAM3X8E__)
    #include <Arduino.h>
    #define _ARDUINO_DUE_ 1
#elif defined(__RX__)
    #include <rxduino.h>
    #define _GR_SAKURA_ 1
#elif defined(__RL78__)
    #include <RLduino78.h>
    #define _GR_KURUMI_ 1
#else
    #error unknown target.
#endif
 
#define DIGITS 1000
#define BUFBITS (int(/*log(10)/log(2)*/3.32192809489 * DIGITS) + 1)
 
static uint8_t buf[(BUFBITS + 7) / 8];
 
static uint8_t pi_dig16(unsigned n);
 
void setup()
{
    Serial.begin(9600);
 
    unsigned long start = millis();
    for (int i = 0; i < ((BUFBITS + 3) / 4); i++) {
        if ((i & 1) == 0) {
            buf[i / 2] = pi_dig16(i) << 4;
        } else {
            buf[i / 2] |= pi_dig16(i);
        }
    }
    unsigned long time = millis() - start;
 
    Serial.println("PI = 3.");
    int c = 0;
    for (int i = 0; i < DIGITS; i++) {
        for (int j = (BUFBITS + 7) / 8 - 1; j >= 0; j--) {
            c += 10 * buf[j];
            buf[j] = c % 256;
            c /= 256;
        }
        Serial.print(c);
        if ((i + 1) ==  DIGITS || i % 50 == 49) {
            Serial.println();
        } else if (i % 10 == 9) {
            Serial.print(" ");
        }
    }
    Serial.println();
    Serial.print(time);
    Serial.println("msec.");
}
 
void loop()
{
}
 
#define main _main
 
#if _ARDUINO_AVR_
#define lrintf lrint
#define floorf floor
#undef __AVR__
#endif
 
#if _GR_SAKURA_
#define _STDINT_H
#endif
 
/*
Copy and paste program file pi.c listed in this link after this comment.
http://www.mikrocontroller.net/articles/4000_Stellen_von_Pi_mit_ATtiny2313
*/
__________________________________________________________________________
Comment: fujita nozomu
Posted: 28 Sep 2013 7:34
 
 
The results are as follows:
 
Board name
On-board MCU
Operating frequency (MHz)
Binary file size (bytes)
Execution time (ms)
GR-SAKURA
Renesas RX63N R5F563NBDDFP
96
39,612
20,298
GR-KURUMI
Renesas RL78/G13 R5F100GJAFB
32
62,172
427,235
Arduino Uno R3
Atmel AVR ATmega328
16
4,808
234,335
Arduino Due
Atmel SAM3X8E ARM Cortex-M3
84
12,548
47,113
Japanino, the hands-on board bundled with Otonanokagaku Magazine Vol. 27
Atmel AVR ATmega168V
8
4,814
469,192

__________________________________________________________________________
Comment: fujita nozomu
Posted: 28 Sep 2013 7:53
 
Random comments:
  • Execution time results with GR-SAKURA were better than expected. Maybe because this was the only one in the comparison that embeds a floating decimal point arithmetic function in the hardware.
  • The GR-KURUMI binary file size is huge, mostly likely due to inefficient code output by the compiler, as well as the current linker linking to unnecessary objects.
  • GR-KURUMI execution time was slow due to the extremely poor efficiency of the current compiler (GCC) code output. I suspect performance would improve (by about 1/4 to 1/3) by using the correct CubeSuite+ and IAR products (unconfirmed).
  • I had expected better execution time performance with Arduino Due in comparison to Arduino Uno, but was disappointed. I thought it would have been at least 10 times faster but not so…
  • Arduino UNO and Due execution files are smaller than GR files. GR should follow suit…
  • Adding Japanino to the lineup had one great result - it saved GR-KURUMI from coming in last in performance! I’m so glad I bought that science magazine for geeks (Otonanokagaku)!
 
 
__________________________________________________________________________
Comment: fujita nozomu
Posted: 22 Oct 2013 11:43
 
 

I had a couple of boards that I had bought for a great price but never used, so I added them to the lineup.

I also made a few changes to the sketch.


#if defined(__AVR_ATmega168__) || defined(__AVR_ATmega328P__) || defined(__AVR_ATmega1280__) || defined(__AVR_ATmega2560__) || defined(__AVR_ATmega32U4__)
    #if defined(ARDUINO) && ARDUINO >= 100
        #include <Arduino.h>
    #else
        #include <WProgram.h>
    #endif
    #define _ARDUINO_AVR_ 1
#elif defined(ARDUINO) && defined(__SAM3X8E__)
    #include <Arduino.h>
    #define _ARDUINO_DUE_ 1
#elif defined(__RX__)
    #include <rxduino.h>
    #define _GR_SAKURA_ 1
#elif defined(__RL78__)
    #include <RLduino78.h>
    #define _GR_KURUMI_ 1
#elif defined(ENERGIA)
    #include <Energia.h>
    #if defined(__MSP430G2452__) || defined(__MSP430G2553__)
        #define _LAUNCHPAD_VALUELINE_ 1
    #elif defined(__LM4F120H5QR__)
        #define _LAUNCHPAD_STELLARIS_ 1
    #else
        #error unknown target.
    #endif
#else
    #error unknown target.
#endif
 
#define DIGITS 1000
#define BUFBITS (int(/*log(10)/log(2)*/3.32192809489 * DIGITS) + 1)
 
#if !_LAUNCHPAD_VALUELINE_
static uint8_t buf[(BUFBITS + 7) / 8];
#endif
 
static uint8_t pi_dig16(unsigned n);
 
void setup()
{
    Serial.begin(9600);
 
    unsigned long start = millis();
    for (int i = 0; i < ((BUFBITS + 3) / 4); i++) {
#if !_LAUNCHPAD_VALUELINE_
        if ((i & 1) == 0) {
            buf[i / 2] = pi_dig16(i) << 4;
        } else {
            buf[i / 2] |= pi_dig16(i);
        }
#else
        pi_dig16(i);
#endif
}
    unsigned long time = millis() - start;
 
    Serial.println("PI = 3.");
#if !_LAUNCHPAD_VALUELINE_
    int c = 0;
    for (int i = 0; i < DIGITS; i++) {
        for (int j = (BUFBITS + 7) / 8 - 1; j >= 0; j--) {
            c += 10 * buf[j];
            buf[j] = c % 256;
            c /= 256;
        }
        Serial.print(c);
        if ((i + 1) ==  DIGITS || i % 50 == 49) {
            Serial.println();
        } else if (i % 10 == 9) {
            Serial.print(" ");
        }
    }
#endif
    Serial.println();
    Serial.print(time);
    Serial.println("msec.");
}
 
void loop()
{
}
 
#define main _main
 
#if _ARDUINO_AVR_
#define lrintf lrint
#define floorf floor
#undef __AVR__
#endif
 
#if _GR_SAKURA_
#define _STDINT_H
#endif
 
#if _LAUNCHPAD_VALUELINE_
#define lrintf (int)rintf
#define stdout (0)
#define fputc(x, y) ((void)x, (void)y)
#define fflush(x) ((void)x)
#endif
 
/*
Copy and paste program file pi.c listed in this link after this comment.
http://www.mikrocontroller.net/articles/4000_Stellen_von_Pi_mit_ATtiny2313
*/

Results:

Board name
On-board MCU
Operating frequency (MHz)
Development tool (compiler)
Binary file size (bytes)
Execution time (ms)
GR-SAKURA
Renesas RX63N R5F563NBDDFP
96
Web Compiler V1.43(GNURX Toolchain v12.03)
39,612
20,298
GR-KURUMI
Renesas RL78/G13 R5F100GJAFB
32
Web Compiler V1.43(GNURL78 Toolchain v13.01)
62,172
427,235
Arduino Uno R3
Atmel AVR ATmega328
16
Arduino IDE 1.52(gcc version 4.3.2 (WinAVR 20081205))
4,808
234,335
Arduino Due
Atmel SAM3X8E(ARM Cortex-M3)
84
Arduino IDE 1.52 (gcc version 4.4.1 (Sourcery G++ Lite 2010q1-188))
12,548
47,113
Japanino, the hands-on board bundled with Otonanokagaku Magazine Vol. 27
Atmel AVR ATmega168V
8
Arduino IDE 0018 (gcc version 4.3.2 (WinAVR 20081205))
4,814
469,192
MSP430 LaunchPad Value Line Development kit
Texas Instruments MSP430G2553
16
Energia 0101E0010(gcc version 4.6.3 20120301 (mspgcc LTS 20120406 unpatched) (MSPGCC 20120406 (With patches: sf3540953 sf3559978)))
7,152
707,340
Stellaris LM4F120 LaunchPad Evaluation Kit
Texas Instruments EK-LM4F120XL(ARM Cortex-M4F)
80
Energia 0101E0010 (gcc version 4.7.1 (GCC))
4,344
31,812

Random comments:

  • I assume the major difference in execution time between Arduino Due and Stellaris LM4F120 LaunchPad Evaluation Kit, despite the similar clock speeds, reflects the availability (or lack of) a floating decimal point arithmetic function.
  • Still, I was amazed with GR-SAKURA (RX63N), which was the clear winner in terms of execution time performance.
  • I’m not sure about the level of optimization of the code output by the MSP430 compiler. Yet judging by the execution file size, it can’t be too bad.
  • Once again, I found the execution file size of both GR-SAKURA and GR-KURUMI simply too big.
  • After seeing the execution time for MSP430 LaunchPad Value Line Development kit, the GR-KURUMI compiler doesn’t look so bad! But that is just an illusion, to be sure.
  • The thing is, I have been measuring everything with millis(), but I’m pretty sure millis() counts 1.007ms as 1(ms). Something isn’t quite right!
  • The library used by Arduino Uno R3 and Japanino, AVR-LIBC, includes the floating point subroutine, and is optimized by Assembler. On the other hand, libraries for other boards feature the floating point function in the software, are probably using newlib or something similar coded by C compiler.
    I figure the reason Arduino Uno R3 and Japanino show decent execution times, despite use of the supposedly powerless 8-bit processor, is thanks to these streamlined libraries. If manufacturers developed something similar to the AVR-LIBC for RL78, the GR-KURUMI execution time would show significant improvement. Yet it would take a lot of effort and input by manufactures to make any progress.
  • It’s about time I stop posting self-satisfying comments that are of little use to anyone else and just start my own blog!
 
__________________________________________________________________________
Comment: fujita nozomu
Posted: 1 Nov 2013 20:02
 
 

I recently found out that in the RL78 compiler comparison, the float process was replaced with a fixed decimal point, so I decided do some tests using the fixed decimal point as well.

The results are as follows:

Board name
On-board MCU
Operating frequency (MHz)
Development tool (compiler)
Fixed decimal point binary file size (bytes)
Fixed decimal point Execution time (ms)
Float decimal point binary file size (bytes)
Float decimal point Execution time (ms)
GR-SAKURA
Renesas RX63N R5F563NBDDFP
96
Web Compiler V1.43 (GNURX Toolchain v12.03)
33,640
21,977
39,612
20,298
GR-KURUMI
Renesas RL78/G13 R5F100GJAFB
32
Web Compiler V1.43 (GNURL78 Toolchain v13.01)
50,672
174,142
62,172
427,235
Arduino Uno R3
Atmel AVR ATmega328
16
Arduino IDE 1.52(gcc version 4.3.2 (WinAVR 20081205))
3,454
197,338
4,808
234,335
Arduino Due
Atmel SAM3X8E(ARM Cortex-M3)
84
Arduino IDE 1.52(gcc version 4.4.1 (Sourcery G++ Lite 2010q1-188))
10,688
38,834
12,548
47,113
Japanino, the hands-on board bundled with Otonanokagaku Magazine Vol. 27  
Atmel AVR ATmega168V
8
Arduino IDE 0018(gcc version 4.3.2 (WinAVR 20081205))
3,258
395,118
4,814
469,192
MSP430 LaunchPad Value Line Development kit
Texas Instruments MSP430G2553
16
Energia 0101E0010 (gcc version 4.6.3 20120301 (mspgcc LTS 20120406 unpatched) (MSPGCC 20120406 (With patches: sf3540953 sf3559978)))
2,662
193,906
7,152
707,340
Stellaris LM4F120 LaunchPad Evaluation Kit
Texas Instruments EK-LM4F120XL(ARM Cortex-M4F)
80
Energia 0101E0010 (gcc version 4.7.1 (GCC))
3,960
34,381
4,344
31,812

Random comments

  • GR-SAKURA (RX63N) is fast. That is all there is to it!
  • GR-SAKURA and Stellaris LM4F120 LaunchPad Evaluation Kit: execution time of the fixed point version was worse than that of the float version. This is basically because both boards feature the floating point operation as an MCU function, so the function is not seen as a penalty. In addition, the cost of division went up as functions div_a() and div_b() must be used for the division processing during the program to facilitate the change from the float to fixed point version. Since GR-SAKURA (RX63N) has a integral division command, I tried changing the contents of div_a() to return (int32_t)(((long long)a << DecimalPoint) / b) and div_b() to return (1UL << a) / b. Performance improved by 19,988 milliseconds. This is a bit faster than the float version.
  • Arduino Uno R3 and Japanino, the hands-on board bundled with Otonanokagaku Magazine Vol. 27, don’t show a great difference between floating and fixed point execution times, most likely because the AVR library’s floating point support routine was written quite efficiently.
  • GR-KURUMI, Arduino Uno R3, and MSP430 LaunchPad Value Line Development Kit boards all come with MCUs in comparably close classes, from 8 to 16 bits. All three show similar execution time results for fixed point operations. (Still, what is up with GR-KURUMI… 32MHz and this is all it can do?)
  • Even though Arduino Uno R3 and the Japanino techno kit didn’t not show any significant difference in execution time between the floating and fixed versions, the fixed version execution time for GR-SAKURA was less than half that of the floating version. This indicates that RL78 gcc floating point processing performance for GR-KURUMI is not good, and is even worse when used with the MSP430 LaunchPad Value Line Development Kit. Arduino Uno R3, GR-KURUMI, and MSP430 LaunchPad Value Line Development kit include the floating point operation in the hardware, floating point operation can be mounted with the integral operation and all three boards show similar integral (fixed point) operational performance. Therefore, I doubt that the float processing for GR-KURUMI and MSP430 LaunchPad Value Line Development kit can be improved enough to reach the level of AVR. Still, that is up to the manufacturers, and whether they can afford the effort and cost required. I also think that the floating point operation performance is not the most important feature of 8 to 16 bit MCUs.
  • The Japanino techno kit came in handy when I was evaluating the float version--a free board to add to the lineup. This time it was out-numbered by other boards. The truth is, Japanino uses an MCU similar to that used in Arduino Uno R3 and operates on half the clock frequency, so I knew from the start that the execution time would be about half. Now I feel kind of bad about keeping Japanino in the lineup, like I was comparing apples and oranges on purpose. I will try to mend my ways in the future!
 
 
Parents Reply Children
No Data