音源とオーディオの電子工作（予定）: Arduino, Nucleo, PSoCの単精度浮動小数点数演算の速度比較

2017年6月2日金曜日

Arduino, Nucleo, PSoCの単精度浮動小数点数演算の速度比較

比較対象	MPU	Core	Clock
Arduino Uno R3	ATMega328P	ATMega	16MHz
PSoC 4 Pioneer Kit	PSoC 4	Cortex-M0	48MHz
PSoC 5LP Protyotyping Kit	PSoC 5 LP	Cortex-M3	80MHz
Nucleo F401RE	STM32F401	Cortex-M4	84MHz
Nucldo F446RE	STM32F446	Cortex-M4	180MHz

ATMegaは8bitMPU。Cortex-M0、Cortex-M3はFPUなし。Cortex-M4はFPU付き。

Arduinoのスケッチ
FloatingPointTest.ino

#define LOOP_N  (100)

float buffer[LOOP_N];

void setup()
{
  char strBuffer[80];
  unsigned long start, end, elapse;

  Serial.begin(9600);
  Serial.print("\r\nFloating Point Test\r\n");
  sprintf(strBuffer, "LOOP_N\t%d\r\n", LOOP_N);
  Serial.print(strBuffer);
  Serial.print("op\tstart\tend\terapse\r\n");
  Serial.print("-------------------------------\r\n");

  // div
  start = micros();
  for (int i = 0; i < LOOP_N; i++) {
    buffer[i] = (float)i / LOOP_N;
  }
  end = micros();
  elapse = end - start;
  sprintf(strBuffer, "div\t%ld\t%ld\t%ld\r\n", start, end, elapse);
  Serial.print(strBuffer);

  // sinf
  start = micros();
  for (int i = 0; i < LOOP_N; i++) {
    buffer[i] = sinf((float)i / LOOP_N);
  }
  end = micros();
  elapse = end - start;
  sprintf(strBuffer, "sinf\t%ld\t%ld\t%ld\r\n", start, end, elapse);
  Serial.print(strBuffer);
  
  // cosf
  start = micros();
  for (int i = 0; i < LOOP_N; i++) {
    buffer[i] = cosf((float)i / LOOP_N);
  }
  end = micros();
  elapse = end - start;
  sprintf(strBuffer, "cosf\t%ld\t%ld\t%ld\r\n", start, end, elapse);
  Serial.print(strBuffer);

  // expf
  start = micros();
  for (int i = 0; i < LOOP_N; i++) {
    buffer[i] = expf((float)i / LOOP_N);
  }
  end = micros();
  elapse = end - start;
  sprintf(strBuffer, "expf\t%ld\t%ld\t%ld\r\n", start, end, elapse);
  Serial.print(strBuffer);

  // logf
  start = micros();
  for (int i = 0; i < LOOP_N; i++) {
    buffer[i] = logf((float)i / LOOP_N);
  }
  end = micros();
  elapse = end - start;
  sprintf(strBuffer, "logf\t%ld\t%ld\t%ld\r\n", start, end, elapse);
  Serial.print(strBuffer);
  
  // sqrtf
  start = micros();
  for (int i = 0; i < LOOP_N; i++) {
    buffer[i] = sqrtf((float)i / LOOP_N);
  }
  end = micros();
  elapse = end - start;
  sprintf(strBuffer, "sqrtf\t%ld\t%ld\t%ld\r\n", start, end, elapse);
  Serial.print(strBuffer);  

  Serial.print("\r\nEnd.\r\n");
}


void loop() {
  // put your main code here, to run repeatedly:

}

float型(単精度浮動小数点数型)で算術関数を呼び出してLOOP_N回ループを回してバッファを埋めている。

細かいところは違うが、PSoCはPSoC Creatorで、Nucleoはmbedで同じ処理をするコードを書いて測定してみた。

Github:
https://github.com/ryood/FloatingPointTest

mbed repository:
https://developer.mbed.org/users/ryood/code/FloatingPointTest/

No	Device	No2	Op	time(us)	clock(MHz)	period(us)	clock:op
1	Nucleo F446RE	1	div	0.117	180	0.005555556	21.06
1	Nucleo F446RE	2	sinf	0.4939	180	0.005555556	88.902
1	Nucleo F446RE	3	cosf	0.4665	180	0.005555556	83.97
1	Nucleo F446RE	4	expf	0.6758	180	0.005555556	121.644
1	Nucleo F446RE	5	logf	0.614	180	0.005555556	110.52
1	Nucleo F446RE	6	sqrtf	0.2392	180	0.005555556	43.056
2	Nucleo F401RE	1	div	0.2505	84	0.011904762	21.042
2	Nucleo F401RE	2	sinf	1.0109	84	0.011904762	84.9156
2	Nucleo F401RE	3	cosf	0.964	84	0.011904762	80.976
2	Nucleo F401RE	4	expf	1.4127	84	0.011904762	118.6668
2	Nucleo F401RE	5	logf	1.3038	84	0.011904762	109.5192
2	Nucleo F401RE	6	sqrtf	0.5127	84	0.011904762	43.0668
3	PSoC 5 LP	1	div	2.35	80	0.0125	188
3	PSoC 5 LP	2	sinf	17.25	80	0.0125	1380
3	PSoC 5 LP	3	cosf	20.69	80	0.0125	1655.2
3	PSoC 5 LP	4	expf	27.05	80	0.0125	2164
3	PSoC 5 LP	5	logf	29.78	80	0.0125	2382.4
3	PSoC 5 LP	6	sqrtf	9.07	80	0.0125	725.6
4	PSoC 4	1	div	13.19	48	0.020833333	633.12
4	PSoC 4	2	sinf	69.17	48	0.020833333	3320.16
4	PSoC 4	3	cosf	80	48	0.020833333	3840
4	PSoC 4	4	expf	86.38	48	0.020833333	4146.24
4	PSoC 4	5	logf	106.25	48	0.020833333	5100
4	PSoC 4	6	sqrtf	24.35	48	0.020833333	1168.8
5	Arduino Uno	1	div	34	16	0.0625	544
5	Arduino Uno	2	sinf	139.24	16	0.0625	2227.84
5	Arduino Uno	3	cosf	146.52	16	0.0625	2344.32
5	Arduino Uno	4	expf	196.2	16	0.0625	3139.2
5	Arduino Uno	5	logf	182.4	16	0.0625	2918.4
5	Arduino Uno	6	sqrtf	64.08	16	0.0625	1025.28

「time(us)」は1処理あたりの実行時間。「clock:op」は1処理あたりのクロック数を計算してみた（推定値）。

単精度浮動小数点数の割り算

1処理あたりの実行時間

FPU付きのNucleo F446RE、F401REが圧倒的に高速だが、Cortex-M3のPSoC 5 LPもまあまあ頑張っている。

1処理あたりのクロック数

クロックあたりの処理能力にすると、コアの性能がわかると思う。Cortex-M0のPSoC 4よりAVRコアのArduino Unoの方が1クロックあたりのfloat型の処理能力は高いようだ？

AVRの駆動クロックは20MHz程度までなので実用的にはM0の方が速いが、8it MPUとは言えAVRもなかなかのもんだ。

単精度浮動小数点数のsinf()

他の関数も同じような傾向なので、sinf()だけグラフ化してみた。

1処理あたりの実行時間

1処理あたりのクロック数

メモ：

Nucleoだけはsinf()よりcosf()の方が高速。他はsinf()の方が高速。

音源とオーディオの電子工作（予定）

2017年6月2日金曜日

Arduino, Nucleo, PSoCの単精度浮動小数点数演算の速度比較

単精度浮動小数点数の割り算

単精度浮動小数点数のsinf()

メモ：

0 件のコメント:

コメントを投稿