Spring BatchでDBにデータを流し込んだ時の処理時間計測(Single/Multiスレッド)

先日、SpringBatchで複数ファイルをマルチスレッドでDBに突っ込むってエントリを書きましたが、
そういえば、時間の計測してなかったと思って、チョロっとやってみました。
＃クアッドコアのCPUとかだとこういうのグワっと差分が出てきそうですが、
＃会社の開発用UbuntuマシンはPentium Dってヤツなので微妙っちゃ微妙ですが。。
　
■テストデータ準備
テストデータ準備用のプログラム

#!/usr/bin/ruby
start = ARGV[0].to_i
file_name = ARGV[1]
loop = ARGV[2].to_i
file = File.open(file_name, 'w')
loop.times{|i|
　source=("a".."z").to_a + ("A".."Z").to_a + (0..9).to_a # ランダム文字列用
　key=""
　5.times{key+=source[rand(source.size)].to_s}
　num = start + i
　file.puts num.to_s + "," + key
}
file.close

1. Multi用(10万行のファイル5つ)
　./input.rb 100000 hage1.csv 100000
　./input.rb 200000 hage2.csv 100000
　./input.rb 300000 hage3.csv 100000
　./input.rb 400000 hage4.csv 100000
　./input.rb 500000 hage5.csv 100000
2. シングル用(50万行のファイル1つ)
　./input.rb 500000 hage.csv 500000
　
■ 処理
reader･･･ファイル読み込み
processor･･･CSVの2要素目に”value”という文字列を追加してDBの3カラム目をセット
writer･･･DBにINSERT(commit-interval=”10″)
・Singleは50万行のCSVファイル(hage.csv)
・Multiは5スレッドそれぞれで10万行のCSVファイル(hage1.csv〜hage5.csv)
　
■ 結果
★ Multi => 約88秒

2010-11-24 16:53:09,189 INFO main [org.springframework.context.support.ClassPathXmlApplicationContext] - <Refreshing org.springframework.context.support.ClassPathXmlApplicationContext@3e86d0: startup date [Wed Nov 24 16:53:09 JST 2010]; root of context hierarchy>
　〜略〜
2010-11-24 16:54:37,925 INFO main [org.springframework.batch.core.launch.support.SimpleJobLauncher] - <Job: [FlowJob: [name=hageMultiJob]] completed with the following parameters: [{}] and the following status: [COMPLETED]>

　
★ Single => 約140秒

2010-11-24 17:00:42,631 INFO [org.springframework.context.support.ClassPathXmlApplicationContext] - <Refreshing org.springframework.context.support.ClassPathXmlApplicationContext@3e86d0: startup date [Wed Nov 24 17:00:42 JST 2010]; root of context hierarchy>
　〜略〜
2010-11-24 17:03:01,594 INFO [org.springframework.batch.core.launch.support.SimpleJobLauncher] - <Job: [FlowJob: [name=hageJob]] completed with the following parameters: [{}] and the following status: [COMPLETED]>

　　
＝＝＝＝＝
　　
データ投入先のDBも同じマシン内なのでCPUネックになってしまって
そこまで劇的な差を出すことは出来ませんでしたが、
多重化して1.6倍くらいの性能が出てるので、計測したのは無駄ではなかったかなと。。