bleukit - NTCIR7 Scoring tools for Patent Translation task

What's this?

This package is made up of the following commands for computing BLEU and some statistics.

doc_bleu.rb: Calculate BLEU score for entire document.
line_bleu.rb: Calculate BLEU score for each lines.
bootstrap.rb: Test whether two translation results have a significant difference or not, using bootstrap method.
binom.rb: Test whether two translation results have a significant difference or not, using sign test.
confiv.rb: Calculate confidence interval of BLEU score.

Download

Change Log

1.05

Ruby 1.9 ready.

Japanese comments of bleulib.rb are translated into English.

1.04

Japanese comments are added on bleulib.rb. Charset is ~~EUC_JP~~ UTF-8. These comments are planned to translate into English on the next version.

If corpus word count is smaller than BLEU ngram(default: 4 words), bleulib.rb(version <= 1.03) aborts. This bug is fixed on 1.04.

1.03

bleulib.rb(0.08 <= version <= 1.01) has a small bug and it affects to line_bleu.rb of "-ngram" option. Please use a bugfix version(1.03).

Stable version

version 1.05(latest version)
md5sum: 0faae44f8f6aaf66eb62c04cb9900b2a
version 1.04
md5sum: 84855fe700d21979e1bdf23bace1735d
version 1.03
md5sum: 3e8531cbf41a328e17b07561041beb4a
version 1.01
md5sum: e58d79c7e72c317834eb10cd73325378
version 1.0
md5sum: c4a48f5800c786e822a0dc4b4211dba6

Usages and Outputs

doc_bleu.rb

ruby doc_bleu.rb [options] [Translation Result] [Reference...]

The options of doc_bleu.rb are followings:

-v level: Verbose mode(0,1,2: default=0)
--ngram n: Calcurate ngram BLEU

line_bleu.rb

ruby line_bleu.rb [options] [translation result] [references...]

The option of line_bleu.rb is only the following:

-ngram min:max: Calc from min-gram to max-gram(default 4:4)

outputs

[max-gram BLEU]\t[(max-1)-gram BLEU]\t...[min-gram BLEU]\t[Translation Result]

bootstrap.rb

ruby bootstrap.rb [Random Seed] [Number of Sampling] [System1 Result] [System2 Result] [Reference..]

binom.rb

ruby binom.rb [Random Seed] [Significance Level(%)] [Partition Number] [System1] [System2] [Reference..]

confiv.rb

ruby confiv.rb [options] [Random Seed] [Number of Sampling] [Significance Level(%)] [System result] [Reference..]

The option of confiv.rb is only the following:

--gnuplot: Output results for gnuplot friendly.

Outputs

If you use --gnuplot option, output is following when interval is [min,max]:

[System BLEU]\t[min]\t[max]\n

File Format

All programs require result and reference files the same lines of which must have the relationship of result-reference at sentence level. File format is follows:

A line must only contains 1 sentence.
Each line must be tokenized.
This programs does NOT require any tags.

Please look at examples in a package.

License

MIT License

References

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2001.
BLEU: a method for automatic evaluation of machine translation. In ACL ��02: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311-318, Morristown, NJ, USA. Association for Computational Linguistics.
Philipp Koehn . 2004.
Statistical Significance Tests for Machine Translation Evaluation. In Proc. of EMNLP 2004.
Philipp Koehn and Christof Monz. 2006.
Manual and Automatic Evaluation of Machine Translation between European Languages

Acknowledgement

The author would like to thank Professor Mikio Yamamoto for his many advices to this work.

Jun-ya NORIMATSU
Department of Computer Science
Graduate school of Systems and Information Engineering,
University of Tsukuba.