bleukit - NTCIR7 Scoring tools for Patent Translation task

What's this?

This package is made up of the following commands for computing BLEU and some statistics.

doc_bleu.rb
Calculate BLEU score for entire document.
line_bleu.rb
Calculate BLEU score for each lines.
bootstrap.rb
Test whether two translation results have a significant difference or not, using bootstrap method.
binom.rb
Test whether two translation results have a significant difference or not, using sign test.
confiv.rb
Calculate confidence interval of BLEU score.

Download

Change Log

1.05

Ruby 1.9 ready.

Japanese comments of bleulib.rb are translated into English.

1.04

Japanese comments are added on bleulib.rb. Charset is EUC_JP UTF-8. These comments are planned to translate into English on the next version.

If corpus word count is smaller than BLEU ngram(default: 4 words), bleulib.rb(version <= 1.03) aborts. This bug is fixed on 1.04.

1.03

bleulib.rb(0.08 <= version <= 1.01) has a small bug and it affects to line_bleu.rb of "-ngram" option. Please use a bugfix version(1.03).

Stable version

Usages and Outputs

doc_bleu.rb

ruby doc_bleu.rb [options] [Translation Result] [Reference...]

The options of doc_bleu.rb are followings:

-v level
Verbose mode(0,1,2: default=0)
--ngram n
Calcurate ngram BLEU

line_bleu.rb

ruby line_bleu.rb [options] [translation result] [references...]

The option of line_bleu.rb is only the following:

-ngram min:max
Calc from min-gram to max-gram(default 4:4)

outputs

[max-gram BLEU]\t[(max-1)-gram BLEU]\t...[min-gram BLEU]\t[Translation Result]

bootstrap.rb

ruby bootstrap.rb [Random Seed] [Number of Sampling] [System1 Result] [System2 Result] [Reference..]

binom.rb

ruby binom.rb [Random Seed] [Significance Level(%)] [Partition Number] [System1] [System2] [Reference..]

confiv.rb

ruby confiv.rb [options] [Random Seed] [Number of Sampling] [Significance Level(%)] [System result] [Reference..]

The option of confiv.rb is only the following:

--gnuplot
Output results for gnuplot friendly.

Outputs

If you use --gnuplot option, output is following when interval is [min,max]:

[System BLEU]\t[min]\t[max]\n

File Format

All programs require result and reference files the same lines of which must have the relationship of result-reference at sentence level. File format is follows:

  1. A line must only contains 1 sentence.
  2. Each line must be tokenized.
  3. This programs does NOT require any tags.

Please look at examples in a package.

License

References

  1. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2001.
    BLEU: a method for automatic evaluation of machine translation. In ACL ��02: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311-318, Morristown, NJ, USA. Association for Computational Linguistics.
  2. Philipp Koehn . 2004.
    Statistical Significance Tests for Machine Translation Evaluation. In Proc. of EMNLP 2004.
  3. Philipp Koehn and Christof Monz. 2006.
    Manual and Automatic Evaluation of Machine Translation between European Languages

Acknowledgement

The author would like to thank Professor Mikio Yamamoto for his many advices to this work.

Jun-ya NORIMATSU
Department of Computer Science
Graduate school of Systems and Information Engineering,
University of Tsukuba.