Computer software is commonly used to translate text from on
| Computer software is commonly used to translate text from one language to another. As part of his Ph.D. thesis, Philipp Koehn developed a phrase-based translation program called Pharaoh. The quality of the translation can vary. A good translation system should match a professional human translation. It is important to be able to quantify how good the translations produced by Pharaoh are. The IBM T. J. Watson Research Center developed methods to measure the quality of a translation from one language to another. One of these is the BiLingual Evaluation Understudy (BLEU). BLEU is a score ranging from 0 to 1 that indicates how well a computer translation matches a professional human translation of the same text. Higher scores indicate a better match. BLEU helps companies who develop translation software \"to monitor the effect of daily changes to their systems in order to weed out bad ideas from good ideas.\" To compare Pharaoh\'s ability to translate with similar computer translation software, Koehn took a random sample of 100 blocks of Spanish text, each of which contained 300 sentences, and used Pharaoh to translate each of these to English. The BLEU score was calculated for each of the 100 blocks. Open the data file BLEU-Scores. | |
| 11. | Assuming the requirements are satisfied, calculate a 95% confidence interval for the mean of the BLEU test scores. Round your answer accurate to three decimal places in interval notation. Round your answers to three decimal places and be sure to put the lower bound in the first box and the upper bound in the second. [Example: (42.335, 54.859)] ( , ) |
| Computer software is commonly used to translate text from one language to another. As part of his Ph.D. thesis, Philipp Koehn developed a phrase-based translation program called Pharaoh. The quality of the translation can vary. A good translation system should match a professional human translation. It is important to be able to quantify how good the translations produced by Pharaoh are. The IBM T. J. Watson Research Center developed methods to measure the quality of a translation from one language to another. One of these is the BiLingual Evaluation Understudy (BLEU). BLEU is a score ranging from 0 to 1 that indicates how well a computer translation matches a professional human translation of the same text. Higher scores indicate a better match. BLEU helps companies who develop translation software \"to monitor the effect of daily changes to their systems in order to weed out bad ideas from good ideas.\" To compare Pharaoh\'s ability to translate with similar computer translation software, Koehn took a random sample of 100 blocks of Spanish text, each of which contained 300 sentences, and used Pharaoh to translate each of these to English. The BLEU score was calculated for each of the 100 blocks. Open the data file BLEU-Scores. | |
Solution
Sample Data:
n=100
Can you provide the Bleu test scores as that is required for calculating mean and stadard deviation. Which will be furthur used to calculate Confidence Interval as per gen formula:
CI : ( Mean - (z*SD)/sqrt(n) , Mean + (z*SD)/sqrt(n) )
where, z for 95% confidence interval = 1.96
