Researchers at Amazon, on the 11th of August, gave a comprehensive progress report made in the new iteration of the e-commerce giant’s open-source, sequence-to- sequence toolkit for neural machine translation (NMT) known as Sockeye 2. The original Sockeye 2 was introduced by the multinational technology company (Amazon) in July 2017 right after they bought the Pittsburgh, Pennsylvania-based Machine translator vendor, Safaba. Since the acquisition, Amazon has proceeded with localization projects through machine learning offerings. It is important to note that machine learning offerings were once the exclusive territory of language service providers (LSPs), including machine dubbing and quality estimation of translated subtitles.
According to Github, Sockeye 2 provides “out-of-the-box support for quickly training strong Transformer models for research or production.” and this is just one of the numerous scientific publications that have referenced Sockeye in the past three years since it got acquired by Amazon. These references are not limited to publications as the toolkit for neural machine translations has also won submissions to conferences on Machine Translation (WMT) evaluations.
Tech conglomerates such as Intel and NVIDIA have also been commended by the media for their contribution to the improvement of Sockeye 2; contributions which include the performance improvement on Sockeye inference and Transformer implementation, respectively.
Five Amazon research scientists and an external advisor from the University of Edinburgh, Professor Kenneth Heafield, have attributed Sockeye 2’s substantial gains primarily to streamlined Gluon implementation; support for state-of-the-art architectures
and efficient decoding; and improved model training.
Rapid development and experiment are bound to be activated with the simplified Gluon code base. This means, by adopting Gluon “the latest and preferred API of MXNet,” Sockeye 2, when compared to the original Sockeye, now requires about 25% less Python code with an improved training speed of 14%. Inspired by the success of self-attentional models, the researchers concentrated on Transformer architecture and discovered that “deep encoders with shallow decoders are competitive in BLEU and significantly faster for decoding.”