This is the second part of an interview with Jean Senellart (JAS) , Global CTO and SYSTRAN SAS Director General. The first part can be found here: A Deep Dive into SYSTRAN’s Neural Machine Translation (NMT) Technology .
Those translators who accuse MT vendors of stealing or undermining their jobs should take note that SYSTRAN is the largest independent MT vendor. A position it has held for most of its existence, but has never generated more than €20 million in annual revenue. Which to me suggests that MT is mostly being used for different kinds of translation tasks and is hardly taking any jobs away. The problem has been much more related to unscrupulous or incompetent LSPs who used MT improperly in rate negotiations. MT is hugely successful for those companies who find other ways to monetize the technology, as I pointed out in The Larger Context Translation Market.This does suggest that MT has huge value in enabling global communication and commerce, and even in its less than perfect state is considered valuable by many who might otherwise need to acquire human translation services. If anything, MT vendors are the ones that are trying hardest to develop technology that is actually useful to professional translators as the Lilt offering shows and as this new SYSTRAN offering also promises to be. The new MT technology is reaching a point where it is becoming rapidly responsive to corrective feedback and thus much more useful to professional translation use case scenarios. The forces affecting translator jobs and work quality are much more complex and harder to pin down as I have also mentioned in the past.
In my conversations with Jean, I realized that he is one of the few people around in the "MT industry", who has deep knowledge and production use experience with all three MT paradigms. Thus, I tried to get him to share both his practical experience-based and philosophical perspectives about the three approaches. I found his comments fascinating and thought that it would be worth highlighting them separately in this post. Jean (through SYSTRAN) is unique in being one of the rare practitioners around, who has produced commercial release versions, of say French <> English MT systems, using all three MT paradigms. More if you count the SPE and NPE “hybrid” variants where more than one approach is used in a single production process.
Again in this post, I have tried as much as possible to keep Jean Senellart’s direct quotes in here to avoid any misinterpretation. My comments are in italics when they do occur within his quotes.
“The NMT approach is extremely smart to learn the language structure but is not as good at memorizing long lists of terminology as RBMT or SMT was. With RBMT, the terminology coverage was right, but the structure was clumsy – with SMT or SPE, we had an “in-between” situation where we got the illusion of fluency, but sometimes at the price of a complete mistranslation, and a strange ability to memorize huge lists but without any consideration of their linguistic nature. With NMT, the language structure is excellent, as if the neural network really deeply understands the grammar of the language – and introducing [greater] support of terminology was the missing link to the previous technologies.”
(This inability of NMT to handle large vocabulary lists is considered one of the main weakness of the technology currently. Here is another reference discussing this issue. However, it appears that SYSTRAN has developed some kind of a solution to address this issue.)
“What is interesting with NMT is it seems far more tolerant than PBSMT (Phrase-Based SMT) to noisy data. However, it does need far less data than PBSMT to learn – so we can afford to provide only data for which we know we have a good alignment quality. Regarding the domain or the quality [of MT output of] the languages, we are for the moment trying to be as broad as possible [
rather than focusing on specialized domains].”
In terms of training data volume, JAS said: “This is still very empirical, but we can outperform Google or Microsoft MT on their best language pairs using only 5M translation units – and we have a very good quality (BLEU score about 45** ) for languages like EN>FR with only 1M TU. I would say we need 1/5 of the data necessary to train SMT. Also, generic translation engines like Google or Bing Translate are using billions of words for their language models, here we need probably less than 1/100th.”
(**I think it bears saying that I fully expect that the BLEU here is measured with great care and competence, unlike what we see so often with Moses practitioners and LSPs in general who assume scores of 75+ are needed for the technology to be usable.
The ability of the new MT technology to improve rapidly with small amounts of good quality training data and small amounts of corrective feedback suggests that we may be approaching new thresholds in the use of MT for professional use.)
“RBMT, fundamentally (unlike the other two), has an ulterior motive: it attempts to describe the actual translation process. And by doing that, it has been trying to solve a far more complicated challenge than just machine translation; it tries to decompose [and deconstruct and analyze] and explain how a translation is produced. I still believe that this goal is the ultimate goal of any [automated translation] system. For many applications, in particular, language learning, but also for post-editing, the [automated ] system would be far more valuable if it could produce not only the translation but also explain the translation.”
“In addition, RBMT systems are facing three main limitations in language [modeling] which are:
1) the intrinsic ambiguity of language for a machine, which is not the same for a human who has access to meaning [and a sense for the underlying semantics],
2) the exception-based grammar system, and
3) the huge, contextual and always expanding volume of terminology units.”
“Technically, RBMT systems might have different levels of complexity depending on the linguistic formalism being used, making it hard to compare with the others (SMT, NMT), so I would rather say that one of the main reasons for the limitations of a pure RBMT system lies in its [higher reaching] goal. The fact is that fully describing a language is a very complicated matter, and there is no language today for which we can claim a full linguistic description.”
“SMT came with an extraordinary ability to memorize from its exposure to existing translations – and with this ability, it brought a partial solution to the first challenge mentioned above, and a very good solution to the third one – the handling of terminology – however, it mostly ignores the modeling of the grammar. Technically, I think SMT is the most difficult of the 3 approaches, it combines many algorithms to optimize the MT process, and it is hard work to deal with the huge database of [relevant training ] corpus.”
“NMT has access to the meaning and is dealing well with modeling of human language grammar. In terms of difficulty, NMT engines are probably the simplest to implement, a full training of an NMT engine involve only several thousands of lines of code. The simplicity of implementation is, however, hiding the fact that we/nobody knows why it is so effective.”
“I would use an analogy of a human learning to drive a car to explain this more fully:
- The Rule-based approach will attempt to provide a full modeling of the car dynamic, on how the engine is connected to the wheel, on the effect of acceleration in the trajectory, etc. This is very complicated. (And possibly impossible to model in totality.)
- The Statistical approach, will use data from past experience and will try to compare a new situation with a past situation and will decide on the action based on this large database [of known experience]. This is a huge task and very difficult to implement. (And can only be as good as the database it learns from.)
- The Neural approach, with a limited access to the phenomenon involved, or with limited ability to remember, will build its own “thinking” system to optimize the driving experience, it will actually learn to drive the car, build reflexes – but will not be able to explain why and how such decisions are being made, and will not be able to leverage local knowledge – for instance that at a specific bend on the road in very specific weather condition, it had to anticipate [braking] because it is particularly dangerous, etc... This approach is surprisingly very simple and thanks to computation power evolution have become more accessible.”
“Today, this last approach (NMT) is clearly the most promising but will need to integrate the second (SMT) to be more robust, and eventually to deal with the first one (RBMT) to be able to not only make choices but also explain them.”
The latest generation MT technology, especially NMT and Adaptive MT look like a major step forward to enabling the expanding use of MT in professional translation settings. With continuing exploration and discovery in the fields of NLP, artificial intelligence and machine intelligence, I think we may be in for some exciting times ahead as these discoveries benefit MT research. Hopefully, the focus will shift to making new content multilingual and solve new kinds of translation challenges, especially in speech and video. I believe that we will see more of the kinds of linguistic steering activities we are seeing in motion at eBay and that there will always be a role for competent linguists and translators.
The first part of the SYSTRAN interview can be found at A Deep Dive into SYSTRAN’s Neural Machine Translation (NMT) Technology
Post Script: There is a detailed article that describes the differences between these approaches on the SYSTRAN website released a few weeks after this post was published.
Those translators who accuse MT vendors of stealing or undermining their jobs should take note that SYSTRAN is the largest independent MT vendor. A position it has held for most of its existence, but has never generated more than €20 million in annual revenue. Which to me suggests that MT is mostly being used for different kinds of translation tasks and is hardly taking any jobs away. The problem has been much more related to unscrupulous or incompetent LSPs who used MT improperly in rate negotiations. MT is hugely successful for those companies who find other ways to monetize the technology, as I pointed out in The Larger Context Translation Market.This does suggest that MT has huge value in enabling global communication and commerce, and even in its less than perfect state is considered valuable by many who might otherwise need to acquire human translation services. If anything, MT vendors are the ones that are trying hardest to develop technology that is actually useful to professional translators as the Lilt offering shows and as this new SYSTRAN offering also promises to be. The new MT technology is reaching a point where it is becoming rapidly responsive to corrective feedback and thus much more useful to professional translation use case scenarios. The forces affecting translator jobs and work quality are much more complex and harder to pin down as I have also mentioned in the past.
In my conversations with Jean, I realized that he is one of the few people around in the "MT industry", who has deep knowledge and production use experience with all three MT paradigms. Thus, I tried to get him to share both his practical experience-based and philosophical perspectives about the three approaches. I found his comments fascinating and thought that it would be worth highlighting them separately in this post. Jean (through SYSTRAN) is unique in being one of the rare practitioners around, who has produced commercial release versions, of say French <> English MT systems, using all three MT paradigms. More if you count the SPE and NPE “hybrid” variants where more than one approach is used in a single production process.
Again in this post, I have tried as much as possible to keep Jean Senellart’s direct quotes in here to avoid any misinterpretation. My comments are in italics when they do occur within his quotes.
Comparing The Three Approaches; The Practical View
Some interesting comparative comments made by Jean about his actual implementation experience with the three methodologies (RBMT, SMT, NMT):“The NMT approach is extremely smart to learn the language structure but is not as good at memorizing long lists of terminology as RBMT or SMT was. With RBMT, the terminology coverage was right, but the structure was clumsy – with SMT or SPE, we had an “in-between” situation where we got the illusion of fluency, but sometimes at the price of a complete mistranslation, and a strange ability to memorize huge lists but without any consideration of their linguistic nature. With NMT, the language structure is excellent, as if the neural network really deeply understands the grammar of the language – and introducing [greater] support of terminology was the missing link to the previous technologies.”
(This inability of NMT to handle large vocabulary lists is considered one of the main weakness of the technology currently. Here is another reference discussing this issue. However, it appears that SYSTRAN has developed some kind of a solution to address this issue.)
“What is interesting with NMT is it seems far more tolerant than PBSMT (Phrase-Based SMT) to noisy data. However, it does need far less data than PBSMT to learn – so we can afford to provide only data for which we know we have a good alignment quality. Regarding the domain or the quality [of MT output of] the languages, we are for the moment trying to be as broad as possible [
rather than focusing on specialized domains].”
In terms of training data volume, JAS said: “This is still very empirical, but we can outperform Google or Microsoft MT on their best language pairs using only 5M translation units – and we have a very good quality (BLEU score about 45** ) for languages like EN>FR with only 1M TU. I would say we need 1/5 of the data necessary to train SMT. Also, generic translation engines like Google or Bing Translate are using billions of words for their language models, here we need probably less than 1/100th.”
(**I think it bears saying that I fully expect that the BLEU here is measured with great care and competence, unlike what we see so often with Moses practitioners and LSPs in general who assume scores of 75+ are needed for the technology to be usable.
The ability of the new MT technology to improve rapidly with small amounts of good quality training data and small amounts of corrective feedback suggests that we may be approaching new thresholds in the use of MT for professional use.)
Comparing RbMT, SMT, and NMT: The Philosophical View
When I probed more deeply into the differences between these MT approaches, (since really SYSTRAN is the only company who has real experience in all 3), JAS said: “I would more compare them in terms of what they are trying to do, and on their ability to learn.” His explanation reflects his long-term experience and expertise and is worth careful reading. I have left the response in his own words as much as possible.“RBMT, fundamentally (unlike the other two), has an ulterior motive: it attempts to describe the actual translation process. And by doing that, it has been trying to solve a far more complicated challenge than just machine translation; it tries to decompose [and deconstruct and analyze] and explain how a translation is produced. I still believe that this goal is the ultimate goal of any [automated translation] system. For many applications, in particular, language learning, but also for post-editing, the [automated ] system would be far more valuable if it could produce not only the translation but also explain the translation.”
“In addition, RBMT systems are facing three main limitations in language [modeling] which are:
1) the intrinsic ambiguity of language for a machine, which is not the same for a human who has access to meaning [and a sense for the underlying semantics],
2) the exception-based grammar system, and
3) the huge, contextual and always expanding volume of terminology units.”
“Technically, RBMT systems might have different levels of complexity depending on the linguistic formalism being used, making it hard to compare with the others (SMT, NMT), so I would rather say that one of the main reasons for the limitations of a pure RBMT system lies in its [higher reaching] goal. The fact is that fully describing a language is a very complicated matter, and there is no language today for which we can claim a full linguistic description.”
“SMT came with an extraordinary ability to memorize from its exposure to existing translations – and with this ability, it brought a partial solution to the first challenge mentioned above, and a very good solution to the third one – the handling of terminology – however, it mostly ignores the modeling of the grammar. Technically, I think SMT is the most difficult of the 3 approaches, it combines many algorithms to optimize the MT process, and it is hard work to deal with the huge database of [relevant training ] corpus.”
“NMT has access to the meaning and is dealing well with modeling of human language grammar. In terms of difficulty, NMT engines are probably the simplest to implement, a full training of an NMT engine involve only several thousands of lines of code. The simplicity of implementation is, however, hiding the fact that we/nobody knows why it is so effective.”
“I would use an analogy of a human learning to drive a car to explain this more fully:
- The Rule-based approach will attempt to provide a full modeling of the car dynamic, on how the engine is connected to the wheel, on the effect of acceleration in the trajectory, etc. This is very complicated. (And possibly impossible to model in totality.)
- The Statistical approach, will use data from past experience and will try to compare a new situation with a past situation and will decide on the action based on this large database [of known experience]. This is a huge task and very difficult to implement. (And can only be as good as the database it learns from.)
- The Neural approach, with a limited access to the phenomenon involved, or with limited ability to remember, will build its own “thinking” system to optimize the driving experience, it will actually learn to drive the car, build reflexes – but will not be able to explain why and how such decisions are being made, and will not be able to leverage local knowledge – for instance that at a specific bend on the road in very specific weather condition, it had to anticipate [braking] because it is particularly dangerous, etc... This approach is surprisingly very simple and thanks to computation power evolution have become more accessible.”
“Today, this last approach (NMT) is clearly the most promising but will need to integrate the second (SMT) to be more robust, and eventually to deal with the first one (RBMT) to be able to not only make choices but also explain them.”
Comparing NMT to Adaptive MT
When probed about this, JAS said: “Adaptive MT is an innovative concept based on the current SMT paradigm – it is, however, a concept that is quite naturally embedded in the NMT paradigm; of course, there will be work needed to be done to make it work as nicely as the Lilt product does it. But, my point is that NMT (and not just from Systran) will bring a far more intuitive solution to this issue of continuous adaptive learning, because it is built for that: on a trained model, we can tune the model without any tricks with feedback of one single sentence – and produce a translation which immediately adapts to user input.”The latest generation MT technology, especially NMT and Adaptive MT look like a major step forward to enabling the expanding use of MT in professional translation settings. With continuing exploration and discovery in the fields of NLP, artificial intelligence and machine intelligence, I think we may be in for some exciting times ahead as these discoveries benefit MT research. Hopefully, the focus will shift to making new content multilingual and solve new kinds of translation challenges, especially in speech and video. I believe that we will see more of the kinds of linguistic steering activities we are seeing in motion at eBay and that there will always be a role for competent linguists and translators.
Jean Senellart, CEO, SYSTRAN SA
The first part of the SYSTRAN interview can be found at A Deep Dive into SYSTRAN’s Neural Machine Translation (NMT) Technology
Post Script: There is a detailed article that describes the differences between these approaches on the SYSTRAN website released a few weeks after this post was published.
0 Comment