In the world of human communication, common concepts are extremely important not only because of the characteristics themselves but because they uniquely identify objects or classes of objects and determine the common reality of the communicating subjects.
When information is transferred this way, the absolute priority is determining the nature and identity of the object of the discussion first: what that object is, what it represents for both parties, what its common meaning is. Only after the common meaning is established, structural information, or details about the object, can come into discussion.
When two people speak, the information transfer is possible because both brains, the sender and the recipient relate to the same “reality”, be that the real world or some abstract world like mathematics or feelings. The recipient may be a doctor and the sender a banker, but when the “Person” word is invoked both of them associate it with an individual in the real world. They may know many different characteristics about the person particular to their profession, but when they speak, none of those particularities matter. What matters is what they have in common and to access that, they need an absolutely simple common pointer, which is the concept of “Person”. The simplicity of this is so powerful, that even if they speak different languages they can actually transmit the information using a dictionary, because regardless of how the word is pronounced or written, it references the same simple concept in a reality that is common to both parties, Illustration 1.
Sentences can be translated the same way. Rules (grammars) may be different in different languages, nevertheless they are based on the same reality, on the same objects, on the same actions, on the same context (time, person, possession, command, etc.) and as such, a translator, somebody that knows both languages, can transpose the sentences (these conglomerates of objects, contexts and rules) from one language to another such that the reality, the semantics of the message stays intact. The fact that this reality is common is a sine qua non condition why such translations are possible. Each language is in fact an encoding of the reality of people that speak it and when a translator has knowledge of both encodings and the encoded realities overlap sufficiently, he or she can transcode this portion of reality from one encoding to the other without any losses. If however, one of the languages lacks a term for a concept, this usually occurs when the reality of the people who developed the language lacks the concept itself altogether, translation becomes difficult or impossible.
The Pirahã language, for example, has no cardinal or ordinal numbers. Why this is, is still subject to academic debate: some argue that the Pirahã people cannot learn numeracy, others advocate that they can count but they choose not to. Whatever the reason, translating example: 3 into Pirahã in an exact manner, is impossible, as their reality lacks a concept behind the source language, numbers:
- Example 3, Partially translatable sentence
- “Twenty people went hunting and they brought back three pigs.”
For the Pirahã, numbers don’t exist. A translator might be able to approximate the meaning and transpose it to the other reality with some information loss. Instead of “twenty” they can use something akin to “many”, instead of “three” they can use something like “few”. This is a much more difficult job to do than simple translation, because direct correspondence between realities does not exist.
An interesting aspect of this translation process is that we, who’s reality is compatible with that which English language models, would be tempted to say that the translation occurred with information loss, simply because the Pirahã reality lacks some fundamental aspect of the actual reality, and so, a back and forth translation will not restore the information to its original state. This view, however is judgmental and incomplete. The Pirahã reality does capture multiplicity / quantity, but it does so in different forms, which is not fully compatible with ours. The proper way to see this is that the two subjective realities are fundamentally different from this point of view and hence the translation occurs with information loss not because one reality is deficient, but because the two subjective realities contain concepts that are not fully compatible.
With such fractured reality, translations is only possible provided that some level of correspondence does exist between the two realities within the domain of the information being transferred and that the translator knows both these realities well enough to make a correlation. On the same token, the translation in the example: 4 would be utterly impossible, because there is nothing in the message that could remotely be correlated to the Pirahã reality1:
- Example 4, Untranslatable sentence
- “Three brokers sold ten thousand bonds today on the stock market and made a million dollars in profit.”
In case of this sentence, there is no common reality that a translator can refer to in translating the sentence. Their world is based on different values as ours, from this particular angle, and as such there is nothing of value to them in this sentence. This particular information cannot flow from one side to the other simply because there is no context (meaning) to give birth to information on one of the sides. There is no reality behind the data. For fairness of treatment, I must point that for obvious reasons it is impossible to me to give an example of a sentence (information conveyed) that would transmit something that exist in Pirahã world and does not exist in ours.
It is I think reasonable to state that the reason why computers have such difficulties in human translating language may be that they do not “understand”2 the reality in which the sentence is based. Computers are unaware of the meaning of the sentence and as such they must rely on translation between words and grammar and statistical matching, which highly inaccurate and sometimes can be confusing.
Computers don’t have a reality and whether they can have one remains to be decided by future AI research. Until then, short of learning the correspondence in languages for every single expression that exists out there, translations will be imprecise. Even if the future computers will possess an internal cognitive process akin to consciousness that reflects in some sort of reality that they, among themselves share, it is still questionable whether they’ll be able to translate our language. For that to happen, they will also have to sense our reality because only then can they create precise correspondence between real concepts and information written.