Natural language processing – never-ending story

At the first sight the issue of language processing seems to be the one which an average user of a computer/smartphone and the like, simply takes for granted given the constantly developing technology we already have at our disposal and, at the same time, provides seemingly unlimited possibilities. However, even a casual look at the brief Wikipedia (translated) definition proves the fallacy of such thinking: “Natural Language processing (NLP) – the interdisciplinary scientific field which joins the issues of natural intelligence         and linguistics, dealing with the automatization of language analysis, understanding, translation and generating the natural language by computer.”

To put it in a nutshell, the whole idea is as follows: the system generating the natural language transforms the information recorded/saved in the database of a computer into the one easily accessible and comprehensible by a man. On the other side of the stick, the natural language samples are transformed into the more formal symbols, which in turn, are easier to master for the needs of computer programs. (By the way, by the natural language we understand a human one such as German or Spanish as opposed to a computer language e.g. C++, Java or Lisp.)

Attractive, though the whole idea of the co-operation between both languages might sound, the analysis of the natural language for the needs of the process is much more complex than one might initially think. It was clearly visible at the very advent of NLP, when the initial enthusiasm towards one of the first systems of this type, called SHRDLU, which worked surprisingly well but only as long as confronted with limited blocks of words within the scope of the equally limited language. The enthusiasm soon faded away at the point when the system was supposed to face some real-life situations with their frequent ambiguity of words and phrases. It soon remained clear that the sheer understanding of a natural language was not sufficient; the problem so well-known to foreign language teachers as well as for the users of the programs such as TECHLAND’S INTELLIGENT TRANSLATING SYSTEM, available on the market several years ago, the usefulness of which proved to be highly debatable.

THE PROBLEM CALLS FOR ARTIFICIAL INTELLIGENCE

        Therefore, it became clear that finding the effective solutions would depend on the field of natural intelligence (AI). It would require the broad knowledge concerning the external world and the ability to transform it. As a matter of fact, in order to achieve the level of full understanding, the AI would have to be able to perceive the world in the way exactly the same as the humans’. Let us have a brief look at some problems that are to be solved to overcome this complexity. It is worth notifying, that they could as well pose serious challenges in the casual communication, not to mention the one concerning a computer:

–        Speech signal segmentation – in case of majority of spoken languages, sounds are represented by sequels of letters. Therefore, the transformation of the analog signal into non-stable symbols might create serious obstacles, particularly if we take into account the small number of pauses between words (in case of the English language, they could be frequently hardly discernible). Moreover, the location of a pause usually depends on semantics, grammar, context of a sentences and last, but not least, sheer emotions.

–       Text segmentation – some languages, particularly these spoken in the Far-East part of the world, like Chinese, Japanese or Thai do not have clear borders in the written language and finding them might be a subject of serious analysis. To make matters worse, they do not even have clearly refined parts of speech as it is in case of the Europe-originated ones. In the preface to “The Art of War” by Sun Tzu, Jarosław Zawadzki, the translator, admits that another translation of the text, very different from his own, could be easily possible.

–       Syntactic ambiguity – grammars of natural languages are to large extend ambiguous and there are usually more possibilities of syntactic analysis. (“Give me a ring” means asking for a piece of jewellery, or sort of encouragement to propose, but equally well a request to be called by the adresee of the utterance by phone). So, what we actually need is the broad information concerning semantics and context.

–       Wrong or irregular data – another real and serious obstacle might be caused by a foreign or regional accent as well as speech defects, or just slips of the tongue or incorrect grammar so common in casual speech.

–       Wrongly understood purpose – frequently a speaker utters a sentence to trigger some sort of action. However, the structure of the sentence may provide no information to identify this action. E.g. “Could you pass me salt” is, in fact, a request of a physical activity and the answer “Yes, I can”, without being followed by a proper action, makes actually no sense.

MODERN APPROACH GETS US CLOSER!

        According to the modern approach, NLP is based on AI algorithms, conducted by an increased need to manage unstructured enterprise with structured data. We must keep in mind that any technologies treating a language as only a sequence of symbols in pattern matching or based on the distribution and frequency of keywords, to mention a few, are bound to be a long way from achieving a goal which is to process everyday language and turn spoken/written words into structured data. Any NLP algorithms that do not have an authentic comprehension of a language will always be limited in their understanding capacity.

        Therefore, here is the highly sophisticated issue of cognitive computing which embraces the attempts to overcome these limits by applying semantic algorithms that are able to mimic the human ability to read and understand.

IS IT REALLY WORTH THE EFFORT?

           NLP as a form of AI, helps computers read, comprehend and, what is crucial, respond by simulating the human ability to understand everyday language. Nowadays many organizations apply NLP techniques to optimize customer support, improve the efficiency of text analytics by easily finding the information they need; they also enhance social media monitoring, e.g. banks might implement NLP algorithms to facilitate customer support; a large consumer products brands might combine NLP processing and semantic analysis to enhance their knowledge, specify management strategies and social media monitoring. These few instances make only a tip of an iceberg which is to be seen in the part devoted to the Voice Recognition Technology.

BEFORE THE COMPUTER UNDERSTANDS…

        Chris Jager has said for the LifeHacker recently: “Before a computer can even understand what you mean, it needs to be able to understand what you said.” It goes without saying, the realization of such a goal requires the application of the sophisticated process that involves audio sampling, feature extraction and eventually the actual voice recognition in order to recognize individual sounds and then convert them into text. The technology which facilitates it has been worked on for many years and the main achievement seems to be the development of the techniques that are able to extract features of speech in a way which is similar to that of human ear. As well as this, they are also able to recognize these features as phonemes and sounds which constitute parts of human speech. This includes the application of neural networks, hidden Markov models and the others which are commonly understood as Artificial Intelligence. All of these caused the significant improvement of the recognition rates with the percentage of errors of less than 8%.

This unfortunately, means only the winning half of the battle as the fact that a computer creates a text based on what you said does not mean that it truly understands it, so here we have NLP in action! Moreover, the sort of action which is extremely sophisticated, if we keep in mind that the human language itself is full of context and semantics (hidden meanings, reading between the lines).

        The early systems were based on very limited vocabulary and the commands which had to be uttered in the right way to make sure, the computer would understand them. Very soon then, it became clear that it was just a teething phase of what was to happen next!

A LITTLE BIT OF HISTORY

        The phenomenon of voice recognition technology already has quite a long history. Over half a century ago, namely in the 1960s’ in one of the episodes of the Star Trek series          (Season 1), Captain Kirk is searching for information concerning a criminal by talking to a computer. The scene, which is to be found in the YouTube service, comes across as a very realistic one for a contemporary viewer. It bears uncanny yet close resemblance to SIRI (Apple IOS). The idea can be also encountered in the more recent productions such as Sneakers (1992). At some point of the plot, it features Robert Redford collecting the snippets of an executive’s voice and then playing them back by means of a tape recorder into a computer in order to gain access to the system. Needless to say, the simplicity of sci-fi film descriptions may overshadow the real complexity of the whole process.

NOT SO EASY!

        Anyone, having dealt with the early systems, knows how difficult it could have been to achieve any sort of communication success, taking into account their limited range of vocabulary and comprehension capacity. They would employ hard rules and decision trees to interpret commands; any deviation from these would cause immediate problems leading to communication breakdowns.

        As far as the newer systems are concerned, they apply machine learning algorithms similar to the hidden Markov models used in speech recognition to build a vocabulary base. This allows more flexible queries, where the language used can be changed but the content of the message remains the same and is still comprehensible for a computer.

One of the most spectacular, though not the only one, devices which put the above into practice is SIRI (Intelligent Personal Assistant) which is a part of Apple operating systems such as iOS, watchOS, tvOS or macOS. It had its debut in October 2011 in iOS 5 and first appeared in the smartphone iPhone 4s.  It recognizes user’s natural speech, answers his questions and performs various requests such as sending messages or making telephone connections. Thanks to the implementation of machine learning it is able to achieve even more, namely, after a while it analyses user’s preferences in order to provide the possibly most suitable results and recommendations. At the moment, its main disadvantage seems to be the lack of the Polish language version, but the progress it makes with each new version is indisputable.

SOME PROBLEMS STILL HAUNT!

        In spite of all the above-mentioned advancements, the field of voice recognition including the vast variety of accents and pronunciations is still challenging and problematic sphere. Therefore, it might still fail to recognize the actual context of a message (e.g. a GPS system mispronouncing “Ipswich” as “eyp-swich” and, obviously enough, it is a sort of 2-way problem). Also, the already mentioned SIRI frequently does not understand commands, whenever a pronunciation mistake appears or a user’s message is not clear enough. The only reasonable way then is to train the system in the variety of ways in which words could be pronounced. This can however be a large, time-consuming and complex process.

Then, there emerges another highly debatable issue, which is context. More often than once, we are forced to pay attention to a conversation in order to catch the right context and then we might realize how much we are supposed to change the way we speak to convey the right extra context to our utterances. This is obviously even more challenging in case of computers. If, for instance, we ask “Did you get my e-mail?” we expect to hear the answer which involves the reaction it triggered rather than a “Yes/No” reply.

AND SO?

        Despite the constant and quite rapid progress, we are still long way before we have the computers interpreting properly everything we say, like the one in Star Trek. The fact that we are gradually getting closer and closer was recently confirmed by the automatic translation system created by Microsoft.  Also, Google has just revealed the technology that employs the combination of image and voice recognition for more effective Natural Language Processing in which the camera in your smartphone automatically translates short conversations from one language to another. It can even try to match the font so the message looks the same, but it is in English! It means that you can avoid an embarrassing situation when pondering over the menu written in Spanish or give your order to a waiter who cannot speak English; Google will do the job for you! It is certainly only one more step, still however, worth notifying.

At the recent conference of World Wide Developer, Apple presented the additional feature of SIRI which started in IOS 9. It makes it possible to make simple, undetailed commands while looking on contextual information on your iPhone. Now, instead of asking SIRI to “remind me about the concert on Wednesday”, it is enough to ask: “remind me of this” while viewing the Facebook event for the concert. It will know what we mean. SIRI is by no means the only system of this, more modern kind, yet the regular users of, let’s say iPhones could easily track the line of development.

Similar technology is also present in Google devices such as “OK Google”, which brings us even closer to the context-aware voice recognition.

The extremely exuberant market of technology seems not to know limits nor vacuum to such extent that the majority of public do not even take notice of the new advancements or is, simply speaking, unable to do so. One way or another, each of the new developments makes a step towards the final goal, which is…… Here we have one more question!


Craft your software with us


captcha

Send