So far, this series has addressed two key elements of Generative AI: Large Language Models and Natural Language Processing. In this article, the author takes a look back at the surprisingly long and fascinating history of Natural Language Processing. He also introduces several NLP platforms that are available.

The History of Natural Language Processing

Although Natural Language Processing is now coming into the forefront, it actually does have a very long history, in fact even more so than that of AI. Here is a timeline of how it all has evolved:

  • From 1906-1911: The first courses in NLP were taught at the University of Geneva, by Professor Saussure. This was also where the concept of an NLP being an actual system was also introduced.
  • 1916: Albert Sechehaye and Charles Bally took the teachings of Professor Saussure and compiled a book called Cours de Linguiste Generale. This manuscript led to the “Structuralist Approach” that is used in NLP today.
  • 1950: Alan Turing composed a scientific paper describing a test for a machine that could “think” on its own. His basic hypothesis was that if a computer could have a primitive conversation with a human being, then it could also “think.” This eventually became known as the “Turing Test.”
  • 1952: The Hodkin-Huxley model scientifically proved that the human brain creates various “networks” to evoke the thought and reasoning processes. This led to the creation of the first known Chatbot, developed by Joseph Weizenbaum. It was called “Eliza” and was designed to mimic a mental therapist. But the responses were pre-scripted, it could not generate answers to queries on its own.
  • The 1960s: Scientists created the first version of Semantic Analysis, Parts of Speech Tagging and Parsing. The first versions of the “corpora” are machine readable documents that are supplemented with linguistic information which can subsequently be used to create NLP algorithms.
  • The 1970s: The first, statistical based algorithms evolved, and the first one to come out in this regard was called “SHRDLU” developed by Terry Winograd. The first of this kind was able to produce and move colored blocks in a virtualized environment.
  • The 1980s: The first NLP algorithms that relied on Machine Learning first evolved.
  • The 1990s: The development of Deep Learning, Neural Network, and Transformer based models gained increased levels of sophistication. The Hidden Markov Model, also called the “HMM,” was developed, and could convert a spoken phrase into a block of text.
  • The 2000s: “Word Embeddings” were developed. Two distinct models, known as “Word2Vec” and “GloVe,” were created. They represented words and blocks of text as “Dense Vectors.” A technical definition of Dense Vectors is as follows:

“Dense vectors are a type of mathematical objects that represent data in machine learning and artificial intelligence.” 1

    These models could capture the Semantics and the relationships between words. For          example, the words “computer” and “keyboard” could be represented as mathematical        vectors which displayed almost similar geometric patterns.

  • In the 2010s: Google came out with a new NLP platform which was called the “Neural Machine Translation.” This was designed for foreign language translation, while keeping the semantics of the text or spoken language almost identical in the conversion process.
  • Present day: ChatGPT is now the de facto Generative AI platform being used, powered by Transformer Models and Large Language Models.

The Tools in Natural Language Processing

Just like in Generative AI, there are also a number of platforms available you can make use of, instead of building your own NLP model from scratch. Here are some of the most widely used tools:

  • Genism: This tool can recognize the correlations between written and text-based language. It also has an indexing function that can handle a large volume of data. More detailed information can be found here.
  • SpaCy: This is deemed to be one of the newer forms of the NLP libraries. It has a plethora of pretrained models, which can also be used for Deep Learning applications. More information can be found here.
  • IBM Watson: This is probably one of the best known and most utilized in Natural Language Processing. For example, it can determine the keywords that are used in spoken language, and other emotional states that are conveyed by the end user. It is also used quite heavily in both the financial and healthcare sectors. More information about this can be seen here.
  • Natural Language Toolkit: This is a tool that allows you to create and execute Python based source code to get an overall understanding of human language, and the steps that need to be taken to model it. More information can be found here.
  • Monkey Learn: This is powered by NLP algorithms and is used to gain analytical insights from both written and spoken language. One of its key advantages is its powerful Sentiment Analysis engine. It can also connect to Google and Excel files. More information can be here.
  • Text Blob: This tool has pretrained models to do Classification., Sentiment Analysis, and various kinds of keyword extractions. It also comes with prebuilt Machine Learning models as well. More information can be found here.
  • Stanford Core NLP: This is an NLP model that was developed at Stanford University. It makes use of the Java Development Kit and is heavily for Tokenization and Named Entity Recognition. More details about this platform can be seen here.
  • Google Natural Cloud Language API: This is an API that you can use to create source code for NLP based models to do Entity Extraction, Content Classification, and Sentiment Analysis. More information about this platform can be found here.

Up Next: The Pros & Cons of Natural Language Processing

In the next article, the author will wrap up the series by discussing the advantages and disadvantages of Natural Language Processing.

Sources/References:

  1. 1. Dense Vectors in Natural Language Processing, Medium.com

Join the conversation.

Keesing Technologies

Keesing Platform forms part of Keesing Technologies
The global market leader in banknote and ID document verification

+ posts

Ravi Das is a Cybersecurity Consultant and Business Development Specialist. He also does Cybersecurity Consulting through his private practice, RaviDas Tech, Inc. He also possesses the Certified in Cybersecurity (CC) cert from the ISC2.
Visit his website at mltechnologies.io

Previous articleRepublic of Portugal Is Issuing a New National Identity Card
Next articleNew Banknotes Celebrate 33 Years of Ukraine’s Independence