Footer BG
    Footer BG
    Image

    Initiatives

    Our Work
    Internet of Agents
    AI/ML
    Quantum
    Open Source
    Our Collaborators
    DevNet
    Research
    Quantum Labs

    About us

    Company
    About Us
    Our Team
    The Shift
    Apply
    Job Openings
    Design Partner Portal
    Connect
    Events
    Contact Us
    YouTube
    LinkedIn
    GitHub
    X
    BlueSky

    Blog

    Categories
    AI/ML
    Quantum
    In-depth Tech
    Strategy & Insights
    Research
    Inside Outshift

    Resources

    Resource Hub
    View all
    Ebooks
    Webinars & Videos
    White papers
    Explore Cisco
    cta
    Help
    Terms & Conditions
    Statement
    Cookies
    Trademarks
    © 2025 Outshift by Cisco Systems, Inc

    clock icon

    5 min read

    Blog thumbnail
    Another Image

    Share

    Jayanth Srinivasa

    by

    Jayanth Srinivasa

    Published on 05/17/2022
    Last updated on 02/05/2024
    Published on 05/17/2022
    Last updated on 02/05/2024

    Foundations of NLU – Word Representations

    Share

    Subscribe card background
    Subscribe
    Subscribe to
    the Shift!
    Get emerging insights on emerging technology straight to your inbox.
    Natural Language Understanding (NLU) is a branch of Machine Learning (ML) that deals with a machine's ability to understand human language. Human language is made up of words, whereas machines and ML algorithms require words to be represented as numbers or vectors. This blog explores how words are 'represented' in ML algorithms.

    One-hot encoded vector

    One method is to represent words using one-hot encoded vectors where the length of the vector is the number of words in the dictionary. Each word in our dictionary maps to a vector component. The corresponding component is marked as 'one' if a particular word is present, and hence every sentence in the document is a sum of one-hot encoded vectors for each word. For example: Here is our dictionary: [cat, bat, rat, sat, mat, on, the]. The sentence 'the cat sat on the mat' can be represented as [1, 0, 0, 1, 1, 1, 1]. The drawback of this method is that the position of the words in the dictionary is random, and it does not signify any meaning. Further, if the words are sparse, the resultant vector is also sparse, i.e., a large number of components in the vector would be zero. Moreover, any addition to the vocabulary changes the length of the vectors.

    Word vector

    "You shall know a word by the company it keeps" – John R Firth.
    The meaning of words can be effectively captured by observing the words around them. This insight and deep learning helped develop the concept of word vectors. In word vectors, the words are represented by vectors of fixed length. The values of the individual components of the vectors are acquired by a deep learning (DL) network in the training phase. Following are the tasks the DL network algorithm is trained on: a) learn to predict a word, given the neighbors of the word (called 'continuous bag of words'). b) learn to predict the neighbors, given a word (called a 'skip-gram'). Word vectors are randomly initialized and become semantically meaningful during the training phase. Two words that are similar in meaning will also have corresponding word vectors close to each other. These word vectors are dense vectors, useful as inputs in downstream tasks such as classification and word generation. Well-known word vectors such as Word2Vec and Glove were created using different vocabularies and are available in open source.

    Context-aware word embeddings:

    Word Vectors ensure that the word representations are not dependent on the words used in the dictionary. But they still have a limitation of context. Words can have different meanings based on the context they appear in a text. For example, the word 'bank' can be used as a noun to reference a place where we transact money or the land by a river or as a verb to indicate a turn in three-dimensional space. Humans understand the meaning of words based on the context. Having a fixed vector representation of these words causes one vector for a word to represent all the meanings of that word. This single vector representation across different contexts can create confusion when used in NLU tasks like question answering. Researchers have developed context-aware embeddings using the recently developed concepts of Attention and Transformers. The original word vector is affected by surrounding word vectors resulting in context-aware embeddings. Using these word vectors in downstream NLU tasks has significantly improved the state-of-the-art results for these tasks.

    Beyond Word Vectors:

    Word Vectors and Context-aware vectors are used in various General Language Understanding and Evaluation (GLUE) applications (ex: question answering). But sometimes, it may become necessary to classify sentences or documents as a whole. An approach to doing that is to create a new embedding vector that represents the sentence. The easiest way to generate a sentence vector is to generate a vector by averaging all the word vectors in the sentence to form a single vector. Another approach is to use a special tag at the beginning of the sentence; when that tag is passed through a context-aware transformer network along with the sentence, it can capture the context-aware meaning of not just the individual words in the sentence but the entire sentence as well. Similarly, we can create vectors that represent the whole document. The easiest way of generating a document vector is by averaging the individual sentence vectors of sentences present in the document. Document vectors can help capture the average meaning of a document, and vector metrics like distance/similarity can be used on these vectors to get the 'semantic distance' between two documents.

    Summary:

    The vector representations of words (sentences or documents) form the basis of a lot of the current applications of NLU, but they do have some shortcomings. Knowledge Graphs are known to overcome these shortcomings. In the upcoming blog posts, we will explore knowledge graphs and how they are used in NLU and look at applications/downstream tasks that use word representations such as word vectors or context-aware word vectors as the building block. Vist the Cisco Research site to learn about other initiatives.
    Another Image
    Subscribe card background
    Subscribe
    Subscribe to
    the Shift!

    Get emerging insights on emerging technology straight to your inbox.

    Welcome to the future of agentic AI: The Internet of Agents

    Outshift is leading the way in building an open, interoperable, agent-first, quantum-safe infrastructure for the future of artificial intelligence.

    * No email required

    thumbnail
    Download Whitepaper

    * No email required

    Related articles

    Featured home blog
    Icon
    Quantum

    Unlocking the power of true randomness with Cisco's Quantum Random Number Generator

    Quantum
    Featured home blog
    Icon
    Inside Outshift

    Fostering a healthy team culture at Outshift

    Team
    Featured home blog
    Icon
    Inside Outshift

    Harnessing fear for professional growth: Insights from an Outshift Business Operations Leader & Chief of Staff

    Team
    Another Image
    Subscribe
    Subscribe
 to
    The Shift
    !
    Get
    emerging insights
    on innovative technology straight to your inbox.

    The Shift is Outshift’s exclusive newsletter.

    Get the latest news and updates on generative AI, quantum computing, and other groundbreaking innovations shaping the future of technology.

    Outshift Background
    Outshift Logo