Named entity recognition (NER) is the process of identifying and categorizing named entities in a given text. Examples of the categories you can use are organizations, locations, time, names, money, and rates. Other terms that are synonymous to NER are:

  • Entity identification
  • Entity extraction
  • Entity chunking

NER is part of information extraction (IE) or the process of automatically getting structured information from an unstructured document. With NER, the entity is the specific piece of information extracted. An example of NER is when the following unannotated text gets annotated:

Bill Gates sold US$35.8 billion worth of Microsoft stocks and gave it to the Bill and Melinda Gates Foundation.

NER creates the following annotated text from the sentence above:

[Bill Gates] Person sold [US$35.8 billion] money worth of Microsoft stocks and gave it to the [Bill and Melinda Gates Foundation] organization.

Other interesting terms…

Read More about Named Entity Recognition (NER)

NER first came to light in 1995 during the Message Understanding Conferences in the U.S. Back then, it was considered a subtask of IE. Today, however, NER is also used in natural language processing (NLP), and it has been quite useful across many sectors.

But before we dive into the nitty-gritty of NER, let’s define what a named entity is first.

What Is a Named Entity?

The term was first coined during the MUC-6 evaluation campaign and consisted of entity name expressions (ENAMEXs) and numerical expressions (NUMEXs). These typically include three broad classes—names, quantities, and dates and durations.

How Does a Named Entity Recognition (NER) System Work?

At its core, the ultimate goal of NER systems is to extract meaningful information about the entities that appear in raw data, such as document text. It follows this necessary process:

  1. The NER system reads the text.
  2. The system identifies and highlights entities.
  3. The system classifies the entities into predefined categories.

In the document below, for example, the NER system found eight entities that have been classified into four different categories—person, organization, date, and nationality or religious or political group (NORP).

How NER Works

Source: AI Time Journal (

As a result of NER, you would probably see several articles about Sebastian Thrun and Recode on the sidebar when you read the paragraphs in the screenshot above on a news site. You may also see this week’s news from the U.S.

How Do Named Entity Recognition (NER) Systems Extract Entities?

Two entity extraction methods are commonly used by NER systems. These are deep neural networks and pattern recognition processors.

  • Deep neural network extractors: Also known as “statistical extractors,” deep neural network processors are mainly used to identify entities that can be itemized. People, locations, and organizations, for instance, can’t be listed one by one. For example, Dakota can refer to a person or place. By using statistical modeling, NER systems can accurately categorize an entity.
  • Pattern matching extractors: NER systems can also be trained to recognize common expressions, such as dates, time, uniform resource locators (URLs), email addresses, phone numbers, credit card numbers, and social media tags. When the system detects a top-level domain (TLD) and a string of characters in-between the @ symbol and a period, for instance, it identifies an email address.
Named Entity Recognition at Work

There are other entity extraction methods, but these two are the most popular. They can also be used together for more accurate entity extraction.

What Methodologies Do Named Entity Recognition (NER) Systems Typically Use?

NER systems typically use various methodologies, including:

  • Rule-based approaches: NER systems rely on handcrafted rules and patterns to identify named entities based on linguistic features, such as capitalization, part-of-speech (PoS) tags, and surrounding words. While simple and interpretable, rule-based systems may struggle with handling variability and complexity.
  • Statistical models: Statistical models, such as Hidden Markov Models (HMMs), Conditional Random Fields (CRFs), and Maximum Entropy Markov Models (MEMMs), learn to recognize named entities from labeled training data. They use features extracted from text, such as word embeddings, PoS tags, and context windows, to make predictions. These models are effective at capturing complex patterns but require annotated training data.
  • Hybrid approaches: Some NER systems combine rule-based methods with statistical or ML techniques to leverage the strengths of each approach. For example, a rule-based system may be used to preprocess text and generate features for a statistical or ML model, improving overall performance.
  • Ensemble methods: These combine predictions from multiple models to improve accuracy and robustness. Techniques like bagging, boosting, and stacking are commonly used to create ensemble NER systems that outperform individual models.
  • Deep learning architectures: Recent advancements in deep learning have led to the development of sophisticated architectures tailored for NER tasks. For instance, architectures like Bidirectional LSTM with Conditional Random Fields (BiLSTM-CRF) and transformer-based models like BERT have achieved state-of-the-art performance on various NER benchmarks.

What Are Some of the Most Popular Named Entity Recognition (NER) Techniques?

Some of the most popular NER techniques include:

  • Conditional random fields (CRFs): CRFs are widely used due to their ability to model sequential dependencies between labels. They consider contextual information along with input features to make predictions, making them effective for capturing patterns in text data.
  • BiLSTM: BiLSTM networks are recurrent neural networks (RNNs) that process input sequences in both forward and backward directions. They can capture long-range dependencies in text data and have been successfully used for NER tasks by learning representations from input sequences.
  • BERT: BERT is a transformer-based deep learning model that utilizes self-attention mechanisms to capture contextual information from input text. It has achieved state-of-the-art performance in various NLP tasks, including NER, by pretraining on large text corpora and fine-tuning on task-specific data.
  • Gated recurrent units (GRUs): Another type of recurrent neural network that can model sequential data. They are similar to LSTMs but have a simpler architecture, making them computationally more efficient. GRUs have been used for their ability to capture dependencies in text sequences.
  • SVMs: Traditional ML algorithms that have been applied to NER tasks. They work by finding the hyperplane that best separates different classes in feature space, making them effective for binary classification tasks. SVMs are often used in conjunction with feature engineering techniques for NER.
  • Ensemble learning: Combines predictions from multiple models to improve overall performance. Ensemble methods, such as bagging, boosting, and stacking have been applied to NER systems by combining the outputs of different classifiers or models trained on different subsets of data.
  • BiLSTM-CRF: A popular architecture that combines the strengths of BiLSTM networks for capturing contextual information and CRFs for modeling label dependencies. This architecture has shown excellent performance in NER tasks by jointly optimizing sequence labeling and sequence modeling.

What Are Some Real-World Applications of Named Entity Recognition (NER)?

NER has several applications across industries. We named some of them below.

  • Content recommendation: When you read an article on a news website, such as BBC and CNN, you would notice a list of articles on the side or below that are related to the one you’re reading. These websites use NER to extract entities from the article you’re reading and recommend others that contain information about them. For instance, if an article is about the coronavirus outbreak, you’d see a slew of other articles about the same topic.
  • Search algorithm creation: Have you ever wondered how sites with millions of content can return relevant results when you search for something? Take Wikipedia, for example. When you search for “jobs,” instead of returning all articles with the word “jobs” in them, Wikipedia returns a page that contains predefined entities that the search term may refer to. Hence, Wikipedia suggests a link to the page where “occupation” is defined, a section for people named Jobs; and another part for movies, video games, and other entertainment content where the word “jobs” appears. You would also see another section for places that contain the search term.
  • E-commerce enablement: Online stores that offer hundreds or thousands of products would benefit a lot if they use NER in their product search algorithm. Without NER, a search for “black stiletto boots” would show stilettos that aren’t boots, boots that aren’t stilettos, and stiletto boots that aren’t black. E-commerce sites will lose customers if that is the case. NER would classify the search term in our example as black being the color and the stiletto boots as product type.
  • Customer support provision: Most customers these days tag a brand’s social media handle when complaining. For companies with branches all over the world, NER makes the job of the customer service department easier. All posts from customers can go through a scan for a location entity, and once found, the concern can get forwarded to the right branch.
Real-World Applications of Named Entity Recognition

NER is a robust process that can benefit various industries and departments. It can answer several questions that help companies understand their market and improve their business processes.

Key Takeaways

  • NER is the process of identifying and categorizing named entities in text, such as organizations, locations, time, names, money, and others.
  • Named entities refer to real-world data, including people, organizations, and locations.
  • Two commonly used NER systems are deep neural network extractors and pattern-matching extractors.
  • NER systems typically use rule-based approaches, statistical models, ML, and other methodologies to identify and categorize named entities.
  • NER systems use techniques like CRFs, BiLSTM, BERT, and others to perform tasks.
  • NER has real-world applications in content recommendation, search algorithm creation, e-commerce enablement, and customer support provision.