Named Entity Recognition (NER) is the process of identifying and categorizing named entities in a given text. Examples of categories are organizations, locations, time, names, money, and rate. Other terms that are synonymous to NER are:
- Entity Identification
- Entity Extraction
- Entity Chunking
NER is part of information extraction (IE) or the process of automatically getting structured information from an unstructured document. With NER, the entity is the specific piece of information extracted. An example of NER is when the following unannotated text gets annotated:
Bill Gates sold US$35.8 billion worth of Microsoft stock and gave it to the Bill and Melinda Gates Foundation.
NER creates the following annotated text from the sentence above:
[Bill Gates]Person sold [US$35.8 billion]Money worth of Microsoft Stock and gave it to the [Bill and Melinda Gates Foundation]Organization.
Read More about “Named Entity Recognition (NER)”
Named Entity Recognition first came to light in 1995 during the Message Understanding Conferences in the U.S. Back then, it was considered a subtask of IE. Today, however, NER is also used in natural language processing (NLP), and it has been quite useful across many sectors. Below are some of the use cases of NER.
But before we dive into the nitty-gritty of Named Entity Recognition, let’s first define what a named entity is.
What Is a Named Entity?
A named entity refers to a piece of real-world data. It can refer to a person, an organization, a location, or a product or service. Anything that can be attributed to a proper name can be considered a named entity. In the example above, “Bill Gates,” “US$35.8 billion,” and “Bill and Melinda Gates Foundation” are all named entities.
The term was first coined during the MUC-6 evaluation campaign and consisted of entity name expressions (ENAMEXs) and numerical expressions (NUMEXs). These typically include three broad classes—names, quantities, and dates and durations.
How a Named Entity Recognition (NER) System Works
At its core, the ultimate goal of NER systems is to extract meaningful information about the entities that appear in raw data such as in a text document. It follows this necessary process:
- The NER system reads the text.
- Entities are identified and highlighted by the NER system.
- The NER system classifies the entities into predefined categories.
In the document below, for example, the NER system found eight entities that have been classified into four different categories—person, organization, date and nationality or religious or political group (NORP).
As a result of NER, you would probably see several articles about Sebastian Thrun and Recode on the sidebar when you read the paragraphs in the screenshot above on a news site. You may also see this week’s news from the U.S.
Popular Ways Named Entity Recognition (NER) Systems Extract Entities
Two entity extraction methods are commonly used in NER systems. These are deep neural network and pattern recognition processors.
- Deep neural network extractors: Also known as “statistical extractors,” deep neural network processors are mainly used to identify entities that can be itemized. People, locations, and organizations, for instance, can’t be listed one by one. For example, Dakota can refer to a person or a place. By using statistical modeling, NER systems can accurately categorize an entity.
- Pattern matching extractors: NER systems can also be trained to recognize common expressions such as dates, time, uniform resource locators (URLs), email addresses, phone numbers, credit card numbers, and social media tags. When the system detects a top-level domain (TLD) and a string of characters in-between the @ symbol and the period, for instance, it identifies an email address.
There are other entity extraction methods, but these two are the most popular. They can also be used together for more accurate entity extraction.
Real-World Applications of Named Entity Recognition (NER)
- Content recommendation: When you read an article on a news website such as BBC and CNN, you would notice a list of articles on the side or below that are related to the one you’re reading. These websites use NER to extract entities from the article you’re reading and recommend others that contain information about them. For instance, if an article is about the coronavirus outbreak, you’d see a slew of other articles about the same topic.
- Search algorithm: Have you ever wondered how sites that have millions of content can return relevant results when you search for something? Take Wikipedia, for example. When you search for “jobs,” instead of returning all articles with the word “jobs” in them, Wikipedia returns a page that contains predefined entities that the search term might refer to. Hence, Wikipedia suggests a link to the page where “occupation” is defined, a section for people named Jobs; and another part for movies, video games, and other entertainment content where the word “jobs” appears. You would also see another section for places that contain the search term.
- E-commerce: Online stores that offer hundreds or thousands of products would benefit a lot if they use NER in their product search algorithm. Without NER, a search for “black stiletto boots” would show stilettos that aren’t boots, boots that aren’t stilettos, and stiletto boots that aren’t black. E-commerce sites will lose customers if this is the case. NER would classify the search term in our example as black being the color and the stiletto boots as product type.
- Customer support: Most customers these days tag a brand’s social media handle when complaining. For companies with branches all over the world, NER makes the job of the customer service department easier. All posts from customers can go through a scan for a location entity, and once found, the concern can get forwarded to the right branch.
NER is a robust process that can benefit various industries and departments. It can answer several questions that help companies understand their market and improve their business processes.