Phrasification And Revisiting Google’S Phrase Based Indexing
A newly granted Google patent on phrase-based indexing calls for a new look at that approach to indexing phrases on the Web, including a process referred to as phrasification.
Say you want to find out who the chief of police is in New York City. You might type the following words into a search box at Google:
New York police chief
When Google attempts to find an answer for you, it may break your query into individual words to find all of the documents that might be a best match for your search:
New AND York AND police AND chief
Google may then take all the documents that are returned, and see which ones contain all of the terms you used, and then rank those based upon some of the ranking algorithms the search engine uses to try to show you the best matches for your query.
But, what if Google tried to find phrases from your query instead, that appear on web pages that are a match for your search. What if Google used something they refer to as phrasification? Google might start out by taking your query and breaking it into different combinations of phrases, such as the following:
New York AND police AND chief
New York AND police chief
New AND York Police AND chief
New AND York police chief
New AND York AND police chief
New York Police Chief
Each of these phrasifications may be scored by using a scoring model that includes:
The expected probability of the phrase occurring in a document,
The number of phrases in the phrasification,
A confidence measure of each phrase,
Some adjustment parameters for controlling the precision and recall of searches on the phrases.
The highest scoring phrasifications may be selected as best representing the phrases contained in a query, and possibly lead to a combination that best matches what you may intend to find with your search.
For instance, its much more likely that you were searching for the chief of police in New York City then you were the new chief of the York Police.
That analysis might also tell it that a phrase such as Chief of Police might also be helpful to find pages that may match the meaning behind your search.
If Googles index contained information about phrases that appear on web pages in addition to individual terms, the phrasification approach might work to improve the results that you see at Google.
Google Phrase-Based Indexing
Over the past few years, a number of Google patent applications were published which describe how the search engine might use a phrase-based indexing system.