Sphinx Search comes to HubPages

powered by Sphinx

This week, HubPages upgraded its search features to use a popular indexing technology knowns as Sphinx Search.  This is the same search technology used by Craigslist and it has enabled HubPages to add the following new capabilities to our search:

  • Matching search terms in bold letters
  • Exact phrase search
  • Ability to exclude search terms from the results
  • Support for both American and British spellings
  • Searches automatically ignore common words

Matching Search Terms in Bold Letters

In the past, we showed the hub summary for every search (in HubPages terminology, each article written on HubPages is called a “hub”).  While the hub summary gives a great, short write up describing the hub, it does not make it clear how a hub matches the search terms.

Using Google search as our model, we now show each matching search term in bold letters.  We believe that this makes it easier to find the hub that you are looking for and also to refine your search to better find the hub that you are looking for.

For people who like to browse hubs, the summary text will still be used.  This will be used when a user sorts hubs by Latest, Hot, and Best and it will be used when a user lists hubs by tag.

Exact Phrase Search

Now, on HubPages, you can do a search for an exact phrase ebay capsule by putting the search terms in quotes: “ebay capsule”.

Without quotes, the search will return any article that contains both terms independent of distance and order.  But like the Google search functionality, why not have the choice.

For example, if you want to find all hubs related to Alice in Wonderland without gettings all the hubs related just to Alice, then you would enter for your search “Alice in Wonderland” (in quotes).

Ability to exclude search terms from the results

We’ve also added the ability to exclude terms from the search results.

For example, let’s say you want to find hubs written about Brad from Linkin Park.  You don’t remember his last name (it’s Delson if you are interested) but you don’t feel like to going through all the hubs about Brad Pitt.  You can then do the following query:

brad -“brad pitt”

Of course, if you knew his last name, then you could also use:

“brad delson”

Support for both American and British spellings

Many words have alternate spelling in British English versus American English.

colour and color

theatre and theater

humour and humor

To make search as intuitive as possible, we’ve made it so that you can search in either American English or British English and you will get the same result.

In the past, if you did a search such as this “British Humor” you might not find any hubs.  After all, these hubs will most likely have the terms “British Humour”.  But with the support for both British and American spellings, the two searches are exactly the same:

“British Humor”

“British Humour”

and likewise, these searches will return exactly the same results:

“American Humour”

“American Humor”

Searches automatically ignore common words

Every once in a while someone includes common words without thinking about it.  If not automatically ignored, common words can needlessly slow down a search since they are pretty much found in every hub.  We’ve set up the HubPages search to ignore common words.

Here is the list of terms that are ignored:

  • the
  • this
  • that
  • a
  • an
  • of
  • from
  • to
  • very
  • much
  • some
  • and

New General Hub Search

The General Hub Search is perhaps the most prominent search on HubPages.  In the past, when you entered a search only terms with at least three letters were used.  For example, if you wanted to look for articles on wu yi tea, you would find a list of hubs based on the search term tea.

Then, you would browse through the list of summaries.  If the summary or title included the search term, that was fine but often, it was not clear how the hub listed matched the search terms.  Additionally, the quality of the hub was not a factor in the ordering of search results.  A hub with a score of 50 might very well get listed before a hub with a score of 99.

The new hub search seeks to address these limitations.  An excerpt from each hu b is displayed with the matching terms in bold.  The search can now be based on two-letter keywords.  If you search by relevance, the hubs are ordered based on the level of the match and based on the hub score.  The results are now returned significantly faster.  In our tests, we found a consistent 5x speed improvement with the hub search.  The response speed will go up further when we update the related author and related tag searches to Sphinx.  Below is a screen shot of the new search:

Results from the New General Hub Search

Results from the New General Hub Search

New Author Hub Search

We’ve also upgraded the author hub search.  This is a search that is done from an author’s profile page.  Below an author’s image, there is a search box with two buttons: Search Text and Search by Tag.  The Search Text button results in a search across all of an author’s hubs.  You search in exactly the same way as the general hub search and the results show excerpts with the matching search terms in bold.  Below is a sample search done on Ryan “Hup” Hupfer’s profile for a hub on jelly beans.

Results from the New Author Hub Search

Results from the New Author Hub Search

New Request Search

HubPages has long had its own version of Yahoo! Answers.  At HubPages, we call the questions, “Requests” and the answers are themselves hubs.  In this way hubbers (HubPages term for authors who write hubs) can look through the requests and decide which ones to answer or take a look at the answers given.

In the old version of the request search, you might get as your first result that was over 2 years old.  This first result might have 0 answers or it might have 10 answers.  The search by relevance was done solely in terms of word matching and it ignored factors of recency and the answer count.

For the new request search, we assume that people are looking for requests to answer.  So, it makes sense that newer requests should come first and unanswered requests should come before requests that have already been answered (with requests answered by only 1 person coming before requests answered by 2 people, etc.).  The new request search accomplishes by grouping the results in 2-week buckets.  Unanswered requests made in the last 2 weeks will come first (followed by requests in the last two weeks with only 1 answer and so on and then followed by requests made in the last 4 weeks, 6 weeks, etc.)

We’ve also made it so that each search automatically gets added to the hub search box.  In this way, you can also do a hub search with the same search term.

Results from the New Request Search

Results from the New Request Search

New Forum Search

There are three major differences in the new forum search that we hope that you will notice.  First, the excerpt in the results shows the bolded, matching search terms (just like the Hub Search and the Author Search).  Second, the search results come back faster and we believe that the results are now a better match with your search terms.  Third, the hub search box is automatically populated with the same search terms in case you would like to do general hub search with the same words.

Below is a search I did on “ebay capsules”  The quotes enable me to do an exact-phrase search.  You can see the results below:

Results from the new forum search

Results from the new forum search

The Future of Search at HubPages

The HubPages staff been very impressed with the Sphinx Search technology.  As a result, we are planning to roll out additional search features over the upcoming months.  We are planning to have a HubPages-wide search across articles, forums, profiles, etc. with the ability to limit which sections that you are interested in.  For example, you could limit your search to hubs with images, RSS, video, etc.

We are very excited about the new search features.  We hope that you are too.

3 thoughts on “Sphinx Search comes to HubPages

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s