While gathering data, at some point you have so much information, that you can’t browse it in ordinary manner. Scouring manually through numerous records can be a real pain in the back. That’s when search engines come in handy.

There is a gem that enables you to use PostgreSQL full-text search features. It’s called pg_search. Even though it is a very convenient and easy to use, we were afraid it won’t be capable of working as planned with that big amount of data. Very big, to be honest.

There are 3 most popular search engines for web apps – Sphinx, Solr and ElasticSearch. Solr and ElasticSearch is based on Apache Lucene, which is a powerful information retrieval library. Needless to say, search engines are very powerful tools. Even though they serve the same purpose (which is text-searching the database and returning results), they have different advantages and key features.

After a short comparison of them all, we decided to use Sphinx. It seemed that it will give us the right efficiency at a reasonable amount of time. Also, none of us didn’t have any opportunity to work with it yet, so we considered this as a great chance to learn something new.

Getting started

Sphinx Installation

On the Sphinx downloads page you can get the engine installation libraries suited specifically for your system. I recommend you get the source tarball – it won’t hurt to compile it manually, and it seems to work across different platforms (I’ve done it both on MacOS X and Debian Wheezy).

To do this, get the tarball from the page. Run ./configure. Remember, if you want to connect Sphinx with PostgreSQL Database, you have to run this command with —with-pgsql parameter.

Then you have to run make and make install respectively. The Sphinx Search is now installed.

If you are afraid to mess anything up, and you happen to use a Mac, you can also install Sphinx through Homebrew. Just type brew install sphinx –mysql –pgsql in your terminal and then press Enter. Et voilà. Homebrew should do the rest.

Refer to the installation guide if you encounter any problems.

The Gem

Next, we need to get the gem. It is called thinking-sphinx. It should be a piece of cake. Put gem 'thinking-sphinx' and gem 'mysql2' (if you haven’t done it yet) in your Gemfile and run bundle install. Why the latter is needed? Sphinx uses a mySQL database to index data. Thus, the mySQL support is required.

Usage

Thinking Sphinx This gem isn’t difficult to use, since it is fairly well-documented on its website. Most importantly, Sphinx has to index your database entries to search for them later. Therefore, you need to define which fields it should index for future searches. You can also define a set of attributes for each model. They can be used for sorting and filtering your search results later.

Rake tasks

After you’ve defined your indices, you have to tell the Sphinx to actually index them and start the Sphinx daemon process subsequently. Pat – the creator of Thinking Sphinx – gave us a few, easy-to-memorize rake tasks.

You run the indexing using rake ts:index. You start the Sphinx daemon using rake ts:start. You stop the Sphinx daemon using rake ts:stop.

When you need to re-index the contents (e.g. you’ve added some new records to the database, or you set up new indices or attributes), you have to stop the daemon, index the records, and start it again. Fortunately, Pat thought about it too.

Asking the Sphinx

Our Sphinx should now be ready to answer the calls. Sphinx adds search method to each class. It takes your query attribute and accepts many options to help you get more suited results. I’ll supply you with a brief overview – for more options, refer to the manual.

Sort by attribute:

Question.search 'override', order: 'created_at DESC'

Search only for records with specified attributes:

Question.search 'override', with: { created_at: Date.today }
  # you can also pass range
  Question.search 'override', with: { created_at: 2.days.ago..Date.today }

…or indexed fields:

Question.search 'override', conditions: { content: 'programming' }

You can also search application-wide:

ThinkingSphinx.search 'override'

…or limit search to perform only on given classes:

ThinkingSphinx.search 'override', classes: [Question, Answer]

The integrated will_paginate pagination is pretty handy too:

 Question.search 'override', page: params[:page], per_page: 42

Get number of matches:

@questions = Question.search 'override'
  @questions.total_entries

Scoping the results also tends to be useful, but since Sphinx can’t work with Active Record scopes, we have to use its own methods:

# some other code

  sphinx_scope(:next_week) do
    { with: { q_date: Date.today..(Date.today + 1.week) } }
  end

  sphinx_scope(:today) do
    { with: { q_date: Date.today } }
  end

  default_sphinx_scope :next_week

  # some other code

Testing

Testing was the hardest bit. At first, we tried to run an actual search engine in the test environment. That was fairly tricky, but we managed to complete this task. For all we know, it noticeably slowed down the specs, and in the end we dropped this idea. Our search wasn’t that complex to check this out every time we run our specs. But, if you ever find yourself in a need to test Sphinx’s behavior, try putting this code in your spec:

  # turn off transactional_fixtures
  self.use_transactional_fixtures = false
  # Ensure sphinx directories exist for the test environment
  ThinkingSphinx::Deltas.suspend!
  ThinkingSphinx::Test.init
  # Configure and start Sphinx, and automatically
  # stop Sphinx at the end of the test suite.
  ThinkingSphinx::Test.start_with_autostop

  # Run your spec here

  ThinkingSphinx::Deltas.resume!

Otherwise, we advise you to stub out the search method as following :)

 allow(Question).to receive(:search).and_return(Question.first.paginate)

And then call the search method – I actually did it in Capybara integration spec, so it was run sort of ‘under the covers’. You can also call the method explicitly and expect it to give desired outcome. I bet you know what to do ;)

Deployment

One last thing to consider is actually deploying the app to the server. It’s rather painless, especially when using Capistrano – you just add the included recipe. You should remember to tweak the config to suit your needs, and set up your indexing task in your crontab (or use the gem whenever).

Conclusion

These are the basics to run this powerful search engine. Configuration, workflow and maintenance isn’t that hard to do. Hope you found this article helpful!

Post tags:

Join our awesome team
Check offers

Work
with us

Tell us about your idea
and we will find a way
to make it happen.

Get estimate