Start Searching with Solr - Integrating Solr into any PHP project is easy with Solarium

{
    "require": {
        "solarium/solarium": "3.0.0"
    }
}
<?php
$config = array(
    'endpoint' => array(
        'localhost' => array(
            'host' => '127.0.0.1', 'port' => '8983', 'path' => '/solr/'
        )
    )
);

// new Solarium Client object
$client = new Solarium\Client($config);
<field name="id" type="int" indexed="true" stored="true" />
<field name="name" type="string" indexed="true" stored="true" multiValued="true" />
<field name="text" type="text" indexed="true" stored="true" />

Introduction to Solr:

Search is an important part of most web projects and should be given much care. Throughout the years, making content searchable has changed a great deal. Obviously, exposing SQL queries to users through search is dangerous. Modern search implementations require a secure approach, such as a stand-alone search server, separate from site data. There are many search options to choose from. Your search list might include: Sphinx, Flax, ElasticSearch, Google, and Solr. Solr’s scalability, speed, built in features and community make it an ideal platform for any project. Luckily, integrating Solr into your PHP project has been made easy by the Solarium Project.

In Apache’s own words, “Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON, Ruby, and Python APIs, hit highlighting, faceted search, caching, replication, and a web administration interface1. Apache Solr is an open source, extensible, stand-alone search engine, built on Lucene, managed by The Apache Software Foundation. There are currently two stable versions of Solr available for use; Solr 3.6.2 and Solr 4.0. Either version may be used with Solarium; Solr 4 ships with added features, however, so use Solr 4 if possible.

Solr is designed to run as a stand-alone Java web application (app). The documentation for downloading, installing, and running Solr can be found at http://lucene.apache.org/solr/tutorial.html. Because Solr is meant to be separate from the primary project, it can be installed anywhere. All interactions with Solr happen over HTTP through a REST-like API. Solarium utilizes both PHP and JSON for its interactions with Solr.

One of the greatest features of Solr is its scalability. Replication can be set up to run multiple Solr instances in a master/slave setup, just like MySQL replication.The documentation for setting up a master/slave setup is very thorough, so it will not be discussed further here. Another option for scaling Solr is to search multiple search cores. Solr can run multiple search cores at a time. Solr can search multiple cores at a time through its distributed search and sharding capabilities. Large search indexes can be split into multiple smaller indexes; then through a distributed search multiple cores are searched. The best part of a multicore search setup is that not all cores need to exist on the same server. This allows for very flexible search architecture.

Solarium acts as a bridge

Solr has a REST-like API, so interactions with it are very simple and happen over HTTP. The Solarium Project is “an open source Solr client library for PHP applications2, which makes interacting with Solr even simpler. Solarium exists to expose the Solr API through an easy to use PHP library.

Installation of Solarium has been made very easy through the use of Composer (Listing 1). Installing through Composer will automatically add the library to the PHP autoload path. To install Solarium into a project not utilizing Composer, the files will need to be downloaded from GitHub (https://github.com/basdenooijer/solarium/tags). Next, add the solarium library to the PHP autoload path manually. After Solarium is installed, Solr can be accessed through sending a Solarium Client ($client) object to a configuration array. The configuration array should be made up of the Solr host, port and path (Listing 2).

Adding content to Solr

Before anything can be searched in Solr, documents (docs) must be added to the Solr index. Solr maintains a collection of docs in its index. A doc is a collection of fields and values. Solr fields must be defined in the Solr schema. A field can occur multiple times in a doc.

To make an update to Solr using Solarium, start with the $client object. Using the $client object, create an $update instance then make a doc:

$update = $client->createUpdate();
$doc1 = $update->createDocument();

With the newly created $doc, begin adding content to Solr fields as instances of the $doc:

$doc1->id = 123;
$doc1->name = 'First Document';
$doc1->text = 'This is the first document\'s content';

// Create another doc
$doc2 = $update->createDocument();
$doc2->id = 234;
$doc2->name = 'Next Document';
$doc2->text = 'This is the next document\'s content';

In this example, the Solr schema would have to be set up to accept the id, name and text fields. Solr ships with an example search app with a good demo of a Solr schema; each field will need to be added to the schema in order for Solr to add the docs to the search index (Listing 3). Next, add each $doc created to an array so that addDocuments() can be called on the array. This will run the Solr add command for every document in the array:

$update->addDocuments(array($doc1, $doc2));

The newly-added docs will not actually become a permanent part of the Solr index until Solr is told to commit all added docs to its index. To do this with Solarium, call addCommit() to generate a commit message for Solr, then a call to update() with the $update will send the commit:

$update->addCommit(); $client->update($update);

The new docs should now exist and be searchable using Solr.

Searching Solr through Solarium

Solarium offers a robust PHP API for searching Solr. To run a basic query, start with the $client object. Next, call the createSelect() instance of the $client object to generate the query for execution:

$query = $client->createSelect();

// *:* is equivalent to telling solr to return all docs
$query->setQuery('*:*');

Because the createSelect() function is being used to generate the query, the select() function should be used when executing the query:

$resultSet = $client->select($query);

Use the $resultSet array to iterate over the array and display the results:

echo '<div class="search-results">';
foreach ($resultSet as $result) {
    echo '<div class="search-result">';
    echo '<p>' . $result->id . '</p>';
    echo '<p>' . $result->name . '</p>';
    echo '<p>' . $result->text . '</p>';
    echo '</div>';
}
echo '</div>';

The $resultSet can also be refined. Setting up pagination, for instance, is very easy:

$query->setStart(0)->setRows(10);

Sorting is also very easy (note that Solr will sort by score if no sort is set):

$query->addSort('name', Solarium_Query_Select::SORT_ASC);

Filtering, setting query fields, boosting, and faceting are also possible ways of searching Solr. Filtering is another way of narrowing search results without using a query. Query fields allow for control over what fields are searched. Boosting gives certain field’s higher precedence in the Search. Facets give users a way to better navigate search results. Solarium gives easy access to all the various ways of searching Solr content.

Removing content from Solr

Solr does not have a command called update; however, updates can be performed by re-adding content. Solr uses a unique identifier for each doc and only allows for one instance of the identifier in the search index. To update content, call the add command on an already existent Solr doc id. Sometimes updating Solr with new content isn’t enough. Sometimes the need to delete content from Solr arises. Doing this is similar to when adding docs, create an $update instance:

$update = $client->createUpdate();

Using the $update instance, documents can be deleted by id or by query. This offers a great deal of flexibility for deleting content. Deleting by id is used more often because it offers very precise deletes:

$update->addDeleteById(234);
$update->addCommit();
$client->update($update);

Deleting by query, however, can be incredibly powerful and useful. When starting development of a new search project with test content, for instance, deleting by query offers the ability to remove all test content. If all the test content starts with the word test, deleting the content is simple:

$update->addDeleteQuery('name:test*');
$update->addCommit();
$client->update($update);

Another common practice is to wipe the index and start fresh. Using delete by query, this task is again made very simple:

$update->addDeleteQuery('*:*');
$update->addCommit();
$client->update($update);

Solarium’s delete doc API is very useful in maintaining an up-to-date search index.

Conclusion

Solr is an incredibly scalable, powerful and easy-to-use search engine that should be considered for any search project. Solr is designed to be language agnostic through its REST-like API, and thanks to Solarium it fits very easily into any PHP project. This article covers only three use cases for Solarium; the examples shipped with Solarium offer a great deal more. Search highlighting, debugging, optimizing, and ‘more like this’ functionality are all possible with Solr and Solarium.

References