Searching Multiple Solr Cores using Shards and eZ Find

1
2
3
4
5
6
7
8
9
<?php
if ( isset( $distributedSearch['shards'] ) )
{
    foreach  ( $distributedSearch['shards'] as $shard )
    {
        $shardUrls[] = $iniShards[$shard];
    }
    $shardQuery = implode( ',', $shardUrls );
}

Prerequisites:

  • eZ Publish with eZ Find installed

The following post is based on the option in the eZ Find solr.ini referring to Shards.

[SolrBase]
#Shards mapping, can be to multicores in one servlet or even a crosss servers
#typical use is multilingual setups, but also for external index support
#the keys are used as shorthands in template functions
#Shards[]
#Shards[eng-GB]=http://localhost:8983/solr/eng-GB
#Shards[fre-FR]=http://localhost:8983/solr/fre-FR
#Shards[myforeignindex]=http://myotherhost:8983/solr

What this means for eZ Find developers: we can do a distributed search in our standard eZ Find fetch (with a little work).

First things first, we need to tell each siteaccess what its Search Server URI is. Within the settings/siteaccess/<siteaccess-name>/ directory we need to create a new solr.ini.append.php file (duplicate the file in the admin version). Within each siteaccess we need to input our Solr core name; for harmssite it would be SearchServerURI=http://localhost:8983/solr/harmssite and for the counterhelix siteaccess it would be SearchServerURI=http://localhost:8983/solr/counterhelix.

Next, make sure that you have your cores installed into solr. In order to do this just copy the default eng-GB from the solr.multicore/ directory and then rename it for each core you wish to have, in my case I have a harmssite and a counterhelix core. There is also a solr.xml that we need to edit. Here we declare each core that we created.

Mine:

<?xml version="1.0" encoding="UTF-8" ?>
<solr persistent="true" sharedLib="lib">
  <cores adminPath="/admin/cores">
    <core name="harmssite" instanceDir="harmssite" />
    <core name="counterhelix" instanceDir="counterhelix" />
  </cores>
</solr>

Finally, we need to start Solr, while giving it a new home parameter.

java -Dsolr.solr.home=solr.multicore -jar start.jar

There is decent documentation on this up to this point located here.

Next, we need to add a list of our shards into our override solr.ini.append.php file. Use the demo from above, just put in the url each core. Something to note here. I believe the documentation is wrong here. We should not include the http:// in this list, Solr will add this automatically for us when it runs the search. Thus, mine looks like this:

[SolrBase]
Shards[]
Shards[harmssite]=localhost:8983/solr/harmssite
Shards[counterhelix]=localhost:8983/solr/counterhelix

I am assuming here that you know how to index your content, so update your search index to follow these settings. The next step is to set up a distributed search in our eZ Find fetch.

example:

{set $search=fetch( 'ezfind', 'search',
                    hash( 'query', $query,
                          'sort_by', $sort_by,
                          'facet', $defaultSearchFacets,
                          'filter', $filterParameters,
                          'publish_date', $search_date,
                          'offset', $view_parameters.offset,
                          'limit', $page_limit,
                          'as_objects', false(),
                          'distributed_search', hash(
                              'shards',array('harmssite', 'counterhelix')
                          )
                         ))}

Note that we have to send the distributed_search parameter as a hashed array. In this case we are telling eZ Find to search both shards. We can have eZ Find search whatever core we want it to at this point, just harmssite, just counterhelix, both, or even new cores that we might add later. The advantages are obvious, separated indexes allowing for much larger sets of data is a huge upgrade. The downside is that searching multiple cores at once is a little slower, so really you should not use this option unless you are in need of scaling Solr out more; this would be a touch overkill for my site.

If you have followed me up to this point and have tried running this, you may have noticed that it does not work out of the box. The ezfezpsolrquerybuilder.php is still a work in process and has not been set up to handle shards just yet. Out of the box, should you send the distributed search shards array to eZ Find, eZ Find will take it, generate the shard urls and then do nothing with them. The task I leave you is to try and fix this problem for eZ Find. I have a fix for this, but would love to see what other people come up with (I will post my code here next week). The file in question. Remember, all we need to do is send the shards parameter to the search plugin should one be sent to eZ Find; the search plugin is already setup to handle shards.

Related Articles