3 min readconfiguration

Automatic ID generation in Apache Solr

On this page

I have been working on Apache Solr for last few months, and have been recieving requirements to speed up query process. As part of the investigation, i found out as retrieved documents’ unique id generation contributes query processing.And hence i have decided to add this post.

# Data Structure

Our sample data structure (field section from schema.xml) looks like specified below:

  <fields>
    <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
    <field name="name" type="text_general" indexed="true" stored="true" />
    <field name="_version_" type="long" indexed="true" stored="true" />
  </fields>

In addition to this, I’ve added the information about which field is the one that should contain unique identifiers. This was also done in schema.xml file:

<uniqueKey>id</uniqueKey>

# Solr Configuration

In addition to changes in the schema.xml file, i need to modify the solrconfig.xml file and introduce a proper UpdateRequestProcessorChain like specified below:

<updateRequestProcessorChain>
  <processor class="solr.UUIDUpdateProcessorFactory">
    <str name="fieldName">id</str>
  </processor>
  <processor class="solr.LogUpdateProcessoryFactory" />
  <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>

Above informs Solr that id field contents are to be generated automatically.

# Simple Test

Enough with the configuration, time to test the configuration. Run below command from terminal to update document before querying indexed documents.

$> curl -XPOST 'localhost:8993/solr/update?commit=true' --data-binary '<add><doc><field name="name">Test</field></doc></add>' -H 'Content-type:application/xml'

If above command runs successfully without any errors, document will get indexed. After then, in order to query below command can be used:

$> curl -XGET 'localhost:8993/solr/select?q=_:_&indent=true'

Above will return queried documents specified below:

<?xml version="1.0" encoding="UTF-8"?>
<response>
 <lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">0</int>
  <lst name="params">
   <str name="indent">true</str>
   <str name="q">*:*</str>
  </lst>
 </lst>
 <result name="response" numFound="1" start="0">
  <doc>
   <str name="name">Test</str>
   <str name="id">1cdee8b4-c42d-4101-8301-4dc350a4d522</str>
   <long name="_version_">1439726523307261952</long>
  </doc>
 </result>
</response>

If you analyze response, you can see the unique identifier was automatically generated. Now if you run same commands ( addition of document & query ) then result would looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<response>
 <lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">1</int>
  <lst name="params">
   <str name="indent">true</str>
   <str name="q">*:*</str>
  </lst>
 </lst>
 <result name="response" numFound="2" start="0">
  <doc>
   <str name="name">Test</str>
   <str name="id">1cdee8b4-c42d-4101-8301-4dc350a4d522</str>
   <long name="_version_">1439726523307261952</long>
  </doc>
  <doc>
   <str name="name">Test</str>
   <str name="id">9bedcb5f-1b71-4ab7-80a9-9882a6bf319e</str>
   <long name="_version_">1439726693819351040</long>
  </doc>
 </result>
</response>

As you can see both documents show two different unique identifier generated by solr.