Saumil Patel

IT Professional, Tinkerer, Thinker Code, Thoughts and Ideas

Hey folks, I'm Saumil. I work in tech. I’m an experienced full-stack software engineer and a tinkerer turned software architect.

I like to work with simple things - things can help do day-to-day job easier for any software engineers. Most of the time i am behind a computer trying to find better ways to solve problems. And, obviously i pick up few new things now-and-then when trying to find solutions.

Wish to support my work, Click to buy me a coffee.

About Me

Automatic ID generation in Apache Solr
2015-02-16

I have been working on Apache Solr for last few months, and have been recieving requirements to speed up query process. As part of the investigation, i found out as retrieved documents' unique id generation contributes query processing.And hence i have decided to add this post.

Data Structure

Our sample data structure (field section from schema.xml) looks like specified below:

  <fields>
    <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
    <field name="name" type="text_general" indexed="true" stored="true" />
    <field name="_version_" type="long" indexed="true" stored="true" />
  </fields>

In addition to this, I've added the information about which field is the one that should contain unique identifiers. This was also done in schema.xml file:

<uniqueKey>id</uniqueKey>

Solr Configuration

In addition to changes in the schema.xml file, i need to modify the solrconfig.xml file and introduce a proper UpdateRequestProcessorChain like specified below:

<updateRequestProcessorChain>
  <processor class="solr.UUIDUpdateProcessorFactory">
    <str name="fieldName">id</str>
  </processor>
  <processor class="solr.LogUpdateProcessoryFactory" />
  <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>

Above informs Solr that id field contents are to be generated automatically.

Simple Test

Enough with the configuration, time to test the configuration. Run below command from terminal to update document before querying indexed documents.

$> curl -XPOST 'localhost:8993/solr/update?commit=true' --data-binary '<add><doc><field name="name">Test</field></doc></add>' -H 'Content-type:application/xml'

If above command runs successfully without any errors, document will get indexed. After then, in order to query below command can be used:

$> curl -XGET 'localhost:8993/solr/select?q=_:_&indent=true'

Above will return queried documents specified below:

<?xml version="1.0" encoding="UTF-8"?>
<response>
 <lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">0</int>
  <lst name="params">
   <str name="indent">true</str>
   <str name="q">*:*</str>
  </lst>
 </lst>
 <result name="response" numFound="1" start="0">
  <doc>
   <str name="name">Test</str>
   <str name="id">1cdee8b4-c42d-4101-8301-4dc350a4d522</str>
   <long name="_version_">1439726523307261952</long>
  </doc>
 </result>
</response>

If you analyze response, you can see the unique identifier was automatically generated. Now if you run same commands ( addition of document & query ) then result would looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<response>
 <lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">1</int>
  <lst name="params">
   <str name="indent">true</str>
   <str name="q">*:*</str>
  </lst>
 </lst>
 <result name="response" numFound="2" start="0">
  <doc>
   <str name="name">Test</str>
   <str name="id">1cdee8b4-c42d-4101-8301-4dc350a4d522</str>
   <long name="_version_">1439726523307261952</long>
  </doc>
  <doc>
   <str name="name">Test</str>
   <str name="id">9bedcb5f-1b71-4ab7-80a9-9882a6bf319e</str>
   <long name="_version_">1439726693819351040</long>
  </doc>
 </result>
</response>

As you can see both documents show two different unique identifier generated by solr.

Back to Top

Automatic ID generation in Apache Solr 2015-02-16

Data Structure

Solr Configuration

Simple Test

Automatic ID generation in Apache Solr
2015-02-16