Solr Standalone Analysers

21 Jul 2019 solr java

To efficiently and effectively search text, Solr/Lucene, splits text into tokens (which are actually graphs) at index time as well as query time. These tokens/graphs can be both pre and post filtered to provide additional flexability. Though they can be pre and/or post filtered this post will only cover analysers that are standalone and are not chainable or pre/post filterable.

Before we start with the list of standalone analysers it should be noted that these analysers only work on Solr text fields with the field type of solr.TextField.

Solr schema

To set-up a solr.TextField to utilize an analyser we need to configure the field inside the solr scheme. In the email below we have a field keyed as text_german that utilizes the org.apache.lucene.analysis.de.GermanAnalyzer. As this is a standalone analyser the analyser is utilized for analysis at both indexing and query time.

<fieldType name="text_german" class="solr.TextField">
  <analyzer class="org.apache.lucene.analysis.de.GermanAnalyzer"/>
</fieldType>

Standalone analysers

All standalone analysers ultimately extend org.apache.lucene.analysis.Analyzer, but for the most part the standalone analysers extend org.apache.lucene.analysis.StopwordAnalyzerBase.