<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Juozas devBlog &#187; sphinx</title>
	<atom:link href="http://dev.juokaz.com/tag/sphinx/feed" rel="self" type="application/rss+xml" />
	<link>http://dev.juokaz.com</link>
	<description>Random ideas, scripts and facts</description>
	<lastBuildDate>Mon, 22 Mar 2010 10:48:42 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Starting with Zend_Search_Lucene</title>
		<link>http://dev.juokaz.com/php/starting-with-zend_search_lucene</link>
		<comments>http://dev.juokaz.com/php/starting-with-zend_search_lucene#comments</comments>
		<pubDate>Wed, 11 Mar 2009 14:19:37 +0000</pubDate>
		<dc:creator>Juozas</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[full-text]]></category>
		<category><![CDATA[index]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[speed]]></category>
		<category><![CDATA[sphinx]]></category>
		<category><![CDATA[zend]]></category>
		<category><![CDATA[zend framework]]></category>

		<guid isPermaLink="false">http://dev.juokaz.com/?p=344</guid>
		<description><![CDATA[As websites grows, searches like &#8220;LIKE title &#8216;%search term%&#8217;&#8221; becomes unreliable. There are very good solutions like Sphinx, Lucene, etc, but not surprisingly, you can&#8217;t always have Sphinx installed (shared servers again) and other solutions should be chosen. 
MySQL supports full-text indexing, but it doesn&#8217;t give a lot of control over actual index. Luckily, Zend [...]]]></description>
			<content:encoded><![CDATA[<p>As websites grows, searches like &#8220;LIKE title &#8216;%search term%&#8217;&#8221; becomes unreliable. There are very good solutions like <a href="http://sphinxsearch.com/">Sphinx</a>, <a href="http://lucene.apache.org/java/docs/">Lucene</a>, etc, but not surprisingly, you can&#8217;t always have Sphinx installed (shared servers <a href="http://dev.juokaz.com/php/using-phing-to-synchronize-files">again</a>) and other solutions should be chosen. </p>
<p>MySQL supports <a href="http://www.devarticles.com/c/a/MySQL/Getting-Started-With-MySQLs-Full-Text-Search-Capabilities/">full-text indexing</a>, but it doesn&#8217;t give a lot of control over actual index. Luckily, Zend team has done wonderful job and implemented Lucene search in PHP (100%). <a href="http://framework.zend.com/manual/en/zend.search.lucene.html">Zend_Search_Lucene</a> is part of Zend Framework, but as all framework modules runs almost independently (it uses Zend_Exception, etc.).</p>
<p>How you start indexing data? Zend <a href="http://framework.zend.com/manual/en/zend.search.lucene.index-creation.html">manual</a> has very good examples how to start with Lucene, but to create sample index index you can use this code (you need to have auto-loading enabled and db connection available):</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">// Create index</span>
<span style="color: #000088;">$index</span> <span style="color: #339933;">=</span> Zend_Search_Lucene<span style="color: #339933;">::</span><span style="color: #004000;">create</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'indexes/products'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000088;">$sql</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">&quot;select product_name, product_url from products&quot;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000088;">$results</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$db</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">fetchAll</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$sql</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #b1b100;">foreach</span> <span style="color: #009900;">&#40;</span><span style="color: #000088;">$results</span> <span style="color: #b1b100;">as</span> <span style="color: #000088;">$result</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #000088;">$doc</span> <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Zend_Search_Lucene_Document<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">// Store document URL to identify it in the search results</span>
    <span style="color: #000088;">$doc</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">addField</span><span style="color: #009900;">&#40;</span>
    Zend_Search_Lucene_Field<span style="color: #339933;">::</span><span style="color: #004000;">UnIndexed</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'url'</span><span style="color: #339933;">,</span> <span style="color: #000088;">$result</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">product_url</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">// Index document title</span>
    <span style="color: #000088;">$doc</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">addField</span><span style="color: #009900;">&#40;</span>
    Zend_Search_Lucene_Field<span style="color: #339933;">::</span><span style="color: #004000;">Text</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'title'</span><span style="color: #339933;">,</span> <span style="color: #000088;">$result</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">product_name</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">// Add document to the index</span>
    <span style="color: #000088;">$index</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">addDocument</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$doc</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #666666; font-style: italic;">// Optimize index.</span>
<span style="color: #000088;">$index</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">optimize</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>This simple code will select products information from database, loop through results and add them as documents to index. In this example I added <em>url</em> as UnIndexed, because I&#8217;m only going to search by title, but Lucene allows other <a href="http://framework.zend.com/manual/en/zend.search.lucene.html#zend.search.lucene.index-creation.understanding-field-types">field types</a>. In most cases, product description or document text should be added (or maybe even indexed).</p>
<p>Searching through index is even easier. One thing you need to learn is how to construct search queries in required <a href="http://framework.zend.com/manual/en/zend.search.lucene.query-language.html">query language</a>. Example:</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">// Open index</span>
<span style="color: #000088;">$index</span> <span style="color: #339933;">=</span> Zend_Search_Lucene<span style="color: #339933;">::</span><span style="color: #004000;">open</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'indexes/products'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000088;">$query</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">'title:&quot;Apple MacBook&quot;'</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #666666; font-style: italic;">// Search by query</span>
<span style="color: #000088;">$hits</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$index</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">find</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$query</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #b1b100;">foreach</span> <span style="color: #009900;">&#40;</span><span style="color: #000088;">$hits</span> <span style="color: #b1b100;">as</span> <span style="color: #000088;">$hit</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #b1b100;">echo</span> <span style="color: #000088;">$hit</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">score</span> <span style="color: #339933;">.</span> <span style="color: #0000ff;">&quot; &quot;</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">echo</span> <span style="color: #000088;">$hit</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">title</span> <span style="color: #339933;">.</span> <span style="color: #0000ff;">&quot; &quot;</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">echo</span> <span style="color: #000088;">$hit</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">url</span> <span style="color: #339933;">.</span> PHP_EOL<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>I tried creating index of 6&#8242;000 products, index (0.7 MB) was created in around 3 minutes and all searches takes about 0.1 s. I tested it on my laptop, without APC and with development Apache/PHP configuration. Normal servers would run this task much more faster, but 0.1 for search is not that bad.</p>
<p>Zend_Search_Lucene will not change Sphinx or Lucene, but in limited environments (like shared servers) it can be quite useful. It supports many query types: phrase queries, boolean queries, wildcard queries, proximity queries, range queries and many other, what can be hardly achieved with using full-text MySQL indexes.</p>
]]></content:encoded>
			<wfw:commentRss>http://dev.juokaz.com/php/starting-with-zend_search_lucene/feed</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
	</channel>
</rss>
