<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Starting with Zend_Search_Lucene</title>
	<atom:link href="http://dev.juokaz.com/php/starting-with-zend_search_lucene/feed" rel="self" type="application/rss+xml" />
	<link>http://dev.juokaz.com/php/starting-with-zend_search_lucene</link>
	<description>Random ideas, scripts and facts</description>
	<lastBuildDate>Mon, 29 Mar 2010 18:47:16 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Patrick</title>
		<link>http://dev.juokaz.com/php/starting-with-zend_search_lucene/comment-page-1#comment-1293</link>
		<dc:creator>Patrick</dc:creator>
		<pubDate>Mon, 20 Apr 2009 15:42:01 +0000</pubDate>
		<guid isPermaLink="false">http://dev.juokaz.com/?p=344#comment-1293</guid>
		<description>What is the limit of records for the Zend implementation of Lucene?
A time ago I inserted 10.000 records coming from a Mysql database (key/value combo&#039;s) with value being a TEXT data type in Mysql.
When I did a $index-&gt;find(&quot;value:&#039;test&#039;&quot;); I got too many records back, even resulting in scores of 4.08383+39E (or something like that) ???
If I keep my records below 1000 everything seems to work fine... (Debian Etch GNU/Linux, PHP v5.2, Zend Framework v1.7.5)

Regards,

Patrick</description>
		<content:encoded><![CDATA[<p>What is the limit of records for the Zend implementation of Lucene?<br />
A time ago I inserted 10.000 records coming from a Mysql database (key/value combo&#8217;s) with value being a TEXT data type in Mysql.<br />
When I did a $index-&gt;find(&#8220;value:&#8217;test&#8217;&#8221;); I got too many records back, even resulting in scores of 4.08383+39E (or something like that) ???<br />
If I keep my records below 1000 everything seems to work fine&#8230; (Debian Etch GNU/Linux, PHP v5.2, Zend Framework v1.7.5)</p>
<p>Regards,</p>
<p>Patrick</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: willian</title>
		<link>http://dev.juokaz.com/php/starting-with-zend_search_lucene/comment-page-1#comment-676</link>
		<dc:creator>willian</dc:creator>
		<pubDate>Fri, 03 Apr 2009 19:56:28 +0000</pubDate>
		<guid isPermaLink="false">http://dev.juokaz.com/?p=344#comment-676</guid>
		<description>hi,

i&#039;m trying to just delete a doc that i added. i have 20 documents added. lucene says that the doc was deleted but when i search i have a result like this: &quot;contains 20 documents&quot;, but it must be 19. do you have any idea why its happening.

thanks,
willian</description>
		<content:encoded><![CDATA[<p>hi,</p>
<p>i&#8217;m trying to just delete a doc that i added. i have 20 documents added. lucene says that the doc was deleted but when i search i have a result like this: &#8220;contains 20 documents&#8221;, but it must be 19. do you have any idea why its happening.</p>
<p>thanks,<br />
willian</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Juozas</title>
		<link>http://dev.juokaz.com/php/starting-with-zend_search_lucene/comment-page-1#comment-315</link>
		<dc:creator>Juozas</dc:creator>
		<pubDate>Fri, 13 Mar 2009 18:52:41 +0000</pubDate>
		<guid isPermaLink="false">http://dev.juokaz.com/?p=344#comment-315</guid>
		<description>&lt;a href=&quot;http://framework.zend.com/code/rdiff/Zend_Framework?csid=14304&amp;u&amp;N&quot; rel=&quot;nofollow&quot;&gt;Patch&lt;/a&gt; shows added limits for max terms per query and prefix length. Probably next week I will try to test it with new version, but from source it looks that &quot;* and * and *&quot; issue should have been fixed.</description>
		<content:encoded><![CDATA[<p><a href="http://framework.zend.com/code/rdiff/Zend_Framework?csid=14304&#038;u&#038;N" rel="nofollow">Patch</a> shows added limits for max terms per query and prefix length. Probably next week I will try to test it with new version, but from source it looks that &#8220;* and * and *&#8221; issue should have been fixed.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: robo47</title>
		<link>http://dev.juokaz.com/php/starting-with-zend_search_lucene/comment-page-1#comment-314</link>
		<dc:creator>robo47</dc:creator>
		<pubDate>Fri, 13 Mar 2009 18:46:54 +0000</pubDate>
		<guid isPermaLink="false">http://dev.juokaz.com/?p=344#comment-314</guid>
		<description>Seems something gets fixed in ZF 1.7.7 Release

http://framework.zend.com/issues/browse/ZF-3321
http://framework.zend.com/code/changelog/Zend_Framework/?cs=14304

Didn&#039;t look what exactly got changed, but ticket got marked as fixed.</description>
		<content:encoded><![CDATA[<p>Seems something gets fixed in ZF 1.7.7 Release</p>
<p><a href="http://framework.zend.com/issues/browse/ZF-3321" rel="nofollow">http://framework.zend.com/issues/browse/ZF-3321</a><br />
<a href="http://framework.zend.com/code/changelog/Zend_Framework/?cs=14304" rel="nofollow">http://framework.zend.com/code/changelog/Zend_Framework/?cs=14304</a></p>
<p>Didn&#8217;t look what exactly got changed, but ticket got marked as fixed.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Juozas Kaziukenas&#8217; Blog: Starting with Zend_Search_Lucene : Dragonfly Networks</title>
		<link>http://dev.juokaz.com/php/starting-with-zend_search_lucene/comment-page-1#comment-305</link>
		<dc:creator>Juozas Kaziukenas&#8217; Blog: Starting with Zend_Search_Lucene : Dragonfly Networks</dc:creator>
		<pubDate>Fri, 13 Mar 2009 04:52:29 +0000</pubDate>
		<guid isPermaLink="false">http://dev.juokaz.com/?p=344#comment-305</guid>
		<description>[...] Kaziukenas has a recent post to his blog introducing one of the many useful components of the Zend Framework - [...]</description>
		<content:encoded><![CDATA[<p>[...] Kaziukenas has a recent post to his blog introducing one of the many useful components of the Zend Framework &#8211; [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Juozas</title>
		<link>http://dev.juokaz.com/php/starting-with-zend_search_lucene/comment-page-1#comment-301</link>
		<dc:creator>Juozas</dc:creator>
		<pubDate>Thu, 12 Mar 2009 19:23:44 +0000</pubDate>
		<guid isPermaLink="false">http://dev.juokaz.com/?p=344#comment-301</guid>
		<description>4 billion results clearly will kill Zend_Lucene :)) But all normal solutions (Lucene, Sphinx) should work fine, I guess.</description>
		<content:encoded><![CDATA[<p>4 billion results clearly will kill Zend_Lucene :)) But all normal solutions (Lucene, Sphinx) should work fine, I guess.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: selected</title>
		<link>http://dev.juokaz.com/php/starting-with-zend_search_lucene/comment-page-1#comment-300</link>
		<dc:creator>selected</dc:creator>
		<pubDate>Thu, 12 Mar 2009 19:21:16 +0000</pubDate>
		<guid isPermaLink="false">http://dev.juokaz.com/?p=344#comment-300</guid>
		<description>it is fine as long as u use it for something small (like 6000 products or a blog like this). In my case (4 billion a4 documents) i would have to hung my self with zend implementation of lucene.</description>
		<content:encoded><![CDATA[<p>it is fine as long as u use it for something small (like 6000 products or a blog like this). In my case (4 billion a4 documents) i would have to hung my self with zend implementation of lucene.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: robo47</title>
		<link>http://dev.juokaz.com/php/starting-with-zend_search_lucene/comment-page-1#comment-299</link>
		<dc:creator>robo47</dc:creator>
		<pubDate>Thu, 12 Mar 2009 18:57:12 +0000</pubDate>
		<guid isPermaLink="false">http://dev.juokaz.com/?p=344#comment-299</guid>
		<description>Without the ability to let the user write complete querys lucene looses a lot. All the nice power the Lucene Implementation gives is away.
I currently only filter out * and ~, so most things the query language provides is still usable.
After finding out about the * AND * ... AND * -problem myself i made a lot of test-querys with the zend implementation and the only dangerous querys i was able to create contained * or ~ .
It&#039;s only some basic blacklisting but better than nothing and additional I have implemented a search-query-log which includes execution-time + memory-usage of the search, so if anything bad happens, I can analyze the querys and probably find a way to filter them.
Would be nice to find a solution for this, for example a way to give the query a time limit which is checked in the search-process and allows to throw an exception after the time is exceeded, because if the script dies because of memory_limit or max_execution_time, there are only 2 ways this is shown: white page or if display_errors is on, an error to the user ... both ways aren&#039;t something i want to choose.</description>
		<content:encoded><![CDATA[<p>Without the ability to let the user write complete querys lucene looses a lot. All the nice power the Lucene Implementation gives is away.<br />
I currently only filter out * and ~, so most things the query language provides is still usable.<br />
After finding out about the * AND * &#8230; AND * -problem myself i made a lot of test-querys with the zend implementation and the only dangerous querys i was able to create contained * or ~ .<br />
It&#8217;s only some basic blacklisting but better than nothing and additional I have implemented a search-query-log which includes execution-time + memory-usage of the search, so if anything bad happens, I can analyze the querys and probably find a way to filter them.<br />
Would be nice to find a solution for this, for example a way to give the query a time limit which is checked in the search-process and allows to throw an exception after the time is exceeded, because if the script dies because of memory_limit or max_execution_time, there are only 2 ways this is shown: white page or if display_errors is on, an error to the user &#8230; both ways aren&#8217;t something i want to choose.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Juozas</title>
		<link>http://dev.juokaz.com/php/starting-with-zend_search_lucene/comment-page-1#comment-297</link>
		<dc:creator>Juozas</dc:creator>
		<pubDate>Thu, 12 Mar 2009 17:34:58 +0000</pubDate>
		<guid isPermaLink="false">http://dev.juokaz.com/?p=344#comment-297</guid>
		<description>Oh, I see. 
I read many (some) complaints about memory/speed issues. Now I need to add &quot;query injections&quot; to list what can cause problems.

What about letting users to submit only keywords, not actual queries and then create them? It should work, but you limit yourself a lot :(</description>
		<content:encoded><![CDATA[<p>Oh, I see.<br />
I read many (some) complaints about memory/speed issues. Now I need to add &#8220;query injections&#8221; to list what can cause problems.</p>
<p>What about letting users to submit only keywords, not actual queries and then create them? It should work, but you limit yourself a lot :(</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: robo47</title>
		<link>http://dev.juokaz.com/php/starting-with-zend_search_lucene/comment-page-1#comment-296</link>
		<dc:creator>robo47</dc:creator>
		<pubDate>Thu, 12 Mar 2009 17:28:01 +0000</pubDate>
		<guid isPermaLink="false">http://dev.juokaz.com/?p=344#comment-296</guid>
		<description>But currently it is important to filter data which you pass to the query-methode, because some things can easily get the script to reach memory_limit or max_execution_time.

especially querys like * AND * AND * .... AND * will use LOTS of memory and can run for minutes or longer. the proximity search also offers some danger to long running querys when giving 2 equal words and a big number.

thread in a german forum: http://www.zfforum.de/showthread.php?p=29697 
A related open Bug in the tracker: http://framework.zend.com/issues/browse/ZF-3321</description>
		<content:encoded><![CDATA[<p>But currently it is important to filter data which you pass to the query-methode, because some things can easily get the script to reach memory_limit or max_execution_time.</p>
<p>especially querys like * AND * AND * &#8230;. AND * will use LOTS of memory and can run for minutes or longer. the proximity search also offers some danger to long running querys when giving 2 equal words and a big number.</p>
<p>thread in a german forum: <a href="http://www.zfforum.de/showthread.php?p=29697" rel="nofollow">http://www.zfforum.de/showthread.php?p=29697</a><br />
A related open Bug in the tracker: <a href="http://framework.zend.com/issues/browse/ZF-3321" rel="nofollow">http://framework.zend.com/issues/browse/ZF-3321</a></p>
]]></content:encoded>
	</item>
</channel>
</rss>
