Zend Framework and Doctrine. Part 3

Posted November 30th, 2009 by Juozas

doctrine-orm-php5 During last two months I spent massive amount of time tweaking Doctrine ORM framework and making it to perform as fast as possible (as you might have noticed from my never ending tweets). This post is devoted to performance and efficiency, with practical tips & tricks how to reduce memory usage, make it work faster and save resources.

Doctrine is a very powerful framework, however you should study its behavior a little bit to get it working properly. As it turns out – it’s not that hard.

Speed

One of the first things I recommend looking at is query cache. Because there is quite a lot happening in turning DQL statement into the SQL query, caching that process can increase performance by a big margin. Best choice – APC, although robo47.net has an example code for use of Zend_Cache adapters, just don’t forget that file back-end is probably not the best pick.

Hydrators are probably the easiest thing to misuse. Hydration in Doctrine language is turning SQL query results into data graph. It can be an array, object or single value. It’s very nice to get results as PHP objects (in this case – models), however it’s slow. Slower than hydrating like array, and much slower than getting raw result from PDO. That’s why I always recommend answering one simple question: “will you be updating/deleting records?”. If the answer is no – hydrate as array, because you are not going to need an actual model (usually).

For this reason I always try to use array notation to access properties rather than like-object-vars one. Basically instead of $product->name I tend to write $product['name'], which returns the same result, but makes switching to array hydration very easy. You can find more tips and tricks at Doctrine manual here, but the key moment is to remember that the faster data structure in PHP is array (I believe so), so if application is not performing well – start playing with hydrations first.

Doctrine has a very nice support for relations, where they work like proxies and data can be retrieved when it’s needed (lazy-loading). However, this is wrong:

$user = Doctrine_Core::getTable('User')->find(1);
 
foreach ($user->Comments as $comment)
{
   print $comment->NewsItem->title . '<br />';
}

This code is bad, because for each comment you will load news item using a separate query, hence you are wasting db server resources and making code slower. Optimization is pretty straightforward:

$query = Doctrine_Query::create();
 
$query->from('Comments C')
        ->innerJoin('C.NewsItem N')
        ->where('C.user_id = ?');
 
foreach ($query->execute(array(1)) as $comment)
{
   print $comment->NewsItem->title . '<br />';
}

Here all the data is loaded in one query and everything happens much faster (and in this case making it to hydrate as array would also improve the performance).

However, make sure you know what you are joining. For example imagine that in previous example news item is also one-to-many (comment has many news items) relation and user 1 has written 1000 comments where each comment is attached to average news items. Query above will return 50′000 records (50 * 1000) which Doctrine will need to hydrate then. This will be very slow and probably going to kill your web server after some time. One day I had a query which was returning 2GB of data, server admins where probably not very happy about it…

Memory

One of my favorite parts of software development is playing with memory usage and making it efficient. Even though it sounds really simple, debugging it and finding it where the memory is leaking is not an easy task. Recently I was using Doctrine to work with quite big datasets (on average 50′000 of records) and probably have tried all the possible tricks to make Doctrine memory efficient. My code looked really simple:

$data = Doctrine_Query::create()->execute();
 
foreach ($data as $item)
{
    // do some work here
}

You might expect that memory usage would be steady, however it is not. Doctrine uses identity map and objects also have a lot of references which makes freeing up memory a tricky job. As my experience showed, even though records have a method free() which is supposed to de-reference it, sometimes it doesn’t help.

So to make memory management work, make sure to try these:

  • $record->free(true) – deep free-up, calls free() on all relations too
  • $collection->free() – free all collection references
  • Doctrine_Manager::connection()->clean() – cleanup connection (and remove identity map entries)

With some debugging and profiling (and these methods) you hopefully can make memory usage to be low. I’m also using some custom iterators (available here) to divide query into chunks, because loading 50′000 of objects in one go is not going to work, so you might want to look at it too.

Another recommended tip: make sure to free queries too. As collections hold references to all records, query object also has some references to parsed sub-parts of query. For this I use auto-free setting enabled by (available in my first part post also here):

// enable automatic queries resource freeing
$manager->setAttribute(
	Doctrine_Core::ATTR_AUTO_FREE_QUERY_OBJECTS,
	true
);

It’s very easy to forget to free a query and it’s again creating these references which makes garbage collector’s work hard. Nevertheless, after enabling this one I don’t know any other place where the memory can start to leak.

Last step – PHP 5.3. I’ve been working with this version for a few months and it works great, so if it’s possible – I recommend using it (you will also help with testing frameworks, but both Zend Framework and Doctrine already should work fine). One important function here is improved garbage collector, so the actual script takes less memory to execute. I haven’t recorded benchmarks on this one, but what I’ve noticed during development is that 5.3 does in fact has a lower peak memory usage.

Conclusion

I think I haven’t missed anything here, or at least these are the tips which made applications work fast (if I have – let me know). To finish with – good optimizations are done by comprehensive benchmarking and profiling, so the fact that Doctrine is a big framework doesn’t necessary mean it’s slow too. At the end of the day, it’s usually only a matter of reading a manual.

All parts:

  1. Zend Framework and Doctrine. Part 1
  2. Zend Framework and Doctrine. Part 2
  3. Zend Framework and Doctrine. Part 3

 

Trackbacks/Pingbacks

  1. Zend Framework and Doctrine. Part 2 | Juozas devBlog
  2. Lingering reservations about ORM | Hot Dorkage
  3. Juozas Kaziukenas’ Blog: Zend Framework and Doctrine. Part 3 | Webs Developer

Comments (4)

  1. romanb

    Memory leaks are usually not that hard to debug. It can just be very hard under php < 5.3 because your code may look completely fine, yet objects keep each other alive even though *your code* never references any of them anymore anywhere.

    These are some good tips for Doctrine 1.

    With 5.3 and Doctrine 2 I think you will find performance and memory usage to be much better, straight-forward and easier to investigate.

  2. Juozas (author)

    Roman, I was testing on both 5.3 and 5.2 branches, yet sometimes you are just stuck, because whatever you change doesn’t change the usage, then with some luck you somehow manage to track it down.

    I’m looking forward to Doctrine 2.0 (and Symfony/ZF too), though I have big projects to support for at least a year (probably) with < 5.3. Right now the only problem with Doctrine 1 branch is SqlSrv extension which I need ASAP and hopefully will start working on it as soon as I can.

  3. Carl Helmertz

    Firstly, thanks for a wellwritten and interesting blog!

    A bit of an OT question but here goes: do you have a longer reasoning behind “just don’t forget that file back-end is probably not the best pick”, such as benchmarking or any custom Zend_Cache-backends that are publicly available? I know about APC so the question is more turned towards Zend_Cache implementations.

  4. Juozas (author)

    Carl, I’m happy that you liked it.

    I don’t know about any benchmarks available though. Why I’m suggesting to use Apc in this case because it’s very fast – not only to retrieve data, but there is very limited seek time (finding information in storage). HDD I/O is very expensive and usually one of the first things you need to look at when optimizing.

    So I would suggest to look at memory based solutions every time you can – memcache, xcache or apc are great. For example even database in our production is almost in memory – most of data is waiting there so it could be very fast retrieved. I use file cache for pre-generated thumbnails, xmls or things like that, where they need a lot of space and it’s useful to have them even if server needs restart for example. In memory based solutions you would loose (usually) everything you stored before.

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">