Tuesday, 9 August 2011

Hibernate Gotchas!

I've been using Hibernate for some time now and when I don't work on a hibernate project for a while I find myself doing the same mistakes I did the previous times

So here is a sort of watch list for my self hopefully will be useful to someone else as well.

Implement hashCode and equals

You should always implement these methods in general but for your entities you should pay a bit more attention.The first thing you think when I say equals is probably to use the Id to distinguish between instance. Well, that's gonna cause you a lot of troubles.
You need to keep in mind that you are working with db entities and not normal POJOs.

When Hibernate is fetching objects is your using collections and hence equals and hashCode to know if an object you are looking for is in the session. For new objects id will be null or 0.
That means when trying to save two objects of the same class the second is going to overwrite the first one.
Also when hibernate saves a new instance it will set the id, thus making it a different object while it is exactly the same.
You need to use some business keys. Unique codes are great but if you can't think of anything just use a meaningful field and some timestamp (like creation date) to make it unique.

This is a good reference if you want to understand a bit further what's happening.

Careful with One-to-One and Many-to-One relations

This is something you really need to know.
When mapping a relation as One-to-One or Many-to-One on the "One" side of the relation you can't have lazy loading unless you specify the field as not nullable.

Why is that?
Essentially on the many side of the relation hibernate can use collection proxies and lazily load instances when required.
On the "One" side there is no collection interface but instead a reference to one of your model classes.
Hibernate can proxy that one as well but only if it is sure the reference will never be null!
So remember if you want to have lazy loading use the not null on the one side together with the lazy annotation (or xml equivalent).
If your relation can be null but you still really want to make it lazy then you have some options:
  • Create a value to represent that. For example if you have a relation like Person ->Partner  just use a specific instance of Partner that means "no partner".
  • Use build time instrumentation. Check this
  • Fake the one side using a List and getting the field with get(0)

Read more on the hibernate documentation

Enable the statement logging

This is the only way to verify Hibernate is really doing what you expect him to do. Luckily enough there are different logging parameters that you can use to find out what is happening both at the HQL or if you want at the SQL level. You'll be surprised how many times hibernate is running queries and you did not except it. Try to this from the very beginning and help the team understand the importance of having the best and least possible queries or you'll surely have performance issue when running the application on some real data. To enable logging just set this property in the session configuration file
If you want to see it nicely formatted add

Watch what goes in the toString method.

This one is again related to what Hibernate fetches for you without you really being aware. Lots of times when you see queries but can't figure out why some lazy list is being loaded then check the toString method.
It might be the culprit!

What are your hibernate gotchas?


DoghouseReilly said...

Good list! Thanks.

MxHyway said...

I have found "hibernate.show_sql=true" to be the most valuable tool, particularly when explaining to a client why their current Hibernate implementation is bringing MySQL to its knees. Seeing multi-gigabyte query logs can be a real eye opener.

salient1 said...

The only real gotcha you need to know is that you shouldn't use Hibernate at all. It requires the user to know far to much about the underlying implementation of the API to be considered a competent implementation of an ORM. It's really a model example of what not to do with a library. The fact that you need an 800+ page book (courtesy of Gavin King) to describe it is your first clue that something isn't right.

Jilles said...

Fully agree with salient1 here. The fallacy with ORM and hibernate in particular is believing that his makes things easier for people who are not database experts. In reality it is an OKish solution only for those without any real scalability or performance requirements since addressing either will require a deep understanding of both how things work at the database level and inside hibernate.

So keep things simple and remove hibernate from the equation entirely. Databases are hard enough without hibernate pretending it is easy. In the end all you will get out of hibernate is not having to write a few simple classes that map rows to objects, which is actually a comparatively simple problem for which many alternative solutions exist. Write those classes manually and you will still be better off without hibernate. If you don't understand databases, keeping things simple should be your primary goal. Hibernate only provides the illusion of simplicity here.

董益宏 said...

Agree with salient1 and Jilles, but I believe it is because we are into relational database and we are more database oriented mindset thinking which is correct for building robust application. Robust application needs proper sql tuning and as Jilles said is complicated. Theoretically, object oriented mindset is so much different from relational concept(Set in maths) thus it is not easy to do a mapping easily. I appreciate Hibernate contributes such a good mapping so far but I think is not good enough for robust enterprise application.

Yannick Majoros said...

I just don't agree with this. ORM has a value. If you are doing OOP, chance are you need Hibernate (or actually, JPA) to manipulate objects instead of bits of strings. This is no php. Using JPA 2 (criteria query), I'm sure my apps are 100% typesafe and my queries are all ok, even when built dynamically.

I think the only reason for not using any ORM is not mastering it. But then, you let ignorance guide your choice. ORM is not here to make things easy, it's here to let you work with objects and have a serious OOP architecture without reinventing the wheel.

Libor Šubčík said...

For logging, I prefer log categories
org.hibernate.SQL for sql and org.hibernate.type for parameter binding. It can leverage your logger framework capabilities, the hibernate.format_sql=true property applies to the logging too.

I use common parent class for all entities with predefined equals, hashCode and toString based on T getId() method (T extends Serializable). I have never encountered problems with equals based on id, maybe because I use DTOs on a business layer, so the entities are short lived. But imho an object with no id assigned is not equal to another object with id assigned (saved in db) from the hibernate perspective (providing hibernate solves persistence layer).

Sreenath V said...

We had used JPA (ORM) in one of my previous news corp application and we are running close to 100 news site built on 8 node in a JBoss cluster, Oracle DB, 250 Entities (some of them are marked as caching and others not cached), used Named Queries in most of the cases and hardly in 10 cases we used native query. And this application is running since 2008.

We migrated the old application which was not scaling beyond 35 news sites that was using plain JDBC code. With JPA things were much better due to transparent second level caching and hence sociability...

Hope some of you will understand the real power of JPA.

mericano1 said...

It's great to see all this feedback from you guys...
I think hibernate is actually a great tool. It is widely adopted, has lots of contributors and it is one of the most reliable and mature ORM tools available.

When used properly it can support massive applications and together with caching can improve application performance and scale up pretty well.

@MxHyway absolutely is great to show customer what they are actually doing.

@Libor Šubčík log categories are a better way to log SQL but the point is always keep an eye on what is the actual query being run, it is so easy to assume it's fine!

Keep your comments coming!

Diesel Don said...

I really strongly disagree with salient1 here. Judging the usefulness of something by the size of the manual means that no one should fly a Boeing 777.

In any case, one of my gotcha's is to be careful using bi-directional relationships in object graphs. Don't do it unless you really, really need to navigate to a parent from a child. This is especially true with deep object graphs. Bi-directionality in this case will kill performance and consume loads of memory. In my experience, I have found it's better to keep the parent object available some other way, even to the point of retrieving it from the database again, rather than navigating to it from a child object 5 or 6 generations below the parent.

JAlexoid said...

You are wrong in case of ManyToOne lazyness. Hibernate handles both sides lazyness properly, without issues. With proxies and all that...

I've just delivered a rather large project that actually depends on Hibernate working properly in that case.