vendredi 18 avril 2014

La mise en cache de données de l'EAV - XML ou NoSQL / MongoDB ? -Débordement de pile


I'm building a web app that relies heavily on the EAV pattern for storing data. This basically means that each attribute of an object has it's own row in a massive database table. I'm using MySQL to store everything. This is a very simplified example of what I'm storing...


OBJECTS              ATTRIBUTES

objId | type objId | attribute | value
============= =========================
1 | fruit 1 | color | green
2 | fruit 1 | shape | round
3 | book 2 | color | red

I know some people hate EAV, but I need to be able to add new object attributes arbitrarily without modifying the database schema, and it's working very well for me so far.


As I think anyone else finds when building a system using an EAV data structure, the weakness of this approach is the retrieval of multiple objects together with each object's attributes. At the moment my app only displays 10 objects at a time, so I just query my EAV table 10 times (once for each object) and it's still very fast. However, I'd like to remove this limitation and allow hundreds of objects to be fetched in one go. I also want to be able to query objects in a more flexible way than I'm doing currently.


Doing this with SQL joins would be hideous, so I'm considering caching the data. On average the database gets about 300 reads for every 1 write, so I think this it's a good candidate for caching.


So far these are the options I've come up with...



  1. XML database column: Every time a write is performed, update an XML text column in the objects table containing all the object's attributes. This would work for reading the data quickly, but querying XML data hidden in a database table is messy.


  2. XML file: Every time a write is performed, write an XML file to disk which contains each object and it's attributes. This has the benefit that I can then use XQuery to query the objects.


  3. NoSQL (eg. MongoDB): Perhaps I should have built the system on a schemaless database like MongoDB. Re-writing the entire app to use MongoDB would be quite time consuming, but it struck me that I could use it as a cache. So for example, every time data is written to the EAV store, the equivalent object would be updated in MongoDB which would then be used for reads and queries.



Originally I thought an XML file would be the best approach, but I can see the file getting really big and unmanageable. At the moment I'm leaning towards using MongoDB. I know it seems crazy running two database servers for one app, but I think it could work in my case.


I'd love to hear your thoughts on this.



I'm building a web app that relies heavily on the EAV pattern for storing data. This basically means that each attribute of an object has it's own row in a massive database table. I'm using MySQL to store everything. This is a very simplified example of what I'm storing...


OBJECTS              ATTRIBUTES

objId | type objId | attribute | value
============= =========================
1 | fruit 1 | color | green
2 | fruit 1 | shape | round
3 | book 2 | color | red

I know some people hate EAV, but I need to be able to add new object attributes arbitrarily without modifying the database schema, and it's working very well for me so far.


As I think anyone else finds when building a system using an EAV data structure, the weakness of this approach is the retrieval of multiple objects together with each object's attributes. At the moment my app only displays 10 objects at a time, so I just query my EAV table 10 times (once for each object) and it's still very fast. However, I'd like to remove this limitation and allow hundreds of objects to be fetched in one go. I also want to be able to query objects in a more flexible way than I'm doing currently.


Doing this with SQL joins would be hideous, so I'm considering caching the data. On average the database gets about 300 reads for every 1 write, so I think this it's a good candidate for caching.


So far these are the options I've come up with...



  1. XML database column: Every time a write is performed, update an XML text column in the objects table containing all the object's attributes. This would work for reading the data quickly, but querying XML data hidden in a database table is messy.


  2. XML file: Every time a write is performed, write an XML file to disk which contains each object and it's attributes. This has the benefit that I can then use XQuery to query the objects.


  3. NoSQL (eg. MongoDB): Perhaps I should have built the system on a schemaless database like MongoDB. Re-writing the entire app to use MongoDB would be quite time consuming, but it struck me that I could use it as a cache. So for example, every time data is written to the EAV store, the equivalent object would be updated in MongoDB which would then be used for reads and queries.



Originally I thought an XML file would be the best approach, but I can see the file getting really big and unmanageable. At the moment I'm leaning towards using MongoDB. I know it seems crazy running two database servers for one app, but I think it could work in my case.


I'd love to hear your thoughts on this.


0 commentaires:

Enregistrer un commentaire