jeudi 1 mai 2014

Hadoop ruche - Cassandra - intégration - Stack Overflow


What's the best practice for integrating Cassandra and Hive?


An old question on Stackoverflow (Cassandra wih Hive) points to Brisk, which has now become a subscription-only Datastax Enterprise product.


A google search only points to two open jira issues,



but none of them has resulted in any code committed in one of the two projects.


Is the only way to integrate Cassandra and Hive patching the Cassandra/Hive source code? Which solution are you using in your stack?




I did the same research a month ago, to reach to the same conclusion. Brisk is no longer available as a community download, and besides patching the Cassandra/Hive code, the only way to throw map/reduce jobs at your Cassandra database is to use DSE -- Datastax Enterprise, which I believe is free for any use but production clusters.


You might have a look at HBase which is based on HDFS.




There's an open source Cassandra Storage Handler for Hive currently maintained by Datastax.




You can use an integration framework or integration suite for this problem. Take a look at my presentation "Big Data beyond Hadoop - How to integrate ALL your data" for more information about how to use open source integration frameworks and integration suites with Hadoop.


For example, Apache Camel (integration framework) and Talend Open Studio for Big Data (integration suite) are two open source solutions which offer connectors to both Cassandra and Hadoop.



What's the best practice for integrating Cassandra and Hive?


An old question on Stackoverflow (Cassandra wih Hive) points to Brisk, which has now become a subscription-only Datastax Enterprise product.


A google search only points to two open jira issues,



but none of them has resulted in any code committed in one of the two projects.


Is the only way to integrate Cassandra and Hive patching the Cassandra/Hive source code? Which solution are you using in your stack?



I did the same research a month ago, to reach to the same conclusion. Brisk is no longer available as a community download, and besides patching the Cassandra/Hive code, the only way to throw map/reduce jobs at your Cassandra database is to use DSE -- Datastax Enterprise, which I believe is free for any use but production clusters.


You might have a look at HBase which is based on HDFS.



There's an open source Cassandra Storage Handler for Hive currently maintained by Datastax.



You can use an integration framework or integration suite for this problem. Take a look at my presentation "Big Data beyond Hadoop - How to integrate ALL your data" for more information about how to use open source integration frameworks and integration suites with Hadoop.


For example, Apache Camel (integration framework) and Talend Open Studio for Big Data (integration suite) are two open source solutions which offer connectors to both Cassandra and Hadoop.


0 commentaires:

Enregistrer un commentaire