lundi 14 avril 2014

ruche - cassandra vive TimeUUIDType - Stack Overflow


I used brisk. The cassandra column family automatically maps to Hive tables.
However, if data type is timeuuid in column family, it is unreadable in Hive tables.


For example, I used following command to create an external table in hive to map column family.


Hive > create external table A (rowkey string, column_name string, value string) 
> STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
> WITH SERDEPROPERTIES (
> "cassandra.columns.mapping" = ":key,:column,:value");

If column name is TimeUUIDType in cassandra, it becomes unreadable in the Hive table.


For example, a row in cassandra column family looks like:


RowKey: 2d36a254bb04272b120aaf79d70a3578  
=> (column=29139210-b6dc-11df-8c64-f315e3a329d6, value={"event_id":101},timestamp=1283464254261)

Where column name is TimeUUIDType.


In hive table, it looks like the following row:


 2d36a254bb04272b120aaf79d70a3578    t��ߒ4��!��   {"event_id":101}

So, column name is unreadable in Hive table.




This is a known issue with the automatic table mapping. For best results with a timeUUIDType, turn the auto-mapping feature off in $brisk_home/resources/hive/hive-site.xml: "cassandra.autoCreateHiveSchema"


and create the table in hive manually.



I used brisk. The cassandra column family automatically maps to Hive tables.
However, if data type is timeuuid in column family, it is unreadable in Hive tables.


For example, I used following command to create an external table in hive to map column family.


Hive > create external table A (rowkey string, column_name string, value string) 
> STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
> WITH SERDEPROPERTIES (
> "cassandra.columns.mapping" = ":key,:column,:value");

If column name is TimeUUIDType in cassandra, it becomes unreadable in the Hive table.


For example, a row in cassandra column family looks like:


RowKey: 2d36a254bb04272b120aaf79d70a3578  
=> (column=29139210-b6dc-11df-8c64-f315e3a329d6, value={"event_id":101},timestamp=1283464254261)

Where column name is TimeUUIDType.


In hive table, it looks like the following row:


 2d36a254bb04272b120aaf79d70a3578    t��ߒ4��!��   {"event_id":101}

So, column name is unreadable in Hive table.



This is a known issue with the automatic table mapping. For best results with a timeUUIDType, turn the auto-mapping feature off in $brisk_home/resources/hive/hive-site.xml: "cassandra.autoCreateHiveSchema"


and create the table in hive manually.


0 commentaires:

Enregistrer un commentaire