mardi 22 avril 2014

Hadoop - peut PL/SQL sûrement être converti en Lating cochon ou d'un Pipeline de Oozie avec le Latin de porc et de la ruche - Stack Overflow


I am curious about replacing my Oracle db with Hadoop and am learning about the Hadoop ecosystem.


I have many PL/SQL scripts that would require replacement if I were to go this route.


I am under the impression that with some hard work I would be able to convert/translate any PL/SQL script into an analogous Pig Latin script. If not only Pig Latin, then a combination of Hive and Pig via Oozie.


Is this correct?




While most SQL statements can be translated into equivalent Pig and/or Hive statements there are several limitations that are inherent to the hadoop filesystem that get passed down to the languages. The primary limitation is that HDFS is a write-once, read-many system. This means that a statement that includes something like an UPDATE SQL command, or a DELETE sql command will not work. This is primarily due to the fact that both would require that the programming language be capable of changing the contents of an already existing file, which would contradict the write-once paradigm of hadoop.


There are however workaround to these. These commands can both be simulated through copying the file in question and making the changes when writing to the copy, deleting the original, and moving the copy into the original's location. Neither pig nor Hive have this functionality so you would have to slightly branch out of these languages in order to do so. For instance a few lines of bash could probably handle the deletion amd movement of the copy once the pig script has executed. Given that you can use bash to call the pig script in the first place this allows for a fairly simple solution. Or you could look into HBase which provides the ability to do something similar. However both solutions involve things outside of Pig/Hive, so if you absolutely cannot go outside of those languages the answer is no.



I am curious about replacing my Oracle db with Hadoop and am learning about the Hadoop ecosystem.


I have many PL/SQL scripts that would require replacement if I were to go this route.


I am under the impression that with some hard work I would be able to convert/translate any PL/SQL script into an analogous Pig Latin script. If not only Pig Latin, then a combination of Hive and Pig via Oozie.


Is this correct?



While most SQL statements can be translated into equivalent Pig and/or Hive statements there are several limitations that are inherent to the hadoop filesystem that get passed down to the languages. The primary limitation is that HDFS is a write-once, read-many system. This means that a statement that includes something like an UPDATE SQL command, or a DELETE sql command will not work. This is primarily due to the fact that both would require that the programming language be capable of changing the contents of an already existing file, which would contradict the write-once paradigm of hadoop.


There are however workaround to these. These commands can both be simulated through copying the file in question and making the changes when writing to the copy, deleting the original, and moving the copy into the original's location. Neither pig nor Hive have this functionality so you would have to slightly branch out of these languages in order to do so. For instance a few lines of bash could probably handle the deletion amd movement of the copy once the pig script has executed. Given that you can use bash to call the pig script in the first place this allows for a fairly simple solution. Or you could look into HBase which provides the ability to do something similar. However both solutions involve things outside of Pig/Hive, so if you absolutely cannot go outside of those languages the answer is no.


0 commentaires:

Enregistrer un commentaire