<b>The different is more efficient pipeline</b><br><br>• MySQL uses so called...

Terrence Miao - 2015-10-11 15:09:59+1100 - Updated: 2015-10-11 15:12:25+1100

The different is more efficient pipeline

• MySQL uses so called “schema on write” – it will need the data to be converted into MySQL. If our data is not inside MySQL you can’t use “sql” to query it.

• Spark (and Hadoop/Hive as well) uses “schema on read” – it can apply a table structure on top of a compressed text file, for example, (or any other supported input format) and see it as a table; then we can use SQL to query this “table.”

Using Apache Spark and MySQL for Data Analysis