Thoughts on Databricks Cloud
Random thoughts after using Databricks Cloud for data exploration. Great platform for ad-hoc analysis with more robust features compared to Apache Zeppelin.
What I Like
- Support for PySpark and PyPi.
- Built-in visualizations and support for
matplotlib
,ggplot2
,d3.js
. - Scheduled jobs.
- Job progress and Spark web UI.
- AWS integration and bleeding edge Spark.
What I Dislike
- Poor error/debug output.
- No line numbering.
- Limited built-in visualization configuration options.
- No easy way to download data without resorting to S3.
- Intermittent connectivity issues with long running Spark context.
Examples