in data dev ~ read.
Thoughts on Databricks Cloud

Thoughts on Databricks Cloud

Random thoughts after using Databricks Cloud for data exploration. Great platform for ad-hoc analysis with more robust features compared to Apache Zeppelin.


What I Like

  • Support for PySpark and PyPi.
  • Built-in visualizations and support for matplotlib, ggplot2, d3.js.
  • Scheduled jobs.
  • Job progress and Spark web UI.
  • AWS integration and bleeding edge Spark.

What I Dislike

  • Poor error/debug output.
  • No line numbering.
  • Limited built-in visualization configuration options.
  • No easy way to download data without resorting to S3.
  • Intermittent connectivity issues with long running Spark context.

Examples

Databricks with Python Databricks visualization Databricks jobs Job progressDatabricks error output Databricks visualization options Overflow

comments powered by Disqus