Your First Databricks Notebook

On August 4, 2020, I presented this on the weekly Pragmatic Works webinar series. You can view that presentation here.

As part of that presentation, I committed to give you access to the databricks notebooks so you can run through this as well. You can find the notebook on my Github. It is stored as a dbc (Databricks notebook) file. You will need Databricks to open the file.

Two questions were asked during the session that I wanted to handle here. The first was related to connecting to relational databases. The short answer is yes. You can use the JDBC driver to work with SQL Server or Snowflake for instance. Details can be found in the data sources article on the Databricks site.

The next question was related to Databricks ability to work with SFTP. While I cannot speak to the specifics, I was able to find the following Spark library that may be able to provide the support you need. To be clear, I have not implemented this myself but wanted to provide a potential resource to help with this implementation. I found this in a Databricks forum and it may work for you: https://github.com/springml/spark-sftp. If one of you finds this useful, feel free to post a comment here for others to refer to.

Thanks again for everyone who was able to attend. We look forward to continuing to work with Databricks more.

2 thoughts on “Your First Databricks Notebook

  1. Helpful hint: try not to use .dbc files to share or version-control notebooks. dbc is a binary format so it’s about worthless for versioning and as you mentioned it requires databricks to open it. The better choice is to save/export each notebook as ipynb files. This is the jupyter format which is JSON. github has a built-in viewer so you can render your notebooks right in gh. ipynb files, however, are not human-readable. The best choice, imo, is to save the notebooks as “source” format (either .py or .scala). The only real benefit of ipynb files over source is that you can save your result cells inline with your code cells, so any data or visualizations will also render in github.

    1. Dave,

      Thanks for the feedback on this. However, I created the notebook using Azure Databricks which gave me two options for export – dbc or html. I chose dbc in this case because the intent is to walk through the solution in the presentation on Azure Databricks which has a free trial for anyone to use. I will keep this in mind for other notebooks I might want to share and appreciate the recommendations.

      Thanks again for taking the time to read the blog and give suggestions.

      Steve

Comments are closed.