There is tonnes out of Information out there about Databricks and it can be a bit overwhelming. Where does somebody even start? We have done you a favor and curated a list of learning materials we found useful when we started our Databricks journey and we share with new employees.
July 29, 2022
We found there was tonnes of information about Databricks to take in, and it can be overwhelming! So we compiled a list of materials we found useful when we started our Databricks journey and we share with new employees when they begin. We keep this list up to date so feel free to share with your colleagues when they begin their Databricks journey.
What is Databricks?
Databricks is a single, cloud-based platform that can handle all of your data needs, which means it’s also a single platform on which your entire data team can collaborate.Not only does it unify and simplify your data systems, Databricks is fast, cost-effective and inherently scales to very large data.Databricks is available on top of your existing cloud, whether that’s Amazon Web Services (AWS), Microsoft Azure, Google Cloud, or even a multi-cloud combination of those. Read our guide to 'What is Databricks and What is it used for' for further information.
What is Azure Databricks?
We like how succint this is and it links out if reader wants to explore the Databricks "areas" in more details. This is a live page and gets updated to include latest offerings from Databricks.
What is Databricks used for?
We've written an article on just that which you can read here.
For another perspective, we think this article from Omar Mamood, titled 'What Does Databricks Do?' is quite comprehensive. However it does not cover 'SQL Analytics Workspace' information, possibly because this feature was added after the blog.
Another source of information is obviously the Databricks website, and it'll be kept up to date. The SQL Analytics feature is simple in concept so we think the infomation on the databricks page itself should be suffient but shouldn't be missed since some organisations can benefit greatly from this feature.
How can I get a Databricks certification?
The Databricks academy is the main source of all official Databricks training. Free vouchers are also available for partners and customers which is a great incentive. There are various Learning paths available to not only provide in depth techinical training, but also to allow business users to become comfortable with the platform
What is a Data lakehouse?
The Databricks article is authored by the vendor while the article from James Serra, a Microsoft employee, provides a great comparison around how the architecture can be applied in other tools as well
Data Lakehouse Defined
What is databricks delta lake?
This resource provides support around the open source specification - Important to include as a lot of Databricks features are open source. This link is specific to the Databricks implementation of Delta Lake
What is the best data lake software?
We think this is a great side by side comparison of the various data lake platforms.
What is the difference between Databricks and Snowflake?
It's a complicated topic... While similar in theory, Databricks and Snowflake have some noticeable differences. Databricks can work with all data types in their original format, while Snowflake requires that structure is added to your unstructured data before you work with it. Databricks also focuses more on data processing and application layers, meaning you can leave your data wherever it is — even on-premise — in any format, and Databricks can process it. Like Databricks, Snowflake provides ODBC & JDBC drivers to integrate with third parties. However, unlike Snowflake, Databricks can also work with your data in a variety of programming languages, which is important for data science and machine learning applications. This resource explains the core differences quite well with good use of diagrams. We like that it highlights the fact that Snowflake doesn't have good support for unstructured data which is the common state of data, more and more today. Since this article is from November, there might be some Snowflake updates. However this article highlights nicely the different philosophies fundamental to each of the platforms.