Overview
This lecture provides a detailed walkthrough of working with notebooks in Azure Databricks, covering creation, import/export, permissions, and language usage.
Navigating the Databricks Workspace
- Access the Databricks workspace from the Azure portal after deployment.
- Workspace overview includes details such as pricing tier, URL, and a delete option for unused workspaces.
- Always delete unused resources to avoid unnecessary billing.
Working with Notebooks
- Notebooks are the main interface for coding and data analysis in Databricks.
- Create new notebooks by selecting the language (Python, Scala, SQL, or R) and associating a cluster.
- Notebooks are organized under users and shared folders within the workspace.
Importing & Exporting Notebooks
- Import notebooks from files (formats: .py, .scala, .sql, .r, .dbc, .ipynb, .html) or by URL.
- Export notebooks as source files, DBC archive, iPython notebook, or HTML.
- Use export to download notebooks for sharing or backup.
Managing Notebooks and Folders
- Options include: create, import, export, rename, move, clone (duplicate), and move to trash (delete).
- Copy file path feature gives relative notebook path within Databricks.
- Organize notebooks into folders for structured collaboration.
Permissions & Collaboration
- Set permissions at the folder or notebook level to control access (read, run, edit, manage) for users or groups.
- Useful for managing projects with multiple team members.
Using Multiple Languages & Magic Commands
- Default notebook language is chosen at creation but can be changed for notebook or specific cells.
- Use magic commands (e.g., %sql, %scala, %python, %r) to execute code in a different language in a cell.
- It is recommended to keep notebooks primarily in one language for readability.
Notebook Cells & Execution
- Code is written in cells; each cell can contain a segment of code.
- Add new cells using the plus button.
- Run cells individually (Shift+Enter) or run all cells using the 'Run All' option.
- Output is displayed below the cell; by default, only the first 1000 rows are shown.
Other Features
- Use dark or light themes for the notebook interface.
- Clear output and state to remove results and reset the notebook.
- Add comments to code using a hash (#) in Python.
Key Terms & Definitions
- Notebook — Interactive document in Databricks for code, output, and visualizations.
- Magic Command — Special notation (e.g., %sql) to specify language for a cell.
- Cell — Section in a notebook to write and run code.
- Cluster — Group of compute resources for running notebook commands.
- Permissions — Access controls for users/groups to view/edit/run notebooks.
Action Items / Next Steps
- Practice creating, importing, exporting, and organizing notebooks in your Databricks workspace.
- Ensure unused resources are deleted to avoid extra costs.
- Experiment with magic commands and multi-language support in notebooks.