SnowPro Core Certification Series - Video 7
Topics Covered
- File Formats
- Sequences
- Streams
- Tasks
File Formats
- Definition: Named objects used for loading and unloading data into Snowflake.
- Supported Formats:
- CSV
- JSON
- Parquet (Par)
- Avro (AO)
- ORC
- XML
- Loading and Unloading:
- CSV, JSON, and Parquet support both loading and unloading.
- Avro, ORC, and XML support loading only.
- Creating File Formats:
- Can be created via Snowflake’s web UI (Snow Site) or SQL commands.
- Options include compression, skip header lines, trim spaces, etc.
- Schema defined in the file format must match the Snowflake table schema.
- Partitioning Loading:
- Supported for ORC, Parquet, and JSON only.
Sequences
- Purpose: Generate unique and continuous values, often used for primary keys.
- Behavior:
- Used across sessions and concurrent statements.
- Snowflake does not guarantee gapless sequence numbers.
- SQL Commands:
CREATE OR REPLACE SEQUENCE sequence_name START WITH 1 INCREMENT BY 1
- Example: Inserting values using sequence increments by defined steps, not always continuous.
- Example Execution:
- Sequence starting at 1 increments by 1 or a specified value.
Tasks
- Definition: Mechanism to schedule SQL statements or procedures at specific intervals.
- Types:
- Serverless Task: Compute managed by Snowflake.
- User-managed Task: Requires specifying a virtual warehouse.
- Scheduling:
- Can use CRON expressions for scheduling.
- Creation and Management:
- Create task using
CREATE TASK statement.
- Tasks are created in a suspended state and require a
RESUME command to activate.
- Task Graphs (DAG):
- Tasks can be dependent on each other, forming a Directed Acyclic Graph.
- Tasks have a root and dependent tasks, and flow is unidirectional.
- Maximum of 10,000 tasks in a graph, each task can have up to 100 predecessors and successors.
Streams
- Purpose: Change Data Capture (CDC) mechanism to track changes like inserts, updates, and deletes on tables.
- Functionality:
- Tracks DML changes using metadata fields:
METADATA$ACTION, METADATA$IS_UPDATE, METADATA$ROW_ID.
- Types of Streams:
- Standard Stream: Tracks all DML changes.
- Append-only Stream: Tracks only insert operations.
- Insert-only Stream: Tracks inserts and omits deletes.
- Integration with Tasks:
- Combine with tasks to automate processes based on data changes detected by streams.
Important Links
- Snowflake Documentation for detailed understanding of:
- File formats
- Sequences
- Tasks (Serverless and User-managed)
- Streams
Conclusion
This video provides an overview of critical functionalities in Snowflake, which are essential for efficient data management and automation tasks within the platform. Understanding these components is vital for the SnowPro Core Certification.