Data Engineering Guardrails and Best Practices - General#
Updated by Gareth Stretch / 2025.03.13
Version Control#
| Version | Date | Owner | Change Description |
|---|---|---|---|
| 0.1 | 9 April 2025 | Gareth Stretch | Initial Framework created |
Naming Standards#
This section contains the guardrails and best practices for naming standards
Recommended areas to define naming standards.#
It is recommended to document , publish and train users on naming standards for at minimum the following areas.
- Databricks Clusters
- Landing zone implementation
- Databases and schema's
- Tables
- Columns
- Indexes and Constraints
- Stored Procedures and Functions
- Triggers
- Databricks jobs
- Security and roles
- Service principles
- Resource groups
Guard Rails#
-
Consistency : - Uniform Case Style : Use a consistent case style, such as lowercase with underscores (e.g., customer_data, order_details) - Standard Prefixes : Apply standard prefixes for different types of objects (e.g., tbl_ for tables, vw_ for views )
-
Clarity : - Descriptive Names : Ensure names are descriptive and clearly indicate the purpose or content of the object (e.g., sales_transactions, customer_profiles) - Avoid Abbreviations : Minimize the use of abbreviations unless they are widely understood within your organization
-
Avoid Reserved Words : - SQL Reserved Words: Avoid using SQL reserved words in names to prevent conflicts and errors (e.g., select, table, join)
-
Scalability : - Future-Proofing: Design naming conventions that can accommodate future growth and changes in the data model - Versioning: Include version numbers in names if applicable (e.g., customer_data_v1, customer_data_v2)
-
Documentation - Comprehensive Guidelines: Maintain a detailed document outlining all naming conventions and examples. - Training: Ensure all team members are trained on the naming standards and understand their importance
-
Integration - Interoperability: Ensure naming conventions are compatible with other systems and tools used in your data ecosystem. - Consistency Across Platforms: Apply the same naming conventions across different platforms and environments
-
Review and Enforcement - Regular Reviews: Implement a process for regularly reviewing and updating naming conventions. - Automated Checks: Use automated tools to enforce naming standards and catch deviations
Pitfalls to avoid#
When establishing naming conventions for a Databricks data platform, there are several common pitfalls to avoid to ensure clarity, consistency, and maintainability
- Inconsistent Case Styles - Avoid Mixed Case Styles: Using mixed case styles (e.g., CamelCase, PascalCase, snake_case) can lead to confusion and errors. Stick to a single case style, such as lowercase with underscores (e.g., customer_data, order_details).
- Ambiguous Names - Avoid Vague Names: Names like data1, table2, or temp do not provide any context about the content or purpose of the object. Use descriptive names that clearly indicate the object's role (e.g., sales_transactions, customer_profiles).
- Abbreviations and Acronyms - Avoid Unclear Abbreviations: Using abbreviations or acronyms that are not widely understood can lead to confusion. If abbreviations are necessary, ensure they are documented and standardized.
- Reserved Words - Avoid SQL Reserved Words: Using SQL reserved words (e.g., select, table, join) as names can cause conflicts and errors in queries3.
- Lack of Standard Prefixes - Avoid Missing Prefixes: Not using standard prefixes for different types of objects can make it difficult to distinguish between tables, views, indexes, etc. Use prefixes like tbl_ for tables, vw_ for views, idx_ for indexes.
- Overly Long Names - Avoid Excessively Long Names: Names that are too long can be cumbersome to work with and may exceed system limits. Aim for concise yet descriptive names2.
- Inconsistent Naming Across Environments - Avoid Environment-Specific Naming: Ensure naming conventions are consistent across development, testing, and production environments to avoid confusion and errors during deployment.
- Lack of Documentation - Avoid Undocumented Conventions: Failing to document naming conventions can lead to inconsistencies and misunderstandings. Maintain a comprehensive guide that outlines all naming standards and provides examples2.
- Ignoring Business Terminology - Avoid Source-Specific Terminology: Use business terminology rather than source-specific terminology to ensure names are meaningful to all stakeholders (e.g., customer_id instead of user_id).
General Naming Guidelines to Follow:#
- Use lowercase with underscores or kebab-case consistently.
- Be descriptive but concise.
- Include context: environment, domain, layer, purpose.
- Avoid abbreviations unless they are widely understood.
- Never use temporary or test names in production environments.
âś… Databricks Naming Standards Table#
| Area | Object Type | Naming Convention Example | Pattern | Notes |
|---|---|---|---|---|
| Workspace | Folder | team-data-eng |
team-[function] |
Use team or domain names to organize notebooks and jobs. |
| Notebook | ingest_customer_data |
[verb]_[object]_[details] |
Use snake_case; keep names action-oriented. | |
| Jobs | Job Name | daily_ingest_customer_data |
[frequency]_[verb]_[object] |
Include frequency if scheduled. |
| Task Name | transform_orders_silver |
[verb]_[object]_[layer] |
Reflects the transformation and target layer. | |
| Clusters | Cluster Name | dev-data-eng-shared |
[env]-[team]-[purpose] |
Helps identify usage and environment. |
| Repos | Repo Name | data-platform-utils |
[project]-[purpose] |
Use kebab-case for Git repo names. |
| Data Storage | External Location | s3://company-data/bronze/sales/ |
[cloud-provider]://[org]/[layer]/[domain]/ |
Use consistent folder hierarchy. |
| Databases / Schemas | Schema Name | bronze_sales |
[layer]_[domain] |
Reflects medallion layer and domain. |
| Table Name | customer_orders |
[subject]_[object] |
Use snake_case; avoid abbreviations. | |
| View Name | vw_customer_orders_summary |
vw_[table]_[purpose] |
Prefix with vw_ to distinguish views. |
|
| Delta Live Tables (DLT) | Pipeline Name | dlt_sales_pipeline |
dlt_[domain]_pipeline |
Use dlt_ prefix for clarity. |
| Table Name | silver_customer_orders |
[layer]_[subject]_[object] |
Include layer in name. | |
| Unity Catalog | Catalog Name | prod_catalog |
[env]_catalog |
Separate by environment (dev, test, prod). |
| Managed Table | gold_finance_kpi_summary |
[layer]_[domain]_[purpose] |
Fully qualified: catalog.schema.table. |
|
| Functions / UDFs | Function Name | fn_calculate_discount |
fn_[verb]_[object] |
Prefix with fn_ for clarity. |
| Secrets | Scope Name | azure-keyvault-prod |
[provider]-[purpose]-[env] |
Use consistent naming across environments. |
| Secret Key | db_password |
[service]_[credential_type] |
Avoid sensitive info in names. | |
| Monitoring | Alert Name | alert_failed_ingest_jobs |
alert_[condition]_[target] |
Clear and actionable. |
| Tags / Metadata | Tag Key | owner, env, cost_center |
lowercase, underscore-separated | Use for governance and cost tracking. |
Absolutely! Here's a table of naming anti-patterns—names you should avoid when working across a Databricks data platform. These examples highlight poor practices that can lead to confusion, maintenance issues, or governance problems.
đźš« Databricks Naming Anti-Patterns Table#
| Area | Bad Naming Example | Why It’s Bad |
|---|---|---|
| Workspace Folder | misc, temp, new_folder |
Vague, non-descriptive, and hard to manage at scale. |
| Notebook | Untitled, test1, final_v2 |
Lacks context, versioning is unclear, and not reusable. |
| Job Name | job1, pipeline_test, run_this |
Doesn’t describe purpose, frequency, or data domain. |
| Cluster Name | cluster123, mycluster, test-cluster |
Not environment-specific or team-specific; hard to track usage. |
| Repo Name | scripts, code, myrepo |
Too generic; doesn’t reflect project or function. |
| Schema Name | db1, test_schema, junk |
Doesn’t indicate layer (bronze/silver/gold) or domain. |
| Table Name | tbl1, data, temp_table |
Non-descriptive, unclear purpose, and may conflict with reserved words. |
| View Name | view1, vw, temp_view |
Doesn’t indicate what data is being viewed or its purpose. |
| DLT Pipeline | pipeline1, dlt_test |
Doesn’t reflect domain, layer, or business logic. |
| Function Name | func1, doStuff, calc |
Ambiguous, not reusable, and lacks clarity.clarity. |
| Secrets | password, key, secret1 |
Too generic; hard to manage securely across scopes. |
| Alert Name | alert1, fail, error_job |
Not actionable or specific; hard to triage. |
| Tags / Metadata | tag1, misc, x |
Not standardized; lacks governance and traceability. |