Skip to content

Data Engineering Guardrails and Best Practices - General#

Updated by Gareth Stretch / 2025.03.13

Back to Menu

Version Control#

Version Date Owner Change Description
0.1 9 April 2025 Gareth Stretch Initial Framework created

Naming Standards#

This section contains the guardrails and best practices for naming standards

It is recommended to document , publish and train users on naming standards for at minimum the following areas.

  • Databricks Clusters
  • Landing zone implementation
  • Databases and schema's
  • Tables
  • Columns
  • Indexes and Constraints
  • Stored Procedures and Functions
  • Triggers
  • Databricks jobs
  • Security and roles
  • Service principles
  • Resource groups

Guard Rails#

  1. Consistency : - Uniform Case Style : Use a consistent case style, such as lowercase with underscores (e.g., customer_data, order_details) - Standard Prefixes : Apply standard prefixes for different types of objects (e.g., tbl_ for tables, vw_ for views )

  2. Clarity : - Descriptive Names : Ensure names are descriptive and clearly indicate the purpose or content of the object (e.g., sales_transactions, customer_profiles) - Avoid Abbreviations : Minimize the use of abbreviations unless they are widely understood within your organization

  3. Avoid Reserved Words : - SQL Reserved Words: Avoid using SQL reserved words in names to prevent conflicts and errors (e.g., select, table, join)

  4. Scalability : - Future-Proofing: Design naming conventions that can accommodate future growth and changes in the data model - Versioning: Include version numbers in names if applicable (e.g., customer_data_v1, customer_data_v2)

  5. Documentation - Comprehensive Guidelines: Maintain a detailed document outlining all naming conventions and examples. - Training: Ensure all team members are trained on the naming standards and understand their importance

  6. Integration - Interoperability: Ensure naming conventions are compatible with other systems and tools used in your data ecosystem. - Consistency Across Platforms: Apply the same naming conventions across different platforms and environments

  7. Review and Enforcement - Regular Reviews: Implement a process for regularly reviewing and updating naming conventions. - Automated Checks: Use automated tools to enforce naming standards and catch deviations

Pitfalls to avoid#

When establishing naming conventions for a Databricks data platform, there are several common pitfalls to avoid to ensure clarity, consistency, and maintainability

  1. Inconsistent Case Styles - Avoid Mixed Case Styles: Using mixed case styles (e.g., CamelCase, PascalCase, snake_case) can lead to confusion and errors. Stick to a single case style, such as lowercase with underscores (e.g., customer_data, order_details).
  2. Ambiguous Names - Avoid Vague Names: Names like data1, table2, or temp do not provide any context about the content or purpose of the object. Use descriptive names that clearly indicate the object's role (e.g., sales_transactions, customer_profiles).
  3. Abbreviations and Acronyms - Avoid Unclear Abbreviations: Using abbreviations or acronyms that are not widely understood can lead to confusion. If abbreviations are necessary, ensure they are documented and standardized.
  4. Reserved Words - Avoid SQL Reserved Words: Using SQL reserved words (e.g., select, table, join) as names can cause conflicts and errors in queries3.
  5. Lack of Standard Prefixes - Avoid Missing Prefixes: Not using standard prefixes for different types of objects can make it difficult to distinguish between tables, views, indexes, etc. Use prefixes like tbl_ for tables, vw_ for views, idx_ for indexes.
  6. Overly Long Names - Avoid Excessively Long Names: Names that are too long can be cumbersome to work with and may exceed system limits. Aim for concise yet descriptive names2.
  7. Inconsistent Naming Across Environments - Avoid Environment-Specific Naming: Ensure naming conventions are consistent across development, testing, and production environments to avoid confusion and errors during deployment.
  8. Lack of Documentation - Avoid Undocumented Conventions: Failing to document naming conventions can lead to inconsistencies and misunderstandings. Maintain a comprehensive guide that outlines all naming standards and provides examples2.
  9. Ignoring Business Terminology - Avoid Source-Specific Terminology: Use business terminology rather than source-specific terminology to ensure names are meaningful to all stakeholders (e.g., customer_id instead of user_id).

General Naming Guidelines to Follow:#

  • Use lowercase with underscores or kebab-case consistently.
  • Be descriptive but concise.
  • Include context: environment, domain, layer, purpose.
  • Avoid abbreviations unless they are widely understood.
  • Never use temporary or test names in production environments.

âś… Databricks Naming Standards Table#

Area Object Type Naming Convention Example Pattern Notes
Workspace Folder team-data-eng team-[function] Use team or domain names to organize notebooks and jobs.
Notebook ingest_customer_data [verb]_[object]_[details] Use snake_case; keep names action-oriented.
Jobs Job Name daily_ingest_customer_data [frequency]_[verb]_[object] Include frequency if scheduled.
Task Name transform_orders_silver [verb]_[object]_[layer] Reflects the transformation and target layer.
Clusters Cluster Name dev-data-eng-shared [env]-[team]-[purpose] Helps identify usage and environment.
Repos Repo Name data-platform-utils [project]-[purpose] Use kebab-case for Git repo names.
Data Storage External Location s3://company-data/bronze/sales/ [cloud-provider]://[org]/[layer]/[domain]/ Use consistent folder hierarchy.
Databases / Schemas Schema Name bronze_sales [layer]_[domain] Reflects medallion layer and domain.
Table Name customer_orders [subject]_[object] Use snake_case; avoid abbreviations.
View Name vw_customer_orders_summary vw_[table]_[purpose] Prefix with vw_ to distinguish views.
Delta Live Tables (DLT) Pipeline Name dlt_sales_pipeline dlt_[domain]_pipeline Use dlt_ prefix for clarity.
Table Name silver_customer_orders [layer]_[subject]_[object] Include layer in name.
Unity Catalog Catalog Name prod_catalog [env]_catalog Separate by environment (dev, test, prod).
Managed Table gold_finance_kpi_summary [layer]_[domain]_[purpose] Fully qualified: catalog.schema.table.
Functions / UDFs Function Name fn_calculate_discount fn_[verb]_[object] Prefix with fn_ for clarity.
Secrets Scope Name azure-keyvault-prod [provider]-[purpose]-[env] Use consistent naming across environments.
Secret Key db_password [service]_[credential_type] Avoid sensitive info in names.
Monitoring Alert Name alert_failed_ingest_jobs alert_[condition]_[target] Clear and actionable.
Tags / Metadata Tag Key owner, env, cost_center lowercase, underscore-separated Use for governance and cost tracking.

Absolutely! Here's a table of naming anti-patterns—names you should avoid when working across a Databricks data platform. These examples highlight poor practices that can lead to confusion, maintenance issues, or governance problems.


đźš« Databricks Naming Anti-Patterns Table#

Area Bad Naming Example Why It’s Bad
Workspace Folder misc, temp, new_folder Vague, non-descriptive, and hard to manage at scale.
Notebook Untitled, test1, final_v2 Lacks context, versioning is unclear, and not reusable.
Job Name job1, pipeline_test, run_this Doesn’t describe purpose, frequency, or data domain.
Cluster Name cluster123, mycluster, test-cluster Not environment-specific or team-specific; hard to track usage.
Repo Name scripts, code, myrepo Too generic; doesn’t reflect project or function.
Schema Name db1, test_schema, junk Doesn’t indicate layer (bronze/silver/gold) or domain.
Table Name tbl1, data, temp_table Non-descriptive, unclear purpose, and may conflict with reserved words.
View Name view1, vw, temp_view Doesn’t indicate what data is being viewed or its purpose.
DLT Pipeline pipeline1, dlt_test Doesn’t reflect domain, layer, or business logic.
Function Name func1, doStuff, calc Ambiguous, not reusable, and lacks clarity.clarity.
Secrets password, key, secret1 Too generic; hard to manage securely across scopes.
Alert Name alert1, fail, error_job Not actionable or specific; hard to triage.
Tags / Metadata tag1, misc, x Not standardized; lacks governance and traceability.