Data Engineering Guardrails and Best Practices - General#

Updated by Gareth Stretch / 2025.03.13

Version Control#

Version	Date	Owner	Change Description
0.1	9 April 2025	Gareth Stretch	Initial Framework created

Naming Standards#

This section contains the guardrails and best practices for naming standards

Recommended areas to define naming standards.#

It is recommended to document , publish and train users on naming standards for at minimum the following areas.

Databricks Clusters
Landing zone implementation
Databases and schema's
Tables
Columns
Indexes and Constraints
Stored Procedures and Functions
Triggers
Databricks jobs
Security and roles
Service principles
Resource groups

Guard Rails#

Consistency : - Uniform Case Style : Use a consistent case style, such as lowercase with underscores (e.g., customer_data, order_details) - Standard Prefixes : Apply standard prefixes for different types of objects (e.g., tbl_ for tables, vw_ for views )
Clarity : - Descriptive Names : Ensure names are descriptive and clearly indicate the purpose or content of the object (e.g., sales_transactions, customer_profiles) - Avoid Abbreviations : Minimize the use of abbreviations unless they are widely understood within your organization
Avoid Reserved Words : - SQL Reserved Words: Avoid using SQL reserved words in names to prevent conflicts and errors (e.g., select, table, join)
Scalability : - Future-Proofing: Design naming conventions that can accommodate future growth and changes in the data model - Versioning: Include version numbers in names if applicable (e.g., customer_data_v1, customer_data_v2)
Documentation - Comprehensive Guidelines: Maintain a detailed document outlining all naming conventions and examples. - Training: Ensure all team members are trained on the naming standards and understand their importance
Integration - Interoperability: Ensure naming conventions are compatible with other systems and tools used in your data ecosystem. - Consistency Across Platforms: Apply the same naming conventions across different platforms and environments
Review and Enforcement - Regular Reviews: Implement a process for regularly reviewing and updating naming conventions. - Automated Checks: Use automated tools to enforce naming standards and catch deviations

Pitfalls to avoid#

When establishing naming conventions for a Databricks data platform, there are several common pitfalls to avoid to ensure clarity, consistency, and maintainability

Inconsistent Case Styles - Avoid Mixed Case Styles: Using mixed case styles (e.g., CamelCase, PascalCase, snake_case) can lead to confusion and errors. Stick to a single case style, such as lowercase with underscores (e.g., customer_data, order_details).
Ambiguous Names - Avoid Vague Names: Names like data1, table2, or temp do not provide any context about the content or purpose of the object. Use descriptive names that clearly indicate the object's role (e.g., sales_transactions, customer_profiles).
Abbreviations and Acronyms - Avoid Unclear Abbreviations: Using abbreviations or acronyms that are not widely understood can lead to confusion. If abbreviations are necessary, ensure they are documented and standardized.
Reserved Words - Avoid SQL Reserved Words: Using SQL reserved words (e.g., select, table, join) as names can cause conflicts and errors in queries3.
Lack of Standard Prefixes - Avoid Missing Prefixes: Not using standard prefixes for different types of objects can make it difficult to distinguish between tables, views, indexes, etc. Use prefixes like tbl_ for tables, vw_ for views, idx_ for indexes.
Overly Long Names - Avoid Excessively Long Names: Names that are too long can be cumbersome to work with and may exceed system limits. Aim for concise yet descriptive names2.
Inconsistent Naming Across Environments - Avoid Environment-Specific Naming: Ensure naming conventions are consistent across development, testing, and production environments to avoid confusion and errors during deployment.
Lack of Documentation - Avoid Undocumented Conventions: Failing to document naming conventions can lead to inconsistencies and misunderstandings. Maintain a comprehensive guide that outlines all naming standards and provides examples2.
Ignoring Business Terminology - Avoid Source-Specific Terminology: Use business terminology rather than source-specific terminology to ensure names are meaningful to all stakeholders (e.g., customer_id instead of user_id).

General Naming Guidelines to Follow:#

Use lowercase with underscores or kebab-case consistently.
Be descriptive but concise.
Include context: environment, domain, layer, purpose.
Avoid abbreviations unless they are widely understood.
Never use temporary or test names in production environments.

✅ Databricks Naming Standards Table#

Area	Object Type	Naming Convention Example	Pattern	Notes
Workspace	Folder	`team-data-eng`	`team-[function]`	Use team or domain names to organize notebooks and jobs.
	Notebook	`ingest_customer_data`	`[verb]_[object]_[details]`	Use snake_case; keep names action-oriented.
Jobs	Job Name	`daily_ingest_customer_data`	`[frequency]_[verb]_[object]`	Include frequency if scheduled.
	Task Name	`transform_orders_silver`	`[verb]_[object]_[layer]`	Reflects the transformation and target layer.
Clusters	Cluster Name	`dev-data-eng-shared`	`[env]-[team]-[purpose]`	Helps identify usage and environment.
Repos	Repo Name	`data-platform-utils`	`[project]-[purpose]`	Use kebab-case for Git repo names.
Data Storage	External Location	`s3://company-data/bronze/sales/`	`[cloud-provider]://[org]/[layer]/[domain]/`	Use consistent folder hierarchy.
Databases / Schemas	Schema Name	`bronze_sales`	`[layer]_[domain]`	Reflects medallion layer and domain.
	Table Name	`customer_orders`	`[subject]_[object]`	Use snake_case; avoid abbreviations.
	View Name	`vw_customer_orders_summary`	`vw_[table]_[purpose]`	Prefix with `vw_` to distinguish views.
Delta Live Tables (DLT)	Pipeline Name	`dlt_sales_pipeline`	`dlt_[domain]_pipeline`	Use `dlt_` prefix for clarity.
	Table Name	`silver_customer_orders`	`[layer]_[subject]_[object]`	Include layer in name.
Unity Catalog	Catalog Name	`prod_catalog`	`[env]_catalog`	Separate by environment (dev, test, prod).
	Managed Table	`gold_finance_kpi_summary`	`[layer]_[domain]_[purpose]`	Fully qualified: `catalog.schema.table`.
Functions / UDFs	Function Name	`fn_calculate_discount`	`fn_[verb]_[object]`	Prefix with `fn_` for clarity.
Secrets	Scope Name	`azure-keyvault-prod`	`[provider]-[purpose]-[env]`	Use consistent naming across environments.
	Secret Key	`db_password`	`[service]_[credential_type]`	Avoid sensitive info in names.
Monitoring	Alert Name	`alert_failed_ingest_jobs`	`alert_[condition]_[target]`	Clear and actionable.
Tags / Metadata	Tag Key	`owner`, `env`, `cost_center`	lowercase, underscore-separated	Use for governance and cost tracking.

Absolutely! Here's a table of naming anti-patterns—names you should avoid when working across a Databricks data platform. These examples highlight poor practices that can lead to confusion, maintenance issues, or governance problems.

🚫 Databricks Naming Anti-Patterns Table#

Area	Bad Naming Example	Why It’s Bad
Workspace Folder	`misc`, `temp`, `new_folder`	Vague, non-descriptive, and hard to manage at scale.
Notebook	`Untitled`, `test1`, `final_v2`	Lacks context, versioning is unclear, and not reusable.
Job Name	`job1`, `pipeline_test`, `run_this`	Doesn’t describe purpose, frequency, or data domain.
Cluster Name	`cluster123`, `mycluster`, `test-cluster`	Not environment-specific or team-specific; hard to track usage.
Repo Name	`scripts`, `code`, `myrepo`	Too generic; doesn’t reflect project or function.
Schema Name	`db1`, `test_schema`, `junk`	Doesn’t indicate layer (bronze/silver/gold) or domain.
Table Name	`tbl1`, `data`, `temp_table`	Non-descriptive, unclear purpose, and may conflict with reserved words.
View Name	`view1`, `vw`, `temp_view`	Doesn’t indicate what data is being viewed or its purpose.
DLT Pipeline	`pipeline1`, `dlt_test`	Doesn’t reflect domain, layer, or business logic.
Function Name	`func1`, `doStuff`, `calc`	Ambiguous, not reusable, and lacks clarity.clarity.
Secrets	`password`, `key`, `secret1`	Too generic; hard to manage securely across scopes.
Alert Name	`alert1`, `fail`, `error_job`	Not actionable or specific; hard to triage.
Tags / Metadata	`tag1`, `misc`, `x`	Not standardized; lacks governance and traceability.