{"id":17804,"date":"2024-03-05T12:38:20","date_gmt":"2024-03-05T12:38:20","guid":{"rendered":"https:\/\/beta.bluetab.net\/?p=17804"},"modified":"2026-06-15T01:11:47","modified_gmt":"2026-06-15T00:11:47","slug":"databricks-on-aws-an-architectural-perspective-part-2","status":"publish","type":"post","link":"https:\/\/beta.bluetab.net\/en\/2024\/03\/databricks-on-aws-an-architectural-perspective-part-2\/","title":{"rendered":"Databricks on AWS &#8211; An Architectural Perspective (part 2)"},"content":{"rendered":"<figure><a href=\"https:\/\/www.linkedin.com\/in\/ruben-villa-munoz\/\" target=\"_blank\" tabindex=\"-1\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/beta.bluetab.net\/wp-content\/uploads\/2024\/03\/1646762031857-150x150.jpeg\" alt=\"\" \/><\/a><\/figure>\n<h4><a href=\"https:\/\/www.linkedin.com\/in\/ruben-villa-munoz\/\" target=\"_blank\" rel=\"noopener\">Rub\u00e9n Villa<\/a><\/h4>\n<p>Big Data &#038; Cloud Architect\n<\/p>\n<figure><a href=\"https:\/\/www.linkedin.com\/in\/jongaraialdegallego\/\" target=\"_blank\" tabindex=\"-1\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/beta.bluetab.net\/wp-content\/uploads\/2024\/03\/1672931909474-150x150.jpeg\" alt=\"\" \/><\/a><\/figure>\n<h4><a href=\"https:\/\/www.linkedin.com\/in\/jongaraialdegallego\/\" target=\"_blank\" rel=\"noopener\">Jon Garaialde<\/a><\/h4>\n<p>Cloud Data Solutions Engineer\/Architect<\/p>\n<figure><a href=\"https:\/\/www.linkedin.com\/in\/alfonsojerezizquierdo\/\" target=\"_blank\" tabindex=\"-1\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/beta.bluetab.net\/wp-content\/uploads\/2024\/03\/alfonso-150x150.jpeg\" alt=\"\" \/><\/a><\/figure>\n<h4><a href=\"https:\/\/www.linkedin.com\/in\/alfonsojerezizquierdo\/\" target=\"_blank\" rel=\"noopener\">Alfonso Jerez<\/a><\/h4>\n<p>Analytics Engineer | GCP | AWS | Python Dev | Azure | Databricks | Spark<\/p>\n<figure><a href=\"https:\/\/www.linkedin.com\/in\/albertojaenrevuelta\/\" target=\"_blank\" tabindex=\"-1\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/beta.bluetab.net\/wp-content\/uploads\/2022\/01\/1607606292897-150x150.jpg\" alt=\"\" \/><\/a><\/figure>\n<h4><a href=\"https:\/\/www.linkedin.com\/in\/albertojaenrevuelta\/\" target=\"_blank\" rel=\"noopener\">Alberto Ja\u00e9n<\/a><\/h4>\n<p>Cloud Engineer | 3x AWS Certified | 2x HashiCorp Certified | GitHub: ajaen4\n<\/p>\n<p>This article is the second in a two-part series aimed at addressing the integration of Databricks in AWS environments by analyzing the alternatives offered by the product concerning architectural design. The first part discussed topics more related to architecture and networking, while in this second installment, we will cover subjects related to security and general administration.<\/p>\n<p>The contents of each article are as follows:<\/p>\n<p><strong>First installment:<\/strong><\/p>\n<ul>\n<li>Introduction<\/li>\n<li>Data Lakehouse &#038; Delta<\/li>\n<li>Concepts<\/li>\n<li>Architecture<\/li>\n<li>Plans and types of workloads<\/li>\n<li>Networking<\/li>\n<\/ul>\n<p><strong>This installment:<\/strong><\/p>\n<ul>\n<li>Security<\/li>\n<li>Persistence<\/li>\n<li>Billing<\/li>\n<\/ul>\n<p>The first article can be visited at the following\u00a0<a href=\"https:\/\/beta.bluetab.net\/en\/databricks-on-aws-an-architectural-perspective-part-1\/\" target=\"_blank\" rel=\"noopener\">link<\/a>.<\/p>\n<h2>Glossary<\/h2>\n<ul>\n<li><strong>Control Plane:<\/strong>\u00a0Hosts Databricks\u2019 backend services necessary to provide the graphical interface, REST APIs for account and workspaces management. These services are deployed in an AWS account owned by Databricks. Refer to the first article for more information.<\/li>\n<li><strong>Credentials Passthrough:<\/strong>\u00a0Mechanism used by Databricks for managing access to different data sources. Refer to the first article for more information.<\/li>\n<li><strong>Cross-account role:<\/strong>\u00a0Role provided for Databricks to assume from its AWS account. It is used to deploy infrastructure and assume other roles within AWS. Refer to the first article for more information.<\/li>\n<li><strong>Compute Plane:<\/strong>\u00a0Hosts all the infrastructure necessary for data processing: persistence, clusters, logging services, Spark libraries, etc. The Data Plane is deployed in the client\u2019s AWS account. Refer to the first article for more information.<\/li>\n<li><strong>Data role:<\/strong>\u00a0Roles with access\/write permissions to S3 buckets that will be assumed by the cluster through the meta instance profile. Refer to the first article for more information.<\/li>\n<li><strong>DBFS:<\/strong>\u00a0Distributed storage system available for clusters. It is an abstraction over an object storage system, in this case, S3, and allows access to files and folders without the need to use URLs. Refer to the first article for more information.<\/li>\n<li><strong>IAM Policies:<\/strong>\u00a0Policies through which access permissions are defined in AWS.<\/li>\n<li><strong>Key Management Service (KMS):<\/strong>\u00a0AWS service that allows creating and managing encryption keys.<\/li>\n<li><strong>Pipelines:<\/strong>\u00a0Series of processes in which a set of data is executed.<\/li>\n<li><strong>Prepared:<\/strong>\u00a0Processed data from raw used as a basis for creating Trusted data.<\/li>\n<li><strong>Init Script (User Data Script):<\/strong>\u00a0EC2 instances launched from Databricks clusters allow including a script to install software updates, download libraries\/modules, etc., at the time it starts.<\/li>\n<li><strong>Mount:<\/strong>\u00a0To avoid internally loading the data required for the process, Databricks enables synchronization with external sources, such as S3, to facilitate interaction with different files (simulating that they are local, making relative paths simpler) while actually stored in the corresponding external storage source.<\/li>\n<li><strong>Personal Access (PAT) Token:<\/strong>\u00a0Token for personal authentication that replaces username and password authentication.<\/li>\n<li><strong>Raw:<\/strong>\u00a0Ingested raw data.<\/li>\n<li><strong>Root Bucket:<\/strong>\u00a0Root directory for the workspace (DBFS root). Used to host cluster logs, notebook revisions, and libraries. Refer to the first article for more information.<\/li>\n<li><strong>Secret Scope:<\/strong>\u00a0Environment to store sensitive information through key-value pairs (name \u2013 secret)<\/li>\n<li><strong>Trusted:<\/strong>\u00a0Data prepared for visualization and study by different interest groups.<\/li>\n<li><strong>Workflows:<\/strong>\u00a0Sequence of tasks.<\/li>\n<\/ul>\n<h2>Security<\/h2>\n<p>Visit Data security and encryption this\u00a0<a href=\"https:\/\/docs.databricks.com\/en\/security\/keys\/index.html\" target=\"_blank\" rel=\"noopener\">link<\/a><\/p>\n<p>Databricks introduces data security configurations to safeguard information in transit or at rest. The documentation provides a comprehensive overview of the available encryption features. These features encompass:<\/p>\n<ul>\n<li>\n<p><strong>Customer-managed keys for encryption:<\/strong>\u00a0Enabling the protection and access control of data in the Databricks control plane, including source files of notebooks, notebook results, secrets, SQL queries, and personal access tokens.<\/p>\n<\/li>\n<li>\n<p><strong>Encryption of traffic between cluster nodes:<\/strong>\u00a0Ensuring the security of communication between nodes within the cluster.<\/p>\n<\/li>\n<li>\n<p><strong>Encryption of queries and results:<\/strong>\u00a0Securing the privacy of queries and the stored results.<\/p>\n<\/li>\n<li>\n<p><strong>Encryption of S3 buckets at rest:<\/strong>\u00a0Providing security for data stored in S3 buckets.<\/p>\n<\/li>\n<\/ul>\n<p>It\u2019s essential to highlight that within the support for customer-managed keys:<\/p>\n<ul>\n<li><strong>Keys can be configured to encrypt data in the root S3 bucket and EBS volumes of the cluster.<\/strong><\/li>\n<\/ul>\n<p>Another capability offered by Databricks is the use of AWS KMS keys to encrypt SQL queries and their history stored in the control plane.<\/p>\n<p>Lastly, it also facilitates the encryption of traffic between cluster nodes and the administration of security configurations for the workspace by administrators.<\/p>\n<p>In this article, we will delve into two of the options: customer-managed keys and the encryption of traffic between cluster worker nodes.<\/p>\n<h4>Customer-managed keys<\/h4>\n<p>Visit Customer-managed keys this\u00a0<a href=\"https:\/\/docs.databricks.com\/en\/security\/keys\/customer-managed-keys.html\" target=\"_blank\" rel=\"noopener\">link<\/a><\/p>\n<p>Databricks account administrators can configure managed keys for encryption. Two use cases are highlighted for adding a customer-managed key: data from managed services in the Databricks control plane (such as notebooks, secrets, and SQL queries) and workspace storage (root S3 buckets and EBS volumes).<\/p>\n<p>It\u2019s important to note that managed keys for EBS volumes do not apply to serverless compute resources, as these disks are ephemeral and tied to the lifecycle of the serverless workload. In the Databricks documentation, there are comparisons of use cases for customer-managed keys, and it is mentioned that this feature is available in the Enterprise subscription.<\/p>\n<p>Regarding the concept of encryption key configurations, these are account-level objects that reference user cloud keys. Account administrators can create these configurations in the account console and associate them with one or more workspaces. The configuration process involves creating or selecting a symmetric key in AWS KMS and subsequently editing the key policy to allow Databricks to perform encryption and decryption operations. Detailed instructions, along with examples of necessary JSON policies for both use configurations (managed services and workspace storage), can be found in the documentation.<\/p>\n<p>Lastly, there is the option to add an access policy to a cross-account IAM role in AWS, in case the KMS key is in a different account.<\/p>\n<h4>Encryption in transit<\/h4>\n<p>For this part, it is crucial to emphasize the importance of the init script.<\/p>\n<p>Encryption in transit<\/p>\n<ul>\n<li><a href=\"https:\/\/docs.databricks.com\/en\/security\/keys\/encrypt-otw.html\" target=\"_blank\" rel=\"noopener\">Encrypt traffic between cluster worker nodes<\/a><\/li>\n<li><a href=\"https:\/\/docs.databricks.com\/en\/security\/keys\/encrypt-otw.html#example-init-script\" target=\"_blank\" rel=\"noopener\">Example init script<\/a><\/li>\n<li><a href=\"https:\/\/docs.databricks.com\/en\/init-scripts\/cluster-scoped.html\">Use cluster-scoped init scripts<\/a><\/li>\n<\/ul>\n<p>In Databricks, it is crucial to highlight the significance of the init script, which, among other functions, is used to configure encryption between worker nodes in a Spark cluster. This init script enables the retrieval of a shared encryption secret from the key scope stored in DBFS. If the secret is rotated by updating the key store file in DBFS, all running clusters must be restarted to avoid authentication issues between Spark workers and the driver. It\u2019s noteworthy that, since the shared secret is stored in DBFS, any user with access to DBFS can retrieve the secret through a notebook.<\/p>\n<p>While specific AWS instances automatically encrypt data between worker nodes without additional configuration, using the init script provides an added level of security for data in transit or complete control over the type of encryption to be applied.<\/p>\n<p>The script is responsible for obtaining the secret from the key store and its password, as well as configuring the necessary Spark parameters for encryption. Launched as Bash, it performs these tasks and, if necessary, waits until the key store file is available in DBFS and derives the shared encryption secret from the hash of the key store file. Once the initialization of the driver and worker nodes is complete, all traffic between these nodes will be encrypted using the key store file.<\/p>\n<p>These features are part of the Enterprise plan.<\/p>\n<h2>Persistence and Metastores<br \/>\n<\/h2>\n<p>Databricks supports two main types of persistent storage: DBFS (Databricks File System) and S3 (Amazon Simple Storage Service).<\/p>\n<p><strong>DBFS<\/strong><\/p>\n<p>DBFS is an integrated distributed file system directly connected to Databricks, storing data in the cluster and workspace\u2019s local storage. It provides a file interface similar to standard HDFS, facilitating collaboration by offering a centralized place to store and access data.<\/p>\n<p><strong>S3<\/strong><\/p>\n<p>On the other hand, Databricks can also connect directly to data stored in Amazon S3. S3 data is independent of clusters and workspaces and can be accessed by multiple clusters and users. S3 stands out for its scalability, durability, and the ability to separate storage and computation, making data access easy even from multiple environments.<\/p>\n<p>Regarding metastores, Databricks on AWS supports various types, including:<\/p>\n<p><strong>Hive Metastore<\/strong><\/p>\n<p>Databricks can integrate with the Hive metastore, allowing users to use tables and schemas defined in Hive.<\/p>\n<p><strong>Glue Metastore in Data Plane<\/strong><\/p>\n<p>Databricks also has the option to host the metastore in the compute plane itself with Glue.<\/p>\n<p>These metastores enable users to manage and query table metadata, facilitating schema management and integration with other data services. The choice of metastore will depend on the specific workflow requirements and metadata management preferences in the Databricks environment on AWS.<\/p>\n<p><strong>Unity Catalog<\/strong><\/p>\n<p>Undoubtedly, a new feature of Databricks that unifies these previous metastores and enhances the various options and tools each of them offers is the Unity Catalog.<\/p>\n<p>\u00a0\t\t<\/p>\n<figure>\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"444\" height=\"310\" src=\"https:\/\/beta.bluetab.net\/wp-content\/uploads\/2024\/03\/image1-1.png\" alt=\"\" srcset=\"https:\/\/beta.bluetab.net\/wp-content\/uploads\/2024\/03\/image1-1.png 444w, https:\/\/beta.bluetab.net\/wp-content\/uploads\/2024\/03\/image1-1-300x209.png 300w\" sizes=\"(max-width: 444px) 100vw, 444px\" \/><figcaption><\/figcaption><\/figure>\n<p>Unity Catalog provides centralized capabilities for access control, auditing, lineage, and data discovery.<\/p>\n<p>Key Features of Unity Catalog:<\/p>\n<ul>\n<li>Manages data access policies in a single location that apply to all defined workspaces.<\/li>\n<li>Based on ANSI SQL, it allows administrators to grant these permissions using SQL syntax.<\/li>\n<li>Automatically captures user-level audit logs.<\/li>\n<li>Enables labeling tables and schemas, providing an efficient search interface to find information.<\/li>\n<\/ul>\n<p>Databricks recommends configuring all access to cloud object storage through Unity Catalog to manage relationships between data in Databricks and cloud storage.<\/p>\n<h3><strong>Unity Catalog Object Model<\/strong><\/h3>\n<ul>\n<li><strong>Metastore:<\/strong>\u00a0Top-level metadata container, exposes a three-level namespace (catalog.schema.table).<\/li>\n<li><strong>Catalog:<\/strong>\u00a0Organizes data assets, the first layer in the hierarchy.<\/li>\n<li><strong>Schema:<\/strong>\u00a0Second layer, organizes tables and views.<\/li>\n<li><strong>Tables, Views, and Volumes:<\/strong>\u00a0Lower levels, with volumes providing non-tabular access to data.<\/li>\n<li><strong>Models:<\/strong>\u00a0Not data assets, record machine learning models.<\/li>\n<\/ul>\n<figure>\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"952\" height=\"442\" src=\"https:\/\/beta.bluetab.net\/wp-content\/uploads\/2024\/03\/image2-1.png\" alt=\"\" srcset=\"https:\/\/beta.bluetab.net\/wp-content\/uploads\/2024\/03\/image2-1.png 952w, https:\/\/beta.bluetab.net\/wp-content\/uploads\/2024\/03\/image2-1-300x139.png 300w, https:\/\/beta.bluetab.net\/wp-content\/uploads\/2024\/03\/image2-1-768x357.png 768w\" sizes=\"(max-width: 952px) 100vw, 952px\" \/><figcaption><\/figcaption><\/figure>\n<h2>Billing<\/h2>\n<p>Here is a detailed explanation of Databricks\u2019 function on AWS that enables the delivery and access to billable usage logs. Account administrators can configure the daily delivery of CSV logs to an AWS S3 bucket. Each CSV file provides historical data on the usage of clusters in Databricks, categorizing them by criteria such as cluster ID, billing SKU, cluster creator, and tags. The delivery includes logs for both running workspaces and those canceled, ensuring the proper representation of the last day of such a workspace (it must have been operational for at least 24 hours).<\/p>\n<p>The setup involves creating an S3 bucket and an IAM role in AWS, along with calling the Databricks API to set up storage configuration objects and credentials. The cross-account support option allows delivery to different AWS accounts through an S3 bucket policy. CSV files are located at <bucket-name>\/<prefix>\/billable-usage\/csv\/, and it is advisable to review S3 security best practices.<\/p>\n<p>The account API allows shared configurations for all workspaces or separate configurations for each space or group. The delivery of these CSVs enables account owners to directly download the logs. The S3 object ownership is auto-configured as \u201cBucket owner preferred\u201d to support ownership of newly created objects.<\/p>\n<p>There is a limit on the number of log delivery configurations, and one needs to be an account administrator, providing the account ID. Extra caution is required when configuring the S3 object property as \u201cObject writer\u201d instead of \u201cBucket owner preferred\u201d due to potential access difficulties.<\/p>\n<table id=\"eael-data-table-1999682\">\n<thead>\n<tr>\n<th id=\"\" colspan=\"Fields\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\tCampos<\/th>\n<th id=\"\" colspan=\"\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\tDescripci\u00f3n<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td colspan=\"\" rowspan=\"\" id=\"\">\n\t\t\t\t\t\t\t\t\t\t\t\tworkspaceId\n\t\t\t\t\t\t\t\t\t\t\t<\/td>\n<td colspan=\"\" rowspan=\"\" id=\"\">\n\t\t\t\t\t\t\t\t\t\t\t\tWorkspace Id\n\t\t\t\t\t\t\t\t\t\t\t<\/td>\n<\/tr>\n<tr>\n<td colspan=\"\" rowspan=\"\" id=\"\">\n\t\t\t\t\t\t\t\t\t\t\t\ttimestamp\n\t\t\t\t\t\t\t\t\t\t\t<\/td>\n<td colspan=\"\" rowspan=\"\" id=\"\">\n\t\t\t\t\t\t\t\t\t\t\t\tEstablished frequency (hourly, daily,&#8230;)\n\t\t\t\t\t\t\t\t\t\t\t<\/td>\n<\/tr>\n<tr>\n<td colspan=\"\" rowspan=\"\" id=\"\">\n\t\t\t\t\t\t\t\t\t\t\t\tclusterId\n\t\t\t\t\t\t\t\t\t\t\t<\/td>\n<td colspan=\"\" rowspan=\"\" id=\"\">\n\t\t\t\t\t\t\t\t\t\t\t\tCluster Id\n\t\t\t\t\t\t\t\t\t\t\t<\/td>\n<\/tr>\n<tr>\n<td colspan=\"\" rowspan=\"\" id=\"\">\n\t\t\t\t\t\t\t\t\t\t\t\tclusterName\n\t\t\t\t\t\t\t\t\t\t\t<\/td>\n<td colspan=\"\" rowspan=\"\" id=\"\">\n\t\t\t\t\t\t\t\t\t\t\t\tName assigned to the Cluster\n\t\t\t\t\t\t\t\t\t\t\t<\/td>\n<\/tr>\n<tr>\n<td colspan=\"\" rowspan=\"\" id=\"\">\n\t\t\t\t\t\t\t\t\t\t\t\tclusterNodeType\n\t\t\t\t\t\t\t\t\t\t\t<\/td>\n<td colspan=\"\" rowspan=\"\" id=\"\">\n\t\t\t\t\t\t\t\t\t\t\t\tType of node assigned\n\t\t\t\t\t\t\t\t\t\t\t<\/td>\n<\/tr>\n<tr>\n<td colspan=\"\" rowspan=\"\" id=\"\">\n\t\t\t\t\t\t\t\t\t\t\t\tclusterOwnerUserId\n\t\t\t\t\t\t\t\t\t\t\t<\/td>\n<td colspan=\"\" rowspan=\"\" id=\"\">\n\t\t\t\t\t\t\t\t\t\t\t\tCluster creator user id\n\t\t\t\t\t\t\t\t\t\t\t<\/td>\n<\/tr>\n<tr>\n<td colspan=\"\" rowspan=\"\" id=\"\">\n\t\t\t\t\t\t\t\t\t\t\t\tclusterCustomTags\n\t\t\t\t\t\t\t\t\t\t\t<\/td>\n<td colspan=\"\" rowspan=\"\" id=\"\">\n\t\t\t\t\t\t\t\t\t\t\t\tCustomizable cluster information labels\n\t\t\t\t\t\t\t\t\t\t\t<\/td>\n<\/tr>\n<tr>\n<td colspan=\"\" rowspan=\"\" id=\"\">\n\t\t\t\t\t\t\t\t\t\t\t\tsku\n\t\t\t\t\t\t\t\t\t\t\t<\/td>\n<td colspan=\"\" rowspan=\"\" id=\"\">\n\t\t\t\t\t\t\t\t\t\t\t\tPackage assigned by Databricks in relation to the cluster characteristics.\n\t\t\t\t\t\t\t\t\t\t\t<\/td>\n<\/tr>\n<tr>\n<td colspan=\"\" rowspan=\"\" id=\"\">\n\t\t\t\t\t\t\t\t\t\t\t\tdbus\n\t\t\t\t\t\t\t\t\t\t\t<\/td>\n<td colspan=\"\" rowspan=\"\" id=\"\">\n\t\t\t\t\t\t\t\t\t\t\t\tDBUs consumption per machine hour\n\t\t\t\t\t\t\t\t\t\t\t<\/td>\n<\/tr>\n<tr>\n<td colspan=\"\" rowspan=\"\" id=\"\">\n\t\t\t\t\t\t\t\t\t\t\t\tmachineHours\n\t\t\t\t\t\t\t\t\t\t\t<\/td>\n<td colspan=\"\" rowspan=\"\" id=\"\">\n\t\t\t\t\t\t\t\t\t\t\t\tCluster deployment machine hours\n\t\t\t\t\t\t\t\t\t\t\t<\/td>\n<\/tr>\n<tr>\n<td colspan=\"\" rowspan=\"\" id=\"\">\n\t\t\t\t\t\t\t\t\t\t\t\tclusterOwnerUserName\n\t\t\t\t\t\t\t\t\t\t\t<\/td>\n<td colspan=\"\" rowspan=\"\" id=\"\">\n\t\t\t\t\t\t\t\t\t\t\t\tUsername of the cluster creator\n\t\t\t\t\t\t\t\t\t\t\t<\/td>\n<\/tr>\n<tr>\n<td colspan=\"\" rowspan=\"\" id=\"\">\n\t\t\t\t\t\t\t\t\t\t\t\ttags\n\t\t\t\t\t\t\t\t\t\t\t<\/td>\n<td colspan=\"\" rowspan=\"\" id=\"\">\n\t\t\t\t\t\t\t\t\t\t\t\tCustomizable cluster information labels\n\t\t\t\t\t\t\t\t\t\t\t<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2>Referencias<\/h2>\n<ol>\n<li><a href=\"https:\/\/beta.bluetab.net\/databricks-sobre-aws-una-perspectiva-de-arquitectura-parte-1\/\" target=\"_blank\" rel=\"noopener\" data-saferedirecturl=\"https:\/\/www.google.com\/url?q=https:\/\/beta.bluetab.net\/databricks-sobre-aws-una-perspectiva-de-arquitectura-parte-1\/&#038;source=gmail&#038;ust=1709202927740000&#038;usg=AOvVaw34BiOAyTAmAoC7b1qhbueZ\">https:\/\/beta.bluetab.net\/<wbr \/>databricks-sobre-aws-una-<wbr \/>perspectiva-de-arquitectura-<wbr \/>parte-1\/<\/a>\u00a0<\/li>\n<li><a href=\"https:\/\/docs.databricks.com\/en\/security\/keys\/index.html\" target=\"_blank\" rel=\"noopener\" data-saferedirecturl=\"https:\/\/www.google.com\/url?q=https:\/\/docs.databricks.com\/en\/security\/keys\/index.html&#038;source=gmail&#038;ust=1709202927740000&#038;usg=AOvVaw0FEhf6TZvgwXeqyphQiclx\">https:\/\/docs.databricks.com\/<wbr \/>en\/security\/keys\/index.html<\/a>\u00a0| 2024-02-06<\/li>\n<li><a href=\"https:\/\/docs.databricks.com\/en\/security\/keys\/customer-managed-keys.html\" target=\"_blank\" rel=\"noopener\" data-saferedirecturl=\"https:\/\/www.google.com\/url?q=https:\/\/docs.databricks.com\/en\/security\/keys\/customer-managed-keys.html&#038;source=gmail&#038;ust=1709202927740000&#038;usg=AOvVaw0FWWmeWyHQyb_trU7S58nz\">https:\/\/docs.databricks.com\/<wbr \/>en\/security\/keys\/customer-<wbr \/>managed-keys.html<\/a>\u00a0|\u00a0 2024-02-06<\/li>\n<li><a href=\"https:\/\/docs.databricks.com\/en\/security\/keys\/encrypt-otw.html\" target=\"_blank\" rel=\"noopener\" data-saferedirecturl=\"https:\/\/www.google.com\/url?q=https:\/\/docs.databricks.com\/en\/security\/keys\/encrypt-otw.html&#038;source=gmail&#038;ust=1709202927740000&#038;usg=AOvVaw1X0pBGm9KTTxLxXayEVoPy\">https:\/\/docs.databricks.com\/<wbr \/>en\/security\/keys\/encrypt-otw.<wbr \/>html<\/a>\u00a0| 2024-02-24<\/li>\n<li><a href=\"https:\/\/docs.databricks.com\/en\/security\/keys\/encrypt-otw.html#example-init-script\" target=\"_blank\" rel=\"noopener\" data-saferedirecturl=\"https:\/\/www.google.com\/url?q=https:\/\/docs.databricks.com\/en\/security\/keys\/encrypt-otw.html%23example-init-script&#038;source=gmail&#038;ust=1709202927740000&#038;usg=AOvVaw2UMBYCHMkPL40tjODrSels\">https:\/\/docs.databricks.com\/<wbr \/>en\/security\/keys\/encrypt-otw.<wbr \/>html#example-init-script<\/a>\u00a0|\u00a0 2024-02-24<\/li>\n<li><a href=\"https:\/\/docs.databricks.com\/en\/init-scripts\/cluster-scoped.html\" target=\"_blank\" rel=\"noopener\" data-saferedirecturl=\"https:\/\/www.google.com\/url?q=https:\/\/docs.databricks.com\/en\/init-scripts\/cluster-scoped.html&#038;source=gmail&#038;ust=1709202927740000&#038;usg=AOvVaw0EVewLXEyV2mrkzO7kkwng\">https:\/\/docs.databricks.com\/<wbr \/>en\/init-scripts\/cluster-<wbr \/>scoped.html<\/a>\u00a0|\u00a0 2023-12-05<\/li>\n<li><a href=\"https:\/\/docs.databricks.com\/en\/data-governance\/unity-catalog\/index.html\" target=\"_blank\" rel=\"noopener\" data-saferedirecturl=\"https:\/\/www.google.com\/url?q=https:\/\/docs.databricks.com\/en\/data-governance\/unity-catalog\/index.html&#038;source=gmail&#038;ust=1709202927740000&#038;usg=AOvVaw1iWyRxjGQvYWsScke9k94S\">https:\/\/docs.databricks.com\/<wbr \/>en\/data-governance\/unity-<wbr \/>catalog\/index.html<\/a>\u00a0| 2024-02-26<\/li>\n<\/ol>\n<p>\u00a0\t\t<\/p>\n<h4>\n\t\t\t\tNavegaci\u00f3n\t\t\t<\/h4>\n<h5>Do you want to know more about what we offer and to see other success stories?<br \/>\n<\/h5>\n<p>\t\t\t<a href=\"\/\"><br \/>\n\t\t\t\t\t\t\t\t\tDISCOVER BLUETAB<br \/>\n\t\t\t\t\t<\/a><\/p>\n<figure><a href=\"https:\/\/www.linkedin.com\/in\/ruben-villa-munoz\/\" target=\"_blank\" tabindex=\"-1\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/beta.bluetab.net\/wp-content\/uploads\/2024\/03\/1646762031857-150x150.jpeg\" alt=\"\" \/><\/a><\/figure>\n<h4><a href=\"https:\/\/www.linkedin.com\/in\/ruben-villa-munoz\/\" target=\"_blank\" rel=\"noopener\">Rub\u00e9n Villa<\/a><\/h4>\n<p>Big Data &#038; Cloud Architect\n<\/p>\n<figure><a href=\"https:\/\/www.linkedin.com\/in\/alfonsojerezizquierdo\/\" target=\"_blank\" tabindex=\"-1\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/beta.bluetab.net\/wp-content\/uploads\/2024\/03\/alfonso-150x150.jpeg\" alt=\"\" \/><\/a><\/figure>\n<h4><a href=\"https:\/\/www.linkedin.com\/in\/alfonsojerezizquierdo\/\" target=\"_blank\" rel=\"noopener\">Alfonso Jerez<\/a><\/h4>\n<p>Analytics Engineer | GCP | AWS | Python Dev | Azure | Databricks | Spark<\/p>\n<figure><a href=\"https:\/\/www.linkedin.com\/in\/jongaraialdegallego\/\" target=\"_blank\" tabindex=\"-1\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/beta.bluetab.net\/wp-content\/uploads\/2024\/03\/1672931909474-150x150.jpeg\" alt=\"\" \/><\/a><\/figure>\n<h4><a href=\"https:\/\/www.linkedin.com\/in\/jongaraialdegallego\/\" target=\"_blank\" rel=\"noopener\">Jon Garaialde<\/a><\/h4>\n<p>Cloud Data Solutions Engineer\/Architect<\/p>\n<figure><a href=\"https:\/\/www.linkedin.com\/in\/albertojaenrevuelta\/\" target=\"_blank\" tabindex=\"-1\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/beta.bluetab.net\/wp-content\/uploads\/2022\/01\/1607606292897-150x150.jpg\" alt=\"\" \/><\/a><\/figure>\n<h4><a href=\"https:\/\/www.linkedin.com\/in\/albertojaenrevuelta\/\" target=\"_blank\" rel=\"noopener\">Alberto Ja\u00e9n<\/a><\/h4>\n<p>Cloud Engineer | 3x AWS Certified | 2x HashiCorp Certified | GitHub: ajaen4\n<\/p>\n<p>\t\t<b>SOLUTIONS, <\/b>WE ARE EXPERTS<br \/>\n\t\t\t\t\t<a href=\"\/soluciones\/data-strategy\/\"><\/p>\n<h5>\n\t\t\t\t\t\tDATA STRATEGY\t\t\t\t\t<\/h5>\n<p>\t\t\t\t\t\t<\/a><br \/>\n\t\t\t\t\t<a href=\"\/soluciones\/data-fabric\/\"><\/p>\n<h5>\n\t\t\t\t\t\tDATA FABRIC\t\t\t\t\t<\/h5>\n<p>\t\t\t\t\t\t<\/a><br \/>\n\t\t\t\t\t<a href=\"\/soluciones\/augmented-analytics\/\"><\/p>\n<h5>\n\t\t\t\t\t\tAUGMENTED ANALYTICS\t\t\t\t\t<\/h5>\n<p>\t\t\t\t\t\t<\/a><br \/>\n\t\tYou may be interested in<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Rub\u00e9n Villa Big Data &#038; Cloud Architect Jon Garaialde Cloud Data Solutions Engineer\/Architect Alfonso Jerez Analytics Engineer | GCP | AWS | Python Dev | Azure | Databricks | Spark Alberto Ja\u00e9n Cloud Engineer | 3x AWS Certified | 2x HashiCorp Certified | GitHub: ajaen4 This article is the second in a two-part series aimed [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":20842,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[7,29,30],"tags":[],"class_list":["post-17804","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog-es","category-practices-en","category-tech-en"],"acf":[],"_links":{"self":[{"href":"https:\/\/beta.bluetab.net\/en\/wp-json\/wp\/v2\/posts\/17804","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/beta.bluetab.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/beta.bluetab.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/beta.bluetab.net\/en\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/beta.bluetab.net\/en\/wp-json\/wp\/v2\/comments?post=17804"}],"version-history":[{"count":1,"href":"https:\/\/beta.bluetab.net\/en\/wp-json\/wp\/v2\/posts\/17804\/revisions"}],"predecessor-version":[{"id":21156,"href":"https:\/\/beta.bluetab.net\/en\/wp-json\/wp\/v2\/posts\/17804\/revisions\/21156"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/beta.bluetab.net\/en\/wp-json\/wp\/v2\/media\/20842"}],"wp:attachment":[{"href":"https:\/\/beta.bluetab.net\/en\/wp-json\/wp\/v2\/media?parent=17804"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/beta.bluetab.net\/en\/wp-json\/wp\/v2\/categories?post=17804"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/beta.bluetab.net\/en\/wp-json\/wp\/v2\/tags?post=17804"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}