Leverage Data Catalog to discover & annotate Qlik Sense assets

Image for post
Image for post
Background photo by Lauren Mancke on Unsplash

In this blog post, I’m going to share thoughts and knowledge that drove the development of a new sample connector for Google Data Catalog: the Qlik Sense connector.

This is the third open-source connector that allows customers to ingest metadata from BI Systems into Data Catalog: the ones for Looker and Tableau have been available for a while.

Disclaimer: Google and/or Google Cloud do not officially support any tool to connect Data Catalog to non-GCP systems at the time this article has been written down (January 2021). …


A custom model

Image for post
Image for post

More and more companies migrate their data workloads to the cloud every day. At the same time, new data protection regulation programs become effective around the globe. Such simultaneous events lead companies to enforce their Data Governance standards intending to avoid legal charges due to inappropriate data management.

Data Governance frameworks are hence becoming more common, but providers not necessarily speak the same language — what any of us could certainly expect as each player has their own strategies to tackle problems… On the other hand, customers have distinct requirements that are not fulfilled by a single platform. …


Hints and tips to create your own driver from scratch

Image for post
Image for post
Photo by Aron Visuals on Unsplash

GitLab is rich and scalable when it comes to software architecture, and Runner is one of the key components of such an ecosystem. It is responsible for running CI/CD jobs and deployed decoupled from the GitLab instance.

Runners come in multiple flavors: Docker, Kubernetes, Shell, SSH, VirtualBox, and others. Technically speaking, each “flavor” is implemented as an Executor. But what if you want to run the CI/CD jobs in an infrastructure that is not supported by the native executors or to scale your Runners fleet in a custom strategy that better fits your needs? …


How to leverage CSV, Cloud Storage, and Data Studio to support fast data-driven decision making

Image for post
Image for post
Photo by Luke Chesser on Unsplash

What comes to mind when you hear “data-driven”? Expensive BI system licenses? Complex ETL processes that prepare data to populate dashboards? Corporate or technical constraints that prevent you from accessing relevant data to make decisions? Or affordable tools that are available whenever you need them, even to guide simple decisions? I hope you answered “The last option!”, but I bring good news for you in case you didn’t.

Well, “data-driven” has been a trending topic for a long time and it is not difficult to understand why. As all of us know, people, gadgets, and devices — aka “things” — generate more and more data day after day; and, fortunately, tools that use such data to support decision-making are easier and more convenient to access. …


Leverage Data Catalog to discover & annotate Tableau assets

Image for post
Image for post
Background photo by Lauren Mancke on Unsplash

The Google Cloud Data Catalog Team has recently announced its product is now GA and ready to accept custom (aka user-defined) entries! This brand new feature opens up scope for integrations and now users can leverage Data Catalog’s well-known potential to manage metadata from almost any kind of data asset.

To demonstrate how it works, I’ll share design thoughts and sample code to connect Data Catalog to market-leader Business Intelligence/Data Visualization tools, covering Tableau metadata integration in this blog post. They come from the experience of participating in the development of fully operational sample connectors, publicly available on GitHub.

Disclaimer: Google and/or Google Cloud do not officially support any tool to connect Data Catalog to non-GCP systems at the time this article has been written down (May 2020). What you will find here is merely the result of my experience as a Data Catalog early adopter.


Leverage Data Catalog to discover & annotate Looker assets

Image for post
Image for post
Background photo by Lauren Mancke on Unsplash

The Google Cloud Data Catalog Team has recently announced its product is now GA and ready to accept custom (aka user-defined) entries! This brand new feature opens up scope for integrations and now users can leverage Data Catalog’s well-known potential to manage metadata from almost any kind of data asset.

To demonstrate how it works, I’ll share design thoughts and sample code to connect Data Catalog to market-leader Business Intelligence/Data Visualization tools, covering Looker metadata integration in this blog post. They come from the experience of participating in the development of fully operational sample connectors, publicly available on GitHub.

Disclaimer: Google and/or Google Cloud do not officially support any tool to connect Data Catalog to non-GCP systems at the time this article has been written down (Apr 2020). What you will find here is merely the result of my experience as a Data Catalog early adopter.


How the most important soft skill may help you to succeed

Image for post
Image for post
Photo by Power Digital Marketing on Unsplash

Some of my co-workers have been encouraging me to write something about soft skills for a while, so let me try!

Soft skills are a set of competencies a person should develop in order to empathize and do a better job, especially when working with teams. They’re no more or less important than technical (aka hard) skills. They are peers — they make relationships more human.

I’ll start with some insights from Gordon Haff (Red Hat). …


Leverage Public-key cryptography to establish secure connections

SSH into an AWS Fargate managed container
SSH into an AWS Fargate managed container
Background photo by chuttersnap on Unsplash

I’ve recently joined a conversation on how to establish secure connections to AWS Fargate-managed containers. SSH is one of the first options that come into mind when we talk about this… but how do these technologies fit together? After discussing with my teammates and some research, we came with a solution that seems to be compliant with AWS security standards, making use of a few products:

  • Amazon Elastic Container Registry
  • AWS Systems Manager Parameter Store (optional)
  • Amazon ECS Task Definitions
  • AWS Fargate


Using Cloud Functions and Node.js Client for Google Cloud Storage to build a data processing pipeline. No server setup required.

Serverless ETL on Google Cloud, a case study: raw data into JSON Lines
Serverless ETL on Google Cloud, a case study: raw data into JSON Lines
Background photo by Ilze Lucero on Unsplash

I’m working on a task that consists of populating BigQuery tables with Tomcat and Nginx access log data. Every day web servers upload new log files to GCS, containing raw data collected during the previous 24 hours. Data need to be converted into a format that is understood by BigQuery Jobs in order to be loaded into the tables.

I opted for the JSON Lines, or newline delimited JSON, to be the target format instead of CSV due to the nature of the data I’m handling. Since one can’t predict the data that will be transformed during this ETL process (e.g. …


Thoughts on data discovery and metadata management in Google Cloud

Image for post
Image for post
Photo by Jesse Bowser on Unsplash

2020 is getting closer, we live the information era, and I believe no one would disagree companies have access to an unprecedented amount of data, thus managing data assets in reliable terms becomes trickier day after day.

By managing data assets, I mean: finding affordable storage solutions, using the most appropriate and up-to-date information for business performance and marketing analysis, granting people the right access levels, and being compliant with new privacy regulations such as GDPR, HIPAA, and CCPA — only to list a few challenges.

Data management related discussions are among top 5 matters at present corporate conversations, and I often see many of them as parts of something bigger, called Data Governance. Although the interest for Data Governance has grown significantly in the past few years, there’s no silver-bullet-solution addressing all of its aspects. …

About

Ricardo Mendes

staff software engineer @ ciandt.com • google cloud certified architect • tech writer • father of a young princess • birder

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store