InData Engineer ThingsbyMatheus JericóCloud-Native Data Engineering: Orchestrating Spark on Kubernetes with Custom Airflow Operator and…Deploying Apache Airflow on Kubernetes with Custom Airflow Operator integrated with Apache Spark and Google Cloud StorageNov 9, 2024Nov 9, 2024
InTDS Archiveby💡Mike ShakhomirovA Guide To Data Pipeline Testing with PythonA gentle introduction to unit testing, mocking and patching for beginnersMar 9, 20241Mar 9, 20241
InThe Modern ScientistbyjanmeskensData Ingestion — Part 2: Tool Selection StrategyThis article is the second one in my series on data ingestion. For an introduction to the topic and to explore ‘data ingestion patterns’…Jan 17, 20245Jan 17, 20245
InThe Modern ScientistbyjanmeskensData Ingestion — Part 1: Architectural PatternsOver the course of two articles, I will thoroughly explore data ingestion, a fundamental process that bridges the operational and…Nov 27, 202324Nov 27, 202324
InTDS ArchivebyEvan HeitmanA Neanderthal’s Guide to Apache Spark in PythonTutorial on Getting Started with Spark for Complete BeginnersJun 14, 201911Jun 14, 201911
InBlog Técnico QuintoAndarbyFelipe MiquelimFrom Traditional BI to Lake House, a Data Architecture EvolutionHow QuintoAndar evolved its Data Platform to a Distributed Strategy.Jul 10, 2023Jul 10, 2023
InTDS ArchivebyTuan NguyenHow to Deploy dbt to Production using GitHub ActionsHow to (and not to) deploy dbt to production.Aug 21, 20214Aug 21, 20214
InTDS ArchivebyRomain GrangerHow to Use Partitions and Clusters in BigQuery Using SQLOptimize your costs and speed up your queriesJun 1, 20223Jun 1, 20223
InTDS ArchivebyBorna AlmasiHow to Measure Data Quality13 Metrics You Should Be Tracking (But Don’t)Mar 15, 20211Mar 15, 20211
InSeattleDataGuy By SeattleDataGuybyBen RogojanWhy Building Data Reliability Systems Is HardScaling Out Your Data Quality SystemApr 6, 20223Apr 6, 20223
InTDS ArchivebyJon LoyensHow Should We Be Thinking about Data Lineage?Get a top-down view of your data and analytics ecosystem with comprehensive lineageMar 22, 20221Mar 22, 20221
InGeek CulturebyAlvin LeeThe Rise of the Data Reliability EngineerEmerging Opportunities in DataMar 22, 2022Mar 22, 2022
InTDS ArchivebyEric BrodaData Mesh Patterns: Change Data CaptureData Mesh uses the Change Data Capture (CDC) pattern to move data reliably around the enterprise. This is a deep dive on the CDC pattern.Jan 26, 20223Jan 26, 20223
InTributary DatabyDunith Danushka8 Practical Use Cases of Change Data CaptureHow to apply the practices of Change Data Capture to reliably move data from operational databases to other systems for other purposes.Jun 20, 20211Jun 20, 20211
InSeattleDataGuy By SeattleDataGuybyBen RogojanCan Palantir Take Over The Entire Data StackOr Is It Too Late To The Modern Data Stack PartyMar 1, 20225Mar 1, 20225
InTDS ArchivebyElena GoydinaThe Missing Piece of Data Discovery and Observability Platforms: Open Standard for MetadataWhat could work for true data democratization at scale?Aug 3, 20212Aug 3, 20212
InTDS ArchivebyStephanie Shen7 Steps to Ensure and Sustain Data QualitySeveral years ago, I met a senior director from a large company. He mentioned the company he worked for was facing data quality issues…Jul 29, 20197Jul 29, 20197
InTDS ArchivebyPiethein StrengholtData domains and data productspractical guidance from the fieldNov 23, 202110Nov 23, 202110
InTDS ArchivebyBarr Moses7 Questions to Ask When Building Your Data TeamWhen it comes to ensuring your team is happy and productive, answering these questions will make all the differenceNov 18, 2021Nov 18, 2021
InTDS ArchivebyPrukalpaModern Data Stack Conference (MDSCON) 2021: The Top 5 Takeaways You Should KnowPro tips on growing data participation, protecting your data, increasing diversity, and moreOct 23, 2021Oct 23, 2021