DataExpert-io/data-engineer-handbook
Fork: 3932 Star: 23064 (更新于 2024-12-15 10:59:11)
license: 暂无
Language: Jupyter Notebook .
This is a repo with links to everything you'd ever want to learn about data engineering
The Data Engineering Handbook
This repo has all the resources you need to become an amazing data engineer!
Getting started
If you are new to data engineering, start by following this 2024 breaking into data engineering roadmap
If you are here for the 6-week free YouTube boot camp you can check out
For more applied learning:
- Check out the projects section for more hands-on examples!
- Check out the interviews section for more advice on how to pass data engineering interviews!
- Check out the books section for a list of high quality data engineering books
- Check out the communities section for a list of high quality data engineering communities to join
- Check out the newsletter section to learn via email
Resources
Great list of over 25 books
Top 3 must read books are:
- Fundamentals of Data Engineering
- Designing Data-Intensive Applications
- Designing Machine Learning Systems
Great list of over 10 communities to join:
Top must-join communities for DE:
Top must-join communities for ML:
Companies:
- Orchestration
- Data Lake / Cloud
- Data Warehouse
- Data Quality
- Education Companies
- Analytics / Visualization
- Data Integration
- Semantic Layers
- Cube
- dbt Semantic Layer
- Modern OLAP
- LLM application library
- Real-Time Data
Data Engineering blogs of companies:
- Netflix
- Uber
- Databricks
- Airbnb
- Amazon AWS Blog
- Microsoft Data Architecture Blogs
- Microsoft Fabric Blog
- Oracle
- Meta
- Onehouse
Data Engineering Whitepapers:
- A Five-Layered Business Intelligence Architecture
- Lakehouse:A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics
- Big Data Quality: A Data Quality Profiling Model
- The Data Lakehouse: Data Warehousing and More
- Spark: Cluster Computing with Working Sets
- The Google File System
- Building a Universal Data Lakehouse
- XTable in Action: Seamless Interoperability in Data Lakes
- MapReduce: Simplified Data Processing on Large Clusters
- Tidy Data
Social Media Accounts
Here's the mostly comprehensive list of data engineering creators: (You have to have at least 5k followers somewhere to be added!)
Name | YouTube |
|
X/Twitter |
|
TikTok |
---|---|---|---|---|---|
Zach Wilson | Data with Zach (70k+) | Zach Wilson (400k+) | EcZachly (30k+) | eczachly (150k+) | @eczachly (70k+) |
Shashank Mishra | E-learning Bridge (100k+) | Shashank Mishra (100k+) | |||
Seattle Data Guy | Seattle Data Guy (100k+) | Ben Rogojan (100k+) | SeattleDataGuy (10k+) | ||
TrendyTech | TrendyTech (100k+) | Sumit Mittal (100k+) | |||
Darshil Parmar | Darshil Parmar (100k+) | Darshil Parmar (100k+) | |||
Andreas Kretz | Andreas Kretz (100k+) | Andreas Kretz (100k+) | learndataengineering (5k+) | ||
ByteByteGo | ByteByteGo (1m+) | Alex Xu (100k+) | alexxubyte (100k+) | ||
The Ravit Show | The Ravit Show (100k+) | ||||
Guy in a Cube | Guy in a Cube (100k+) | ||||
Adam Marczak | Adam Marczak (100k+) | ||||
nullQueries | nullQueries (100k+) | ||||
TECHTFQ by Thoufiq | TECHTFQ by Thoufiq (100k+) | ||||
SQLBI | SQLBI (100k+) | Marco Russo (50k+) | marcorus (10k+) | ||
Azure Lib | Azure Lib (10k+) | Deepak Goyal (100k+) | |||
Prashanth Kumar Pandey | ScholarNest (77k+) | Prashanth Kumar Pandey (37K+) | |||
Advancing Analytics | Advancing Analytics (10k+) | Simon Whiteley (10k+) | |||
Kahan Data Solutions | Kahan Data Solutions (10k+) | ||||
Ankit Bansal | Ankit Bansal (10k+) | Ankit Bansal (50k+) | |||
Mr. K Talks Tech | Mr. K Talks Tech (10k+) | ||||
Li Yin | Li Yin (10k+) | ||||
Jaco van Gelder | Jaco van Gelder (10k+) | ||||
Joseph Machado | Joseph Machado (10k+) | startdataeng (5k+) | |||
Eric Roby | Eric Roby (10k+) | ||||
Simon Späti | Simon Späti (10k+) | ||||
Dipankar Mazumdar | Dipankar Mazumdar (5k+) | ||||
Daniel Ciocirlan | Daniel Ciocirlan (5k+) | ||||
Hugo Lu | Hugo Lu (5k+) | ||||
Tobias Macey | Tobias Macey (5k+) | ||||
Marcos Ortiz | Marcos Ortiz (5k+) | ||||
Julien Hurault | Julien Hurault (5k+) | ||||
Alex Freberg | Alex The Analyst (100k+) | Alex Freberg (100k+) | @alex_the_analyst (10k+) | ||
Marc Lamberti | Marc Lamberti (50k+) | ||||
Chip Huyen | Chip Huyen (250k+) | ||||
Alex Merced | Alex Merced Data | Alex Merced (30k+) | @amdatalakehouse | @alexmercedcoder | |
John Kutay | John Kutay | John Kutay (5k+) | @JohnKutay | ||
Lakshmi Sontenam | Lakshmi Sontenam (9.5k+) | ||||
Hassaan Akbar | Hassaan Akbar (5k+) | ||||
Samuel Focht | Python Basics (10k+) | ||||
Constantin Lungu | Constantin Lungu (10k+) | ||||
Ijaz Ali | Ijaz Ali (24K+) | ||||
Subhankar | Subhankar (5k+) | ||||
Ankur Ranjan | Big Data Show (100k+) | Ankur Ranjan (48k+) | |||
Lenny | Lenny A (6k+) | ||||
Mehdi Ouazza | Mehdio DataTV (3k+) | Mehdi Ouazza (20k+) | mehd_io | @mehdio_datatv | |
ITVersity | ITVersity (67k+) | Durga Gadiraju (48k+) | |||
Arnaud Milleker | Arnaud Milleker (7k+) | ||||
Soumil Shah | [Soumil Shah] (https://www.youtube.com/@SoumilShah) (50k) | Soumil Shah (8k+) | |||
Ananth Packkildurai | Ananth Packkildurai (18k+) | ||||
Dan Kornas | dankornas (66k+) | ||||
Nitin | https://www.linkedin.com/in/tomernitin29/ | ||||
Manojkumar Vadivel | Manojkumar Vadivel (12k+) |
Great Podcasts
- The Data Engineering Show
- Data Engineering Podcast
- DataTopics
- The Data Engineering Side Of Data
- DataWare
- The Data Coffee Break Podcast
- The Datastack show
- Intricity101 Data Sharks Podcast
- Drill to Detail with Mark Rittman
- Analytics Power Hour
- Catalog & cocktails
- Datatalks
- Data Brew by Databricks
- The Data Cloud Podcast by Snowflake
- What's New in Data
- Open||Source||Data by Datastax
- Streaming Audio by confluent
- The Data Scientist Show
- MLOps.community
- Monday Morning Data Chat
- The Data Chief
Great list of 20+ newsletters
Top must follow newsletters for data engineering:
Glossaries:
- Data Engineering Vault
- Airbyte Data Glossary
- Data Engineering Wiki by Reddit
- Seconda Glossary
- Glossary Databricks
- Airtable Glossary
- Data Engineering Glossary by Dagster
Design Patterns
- Cumulative Table Design
- Microbatch Deduplication
- The Little Book of Pipelines
- Data Developer Platform
Courses / Academies
- DataExpert.io course use code HANDBOOK10 for a discount!
- LearnDataEngineering.com
- Technical Freelancer Academy Use code zwtech for a discount!
- IBM Data Engineering for Everyone
- Qwiklabs
- DataCamp
- Udemy Courses from Shruti Mantri
- Rock the JVM teaches Spark (in Scala), Flink and others
- Data Engineering Zoomcamp by DataTalksClub
- Efficient Data Processing in Spark
- Scaler
- DataTeams - Data Engingeer hiring platform
- Udemy Courses from Daniel Blanco
Certifications Courses
- Google Cloud Certified - Professional Data Engineer
- Databricks - Certified Associate Developer for Apache Spark
- Databricks - Data Engineer Associate
- Databricks - Data Engineer Professional
- Exam DP-203: Data Engineering on Microsoft Azure
- Microsoft Fabric Analytics Engineer Associate
- AWS Certified Data Engineer - Associate
最近版本更新:(数据更新于 2024-12-16 00:07:05)
主题(topics):
apachespark, awesome, bigdata, data, dataengineering, sql
DataExpert-io/data-engineer-handbook同语言 Jupyter Notebook最近更新仓库
2024-11-29 18:33:27 neo4j-labs/llm-graph-builder
2024-11-15 05:39:53 KindXiaoming/pykan
2024-11-11 10:53:33 microsoft/autogen
2024-10-09 04:20:42 Arize-ai/phoenix
2024-10-03 01:07:52 langchain-ai/langchain
2024-10-02 03:17:33 udlbook/udlbook