Видео 877
Просмотров 10 324 256

Microservice Pitfalls: Solving the Dual-Write Problem | Designing Event-Driven Microservices

6:47

Tabs or spaces? Merge vs. rebase? Let’s settle it with Kafka and Node.js

4:33

What is a Headless Data Architecture?

11:11

Using Asynchronous Events to enrich Fraud Detection | Designing Event-Driven Microservices

8:57

Extracting a Fraud Detection Microservice from a Bank System | Designing Event-Driven Microservices

8:13

Ignite Series

6:02

How to resolve issues with your Python Kafka Producers

Learn how to leverage the native monitoring capabilities of the Python Kafka producer along with Confluent Cloud’s Metrics API while exploring how linger.ms affects latency and batch sizes.
Use the promo code MONITORINGAPPS to get $25 of additional free Confluent Cloud usage: cnfl.io/try-cloud-monitoring-and-troubleshooting-data-streaming-apps
Promo code details: cnfl.io/monitoring-and-troubleshooting-promo-code-details
RELATED RESOURCES
► Confluent Developer: cnfl.io/3VYWxow
► Python Kafka client - cnfl.io/3xxYxuI
► Confluent Cloud Metrics API - cnfl.io/3RK3p6U
► Knight Capital - www.cnbc.com/2012/08/02/the-knight-fiasco-how-did-it-lose-440-million.html
CHAPTERS
00:00 - Intro
00:26 - What is ling...

Видео

Microservice Pitfalls: Solving the Dual-Write Problem | Designing Event-Driven Microservices

6:47

Microservice Pitfalls: Solving the Dual-Write Problem | Designing Event-Driven Microservices

Просмотров 2,6 тыс.12 часов назад

► LEARN MORE: cnfl.io/microservices-101-module-1 When building a distributed system, developers are often faced with something known as the dual-write problem. It occurs whenever the system needs to perform individual writes to separate systems that can't be transactionally linked. This situation creates the potential for data loss if the developer isn't careful. However, techniques such as the...

Tabs or spaces? Merge vs. rebase? Let’s settle it with Kafka and Node.js

4:33

Tabs or spaces? Merge vs. rebase? Let’s settle it with Kafka and Node.js

Просмотров 52315 часов назад

Tabs or spaces? Merge vs. rebase? Flink SQL vs. KStreams? Lucia Cerchie wanted to settle these debates once and for all, so she created a website to let the internet decide: www.lets-settle-this.com/ Let’s Settle This is powered by a new Kafka JavaScript client from Confluent: confluent-kafka-javascript (early access). Find out how Lucia used it to make the website in the video above. RELATED R...

11:11

What is a Headless Data Architecture?

Просмотров 8 тыс.День назад

The headless data architecture. Is it a fad? Some marketecture? Or something real? In this video, Adam Bellemare takes you through the basics of the headless data architecture and why it’s beginning to emerge as its own respective pattern. Driven by the decoupling of data computation from storage, the headless data architecture provides the basis for a modular data ecosystem. Stream your data f...

Using Asynchronous Events to enrich Fraud Detection | Designing Event-Driven Microservices

8:57

Using Asynchronous Events to enrich Fraud Detection | Designing Event-Driven Microservices

Просмотров 1,4 тыс.14 дней назад

► LEARN MORE: cnfl.io/microservices-101-module-1 In this video, you will see an example of how Tributary bank uses asynchronous events to enrich its domain and protect its fraud detection system from failures. To learn more about Microservices, check out my Designing Event-Driven Microservices course on Confluent Developer: cnfl.io/microservices-101-module-1 Relying purely on synchronous reques...

Extracting a Fraud Detection Microservice from a Bank System | Designing Event-Driven Microservices

8:13

Extracting a Fraud Detection Microservice from a Bank System | Designing Event-Driven Microservices

Просмотров 95221 день назад

► LEARN MORE: cnfl.io/microservices-101-module-1 In this video, we discuss one way a business can approach decomposing a monolith using a series of clearly defined steps and robust monitoring. To learn more about Microservices, check out my Designing Event-Driven Microservices course on Confluent Developer: cnfl.io/microservices-101-module-1 Decomposing a monolith into a set of microservices wh...

6:02

Ignite Series

Просмотров 18221 день назад

Introducing the Ignite Series! We sat down with some of our inspiring female leaders to get their insights on topics ranging from effective leadership development for women in tech to strategies for re-entering the workforce after a break. ABOUT CONFLUENT Confluent is pioneering a fundamentally new category of data infrastructure focused on data in motion. Confluent’s cloud-native offering is t...

How to Analyze Data from a REST API with Flink SQL

4:32

How to Analyze Data from a REST API with Flink SQL

Просмотров 1,3 тыс.21 день назад

Join Lucia Cerchie in a coding walkthrough, bridging the gap between REST APIs and data streaming. Together we’ll transform the OpenSky Network's live API into a data stream using Kafka and Flink SQL. To see more data streaming in action, check out the demos on Confluent Developer: cnfl.io/3X9niaV Not only do we change the REST API into a data stream in this walkthrough, but we clean up the dat...

Defining Asynchronous Microservice APIs for Fraud Detection | Designing Event-Driven Microservices

8:16

Defining Asynchronous Microservice APIs for Fraud Detection | Designing Event-Driven Microservices

Просмотров 1,6 тыс.28 дней назад

► LEARN MORE: cnfl.io/microservices-101-module-1 In this video, Wade explores the process of decomposing a monolith into a series of microservices. You'll see how Tributary bank extracts a variety of API methods from an existing monolith. To learn more about Microservices, check out my Designing Event-Driven Microservices course on Confluent Developer: cnfl.io/microservices-101-module-1 Tributa...

Confluent Data Portal: Data Discovery Made Easy

8:08

Confluent Data Portal: Data Discovery Made Easy

Просмотров 49128 дней назад

Learn how the Data Portal and Apache Flink® in Confluent Cloud can help developers and data practitioners find the data they need to quickly create new data products. Try Data Portal now without worrying about the setup or configuration in Confluent Cloud: cnfl.io/45hx3WN RELATED RESOURCES ► Data Portal on Confluent Cloud - cnfl.io/3R5lADI CHAPTERS 00:00 - Intro 00:21 - Data Portal Overview 02:...

Retrieval Augmented Generation (RAG) with Data Streaming

12:57

Retrieval Augmented Generation (RAG) with Data Streaming

Просмотров 2,1 тыс.Месяц назад

How do you prevent hallucinations from large language models (LLMs) in GenAI applications? LLMs need real-time, contextualized, and trustworthy data to generate the most reliable outputs. Kai Waehner, Global Field CTO at Confluent, explains how RAG and a data streaming platform with Apache Kafka and Flink make that possible. RESOURCES ► GenAI hub: www.confluent.io/generative-ai ► Webinar: Build...

Event-Driven Microservices in Banking and Fraud Detection | Designing Event-Driven Microservices

10:55

Event-Driven Microservices in Banking and Fraud Detection | Designing Event-Driven Microservices

Просмотров 3,9 тыс.Месяц назад

► LEARN MORE: cnfl.io/microservices-101-module-1 How do we know whether Event-Driven Microservices are the right solution? This is the question that Tributary Bank faced when they looked at modernizing their old fraud-detection system. They were faced with many challenges, including scalability, reliability, and security. Some members of their team felt that switching to an event-driven microse...

Everything you’ve wanted to ask about Event-Driven Architectures | The Duchess & The Doctor Show

18:09

Everything you’ve wanted to ask about Event-Driven Architectures | The Duchess & The Doctor Show

Просмотров 1,2 тыс.Месяц назад

For their inaugural episode, Anna McDonald (the Duchess) and Matthias J. Sax (the Doctor) from Confluent, along with their extinct friend, Phil, wax rhapsodic about all things eventing: you’ll learn why events are a mindset, why the Duchess thinks you’ll find event immutability relaxing, and why your event streams might need some windows. To learn more about fundamentals across the Apache Kafka...

Kafka Summit Bangalore 2024 Keynote | Jay Kreps, Co-founder & CEO, Confluent

1:31:01

Kafka Summit Bangalore 2024 Keynote | Jay Kreps, Co-founder & CEO, Confluent

Просмотров 2,7 тыс.Месяц назад

Join the Confluent leadership team as they share their vision of streaming data products enabled by a data streaming platform built around Apache Kafka. Jay Kreps, Co-creator of Apache Kafka and CEO of Confluent, will present his vision of unifying the operational and analytical worlds with data streams and showcase exciting new product capabilities. ABOUT CONFLUENT Confluent is pioneering a fu...

5:50

Exactly-Once Processing in Apache Flink

Просмотров 2,5 тыс.Месяц назад

Learn how Apache Flink® can handle hundreds or even thousands of compute nodes running 24/7 and still produce correct results. Try Flink without worrying about the setup or configuration in Confluent Cloud: cnfl.io/4bhZZiQ RELATED RESOURCES ► Apache Flink 101: cnfl.io/4b206zA ► Streaming Joins in Apache Flink: ruclips.net/video/ChiAXgTuzaA/видео.html ► Original paper on state management in Apac...

Event-Driven Architecture (EDA) vs Request/Response (RR)

12:00

Event-Driven Architecture (EDA) vs Request/Response (RR)

Просмотров 119 тыс.2 месяца назад

Event-Driven Architecture (EDA) vs Request/Response (RR)

Confluent Connectors | Fast, frictionless, and secure Apache Kafka integrations

5:17

Confluent Connectors | Fast, frictionless, and secure Apache Kafka integrations

Просмотров 5272 месяца назад

Confluent Connectors | Fast, frictionless, and secure Apache Kafka integrations

6:18

confluent investor testimonials

Просмотров 3582 месяца назад

confluent investor testimonials

How to Unlock the Power of Event-Driven Architecture | Designing Event-Driven Microservices

7:30

How to Unlock the Power of Event-Driven Architecture | Designing Event-Driven Microservices

Просмотров 8 тыс.3 месяца назад

How to Unlock the Power of Event-Driven Architecture | Designing Event-Driven Microservices

Introducing Gitpod for Confluent Developer

6:00

Introducing Gitpod for Confluent Developer

Просмотров 1,1 тыс.3 месяца назад

Introducing Gitpod for Confluent Developer

Set your Data in Motion with Confluent on Google Cloud

3:08

Set your Data in Motion with Confluent on Google Cloud

Просмотров 5723 месяца назад

Set your Data in Motion with Confluent on Google Cloud

Streams Forever: Kafka Summit London 2024 Keynote | Jay Kreps, Co-founder & CEO, Confluent

1:47:27

Streams Forever: Kafka Summit London 2024 Keynote | Jay Kreps, Co-founder & CEO, Confluent

Просмотров 9 тыс.3 месяца назад

Streams Forever: Kafka Summit London 2024 Keynote | Jay Kreps, Co-founder & CEO, Confluent

4:31

The Confluent Q1 ‘24 Launch

Просмотров 5043 месяца назад

The Confluent Q1 ‘24 Launch

Confluent Cloud for Apache Flink | Simple, Serverless Stream Processing

7:09

Confluent Cloud for Apache Flink | Simple, Serverless Stream Processing

Просмотров 7763 месяца назад

Confluent Cloud for Apache Flink | Simple, Serverless Stream Processing

Apache Flink 1.19 - Deprecations, New Features, and Improvements

4:09

Apache Flink 1.19 - Deprecations, New Features, and Improvements

Просмотров 1,2 тыс.3 месяца назад

Apache Flink 1.19 - Deprecations, New Features, and Improvements

4 Key Types of Event-Driven Architecture

9:19

4 Key Types of Event-Driven Architecture

Просмотров 11 тыс.3 месяца назад

4 Key Types of Event-Driven Architecture

How to Evolve your Microservice Schemas | Designing Event-Driven Microservices

6:20

How to Evolve your Microservice Schemas | Designing Event-Driven Microservices

Просмотров 3,6 тыс.3 месяца назад

How to Evolve your Microservice Schemas | Designing Event-Driven Microservices

What is a Kafka Consumer and How does it work?

5:09

What is a Kafka Consumer and How does it work?

Просмотров 3,5 тыс.3 месяца назад

What is a Kafka Consumer and How does it work?

What are Kafka Producers and How do they work?

4:57

What are Kafka Producers and How do they work?

Просмотров 2,9 тыс.3 месяца назад

What are Kafka Producers and How do they work?

What is the Listen to Yourself Pattern? | Designing Event-Driven Microservices

5:51

What is the Listen to Yourself Pattern? | Designing Event-Driven Microservices

Просмотров 7 тыс.4 месяца назад

What is the Listen to Yourself Pattern? | Designing Event-Driven Microservices

@NamLe-fl4sz 12 часов назад
From Viet Nam. Thanks <3
@ilijanl 13 часов назад
You can actually leverage legacy db transaction to publish to kafka with some tradeoffs. The flow can be following: 1. Start transaction 2. Insert into legacy db 3. Publish to Kafka 4. Commit If step 2 or 3 throws, nothing will be committed and the whole handler will fail, which can be retried later. If for some reason 2&3 succeed and 4 fails, you have published the event to kafka without storing to db however now you have atleast once for publishing. Tradeoff is ofcourse that your runtime has a dependency on Kafka, and if kafka is down, you never can succeed the transaction. However they say kafka is HA and high performance so the problem might be smaller than it seems.
@ConfluentDevXTeam 21 час назад
👋Hey there! Thanks for watching; hope you found it useful for your journey to building solid data streaming applications in Python. Don’t forget to subscribe if you did, as we release content quite often! If you have any questions or feedback, drop a comment below-we’d love to hear from you! 😊Also, check out the description for links to related resources. Enjoy!🎉
@vinitsunita День назад
What will happen if I have million of users. How flink will manage state for that scale ?
@ConfluentDevXTeam День назад
Wade here. Thankfully, scale is something that Flink does very well. You can scale out your jobs, use tools to shuffle and rebalance data across a cluster, etc. Having said that, as you scale, you do have to be more careful. The state needs to be stored somewhere by Flink so you will need to ensure you aren't running out of resources. You can often alleviate this with careful distribution across a cluster. In general, try to avoid storing things that grow in an unbounded fashion. When you do have something that is going to grow, make sure you have a way to partition and scale it across the cluster.
@user-bc6kt6ei7m 2 дня назад
Thank you for this tutorial!
@ramannanda 3 дня назад
crisp intro missing barriers and stuff..
@nroelandt 3 дня назад
Hi Adam, this sounds great in theory and in 'full' load scenario's. What about CDC workloads, where full loads and delta's are separate. The logic and needed compute power (credits) will skyrocket..
@ConfluentDevXTeam 3 дня назад
Adam here. When you create the CDC topic and snapshot the table, set retention to infinite and set compaction to true. Your topic will be an eventually-consistent replica of your database table. Whenever you CUD a row in the DB, the latest full set of data will be propagated to the topic (think Debezium, with it's before/after fields). Then as a consumer you can simply materialize the whole topic if you want the whole set of data, or just select the fields your app cares about and discard the rest. Basically, you want to do whateve you can in your power to alleviate your consumers from having to "reconstruct" the data on their own, merging snapshots and deltas leads to a lot of complexity and work, replicated for every single consumer that wants the data. The tradeoff is more data over the wire with conventional message brokering - but the benefit is that it's a much simpler architecture. Storage is super cheap, especially if you're using Cloud storage as a backer for your Kafka topics. Note that this design is becoming even simpler with the adoption of Kafka replication without network. Eg; Confluent Freight topics use Amazon S3 as both storage AND replication, so that you don't pay cross-AZ fees anymore. Check it out here if you want to know more: www.confluent.io/blog/introducing-confluent-cloud-freight-clusters/
@darwinmanalo5436 3 дня назад
So instead of manually sending events to Kafka, we save the events to the database first. Then, there is a CDC tool that detects updates and automatically sends them to Kafka? Another tool adds another layer of complexity. Event Sourcing is quite complex, so people should carefully consider if it's the right tool for the project before implementing it. I wish these inconsistencies/issues were already solved in Kafka itself, not by us. P.S. The presentation is well-explained though. Wade is a good teacher.
@ConfluentDevXTeam 3 дня назад
Wade here. CDC is entirely optional. If you already have a CDC system in place, it can assist you in this process. But, if you don't have CDC in place, and don't want to introduce it, you can write some code in your microservice to do this. Unfortunately, this isn't a problem you can solve in Kafka. That's in part because it's not a Kafka problem. It's a distributed systems problem that can exist in any system. You can encounter the exact same problem when saving to a file and sending an email. But the other issue is that in this specific instance, Kafka can't solve the problem because Kafka doesn't know about it. I.E. The situation outlined is what happens when Kafka doesn't get the message? How could Kafka solve the problem when it doesn't even know the message exists? To understand the problem in more depth, check out www.confluent.io/blog/dual-write-problem/
@darwinmanalo5436 3 дня назад
@@ConfluentDevXTeam I got your point. Thanks for the reply, Wade.
@MrGoGetItt 3 дня назад
Exceptional content delivery! Not only were you articulate, but the visuals were an excellent aid. Great work
@ConfluentDevXTeam 3 дня назад
Wade here. Thanks. I'm glad you enjoyed it. On the visual side, I've started working a bit more with animations to try to assist in focusing the eyes where they need to be. It sounds like that has been effective.
@petermoskovits8470 3 дня назад
At 1:25 you cover in great detail how to address the problem when the Kafka write fails and the DB write succeeds. How about the other way around? What if the Kafka write succeeds, and the DB write fails?
@ConfluentDevXTeam 3 дня назад
Wade here. There's an implicit assumption in the video about order. I am assuming that your order of operations is 1. Write to the Database, 2. Write to Kafka. With that assumption in mind, if the write to the database fails, then presumably the operation aborts, and the write to Kafka never happens. This scenario is fine and we don't have to do anything. Now, if you reverse the order of operations and do the write to Kafka first, then we are back to the same problem and have to investigate the same solutions. You can find a deeper discussion on this here: developer.confluent.io/courses/microservices/the-dual-write-problem/. There's also a separate edge case that I haven't discussed. What happens if the write to the database times out? In that case, it's actually possible that the write succeeded, but the code will view it as a failure and the write to Kafka won't proceed, again resulting in an inconsistency. To be safe, it's best to deal with the dual-write problem upfront so we can avoid these edge cases.
@xxXAsuraXxx 4 дня назад
Nice. Outbox will do
@ConfluentDevXTeam 3 дня назад
Wade here. Thanks, glad you enjoyed the content. Though, in this case, although the Outbox will work, I feel that Event Sourcing might be a better solution.
@user-ev9jg6ts6e 4 дня назад
Nice one. Thanks Wade.
@ConfluentDevXTeam 4 дня назад
Wade here. Thank you. Glad you enjoyed it.
@ConfluentDevXTeam 4 дня назад
Wade here. This is my second video on the Dual-Write Problem. The previous video generated a lot of interesting comments and discussions. I'd love for the same to happen here. If you haven't watched the previous video, you can find it linked in the description. But if you are still confused about the Dual-Write problem, drop me a comment and I'll do my best to try and clarify.
@marcom. 3 дня назад
Hi Wade. Thanks a lot for your videos. I think I have one thing that you probably could address. I read somewhere that, if you use CDC (i.e. Debezium), there is no need to store the events in the outbox table at all. It is sufficient to save and immediately delete them, because CDC only sees the changelog of the db, reacts to the insert and ignores the delete. Right?
@shinypants2204 6 дней назад
Great content! Thanks for putting this together
@thisismissem 7 дней назад
The most impressive thing about this video is he's writing everything backwards on that glass board he's behind 😮
@tobbymarchal3140 5 дней назад
No they flipped
@ConfluentDevXTeam 8 дней назад
Hey there, it’s Lucia! If you’re a JavaScript developer who’s new to Kafka, I highly recommend checking out the confluent-kafka-javascript client! It’s in early access, and you can find resources on getting started with it here: github.com/confluentinc/confluent-kafka-javascript
@jarrodhroberson 8 дней назад
congratulations you rediscovered client server architecture and just confusingly rename it headless. by your definition ever RDBMS is “headless”
@ConfluentDevXTeam 8 дней назад
Adam here. It seems you're missing some key elements: 1) A traditional RDBMS bundles processing with storage.In headless, the storage is completely separate from the processing and query. I do not know of any RDBMS systems that let you access their underlying storage (Eg: a B-tree) directly, but if there is, I would be keen to find out. 2) You don't have the risk of one query saturating your data layer and stalling other queries, like you do in an RDBMS. However, this relies on the fact that most data layers will be served via massive cloud compute storage (R2, GCS, Azure Blob, S3, etc), and not you running your own data layer on an in-house HDFS. 3) Client-server embeds business logic inside the server for the client to call. There is a tight coupling between the two. In HDA, there are no smarts in the data layer. It's just data that has been created by the upstream services. IF your server only provided raw data via a GET, and writes via a PUT/POST, but had absolutely no other functionality whatsoever, then you could equate it to a headless model. That's pretty much what Iceberg and Kafka do, with a bit of cleanup optimizations sprinkled in.
@maf_aka 9 дней назад
how can I model branched state though? for example if I add a payment flow between Ride Requested and Ride Scheduled events, and the payment results can either trigger a Ride Scheduled if successful or Ride Cancelled otherwise. if the decider function only throws an error without evolving the state how would the system know if the ride is no longer needed to be scheduled?
@puneetbhatia2326 10 дней назад
So basically, the system would allow at least one fraudulent transaction to go through per account before it can mark the account as compromised. That can be an expensive proposition for the bank if these fraudulent transactions are high $$ value. Thoughts?
@ConfluentDevXTeam 10 дней назад
Wade here. Depends on the final implementation details of the system. They certainly could have an initial layer of basic checks that run synchronously (see other comments). This would be looking for obvious things that can be detected on a single transaction. But, the reality is that most fraud doesn't show up on a single transaction. A lot of fraud detection happens over multiple transactions. As a result, the nature of the problem dictates that they have to let through some fraudulent transactions because they can't detect it until they have seen a pattern develop. And yes, if the $$ amount is high, that could be a problem. That is why many banks have transaction limits. They have decided what level of risk they are willing to take and set the limit accordingly. There's a balance to be struck. The bank has to decide where they want to strike that balance. Would they rather have valid transactions fail (or timeout) or potentially let through more fraud? Both situations are costly. But which one costs more?
@puneetbhatia2326 10 дней назад
I don’t understand the proposed asynchrony in the fraud detection system with Kafka in the middle. What good is the fraud detection system if it doesn’t stop/prevent fraud on the transaction being serviced? I would have expected it to be synchronous request-response framework. No?
@ConfluentDevXTeam 10 дней назад
Wade here. You probably want to take some time and watch the rest of the video series here: developer.confluent.io/courses/microservices In particular, you might find the most recent video helpful (and the comments below it): developer.confluent.io/courses/microservices/case-study-asynchronous-events/ But the gist of it is that Fraud Detection takes time. You generally can't look at a single transaction and say "oh, that's fraud." You have to look at patterns of transactions over time. Eventually, you develop enough information that you can start to say "Ah, something weird is happening here" and at that point, you lock the account and prevent future transactions.
@an2ny2100 10 дней назад
good explanation though in terms of a bank transaction as a customer i'd prefer if the transaction was detected as a fraud rather than pushing through but hey, everything comes down to preference which is why we have a lot of software accomplishing the same thing in a different way :-)
@WadeWaldron-Confluent 10 дней назад
Wade here. The thing is, it may not be as simple as detect fraud or don't detect fraud. It's a balance. Yes, you obviously want to detect fraud as accurately as possible. But your bank also needs to be able to give people money when they ask for it. Financial systems, payment processors, etc, have very strict limits on how long they can take. If running fraud detection exceeds those limits, all transactions will be rejected, whether they are fraudulent or not. If that happens, the fraud detection is going to be largely irrelevant because the bank won't have any customers left. Having said that, in a real system, they can definitely have layers of fraud detection. Do the fast stuff in real-time, and leave the slower stuff for an async process.
@an2ny2100 10 дней назад
@@WadeWaldron-Confluent alright. now i know a bit more about banking system
@uchechukwumadu9625 10 дней назад
Why not just call it a data lakehouse architecture like others are doing?
@ConfluentDevXTeam 10 дней назад
Adam here. I covered that here: ruclips.net/video/dahs6jpCYQs/видео.html Headless is the full decoupling of data access from processing _of all forms_, providing reusable data assets for anywhere in the company - not just for analytics use cases. To emphasize - a data lake, or lakehouse, or lake-like warehouse, are all analytics constructs first and foremost. They're also predicated on the notion that you must copy all the data into the data lake's bronze layer "as-is". From there, you add schemas, formatting, and structure, leading to a silver layer (another copy). Then you can start doing work on it. The problems: 1) You don't own the data. The source breaks, your pipeline breaks, and then you're forced to react to it, determine impact / contamination, reprocess data sets, etc. (Did this for 7-8 years myself, no thanks). 2) It's stuck in your data lake. All that work you did to convert it Source->Bronze->Silver is only usable if you use the data lake. _Historically_, leading data lake providers have been happy to provide you with the best performance ONLY if you use their query engines. Using an external engine (if even compatible) would lead to far worse performance. Data lakehouse/warehouse/househouse providers were more than happy to lock you in on the data front, because they made big $$$ on it. But happily, this is starting to change due to the adoption of open-source formats that you can run yourself in house - you can see it with the growing adoption of Iceberg (Databricks bought Tabular, Iceberg cocreators - Snowflake is investing heavily in Iceberg, open sourcing their Polaris catalog). Data lake providers _could_ decide NOT to adopt these open formats, but then they risk losing their business to those who have - so the result is that most players are letting go of their control over storage so that they can adopt Iceberg/Delta/Hudi compatible workloads that they may not have had access to before. If you want a quick mental shortcut for how this is different - a headless data architecture lets you plug your data into your data lake _at the silver layer_. The data is well-formed, controlled, schematized, and has designated ownership. But you can also plug that same data into a DuckDB instance, or you can plug it into BigQuery, or Flink, or any other Iceberg-compatible / Kafka compatible consumer endpoint. The idea is that you've decoupled the creation of the data plane from "analytics-only" concerns, and instead focused on building modular reusable building blocks that can power any number of data lakes, warehouses, or swamps, in addition to operational systems.
@marcom. 10 дней назад
I don't get the point of this video, I must admit. If we build modular architectures with bounded contexts, each with its own data, loosely coupled with EDA - why should I want something that sounds like the exact opposite?
@ConfluentDevXTeam 10 дней назад
Adam here. I'm not advocating removing bounded contexts, putting all the data into a big mixed pile, and tightly coupling all your systems together. A headless data architecture promotes the data that should be shared into an accessible format - in this version, the stream or the table. If you already have EDA with well-formed and schematized streams you're halfway there. The table component is an extension, where you take business data circulating in your streams and materializing it into an Iceberg table - but note that we didn't shove it into some data lake somewhere for just the lake to use. It remains outside the lake, and is _pluggable_ into whatever lake, warehouse, operational system, SaaS app, or client that needs it. This pluggability forgoes copying, so that you don't need to build pipelines and copy data. The gist is that you focus on building a data layer that promotes access and reuse - something that comes for free with a Kafka-based EDA, but that has historically struggled for tables due to the general approach of dumping it all in a data lake to sort it out later.
@danielthebear 11 дней назад
I love Iceberg but probably I would not apply this architecture when data is distributed in different cloud providers because each query that goes across cloud providers will incur great latency and generate egress costs - costs that will be difficult to predict. Furthermore CAP theorem applies when data is distributed. What are you thoughts about those 3 points?
@LtdJorge 10 дней назад
Well, the team building your architecture could abstract it below the public API. If you query data from BiqQuery, make the system do all processing on GCP and so on. However, if you're trying to join/aggregate data from different clouds, then yeah, I guess you're out of luck. Or you could make a query engine that is architecture aware and takes into account where the data is, the potential egress/ingress, etc as cost for the query planner and then try to push down as many operations as possible, so that you only send through the internet the most compact and already processed data, instead of the entire projection.
@ConfluentDevXTeam 10 дней назад
Adam here. Inter-cloud costs remain a factor, but typically I wouldn't expect to see a single query federated across clouds WITHOUT taking into consideration data locality. For example, issue separate queries for each cloud, aggregate locally, then bring those results to a single cloud for final aggregation (basically a multi-cloud Map-Reduce - thanks Hadoop!). Speed also remains a factor, as you pointed out, due to CAP theorem. There is no free lunch, so if you're going with a global, multi-cloud, distributed data layer, then yeah, you should probably invest in some tooling to prevent your users from shooting themselves in the foot with a $50k per query bill.
@BUY_YOUTUB_VIEWS.986 11 дней назад
This needs to be in a museum.
@ConfluentDevXTeam 11 дней назад
Hey, Adam here. Plugging your data into the processing and query heads of your choice is a significant benefit of the headless data architecture. Let me know what heads you make the most use of, and what pain points you have!
@ConfluentDevXTeam 11 дней назад
Try www.confluent.io today, the cloud-native Data Streaming Platform to set your data in motion.
@Mahi47XI 11 дней назад
Too much theory and less explanation
@ericsphone6915 11 дней назад
I believe it's "All that glitters is not gold".
@ConfluentDevXTeam 11 дней назад
That's the common quote, yes. However, in this case, Danica is correctly quoting from J.R.R. Tolkien. www.goodreads.com/quotes/229-all-that-is-gold-does-not-glitter-not-all-those
@GamingAmbienceLive 12 дней назад
i call crows kafkas cute name
@samwei51 12 дней назад
Thanks for the explanation. I would like to point out a few points where it sounded vague and unclear to me. 1. The arrows at 3:09 are intertwined and are hard to comprehend easily. 2. 8:12, it sounds like the P0 P1 P7 messages are enqueued into the coordinator queue all at once. But I wonder if they are progressively enqueued as the transaction process steps forward. 3. 10:21, "the broker also returns a little bit metadata (for message #57) so the consumer can ignore the aborted record". But the Abort message (#61) has not been read yet. It's kind of unclear to me how Kafka includes the info "message #57 has been aborted" in the metadata. Maybe I'm just being ignorant as I'm not experienced with Kafka. Any illuminations would be appreciated.
@himanshuthakur8635 12 дней назад
Great information!
@tomasselnekovic 13 дней назад
I dont understand why are you in such a hurry when explaining something like this. This video could be really good but should be redone because the pace of explanation makes it unfortunately useless.
@cu7695 14 дней назад
How to add business domain metadata ? Is it supported via Terraform
@ConfluentDevXTeam 12 дней назад
Hi, Gilles from Confluent here. You can add business domain metadata via the UI or the API, or you can use our Terraform provider. Check out my video to see how to do that: ruclips.net/video/Vg7k_3vlC3Q/видео.html.
@YoyoMoneyRing 15 дней назад
Its a 1.25 speed video already.
@Zmey5656 15 дней назад
Thank you, very interesting explanation about using asynchronous events
@ConfluentDevXTeam 15 дней назад
Wade here. You are welcome. Glad you enjoyed it.
@chalequin 15 дней назад
Great video , enjoying this series When it comes to transaction approval , there is the transactional part ( no pun intended ) where the system has to decide within ~100 milliseconds if it goes through or not . Here we must pack a a rule-based solution inside the monolith . Then for higher-latency ML models , the micro service takes over . I haven't seen or implemented the async approach myself but it feels like the right system design to me
@ConfluentDevXTeam 15 дней назад
Wade here. Yeah, that's more or less my thinking as well. Simple rules that can be executed fast can be used for a quick sanity check on the transaction. Async events for a more detailed analysis. The key is to recognize that sometimes things need to be done synchronously, but that doesn't mean everything needs to be synchronous.
@user-ev9jg6ts6e 16 дней назад
The problem with event-driven architecture is that business almost always wants things to be strongly consistent. In the example, transaction processing is not strongly consistent with fraud detection. I mean a transaction can be processed, but detected as fraud after several milliseconds. So a compensation mechanism needs to come into play and business people usually do not like this :)
@ConfluentDevXTeam 16 дней назад
Wade here. Absolutely. Within our industry, there is a tendency to assign strong consistency to processes that aren't actually strongly consistent and don't really require it. That is a hurdle to overcome. And it's not to suggest that you can always overcome it. Sometimes, the business wants strong consistency, and you can't do anything to change their mind. However, we should avoid assuming that everything must be strongly consistent. Start with async events. Fallback to other techniques if required. And make sure the business understands the consequences. If making the operation strongly consistent results in a 10x increase in latency (as an example), is that really what the business (or customer) wants?
@user-ev9jg6ts6e 16 дней назад
@@ConfluentDevXTeam Totally agree
@user-ev9jg6ts6e 16 дней назад
Excellent as always. Thanks, Wade.
@ConfluentDevXTeam 16 дней назад
Wade here. Thanks. Glad you enjoyed it.
@ConfluentDevXTeam 16 дней назад
Wade here. This is an interesting topic for me. Having built applications that use Async messages as their foundation, I'm sold. I've seen so many amazing benefits from it. But it's sometimes hard to explain those benefits to people who haven't tried it. So I am curious. Have you used async events? Have you seen some of the benefits I outline in the video? Have you seen challenges? Have you encountered issues in synchronous communication that you think might be solved using these techniques?
@dswork-wh2yf 17 дней назад
Thank you for the clear explanation
@cu7695 17 дней назад
A lot of orgs need to hear this.
@ConfluentDevXTeam 17 дней назад
Wade here. I'm glad you got value out of it. I'm curious what aspect you found so important that you feel other orgs need to hear it.
@szymonbernasiak7526 17 дней назад
For those who cannot find API Credentials: 1. Select your environment (probably 'default') 2. You'll see the "endpoint" in the right bottom right corner. This is what you're looking for.
@cpthermes3703 18 дней назад
Actually such a great explanation.
@oscarluizoliveira 19 дней назад
Congratulations on the content Adam, this area of technology of ours is very complicated, with many of the same terms having different meanings depending on the context, I'm studying postgraduate studies in software architecture and it drives me crazy. Taking advantage, when we talk about an ephemeral form without retention, we are referring to a pattern where events are processed in real time and are not stored for later consultation, is that it?
@ConfluentDevXTeam 18 дней назад
Ephemeral means that there is no indefinite persistence of the events. It can vary from "all events are only in memory" (and therefore vulnerable to deletion upon power loss) to "events are persisted to disk, but are deleted after a short period of time "(seconds, minutes, or perhaps after consumption). For example, a consumer may only receive the messages if it is online at the time the producer writes it to the broker. A consumer coming online later is unable to access the historic events because they are no longer stored anywhere.

Confluent

Комментарии