Home > Software engineering >  How big tech companies share databases across multiple teams?
How big tech companies share databases across multiple teams?

Time:01-16

How multiple teams(which own different system components/micro-services) in a big tech company share their databases.

I can think of multiple use cases where this would be required. For example in an e-commerce firm, same product will be shared among multiple teams like product at first will be part of product onboarding service, then may be catalog service (which stores all products and categories), then search service, cart service, order placing service, recommendation service, cancellation & return service and so on.

If they don't share any db then

  1. Do they all have redundant copy of the products with same product ID and
  2. Wouldn't there be a challenge to achieve consistency among multiple team.

There are multiple related doubt I have in both the case wether they share DB or not. I have been through multiple tech blogs and video on software design, and still didn't get satisfying answer. Do share some resources which can give a complete workflow of how things work end-to-end in a big tech firm. Thank you

CodePudding user response:

In the microservice architecture, each microservice exposes endpoints where other microservice can access shared information between the services. So one service would store as minimal information of a record that is managed by another microservice. For example if a user service would like to fetch orders for a particular user in an e-commerce case, then the order service would expose an endpoint given a user id would return all orders related to the userid supplied and so on...so essentally the only field related to the user that the order service needs to store is the userid, the rest of the user details is irrelevant to it.

To further improve the cohesion and understanding between teams, data discovery apis/documentation are also built to share metadata of databases to other teams to further explain what each table/field means for one to efficiently plan out a microservice. You can read more about how such companies build data discovery tools here

CodePudding user response:

If I understand you correctly, you are unsure how different departments receive data in a company?

The idea is that you create reusable and effective API's to solve this problem.

Let's generically say the company we're looking at is walmart. Walmart has millions of items in a database(s). Each item has a unique ID etc etc.

If Walmart is selling items online via walmart.com, they have to have a way to get those items, so they create API's and use them to grab items based on certain query conditions.

Now, let's say walmart has decided to build an app... well they need those exact same items! Well, good thing we already created those API's, we will use the exact same ones to grab the data.

Now, how does Walmart manage which items are available at which store, and at what price? They would usually link this meta data through additional database schema tables and tying them all together with primary and foreign keys.

^^ This essentially allows walmart to grab ONLY the item out of their CORE database that only has details that are necessary to the item (e.g. name, size, color, SKU, details, etc), and link it to another database that is say, YOUR local walmart that contains information relevant to only your walmart location in regard to that item (e.g. price, stock, aisle number etc).

So using multiple databases yes, in a sense.

Perhaps this may drive you down some more roads: https://learnsql.com/blog/why-use-primary-key-foreign-key/ https://towardsdatascience.com/designing-a-relational-database-and-creating-an-entity-relationship-diagram-89c1c19320b2

CodePudding user response:

There's a substantial diversity of approaches used between and even within big tech companies, driven by different company/org cultures and different requirements around consistency and availability.

Any time you have an explicit "query another service/another DB" dependency, you have a coupling which tends to turn a problem in one service into a problem in both services (and this isn't a necessarily a one-way thing: it's quite possible for the querying service to encounter a problem which cascades into a problem in the queried service (this is especially possible when a cache becomes load-bearing, which has led to major outages at at least one FANMAG in the not-that-distant past)).

This has led some companies that could be fairly called big tech to eschew that approach in their service design, typically by having services publish events describing what has changed to a durable log (append-only storage). Other services subscribe to that log and use the events to construct their own eventually consistent view of the data owned by the other service (i.e. there's some level of data duplication, with services storing exactly the data they need to function).

  •  Tags:  
  • Related