I am currently in a situation where I think of redesigning one of my classic relational database tables into a non-relational design and I am not sure if I should do it or not. Reason is that performance problems simply got out of control.
This table has 11 columns and it is designed like this:
Id bigint (PK) -- Clustered Index
FK1 bigint (FK) -- Non-Clustered Unique Composite Index ( FK1_FK2_FK3 in this order )
FK2 bigint (FK) -- Non-Clustered Unique Composite Index ( FK1_FK2_FK3 in this order )
FK3 bigint (FK) -- Non-Clustered Unique Composite Index ( FK1_FK2_FK3 in this order )
Value1 nvarchar(100)
Value2 nvarchar(100)
Value3 nvarchar(100)
Value4 nvarchar(100)
Value5 nvarchar(100)
Value6 nvarchar(100)
Value7 nvarchar(100)
Value8 nvarchar(100)
Here are some facts:
- database table has ~800.000.000 records and increasing faster than expected
- very low to low number of requests but high workload per request
5% of the requests look like this:
SELECT *
FROM myTable
WHERE Id = 12346 -- (works perfectly)
SELECT *
FROM myTable
WHERE Id IN (123456, 654321) -- (works OK because IN list contains only a small number of IDs)
UPDATE myTable
SET .....
WHERE Id = 123456 (works perfectly)
Unfortunately 95% of the requests look like this:
SELECT *
FROM myTable
WHERE Fk1 = 123456 AND FK2 = 654321
-- works badly because it gets 100.000 - 300.000 records but I need all of them. Yes, unique index is used because order of index is correct )
UPDATE myTable
SET Value1 = '1', Value2 = '2', Value3 = '3', Value 4 = '4',
Value5 = '5', Value6 = '6', Value7 = '7', Value8 = '8'
WHERE Fk1 = 123456 AND FK2 = 654321 -- horrible because also 300.000 and yes, unique index is used because order of index is correct )
Instead I would like to design it like that:
Id1 bigint (PK) -- Clustered Composite Index (former FK1 column)
Id2 bigint (PK) -- Clustered Composite Index (former FK2 column)
ContentColumn JSON -- Contains all former Value columns and the FK3 column as an array of objects. A object is column FK3, Value1, Value2 ....
ArrayLength INT -- length of json array
- Former FK1 and Fk2 are now a clustered unique composite PK because 95% are selecting and updating on that
- This means of course when I run in my 5% situation I have to load more than I actually need but in my opinion it is still worth
- I have to know of course in which Json FK3 is before request but I do that because of some other logic
- I also do not necessary need the composite unique constraint together with FK3.
- Foreign key constraint of FK3 is also not necessary.
- There are no form of SQLs that specifically read or search in a specific former Value column e.g. WHERE Value5 = 'ABC'
So what do you guys think? Should I give it a try?
Or do you have some completely different ideas?
Thanks for any help!
CodePudding user response:
Or do you have some completely different ideas?
You can do better with a better set of index designs. You want to optimize for your most expensive queries:
SELECT *
FROM myTable
WHERE Fk1 = 123456 AND FK2 = 654321
-- works badly because it gets 100.000 - 300.000 records but I need all of them. Yes, unique index is used because order of index is correct )
UPDATE myTable
SET Value1 = '1', Value2 = '2', Value3 = '3', Value 4 = '4',
Value5 = '5', Value6 = '6', Value7 = '7', Value8 = '8'
WHERE Fk1 = 123456 AND FK2 = 654321
Making FK1_FK2_FK3 the clustered index and making ID a non-clustered PK would be better. For queries that retrieve a handful of rows, using nested loop join from the non-clustered PK to the composite clustered index should be fine. But doing 300,000 lookups when querying by (Fk1,Fk2) is going to be expensive. It's so expensive that these queries might be doing table scans instead.
And after clustering the table by (FK1,FK2,FK3) consider partitioning it by FK2 into 10-100 separate partitions. Then a predicate like WHERE Fk1 = 123456 AND FK2 = 654321 will only have to scan the partition containing FK2=654321, and can seek in that partition directly to the first page with FK1=123456.
In addition consider ROW or PAGE compression if PAGEIOLATCH waits are a significant part of your query runtime.
