I need to generate a loan repayment schedule per date from a summarized loan table in Databricks usi-CodePudding

I have the following table

LoanID	Event_count	StartDateTime	EndDateTime	Frequency	Amount
12	3	2020-09-01T00:00:00Z	2020-12-01T00:00:00Z	Monthly	120
99	4	2021-01-01T00:00:00Z	2021-10-01T00:00:00Z	Quarterly	50

Column definitions

Event_count is the number of times a repayment is made.
StartDateTime is the time of the first payment.
EndDateTime is the date of the last payment.
Frequency is the interval of payment.
Amount is the sum that is paid back each time.

How do I transform this to the format below? (without using loops as they are not supported by Databricks Spark SQL)

Date	LoanID	RepaymentAmount	RepaymentNumber
2020-09-01	12	120	1
2020-10-01	12	120	2
2020-11-01	12	120	3
2021-01-01	99	50	1
2021-04-01	99	50	2
2021-07-01	99	50	3
2021-10-01	99	50	4

CodePudding user response：

As per today , Databricks SQL is being solved as federated query engine for most of the use cases . So this is not being used for any transformation use case . Thats why Spark SQL support is for basic SQL queries only . So procedure-oriented queries (Like for, while) are not supported on Spark. This would fall under a new feature request and may be implemented in future. You should handle through pyspark or spark SQL .

CodePudding user response：

You should be able to use the range function with a CROSS JOIN, a simple example:

%sql
SELECT * 
FROM tmp t
    CROSS JOIN range(1, 99) r
WHERE r.id <= t.Event_count

Just a word of caution on the event count - the second argument to range should be the max number of events you can have, I've just guessed at 99 here, plus cross joins can be slow - please test with your data