Home > Back-end >  I need to generate a loan repayment schedule per date from a summarized loan table in Databricks usi
I need to generate a loan repayment schedule per date from a summarized loan table in Databricks usi

Time:01-15

I have the following table

LoanID Event_count StartDateTime EndDateTime Frequency Amount
12 3 2020-09-01T00:00:00Z 2020-12-01T00:00:00Z Monthly 120
99 4 2021-01-01T00:00:00Z 2021-10-01T00:00:00Z Quarterly 50

Column definitions

  • Event_count is the number of times a repayment is made.
  • StartDateTime is the time of the first payment.
  • EndDateTime is the date of the last payment.
  • Frequency is the interval of payment.
  • Amount is the sum that is paid back each time.

How do I transform this to the format below? (without using loops as they are not supported by Databricks Spark SQL)

Date LoanID RepaymentAmount RepaymentNumber
2020-09-01 12 120 1
2020-10-01 12 120 2
2020-11-01 12 120 3
2021-01-01 99 50 1
2021-04-01 99 50 2
2021-07-01 99 50 3
2021-10-01 99 50 4

CodePudding user response:

As per today , Databricks SQL is being solved as federated query engine for most of the use cases . So this is not being used for any transformation use case . Thats why Spark SQL support is for basic SQL queries only . So procedure-oriented queries (Like for, while) are not supported on Spark. This would fall under a new feature request and may be implemented in future. You should handle through pyspark or spark SQL .

CodePudding user response:

You should be able to use the range function with a CROSS JOIN, a simple example:

%sql
SELECT * 
FROM tmp t
    CROSS JOIN range(1, 99) r
WHERE r.id <= t.Event_count

Just a word of caution on the event count - the second argument to range should be the max number of events you can have, I've just guessed at 99 here, plus cross joins can be slow - please test with your data

  •  Tags:  
  • Related