How to do data enrichment on CSV file with Apache NiFi using SimpleDatabaseLookupService
I have a
csv file like this:
ProductId,CategoryId 1,1 1,2 1,3 2,1 2,2 2,3 ... ... nearly 1 million records
csv, I want to produce another
csv like below:
ProductId,CategoryId,ProductName,CategoryName 1,1,Shirt,Mens 1,2,Shirt,Boys 1,3,Shirt,Uni 2,1,Watch,Mens 2,2,Watch,Boys 2,3,Watch,Uni
To this, I have setup NiFi flow as below:
GetFile -> LookupRecord -> uses SimpleDatabaseLookupService to query DB table "PRODUCT" using ProductId to get ProductName -> uses CSVRecordSetWriter to write the ProdcutName value back to the CSV -> LookupRecord -> uses SimpleDatabaseLookupService to query DB table "CATEGORY" using CategoryId to get CategoryName -> uses CSVRecordSetWriter to write the CategoryName value back to the CSV
This works, atleast on files which has about 40K lines. But, as soon as I feed the original file containing over a million records, NiFi just hangs.
So my questions is: Is there a way to optimize my flow, which can work with such large data sets ?
Note: there is a repetition of ProductId and CategoryId in my file. My current flow executes for each row. I was wondering if this fact can be leveraged to optimize the flow, but I couldn't figure out how. Any help will be greatly appreciated. Thanks.