Friday, June 21, 2024

Using DFSORT to Generate Bulk SQL Queries: A Step-by-Step Guide

One of the frequent tasks that I have dealt with is bulk updates to the DB2 table. I'll be provided with an excel spreadsheet containing thousands of rows, each specifying the current and target values for the columns of the table. Manually writing an UPDATE SQL for each row of the excel sheet is not only tedious but highly inefficient. Even if it takes just 30 seconds to write the UPDATE SQL for a row, you'll end up spending 8.3 hours (roughly 1 business day) to complete the task.

In this blog post, I'll show you how to automate the generate of SQL queries for bulk updates. There are many ways to automate it, but I'll be using IBM DFSORT.  

What is DFSORT?

DFSORT is a high-performance sort, merge, copy, and data manipulation utility used on IBM mainframes. It's an incredibly versatile tool that can handle a variety of data processing tasks, including the generation of SQL queries from a dataset.

The Scenario

Imagine you receive an Excel spreadsheet with thousands of rows, each containing the current and target values for specific columns in your DB2 table. Your task is to update the table with these target values. The sheet consists of details about Employees and the task is to update the current salary with the target salary. 

Preparing the Dataset

First, convert the Excel spreadsheet into a format suitable for processing in Mainframe, such as a comma-separated values (CSV) file. Once you have the CSV file, you need to upload it to the mainframe and create a dataset (INPUT.DATA) that DFSORT can process. At the site I work for, I use WS FTP Pro software to download/upload files from/to Mainframe server. 

Assuming your CSV data looks like this:

 001,John,Smith,Sales,5000,5500  
 002,Jane,Doe,HR,6000,6500  
 003,Mike,Johnson,IT,5500,6000  
 ...  

Each field is:

  • Employee ID: Position 1-3
  • First Name: Position 5-10
  • Last Name: Position 12-17
  • Department: Position 19-24
  • Current Salary: Position 26-30
  • Target Salary: Position 32-36

Reformatting the CSV file 

After uploading the CSV file to Mainframe, we need to reformat the CSV file into a fixed-field dataset where each column value starts at a specific position. We can achieve this using DFSORT's PARSE feature.
 //REFORMAT EXEC PGM=SORT  
 //SYSOUT  DD SYSOUT=*  
 //SORTIN  DD DSN=INPUT.CSV,DISP=SHR  
 //SORTOUT DD DSN=INPUT.DATA,DISP=(NEW,CATLG,DELETE),  
 //       SPACE=(CYL,(1,1)),UNIT=SYSDA  
 //SYSIN  DD *  
  OPTION COPY  
  INREC PARSE=(%01=(ENDBEFR=C',',FIXLEN=3),  
         %02=(ENDBEFR=C',',FIXLEN=6),  
         %03=(ENDBEFR=C',',FIXLEN=7),  
         %04=(ENDBEFR=C',',FIXLEN=6),  
         %05=(ENDBEFR=C',',FIXLEN=5),  
         %06=(FIXLEN=5)),  
     BUILD=(%01,3X,%02,2X,%03,X,%04,2X,%05,2X,%06)  
 /*  

Explanation

  • OPTION COPY: Instructs DFSORT to copy the input records.
  • INREC PARSE: Defines how to parse the input CSV records:
    • %01=(ENDBEFR=C',',FIXLEN=3): Parses the first field (Employee ID) up to the comma and with a fixed length of 3.
    • %02=(ENDBEFR=C',',FIXLEN=6): Parses the second field (First Name) up to the comma and with a fixed length of 6.
    • %03=(ENDBEFR=C',',FIXLEN=7): Parses the third field (Last Name) up to the comma and fixed length of 6.
    • %04=(ENDBEFR=C',',FIXLEN=6): Parses the fourth field (Department) up to the comma and fixed length of 6.
    • %05=(ENDBEFR=C',',FIXLEN=5): Parses the fifth field (Current Salary) up to the comma and fixed length of 5.
    • %06=(FIXLEN=5): Parses the sixth field (Target Salary) with fixed length of 5.
  • BUILD: Rebuilds the records with fixed positions and adds spaces as needed.

  • Generating SQL Queries with DFSORT

    Now that we have the fixed-field dataset (INPUT.DATA), we can proceed to generate the SQL queries:
     //SORTSTEP EXEC PGM=SORT   
     //SYSOUT DD SYSOUT=*   
     //SORTIN DD DSN=INPUT.DATA,DISP=SHR   
     //SORTOUT DD DSN=OUTPUT.SQL,DISP=(NEW,CATLG,DELETE),   
     //    SPACE=(CYL,(1,1)),UNIT=SYSDA   
     //SYSIN DD *   
      OPTION COPY   
      OUTFIL BUILD=(C'-- UPDATE FOR ',7,6,X,15,7,80:X,/,  
            C'UPDATE EMPLOYEES ',80:X,/,  
            C'  SET SALARY = ',38,5,80:X,/,  
            C' WHERE EMP_ID = ',1,3,80:X,/,  
            C'  AND SALARY = ',31,5,80:X,/,  
            C';',80:X,/,  
            80:X)   
     /*   
    

    Explanation

  • OPTION COPY: Instructs DFSORT to copy the input records.
  • OUTFIL: Defines the output file, the default being SORTOUT.
  • BUILD: Specifies the format of the output records. Here's a breakdown:
    • C'-- UPDATE FOR ': Adds a comment with the employee's name.
    • 7,6,X,15,7,80:X,/,: Extracts the first name and last name. Adds a line break (/) after 80th byte.
    • C'UPDATE EMPLOYEES ',80:X,/,: Static text for the SQL SET statement. Adds a line break (/) after 80th byte so that further statements of the SQL can be written in the next line. 
    • C'  SET SALARY = ',38,5,80:X,/,: Static text for the SQL SET statement. Extracts the target salary value from position 38, length 5 and places it after the SET statement. Adds a line break after 80th byte.
    • C' WHERE EMP_ID = ',1,3,80:X,/,: Static text for the WHERE clause. Extracts the employee ID from position 1, length 3 and places it after the first column in the WHERE clause. Adds a line break after 80th byte. 
    • C' AND SALARY = ',31,5,80:X,/,: Adds a condition for the current salary. Extracts the current salary value from position 31, length 5. As usual, a line break is added after the 80th byte. 
    • 32,5: Extracts the target salary value from position 32, length 5.
    • C';',80:X,/,: Adds a semicolon to end the SQL statement. A line break is added after the 80th byte.
    • 80:X: Adds an empty line.
  • The Output

    After running the DFSORT step, the OUTPUT.SQL dataset will contain SQL update statements like this:

     -- UPDATE FOR JOHN  SMITH  
     UPDATE EMPLOYEES   
       SET SALARY = 5500   
      WHERE EMP_ID = 001   
       AND SALARY = 5000  
     ;  
       
     -- UPDATE FOR JANE  DOE  
     UPDATE EMPLOYEES   
       SET SALARY = 6500   
      WHERE EMP_ID = 002   
       AND SALARY = 6000  
     ;  
       
     -- UPDATE FOR MIKE  JOHNSON  
     UPDATE EMPLOYEES   
       SET SALARY = 6000   
      WHERE EMP_ID = 003   
       AND SALARY = 5500  
     ;  
    
    It is to be noted that for one record in the input dataset, DFSORT wrote 7 lines in the output dataset. Now that the bulk SQL queries are generated, the next step would be to execute the SQL queries using IKJEFT01 utility. 

    Conclusion

    By leveraging DFSORT’s powerful data manipulation capabilities, you can automate the generation of SQL queries for bulk updates to your DB2 tables. This method not only saves a significant amount of time but also minimizes errors associated with manual SQL writing. Whether you’re handling hundreds or thousands of updates, DFSORT provides a robust and efficient solution for your data processing needs on the mainframe.

    There are other ways to generate bulk SQL queries. In the next blog post, I'll show you how to achieve the same task using REXX. 

    Hope this helps. Should you have any questions, please leave them in the Comments section below. 



    No comments:

    Post a Comment