Friday, February 26, 2021

How to sort on the bits of a byte using IBM DFSORT?

Recently, I came across a DFSORT coding challenge titled as "Odds & Evens".

The problem statement goes like this - Given a file with valid sequence numbers in columns 1 thru 6, sort the file so the corresponding output has all the even numbered records first, followed by all the odd numbered records.

I put on my thinking cap 🎩 for a while and came up with the following answer:

 ----+----1----+----2----+----3----+----4----+----5----+----6----+----7--  
 ***************************** Top of Data ******************************  
 //Z01071A JOB 1,NOTIFY=&SYSUID                       
 //STEP01  EXEC PGM=SORT                           
 //SORTIN  DD *                               
 000001                                   
 000002                                   
 000003                                   
 000004                                   
 000005                                   
 000006                                   
 000007                                   
 000008                                   
 000009                                   
 000010                                   
 000011                                   
 000012                                   
 000013                                   
 000014                                   
 000015                                   
 000016                                   
 000017                                   
 000018                                   
 000019                                   
 000020                                   
 //SORTOUT DD SYSOUT=*                            
 //SYSOUT  DD SYSOUT=*                            
 //SYSIN   DD *                               
   INREC IFTHEN=(WHEN=GROUP,RECORDS=2,PUSH(10:SEQ=1))            
   SORT FIELDS=(10,1,CH,D,1,6,CH,A)                     
   OUTREC FIELDS=(1,6)                            
 /*                                     
I just formed a group of 2 records and PUSH'ed sequence numbers (of 1 byte) for each record of the group. As there are only 2 records in a group, the sequence number will be 1 for the first record and 2 for the second record. The sequence number will be restarted from 1 when a new group is started. 

Then, I used the sequence number field (at col 10) in the SORT statement to sort it in the descending order so that all the records with sequence number as 2 will be at the top.  A secondary sort was applied on the first 6 bytes. 

Submitting this job, I got the following output,
  COMMAND INPUT ===>                                            SCROLL ===> CSR   
 ********************************* TOP OF DATA **********************************  
 000002                                       
 000004                                       
 000006                                       
 000008                                       
 000010                                       
 000012                                       
 000014                                       
 000016                                       
 000018                                       
 000020                                       
 000001                                       
 000003                                       
 000005                                       
 000007                                       
 000009                                       
 000011                                       
 000013                                       
 000015                                       
 000017                                       
 000019                                       
 ******************************** BOTTOM OF DATA ********************************  
WHEN=GROUP is one amazing feature in DFSORT, thanks to Frank Yaeger from IBM DFSORT Development team, as he is one of the brains behind the invention of WHEN=GROUP.

We got the answer. Are we done here?

Nope, I'm just done with the Intro. 

The main reason behind the idea of writing this blog post was that when I was looking at other answers, I stumbled upon a solution which had a syntax that I've never seen before. It goes like this: 
SORT FIELDS=(6.7,0.1,BI,A),EQUALS
Most of us would use the SORT control statement to specify the control field based on which the sorting should take place. We provide,
  1. the position of the field within the record
  2. the length of the field (in bytes)
  3. the format of the data in control field
  4. the order in which field must be sorted (ascending or descending)
Let's take the first 2 items. The position of the field within the record is the byte positon relative to the beginning of the record. The length of the field is usually expressed in integer numbers of bytes. We deal with Bytes (and a pet lover has to deal with bites 🐢 sometimes).

Let's take a look under the hood πŸ”§


A byte consists of 2 nibbles and each nibble is 4 bits long. A bit is either 0 or 1.

IBM Mainframe uses the EBCDIC character encoding. Each character is represented by its 8 bit EBCDIC Code. When we turn on the Hex mode, we will be able to see an hex value for each byte. When the hex value of each byte is converted to binary, we'll get the corresponding bits. 

For example, 

SRINI becomes,
E2        D9        C9        D5        C9                  Hexadecimal
11100010  11011001  11001001  11010101  11001001    Binary

πŸ“£IBM DFSORT allows us to sort on the bits of a byte with "bytes.bits" notation. 

How to sort on the bits of a byte?

Now, we know that each character has got an 8 bit binary value, we can use the bytes.bits notation to sort using bits.
  • First, specify the byte location relative to the beginning of the record and follow it with a period.
  • Then, specify the bit location relative to the beginning of that byte. Remember that the first (high-order) bit of a byte is bit 0 (not bit 1); the remaining bits are numbered 1 through 7.
In SORT FIELDS=(6.7,0.1,BI,A),EQUALS statement,
6.7 - says that the starting postion is the last bit in byte 6. 
0.1 - says that the length is 1 bit. 
BI  - for Binary format as we want to sort on bits
A  - for Ascending order. 

But why 6.7 as the start position of the control field? 

That's because by looking at the 6th byte of every sequence number, we can say whether that's an even number or odd number.

Example:
000001 - πŸ‘€ -> that's an odd number
000002 - πŸ‘€ -> that's an even
000003 - πŸ‘€ -> that's an odd
000004 - πŸ‘€ -> that's an even
000005 - πŸ‘€ -> that's an odd
000006 - πŸ‘€ -> that's an even. I'm tiredπŸ˜‘
....
....
.... and so on.

Another significance is that for each even number, the Least significant bit (the last bit) is 0 and for each odd number, it's 1. 

Example:
1         EBCDIC character
F1        Hexadecimal  
11110001  Binary

2         EBCDIC character
F2        Hexadecimal
11110010  Binary

3         EBCDIC character   
F3        Hexadecimal
11110011  Binary

4         EBCDIC character
F4        Hexadecimal
11110100  Binary

Hence, if we sort the last bit of 6th byte in ascending order, we would get all the even numbered records first, followed by the odd numbered records. 

The EQUALS parameter is coded in the SORT statement to preserve the original sequene in the output. If EQUALS is not coded, then the output will have all the even numbered records first, followed by the odd numbered record but the even/odd numbered records will not be in sorted order.

Let's try running this SORT operation using Python 🐍


We can make use of the Python API's provided by ZOAU to run the SORT operation. Z Open Automation Utilities (abbreviated to ZOAU) lets you perform many tasks on z/OS without needing to get into JCL. IBM has developed a bridge between Python and z/OS by creating API's for Python which allow Pythonistas to access z/OS resources. 

Before we start, we need the following stuff to run the SORT operation from Python:
  1. VS Code with Zowe explorer and IBM Z Open Editor extensions.
  2. Access to Zowe explorer.
  3. Access to USS (Unix System Services). 
  4. Little bit of Python Skills.
Note: Access to Zowe explorer and USS can be obtained when you sign up for MTM2020.

First, we need to create a new file under your home directory (/z/zxxxxx) in Unix Sytem Services. Use the touch command to create a new file.

I created one using this command, touch run_sort.py. Then, I used the IBM Z Open Editor to write the following code inside this file.

 
I've used Trinket to embed the Python code in this blog post. Note that you may not be able to RUN πŸƒ this script as the ZOAU utilities for Python aren't available in Trinket.

Let's walk through the code.

Lines 1 thru 3: The required ZOAU libraries for Python are imported so that you can use them in your code. 

Lines 5 thru 9: Line #5 uses the os.getenv() method in Python with 'USER' as argument. As the operating system that Python is running under is z/OS, USERID variable is assigned with your TSO user ID. 
Lines 6 thru 9 has got 3 variables of string type to store the dataset names. 

Lines 11 thru 28: Lines 11 thru 28 mimics the functionality of IEFBR14 utility. These lines delete the datasets before creation. We make use of the zoautil_py.Datasets module which has got several dataset related functions like create, delete, exists and so on. 

Lines 30 thru 41: Writes data into the SORTIN and SYSIN datasets. 
Line #31 defines an empty list called num.This list is created to store the sequence numbers from 1 to 20. Read more about lists and how to access the elements in a list πŸ‘‰ here
Lines 34 and 35 creates sequence numbers from 1 to 20 with the help of for loop and range() function in Python. zfill() method is used to populate leading zeros. As zfill() method can be applied only on string data, the numbers are type converted to string using str() function (in line #35). All the sequence numbers are appended to the list, num. The list is then written to the SORTIN dataset.
Lines 40 and 41 writes the SORT statements to the SYSIN dataset. The write functionality is achieved via zoautil_py.Datasets module.

Lines 43 thru 53: Line 44 creates an empty list called dd_names to store the DD names that are needed for the SORT program to run. 
In Line 53, MVSCmd.execute API is called to run the program SORT with arguments MSGPRT=CRITICAL,LIST (which goes to the PARM parameter in EXEC statement) and the list of DDStatements created in lines 47 thru 50. When this instruction is executed, a job might be submitted on z/OS in the background. 

Line #55 checks the return code from MVSCmd.execute API call. If it's zero, a message is displayed in the terminal and the output dataset (SORTOUT) created from the previous execution is read. 

When the program is run with python3 run_sort.py command in terminal, we get the following output.

Note: Click on the picture to get an enlarged view. 

The records in the output dataset are displayed in the terminal after running the Python code. The even numbered records are at the top, followed by the odd numbered records.

The datasets that were created from Python can also be accessed from the terminal. The datasets are shown below:

The input dataset to the SORT program. The sequence numbers were generated in Python.


The SYSIN input to the SORT program. The SORT statements were written from Python.


The output dataset. 

We have reached the bottom of this post and we discussed about two things:
  1. How to sort on the bits of a byte using IBM DFSORT?
  2. How to perform DFSORT operation using Python? 
I hope the content in this post was helpful to you. Please post your questions/suggestions in the Comments section of this post. 

Thx and Happy Weekend!


1 comment:

  1. Can we use to check a bit in particular byte using include cond=(6.7,0.1,BI,EQ,1).

    ReplyDelete