IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

DFSORT - Selecting Duplicate records


IBM Mainframe Forums -> DFSORT/ICETOOL
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
murali3955

New User


Joined: 25 Nov 2009
Posts: 11
Location: Chennai

PostPosted: Sun Apr 24, 2011 12:29 pm
Reply with quote

Hi,

I have to select records from a file which has duplicates. The file layout is below:


Client no ID No Account no
===========================
123344 1 0123344
234455 1 0123344
324521 1 0123344
234531 2 1236789
456879 3 1236789
784322 3 1236789


The file is sorted based on Account no.
For the same Account no and ID no, I have to select the records.

As in the example for Account no '0123344' and ID no '1' there are 3 rows, so this should be written to the file. But in case of Account no '1236789' the ID no changes as '2' and '3' so this should not be written to output.


Only when the ID no is same with an Account no it should be written to output file. If the ID no changes within the same Account no it should not be written to output.

Please let me know if this be achieved with DFSORT. I will be grateful for your ideas.


Thanks and Regards,
Murali
Back to top
View user's profile Send private message
Nic Clouston

Global Moderator


Joined: 10 May 2007
Posts: 2455
Location: Hampshire, UK

PostPosted: Sun Apr 24, 2011 8:40 pm
Reply with quote

I am not clear about what your output is like - is it one example of the duplicates of an account or is it all duplicates of the account or is it ALL records for an account with duplicates? I.E for account 0123344 do you want 1, 2 or 3 records written? If 1, which one? If 2, which one is not to be written?
Back to top
View user's profile Send private message
shrivatsa
Warnings : 1

Active User


Joined: 17 Mar 2006
Posts: 174
Location: Bangalore

PostPosted: Mon Apr 25, 2011 10:49 am
Reply with quote

Hi Murali,

I hope the below SORT output you are expecting

Code:
//S1    EXEC PGM=ICETOOL                         
//TOOLMSG DD SYSOUT=                             
//DFSMSG DD SYSOUT=*                             
//IN   DD *                                     
123344 1 0123344                                 
234455 1 0123344                                 
324521 1 0123344                                 
234531 2 1236789                                 
456879 3 1236789                                 
784322 3 1236789                                 
//OUT  DD SYSOUT=*                               
//NODUPS   DD SYSOUT=*                           
//TOOLIN   DD *                                 
 SELECT FROM(IN) TO(OUT) ON(10,7,CH) ON(8,1,CH) -
  ALLDUPS DISCARD(NODUPS)                       
/*


Output:-

Code:
123344 1 0123344
234455 1 0123344
324521 1 0123344
456879 3 1236789
784322 3 1236789
Back to top
View user's profile Send private message
murali3955

New User


Joined: 25 Nov 2009
Posts: 11
Location: Chennai

PostPosted: Mon Apr 25, 2011 1:04 pm
Reply with quote

Hi Nic/Shrivatsa,

Sorry for not mentioning how my output file should be.

Since for the Account no '1236789' the ID no changes to '2' and '3'. This should not be written to output.

Finally my output file should look like below.

Client no ID No Account no
===========================
123344 1 0123344
234455 1 0123344
324521 1 0123344

Thanks,
Murali.
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Developer


Joined: 15 Feb 2005
Posts: 7129
Location: San Jose, CA

PostPosted: Tue Apr 26, 2011 3:13 am
Reply with quote

Murali,

So basically are you saying that you only want to output the records for each account number that has only a single ID (e.g. every ID is 1 for that account number, or every ID is 2 for that account number, etc)?

What is the RECFM and LRECL of the input file?

What is the starting position, length and format of each field?

Is the ID always already in ascending order by account number? Or could you have IDs out of order, for example:

Code:

734531 3 3333333
000002 2 3333333
584322 4 3333333
888888 1 3333333


Please show an example of input and expected output with more variations.
Back to top
View user's profile Send private message
murali3955

New User


Joined: 25 Nov 2009
Posts: 11
Location: Chennai

PostPosted: Tue Apr 26, 2011 10:20 am
Reply with quote

Hi Frank,

Yes, you are correct. I want to write the records to output for the same Account number that has single ID.

RECFM=FB, LRECL=80.

The fields start position,length and format :

CLIENT NO --> 3,9, Numeric
ID NO --> 29,7,Alphanumeric
ACC NO --> 38,14,Numeric
WIP NO --> 60,9,Numeric

The input file is sorted based on Account number, ID No. So within the same account number, the ID NO will be in sorted order.

In the input, for the Account number '00000000001329' as the ID No changes. This Account number records should not be written to output and remaining records should be written.

The input/output files are attached.

Thanks,
Murali.
Back to top
View user's profile Send private message
Nic Clouston

Global Moderator


Joined: 10 May 2007
Posts: 2455
Location: Hampshire, UK

PostPosted: Tue Apr 26, 2011 7:49 pm
Reply with quote

Well, I am confused - for account 1329 you have duplicate IDs (2 for 7176830 and 2 for 7179998) so why is that account not to be written?
Back to top
View user's profile Send private message
murali3955

New User


Joined: 25 Nov 2009
Posts: 11
Location: Chennai

PostPosted: Tue Apr 26, 2011 10:01 pm
Reply with quote

My requirement is to write only those Account numbers that have same ID NO.

For account number '1329' there are 2 different ID NO's (7176830 and 7179998) so this should not be written to output.

For example refer to the Account number '30079'. There are 6 records with same ID no '8227896'. So this should be written to output.

Thanks,
Murali.
Back to top
View user's profile Send private message
Skolusu

Senior Member


Joined: 07 Dec 2007
Posts: 2205
Location: San Jose

PostPosted: Tue Apr 26, 2011 11:00 pm
Reply with quote

murali3955,

The following DFSORT JCL will give you the desired results.

Code:

//STEP0100 EXEC PGM=SORT                                             
//SYSOUT   DD SYSOUT=*                                               
//INA      DD DSN=Your Input FB 80 byte file,DISP=SHR
//INB      DD DSN=Same Input FB 80 byte file,DISP=SHR
//SORTOUT  DD SYSOUT=*                                               
//SYSIN    DD *                                                     
  JOINKEYS F1=INA,FIELDS=(38,14,A)                                   
  JOINKEYS F2=INB,FIELDS=(38,14,A)                                   
  REFORMAT FIELDS=(F1:1,80,F2:81,3)                                 
  OPTION COPY                                                       
  OUTFIL OMIT=(81,3,ZD,GT,0),BUILD=(1,80)                           
//*
//JNF2CNTL DD *                                                     
  INREC IFTHEN=(WHEN=INIT,OVERLAY=(81:SEQNUM,3,ZD,RESTART=(38,14))),
  IFTHEN=(WHEN=GROUP,KEYBEGIN=(29,22),PUSH=(84:SEQ=3)),             
  IFTHEN=(WHEN=(81,3,ZD,EQ,84,3,ZD),OVERLAY=(81:3C'0'))             
  SUM FIELDS=(81,3,ZD)                                               
//*
Back to top
View user's profile Send private message
murali3955

New User


Joined: 25 Nov 2009
Posts: 11
Location: Chennai

PostPosted: Wed Apr 27, 2011 10:56 am
Reply with quote

Hi Skolusu,

Thank you very much:D. The job worked fine.

Almost I was able to understand the job provided by you.But could you please explain me 'SUM FIELDS=(81,3,ZD)'.

I ran the job without SUM FIELDS.For each record in input, there were two records written to the output file.

SUM Fields example :
--------------------
SORT FIELDS=(1,5,CH,A)
SUM FIELDS=(11,4,ZD)

This will sort the fields 1,5 in Ascending order and then If there are any duplicates in 1,5 it will SUM the values in 11,4. Please correct me if I am wrong.

As per the SUM fields in the job, position 81,3 will have '000' since overlayed,when 81,3 and 84,3 are equal.Please let me know how SUM FIELDS=(81,3,ZD) identifies the duplicates and eliminates it without using any SORT fields.

Thanks,
Murali
Back to top
View user's profile Send private message
Skolusu

Senior Member


Joined: 07 Dec 2007
Posts: 2205
Location: San Jose

PostPosted: Wed Apr 27, 2011 9:55 pm
Reply with quote

murali3955 wrote:

As per the SUM fields in the job, position 81,3 will have '000' since overlayed,when 81,3 and 84,3 are equal.Please let me know how SUM FIELDS=(81,3,ZD) identifies the duplicates and eliminates it without using any SORT fields.

Thanks,
Murali


murali3955,

You need to look at the JNF2JMSG output and you will see that it used the following SORT statement

Code:

  SORT  FORMAT=BI,FIELDS=(38,14,A)


for a better understanding of the job run the following steps individually and check out the contents in pos 81 and 85.
Code:

//STEP0100 EXEC PGM=SORT                                               
//SYSOUT   DD SYSOUT=*                                                 
//SORTIN   DD *                                                         
  630774221                 8766917 00000000000317         630774221   
  963495850                 8766917 00000000000317         963495850   
  345695561                 54M1908 00000000000789         345695561   
  678609548                 54M1908 00000000000789         678609548   
  918367402                 54M1908 00000000000789         918367402   
  279702180                 7176830 00000000001329         279702180   
  379702180                 7176830 00000000001329         379702180   
  479702180                 7179998 00000000001329         479702180   
  579702180                 7179998 00000000001329         579702180   
  102834750                 8227896 00000000030079         102834750   
  192074221                 8227896 00000000030079         192074221   
  392074221                 8227896 00000000030079         392074221   
  592074221                 8227896 00000000030079         592074221   
  782074221                 8227896 00000000030079         782074221   
  982074221                 8227896 00000000030079         982074221   
//SORTOUT  DD SYSOUT=*                                                 
//SYSIN    DD *                                                         
  SORT FIELDS=COPY                                                     
  INREC IFTHEN=(WHEN=INIT,OVERLAY=(81:SEQNUM,3,ZD,RESTART=(38,14))),   
  IFTHEN=(WHEN=GROUP,KEYBEGIN=(29,22),PUSH=(85:SEQ=3)),                 
  IFTHEN=(WHEN=(81,3,ZD,EQ,85,3,ZD),OVERLAY=(81:3C'0'))                 
//*


Now add the sort statement and sum statement run this job and check the contents in pos 81 and 85 (especially for the acct # 1329 record)
Code:

//STEP0100 EXEC PGM=SORT                                             
//SYSOUT   DD SYSOUT=*                                               
//SORTIN   DD *                                                       
  630774221                 8766917 00000000000317         630774221 
  963495850                 8766917 00000000000317         963495850 
  345695561                 54M1908 00000000000789         345695561 
  678609548                 54M1908 00000000000789         678609548 
  918367402                 54M1908 00000000000789         918367402 
  279702180                 7176830 00000000001329         279702180 
  379702180                 7176830 00000000001329         379702180 
  479702180                 7179998 00000000001329         479702180 
  579702180                 7179998 00000000001329         579702180 
  102834750                 8227896 00000000030079         102834750 
  192074221                 8227896 00000000030079         192074221 
  392074221                 8227896 00000000030079         392074221 
  592074221                 8227896 00000000030079         592074221 
  782074221                 8227896 00000000030079         782074221 
  982074221                 8227896 00000000030079         982074221 
//SORTOUT  DD SYSOUT=*                                               
//SYSIN    DD *                                                       
  SORT FIELDS=(38,14,CH,A)                                           
  INREC IFTHEN=(WHEN=INIT,OVERLAY=(81:SEQNUM,3,ZD,RESTART=(38,14))), 
  IFTHEN=(WHEN=GROUP,KEYBEGIN=(29,22),PUSH=(85:SEQ=3)),               
  IFTHEN=(WHEN=(81,3,ZD,EQ,85,3,ZD),OVERLAY=(81:3C'0'))               
                                                                     
  SUM FIELDS=(81,3,ZD)                                               
//*
Back to top
View user's profile Send private message
murali3955

New User


Joined: 25 Nov 2009
Posts: 11
Location: Chennai

PostPosted: Thu Apr 28, 2011 10:07 am
Reply with quote

Hi Skolusu,

Thank you for your help and explanation. icon_biggrin.gif

Now I understood how it works.

Thanks,
Murali.
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> DFSORT/ICETOOL

 


Similar Topics
Topic Forum Replies
No new posts DFSORT GUID DFSORT/ICETOOL 2
No new posts To fetch records that has Ttamp value... DFSORT/ICETOOL 4
No new posts ICETOOL returns no records JCL & VSAM 1
No new posts Compare 2 files and retrive records f... DFSORT/ICETOOL 3
No new posts Compare 2 files(F1 & F2) and writ... JCL & VSAM 8
Search our Forums:

Back to Top