Click here to Skip to main content
15,868,016 members
Please Sign up or sign in to vote.
1.00/5 (2 votes)
Experts,
Need your support for this awk script. we have only one input file, all these column 1 and column 2 are in same file and have to do lookup for values in one file(column1 and column2) but output we need in another file
Need to grep row whose string contains 9K in column 1 and then grep its corresponding row in column 2. Suppose check for 9K1 then grep A1, check for A1 in column 1 if not, output result as shown in below expected output file
Then again check for 9K1, grep A2, look for A2 in column1, if found grep its corresponding row value B2.
Now check for B2 value in column 1, then grep for C2 instead of A2 as we have already considered previously in lookup. Now look for C2 in column and grep D2 instead of B2

This needs to be checked for all rows in column 1 that contains 9K, as there can be rows with value 9K2, 9K3, 9K4 in column 1 with corresponding data in column2




Input file :
9K1,A1
9K1,A2
9K1,A3
9K1,A4 
9K1,A5 
A2,B2
B2,A2
B2,C2
C2,B2
C2,D2
A5,B5
B5,C5
B5,A5
9K1,A6
A6,B6
B6,A6
B6,C6


Below is required and expected output which is needed(another file) :

9K1,A1
9K1,A2,B2,C2,D2
9K1,A3
9K1,A4
9K1,A5,B5,C5
9K1,A6,B6,C6


Output returned by above code: (which is not as per expectation)
9K1,A3
9K1,A4
9K1,A5
9K1,A6,B6,C6


What I have tried:

awk '
NR==FNR{
    assoc[$1]=$2
    next
}
(FNR!=1)&&($1~/9K/){
    printf "%s,%s", $1,$2
    seen[$1]; seen[$2]
    search=$2
    while((search in assoc) && !(assoc[search] in seen)){
        search=assoc[search]
        printf ",%s", search
        seen[search]
    }
    print ""
    for(var in seen){
         delete seen[var]
    }
}' input.csv input.csv > output.csv
Posted
Updated 7-May-23 0:46am

1 solution

/^9K/ {
        kees[$0];
        next;
}
{
        for ( key in kees )
        {
                #
                # is it part of the key ...
                #
                if ( match( key, $1) )
                {
                        if( match( kees[key],$2) == 0 )
                                kees[key]=kees[key]"," $2
                        break;
                }
                else if ( match( key, $2 ) )
                {
                        if( match( kees[key],$1 ) == 0 )
                                kees[key]=kees[key]"," $1
                        break;
                }

                #
                # not part of the key but a member
                #
                if ( match( kees[key], $1 ) )
                {
                        if( match( kees[key],$2) == 0 )
                                kees[key]=kees[key]"," $2
                }
                else if ( match( kees[key], $2 ) )
                {
                        if( match( kees[key],$1) == 0 )
                                kees[key]=kees[key]"," $1
                }

        }
}
END{
        for ( key in kees )
                print key kees[key]
}


Given your input, a very rough job,
(NB: you need to test/check ... this can be further reduced, I leave that to you.

awk -f s1.awk FS=',' input.csv
9K1,A1
9K1,A2,B2,C2,D2
9K1,A3
9K1,A4
9K1,A5,B5,C5
9K1,A6,B6,C6
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900