Click here to Skip to main content
15,880,796 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
Experts,
Need your support for this awk script.

we have only one input file, all these column 1 and column 2 are in same file and have to do lookup for values in one file(column1 and column2) but output we need in another file
Need to grep row whose string contains 9K from column 1. When found match, grep that row suppose(BGL_0BC_901_1AG_A_CASR9KTR176) and pick corresponding rows form column 2 for that column 1 matched string.
If column2 contains 5 rows, then pick each value one by one and search in column1 and grep its corresponding row data from column 2 and prepare new output file in below format.
In output file all lines should start with columns value/string containing 9K


BGL_0BC_901_1AG_A_CASR9KTR176,BGL_CHT_903_1AC_B_CASR920R879,BGL_BAM_912_2AC_B_CASR920R879,BGL_BAM_912_1AC_B_CASR920R879,BGL_BAM_907_1AC_B_CASR920R879


For example :

if strings in rows of column1 contains 9K. Suppose we find (BGL_0BC_901_1AG_A_CASR9KTR176) in column 1. Now pick it's corresponding row value(BGL_KMR_919_1AC_B_CASR920R899) from column 2 and look in column1

column1,column2
BGL_0BC_901_1AG_A_CASR9KTR176,BGL_KMR_919_1AC_B_CASR920R899
BGL_0BC_901_1AG_A_CASR9KTR176,BGL_CHT_903_1AC_B_CASR920R879
BGL_0BC_901_1AG_A_CASR9KTR176,BGL_0UT_901_1AC_CASR903R551
BGL_0BC_901_1AG_A_CASR9KTR176,BGL_YOT_919_1AC_CASR903R458
BGL_0BC_901_1AG_A_CASR9KTR176,BGL_0BC_901_1AC_T_CASR920R504
BGL_CHT_903_1AC_B_CASR920R879,BGL_BAM_910_1AC_B_CASR920R879
BGL_BAM_910_1AC_B_CASR920R879,BGL_CHT_903_1AC_B_CASR920R879
BGL_BAM_910_1AC_B_CASR920R879,BGL_BAM_912_2AC_B_CASR920R879
BGL_BAM_912_2AC_B_CASR920R879,BGL_BAM_910_1AC_B_CASR920R879
BGL_BAM_912_2AC_B_CASR920R879,BGL_BAM_912_1AC_B_CASR920R879
BGL_BAM_912_1AC_B_CASR920R879,BGL_BAM_907_1AC_B_CASR920R879
BGL_BAM_912_1AC_B_CASR920R879,BGL_BAM_912_2AC_B_CASR920R879
BGL_2BC_901_1AG_A_CASR9KTR124,BGL_BGM_908_1AC_CASR903R173
BGL_2BC_901_1AG_A_CASR9KTR124,BGL_ABT_932_1AC_CASR903R963
BGL_2BC_901_1AG_A_CASR9KTR124,BGL_2BC_901_1AC_T_CASR920R948
BGL_2BC_901_1AG_A_CASR9KTR124,BGL_2BC_901_1AC_T_CASR920R948
BGL_2BC_901_1AG_A_CASR9KTR124,BGL_ABT_918_1AC_CASR903R963


if value(BGL_KMR_919_1AC_B_CASR920R899) is found in column 1 then pick it's corresponding row value from column 2 and if not found output the result in new file like below

output file :

BGL_0BC_901_1AG_A_CASR9KTR176,BGL_KMR_919_1AC_B_CASR920R899


Continuing, Now again for(BGL_0BC_901_1AG_A_CASR9KTR176) start and take the second row value(BGL_CHT_903_1AC_B_CASR920R879) from column 2 and look for this in column 1 and if found pick for its corresponding row value(BGL_BAM_910_1AC_B_CASR920R879) from column 2


column1,column 2
BGL_CHT_903_1AC_B_CASR920R879,BGL_BAM_910_1AC_B_CASR920R879

Now again look for value(BGL_BAM_910_1AC_B_CASR920R879) in column 1, if found pick for its corresponding row value(BGL_BAM_912_2AC_B_CASR920R879) from column 2 In this case we don't have to consider(BGL_CHT_903_1AC_B_CASR920R879) as we have already looked for it in above part

column 1                          column 2
BGL_BAM_910_1AC_B_CASR920R879 ,BGL_CHT_903_1AC_B_CASR920R879
BGL_BAM_910_1AC_B_CASR920R879,BGL_BAM_912_2AC_B_CASR920R879


Now again look for row value picked from column 2(BGL_BAM_912_2AC_B_CASR920R879) in column 1 and if found pick for its corresponding column 2 value(BGL_BAM_912_1AC_B_CASR920R879) In this case also we don't have to consider(BGL_BAM_910_1AC_B_CASR920R879) as we have already looked for it in above part

BGL_BAM_912_2AC_B_CASR920R879,BGL_BAM_910_1AC_B_CASR920R879
BGL_BAM_912_2AC_B_CASR920R879,BGL_BAM_912_1AC_B_CASR920R879


Now again look for row value picked from column 2(BGL_BAM_912_1AC_B_CASR920R879) in column 1 and if found pick for its corresponding column 2 value(BGL_BAM_907_1AC_B_CASR920R879) Tn this case also we don't have to consider(BGL_BAM_912_2AC_B_CASR920R879 as we have already looked for it in above part

BGL_BAM_912_1AC_B_CASR920R879,BGL_BAM_907_1AC_B_CASR920R879
BGL_BAM_912_1AC_B_CASR920R879,BGL_BAM_912_2AC_B_CASR920R879


Now again look for row value picked from column 2(BGL_BAM_907_1AC_B_CASR920R879) in column 1 and if don't find any corresponding row value in column 2. Stop the search and append the result to above sampple output file in below format

REQUIRED SAMPLE OUTPUT FILE :

GL_0BC_901_1AG_A_CASR9KTR176,BGL_KMR_919_1AC_B_CASR920R899
BGL_0BC_901_1AG_A_CASR9KTR176,BGL_CHT_903_1AC_B_CASR920R879,BGL_BAM_912_2AC_B_CASR920R879,BGL_BAM_912_1AC_B_CASR920R879,BGL_BAM_907_1AC_B_CASR920R879


Similary grep done for(BGL_0BC_901_1AG_A_CASR9KTR176), have to repeat and keep looking for the values in other rows values in column 2 in complete csv file and have to perform operation like above.

Similary have to perform operation like above for all others strings like BGL_2BC_901_1AG_A_CASR9KTR124 that has 9K in column 1 in this csv file.

Code i am trying but dnt know how to complete the code for above required

What I have tried:

awk '
NR==FNR{
    assoc[$1]=$2
    next
}
FNR!=1{         
    printf "%s,%s", $1,$2
    seen[$1]; seen[$2]
    search=$2 
    while((search in assoc) && !(assoc[search] in seen)){
        search=assoc[search]
        printf ",%s", search
        seen[search]
    }
    print ""
    for(var in seen){ 
         delete seen[var]
    }
}' inputfile.csv inputfile.csv > output.csv
Posted

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900