Replacing matching entries in one column of a file by another column from a different fileMerge two files: two lines, partial line, two lines, partial line, etcFind common elements in a given column from two files and output the column values from each filecompare multiple files(more than two) with two different columnsReplace column in one file with column from another using awk?Joining columns from files if they contain a match in another columnMerging two files, one column at a timeColumn matching in separate filesExtract row if both column values appear in a single column from a separate fileJoining entries based off of column using awk/joinCompare two files by first column. Keep rows if matchingRecursively find and replace contents of one file using a key from another file

Can a Cauchy sequence converge for one metric while not converging for another?

Theorems that impeded progress

Replacing matching entries in one column of a file by another column from a different file

Codimension of non-flat locus

Watching something be written to a file live with tail

Do infinite dimensional systems make sense?

Can you really stack all of this on an Opportunity Attack?

DC-DC converter from low voltage at high current, to high voltage at low current

Why does Kotter return in Welcome Back Kotter?

How to format long polynomial?

Could an aircraft fly or hover using only jets of compressed air?

Alternative to sending password over mail?

How can I prevent hyper evolved versions of regular creatures from wiping out their cousins?

What typically incentivizes a professor to change jobs to a lower ranking university?

Is it unprofessional to ask if a job posting on GlassDoor is real?

What defenses are there against being summoned by the Gate spell?

How to determine what difficulty is right for the game?

Convert two switches to a dual stack, and add outlet - possible here?

Why is Minecraft giving an OpenGL error?

Are astronomers waiting to see something in an image from a gravitational lens that they've already seen in an adjacent image?

How does one intimidate enemies without having the capacity for violence?

Does an object always see its latest internal state irrespective of thread?

What are these boxed doors outside store fronts in New York?

Languages that we cannot (dis)prove to be Context-Free



Replacing matching entries in one column of a file by another column from a different file


Merge two files: two lines, partial line, two lines, partial line, etcFind common elements in a given column from two files and output the column values from each filecompare multiple files(more than two) with two different columnsReplace column in one file with column from another using awk?Joining columns from files if they contain a match in another columnMerging two files, one column at a timeColumn matching in separate filesExtract row if both column values appear in a single column from a separate fileJoining entries based off of column using awk/joinCompare two files by first column. Keep rows if matchingRecursively find and replace contents of one file using a key from another file






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








4















I have two tab-separated files which look as follows:



file1:



NC_008146.1 WP_011558474.1 1155234 1156286 44173
NC_008146.1 WP_011558475.1 1156298 1156807 12
NC_008146.1 WP_011558476.1 1156804 1157820 -3
NC_008705.1 WP_011558474.1 1159543 1160595 42748
NC_008705.1 WP_011558475.1 1160607 1161116 12
NC_008705.1 WP_011558476.1 1161113 1162129 -3
NC_009077.1 WP_011559727.1 2481079 2481633 8
NC_009077.1 WP_011854835.1 1163068 1164120 42559
NC_009077.1 WP_011854836.1 1164127 1164636 7


file2:



NC_008146.1 GCF_000014165.1_ASM1416v1_protein.faa
NC_008705.1 GCF_000015405.1_ASM1540v1_protein.faa
NC_009077.1 GCF_000016005.1_ASM1600v1_protein.faa


I want to match column 1 of file1 to file2 and replace itself with the respective column 2 entry of file 2.
The output would look like this:



GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7









share|improve this question









New contributor




BhushanDhamale is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • It looks like you might also be interested in our sister site: Bioinformatics.

    – terdon
    12 hours ago











  • Thank you for the link @terdon!

    – BhushanDhamale
    12 hours ago

















4















I have two tab-separated files which look as follows:



file1:



NC_008146.1 WP_011558474.1 1155234 1156286 44173
NC_008146.1 WP_011558475.1 1156298 1156807 12
NC_008146.1 WP_011558476.1 1156804 1157820 -3
NC_008705.1 WP_011558474.1 1159543 1160595 42748
NC_008705.1 WP_011558475.1 1160607 1161116 12
NC_008705.1 WP_011558476.1 1161113 1162129 -3
NC_009077.1 WP_011559727.1 2481079 2481633 8
NC_009077.1 WP_011854835.1 1163068 1164120 42559
NC_009077.1 WP_011854836.1 1164127 1164636 7


file2:



NC_008146.1 GCF_000014165.1_ASM1416v1_protein.faa
NC_008705.1 GCF_000015405.1_ASM1540v1_protein.faa
NC_009077.1 GCF_000016005.1_ASM1600v1_protein.faa


I want to match column 1 of file1 to file2 and replace itself with the respective column 2 entry of file 2.
The output would look like this:



GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7









share|improve this question









New contributor




BhushanDhamale is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • It looks like you might also be interested in our sister site: Bioinformatics.

    – terdon
    12 hours ago











  • Thank you for the link @terdon!

    – BhushanDhamale
    12 hours ago













4












4








4








I have two tab-separated files which look as follows:



file1:



NC_008146.1 WP_011558474.1 1155234 1156286 44173
NC_008146.1 WP_011558475.1 1156298 1156807 12
NC_008146.1 WP_011558476.1 1156804 1157820 -3
NC_008705.1 WP_011558474.1 1159543 1160595 42748
NC_008705.1 WP_011558475.1 1160607 1161116 12
NC_008705.1 WP_011558476.1 1161113 1162129 -3
NC_009077.1 WP_011559727.1 2481079 2481633 8
NC_009077.1 WP_011854835.1 1163068 1164120 42559
NC_009077.1 WP_011854836.1 1164127 1164636 7


file2:



NC_008146.1 GCF_000014165.1_ASM1416v1_protein.faa
NC_008705.1 GCF_000015405.1_ASM1540v1_protein.faa
NC_009077.1 GCF_000016005.1_ASM1600v1_protein.faa


I want to match column 1 of file1 to file2 and replace itself with the respective column 2 entry of file 2.
The output would look like this:



GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7









share|improve this question









New contributor




BhushanDhamale is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












I have two tab-separated files which look as follows:



file1:



NC_008146.1 WP_011558474.1 1155234 1156286 44173
NC_008146.1 WP_011558475.1 1156298 1156807 12
NC_008146.1 WP_011558476.1 1156804 1157820 -3
NC_008705.1 WP_011558474.1 1159543 1160595 42748
NC_008705.1 WP_011558475.1 1160607 1161116 12
NC_008705.1 WP_011558476.1 1161113 1162129 -3
NC_009077.1 WP_011559727.1 2481079 2481633 8
NC_009077.1 WP_011854835.1 1163068 1164120 42559
NC_009077.1 WP_011854836.1 1164127 1164636 7


file2:



NC_008146.1 GCF_000014165.1_ASM1416v1_protein.faa
NC_008705.1 GCF_000015405.1_ASM1540v1_protein.faa
NC_009077.1 GCF_000016005.1_ASM1600v1_protein.faa


I want to match column 1 of file1 to file2 and replace itself with the respective column 2 entry of file 2.
The output would look like this:



GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7






awk






share|improve this question









New contributor




BhushanDhamale is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




BhushanDhamale is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 12 hours ago









Rui F Ribeiro

41.9k1483142




41.9k1483142






New contributor




BhushanDhamale is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 13 hours ago









BhushanDhamaleBhushanDhamale

1453




1453




New contributor




BhushanDhamale is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





BhushanDhamale is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






BhushanDhamale is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • It looks like you might also be interested in our sister site: Bioinformatics.

    – terdon
    12 hours ago











  • Thank you for the link @terdon!

    – BhushanDhamale
    12 hours ago

















  • It looks like you might also be interested in our sister site: Bioinformatics.

    – terdon
    12 hours ago











  • Thank you for the link @terdon!

    – BhushanDhamale
    12 hours ago
















It looks like you might also be interested in our sister site: Bioinformatics.

– terdon
12 hours ago





It looks like you might also be interested in our sister site: Bioinformatics.

– terdon
12 hours ago













Thank you for the link @terdon!

– BhushanDhamale
12 hours ago





Thank you for the link @terdon!

– BhushanDhamale
12 hours ago










2 Answers
2






active

oldest

votes


















10














You can do this very easily with awk:



$ awk 'NR==FNRa[$1]=$2; next$1=a[$1]; print' file2 file1
GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7


Or, since that looks like a tab-separated file:



$ awk -vOFS="t" 'NR==FNRa[$1]=$2; next$1=a[$1]; print' file2 file1
GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7


This assumes that every RefSeq (NC_*) id in file1 has a corresponding entry in file2.



Explanation




  • NR==FNR : NR is the current line number, FNR is the line number of the current file. The two will be identical only while the 1st file (here, file2) is being read.


  • a[$1]=$2; next: if this is the first file (see above), save the 2nd field in an array whose key is the 1st field. Then, move on to the next line. This ensures the next block isn't executed for the 1st file.


  • $1=a[$1]; print : now, in the second file, set the 1st field to whatever value was saved in the array a for the 1st field (so, the associated value from file2) and print the resulting line.





share|improve this answer




















  • 1





    NR == FNR doesn't work correctly when the first file is empty. See this and the associated answer for a workaround

    – iruvar
    12 hours ago







  • 1





    @iruvar nothing will work well if the first file is empty, so I don't really see why that's relevant. The entire point here is to combine the data from the two files. If either file is empty, the whole exercise is pointless.

    – terdon
    12 hours ago












  • sorry I should have said in this particular case file2 and not file1 is empty. Sane behaviour when file2 is empty is to report the contents of file1. The problem with NR == FNR is that code associated with it executes on the contents of file1 when file2 is empty

    – iruvar
    12 hours ago






  • 1





    @iruvar there is no sane behavior here if either file is empty. That's what I'm saying :) So trying to make it deal with that case gracefully is pointless. And, in any case, when either file is empty here, nothing is printed. Which actually seems like the sanest approach, I'd rather get no data than wrong data.

    – terdon
    12 hours ago



















12














No need for awk, assuming the files are sorted, you can use coreutils join:



join -o '2.2 1.2 1.3 1.4 1.5' file1 file2


Output:



GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7


If your files aren't, sorted, you can either sort them first (sort file1 > file1.sorted; sort file2 > file2.sorted) and then use the command above, or, if your shell supports the <() construct (bash does), you can do:



join -o '2.2 1.2 1.3 1.4 1.5' <(sort file1) <(sort file2)





share|improve this answer

























    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );






    BhushanDhamale is a new contributor. Be nice, and check out our Code of Conduct.









    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f510709%2freplacing-matching-entries-in-one-column-of-a-file-by-another-column-from-a-diff%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    10














    You can do this very easily with awk:



    $ awk 'NR==FNRa[$1]=$2; next$1=a[$1]; print' file2 file1
    GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
    GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
    GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
    GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
    GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
    GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
    GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
    GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
    GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7


    Or, since that looks like a tab-separated file:



    $ awk -vOFS="t" 'NR==FNRa[$1]=$2; next$1=a[$1]; print' file2 file1
    GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
    GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
    GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
    GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
    GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
    GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
    GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
    GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
    GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7


    This assumes that every RefSeq (NC_*) id in file1 has a corresponding entry in file2.



    Explanation




    • NR==FNR : NR is the current line number, FNR is the line number of the current file. The two will be identical only while the 1st file (here, file2) is being read.


    • a[$1]=$2; next: if this is the first file (see above), save the 2nd field in an array whose key is the 1st field. Then, move on to the next line. This ensures the next block isn't executed for the 1st file.


    • $1=a[$1]; print : now, in the second file, set the 1st field to whatever value was saved in the array a for the 1st field (so, the associated value from file2) and print the resulting line.





    share|improve this answer




















    • 1





      NR == FNR doesn't work correctly when the first file is empty. See this and the associated answer for a workaround

      – iruvar
      12 hours ago







    • 1





      @iruvar nothing will work well if the first file is empty, so I don't really see why that's relevant. The entire point here is to combine the data from the two files. If either file is empty, the whole exercise is pointless.

      – terdon
      12 hours ago












    • sorry I should have said in this particular case file2 and not file1 is empty. Sane behaviour when file2 is empty is to report the contents of file1. The problem with NR == FNR is that code associated with it executes on the contents of file1 when file2 is empty

      – iruvar
      12 hours ago






    • 1





      @iruvar there is no sane behavior here if either file is empty. That's what I'm saying :) So trying to make it deal with that case gracefully is pointless. And, in any case, when either file is empty here, nothing is printed. Which actually seems like the sanest approach, I'd rather get no data than wrong data.

      – terdon
      12 hours ago
















    10














    You can do this very easily with awk:



    $ awk 'NR==FNRa[$1]=$2; next$1=a[$1]; print' file2 file1
    GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
    GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
    GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
    GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
    GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
    GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
    GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
    GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
    GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7


    Or, since that looks like a tab-separated file:



    $ awk -vOFS="t" 'NR==FNRa[$1]=$2; next$1=a[$1]; print' file2 file1
    GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
    GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
    GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
    GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
    GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
    GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
    GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
    GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
    GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7


    This assumes that every RefSeq (NC_*) id in file1 has a corresponding entry in file2.



    Explanation




    • NR==FNR : NR is the current line number, FNR is the line number of the current file. The two will be identical only while the 1st file (here, file2) is being read.


    • a[$1]=$2; next: if this is the first file (see above), save the 2nd field in an array whose key is the 1st field. Then, move on to the next line. This ensures the next block isn't executed for the 1st file.


    • $1=a[$1]; print : now, in the second file, set the 1st field to whatever value was saved in the array a for the 1st field (so, the associated value from file2) and print the resulting line.





    share|improve this answer




















    • 1





      NR == FNR doesn't work correctly when the first file is empty. See this and the associated answer for a workaround

      – iruvar
      12 hours ago







    • 1





      @iruvar nothing will work well if the first file is empty, so I don't really see why that's relevant. The entire point here is to combine the data from the two files. If either file is empty, the whole exercise is pointless.

      – terdon
      12 hours ago












    • sorry I should have said in this particular case file2 and not file1 is empty. Sane behaviour when file2 is empty is to report the contents of file1. The problem with NR == FNR is that code associated with it executes on the contents of file1 when file2 is empty

      – iruvar
      12 hours ago






    • 1





      @iruvar there is no sane behavior here if either file is empty. That's what I'm saying :) So trying to make it deal with that case gracefully is pointless. And, in any case, when either file is empty here, nothing is printed. Which actually seems like the sanest approach, I'd rather get no data than wrong data.

      – terdon
      12 hours ago














    10












    10








    10







    You can do this very easily with awk:



    $ awk 'NR==FNRa[$1]=$2; next$1=a[$1]; print' file2 file1
    GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
    GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
    GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
    GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
    GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
    GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
    GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
    GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
    GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7


    Or, since that looks like a tab-separated file:



    $ awk -vOFS="t" 'NR==FNRa[$1]=$2; next$1=a[$1]; print' file2 file1
    GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
    GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
    GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
    GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
    GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
    GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
    GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
    GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
    GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7


    This assumes that every RefSeq (NC_*) id in file1 has a corresponding entry in file2.



    Explanation




    • NR==FNR : NR is the current line number, FNR is the line number of the current file. The two will be identical only while the 1st file (here, file2) is being read.


    • a[$1]=$2; next: if this is the first file (see above), save the 2nd field in an array whose key is the 1st field. Then, move on to the next line. This ensures the next block isn't executed for the 1st file.


    • $1=a[$1]; print : now, in the second file, set the 1st field to whatever value was saved in the array a for the 1st field (so, the associated value from file2) and print the resulting line.





    share|improve this answer















    You can do this very easily with awk:



    $ awk 'NR==FNRa[$1]=$2; next$1=a[$1]; print' file2 file1
    GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
    GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
    GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
    GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
    GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
    GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
    GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
    GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
    GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7


    Or, since that looks like a tab-separated file:



    $ awk -vOFS="t" 'NR==FNRa[$1]=$2; next$1=a[$1]; print' file2 file1
    GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
    GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
    GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
    GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
    GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
    GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
    GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
    GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
    GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7


    This assumes that every RefSeq (NC_*) id in file1 has a corresponding entry in file2.



    Explanation




    • NR==FNR : NR is the current line number, FNR is the line number of the current file. The two will be identical only while the 1st file (here, file2) is being read.


    • a[$1]=$2; next: if this is the first file (see above), save the 2nd field in an array whose key is the 1st field. Then, move on to the next line. This ensures the next block isn't executed for the 1st file.


    • $1=a[$1]; print : now, in the second file, set the 1st field to whatever value was saved in the array a for the 1st field (so, the associated value from file2) and print the resulting line.






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited 12 hours ago

























    answered 12 hours ago









    terdonterdon

    133k33268448




    133k33268448







    • 1





      NR == FNR doesn't work correctly when the first file is empty. See this and the associated answer for a workaround

      – iruvar
      12 hours ago







    • 1





      @iruvar nothing will work well if the first file is empty, so I don't really see why that's relevant. The entire point here is to combine the data from the two files. If either file is empty, the whole exercise is pointless.

      – terdon
      12 hours ago












    • sorry I should have said in this particular case file2 and not file1 is empty. Sane behaviour when file2 is empty is to report the contents of file1. The problem with NR == FNR is that code associated with it executes on the contents of file1 when file2 is empty

      – iruvar
      12 hours ago






    • 1





      @iruvar there is no sane behavior here if either file is empty. That's what I'm saying :) So trying to make it deal with that case gracefully is pointless. And, in any case, when either file is empty here, nothing is printed. Which actually seems like the sanest approach, I'd rather get no data than wrong data.

      – terdon
      12 hours ago













    • 1





      NR == FNR doesn't work correctly when the first file is empty. See this and the associated answer for a workaround

      – iruvar
      12 hours ago







    • 1





      @iruvar nothing will work well if the first file is empty, so I don't really see why that's relevant. The entire point here is to combine the data from the two files. If either file is empty, the whole exercise is pointless.

      – terdon
      12 hours ago












    • sorry I should have said in this particular case file2 and not file1 is empty. Sane behaviour when file2 is empty is to report the contents of file1. The problem with NR == FNR is that code associated with it executes on the contents of file1 when file2 is empty

      – iruvar
      12 hours ago






    • 1





      @iruvar there is no sane behavior here if either file is empty. That's what I'm saying :) So trying to make it deal with that case gracefully is pointless. And, in any case, when either file is empty here, nothing is printed. Which actually seems like the sanest approach, I'd rather get no data than wrong data.

      – terdon
      12 hours ago








    1




    1





    NR == FNR doesn't work correctly when the first file is empty. See this and the associated answer for a workaround

    – iruvar
    12 hours ago






    NR == FNR doesn't work correctly when the first file is empty. See this and the associated answer for a workaround

    – iruvar
    12 hours ago





    1




    1





    @iruvar nothing will work well if the first file is empty, so I don't really see why that's relevant. The entire point here is to combine the data from the two files. If either file is empty, the whole exercise is pointless.

    – terdon
    12 hours ago






    @iruvar nothing will work well if the first file is empty, so I don't really see why that's relevant. The entire point here is to combine the data from the two files. If either file is empty, the whole exercise is pointless.

    – terdon
    12 hours ago














    sorry I should have said in this particular case file2 and not file1 is empty. Sane behaviour when file2 is empty is to report the contents of file1. The problem with NR == FNR is that code associated with it executes on the contents of file1 when file2 is empty

    – iruvar
    12 hours ago





    sorry I should have said in this particular case file2 and not file1 is empty. Sane behaviour when file2 is empty is to report the contents of file1. The problem with NR == FNR is that code associated with it executes on the contents of file1 when file2 is empty

    – iruvar
    12 hours ago




    1




    1





    @iruvar there is no sane behavior here if either file is empty. That's what I'm saying :) So trying to make it deal with that case gracefully is pointless. And, in any case, when either file is empty here, nothing is printed. Which actually seems like the sanest approach, I'd rather get no data than wrong data.

    – terdon
    12 hours ago






    @iruvar there is no sane behavior here if either file is empty. That's what I'm saying :) So trying to make it deal with that case gracefully is pointless. And, in any case, when either file is empty here, nothing is printed. Which actually seems like the sanest approach, I'd rather get no data than wrong data.

    – terdon
    12 hours ago














    12














    No need for awk, assuming the files are sorted, you can use coreutils join:



    join -o '2.2 1.2 1.3 1.4 1.5' file1 file2


    Output:



    GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
    GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
    GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
    GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
    GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
    GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
    GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
    GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
    GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7


    If your files aren't, sorted, you can either sort them first (sort file1 > file1.sorted; sort file2 > file2.sorted) and then use the command above, or, if your shell supports the <() construct (bash does), you can do:



    join -o '2.2 1.2 1.3 1.4 1.5' <(sort file1) <(sort file2)





    share|improve this answer





























      12














      No need for awk, assuming the files are sorted, you can use coreutils join:



      join -o '2.2 1.2 1.3 1.4 1.5' file1 file2


      Output:



      GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
      GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
      GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
      GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
      GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
      GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
      GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
      GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
      GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7


      If your files aren't, sorted, you can either sort them first (sort file1 > file1.sorted; sort file2 > file2.sorted) and then use the command above, or, if your shell supports the <() construct (bash does), you can do:



      join -o '2.2 1.2 1.3 1.4 1.5' <(sort file1) <(sort file2)





      share|improve this answer



























        12












        12








        12







        No need for awk, assuming the files are sorted, you can use coreutils join:



        join -o '2.2 1.2 1.3 1.4 1.5' file1 file2


        Output:



        GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
        GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
        GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
        GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
        GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
        GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
        GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
        GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
        GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7


        If your files aren't, sorted, you can either sort them first (sort file1 > file1.sorted; sort file2 > file2.sorted) and then use the command above, or, if your shell supports the <() construct (bash does), you can do:



        join -o '2.2 1.2 1.3 1.4 1.5' <(sort file1) <(sort file2)





        share|improve this answer















        No need for awk, assuming the files are sorted, you can use coreutils join:



        join -o '2.2 1.2 1.3 1.4 1.5' file1 file2


        Output:



        GCF_000014165.1_ASM1416v1_protein.faa WP_011558474.1 1155234 1156286 44173
        GCF_000014165.1_ASM1416v1_protein.faa WP_011558475.1 1156298 1156807 12
        GCF_000014165.1_ASM1416v1_protein.faa WP_011558476.1 1156804 1157820 -3
        GCF_000015405.1_ASM1540v1_protein.faa WP_011558474.1 1159543 1160595 42748
        GCF_000015405.1_ASM1540v1_protein.faa WP_011558475.1 1160607 1161116 12
        GCF_000015405.1_ASM1540v1_protein.faa WP_011558476.1 1161113 1162129 -3
        GCF_000016005.1_ASM1600v1_protein.faa WP_011559727.1 2481079 2481633 8
        GCF_000016005.1_ASM1600v1_protein.faa WP_011854835.1 1163068 1164120 42559
        GCF_000016005.1_ASM1600v1_protein.faa WP_011854836.1 1164127 1164636 7


        If your files aren't, sorted, you can either sort them first (sort file1 > file1.sorted; sort file2 > file2.sorted) and then use the command above, or, if your shell supports the <() construct (bash does), you can do:



        join -o '2.2 1.2 1.3 1.4 1.5' <(sort file1) <(sort file2)






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited 12 hours ago









        terdon

        133k33268448




        133k33268448










        answered 12 hours ago









        ThorThor

        12.1k13762




        12.1k13762




















            BhushanDhamale is a new contributor. Be nice, and check out our Code of Conduct.









            draft saved

            draft discarded


















            BhushanDhamale is a new contributor. Be nice, and check out our Code of Conduct.












            BhushanDhamale is a new contributor. Be nice, and check out our Code of Conduct.











            BhushanDhamale is a new contributor. Be nice, and check out our Code of Conduct.














            Thanks for contributing an answer to Unix & Linux Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f510709%2freplacing-matching-entries-in-one-column-of-a-file-by-another-column-from-a-diff%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Magento 2 duplicate PHPSESSID cookie when using session_start() in custom php scriptMagento 2: User cant logged in into to account page, no error showing!Magento duplicate on subdomainGrabbing storeview from cookie (after using language selector)How do I run php custom script on magento2Magento 2: Include PHP script in headerSession lock after using Cm_RedisSessionscript php to update stockMagento set cookie popupMagento 2 session id cookie - where to find it?How to import Configurable product from csv with custom attributes using php scriptMagento 2 run custom PHP script

            Can not update quote_id field of “quote_item” table magento 2Magento 2.1 - We can't remove the item. (Shopping Cart doesnt allow us to remove items before becomes empty)Add value for custom quote item attribute using REST apiREST API endpoint v1/carts/cartId/items always returns error messageCorrect way to save entries to databaseHow to remove all associated quote objects of a customer completelyMagento 2 - Save value from custom input field to quote_itemGet quote_item data using quote id and product id filter in Magento 2How to set additional data to quote_item table from controller in Magento 2?What is the purpose of additional_data column in quote_item table in magento2Set Custom Price to Quote item magento2 from controller

            How to solve knockout JS error in Magento 2 Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) Announcing the arrival of Valued Associate #679: Cesar Manara Unicorn Meta Zoo #1: Why another podcast?(Magento2) knockout.js:3012 Uncaught ReferenceError: Unable to process bindingUnable to process binding Knockout.js magento 2Cannot read property `scopeLabel` of undefined on Product Detail PageCan't get Customer Data on frontend in Magento 2Magento2 Order Summary - unable to process bindingKO templates are not loading in Magento 2.1 applicationgetting knockout js error magento 2Product grid not load -— Unable to process binding Knockout.js magento 2Product form not loaded in magento2Uncaught ReferenceError: Unable to process binding “if: function()return (isShowLegend()) ” magento 2