partial match help

C

cupcakeluv_333

Hello,
I am in desperate need of help! Here are my details:
I have one column with different gene identities, such a
"gi|351702631|gb|EHB05550.1|".
I have another column with the identities matched with gen
descriptions, for example "351702631gb|EHB05550.1|EHB05550.1cGMP-gate
cation channel alpha-1 [Heterocephalus glaber]".
I need to find the match for column 1 in column 2; however, as you ca
see, they are not exact matches. I need to put the match from column
into a 3rd column. The lengths of column 1 and 2 do not match.
Please help!
Thanks :
 
R

Ron Rosenfeld

Hello,
I am in desperate need of help! Here are my details:
I have one column with different gene identities, such as
"gi|351702631|gb|EHB05550.1|".
I have another column with the identities matched with gene
descriptions, for example "351702631gb|EHB05550.1|EHB05550.1cGMP-gated
cation channel alpha-1 [Heterocephalus glaber]".
I need to find the match for column 1 in column 2; however, as you can
see, they are not exact matches. I need to put the match from column 2
into a 3rd column. The lengths of column 1 and 2 do not match.
Please help!
Thanks :)

This is not straightforward as it would require multiple substitutions in one or the other to develop a match. In other words, in the example you present, one would have to remove from the gene identity the leading "gi|" and the second "|" in order to develop a partial match. Without knowing how these gene identity strings, and gene defintion strings are constructed, it would be very difficult to develop an accurate algorithm to determine what kinds of matches are proper, and what are improper.

Some questions that come to mind have to do with the location of the pipes, especially since they are different in both instances;
the leading "gi|" in the gene identity string -- is there something at the beginning that can always be ignored?
the significance of the 2nd EHB05550.1 in the gene description string
how to determine how much of the gene identity has to match with the gene description in order to constitute a proper match
etc.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top