This will be a short exercise since all of you have projects to work on - and pattern finding should be a part of all projects.
Start with your sequence.
If this is a new or unknown sequence, where do you start an analysis?
OK, so most of you know by now that a BLAST or FASTA search is the best place to start any analysis of an unknown sequence, but lets skip that for now and think about pattern recognition. Does this gene match any known patterns.
The first problem is that this is an unknown chunk of DNA, does it have any genes in it? Lets do a quick check for open reading frames. The best tool for this is FRAMES we can also use MAP if no graphics viewing options are available.
The output from MAP is a bit confusing, but if you choose just one enzyme and the option to translate only open reading frames, it can be deciphered. The command line option "/open=20" will limit the display to open reading frames longer than 20 amino acids.
OK, now try a BLAST search against the Swissprot database (GCG will automatically use BLASTX to translate your DNA query sequence in all six reading frames for comparison to the protein database).
Isn't that an easier way to find the coding sequence in a stretch of DNA? Too bad this doesn't work for all new sequences that you find in the lab.
Take the protein sequence generated by BLASTX and use it for a MOTIFS search. In this case it is simple and it works. Use this same protein sequence to search the ProDom database and look at a multiple alignment of related genes. [ http://protein.toulouse.inra.fr/prodom.html ]
If you were new to the study of this gene, this information would probably be valuable.