Blast Two sequences from a python script - alignment

I have a list of pairs of proteins and I want to compare speed and accuracy of "BLAST Two Sequences" to a Smith-Waterman program for alignment.
I know there is a "Blast Two Sequences" option on NCBI website, but I would like to run it from a python script. Perhaps Biopython has this capability?
If I cannot use Blast Two Sequences, I will compare different versions of Smith-Waterman, but this would not be nearly as exciting :)
OR, if anyone has another idea for a great senior year project in Bioinformatics involving comparing pairs of proteins, please don't hesitate to let me know! Thank you in advance.

Chapter 7 (BLAST) of the Biopython Tutorial and Cookbook should have what you're looking for.
The NCBBI module allows interaction with online BLAST tools, Bio.Blast.Applications has a number of different local alignments utilities, and the Bio.Seq module contains objects to interact with different sequences.
Biopython's documentation is quite good and the API is generally well-written. If you're looking for an interesting project, I suggest you read the Tutorial and Cookbook- it's a good overview of what Biopython has to offer.

Related

Are there any papers describe why flex is faster than lex?

flex is called the "fast" lexical analyzer, but I can not find any document that explains why it is faster than lex. flex has a manual, but it focus on its usage instead of its internals. Could any experts in this field give some help please? Either an explanation about flex's performance improvements or a link to it is welcome.
This answer is from Vern Paxson, and he allows it being shared here.
Alas, this would take quite a bit of time to sketch in any sort of
useful detail, as there are a number of techniques that contribute to
its performance. I wrote a paper about it a loooong time ago (mid
80s!) but don't have a copy of it. Evidently you can buy it from:
http://www.ntis.gov/search/product.aspx?ABBR=DE85000703
Sorry not to be of more help ...
To add to Vern's statement, flex does a lot better job of table compression, providing several different space/time tradeoffs, and its inner loop is also considerably faster than lex's.
According to a (Usenet?) paper by van Jacobsen in the 1980s, lex was largely written by an AT&T intern. VJ described how its inner loop could be reduced from several dozen instructions to about three.
Vern Paxon wrote flex for what he described at the time as the fastest data acquisition applications in the world. Not sure if I should go into more details here.
I had the privilege of helping Vern with the 8-bit version, as I was working in compilers that had to scan Kanji and Katakana at the time.
I'm not so sure flex is so much faster than the AT&T version lex. Both programs have been developed independently and to avoid confusion with the official version, the authors of flex probably choose a slightly different name. They might have intended to generate faster scanners, which is also suggested by a couple of options to trade space for time. Also they motivate making %option yylineno (and a few other features) optional with the speed of the generated scanner.
Whether the slight differences in speed for such scanners are still relevant is debatable. I couldn't find any official statement on the choice of name either, so I guess you'd have to ask the original authors Jef Poskanzer and/or Vern Paxson. If you find them and get an answer, then please let us know here. History of software is interesting and you can still get the answer first hand.

Creating parsers using flex/bison

Hi I'm need to create a parser to parse search engine advanced query languages:
For instance: “food” language:es
I want to use Flex and Bison but I've never used them. I was wondering if anyone could point me to a good tutorial online, then it would be really helpful. I've been looking online but I didn't find anything useful.
Also, If anyone can provide any sample flex/bison code, I would really appreciate it.
Thanks so much in advance
I'm surprised you have been unable to find good tutorial's online, as the use of flex and bison and similar compiling tools are used in large numbers of computer science university courses world wide. As many people are learning them there are a large number of resources available. You must not have been using the right search terms. There are also numerous helpful tutorial videos on YouTube (including mine).
When I searched, this one came are the first result: http://aquamentus.com/flex_bison.html
The page suggested by #Bart Kiers http://dinosaur.compilertools.net/ is good too.

A production-ready, real-time recommendation engine thats easy to setup

I want to store large number of data ppints for user actions, like likes, tags etc (I have plans for both e-commerce and document management).
With the data points, I want to support functions such as
"users who loved X loved Y,Z" recommendations
"fetch more stuff similar to X,Y" clustering.
By production-ready, real-time; I mean that I can enter data points and make queries at the same time, the server will take care of answering queries and updating scores by itself.
I searched around the interwebs and the solutions that come up are either of:
Data-mining libraries that are mostly academic-oriented and are meant for large batch operations, not for heavy real-time queries
Hadoop/Mahout, which is production-ready and support real time updates and queries, but have a steep learning curve and tough to administer.
For recommenders, Mahout has a non-distributed recommender implementation that does not use Hadoop. In fact, this is the only part that is real-time; the Hadoop-based parts are not.
I think there is little learning curve to it; see here and here for a pretty complete writeup.
Mahout in Action chapters 2-5 cover this quite well too.
Please understand that for useful recommendations, the various parameters of such a system must be carefully fine tuned. The out of the box functionality many systems have (Oracle data mining, Microsoft data mining extensions etc.) just offer the core functionality.
So in the end, you will not get around the "steep learning curve", I guess. That is why you need experts for data mining. If there were a point-and-click solution, it would already be integrated everywhere.
Example "similar items". I laughed hard, when Amazon once recommended me to buy two products: Debian Linux Administrators Handbook and ... Debian Linux Admininstrators Handbook WITH CD.
I hope you get the key point of this example: to a plain algorithm, the two books appear "similar", and thus a sensible combination. To a human, it it pointless to buy the same book twice. You need to teach such rules to any recommendation system, as they cannot be trivially learned from the data. There will always be good results and useless results, and you need to tune and parameterize the system carefully.

introduction to latex

what's a good website that has an introduction to latex for window users? I will be using it mainly to write math homework problems and probably then converting it to a PDF to print out. I'm hoping somebody has bookmarked a good link already so I don't have to search. Thanks!
You should start from this "not so short introduction to latex"
http://www.ctan.org/tex-archive/info/lshort/english/lshort.pdf
I recommend this one: http://en.wikibooks.org/wiki/LaTeX
LaTeX works the same across platforms (and even across its distros, except some may provide features that others don't), so it doesn't really matter what your platform is.
I find that Andy Roberts's site is perfect for beginners, it has a lot of newbie exercises and takes you by the hand in a perfectly controlled fashion. It is my online reference for my basic latex questions.
http://www.andy-roberts.net/misc/latex/index.html
The LaTeX Community site doesn't seem to have any beginner tutorials, but it does have a number of articles that go into specific uses. If one of those articles covers what you're trying to do, then it may give you a headstart.
Outside tutorials, as a beginner getting into LaTeX, I found the TeXnicCenter Open Source IDE to be very useful. It makes life a lot easier when you get some syntax colouring and templates to help with common structures like tables.

How can I implement a semantic ontology in Ruby on Rails?

I'm working on a "twitter filter" - more to learn ruby on rails than anything else. The idea is that I use a semantic ontology to lookup a users interests. So if a user says they're interested in "sports" that means flag any tweets that discuss "sports" "golf" "football" and so on.
I'd like to be able to expand it to any hierachial of topics, though. So if you're interested in Europe flag all the countries in Europe.
Naturally this is rather complex, so maybe we'd limit it to one or two "levels" of lookup...
How could I do this efficently? I'm pretty familiar with Java, C and Ruby, and have worked a lot with MySQL.
I'd look into Doug Lenat's Cyc. It's done and open.
I'm not sure if it will help you, but Google has something called Google Sets. You can look on it here: http://labs.google.com/sets
Before you think about programming languages and technology, think about this: What kind of datastructure is a "semantic onthology"?
To me that sounds like some kind of a directed graph.
Knowing that, you'll soon find out, that it's quite easy to implement such a structure in whatever language and technology you want and that a lot of languages already have implemented some kind of a graph library (e.g. RGL for Ruby).
To me the real problem isn't how to implement such a datastructure and how to do this efficiently but how to get the semantic information you need out of twitter to build this (e.g. who tells your application that europe isn't a part of spain but that spain is a part of europe?).
Anyway, have fun implementing it, sounds like a cool project! :-)
I'm not sure what your requirements are. But it seems that either Singular Value Decomposition (SVD) or Support Vector Machines (SVM) will work for you.

Resources