I'm a graduate student whose research is complex network. I am working on a project that involves analyzing connections between users (follwing and follwers). Is it possible to write a crawler for twitter based on friendship information?
I looked around but couldn't find any things useful so far.
thanks
Leila
I don't know whether something "out of the box" exists in this case.
But, some frameworks exist in which it is quite easy to write crawlers/spiders.
For instance:
In Python: http://scrapy.org/
In Java: http://java.sun.com/developer/technicalArticles/ThirdParty/WebCrawler/
I appreciate Scrapy for its simplicity in this case. But you might want to look for a library suited for the languages you are familiar with.
Related
I've been looking for a long time for a simple solution for building a comparison website for different kind of products.
There are ways building something like laptopvslaptop.com with Wordpress or other cms, but they all have too much code and tons of js libraries. I look for a lightweight and simple script solution comparing for example products.
The sticking point is, Iam not a programmer. And yes, there tons of tutorials out there, but none that describes building site like snapsort, cpuboss or laptopvslaptop. And hiring someone would be an option, but I like to build with basic php, css and html skills. (or better NodeJS or NextJS based)
Maybe someone here can give me some advice for building a simple lightweight comparison script. I like laptopvslaptop com because of simplicity. How is this side made...
thanks for your thoughts in advance
I am willing to build a wiki-based website that would have some other features, namely comments, social sharing, video insertion, article rating and gamification. In a nutshell, something very close to the StackExchange's websites, but the pages would consist of a single piece article instead of a thread of questions implementing the footnote feature.
I have not coded a single line yet.
I am rather experienced with Grails, so I know Groovy and Java. I also know JQuery and a bit of PHP, but I can learn basically everything required. I will be the only one programming on the project.
My questions are:
Which technology should I use according to YOU ?
Should I use Grails as this is what I know best, and try to integrate a wiki technology within my app (if yes, which one) ?
Should I start from an already existing wiki technology (WikiMedia, XWiki, TWiki, Moinmoin, ...) and modify it to integrate the features I need (gamification, comments, video insertion, article rating and social sharing) ? Once again, if you think that is the best solution, please quote a technology, and if possible, tell me why is this THE one.
Thank you very much for your help. I find it rather hard to choose, and ever harder to know which path is the right one to go.
Any suggestion is most welcome.
I would suggest using MediaWiki for the following reasons
You mentioned a wiki-based website
It has lots of extensions built already for your needs (comments, article rating, sharing, comments)
Since you mentioned you know little PHP, you can also modify some of the extensions for your use.
MediaWiki has (via extensions) support for social sharing, video insertion and article rating, and not-great-but-okay support for comments. (Probably most other wiki platforms too - these are common enough features.) Wikia (a MediaWiki-based wiki farm who opensourced most of their custom code) has some gamification features, though I am not familiar with them. Also, MediaWiki has the advantage of having the most widely known wiki dialect (due to the popularity of Wikipedia).
That said, if you are going for minimal developement effort, I would look into adding wiki features to an existing StackOverflow clone before trying to add gamification, comment etc. features to a wiki.
I'm quite new to the field of computer science but I think I've got a pretty decent idea for a website to aid classroom CS learning and collaboration. I'd really like to develop the website from the ground up and make it a sort of pet project in hopes of eventually getting it out on the web for free. Hopefully I can get some teachers to adopt it for use with their classes.
The problem is that I honestly don't know where to start. I've got the idea but I don't have enough formal education to guide the implementation of my idea. The site should have quite a bit of functionality in the long run. I'll need to be able to store user and class data/files as well as offer discussion boards and other things.
Without getting into too many details, what is the best way for me to get started? What languages and databases should I be most interested in as I build the site and ensure scalability and future functionality developments? I would really appreciate any information you could give me on how to structure the project/stack as I don't have much of a clue at this point. I have the idea. Now I just need a little bit of help getting started.
Thanks!
There are definitely already projects out there that will (more than likely) do everything you're currently considering. That said, there's immense benefit in doing a project like this for personal development - you get to learn, and you expand your public portfolio. If you run the project as open source, you can also demonstrate your ability to work with others. All very good (hireable) attributes.
Are there any programming languages you already know? Are there any that your course is going to be teaching that you know ahead of time?
There are so many different languages and frameworks available to choose from, but I'll only mention a few.
Language: Framework
.NET: ASP.NET MVC
python: django
ruby: ruby-on-rails
I'm a huge fan of django. Python is quite a nice language to learn. I'd recommend django purely from a biased point of view. Python runs on Windows, Linux, and Mac, though you probably don't want to host python on windows (culture more than ability).
Conversely, if you really like Windows, ASP.NET MVC makes building out websites very very easy. Mono does allow you to run .NET on linux and mac, but you might find support lacking, and I wouldn't suggest using Mono for your first project.
PHP is (was?) another popular language for building websites in. There are tonnes of web frameworks available for PHP. Popular opinion seems to be that PHP makes it easier for developers to write bad code, though it is possible to write good code with PHP.
Unfortunately, without knowing a rough direction in which you're headed, it's nearly impossible to offer some concrete advice. Database choice will generally come down to what language and platform (linux/.net) you're targeting. Web server also fits this profile. Once you decide on a language, narrowing down the other choices become a lot easier.
Learn HTML to start with and keep improving as per needed with css , javascript. You won't need more then this.
A recent announcement by Google about the Google Prediction API sounded very interesting. It could be useful for a project that is coming up, and would probably do a better job than some custom code I was considering.
However, there is some vendor lock-in. Google retain the trained model, and could later choose to overcharge me for it. It occurred to me that there are probably open-source equivalents, if I was willing to host the training myself (I am) and live without their ability to throw hardware at the problem at a moment's notice.
Last time I looked at 3rd Party computer training code was many years ago, and there were a lot of details that needed to be carefully considered and customised for your project. Google appear to have hidden those decisions, and take care of them for you. To me, this is still indistinguishable from magic, but I would like to hear whether others can do the same.
So my question is:
What alternatives to Google Prediction API exist which:
categorise data with supervised machine learning,
can be easily configured (or don't need configuration) for different kinds and scales of data-sets?
are open-source and self-hosted (or at the very least, provide you with a royalty free use of your model, without a dependence on a third party)
Maybe Apache Mahout?
PredictionIO is an open source machine learning server for software developers to create predictive features, such as personalization, recommendation and content discovery.
Have been looking recently at tools like google prediction API, one of the first ones I got put on to was Weka machine learning tool which could be worth checking out for anyone looking.
I'm not sure if it's relevant, but directededge seams to be doing exactly that :)
There is good free for use service Yandex Predictor with 100000/day request quota. It works for text only, supports several languages and spell correction.
You need to get free API Key, then you can use simple RESTful API. Api support JSON, XML and JSONP as output.
Unfortunately I cannot find documentation in English. You can use Google Translate.
I can translate docs if there is some demand.
I'm working on a "twitter filter" - more to learn ruby on rails than anything else. The idea is that I use a semantic ontology to lookup a users interests. So if a user says they're interested in "sports" that means flag any tweets that discuss "sports" "golf" "football" and so on.
I'd like to be able to expand it to any hierachial of topics, though. So if you're interested in Europe flag all the countries in Europe.
Naturally this is rather complex, so maybe we'd limit it to one or two "levels" of lookup...
How could I do this efficently? I'm pretty familiar with Java, C and Ruby, and have worked a lot with MySQL.
I'd look into Doug Lenat's Cyc. It's done and open.
I'm not sure if it will help you, but Google has something called Google Sets. You can look on it here: http://labs.google.com/sets
Before you think about programming languages and technology, think about this: What kind of datastructure is a "semantic onthology"?
To me that sounds like some kind of a directed graph.
Knowing that, you'll soon find out, that it's quite easy to implement such a structure in whatever language and technology you want and that a lot of languages already have implemented some kind of a graph library (e.g. RGL for Ruby).
To me the real problem isn't how to implement such a datastructure and how to do this efficiently but how to get the semantic information you need out of twitter to build this (e.g. who tells your application that europe isn't a part of spain but that spain is a part of europe?).
Anyway, have fun implementing it, sounds like a cool project! :-)
I'm not sure what your requirements are. But it seems that either Singular Value Decomposition (SVD) or Support Vector Machines (SVM) will work for you.